Mark Erdmann's Highlights on '(1) Damien Teney on X: "Lots of confusion here: 🤔 ● There was already no doubt that GPT-2 could implement generalizing arithmetic. ● It's also not an 'optimization' problem. It's not about finding a smaller value of the training loss. The limitation comes from the fact... ⬇️" / X' | Glasp