Mark Erdmann's Highlights on '(1) Damien Teney on X: "Lots of confusion here: 🤔 ● There was already no doubt that GPT-2 could implement generalizing arithmetic. ● It's also not an 'optimization' problem. It's not about finding a smaller value of the training loss. The limitation comes from the fact... ⬇️" / X'

(1) Patrick Collison on X: "Was chatting with a well-known founder yesterday about the "founder mode" discussion. We were both wondering if people would misinterpret it, and undervalue the importance of hiring great leaders. Steve Jobs, the canonical example of "founder mode", was also gifted at" / X

x.com/patrickc/status/1835434966072836483

foundermode

Sep 18, 2024

CLS on X: "Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers. https://t.co/vAU65mBikQ" / X

x.com/ChengleiSi/status/1833166031134806330

Sep 10, 2024

(1) Rohan Paul on X: "Simply adding "Repeat the question before answering it." somehow make the models answer the trick question correctly. 🤔 Probable explanations:✨ 📌 Repeating the question in the model's context, significantly increasing the likelihood of the model detecting any potential https://t.co/kGxwHeyBVp" / X

x.com/rohanpaul_ai/status/1830230678673223737

Sep 3, 2024

Victor M on X: "This is probably the most beautiful photorealistic LoRA ever trained ☀️ https://t.co/wwHHKckky6 https://t.co/RBWOBVLGhA" / X

x.com/victormustar/status/1828895056738374098

Aug 29, 2024

Yohei on X: "Proposing a new benchmark for autonomous agents… Start with one code base, $1000 in a digital wallet, and one email address. Score: how much money it can make with no human intervention - just press go. Let’s call it… the HustleAGI benchmark." / X

x.com/yoheinakajima/status/1828498398313652531

Aug 27, 2024

elvis on X: "This Python tool looks super useful to crawl websites and convert data into LLM-ready markdown or structured data. I find myself doing this a lot and most of the time it is a tedious effort. Great to see a service that does data extraction catered for LLM-based pipelines. https://t.co/yaiKttOSU2" / X

x.com/omarsar0/status/1828470077798183403

Aug 27, 2024

Dylan Freedman on X: "Microsoft's new open source Phi 3.5 vision model is really good at OCR/text extraction — even on handwriting! You can prompt it to extract tabular data as well. It's permissively licensed (MIT). Play around with it here: https://t.co/5onmYAwNu7 https://t.co/hjYieofnKw" / X

x.com/dylfreed/status/1828132226523131931

Aug 27, 2024

Patrick Collison on X: "After trying many different strategies over the years, recent yoyo-ing across several continents has convinced me that the best jet lag strategy is simply to limit adjustment to 1–2 hours/day, especially when traveling east. Means a lot of reading and email at odd hours, but" / X

x.com/patrickc/status/1828286610334720355

Aug 27, 2024

(1) Ron Mokady on X: "My short analysis of the (technical) difference between Flux and SD3: 1. The most significant architecture change IMO is that RoPE (Rotary Position Embedding) is injected before each attention layer [1/N] https://t.co/n8x83tcOJ6" / X

x.com/MokadyRon/status/1821533077396390063

Aug 8, 2024

(1) Nat Friedman on X: "I successfully used galantamine to induce lucid dreaming after a couple of tries. Other people might enjoy: https://t.co/zMzPw6Wj3g" / X

x.com/natfriedman/status/1821632980915466698

Aug 8, 2024

(1) Rohan Paul on X: "MASSIVE achievement by @GoogleDeepMind. Just released Gemma-2 2B surpasses all GPT-3.5 models on Chatbot Arena. 🤯 Take that a 2B param model surpasses GPT-3.5 (175B+ param ) - almost can't believe it. Totally they released three new additions to the Gemma 2 family:They 📌 https://t.co/jzN1LGyblZ" / X

x.com/rohanpaul_ai/status/1818697538360295897

Jul 31, 2024

(1) Damien Teney on X: "Lots of confusion here: 🤔 ● There was already no doubt that GPT-2 could implement generalizing arithmetic. ● It's also not an 'optimization' problem. It's not about finding a smaller value of the training loss. The limitation comes from the fact... ⬇️" / X

x.com/DamienTeney/status/1817501437078679796

Jul 30, 2024

(2) Allen Downey on X: "On Reddit's statistics forum, the most common question is "What test should I use?" My answer, from 2011, is "There is only one test" https://t.co/J5Ar4olekz https://t.co/qaHhZjMt8C" / X

x.com/AllenDowney/status/1817908776072028475

Jul 30, 2024

(1) from:emollick gender - Search / X

x.com/search?q=from%3Aemollick%20gender&src=typed_query&f=top

Jul 17, 2024

Michael Antonelli on X: "Wild list. How about @miamiuniversity https://t.co/RrDAi4bMsC" / X

x.com/BullandBaird/status/1813193905887666277

Jul 16, 2024

Kangwook Lee on X: "🧵Let me explain why the early ascent phenomenon occurs🔥 We must first understand that in-context learning exhibits two distinct modes. When given samples from a novel task, the model actually learns the pattern from the examples. We call this mode the "task learning" mode. https://t.co/AirPHIjAVp" / X

x.com/Kangwook_Lee/status/1767603595619246530

Jul 16, 2024

mutable.ai

mutable.ai/

Jul 16, 2024

(1) Matt Holden on X: "I want a MIDI controller for the 9 Enneagram types of my LLM Claude tries way too hard to help, gotta dial that 2 energy down Let's crank up the 4/7 for this visual design, then dial up 1 to write the unit tests, and then use some 3/5 (with a bit of 6) for the strategy doc" / X

x.com/holdenmatt/status/1813249808657969505

Jul 16, 2024

Gradio on X: "Florence-2 to generate image captions and AuraFlow to generate the image! Yields stunning results. Try out the fun app by @nonda30 at : https://t.co/FrpS5HH1r1 https://t.co/R9sXtYCh7s" / X

x.com/Gradio/status/1812811631522361527

Jul 16, 2024

(3) Andrej Karpathy on X: "We will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely. https://t.co/5haV7FvbBx" / X

x.com/karpathy/status/1759996551378940395

Jul 16, 2024

Rohan Paul on X: "✨ Intriguing paper - Synthetic data proves to be nearly as effective as real data and shows no clear saturation when scaled up to approximately one million samples. 🗞️ "Common 7B Language Models Already Possess Strong Math Capabilities" 📌 To overcome this limitation of the https://t.co/qP7doGiEH3" / X

x.com/rohanpaul_ai/status/1812125722091143344

Jul 15, 2024

Ethan Mollick on X: "It begins. This is another sign that LLMs are going to be able to work with structured & unstructured spreadsheet data soon. This will unlock a lot of use cases (projections, financials, valuations, etc.) and having a spreadsheet source of truth will tend to lower hallucinations https://t.co/ovYalYips5" / X

x.com/emollick/status/1812684733538541694

Jul 15, 2024

Posts / X

x.com/i/timeline

Jul 15, 2024

Soami Kapadia on X: "Mixture of Agents on Groq Introducing a fully configurable, Mixture-of-Agents framework powered by @GroqInc using @LangChainAI You can configure your own MoA version using the @streamlit UI through the framework. details + links below👇🧵 https://t.co/nItnqbPtgi" / X

x.com/KapadiaSoami/status/1811657156082712605

Jul 13, 2024

(1) Ted Werbel on X: "Few things pretty obvious to a few AI researchers but that most don't want to believe: 1. 90% of the most impactful AI research is already on arxiv, x, or company blog posts 2. q* aka strawberry = STaR (self-taught reasoners) with dynamic self-discover + something like DSPy for" / X

x.com/tedx_ai/status/1811945091696853431

Jul 13, 2024

ben on X: "super interesting report on @openai's revenue they make 5x more from ChatGPT than they make from every single product that is built on top of OpenAI, in the entire world, combined https://t.co/JinXW4GHZY https://t.co/HTY4c9PqVe" / X

x.com/benhylak/status/1811448374349943263

Jul 12, 2024

Patrick Hsu on X: "Collaborate with people who have a strong sense of aesthetic This is surprisingly underappreciated" / X

x.com/pdhsu/status/1811416100393083002

Jul 12, 2024

Yifei Hu on X: "Releasing TF-ID: Table/Figure Identifier for academic papers. SoTA performance: 98%+ sucess rate for perfect table/figure detection 📈 mit license: free for any use cases ✅ 2 sizes: 0.23B and 0.77B 📏 2 variants: with or without caption text 🛠️ Finetuned on Florence 2 with https://t.co/kEoLCVFaRs" / X

x.com/hu_yifei/status/1811187540009042417

Jul 12, 2024

(1) xjdr on X: "Claude Power Move (CPM): Come up with an abstract idea "Help me create a step-by-step plan to <do x> and in order to accomplish <y goal>". Send that to sonnet 3.5 for the reasoning engine. Take the sonnet 3.5 output and feed it into opus with a "Please elaborate and improve" / X

x.com/_xjdr/status/1811470145426194602

Jul 12, 2024

(1) Tanay Jaipuria on X: "Software IPOs by Year 😲 via @avenirgrowth https://t.co/j1UAeWomCQ" / X

x.com/tanayj/status/1811517130963083408

Jul 12, 2024

Ethan Mollick on X: "An experiment shows temperature has different effects on test-taking for men vs. women. For verbal tests, women beat men when it is over 70° F (maxing at 90°). In math, men & women do the same when its 80°. They suggest setting office thermostats higher! https://t.co/JJnSXs4VV4 https://t.co/LcxbIpdcqR" / X

x.com/emollick/status/1360763996484366336

Jul 11, 2024

(1) Morgan McGuire (Hiring 👋) on X: "RIP RAG “I think long context is definitely the future rather than RAG” On domain specialisation: “If you want a model for medical domain, legal domain…it (finetuning) definitely makes sense…finetuning can also be an alternative to RAG” Great episode, had to listen 0.75x 😂" / X

x.com/morgymcg/status/1810973158331072630

Jul 10, 2024

(1) Ethan Mollick on X: "We know good management is causal in part because, a decade ago, teams of consultants introduced basic management practices to some Indian plants & left others as a control. The practices boosted performance then. A followup shows about half the effects persist 10 years later! https://t.co/x4NROBczIJ" / X

x.com/emollick/status/1808922231302422840

Jul 4, 2024

(1) Lingming Zhang on X: "Introducing OpenAutoCoder-Agentless😺: A simple agentless solution solves 27.3% GitHub issues on SWE-bench Lite with ~$0.34 each, outperforming all open-source AI SW agents! It's fully open-source, try it out: 🧑‍💻https://t.co/AKyiZhmi7B 📝https://t.co/Oc4QCaQult https://t.co/gQDfCrLzs3" / X

x.com/LingmingZhang/status/1808501612056629569

Jul 4, 2024

Rohan Paul on X: "Quite an wild idea in this paper - Proposes a persona-driven data synthesis methodology using Persona Hub, a collection of 1 billion diverse personas, to create scalable and diverse synthetic data for LLM training and evaluation. 📌 Persona Hub contains 1bn+ personas derived https://t.co/r3pjDYa49u" / X

x.com/rohanpaul_ai/status/1808096574997770590

Jul 3, 2024

Pat Walls on X: "Free business idea for anyone that can code: Build a tiny saas around a SINGLE Zapier integration. Hear me out... So Zapier has 6,000 integrations. 6,000! 1. Find an integration that's (1) popular and (2) limited in functionality 2. Make it 10x better, cover edge cases, etc. https://t.co/TppPkhRDNf" / X

x.com/thepatwalls/status/1808150786804707755

Jul 3, 2024

AGI will drastically increase economies of scale — LessWrong

www.lesswrong.com/posts/Sn5NiiD5WBi4dLzaB/agi-will-drastically-increase-economies-of-scale

Jul 3, 2024

(3) Aidan McLau on X: "livebench (https://t.co/3fKC4vaoTE) is my new favorite eval: > contamination proof (new questions monthly) >tests model iq (unlike arena nowadays) >matches my intuition on relative perf quite well thanks @jpohhhh for the pointer https://t.co/fDXfG51wJe" / X

x.com/aidan_mclau/status/1807875944088326271

Jul 2, 2024

(1) Rohan Paul on X: "Brilliant new paper, HUGE for LLM's internalized knowledge 🔥 Out Of Context Learning > In Context Learning | Fine-tuning can teach new concepts better than ICL 📌 Finds a surprising capability of LLMs through a process called inductive out-of-context reasoning (OOCR). In the https://t.co/Ys5LUgLNKp" / X

x.com/rohanpaul_ai/status/1807774433550950816

Jul 1, 2024

(1) elvis on X: "This is one of the coolest ideas for scaling synthetic data that I've come across. Proposes 1 billion diverse personas to facilitate the creation of diverse synthetic data for different scenarios. It's easy to generate synthetic data but hard to scale up its diversity which is https://t.co/UR998d49hE" / X

x.com/omarsar0/status/1807827401122238628

Jul 1, 2024

Jeff Morris Jr. on X: "“How to ship fast as a small company looking for product-market fit” — by @varunsrin @farcaster_xyz is one of the fastest engineering teams I’ve ever seen… Here is how they operate: https://t.co/AARMozy2zy" / X

x.com/jmj/status/1806125131024183452

Jun 28, 2024

Steve Stewart-Williams on X: "The Big 5 personality traits strongly predict life satisfaction (r = .8 - one of the largest effects I’ve seen in a psychology paper). https://t.co/K2OaLCSD0L https://t.co/TlZiLN3ibe" / X

x.com/SteveStuWill/status/1806087660139946432

Jun 28, 2024