HomeMy HighlightsDiscover
Sign up
Mark Erdmann

Mark Erdmann

@8ybco2inhg7qeql5

Ask AI Clone

Joined Jun 18, 2024

0

Following

3

Followers

111

pages

163

highlights

367

views
M
W
F
Mar
Apr
May
Jun
All/Weekly/Yearly
Total Days:
Total Weeks:
31-Days 📚
11-Weeks 📚
Tags
Domains
foundermode 1

Atomic Graph

Highlights
Favorite
Kindle
Video
Bookmarks
Hatches
Posts

(1) Patrick Collison on X: "Was chatting with a well-known founder yesterday about the "founder mode" discussion. We were both wondering if people would misinterpret it, and undervalue the importance of hiring great leaders. Steve Jobs, the canonical example of "founder mode", was also gifted at" / X

x.com/patrickc/status/1835434966072836483

foundermode

Sep 18, 2024

1

CLS on X: "Automating AI research is exciting! But can LLMs actually produce novel, expert-level research ideas? After a year-long study, we obtained the first statistically significant conclusion: LLM-generated ideas are more novel than ideas written by expert human researchers. https://t.co/vAU65mBikQ" / X

x.com/ChengleiSi/status/1833166031134806330

Sep 10, 2024

1

(1) Rohan Paul on X: "Simply adding "Repeat the question before answering it." somehow make the models answer the trick question correctly. 🤔 Probable explanations:✨ 📌 Repeating the question in the model's context, significantly increasing the likelihood of the model detecting any potential https://t.co/kGxwHeyBVp" / X

x.com/rohanpaul_ai/status/1830230678673223737

Sep 3, 2024

2

Victor M on X: "This is probably the most beautiful photorealistic LoRA ever trained ☀️ https://t.co/wwHHKckky6 https://t.co/RBWOBVLGhA" / X

x.com/victormustar/status/1828895056738374098

Aug 29, 2024

1

Yohei on X: "Proposing a new benchmark for autonomous agents… Start with one code base, $1000 in a digital wallet, and one email address. Score: how much money it can make with no human intervention - just press go. Let’s call it… the HustleAGI benchmark." / X

x.com/yoheinakajima/status/1828498398313652531

Aug 27, 2024

1

elvis on X: "This Python tool looks super useful to crawl websites and convert data into LLM-ready markdown or structured data. I find myself doing this a lot and most of the time it is a tedious effort. Great to see a service that does data extraction catered for LLM-based pipelines. https://t.co/yaiKttOSU2" / X

x.com/omarsar0/status/1828470077798183403

Aug 27, 2024

1

Dylan Freedman on X: "Microsoft's new open source Phi 3.5 vision model is really good at OCR/text extraction — even on handwriting! You can prompt it to extract tabular data as well. It's permissively licensed (MIT). Play around with it here: https://t.co/5onmYAwNu7 https://t.co/hjYieofnKw" / X

x.com/dylfreed/status/1828132226523131931

Aug 27, 2024

1

Patrick Collison on X: "After trying many different strategies over the years, recent yoyo-ing across several continents has convinced me that the best jet lag strategy is simply to limit adjustment to 1–2 hours/day, especially when traveling east. Means a lot of reading and email at odd hours, but" / X

x.com/patrickc/status/1828286610334720355

Aug 27, 2024

6

(1) Ron Mokady on X: "My short analysis of the (technical) difference between Flux and SD3: 1. The most significant architecture change IMO is that RoPE (Rotary Position Embedding) is injected before each attention layer [1/N] https://t.co/n8x83tcOJ6" / X

x.com/MokadyRon/status/1821533077396390063

Aug 8, 2024

1

(1) Nat Friedman on X: "I successfully used galantamine to induce lucid dreaming after a couple of tries. Other people might enjoy: https://t.co/zMzPw6Wj3g" / X

x.com/natfriedman/status/1821632980915466698

Aug 8, 2024

1

(1) Rohan Paul on X: "MASSIVE achievement by @GoogleDeepMind. Just released Gemma-2 2B surpasses all GPT-3.5 models on Chatbot Arena. 🤯 Take that a 2B param model surpasses GPT-3.5 (175B+ param ) - almost can't believe it. Totally they released three new additions to the Gemma 2 family:They 📌 https://t.co/jzN1LGyblZ" / X

x.com/rohanpaul_ai/status/1818697538360295897

Jul 31, 2024

1

(1) Damien Teney on X: "Lots of confusion here: 🤔 ● There was already no doubt that GPT-2 could implement generalizing arithmetic. ● It's also not an 'optimization' problem. It's not about finding a smaller value of the training loss. The limitation comes from the fact... ⬇️" / X

x.com/DamienTeney/status/1817501437078679796

Jul 30, 2024

1

(2) Allen Downey on X: "On Reddit's statistics forum, the most common question is "What test should I use?" My answer, from 2011, is "There is only one test" https://t.co/J5Ar4olekz https://t.co/qaHhZjMt8C" / X

x.com/AllenDowney/status/1817908776072028475

Jul 30, 2024

1

(1) from:emollick gender - Search / X

x.com/search?q=from%3Aemollick%20gender&src=typed_query&f=top

Jul 17, 2024

1

Michael Antonelli on X: "Wild list. How about @miamiuniversity https://t.co/RrDAi4bMsC" / X

x.com/BullandBaird/status/1813193905887666277

Jul 16, 2024

1

Kangwook Lee on X: "🧵Let me explain why the early ascent phenomenon occurs🔥 We must first understand that in-context learning exhibits two distinct modes. When given samples from a novel task, the model actually learns the pattern from the examples. We call this mode the "task learning" mode. https://t.co/AirPHIjAVp" / X

x.com/Kangwook_Lee/status/1767603595619246530

Jul 16, 2024

1

mutable.ai

mutable.ai/

Jul 16, 2024

1

(1) Matt Holden on X: "I want a MIDI controller for the 9 Enneagram types of my LLM Claude tries way too hard to help, gotta dial that 2 energy down Let's crank up the 4/7 for this visual design, then dial up 1 to write the unit tests, and then use some 3/5 (with a bit of 6) for the strategy doc" / X

x.com/holdenmatt/status/1813249808657969505

Jul 16, 2024

1

Gradio on X: "Florence-2 to generate image captions and AuraFlow to generate the image! Yields stunning results. Try out the fun app by @nonda30 at : https://t.co/FrpS5HH1r1 https://t.co/R9sXtYCh7s" / X

x.com/Gradio/status/1812811631522361527

Jul 16, 2024

1

(3) Andrej Karpathy on X: "We will see that a lot of weird behaviors and problems of LLMs actually trace back to tokenization. We'll go through a number of these issues, discuss why tokenization is at fault, and why someone out there ideally finds a way to delete this stage entirely. https://t.co/5haV7FvbBx" / X

x.com/karpathy/status/1759996551378940395

Jul 16, 2024

2

Rohan Paul on X: "✨ Intriguing paper - Synthetic data proves to be nearly as effective as real data and shows no clear saturation when scaled up to approximately one million samples. 🗞️ "Common 7B Language Models Already Possess Strong Math Capabilities" 📌 To overcome this limitation of the https://t.co/qP7doGiEH3" / X

x.com/rohanpaul_ai/status/1812125722091143344

Jul 15, 2024

2

Ethan Mollick on X: "It begins. This is another sign that LLMs are going to be able to work with structured & unstructured spreadsheet data soon. This will unlock a lot of use cases (projections, financials, valuations, etc.) and having a spreadsheet source of truth will tend to lower hallucinations https://t.co/ovYalYips5" / X

x.com/emollick/status/1812684733538541694

Jul 15, 2024

1

Posts / X

x.com/i/timeline

Jul 15, 2024

4

Soami Kapadia on X: "Mixture of Agents on Groq Introducing a fully configurable, Mixture-of-Agents framework powered by @GroqInc using @LangChainAI You can configure your own MoA version using the @streamlit UI through the framework. details + links below👇🧵 https://t.co/nItnqbPtgi" / X

x.com/KapadiaSoami/status/1811657156082712605

Jul 13, 2024

3

(1) Ted Werbel on X: "Few things pretty obvious to a few AI researchers but that most don't want to believe: 1. 90% of the most impactful AI research is already on arxiv, x, or company blog posts 2. q* aka strawberry = STaR (self-taught reasoners) with dynamic self-discover + something like DSPy for" / X

x.com/tedx_ai/status/1811945091696853431

Jul 13, 2024

1

ben on X: "super interesting report on @openai's revenue they make 5x more from ChatGPT than they make from every single product that is built on top of OpenAI, in the entire world, combined https://t.co/JinXW4GHZY https://t.co/HTY4c9PqVe" / X

x.com/benhylak/status/1811448374349943263

Jul 12, 2024

1

Patrick Hsu on X: "Collaborate with people who have a strong sense of aesthetic This is surprisingly underappreciated" / X

x.com/pdhsu/status/1811416100393083002

Jul 12, 2024

1

Yifei Hu on X: "Releasing TF-ID: Table/Figure Identifier for academic papers. SoTA performance: 98%+ sucess rate for perfect table/figure detection 📈 mit license: free for any use cases ✅ 2 sizes: 0.23B and 0.77B 📏 2 variants: with or without caption text 🛠️ Finetuned on Florence 2 with https://t.co/kEoLCVFaRs" / X

x.com/hu_yifei/status/1811187540009042417

Jul 12, 2024

1

(1) xjdr on X: "Claude Power Move (CPM): Come up with an abstract idea "Help me create a step-by-step plan to <do x> and in order to accomplish <y goal>". Send that to sonnet 3.5 for the reasoning engine. Take the sonnet 3.5 output and feed it into opus with a "Please elaborate and improve" / X

x.com/_xjdr/status/1811470145426194602

Jul 12, 2024

1

(1) Tanay Jaipuria on X: "Software IPOs by Year 😲 via @avenirgrowth https://t.co/j1UAeWomCQ" / X

x.com/tanayj/status/1811517130963083408

Jul 12, 2024

1

Ethan Mollick on X: "An experiment shows temperature has different effects on test-taking for men vs. women. For verbal tests, women beat men when it is over 70° F (maxing at 90°). In math, men & women do the same when its 80°. They suggest setting office thermostats higher! https://t.co/JJnSXs4VV4 https://t.co/LcxbIpdcqR" / X

x.com/emollick/status/1360763996484366336

Jul 11, 2024

1

(1) Morgan McGuire (Hiring 👋) on X: "RIP RAG “I think long context is definitely the future rather than RAG” On domain specialisation: “If you want a model for medical domain, legal domain…it (finetuning) definitely makes sense…finetuning can also be an alternative to RAG” Great episode, had to listen 0.75x 😂" / X

x.com/morgymcg/status/1810973158331072630

Jul 10, 2024

1

(1) Ethan Mollick on X: "We know good management is causal in part because, a decade ago, teams of consultants introduced basic management practices to some Indian plants & left others as a control. The practices boosted performance then. A followup shows about half the effects persist 10 years later! https://t.co/x4NROBczIJ" / X

x.com/emollick/status/1808922231302422840

Jul 4, 2024

1

(1) Lingming Zhang on X: "Introducing OpenAutoCoder-Agentless😺: A simple agentless solution solves 27.3% GitHub issues on SWE-bench Lite with ~$0.34 each, outperforming all open-source AI SW agents! It's fully open-source, try it out: 🧑‍💻https://t.co/AKyiZhmi7B 📝https://t.co/Oc4QCaQult https://t.co/gQDfCrLzs3" / X

x.com/LingmingZhang/status/1808501612056629569

Jul 4, 2024

1

Rohan Paul on X: "Quite an wild idea in this paper - Proposes a persona-driven data synthesis methodology using Persona Hub, a collection of 1 billion diverse personas, to create scalable and diverse synthetic data for LLM training and evaluation. 📌 Persona Hub contains 1bn+ personas derived https://t.co/r3pjDYa49u" / X

x.com/rohanpaul_ai/status/1808096574997770590

Jul 3, 2024

1

Pat Walls on X: "Free business idea for anyone that can code: Build a tiny saas around a SINGLE Zapier integration. Hear me out... So Zapier has 6,000 integrations. 6,000! 1. Find an integration that's (1) popular and (2) limited in functionality 2. Make it 10x better, cover edge cases, etc. https://t.co/TppPkhRDNf" / X

x.com/thepatwalls/status/1808150786804707755

Jul 3, 2024

1

AGI will drastically increase economies of scale — LessWrong

www.lesswrong.com/posts/Sn5NiiD5WBi4dLzaB/agi-will-drastically-increase-economies-of-scale

Jul 3, 2024

1

(3) Aidan McLau on X: "livebench (https://t.co/3fKC4vaoTE) is my new favorite eval: > contamination proof (new questions monthly) >tests model iq (unlike arena nowadays) >matches my intuition on relative perf quite well thanks @jpohhhh for the pointer https://t.co/fDXfG51wJe" / X

x.com/aidan_mclau/status/1807875944088326271

Jul 2, 2024

1

(1) Rohan Paul on X: "Brilliant new paper, HUGE for LLM's internalized knowledge 🔥 Out Of Context Learning > In Context Learning | Fine-tuning can teach new concepts better than ICL 📌 Finds a surprising capability of LLMs through a process called inductive out-of-context reasoning (OOCR). In the https://t.co/Ys5LUgLNKp" / X

x.com/rohanpaul_ai/status/1807774433550950816

Jul 1, 2024

1

(1) elvis on X: "This is one of the coolest ideas for scaling synthetic data that I've come across. Proposes 1 billion diverse personas to facilitate the creation of diverse synthetic data for different scenarios. It's easy to generate synthetic data but hard to scale up its diversity which is https://t.co/UR998d49hE" / X

x.com/omarsar0/status/1807827401122238628

Jul 1, 2024

1

Jeff Morris Jr. on X: "“How to ship fast as a small company looking for product-market fit” — by @varunsrin @farcaster_xyz is one of the fastest engineering teams I’ve ever seen… Here is how they operate: https://t.co/AARMozy2zy" / X

x.com/jmj/status/1806125131024183452

Jun 28, 2024

1

Steve Stewart-Williams on X: "The Big 5 personality traits strongly predict life satisfaction (r = .8 - one of the largest effects I’ve seen in a psychology paper). https://t.co/K2OaLCSD0L https://t.co/TlZiLN3ibe" / X

x.com/SteveStuWill/status/1806087660139946432

Jun 28, 2024

1

Greg Kamradt on X: "How do SOTA LLMs do on ARC Prize? We wanted to see how gpt-4o, claude sonnet, and gemini did on public tasks So we made a baseline template with @LangChainAI that tests them all Scores: * Claude Sonnet: 21% * gpt-4o: 9% * gemini 1.5: 8% https://t.co/6wXW8E3vOE" / X

x.com/GregKamradt/status/1806373849333653975

Jun 28, 2024

1

Ethan Mollick on X: "Two big lessons in the new OpenAI paper on training AI to detect AI bugs, 1) Cyborgs rule: AI detected more bugs than humans alone, but humans & AI working together had lower hallucination rates… 2)…for now: human error rates were also high. And read the highlighted conclusion. https://t.co/FbPTQVNPeI" / X

x.com/emollick/status/1806401500194672742

Jun 28, 2024

1

(1) BioBootloader on X: "1/ Thrilled to announce that our team has created the most advanced coding AI in the world, smashing the previous State-of-the-Art by solving 38.33% of SWE-bench Lite! MentatBot is not only the most accurate, but runs extremely quickly and is available for you to use today! https://t.co/FNJ7nPKaNi" / X

x.com/bio_bootloader/status/1806342922893394290

Jun 27, 2024

1

Finding GPT-4’s mistakes with GPT-4

openai.com/index/finding-gpt4s-mistakes-with-gpt-4/

Jun 27, 2024

1

(1) Greg Kamradt on X: "Last week @RyanPGreenblatt shared his gpt-4o based attempt on ARC-AGI We verified his score, excited to say his method got 42% on public tasks We’re publishing a secondary leaderboard to measure attempts like these So of course we tested gpt-4, claude sonnet, and gemini https://t.co/lyfIKNOioL" / X

x.com/GregKamradt/status/1806372523170533457

Jun 27, 2024

1

MultiOn on X: "Introducing Retrieve API: the best-in-class autonomous web information retrieval API. Developers love our Agent API ❤️. Since its launch, we have consistently received feedback that many use cases rely on intelligently leveraging the Agent API to retrieve information from the https://t.co/upOn8TflUj" / X

x.com/MultiOn_AI/status/1806007797030834521

Jun 27, 2024

1

University to Replace Students With ChatGPT After It Outperforms Them in Exams / X

x.com/i/trending/1806333778765271252

Jun 27, 2024

1

Ethan Mollick on X: "This isn't reliable enough yet, but it is a sign of what is coming: Claude 3.5 here's excel of my startup's finances, make a dashboard Add sensitivity analysis of key assumptions Run it as a Monte Carlo simulation Assuming a normal distribution, what are outcomes? All first try https://t.co/JBnudGihja" / X

x.com/emollick/status/1806321738734600337

Jun 27, 2024

1

Marzena Karpinska on X: "Can #LLMs truly reason over loooong context? 🤔 NoCha asks LLMs to verify claims about *NEW* fictional books 🪄 📚 ⛔ LLMs that solve needle-in-the-haystack (~100%) struggle on NoCha! ⛔ None of 11 tested LLMs reach human performance → 97%. The best, #GPT-4o, gets only 55.8%. https://t.co/beuo7q9KIj" / X

x.com/mar_kar_/status/1805660949023793224

Jun 26, 2024

1

Gizem Akdag on X: "Here is a Midjourney Style Reference that I think you'll like: --sref 3721090848. Save this code for calm, soft vibes. This one works really well with architecture, city, and interior design shots. Also, with a 35 mm film look, it gives a vintage feel. This time, I used Krea https://t.co/hKQhkCx6MI" / X

x.com/gizakdag/status/1785257036151656918

Jun 26, 2024

1

Tuhin Chakrabarty on X: "New paper with students @BarnardCollege on testing orthogonal thinking / abstract reasoning capabilities of Large Language Models using the fascinating yet frustratingly difficult @nytimes Connections game. #NLProc #LLMs #GPT4o #Claude3opus 🧵(1/n) https://t.co/jDfCbpPi2Z" / X

x.com/TuhinChakr/status/1805999559585227002

Jun 26, 2024

2

François Fleuret on X: "This is an argument often used (by me included), but I find it slightly unsatisfactory. Consider natural selection as a process that given *tons of training data* produce a 100k x 100k configuration of the game of life (that's roughly the information in human dna). 1/3" / X

x.com/francoisfleuret/status/1806037891757293626

Jun 26, 2024

1

clem 🤗 on X: "Pumped to announce the brand new open LLM leaderboard. We burned 300 H100 to re-run new evaluations like MMLU-pro for all major open LLMs! Some learning: - Qwen 72B is the king and Chinese open models are dominating overall - Previous evaluations have become too easy for recent" / X

x.com/ClementDelangue/status/1805989925080219927

Jun 26, 2024

1

Anna Mills, annamillsoer.bsky.social, she/her on X: "Can we tell when a student submission is AI? This study from the University of Reading suggests not. "The university’s markers – who were not told about the project – flagged only one of the 33 entries." https://t.co/dTuSCTJnHF" / X

x.com/AnnaRMills/status/1806033830027182423

Jun 26, 2024

1

Ethan Mollick on X: "Researchers secretly added AI-created the papers to the exam pool: “We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students.“ https://t.co/z8IX14133B. https://t.co/JDmET3q7pw" / X

x.com/emollick/status/1806040241104470228

Jun 26, 2024

1

(1) https://arxiv.org/abs/2406.13121 - Search / X

x.com/search?q=https%3A%2F%2Farxiv.org%2Fabs%2F2406.13121&src=typed_query&f=top

Jun 26, 2024

1

Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

arxiv.org/abs/2406.13121

Jun 26, 2024

1

(2) Posts liked by mark erdmann (@markerdmann) / X

x.com/markerdmann/likes

Jun 25, 2024

2

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

arxiv.org/abs/2406.11695

Jun 25, 2024

1

(1) Dan Hendrycks on X: "Nat's right so I think I'm going to make 2-3 more benchmarks to replace MMLU and MATH." / X

x.com/DanHendrycks/status/1804929811703591345

Jun 24, 2024

1

(2) Eugene Yan (SF 22 - 28 June) on X: "i previously spoke to a team who only used embedding-based retrieval. i suggested, insisted, they try lexical search. at our next chat, they shared that 80% of the relevant docs now come from lexical search. i.e., without lexical search they were missing 80% of the juice for RAG. https://t.co/2N92Xygw1G" / X

x.com/eugeneyan/status/1804270554033328359

Jun 21, 2024

1

Detecting hallucinations in large language models using semantic entropy - Nature

www.nature.com/articles/s41586-024-07421-0

Jun 21, 2024

1

(2) Andrej Karpathy on X: "The way to think about asking a factual question to an LLM is that it's a bit like asking a person who read about the topic previously, but they are not allowed to reference any material and have to answer just from memory. LLMs are a lot better at memorizing than humans, but the" / X

x.com/karpathy/status/1804208334033371213

Jun 21, 2024

1

How to use Claude’s artifacts

medium.com/@simeon.emanuilov/how-to-use-claudes-artifacts-908835dbd96a

Jun 21, 2024

1

(1) Keyon Vafa on X: "New paper: How can you tell if a transformer has the right world model? We trained a transformer to predict directions for NYC taxi rides. The model was good. It could find shortest paths between new points But had it built a map of NYC? We reconstructed its map and found this: https://t.co/5z6sglnRIQ" / X

x.com/keyonV/status/1803838591371555252

Jun 20, 2024

2

2406.04692v1.pdf

arxiv.org/pdf/2406.04692

Jun 20, 2024

1

Aqua Voice - Voice-only Document Editor

withaqua.com/?ref=upstract.com

Jun 20, 2024

1

(1) Rob Wiblin on X: ""The results were otherworldly. Claude is fully capable of acting as a Supreme Court Justice right now. When used as a law clerk, Claude is easily as insightful and accurate as human clerks, while towering over humans in efficiency." https://t.co/tfdYtHSqnT https://t.co/83t85g5Wtp" / X

x.com/robertwiblin/status/1803388400084381787

Jun 20, 2024

1

NeurIPS-2021-attention-approximates-sparse-distributed-memory-Paper.pdf

proceedings.neurips.cc/paper_files/paper/2021/file/8171ac2c5544a5cb54ac0f38bf477af4-Paper.pdf

Jun 19, 2024

1

Lienid on X: "@mikeknoop been clear for a while that transformers have nailed associative memory. it even maps to a biologically plausible mechanism. frankly i’m not sure how people haven’t come to this conclusion yet https://t.co/remVwHBlYd" / X

x.com/0xLienid/status/1803530958207066114

Jun 19, 2024

1

Mike Knoop on X: "If superintelligence is human-level skill acquisition (AGI) plus narrow super-human characteristics, like memorization or inference speed, this is plausibly within reach. The former still requires new 0 to 1 ideas (see ARC Prize) but the latter already exists." / X

x.com/mikeknoop/status/1803528066616246478

Jun 19, 2024

1

Rohan Paul on X: "Transformer models can learn robust reasoning skills (beyond those of GPT-4-Turbo and Gemini-1.5-Pro) through a stage of training dynamics that continues far beyond the point of overfitting (i.e. with 'Grokking') 🤯 For a challenging reasoning task with a large search space,… https://t.co/Tl9bND5PHq" / X

x.com/rohanpaul_ai/status/1803478727067603055

Jun 19, 2024

1

Gary Basin 🍍 on X: "Why deep learning is ngmi in one graph https://t.co/lZwvEnXy8H" / X

x.com/garybasin/status/1802465723215737112

Jun 19, 2024

1

davidad 🎇 on X: "When @GaryMarcus and others (including myself) say that LLMs do not “reason,” we mean something quite specific, but it’s hard to put one’s finger on it, until now. Specifically, Transformers do not generalize algebraic structures out of distribution." / X

x.com/davidad/status/1802576341470216362

Jun 19, 2024

1

abhav on X: "Something weird is afoot. quick story involving: - open source "reasoning" SOTA LLM (only 7B params, and from china!) - big math doing small math - a $1M opportunity well, almost $1m and that is really tough. anyway, strap in 🍿🧵" / X

x.com/abhav_k/status/1802572167617626399

Jun 19, 2024

1

Hesam on X: "Reasoning with LLM is Hard! ​ Large Language Models need help with generalized reasoning capabilities, and a key factor is how we prompt them. ​ 📌 Traditional prompting methods such as Chain-of-Thought (CoT) or Tree-of-Thought (ToT) often require multiple assumptions or numerous… https://t.co/XtRVXNSydX" / X

x.com/itsHesamSheikh/status/1801934604334477355

Jun 19, 2024

1

AlphaMath Almost Zero: process Supervision without process

arxiv.org/abs/2405.03553

Jun 19, 2024

1

Aran Komatsuzaki on X: "@jeremyphoward Btw there are many other recent papers with LLM + MCTS for reasoning with successful results. Here are some interesting ones: - https://t.co/TpE92UMx2C - https://t.co/1Kh8rVyTat" / X

x.com/arankomatsuzaki/status/1803482585378726379

Jun 19, 2024

1

Alessio Fanelli on X: "How AI is eating Finance 📈 @vagabondjack is back on @latentspacepod! He shared all the AI Engineering wisdom he acquired while turning LLMs into AI thought partners @brightwaveio for customers with >$120B under management 💰 - Why he lost faith in long context windows - 3 https://t.co/AKJ82amHDC" / X

x.com/FanaHOVA/status/1800553625607155856

Jun 19, 2024

1

Xing Han Lu on X: "Announcing ⚡BM25S, a fast lexical retrieval library! 🏎️ Up to 500x faster than the most popular Python lib, matches @Elastic search (BM25 default) 🤗 First BM25 library that is directly integrated with @huggingface hub: load or save in 1 line! GitHub: https://t.co/iuQleXIGgX https://t.co/trNv0QbUao" / X

x.com/xhluca/status/1803100958408241597

Jun 19, 2024

1

Naomi Saphra on X: "Modern generative models are trained to imitate human experts, but can they actually beat those experts? Our new paper uses imitative chess agents to explore when a model can "transcend" its training distribution and outperform every human it's trained on. https://t.co/oKsIh5nVBk https://t.co/rA3TzmIXm7" / X

x.com/nsaphra/status/1803114822445465824

Jun 19, 2024

1

Patterns for Building LLM-based Systems & Products

eugeneyan.com/writing/llm-patterns/

Jun 19, 2024

1

Arvind Narayanan on X: "Tired: train/test leakage. Wired: benchmark contamination. Inspired: resample until answer is correct." / X

x.com/random_walker/status/1803392358093857127

Jun 19, 2024

1

(1) Alex Cheema - e/acc on X: "Llama 3 running locally on iPhone with MLX Built by @exolabs_ team @mo_baioumy h/t @awnihannun MLX & @Prince_Canuma for the port https://t.co/4swkM7mOfI" / X

x.com/ac_crypto/status/1781061013716037741

Jun 19, 2024

1

2406.11741v1.pdf

arxiv.org/pdf/2406.11741

Jun 19, 2024

1

Context caching  |  Google AI for Developers  |  Google for Developers

ai.google.dev/gemini-api/docs/caching?lang=python

Jun 19, 2024

1

x.com/johnathanbi/status/1803096216299090267?s=12

Jun 19, 2024

Pass@k or Pass@1? · Issue #1 · trotsky1997/MathBlackBox

github.com/trotsky1997/MathBlackBox/issues/1

Jun 18, 2024

1

quickwit-oss/tantivy: Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

github.com/quickwit-oss/tantivy

Jun 18, 2024

1

Olympiad Solutions - Search / X

x.com/search?q=Olympiad%20Solutions&src=typed_query

Jun 18, 2024

1

The 100 Rep Squat Challenge

kettlebellaerobics.substack.com/p/the-100-rep-squat-challenge

Jun 18, 2024

1

Applied LLMs - What We’ve Learned From A Year of Building with LLMs

applied-llms.org/

Jun 18, 2024

1

(1) Terry Yue Zhuo on X: "In the past few months, we’ve seen SOTA LLMs saturating basic coding benchmarks with short and simplified coding tasks. It's time to enter the next stage of coding challenge under comprehensive and realistic scenarios! -- Here comes BigCodeBench, benchmarking LLMs on solving… https://t.co/w3Z6N5wnVk" / X

x.com/terryyuezhuo/status/1803076834520945117

Jun 18, 2024

1

4

(1) Patrick Collison on X: "Was chatting with a well-known founder yesterday about the "founder mode" discussion. We were both wondering if people would misinterpret it, and undervalue the importance of hiring great leaders. Steve Jobs, the canonical example of "founder mode", was also gifted at" / X

URL
https://x.com/patrickc/status/1835434966072836483
Tag
foundermode

Highlights & Notes

Was chatting with a well-known founder yesterday about the "founder mode" discussion.

We were both wondering if people would misinterpret it, and undervalue the importance of hiring great leaders. Steve Jobs, the canonical example of "founder mode", was also gifted at identifying stellar leaders, without whom no great organization gets built. (And we're lucky to have many at Stripe.)

To the extent that there's an ostensible tension here (founder-mode micromanagement vs the classic view that one should focus on enablement), this founder pointed out that the lens of domain-specific judgment helps reconcile the dichotomy.

• You need to have excellent judgment in your problem area.
• You need to recognize the importance of good judgment as a phenomenon.
• You need to demand it in others.

He argued that many companies and founders fail at (2) and (3). That is, individuals can be effective "people managers", or have strong resumes, or whatever, but just not be deep enough in their domains to be right on the substantive merits of questions within their purview (and unable to recursively detect/insist on that correctness in others, or to elevate and prize it when they see people who do