Mark Erdmann's Highlights on '(1) François Chollet on X: "@mahaoo_ASI @wintermoat SOTA did not go from 35% to 50%. The 50% is on the evaluation set, the 35% is on the private test set. The solution that does ~35% on the private test set also did ~50% on the evaluation set, so 50% on the eval set is not clearly a new SOTA (it might be, but it isn't clear)" / X'

Patterns for Building LLM-based Systems & Products

eugeneyan.com/writing/llm-patterns/

Jun 19, 2024

Arvind Narayanan on X: "Tired: train/test leakage. Wired: benchmark contamination. Inspired: resample until answer is correct." / X

x.com/random_walker/status/1803392358093857127

Jun 19, 2024

(1) Alex Cheema - e/acc on X: "Llama 3 running locally on iPhone with MLX Built by @exolabs_ team @mo_baioumy h/t @awnihannun MLX & @Prince_Canuma for the port https://t.co/4swkM7mOfI" / X

x.com/ac_crypto/status/1781061013716037741

Jun 19, 2024

2406.11741v1.pdf

arxiv.org/pdf/2406.11741

Jun 19, 2024

Context caching | Google AI for Developers | Google for Developers

ai.google.dev/gemini-api/docs/caching?lang=python

Jun 19, 2024

x.com/johnathanbi/status/1803096216299090267?s=12

Jun 19, 2024

Pass@k or Pass@1? · Issue #1 · trotsky1997/MathBlackBox

github.com/trotsky1997/MathBlackBox/issues/1

Jun 18, 2024

quickwit-oss/tantivy: Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

github.com/quickwit-oss/tantivy

Jun 18, 2024

Olympiad Solutions - Search / X

x.com/search?q=Olympiad%20Solutions&src=typed_query

Jun 18, 2024

The 100 Rep Squat Challenge

kettlebellaerobics.substack.com/p/the-100-rep-squat-challenge

Jun 18, 2024

Applied LLMs - What We’ve Learned From A Year of Building with LLMs

applied-llms.org/

Jun 18, 2024

(1) Terry Yue Zhuo on X: "In the past few months, we’ve seen SOTA LLMs saturating basic coding benchmarks with short and simplified coding tasks. It's time to enter the next stage of coding challenge under comprehensive and realistic scenarios! -- Here comes BigCodeBench, benchmarking LLMs on solving… https://t.co/w3Z6N5wnVk" / X

x.com/terryyuezhuo/status/1803076834520945117

Jun 18, 2024

(2) François Chollet on X: "I believe that program synthesis will solve reasoning. And I believe that deep learning will solve program synthesis (by guiding a discrete program search process). But I don't think you can go all that far with just prompting a LLM to generate end-to-end Python programs (even…" / X

x.com/fchollet/status/1803096195684012371

Jun 18, 2024

Caiming Xiong on X: "🎆I am pleased to announce the release of the latest version of the Salesforce Embedding Model (SFR-embedding-v2), which has reclaimed the top-1 position on the MTEB benchmark. ✨ Key Highlights: 🥇 Achieved the distinction of being the second model to surpass a 70+ performance… https://t.co/ucs4gXfp1v" / X

x.com/CaimingXiong/status/1802879572385714496

Jun 18, 2024

Debunking the Chessboard: Confronting GPTs Against Chess Engines to Estimate Elo Ratings and Assess Legal Move Abilities

blog.mathieuacher.com/GPTsChessEloRatingLegalMoves/

Jun 18, 2024

Beyond the Basics of Retrieval for Augmenting Generation – Parlance

parlance-labs.com/education/rag/ben.html

Jun 18, 2024

TaskMeAnything

www.task-me-anything.org/

Jun 18, 2024

John David Pressman on X: "My problem with "transformers don't generalize algebraic structures and therefore don't reason" is that while I agree this is a real limitation there are important aspects of reason which these models in fact do and other methods don't. We may need to divide "reason" up." / X

x.com/jd_pressman/status/1802835378451185733

Jun 18, 2024

François Chollet on X: "@RyanPGreenblatt @TomDAAVID @AndrewTBurks @dwarkesh_sp This isn't reasoning, it's intuition. Intuition is a fast, perception-like, inexact, approximate way of navigating a complex space. Your LLM has "intuition" over the space of program, which can be used to fight combinatorial complexity and make discrete program search more…" / X

x.com/fchollet/status/1802790666420035646

Jun 18, 2024

(1) François Chollet on X: "@mahaoo_ASI @wintermoat SOTA did not go from 35% to 50%. The 50% is on the evaluation set, the 35% is on the private test set. The solution that does ~35% on the private test set also did ~50% on the evaluation set, so 50% on the eval set is not clearly a new SOTA (it might be, but it isn't clear)" / X

x.com/fchollet/status/1802807579489468846

Jun 18, 2024

(1) Warp on X: "Type plain English on the command line. Accomplish any dev task. This is the command line for the AI era. New Agent Mode is available today. https://t.co/ptqib32w8o" / X

x.com/warpdotdev/status/1802736163507118387

Jun 18, 2024

(2) Gergely Orosz on X: "After having read it, I can say this is probably the best book to explain how ChatGPT (and LLMs) work (written by Stephen Wolfram, who excels at explaining complex topics simply, so perhaps not a surprise) There's also a blog post form for those wanting to read online." / X

x.com/GergelyOrosz/status/1802798002081251497

Jun 18, 2024