Mark Erdmann's Highlights on '(3) Aidan McLau on X: "livebench (https://t.co/3fKC4vaoTE) is my new favorite eval: > contamination proof (new questions monthly) >tests model iq (unlike arena nowadays) >matches my intuition on relative perf quite well thanks @jpohhhh for the pointer https://t.co/fDXfG51wJe" / X' | Glasp