Kazuki
@kazuki
Cofounder of Glasp. I collect ideas and stories worth sharing đ
San Francisco, CA
Joined Oct 9, 2020
1068
Following
5613
Followers
1.44k
13.38k
165.37k
metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
Mar 20, 2025
82
map.simonsarris.com/p/reading-well
Mar 16, 2025
74
collabfund.com/blog/pure-independence/
Mar 11, 2025
187
map.simonsarris.com/p/the-most-precious-resource-is-agency
Mar 8, 2025
83
usefulfictions.substack.com/p/how-to-be-more-agentic
Mar 4, 2025
93
x.com/karpathy/status/1894099637218545984/
Mar 3, 2025
41
www.youtube.com/watch?v=MhVZTzMy-BA
Mar 3, 2025
204
www.jasonfeifer.com/how-failure-builds-trust/
Mar 3, 2025
72
www.theverge.com/press-room/617654/internet-community-future-research
Feb 27, 2025
72
read.glasp.co/p/why-im-building-glasp
Feb 26, 2025
125
multitudes.weisser.io/p/the-dam-has-burst
Feb 25, 2025
31
investing101.substack.com/p/barking-in-public
Feb 24, 2025
73
investing101.substack.com/p/the-wrath-of-reading-and-writing
Feb 24, 2025
93
investing101.substack.com/p/on-writing
Feb 24, 2025
51
investing101.substack.com/p/2024-in-books
Feb 24, 2025
22
www.implications.com/p/outmaneuvering-friction-stages-of
Feb 21, 2025
94
www.jasonfeifer.com/sharing-something-personal-purposefully/
Feb 21, 2025
81
glasp.co/posts/e387a4bb-4be1-4cbd-ba2b-cfbe5d25920c
Feb 18, 2025
41
andysblog.uk/why-blog-if-nobody-reads-it/
Feb 18, 2025
62
fs.blog/richard-feynman-what-problems-to-solve/
Feb 13, 2025
31
putsomethingback.stevejobsarchive.com/internal-meeting-at-apple
Feb 13, 2025
2
putsomethingback.stevejobsarchive.com/
Feb 13, 2025
1
andrewchen.substack.com/p/the-growth-maze-vs-the-idea-maze
Feb 11, 2025
134
www.simplypsychology.org/what-is-the-yerkes-dodson-law.html
Feb 11, 2025
5
blog.samaltman.com/three-observations
Feb 10, 2025
133
openai.com/index/introducing-deep-research/
Feb 6, 2025
72
worldaftercapital.gitbook.io/worldaftercapital/part-three/power
Feb 6, 2025
121
www.jasonfeifer.com/how-to-be-a-powerful-communicator/
Feb 5, 2025
112
www.jasonfeifer.com/how-to-solve-your-big-problems-by-finding-your-real-problem/
Feb 5, 2025
7
www.jasonfeifer.com/how-to-recover-after-you-screw-up/
Feb 5, 2025
3
www.sarahtavel.com/p/james-raybould-on-being-ai-forward
Feb 3, 2025
43
blogs.cisco.com/security/evaluating-security-risk-in-deepseek-and-other-frontier-reasoning-models
Feb 1, 2025
82
apnews.com/article/ai-copyright-office-artificial-intelligence-363f1c537eb86b624bf5e81bed70d459
Jan 31, 2025
31
openai.com/index/introducing-operator/
Jan 25, 2025
9
x.com/waitbutwhy/status/1873499122075828270
Jan 24, 2025
1
www.linkedin.com/feed/update/urn:li:activity:7284736784595406848/
Jan 24, 2025
41
www.sarahtavel.com/p/pushing-yourself-beyond-the-google
Jan 23, 2025
3
x.com/StartupArchive_/status/1882110879795384589
Jan 22, 2025
31
blakebutler.substack.com/p/maximizing-time-for-reading
Jan 21, 2025
175
blog.glasp.co/glasp-selected-for-the-2025-gsv-cup-50/
Jan 15, 2025
41
x.com/StartupArchive_/status/1876974655602618798
Jan 15, 2025
2
multitudes.weisser.io/p/founders-and-momentum
Jan 13, 2025
95
www.psychologytoday.com/us/blog/everyday-resilience/202501/do-whats-best-for-yourself-this-year
Jan 13, 2025
125
textswithfounders.substack.com/p/obstacles-for-young-founders-and
Jan 13, 2025
41
x.com/StartupArchive_/status/1872688898947830219
Jan 10, 2025
42
www.noahpinion.blog/p/learn-smart-lessons-from-the-la-fires
Jan 10, 2025
81
mikegreenfield.substack.com/p/unhealthy-incentives
Jan 9, 2025
4
www.digitalnative.tech/p/25-predictions-for-2025
Jan 8, 2025
61
blog.samaltman.com/reflections
Jan 7, 2025
5
www.technologyreview.com/2025/01/06/1108679/ai-generative-search-internet-breakthroughs/
Jan 7, 2025
92
The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years.
AI performance has increased rapidly on many benchmarks across a variety of domains. However, translating this increase in performance into predictions of the real world usefulness of AI can be challenging.
We find that measuring the length of tasks that models can complete is a helpful lens for understanding current AI capabilities. \ [1] 1 This makes sense: AI agents often seem to struggle with stringing together longer sequences of actions more than they lack skills or knowledge needed to solve single steps.
the time taken by human experts is strongly predictive of model success on a given task: current models have almost 100% success rate on tasks taking humans less than 4 minutes, but succeed <10% of the time on tasks taking more than around 4 hours.
the length of tasks models can complete is well predicted by an exponential trend, with a doubling time of around 7 months.
If the measured trend from the past 6 years continues for 2-4 more years, generalist autonomous agents will be capable of performing a wide range of week-long tasks.