2410.14815v2.pdf
arxiv.org/pdf/2410.14815v2
Aug 20, 2025
15
2208.03306v1.pdf
arxiv.org/pdf/2208.03306
Jul 31, 2025
34
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection - 2502.10361v1.pdf
arxiv.org/pdf/2502.10361
Jul 29, 2025
35
2504.11336v2.pdf
arxiv.org/pdf/2504.11336
Jul 29, 2025
11
Training a Generally Curious Agent - 2502.17543v3.pdf
arxiv.org/pdf/2502.17543
Jul 29, 2025
10
2409.05816v2.pdf
arxiv.org/pdf/2409.05816
Jul 25, 2025
30
The Era of Experience Paper.pdf
storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf
Jul 25, 2025
25
2310.10638v6.pdf
arxiv.org/pdf/2310.10638
Jul 24, 2025
31
2304.09151v1.pdf
arxiv.org/pdf/2304.09151
Jul 24, 2025
231
2503.13423v2.pdf
arxiv.org/pdf/2503.13423
Jul 24, 2025
29
2410.20796v1.pdf
arxiv.org/pdf/2410.20796
Jul 24, 2025
282
2406.13361v1.pdf
arxiv.org/pdf/2406.13361
Jul 17, 2025
16
2312.06134v1.pdf
arxiv.org/pdf/2312.06134
Jul 17, 2025
251
2408.14960v1.pdf
arxiv.org/pdf/2408.14960
Jul 15, 2025
361

The Product-Minded Software Engineer
blog.pragmaticengineer.com/the-product-minded-engineer/
Jul 11, 2025
6
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language - 2506.20920v1.pdf
arxiv.org/pdf/2506.20920
Jul 10, 2025
39
2504.15431v1.pdf
arxiv.org/pdf/2504.15431
Jul 10, 2025
413
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining - 2507.01785v1.pdf
arxiv.org/pdf/2507.01785
Jul 9, 2025
162

Direct Preference Optimization: Your Language Model is Secretly a Reward Model
arxiv.org/html/2305.18290v3
Jun 20, 2025
28

Impossible Distillation for Paraphrasing and Summarization: How to Make High-quality Lemonade out of Small, Low-quality Models
arxiv.org/html/2305.16635v4
Jun 19, 2025
22
A Closer Look at AUROC and AUPRC under Class Imbalance - 2401.06091v3.pdf
arxiv.org/pdf/2401.06091
Dec 4, 2024
6
2306.11644v2.pdf
arxiv.org/pdf/2306.11644
Nov 5, 2024
22
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only - 2306.01116v1.pdf
arxiv.org/pdf/2306.01116
Nov 5, 2024
40

Capacity Management - Fall Quarter 2022 - Online LaTeX Editor Overleaf
www.overleaf.com/project/63195c317e9f10d7572b1fb2
Feb 12, 2024
1

A Big Data Approach to Public Speaking
www.gsb.stanford.edu/insights/big-data-approach-public-speaking
Jun 29, 2023
15

How to Read Academic Content Once and Remember it Forever
betterhumans.pub/how-to-read-academic-content-once-and-remember-it-forever-e44f26d82566
Nov 22, 2022