NVIDIA-Nemotron-3-Nano-Technical-Report.pdf
research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Nano-Technical-Report.pdf
Dec 15, 2025
13
2407.01492v2.pdf
arxiv.org/pdf/2407.01492
Nov 5, 2025
291

All About Rooflines | How To Scale Your Model
jax-ml.github.io/scaling-book/roofline/
Oct 31, 2025
10

How To Scale Your Model
jax-ml.github.io/scaling-book/
Oct 31, 2025
2
The Data-Quality Illusion: Rethinking Classifier-Based Quality Filtering for LLM Pretraining - 2510.00866v2.pdf
arxiv.org/pdf/2510.00866
Oct 22, 2025
143
MobileLLM-R1: Exploring the Limits of Sub-Billion Language Model Reasoners with Open Training Recipes - 2509.24945v2.pdf
arxiv.org/pdf/2509.24945
Oct 6, 2025
9

Inside vLLM: Anatomy of a High-Throughput LLM Inference System - Aleksa Gordić
www.aleksagordic.com/blog/vllm
Sep 17, 2025
1

Qwen
qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list
Sep 15, 2025
12

SARVAM - TRANSLATE
www.sarvam.ai/blogs/sarvam-translate
Aug 28, 2025
22

Sarvam M
www.sarvam.ai/blogs/sarvam-m
Aug 26, 2025
31
2410.14815v2.pdf
arxiv.org/pdf/2410.14815v2
Aug 20, 2025
15
2208.03306v1.pdf
arxiv.org/pdf/2208.03306
Jul 31, 2025
34
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection - 2502.10361v1.pdf
arxiv.org/pdf/2502.10361
Jul 29, 2025
35
2504.11336v2.pdf
arxiv.org/pdf/2504.11336
Jul 29, 2025
11
Training a Generally Curious Agent - 2502.17543v3.pdf
arxiv.org/pdf/2502.17543
Jul 29, 2025
10
2409.05816v2.pdf
arxiv.org/pdf/2409.05816
Jul 25, 2025
30
The Era of Experience Paper.pdf
storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf
Jul 25, 2025
25
2310.10638v6.pdf
arxiv.org/pdf/2310.10638
Jul 24, 2025
31
2304.09151v1.pdf
arxiv.org/pdf/2304.09151
Jul 24, 2025
231
2503.13423v2.pdf
arxiv.org/pdf/2503.13423
Jul 24, 2025
29
2410.20796v1.pdf
arxiv.org/pdf/2410.20796
Jul 24, 2025
282
2406.13361v1.pdf
arxiv.org/pdf/2406.13361
Jul 17, 2025
16
2312.06134v1.pdf
arxiv.org/pdf/2312.06134
Jul 17, 2025
251
2408.14960v1.pdf
arxiv.org/pdf/2408.14960
Jul 15, 2025
361

The Product-Minded Software Engineer
blog.pragmaticengineer.com/the-product-minded-engineer/
Jul 11, 2025
6
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language - 2506.20920v1.pdf
arxiv.org/pdf/2506.20920
Jul 10, 2025
39
2504.15431v1.pdf
arxiv.org/pdf/2504.15431
Jul 10, 2025
413
MuRating: A High Quality Data Selecting Approach to Multilingual Large Language Model Pretraining - 2507.01785v1.pdf
arxiv.org/pdf/2507.01785
Jul 9, 2025
162

Direct Preference Optimization: Your Language Model is Secretly a Reward Model
arxiv.org/html/2305.18290v3
Jun 20, 2025
28

Impossible Distillation for Paraphrasing and Summarization: How to Make High-quality Lemonade out of Small, Low-quality Models
arxiv.org/html/2305.16635v4
Jun 19, 2025
22