
站在AI Scale-Up域的一个岔路口
zhuanlan.zhihu.com/p/707355769?utm_psn=1796087465674674176
Jul 23, 2024
3

AI fabric is a bus or a network?
zhuanlan.zhihu.com/p/708602042
Jul 23, 2024
2

大模型推理分离架构五虎上将
zhuanlan.zhihu.com/p/706218732
Jul 22, 2024
4

为Token-level流水并行找PMF:从TeraPipe,Seq1F1B,HPipe到PipeFusion
zhuanlan.zhihu.com/p/706475158
Jul 22, 2024
1

LLM分离式推理可能带来的软硬件变革的迷思
zhuanlan.zhihu.com/p/707199343
Jul 22, 2024
1

GB200 Hardware Architecture - Component Supply Chain & BOM
www.semianalysis.com/p/gb200-hardware-architecture-and-component
Jul 17, 2024
3

Mooncake (1): 在月之暗面做月饼,Kimi 以 KVCache 为中心的分离式推理架构
zhuanlan.zhihu.com/p/705754254
Jul 12, 2024
3
AI Inference — 从前沿技术到商业化实操观察 (社区版) - 飞书云文档
miracleplus.feishu.cn/docx/Lqe1dgVTho0vEVxZqLZcFpmgnkb
Jul 10, 2024
1

From bare metal to a 70B model: infrastructure set-up and scripts
imbue.com/research/70b-infrastructure/
Jul 3, 2024
1

星融元针对LLM大模型承载网发布星智AI网络解决方案
asterfusion.com/a20240205-ai-llm-solution/
Jun 13, 2024
4

云原生机器学习平台技术综述(编排调度篇)-来也科技
laiye.com/news/post/2627.html
May 20, 2024
1
DeepSpeed/blogs/deepspeed-fastgen/2024-01-19 at master · microsoft/DeepSpeed
github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-fastgen/chinese
May 20, 2024
8

Nvidia Blackwell Perf TCO Analysis - B100 vs B200 vs GB200NVL72
www.semianalysis.com/p/nvidia-blackwell-perf-tco-analysis
May 11, 2024
2

在生产环境中的OpenStack上运行Kubernetes集群 - 墨天轮
www.modb.pro/db/47575
May 9, 2024
2

OpenStack与K8s的关系 OpenStack与Kubernetes(K8s)的区别
www.usa-idc.com/news/idc/2023071414.shtml
May 9, 2024
1

Intel Introduces Gaudi 3 AI Accelerator: Going Bigger and Aiming Higher In AI Market
www.anandtech.com/show/21342/intel-introduces-gaudi-3-accelerator-going-bigger-and-aiming-higher
Apr 10, 2024
1

英伟达GB200架构解析:互联架构和未来演进-电子工程专辑
www.eet-china.com/mp/a301182.html
Apr 8, 2024
5

暴力美学的优雅化——NVidia的Rack Scale
zhuanlan.zhihu.com/p/689424234
Apr 8, 2024
1

英伟达AI芯片路线图分析与解读
wallstreetcn.com/articles/3712058
Apr 7, 2024
2

GPT-4 “炼丹”指南:MoE、参数量、训练成本和推理的秘密
www.aixinzhijie.com/article/6825966
Apr 2, 2024
4

英伟达 A100知识分享 GPU 板组单机价值量 1.2 万
www.jaeaiot.com/news/detail/32.html
Feb 22, 2024
3

SemiAnalysis | Dylan Patel | Substack
www.semianalysis.com/p/groq-inference-tokenomics-speed-but?utm_source=post-email-title&publication_id=329241&post_id=141888751&utm_campaign=email-post-title&isFreemail=true&r=b0aiz&utm_medium=email
Feb 22, 2024
1