位置编码

Multi Head Attention & rope & GQA & KV cache

随笔

flash attention

deepseek

GPipe & 1F1B & VP 理解

数据并行(DP)

RLHF

LoRA

Adam

paged attention

prefix caching

vLLM 动态批处理(dynamic batching)