前缀缓存

Automatic Prefix Caching (APC) allows the vLLM engine to reuse cached KV (key-value) pairs from previous prompts if a new query shares the same prefix. This reduces redundant computation and improves inference speed.

增加上下文长度

rope_scaling：YaRN法（Yet another Rope extensioN）频率域插值、预softmax缩放

参考

vllm文档
PagedAttention论文

🧠 Brain

Explorer

vLLM

前缀缓存

增加上下文长度

参考

Graph View

Table of Contents

Backlinks

🧠 Brain

Explorer

vLLM

前缀缓存 §

增加上下文长度 §

参考 §

Graph View

Table of Contents

Backlinks

前缀缓存

增加上下文长度

参考