#AI

前缀缓存

Automatic Prefix Caching (APC) allows the vLLM engine to reuse cached KV (key-value) pairs from previous prompts if a new query shares the same prefix. This reduces redundant computation and improves inference speed.

增加上下文长度

rope_scaling:YaRN法(Yet another Rope extensioN) 频率域插值、预softmax缩放

参考