前缀缓存
Automatic Prefix Caching (APC) allows the vLLM engine to reuse cached KV (key-value) pairs from previous prompts if a new query shares the same prefix. This reduces redundant computation and improves inference speed.
增加上下文长度
rope_scaling:YaRN法(Yet another Rope extensioN) 频率域插值、预softmax缩放