KVBoostΒΆ

Chunk-level KV cache reuse for faster HuggingFace inference.

5-48x TTFT reduction on 3B+ models with repeated long context.

from kvboost import KVBoost

engine = KVBoost.from_pretrained("Qwen/Qwen2.5-3B")
engine.warm("You are a helpful assistant...")
result = engine.generate("You are a helpful assistant...\n\nHello!")
print(result.output_text)

Development