KVBoostΒΆ
Chunk-level KV cache reuse for faster HuggingFace inference.
5-48x TTFT reduction on 3B+ models with repeated long context.
from kvboost import KVBoost
engine = KVBoost.from_pretrained("Qwen/Qwen2.5-3B")
engine.warm("You are a helpful assistant...")
result = engine.generate("You are a helpful assistant...\n\nHello!")
print(result.output_text)
Getting Started
User Guide
Benchmarks
Architecture
API Reference
Development