KVBoost¶

Chunk-level KV cache reuse for faster HuggingFace inference.

5-48x TTFT reduction on 3B+ models with repeated long context.

from kvboost import KVBoost

engine = KVBoost.from_pretrained("Qwen/Qwen2.5-3B")
engine.warm("You are a helpful assistant...")
result = engine.generate("You are a helpful assistant...\n\nHello!")
print(result.output_text)

Getting Started

Quick Start
Installation

Benchmarks

Benchmark Results
KVBoost vs vLLM Prefix Caching

Architecture

Architecture Overview
- Module Responsibilities
- Supporting Modules
Cache Key Design

Development

Changelog