Cache Key Design¶
KVBoost uses a two-tier keying system inspired by vLLM v1’s block hashes.
The Problem with Content-Only Keys¶
A naive SHA256(token_bytes) key means identical text at different positions
shares the same cache entry. This causes two bugs:
RoPE position collision: KV tensors cached at position 0 get loaded into a prompt where they should be at position 512. The assembled sequence has positions
[0..127, 512..639]instead of[0..639].Cross-chunk attention contamination: The same text in different conversations attended to different preceding context. The KV vectors encode “what this token is given everything before it” – reusing them across contexts introduces semantic error.
Two-Tier Keys¶
Key Type |
Formula |
What It Encodes |
|---|---|---|
|
|
Full prefix chain – same tokens at different positions get different keys |
|
|
Content only – used for approximate reuse with mandatory full recompute |
Lookup Order¶
Try
prefix_hash(exact match) – positionally correct, use directlyFall back to
content_hash(approximate match) – flag for CacheBlend full recompute, not just boundary repair
This preserves KVBoost’s differentiator (non-prefix chunk reuse) while making it correct. Approximate matches still hit the cache, but the system knows the KV tensors need full correction.
Hash Chaining¶
from kvboost.models import chained_hash, content_hash_from_tokens
tokens = [1, 2, 3, 4]
# Content hash -- same for same tokens regardless of position
c = content_hash_from_tokens(tokens)
# Chained hash -- different for different preceding context
h1 = chained_hash(tokens, parent_hash=None) # first chunk
h2 = chained_hash(tokens, parent_hash="abc") # after chunk "abc"
assert h1 != h2 # same tokens, different keys