Data Structures¶
CachedChunk¶
- class kvboost.models.CachedChunk(chunk_id, text, token_ids, past_key_values, position_start, position_end, prefix_hash='', content_hash='', created_at=<factory>, access_count=0, recomputed=False)[source]¶
A single cached chunk: a slice of tokenized text + its KV tensors. position_start / position_end are the absolute token positions at which this chunk was originally encoded. They drive position_ids on reuse so RoPE offsets stay consistent.
- Parameters:
- past_key_values: PastKVType¶
AssembledPrompt¶
- class kvboost.models.AssembledPrompt(full_token_ids, cached_past_kv, cached_length, live_token_ids, live_position_ids, chunk_boundaries, cache_hit_ratio, has_approximate=False)[source]¶
Result of stitching cached chunks + live (uncached) tail tokens.
cached_past_kv : merged KV tensors for all cached tokens cached_length : number of tokens covered by cached_past_kv live_token_ids : tokens that still need a fresh forward pass live_position_ids : absolute positions for each live token chunk_boundaries : list of (start, end) for each reused chunk
(used by SelectiveRecompute to find seam positions)
cache_hit_ratio : fraction of total tokens served from cache has_approximate : True if any chunk was matched by content_hash
(not prefix_hash) — signals that full recompute is needed, not just boundary repair
- Parameters:
WarmResult¶
Hashing Functions¶
- kvboost.models.content_hash_from_tokens(token_ids)[source]¶
Content-only hash. Same tokens always produce the same key.
- kvboost.models.chained_hash(token_ids, parent_hash=None)[source]¶
Prefix-chained hash (vLLM-style). key = SHA256(parent_hash || token_bytes)
Same tokens with different parent hashes produce different keys, so the same text at different positions in different conversations correctly gets separate cache entries.