Cache Manager¶
KVCacheManager¶
- class kvboost.cache_manager.KVCacheManager(max_chunks=64, disk_dir=None, device='cpu', kv_cache_bits=16)[source]¶
-
- store(chunk)[source]¶
Store a chunk. Evicts lowest-frequency entry if over capacity.
- Parameters:
chunk (CachedChunk)
- Return type:
None
- get(chunk_id)[source]¶
Retrieve a chunk by prefix_hash (exact match only). Dequantizes if needed.
- Parameters:
chunk_id (str)
- Return type:
CachedChunk | None
- get_by_content(content_hash)[source]¶
Look up by content_hash (approximate match). Returns ChunkMatch with approximate=True if found via content index.
- Parameters:
content_hash (str)
- Return type:
ChunkMatch | None
- lookup(token_ids, parent_hash=None)[source]¶
Two-tier lookup: 1. Try prefix-chained hash (exact) — correct position + context 2. Fall back to content hash (approximate) — flagged for full recompute
Returns ChunkMatch or None.
- Parameters:
- Return type:
ChunkMatch | None
- build_prefix_kv(token_ids, chunk_size)[source]¶
Greedily assemble the longest cached prefix using chained hashes. Only exact matches are used (no approximate fallback for prefix mode).
- Parameters:
- Return type:
Tuple[Tuple[Tuple[torch.Tensor, torch.Tensor], …] | None, int]
- find_matching_chunks(token_ids, chunk_size)[source]¶
Scan for all matching chunks using two-tier lookup. Returns list of (start_pos, ChunkMatch) pairs in order. Each ChunkMatch carries approximate=True/False.
- static merge_kv_list(kv_list)[source]¶
- Parameters:
kv_list (List[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]])
- Return type:
Tuple[Tuple[torch.Tensor, torch.Tensor], …]
- static slice_kv(kv, start, end)[source]¶
- Parameters:
kv (Tuple[Tuple[torch.Tensor, torch.Tensor], ...])
start (int)
end (int)
- Return type:
Tuple[Tuple[torch.Tensor, torch.Tensor], …]
- static kv_seq_len(kv)[source]¶
- Parameters:
kv (Tuple[Tuple[torch.Tensor, torch.Tensor], ...])
- Return type: