Cache Manager¶

KVCacheManager¶

class kvboost.cache_manager.KVCacheManager(max_chunks=64, disk_dir=None, device='cpu', kv_cache_bits=16)[source]¶

Parameters:

max_chunks (int)
disk_dir (Optional[str])
device (str)
kv_cache_bits (int)

store(chunk)[source]¶

Store a chunk. Evicts lowest-frequency entry if over capacity.

Parameters:: chunk (CachedChunk)
Return type:: None

get(chunk_id)[source]¶

Retrieve a chunk by prefix_hash (exact match only). Dequantizes if needed.

Parameters:: chunk_id (str)
Return type:: CachedChunk | None

get_by_content(content_hash)[source]¶

Look up by content_hash (approximate match). Returns ChunkMatch with approximate=True if found via content index.

Parameters:: content_hash (str)
Return type:: ChunkMatch | None

lookup(token_ids, parent_hash=None)[source]¶

Two-tier lookup: 1. Try prefix-chained hash (exact) — correct position + context 2. Fall back to content hash (approximate) — flagged for full recompute

Returns ChunkMatch or None.

Parameters:

token_ids (List[int])
parent_hash (str | None)

Return type:

ChunkMatch | None

build_prefix_kv(token_ids, chunk_size)[source]¶

Greedily assemble the longest cached prefix using chained hashes. Only exact matches are used (no approximate fallback for prefix mode).

Parameters:

token_ids (List[int])
chunk_size (int)

Return type:

Tuple[Tuple[Tuple[torch.Tensor, torch.Tensor], …] | None, int]

find_matching_chunks(token_ids, chunk_size)[source]¶

Scan for all matching chunks using two-tier lookup. Returns list of (start_pos, ChunkMatch) pairs in order. Each ChunkMatch carries approximate=True/False.

Parameters: