Cache Manager

KVCacheManager

class kvboost.cache_manager.KVCacheManager(max_chunks=64, disk_dir=None, device='cpu', kv_cache_bits=16)[source]
Parameters:
  • max_chunks (int)

  • disk_dir (Optional[str])

  • device (str)

  • kv_cache_bits (int)

store(chunk)[source]

Store a chunk. Evicts lowest-frequency entry if over capacity.

Parameters:

chunk (CachedChunk)

Return type:

None

get(chunk_id)[source]

Retrieve a chunk by prefix_hash (exact match only). Dequantizes if needed.

Parameters:

chunk_id (str)

Return type:

CachedChunk | None

get_by_content(content_hash)[source]

Look up by content_hash (approximate match). Returns ChunkMatch with approximate=True if found via content index.

Parameters:

content_hash (str)

Return type:

ChunkMatch | None

lookup(token_ids, parent_hash=None)[source]

Two-tier lookup: 1. Try prefix-chained hash (exact) — correct position + context 2. Fall back to content hash (approximate) — flagged for full recompute

Returns ChunkMatch or None.

Parameters:
Return type:

ChunkMatch | None

build_prefix_kv(token_ids, chunk_size)[source]

Greedily assemble the longest cached prefix using chained hashes. Only exact matches are used (no approximate fallback for prefix mode).

Parameters:
Return type:

Tuple[Tuple[Tuple[torch.Tensor, torch.Tensor], …] | None, int]

find_matching_chunks(token_ids, chunk_size)[source]

Scan for all matching chunks using two-tier lookup. Returns list of (start_pos, ChunkMatch) pairs in order. Each ChunkMatch carries approximate=True/False.

Parameters:
Return type:

List[Tuple[int, ChunkMatch]]

invalidate(chunk_id)[source]
Parameters:

chunk_id (str)

Return type:

None

stats()[source]
Return type:

Dict

static merge_kv_list(kv_list)[source]
Parameters:

kv_list (List[Tuple[Tuple[torch.Tensor, torch.Tensor], ...]])

Return type:

Tuple[Tuple[torch.Tensor, torch.Tensor], …]

static slice_kv(kv, start, end)[source]
Parameters:
Return type:

Tuple[Tuple[torch.Tensor, torch.Tensor], …]

static kv_seq_len(kv)[source]
Parameters:

kv (Tuple[Tuple[torch.Tensor, torch.Tensor], ...])

Return type:

int

ChunkMatch

class kvboost.cache_manager.ChunkMatch(chunk, approximate=False)[source]

Result of a cache lookup — tracks whether the match is exact or approximate.

Parameters:
chunk
approximate