Data Structures¶

CachedChunk¶

class kvboost.models.CachedChunk(chunk_id, text, token_ids, past_key_values, position_start, position_end, prefix_hash='', content_hash='', created_at=<factory>, access_count=0, recomputed=False)[source]¶

A single cached chunk: a slice of tokenized text + its KV tensors. position_start / position_end are the absolute token positions at which this chunk was originally encoded. They drive position_ids on reuse so RoPE offsets stay consistent.

Parameters:

chunk_id (str)
text (str)
token_ids (List[int])
past_key_values (PastKVType)
position_start (int)
position_end (int)
prefix_hash (str)
content_hash (str)
created_at (float)
access_count (int)
recomputed (bool)

chunk_id: str¶

text: str¶

token_ids: List[int]¶

past_key_values: PastKVType¶

position_start: int¶

position_end: int¶

prefix_hash: str = ''¶

content_hash: str = ''¶

created_at: float¶

access_count: int = 0¶

recomputed: bool = False¶

property length: int¶

touch()[source]¶

Return type:: None

memory_bytes()[source]¶

Return type:: int

AssembledPrompt¶

class kvboost.models.AssembledPrompt(full_token_ids, cached_past_kv, cached_length, live_token_ids, live_position_ids, chunk_boundaries, cache_hit_ratio, has_approximate=False)[source]¶

Result of stitching cached chunks + live (uncached) tail tokens.

cached_past_kv : merged KV tensors for all cached tokens cached_length : number of tokens covered by cached_past_kv live_token_ids : tokens that still need a fresh forward pass live_position_ids : absolute positions for each live token chunk_boundaries : list of (start, end) for each reused chunk

(used by SelectiveRecompute to find seam positions)

cache_hit_ratio : fraction of total tokens served from cache has_approximate : True if any chunk was matched by content_hash

(not prefix_hash) — signals that full recompute is needed, not just boundary repair

Parameters:

full_token_ids (List[int])
cached_past_kv (Optional[PastKVType])
cached_length (int)
live_token_ids (List[int])
live_position_ids (List[int])
chunk_boundaries (List[Tuple[int, int]])
cache_hit_ratio (float)
has_approximate (bool)

full_token_ids: List[int]¶

cached_past_kv: PastKVType | None¶

cached_length: int¶

live_token_ids: List[int]¶

live_position_ids: List[int]¶

chunk_boundaries: List[Tuple[int, int]]¶

cache_hit_ratio: float¶

has_approximate: bool = False¶

property total_length: int¶

WarmResult¶

class kvboost.models.WarmResult(chunks_stored, token_count, chunk_size, chunk_boundary_aligned, partial_tail_tokens, alignment_warning=None)[source]¶

Diagnostic returned by engine.warm() to help catch alignment issues.

Parameters:

chunks_stored (int)
token_count (int)
chunk_size (int)
chunk_boundary_aligned (bool)
partial_tail_tokens (int)
alignment_warning (str | None)

chunks_stored: int¶

token_count: int¶

chunk_size: int¶

chunk_boundary_aligned: bool¶

partial_tail_tokens: int¶

alignment_warning: str | None = None¶

Hashing Functions¶

kvboost.models.content_hash_from_tokens(token_ids)[source]¶

Content-only hash. Same tokens always produce the same key.

Parameters:: token_ids (List[int])
Return type:: str

kvboost.models.chained_hash(token_ids, parent_hash=None)[source]¶

Prefix-chained hash (vLLM-style). key = SHA256(parent_hash || token_bytes)

Same tokens with different parent hashes produce different keys, so the same text at different positions in different conversations correctly gets separate cache entries.

Parameters:

token_ids (List[int])
parent_hash (str | None)

Return type:

str