Data Structures

CachedChunk

class kvboost.models.CachedChunk(chunk_id, text, token_ids, past_key_values, position_start, position_end, prefix_hash='', content_hash='', created_at=<factory>, access_count=0, recomputed=False)[source]

A single cached chunk: a slice of tokenized text + its KV tensors. position_start / position_end are the absolute token positions at which this chunk was originally encoded. They drive position_ids on reuse so RoPE offsets stay consistent.

Parameters:
  • chunk_id (str)

  • text (str)

  • token_ids (List[int])

  • past_key_values (PastKVType)

  • position_start (int)

  • position_end (int)

  • prefix_hash (str)

  • content_hash (str)

  • created_at (float)

  • access_count (int)

  • recomputed (bool)

chunk_id: str
text: str
token_ids: List[int]
past_key_values: PastKVType
position_start: int
position_end: int
prefix_hash: str = ''
content_hash: str = ''
created_at: float
access_count: int = 0
recomputed: bool = False
property length: int
touch()[source]
Return type:

None

memory_bytes()[source]
Return type:

int

AssembledPrompt

class kvboost.models.AssembledPrompt(full_token_ids, cached_past_kv, cached_length, live_token_ids, live_position_ids, chunk_boundaries, cache_hit_ratio, has_approximate=False)[source]

Result of stitching cached chunks + live (uncached) tail tokens.

cached_past_kv : merged KV tensors for all cached tokens cached_length : number of tokens covered by cached_past_kv live_token_ids : tokens that still need a fresh forward pass live_position_ids : absolute positions for each live token chunk_boundaries : list of (start, end) for each reused chunk

(used by SelectiveRecompute to find seam positions)

cache_hit_ratio : fraction of total tokens served from cache has_approximate : True if any chunk was matched by content_hash

(not prefix_hash) — signals that full recompute is needed, not just boundary repair

Parameters:
  • full_token_ids (List[int])

  • cached_past_kv (Optional[PastKVType])

  • cached_length (int)

  • live_token_ids (List[int])

  • live_position_ids (List[int])

  • chunk_boundaries (List[Tuple[int, int]])

  • cache_hit_ratio (float)

  • has_approximate (bool)

full_token_ids: List[int]
cached_past_kv: PastKVType | None
cached_length: int
live_token_ids: List[int]
live_position_ids: List[int]
chunk_boundaries: List[Tuple[int, int]]
cache_hit_ratio: float
has_approximate: bool = False
property total_length: int

WarmResult

class kvboost.models.WarmResult(chunks_stored, token_count, chunk_size, chunk_boundary_aligned, partial_tail_tokens, alignment_warning=None)[source]

Diagnostic returned by engine.warm() to help catch alignment issues.

Parameters:
  • chunks_stored (int)

  • token_count (int)

  • chunk_size (int)

  • chunk_boundary_aligned (bool)

  • partial_tail_tokens (int)

  • alignment_warning (str | None)

chunks_stored: int
token_count: int
chunk_size: int
chunk_boundary_aligned: bool
partial_tail_tokens: int
alignment_warning: str | None = None

Hashing Functions

kvboost.models.content_hash_from_tokens(token_ids)[source]

Content-only hash. Same tokens always produce the same key.

Parameters:

token_ids (List[int])

Return type:

str

kvboost.models.chained_hash(token_ids, parent_hash=None)[source]

Prefix-chained hash (vLLM-style). key = SHA256(parent_hash || token_bytes)

Same tokens with different parent hashes produce different keys, so the same text at different positions in different conversations correctly gets separate cache entries.

Parameters:
Return type:

str