nn.KVCache¶

class lucid.nn.KVCache¶

Overview¶

KVCache is the abstract base class for Transformer KV cache implementations. It extends lucid.nn.Cache with KV-specific APIs used by attention modules and generation loops.

Class Signature¶

class lucid.nn.KVCache()

All concrete caches must implement:

update(key, value, layer_idx, cache_position=None)
get(layer_idx)
get_seq_length(layer_idx=0)
reset()
internal crop behavior used by crop(max_length)

Common public methods¶

KVCache also provides shared utility methods:

reorder_cache(beam_idx)
batch_select_indices(indices)
batch_repeat_interleave(repeats)
crop(max_length)
get_max_cache_shape() (returns None by default)

Shape convention¶

Typical key/value shape is:

\[(B, H, T, D_h)\]

\(B\): batch size
\(H\): number of attention heads
\(T\): sequence length in cache axis
\(D_h\): per-head dimension

Minimal API example¶

import lucid
import lucid.nn as nn

cache: nn.KVCache = nn.DynamicKVCache()
key = lucid.randn(1, 8, 1, 64)
value = lucid.randn(1, 8, 1, 64)

cache.update(key, value, layer_idx=0)
kv = cache.get(0)
print(cache.get_seq_length(0))  # 1

Beam utility example¶

import lucid
import lucid.nn as nn

cache = nn.DynamicKVCache()
# ... cache is already populated ...

# Expand batch for beam search
cache.batch_repeat_interleave(4)

# Select/reorder active beams
active = lucid.Tensor([3, 1, 0, 2], dtype=lucid.Int32)
cache.reorder_cache(active)

# equivalent generic call
cache.batch_select_indices(active)

Crop and reset example¶

cache.crop(512)  # keep only the latest 512 tokens
cache.reset()    # clear all layers

When to use this class directly¶

Use KVCache as a type annotation and public API contract. Instantiate DynamicKVCache or StaticKVCache for actual runtime behavior.