nn.KVCache¶
- class lucid.nn.KVCache¶
Overview¶
KVCache is the abstract base class for Transformer KV cache implementations.
It extends lucid.nn.Cache with KV-specific APIs used by attention
modules and generation loops.
Class Signature¶
class lucid.nn.KVCache()
All concrete caches must implement:
update(key, value, layer_idx, cache_position=None)
get(layer_idx)
get_seq_length(layer_idx=0)
reset()
internal crop behavior used by crop(max_length)
Common public methods¶
KVCache also provides shared utility methods:
reorder_cache(beam_idx)
batch_select_indices(indices)
batch_repeat_interleave(repeats)
crop(max_length)
get_max_cache_shape() (returns None by default)
Shape convention¶
Typical key/value shape is:
\(B\): batch size
\(H\): number of attention heads
\(T\): sequence length in cache axis
\(D_h\): per-head dimension
Minimal API example¶
import lucid
import lucid.nn as nn
cache: nn.KVCache = nn.DynamicKVCache()
key = lucid.randn(1, 8, 1, 64)
value = lucid.randn(1, 8, 1, 64)
cache.update(key, value, layer_idx=0)
kv = cache.get(0)
print(cache.get_seq_length(0)) # 1
Beam utility example¶
import lucid
import lucid.nn as nn
cache = nn.DynamicKVCache()
# ... cache is already populated ...
# Expand batch for beam search
cache.batch_repeat_interleave(4)
# Select/reorder active beams
active = lucid.Tensor([3, 1, 0, 2], dtype=lucid.Int32)
cache.reorder_cache(active)
# equivalent generic call
cache.batch_select_indices(active)
Crop and reset example¶
cache.crop(512) # keep only the latest 512 tokens
cache.reset() # clear all layers
When to use this class directly¶
Use KVCache as a type annotation and public API contract. Instantiate DynamicKVCache or StaticKVCache for actual runtime behavior.