mpcompress.latent_codecs
FeatureScaleHyperprior
FeatureScaleHyperprior(N, M, **kwargs)
Scale Hyperprior model from J. Balle, D. Minnen, S. Singh, S.J. Hwang,
N. Johnston: "Variational Image Compression with a Scale Hyperprior"
<https://arxiv.org/abs/1802.01436>_ Int. Conf. on Learning Representations
(ICLR), 2018.
┌───┐ y ┌───┐ z ┌───┐ z_hat z_hat ┌───┐
x ──►─┤g_a├──►─┬──►──┤h_a├──►──┤ Q ├───►───·⋯⋯·───►───┤h_s├─┐
└───┘ │ └───┘ └───┘ EB └───┘ │
▼ │
┌─┴─┐ │
│ Q │ ▼
└─┬─┘ │
│ │
y_hat ▼ │
│ │
· │
GC : ◄─────────────────────◄────────────────────┘
· scales_hat
│
y_hat ▼
│
┌───┐ │
x_hat ──◄─┤g_s├────┘
└───┘
EB = Entropy bottleneck
GC = Gaussian conditional
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
|
Number of channels |
required |
M
|
|
Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder) |
required |
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
N
|
|
Number of channels in the main network. |
required |
M
|
|
Number of channels in the expansion layers (last layer of the encoder and last layer of the hyperprior decoder). |
required |
**kwargs
|
|
Additional keyword arguments passed to parent class. |
{}
|
downsampling_factor
property
downsampling_factor: int
Compute the downsampling factor of the model.
Returns:
| Name | Type | Description |
|---|---|---|
factor |
|
Downsampling factor (64 for this architecture). |
compress
compress(x)
Compress input tensor to bitstrings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input tensor to compress. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
decompress
decompress(strings, shape)
Decompress bitstrings to reconstructed tensor.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
List of compressed bitstrings [y_strings, z_strings]. Must contain exactly 2 elements. |
required |
shape
|
|
Spatial shape of the hyper latents (H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
forward
forward(x)
Forward pass through the Scale Hyperprior model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input tensor to compress. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
from_state_dict
classmethod
from_state_dict(state_dict)
Create a new model instance from state dictionary.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
state_dict
|
|
State dictionary containing model weights. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
model |
|
New model instance with loaded weights. |
HyperLatentCodecWithCtx
HyperLatentCodecWithCtx(entropy_bottleneck: EntropyBottleneck, h_a: Module, h_s: Module, quantizer: str = 'noise', **kwargs)
Entropy bottleneck codec with surrounding h_a and h_s transforms.
"Hyper" side-information branch introduced in
"Variational Image Compression with a Scale Hyperprior"
<https://arxiv.org/abs/1802.01436>_,
by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston,
International Conference on Learning Representations (ICLR), 2018.
HyperLatentCodec should be used inside
HyperpriorLatentCodec to construct a full hyperprior.
┌───┐ z ┌───┐ z_hat z_hat ┌───┐
y ──►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►── params
└───┘ └───┘ EB └───┘
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
entropy_bottleneck
|
|
Entropy bottleneck module for compressing hyper latents. |
required |
h_a
|
|
Analysis transform that maps input to hyper latents. |
required |
h_s
|
|
Synthesis transform that maps hyper latents to parameters. |
required |
quantizer
|
|
Quantization method. Options: "noise" (default) or "ste". Defaults to "noise". |
'noise'
|
**kwargs
|
|
Additional keyword arguments passed to parent class. |
{}
|
compress
compress(y: Tensor, ctx: Tensor) -> Dict[str, Any]
Compress main latents to bitstrings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
|
Main latents to compress. |
required |
ctx
|
|
Context tensor for conditional processing. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
decompress
decompress(strings: List[List[bytes]], shape: Tuple[int, int], ctx: Tensor, **kwargs) -> Dict[str, Any]
Decompress bitstrings to parameters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
List containing compressed bitstrings [z_strings]. |
required |
shape
|
|
Spatial shape of hyper latents (H, W). |
required |
ctx
|
|
Context tensor for conditional processing. |
required |
**kwargs
|
|
Additional keyword arguments (unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
forward
forward(y: Tensor, ctx: Tensor) -> Dict[str, Any]
Forward pass through the hyper latent codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
|
Main latents to process. |
required |
ctx
|
|
Context tensor for conditional processing. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
HyperpriorLatentCodecWithCtx
HyperpriorLatentCodecWithCtx(latent_codec: Mapping[str, LatentCodec], **kwargs)
Hyperprior codec constructed from latent codec for y that
compresses y using params from hyper branch.
Hyperprior entropy modeling introduced in
"Variational Image Compression with a Scale Hyperprior"
<https://arxiv.org/abs/1802.01436>_,
by J. Balle, D. Minnen, S. Singh, S.J. Hwang, and N. Johnston,
International Conference on Learning Representations (ICLR), 2018.
┌──────────┐
┌─►──┤ lc_hyper ├──►─┐
│ └──────────┘ │
│ ▼ params
│ │
│ ┌──┴───┐
y ──┴───────►─────────┤ lc_y ├───►── y_hat
└──────┘
By default, the following codec is constructed:
┌───┐ z ┌───┐ z_hat z_hat ┌───┐
┌─►──┤h_a├──►──┤ Q ├───►───····───►───┤h_s├──►─┐
│ └───┘ └───┘ EB └───┘ │
│ │
│ ┌──────────────◄────────────┘
│ │ params
│ ┌──┴──┐
│ │ EP │
│ └──┬──┘
│ │
│ ┌───┐ y_hat ▼
y ──┴─►─┤ Q ├────►────····────►── y_hat
└───┘ GC
Common configurations of latent codecs include:
- entropy bottleneck
hyper(default) and gaussian conditionaly(default) - entropy bottleneck
hyper(default) and autoregressivey
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
latent_codec
|
|
Dictionary of latent codecs containing at least "y" and "hyper" keys: - "y": Codec for main latents. - "hyper": Codec for hyper latents (side information). |
required |
**kwargs
|
|
Additional keyword arguments passed to parent class. |
{}
|
__getitem__
__getitem__(key: str) -> LatentCodec
Get a latent codec by key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
|
Key to access latent codec (e.g., "y" or "hyper"). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
codec |
|
Requested latent codec. |
compress
compress(y: Tensor, ctx: Tensor) -> Dict[str, Any]
Compress main latents to bitstrings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
|
Main latents to compress. |
required |
ctx
|
|
Context tensor for conditional processing. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
decompress
decompress(strings: List[List[bytes]], shape: Dict[str, Tuple[int, ...]], ctx: Tensor, **kwargs) -> Dict[str, Any]
Decompress bitstrings to reconstructed main latents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
List of compressed bitstrings, with y_strings followed by z_strings. All y_strings must have the same length as z_strings. |
required |
shape
|
|
Dictionary with keys "y" and "hyper" containing spatial shapes for main and hyper latents respectively. |
required |
ctx
|
|
Context tensor for conditional processing. |
required |
**kwargs
|
|
Additional keyword arguments (unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
forward
forward(y: Tensor, ctx: Tensor) -> Dict[str, Any]
Forward pass through the hyperprior codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
|
Main latents to process. |
required |
ctx
|
|
Context tensor for conditional processing. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
VitUnionLatentCodec
VitUnionLatentCodec(h_dim=384, y_dim=256, z_dim=192, groups=16, num_prefix_tokens=1, **kwargs)
Vit-based latent codec with joint modeling of cls token and patch tokens.
This codec takes ViT features as input and compresses the 2D patch tokens using a hyperprior + space-channel context model (SCCTX) as in [He2022]. It reconstructs the ViT feature map and re-injects learned register tokens before passing through transformer blocks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h_dim
|
|
Channel dimension of ViT features. |
384
|
y_dim
|
|
Channel dimension of primary latent representation |
256
|
z_dim
|
|
Channel dimension of hyperprior latent representation |
192
|
groups
|
|
Channel groups for channel-wise context modeling.
If int, the channels are evenly split; if list, must sum to |
16
|
num_prefix_tokens
|
|
Number of prefix/register tokens in the ViT feature. |
1
|
**kwargs
|
|
Extra keyword arguments for compatibility (unused). |
{}
|
compress
compress(h, token_res, **kwargs)
Compress ViT features into entropy-coded bitstreams.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h
|
|
ViT output tensor of shape |
required |
token_res
|
|
Spatial token resolution |
required |
**kwargs
|
|
Unused keyword arguments for API compatibility. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
A dictionary with keys:
|
decompress
decompress(strings, pstate, **kwargs)
Decompress entropy-coded bitstreams back to ViT features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
Bitstreams produced by :meth: |
required |
pstate
|
|
Side information produced by :meth: |
required |
**kwargs
|
|
Unused keyword arguments for API compatibility. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
A dictionary with key:
|
forward
forward(h, token_res, **kwargs)
Forward pass for end-to-end rate–distortion training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h
|
|
ViT output tensor of shape |
required |
token_res
|
|
Spatial token resolution |
required |
**kwargs
|
|
Unused keyword arguments for API compatibility. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
A dictionary with keys:
|
VitUnionLatentCodecWithCtx
VitUnionLatentCodecWithCtx(h_dim=384, y_dim=256, z_dim=192, ctx_dim=256, groups=16, **kwargs)
Vit union latent codec conditioned on an external context feature map.
This codec jointly compresses ViT patch tokens and an additional context feature map. The context is injected into both the analysis and synthesis transforms as well as the hyperprior pathway.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h_dim
|
|
Channel dimension of ViT features. |
384
|
y_dim
|
|
Channel dimension of primary latent representation |
256
|
z_dim
|
|
Channel dimension of hyperprior latent representation |
192
|
ctx_dim
|
|
Channel dimension of the external context feature map. |
256
|
groups
|
|
Channel groups for channel-wise context modeling. |
16
|
**kwargs
|
|
Extra keyword arguments for compatibility (unused). |
{}
|
compress
compress(h, ctx, token_res)
Compress ViT features conditioned on a context feature map.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h
|
|
ViT output tensor of shape |
required |
ctx
|
|
Context feature map of shape |
required |
token_res
|
|
Spatial token resolution |
required |
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
A dictionary with keys:
|
decompress
decompress(strings, pstate, ctx, **kwargs)
Decompress context-conditioned bitstreams back to ViT features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
Bitstreams with keys |
required |
pstate
|
|
Side information produced by :meth: |
required |
ctx
|
|
Context feature map used also at decoding time. |
required |
**kwargs
|
|
Unused keyword arguments for API compatibility. |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
A dictionary with keys:
|
forward
forward(h, ctx, token_res)
Forward pass with context-conditioned hyperprior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
h
|
|
ViT output tensor of shape |
required |
ctx
|
|
Context feature map of shape |
required |
token_res
|
|
Spatial token resolution |
required |
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
A dictionary with keys:
|
VtmCodec
VtmCodec(repo_dir)
VTM (VVC Test Model) codec wrapper for video encoding and decoding.
This class provides an interface to VTM encoder and decoder executables for compressing and decompressing video data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_dir
|
|
Path to VTM repository directory containing bin/ and cfg/ folders. |
required |
compress
compress(raw_path, bin_path, width, height, qp: int, bitdepth: int = 8, chroma_format: str = '400')
Compress raw video file using VTM encoder.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
raw_path
|
|
Path to input raw YUV file. |
required |
bin_path
|
|
Path to output compressed bitstream file. |
required |
width
|
|
Video width in pixels. |
required |
height
|
|
Video height in pixels. |
required |
qp
|
|
Quantization parameter (0-51, lower is higher quality). |
required |
bitdepth
|
|
Bit depth (8 or 10). Defaults to 8. |
8
|
chroma_format
|
|
Chroma format. Defaults to "400" (grayscale). |
'400'
|
decompress
decompress(bin_path, rec_path, bit_depth=8)
Decompress VTM bitstream to raw video file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bin_path
|
|
Path to input compressed bitstream file. |
required |
rec_path
|
|
Path to output reconstructed YUV file. |
required |
bit_depth
|
|
Bit depth (8 or 10). Defaults to 8. |
8
|
VtmFeatureCodec
VtmFeatureCodec(cfg)
VTM-based feature codec for compressing neural network features.
This codec applies truncation, quantization, packing, VTM encoding/decoding, and post-processing to compress features from various model types (llama3, dinov2, sd3).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cfg
|
|
Configuration object containing:
|
required |
compress
compress(org_feat, qp: int)
Compress features using VTM codec.
Expected feature shape: (N_crop, N_layer, H*W+1, C)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
org_feat
|
|
Original features to compress. |
required |
qp
|
|
Quantization parameter for VTM encoding. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing:
|
decompress
decompress(strings, pstate, **kwargs)
Decompress features from VTM bitstream.
Note: model_type, bit_depth, trun_low, trun_high are fixed in self.cfg.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
Dictionary with key "vtm" containing compressed bitstring. |
required |
pstate
|
|
State dictionary containing:
|
required |
**kwargs
|
|
Additional keyword arguments (unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
decoded |
|
Dictionary containing: - "h_hat" (numpy.ndarray): Decoded features with original shape. |
forward_test
forward_test(org_feat, qp: int)
Forward test method for debugging (includes timing measurements).
This method performs full encode-decode cycle and returns both compressed representation and decoded features with timing information.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
org_feat
|
|
Original features to compress. |
required |
qp
|
|
Quantization parameter for VTM encoding. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing:
|
decoded |
|
Dictionary containing:
|