mpcompress.models
Dinov2OrigSlideOnlyPatchCodec
Dinov2OrigSlideOnlyPatchCodec(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, dino_codec={}, **kwargs)
Compression model using DINOv2-Original backbone with sliding window and VTM codec.
This model uses a sliding window approach to handle large images by processing them in overlapping patches. It extracts features using DINOv2-Original backbone and compresses them using VTM codec. Supports segmentation tasks but not classification tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slide_size
|
list of int
|
Size of each sliding window patch [height, width]. Defaults to [518, 518]. |
[518, 518]
|
slide_stride
|
list of int
|
Stride for sliding window [height_stride, width_stride]. Defaults to [259, 259]. |
[259, 259]
|
dino_backbone
|
|
Configuration dictionary for the DINOv2-Original backbone. Passed directly to Dinov2OrgBackbone constructor. |
{}
|
dino_codec
|
|
Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor. |
{}
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
The DINOv2-Original backbone model. |
|
|
The VTM feature codec for compression. |
|
|
Patch size used by the backbone model. |
|
|
Image size expected by the backbone. |
|
|
Whether the model supports dynamic input sizes. |
|
list of int
|
Size of each sliding window patch. |
|
list of int
|
Stride for sliding window. |
compress
compress(x, qp)
Compress input image to byte strings using sliding window approach.
Processes input image using sliding window, extracts features, and compresses them using VTM codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter for VTM compression. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data in CompressAI-compatible format:
|
Note:
h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features using sliding window.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
|
Dictionary of task-specific features:
|
forward
forward(x)
Forward pass for training (not implemented).
VTM codec does not require training, so this method raises an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor. |
required |
Raises:
| Type | Description |
|---|---|
|
Always raised as VTM does not need training. |
forward_test
forward_test(x, qp, tasks=[], **kwargs)
Forward pass for testing/inference with compression using sliding window.
Processes input image using sliding window approach, extracts features, compresses them with VTM codec, and generates task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter for VTM compression. |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data:
|
task_feats |
|
Dictionary of task-specific features:
|
Note:
h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Uses sliding window approach to extract features and calculates the total number of elements across all crops and layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
|
Total number of elements in the stacked feature tensor. |
Dinov2TimmBypass
Dinov2TimmBypass(dino_backbone={}, **kwargs)
A bypass model using DINOv2-Timm backbone for feature extraction without compression.
This model performs feature extraction using a DINOv2-Timm backbone and supports multiple downstream tasks (classification, segmentation) without any compression operations. It returns empty byte strings as a placeholder for compressed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
|
Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
The DINOv2-Timm backbone model. |
|
|
Patch size used by the backbone model. |
forward_test
forward_test(x, tasks=[], **kwargs)
Forward pass for testing/inference without compression.
Extracts features using the DINOv2 backbone and generates task-specific features (classification, segmentation) without performing any compression. Returns empty byte strings as a placeholder for compressed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing:
|
task_feats |
|
Dictionary of task-specific features
|
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
|
Total number of elements in the feature tensor. |
Dinov2TimmOnlyPatchCodec
Dinov2TimmOnlyPatchCodec(dino_backbone={}, dino_codec={}, **kwargs)
Compression model using DINOv2-Timm backbone with VTM feature codec.
This model extracts features using a DINOv2-Timm backbone and compresses them using VTM (Video Test Model) codec. It supports segmentation tasks but not classification tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
|
Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
dino_codec
|
|
Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor. |
{}
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
The DINOv2-Timm backbone model. |
|
|
The VTM feature codec for compression. |
|
|
Patch size used by the backbone model. |
|
|
Image size expected by the backbone. |
|
|
Whether the model supports dynamic input sizes. |
compress
compress(x, qp)
Compress input image to byte strings.
Extracts features using DINOv2 backbone and compresses them using VTM codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter for VTM compression. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data:
|
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
|
Dictionary of task-specific features:
|
forward
forward(x)
Forward pass for training (not implemented).
VTM codec does not require training, so this method raises an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor. |
required |
Raises:
| Type | Description |
|---|---|
|
Always raised as VTM does not need training. |
forward_test
forward_test(x, qp, tasks, **kwargs)
Forward pass for testing/inference with compression.
Extracts features using DINOv2 backbone, compresses them with VTM codec, and generates task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter for VTM compression. |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data:
|
task_feats |
|
Dictionary of task-specific features:
|
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
|
Total number of elements in the feature tensor after segmentation decoding. |
MPC_I1
MPC_I1(vqgan_config, **kwargs)
Multi-Purpose Compression model using VQGAN backbone only.
This is a single-layer compression model that uses VQGAN for feature extraction and uniform token codec for compression. It provides basic image reconstruction capabilities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vqgan_config
|
|
Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor. |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
The VQGAN backbone model. |
|
|
The uniform token codec for compression. |
|
|
Patch size used by the model (fixed at 16). |
compress
compress(x, **kwargs)
Compress input image to byte strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data:
|
decompress
decompress(coded_unit, **kwargs)
Decompress byte strings to reconstructed image and features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
|
Dictionary containing compressed data:
|
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
|
Dictionary of task-specific features:
|
forward
forward(x, **kwargs)
Forward pass for training.
Encodes input image using VQGAN, compresses tokens, and decodes to reconstructed image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
MPC_I12
MPC_I12(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)
Multi-Purpose Compression model with two layers: VQGAN and DINOv2.
This is a two-layer compression model that combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression. The DINOv2 codec uses VQGAN context for conditional compression. Supports multiple tasks including reconstruction, classification, and segmentation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vqgan_backbone
|
|
Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor. |
{}
|
vqgan_codec
|
|
Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor. |
{}
|
dino_backbone
|
|
Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
dino_codec
|
|
Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecWithCtx constructor. |
{}
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
The VQGAN backbone model. |
|
|
The VQGAN codec. |
|
|
The DINOv2 backbone model. |
|
|
The DINO codec with context. |
|
|
Patch size used by the DINOv2 backbone. |
|
|
Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features. |
extract_feature
extract_feature(x, **kwargs)
Extract features from input image for offline training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
forward
forward(x, **kwargs)
Forward pass for training.
Processes input through both VQGAN and DINOv2 layers, performs conditional compression, and returns features and likelihoods for training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
forward_test
forward_test(x, tasks, **kwargs)
Forward pass for testing/inference with compression.
Processes input through both layers, compresses features, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
|
Dictionary containing compressed data:
|
task_feats |
|
Dictionary of task-specific features:
|
offline_forward
offline_forward(data, device, **kwargs)
Offline forward pass for training with pre-extracted features.
Processes pre-extracted VQGAN tokens and DINO features for training. This method is used when features are extracted separately to save memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
|
Dictionary containing:
|
required |
device
|
|
Device to move tensors to. |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
MPC_I12_CtxAsHyper
MPC_I12_CtxAsHyper(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)
Multi-Purpose Compression model with context as hyperprior.
This is a variant of MPC_I12 where the VQGAN context is treated as hyperprior for the DINOv2 codec. Similar to MPC_I12, it combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression, but uses a different codec architecture that treats context as hyperprior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vqgan_backbone
|
|
Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor. |
{}
|
vqgan_codec
|
|
Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor. |
{}
|
dino_backbone
|
|
Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
dino_codec
|
|
Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecCtxAsHyper constructor. |
{}
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
|
The VQGAN backbone model. |
|
|
The VQGAN codec. |
|
|
The DINOv2 backbone model. |
|
|
The DINO codec with context as hyperprior. |
|
|
Patch size used by the DINOv2 backbone. |
|
|
Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features. |
compress
compress(x, **kwargs)
Compress input image to byte strings using both layers.
Processes input through both VQGAN and DINOv2 layers, compresses features with context as hyperprior, and returns compressed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
|
Dictionary containing compressed data:
|
decompress
decompress(coded_data, tasks=[], **kwargs)
Decompress byte strings to task-specific features.
Decompresses both VQGAN and DINOv2 layers, uses context as hyperprior for DINO decompression, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_data
|
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
|
Dictionary of task-specific features:
|
forward
forward(x, **kwargs)
Forward pass for training.
Processes input through both VQGAN and DINOv2 layers, performs conditional compression with context as hyperprior, and returns features and likelihoods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
forward_test
forward_test(x, tasks, **kwargs)
Forward pass for testing/inference with compression.
Processes input through both layers, compresses features with context as hyperprior, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
|
Dictionary containing compressed data:
|
task_feats |
|
Dictionary of task-specific features:
|
MPC_I2
MPC_I2(dino_backbone={}, dino_codec={}, **kwargs)
Multi-Purpose Compression model using DINOv2 backbone only.
This is a single-layer compression model that uses DINOv2 for feature extraction and ViT-based latent codec for compression. It supports multiple downstream tasks including classification and segmentation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
|
Configuration dictionary for the DINOv2 backbone. If "type" key is present, uses instantiate_class for dynamic instantiation. Otherwise, uses Dinov2TimmBackbone with provided config. |
{}
|
dino_codec
|
|
Configuration dictionary for the DINO codec. If "type" key is present in dino_backbone, uses instantiate_class. Otherwise, uses VitUnionLatentCodec with provided config. |
{}
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
|
The DINOv2 backbone model (Dinov2TimmBackbone or dynamically instantiated). |
|
|
The DINO codec (VitUnionLatentCodec or dynamically instantiated). |
|
|
|
Patch size used by the backbone model. |
compress
compress(x, qp=0, **kwargs)
Compress input image to byte strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter. Defaults to 0. |
0
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data:
|
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
|
Dictionary of task-specific features:
|
extract_feature
extract_feature(x, **kwargs)
Extract features from input image for offline training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
forward
forward(x, qp=0, **kwargs)
Forward pass for training with learned image compression (LIC).
Encodes input image using DINOv2, compresses features with codec, and returns reconstructed features and likelihoods for training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter. Defaults to 0. |
0
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|
forward_test
forward_test(x, qp=0, tasks=[], **kwargs)
Forward pass for testing/inference with compression.
Encodes input image, compresses features, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
|
Quantization parameter. Defaults to 0. |
0
|
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing compressed data:
|
task_feats |
|
Dictionary of task-specific features:
|
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
|
Total number of elements in the feature tensor. |
offline_forward
offline_forward(data, device, qp=0, **kwargs)
Offline forward pass for training with pre-extracted features.
Processes pre-extracted DINO features for learned image compression training. This method is used when features are extracted separately to save memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
|
Dictionary containing:
|
required |
device
|
|
Device to move tensors to. |
required |
qp
|
|
Quantization parameter. Defaults to 0. |
0
|
**kwargs
|
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
|
Dictionary containing:
|