Skip to content

mpcompress.models

Dinov2OrigSlideOnlyPatchCodec

Dinov2OrigSlideOnlyPatchCodec(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, dino_codec={}, **kwargs)

Compression model using DINOv2-Original backbone with sliding window and VTM codec.

This model uses a sliding window approach to handle large images by processing them in overlapping patches. It extracts features using DINOv2-Original backbone and compresses them using VTM codec. Supports segmentation tasks but not classification tasks.

Parameters:

Name Type Description Default
slide_size list of int

Size of each sliding window patch [height, width]. Defaults to [518, 518].

[518, 518]
slide_stride list of int

Stride for sliding window [height_stride, width_stride]. Defaults to [259, 259].

[259, 259]
dino_backbone dict

Configuration dictionary for the DINOv2-Original backbone. Passed directly to Dinov2OrgBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino Dinov2OrgBackbone

The DINOv2-Original backbone model.

dino_codec VtmFeatureCodec

The VTM feature codec for compression.

patch_size int

Patch size used by the backbone model.

img_size int or tuple

Image size expected by the backbone.

dynamic_size bool

Whether the model supports dynamic input sizes.

slide_size list of int

Size of each sliding window patch.

slide_stride list of int

Stride for sliding window.

compress

compress(x, qp)

Compress input image to byte strings using sliding window approach.

Processes input image using sliding window, extracts features, and compresses them using VTM codec.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data in CompressAI-compatible format:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

Note:

h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features using sliding window.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification task (not supported)
  • "seg": Segmentation features (if "seg" in tasks)

forward

forward(x)

Forward pass for training (not implemented).

VTM codec does not require training, so this method raises an error.

Parameters:

Name Type Description Default
x Tensor

Input image tensor.

required

Raises:

Type Description
NotImplementedError

Always raised as VTM does not need training.

forward_test

forward_test(x, qp, tasks=[], **kwargs)

Forward pass for testing/inference with compression using sliding window.

Processes input image using sliding window approach, extracts features, compresses them with VTM codec, and generates task-specific features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification task (not supported)
  • "seg": Segmentation features (if "seg" in tasks)

Note:

h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Uses sliding window approach to extract features and calculates the total number of elements across all crops and layers.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the stacked feature tensor.

Dinov2TimmBypass

Dinov2TimmBypass(dino_backbone={}, **kwargs)

A bypass model using DINOv2-Timm backbone for feature extraction without compression.

This model performs feature extraction using a DINOv2-Timm backbone and supports multiple downstream tasks (classification, segmentation) without any compression operations. It returns empty byte strings as a placeholder for compressed data.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino Dinov2TimmBackbone

The DINOv2-Timm backbone model.

patch_size int

Patch size used by the backbone model.

forward_test

forward_test(x, tasks=[], **kwargs)

Forward pass for testing/inference without compression.

Extracts features using the DINOv2 backbone and generates task-specific features (classification, segmentation) without performing any compression. Returns empty byte strings as a placeholder for compressed data.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing:

  • "strings": Dictionary with "bypass" key containing empty bytes
  • "pstate": Dictionary with "token_res" (token resolution)
task_feats dict

Dictionary of task-specific features

  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the feature tensor.

Dinov2TimmOnlyPatchCodec

Dinov2TimmOnlyPatchCodec(dino_backbone={}, dino_codec={}, **kwargs)

Compression model using DINOv2-Timm backbone with VTM feature codec.

This model extracts features using a DINOv2-Timm backbone and compresses them using VTM (Video Test Model) codec. It supports segmentation tasks but not classification tasks.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino Dinov2TimmBackbone

The DINOv2-Timm backbone model.

dino_codec VtmFeatureCodec

The VTM feature codec for compression.

patch_size int

Patch size used by the backbone model.

img_size int or tuple

Image size expected by the backbone.

dynamic_size bool

Whether the model supports dynamic input sizes.

compress

compress(x, qp)

Compress input image to byte strings.

Extracts features using DINOv2 backbone and compresses them using VTM codec.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "seg": Segmentation features (if "seg" in tasks)

forward

forward(x)

Forward pass for training (not implemented).

VTM codec does not require training, so this method raises an error.

Parameters:

Name Type Description Default
x Tensor

Input image tensor.

required

Raises:

Type Description
NotImplementedError

Always raised as VTM does not need training.

forward_test

forward_test(x, qp, tasks, **kwargs)

Forward pass for testing/inference with compression.

Extracts features using DINOv2 backbone, compresses them with VTM codec, and generates task-specific features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
task_feats dict

Dictionary of task-specific features:

  • "seg": Segmentation features (if "seg" in tasks)

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the feature tensor after segmentation decoding.

MPC_I1

MPC_I1(vqgan_config, **kwargs)

Multi-Purpose Compression model using VQGAN backbone only.

This is a single-layer compression model that uses VQGAN for feature extraction and uniform token codec for compression. It provides basic image reconstruction capabilities.

Parameters:

Name Type Description Default
vqgan_config dict

Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor.

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
vqgan VqganBackbone

The VQGAN backbone model.

vqgan_codec UniformTokenCodec

The uniform token codec for compression.

patch_size int

Patch size used by the model (fixed at 16).

compress

compress(x, **kwargs)

Compress input image to byte strings.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

decompress

decompress(coded_unit, **kwargs)

Decompress byte strings to reconstructed image and features.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "z_q": Quantized features
  • "tokens": VQGAN tokens
  • "x_hat": Reconstructed image tensor

forward

forward(x, **kwargs)

Forward pass for training.

Encodes input image using VQGAN, compresses tokens, and decodes to reconstructed image.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "likelihoods": Likelihoods from the codec
  • "x_hat": Reconstructed image tensor

MPC_I12

MPC_I12(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)

Multi-Purpose Compression model with two layers: VQGAN and DINOv2.

This is a two-layer compression model that combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression. The DINOv2 codec uses VQGAN context for conditional compression. Supports multiple tasks including reconstruction, classification, and segmentation.

Parameters:

Name Type Description Default
vqgan_backbone dict

Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor.

{}
vqgan_codec dict

Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor.

{}
dino_backbone dict

Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecWithCtx constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
vqgan VqganBackbone

The VQGAN backbone model.

vqgan_codec UniformTokenCodec

The VQGAN codec.

dino Dinov2TimmBackbone

The DINOv2 backbone model.

dino_codec VitUnionLatentCodecWithCtx

The DINO codec with context.

patch_size int

Patch size used by the DINOv2 backbone.

cond_dec_for_vqgan Sequential

Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features.

extract_feature

extract_feature(x, **kwargs)

Extract features from input image for offline training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "tokens": VQGAN tokens
  • "h_dino": Extracted DINO features

forward

forward(x, **kwargs)

Forward pass for training.

Processes input through both VQGAN and DINOv2 layers, performs conditional compression, and returns features and likelihoods for training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_vqgan": Original VQGAN features
  • "h_vqgan_hat": Enhanced VQGAN features
  • "h_dino": Original DINO features
  • "h_dino_hat": Reconstructed DINO features
  • "likelihoods": Likelihoods from the DINO codec
  • "x_hat": Reconstructed image (only in eval mode, None during training)

forward_test

forward_test(x, tasks, **kwargs)

Forward pass for testing/inference with compression.

Processes input through both layers, compresses features, and generates task-specific outputs.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "rec1": Basic VQGAN reconstruction
  • "rec2": Enhanced reconstruction using DINOv2 features
  • "cls": Classification task
  • "seg": Segmentation task
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data
task_feats dict

Dictionary of task-specific features:

  • "rec1": Basic reconstruction (if "rec1" in tasks)
  • "rec2": Enhanced reconstruction (if "rec2" in tasks)
  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

offline_forward

offline_forward(data, device, **kwargs)

Offline forward pass for training with pre-extracted features.

Processes pre-extracted VQGAN tokens and DINO features for training. This method is used when features are extracted separately to save memory.

Parameters:

Name Type Description Default
data dict

Dictionary containing:

  • "h_dino": Pre-extracted DINO features
  • "tokens": Pre-extracted VQGAN tokens
  • "x_shape": Original image shape (B, C, H, W)
required
device device

Device to move tensors to.

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_vqgan": VQGAN context features
  • "h_vqgan_hat": Enhanced VQGAN features
  • "h_dino": Original DINO features
  • "h_dino_hat": Reconstructed DINO features
  • "likelihoods": Likelihoods from the DINO codec
  • "x_hat": Reconstructed image (only in eval mode, None during training)

MPC_I12_CtxAsHyper

MPC_I12_CtxAsHyper(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)

Multi-Purpose Compression model with context as hyperprior.

This is a variant of MPC_I12 where the VQGAN context is treated as hyperprior for the DINOv2 codec. Similar to MPC_I12, it combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression, but uses a different codec architecture that treats context as hyperprior.

Parameters:

Name Type Description Default
vqgan_backbone dict

Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor.

{}
vqgan_codec dict

Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor.

{}
dino_backbone dict

Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecCtxAsHyper constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
vqgan VqganBackbone

The VQGAN backbone model.

vqgan_codec UniformTokenCodec

The VQGAN codec.

dino Dinov2TimmBackbone

The DINOv2 backbone model.

dino_codec VitUnionLatentCodecCtxAsHyper

The DINO codec with context as hyperprior.

patch_size int

Patch size used by the DINOv2 backbone.

cond_dec_for_vqgan Sequential

Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features.

compress

compress(x, **kwargs)

Compress input image to byte strings using both layers.

Processes input through both VQGAN and DINOv2 layers, compresses features with context as hyperprior, and returns compressed data.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data

decompress

decompress(coded_data, tasks=[], **kwargs)

Decompress byte strings to task-specific features.

Decompresses both VQGAN and DINOv2 layers, uses context as hyperprior for DINO decompression, and generates task-specific outputs.

Parameters:

Name Type Description Default
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "rec1": Basic VQGAN reconstruction
  • "rec2": Enhanced reconstruction using DINOv2 features
  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "rec1": Basic reconstruction (if "rec1" in tasks)
  • "rec2": Enhanced reconstruction (if "rec2" in tasks)
  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

forward

forward(x, **kwargs)

Forward pass for training.

Processes input through both VQGAN and DINOv2 layers, performs conditional compression with context as hyperprior, and returns features and likelihoods.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_vqgan": Original VQGAN features
  • "h_vqgan_hat": Enhanced VQGAN features
  • "h_dino": Original DINO features
  • "h_dino_hat": Reconstructed DINO features
  • "likelihoods": Likelihoods from the DINO codec
  • "x_hat": Reconstructed image (only in eval mode, None during training)

forward_test

forward_test(x, tasks, **kwargs)

Forward pass for testing/inference with compression.

Processes input through both layers, compresses features with context as hyperprior, and generates task-specific outputs.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "rec1": Basic VQGAN reconstruction
  • "rec2": Enhanced reconstruction using DINOv2 features
  • "cls": Classification task
  • "seg": Segmentation task
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data
task_feats dict

Dictionary of task-specific features:

  • "rec1": Basic reconstruction (if "rec1" in tasks)
  • "rec2": Enhanced reconstruction (if "rec2" in tasks)
  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

MPC_I2

MPC_I2(dino_backbone={}, dino_codec={}, **kwargs)

Multi-Purpose Compression model using DINOv2 backbone only.

This is a single-layer compression model that uses DINOv2 for feature extraction and ViT-based latent codec for compression. It supports multiple downstream tasks including classification and segmentation.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration dictionary for the DINOv2 backbone. If "type" key is present, uses instantiate_class for dynamic instantiation. Otherwise, uses Dinov2TimmBackbone with provided config.

{}
dino_codec dict

Configuration dictionary for the DINO codec. If "type" key is present in dino_backbone, uses instantiate_class. Otherwise, uses VitUnionLatentCodec with provided config.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino

The DINOv2 backbone model (Dinov2TimmBackbone or dynamically instantiated).

dino_codec

The DINO codec (VitUnionLatentCodec or dynamically instantiated).

patch_size int

Patch size used by the backbone model.

compress

compress(x, qp=0, **kwargs)

Compress input image to byte strings.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter. Defaults to 0.

0
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

extract_feature

extract_feature(x, **kwargs)

Extract features from input image for offline training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_dino": Extracted DINO features

forward

forward(x, qp=0, **kwargs)

Forward pass for training with learned image compression (LIC).

Encodes input image using DINOv2, compresses features with codec, and returns reconstructed features and likelihoods for training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter. Defaults to 0.

0
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_dino_hat": Reconstructed DINO features
  • "h_dino": Original DINO features
  • "likelihoods": Likelihoods from the codec

forward_test

forward_test(x, qp=0, tasks=[], **kwargs)

Forward pass for testing/inference with compression.

Encodes input image, compresses features, and generates task-specific outputs.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter. Defaults to 0.

0
tasks list of str

List of tasks to perform. Supported tasks:

  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
  • "h_hat": Reconstructed features
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the feature tensor.

offline_forward

offline_forward(data, device, qp=0, **kwargs)

Offline forward pass for training with pre-extracted features.

Processes pre-extracted DINO features for learned image compression training. This method is used when features are extracted separately to save memory.

Parameters:

Name Type Description Default
data dict

Dictionary containing:

  • "h_dino": Pre-extracted DINO features
  • "x_shape": Original image shape (B, C, H, W)
required
device device

Device to move tensors to.

required
qp int

Quantization parameter. Defaults to 0.

0
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_dino_hat": Reconstructed DINO features
  • "h_dino": Original DINO features
  • "likelihoods": Likelihoods from the codec