Skip to content

mpcompress.models

Dinov2OrigClsBypass

Dinov2OrigClsBypass(dino_backbone={}, **kwargs)

DINOv2-Original backbone for classification, WITHOUT compression (bypass).

This model extracts features via encode and decodes for classification, without any compression step. Useful for testing baseline classification accuracy without compression.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration for Dinov2OrgBackbone.

{}

Attributes:

Name Type Description
dino Dinov2OrgBackbone

The DINOv2 backbone.

patch_size int

Patch size used by the backbone.

img_size int or tuple

Image size expected by the backbone.

forward_test

forward_test(x, qp=None, tasks=[], **kwargs)

Forward pass WITHOUT compression - just extract and decode for classification.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Ignored (no compression).

None
tasks list of str

List of tasks. Supported: "cls".

[]

Returns:

Name Type Description
coded_data dict

Mock coded data with zero bits.

task_feats dict

Dictionary with "cls" key if "cls" in tasks.

Dinov2OrigClsFCVQ

Dinov2OrigClsFCVQ(dino_backbone={}, fcvq_codec={}, **kwargs)

DINOv2-Original backbone with sliding window + FCVQ compression for Classification.

This model combines Dinov2OrgBackbone with FCVQ codec for feature compression on classification tasks.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration for Dinov2OrgBackbone.

{}
fcvq_codec dict

Configuration for FCVQ codec.

{}

Attributes:

Name Type Description
dino Dinov2OrgBackbone

The DINOv2 backbone.

fcvq FCVQ

The FCVQ codec.

compress

compress(x, qp=None)

Compress input image to byte strings using FCVQ for classification.

Extracts features via encode, compresses with FCVQ, returns coded_unit. The qp parameter is unused (FCVQ has no QP).

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Unused; kept for API compatibility.

None

Returns:

Name Type Description
coded_unit dict

Dictionary containing: - "strings": {"fcvq": [strings]} - "pstate": {"feat_shape": tuple}

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features for classification.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing: - "strings": {"fcvq": [strings]} from compress - "pstate": {"feat_shape": tuple}

required
tasks list of str

List of tasks. Supported: "cls".

[]
**kwargs

Additional arguments (unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary with "cls" key if "cls" in tasks.

forward_test

forward_test(x, qp=None, tasks=[], **kwargs)

Forward pass with FCVQ compression for classification.

get_feature_numel

get_feature_numel(x)

Total number of elements in encoded features (for bpfp).

Dinov2OrigSlideOnlyPatchCodec

Dinov2OrigSlideOnlyPatchCodec(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, dino_codec={}, **kwargs)

Compression model using DINOv2-Original backbone with sliding window and VTM codec.

This model uses a sliding window approach to handle large images by processing them in overlapping patches. It extracts features using DINOv2-Original backbone and compresses them using VTM codec. Supports segmentation tasks but not classification tasks.

Parameters:

Name Type Description Default
slide_size list of int

Size of each sliding window patch [height, width]. Defaults to [518, 518].

[518, 518]
slide_stride list of int

Stride for sliding window [height_stride, width_stride]. Defaults to [259, 259].

[259, 259]
dino_backbone dict

Configuration dictionary for the DINOv2-Original backbone. Passed directly to Dinov2OrgBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino Dinov2OrgBackbone

The DINOv2-Original backbone model.

dino_codec VtmFeatureCodec

The VTM feature codec for compression.

patch_size int

Patch size used by the backbone model.

img_size int or tuple

Image size expected by the backbone.

dynamic_size bool

Whether the model supports dynamic input sizes.

slide_size list of int

Size of each sliding window patch.

slide_stride list of int

Stride for sliding window.

compress

compress(x, qp)

Compress input image to byte strings using sliding window approach.

Processes input image using sliding window, extracts features, and compresses them using VTM codec.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data in CompressAI-compatible format:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

Note:

h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features using sliding window.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification task (not supported)
  • "seg": Segmentation features (if "seg" in tasks)

forward

forward(x)

Forward pass for training (not implemented).

VTM codec does not require training, so this method raises an error.

Parameters:

Name Type Description Default
x Tensor

Input image tensor.

required

Raises:

Type Description
NotImplementedError

Always raised as VTM does not need training.

forward_test

forward_test(x, qp, tasks=[], **kwargs)

Forward pass for testing/inference with compression using sliding window.

Processes input image using sliding window approach, extracts features, compresses them with VTM codec, and generates task-specific features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification task (not supported)
  • "seg": Segmentation features (if "seg" in tasks)

Note:

h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Uses sliding window approach to extract features and calculates the total number of elements across all crops and layers.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the stacked feature tensor.

Dinov2OrigSlideSegBypass

Dinov2OrigSlideSegBypass(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, **kwargs)

DINOv2-Original backbone with sliding window, WITHOUT compression (bypass).

This model is identical to Dinov2OrigSlideOnlyPatchCodec but skips the VTM compression step. Useful for testing baseline mIoU without compression.

Parameters:

Name Type Description Default
slide_size list of int

Size of each sliding window patch [height, width].

[518, 518]
slide_stride list of int

Stride for sliding window [height_stride, width_stride].

[259, 259]
dino_backbone dict

Configuration dictionary for the DINOv2-Original backbone.

{}

Attributes:

Name Type Description
dino Dinov2OrgBackbone

The DINOv2-Original backbone model.

patch_size int

Patch size used by the backbone model.

img_size int or tuple

Image size expected by the backbone.

dynamic_size bool

Whether the model supports dynamic input sizes.

slide_size list of int

Size of each sliding window patch.

slide_stride list of int

Stride for sliding window.

compress

compress(x, qp)

Compress is not supported - returns empty.

decompress

decompress(*args, **kwargs)

Decompress is not supported - returns empty.

forward_test

forward_test(x, qp=None, tasks=[], **kwargs)

Forward pass WITHOUT compression - just extract and decode features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Ignored (no compression).

None
tasks list of str

List of tasks to perform: - "seg": Segmentation task - "cls": Classification task (not supported)

[]

Returns:

Name Type Description
coded_data dict

Mock coded data with zero bits.

task_feats dict

Dictionary of task-specific features: - "seg": Segmentation features (if "seg" in tasks)

Dinov2OrigSlideSegFCVQ

Dinov2OrigSlideSegFCVQ(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, fcvq_codec={}, **kwargs)

DINOv2-Original backbone with sliding window + FCVQ compression for Segmentation.

This model combines Dinov2OrgBackbone (slide encode) with FCVQ codec for feature compression on segmentation tasks.

Parameters:

Name Type Description Default
slide_size list of int

Size of each sliding window patch [height, width].

[518, 518]
slide_stride list of int

Stride for sliding window [height_stride, width_stride].

[259, 259]
dino_backbone dict

Configuration for Dinov2OrgBackbone.

{}
fcvq_codec dict

Configuration for FCVQ codec.

{}

Attributes:

Name Type Description
dino Dinov2OrgBackbone

The DINOv2 backbone.

fcvq FCVQ

The FCVQ codec.

compress

compress(x, qp=None)

Compress input image to byte strings using sliding window + FCVQ.

Extracts features via slide_encode, compresses patch tokens with FCVQ, and returns coded_unit. The qp parameter is unused (FCVQ has no QP).

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Unused; kept for API compatibility with VTM models.

None

Returns:

Name Type Description
coded_unit dict

Dictionary containing: - "strings": {"vtm": [strings]} where strings is list of bytes from FCVQ - "pstate": {"feat_shape": stacked_feat_hat.shape}

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features using sliding window.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing: - "strings": {"fcvq": [strings]} from compress - "pstate": {"feat_shape": tuple}

required
tasks list of str

List of tasks. Supported: "seg".

[]
**kwargs

Additional arguments (unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary with "seg" key if "seg" in tasks.

forward_test

forward_test(x, qp=None, tasks=[], **kwargs)

Forward pass with FCVQ compression for segmentation.

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Uses sliding window approach to extract features and calculates the total number of elements across all crops and layers.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the stacked feature tensor.

Dinov2TimmBypass

Dinov2TimmBypass(dino_backbone={}, **kwargs)

A bypass model using DINOv2-Timm backbone for feature extraction without compression.

This model performs feature extraction using a DINOv2-Timm backbone and supports multiple downstream tasks (classification, segmentation) without any compression operations. It returns empty byte strings as a placeholder for compressed data.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino Dinov2TimmBackbone

The DINOv2-Timm backbone model.

patch_size int

Patch size used by the backbone model.

forward_test

forward_test(x, tasks=[], **kwargs)

Forward pass for testing/inference without compression.

Extracts features using the DINOv2 backbone and generates task-specific features (classification, segmentation) without performing any compression. Returns empty byte strings as a placeholder for compressed data.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing:

  • "strings": Dictionary with "bypass" key containing empty bytes
  • "pstate": Dictionary with "token_res" (token resolution)
task_feats dict

Dictionary of task-specific features

  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the feature tensor.

Dinov2TimmOnlyPatchCodec

Dinov2TimmOnlyPatchCodec(dino_backbone={}, dino_codec={}, **kwargs)

Compression model using DINOv2-Timm backbone with VTM feature codec.

This model extracts features using a DINOv2-Timm backbone and compresses them using VTM (Video Test Model) codec. It supports segmentation tasks but not classification tasks.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino Dinov2TimmBackbone

The DINOv2-Timm backbone model.

dino_codec VtmFeatureCodec

The VTM feature codec for compression.

patch_size int

Patch size used by the backbone model.

img_size int or tuple

Image size expected by the backbone.

dynamic_size bool

Whether the model supports dynamic input sizes.

compress

compress(x, qp)

Compress input image to byte strings.

Extracts features using DINOv2 backbone and compresses them using VTM codec.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "seg": Segmentation features (if "seg" in tasks)

forward

forward(x)

Forward pass for training (not implemented).

VTM codec does not require training, so this method raises an error.

Parameters:

Name Type Description Default
x Tensor

Input image tensor.

required

Raises:

Type Description
NotImplementedError

Always raised as VTM does not need training.

forward_test

forward_test(x, qp, tasks, **kwargs)

Forward pass for testing/inference with compression.

Extracts features using DINOv2 backbone, compresses them with VTM codec, and generates task-specific features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter for VTM compression.

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "seg": Segmentation task
  • "cls": Classification task (not supported)
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
task_feats dict

Dictionary of task-specific features:

  • "seg": Segmentation features (if "seg" in tasks)

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the feature tensor after segmentation decoding.

MLoREFrameCodec

MLoREFrameCodec(p=None, stage='stage1', pretrained=True, img_size=(512, 512), drop_path_rate=0.15, device='cuda', **kwargs)

MLoRE Frame-level Codec for multi-task feature compression.

This FrameCodec implements the MPCompress framework specification for compressing a single Access Unit (frame) with multi-task support.

Architecture follows the Feature DU design: - Feature Encoder: ViT backbone front-end extracts intermediate features - Compress: Hyperprior-style codec compresses features to bitstream - Feature Decoder: ViT backbone back-end + task heads produce outputs

Parameters:

Name Type Description Default
p

Configuration dict containing model parameters

None
stage

Training stage ('stage0', 'stage1', 'stage2')

'stage1'
pretrained

Whether to load pretrained ViT weights

True
img_size

Input image size (H, W)

(512, 512)
drop_path_rate

Drop path rate for stochastic depth

0.15
device

Device to run on

'cuda'

compress

compress(x: Tensor, tasks: List[str] = None, **kwargs) -> Dict[str, Any]

Compress a single frame (Access Unit).

Following the MPCompress DataUnitCodec interface, compresses input image to coded_unit format.

Parameters:

Name Type Description Default
x Tensor

Input image tensor (B, 3, H, W)

required
tasks List[str]

List of tasks for encoding hints

None
**kwargs

Additional encoding parameters

{}

Returns:

Name Type Description
coded_unit Dict[str, Any]

Dictionary following framework specification: { "strings": {"y": [[bytes]], "z": [[bytes]]}, # Compressed bitstreams "pstate": { "shape": (H, W), "input_shape": (B, C, H, W), "tasks": [...], "resolution": [H, W] } }

decompress

decompress(coded_unit: Dict[str, Any] = None, strings: Dict[str, List[List[bytes]]] = None, pstate: Dict[str, Any] = None, tasks: List[str] = None, **kwargs) -> Dict[str, torch.Tensor]

Decompress a single frame (Access Unit).

Following the MPCompress DataUnitCodec interface, decodes coded_unit to task outputs.

Parameters:

Name Type Description Default
coded_unit Dict[str, Any]

Dictionary from compress() (alternative to strings/pstate)

None
strings Dict[str, List[List[bytes]]]

Compressed bitstreams dict

None
pstate Dict[str, Any]

State dict for decoding

None
tasks List[str]

List of tasks to decode (default: from pstate)

None
**kwargs

Additional decoding parameters

{}

Returns:

Name Type Description
task_feats Dict[str, Tensor]

Dictionary with task outputs: { "semseg": tensor, "edge": tensor, ... }

forward

forward(x, tasks=None, episode_tasks=None, return_feat=False)

Forward pass for training.

Parameters:

Name Type Description Default
x

Input image tensor (B, 3, H, W)

required
tasks

List of tasks to compute (default: all)

None
episode_tasks

Task grouping for multi-task routing

None
return_feat

If True, return features before heads

False

Returns:

Type Description

Dict with task predictions and auxiliary info (bpp_loss, mse_loss)

forward_test

forward_test(x: Tensor, tasks: List[str] = None, return_likelihoods: bool = True, **kwargs) -> Dict[str, Any]

Forward pass for testing/evaluation.

This method performs inference and returns task predictions along with rate estimation via likelihoods.

Parameters:

Name Type Description Default
x Tensor

Input image tensor (B, 3, H, W)

required
tasks List[str]

List of tasks to compute (default: all)

None
return_likelihoods bool

Whether to return likelihoods for rate estimation

True
**kwargs

Additional arguments

{}

Returns:

Type Description
Dict[str, Any]

Dict containing: - Task predictions (e.g., 'semseg', 'edge', etc.) - 'likelihoods': Dict of likelihoods for rate estimation (if return_likelihoods) - 'bpp_loss', 'mse_loss': Compression losses

get_compression_module

get_compression_module()

Get the feature compression module.

get_feature_numel

get_feature_numel(x: Tensor) -> int

Get number of elements in feature representation.

load_checkpoint

load_checkpoint(checkpoint_path: str, strict: bool = False)

Load model weights from checkpoint.

Parameters:

Name Type Description Default
checkpoint_path str

Path to checkpoint file

required
strict bool

Whether to strictly enforce state_dict key matching

False

set_grad_mode

set_grad_mode(mode: str)

Set gradient mode for different training phases.

Parameters:

Name Type Description Default
mode str

Gradient mode: - 'full_finetune': Train all parameters - 'only_compress': Train only compression module - 'finetune_mona': Fine-tune Mona adapters and heads

required

update

update(scale_table=None, force=False)

Update entropy model parameters.

MLoREVideoCodec

MLoREVideoCodec(frame_codec: MLoREFrameCodec = None, p=None, stage: str = 'stage1', pretrained: bool = True, img_size: Tuple[int, int] = (512, 512), device: str = 'cuda', **kwargs)

MLoRE Video-level Codec for multi-task feature compression.

This VideoCodec implements the MPCompress framework specification for compressing entire video sequences with multi-task support.

Supports both frame-wise and layer-wise organization: - frame_wise: Each frame compressed independently - layer_wise: Features and metadata organized by layer

Framework interface methods: - compress_video(video_reader, meta, codec_args) -> coded_data - decompress_video(coded_data, codec_args) -> results - compress_frame(frame, codec_args) -> coded_unit - decompress_frame(coded_unit, codec_args) -> task_feats

Parameters:

Name Type Description Default
frame_codec MLoREFrameCodec

MLoREFrameCodec instance or config

None
p

Configuration dict (if frame_codec not provided)

None
stage str

Training stage ('stage0', 'stage1', 'stage2')

'stage1'
pretrained bool

Whether to load pretrained weights

True
img_size Tuple[int, int]

Input image size

(512, 512)
device str

Device to run on

'cuda'

compress_frame

compress_frame(frame: Tensor, codec_args: Dict[str, Any] = None) -> Dict[str, Any]

Compress a single frame (Access Unit).

Parameters:

Name Type Description Default
frame Tensor

Input frame tensor (B, 3, H, W) or (3, H, W)

required
codec_args Dict[str, Any]

Encoding arguments

None

Returns:

Name Type Description
coded_unit Dict[str, Any]

Compressed frame data

compress_video

compress_video(video_reader, meta: Dict[str, Any] = None, codec_args: Dict[str, Any] = None) -> Dict[str, Any]

Compress an entire video sequence.

Following the MPCompress VideoCodec interface, compresses all frames and returns coded_data intermediate representation.

Parameters:

Name Type Description Default
video_reader

Video reader object supporting iteration

required
meta Dict[str, Any]

Video metadata dict with keys like: - seq_name: Sequence name - src_width, src_height: Original dimensions - frame_num: Total frame count

None
codec_args Dict[str, Any]

Encoding arguments: - tasks: List of tasks to encode for

None

Returns:

Name Type Description
coded_data Dict[str, Any]

Dictionary following framework specification: { "type": "frame_wise_video", "data": { 0: coded_unit_0, 1: coded_unit_1, ... }, "meta": {...} }

decompress_frame

decompress_frame(coded_unit: Dict[str, Any], codec_args: Dict[str, Any] = None) -> Dict[str, torch.Tensor]

Decompress a single frame (Access Unit).

Parameters:

Name Type Description Default
coded_unit Dict[str, Any]

Compressed frame data from compress_frame()

required
codec_args Dict[str, Any]

Decoding arguments

None

Returns:

Name Type Description
task_feats Dict[str, Tensor]

Task prediction outputs

decompress_video

decompress_video(coded_data: Dict[str, Any], codec_args: Dict[str, Any] = None) -> Dict[int, Dict[str, torch.Tensor]]

Decompress an entire video sequence.

Following the MPCompress VideoCodec interface, decodes coded_data to task outputs for each frame.

Parameters:

Name Type Description Default
coded_data Dict[str, Any]

Dictionary from compress_video()

required
codec_args Dict[str, Any]

Decoding arguments: - tasks: List of tasks to decode

None

Returns:

Name Type Description
results Dict[int, Dict[str, Tensor]]

Dictionary with frame index as key: { 0: {"semseg": tensor, "edge": tensor, ...}, 1: {...}, ... }

forward

forward(x: Tensor, tasks: List[str] = None, **kwargs)

Forward pass through frame codec.

forward_test

forward_test(x: Tensor, tasks: List[str] = None, **kwargs)

Test forward pass through frame codec.

load_checkpoint

load_checkpoint(checkpoint_path: str, strict: bool = False)

Load model weights from checkpoint.

update

update(scale_table=None, force: bool = False)

Update entropy model parameters.

MLoREWrapperCodec

MLoREWrapperCodec(p, checkpoint_path=None)

Wrapper to use RFC's original MLoREWrapper_coding class.

This provides exact compatibility with pretrained RFC weights while conforming to MPCompress interfaces.

Parameters:

Name Type Description Default
p

Configuration dict

required
checkpoint_path

Path to pretrained weights

None

compress

compress(x, tasks=None, **kwargs)

Compress features.

decompress

decompress(coded_unit, tasks=None, **kwargs)

Decompress to task outputs.

forward

forward(x, tasks=None, episode_tasks=None, **kwargs)

Forward pass.

load_checkpoint

load_checkpoint(checkpoint_path)

Load pretrained weights.

MPC_I1

MPC_I1(vqgan_config, **kwargs)

Multi-Purpose Compression model using VQGAN backbone only.

This is a single-layer compression model that uses VQGAN for feature extraction and uniform token codec for compression. It provides basic image reconstruction capabilities.

Parameters:

Name Type Description Default
vqgan_config dict

Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor.

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
vqgan VqganBackbone

The VQGAN backbone model.

vqgan_codec UniformTokenCodec

The uniform token codec for compression.

patch_size int

Patch size used by the model (fixed at 16).

compress

compress(x, **kwargs)

Compress input image to byte strings.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

decompress

decompress(coded_unit, **kwargs)

Decompress byte strings to reconstructed image and features.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "z_q": Quantized features
  • "tokens": VQGAN tokens
  • "x_hat": Reconstructed image tensor

forward

forward(x, **kwargs)

Forward pass for training.

Encodes input image using VQGAN, compresses tokens, and decodes to reconstructed image.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "likelihoods": Likelihoods from the codec
  • "x_hat": Reconstructed image tensor

MPC_I12

MPC_I12(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)

Multi-Purpose Compression model with two layers: VQGAN and DINOv2.

This is a two-layer compression model that combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression. The DINOv2 codec uses VQGAN context for conditional compression. Supports multiple tasks including reconstruction, classification, and segmentation.

Parameters:

Name Type Description Default
vqgan_backbone dict

Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor.

{}
vqgan_codec dict

Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor.

{}
dino_backbone dict

Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecWithCtx constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
vqgan VqganBackbone

The VQGAN backbone model.

vqgan_codec UniformTokenCodec

The VQGAN codec.

dino Dinov2TimmBackbone

The DINOv2 backbone model.

dino_codec VitUnionLatentCodecWithCtx

The DINO codec with context.

patch_size int

Patch size used by the DINOv2 backbone.

cond_dec_for_vqgan Sequential

Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features.

extract_feature

extract_feature(x, **kwargs)

Extract features from input image for offline training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "tokens": VQGAN tokens
  • "h_dino": Extracted DINO features

forward

forward(x, **kwargs)

Forward pass for training.

Processes input through both VQGAN and DINOv2 layers, performs conditional compression, and returns features and likelihoods for training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_vqgan": Original VQGAN features
  • "h_vqgan_hat": Enhanced VQGAN features
  • "h_dino": Original DINO features
  • "h_dino_hat": Reconstructed DINO features
  • "likelihoods": Likelihoods from the DINO codec
  • "x_hat": Reconstructed image (only in eval mode, None during training)

forward_test

forward_test(x, tasks, **kwargs)

Forward pass for testing/inference with compression.

Processes input through both layers, compresses features, and generates task-specific outputs.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "rec1": Basic VQGAN reconstruction
  • "rec2": Enhanced reconstruction using DINOv2 features
  • "cls": Classification task
  • "seg": Segmentation task
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data
task_feats dict

Dictionary of task-specific features:

  • "rec1": Basic reconstruction (if "rec1" in tasks)
  • "rec2": Enhanced reconstruction (if "rec2" in tasks)
  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

offline_forward

offline_forward(data, device, **kwargs)

Offline forward pass for training with pre-extracted features.

Processes pre-extracted VQGAN tokens and DINO features for training. This method is used when features are extracted separately to save memory.

Parameters:

Name Type Description Default
data dict

Dictionary containing:

  • "h_dino": Pre-extracted DINO features
  • "tokens": Pre-extracted VQGAN tokens
  • "x_shape": Original image shape (B, C, H, W)
required
device device

Device to move tensors to.

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_vqgan": VQGAN context features
  • "h_vqgan_hat": Enhanced VQGAN features
  • "h_dino": Original DINO features
  • "h_dino_hat": Reconstructed DINO features
  • "likelihoods": Likelihoods from the DINO codec
  • "x_hat": Reconstructed image (only in eval mode, None during training)

MPC_I12_CtxAsHyper

MPC_I12_CtxAsHyper(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)

Multi-Purpose Compression model with context as hyperprior.

This is a variant of MPC_I12 where the VQGAN context is treated as hyperprior for the DINOv2 codec. Similar to MPC_I12, it combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression, but uses a different codec architecture that treats context as hyperprior.

Parameters:

Name Type Description Default
vqgan_backbone dict

Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor.

{}
vqgan_codec dict

Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor.

{}
dino_backbone dict

Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor.

{}
dino_codec dict

Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecCtxAsHyper constructor.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
vqgan VqganBackbone

The VQGAN backbone model.

vqgan_codec UniformTokenCodec

The VQGAN codec.

dino Dinov2TimmBackbone

The DINOv2 backbone model.

dino_codec VitUnionLatentCodecCtxAsHyper

The DINO codec with context as hyperprior.

patch_size int

Patch size used by the DINOv2 backbone.

cond_dec_for_vqgan Sequential

Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features.

compress

compress(x, **kwargs)

Compress input image to byte strings using both layers.

Processes input through both VQGAN and DINOv2 layers, compresses features with context as hyperprior, and returns compressed data.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data

decompress

decompress(coded_data, tasks=[], **kwargs)

Decompress byte strings to task-specific features.

Decompresses both VQGAN and DINOv2 layers, uses context as hyperprior for DINO decompression, and generates task-specific outputs.

Parameters:

Name Type Description Default
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "rec1": Basic VQGAN reconstruction
  • "rec2": Enhanced reconstruction using DINOv2 features
  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "rec1": Basic reconstruction (if "rec1" in tasks)
  • "rec2": Enhanced reconstruction (if "rec2" in tasks)
  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

forward

forward(x, **kwargs)

Forward pass for training.

Processes input through both VQGAN and DINOv2 layers, performs conditional compression with context as hyperprior, and returns features and likelihoods.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_vqgan": Original VQGAN features
  • "h_vqgan_hat": Enhanced VQGAN features
  • "h_dino": Original DINO features
  • "h_dino_hat": Reconstructed DINO features
  • "likelihoods": Likelihoods from the DINO codec
  • "x_hat": Reconstructed image (only in eval mode, None during training)

forward_test

forward_test(x, tasks, **kwargs)

Forward pass for testing/inference with compression.

Processes input through both layers, compresses features with context as hyperprior, and generates task-specific outputs.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
tasks list of str

List of tasks to perform. Supported tasks:

  • "rec1": Basic VQGAN reconstruction
  • "rec2": Enhanced reconstruction using DINOv2 features
  • "cls": Classification task
  • "seg": Segmentation task
required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_data dict

Dictionary containing compressed data:

  • "type": "frame"
  • "data": Dictionary with:
    • "layer1": VQGAN compressed data
    • "layer2": DINO compressed data
task_feats dict

Dictionary of task-specific features:

  • "rec1": Basic reconstruction (if "rec1" in tasks)
  • "rec2": Enhanced reconstruction (if "rec2" in tasks)
  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

MPC_I2

MPC_I2(dino_backbone={}, dino_codec={}, **kwargs)

Multi-Purpose Compression model using DINOv2 backbone only.

This is a single-layer compression model that uses DINOv2 for feature extraction and ViT-based latent codec for compression. It supports multiple downstream tasks including classification and segmentation.

Parameters:

Name Type Description Default
dino_backbone dict

Configuration dictionary for the DINOv2 backbone. If "type" key is present, uses instantiate_class for dynamic instantiation. Otherwise, uses Dinov2TimmBackbone with provided config.

{}
dino_codec dict

Configuration dictionary for the DINO codec. If "type" key is present in dino_backbone, uses instantiate_class. Otherwise, uses VitUnionLatentCodec with provided config.

{}
**kwargs dict

Additional keyword arguments (currently unused).

{}

Attributes:

Name Type Description
dino

The DINOv2 backbone model (Dinov2TimmBackbone or dynamically instantiated).

dino_codec

The DINO codec (VitUnionLatentCodec or dynamically instantiated).

patch_size int

Patch size used by the backbone model.

compress

compress(x, qp=0, **kwargs)

Compress input image to byte strings.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter. Defaults to 0.

0
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information

decompress

decompress(coded_unit, tasks=[], **kwargs)

Decompress byte strings to task-specific features.

Parameters:

Name Type Description Default
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
required
tasks list of str

List of tasks to perform. Supported tasks:

  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

extract_feature

extract_feature(x, **kwargs)

Extract features from input image for offline training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_dino": Extracted DINO features

forward

forward(x, qp=0, **kwargs)

Forward pass for training with learned image compression (LIC).

Encodes input image using DINOv2, compresses features with codec, and returns reconstructed features and likelihoods for training.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter. Defaults to 0.

0
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_dino_hat": Reconstructed DINO features
  • "h_dino": Original DINO features
  • "likelihoods": Likelihoods from the codec

forward_test

forward_test(x, qp=0, tasks=[], **kwargs)

Forward pass for testing/inference with compression.

Encodes input image, compresses features, and generates task-specific outputs.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required
qp int

Quantization parameter. Defaults to 0.

0
tasks list of str

List of tasks to perform. Supported tasks:

  • "cls": Classification task
  • "seg": Segmentation task
[]
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
coded_unit dict

Dictionary containing compressed data:

  • "strings": Compressed byte strings
  • "pstate": Compression state information
  • "h_hat": Reconstructed features
task_feats dict

Dictionary of task-specific features:

  • "cls": Classification features (if "cls" in tasks)
  • "seg": Segmentation features (if "seg" in tasks)

get_feature_numel

get_feature_numel(x)

Calculate the total number of elements in the extracted features.

Parameters:

Name Type Description Default
x Tensor

Input image tensor of shape (B, C, H, W).

required

Returns:

Name Type Description
numel int

Total number of elements in the feature tensor.

offline_forward

offline_forward(data, device, qp=0, **kwargs)

Offline forward pass for training with pre-extracted features.

Processes pre-extracted DINO features for learned image compression training. This method is used when features are extracted separately to save memory.

Parameters:

Name Type Description Default
data dict

Dictionary containing:

  • "h_dino": Pre-extracted DINO features
  • "x_shape": Original image shape (B, C, H, W)
required
device device

Device to move tensors to.

required
qp int

Quantization parameter. Defaults to 0.

0
**kwargs dict

Additional keyword arguments (currently unused).

{}

Returns:

Name Type Description
out dict

Dictionary containing:

  • "h_dino_hat": Reconstructed DINO features
  • "h_dino": Original DINO features
  • "likelihoods": Likelihoods from the codec