mpcompress.models
Dinov2OrigClsBypass
Dinov2OrigClsBypass(dino_backbone={}, **kwargs)
DINOv2-Original backbone for classification, WITHOUT compression (bypass).
This model extracts features via encode and decodes for classification, without any compression step. Useful for testing baseline classification accuracy without compression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
dict
|
Configuration for Dinov2OrgBackbone. |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2OrgBackbone
|
The DINOv2 backbone. |
patch_size |
int
|
Patch size used by the backbone. |
img_size |
int or tuple
|
Image size expected by the backbone. |
forward_test
forward_test(x, qp=None, tasks=[], **kwargs)
Forward pass WITHOUT compression - just extract and decode for classification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Ignored (no compression). |
None
|
tasks
|
list of str
|
List of tasks. Supported: "cls". |
[]
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
dict
|
Mock coded data with zero bits. |
task_feats |
dict
|
Dictionary with "cls" key if "cls" in tasks. |
Dinov2OrigClsFCVQ
Dinov2OrigClsFCVQ(dino_backbone={}, fcvq_codec={}, **kwargs)
DINOv2-Original backbone with sliding window + FCVQ compression for Classification.
This model combines Dinov2OrgBackbone with FCVQ codec for feature compression on classification tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
dict
|
Configuration for Dinov2OrgBackbone. |
{}
|
fcvq_codec
|
dict
|
Configuration for FCVQ codec. |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2OrgBackbone
|
The DINOv2 backbone. |
fcvq |
FCVQ
|
The FCVQ codec. |
compress
compress(x, qp=None)
Compress input image to byte strings using FCVQ for classification.
Extracts features via encode, compresses with FCVQ, returns coded_unit. The qp parameter is unused (FCVQ has no QP).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Unused; kept for API compatibility. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing: - "strings": {"fcvq": [strings]} - "pstate": {"feat_shape": tuple} |
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features for classification.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
dict
|
Dictionary containing: - "strings": {"fcvq": [strings]} from compress - "pstate": {"feat_shape": tuple} |
required |
tasks
|
list of str
|
List of tasks. Supported: "cls". |
[]
|
**kwargs
|
Additional arguments (unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary with "cls" key if "cls" in tasks. |
forward_test
forward_test(x, qp=None, tasks=[], **kwargs)
Forward pass with FCVQ compression for classification.
get_feature_numel
get_feature_numel(x)
Total number of elements in encoded features (for bpfp).
Dinov2OrigSlideOnlyPatchCodec
Dinov2OrigSlideOnlyPatchCodec(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, dino_codec={}, **kwargs)
Compression model using DINOv2-Original backbone with sliding window and VTM codec.
This model uses a sliding window approach to handle large images by processing them in overlapping patches. It extracts features using DINOv2-Original backbone and compresses them using VTM codec. Supports segmentation tasks but not classification tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slide_size
|
list of int
|
Size of each sliding window patch [height, width]. Defaults to [518, 518]. |
[518, 518]
|
slide_stride
|
list of int
|
Stride for sliding window [height_stride, width_stride]. Defaults to [259, 259]. |
[259, 259]
|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2-Original backbone. Passed directly to Dinov2OrgBackbone constructor. |
{}
|
dino_codec
|
dict
|
Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor. |
{}
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2OrgBackbone
|
The DINOv2-Original backbone model. |
dino_codec |
VtmFeatureCodec
|
The VTM feature codec for compression. |
patch_size |
int
|
Patch size used by the backbone model. |
img_size |
int or tuple
|
Image size expected by the backbone. |
dynamic_size |
bool
|
Whether the model supports dynamic input sizes. |
slide_size |
list of int
|
Size of each sliding window patch. |
slide_stride |
list of int
|
Stride for sliding window. |
compress
compress(x, qp)
Compress input image to byte strings using sliding window approach.
Processes input image using sliding window, extracts features, and compresses them using VTM codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter for VTM compression. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data in CompressAI-compatible format:
|
Note:
h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features using sliding window.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
dict
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary of task-specific features:
|
forward
forward(x)
Forward pass for training (not implemented).
VTM codec does not require training, so this method raises an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Always raised as VTM does not need training. |
forward_test
forward_test(x, qp, tasks=[], **kwargs)
Forward pass for testing/inference with compression using sliding window.
Processes input image using sliding window approach, extracts features, compresses them with VTM codec, and generates task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter for VTM compression. |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data:
|
task_feats |
dict
|
Dictionary of task-specific features:
|
Note:
h_dino_list structure: [ [(B,L,C), ...], ..., [(B,L,C), ...] ]
stacked_feature shape: (N_crop, N_layer, H*W+1, C)
where N_crop is the number of sliding window crops.
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Uses sliding window approach to extract features and calculates the total number of elements across all crops and layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
int
|
Total number of elements in the stacked feature tensor. |
Dinov2OrigSlideSegBypass
Dinov2OrigSlideSegBypass(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, **kwargs)
DINOv2-Original backbone with sliding window, WITHOUT compression (bypass).
This model is identical to Dinov2OrigSlideOnlyPatchCodec but skips the VTM compression step. Useful for testing baseline mIoU without compression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slide_size
|
list of int
|
Size of each sliding window patch [height, width]. |
[518, 518]
|
slide_stride
|
list of int
|
Stride for sliding window [height_stride, width_stride]. |
[259, 259]
|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2-Original backbone. |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2OrgBackbone
|
The DINOv2-Original backbone model. |
patch_size |
int
|
Patch size used by the backbone model. |
img_size |
int or tuple
|
Image size expected by the backbone. |
dynamic_size |
bool
|
Whether the model supports dynamic input sizes. |
slide_size |
list of int
|
Size of each sliding window patch. |
slide_stride |
list of int
|
Stride for sliding window. |
compress
compress(x, qp)
Compress is not supported - returns empty.
decompress
decompress(*args, **kwargs)
Decompress is not supported - returns empty.
forward_test
forward_test(x, qp=None, tasks=[], **kwargs)
Forward pass WITHOUT compression - just extract and decode features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Ignored (no compression). |
None
|
tasks
|
list of str
|
List of tasks to perform: - "seg": Segmentation task - "cls": Classification task (not supported) |
[]
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
dict
|
Mock coded data with zero bits. |
task_feats |
dict
|
Dictionary of task-specific features: - "seg": Segmentation features (if "seg" in tasks) |
Dinov2OrigSlideSegFCVQ
Dinov2OrigSlideSegFCVQ(slide_size=[518, 518], slide_stride=[259, 259], dino_backbone={}, fcvq_codec={}, **kwargs)
DINOv2-Original backbone with sliding window + FCVQ compression for Segmentation.
This model combines Dinov2OrgBackbone (slide encode) with FCVQ codec for feature compression on segmentation tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
slide_size
|
list of int
|
Size of each sliding window patch [height, width]. |
[518, 518]
|
slide_stride
|
list of int
|
Stride for sliding window [height_stride, width_stride]. |
[259, 259]
|
dino_backbone
|
dict
|
Configuration for Dinov2OrgBackbone. |
{}
|
fcvq_codec
|
dict
|
Configuration for FCVQ codec. |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2OrgBackbone
|
The DINOv2 backbone. |
fcvq |
FCVQ
|
The FCVQ codec. |
compress
compress(x, qp=None)
Compress input image to byte strings using sliding window + FCVQ.
Extracts features via slide_encode, compresses patch tokens with FCVQ, and returns coded_unit. The qp parameter is unused (FCVQ has no QP).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Unused; kept for API compatibility with VTM models. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing: - "strings": {"vtm": [strings]} where strings is list of bytes from FCVQ - "pstate": {"feat_shape": stacked_feat_hat.shape} |
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features using sliding window.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
dict
|
Dictionary containing: - "strings": {"fcvq": [strings]} from compress - "pstate": {"feat_shape": tuple} |
required |
tasks
|
list of str
|
List of tasks. Supported: "seg". |
[]
|
**kwargs
|
Additional arguments (unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary with "seg" key if "seg" in tasks. |
forward_test
forward_test(x, qp=None, tasks=[], **kwargs)
Forward pass with FCVQ compression for segmentation.
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Uses sliding window approach to extract features and calculates the total number of elements across all crops and layers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
int
|
Total number of elements in the stacked feature tensor. |
Dinov2TimmBypass
Dinov2TimmBypass(dino_backbone={}, **kwargs)
A bypass model using DINOv2-Timm backbone for feature extraction without compression.
This model performs feature extraction using a DINOv2-Timm backbone and supports multiple downstream tasks (classification, segmentation) without any compression operations. It returns empty byte strings as a placeholder for compressed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2TimmBackbone
|
The DINOv2-Timm backbone model. |
patch_size |
int
|
Patch size used by the backbone model. |
forward_test
forward_test(x, tasks=[], **kwargs)
Forward pass for testing/inference without compression.
Extracts features using the DINOv2 backbone and generates task-specific features (classification, segmentation) without performing any compression. Returns empty byte strings as a placeholder for compressed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing:
|
task_feats |
dict
|
Dictionary of task-specific features
|
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
int
|
Total number of elements in the feature tensor. |
Dinov2TimmOnlyPatchCodec
Dinov2TimmOnlyPatchCodec(dino_backbone={}, dino_codec={}, **kwargs)
Compression model using DINOv2-Timm backbone with VTM feature codec.
This model extracts features using a DINOv2-Timm backbone and compresses them using VTM (Video Test Model) codec. It supports segmentation tasks but not classification tasks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2-Timm backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
dino_codec
|
dict
|
Configuration dictionary for the VTM feature codec. Passed directly to VtmFeatureCodec constructor. |
{}
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
Dinov2TimmBackbone
|
The DINOv2-Timm backbone model. |
dino_codec |
VtmFeatureCodec
|
The VTM feature codec for compression. |
patch_size |
int
|
Patch size used by the backbone model. |
img_size |
int or tuple
|
Image size expected by the backbone. |
dynamic_size |
bool
|
Whether the model supports dynamic input sizes. |
compress
compress(x, qp)
Compress input image to byte strings.
Extracts features using DINOv2 backbone and compresses them using VTM codec.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter for VTM compression. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data:
|
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
dict
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary of task-specific features:
|
forward
forward(x)
Forward pass for training (not implemented).
VTM codec does not require training, so this method raises an error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Always raised as VTM does not need training. |
forward_test
forward_test(x, qp, tasks, **kwargs)
Forward pass for testing/inference with compression.
Extracts features using DINOv2 backbone, compresses them with VTM codec, and generates task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter for VTM compression. |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data:
|
task_feats |
dict
|
Dictionary of task-specific features:
|
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
int
|
Total number of elements in the feature tensor after segmentation decoding. |
MLoREFrameCodec
MLoREFrameCodec(p=None, stage='stage1', pretrained=True, img_size=(512, 512), drop_path_rate=0.15, device='cuda', **kwargs)
MLoRE Frame-level Codec for multi-task feature compression.
This FrameCodec implements the MPCompress framework specification for compressing a single Access Unit (frame) with multi-task support.
Architecture follows the Feature DU design: - Feature Encoder: ViT backbone front-end extracts intermediate features - Compress: Hyperprior-style codec compresses features to bitstream - Feature Decoder: ViT backbone back-end + task heads produce outputs
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
Configuration dict containing model parameters |
None
|
|
stage
|
Training stage ('stage0', 'stage1', 'stage2') |
'stage1'
|
|
pretrained
|
Whether to load pretrained ViT weights |
True
|
|
img_size
|
Input image size (H, W) |
(512, 512)
|
|
drop_path_rate
|
Drop path rate for stochastic depth |
0.15
|
|
device
|
Device to run on |
'cuda'
|
compress
compress(x: Tensor, tasks: List[str] = None, **kwargs) -> Dict[str, Any]
Compress a single frame (Access Unit).
Following the MPCompress DataUnitCodec interface, compresses input image to coded_unit format.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor (B, 3, H, W) |
required |
tasks
|
List[str]
|
List of tasks for encoding hints |
None
|
**kwargs
|
Additional encoding parameters |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
Dict[str, Any]
|
Dictionary following framework specification: { "strings": {"y": [[bytes]], "z": [[bytes]]}, # Compressed bitstreams "pstate": { "shape": (H, W), "input_shape": (B, C, H, W), "tasks": [...], "resolution": [H, W] } } |
decompress
decompress(coded_unit: Dict[str, Any] = None, strings: Dict[str, List[List[bytes]]] = None, pstate: Dict[str, Any] = None, tasks: List[str] = None, **kwargs) -> Dict[str, torch.Tensor]
Decompress a single frame (Access Unit).
Following the MPCompress DataUnitCodec interface, decodes coded_unit to task outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
Dict[str, Any]
|
Dictionary from compress() (alternative to strings/pstate) |
None
|
strings
|
Dict[str, List[List[bytes]]]
|
Compressed bitstreams dict |
None
|
pstate
|
Dict[str, Any]
|
State dict for decoding |
None
|
tasks
|
List[str]
|
List of tasks to decode (default: from pstate) |
None
|
**kwargs
|
Additional decoding parameters |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
Dict[str, Tensor]
|
Dictionary with task outputs: { "semseg": tensor, "edge": tensor, ... } |
forward
forward(x, tasks=None, episode_tasks=None, return_feat=False)
Forward pass for training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Input image tensor (B, 3, H, W) |
required | |
tasks
|
List of tasks to compute (default: all) |
None
|
|
episode_tasks
|
Task grouping for multi-task routing |
None
|
|
return_feat
|
If True, return features before heads |
False
|
Returns:
| Type | Description |
|---|---|
|
Dict with task predictions and auxiliary info (bpp_loss, mse_loss) |
forward_test
forward_test(x: Tensor, tasks: List[str] = None, return_likelihoods: bool = True, **kwargs) -> Dict[str, Any]
Forward pass for testing/evaluation.
This method performs inference and returns task predictions along with rate estimation via likelihoods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor (B, 3, H, W) |
required |
tasks
|
List[str]
|
List of tasks to compute (default: all) |
None
|
return_likelihoods
|
bool
|
Whether to return likelihoods for rate estimation |
True
|
**kwargs
|
Additional arguments |
{}
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
Dict containing: - Task predictions (e.g., 'semseg', 'edge', etc.) - 'likelihoods': Dict of likelihoods for rate estimation (if return_likelihoods) - 'bpp_loss', 'mse_loss': Compression losses |
get_compression_module
get_compression_module()
Get the feature compression module.
get_feature_numel
get_feature_numel(x: Tensor) -> int
Get number of elements in feature representation.
load_checkpoint
load_checkpoint(checkpoint_path: str, strict: bool = False)
set_grad_mode
set_grad_mode(mode: str)
Set gradient mode for different training phases.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mode
|
str
|
Gradient mode: - 'full_finetune': Train all parameters - 'only_compress': Train only compression module - 'finetune_mona': Fine-tune Mona adapters and heads |
required |
update
update(scale_table=None, force=False)
Update entropy model parameters.
MLoREVideoCodec
MLoREVideoCodec(frame_codec: MLoREFrameCodec = None, p=None, stage: str = 'stage1', pretrained: bool = True, img_size: Tuple[int, int] = (512, 512), device: str = 'cuda', **kwargs)
MLoRE Video-level Codec for multi-task feature compression.
This VideoCodec implements the MPCompress framework specification for compressing entire video sequences with multi-task support.
Supports both frame-wise and layer-wise organization: - frame_wise: Each frame compressed independently - layer_wise: Features and metadata organized by layer
Framework interface methods: - compress_video(video_reader, meta, codec_args) -> coded_data - decompress_video(coded_data, codec_args) -> results - compress_frame(frame, codec_args) -> coded_unit - decompress_frame(coded_unit, codec_args) -> task_feats
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
frame_codec
|
MLoREFrameCodec
|
MLoREFrameCodec instance or config |
None
|
p
|
Configuration dict (if frame_codec not provided) |
None
|
|
stage
|
str
|
Training stage ('stage0', 'stage1', 'stage2') |
'stage1'
|
pretrained
|
bool
|
Whether to load pretrained weights |
True
|
img_size
|
Tuple[int, int]
|
Input image size |
(512, 512)
|
device
|
str
|
Device to run on |
'cuda'
|
compress_frame
compress_frame(frame: Tensor, codec_args: Dict[str, Any] = None) -> Dict[str, Any]
compress_video
compress_video(video_reader, meta: Dict[str, Any] = None, codec_args: Dict[str, Any] = None) -> Dict[str, Any]
Compress an entire video sequence.
Following the MPCompress VideoCodec interface, compresses all frames and returns coded_data intermediate representation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
video_reader
|
Video reader object supporting iteration |
required | |
meta
|
Dict[str, Any]
|
Video metadata dict with keys like: - seq_name: Sequence name - src_width, src_height: Original dimensions - frame_num: Total frame count |
None
|
codec_args
|
Dict[str, Any]
|
Encoding arguments: - tasks: List of tasks to encode for |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
Dict[str, Any]
|
Dictionary following framework specification: { "type": "frame_wise_video", "data": { 0: coded_unit_0, 1: coded_unit_1, ... }, "meta": {...} } |
decompress_frame
decompress_frame(coded_unit: Dict[str, Any], codec_args: Dict[str, Any] = None) -> Dict[str, torch.Tensor]
decompress_video
decompress_video(coded_data: Dict[str, Any], codec_args: Dict[str, Any] = None) -> Dict[int, Dict[str, torch.Tensor]]
Decompress an entire video sequence.
Following the MPCompress VideoCodec interface, decodes coded_data to task outputs for each frame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_data
|
Dict[str, Any]
|
Dictionary from compress_video() |
required |
codec_args
|
Dict[str, Any]
|
Decoding arguments: - tasks: List of tasks to decode |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
results |
Dict[int, Dict[str, Tensor]]
|
Dictionary with frame index as key: { 0: {"semseg": tensor, "edge": tensor, ...}, 1: {...}, ... } |
forward
forward(x: Tensor, tasks: List[str] = None, **kwargs)
Forward pass through frame codec.
forward_test
forward_test(x: Tensor, tasks: List[str] = None, **kwargs)
Test forward pass through frame codec.
load_checkpoint
load_checkpoint(checkpoint_path: str, strict: bool = False)
Load model weights from checkpoint.
update
update(scale_table=None, force: bool = False)
Update entropy model parameters.
MLoREWrapperCodec
MLoREWrapperCodec(p, checkpoint_path=None)
Wrapper to use RFC's original MLoREWrapper_coding class.
This provides exact compatibility with pretrained RFC weights while conforming to MPCompress interfaces.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
p
|
Configuration dict |
required | |
checkpoint_path
|
Path to pretrained weights |
None
|
compress
compress(x, tasks=None, **kwargs)
Compress features.
decompress
decompress(coded_unit, tasks=None, **kwargs)
Decompress to task outputs.
forward
forward(x, tasks=None, episode_tasks=None, **kwargs)
Forward pass.
load_checkpoint
load_checkpoint(checkpoint_path)
Load pretrained weights.
MPC_I1
MPC_I1(vqgan_config, **kwargs)
Multi-Purpose Compression model using VQGAN backbone only.
This is a single-layer compression model that uses VQGAN for feature extraction and uniform token codec for compression. It provides basic image reconstruction capabilities.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vqgan_config
|
dict
|
Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor. |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
vqgan |
VqganBackbone
|
The VQGAN backbone model. |
vqgan_codec |
UniformTokenCodec
|
The uniform token codec for compression. |
patch_size |
int
|
Patch size used by the model (fixed at 16). |
compress
compress(x, **kwargs)
Compress input image to byte strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data:
|
decompress
decompress(coded_unit, **kwargs)
Decompress byte strings to reconstructed image and features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
dict
|
Dictionary containing compressed data:
|
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary of task-specific features:
|
forward
forward(x, **kwargs)
Forward pass for training.
Encodes input image using VQGAN, compresses tokens, and decodes to reconstructed image.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
MPC_I12
MPC_I12(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)
Multi-Purpose Compression model with two layers: VQGAN and DINOv2.
This is a two-layer compression model that combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression. The DINOv2 codec uses VQGAN context for conditional compression. Supports multiple tasks including reconstruction, classification, and segmentation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vqgan_backbone
|
dict
|
Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor. |
{}
|
vqgan_codec
|
dict
|
Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor. |
{}
|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
dino_codec
|
dict
|
Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecWithCtx constructor. |
{}
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
vqgan |
VqganBackbone
|
The VQGAN backbone model. |
vqgan_codec |
UniformTokenCodec
|
The VQGAN codec. |
dino |
Dinov2TimmBackbone
|
The DINOv2 backbone model. |
dino_codec |
VitUnionLatentCodecWithCtx
|
The DINO codec with context. |
patch_size |
int
|
Patch size used by the DINOv2 backbone. |
cond_dec_for_vqgan |
Sequential
|
Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features. |
extract_feature
extract_feature(x, **kwargs)
Extract features from input image for offline training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
forward
forward(x, **kwargs)
Forward pass for training.
Processes input through both VQGAN and DINOv2 layers, performs conditional compression, and returns features and likelihoods for training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
forward_test
forward_test(x, tasks, **kwargs)
Forward pass for testing/inference with compression.
Processes input through both layers, compresses features, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
dict
|
Dictionary containing compressed data:
|
task_feats |
dict
|
Dictionary of task-specific features:
|
offline_forward
offline_forward(data, device, **kwargs)
Offline forward pass for training with pre-extracted features.
Processes pre-extracted VQGAN tokens and DINO features for training. This method is used when features are extracted separately to save memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary containing:
|
required |
device
|
device
|
Device to move tensors to. |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
MPC_I12_CtxAsHyper
MPC_I12_CtxAsHyper(vqgan_backbone={}, vqgan_codec={}, dino_backbone={}, dino_codec={}, **kwargs)
Multi-Purpose Compression model with context as hyperprior.
This is a variant of MPC_I12 where the VQGAN context is treated as hyperprior for the DINOv2 codec. Similar to MPC_I12, it combines VQGAN (layer 1) and DINOv2 (layer 2) for hierarchical compression, but uses a different codec architecture that treats context as hyperprior.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
vqgan_backbone
|
dict
|
Configuration dictionary for the VQGAN backbone. Passed directly to VqganBackbone constructor. |
{}
|
vqgan_codec
|
dict
|
Configuration dictionary for the VQGAN codec. Passed directly to UniformTokenCodec constructor. |
{}
|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2 backbone. Passed directly to Dinov2TimmBackbone constructor. |
{}
|
dino_codec
|
dict
|
Configuration dictionary for the DINO codec. Must contain "h_dim" and "ctx_dim" keys for the conditional decoder. Passed directly to VitUnionLatentCodecCtxAsHyper constructor. |
{}
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
vqgan |
VqganBackbone
|
The VQGAN backbone model. |
vqgan_codec |
UniformTokenCodec
|
The VQGAN codec. |
dino |
Dinov2TimmBackbone
|
The DINOv2 backbone model. |
dino_codec |
VitUnionLatentCodecCtxAsHyper
|
The DINO codec with context as hyperprior. |
patch_size |
int
|
Patch size used by the DINOv2 backbone. |
cond_dec_for_vqgan |
Sequential
|
Conditional decoder for enhancing VQGAN reconstruction using DINOv2 features. |
compress
compress(x, **kwargs)
Compress input image to byte strings using both layers.
Processes input through both VQGAN and DINOv2 layers, compresses features with context as hyperprior, and returns compressed data.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
dict
|
Dictionary containing compressed data:
|
decompress
decompress(coded_data, tasks=[], **kwargs)
Decompress byte strings to task-specific features.
Decompresses both VQGAN and DINOv2 layers, uses context as hyperprior for DINO decompression, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_data
|
dict
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary of task-specific features:
|
forward
forward(x, **kwargs)
Forward pass for training.
Processes input through both VQGAN and DINOv2 layers, performs conditional compression with context as hyperprior, and returns features and likelihoods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
forward_test
forward_test(x, tasks, **kwargs)
Forward pass for testing/inference with compression.
Processes input through both layers, compresses features with context as hyperprior, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_data |
dict
|
Dictionary containing compressed data:
|
task_feats |
dict
|
Dictionary of task-specific features:
|
MPC_I2
MPC_I2(dino_backbone={}, dino_codec={}, **kwargs)
Multi-Purpose Compression model using DINOv2 backbone only.
This is a single-layer compression model that uses DINOv2 for feature extraction and ViT-based latent codec for compression. It supports multiple downstream tasks including classification and segmentation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dino_backbone
|
dict
|
Configuration dictionary for the DINOv2 backbone. If "type" key is present, uses instantiate_class for dynamic instantiation. Otherwise, uses Dinov2TimmBackbone with provided config. |
{}
|
dino_codec
|
dict
|
Configuration dictionary for the DINO codec. If "type" key is present in dino_backbone, uses instantiate_class. Otherwise, uses VitUnionLatentCodec with provided config. |
{}
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Attributes:
| Name | Type | Description |
|---|---|---|
dino |
The DINOv2 backbone model (Dinov2TimmBackbone or dynamically instantiated). |
|
dino_codec |
The DINO codec (VitUnionLatentCodec or dynamically instantiated). |
|
patch_size |
int
|
Patch size used by the backbone model. |
compress
compress(x, qp=0, **kwargs)
Compress input image to byte strings.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter. Defaults to 0. |
0
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data:
|
decompress
decompress(coded_unit, tasks=[], **kwargs)
Decompress byte strings to task-specific features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coded_unit
|
dict
|
Dictionary containing compressed data:
|
required |
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
dict
|
Dictionary of task-specific features:
|
extract_feature
extract_feature(x, **kwargs)
Extract features from input image for offline training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
forward
forward(x, qp=0, **kwargs)
Forward pass for training with learned image compression (LIC).
Encodes input image using DINOv2, compresses features with codec, and returns reconstructed features and likelihoods for training.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter. Defaults to 0. |
0
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|
forward_test
forward_test(x, qp=0, tasks=[], **kwargs)
Forward pass for testing/inference with compression.
Encodes input image, compresses features, and generates task-specific outputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
qp
|
int
|
Quantization parameter. Defaults to 0. |
0
|
tasks
|
list of str
|
List of tasks to perform. Supported tasks:
|
[]
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
dict
|
Dictionary containing compressed data:
|
task_feats |
dict
|
Dictionary of task-specific features:
|
get_feature_numel
get_feature_numel(x)
Calculate the total number of elements in the extracted features.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Tensor
|
Input image tensor of shape (B, C, H, W). |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numel |
int
|
Total number of elements in the feature tensor. |
offline_forward
offline_forward(data, device, qp=0, **kwargs)
Offline forward pass for training with pre-extracted features.
Processes pre-extracted DINO features for learned image compression training. This method is used when features are extracted separately to save memory.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
dict
|
Dictionary containing:
|
required |
device
|
device
|
Device to move tensors to. |
required |
qp
|
int
|
Quantization parameter. Defaults to 0. |
0
|
**kwargs
|
dict
|
Additional keyword arguments (currently unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
out |
dict
|
Dictionary containing:
|