Skip to content

mpcompress.token_codecs

UniformTokenCodec

UniformTokenCodec(alphabet_size, **kwargs)

Uniform token codec for compression.

This codec assumes a uniform distribution over the alphabet and encodes tokens using uniform quantization. It extends CompressionModel to provide compression and decompression functionality for discrete tokens.

Parameters:

Name Type Description Default
alphabet_size int

Size of the token alphabet (number of possible values).

required
**kwargs dict

Additional keyword arguments passed to parent class.

{}

compress

compress(tokens)

Compress tokens to bitstring.

Note: tokens should not have batch dimension.

Parameters:

Name Type Description Default
tokens Tensor

Input tokens to compress. Shape should be (H, W, ...) without batch dimension.

required

Returns:

Name Type Description
coded_unit dict

Dictionary containing:

  • "strings" (dict): Dictionary with key "t" containing nested list with encoded bitstring. Nested structure is for consistent API.
  • "pstate" (dict): Dictionary with key "t_shape" containing the original shape of tokens as a tuple.

decompress

decompress(strings, pstate, **kwargs)

Decompress bitstring to tokens.

Parameters:

Name Type Description Default
strings dict

Dictionary with key "t" containing nested list with encoded bitstring. Nested structure is for consistent API.

required
pstate dict

Dictionary with key "t_shape" containing the original shape of tokens as a tuple.

required
**kwargs dict

Additional keyword arguments (unused).

{}

Returns:

Name Type Description
task_feats dict

Dictionary with key "tokens" containing decompressed tokens of shape specified in pstate["t_shape"]. Note: tokens do not have batch dimension.

forward

forward(tokens)

Forward pass to compute uniform likelihoods.

Parameters:

Name Type Description Default
tokens Tensor

Input tokens of any shape.

required

Returns:

Name Type Description
output dict

Dictionary containing: - "likelihoods" (dict): Dictionary with key "t" containing uniform likelihoods of shape matching tokens. - "tokens" (torch.Tensor): Original input tokens.