mpcompress.token_codecs
UniformTokenCodec
UniformTokenCodec(alphabet_size, **kwargs)
Uniform token codec for compression.
This codec assumes a uniform distribution over the alphabet and encodes tokens using uniform quantization. It extends CompressionModel to provide compression and decompression functionality for discrete tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
alphabet_size
|
|
Size of the token alphabet (number of possible values). |
required |
**kwargs
|
|
Additional keyword arguments passed to parent class. |
{}
|
compress
compress(tokens)
Compress tokens to bitstring.
Note: tokens should not have batch dimension.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
|
Input tokens to compress. Shape should be (H, W, ...) without batch dimension. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
coded_unit |
|
Dictionary containing:
|
decompress
decompress(strings, pstate, **kwargs)
Decompress bitstring to tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strings
|
|
Dictionary with key "t" containing nested list with encoded bitstring. Nested structure is for consistent API. |
required |
pstate
|
|
Dictionary with key "t_shape" containing the original shape of tokens as a tuple. |
required |
**kwargs
|
|
Additional keyword arguments (unused). |
{}
|
Returns:
| Name | Type | Description |
|---|---|---|
task_feats |
|
Dictionary with key "tokens" containing decompressed tokens of shape specified in pstate["t_shape"]. Note: tokens do not have batch dimension. |
forward
forward(tokens)
Forward pass to compute uniform likelihoods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tokens
|
|
Input tokens of any shape. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
output |
|
Dictionary containing: - "likelihoods" (dict): Dictionary with key "t" containing uniform likelihoods of shape matching tokens. - "tokens" (torch.Tensor): Original input tokens. |