Skip to content

Issues with user-defined codecs #108

Open
@aleixalcacer

Description

@aleixalcacer

I'm facing issues when creating a simple codec that just makes a copy of the data to get familiar with Blosc's registering machinery. I attach the code:

import blosc2
import numpy as np

# Create an User-defined codec (just a memcpy)

def encoder(input, output, meta, schunk: blosc2.SChunk):
    print(f"Encoder output size: {output.size}")
    output[:schunk.blocksize] = input[:schunk.blocksize]
    return schunk.blocksize

def decoder(input, output, meta, schunk: blosc2.SChunk):
    output[:schunk.blocksize] = input[:schunk.blocksize]
    return schunk.blocksize

# Register the codec
codec_id = 200
blosc2.register_codec('test1', codec_id, encoder, decoder)

# Compress this array with the new codec

shape = (100, 100)
a = np.ones(shape, dtype=np.int64)


cparams = {
    'codec': codec_id,
    'nthreads': 1,
    'filters': [],
    'splitmode': blosc2.SplitMode.NEVER_SPLIT,
}

dparams = {
    'nthreads': 1,
}

chunks = shape
blocks = (50, 50)

c_a = blosc2.asarray(a, chunks=chunks, blocks=blocks, cparams=cparams, dparams=dparams)

However, when I run the previous code, I get the following:

Encoder output size: 20000
Encoder output size: 20000
Encoder output size: 20000
Encoder output size: 19968

ValueError: could not broadcast input array from shape (20000,) into shape (19968,)

Looking at this, it appears that the last block of the chunk is smaller than the others. Do you know what is happening? Is there something I'm doing wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions