Skip to content

Datasets are created with scaleoffset filter #4

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
hanslovsky opened this issue Feb 16, 2019 · 2 comments
Open

Datasets are created with scaleoffset filter #4

hanslovsky opened this issue Feb 16, 2019 · 2 comments

Comments

@hanslovsky
Copy link
Contributor

hanslovsky commented Feb 16, 2019

Create a dataset like this (kotlin):

import net.imglib2.img.array.ArrayImgs
import org.janelia.saalfeldlab.n5.GzipCompression
import org.janelia.saalfeldlab.n5.hdf5.N5HDF5Writer
import org.janelia.saalfeldlab.n5.imglib2.N5Utils
import kotlin.random.Random

fun main(args: Array<String>) {
	val filename = "/home/hanslovskyp/local/tmp/some-file.h5"
	val rai = ArrayImgs.unsignedLongs(10, 20, 30)
	val rng = Random(100L)
	rai.forEach { it.set(rng.nextLong()) }
	N5Utils.save(rai, N5HDF5Writer(filename), "dataset", intArrayOf(3,4,7), GzipCompression())
}

Look at dataset info using h5dump:

$ h5dump -H -p -d "dataset" /home/hanslovskyp/local/tmp/some-file.h5
HDF5 "/home/hanslovskyp/local/tmp/some-file.h5" {
DATASET "dataset" {
   DATATYPE  H5T_STD_U64LE
   DATASPACE  SIMPLE { ( 30, 20, 10 ) / ( H5S_UNLIMITED, H5S_UNLIMITED, H5S_UNLIMITED ) }
   STORAGE_LAYOUT {
      CHUNKED ( 7, 4, 3 )
      SIZE 52306 (0.918:1 COMPRESSION)
   }
   FILTERS {
      COMPRESSION SCALEOFFSET { MIN BITS 2 }
      COMPRESSION DEFLATE { LEVEL 6 }
   }
   FILLVALUE {
      FILL_TIME H5D_FILL_TIME_ALLOC
      VALUE  H5D_FILL_VALUE_DEFAULT
   }
   ALLOCATION_TIME {
      H5D_ALLOC_TIME_INCR
   }
}
}

Scale-offset is lossy compression and the hdf5 library seems to have a memory leak when the parameter is set (see also h5py/h5py#984)

@hanslovsky
Copy link
Contributor Author

Could be an bug upstream, either in the Java bindings or the hdf library itself.

@axtimwalde
Copy link
Contributor

Citing http://svnsis.ethz.ch/doc/hdf5/current/ch/systemsx/cisd/hdf5/HDF5IntStorageFeatures.html#INT_AUTO_SCALING

Note that this compression is lossless if scalineFactor >= ceil(log2(max(values) - min(values) + 1). This in made sure when using INT_AUTO_SCALING, thus INT_AUTO_SCALING is always losless.

Nevertheless, I changed the code to use INT_AUTO_SCALING_UNSIGNED for unsigned types which sounds better although I do not understand the difference a7b3735 .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants