Multi-part gzip file random access (in Java)?

The design of GZIP, as you have realized, is not friendly to random access.

The design of GZIP, as you have realized, is not friendly to random access. You can do as you describe, and then if you run into an error in the decompressor, conclude that the signature you found was actually compressed data. If you finish decompressing, then it's easy to verify the validity of the stream just decompressed, via the CRC32.

If the files are not so big, you might consider just de-compressing all of the entries in series, and retaining the offsets of the signatures so as to build a directory. As you decompress, dump the bytes to a bit bucket. At that point you will have generated a directory, and you can then support random access based on filename, date, or other metadata.

This will be reasonably fast for files below 100k. Just as a guess, if you had 10 files of around 100k each, it would probably be done in 2s on a modern CPU. This is what I mean by "pretty fast".

But only you know the perf requirements of your application . Do you have a GZipInputStream class? If so you are half-way there.

The BGZF file format, compatible with GZIP was developped by the biologists. (...) The advantage of BGZF over conventional gzip is that BGZF allows for seeking without having to scan through the entire file up to the position being sought. In picard.svn.sourceforge.net/viewvc/picard... , have a look at BlockCompressedOutputStream and BlockCompressedInputStream.java.

I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.

Related Questions