I'm working on implementing a differential update mechanism in .NET. The idea is to split data into 64 KB blocks and compress each block independently using SharpZipLib in raw mode (i.e., without the zlib header and checksum). This approach allows me to only download the blocks that have changed during an update.
However, the issue I'm encountering is that each compressed block is a valid deflate stream on its own and, by default, ends with the BFINAL bit set to 1. When I concatenate these blocks, standard decompressors (like 7zip) only decompress the first block because they interpret the BFINAL bit as the end of the stream.
My question is: Is there any way—through an API setting, configuration tweak, or perhaps by modifying SharpZipLib or using another .NET library—to generate deflate blocks so that the intermediate blocks do not have the BFINAL bit set (only the final block should have BFINAL = 1)? This would ensure that the concatenation of all blocks forms a single continuous deflate stream that can be decompressed by standard tools.
Any guidance or suggestions on this matter would be greatly appreciated!
I'm working on implementing a differential update mechanism in .NET. The idea is to split data into 64 KB blocks and compress each block independently using SharpZipLib in raw mode (i.e., without the zlib header and checksum). This approach allows me to only download the blocks that have changed during an update.
However, the issue I'm encountering is that each compressed block is a valid deflate stream on its own and, by default, ends with the BFINAL bit set to 1. When I concatenate these blocks, standard decompressors (like 7zip) only decompress the first block because they interpret the BFINAL bit as the end of the stream.
My question is: Is there any way—through an API setting, configuration tweak, or perhaps by modifying SharpZipLib or using another .NET library—to generate deflate blocks so that the intermediate blocks do not have the BFINAL bit set (only the final block should have BFINAL = 1)? This would ensure that the concatenation of all blocks forms a single continuous deflate stream that can be decompressed by standard tools.
Any guidance or suggestions on this matter would be greatly appreciated!
Share Improve this question asked Mar 16 at 15:14 SuperJMNSuperJMN 14k20 gold badges101 silver badges205 bronze badges1 Answer
Reset to default 1If you're using DeflaterOutputStream
, then you can try Flush()
instead of Finish()
. The documentation doesn't say what Flush()
does exactly, but I'm guessing it calls zlib with Z_SYNC_FLUSH
, which would compress all the data provided so far and end it with an empty stored block. That would end the deflate stream on a byte boundary with BFINAL false, and so permit concatenating another deflate stream after it.
You should use Finish()
on your last block to end the deflate stream.
Update:
The above answer was accepted too quickly. I tested it, and found that my guess was wrong. Flush()
unfortunately does a Z_PARTIAL_FLUSH
, which does not end the the deflate stream on a byte boundary, and so does not facilitate the concatenation of so-constructed streams.
A quick perusal of the documentation does not turn up any promising methods for what would be needed here. You may need to use zlib directly.
Or, there is some bit twiddling you could do. What you'll get from a Z_PARTIAL_FLUSH
is a non-final empty fixed block appended after the last deflate block with data. That consists of ten bits, of which zero to seven of the bits may be held back since they did not complete a byte. Those bits, in the order they are appended, are 0100000000
. That 1
bit will certainly be in the output you get. You can find it by searching backwards through any 0
bits at the end until you get to the first 1
bit. Back up one more bit, and you are at the start of that appended fixed block. You can replace it with a non-final empty stored block, which will end at a byte boundary. Now another deflate stream could be appended after that. That block is the bits 001
followed by zero to seven 0
bits to bring the stream to a byte boundary. Then the four bytes 0x00 0x00 0xff 0xff
.
Note that the bits noted above are placed in each byte in the order from the least to the most significant bit. The search for the 1
of the fixed block would be backwards, from the most significant bit towards the least.