On May 3, 11:03*pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:
> > Sounds like a useful technique. *The text snippets that I'm
> > compressing are indeed mostly English words, and 7-bit ascii, so it
> > would be practical to use a compression library that just uses the
> > same good-enough encodings every time, so that you don't have to write
> > the encoding dictionary as part of every small payload.
> Zlib stays adaptive, the idea is just to start with some ready-made
> compression state that reflects the statistics of your data.
> > Sort of as you suggest, you could build a Huffman encoding for a
> > representative run of data, save that tree off somewhere, and then use
> > it for all your future encoding/decoding.
> Zlib is better than Huffman in my experience, and Python's zlib module
> already has the right entry points. *Looking at the docs,
> Compress.flush(Z_SYNC_FLUSH) is the important one. *I did something like
> this before and it was around 20 lines of code. *I don't have it around
> any more but maybe I can write something else like it sometime.
> > Is there a name to describe this technique?
> Incremental compression maybe?
Many thanks, this is getting me on the right path:
compressor = zlib.compressobj()
s = compressor.compress("foobar")
s += compressor.flush(zlib.Z_SYNC_FLUSH)
s_start = s
compressor2 = compressor.copy()
s += compressor.compress("baz")
s += compressor.flush(zlib.Z_FINISH)
s = s_start
s += compressor2.compress("spam")
s += compressor2.flush(zlib.Z_FINISH)