View Single Post
  #15 (permalink)  
Old 05-04-2012, 06:29 AM
Steve Howell
Guest
 
Posts: n/a
Default Re: key/value store optimized for disk storage

On May 3, 11:03*pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Steve Howell <showel...@yahoo.com> writes:
> > Sounds like a useful technique. *The text snippets that I'm
> > compressing are indeed mostly English words, and 7-bit ascii, so it
> > would be practical to use a compression library that just uses the
> > same good-enough encodings every time, so that you don't have to write
> > the encoding dictionary as part of every small payload.

>
> Zlib stays adaptive, the idea is just to start with some ready-made
> compression state that reflects the statistics of your data.
>
> > Sort of as you suggest, you could build a Huffman encoding for a
> > representative run of data, save that tree off somewhere, and then use
> > it for all your future encoding/decoding.

>
> Zlib is better than Huffman in my experience, and Python's zlib module
> already has the right entry points. *Looking at the docs,
> Compress.flush(Z_SYNC_FLUSH) is the important one. *I did something like
> this before and it was around 20 lines of code. *I don't have it around
> any more but maybe I can write something else like it sometime.
>
> > Is there a name to describe this technique?

>
> Incremental compression maybe?


Many thanks, this is getting me on the right path:

compressor = zlib.compressobj()
s = compressor.compress("foobar")
s += compressor.flush(zlib.Z_SYNC_FLUSH)

s_start = s
compressor2 = compressor.copy()

s += compressor.compress("baz")
s += compressor.flush(zlib.Z_FINISH)
print zlib.decompress(s)

s = s_start
s += compressor2.compress("spam")
s += compressor2.flush(zlib.Z_FINISH)
print zlib.decompress(s)
Reply With Quote