View Single Post
  #8 (permalink)  
Old 05-03-2012, 08:42 AM
Steve Howell
Guest
 
Posts: n/a
Default Re: key/value store optimized for disk storage

On May 2, 11:48*pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Paul Rubin <no.em...@nospam.invalid> writes:
> >looking at the spec more closely, there are 256 hash tables.. ...

>
> You know, there is a much simpler way to do this, if you can afford to
> use a few hundred MB of memory and you don't mind some load time when
> the program first starts. *Just dump all the data sequentially into a
> file. *Then scan through the file, building up a Python dictionary
> mapping data keys to byte offsets in the file (this is a few hundred MB
> if you have 3M keys). *Then dump the dictionary as a Python pickle and
> read it back in when you start the program.
>
> You may want to turn off the cyclic garbage collector when building or
> loading the dictionary, as it badly can slow down the construction of
> big lists and maybe dicts (I'm not sure of the latter).


I'm starting to lean toward the file-offset/seek approach. I am
writing some benchmarks on it, comparing it to a more file-system
based approach like I mentioned in my original post. I'll report back
when I get results, but it's already way past my bedtime for tonight.

Thanks for all your help and suggestions.
Reply With Quote