On Fri, 16 Dec 2011, Roedy Green wrote:
> Let's say I had a million records each with a text field. I wanted to
> find all records that contained a given substring. Are there fast
> algorithms to do that or do you have to scan the whole thing linearly?
http://en.wikipedia.org/wiki/Inverted_index
> I imagine you could build an index of words. Then maybe you could
> figure out which words contain the string, and narrow your search to
> records with those words.
Exactly that!
> Similarly, I was wondering how you might build an index to all the files
> on a computer so that you could find one just by giving a wild card or
> partial filename.
There is a standard utility on unix that does just that:
http://en.wikipedia.org/wiki/Locate_%28Unix%29
The source is in C, but might be informative (this is for the 'mlocate'
version):
https://fedorahosted.org/mlocate/browser/src
tom
--
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners