View Single Post
  #2 (permalink)  
Old 12-16-2011, 05:58 PM
Tom Anderson
Guest
 
Posts: n/a
Default Re: Fast String search algorithm

On Fri, 16 Dec 2011, Roedy Green wrote:

> Let's say I had a million records each with a text field. I wanted to
> find all records that contained a given substring. Are there fast
> algorithms to do that or do you have to scan the whole thing linearly?


http://en.wikipedia.org/wiki/Inverted_index

> I imagine you could build an index of words. Then maybe you could
> figure out which words contain the string, and narrow your search to
> records with those words.


Exactly that!

> Similarly, I was wondering how you might build an index to all the files
> on a computer so that you could find one just by giving a wild card or
> partial filename.


There is a standard utility on unix that does just that:

http://en.wikipedia.org/wiki/Locate_%28Unix%29

The source is in C, but might be informative (this is for the 'mlocate'
version):

https://fedorahosted.org/mlocate/browser/src

tom

--
Get a fucking hobby that isn't breathing, browsing 4chan, or fapping. --
The Well Cultured Anonymous, on Manners
Reply With Quote