|
|||
|
Do any of you have a way around the "short word feature", other than modifying ft_min_word_len? I'm getting more and more complaints that my "search engine sucks" because it can't tell the difference between (for example) "9mm pistol" and "pistol", but I don't really want to slow it down by modifying ft_min_word_len, either.
I read one solution of having a second column, and adding __ to every word,then modifying the query to add __ to each of the query words to make it match the word in the table. But of course, that doubles the size of the table, and since one of my tables are 1.3G, I'm not sure that doubling it is very wise. If I did lower ft_min_word_len to 2, would I REALLY notice that much of a query speed difference? Have any of you benchmarked it? Are we talking abouta difference of 0.007 to 0.009 (which is acceptable), or 0.007 to 2.0 (which is not)? TIA, Jason |
|
|
||||
|
||||
|
|
|
|||
|
Jason C <jwcarlton@gmail.com> wrote:
> Do any of you have a way around the "short word feature", other than > modifying ft_min_word_len? I'm getting more and more complaints that > my "search engine sucks" because it can't tell the difference between > (for example) "9mm pistol" and "pistol", but I don't really want to > slow it down by modifying ft_min_word_len, either. Congratulations! You have just found why the MySQL fulltext search sucks. And any other search engines without "term preprocessor" suck as well. If you want terms like "9mm", "C++" or "AS/400" indexable (and thus: findable) then you must transform them, such that the search engine recognizes them as words. > I read one solution of having a second column, and adding __ to > every word, then modifying the query to add __ to each of the query > words to make it match the word in the table. That's the basic idea, but I rather suggest "9mm" -> nine_millimeter "C++" -> C_plus_plus etc. pp. XL |
|
|||
|
On Monday, July 16, 2012 7:17:12 AM UTC-4, Axel Schwenke wrote:
> That's the basic idea, but I rather suggest > > "9mm" -> nine_millimeter > "C++" -> C_plus_plus > > etc. pp. That makes sense, but I'm not entirely sure how to do that on the fly. There are millions of potential combinations that could be queried, so I'm not sure I would write a program to know when it should be "momma" or "momillimetera" (replacing all "mm" with "millimeter"). Would it be practical to convert all characters to hex code? This is probably crazy, but is there a way to query MySQL as hex, or would I still have to have a second column with everything converted to hex? Or, would it simply be faster to use LIKE, convert all whitespaces in the query to "%", and just forget about MATCH...AGAINST? |
|
|||
|
On 7/16/2012 7:03 PM, Jason C wrote:
> On Monday, July 16, 2012 7:17:12 AM UTC-4, Axel Schwenke wrote: >> That's the basic idea, but I rather suggest >> >> "9mm" -> nine_millimeter >> "C++" -> C_plus_plus >> >> etc. pp. > > That makes sense, but I'm not entirely sure how to do that on the fly. There are millions of potential combinations that could be queried, so I'm not sure I would write a program to know when it should be "momma" or "momillimetera" (replacing all "mm" with "millimeter"). > > Would it be practical to convert all characters to hex code? This is probably crazy, but is there a way to query MySQL as hex, or would I still have to have a second column with everything converted to hex? > > Or, would it simply be faster to use LIKE, convert all whitespaces in the query to "%", and just forget about MATCH...AGAINST? > If you want to search for "millions of potential combinations", then your best bet is to just change your ft_min_word_len to 2. Yes, it will slow down searches - but it will still be better than trying to keep up with the millions of short words. This is a problem with any database, not just mysql. Full text searches are not handled well with any of the databases. -- ================== Remove the "x" from my email address Jerry Stuckle JDS Computer Training Corp. jstucklex@attglobal.net ================== |
|
|||
|
On Mon, 16 Jul 2012 19:03:59 -0400, Jason C wrote
(in article <31d1a18c-295c-4826-b4ec-c9ef9624f444@googlegroups.com>): > On Monday, July 16, 2012 7:17:12 AM UTC-4, Axel Schwenke wrote: >> That's the basic idea, but I rather suggest >> >> "9mm" -> nine_millimeter >> "C++" -> C_plus_plus >> >> etc. pp. > > That makes sense, but I'm not entirely sure how to do that on the fly. There > are millions of potential combinations that could be queried, so I'm not sure > I would write a program to know when it should be "momma" or "momillimetera" > (replacing all "mm" with "millimeter"). > > Would it be practical to convert all characters to hex code? This is probably > crazy, but is there a way to query MySQL as hex, or would I still have to > have a second column with everything converted to hex? > > Or, would it simply be faster to use LIKE, convert all whitespaces in the > query to "%", and just forget about MATCH...AGAINST? Try Sphinx search. I was created specifically for this problem. http://sphinxsearch.com/ -- Jonas |
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|