Thread: sound index
hello. does anybody know any solutions to the problem of searching words/phrases, which are close to each other by sounding? e.g. soundex index or smth. problem I have: tag suggestion mechanism, similar to google suggest, which is intended to suggest names of people (search field "person's name" in web form). it would be great if it does its work smarter than simple LIKE. also, i'd be happy to listen opinions from people who have experience of usage of such things like soundex. -- Best regards, Nikolay
On Tue, Apr 11, 2006 at 05:28:12AM -0700, Nikolay Samokhvalov wrote: > hello. > > does anybody know any solutions to the problem of searching > words/phrases, which are close to each other by sounding? e.g. soundex > index or smth. Check out contrib/fuzzystrmatch. It has a number of such algorithms. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
> also, i'd be happy to listen opinions from people who have experience of usage > of such things like soundex. Soundex is grossly outdated. It was designed for manual use by 19th century census takers, and I'm always surprised to see it still used. Metaphone (google search gets good results) does a much better job of matching names, and double metaphone does even better although having each word mapped to possibly 2 equivalents might complicate your logic depending on your queries. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
>> also, i'd be happy to listen opinions from people who have experience > of usage >> of such things like soundex. > > > Soundex is grossly outdated. It was designed for manual use by 19th century > census takers, and I'm always surprised to see it still used. Metaphone > (google search gets good results) does a much better job of matching names, > and double metaphone does even better although having each word mapped to > possibly 2 equivalents might complicate your logic depending on your > queries. I remember now that over the years I found a few places where Metaphone needed improvement. Double Metaphone seemed to incorporate all my revisions, so the best approach would be to start with it, and if your system can't accommodate the notion of multiple equivalents, then just use the primary. -- Scott Ribe scott_ribe@killerbytes.com http://www.killerbytes.com/ (303) 722-0567 voice
Have a look at contrib/pg_trgm Nikolay Samokhvalov wrote: > hello. > > does anybody know any solutions to the problem of searching > words/phrases, which are close to each other by sounding? e.g. soundex > index or smth. > > problem I have: tag suggestion mechanism, similar to google suggest, > which is intended to suggest names of people (search field "person's > name" in web form). it would be great if it does its work smarter than > simple LIKE. > > also, i'd be happy to listen opinions from people who have experience > of usage of such things like soundex. > > -- > Best regards, > Nikolay > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev wrote: >> also, i'd be happy to listen opinions from people who have experience >> of usage of such things like soundex. I'm using metaphone() together with levenshtein() to search a place name gazetteer database and order the results. That works reasonably well and gives interesting results ("places with similar names"). However, it does not cover "partial" matches (it does just compare the whole string, and does not find multi-word names when just a single word is entered, eg. it would not find "santa cruz" when you just enter "cruz"). Regarding db structure: I've specifically added a column which contains the metaphone string (loaded with "UPDATE places set pname_metaphone = metaphone(pname, 11)") - this row is obviously indexed (and, with functional indices, actuall redundant ;). i'm then using "SELECT * from places where pname_metaphone = metaphone('searchstring', 11)" to retrieve similar names. levenshtein is used to order those rows by string distance. try it at http://nona.net/features/map/ I haven't attemted yet to combine tsearch2 and metaphone results - that would probably be the PerfectSolution(tm). hope that helps Alex