Re: Enhancing phonetic search support for more languages - GSoC 2010 - Mailing list pgsql-hackers

From Josh Berkus
Subject Re: Enhancing phonetic search support for more languages - GSoC 2010
Date
Msg-id 4BBD17C8.3040509@agliodbs.com
Whole thread Raw
In response to Enhancing phonetic search support for more languages - GSoC 2010  (Dhiraj Lohiya <lohiya.dhiraj@gmail.com>)
Responses Re: Enhancing phonetic search support for more languages - GSoC 2010  (Dhiraj Lohiya <lohiya.dhiraj@gmail.com>)
List pgsql-hackers
Dhiraj,

> For instance, if many users(above a threshold set by us) insert some 
> search  string for which no wanted  search  result is retrieved, we
> could track what he finally selects and then accordingly append/modify
> our set of phonetic rules based on the phonetic mismatch amongst the
>  query inserted and result wanted according to our set of rules. Using
> this, the *  rule sets it could evolve itself when we collect usage
> statistics from users based on their experience.  * This feature would
> add a new dimension to the  search functionality and would surely stand
> out.

You're mixing two completely different kinds of features here.  One is a
backend function and the other is an application for building soundex
rules.  While both of these are interesting projects, it is unlikely you
can complete both in one summer.

What I'd suggest focussing on for SoC is creating a new soundex funciton
(suggested name: soundex_ml) which includes a facility for loadable
algorithms and callability on a per-language basis.  That would be
plenty of work by itself.  From there, you could then continue your
undergraduate work on the tool to build the algorithms in the first place.

I'm also curious why you chose to focus on the extremely imprecise
soundex instead of the more discriminating metaphone.

--                                  -- Josh Berkus                                    PostgreSQL Experts Inc.
                        http://www.pgexperts.com
 


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: FM suffix in to_char Y/YY/YYY still screwy
Next
From: Robert Haas
Date:
Subject: Re: Enhancing phonetic search support for more languages - GSoC 2010