Re: String Similarity - Mailing list pgsql-hackers

From Christopher Kings-Lynne
Subject Re: String Similarity
Date
Msg-id 44712CDE.1090608@calorieking.com
Whole thread Raw
In response to String Similarity  ("Mark Woodward" <pgsql@mohawksoft.com>)
Responses Re: String Similarity  ("Mark Woodward" <pgsql@mohawksoft.com>)
List pgsql-hackers
Try contrib/pg_trgm...

Chris

Mark Woodward wrote:
> I have a side project that needs to "intelligently" know if two strings
> are contextually similar. Think about how CDDB information is collected
> and sorted. It isn't perfect, but there should be enough information to be
> usable.
> 
> Think about this:
> 
> "pink floyd - dark side of the moon - money"
> "dark side of the moon - pink floyd - money"
> "money - dark side of the moon - pink floyd"
> etc.
> 
> To a human, these strings are almost identical. Similarly:
> 
> "dark floyd of money moon pink side the"
> 
> Is a puzzle to be solved by 13 year old children before the movie starts.
> 
> My post has three questions:
> 
> (1) Does anyone know of an efficient and numerically quantified method of
> detecting these sorts of things? I currently have a fairly inefficient and
> numerically bogus solution that may be the only non-impossible solution
> for the problem.
> 
> (2) Does any one see a need for this feature in PostgreSQL? If so, what
> kind of interface would be best accepted as a patch? I am currently
> returning a match liklihood between 0 and 100;
> 
> (3) Is there also a desire for a Levenshtein distence function for text
> and varchars? I experimented with it, and was forced to write the function
> in item #1.
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

-- 
Christopher Kings-Lynne

Technical Manager
CalorieKing
Tel: +618.9389.8777
Fax: +618.9389.8444
chris.kings-lynne@calorieking.com
www.calorieking.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: FW: iDefense Q2 2006 Vulnerability Challenge
Next
From: Martijn van Oosterhout
Date:
Subject: Re: problem with PQsendQuery/PQgetResult and COPY FROM statement