Home > mailing lists

Re: Searching for "bare" letters - Mailing list pgsql-general

From	Uwe Schroeder
Subject	Re: Searching for "bare" letters
Date	October 2, 2011 05:20:22
Msg-id	201110020120.11175.uwe@oss4u.com Whole thread Raw
In response to	Re: Searching for "bare" letters ("Reuven M. Lerner" <reuven@lerner.co.il>)
Responses	Re: Searching for "bare" letters
List	pgsql-general

Tree view

> Hi, everyone.  Uwe wrote:
> > What kind of "client" are the users using?  I assume you will have some
> > kind of user interface. For me this is a typical job for a user
> > interface. The number of letters with "equivalents" in different
> > languages are extremely limited, so a simple matching routine in the
> > user interface should give you a way to issue the proper query.
>
> The user interface will be via a Web application.  But we need to store
> the data with the European characters, such as ñ, so that we can display
> them appropriately.  So much as I like your suggestion, we need to do
> the opposite of what you're saying -- namely, take a bare letter, and
> then search for letters with accents and such on them.
>
> I am beginning to think that storing two versions of each name, one bare
> and the other not, might be the easiest way to go.   But hey, I'm open
> to more suggestions.
>
> Reuven

That still doesn't hinder you from using a matching algorithm. Here a simple
example (to my understanding of the problem)
You have texts stored in the db both containing a n and a ñ. Now a client
enters "n" on the website. What you want to do is look for both variations, so
"n" translates into "n" or "ñ".
There you have it. In the routine that receives the request you have a
matching method that matches on "n" (or any of the few other characters with
equivalents) and the routine will issue a query with a "xx like "%n%" or xx
like "%ñ%" (personally I would use ilike, since that eliminates the case
problem).

Since you're referring to a "name", I sure don't know the specifics of the
problem or data layout, but by what I know I think you can tackle this with a
rather primitive "match -> translate to" kind of algorithm.

One thing I'd not do: store duplicate versions. There's always a way to deal
with data the way it is. In my opinion storing different versions of the same
data just bloats a database in favor of a smarter way to deal with the initial
data.

Uwe

pgsql-general by date:

From: Thomas Kellerer
Date: 02 October 2011, 04:46:21
Subject: Re: SQL Help - Finding Next Lowest Value of Current Row Value

From: Oleg Bartunov
Date: 02 October 2011, 06:36:34
Subject: Re: Searching for "bare" letters

Re: Searching for "bare" letters - Mailing list pgsql-general

Previous

Next