Re: Searching for "bare" letters - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: Searching for "bare" letters
Date
Msg-id Pine.LNX.4.64.1110021333280.26195@sn.sai.msu.ru
Whole thread Raw
In response to Re: Searching for "bare" letters  (Uwe Schroeder <uwe@oss4u.com>)
Responses Re: Searching for "bare" letters  ("Reuven M. Lerner" <reuven@lerner.co.il>)
List pgsql-general
I don't see the problem - you can have a dictionary, which does all work on
recognizing bare letters and output several versions. Have you seen unaccent
dictionary ?

Oleg
On Sun, 2 Oct 2011, Uwe Schroeder wrote:

>> Hi, everyone.  Uwe wrote:
>>> What kind of "client" are the users using?  I assume you will have some
>>> kind of user interface. For me this is a typical job for a user
>>> interface. The number of letters with "equivalents" in different
>>> languages are extremely limited, so a simple matching routine in the
>>> user interface should give you a way to issue the proper query.
>>
>> The user interface will be via a Web application.  But we need to store
>> the data with the European characters, such as ?, so that we can display
>> them appropriately.  So much as I like your suggestion, we need to do
>> the opposite of what you're saying -- namely, take a bare letter, and
>> then search for letters with accents and such on them.
>>
>> I am beginning to think that storing two versions of each name, one bare
>> and the other not, might be the easiest way to go.   But hey, I'm open
>> to more suggestions.
>>
>> Reuven
>
>
> That still doesn't hinder you from using a matching algorithm. Here a simple
> example (to my understanding of the problem)
> You have texts stored in the db both containing a n and a ?. Now a client
> enters "n" on the website. What you want to do is look for both variations, so
> "n" translates into "n" or "?".
> There you have it. In the routine that receives the request you have a
> matching method that matches on "n" (or any of the few other characters with
> equivalents) and the routine will issue a query with a "xx like "%n%" or xx
> like "%?%" (personally I would use ilike, since that eliminates the case
> problem).
>
> Since you're referring to a "name", I sure don't know the specifics of the
> problem or data layout, but by what I know I think you can tackle this with a
> rather primitive "match -> translate to" kind of algorithm.
>
> One thing I'd not do: store duplicate versions. There's always a way to deal
> with data the way it is. In my opinion storing different versions of the same
> data just bloats a database in favor of a smarter way to deal with the initial
> data.
>
> Uwe
>
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: Uwe Schroeder
Date:
Subject: Re: Searching for "bare" letters
Next
From: r d
Date:
Subject: Updating 9.0.4 --> 9.1.1: How best to ???