Thread: Seaching with and without diacritical marks

Seaching with and without diacritical marks

From
"Ullrich Ralf"
Date:
Hello,

I have a multilingual portal running on PostgreSQL 7.4.2.
My clients come from spain,portugal, latin america and germany (mainly).
The main feature of the site is a search engine that retrieves bibliographic data, which is stored in
my database (unicode!) with diacritical marks (e.g. Panamá,América); when users enter their search terms with
diacriticalmarks postgres will find the requested records, but if a german user enter Panama or America (without
diacriticalmarks), the  search fails.  
Is there any extension for postgres that allows for both modes of searching on the same data?

Yours,

Ralf Ullrich

______________________
Ralf Ullrich, M.A.
Virtuelle Fachbibliothek
Ibero-Amerikanisches Institut
Preußischer Kulturbesitz
Potsdamer Str. 37
D-10785 Berlin
Germany
Phone: +49 +30 266-2512
Fax: +49 +30 266-2503
E-Mail: ullrich@iai.spk-berlin.de
http://www.iai.spk-berlin.de
______________________



Attachment

Re: Seaching with and without diacritical marks

From
Mike G
Date:
Automatically I don't believe so.

You can modify your queries though.

IF default search returns 0 records then
SET CLIENT_ENCODING TO 'some_other_encodin';
SELECT.....
END IF;

Maybe search for those special characters before issuing a SELECT statement.

HTH.

On Tue, Jul 13, 2004 at 02:20:48PM +0200, Ullrich Ralf wrote:
> Hello,
>
> I have a multilingual portal running on PostgreSQL 7.4.2.
> My clients come from spain,portugal, latin america and germany (mainly).
> The main feature of the site is a search engine that retrieves bibliographic data, which is stored in
> my database (unicode!) with diacritical marks (e.g. Panamá,América); when users enter their search terms with
diacriticalmarks postgres will find the requested records, but if a german user enter Panama or America (without
diacriticalmarks), the  search fails.  
> Is there any extension for postgres that allows for both modes of searching on the same data?
>
> Yours,
>
> Ralf Ullrich
>
> ______________________
> Ralf Ullrich, M.A.
> Virtuelle Fachbibliothek
> Ibero-Amerikanisches Institut
> Preußischer Kulturbesitz
> Potsdamer Str. 37
> D-10785 Berlin
> Germany
> Phone: +49 +30 266-2512
> Fax: +49 +30 266-2503
> E-Mail: ullrich@iai.spk-berlin.de
> http://www.iai.spk-berlin.de
> ______________________
>
>

Content-Description: Ullrich Ralf.vcf

>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings


Re: Seaching with and without diacritical marks

From
Lynna Landstreet
Date:
on 7/13/04 8:20 AM, Ullrich Ralf at Ullrich@iai.spk-berlin.de wrote:

> I have a multilingual portal running on PostgreSQL 7.4.2.
> My clients come from spain,portugal, latin america and germany (mainly).
> The main feature of the site is a search engine that retrieves bibliographic
> data, which is stored in
> my database (unicode!) with diacritical marks (e.g. Panamá,América); when
> users enter their search terms with diacritical marks postgres will find the
> requested records, but if a german user enter Panama or America (without
> diacritical marks), the  search fails.
> Is there any extension for postgres that allows for both modes of searching on
> the same data?

I've been wrestling with this issue too - I'm working on an art gallery
database which includes work from a number of French-Canadian artists, plus
a few from other countries where names and image titles typically involve
accents as well.

What I've been tentatively planning to do is to handle it in the PHP
frontend rather than the database itself, by setting up a function with
strtr() (string translate) that would strip out the accents while searching
so that results would come up regardless of whether users entered the right
accent, the wrong accent or no accent at all. The strtr() function allows
you to specify a number of pairs of strings to translate, so I could make up
a list of all the commonly used accented characters and have it translate
all search text with those. I'd apply it to both the search terms entered
and the text found, so that any "a" would match any other "a", regardless of
whether it was really an à, á, ä, â or just plain a (let's see if those
accents show up in anyone's e-mail...). It's kind of the way I'm handling
case sensitivity now.


Lynna

--
Resource Centre Database Coordinator
Gallery 44: www.gallery44.org
Database Project: www.gallery44db.org