Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents) - Mailing list pgsql-sql

From Benedikt Eric Heinen
Subject Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents)
Date
Msg-id Pine.LNX.3.96.980617210549.30824C-100000@fenun.icemark.ch
Whole thread Raw
In response to Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents)  (Patrice Hédé <patrice@idf.net>)
Responses Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents)  (Patrice Hédé <patrice@idf.net>)
List pgsql-sql
> Do you mean you have a field with German *and* French *and* Italian *and*
> English words in it, and you want people, be they german-, french-,
> italian-, english-speaking, to be able to access this field, without
> putting accents and all ?

Right - basically, I am building a web database with addresses of a group
of people all over Switzerland, who are members of the same club. The
problem is just, that for a Mr. "á Porta" I (can't speak French or
Italian) doesn't know what the right spelling with accents is. Which is
much the same way that a French native speaker of the western part of
Switzerland possibly doesn't know which/whether an Umlaut will have to be
used in a German name...



> As I said earlier, you may have problems, since `ae' doesn't mean `ä' for
> most of these people (except the german-speaking ones), and they may put
> `a' instead. As the rules are different among the languages, it's
> difficult to have a single solution. However, you *need* a solution.
> Maybe I, or others ;) , may help though. Some questions : what is your
> interface language (if it's perl, it can be much easier :) ) ? Can it be a
> client-side solution, or do you absolutely need a server-side one (which
> would then have to be a C function, I think) ?

The program is a server-side C++ CGI  (Can't program perl).

I just thought - I am certainly not the first to have had this kind of
problem...



> And then, what kind of conversions do you need ? For example, for French,
> I decided that all a, e, i, o, u, y to be equal, which meant :
>
> any of a,A,à,À,æ,Æ,å,Å,â,Â,á,Á,ä,Ä => a,A,à,À,æ,Æ,å,Å,â,Â,á,Á,ä,Ä
> etc.

Let's say - only just the search string should ever be modified, so an "ä"
in the search string should never match "ae" in a string in the database.

The modifications should be:

part of search string        can match in database side string

    a            a, a umlaut,
                a with acute/grave/circumflex accent
    ae            ae, a umlaut
    c            c, c cedilla
    e            e, e with acute/grave/circumflex accent
    i            i, i with acute/grave/circumflex accent
    o            o, o umlaut,
                o with acute/grave/circumflex accent
    oe            oe, o umlaut
    u            u, u umlaut,
                u with acute/grave/circumflex accent
    ue            ue, u umlaut


    [all searches will be case insensitive]



> Obviously, in your case, it will be more complex, since `ae' *may* have a
> special meaning... (that's where it's getting difficult :( )...

I hope the above description is somewhat useful to you (unfortunately I am
lacking the matching characters on my US keyboard - so I described which
ones should be matched).

I guess, the ideal way would be to try and build a general pluggable
module for postgresql, so that it can handle this somewhat transparently.


   Benedikt

Windows 95: n.
    32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit
    operating system originally coded for a 4-bit microprocessor,  written
         by a 2-bit company that can't stand for 1 bit of competition.


pgsql-sql by date:

Previous
From: Patrice Hédé
Date:
Subject: Re: [SQL] Internationalisation: SELECT str (ignoring Umlauts/Accents)
Next
From: Pich LY
Date:
Subject: FOREIGN KEY ...