Re: search on accents -> Why not include this function - Mailing list pgsql-admin

From Patrice Hédé
Subject Re: search on accents -> Why not include this function
Date
Msg-id 20010329230047.H6360@idf.net
Whole thread Raw
In response to Re: search on accents -> Why not include this function  (Jaume Teixi <teixi@6tems.com>)
List pgsql-admin
Hi,

First, thank you for having including me in this thread : I haven't
been involved with PostgreSQL for 3 years now, and it's nice to see
that this hack is still useful to some persons ! (I should however
soon get involved again with databases :) ).

About this programme, I agree with Peter that it is too biased to be
included as a standard function. It is biased towards ISO-8859-1, and
towards some european languages I know ("d" or "dh" => "ð" is for
Icelandic, for example)... although "a" => "å" makes sense : not all
people involved with swedish/norwegian/danish have a scandinavic
keyboard, and they may not be sure whether the programme will do the
"aa" => "å" translation correctly (which this function does ;) ).

Back to the subject, though. This function also has another
limitation, namely, it has a fixed length buffer of 4096 bytes, and
that's not so nice (but it takes care of buffer overflows...).

Maybe, if it's not already the case, the source code could be put in a
contribution directory, available for anyone to adapt to his/her
needs without having to go through 3 years of archives, since it seems
to be a fairly common problem. The code should be simple enough for
anyone with a basic knowledge of C to customise :)

I know that localisation, and collation, and "acceptable alternatives"
are following quite different rules from country to country, making it
difficult to come with a general solution. This is why I didn't even
try to make one ;)

Patrice

* Jaume Teixi <teixi@6tems.com> [010329 22:04]:
> But the thing is that you must explicity call this function in order
> to use it.
> Also in order to some stetics maybe you should call it
> accents_iso-8859-1 The thing is that this should be consider a big
> need for non-english languages.
>
> On a major approx also could be possible to modify it in order to
> accept parameters to include ('å','à') or ('ca_ES','fr_FR')....
>
> bests,
> jaume.
>
>
> > For the reason I cited above:  it is a too abstract approach for
> > many languages and/or applications.  For example in Swedish, a
> > search for 'e' should probably include 'é', since most users will
> > not type that in explicitly (it's not on the keyboard), but a
> > search for 'a' should normally not include 'å', since that it a
> > completely separate letter (and it is on the keyboard).
> > Additionally, this particular implementation seems to be
> > ISO-8859-1 charset specific.  I know a number of accented
> > letters that are a lot closer "siblings" to 'd' than 'ð' is.
> >
> > --
> > Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/
>

--
Patrice HÉDÉ --------------------------------- patrice@islande.org -----
  --  Isn't it weird  how scientists  can imagine  all the matter of the
universe exploding out of a dot smaller than the head of a pin, but they
can't come up with a more evocative name for it than "The Big Bang" ?
  -- What would _you_ call the creation of the universe ?
  -- "The HORRENDOUS SPACE KABLOOIE !"               - Calvin and Hobbes
------------------------------------------ http://www.islande.org/ -----

pgsql-admin by date:

Previous
From: "Ascent Avenues"
Date:
Subject: postgres
Next
From: Bruce Momjian
Date:
Subject: Re: postgres