Re: search on accents -> Why not include this function - Mailing list pgsql-admin
From | Patrice Hédé |
---|---|
Subject | Re: search on accents -> Why not include this function |
Date | |
Msg-id | 20010329230047.H6360@idf.net Whole thread Raw |
In response to | Re: search on accents -> Why not include this function (Jaume Teixi <teixi@6tems.com>) |
List | pgsql-admin |
Hi, First, thank you for having including me in this thread : I haven't been involved with PostgreSQL for 3 years now, and it's nice to see that this hack is still useful to some persons ! (I should however soon get involved again with databases :) ). About this programme, I agree with Peter that it is too biased to be included as a standard function. It is biased towards ISO-8859-1, and towards some european languages I know ("d" or "dh" => "ð" is for Icelandic, for example)... although "a" => "å" makes sense : not all people involved with swedish/norwegian/danish have a scandinavic keyboard, and they may not be sure whether the programme will do the "aa" => "å" translation correctly (which this function does ;) ). Back to the subject, though. This function also has another limitation, namely, it has a fixed length buffer of 4096 bytes, and that's not so nice (but it takes care of buffer overflows...). Maybe, if it's not already the case, the source code could be put in a contribution directory, available for anyone to adapt to his/her needs without having to go through 3 years of archives, since it seems to be a fairly common problem. The code should be simple enough for anyone with a basic knowledge of C to customise :) I know that localisation, and collation, and "acceptable alternatives" are following quite different rules from country to country, making it difficult to come with a general solution. This is why I didn't even try to make one ;) Patrice * Jaume Teixi <teixi@6tems.com> [010329 22:04]: > But the thing is that you must explicity call this function in order > to use it. > Also in order to some stetics maybe you should call it > accents_iso-8859-1 The thing is that this should be consider a big > need for non-english languages. > > On a major approx also could be possible to modify it in order to > accept parameters to include ('å','à') or ('ca_ES','fr_FR').... > > bests, > jaume. > > > > For the reason I cited above: it is a too abstract approach for > > many languages and/or applications. For example in Swedish, a > > search for 'e' should probably include 'é', since most users will > > not type that in explicitly (it's not on the keyboard), but a > > search for 'a' should normally not include 'å', since that it a > > completely separate letter (and it is on the keyboard). > > Additionally, this particular implementation seems to be > > ISO-8859-1 charset specific. I know a number of accented > > letters that are a lot closer "siblings" to 'd' than 'ð' is. > > > > -- > > Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/ > -- Patrice HÉDÉ --------------------------------- patrice@islande.org ----- -- Isn't it weird how scientists can imagine all the matter of the universe exploding out of a dot smaller than the head of a pin, but they can't come up with a more evocative name for it than "The Big Bang" ? -- What would _you_ call the creation of the universe ? -- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes ------------------------------------------ http://www.islande.org/ -----
pgsql-admin by date: