Thread: search on accents over all possible matches
Hello, I'm developing a search tool with php against a posgresql database. As the database is in catalan an in spanish is obvious that a simple search like: (SELECT * FROM painters WHERE artist_name ~* 'Dali'); should perform over Dd Aa Ll Ii (and will not found Dalí). but on an accent based language also should perform over ÍíÌìÏï question is: this c function from Patrice Hédé is the most appropiate tool for searching on an accent based language ? http://www.postgresql.org/mhonarc/pgsql-sql/1998-06/msg00119.html or should I use an implemented function inside postgres right now ? bests from barcelona, jaume teixi.
At 18.24 27/3/01 +0200, you wrote: >Hello, > >I'm developing a search tool with php against a posgresql database. >As the database is in catalan an in spanish is obvious that a simple >search like: >(SELECT * FROM painters WHERE artist_name ~* 'Dali'); > >should perform over Dd Aa Ll Ii (and will not found Dalí). >but on an accent based language also should perform over ÍíÌìÏï > >question is: > >this c function from Patrice Hédé is the most appropiate tool for >searching on an accent based language ? >http://www.postgresql.org/mhonarc/pgsql-sql/1998-06/msg00119.html > >or should I use an implemented function inside postgres right now ? > >bests from barcelona, >jaume teixi. Using regular expressions from PHP you can convert "a" into "[Aaáä]" and from the original SQL query: (SELECT * FROM painters WHERE artist_name ~* 'Dali'); You obtain (SELECT * FROM painters WHERE artist_name ~* 'D[Aaáä]l[Iiíï]'); generating a new complete regular expression for the SQL language. It should be valid for Dali, Dáli, Dalí, Dálí, and others. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ David Lizano - Director área técnica correo-e: david.lizano@izanet.com I Z A N E T - Servicios integrales de internet. web: http://www.izanet.com/ Dirección: C/ Checa, 57-59, 3º D - 50.007 Zaragoza (España) Teléfono: +34 976 25 80 23 Fax: +34 976 25 80 24 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jaume Teixi writes: > this c function from Patrice Hédé is the most appropiate tool for > searching on an accent based language ? > http://www.postgresql.org/mhonarc/pgsql-sql/1998-06/msg00119.html Looks good to me. > or should I use an implemented function inside postgres right now ? The reason there is no such implementation, and probably won't be any time soon, is that this tool would either have to hard-code or ignore natural language semantics, neither of which would make it practical. Not all languages have the same accent ignoring or accent folding rules or conventions. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
on day of Tue, 27 Mar 2001 19:24:34 +0200 (CEST), the message from Peter Eisentraut <peter_e@gmx.net> says: > Jaume Teixi writes: > > > this c function from Patrice Hédé is the most appropiate tool for > > searching on an accent based language ? > > http://www.postgresql.org/mhonarc/pgsql-sql/1998-06/msg00119.html > > Looks good to me. > > > or should I use an implemented function inside postgres right now ? > > The reason there is no such implementation, and probably won't be any time > soon, is that this tool would either have to hard-code or ignore natural > language semantics, neither of which would make it practical. Not all > languages have the same accent ignoring or accent folding rules or > conventions. This function is really fast. The accent method is a REAL need for almost all non-english languages. You should to explicity call this funciton like: select accents ('dali'); accents ---------------------------------- [dðÐ][aáÁàÀâÂäÄåÅãÃ]l[iíÍìÌîÎïÏ] so why to not to include on the next release ? best from barcelona, jaume teixi. This fortune intentionally not included.
Jaume Teixi writes: > > The reason there is no such implementation, and probably won't be any time > > soon, is that this tool would either have to hard-code or ignore natural > > language semantics, neither of which would make it practical. Not all > > languages have the same accent ignoring or accent folding rules or > > conventions. > > This function is really fast. > The accent method is a REAL need for almost all non-english languages. > You should to explicity call this funciton like: > select accents ('dali'); > accents > ---------------------------------- > [dðÐ][aáÁàÀâÂäÄåÅãÃ]l[iíÍìÌîÎïÏ] > > so why to not to include on the next release ? For the reason I cited above: it is a too abstract approach for many languages and/or applications. For example in Swedish, a search for 'e' should probably include 'é', since most users will not type that in explicitly (it's not on the keyboard), but a search for 'a' should normally not include 'å', since that it a completely separate letter (and it is on the keyboard). Additionally, this particular implementation seems to be ISO-8859-1 charset specific. I know a number of accented letters that are a lot closer "siblings" to 'd' than 'ð' is. -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
But the thing is that you must explicity call this function in order to use it. Also in order to some stetics maybe you should call it accents_iso-8859-1 The thing is that this should be consider a big need for non-english languages. On a major approx also could be possible to modify it in order to accept parameters to include ('å','à') or ('ca_ES','fr_FR').... bests, jaume. > For the reason I cited above: it is a too abstract approach for many > languages and/or applications. For example in Swedish, a search for 'e' > should probably include 'é', since most users will not type that in > explicitly (it's not on the keyboard), but a search for 'a' should > normally not include 'å', since that it a completely separate letter (and > it is on the keyboard). Additionally, this particular implementation > seems to be ISO-8859-1 charset specific. I know a number of accented > letters that are a lot closer "siblings" to 'd' than 'ð' is. > > -- > Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Hi, I am putting together a large database and want the table to reside on a partition separate from the default under 'base'. How can I do this? I tried the following: First I created the place I wanted to put the table: mkdir /taos/01/postgres chown taos /taos/01/postgres Then I modified the .profile of postgres so that: PGDATA2 = /taos/01/postgres export PGDATA2 then on the command line of the postgres account I typed initlocation $PGDATA2 which was successful. Then I typed initdb -D $PGDATA2 -i 1095 Now I had to kill the old postmaster and restart it. However, I could only give it one location to utilize the data base, i.e., postmaster -D /taos/01/postgres createdb -U taos -D $PGDATA2 large_table So does this mean I have to start a separate postmaster for the new location? How would I do that? What we would like is a single postmaster to handle all database queries, etc. However, for the especially large table have a special user and table only. Rodin
>I am putting together a large database and want the >table to reside on a partition separate from the >default under 'base'. How can I do this? You could create symlinks for the larger tables pointing to another location: ln -s /path/to/table/bigtable /usr/local/pgsql/data/base/whatever/bigtable (supposed /usr/local/pgsql is your Postgres directory) If there are serious troubles to be expected, I'd like to know that, because we have used this method once (not so importantDB, without any probs till now) Stefan -- Atheism is a non-prophet organization.
Hi, First, thank you for having including me in this thread : I haven't been involved with PostgreSQL for 3 years now, and it's nice to see that this hack is still useful to some persons ! (I should however soon get involved again with databases :) ). About this programme, I agree with Peter that it is too biased to be included as a standard function. It is biased towards ISO-8859-1, and towards some european languages I know ("d" or "dh" => "ð" is for Icelandic, for example)... although "a" => "å" makes sense : not all people involved with swedish/norwegian/danish have a scandinavic keyboard, and they may not be sure whether the programme will do the "aa" => "å" translation correctly (which this function does ;) ). Back to the subject, though. This function also has another limitation, namely, it has a fixed length buffer of 4096 bytes, and that's not so nice (but it takes care of buffer overflows...). Maybe, if it's not already the case, the source code could be put in a contribution directory, available for anyone to adapt to his/her needs without having to go through 3 years of archives, since it seems to be a fairly common problem. The code should be simple enough for anyone with a basic knowledge of C to customise :) I know that localisation, and collation, and "acceptable alternatives" are following quite different rules from country to country, making it difficult to come with a general solution. This is why I didn't even try to make one ;) Patrice * Jaume Teixi <teixi@6tems.com> [010329 22:04]: > But the thing is that you must explicity call this function in order > to use it. > Also in order to some stetics maybe you should call it > accents_iso-8859-1 The thing is that this should be consider a big > need for non-english languages. > > On a major approx also could be possible to modify it in order to > accept parameters to include ('å','à') or ('ca_ES','fr_FR').... > > bests, > jaume. > > > > For the reason I cited above: it is a too abstract approach for > > many languages and/or applications. For example in Swedish, a > > search for 'e' should probably include 'é', since most users will > > not type that in explicitly (it's not on the keyboard), but a > > search for 'a' should normally not include 'å', since that it a > > completely separate letter (and it is on the keyboard). > > Additionally, this particular implementation seems to be > > ISO-8859-1 charset specific. I know a number of accented > > letters that are a lot closer "siblings" to 'd' than 'ð' is. > > > > -- > > Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/ > -- Patrice HÉDÉ --------------------------------- patrice@islande.org ----- -- Isn't it weird how scientists can imagine all the matter of the universe exploding out of a dot smaller than the head of a pin, but they can't come up with a more evocative name for it than "The Big Bang" ? -- What would _you_ call the creation of the universe ? -- "The HORRENDOUS SPACE KABLOOIE !" - Calvin and Hobbes ------------------------------------------ http://www.islande.org/ -----