Thread: COLLATE
Hi all, let's assume that we keep Unicode text data in the column. sometimes we want to sort it according to specific collation order. how can we force collation when running a query? ideal solution would be having SQL92 standard COLLATE clauses in SELECT statements. I heard it's work in progress, what's the status? are there any "partial" solutions to the problem? regards, Filip Rembiałkowski
On Mon, Jun 05, 2006 at 05:04:25PM +0200, Filip Rembia??kowski wrote: > Hi all, > > let's assume that we keep Unicode text data in the column. > sometimes we want to sort it according to specific collation order. > how can we force collation when running a query? > ideal solution would be having SQL92 standard COLLATE clauses in > SELECT statements. I heard it's work in progress, what's the status? Yeah, I was working on it but got stuck on the planner/optimiser changes. In the mean time the tree drifted and lack of interest, which gets us where we are now... > are there any "partial" solutions to the problem? Not that I know of. Have a ncie day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
"Filip Rembiaâkowski" <plk.zuber@gmail.com> writes: > let's assume that we keep Unicode text data in the column. > sometimes we want to sort it according to specific collation order. > how can we force collation when running a query? > ideal solution would be having SQL92 standard COLLATE clauses in > SELECT statements. I heard it's work in progress, what's the status? > > are there any "partial" solutions to the problem? I don't know the status of the full COLLATE support. But the closest partial solution suggested so far is the pg_xfrm function that has been implemented and posted at least three times by three different posters to the postgres mailing lists. In the interest of avoiding a fourth independent implementation I'll attach the one I use below, it's not big. -- greg
Attachment
On 6/5/06, Martijn van Oosterhout <kleptog@svana.org> wrote: > > Yeah, I was working on it but got stuck on the planner/optimiser > changes. In the mean time the tree drifted and lack of interest, which > gets us where we are now... Very bad news :-( I were looking forward to this feature... So many troubles in my projects would be vanished... What is 'lack of interest'? Interest from community, or major developers, or your personal one?
(group dupe) 05 Jun 2006 12:53:57 -0400, Greg Stark <gsstark@mit.edu>: > > But the closest partial solution suggested so far is the pg_xfrm function that > has been implemented and posted at least three times by three different > posters to the postgres mailing lists. In the interest of avoiding a fourth > independent implementation I'll attach the one I use below, it's not big. > Thanks! It may be the only solution available at the moment. But I have no idea how to compile/install it. I imagine there should be separate /contrib/pg_strxfrm directory, with the makefile etc. is it correct? if so, how should the makefile look like? I tried compiling but it failed: pgdba@sulaco:~/src/postgresql-8.1.3/src$ gcc -I ./include -fPIC -c pg_strxfrm.c pg_strxfrm.c: In function 'pg_strxfrm': pg_strxfrm.c:98: error: 'Warn_restart' undeclared (first use in this function) pg_strxfrm.c:98: error: (Each undeclared identifier is reported only once pg_strxfrm.c:98: error: for each function it appears in.)
On Wed, Jun 07, 2006 at 12:40:49AM +0400, Nikolay Samokhvalov wrote: > I were looking forward to this feature... So many troubles in my > projects would be vanished... > > What is 'lack of interest'? Interest from community, or major > developers, or your personal one? Kind of all three, feeding off eachother. There's just not enough interest from any front to really get it moving. It's a fairly invasive change and without significant support and interest from somewhere, chances of completion let alone acceptance are pretty slim... Have a nice day -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On 6/7/06, Martijn van Oosterhout <kleptog@svana.org> wrote: > Kind of all three, feeding off eachother. There's just not enough > interest from any front to really get it moving. It's a fairly invasive > change and without significant support and interest from somewhere, > chances of completion let alone acceptance are pretty slim... I don't get it. Maybe it's me or my environment. About half of the work we do in my company is building webs on the (excellent) eZ publish CMS. We routinely use MySQL even though the CMS (at least declaratively) wolks on top of pgsql. We use MySQL on inertia: it's the default db under eZ publish and it's easy to get hosting. On the other hand, I'd rather bite my hand off than use MySQL in some other Internet apps (e-shop with a card payment interface and others). Where am I going with all this? Well, if you don't live in an english-speaking country, there's no such thing as a single language web. Because of collation limitations, however, postgresql would be the first to be crossed out on my list no matter how good it is in all other respects. I understand that the needed change is uncomfortably invasive, but not beeing able to collate correctly is a show-stopping problem on a professional site. The user opens a drop-down with several dozens of cities, scrolls down a bit to where his city should be, dooesn't see it because the collator places the weird letter all the way down at the end of the list - and the user walks away. The other user can't get info about the bus lines to the city or he doesn't buy a product he want's because he doesn't see it where it's supposed to be. Another user notices the error and dissregards the site as amateurish. I understand I'm talking about a specific area of use, but isn't that where the biggest growth in both the number of new applications as well as the number of users is? As it is, I'm happy to have a great RDBMS to build intranet, rich client apps on - but that's about it. I've used MySQL and MSSQL (unfortunately) and they both support collation much better than pgsql. Not perfect, but much much better. Maybe postgresql should try to set less ambitious goals and instead of going for the holy grail of collation management (which is usualy praiseworthy) try to provide at least db-level collation definitions if table/row/cell level collation settings should proove too challenging at the moment. t.n.a.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Wed, Jun 07, 2006 at 12:40:49AM +0400, Nikolay Samokhvalov wrote: >> What is 'lack of interest'? Interest from community, or major >> developers, or your personal one? > Kind of all three, feeding off eachother. There's just not enough > interest from any front to really get it moving. It's a fairly invasive > change and without significant support and interest from somewhere, > chances of completion let alone acceptance are pretty slim... I think there was also considerable concern about introducing a dependency on a very large chunk of outside software (viz, ICU). Loss of control, licensing questions, etc. Of course, doing it *without* ICU is even more daunting :-( but I'd like to think we'll get there someday. regards, tom lane
Filip Rembiałkowski wrote: > let's assume that we keep Unicode text data in the column. > sometimes we want to sort it according to specific collation order. > how can we force collation when running a query? Hi Filip, I had the same problem you have. As a solution I implemented a wrapper function named collkey(), which transforms a Unicode string into a sortable collation key using IBM's ICU library. Of course it's not as nice as having a standard SQL command for it, but i think it's a nice solution, until a good collation support is provided by PostgreSQL itself. You can find the source here: http://www.flexiguided.de/publications.pgcollkey.en.html I can understand that people don't want PostgreSQL being dependent of other libraries. The best solution for future would be implementing independent collation functions inside of PostgreSQL, which also support standard SQL syntax, but that's a lot of work, i think. Jan Behrens
Filip Rembiałkowski wrote: > Greg Stark wrote: > > But the closest partial solution suggested so far is the pg_xfrm (sic) function that > has been implemented and posted at least three times by three different > posters to the postgres mailing lists. In the interest of avoiding a fourth > independent implementation I'll attach the one I use below, it's not big. > > > But I have no idea how to compile/install it. > > I tried compiling but it failed: > > pgdba ( at ) sulaco:~/src/postgresql-8.1.3/src$ gcc -I ./include -fPIC -c pg_strxfrm.c > pg_strxfrm.c: In function 'pg_strxfrm': > pg_strxfrm.c:98: error: 'Warn_restart' undeclared (first use in this function) > pg_strxfrm.c:98: error: (Each undeclared identifier is reported only once > pg_strxfrm.c:98: error: for each function it appears in.) What is the answer to Filip's question? I didn't see an answer in the list archives. I've seen several copies of Joe Conway'spg_strxfrm.c code on the web, and it always refers to the Warn_restart variable, which doesn't seem to exist in the8.1.4 code that I'm using. I am working with a database in UTF-8 encoding using "C" collation - but I'd occasionally like to ORDER BY columns containingreal UTF-8 data. Would the pg_strxfrm() function get used in a new operator class function? I'll read up on operator classes in chapter32.14 of the docs, but if someone has a simple example, it might help other searchers of the archives. Thanks, Kevin Murphy
Kevin Murphy <murphy@genome.chop.edu> writes: > What is the answer to Filip's question? I didn't see an answer in the list archives. I've seen several copies of JoeConway's pg_strxfrm.c code on the web, and it always refers to the Warn_restart variable, which doesn't seem to existin the 8.1.4 code that I'm using. Warn_restart hasn't existed since PG 7.4. I would imagine that the code needs to be tweaked to use a PG_TRY construct instead of direct setjmp hacking. regards, tom lane
Tom Lane wrote: > Kevin Murphy <murphy@genome.chop.edu> writes: >> What is the answer to Filip's question? I didn't see an answer in the list archives. I've seen several copies of JoeConway's pg_strxfrm.c code on the web, and it always refers to the Warn_restart variable, which doesn't seem to existin the 8.1.4 code that I'm using. > > Warn_restart hasn't existed since PG 7.4. I would imagine that the code > needs to be tweaked to use a PG_TRY construct instead of direct setjmp > hacking. Yes, I'm a user, not a hacker. I was hoping that someone had done this already. Anyway, I gave PG_TRY a try, and the code superficially works. I have no idea what I'm doing; you can see what I did below. Confirm that instead of: memcpy(&save_restart, &Warn_restart, sizeof(save_restart)); if (sigsetjmp(Warn_restart, 1) != 0) { memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart)); newlocale = setlocale(LC_COLLATE, oldlocale); if (!newlocale) elog(PANIC, "setlocale failed to reset locale: %s", localestr); siglongjmp(Warn_restart, 1); } ... code here ... memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart)); it should be: PG_TRY(); { ... code here ... } PG_CATCH(); { newlocale = setlocale(LC_COLLATE, oldlocale); if (!newlocale) elog(PANIC, "setlocale failed to reset locale: %s", localestr); } PG_END_TRY(); Thanks, Kevin Murphy