Home > mailing lists

Re: How to add locale support for each column? - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: How to add locale support for each column?
Date	September 26, 2004 07:52:10
Msg-id	87wtyh8quu.fsf@stark.xeocode.com Whole thread Raw
In response to	Re: How to add locale support for each column? (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List	pgsql-hackers

Tree view

Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

> I'd thought there was still a question of where such a thing would live?
> If it's an external project or a contrib thing, the above might be true,
> but if it's meant to be a truly supported internal builtin then the
> function call cost is part of the implementation and is significant data
> that cannot be thrown out.

Well it seems to be consensus that it would be good to have a complete locale
handling as envisioned by the spec. But I don't see that as relevant to this
discussion. I'm comparing a function handling strxfrm with a function handling
lower() and with sorting on a column directly. The point was to demonstrate
that it was practical (if not ideal) to switch locales repeatedly, especially
when you take into account that *any* function will have some overhead
anyways. If it were built into postgres the overhead might be lower, but I
doubt by much, and in any case it's just not an option for me now.

> Aparently the message I responded to hung around for a while before
> getting to me because they came out of order.

That seems to be happening a lot lately.

> I agree in general, but if part of this involves forcing "C" locale (see
> my question at the end) and so any locale sorting is forced to do this,
> then if a query in en_US currently takes 7 seconds, but now will take 17,
> I think that's significant.

I compared against sorting in C locale. It would be interesting to know how
much of the penalty came from simply having to do the work strxfrm vs the
overhead of switching locales. The former is inevitable. *Any* implementation
of locale collation orders is going to have to do it. 

The latter is maybe something we can work on reducing, though not without
considerable cost in terms of code complexity. It will mean either lobbying
for API changes in libc or growing the codebase of postgres by the size of an
entire i18n package. I strongly suspect maintaining i18n packages turns out to
be a *lot* of work.

> Was your strxfrm comparison against a column comparison in "C" locale then
> rather than one using en_US or some other such locale?

C.

I could compare it against sorting in a database created in a given locale,
but I suspect I'll find gprof output more directly helpful.

> But we don't presumably have to look up the locale each time as you note.

The question is whether looking up the locale is significant compared to
executing strxfrm. I suspect it'll be significant, but not the majority of the
time. 

The real question is whether speeding up sorting by removing that overhead is
worth the complexity of abandoning libc.

I would strongly urge people to consider writing postgres support to assume
standard libc functionality. If we can convince glibc and BSD libc people to
add a more reasonable interface we can optionally use it, just as we do other
more modern interfaces to old features. 

If some platforms are just terminally braindead we should look for ways to
support people installing gnu libintl (or whatever the glibc i18n chunk is
called) separately and using it like we do libreadline, libkrb, or libz.

> More importantly, do we have know whether or not this function really works
> properly in non-C locales? Is the strxfrm result guaranteed to sort
> correctly (using strcoll) in others?

Well you wouldn't want to use strcoll at all actually, just strcmp. Actually
Conway's reimplementation returns a bytea which is probably more correct than
my original plan to return text. Though I should check whether postgres has to
do extra work to sort bytea data instead of varchar data, especially since
strxfrm should never return strings containing nuls.

-- 
greg

pgsql-hackers by date:

From: Dennis Bjorklund
Date: 26 September 2004, 07:46:58
Subject: Re: Get rid of Money

From: Andrew Dunstan
Date: 26 September 2004, 14:28:51
Subject: Re: Use of zlib

Re: How to add locale support for each column? - Mailing list pgsql-hackers

Previous

Next