Re: How to add locale support for each column? - Mailing list pgsql-hackers

From Stephan Szabo
Subject Re: How to add locale support for each column?
Date
Msg-id 20040925083506.I92255@megazone.bigpanda.com
Whole thread Raw
In response to Re: How to add locale support for each column?  (Greg Stark <gsstark@mit.edu>)
Responses Re: How to add locale support for each column?  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Sun, 19 Sep 2004, Greg Stark wrote:

> Tom Lane <tgl@sss.pgh.pa.us> writes:
>
> > Greg Stark <gsstark@mit.edu> writes:
> > > Peter Eisentraut <peter_e@gmx.net> writes:
> > >> 2) switching the locale at run time is too expensive when using the system
> > >> library.
> >
> > > Fwiw I did some experiments with this and found it wasn't true.
> >
> > Really?
>
> We're following two different methodologies so the results aren't comparable.
> I exposed strxfrm to postgres and then did a sort on strxfrm(col). The
> resulting query times were slower than sorting on lower(col) by negligible
> amounts.

But shouldn't the comparison be against sorting on col not lower(col)?
strxfrm(col) sorts seem comparable to col, strxfrm(lower(col)) sorts seem
comparable to lower(col). Some collations do treat 'A' and 'a' as be
adjacent in sort order, but that's not a guarantee, so it's not valid to
say, "everywhere you'd use lower(col) you can use strxfrm instead."

And in past numbers you sent, it looked like the amounts were: 1s for sort
on col, 1.5s for sort on lower(col), 2.5s for sort on strxfrm(col).  That
doesn't seem negligible to me unless that doesn't grow linearly with the
number of rows. It also seems like if the only differences in the query
was that, then the time for the strxfrm was significant compared to the
rest of the query time on that query.

> > These are on machines of widely varying horsepower, so the absolute
> > numbers shouldn't be compared across rows, but the general story holds:
> > setlocale should be considered to be at least an order of magnitude
> > slower than strcoll, and on non-glibc machines it can be a whole lot
> > worse than that.
>
> I don't see how this is relevant though. One way or another postgres is going
> to have to sort strings in varying locales chosen at run-time. Comparing
> against strcoll's execution time without changing changing locales is a straw
> man. It's like comparing your tcp/ip bandwidth with the loopback interface's
> bandwidth.
>
> I see no reason to think Postgres's implementation of looking up xfrm rules
> for the specified locale will be any faster than the OS's. We know some OS's
> suck but some certainly don't.

But do you have to change locales per row or per sort? Presumably, a built
in implementation may be able to do the latter rather than the former.



pgsql-hackers by date:

Previous
From: Daniel Ahlin
Date:
Subject: Allow change of kerberos service name without recompilation
Next
From: "Marc G. Fournier"
Date:
Subject: Re: anoncvs lock problem