Re: How to add locale support for each column? - Mailing list pgsql-hackers
From | Greg Stark |
---|---|
Subject | Re: How to add locale support for each column? |
Date | |
Msg-id | 873c15agzi.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: How to add locale support for each column? (Stephan Szabo <sszabo@megazone.bigpanda.com>) |
Responses |
Re: How to add locale support for each column?
|
List | pgsql-hackers |
Stephan Szabo <sszabo@megazone.bigpanda.com> writes: > But shouldn't the comparison be against sorting on col not lower(col)? > strxfrm(col) sorts seem comparable to col, strxfrm(lower(col)) sorts seem > comparable to lower(col). Some collations do treat 'A' and 'a' as be > adjacent in sort order, but that's not a guarantee, so it's not valid to > say, "everywhere you'd use lower(col) you can use strxfrm instead." Well, in my implementation strxfrm is a postgresql function. So I wanted to compare it with an expression that had at least as much overhead as a postgresql expression with a single function call. > And in past numbers you sent, it looked like the amounts were: 1s for sort > on col, 1.5s for sort on lower(col), 2.5s for sort on strxfrm(col). That > doesn't seem negligible to me Right, I amended my "negligible" claim. It's a significant but reasonable speed. A 1.5s delay on sorting 100k rows is certainly not the kind of intolerable delay that would make the idea of switching locales intolerable. > unless that doesn't grow linearly with the number of rows. Well I was comparing sorting 206,000 rows. Even if it scales linearly, a 10s delay on sorting 2M records isn't really fatal. I certainly wouldn't want to remove the ability to sort using strcmp if the data is ascii or binary. But if you're going to use locale collation order it's going to be slower. strxfrm has to do quite a bit of work. Even a postgres-internal mechanism is going to have to do that same work. The only time you could save is the time it takes to look up "en_US" in a list (or hash) of cached locales and switch a pointer. I suspect that's going to be on a small (but not negligible) portion the overhead. I guess this is subject to analysis, I'll try to do a gprof run at some point to answer that. > > I see no reason to think Postgres's implementation of looking up xfrm rules > > for the specified locale will be any faster than the OS's. We know some OS's > > suck but some certainly don't. > > But do you have to change locales per row or per sort? Presumably, a built > in implementation may be able to do the latter rather than the former. We certainly need the ability to change the locales per-row, in fact possibly multiple times per row. Consider select en,fr from translationsorder by en,fr Which is actually something reasonable I could have to do in my current project. However changing locales should be nigh-instantaneous, it really ought to be just changing a pointer. And in the API Tom foresees shouldn't even happen. The only cost of sorting on many locales (aside from the initial load) would be in the reduced cache hit rate from using more locale tables. -- greg
pgsql-hackers by date: