Re: Collation rules and multi-lingual databases - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Collation rules and multi-lingual databases
Date
Msg-id 87isoohfcb.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: Collation rules and multi-lingual databases  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
Responses Re: Collation rules and multi-lingual databases
Re: Collation rules and multi-lingual databases
Re: Collation rules and multi-lingual databases
List pgsql-hackers
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

> Since most of that work is for an exceptional case, maybe it'd be safer
> (although slower) to structure the function as

Yeah I thought of that. But if making it a critical section is cheap then it's
probably a better approach. The problem with restoring the locale for the
palloc is that if the user is unlucky he might sort a table of thousands of
strings that all trigger the exception case.

The glibc docs sample code suggests using 2x the original string length for
the initial buffer. My testing showed that *always* triggered the exceptional
case. A bit of experimentation lead to the 3x+4 which eliminates it except for
0 and 1 byte strings. I'm still tweaking it. But on another OS, or in a more
complex collation locale maybe you would still trigger it a lot. Even as it is
if you happy to try to sort a large list of single character strings you would
trigger it a lot.

I have some documentation reading to do apparently before I can fix this up.


> setlocale
> call strxfrm (and that's it)
> setlocale back
> if there wasn't enough space
>  make a new buffer
>  setlocale
>  call strxfrm (and that's it)
>  setlocale back
> 
> Probably putting the sl/strxfrm/sl into its own function.

-- 
greg



pgsql-hackers by date:

Previous
From: Larry Rosenman
Date:
Subject: Re: strerror_r and gethostbyname_r?
Next
From: Jeff
Date:
Subject: Re: Single-file DBs WAS: Need concrete "Why Postgres