Re: How to add locale support for each column? - Mailing list pgsql-hackers

From Greg Stark
Subject Re: How to add locale support for each column?
Date
Msg-id 87r7oyjd4r.fsf@stark.xeocode.com
Whole thread Raw
In response to Re: How to add locale support for each column?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: How to add locale support for each column?  (Mahmoud Taghizadeh <m_taghi@yahoo.com>)
Re: How to add locale support for each column?  (Stephan Szabo <sszabo@megazone.bigpanda.com>)
List pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Peter Eisentraut <peter_e@gmx.net> writes:
> >> 2) switching the locale at run time is too expensive when using the system
> >> library.
>
> > Fwiw I did some experiments with this and found it wasn't true.
>
> Really?

We're following two different methodologies so the results aren't comparable.
I exposed strxfrm to postgres and then did a sort on strxfrm(col). The
resulting query times were slower than sorting on lower(col) by negligible
amounts.

I have the original code I wrote and Joe Conway's reimplementation of it using
setjmp/longjmp to protect against errors. However the list archives appear to
have been down that month so I've attached Joe Conway's implementation below.

> These are on machines of widely varying horsepower, so the absolute
> numbers shouldn't be compared across rows, but the general story holds:
> setlocale should be considered to be at least an order of magnitude
> slower than strcoll, and on non-glibc machines it can be a whole lot
> worse than that.

I don't see how this is relevant though. One way or another postgres is going
to have to sort strings in varying locales chosen at run-time. Comparing
against strcoll's execution time without changing changing locales is a straw
man. It's like comparing your tcp/ip bandwidth with the loopback interface's
bandwidth.

I see no reason to think Postgres's implementation of looking up xfrm rules
for the specified locale will be any faster than the OS's. We know some OS's
suck but some certainly don't.

Perhaps glibc's locale handling functions ought to be available as a separate
library users of those OS's could install -- if it isn't already.

> I don't think we can take that attitude when the cost penalty involved
> can be a couple of orders of magnitude.

Aside from the above complaint there's another problem with your methodology.
An order of magnitude change in the cost for strcoll isn't really relevant
unless the cost for strcoll is significant to begin with. I suspect strcoll
costs are currently dwarfed by the palloc costs to evaluate the expression
already.



Here's the implementation in postgres from Joe Conway btw:


Joe Conway wrote:
> What about something like this?

Oops! Forgot to restrore error handling. See below:

Joe

> 8<--------------------------------
>
> #include <setjmp.h>
> #include <string.h>
>
> #include "postgres.h"
> #include "fmgr.h"
> #include "tcop/tcopprot.h"
> #include "utils/builtins.h"
>
> #define GET_STR(textp) \
>   DatumGetCString(DirectFunctionCall1(textout, PointerGetDatum(textp)))
> #define GET_BYTEA(str_) \
>   DatumGetTextP(DirectFunctionCall1(byteain, CStringGetDatum(str_)))
> #define MAX_BYTEA_LEN    0x3fffffff
>
> /*
>  * pg_strxfrm - Function to convert string similar to the strxfrm C
>  * function using a specified locale.
>  */
> extern Datum pg_strxfrm(PG_FUNCTION_ARGS);
> PG_FUNCTION_INFO_V1(pg_strxfrm);
>
> Datum
> pg_strxfrm(PG_FUNCTION_ARGS)
> {
>   char       *str = GET_STR(PG_GETARG_TEXT_P(0));
>   size_t      str_len = strlen(str);
>   char       *localestr = GET_STR(PG_GETARG_TEXT_P(1));
>   size_t      approx_trans_len = 4 + (str_len * 3);
>   char       *trans = (char *) palloc(approx_trans_len + 1);
>   size_t      actual_trans_len;
>   char       *oldlocale;
>   char       *newlocale;
>   sigjmp_buf  save_restart;
>
>   if (approx_trans_len > MAX_BYTEA_LEN)
>     elog(ERROR, "source string too long to transform");
>
>   oldlocale = setlocale(LC_COLLATE, NULL);
>   if (!oldlocale)
>     elog(ERROR, "setlocale failed to return a locale");
>
>   /* catch elog while locale is set other than the default */
>   memcpy(&save_restart, &Warn_restart, sizeof(save_restart));
>   if (sigsetjmp(Warn_restart, 1) != 0)
>   {
>     memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart));
>     newlocale = setlocale(LC_COLLATE, oldlocale);
>     if (!newlocale)
>       elog(PANIC, "setlocale failed to reset locale: %s", localestr);
>     siglongjmp(Warn_restart, 1);
>   }
>
>   newlocale = setlocale(LC_COLLATE, localestr);
>   if (!newlocale)
>     elog(ERROR, "setlocale failed to set a locale: %s", localestr);
>
>   actual_trans_len = strxfrm(trans, str, approx_trans_len + 1);
>
>   /* if the buffer was not large enough, resize it and try again */
>   if (actual_trans_len >= approx_trans_len)
>   {
>     approx_trans_len = actual_trans_len + 1;
>     if (approx_trans_len > MAX_BYTEA_LEN)
>       elog(ERROR, "source string too long to transform");
>
>     trans = (char *) repalloc(trans, approx_trans_len + 1);
>     actual_trans_len = strxfrm(trans, str, approx_trans_len + 1);
>
>     /* if the buffer still not large enough, punt */
>     if (actual_trans_len >= approx_trans_len)
>       elog(ERROR, "strxfrm failed, buffer insufficient");
>   }
>
>   newlocale = setlocale(LC_COLLATE, oldlocale);
>   if (!newlocale)
>     elog(PANIC, "setlocale failed to reset locale: %s", localestr);

   /* restore normal error handling */
   memcpy(&Warn_restart, &save_restart, sizeof(Warn_restart));

>
>   PG_RETURN_BYTEA_P(GET_BYTEA(trans));
> }
>
> 8<--------------------------------
>






--
greg

pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: libpq and prepared statements progress for 8.0
Next
From: mcolosimo@mitre.org
Date:
Subject: Re: tweaking MemSet() performance - 7.4.5