Re: Locale timings - Mailing list pgsql-hackers

From Michael Tiemann
Subject Re: Locale timings
Date
Msg-id 3C029AEB.2030902@redhat.com
Whole thread Raw
In response to Locale timings  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
This is a common way of doing things inside glibc, and the happy result is that 
if you really want to build a non-locale-aware system, you can use a 
compile-time option that replaces the "locale_is_not_C" test with a constant. 
It makes for more maintainable code because there's less chance for bitrot in 
the usual case.

M

Peter Eisentraut wrote:

> I did some "benchmarks" to check whether --enable-locale with LC_ALL=C is
> just as fast as --disable-locale, to possibly justify making locale
> support the default.  This test only covers locale-aware comparisons,
> which seems to be the critical aspect for all intents and purposes.
> 
> I loaded a table of a single text column with 454240 rows of English
> words.  The table had a size of 21.5 MB.  The values were explicitly
> de-sorted, but the order was the same across all test runs.  Then I ran
> SELECT * FROM test ORDER BY 1; and timed the wall-clock response time a
> few times.  All configuration parameters were left at the default.
> 
> The averaged results follow.  Some logarithmic buffering cleverness
> appeared to surface, but the results are still distinct enough to be
> useful.
> 
> no locale:    58s
> locale=C:    78s    (ca. 33% slower)
> locale=en_US:    118s    (ca. 100% slower)
> 
> This confused me, because in my C library a strcoll() call with locale=C
> is handed to strcmp() quite directly.  A look into varlena.c:varstr_cmp()
> shows that the locale-aware path does some extra copying because there is
> no strncoll() function we can use with non-terminated strings.
> 
> For testing's sake I replaced the two palloc() calls in that function with
> alloca(), which is presumably the fastest possible memory allocator.
> Result:
> 
> locale=C,alloca:    67s    (ca. 15% slower)
> 
> This shows that we're wasting quite a bit of time allocating memory --
> probably not only in this place.  I'm pretty sure that the majority of the
> rest of the gap comes from the memcpy() operations.  Not that there's a
> whole lot we can do about either of these things.
> 
> However, I feel that we could reasonably cope with this situation by
> replacing
> 
> #ifdef USE_LOCALE
> /* locale-aware code */
> #else
> /* non-locale code */
> #endif
> 
> with
> 
> if (locale_is_not_C)
> {
>     /* locale-ware code */
> }
> else
> {
>     /* non-locale code */
> }
> 
> This practice should have minuscule impact, and it's probably the plan for
> the multibyte side of things as well.
> 
> 




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Call for objections: deprecate postmaster -o switch?
Next
From: Bruce Momjian
Date:
Subject: Re: Minor buglet in update...from (I think)