Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows - Mailing list pgsql-hackers

From Bryan Green
Subject Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows
Date
Msg-id 87ebc57c-71ea-4722-a3f7-46a4dd1b52c2@gmail.com
Whole thread Raw
In response to Re: [PATCH] Fix severe performance regression with gettext 0.20+ on Windows  (Peter Eisentraut <peter@eisentraut.org>)
List pgsql-hackers
On 12/17/2025 3:54 AM, Peter Eisentraut wrote:
> On 10.12.25 01:45, Bryan Green wrote:
>> The attached patch takes a pragmatic approach: for gettext 0.20.1+, we
>> avoid triggering the bug by using Windows locale format instead of
>> calling IsoLocaleName(). This works because gettext 0.20.1+ internally
>> converts the Windows format back to POSIX for catalog lookups, whereas
>> 0.19.8 and earlier need POSIX format directly.
> 
> A few comments on the patch itself:
> 
> 1) The description from the commit message, 'gettext 0.20.1+ expects
> Windows locale format ("English_United States") not POSIX format
> ("en_US")', or similar, should be added as a comment to the code.
> 
> 2) Is the cutoff actually 0.20.1 or should it be 0.20?
> 
> 3) Similarly some comment about why "C" and "POSIX" are handled specially.
> 
> 4) There is already earlier in pg_perm_setlocale() code that handles
> LC_MESSAGES specially on Windows, and it deals with the (locale == NULL
> || locale[0] == '\0') case, which is then repeated later on in your
> patch.  I wonder if this could be simplified or combined somehow.
> 
> 5) With the patch applied, building with a new-enough gettext will leave
> the function IsoLocaleName() unused, which could result in a compiler
> warning.  You could add pg_attribute_unused() to avoid that.
> 

Peter,

I agree with the above changes and have implemented them, including the
correction to the cutoff version.  But, before sharing the patch with
those changes I think we should discuss 1) should we short-circuit
C/POSIX and not ever call gettext in that case, 2) should we try to
convert "ISO" to Windows legacy format.

Even with the patch, if locale is not in Windows legacy locale format we
will pay the performance penalty.  I think we should short-circuit the
C/POSIX variants and just return the message untranslated.  This is the
intent of setting LC_MESSAGES to one of those values.  There is no
point, that I am aware of, in calling gettext in this case.

As for handling the "ISO" locale format if a Windows system has it set--
we could write something to convert from "ISO" to the Windows legacy
format. At first glance it is not obvious to me how to determine the
codepage-- unless we just use the default for the locale setting.
Otherwise, we seem to be at a position to require changes by the user. I
am considering submitting a patch to have the Windows enumeration code
in gettext handle "ISO" as well.

But, there seems to be no timeline for when any of these fixes will be
released for gettext upstream.  I have also found another patch that
needs to happen on gettext and will get that submitted shortly.  A more
radical choice would be to just have our own gettext with these patches
applied until upstream is ready.



BACKGROUND:

We call gettext() with three locale format types:

1) "C"/"POSIX" variants
2) ISO locales (language-region format like "en-US")
3) Windows legacy locales (like "English_United States.1252")

Current Problems
Problem 1: Version incompatibility

Versions before 0.20: Handled ISO locales correctly on Windows
Versions 0.20-0.26: Broke ISO locale handling with flawed enumeration logic

Problem 2: Performance degradation (versions 0.20+)

gettext enumerates all Windows locales on every call to find a match
Enumeration misses are not cached, forcing re-enumeration
Since we convert legacy locales to ISO format, we always miss.
Result: Severe performance penalty (180s → 32s for 1M exceptions when fixed)

A patch fixing the cache issue has been accepted into gettext's
development branch, but no release timeline exists.

Problem 3: C/POSIX locale handling

Windows doesn't recognize "C" or "POSIX" as valid locales
gettext correctly returns untranslated messages for these locales
However, it still performs the full locale enumeration before doing so
This adds unnecessary overhead to every call with C/POSIX locales

I'm preparing a patch for upstream regarding this issue.  Although, I
won't be surprised if they say "why are you setting it to those values
on Windows...those don't make any sense on Windows".

For gettext versions 0.20 and later, we must:

1) Stop converting Windows legacy locales to ISO format
2) Short-circuit gettext calls when locale is C/POSIX variant
3) Document that Windows users must use legacy locale format

Trade-offs

Versions before 0.20: Work correctly but can't upgrade to escape CVEs if
needed.
Versions 0.20+: Require workarounds above to maintain acceptable
performance.



-- 
Bryan Green
EDB: https://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Minor _bt_checkkeys comment improvements
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCH] Provide support for trailing commas