Re: Solaris testers wanted for strxfrm() behavior - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Solaris testers wanted for strxfrm() behavior
Date
Msg-id 20150630055741.GA774647@tornado.leadboat.com
Whole thread Raw
In response to Re: Solaris testers wanted for strxfrm() behavior  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Solaris testers wanted for strxfrm() behavior
List pgsql-hackers
On Mon, Jun 29, 2015 at 11:52:26AM +1200, Thomas Munro wrote:
> On Mon, Jun 29, 2015 at 10:57 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@enterprisedb.com> writes:
> >> Just by the way, I wonder if this was that bug:
> >> https://illumos.org/issues/1594
> >
> > Oooh.  Might or might not be *same* bug, but it sure looks like it could
> > have the right symptom.  If this is indeed inherited from old Solaris,
> > I'm afraid we are totally fooling ourselves if we guess that it's no
> > longer present in the wild.

Very interesting.  Looks like the illumos strxfrm() came from FreeBSD, not
from Solaris; illumos introduced their bug independently:

https://illumos.org/issues/2
https://github.com/illumos/illumos-gate/commits/master/usr/src/lib/libc/port/locale/collate.c

> Also, here is an interesting patch that went into the Apache C++
> standard library.  Maybe the problem was limited to amd64 system...
> 
> https://github.com/illumos/illumos-userland/blob/master/components/stdcxx/patches/047-collate.cpp.patch

That's a useful data point.  Based on Oskari Saarenmaa's report, newer Solaris
10 is not affected.  The fix presumably showed up after the 05/08 release and
no later than the 01/13 release.

On Sun, Jun 28, 2015 at 07:00:14PM -0400, Tom Lane wrote:
> > On Sun, Jun 28, 2015 at 12:58 PM, Josh Berkus <josh@agliodbs.com> wrote:
> >> My perspective is that if both SmartOS and OmniOS pass, it's not our
> >> responsibility to support OldSolaris if they won't update libraries.

> Another idea would be to make a test during postmaster start to see
> if this bug exists, and fail if so.  I'm generally on board with the
> thought that we don't need to work on systems with such a bad bug,
> but it would be a good thing if the failure was clean and produced
> a helpful error message, rather than looking like a Postgres bug.

Failing cleanly on unpatched Solaris is adequate, agreed.  A check at
postmaster start isn't enough, because the postmaster might run in the C
locale while individual databases or collations use problem locales.  The
safest thing is to test after every setlocale(LC_COLLATE) and
newlocale(LC_COLLATE).  That's once at backend start and once per backend per
collation used, more frequent than I would like.  Hmm.



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: drop/truncate table sucks for large values of shared buffers
Next
From: Simon Riggs
Date:
Subject: Re: Reduce ProcArrayLock contention