Re: orangutan seizes up during isolation-check - Mailing list pgsql-hackers

From Noah Misch
Subject Re: orangutan seizes up during isolation-check
Date
Msg-id 20140915142213.GB1332666@tornado.leadboat.com
Whole thread Raw
In response to Re: orangutan seizes up during isolation-check  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Mon, Sep 15, 2014 at 10:11:57AM +0300, Heikki Linnakangas wrote:
> On 09/15/2014 07:51 AM, Noah Misch wrote:
> >libintl replaces setlocale().  Its setlocale(LC_x, "") uses OS-specific APIs
> >to determine the default locale when $LANG and similar environment variables
> >are empty, as they are during "make check NO_LOCALE=1".  On OS X, it calls[1]
> >CFLocaleCopyCurrent(), which in turn spins up a thread.  See the end of this
> >message for the postmaster thread stacks active upon hitting a breakpoint set
> >at _dispatch_mgr_thread.
> 
> Ugh. I'd call that a bug in libintl. setlocale() has no business to
> make the process multi-threaded.

Fair interpretation.  libintl's options for mitigating the problem are even
more constrained, unfortunately.

> Do we have the same problem in backends? At a quick glance, aside
> from postmaster we only use PG_SETMASK(&BlockSig) in signal
> handlers, to prevent another signal handler from running
> concurrently.

In backends, we pass a specific value, not "", to pg_perm_setlocale().  (I
expect the problem in EXEC_BACKEND builds, but I doubt anyone uses that as a
production configuration on OS X.)  Every frontend program does call
setlocale(LC_ALL, "") via set_pglocale_pgservice().  None use sigprocmask(),
but a few do use fork().

> >1. Fork, call setlocale(LC_x, "") in the child, pass back the effective locale
> >    name through a pipe, and pass that name to setlocale() in the original
> >    process.  The short-lived child will get the extra threads, and the
> >    postmaster will remain clean.

> >I'm skeptical of the value of looking up locale information using other OS X
> >facilities when the usual environment variables are inconclusive, but I see no
> >clear cause to reverse that decision now.  I lean toward (1).
> 
> Both of those are horrible hacks. And who's to say that calling
> setlocale(LC_x, "foo") won't also call some function that makes the
> process multi-threaded. If not in any current OS X release, it might
> still happen in a future one.

Right.  Not just setlocale(); a large body of APIs could change that way.  The
CAVEATS section[1] of the OS X fork() manual page suggests Apple had that in
mind at some point.  To be 100% safe, we would write the postmaster as though
it's always multithreaded, including making[2] EXEC_BACKEND the only supported
configuration on OS X.

Assuming we don't do that anytime soon, I did have in mind to make the
postmaster raise an error if it becomes multithreaded.  (Apple's
pthread_is_threaded_np() is usable for this check.)  Then, at least, we'll
more-easily hear about new problems of this nature.

> One idea would be to use an extra pthread mutex or similar, in
> addition to PG_SETMASK(). Whenever you do PG_SETMASK(&BlockSig),
> also grab the mutex, and release it when you do
> PG_SETMASK(&UnBlockSig).

I had considered and tested such a thing, and it sufficed to clear orangutan's
present troubles.  You still face undefined behavior after fork().  It's okay
in practice if libdispatch threads are careful to register pthread_atfork()
handlers for any relevant resources.  It's a fair option, but running
setlocale(LC_x, "") in a short-lived child assumes less about the system
library implementation, seems no more of a hack to me, and isolates the
overhead at postmaster start.

[1] https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/fork.2.html
[2] http://www.postgresql.org/message-id/1349280184-sup-326@alvh.no-ip.org



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Triconsistent catalog declaration
Next
From: Kevin Grittner
Date:
Subject: Re: delta relations in AFTER triggers