BUG #13454: Embedded python can stop WAL streaming and hot standby mode - Mailing list pgsql-bugs

From chris+postgresql@qwirx.com
Subject BUG #13454: Embedded python can stop WAL streaming and hot standby mode
Date
Msg-id 20150618165827.2737.42412@wrigleys.postgresql.org
Whole thread Raw
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      13454
Logged by:          Chris Wilson
Email address:      chris+postgresql@qwirx.com
PostgreSQL version: 9.4.1
Operating system:   Linux 2.6.32-220.30.1.el6.x86_64
Description:

I don't think this is actually a bug in Postgres, but perhaps the
documentation can be improved. I thought I should at least report it
somewhere public in case anyone else has the same problem.

One of our replicating hot standbys failed to come up properly after a
restart. We got a consistent state:

LOG:  consistent recovery state reached at CEC/AD9B8660

but not followed by:

LOG:  database system is ready to accept read only connections

nor:

LOG:  started streaming WAL from primary at CEE/17000000 on timeline 1

Instead, there was no clue why hot_standby mode didn't start, even at debug3
level, and lots of flip-flopping between stream and archive WAL sources
instead of successfully streaming:

DEBUG:  switched WAL source from stream to archive after failure
DEBUG:  switched WAL source from archive to stream after failure

This turned out to be a problem with our embedded python interpreter
(plpython2) having a site-wide sitecustomize.py script in PYTHONPATH, which
did something bad to Postgres (installing a fault handler for SIGUSR1) which
managed to stop it initialising completely, I guess.

The documentation implies that "consistent recovery state reached" will
always be followed by "database system is ready to accept read only
connections", but it isn't, and it's not clear why not.

There's also no clue what "failure" caused the "switched WAL source from
stream to archive after failure". Strace showed that postgres didn't even
try to connect to the remote server, so it must have known internally that
something was wrong, but it didn't tell us :)

I understand that you can't defend against everything that can be done in a
turing-complete embedded language, but it might be worth pointing a finger
at plugins if either of these expected progressions doesn't hold.

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #13440: unaccent does not remove all diacritics
Next
From: Andres Freund
Date:
Subject: Re: BUG #13440: unaccent does not remove all diacritics