Thread: Re: [GENERAL] Template zero xid issue

Re: [GENERAL] Template zero xid issue

From
Tom Lane
Date:
[ redirected to -hackers for discussion of bug fix ]

"Keaton Adams" <kadams@mxlogic.com> writes:
> Some additional information I just received from one of the operations
> personnel:

> 1. I set pg_database.datallowconn to true on template0 in an attempt to
> log into the database with psql and perform a VACUUM there.  It was our
> last-ditch attempt to perform a vacuum against that database without
> shutting down postgres altogether, and it worked fine when I tested it
> in our ORT environment.

Hah, and in fact the log you sent me shows that that is exactly what
started the problem --- everything was fine until here:

<2007-08-06 09:42:50 MDT>LOG:  transaction ID wrap limit is 2147484146, limited by database "template0"
<2007-08-06 09:42:50 MDT>STATEMENT:  UPDATE pg_database SET datallowconn='t' WHERE datname='template0';
<2007-08-06 09:42:50 MDT>WARNING:  database "template0" must be vacuumed within 736811 transactions
<2007-08-06 09:42:50 MDT>HINT:  To avoid a database shutdown, execute a full-database VACUUM in "template0".
<2007-08-06 09:42:50 MDT>STATEMENT:  UPDATE pg_database SET datallowconn='t' WHERE datname='template0';
<2007-08-06 09:42:50 MDT>ERROR:  database is not accepting commands to avoid wraparound data loss in database
"template0"
<2007-08-06 09:42:50 MDT>HINT:  Stop the postmaster and use a standalone backend to vacuum database "template0".

So the whole thing boiled down to pilot error :-( ... if you hadn't
misinterpreted template0's age() reading as something you needed to
Do Something about, everything would have been fine.

FWIW, in 8.2 and up template0 is autovacuumed the same as every other
database, which hopefully will prevent people from making this sort
of mistake in the future.

That explains most of what I was wondering about in connection with the
log excerpts.  The remaining loose end was why, after successfully
stopping the bgwriter, the postmaster chose to restart everything:

<2007-08-06 09:44:07 MDT>LOG:  database system is shut down
<2007-08-06 09:44:07 MDT>FATAL:  the database system is shutting down
[ lots of these snipped ]
<2007-08-06 09:44:07 MDT>LOG:  background writer process (PID 5203) exited with exit code 0
<2007-08-06 09:44:07 MDT>LOG:  terminating any other active server processes
<2007-08-06 09:44:07 MDT>LOG:  all server processes terminated; reinitializing
<2007-08-06 09:44:07 MDT>LOG:  database system was shut down at 2007-08-06 09:44:07 MDT
<2007-08-06 09:44:07 MDT>FATAL:  the database system is shutting down

But I think I see what's going on there: the test for successful
system shutdown is
           if (exitstatus == 0 && Shutdown > NoShutdown && !FatalError &&               !DLGetHead(BackendList) &&
AutoVacPID== 0)
 

ie, we saw bgwriter do exit(0), and we are shutting down, and there
is no other activity in the system.  The problem with this apparently
is that BackendList wasn't empty, which is not surprising given the
storm of clients attempting to reconnect, as evidenced by all the
"system is shutting down" log messages.  Evidently, at the moment the
bgwriter exited, there was at least one child that had been forked but
hadn't yet finished telling its client to go away.

I think this needs to be fixed, because in principle a sufficiently
heavy load of connection requests could prevent the postmaster from
*ever* finishing the shutdown sequence --- after it's restarted and
re-stopped the bgwriter, it will come right back to this same test, and
if there are more transient children it would repeat the whole process.
Since we've not heard of any complaints of such happening in the field,
it probably needn't be back-patched, but I want a fix for 8.3.

I'm not entirely sure what's the most reasonable way to fix it though.
I can see a number of alternatives:

1. Remove the test on BackendList shown above.  This seems like a bad
idea as it puts us at risk of the postmaster shutting down prematurely,
should the bgwriter happen to exit(0) while there are still live
backends.

2. Stop paying attention to incoming connection requests once we've
begun the shutdown sequence, so that no new children will be spawned.
This is probably not acceptable, particularly in the case of a "smart"
shutdown which can go on for a long time; we really want would-be
clients to get some kind of response and not just hang waiting for
a connection.

3. Don't spawn a child process to reject connection requests once
we have begun shutdown, but have the postmaster just send the response
directly, much as it does in the case of fork failure.  This is not
too bad; the only real downside is that we can't allow the postmaster
to block on any one would-be client, and so we'd have to send the
error message on a one-try-and-give-up basis, the same as we do with the
fork failure case.  But the odds of failure for that are really pretty
low.

4. Keep spawning a child, but mark it in the BackendList as known
doomed, and don't count such children when deciding if it's OK to
terminate.  The problem with this idea is that such children will
still be connected to shared memory, and we really don't want to
terminate the postmaster before all connections to shmem are gone.
(This objection also applies to #1, now that I think about it.)

I'm sort of leaning to solution #3, but I wondered if anyone had
a different opinion or a better idea.
        regards, tom lane


Re: [GENERAL] Template zero xid issue

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> 4. Keep spawning a child, but mark it in the BackendList as known
> doomed, and don't count such children when deciding if it's OK to
> terminate.  The problem with this idea is that such children will
> still be connected to shared memory, and we really don't want to
> terminate the postmaster before all connections to shmem are gone.
> (This objection also applies to #1, now that I think about it.)
>
> I'm sort of leaning to solution #3, but I wondered if anyone had
> a different opinion or a better idea.

A variant on option 4 would be to stop accepting new connections once there
are only known-doomed clients left. Ie, behave as if we're shut down already
but not actually exit until all the known-doomed clients drain out.

I think I agree that option 3 sounds simpler though.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: [GENERAL] Template zero xid issue

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> 4. Keep spawning a child, but mark it in the BackendList as known
>> doomed, and don't count such children when deciding if it's OK to
>> terminate.  The problem with this idea is that such children will
>> still be connected to shared memory, and we really don't want to
>> terminate the postmaster before all connections to shmem are gone.
>> (This objection also applies to #1, now that I think about it.)
>> 
>> I'm sort of leaning to solution #3, but I wondered if anyone had
>> a different opinion or a better idea.

> A variant on option 4 would be to stop accepting new connections once there
> are only known-doomed clients left. Ie, behave as if we're shut down already
> but not actually exit until all the known-doomed clients drain out.

> I think I agree that option 3 sounds simpler though.

After further thought, option 3 is not as nice as it looks.  The
postmaster hasn't read the client's connection request message yet and
therefore doesn't know which protocol version it wants.  We deal with
this in the fork-failure case by always sending a V2-protocol error
message (which newer clients are still expected to be able to handle).
However, that's not very nice, because there's no way to include a
SQLSTATE, and so client software wouldn't know *why* it got a failure,
unless it wants to try to recognize the possibly-localized message text
(ick).  So doing this for startup/shutdown cases would be a noticeable
step backwards in client friendliness.

I'm tempted to stop accepting connections once we start the shutdown
checkpoint (ie, tell the bgwriter to shut down).  This would at least
limit the width of the window in which incoming connections get ignored.
We could also close the sockets at that point, so that clients get an
immediate failure instead of hanging for awhile --- then it will look
to them like the postmaster is already gone.

Thoughts, better ideas?
        regards, tom lane


Re: [GENERAL] Template zero xid issue

From
Decibel!
Date:
On Tue, Aug 07, 2007 at 06:46:05PM -0400, Tom Lane wrote:
> I'm tempted to stop accepting connections once we start the shutdown
> checkpoint (ie, tell the bgwriter to shut down).  This would at least
> limit the width of the window in which incoming connections get ignored.
> We could also close the sockets at that point, so that clients get an
> immediate failure instead of hanging for awhile --- then it will look
> to them like the postmaster is already gone.

Well, we're effectively shutdown at that point... I don't see much
benefit to sending a "We're in the middle of shutting down" message over
"couldn't connect to the server", which is what the client would have
gotten anyway in very short order.
--
Decibel!, aka Jim Nasby                        decibel@decibel.org
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

Re: [GENERAL] Template zero xid issue

From
Tom Lane
Date:
Decibel! <decibel@decibel.org> writes:
> On Tue, Aug 07, 2007 at 06:46:05PM -0400, Tom Lane wrote:
>> I'm tempted to stop accepting connections once we start the shutdown
>> checkpoint (ie, tell the bgwriter to shut down).  This would at least
>> limit the width of the window in which incoming connections get ignored.
>> We could also close the sockets at that point, so that clients get an
>> immediate failure instead of hanging for awhile --- then it will look
>> to them like the postmaster is already gone.

> Well, we're effectively shutdown at that point... I don't see much
> benefit to sending a "We're in the middle of shutting down" message over
> "couldn't connect to the server", which is what the client would have
> gotten anyway in very short order.

After sleeping on it, I see that doesn't work either.  We have exactly
the same problem during crash recovery: if there's a sufficiently
constant flow of connection requests, the postmaster will never decide
that all the old backends are gone and it can initiate recovery.  (This
is if anything a more plausible scenario than the shutdown case, since
the clients will surely be trying to reconnect, whereas in a commanded
shutdown you'd mostly expect the clients to be gone already.)  And we
can't just throw away the sockets if we intend to restart.

The only way I can see to fix it is to have the postmaster just stop
doing accept()s for a short time while the "doomed" children drain out
of the system.  If we don't do this until all the "live" children are
gone then it should just be a short interval, since we know the "doomed"
children will exit as soon as they get a few cycles to run.

BTW, the crash-recovery case also destroys the option of just ignoring
doomed children while deciding whether it's OK to move on to the next
phase.  Because doomed children will still be attached to the old shmem
segment, they would prevent it from being deleted immediately when we
IPC_RMID it.  That could mean that our subsequent attempt to create a
new one fails due to SHMALL violation, and thus that we fail to recover
automatically after a crash.

Can anyone think of a better adjective than "doomed"?  Basically we're
going to remember whether we sent the child a canAcceptConnections state
different from CAC_OK.
        regards, tom lane