Thread: Re: [GENERAL] Template zero xid issue
[ redirected to -hackers for discussion of bug fix ] "Keaton Adams" <kadams@mxlogic.com> writes: > Some additional information I just received from one of the operations > personnel: > 1. I set pg_database.datallowconn to true on template0 in an attempt to > log into the database with psql and perform a VACUUM there. It was our > last-ditch attempt to perform a vacuum against that database without > shutting down postgres altogether, and it worked fine when I tested it > in our ORT environment. Hah, and in fact the log you sent me shows that that is exactly what started the problem --- everything was fine until here: <2007-08-06 09:42:50 MDT>LOG: transaction ID wrap limit is 2147484146, limited by database "template0" <2007-08-06 09:42:50 MDT>STATEMENT: UPDATE pg_database SET datallowconn='t' WHERE datname='template0'; <2007-08-06 09:42:50 MDT>WARNING: database "template0" must be vacuumed within 736811 transactions <2007-08-06 09:42:50 MDT>HINT: To avoid a database shutdown, execute a full-database VACUUM in "template0". <2007-08-06 09:42:50 MDT>STATEMENT: UPDATE pg_database SET datallowconn='t' WHERE datname='template0'; <2007-08-06 09:42:50 MDT>ERROR: database is not accepting commands to avoid wraparound data loss in database "template0" <2007-08-06 09:42:50 MDT>HINT: Stop the postmaster and use a standalone backend to vacuum database "template0". So the whole thing boiled down to pilot error :-( ... if you hadn't misinterpreted template0's age() reading as something you needed to Do Something about, everything would have been fine. FWIW, in 8.2 and up template0 is autovacuumed the same as every other database, which hopefully will prevent people from making this sort of mistake in the future. That explains most of what I was wondering about in connection with the log excerpts. The remaining loose end was why, after successfully stopping the bgwriter, the postmaster chose to restart everything: <2007-08-06 09:44:07 MDT>LOG: database system is shut down <2007-08-06 09:44:07 MDT>FATAL: the database system is shutting down [ lots of these snipped ] <2007-08-06 09:44:07 MDT>LOG: background writer process (PID 5203) exited with exit code 0 <2007-08-06 09:44:07 MDT>LOG: terminating any other active server processes <2007-08-06 09:44:07 MDT>LOG: all server processes terminated; reinitializing <2007-08-06 09:44:07 MDT>LOG: database system was shut down at 2007-08-06 09:44:07 MDT <2007-08-06 09:44:07 MDT>FATAL: the database system is shutting down But I think I see what's going on there: the test for successful system shutdown is if (exitstatus == 0 && Shutdown > NoShutdown && !FatalError && !DLGetHead(BackendList) && AutoVacPID== 0) ie, we saw bgwriter do exit(0), and we are shutting down, and there is no other activity in the system. The problem with this apparently is that BackendList wasn't empty, which is not surprising given the storm of clients attempting to reconnect, as evidenced by all the "system is shutting down" log messages. Evidently, at the moment the bgwriter exited, there was at least one child that had been forked but hadn't yet finished telling its client to go away. I think this needs to be fixed, because in principle a sufficiently heavy load of connection requests could prevent the postmaster from *ever* finishing the shutdown sequence --- after it's restarted and re-stopped the bgwriter, it will come right back to this same test, and if there are more transient children it would repeat the whole process. Since we've not heard of any complaints of such happening in the field, it probably needn't be back-patched, but I want a fix for 8.3. I'm not entirely sure what's the most reasonable way to fix it though. I can see a number of alternatives: 1. Remove the test on BackendList shown above. This seems like a bad idea as it puts us at risk of the postmaster shutting down prematurely, should the bgwriter happen to exit(0) while there are still live backends. 2. Stop paying attention to incoming connection requests once we've begun the shutdown sequence, so that no new children will be spawned. This is probably not acceptable, particularly in the case of a "smart" shutdown which can go on for a long time; we really want would-be clients to get some kind of response and not just hang waiting for a connection. 3. Don't spawn a child process to reject connection requests once we have begun shutdown, but have the postmaster just send the response directly, much as it does in the case of fork failure. This is not too bad; the only real downside is that we can't allow the postmaster to block on any one would-be client, and so we'd have to send the error message on a one-try-and-give-up basis, the same as we do with the fork failure case. But the odds of failure for that are really pretty low. 4. Keep spawning a child, but mark it in the BackendList as known doomed, and don't count such children when deciding if it's OK to terminate. The problem with this idea is that such children will still be connected to shared memory, and we really don't want to terminate the postmaster before all connections to shmem are gone. (This objection also applies to #1, now that I think about it.) I'm sort of leaning to solution #3, but I wondered if anyone had a different opinion or a better idea. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > 4. Keep spawning a child, but mark it in the BackendList as known > doomed, and don't count such children when deciding if it's OK to > terminate. The problem with this idea is that such children will > still be connected to shared memory, and we really don't want to > terminate the postmaster before all connections to shmem are gone. > (This objection also applies to #1, now that I think about it.) > > I'm sort of leaning to solution #3, but I wondered if anyone had > a different opinion or a better idea. A variant on option 4 would be to stop accepting new connections once there are only known-doomed clients left. Ie, behave as if we're shut down already but not actually exit until all the known-doomed clients drain out. I think I agree that option 3 sounds simpler though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> 4. Keep spawning a child, but mark it in the BackendList as known >> doomed, and don't count such children when deciding if it's OK to >> terminate. The problem with this idea is that such children will >> still be connected to shared memory, and we really don't want to >> terminate the postmaster before all connections to shmem are gone. >> (This objection also applies to #1, now that I think about it.) >> >> I'm sort of leaning to solution #3, but I wondered if anyone had >> a different opinion or a better idea. > A variant on option 4 would be to stop accepting new connections once there > are only known-doomed clients left. Ie, behave as if we're shut down already > but not actually exit until all the known-doomed clients drain out. > I think I agree that option 3 sounds simpler though. After further thought, option 3 is not as nice as it looks. The postmaster hasn't read the client's connection request message yet and therefore doesn't know which protocol version it wants. We deal with this in the fork-failure case by always sending a V2-protocol error message (which newer clients are still expected to be able to handle). However, that's not very nice, because there's no way to include a SQLSTATE, and so client software wouldn't know *why* it got a failure, unless it wants to try to recognize the possibly-localized message text (ick). So doing this for startup/shutdown cases would be a noticeable step backwards in client friendliness. I'm tempted to stop accepting connections once we start the shutdown checkpoint (ie, tell the bgwriter to shut down). This would at least limit the width of the window in which incoming connections get ignored. We could also close the sockets at that point, so that clients get an immediate failure instead of hanging for awhile --- then it will look to them like the postmaster is already gone. Thoughts, better ideas? regards, tom lane
On Tue, Aug 07, 2007 at 06:46:05PM -0400, Tom Lane wrote: > I'm tempted to stop accepting connections once we start the shutdown > checkpoint (ie, tell the bgwriter to shut down). This would at least > limit the width of the window in which incoming connections get ignored. > We could also close the sockets at that point, so that clients get an > immediate failure instead of hanging for awhile --- then it will look > to them like the postmaster is already gone. Well, we're effectively shutdown at that point... I don't see much benefit to sending a "We're in the middle of shutting down" message over "couldn't connect to the server", which is what the client would have gotten anyway in very short order. -- Decibel!, aka Jim Nasby decibel@decibel.org EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Decibel! <decibel@decibel.org> writes: > On Tue, Aug 07, 2007 at 06:46:05PM -0400, Tom Lane wrote: >> I'm tempted to stop accepting connections once we start the shutdown >> checkpoint (ie, tell the bgwriter to shut down). This would at least >> limit the width of the window in which incoming connections get ignored. >> We could also close the sockets at that point, so that clients get an >> immediate failure instead of hanging for awhile --- then it will look >> to them like the postmaster is already gone. > Well, we're effectively shutdown at that point... I don't see much > benefit to sending a "We're in the middle of shutting down" message over > "couldn't connect to the server", which is what the client would have > gotten anyway in very short order. After sleeping on it, I see that doesn't work either. We have exactly the same problem during crash recovery: if there's a sufficiently constant flow of connection requests, the postmaster will never decide that all the old backends are gone and it can initiate recovery. (This is if anything a more plausible scenario than the shutdown case, since the clients will surely be trying to reconnect, whereas in a commanded shutdown you'd mostly expect the clients to be gone already.) And we can't just throw away the sockets if we intend to restart. The only way I can see to fix it is to have the postmaster just stop doing accept()s for a short time while the "doomed" children drain out of the system. If we don't do this until all the "live" children are gone then it should just be a short interval, since we know the "doomed" children will exit as soon as they get a few cycles to run. BTW, the crash-recovery case also destroys the option of just ignoring doomed children while deciding whether it's OK to move on to the next phase. Because doomed children will still be attached to the old shmem segment, they would prevent it from being deleted immediately when we IPC_RMID it. That could mean that our subsequent attempt to create a new one fails due to SHMALL violation, and thus that we fail to recover automatically after a crash. Can anyone think of a better adjective than "doomed"? Basically we're going to remember whether we sent the child a canAcceptConnections state different from CAC_OK. regards, tom lane