Thread: Race-condition with failed block-write?
Hi list, I noticed a development machine was responding terribly slow, and found out postgresql had consumed all available memory: USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND postgres 7465 0.0 0.0 16636 928 ? S Sep05 0:00 /usr/bin/postmaster -D /var/lib/postgresql/data postgres 7472 0.0 81.0 3144408 832032 ? D Sep05 0:56 \_ postgres: writer process postgres 7473 0.0 0.0 7552 916 ? S Sep05 0:00 \_ postgres: stats buffer process postgres 7474 0.0 0.0 6848 900 ? S Sep05 0:00 \_ postgres: stats collector process It had taken over 800MB of memory. The machine is a P4 3.0 Ghz (with HT enabled), 1GB of memory a few SATA disks and runs with a recent Gentoo + 2.6.12.5 kernel. The postgresql version that failed was 8.0.3, it may or may not be worth knowing that it runs a 8.1devel on another port and from another directory. The 8.0.3-database was not actively in use, since our current development work was using the 8.1devel. In the postgresql.log a write-failure messages was repeated enough to make the log file 50MB larger: This was the first one: [ - 2005-09-07 13:03:47 CEST @] ERROR: xlog flush request 29/67713428 is not satisfied --- flushed only to 29/2E73E488 [ - 2005-09-07 13:03:47 CEST @] CONTEXT: writing block 21 of relation 1663/2013826/9975789 [ - 2005-09-07 13:03:48 CEST @] ERROR: xlog flush request 29/67713428 is not satisfied --- flushed only to 29/2E73E488 [ - 2005-09-07 13:03:48 CEST @] CONTEXT: writing block 21 of relation 1663/2013826/9975789 [ - 2005-09-07 13:03:48 CEST @] WARNING: could not write block 21 of 1663/2013826/9975789 [ - 2005-09-07 13:03:48 CEST @] DETAIL: Multiple failures --- write error may be permanent. And those 4 lines were repeated up untill the memory was empty, as it seems from the log: TopMemoryContext: -1095880208 total in 264213 blocks; 537938888 free (924739 chunks); -1633819096 used MdSmgr: 8192 total in 1 blocks; 8024 free (0 chunks); 168 used Pending Ops Table: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used DynaHash: 8192 total in 1 blocks; 7488 free (0 chunks); 704 used smgr relation table: 8192 total in 1 blocks; 4048 free (0 chunks); 4144 used LockTable (locallock hash): 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used [ - 2005-09-09 02:42:22 CEST @] ERROR: out of memory [ - 2005-09-09 02:42:22 CEST @] DETAIL: Failed on request of size 16000. After issueing an immediate-shutdown, it starts and runs just fine: [ - 2005-09-09 10:19:51 CEST @] LOG: received fast shutdown request [ - 2005-09-09 10:23:01 CEST @] LOG: received immediate shutdown request [ - 2005-09-09 10:27:20 CEST @] LOG: database system was interrupted at 2005-09-05 11:24:08 CEST [ - 2005-09-09 10:27:20 CEST @] LOG: checkpoint record is at 29/2E73E44C [ - 2005-09-09 10:27:20 CEST @] LOG: redo record is at 29/2E73E44C; undo record is at 0/0; shutdown TRUE [ - 2005-09-09 10:27:20 CEST @] LOG: next transaction ID: 166361; next OID: 55768251 [ - 2005-09-09 10:27:20 CEST @] LOG: database system was not properly shut down; automatic recovery in progress [ - 2005-09-09 10:27:20 CEST @] LOG: record with zero length at 29/2E73E488 [ - 2005-09-09 10:27:20 CEST @] LOG: redo is not required [ - 2005-09-09 10:27:20 CEST @] LOG: database system is ready At 2005-09-05 the machine was rebooted and the only query since then (which may have triggered the issue) was: [2005-09-07 13:03:35 CEST - 2005-09-07 13:03:37 CEST acm@acm] LOG: statement: SELECT c.oid, n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE pg_catalog.pg_table_is_visible(c.oid) AND c.relname ~ '^pwprijs$' ORDER BY 2, 3; I shut down the postgresql-server after the restart again to prevent further changes to the data-files of it. Best regards, Arjen van der Meijden
Arjen van der Meijden <acm@tweakers.net> writes: > In the postgresql.log a write-failure messages was repeated enough to > make the log file 50MB larger: > [ - 2005-09-07 13:03:47 CEST @] ERROR: xlog flush request 29/67713428 > is not satisfied --- flushed only to 29/2E73E488 > [ - 2005-09-07 13:03:47 CEST @] CONTEXT: writing block 21 of relation > 1663/2013826/9975789 > ... > TopMemoryContext: -1095880208 total in 264213 blocks; 537938888 free > (924739 chunks); -1633819096 used > MdSmgr: 8192 total in 1 blocks; 8024 free (0 chunks); 168 used > Pending Ops Table: 8192 total in 1 blocks; 6112 free (0 chunks); 2080 used > DynaHash: 8192 total in 1 blocks; 7488 free (0 chunks); 704 used > smgr relation table: 8192 total in 1 blocks; 4048 free (0 chunks); 4144 used > LockTable (locallock hash): 8192 total in 1 blocks; 6112 free (0 > chunks); 2080 used > ErrorContext: 8192 total in 1 blocks; 8176 free (0 chunks); 16 used > [ - 2005-09-09 02:42:22 CEST @] ERROR: out of memory > [ - 2005-09-09 02:42:22 CEST @] DETAIL: Failed on request of size 16000. The pending-ops table only exists in the bgwriter, so it's evidently the bgwriter that was out of memory. I have an old note to myself : Doesn't bgwriter risk leaking memory if ereport out of a checkpoint? : Seems we should have it run checkpoints in a short-term context. : Don't the other daemons have similar issues? It looks to me like you have a case of this actually happening: it kept failing to execute a checkpoint and leaking some more memory each time. I'll move the priority of fixing that up a bit ... The other question is why the failure was occurring in the first place --- corrupt LSN value, apparently, but how'd it get that way? regards, tom lane
Arjen van der Meijden <acm@tweakers.net> writes: > It had taken over 800MB of memory. The machine is a P4 3.0 Ghz (with HT > enabled), 1GB of memory a few SATA disks and runs with a recent Gentoo + > 2.6.12.5 kernel. > The postgresql version that failed was 8.0.3, it may or may not be worth > knowing that it runs a 8.1devel on another port and from another directory. > In the postgresql.log a write-failure messages was repeated enough to > make the log file 50MB larger: I was able to replicate the memory leak in 8.0 by artificially poking a bad value into the LSN field of a dirty page. CVS tip did not seem to leak anything under the same conditions --- probably the difference is that BufferSync() uses palloc'd temporary space in 8.0 but not in HEAD. Nonetheless, we ought to have a positive defense against such leaks in future, so I've applied the attached patch to both branches. There is still the question of how the LSN value got corrupted, but we've probably missed any chance of finding that out now. regards, tom lane Index: bgwriter.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/postmaster/bgwriter.c,v retrieving revision 1.19 diff -c -r1.19 bgwriter.c *** bgwriter.c 20 Aug 2005 23:26:17 -0000 1.19 --- bgwriter.c 12 Sep 2005 21:42:48 -0000 *************** *** 160,165 **** --- 160,166 ---- BackgroundWriterMain(void) { sigjmp_buf local_sigjmp_buf; + MemoryContext bgwriter_context; Assert(BgWriterShmem != NULL); BgWriterShmem->bgwriter_pid = MyProcPid; *************** *** 208,213 **** --- 209,227 ---- last_checkpoint_time = time(NULL); /* + * Create a memory context that we will do all our work in. We do this + * so that we can reset the context during error recovery and thereby + * avoid possible memory leaks. Formerly this code just ran in + * TopMemoryContext, but resetting that would be a really bad idea. + */ + bgwriter_context = AllocSetContextCreate(TopMemoryContext, + "Background Writer", + ALLOCSET_DEFAULT_MINSIZE, + ALLOCSET_DEFAULT_INITSIZE, + ALLOCSET_DEFAULT_MAXSIZE); + MemoryContextSwitchTo(bgwriter_context); + + /* * If an exception is encountered, processing resumes here. * * See notes in postgres.c about the design of this coding. *************** *** 247,255 **** * Now return to normal top-level context and clear ErrorContext * for next time. */ ! MemoryContextSwitchTo(TopMemoryContext); FlushErrorState(); /* Now we can allow interrupts again */ RESUME_INTERRUPTS(); --- 261,272 ---- * Now return to normal top-level context and clear ErrorContext * for next time. */ ! MemoryContextSwitchTo(bgwriter_context); FlushErrorState(); + /* Flush any leaked data in the top-level context */ + MemoryContextResetAndDeleteChildren(bgwriter_context); + /* Now we can allow interrupts again */ RESUME_INTERRUPTS();
Arjen van der Meijden <acm@tweakers.net> writes: > As a matter of fact, I just tested it again. Issuing that same query > will trigger the issue again when I execute it. I don't know whether the > query matters, but I know this one will trigger it, so I didn't try others. It's highly unlikely that that query has anything to do with it, since it's not touching anything but system catalogs and not trying to write them either. > Since its a development machine, access to it is a possibility. But if > you can give me a few pointers how to gather usefull information without > any "stranger" accessing the machine, that'd be nice. The first thing you ought to find out is which table 1663/2013826/9975789 is, and look to see if the corrupted LSN value is already present on disk in that block. If it is, then we've probably not got much chance of finding out how it got there. If it is *not* on disk, but you have a repeatable way of causing this to happen starting from a clean postmaster start, then that's pretty interesting --- but I don't know any way of figuring it out short of groveling through the code with a debugger. If you're not already pretty familiar with the PG code, coaching you remotely isn't going to work very well :-(. I'd be glad to look into it if you can get me access to the machine though. regards, tom lane
Arjen van der Meijden <acm@tweakers.net> writes: > On 13-9-2005 16:25, Tom Lane wrote: >> The first thing you ought to find out is which table >> 1663/2013826/9975789 is, and look to see if the corrupted LSN value is >> already present on disk in that block. > Well, its an index, not a table. It was the index: > "pg_class_relname_nsp_index" on pg_class(relname, relnamespace). Ah. So you've reindexed pg_class at some point. Reindexing it again would likely get you out of this. > Using pg_filedump I extracted the LSN for block 21 and indeed, that was > already 67713428 instead of something below 2E73E53C. It wasn't that > block alone though, here are a few LSN-lines from it: > LSN: logid 41 recoff 0x676f5174 Special 8176 (0x1ff0) > LSN: logid 25 recoff 0x3c6c5504 Special 8176 (0x1ff0) > LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) > LSN: logid 41 recoff 0x2ea88190 Special 8176 (0x1ff0) > LSN: logid 1 recoff 0x68e2f660 Special 8176 (0x1ff0) > LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) > LSN: logid 1 recoff 0x68e2f6a4 Special 8176 (0x1ff0) logid is the high-order half of the LSN, so there's nothing wrong with those other pages --- it's only the first one you show there that seems to be past the current end of WAL. > On that day I did some active query-tuning, but a few times it took too > long, so I issued immediate shut downs when the selects took too long. > There were no warnings about broken records afterwards in the log > though, so I don't believe anything got damaged afterwards. I have a feeling something may have gone wrong here, though it's hard to say what. If the bogus pages in the other tables all have LSNs close to this one then that makes it less likely that this is a random corruption event --- what would be more plausible is that end of WAL really was that high and somehow the WAL counter got reset back during one of those forced restarts. Can you show us ls -l output for the pg_xlog directory? I'm interested to see the file names and mod dates there. regards, tom lane
On 13-9-2005 20:04, Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: > >>On 13-9-2005 16:25, Tom Lane wrote: >> >>Well, its an index, not a table. It was the index: >>"pg_class_relname_nsp_index" on pg_class(relname, relnamespace). > > Ah. So you've reindexed pg_class at some point. Reindexing it again > would likely get you out of this. Unless reindexing is part of other commands, I didn't do that. The last time 'grep' was able to find an reference to something being reindexed was in June, something (maybe me, but I doubt it, I'd also reindex the user-tables, I suppose) was reindexing all system tables back then. Besides, its not just the index, on pg_class, pg_class itself (and pg_index) have wrong LSN's as well. >>Using pg_filedump I extracted the LSN for block 21 and indeed, that was >>already 67713428 instead of something below 2E73E53C. It wasn't that >>block alone though, here are a few LSN-lines from it: > > >> LSN: logid 41 recoff 0x676f5174 Special 8176 (0x1ff0) >> LSN: logid 25 recoff 0x3c6c5504 Special 8176 (0x1ff0) >> LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) >> LSN: logid 41 recoff 0x2ea88190 Special 8176 (0x1ff0) >> LSN: logid 1 recoff 0x68e2f660 Special 8176 (0x1ff0) >> LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) >> LSN: logid 1 recoff 0x68e2f6a4 Special 8176 (0x1ff0) > > > logid is the high-order half of the LSN, so there's nothing wrong with > those other pages --- it's only the first one you show there that seems > to be past the current end of WAL. There were 3 blocks of 40 with a LSN like the first one above in that index-file. So with high-order 41, recoff 0x67[67]something. In the pg_class-file there were 6 blocks, of which 5 LSN's were like the above in that index. And for pg_index 3 blocks, with 1 wrong. >>On that day I did some active query-tuning, but a few times it took too >>long, so I issued immediate shut downs when the selects took too long. >>There were no warnings about broken records afterwards in the log >>though, so I don't believe anything got damaged afterwards. > > I have a feeling something may have gone wrong here, though it's hard to > say what. If the bogus pages in the other tables all have LSNs close to > this one then that makes it less likely that this is a random corruption > event --- what would be more plausible is that end of WAL really was > that high and somehow the WAL counter got reset back during one of those > forced restarts. > > Can you show us ls -l output for the pg_xlog directory? I'm interested > to see the file names and mod dates there. Here you go: l /var/lib/postgresql/data/pg_xlog/ total 145M drwx------ 3 postgres postgres 4.0K Sep 1 12:37 . drwx------ 8 postgres postgres 4.0K Sep 13 20:31 .. -rw------- 1 postgres postgres 16M Sep 13 19:25 00000001000000290000002E -rw------- 1 postgres postgres 16M Sep 1 12:36 000000010000002900000067 -rw------- 1 postgres postgres 16M Aug 25 11:40 000000010000002900000068 -rw------- 1 postgres postgres 16M Aug 25 11:40 000000010000002900000069 -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006A -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006B -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006C -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006D -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006E During data-load it was warning about too frequent checkpoints, but I do hope thats mostly performance-related, not stability? Best regards, Arjen van der Meijden
Arjen van der Meijden <acm@tweakers.net> writes: > Unless reindexing is part of other commands, I didn't do that. The last > time 'grep' was able to find an reference to something being reindexed > was in June, something (maybe me, but I doubt it, I'd also reindex the > user-tables, I suppose) was reindexing all system tables back then. Well, that would have done it. But if you have such complete logs, do you have the postmaster log from the whole period? Look for sequences approximately like this: 2005-09-13 14:48:33.450 EDT 25857 LOG: database system was interrupted at 2005-09-13 14:42:08 EDT 2005-09-13 14:48:33.450 EDT 25857 LOG: checkpoint record is at 0/12B65A40 2005-09-13 14:48:33.450 EDT 25857 LOG: redo record is at 0/12B65A40; undo record is at 0/0; shutdown FALSE 2005-09-13 14:48:33.450 EDT 25857 LOG: next transaction ID: 67789; next OID: 240494 2005-09-13 14:48:33.450 EDT 25857 LOG: next MultiXactId: 1; next MultiXactOffset: 0 2005-09-13 14:48:33.450 EDT 25857 LOG: database system was not properly shut down; automatic recovery in progress 2005-09-13 14:48:33.452 EDT 25857 LOG: record with zero length at 0/12B65A88 2005-09-13 14:48:33.452 EDT 25857 LOG: redo is not required 2005-09-13 14:48:33.474 EDT 25857 LOG: database system is ready and note whether there's any one that shows a WAL end address (here 0/12B65A88) that's less than the one before. > l /var/lib/postgresql/data/pg_xlog/ > total 145M > drwx------ 3 postgres postgres 4.0K Sep 1 12:37 . > drwx------ 8 postgres postgres 4.0K Sep 13 20:31 .. > -rw------- 1 postgres postgres 16M Sep 13 19:25 00000001000000290000002E > -rw------- 1 postgres postgres 16M Sep 1 12:36 000000010000002900000067 > -rw------- 1 postgres postgres 16M Aug 25 11:40 000000010000002900000068 > -rw------- 1 postgres postgres 16M Aug 25 11:40 000000010000002900000069 > -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006A > -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006B > -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006C > -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006D > -rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006E This is definitely ugly: there's no way all those higher-numbered segments would be in place if the WAL counter hadn't been up to that range. So I think we are definitely looking at a WAL-counter-went- backward problem, not some random data corruption. Judging from the file dates, the WAL counter was still OK (ie, above 67xxx) on Sep 1, and the problem must have occurred since then. regards, tom lane
On 13-9-2005 20:53, Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: > >>Unless reindexing is part of other commands, I didn't do that. The last >>time 'grep' was able to find an reference to something being reindexed >>was in June, something (maybe me, but I doubt it, I'd also reindex the >>user-tables, I suppose) was reindexing all system tables back then. > > > Well, that would have done it. But if you have such complete logs, do > you have the postmaster log from the whole period? I have logs from the time I started using it seriously indeed. It was just never rotated or something. I did adjust it to have proper prefixes only at 23th of august, so everything before that is a bit less easy to read. Besides that it contains lots of "debug" stuff, like logged queries and their statistics. It is a lot of text and the file is currently 52MB. But without all those repeated failure-messages its about 862KB. So I can e-mail a compressed version to you, if you're interested in it. > Look for sequences > approximately like this: > > and note whether there's any one that shows a WAL end address (here > 0/12B65A88) that's less than the one before. I take it you mean something like this: [ - 2005-08-23 15:29:26 CEST @] LOG: checkpoint record is at 29/2E73E410 [ - 2005-08-23 15:37:27 CEST @] LOG: checkpoint record is at 29/2E73E410 [ - 2005-08-25 10:26:27 CEST @] LOG: checkpoint record is at 29/2E8D26B4 [ - 2005-08-25 10:47:59 CEST @] LOG: checkpoint record is at 29/2E8D26F0 [ - 2005-09-05 11:24:08 CEST @] LOG: checkpoint record is at 29/2E73E44C [ - 2005-09-09 10:27:20 CEST @] LOG: checkpoint record is at 29/2E73E44C The last start-up at 25th of August has a higher second value than the start-up at the 5th of September. Which was a start since it was closed at sep 1. >>l /var/lib/postgresql/data/pg_xlog/ >>total 145M >>drwx------ 3 postgres postgres 4.0K Sep 1 12:37 . >>drwx------ 8 postgres postgres 4.0K Sep 13 20:31 .. >>-rw------- 1 postgres postgres 16M Sep 13 19:25 00000001000000290000002E >>-rw------- 1 postgres postgres 16M Sep 1 12:36 000000010000002900000067 >>-rw------- 1 postgres postgres 16M Aug 25 11:40 000000010000002900000068 >>-rw------- 1 postgres postgres 16M Aug 25 11:40 000000010000002900000069 >>-rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006A >>-rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006B >>-rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006C >>-rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006D >>-rw------- 1 postgres postgres 16M Aug 25 11:40 00000001000000290000006E > > > This is definitely ugly: there's no way all those higher-numbered > segments would be in place if the WAL counter hadn't been up to that > range. So I think we are definitely looking at a WAL-counter-went- > backward problem, not some random data corruption. > > Judging from the file dates, the WAL counter was still OK (ie, above > 67xxx) on Sep 1, and the problem must have occurred since then. That's odd. But it must have been my hard attempts to shut it down then. These are all the lines there are for Sep 1: [ - 2005-09-01 12:36:50 CEST @] LOG: received fast shutdown request [ - 2005-09-01 12:36:50 CEST @] LOG: shutting down [ - 2005-09-01 12:36:50 CEST @] LOG: database system is shut down [ - 2005-09-01 12:37:01 CEST @] LOG: received smart shutdown request [ - 2005-09-01 12:37:01 CEST @] LOG: shutting down [ - 2005-09-01 12:37:01 CEST @] LOG: database system is shut down [ - 2005-09-05 11:24:08 CEST @] LOG: database system was shut down at 2005-09-01 12:37:01 CEST [ - 2005-09-05 11:24:08 CEST @] LOG: checkpoint record is at 29/2E73E44C [ - 2005-09-05 11:24:08 CEST @] LOG: redo record is at 29/2E73E44C; undo record is at 0/0; shutdown TRUE [ - 2005-09-05 11:24:08 CEST @] LOG: next transaction ID: 166361; next OID: 55768251 [ - 2005-09-05 11:24:08 CEST @] LOG: database system is ready I'm not sure why I needed two shut down-calls at that time. It may have been due to a psql on a terminal I'd forgot to close first. Best regards, Arjen
Arjen van der Meijden <acm@tweakers.net> writes: > These are all the lines there are for Sep 1: > [ - 2005-09-01 12:36:50 CEST @] LOG: received fast shutdown request > [ - 2005-09-01 12:36:50 CEST @] LOG: shutting down > [ - 2005-09-01 12:36:50 CEST @] LOG: database system is shut down > [ - 2005-09-01 12:37:01 CEST @] LOG: received smart shutdown request > [ - 2005-09-01 12:37:01 CEST @] LOG: shutting down > [ - 2005-09-01 12:37:01 CEST @] LOG: database system is shut down That's all? There's something awfully suspicious about that. You're sure this is 8.0.3? AFAICS it is absolutely impossible for the 8.0 postmaster.c code to emit "received smart shutdown request" after emitting "received fast shutdown request". The SIGINT code looks like if (Shutdown >= FastShutdown) break; Shutdown = FastShutdown; ereport(LOG, (errmsg("received fast shutdown request"))); and the SIGTERM code looks like if (Shutdown >= SmartShutdown) break; Shutdown = SmartShutdown; ereport(LOG, (errmsg("received smart shutdown request"))); and there are no other places that change the value of Shutdown, and certainly FastShutdown > SmartShutdown. So I wonder if something got lost in the log entries. Another question is why the postmaster didn't exit at 12:36:50. It was not waiting on any backends, else it would not have launched the shutdown process (which is what emits the other two messages). [ thinks for a bit ... ] I wonder if Shutdown ought to be marked volatile, since it is after all changed by a signal handler. But given the way the postmaster is coded, this doesn't seem likely to be an issue. Basically all of the code runs with signals blocked. Can you try to reconstruct what you did on Sep 1, and see whether you can reproduce the above behavior? regards, tom lane
On 13-9-2005 23:01, Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: > > That's all? There's something awfully suspicious about that. You're > sure this is 8.0.3? When I do "select version();" PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.3.5-20050130 (Gentoo Linux 3.3.5.20050130-r1, ssp-3.3.5.20050130-1, pie-8.7.7.1) I checked the emerge.log from Gentoo and that sais I installed version 8.0.3 at the 15th of August. So well before that September the 1st. > AFAICS it is absolutely impossible for the 8.0 > postmaster.c code to emit "received smart shutdown request" after > emitting "received fast shutdown request". The SIGINT code looks like > > > and the SIGTERM code looks like > > and there are no other places that change the value of Shutdown, and > certainly FastShutdown > SmartShutdown. So I wonder if something got > lost in the log entries. That'd surprise me, but it would explain this behaviour. I doubt though that much happened in those 11 seconds that are missing. It can't have been a start-up without logging, since it wouldn't have logged the shut-down then, would it? Besides that, I normally start it and shut it down using the /etc/init.d-scripts. And that script issues a fast-shutdown, so a smart-shutdown should not be necessary anymore. > Another question is why the postmaster didn't exit at 12:36:50. It was > not waiting on any backends, else it would not have launched the > shutdown process (which is what emits the other two messages). > > [ thinks for a bit ... ] I wonder if Shutdown ought to be marked > volatile, since it is after all changed by a signal handler. But given > the way the postmaster is coded, this doesn't seem likely to be an issue. > Basically all of the code runs with signals blocked. > > Can you try to reconstruct what you did on Sep 1, and see whether you > can reproduce the above behavior? The only time I really recall having trouble with shutting it down was when the memory had leaked up all system memory (at 9-9). I don't know what happened at 1-9 anymore, as far as I remember and can read back from the log I just (tried to) shut it down. Most likely I tried to shut it down to free up some extra memory for the postgres 8.1, running at that time, the 8.0.3 wasn't in use anyway. I'll try and see if I can dig up more from the logs and see if I can test a few reasonable scenario's tomorrow though. Best regards, Arjen
Arjen van der Meijden wrote: > On 13-9-2005 23:01, Tom Lane wrote: > > I'll try and see if I can dig up more from the logs and see if I can > test a few reasonable scenario's tomorrow though. I wasn't able to find anything suspicious at all. A smart-shutdown should've come from the user postgres and I can trace all the 'pg_ctl ...'-calls for that postgres-installation to other dates than Sep 1 or the 8.1-installation. Besides that, if the init.d-script with its fast shutdown wouldn't have responded, it seems very unlikely to me that I'd try a normal shutdown after the fast shutdown. I would've tried an immediate shutdown instead. Besides that the settings hadn't been changed in days at that time, so any query issued should've been logged (log statement was enabled), even if it got cancelled before it finished. I also can't find kill's from root or similar statements, pointing to other incorrect usages. Nor was I silly enough to start postgres 8.1 on the 8.0.3-data-store, or postgresql 8.0.3 with pg_ctl instead of the init.d-script. So why it was able to do two shutdowns in a row, I have no clue, not even how to think of a reasonable scenario what happened. I wasn't actively using/testing the database at that time. The machine itself wasn't shut down as well at that date. Best regards, Arjen
Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: >>> I'll try and see if I can dig up more from the logs and see if I can >>> test a few reasonable scenario's tomorrow though. > >> I wasn't able to find anything suspicious at all. > > Is it conceivable that the init-script itself did something funny? > There are so many different init scripts around ... Its the default gentoo-init-script, but I checked it whether it could do weird things. Anyway, this is the stop-code: stop() { ebegin "Stopping PostgreSQL" su - $PGUSER -c "/usr/bin/pg_ctl stop -D '$PGDATA' -s -m fast" eend $? } start and restart are similarily simple. ebegin and eend are afaik simple macro's to pretty print to the console. Best regards, Arjen
Arjen van der Meijden <acm@tweakers.net> writes: >> I'll try and see if I can dig up more from the logs and see if I can >> test a few reasonable scenario's tomorrow though. > I wasn't able to find anything suspicious at all. Is it conceivable that the init-script itself did something funny? There are so many different init scripts around ... regards, tom lane
On 13-9-2005 0:26, Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: > > There is still the question of how the LSN value got corrupted, but > we've probably missed any chance of finding that out now. As a matter of fact, I just tested it again. Issuing that same query will trigger the issue again when I execute it. I don't know whether the query matters, but I know this one will trigger it, so I didn't try others. This one has the issue as a result: SELECT c.oid, n.nspname, c.relname FROM pg_catalog.pg_class c LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace WHERE pg_catalog.pg_table_is_visible(c.oid) AND c.relname ~ '^pwprijs$' ORDER BY 2, 3; After the above query nothing happens for a few seconds and then these errors show up again: [ - 2005-09-13 08:23:10 CEST @] ERROR: xlog flush request 29/67713428 is not satisfied --- flushed only to 29/2E73E53C [ - 2005-09-13 08:23:10 CEST @] CONTEXT: writing block 21 of relation 1663/2013826/9975789 [ - 2005-09-13 08:23:10 CEST @] WARNING: could not write block 21 of 1663/2013826/9975789 I stopped postgresql again to prevent changes to the on-disk data. But I did start postgresql a second time to see if that query would trigger it again and it did. So what can I do to help you find the issue? Dumping the data to file and sending it to you would probably be a waste of bandwidth and trouble. Apart from the fact that I'd have permission of my employer to do so anyway. Since its a development machine, access to it is a possibility. But if you can give me a few pointers how to gather usefull information without any "stranger" accessing the machine, that'd be nice. Best regards, Arjen van der Meijden
On 13-9-2005 16:25, Tom Lane wrote: > Arjen van der Meijden <acm@tweakers.net> writes: > > It's highly unlikely that that query has anything to do with it, since > it's not touching anything but system catalogs and not trying to write > them either. Indeed, other things trigger it as well. > The first thing you ought to find out is which table > 1663/2013826/9975789 is, and look to see if the corrupted LSN value is > already present on disk in that block. Well, its an index, not a table. It was the index: "pg_class_relname_nsp_index" on pg_class(relname, relnamespace). Using pg_filedump I extracted the LSN for block 21 and indeed, that was already 67713428 instead of something below 2E73E53C. It wasn't that block alone though, here are a few LSN-lines from it: LSN: logid 41 recoff 0x676f5174 Special 8176 (0x1ff0) LSN: logid 25 recoff 0x3c6c5504 Special 8176 (0x1ff0) LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) LSN: logid 41 recoff 0x2ea88190 Special 8176 (0x1ff0) LSN: logid 1 recoff 0x68e2f660 Special 8176 (0x1ff0) LSN: logid 41 recoff 0x2ea8a270 Special 8176 (0x1ff0) LSN: logid 1 recoff 0x68e2f6a4 Special 8176 (0x1ff0) I tried other files and each one I tried only had LSN's of 0. When trying (\d indexname in psql) to determine to which table that index belonged I noticed it got the errors again, but for another file (pg_index this time). And another try (oid2name ...) after that, yet another file (the pg_class-table). All those files where last changed somewhere August 25, so now new changes. On that day I did some active query-tuning, but a few times it took too long, so I issued immediate shut downs when the selects took too long. There were no warnings about broken records afterwards in the log though, so I don't believe anything got damaged afterwards. After that I loaded some fresh data from a production-database using either pg_restore or psql < some-file-from-pg_dump.sql (I don't know which one anymore). A few days later I shut down that postgres, installed 8.1-beta and used that (in another directory of course), this 8.0.3 only came back up because of a reboot and wasn't used since that reboot. I guess, during that reloading those system tables got mixed up? > If it is, then we've probably > not got much chance of finding out how it got there. If it is *not* on > disk, but you have a repeatable way of causing this to happen starting > from a clean postmaster start, then that's pretty interesting --- but > I don't know any way of figuring it out short of groveling through the > code with a debugger. If you're not already pretty familiar with the PG > code, coaching you remotely isn't going to work very well :-(. I'd be > glad to look into it if you can get me access to the machine though. Well, I can very probably give you that access. But as you say, finding out was went wrong is very hard to do. Best regards, Arjen van der Meijden