Thread: apparent wraparound
After a system crash, postgresql 8.1.4 restarted but reported that I have an apparent wraparound: 2006-07-13 14:03:40 PDT [10092] LOG: database system was interrupted at 2006-07-13 13:22:19 PDT 2006-07-13 14:03:40 PDT [10092] LOG: checkpoint record is at 1DD/26283E18 2006-07-13 14:03:40 PDT [10092] LOG: redo record is at 1DD/26283E18; undo record is at 0/0; shutdown FALSE 2006-07-13 14:03:40 PDT [10092] LOG: next transaction ID: 169855318; next OID: 787933 2006-07-13 14:03:40 PDT [10092] LOG: next MultiXactId: 5475264; next MultiXactOffset: 13765525 2006-07-13 14:03:40 PDT [10092] LOG: database system was not properly shut down; automatic recovery in progress 2006-07-13 14:03:40 PDT [10092] LOG: record with zero length at 1DD/26283E68 2006-07-13 14:03:40 PDT [10092] LOG: redo is not required 2006-07-13 14:03:40 PDT [10092] LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound 2006-07-13 14:03:40 PDT [10092] LOG: could not truncate directory "pg_multixact/members": apparent wraparound 2006-07-13 14:03:41 PDT [10093] [unknown]%[unknown] LOG: connection received: host=[local] 2006-07-13 14:03:41 PDT [10093] postgres%postgres FATAL: the database system is starting up 2006-07-13 14:03:41 PDT [10092] LOG: database system is ready 2006-07-13 14:03:41 PDT [10092] LOG: transaction ID wrap limit is 1243594092, limited by database "csb-dev" This is from: PostgreSQL 8.1.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 20050901 (prerelease) (SUSE Linux) I'm using autovacuum and it ran around 13:15 on database csb-dev. I see no errors in the logs and therefore assume that the vacuum was successful. There are two files left in pg_multixact: -rw------- 1 postgres postgres 16K 2006-07-13 14:13 pg_multixact/members/00D2 -rw------- 1 postgres postgres 144K 2006-07-13 14:13 pg_multixact/offsets/0053 The system crash occurred during scsi rescanning that was initiated by an admin. The machine has been extremely stable otherwise and I have no reason to suspect hardware flakiness. In http://www.mail-archive.com/pgsql-general@postgresql.org/msg76635.html Tom Lane implies that I can probably ignore these messages. So, the 64K questions: Can I really ignore this? Is there anything I can do to ascertain whether it's a false alarm? Thanks, Reece -- Reece Hart, Ph.D. rkh@gene.com, http://www.gene.com/ Genentech, Inc. 650-225-6133 (voice), -5389 (fax) Bioinformatics and Protein Engineering 1 DNA Way, MS-93 http://harts.net/reece/ South San Francisco, CA 94080-4990 reece@harts.net, GPG:0x25EC91A0
Reece Hart <hart.reece@gene.com> writes: > After a system crash, postgresql 8.1.4 restarted but reported that I > have an apparent wraparound: > ... > 2006-07-13 14:03:40 PDT [10092] LOG: next MultiXactId: 5475264; next MultiXactOffset: 13765525 > ... > 2006-07-13 14:03:40 PDT [10092] LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound > 2006-07-13 14:03:40 PDT [10092] LOG: could not truncate directory "pg_multixact/members": apparent wraparound > There are two files left in pg_multixact: > -rw------- 1 postgres postgres 16K 2006-07-13 14:13 pg_multixact/members/00D2 > -rw------- 1 postgres postgres 144K 2006-07-13 14:13 pg_multixact/offsets/0053 That's odd. Those files correspond to the next MultiXactId and MultiXactOffset, so there shouldn't have been any such complaint. [ looks at code... ] I wonder if this is happening because shared->latest_page_number hasn't been set up yet when we do the end-of-recovery checkpoint. > In > http://www.mail-archive.com/pgsql-general@postgresql.org/msg76635.html > Tom Lane implies that I can probably ignore these messages. No, I was saying that the "invalid server process ID -1" was harmless. The "apparent wraparound" is a distinct issue, and I'd ask you the same question I asked Thomas: do you continue to get those log messages during subsequent checkpoints? regards, tom lane
Tom Lane wrote: > I'd ask you the same question I asked Thomas: do you continue to get those log messages > during subsequent checkpoints? No, I don't. The error did not reappear during ~2h of continuous inserts since my report, didn't reappear after a forced checkpoint (i.e., via psql), and did not reappear on a recent stop/start cycle. There was a period when my cron-driven vacuuming was broken and, in principle, I might have been susceptible to wraparound. However, I don't see how we could have had 1B transactions in that period. One other tidbit: a colleague inadvertently updated ~10M records. After this, I started getting errors like: number of page slots needed (2952496) exceeds max_fsm_pages (500000) I restored from a backup, but still have: 'number of page slots needed (183248) exceeds max_fsm_pages (50000)' (I reduced max_fsm_pages after the restore.) I'm not sure whether the vacuum and fsm info is relevant. -Reece -- Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
Reece -- The "number of slots needed exceeds" messages are telling you that the current FSM (Free Space Map) does not have enoughspace allocated to track all of the old tuples that are to be reused. I suspect that having such a situation wouldeffect the wraparound issue, since you'd have "dead wood" which hasn't been recycled. You need to edit the postgresql.conf file and increase the max_fsm_pages and max_fsm_relations parameters and then restartpostgres (I think you have to actually stop and restart, as opposed to a reload, but I could be wrong). You may endup needing to adjust the total amount of RAM allocated to Shared Memory to allow for as large an FSM as you'll need. Thatrequires a system reboot. HTH, Greg Williamson DBA GlobeXplorer LLC -----Original Message----- From: pgsql-general-owner@postgresql.org on behalf of Reece Hart Sent: Fri 7/14/2006 12:30 PM To: pgsql-general Cc: Subject: Re: [GENERAL] apparent wraparound Tom Lane wrote: > I'd ask you the same question I asked Thomas: do you continue to get those log messages > during subsequent checkpoints? No, I don't. The error did not reappear during ~2h of continuous inserts since my report, didn't reappear after a forced checkpoint (i.e., via psql), and did not reappear on a recent stop/start cycle. There was a period when my cron-driven vacuuming was broken and, in principle, I might have been susceptible to wraparound. However, I don't see how we could have had 1B transactions in that period. One other tidbit: a colleague inadvertently updated ~10M records. After this, I started getting errors like: number of page slots needed (2952496) exceeds max_fsm_pages (500000) I restored from a backup, but still have: 'number of page slots needed (183248) exceeds max_fsm_pages (50000)' (I reduced max_fsm_pages after the restore.) I'm not sure whether the vacuum and fsm info is relevant. -Reece -- Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0 ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org !DSPAM:44b7f15495741414113241!
Gregory S. Williamson wrote: > You need to edit the postgresql.conf file and increase the max_fsm_pages and > max_fsm_relations parameters and then restart postgres (I think you > have to actually stop and restart, as opposed to a reload, but I could be > wrong). You may end up needing to adjust the total amount of RAM > allocated to Shared Memory to allow for as large an FSM as you'll > need. That requires a system reboot. Actually, it doesn't require a system reboot. On debian/ubunut, and AFAIK on redhat too, you just need to edit /etc/sysctl.conf, and then call sysctl -p to update the in-kernel settings. greetings, Florian Pflug
Florian G. Pflug wrote: > Gregory S. Williamson wrote: >> You need to edit the postgresql.conf file and increase the >> max_fsm_pages and > > max_fsm_relations parameters and then restart postgres (I think you > > have to actually stop and restart, as opposed to a reload, but I > could be > > wrong). You may end up needing to adjust the total amount of RAM > > allocated to Shared Memory to allow for as large an FSM as you'll > > need. That requires a system reboot. > Actually, it doesn't require a system reboot. On debian/ubunut, and > AFAIK on redhat too, you just need to edit /etc/sysctl.conf, and > then call sysctl -p to update the in-kernel settings. Most modern day unix does not need to reboot... FreeBSD (although there are a couple it still needs to reboot for IIRC) Linux MacOSX None of these need to reboot to set those parameters. Sincerely, Joshua D. Drake > > greetings, Florian Pflug > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/
Greg, Florian, Joshua, Tom- On Fri, 2006-07-14 at 17:02 -0700, Gregory S. Williamson wrote: > You need to edit the postgresql.conf file and increase the > max_fsm_pages and max_fsm_relations parameters and then restart > postgres I did this and vacuumed. I didn't need to up shmmax. The problem's disappeared (and, fortunately, my data are still intact). Thanks everyone for your help. -Reece -- Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0
I wrote: > Reece Hart <hart.reece@gene.com> writes: >> After a system crash, postgresql 8.1.4 restarted but reported that I >> have an apparent wraparound: >> 2006-07-13 14:03:40 PDT [10092] LOG: could not truncate directory "pg_multixact/offsets": apparent wraparound >> 2006-07-13 14:03:40 PDT [10092] LOG: could not truncate directory "pg_multixact/members": apparent wraparound > [ looks at code... ] I wonder if this is happening because > shared->latest_page_number hasn't been set up yet when we do the > end-of-recovery checkpoint. Indeed, I was able to reproduce this after advancing the next-mxact counters far enough. The consequences seem only cosmetic (ie, a scary log message during recovery), but still worth fixing. I committed a patch in HEAD and 8.1. regards, tom lane
Hi all, I'm trying to run pg_regress as part of a build of Postgres 8.1 on Suse Linux Enterprise Server 9. I keep getting the following error: ./pg_regress: line 264: kill: (25049) - No such process I chased the problem down to the following lines in pg_regress: "$bindir/postmaster" -D "$PGDATA" -F $postmaster_options >"$LOGDIR/postmaster.log"; 2>&1 & postmaster_pid=$! When I run sh -x pg_regress, I get: +/usr/src/packages/BUILD/postgresql-8.1.4/src/test/regress/./tmp_check/install//usr/bin/postmaster \ /usr/src/packages/BUILD/postgresql-8.1.4/src/test/regress/./tmp_check/data -F -c listen_addresses= + postmaster_pid=25049 As I understand it, $! is supposed to return the process id of the last backgrounded process. However, in my case it's return a process ID 1 greater than the actual postmaster instance. I end up with the following process tree 25048 ? Ss 0:00 /usr/src/packages/BUILD/postgresql-8.1.4/src/test/regress/./tmp_check/install//usr/bin/postmaster -D /usr/s 25054 ? S 0:00 \_ postgres: logger process 25056 ? S 0:00 \_ postgres: writer process 25057 ? S 0:00 \_ postgres: stats buffer process 25058 ? S 0:00 \_ postgres: stats collector process Then of course when the script tries kill -0 on a 25049 to see if the postmaster came up, it wrongly concludes that postmaster did not start. Can anyone help understand this behavior and/or get the proper process ID? Thanks, Eric
Eric E <whalesuit@gmail.com> writes: > I chased the problem down to the following lines in pg_regress: > "$bindir/postmaster" -D "$PGDATA" -F $postmaster_options > > "$LOGDIR/postmaster.log"; 2>&1 & ^ Is there really a semicolon there? There should not be (and isn't in my copy of 8.1) regards, tom lane
Sorry - that semicolon is an artifact of a change I made - I attempted to put postmaster_pid=$! inside the executing statement, but as I discovered, if you put it before last &, you get the backgrounded process before postmaster. I'm presently checking out how the SLES init scripts work, but any thoughts would be very welcome. Thanks, EE Tom Lane wrote: > Eric E <whalesuit@gmail.com> writes: > >> I chased the problem down to the following lines in pg_regress: >> > > >> "$bindir/postmaster" -D "$PGDATA" -F $postmaster_options >> > "$LOGDIR/postmaster.log"; 2>&1 & >> > ^ > > Is there really a semicolon there? There should not be (and isn't in > my copy of 8.1) > > regards, tom lane > >