Thread: BUG #5176: database recovery produces infinite loop
The following bug has been logged online: Bug reference: 5176 Logged by: Helge Milde Email address: helge@monsternett.no PostgreSQL version: 8.1.18 Operating system: Debian 2.6.18-6-686 Description: database recovery produces infinite loop Details: Our database recently crashed, and since then, we haven't been able to start it again. Here's the postmaster log with debug level 5: --- log start --- LOG: could not load root certificate file "root.crt": No SSL error reported DETAIL: Will not verify client certificates. DEBUG: invoking IpcMemoryCreate(size=10469376) DEBUG: max_safe_fds = 985, usable_fds = 1000, already_open = 5 LOG: database system was interrupted while in recovery at 2009-11-10 13:07:47 CET HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery. LOG: checkpoint record is at 8/931B8558 LOG: redo record is at 8/931B3544; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 566148231; next OID: 25210 LOG: next MultiXactId: 18; next MultiXactOffset: 37 LOG: database system was not properly shut down; automatic recovery in progress LOG: redo starts at 8/931B3544 --- log end --- Nothing else of interest follows. It seems the recovery process is failing; doing a strace on the 'postgres: startup process' pid, I see this: -- strace log start --- Process 29888 attached - interrupt to quit close(14) = 0 open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14 _llseek(14, 245760, [245760], SEEK_SET) = 0 write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 fsync(14) = 0 close(14) = 0 open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14 _llseek(14, 245760, [245760], SEEK_SET) = 0 write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192 fsync(14) = 0 close(14) = 0 -- strace log end --- This repeats infinetly. We do have some database dumps, but they are not very recent, so any help on fixing this would be much appreciated. Thanks, Helge Milde
"Helge Milde" <helge@monsternett.no> writes: > It seems the recovery process is failing; doing a strace on the 'postgres: > startup process' pid, I see this: > -- strace log start --- > Process 29888 attached - interrupt to quit > close(14) = 0 > open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14 > _llseek(14, 245760, [245760], SEEK_SET) = 0 > write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) > = 8192 > fsync(14) = 0 > close(14) = 0 > open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14 > _llseek(14, 245760, [245760], SEEK_SET) = 0 > write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) > = 8192 > fsync(14) = 0 > close(14) = 0 > -- strace log end --- > This repeats infinetly. Hmm, can you attach to that process with gdb and collect a few stack traces to show where it's looping? This isn't a symptom we've seen before, AFAIR. regards, tom lane
I'm sorry, but the issue has been fixed now; we've upgraded to postgres-8.3, so I can't reproduce this anymore.. I'm not completely sure how it got fixed (it wasn't me), but I can ask the person who fixed it and get back to you tomorrow;perhaps that might help narrow the field a little bit. On Tue, Nov 10, 2009 at 09:31:06AM -0500, Tom Lane wrote: >"Helge Milde" <helge@monsternett.no> writes: >> It seems the recovery process is failing; doing a strace on the 'postgres: >> startup process' pid, I see this: > >> -- strace log start --- >> Process 29888 attached - interrupt to quit >> close(14) = 0 >> open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14 >> _llseek(14, 245760, [245760], SEEK_SET) = 0 >> write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) >> = 8192 >> fsync(14) = 0 >> close(14) = 0 >> open("pg_clog/021B", O_RDWR|O_CREAT|O_LARGEFILE, 0600) = 14 >> _llseek(14, 245760, [245760], SEEK_SET) = 0 >> write(14, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) >> = 8192 >> fsync(14) = 0 >> close(14) = 0 >> -- strace log end --- > >> This repeats infinetly. > >Hmm, can you attach to that process with gdb and collect a few stack >traces to show where it's looping? This isn't a symptom we've seen >before, AFAIR. > > regards, tom lane -- Helge Milde, 69701808 www.monsternett.no