Thread: Emergency - Need assistance
I received the following error message when trying to copy a table from one database to another on the same cluster: pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor pg_restore: [custom archiver] could not read data block -- expected 1, got 0 pg_restore: *** aborted because of error The table contains a bytea column which houses pdf documents. Is this a sign of corrupted data and if so would setting "zero_damaged_pages = true" allow the copy to proceed? The table is about 25GB in size and takes a long time to dump/restore and I'm running out of time to get the cluster back into production. note running: PostgreSQL 8.1beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)" -- Warren Little CTO Meridias Capital Inc ph 866.369.7763
warren little <warren.little@meridiascapital.com> writes: > I received the following error message when trying to copy a table from > one database to another on the same cluster: > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor > pg_restore: [custom archiver] could not read data block -- expected 1, > got 0 > pg_restore: *** aborted because of error You seem to have omitted the messages that would indicate what's actually wrong; the above is all just subsidiary damage after whatever caused the FETCH to fail. > The table is about 25GB in size and takes a long time to dump/restore > and I'm running out of time to get the cluster back into production. > note running: > PostgreSQL 8.1beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc > (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)" You're running a production database on a beta release?? regards, tom lane
Tom, The extent of the messages I received from the command pg_dump -Fc --table=casedocument -d tigrissave | pg_restore --verbose -d tigris is listed below: pg_dump: SQL command failed pg_dump: Error message from server: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor pg_restore: [custom archiver] could not read data block -- expected 1, got 0 pg_restore: *** aborted because of error I had removed all the files in pg_log prior to getting this error and no new logfile was created. I'm guessing I screwed up the logger when removing all the files, but I assumed that when writing to the error logs the backend would create a file if one did not exist. I currently attempt to run the dump/restore with the zero_damaged_pages turned on to see if the results yield something more useful. About the beta version, this is temporary, hadn't really planned on running production on our development box. Haven't had any issues with 8.1beta for a few months and will be moving to 8.1.x as soon as some new hardware arrives (about a week). thanks On Mon, 2006-01-02 at 15:10 -0500, Tom Lane wrote: > warren little <warren.little@meridiascapital.com> writes: > > I received the following error message when trying to copy a table from > > one database to another on the same cluster: > > > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor > > pg_restore: [custom archiver] could not read data block -- expected 1, > > got 0 > > pg_restore: *** aborted because of error > > You seem to have omitted the messages that would indicate what's > actually wrong; the above is all just subsidiary damage after whatever > caused the FETCH to fail. > > > The table is about 25GB in size and takes a long time to dump/restore > > and I'm running out of time to get the cluster back into production. > > > note running: > > PostgreSQL 8.1beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc > > (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)" > > You're running a production database on a beta release?? > > regards, tom lane -- Warren Little CTO Meridias Capital Inc ph 866.369.7763
warren little <warren.little@meridiascapital.com> writes: > pg_dump: SQL command failed > pg_dump: Error message from server: server closed the connection > unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor Hmm. This could mean corrupted data files, but it's hard to be sure without more info. > I had removed all the files in pg_log prior to getting this error and no > new logfile was created. I'm guessing I screwed up the logger when > removing all the files, but I assumed that when writing to the error > logs the backend would create a file if one did not exist. The file *does* exist, there's just no directory link to it anymore :-( You need to force a logfile rotation, which might be most easily done by stopping and restarting the postmaster. What you need to do is see the postmaster log entry about the backend crash. If it's dying on a signal (likely sig11 = SEGV) then inspecting the core file might yield useful information. > I currently attempt to run the dump/restore with the zero_damaged_pages > turned on to see if the results yield something more useful. That really ought to be the last resort not the first one, because it will destroy not only data but most of the evidence about what went wrong... regards, tom lane
The dump/restore failed even with the zero_damaged_pages=true. The the logfile (postgresql-2006-01-02_130023.log) did not have much in the way of useful info. I've attached the section of the logfile around the time of the crash. I cannot find any sign of a core file. Where might the core dump have landed? Regarding your comments about losing the evidence, the data I'm trying to load is in another database in the same cluster which I have no intention of purging until a can get the table moved to the new database. thanks On Mon, 2006-01-02 at 16:34 -0500, Tom Lane wrote: > warren little <warren.little@meridiascapital.com> writes: > > pg_dump: SQL command failed > > pg_dump: Error message from server: server closed the connection > > unexpectedly > > This probably means the server terminated abnormally > > before or while processing the request. > > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor > > Hmm. This could mean corrupted data files, but it's hard to be sure > without more info. > > > I had removed all the files in pg_log prior to getting this error and no > > new logfile was created. I'm guessing I screwed up the logger when > > removing all the files, but I assumed that when writing to the error > > logs the backend would create a file if one did not exist. > > The file *does* exist, there's just no directory link to it anymore :-( > You need to force a logfile rotation, which might be most easily done by > stopping and restarting the postmaster. > > What you need to do is see the postmaster log entry about the backend > crash. If it's dying on a signal (likely sig11 = SEGV) then inspecting > the core file might yield useful information. > > > I currently attempt to run the dump/restore with the zero_damaged_pages > > turned on to see if the results yield something more useful. > > That really ought to be the last resort not the first one, because it > will destroy not only data but most of the evidence about what went > wrong... > > regards, tom lane
Sorry, forget the attachment. On Mon, 2006-01-02 at 15:24 -0700, warren little wrote: > The dump/restore failed even with the zero_damaged_pages=true. > The the logfile (postgresql-2006-01-02_130023.log) > did not have much in the way of useful info. I've attached the section > of the logfile around the time of the crash. I cannot find any sign of > a core file. Where might the core dump have landed? > > Regarding your comments about losing the evidence, the data I'm trying > to load is in another database in the same cluster which I have no > intention of purging until a can get the table moved to the new > database. > > thanks > > > > > On Mon, 2006-01-02 at 16:34 -0500, Tom Lane wrote: > > warren little <warren.little@meridiascapital.com> writes: > > > pg_dump: SQL command failed > > > pg_dump: Error message from server: server closed the connection > > > unexpectedly > > > This probably means the server terminated abnormally > > > before or while processing the request. > > > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor > > > > Hmm. This could mean corrupted data files, but it's hard to be sure > > without more info. > > > > > I had removed all the files in pg_log prior to getting this error and no > > > new logfile was created. I'm guessing I screwed up the logger when > > > removing all the files, but I assumed that when writing to the error > > > logs the backend would create a file if one did not exist. > > > > The file *does* exist, there's just no directory link to it anymore :-( > > You need to force a logfile rotation, which might be most easily done by > > stopping and restarting the postmaster. > > > > What you need to do is see the postmaster log entry about the backend > > crash. If it's dying on a signal (likely sig11 = SEGV) then inspecting > > the core file might yield useful information. > > > > > I currently attempt to run the dump/restore with the zero_damaged_pages > > > turned on to see if the results yield something more useful. > > > > That really ought to be the last resort not the first one, because it > > will destroy not only data but most of the evidence about what went > > wrong... > > > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly
Attachment
warren little <warren.little@meridiascapital.com> writes: > The dump/restore failed even with the zero_damaged_pages=true. > The the logfile (postgresql-2006-01-02_130023.log) > did not have much in the way of useful info. I've attached the section > of the logfile around the time of the crash. I cannot find any sign of > a core file. Where might the core dump have landed? It would typically go into $PGDATA (if you're using 8.1) or some subdirectory thereof (if you're using an older release). There are some platforms such as OS X that put core files in a special directory /core so check for that too. If you're not finding any corefile then the most likely bet is that the postmaster has been launched under "ulimit -c 0" which forbids dropping a corefile. (This seems to be the default environment under many Linuxen.) I'd suggest adding "ulimit -c unlimited" to the postmaster start script you're using, restarting the postmaster, and repeating the dump to cause the crash again. regards, tom lane