Thread: Postgres crashes,help to recover
I have Postgres 8 running on Windows XP The size of data subdirectory is 326 MB Today morning suddenly one table in one database, firma1.klient is corrupted. When trying to backup it using pgAdmin III I get the log below and backup is not created. I have: 1. Compressed backup of the whole database as of 15.7 2. Compressed backup of the firma1 schema of this database as of 19.7 where corrupted table klient resides. How to get the database back working by repairing firma1.klient table or by restoring this from schema backup. Why Postgres crashes ? I use default postgres.conf file which has probably fsync on Log when trying to backup table: ..... pg_dump: restoring data for table "klient" pg_dump: dumping contents of table klient pg_dump: ERROR: out of memory DETAIL: Failed on request of size 544565107. pg_dump: SQL command to dump the contents of table "klient" failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: out of memory DETAIL: Failed on request of size 544565107. pg_dump: The command was: COPY firma1.klient (kood, nimi, a_a, p_kood, regnr, vatpayno, piirkond, postiindek, tanav, kontaktisi, telefon, faks, email, wwwpage, liik, viitenr, riik, riik2, riigikood, hinnak, erihinnak, myygikood, objekt2, objekt5, objekt7, maksetin, omakseti, krediit, ostukredii, masin, info, maksja, "timestamp", atimestamp, elanikud, pindala, grmaja, apindala, kpindala, idmakett, tulemus, omandisuhe, username, changedby, parool, hinnaale, mitteakt, kontakteer, klikaart, mhprotsent, aadress, grupp, verskp, firma_enne, tegevusala, instkuupae, firmarv, tookohti, versioon, teenlepkp, jur, kasutab, est, rus, miniest, minirus, plakat, keel) TO stdout; pg_dump: *** aborted because of error Process returned exit code 1.
Andrus wrote: > I have Postgres 8 running on Windows XP > The size of data subdirectory is 326 MB > > Today morning suddenly one table in one database, firma1.klient is > corrupted. When trying to backup it using pgAdmin III I get the log below > and backup is not created. What caused this? Presumably you had a power/system-failure or similar? > I have: > > 1. Compressed backup of the whole database as of 15.7 > 2. Compressed backup of the firma1 schema of this database as of 19.7 where > corrupted table klient resides. Do you mean file-level backups, or backups taken using pg_dump/pgadmin? You can't take file-backups of anything less than the entire data directory. > How to get the database back working by repairing firma1.klient table or by > restoring this from schema backup. If your backup is recent enough, that's probably the quickest route. > Why Postgres crashes ? I use default postgres.conf file which has probably > fsync on *WHEN* did Postgresql crash, originally that is? This error was caused by something - when did something go horribly wrong? And do your disks honour the fsync? Was a power failure the cause of this? > Log when trying to backup table: > > ..... > pg_dump: restoring data for table "klient" > pg_dump: dumping contents of table klient > pg_dump: ERROR: out of memory > DETAIL: Failed on request of size 544565107. Well - unless you have a piece of data that's 544MB that certainly looks like corruption. It's entirely possible you can identify the row that's causing this problem and dump all the data either side of it. However, if your backup is good, then I'd just restore that. -- Richard Huxton Archonet Ltd
>> Today morning suddenly one table in one database, firma1.klient is >> corrupted. When trying to backup it using pgAdmin III I get the log >> below and backup is not created. > > What caused this? Presumably you had a power/system-failure or similar? Windows XP does not respond and I pressed reset key yesterday evening. However, after that the database continues working yesterday. I use default postgres.conf file created by installer. In my knowledge this crash does not occur. I'm very intresting about reasons of this crash. I created copy of the whole data directory. running select * from firma1.klient; from pgAdmin yields: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. in windows eventlog this writes: LOG: checkpoint record is at 0/4FD3ECB8 LOG: all server processes terminated; reinitializing WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. LOG: terminating any other active server processes LOG: server process (PID 2756) was terminated by signal 5 >> I have: >> >> 1. Compressed backup of the whole database as of 15.7 >> 2. Compressed backup of the firma1 schema of this database as of 19.7 >> where corrupted table klient resides. > > Do you mean file-level backups, or backups taken using pg_dump/pgadmin? I have compressed backups created using pgadmin. Whole database backup (database contains two, andmed and firma1 schemas) is from 15.7 and fresh, firma1 schema backup is form 19.7 >> How to get the database back working by repairing firma1.klient table or >> by restoring this from schema backup. > > If your backup is recent enough, that's probably the quickest route. My problem is that I have whole backup only as 15.7 evening I have the current backup (as 19.7 evening) only firma1 schema. Corrupted table klient resides in firma1 schema. I need to restore firma1 backup as of 19.7 to the new database created from 15.7 backup. Unfortunately, firma1 schema is cross-referenced with other schema (andmed) So I have no idea how to restore. Is it possible to convert compressed backup file to plain text or to get data from it ? >> Why Postgres crashes ? I use default postgres.conf file which has >> probably fsync on > > *WHEN* did Postgresql crash, originally that is? This error was caused by > something - when did something go horribly wrong? Yesterday evening I pressed the reset button because windows task manager stops responding ( By experimenting with setforegroundwindow Windows API call I ran 20 copies of charmap.exe and tried to kill them all from task manager). However, after re-booting computer database continues working yesterday. I also restored new database yesterday with 500 tables to this cluster. > And do your disks honour the fsync? Was a power failure the cause of this? I have usual office PC using Quantum FireballP LM20.5 20 GB IDE HDD with XP drivers. How to determine is fsync working or not ? I use default postgres.conf file ( added only listen_addresses = '*' ) >> Log when trying to backup table: >> >> ..... > Well - unless you have a piece of data that's 544MB that certainly looks > like corruption. Tables are small. Whole data directory (including wal segments and 2 other nonimportant databases) sizes is about 350 MB. > It's entirely possible you can identify the row that's causing this > problem and dump all the data either side of it. However, if your backup > is good, then I'd just restore that. I have up-to date backup of firma1 schema only. Whole backup is a bit old. Is it possible to dump the corrupted table, truncate it and re-load it? I think thank referential integrity is not checked in truncate and refrential integrity does not prevent loading this table. Will truncate command fix the corrupted table? Andrus.
It seems that corrupted table klient contains data from some other table. select * from firma1.klient limit 3686; Seems to return all data from table. Starting at row 3687 table contains data from other table. VACUUM command returns: INFO: vacuuming "firma1.klient" ERROR: invalid page header in block 1639 of relation "klient"
klient table has oid 66079 file 66079 size is 13 MB in correct copy klient table file size is 5MB it seems that other table, nomenkla oid=65783 is added to the end of klient table. How to repair file 66079 so that it contains only 3686 rows from beginning ?
> > And do your disks honour the fsync? Was a power failure the > cause of this? > > > I have usual office PC using Quantum FireballP LM20.5 20 GB > IDE HDD with > XP drivers. > How to determine is fsync working or not ? > I use default postgres.conf file ( added only listen_addresses = '*' ) What version exactly was this? There was some changes in 8.0.2 in this area on win32. The default sync method was also changed at this point. What's the output of "show wal_sync_method"? Finally, go into device manager, find your disk, get properties, look under Policies, is the box for "Enable write caching on the disk" checked? //Magnus
> What version exactly was this? There was some changes in 8.0.2 in this > area on win32. The default sync method was also changed at this point. "PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special)" > What's the output of "show wal_sync_method"? "open_datasync" > Finally, go into device manager, find your disk, get properties, look > under Policies, is the box for "Enable write caching on the disk" > checked? It is checked. Does Postgres require this to be unchecked ? It is difficult to force customers to change it. Andrus.
> > What version exactly was this? There was some changes in > 8.0.2 in this > > area on win32. The default sync method was also changed at > this point. > > "PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe > (GCC) 3.4.2 (mingw-special)" There we go. That explains it - you have the new code. > > What's the output of "show wal_sync_method"? > > "open_datasync" This setting is only safe if you disable write cache. > > Finally, go into device manager, find your disk, get > properties, look > > under Policies, is the box for "Enable write caching on the disk" > > checked? > > It is checked. > > Does Postgres require this to be unchecked ? It is difficult > to force customers to change it. No, doesn't erquire it. There are a couple of different scenarios: 1) Box is checked. wal_sync_method=open_datasync. This may cause data loss! 2) Box is checked. wal_sync_method=fsync_writethrough. This is safe. 3) Box is unchecked. wal_sync_method=open_datasync. This is safe. 4) Box is unchecked. wal_sync_method=fsync_writethrough. This is safe. In general I would say that 1 is of course the fastest, but it's not safe. 3 should normally be the fastest if the data is on a disk that's only used by postgresql. 2 is probably faster if you have other applications that also write data to the same disk. 4 is probably *never* fastest :-) This all assumes you don't have a battery backed cache. If you have a controller with battery backed cache, 1 should still be the fastest, but now it's suddenly safe. (The basics of these changes are documented in the release notes at http://www.postgresql.org/docs/8.0/static/release-8-0-2.html) //Magnus
>> "open_datasync" > > This setting is only safe if you disable write cache. Thanks you for explanation. I expected that default installation does not cause data loss in any maschine configuration. I don't remember was write cache enabling Windows default setting or was it set by me. Andrus.