Thread: Postgres crashes,help to recover

Postgres crashes,help to recover

From
"Andrus"
Date:
I have Postgres 8 running on Windows XP
The size of data subdirectory is 326 MB

Today morning suddenly one table in one database, firma1.klient is
corrupted. When trying to backup it using pgAdmin III  I get the log below
and backup is not created.

I have:

1. Compressed backup of the whole database as of 15.7
2. Compressed backup of the firma1 schema of this database as of 19.7 where
corrupted table klient resides.

How to get the database back working by repairing firma1.klient table or by
restoring this from schema backup.
Why Postgres crashes ? I use default postgres.conf file which has probably
fsync on


Log when trying to backup table:

.....
pg_dump: restoring data for table "klient"
pg_dump: dumping contents of table klient
pg_dump: ERROR:  out of memory
DETAIL:  Failed on request of size 544565107.
pg_dump: SQL command to dump the contents of table "klient" failed:
PQendcopy() failed.
pg_dump: Error message from server: ERROR:  out of memory
DETAIL:  Failed on request of size 544565107.
pg_dump: The command was: COPY firma1.klient (kood, nimi, a_a, p_kood,
regnr, vatpayno, piirkond, postiindek, tanav, kontaktisi, telefon, faks,
email, wwwpage, liik, viitenr, riik, riik2, riigikood, hinnak, erihinnak,
myygikood, objekt2, objekt5, objekt7, maksetin, omakseti, krediit,
ostukredii, masin, info, maksja, "timestamp", atimestamp, elanikud, pindala,
grmaja, apindala, kpindala, idmakett, tulemus, omandisuhe, username,
changedby, parool, hinnaale, mitteakt, kontakteer, klikaart, mhprotsent,
aadress, grupp, verskp, firma_enne, tegevusala, instkuupae, firmarv,
tookohti, versioon, teenlepkp, jur, kasutab, est, rus, miniest, minirus,
plakat, keel) TO stdout;
pg_dump: *** aborted because of error

Process returned exit code 1.



Re: Postgres crashes,help to recover

From
Richard Huxton
Date:
Andrus wrote:
> I have Postgres 8 running on Windows XP
> The size of data subdirectory is 326 MB
>
> Today morning suddenly one table in one database, firma1.klient is
> corrupted. When trying to backup it using pgAdmin III  I get the log below
> and backup is not created.

What caused this? Presumably you had a power/system-failure or similar?

> I have:
>
> 1. Compressed backup of the whole database as of 15.7
> 2. Compressed backup of the firma1 schema of this database as of 19.7 where
> corrupted table klient resides.

Do you mean file-level backups, or backups taken using pg_dump/pgadmin?
You can't take file-backups of anything less than the entire data directory.

> How to get the database back working by repairing firma1.klient table or by
> restoring this from schema backup.

If your backup is recent enough, that's probably the quickest route.

> Why Postgres crashes ? I use default postgres.conf file which has probably
> fsync on

*WHEN* did Postgresql crash, originally that is? This error was caused
by something - when did something go horribly wrong?

And do your disks honour the fsync? Was a power failure the cause of this?

> Log when trying to backup table:
>
> .....
> pg_dump: restoring data for table "klient"
> pg_dump: dumping contents of table klient
> pg_dump: ERROR:  out of memory
> DETAIL:  Failed on request of size 544565107.

Well - unless you have a piece of data that's 544MB that certainly looks
like corruption.
It's entirely possible you can identify the row that's causing this
problem and dump all the data either side of it. However, if your backup
is good, then I'd just restore that.

--
   Richard Huxton
   Archonet Ltd

Re: Postgres crashes,help to recover

From
"Andrus"
Date:
>> Today morning suddenly one table in one database, firma1.klient is
>> corrupted. When trying to backup it using pgAdmin III  I get the log
>> below and backup is not created.
>
> What caused this? Presumably you had a power/system-failure or similar?


Windows XP does not respond and I pressed reset key yesterday evening.
However, after that the database continues working yesterday.

I use default postgres.conf file created by installer. In my knowledge this
crash does not occur.
I'm very intresting about reasons of this crash.
I created copy of the whole data directory.

running

select * from firma1.klient;

from pgAdmin yields:

server closed the connection unexpectedly
 This probably means the server terminated abnormally
 before or while processing the request.


in windows eventlog this writes:

LOG: checkpoint record is at 0/4FD3ECB8

LOG: all server processes terminated; reinitializing

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.

LOG: terminating any other active server processes
LOG: server process (PID 2756) was terminated by signal 5

>> I have:
>>
>> 1. Compressed backup of the whole database as of 15.7
>> 2. Compressed backup of the firma1 schema of this database as of 19.7
>> where corrupted table klient resides.
>
> Do you mean file-level backups, or backups taken using pg_dump/pgadmin?


I have compressed backups created using pgadmin.
Whole database backup (database contains two, andmed and firma1 schemas) is
from 15.7 and fresh, firma1 schema backup is form  19.7

>> How to get the database back working by repairing firma1.klient table or
>> by restoring this from schema backup.
>
> If your backup is recent enough, that's probably the quickest route.


My problem is that I have whole backup only as 15.7 evening

I have the current backup (as 19.7 evening) only firma1 schema.

Corrupted table klient resides in firma1 schema.

I need to restore firma1 backup as of 19.7  to the new database created from
15.7 backup.

Unfortunately, firma1 schema is cross-referenced with other schema (andmed)
So I have no idea how to restore.

Is it possible to convert compressed backup file to plain text or to get
data from it ?

>> Why Postgres crashes ? I use default postgres.conf file which has
>> probably fsync on
>
> *WHEN* did Postgresql crash, originally that is? This error was caused by
> something - when did something go horribly wrong?


Yesterday evening I pressed the reset button because windows task manager
stops responding ( By experimenting with setforegroundwindow Windows API
call I ran 20 copies of charmap.exe and tried to kill them all from task
manager). However, after re-booting computer database continues working
yesterday.

I also restored new database yesterday with 500 tables to this cluster.

> And do your disks honour the fsync? Was a power failure the cause of this?


I have usual office PC using Quantum FireballP LM20.5   20 GB IDE HDD with
XP drivers.
How to determine is fsync working or not ?
I use default postgres.conf file ( added only listen_addresses = '*' )

>> Log when trying to backup table:
>>
>> .....
> Well - unless you have a piece of data that's 544MB that certainly looks
> like corruption.

Tables are small. Whole data directory (including wal segments and 2 other
nonimportant databases) sizes is about 350 MB.

> It's entirely possible you can identify the row that's causing this
> problem and dump all the data either side of it. However, if your backup
> is good, then I'd just restore that.

I have up-to date backup of firma1 schema only. Whole backup is a bit old.

Is it possible to dump the corrupted table, truncate it and re-load it? I
think thank referential integrity is not checked in truncate and refrential
integrity does not prevent loading this table.

Will truncate command fix the corrupted table?

Andrus.



Re: Postgres crashes,help to recover

From
"Andrus"
Date:
It seems that corrupted table klient contains data from some other table.

select * from firma1.klient limit 3686;

Seems to return all data from table.
Starting at row 3687  table contains data from other table.

VACUUM command returns:

INFO:  vacuuming "firma1.klient"

ERROR:  invalid page header in block 1639 of relation "klient"



Re: Postgres crashes,help to recover

From
"Andrus"
Date:
klient table has oid 66079

file 66079  size is 13 MB

in correct copy klient table file size is 5MB

it seems that other table, nomenkla oid=65783  is added to the end of
klient table.

How to repair file 66079 so that it contains only 3686 rows from beginning ?




Re: Postgres crashes,help to recover

From
"Magnus Hagander"
Date:
> > And do your disks honour the fsync? Was a power failure the
> cause of this?
>
>
> I have usual office PC using Quantum FireballP LM20.5   20 GB
> IDE HDD with
> XP drivers.
> How to determine is fsync working or not ?
> I use default postgres.conf file ( added only listen_addresses = '*' )

What version exactly was this? There was some changes in 8.0.2 in this
area on win32. The default sync method was also changed at this point.

What's the output of "show wal_sync_method"?

Finally, go into device manager, find your disk, get properties, look
under Policies, is the box for "Enable write caching on the disk"
checked?

//Magnus

Re: Postgres crashes,help to recover

From
"Andrus"
Date:
> What version exactly was this? There was some changes in 8.0.2 in this
> area on win32. The default sync method was also changed at this point.

"PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2
(mingw-special)"

> What's the output of "show wal_sync_method"?

"open_datasync"

> Finally, go into device manager, find your disk, get properties, look
> under Policies, is the box for "Enable write caching on the disk"
> checked?

It is checked.

Does Postgres require this to be unchecked ? It is difficult to force
customers to change it.

Andrus.



Re: Postgres crashes,help to recover

From
"Magnus Hagander"
Date:
> > What version exactly was this? There was some changes in
> 8.0.2 in this
> > area on win32. The default sync method was also changed at
> this point.
>
> "PostgreSQL 8.0.2 on i686-pc-mingw32, compiled by GCC gcc.exe
> (GCC) 3.4.2 (mingw-special)"

There we go. That explains it - you have the new code.


> > What's the output of "show wal_sync_method"?
>
> "open_datasync"

This setting is only safe if you disable write cache.


> > Finally, go into device manager, find your disk, get
> properties, look
> > under Policies, is the box for "Enable write caching on the disk"
> > checked?
>
> It is checked.
>
> Does Postgres require this to be unchecked ? It is difficult
> to force customers to change it.

No, doesn't erquire it. There are a couple of different scenarios:

1) Box is checked. wal_sync_method=open_datasync. This may cause data
loss!
2) Box is checked. wal_sync_method=fsync_writethrough. This is safe.
3) Box is unchecked. wal_sync_method=open_datasync. This is safe.
4) Box is unchecked. wal_sync_method=fsync_writethrough. This is safe.

In general I would say that 1 is of course the fastest, but it's not
safe. 3 should normally be the fastest if the data is on a disk that's
only used by postgresql. 2 is probably faster if you have other
applications that also write data to the same disk. 4 is probably
*never* fastest :-)


This all assumes you don't have a battery backed cache. If you have a
controller with battery backed cache, 1 should still be the fastest, but
now it's suddenly safe.

(The basics of these changes are documented in the release notes at
http://www.postgresql.org/docs/8.0/static/release-8-0-2.html)

//Magnus

Re: Postgres crashes,help to recover

From
"Andrus"
Date:
>> "open_datasync"
>
> This setting is only safe if you disable write cache.

Thanks you for explanation.

I expected that default installation does not cause data loss in any
maschine configuration.

I don't remember was write cache enabling Windows default setting or was it
set by me.

Andrus.