Database Corruption - last chance recovery options? - Mailing list pgsql-general
From | Michael Best |
---|---|
Subject | Database Corruption - last chance recovery options? |
Date | |
Msg-id | 459DE100.9030302@pendragon.org Whole thread Raw |
Responses |
Re: Database Corruption - last chance recovery options?
|
List | pgsql-general |
Had some database corruption problems today. Since they came on the heels of making some minor database changes yesterday, they may or may not be related to that. Centos 4.x, Postgresql 8.1.4 I modified the following settings and then issued a reload. I hadn't turned up the kernel.shmmax to allow for these bigger memory settings, but the database continued to run fine. shared_buffers = 10000 work_mem = 2048 autovacuum = on # enable autovacuum subprocess? autovacuum_naptime = 60 # time between autovacuum runs, in secs autovacuum_vacuum_threshold = 1000 # min # of tuple updates before autovacuum_analyze_threshold = 500 # min # of tuple updates before autovacuum_vacuum_scale_factor = 0.4 # fraction of rel size before autovacuum_analyze_scale_factor = 0.2 # fraction of rel size before autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for After reloading I ran a number of vacuumdb -a -z which completed successfully. Sometime after the vacuum or reload one of our clients started to have database problems. The other databases in the same postgresql on the server don't seem to be affected. When I finally got the error report in the morning the database was in this state: $ psql dbname dbname=# \dt ERROR: cache lookup failed for relation 20884 Doing a select * from pg_tables seemed to indicate that some of the tables were no longer in the database, also some other tables were inaccessible. I made a backup and then some functionality was restored by issuing a reindex system dbname Using the "broken database" pg_dump on all tables in pg_table gives this for some tables: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: cache lookup failed for relation 20871 pg_dump: The command was: SELECT tableoid, oid, conname, pg_catalog.pg_get_const raintdef(oid) as condef FROM pg_catalog.pg_constraint WHERE conrelid = '20876':: pg_catalog.oid AND contype = 'f' pg_dump: SQL command failed pg_dump: Error message from server: ERROR: relation "public.auth_groups_permissions" does not exist pg_dump: The command was: LOCK TABLE public.auth_groups_permissions IN ACCESS SH ARE MODE pg_dump: SQL command failed pg_dump: Error message from server: ERROR: relation with OID 21186 does not exist pg_dump: The command was: LOCK TABLE public.ght_ght_shippingorders IN ACCESS SHA RE MODE Our backups failed of course, and we have made a good attempt at recovery, which we are willing to accept as all the recovery we can do. As a last chance to get some of the data back, I would be interested if there is any way to read through the raw database files to see if I can recover some more data from them. I'm also curious if any of the settings/reload caused this problem, or perhaps the vacuum or autovacuum is what caused our error. I'll file a bug report if it's somehow repeatable. Oh, and make backups. -Mike
pgsql-general by date: