Thread: VACUUM PANIC: corrupted item pointer
Hi, We got a corrupted database server, and I need some advice on what else can I do. Specs: - Postgres 8.3.3 @ Linux dbserver4 2.6.24-etchnhalf.1-686-bigmem - 4 GB RAM - PGDATA with 700GB (70GB used), software RAID-0 with 3 sata-2 (blame it on me, ugh...) So, the steps till now: 1 - Vacuum analyze fails this night: 2008-10-17 02:36:30 BRST: PANIC: corrupted item pointer: 44 2008-10-17 02:36:30 BRST: STATEMENT: VACUUM ANALYZE; 2 - We found the corrupted table, but the data was ok: # vacuumdb -vz matriz -t total PANIC: corrupted item pointer: 44 # psql matriz matriz# select * from total; OK 3. Tried postgres single # pg_ctl stop # postgres --single -D /postgres matriz matriz# vacuum analyze PANIC: corrupted item pointer: 44 4. reindex # postgres --single -D /postgres matriz matriz# reindexdb OK 5. Again, vacuum single # postgres --single -D /postgres matriz matriz# vacuum analyze PANIC: corrupted item pointer: 44 6. Tried to connect again, but... # postgres --single -D /postgres matriz I got that message when Postgres cant create the shared memory segment, as if the OS has not enough shmmax. Weird. # pg_ctl start Again, same message complaining aboyt shared memory. 7. Reboot 8. Postgres is up, but... # psql matriz psql: FATAL: could not read block 0 of relation 1664/0/2676: read only 0 of 8192 bytes 9. Reset xlog # pg_resetxlog OK 10. Same error # psql matriz psql: FATAL: could not read block 0 of relation 1664/0/2676: read only 0 of 8192 bytes Right now this is the message I got every time I try to connect, even in single mode. We tried to fsck the raid partition, but with no luck. Hope there is something else I can do... -- []´s, ACV
André Volpato escreveu: > 10. Same error > # psql matriz > psql: FATAL: could not read block 0 of relation 1664/0/2676: read > only 0 of 8192 bytes > > Right now this is the message I got every time I try to connect, even > in single mode. > We tried to fsck the raid partition, but with no luck. > > Hope there is something else I can do... > Digging a little bit more, I found that global/2676 has zero bytes, so we restore global/ from backup. Now the only databases we can log in the ones wich hasn´t been vacuumed regulary, e.g. "postgres": $ psql matriz psql: FATAL: cache lookup failed for database 87421438 $ psql volpato psql: FATAL: cache lookup failed for database 94680947 $ psql postgres # \d List of relations Schema | Name | Type | Owner --------+----------------+------+---------- public | pg_buffercache | view | postgres Not sure if pg_buffercache should be there anyway ... Ideas? -- []´s, ACV
=?ISO-8859-1?Q?Andr=E9_Volpato?= <andre.volpato@ecomtecnologia.com.br> writes: > 8. Postgres is up, but... > # psql matriz > psql: FATAL: could not read block 0 of relation 1664/0/2676: read only > 0 of 8192 bytes Looks like your raid array is dying under you :-( ... 2676 is pg_authid_rolname_index, so you could probably get around that particular problem by reindexing pg_authid in standalone mode. (You might need to use the -P option too, not sure.) But the "corrupted item pointer" complaint is heap corruption and no amount of reindexing will fix it. Since you seem to be able to select from the broken table without problem, I'd suggest just copying its data into a new table and dropping the broken one (or truncate it and copy the data back). regards, tom lane
Tom Lane escreveu: > =?ISO-8859-1?Q?Andr=E9_Volpato?= <andre.volpato@ecomtecnologia.com.br> writes: > >> 8. Postgres is up, but... >> # psql matriz >> psql: FATAL: could not read block 0 of relation 1664/0/2676: read only >> 0 of 8192 bytes >> > > Looks like your raid array is dying under you :-( ... We hope it last more 3 weeks, new servers are coming... > 2676 is pg_authid_rolname_index, so you could probably get around that > particular problem by reindexing pg_authid in standalone mode. (You > might need to use the -P option too, not sure.) But the "corrupted item > pointer" complaint is heap corruption and no amount of reindexing will > fix it. > Since you seem to be able to select from the broken table without > problem, I'd suggest just copying its data into a new table and dropping > the broken one (or truncate it and copy the data back). > Not anymore... I cant even connect to any database since reboot (#7). I tryed to restore global/ directory from backup, and I got "cache lookup failed for database 87421438". Now we ran initdb again, and start recreating dbs with backup data. > regards, tom lane > -- []´s, ACV