Re: Data corruption - Mailing list pgsql-admin

From Ron
Subject Re: Data corruption
Date
Msg-id 608090c5-0972-9fee-fb6f-9462ee3fc941@gmail.com
Whole thread Raw
In response to Data corruption  (Konrad Garus <konrad.garus@gmail.com>)
List pgsql-admin
I've zeroes out Pg 8.4 pc_clog files using dd (instructions are out there on the web) without any visible negative effect.

Naturally, YMMV.

On 7/21/21 11:14 AM, Konrad Garus wrote:
Hello,

we have a database that apparently got corrupted in a crash a few months ago, long enough that there is no "good" backup.

There are at least two tables occasionally emitting errors like this:

ERROR: invalid page header in block 39684 of relation base/12926/32397 while performing SQL query: SQL DML: insert into foo (a, b, c) values (?, ?, ?) | PARAMETERS: foo, bar, baz: org.postgresql.util.PSQLException: ERROR: invalid page header in block 39684 of relation base/12926/32397

The files in question (e.g. 32397) correspond to the table itself, not an index. The larger of these tables has two files (32397 & 32397.1), almost 2G total.

To make things worse, vacuum ends with:

ERROR:  could not access status of transaction 2828785257
DETAIL:  Could not open file "pg_clog/0A89": No such file or directory.

That file is nowhere to be found.

When I do this:

set zero_damaged_pages = on;
select count(*) from foo

It shows 3 errors like this:

WARNING:  invalid page header in block 78550 of relation base/12926/31642; zeroing out page
WARNING:  invalid page header in block 78551 of relation base/12926/31642; zeroing out page
WARNING:  invalid page header in block 78552 of relation base/12926/31642; zeroing out page

One idea I found is to generate a zero file for the missing clog, set zero_damaged_pages=on, then vacuum or recreate the table. Looks like it might work, but I'm trying to understand the implications.

1. When I generate a zero clog file like this, what impact will it have on everything? Can it break anything else in the database, introduce some kind of inconsistency with other (correctly committed) table data?

2. How can I gauge how much actual data (e.g. how many rows) will be lost by doing this (zero clog file + vacuum/recreate table)?

3. Is there some place I can look to reliably tell how many rows the table should have, so I can compare to how many I can read after setting zero_damaged_pages?

4. Is there any way at all to recover/repair the corrupted data at this point?

--
Konrad Garus

--
Angular momentum makes the world go 'round.

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: Postgres 13.3 times out when attempting to connect via odbc & pgAdmin4
Next
From: Laurenz Albe
Date:
Subject: Re: Data corruption