Thread: [BUGS] BUG #14651: Uninitialized page fix corrupted TOAST table

[BUGS] BUG #14651: Uninitialized page fix corrupted TOAST table

From
nilesoien@gmail.com
Date:
The following bug has been logged on the website:

Bug reference:      14651
Logged by:          Niles Oien
Email address:      nilesoien@gmail.com
PostgreSQL version: 9.5.5
Operating system:   CentOS 6.8
Description:


Hi,

Recently one of our postgres databases gave us this warning in the *.csv log
file :

2017-05-10 19:30:36.938 GMT,,,16791,,5913487e.4197,1,,
2017-05-10 17:06:06 GMT,5/517942,0,WARNING,01000,
"relation ""pg_toast_9303780"" page 2201123 is uninitialized ---
fixing",,,,,,,,,""

Almost immediately thereafter, it started throwing an error once a minute
:

2017-05-10 19:30:36.939 GMT,,,16791,,5913487e.4197,2,,
2017-05-10 17:06:06 GMT,5/517942,0,
ERROR,XX001,"invalid page in block 2201124 of relation
base/16384/9303783",,,,,"automatic vacuum of table
""data.pg_toast.pg_toast_9303780""",,,,""

2017-05-10 19:31:11.100 GMT,,,25310,,59136a7f.62de,1,,
2017-05-10 19:31:11 GMT,5/517948,0,
ERROR,XX001,"invalid page in block 2201124 of relation
base/16384/9303783",,,,,"automatic vacuum of table
""data.pg_toast.pg_toast_9303780""",,,,""

2017-05-10 19:32:11.138 GMT,,,25451,,59136abb.636b,1,,
2017-05-10 19:32:11 GMT,5/517956,0,
ERROR,XX001,"invalid page in block 2201124 of relation
base/16384/9303783",,,,,"automatic vacuum of table
""data.pg_toast.pg_toast_9303780""",,,,""

I think this is a bug? That the fix of page 2201123 caused problems for,
at least, page 2201124 in this TOAST table?

It's been doing this ever since (about two days). I think I figured out
which table data.pg_toast.pg_toast_9303780 supports :

$ oid2name -o 9303780 -d data
From database "data": Filenode     Table Name
-------------------------  9303780  rdvtrack_fd05


And interestingly I can still select from that table :

data=# select count(*) from hmi.rdvtrack_fd05; count   
----------18321734
(1 row)


It looks like there's a damaged block in the TOAST table, but
does the fact that I can do the select on the table supported by
the TOAST table mean that this damaged block has no entries?

Can I do this :

SET zero_damaged_pages = on;
VACUUM FULL data.pg_toast.pg_toast_9303780;
(I'll get a warning that it's zeroing out the damaged block)
REINDEX TABLE data.pg_toast.pg_toast_9303780;

And not lose any data in this case?

And if not, what's a reasonable thing to do? I can shutdown updates
to the database for a little bit if that helps for instance if I have
to dump this table out and re-read it in again.

I'm running postgres on CentOS, and both are reasonably recent, I think :


$ psql --version
psql (PostgreSQL) 9.5.5

$ uname -a
Linux netdrms01 2.6.32-642.11.1.el6.x86_64 #1 SMP Fri Nov 18 19:25:05 UTC
2016 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/issue
CentOS release 6.8 (Final)

Thanks,

Niles Oien, Boulder, Colorado, USA nilesOien@gmail.com




--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14651: Uninitialized page fix corrupted TOAST table

From
Tom Lane
Date:
nilesoien@gmail.com writes:
> It's been doing this ever since (about two days). I think I figured out
> which table data.pg_toast.pg_toast_9303780 supports :

> $ oid2name -o 9303780 -d data
>> From database "data":
>   Filenode     Table Name
> -------------------------
>    9303780  rdvtrack_fd05

That's not hugely reliable because filenode isn't necessarily equal to
oid.  I'd try this to be sure:

select relname from pg_class where reltoastrelid = 'pg_toast.pg_toast_9303780'::regclass;

> And interestingly I can still select from that table :

> data=# select count(*) from hmi.rdvtrack_fd05;

A "select count(*)" would only fetch the main table rows, not out-of-line
TOAST data, so this proves little.  What you want is to see if you can
retrieve all the data in the table.  An easy way to do that is to try to
pg_dump it ("pg_dump -t problem_table ...").

> It looks like there's a damaged block in the TOAST table, but
> does the fact that I can do the select on the table supported by
> the TOAST table mean that this damaged block has no entries?

It's possible but I'd hardly bet on it yet.

> Can I do this :

> SET zero_damaged_pages = on;
> VACUUM FULL data.pg_toast.pg_toast_9303780;
> (I'll get a warning that it's zeroing out the damaged block)
> REINDEX TABLE data.pg_toast.pg_toast_9303780;

You can, but my bet is that you'll lose data.  You may have little
choice though.  (I wouldn't bother with the FULL, btw, since you
already know that a regular vacuum will hit the page.)  Where you
probably will end up is identifying the main-table row(s) that reference
this page and deleting them.

An angle that might be interesting is to check the physical length of
the toast table and see whether the problem page is the last page or
nearly last.  If so, the problem might have originated in a failed
relation-extension attempt.  This would also suggest that the main
table row(s) referencing the bad page are recent.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs