Re: TOAST table repeatedly corrupted - Mailing list pgsql-bugs

From Niles Oien
Subject Re: TOAST table repeatedly corrupted
Date
Msg-id CANQ3m6OKUsPZ9c-=5hRBi7_CCVNDy+ED+mJOdwFCPq8u=2nNGA@mail.gmail.com
Whole thread Raw
In response to Re: TOAST table repeatedly corrupted  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs

Thanks. I don't have checksums on. I'll look into it on the next build.

The file 36298640.10 didn't show anything unusual under pg_filedump.

I'm betting that we are suffering from a now-fixed TOAST issue, if not the recently fixed one you mentioned. That's probably all the chasing that's worth doing here given the dated nature of our production box. On our development box, where we have some room to move, we're running something a bit newer :

 PostgreSQL 10.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-16), 64-bit

Does that need an upgrade to get this week's TOAST fixes, too? I'm not sure if CentOS's 'yum upgrade' will pick it up - I have the repo pgdg10/7/x86_64 enabled, will the update show up that way?

Thanks,

Niles.






On Wed, May 9, 2018 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Niles Oien <noien@nso.edu> writes:
> I am running a reasonably recent version of postgres :
>  PostgreSQL 9.5.5 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7
> 20120313 (Red Hat 4.4.7-17), 64-bit

As David said, that's not terribly recent.  If you are going to upgrade,
I'd suggest waiting till tomorrow and grabbing 9.5.13, because we fixed
a pretty serious TOAST data corruption bug in this week's batch of
releases.  The expected symptoms of it don't match what you're seeing,
unfortunately, but nonetheless you ought to be using the latest, just
in case this is an already-fixed issue.

> 2018-05-09 16:14:03.834 GMT,,,27018,,5af31e4b.698a,1,,2018-05-09 16:14:03
> GMT,12/611211,0,ERROR,XX001,"invalid page in block 1374551 of relation
> base/16384/36298640",,,,,"automatic vacuum of table
> ""data.pg_toast.pg_toast_36298637""",,,,""

Block 1374551 would be well past the first segment of the file, since
in a standard build (1GB segments, 8K blocks) there are only 131072
pages per segment.  This explains why you didn't see any complaints
from pg_filedump, if you only ran it over the first segment.

If you've not clobbered the DB yet, file 36298640.10 would be what
to look at, I believe.

> And sure enough, I now cannot dump that table :
> pg_dump: Error message from server: ERROR:  compressed data is corrupted

That's interesting, because it seems to indicate an independent problem.
The "invalid page" error indicates a bad page header, or possibly a
page checksum failure; either way the page would not have been allowed
into the buffer pool.  But "compressed data is corrupted" implies that
we did read a page but the data in it seems messed up.  So this evidence
says you have at least two different corrupted places in that table.

Do you have checksums enabled in this installation?  If you're going
to have to rebuild it, you should probably turn those on (use
initdb --data-checksums), in hopes of narrowing down what's happening. 

> I think this is probably a bug? Every time it happens
> it affects the same table, hmi.rdvtrack_fd05.

That's mighty suggestive all right, but unfortunately it doesn't
do much to narrow down the problem :-(

                        regards, tom lane



--
Niles Oien, National Solar Observatory, Boulder Colorado USA

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: TOAST table repeatedly corrupted
Next
From: Peter Geoghegan
Date:
Subject: Re: TOAST table repeatedly corrupted