Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100 - Mailing list pgsql-general

From Pavan Deolasee
Subject Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
Date
Msg-id CABOikdPe3Zq8VXVH+QWb2Kj6JemFyuz4y91SwmfKdNL=BsHo9w@mail.gmail.com
Whole thread Raw
In response to Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
List pgsql-general


On Fri, Apr 6, 2018 at 2:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
adsj@novozymes.com (Adam Sjøgren) writes:
>> [... still waiting for the result, I will return with what it said
>> when the server does ...]

> It did eventually finish, with the same result:

Huh.  So what we have here, apparently, is that regular MVCC snapshots
think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
think there is more than one.  This is darn odd, not least because we
never do UPDATEs in toast tables, only inserts and deletes, so there
certainly shouldn't be update chains there.

It seems like you've got some corner case wherein SnapshotToast sees a row
that isn't visible according to MVCC --- probably a row left over from
some previous cycle of life.  That is, I'm imagining the OID counter
wrapped around and we've reused a toast OID, but for some reason there's
still a row in the table with that OID.  I'm not sure offhand how we could
get into such a state.  Alvaro, does this ring any bells (remembering that
this is 9.3)?

FWIW one of our support customers reported a very similar TOAST table corruption issue last week which nearly caused an outage. After a lot of analysis, I think I've now fully understood the reasons behind the corruption, the underlying bug(s) and possible remedy. I am currently working on writing a reproducible test case to demonstrate the problem and writing the fix. More details on that soon.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-general by date:

Previous
From: Alexandre Arruda
Date:
Subject: ERROR: found multixact from before relminmxid
Next
From: Laurenz Albe
Date:
Subject: Re: pg_basebackup or dump for starting replication process