Home > mailing lists

Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100 - Mailing list pgsql-general

From	Pavan Deolasee
Subject	Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
Date	April 10, 2018 13:42:47
Msg-id	CABOikdPJUv5EyxRZnJY2oA7EKi=bZdoryXNmUGK3dFAQ08tZdw@mail.gmail.com Whole thread
In response to	Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100 (Pavan Deolasee <pavan.deolasee@gmail.com>)
List	pgsql-general

Tree view

On Fri, Apr 6, 2018 at 8:55 AM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote:

On Fri, Apr 6, 2018 at 2:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
adsj@novozymes.com (Adam Sjøgren) writes:
>> [... still waiting for the result, I will return with what it said
>> when the server does ...]

> It did eventually finish, with the same result:

Huh. So what we have here, apparently, is that regular MVCC snapshots
think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
think there is more than one. This is darn odd, not least because we
never do UPDATEs in toast tables, only inserts and deletes, so there
certainly shouldn't be update chains there.

It seems like you've got some corner case wherein SnapshotToast sees a row
that isn't visible according to MVCC --- probably a row left over from
some previous cycle of life. That is, I'm imagining the OID counter
wrapped around and we've reused a toast OID, but for some reason there's
still a row in the table with that OID. I'm not sure offhand how we could
get into such a state. Alvaro, does this ring any bells (remembering that
this is 9.3)?

FWIW one of our support customers reported a very similar TOAST table corruption issue last week which nearly caused an outage. After a lot of analysis, I think I've now fully understood the reasons behind the corruption, the underlying bug(s) and possible remedy. I am currently working on writing a reproducible test case to demonstrate the problem and writing the fix. More details on that soon.

I've posted a reproducer and a proposed fix to -hackers [1]

In the particular case that I investigated, a database crash/recovery was involved. But I think we should be able to create a scenario where OID wrap-around or a standby promotion triggers the problem. I don't know if any of that was involved in the cases reported on this thread, but I've a strong suspicion that the underlying bug is probably the same.

Thanks,

Pavan

[1] https://www.postgresql.org/message-id/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-%3Dh4OLosXHf9w%40mail.gmail.com

Pavan Deolasee http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-general by date:

From: hmidi slim
Date: 10 April 2018, 13:24:49
Subject: Using enum instead of join tables

From: Alexandre Arruda
Date: 10 April 2018, 14:31:20
Subject: Re: ERROR: found multixact from before relminmxid

Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100 - Mailing list pgsql-general

Previous

Next