Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100 - Mailing list pgsql-general

From adsj@novozymes.com (Adam Sjøgren)
Subject Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Date
Msg-id 87efjnp9cb.fsf@tullinup.koldfront.dk
Whole thread Raw
In response to [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100  (ADSJ (Adam Sjøgren) <adsj@novozymes.com>)
List pgsql-general
Pavan writes:

>> FWIW one of our support customers reported a very similar TOAST table
>> corruption issue last week which nearly caused an outage. After a lot of
>> analysis, I think I've now fully understood the reasons behind the
>> corruption, the underlying bug(s) and possible remedy. I am currently
>> working on writing a reproducible test case to demonstrate the problem and
>> writing the fix. More details on that soon.
>
> I've posted a reproducer and a proposed fix to -hackers [1]
>
> In the particular case that I investigated, a database crash/recovery was
> involved. But I think we should be able to create a scenario where OID
> wrap-around or a standby promotion triggers the problem.  I don't know if
> any of that was involved in the cases reported on this thread, but I've a
> strong suspicion that the underlying bug is probably the same.

In our case there was no crash+recovery, but we do have a high write
load (and the problem occurring quite seldom), so it sounds like it
fits.

> [1] https://www.postgresql.org/message-id/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-%3Dh4OLosXHf9w%40mail.gmail.com

Impressive investigation!

Just a couple of data points: your script to reproduce the problem does
so for all three versions of PostgreSQL we have in use: 9.3.22, 9.6.8
and 10.3.

And I specifically tested on our production machine which exhibits the
problem, running 9.3.22, and your script reproduced the problem there as
well.

I.e. ending with:

  ERROR:  unexpected chunk number 0 (expected 1) for toast value 16396 in pg_toast_16384
  REINDEX
  ERROR:  unexpected chunk number 0 (expected 1) for toast value 16396 in pg_toast_16384
  VACUUM
  ERROR:  unexpected chunk number 0 (expected 1) for toast value 16396 in pg_toast_16384


  Best regards,

    Adam

-- 
 "No more than that, but very powerful all the                Adam Sjøgren
  same; simple things are good."                        adsj@novozymes.com



pgsql-general by date:

Previous
From: Alexandre Arruda
Date:
Subject: Re: ERROR: found multixact from before relminmxid
Next
From: Vikas Sharma
Date:
Subject: Postgresql Split Brain: Which one is latest