Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100 - Mailing list pgsql-general

From Achilleas Mantzios
Subject Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Date
Msg-id f972af88-b5a0-ecc5-3fd6-ce22faa1bddc@matrix.gatewaynet.com
Whole thread Raw
In response to [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100  (ADSJ (Adam Sjøgren) <adsj@novozymes.com>)
Responses Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100  (Harry Ambrose <harry.ambrose@gmail.com>)
List pgsql-general
On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:
> Our database has started reporting errors like this:
>
>    2017-05-31 13:48:10 CEST ERROR:  unexpected chunk number 0 (expected 1) for toast value 14242189 in
pg_toast_10919630
>        ...
>    2017-06-01 11:06:56 CEST ERROR:  unexpected chunk number 0 (expected 1) for toast value 19573520 in
pg_toast_10919630
>
> (157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
> corresponds to a table with around 168 million rows.
>
> These went away, but the next day we got similar errors from another
> table:
>
>    2017-06-02 05:59:50 CEST ERROR:  unexpected chunk number 0 (expected 1) for toast value 47060150 in
pg_toast_10920100
>        ...
>    2017-06-02 06:14:54 CEST ERROR:  unexpected chunk number 0 (expected 1) for toast value 47226455 in
pg_toast_10920100
>
> (Only 4 this time) pg_toast_10920100 corresponds to a table with holds
> around 320 million rows (these are our two large tables).
>
> The next day we got 6 such errors and the day after 10 such errors. On
> June 5th we got 94, yesterday we got 111, of which one looked a little
> different:
>
>    2017-06-06 17:32:21 CEST ERROR:  unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value
114925100in pg_toast_10920100 
>
> and today the logs have 65 lines, ending with these:
>
>    2017-06-07 14:49:53 CEST ERROR:  unexpected chunk number 0 (expected 1) for toast value 131114834 in
pg_toast_10920100
>    2017-06-07 14:53:41 CEST ERROR:  unexpected chunk number 0 (expected 1) for toast value 131149566 in
pg_toast_10920100
First try to find which tables those toast relations refer to :
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?

Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;

also VACUUM the above tables.

You might want to write a function which iterates over the damaged table's rows in order to identify the damaged
row(s).And then do some good update to create a new version. 

> The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
> TB RAM on Ubuntu 14.04 (Linux 3.18.13).
>
> We are updating rows in the database a lot/continuously.
>
> There are no apparent indications of hardware errors (like ECC) in
> dmesg, nor any error messages logged by the LSI MegaRAID controller, as
> far as I can tell.
>
> We are running PostgreSQL 9.3.14 currently.
>
> The only thing I could see in the release notes since 9.3.14 that might
> be related is this:
>
>   "* Avoid very-low-probability data corruption due to testing tuple
>      visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
>      Tom Lane)"
>
> Although reading more about it, it doesn't sound like it would exhibit
> the symptoms we see?
>
> We have recently increased the load (to around twice the number of
> cores), though, which made me think we could be triggering corner cases
> we haven't hit before.
>
> We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
> if anyone has seen something like this, or have some ideas of how to
> investigate/what the cause might be.
>
>
>    Best regards,
>
>      Adam
>

--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt



pgsql-general by date:

Previous
From: David Rosenstrauch
Date:
Subject: [GENERAL] Advisory lock deadlock issue
Next
From: Merlin Moncure
Date:
Subject: Re: [GENERAL] Advisory lock deadlock issue