Thread: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
[GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Our database has started reporting errors like this: 2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630 ... 2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630 (157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630 corresponds to a table with around 168 million rows. These went away, but the next day we got similar errors from another table: 2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100 ... 2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100 (Only 4 this time) pg_toast_10920100 corresponds to a table with holds around 320 million rows (these are our two large tables). The next day we got 6 such errors and the day after 10 such errors. On June 5th we got 94, yesterday we got 111, of which one looked a little different: 2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100in pg_toast_10920100 and today the logs have 65 lines, ending with these: 2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100 2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100 The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3 TB RAM on Ubuntu 14.04 (Linux 3.18.13). We are updating rows in the database a lot/continuously. There are no apparent indications of hardware errors (like ECC) in dmesg, nor any error messages logged by the LSI MegaRAID controller, as far as I can tell. We are running PostgreSQL 9.3.14 currently. The only thing I could see in the release notes since 9.3.14 that might be related is this: "* Avoid very-low-probability data corruption due to testing tuple visibility without holding buffer lock (Thomas Munro, Peter Geoghegan, Tom Lane)" Although reading more about it, it doesn't sound like it would exhibit the symptoms we see? We have recently increased the load (to around twice the number of cores), though, which made me think we could be triggering corner cases we haven't hit before. We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear if anyone has seen something like this, or have some ideas of how to investigate/what the cause might be. Best regards, Adam -- "Lägg ditt liv i min hand Adam Sjøgren Sälj din själ till ett band" adsj@novozymes.com
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote: > Our database has started reporting errors like this: > > 2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630 > ... > 2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630 > > (157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630 > corresponds to a table with around 168 million rows. > > These went away, but the next day we got similar errors from another > table: > > 2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100 > ... > 2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100 > > (Only 4 this time) pg_toast_10920100 corresponds to a table with holds > around 320 million rows (these are our two large tables). > > The next day we got 6 such errors and the day after 10 such errors. On > June 5th we got 94, yesterday we got 111, of which one looked a little > different: > > 2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100in pg_toast_10920100 > > and today the logs have 65 lines, ending with these: > > 2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100 > 2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100 First try to find which tables those toast relations refer to : select 10919630::regclass , 10920100::regclass ; Are those critical tables? Can you restore them somehow? Also you may consider REINDEX TABLE pg_toast.pg_toast_10920100; REINDEX TABLE pg_toast.pg_toast_10919630; REINDEX TABLE <name of table 10920100>; REINDEX TABLE <name of table 10919630>; also VACUUM the above tables. You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s).And then do some good update to create a new version. > The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3 > TB RAM on Ubuntu 14.04 (Linux 3.18.13). > > We are updating rows in the database a lot/continuously. > > There are no apparent indications of hardware errors (like ECC) in > dmesg, nor any error messages logged by the LSI MegaRAID controller, as > far as I can tell. > > We are running PostgreSQL 9.3.14 currently. > > The only thing I could see in the release notes since 9.3.14 that might > be related is this: > > "* Avoid very-low-probability data corruption due to testing tuple > visibility without holding buffer lock (Thomas Munro, Peter Geoghegan, > Tom Lane)" > > Although reading more about it, it doesn't sound like it would exhibit > the symptoms we see? > > We have recently increased the load (to around twice the number of > cores), though, which made me think we could be triggering corner cases > we haven't hit before. > > We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear > if anyone has seen something like this, or have some ideas of how to > investigate/what the cause might be. > > > Best regards, > > Adam > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
On 7 Jun 2017, at 15:22, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:Our database has started reporting errors like this:First try to find which tables those toast relations refer to :
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630
(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.
These went away, but the next day we got similar errors from another
table:
2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100
(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).
The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:
2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
and today the logs have 65 lines, ending with these:
2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?
Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;
also VACUUM the above tables.
You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).
We are updating rows in the database a lot/continuously.
There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.
We are running PostgreSQL 9.3.14 currently.
The only thing I could see in the release notes since 9.3.14 that might
be related is this:
"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"
Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?
We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.
We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.
Best regards,
Adam
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Hi,One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answer it is time to start blaming my hardware. Are those tablespaces on any cheap SSD's ?Out of interest, are you using any tablespaces other than pg_default? I can only replicate the issue when using separately mounted tablespaces.
You don't use index when pg_dump . If only the index is corrupted you can get away with dump/reload (but for big DBs this is unrealistic)I have been investigating this quite extensively and everything I can find on the web suggests data corruption. However running the the following DO reports no errors and I can dump the database without issue.
I also found the following has been reported: https://www.postgresql.org/message-id/20161201165505.4360.28203@wrigleys.postgresql.orgBest wishes,HarryOn 7 Jun 2017, at 15:22, Achilleas Mantzios <achill@matrix.gatewaynet.com> wrote:On 07/06/2017 16:33, ADSJ (Adam Sjøgren) wrote:Our database has started reporting errors like this:First try to find which tables those toast relations refer to :
2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630
...
2017-06-01 11:06:56 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 19573520 in pg_toast_10919630
(157 times, for different toast values, same pg_toast_nnn). pg_toast_10919630
corresponds to a table with around 168 million rows.
These went away, but the next day we got similar errors from another
table:
2017-06-02 05:59:50 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47060150 in pg_toast_10920100
...
2017-06-02 06:14:54 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 47226455 in pg_toast_10920100
(Only 4 this time) pg_toast_10920100 corresponds to a table with holds
around 320 million rows (these are our two large tables).
The next day we got 6 such errors and the day after 10 such errors. On
June 5th we got 94, yesterday we got 111, of which one looked a little
different:
2017-06-06 17:32:21 CEST ERROR: unexpected chunk size 1996 (expected 1585) in final chunk 0 for toast value 114925100 in pg_toast_10920100
and today the logs have 65 lines, ending with these:
2017-06-07 14:49:53 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131114834 in pg_toast_10920100
2017-06-07 14:53:41 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 131149566 in pg_toast_10920100
select 10919630::regclass , 10920100::regclass ;
Are those critical tables? Can you restore them somehow?
Also you may consider
REINDEX TABLE pg_toast.pg_toast_10920100;
REINDEX TABLE pg_toast.pg_toast_10919630;
REINDEX TABLE <name of table 10920100>;
REINDEX TABLE <name of table 10919630>;
also VACUUM the above tables.
You might want to write a function which iterates over the damaged table's rows in order to identify the damaged row(s). And then do some good update to create a new version.The database is 10 TB on disk (SSDs) and runs on a 48 core server with 3
TB RAM on Ubuntu 14.04 (Linux 3.18.13).
We are updating rows in the database a lot/continuously.
There are no apparent indications of hardware errors (like ECC) in
dmesg, nor any error messages logged by the LSI MegaRAID controller, as
far as I can tell.
We are running PostgreSQL 9.3.14 currently.
The only thing I could see in the release notes since 9.3.14 that might
be related is this:
"* Avoid very-low-probability data corruption due to testing tuple
visibility without holding buffer lock (Thomas Munro, Peter Geoghegan,
Tom Lane)"
Although reading more about it, it doesn't sound like it would exhibit
the symptoms we see?
We have recently increased the load (to around twice the number of
cores), though, which made me think we could be triggering corner cases
we haven't hit before.
We will be upgrading to PostgreSQL 9.3.17 during the weekend, but I'd like to hear
if anyone has seen something like this, or have some ideas of how to
investigate/what the cause might be.
Best regards,
Adam
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes: > I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicatedthe issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are: This is a very interesting report, but you didn't actually provide a reproducer, just a handwavy outline. If you submit a script that makes this happen, we will most definitely look into it. But people aren't going to be excited about trying to reverse-engineer a test case out of a vague description. > I also found the following has been reported: > https://www.postgresql.org/message-id/20161201165505.4360.28203@wrigleys.postgresql.org That person never came back with a self-contained test case, either. https://wiki.postgresql.org/wiki/Guide_to_reporting_problems regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Hi, Thanks for the responses. > "One lesson I learned from the BSD camp when dealing with random freezes and panics : when all else fails to give an answerit is time to start blaming my hardware. Are those tablespaces on any cheap SSD's ?” The tablespaces are not sat on SSD’s. Something I had also considered. Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over? Best wishes, Harry > On 7 Jun 2017, at 16:27, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Harry Ambrose <harry.ambrose@gmail.com> writes: >> I have been following the updates to the 9.4 branch hoping a fix will appear, but sadly no luck yet. I have manually replicatedthe issue on 9.4.4, 9.4.10 and 9.4.12. My replication steps are: > > This is a very interesting report, but you didn't actually provide a > reproducer, just a handwavy outline. If you submit a script that > makes this happen, we will most definitely look into it. But > people aren't going to be excited about trying to reverse-engineer > a test case out of a vague description. > >> I also found the following has been reported: >> https://www.postgresql.org/message-id/20161201165505.4360.28203@wrigleys.postgresql.org > > That person never came back with a self-contained test case, either. > > https://wiki.postgresql.org/wiki/Guide_to_reporting_problems > > regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes: > Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over? If it's not enormous, just send it as an email attachment. regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes:
> Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
Attachment
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes:
> Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
Attachment
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes:
> Tom - I can provide a jar that I have been using to replicate the issue. Whats the best transport method to send it over?
If it's not enormous, just send it as an email attachment.
regards, tom lane
Attachment
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes: > Please find the jar attached (renamed with a .txt extension as I know some > email services deem jars a security issue). Hmm, the output from this script reminds me quite a lot of one I was sent in connection with bug #14444 awhile back: https://www.postgresql.org/message-id/20161201165505.4360.28203%40wrigleys.postgresql.org Was that a colleague of yours? Anyway, the bad news is I couldn't reproduce the problem then and I can't now. I don't know if it's a timing issue or if there's something critical about configuration that I'm not duplicating. Can you explain what sort of platform you're testing on, and what nondefault configuration settings you're using? regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Maybe you could give some info on :Hi Tom,Thanks for attempting to replicate the issue.Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?Further details about the environment that I can replicate on below:- Non default postgresql.conf settings:checkpoint_segments = 192checkpoint_completion_target = 0.9checkpoint_timeout = 5minwal_keep_segments = 256wal_writer_delay = 200msarchive_mode = onarchive_command = 'rsync -e ssh -arv /wal/pg_xlog/%f postgres@<removed hostname>:/wal/pg_xlog'archive_timeout = 60syslog_facility = 'LOCAL0'log_statement = 'mod'syslog_ident = 'postgres'log_line_prefix = '%h %m %p %c %u %a %e 'log_timezone = 'GB'track_activities = ontrack_counts = ondatestyle = 'iso, mdy'timezone = 'GB'default_text_search_config = 'pg_catalog.english'array_nulls = onsql_inheritance = onstandard_conforming_strings = onsynchronize_seqscans = ontransform_null_equals = off- Two node master/slave setup using streaming replication (without slots).- CentOS 6.9 (2.6.32-696.el6.x86_64).- PostgreSQL 9.4.10 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-17), 64-bit.- 64GiB RAM.- AMD Opteron(TM) Processor 6238.- pg_default sat on 2 disk RAID-1 conifugration (ext3 filesystem).- Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3 filesystem).- All disks are HP 600G SAS 6.0Gbps with P420(i) controllers and battery backed cache enabled.
- your ext3 mkfs and mount options (journal, barriers, etc)
- your controller setup (battery should be working good and cache mode set to write back)
- your disks setup (write cache should be disabled)
- you should check your syslogs/messages for any errors related to storage
- is your RAM ECC? Did you run any memtest?
- is your CPU overheating ?
- have you experienced any crashes/freezes ?
Please let me know if you require further info.Best wishes,Harry
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Maybe you could give some info on :
- your ext3 mkfs and mount options (journal, barriers, etc)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 36634624
Block count: 146506767
Reserved block count: 7325338
Free blocks: 143785740
Free inodes: 36627866
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 989
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 64
RAID stripe width: 64
Filesystem created: Fri Aug 9 16:11:53 2013
Last mount time: Fri Apr 21 22:37:02 2017
Last write time: Fri Apr 21 22:37:02 2017
Mount count: 2
Maximum mount count: 100
Last checked: Thu Sep 15 18:52:43 2016
Check interval: 31536000 (12 months, 5 days)
Next check after: Fri Sep 15 18:52:43 2017
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Journal backup: inode blocks
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Filesystem flags: signed_directory_hash
Default mount options: (none)
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 73261056
Block count: 293013543
Reserved block count: 14650677
Free blocks: 286208439
Free inodes: 73174728
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 954
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 8192
Inode blocks per group: 512
RAID stride: 64
RAID stripe width: 128
Filesystem created: Fri Aug 9 16:11:53 2013
Last mount time: Fri Apr 21 22:37:02 2017
Last write time: Fri Apr 21 22:37:02 2017
Mount count: 2
Maximum mount count: 100
Last checked: Thu Sep 15 18:52:43 2016
Check interval: 31536000 (12 months, 5 days)
Next check after: Fri Sep 15 18:52:43 2017
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 256
Required extra isize: 28
Desired extra isize: 28
Journal inode: 8
Default directory hash: half_md4
Journal backup: inode blocks
- your controller setup (battery should be working good and cache mode set to write back)
- your disks setup (write cache should be disabled)
- you should check your syslogs/messages for any errors related to storage
- is your RAM ECC? Did you run any memtest?
- is your CPU overheating ?
- have you experienced any crashes/freezes ?
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
CREATE OR REPLACE FUNCTION listings_audit() RETURNS TRIGGER AS
$listings_audit$
BEGIN
IF (TG_OP = 'DELETE') THEN
IF (EXISTS (
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'schema_name'
AND c.relname = 'table_name'
AND c.relkind = 'r' -- only tables
)) THEN
INSERT INTO listings_changes
SELECT now(), 'DELETE', OLD.*;
RETURN OLD;
ELSE RAISE EXCEPTION 'Table does not exists';
END IF;
ELSIF (TG_OP = 'UPDATE') THEN
IF (EXISTS (
SELECT 1
FROM pg_catalog.pg_class c
JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE n.nspname = 'schema_name'
AND c.relname = 'table_name'
AND c.relkind = 'r' -- only tables
)) THEN
INSERT INTO listings_changes
SELECT now(), 'UPDATE', NEW.*;
RETURN NEW;
ELSE RAISE EXCEPTION 'Table does not exists';
END IF;
ELSEIF (TG_OP = 'INSERT') THEN
INSERT INTO listings_changes
SELECT now(), 'INSERT', NEW.*;
RETURN NEW;
END IF;
EXCEPTION
WHEN SQLSTATE '42611' THEN
RAISE EXCEPTION 'Columns do not match audit file does not match user file';
WHEN SQLSTATE '42P16' THEN
RAISE EXCEPTION 'Table does not exists';
WHEN OTHERS THEN
RAISE EXCEPTION 'PostgresSQL error code that has occurred';
RETURN SQLSTATE;
END;
$listings_audit$ LANGUAGE plpgsql;
Harry Ambrose <harry.ambrose@gmail.com> writes:
> Please find the jar attached (renamed with a .txt extension as I know some
> email services deem jars a security issue).
Hmm, the output from this script reminds me quite a lot of one I was
sent in connection with bug #14444 awhile back:
https://www.postgresql.org/message-id/20161201165505. 4360.28203%40wrigleys. postgresql.org
Was that a colleague of yours?
Anyway, the bad news is I couldn't reproduce the problem then and I can't
now. I don't know if it's a timing issue or if there's something critical
about configuration that I'm not duplicating. Can you explain what sort
of platform you're testing on, and what nondefault configuration settings
you're using?
regards, tom lane
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Robert Lakes <robertl@propaas.com> writes: > I am new to postgress and I am trying to write my first function to insert, > update or delete and trap errors as a result of the table not existing , > the columns not exist or if any other error simply pass back the sqlstate Please do not hijack an existing thread to ask an unrelated question. Start a new thread (ie "compose" don't "reply") and use an appropriate subject line. regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) fortoast value 76753264 in pg_toast_10920100
ADSJ (Adam Sjøgren) wrote: > Our database has started reporting errors like this: > > 2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630 Does the problem still reproduce if you revert commit 6c243f90ab6904f27fa990f1f3261e1d09a11853? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) fortoast value 76753264 in pg_toast_10920100
Harry Ambrose wrote: > Hi, > > Please find the jar attached (renamed with a .txt extension as I know some > email services deem jars a security issue). > > The jar accepts the following arguments: > > $1 = host > $2 = database > $3 = username > $4 = password > $5 = port > > It returns its logging to STDOUT. Please let me know if you require further > info. I'm unable to run this file. Maybe it was corrupted in transit, given that it was considered to be text. $ md5sum toast-corrupter-aio.jar 7b1f5854c286f9b956b9442afd455b7a toast-corrupter-aio.jar $ java -jar toast-corrupter-aio.jar Error: Invalid or corrupt jarfile toast-corrupter-aio.jar Even unzip complains (after extracting a bunch of JDBC .class files) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > I'm unable to run this file. Maybe it was corrupted in transit, given > that it was considered to be text. My copy came through fine, so I tried to forward it to you off-list, but gmail rejected it as a possible security hazard. Do you know which binary mime types they won't reject? regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
On 06/11/2017 08:34 AM, Tom Lane wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> I'm unable to run this file. Maybe it was corrupted in transit, given >> that it was considered to be text. > > My copy came through fine, so I tried to forward it to you off-list, > but gmail rejected it as a possible security hazard. Do you know > which binary mime types they won't reject? Well this is a list that they do reject: https://support.google.com/mail/answer/6590?hl=en So I guess you can work backwards from there. Their suggestion is to upload to Google Drive. That or use a third party site, like Dropbox. > > regards, tom lane > > -- Adrian Klaver adrian.klaver@aklaver.com
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Hi,
So I guess you run memtest86+ and it reported that your memory is indeed ECC and also that it is working properly?No error messages found.- is your RAM ECC? Did you run any memtest?Yes, memory is ECC. No error messages found.
Best wishes,Harry
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > ADSJ (Adam Sjøgren) wrote: > >> Our database has started reporting errors like this: >> >> 2017-05-31 13:48:10 CEST ERROR: unexpected chunk number 0 (expected 1) for toast value 14242189 in pg_toast_10919630 > Does the problem still reproduce if you revert commit > 6c243f90ab6904f27fa990f1f3261e1d09a11853? I will try and get back to you with the results (building new .deb packages as I type this). Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and the errors keep appearing the log. We have tried running a function similar to the one described in http://www.databasesoup.com/2013/10/de-corrupting-toast-tables.html as suggested by Adrian Klaver, but we haven't been able to get any errors from that. This is the function we have run over our two tables: CREATE OR REPLACE FUNCTION check_table_a(from_id int, to_id int) RETURNS VOID LANGUAGE PLPGSQL AS $f$ declare curid INT := 0; rec RECORD; badid INT; detoast TEXT; begin FOR badid IN SELECT id FROM table_a where id >= from_id and id <= to_id LOOP curid = curid + 1; if curid % 10000 = 0 then raise notice '% rows inspected (%, %,%)', curid, badid, from_id, to_id; end if; begin SELECT * INTO rec FROM table_a where id = badid; detoast := substr(rec.fts::text,1,2000); exception when others then raise notice 'data for table_a id: % is corrupt', badid; continue; end; end loop; end; $f$; -- The other function has: -- detoast := substr(vcontent.document,1,2000); -- -- and is otherwise identical. But no 'data for table... is corrupt' is printed. We are only substr()'ing one field (which we know is big) for each row. Should we do so for _all_ fields? Is there an elegant way to do so? Best regards, Adam -- "Lägg ditt liv i min hand Adam Sjøgren Sälj din själ till ett band" adsj@novozymes.com
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
After 2 full attempts, (and after bringing my poor - old workstation to its knees) it still does not produce the supposed ERROR :Hi,> Their suggestion is to upload to Google Drive. That or use a third party site, like Dropbox.I have uploaded the jar to dropbox, link below (please let me know if you have any issues downloading):
........
update 160000
update 180000
Updated all
Attempting vacuum
Vacuum completed
Dropping the table
=========================================
New attempt - number 3
Creating the table if it does not exist
Inserting the rows
Executing 0
Executing 40000
^C
PostgreSQL version : 9.3.4
Best wishes,Harry
-- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
pls tell me, I am currently running 2nd run in my box, (New attempt 2), and its in the "Attempting vacuum" phase.
What is it supposed to do next?
I got no errors , it has gotten my machine to its knees.
c.setAutoCommit(true);
System.out.println("Attempting vacuum");
c.prepareCall("VACUUM (FULL, ANALYZE, VERBOSE) x").execute();
System.out.println("Vacuum end. 30 sec sleep");
Thread.sleep(60000);
System.out.println("Attempting vacuum");
c.prepareCall("VACUUM (FULL, ANALYZE, VERBOSE) x").execute();
System.out.println("Vacuum completed");
Thread.sleep(30000);
System.out.println("Dropping the table");
c.createStatement().execute("drop table if exists x");
Thread.sleep(30000);
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
update 100000
update 120000
update 140000
update 160000
update 180000
Update all
Vacuum
org.postgresql.util.PSQLException: ERROR: unexpected chunk number 2285 (expected 0) for toast value 187504167 in pg_toast_187504156
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2455)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2155)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:288)
at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:430)
at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:356)
at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:168)
at org.postgresql.jdbc.PgCallableStatement.executeWithFlags(PgCallableStatement.java:78)
at org.postgresql.jdbc.PgPreparedStatement.execute(PgPreparedStatement.java:157)
at Start.execute(Start.java:118)
at Start.main(Start.java:20)
=========================================
New attempt - number 2
Inserting the rows
Executing 0
Executing 40000
Executing 80000
Executing 120000
Executing 160000
Executing 200000
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Achilleas Mantzios <achill@matrix.gatewaynet.com> writes: > After 2 full attempts, (and after bringing my poor - old workstation to its knees) it still does not produce the supposedERROR : Yeah, I've had little luck reproducing it either. However, I noticed a few messages back that Harry is testing against a master/slave setup not just a standalone server. Maybe that's not as irrelevant as it should be. regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes: > - Custom tablespaces (where the errors occur) sat on 4 disk RAID-10 (ext3 > filesystem). BTW, how do you get that jar to make the test table on a non-default tablespace? Or are you just putting the whole test DB on a tablespace? regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
BTW, how do you get that jar to make the test table on a non-default
tablespace? Or are you just putting the whole test DB on a tablespace?
regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
(...)
Vacuum end. 30 sec sleep
Update selective
Inserting the rows
update 0
update 20000
update 40000
update 60000
update 80000
update 100000
update 120000
update 140000
update 160000
update 180000
Update all
Attempting vacuum
Vacuum completed
dropping the table
=========================================
New attempt - number 96
Inserting the rows
Executing 0
Executing 40000
Executing 80000
Executing 120000
Executing 160000
Executing 200000
Executing 240000
Executing 280000
(...)
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Harry Ambrose <harry.ambrose@gmail.com> writes: > Not sure whether its relevant or not, however upon adding an ANALYSE before > the second vacuum the issue has not presented when testing. I have managed > 95 cycles thus far. I'm still unable to reproduce :-( --- I ran about two dozen cycles overnight with no sign of trouble. This time I was using a master/slave pair with the test database in a non-default partition, so neither of those aspects seem to be key after all. I suspect the reason for it being so hard to reproduce is that there's a timing window involved. But that doesn't offer much to go on in terms of being able to make a more reproducible case. regards, tom lane
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Adam Sjøgren <adsj@novozymes.com> wrote: > Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and > the errors keep appearing the log. Just a quick update with more observations: All the errors in the postgres.log from one of the tables are triggered by a stored procedure that gathers data to put in a field used for full text search - this stored procedure is called by a before update trigger on the table. We have only seen it in the log, but not been able to reproduce it. We have, however, now got a row in the other big table where we can get the error just by running a SELECT * on the row, in psql: user@server db=# select * from ourschema.table_a where id = 6121931; ERROR: unexpected chunk number 0 (expected 1) for toast value 339846807 in pg_toast_10919630 user@server db=# Which is both nice - we can show the error on demand - but also more worrying, I guess, because that means the problem is "on disk". Running this in a stored procedure over the record in question: > SELECT * > INTO rec > FROM table_a where id = badid; > detoast := substr(rec.fts::text,1,2000); > exception > when others then > raise notice 'data for table_a id: % is corrupt', badid; > continue; also shows the error: user@server db=# SELECT ourschema.check_sequence(6121931, 6121931); NOTICE: data for table_a id: 6121931 is corrupt check_sequence ---------------- (1 row) We are running this over the entire (160M+ row) table now, to see if any other rows are affected. So, we can reproduce the error message, but we can't reproduce the problem from scratch. Any ideas on what to look at, given a non-transient problem-row? Our next step will be to try to switch to 9.3.17 with 6c243f90ab6904f27fa990f1f3261e1d09a11853 reverted as suggested by Alvaro Herrera last week. Best regards, Adam -- "Lägg ditt liv i min hand Adam Sjøgren Sälj din själ till ett band" adsj@novozymes.com
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
int hasfailed;
hasfailed = 0;
(...)
while ((ttup = systable_getnext_ordered(toastscan, ForwardScanDirection)) != NULL)
{
/*
* Have a chunk, extract the sequence number and the data
*/
residx = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, &isnull));
Assert(!isnull);
chunk = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, &isnull));
Assert(!isnull);
(...)
/*
* Some checks on the data we've found
*/
if (residx != nextidx)
{
/*
* Remove error status and convert to DEBUG5. Its important to see what other values are returned. We should error outside the loop.
elog(ERROR, "unexpected chunk number %d (expected %d) for toast value %u in %s",
residx, nextidx,
toast_pointer.va_valueid,
RelationGetRelationName(toastrel));
*/
elog(DEBUG5, "unexpected chunk number %d (expected %d) for toast value %u in %s",
residx, nextidx,
toast_pointer.va_valueid,
RelationGetRelationName(toastrel));
hasfailed = 1;
}
(...)
/*
* Log the current chunk number and the expected chunk number
*/
if (hasfailed!=0)
{
elog(DEBUG5, "HARRYAMBROSE - Current chunk number is %d, expecting %d.", residx, nextidx);
}
nextidx++;
} // End while loop
elog(DEBUG5, "HARRYAMBROSE - Has failed is %d after loop", hasfailed);
if (hasfailed!=0)
{
elog(ERROR, "unexpected chunk number exception, please see DEBUG.");
}
Happy to provide further info if required. I have tried to pick only the interesting changes above ^.
Thanks again for your help!
Have a great weekend :)
Harry
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
On 2017-06-21 Adam Sjøgren <adsj@novozymes.com> wrote: > Adam Sjøgren <adsj@novozymes.com> wrote: >> Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and >> the errors keep appearing the log. Just to close this, for the record: We haven't seen the errors since 2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time of writing) on 2017-06-10. Whether this means that the affected rows gradually got overwritten after switching to .17 and thus got fixed, or if something subtle in our workflow changed, so we aren't hitting this anymore, or something else entirely is the answer, we're not sure. We didn't get to trying Alvaro Herrera's suggestion of removing 6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped appearing "by themselves". Best regards, Adam -- "My Dear Babbage. I am in much dismay at having Adam Sjøgren got into so amazing a quagmire & botheration with adsj@novozymes.com these Numbers, that I cannot possibly get the thing done today."
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
On 09/08/2017 15:27, ADSJ (Adam Sjøgren) wrote: > On 2017-06-21 Adam Sjøgren <adsj@novozymes.com> wrote: > >> Adam Sjøgren <adsj@novozymes.com> wrote: >>> Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and >>> the errors keep appearing the log. > Just to close this, for the record: We haven't seen the errors since > 2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time > of writing) on 2017-06-10. > > Whether this means that the affected rows gradually got overwritten > after switching to .17 and thus got fixed, or if something subtle in our > workflow changed, so we aren't hitting this anymore, or something else > entirely is the answer, we're not sure. Glad you sorted it out! You have been consistent in your effort to chase this down, and reverted back with your findingsto close the case. Thumbs up! > We didn't get to trying Alvaro Herrera's suggestion of removing > 6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped > appearing "by themselves". > > > Best regards, > > Adam > -- Achilleas Mantzios IT DEV Lead IT DEPT Dynacom Tankers Mgmt
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
On 09/08/2017 15:27, ADSJ (Adam Sjøgren) wrote:On 2017-06-21 Adam Sjøgren <adsj@novozymes.com> wrote:Glad you sorted it out! You have been consistent in your effort to chase this down, and reverted back with your findings to close the case. Thumbs up!Adam Sjøgren <adsj@novozymes.com> wrote:Just to close this, for the record: We haven't seen the errors sinceMeanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and
the errors keep appearing the log.
2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time
of writing) on 2017-06-10.
Whether this means that the affected rows gradually got overwritten
after switching to .17 and thus got fixed, or if something subtle in our
workflow changed, so we aren't hitting this anymore, or something else
entirely is the answer, we're not sure.We didn't get to trying Alvaro Herrera's suggestion of removing
6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped
appearing "by themselves".
Best regards,
Adam
--
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
On Wed, Aug 9, 2017 at 6:27 AM, ADSJ (Adam Sjøgren) <adsj@novozymes.com> wrote: > On 2017-06-21 Adam Sjøgren <adsj@novozymes.com> wrote: > >> Adam Sjøgren <adsj@novozymes.com> wrote: > >>> Meanwhile, I can report that I have upgraded from 9.3.14 to 9.3.17 and >>> the errors keep appearing the log. > > Just to close this, for the record: We haven't seen the errors since > 2017-06-30. We upgraded to 9.3.17 (latest 9.3 point-release at the time > of writing) on 2017-06-10. > > Whether this means that the affected rows gradually got overwritten > after switching to .17 and thus got fixed, or if something subtle in our > workflow changed, so we aren't hitting this anymore, or something else > entirely is the answer, we're not sure. > > We didn't get to trying Alvaro Herrera's suggestion of removing > 6c243f90ab6904f27fa990f1f3261e1d09a11853 before the errors stopped > appearing "by themselves". This sounds a lot like bad sectors getting remapped.
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) for toastvalue 76753264 in pg_toast_10920100
Re: [GENERAL] ERROR: unexpected chunk number 0 (expected 1) fortoast value 76753264 in pg_toast_10920100
Re: Harry Ambrose 2017-08-25 <CAK4Knu86q98fWNzYntsZk1uGca6F129wLShmrkzyb2EZkSMDWQ@mail.gmail.com> > Hi All, > > Sorry to open this can of worms again. However, we are still struggling > with this issue across quite a large amount of our estate. Hi, we've just seen exactly this error on a customer database running 9.5.3 (postgresql95.x86_64 9.5.3-2PGDG.rhel6). Luckily just one tuple was affected. Symptoms were: # select text from k... where id = 719764749; ERROR: unexpected chunk number 0 (expected 1) for toast value 3347468184 in pg_toast_922511637 The toast table itself was perfectly ok, with a single chunk: # select * from pg_toast.pg_toast_922511637 where chunk_id = 3347468184; chunk_id | 3347468184 chunk_seq | 0 chunk_data | ...valid text string... (with bytea_output = 'escape') Updating or deleting the field/row didn't work: # update k... set text = '...same text as above...' where id = 719764749; ERROR: XX000: tuple concurrently updated ORT: simple_heap_delete, heapam.c:3171 # delete from k... where id = 719764749; ERROR: XX000: tuple concurrently updated ORT: simple_heap_delete, heapam.c:3171 The problem persisted over the last two weeks (eventually noticed by pg_dump starting to fail between August 15th and 18th). The server was started on July 26th. Besides the ERRORing statements above, I didn't actively resolve it, suddenly SELECTing the original row just worked again. According to pg_stat_user_tables, autovacuum didn't hit in. I can't say if there were backends open for two weeks. At the time it resolved itself, the oldest backend was from August 27th. If xmin/xmax/multixact... data from this server is interesting, I can extract it on request. Christoph
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
We are seeing these "ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630" in increasing numbers again¹. An observation is that they seem to only happen for tsvector fields. Here is an example sequence of queries for a record (we have more than a handful of these currently), which exhibits the problem. First we get two other fields, 'sequence' is large enough to be toast'ed: 2018-01-16 08:51:17.362 efam=# select id,sequence from efam.sequence where id = 164504550; Time: 1.150 ms No problem. Then we also fetch the tsvector field: 2018-01-16 08:51:27.773 efam=# select id,sequence,fts from efam.sequence where id = 164504550; ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630 Time: 0.912 ms And we get the error. Getting the id and the tsvector: 2018-01-16 08:51:34.174 efam=# select id,fts from efam.sequence where id = 164504550; ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630 Time: 6.138 ms gives the error. Just getting the tsvector: 2018-01-16 08:51:40.066 efam=# select fts from efam.sequence where id = 164504550; ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630 Time: 1.805 ms Gives the error. Field definitions: id | integer | sequence | text | fts | tsvector | Maybe the observation that this only happens (for us) on tsvector rings a bell for someone? Best regards, Adam ¹ As reported back in June, 2017, starting here: https://www.postgresql.org/message-id/7pefuv53dl.fsf%40novozymes.com (I have to admit I never got around to trying to revert the commit Alvaro Herrera suggested we try without (https://www.postgresql.org/message-id/20170611033840.hruqadsk47qcdrqb%40alvherre.pgsql)) -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
Hi Adam , I've been seeing this same kind of Error in my clusters for a while .
Is this running on postgresql 9.4.8? Because mine was fixed upgrading to 9.4.11 , a bug indeed .
Kind regards
Sent: Tuesday, January 16, 2018 7:18 AM
To: pgsql-general@postgresql.org
Subject: Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
toast value 1498303849 in pg_toast_10919630" in increasing numbers again¹.
An observation is that they seem to only happen for tsvector fields.
Here is an example sequence of queries for a record (we have more than a
handful of these currently), which exhibits the problem.
First we get two other fields, 'sequence' is large enough to be toast'ed:
2018-01-16 08:51:17.362 efam=# select id,sequence from efam.sequence where id = 164504550;
Time: 1.150 ms
No problem.
Then we also fetch the tsvector field:
2018-01-16 08:51:27.773 efam=# select id,sequence,fts from efam.sequence where id = 164504550;
ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630
Time: 0.912 ms
And we get the error.
Getting the id and the tsvector:
2018-01-16 08:51:34.174 efam=# select id,fts from efam.sequence where id = 164504550;
ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630
Time: 6.138 ms
gives the error.
Just getting the tsvector:
2018-01-16 08:51:40.066 efam=# select fts from efam.sequence where id = 164504550;
ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630
Time: 1.805 ms
Gives the error.
Field definitions:
id | integer |
sequence | text |
fts | tsvector |
Maybe the observation that this only happens (for us) on tsvector rings
a bell for someone?
Best regards,
Adam
¹ As reported back in June, 2017, starting here:
https://www.postgresql.org/message-id/7pefuv53dl.fsf%40novozymes.com
(I have to admit I never got around to trying to revert the commit
Alvaro Herrera suggested we try without
(https://www.postgresql.org/message-id/20170611033840.hruqadsk47qcdrqb%40alvherre.pgsql))
--
"No more than that, but very powerful all the Adam Sjøgren
same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Hi Jorge, This sounds very interesting - we are running PostgreSQL 9.3.20. Did you ever find out exactly what the change that solved the problem between 9.4.8 and 9.4.11 was? Best regards, Adam Jorge writes: > Hi Adam , I've been seeing this same kind of Error in my clusters for a while . > > Is this running on postgresql 9.4.8? Because mine was fixed upgrading to 9.4.11 , a bug indeed . > > > > Kind regards > > > > Jorge Daniel Fernandez > > > ________________________________ > From: Adam Sjøgren <adsj@novozymes.com> > Sent: Tuesday, January 16, 2018 7:18 AM > To: pgsql-general@postgresql.org > Subject: Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100 > > We are seeing these "ERROR: unexpected chunk number 0 (expected 1) for > toast value 1498303849 in pg_toast_10919630" in increasing numbers again¹. > > An observation is that they seem to only happen for tsvector fields. > > Here is an example sequence of queries for a record (we have more than a > handful of these currently), which exhibits the problem. > > First we get two other fields, 'sequence' is large enough to be toast'ed: > > 2018-01-16 08:51:17.362 efam=# select id,sequence from efam.sequence where id = 164504550; > Time: 1.150 ms > > No problem. > > Then we also fetch the tsvector field: > > 2018-01-16 08:51:27.773 efam=# select id,sequence,fts from efam.sequence where id = 164504550; > ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630 > Time: 0.912 ms > > And we get the error. > > Getting the id and the tsvector: > > 2018-01-16 08:51:34.174 efam=# select id,fts from efam.sequence where id = 164504550; > ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630 > Time: 6.138 ms > > gives the error. > > Just getting the tsvector: > > 2018-01-16 08:51:40.066 efam=# select fts from efam.sequence where id = 164504550; > ERROR: unexpected chunk number 0 (expected 1) for toast value 1498303849 in pg_toast_10919630 > Time: 1.805 ms > > Gives the error. > > Field definitions: > > id | integer | > sequence | text | > fts | tsvector | > > Maybe the observation that this only happens (for us) on tsvector rings > a bell for someone? > > > Best regards, > > Adam > > > ¹ As reported back in June, 2017, starting here: > https://www.postgresql.org/message-id/7pefuv53dl.fsf%40novozymes.com > (I have to admit I never got around to trying to revert the commit > Alvaro Herrera suggested we try without > (https://www.postgresql.org/message-id/20170611033840.hruqadsk47qcdrqb%40alvherre.pgsql)) > > -- > "No more than that, but very powerful all the Adam Sjøgren > same; simple things are good." adsj@novozymes.com > > -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
On Tue, Jan 16, 2018 at 07:05:19PM +0100, Adam Sjøgren wrote: > This sounds very interesting - we are running PostgreSQL 9.3.20. Which means that we may be looking at a new bug, 9.3.20 is the latest in the 9.3 set as of today. > Did you ever find out exactly what the change that solved the problem > between 9.4.8 and 9.4.11 was? In this case, I think that you are looking for this thread: https://www.postgresql.org/message-id/20160826072658.15676.7628@wrigleys.postgresql.org And this commit: commit: a694435641faf26a9a4c210d20576ae836e86c48 author: Tom Lane <tgl@sss.pgh.pa.us> date: Sat, 3 Sep 2016 13:28:53 -0400 Fix corrupt GIN_SEGMENT_ADDITEMS WAL records on big-endian hardware. Both involved 9.4.8. -- Michael
Attachment
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Michael writes: > On Tue, Jan 16, 2018 at 07:05:19PM +0100, Adam Sjøgren wrote: >> This sounds very interesting - we are running PostgreSQL 9.3.20. > Which means that we may be looking at a new bug, 9.3.20 is the latest in > the 9.3 set as of today. Yes; unfortunately we have failed to reproduce it outside production. The fact that the tsvector field is always involved when we see the error might be of interest (the tsvector field is the most updated in our database, however). Just for completeness, the tsvector field has a GIN index on it: "sequence_fts_idx" gin (fts) WITH (fastupdate=off) and it is updated by a BEFORE INSERT OR UPDATE trigger. A new observation is that when we previously were able to get the "unexpected chunk number" to go away by simply updating the tsvector field of the offending record, we now have a record where we get "ERROR: tuple concurrently updated" when we try overwriting the field. On another record exhibiting the "unexpected chunk number" error, we could overwrite the fts field, as can we on rows not affected by the "unexpected chunk number"-error. So it seems the two errors might somehow be related. We tried stopping all activity on the database, and still got the "ERROR: tuple concurrently updated" on the row with "unexpected chunk number". Also, the error we are getting is now: "unexpected chunk number 2 (expected 3) for toast value 1498303849 in pg_toast_10919630", where previously we've only seen "unexpected chunk number 0 (expected 1)". We are kind of at a loss, so any suggestions on what we could try are welcome. >> Did you ever find out exactly what the change that solved the problem >> between 9.4.8 and 9.4.11 was? > In this case, I think that you are looking for this thread: > https://www.postgresql.org/message-id/20160826072658.15676.7628@wrigleys.postgresql.org > And this commit: > commit: a694435641faf26a9a4c210d20576ae836e86c48 > author: Tom Lane <tgl@sss.pgh.pa.us> > date: Sat, 3 Sep 2016 13:28:53 -0400 > Fix corrupt GIN_SEGMENT_ADDITEMS WAL records on big-endian hardware. > > Both involved 9.4.8. We run on x86_64-hardware, so I guess this wouldn't affect us? Best regards, Adam -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
adsj@novozymes.com (Adam =?utf-8?Q?Sj=C3=B8gren?=) writes: > Also, the error we are getting is now: "unexpected chunk number 2 > (expected 3) for toast value 1498303849 in pg_toast_10919630", where > previously we've only seen "unexpected chunk number 0 (expected 1)". > We are kind of at a loss, so any suggestions on what we could try are > welcome. The basic thrust of these messages is "I'm reading what should be sequentially numbered data chunks for this toast OID, and the sequence numbers are wrong". Both of these instances could be explained by duplicate toast rows (or duplicate index entries pointing at one row), though of course that would just move to the next question of how it got that way. Anyway, you could move the investigation along with some manual checking into what's in that toast table. For instance select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 where chunk_id = 1498303849 order by 1,2; might be informative. If you do see what seem to be duplicate chunk_seq values, checking whether they're still there in a seqscan would be good. regards, tom lane
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
On Wed, Jan 17, 2018 at 12:16:19PM -0500, Tom Lane wrote: > The basic thrust of these messages is "I'm reading what should be > sequentially numbered data chunks for this toast OID, and the sequence > numbers are wrong". Both of these instances could be explained by > duplicate toast rows (or duplicate index entries pointing at one row), > though of course that would just move to the next question of how it > got that way. Good point here. This could be a consequence of freeze-the-dead whose fix will be available in the next round of minor releases. -- Michael
Attachment
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Tom writes: > adsj@novozymes.com (Adam =?utf-8?Q?Sj=C3=B8gren?=) writes: >> Also, the error we are getting is now: "unexpected chunk number 2 >> (expected 3) for toast value 1498303849 in pg_toast_10919630", where >> previously we've only seen "unexpected chunk number 0 (expected 1)". > >> We are kind of at a loss, so any suggestions on what we could try are >> welcome. > > The basic thrust of these messages is "I'm reading what should be > sequentially numbered data chunks for this toast OID, and the sequence > numbers are wrong". Both of these instances could be explained by > duplicate toast rows (or duplicate index entries pointing at one row), > though of course that would just move to the next question of how it > got that way. Anyway, you could move the investigation along with > some manual checking into what's in that toast table. For instance > > select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) > from pg_toast.pg_toast_10919630 > where chunk_id = 1498303849 > order by 1,2; > > might be informative. If you do see what seem to be duplicate > chunk_seq values, checking whether they're still there in a > seqscan would be good. Here's a statement which currently gives an unexpected chunk error: efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '189909908'; ERROR: unexpected chunk number 0 (expected 1) for toast value 1698936148 in pg_toast_10919630 And when I run the suggested query, I get: efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698936148 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 (2 rows) How would I go about checking if they are still in a seqscan? (Note: this is on PostgreSQL 9.3.22.) Best regards, Adam -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Adam writes: > Here's a statement which currently gives an unexpected chunk error: > > efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '189909908'; > ERROR: unexpected chunk number 0 (expected 1) for toast value 1698936148 in pg_toast_10919630 > > And when I run the suggested query, I get: > > efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; > chunk_id | chunk_seq | ctid | xmin | xmax | length > ------------+-----------+--------------+------------+------+-------- > 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 > 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 > (2 rows) More examples: efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '237764759'; ERROR: unexpected chunk number 0 (expected 1) for toast value 1698958350 in pg_toast_10919630 efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698958350 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698958350 | 0 | (54859821,2) | 1511487270 | 0 | 1448 (1 row) And: efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '366275833'; ERROR: unexpected chunk number 0 (expected 1) for toast value 1698945095 in pg_toast_10919630 efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698945095 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698945095 | 0 | (53706565,3) | 1511426847 | 0 | 1996 1698945095 | 1 | (53706565,6) | 1511426847 | 0 | 108 (2 rows) One more: efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '189909908'; ERROR: unexpected chunk number 0 (expected 1) for toast value 1698936148 in pg_toast_10919630 efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698936148 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 (2 rows) And here is one from another table: efamroot@kat efam=# SELECT * FROM efam.sequence WHERE id = '235887163'; ERROR: unexpected chunk number 0 (expected 1) for toast value 1698750544 in pg_toast_10919630 efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698750544 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698750544 | 0 | (39575142,3) | 1510704835 | 0 | 1996 1698750544 | 1 | (39575142,4) | 1510704835 | 0 | 716 (2 rows) Let me know what other relevant info I can provide. Best regards, Adam -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
adsj@novozymes.com (Adam =?utf-8?Q?Sj=C3=B8gren?=) writes: > Here's a statement which currently gives an unexpected chunk error: > efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '189909908'; > ERROR: unexpected chunk number 0 (expected 1) for toast value 1698936148 in pg_toast_10919630 OK ... > And when I run the suggested query, I get: > efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; > chunk_id | chunk_seq | ctid | xmin | xmax | length > ------------+-----------+--------------+------------+------+-------- > 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 > 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 > (2 rows) Hmph. So if you EXPLAIN that query, does it show it's doing it as an indexscan? I'd expect so, but it's always good to make sure. Assuming it does say that, then the other test I had in mind would involve "set enable_indexscan = 0", then repeat the EXPLAIN to make sure that you now get a seqscan plan (you might need to turn off enable_bitmapscan too), then do the query again and see whether the results are the same. regards, tom lane
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Tom writes: >> And when I run the suggested query, I get: > >> efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; >> chunk_id | chunk_seq | ctid | xmin | xmax | length >> ------------+-----------+--------------+------------+------+-------- >> 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 >> 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 >> (2 rows) > Hmph. So if you EXPLAIN that query, does it show it's doing it as an > indexscan? I'd expect so, but it's always good to make sure. It does: efamroot@kat efam=# SELECT * FROM efam.sendreference WHERE id = '189909908'; ERROR: unexpected chunk number 0 (expected 1) for toast value 1698936148 in pg_toast_10919630 efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698936148 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 (2 rows) efamroot@kat efam=# explain select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Index Scan using pg_toast_10919630_index on pg_toast_10919630 (cost=0.57..2627179.25 rows=2135674 width=54) Index Cond: (chunk_id = 1698936148::oid) (2 rows) > Assuming it does say that, then the other test I had in mind would > involve "set enable_indexscan = 0", then repeat the EXPLAIN to make > sure that you now get a seqscan plan (you might need to turn off > enable_bitmapscan too), then do the query again and see whether the > results are the same. Ok (if I don't disable bitmapscan, I get Bitmap Heap Scans in the EXPLAIN, so): efamroot@kat efam=# set enable_indexscan = 0; SET efamroot@kat efam=# set enable_bitmapscan = 0; SET efamroot@kat efam=# explain select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; QUERY PLAN ------------------------------------------------------------------------------------- Sort (cost=96465280.57..96470619.75 rows=2135674 width=54) Sort Key: chunk_seq -> Seq Scan on pg_toast_10919630 (cost=0.00..96240754.39 rows=2135674 width=54) Filter: (chunk_id = 1698936148::oid) (4 rows) efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698936148 order by 1,2; [... still waiting for the result, I will return with what it said when the server does ...] Best regards, Adam -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Adam writes: > efamroot@kat efam=# explain select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; > QUERY PLAN > ------------------------------------------------------------------------------------- > Sort (cost=96465280.57..96470619.75 rows=2135674 width=54) > Sort Key: chunk_seq > -> Seq Scan on pg_toast_10919630 (cost=0.00..96240754.39 rows=2135674 width=54) > Filter: (chunk_id = 1698936148::oid) > (4 rows) > > efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630where chunk_id = 1698936148 order by 1,2; > [... still waiting for the result, I will return with what it said > when the server does ...] It did eventually finish, with the same result: efamroot@kat efam=# select chunk_id, chunk_seq, ctid, xmin, xmax, length(chunk_data) from pg_toast.pg_toast_10919630 wherechunk_id = 1698936148 order by 1,2; chunk_id | chunk_seq | ctid | xmin | xmax | length ------------+-----------+--------------+------------+------+-------- 1698936148 | 0 | (52888694,2) | 1511390221 | 0 | 1996 1698936148 | 1 | (52888694,4) | 1511390221 | 0 | 1148 (2 rows) Best regards, Adam -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
adsj@novozymes.com (Adam =?utf-8?Q?Sj=C3=B8gren?=) writes: >> [... still waiting for the result, I will return with what it said >> when the server does ...] > It did eventually finish, with the same result: Huh. So what we have here, apparently, is that regular MVCC snapshots think there is exactly one copy of the 1698936148/0 row, but TOAST fetches think there is more than one. This is darn odd, not least because we never do UPDATEs in toast tables, only inserts and deletes, so there certainly shouldn't be update chains there. It seems like you've got some corner case wherein SnapshotToast sees a row that isn't visible according to MVCC --- probably a row left over from some previous cycle of life. That is, I'm imagining the OID counter wrapped around and we've reused a toast OID, but for some reason there's still a row in the table with that OID. I'm not sure offhand how we could get into such a state. Alvaro, does this ring any bells (remembering that this is 9.3)? regards, tom lane
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
adsj@novozymes.com (Adam Sjøgren) writes:
>> [... still waiting for the result, I will return with what it said
>> when the server does ...]
> It did eventually finish, with the same result:
Huh. So what we have here, apparently, is that regular MVCC snapshots
think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
think there is more than one. This is darn odd, not least because we
never do UPDATEs in toast tables, only inserts and deletes, so there
certainly shouldn't be update chains there.
It seems like you've got some corner case wherein SnapshotToast sees a row
that isn't visible according to MVCC --- probably a row left over from
some previous cycle of life. That is, I'm imagining the OID counter
wrapped around and we've reused a toast OID, but for some reason there's
still a row in the table with that OID. I'm not sure offhand how we could
get into such a state. Alvaro, does this ring any bells (remembering that
this is 9.3)?
PostgreSQL Development, 24x7 Support, Training & Services
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value76753264 in pg_toast_10920100
On Fri, Apr 6, 2018 at 2:34 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:adsj@novozymes.com (Adam Sjøgren) writes:
>> [... still waiting for the result, I will return with what it said
>> when the server does ...]
> It did eventually finish, with the same result:
Huh. So what we have here, apparently, is that regular MVCC snapshots
think there is exactly one copy of the 1698936148/0 row, but TOAST fetches
think there is more than one. This is darn odd, not least because we
never do UPDATEs in toast tables, only inserts and deletes, so there
certainly shouldn't be update chains there.
It seems like you've got some corner case wherein SnapshotToast sees a row
that isn't visible according to MVCC --- probably a row left over from
some previous cycle of life. That is, I'm imagining the OID counter
wrapped around and we've reused a toast OID, but for some reason there's
still a row in the table with that OID. I'm not sure offhand how we could
get into such a state. Alvaro, does this ring any bells (remembering that
this is 9.3)?FWIW one of our support customers reported a very similar TOAST table corruption issue last week which nearly caused an outage. After a lot of analysis, I think I've now fully understood the reasons behind the corruption, the underlying bug(s) and possible remedy. I am currently working on writing a reproducible test case to demonstrate the problem and writing the fix. More details on that soon.
PostgreSQL Development, 24x7 Support, Training & Services
Re: ERROR: unexpected chunk number 0 (expected 1) for toast value 76753264 in pg_toast_10920100
Pavan writes: >> FWIW one of our support customers reported a very similar TOAST table >> corruption issue last week which nearly caused an outage. After a lot of >> analysis, I think I've now fully understood the reasons behind the >> corruption, the underlying bug(s) and possible remedy. I am currently >> working on writing a reproducible test case to demonstrate the problem and >> writing the fix. More details on that soon. > > I've posted a reproducer and a proposed fix to -hackers [1] > > In the particular case that I investigated, a database crash/recovery was > involved. But I think we should be able to create a scenario where OID > wrap-around or a standby promotion triggers the problem. I don't know if > any of that was involved in the cases reported on this thread, but I've a > strong suspicion that the underlying bug is probably the same. In our case there was no crash+recovery, but we do have a high write load (and the problem occurring quite seldom), so it sounds like it fits. > [1] https://www.postgresql.org/message-id/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-%3Dh4OLosXHf9w%40mail.gmail.com Impressive investigation! Just a couple of data points: your script to reproduce the problem does so for all three versions of PostgreSQL we have in use: 9.3.22, 9.6.8 and 10.3. And I specifically tested on our production machine which exhibits the problem, running 9.3.22, and your script reproduced the problem there as well. I.e. ending with: ERROR: unexpected chunk number 0 (expected 1) for toast value 16396 in pg_toast_16384 REINDEX ERROR: unexpected chunk number 0 (expected 1) for toast value 16396 in pg_toast_16384 VACUUM ERROR: unexpected chunk number 0 (expected 1) for toast value 16396 in pg_toast_16384 Best regards, Adam -- "No more than that, but very powerful all the Adam Sjøgren same; simple things are good." adsj@novozymes.com