Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunknumber 0 for toast value .. in pg_toast_2619 - Mailing list pgsql-general

From Justin Pryzby
Subject Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunknumber 0 for toast value .. in pg_toast_2619
Date
Msg-id 20190830142655.GA14011@telsasoft.com
Whole thread Raw
In response to Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunknumber 0 for toast value .. in pg_toast_2619  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-general
Moving this old thread to -hackers
https://www.postgresql.org/message-id/flat/20180519142603.GA30060%40telsasoft.com

I wanted to mention that this seems to still be an issue, now running pg11.5.

log_time               | 2019-08-30 23:20:00.118+10
user_name              | postgres
database               | ts
session_id             | 5d69227e.235
session_line           | 1
command_tag            | CLUSTER
session_start_time     | 2019-08-30 23:19:58+10
error_severity         | ERROR
sql_state_code         | XX000
message                | unexpected chunk number 1 (expected 0) for toast value 2369261203 in pg_toast_2619
query                  | CLUSTER pg_statistic USING pg_statistic_relid_att_inh_index
application_name       | psql

Note that my original report was for "missing" chunk during "VACUUM FULL", and
the current error is "unexpected chunk" during CLUSTER.  I imagine that's
related issue.  I haven't seen this in awhile (but stopped trying to reproduce
it long ago).  A recently-deployed update to this maintenance script is
probably why it's now doing CLUSTER.

On Fri, May 25, 2018 at 08:49:50AM -0500, Justin Pryzby wrote:
> On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote:
> > Justin Pryzby <pryzby@telsasoft.com> writes:
> > > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT datname FROM pg_database WHERE
datallowconn"`;do for t in pg_statistic pg_attrdef pg_constraint; do echo "$db.$t...";
PGOPTIONS=-cstatement_timeout='9s'psql $db -qc "VACUUM FULL $t"; done; done; done
 
> > 
> > > ...
> > > postgres.pg_statistic...
> > > postgres.pg_attrdef...
> > > postgres.pg_constraint...
> > > template1.pg_statistic...
> > > template1.pg_attrdef...
> > > template1.pg_constraint...
> > > ts.pg_statistic...
> > > ERROR:  canceling statement due to statement timeout
> > > ts.pg_attrdef...
> > > ts.pg_constraint...
> > > postgres.pg_statistic...
> > > ERROR:  missing chunk number 0 for toast value 3372855171 in pg_toast_2619
> > 
> > Hm, so was the timeout error happening every time through on that table,
> > or just occasionally, or did you provoke it somehow?  I'm wondering how
> > your 9s timeout relates to the expected completion time.
> 
> Actually statement_timeout isn't essential for this (maybe it helps to triggers
> it more often - not sure).
> 
> Could you try:
> time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE pg_toast.pg_toast_2619"; psql postgres -c "VACUUM
FULLVERBOSE pg_statistic"; done'; date
 
> 
> Three servers experienced error within 30min, but one server didn't fail until
> 12h later, and a handful others still haven't failed..
> 
> Does this help at all ?
>  2018-05-24 21:57:49.98-03  | 5b075f8d.1ad1 | LOG            | pryzbyj   | postgres | statement: VACUUM FULL VERBOSE
pg_toast.pg_toast_2619
>  2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO           | pryzbyj   | postgres | vacuuming
"pg_toast.pg_toast_2619"
>  2018-05-24 21:57:50.09-03  | 5b075f8d.1ad1 | INFO           | pryzbyj   | postgres | "pg_toast_2619": found 0
removable,408 nonremovable row versions in 99 pages
 
>  2018-05-24 21:57:50.12-03  | 5b075f8e.1ada | LOG            | pryzbyj   | postgres | statement: VACUUM FULL VERBOSE
pg_statistic
>  2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO           | pryzbyj   | postgres | vacuuming
"pg_catalog.pg_statistic"
>  2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR          | pryzbyj   | postgres | missing chunk number 0 for
toastvalue 3382957233 in pg_toast_2619
 
> 
> Some thing; this server has autovacuum logging, although it's not clear to me
> if that's an essential component of the problem, either:
>  2018-05-24 21:16:39.856-06 | LOG   | 5b078017.7b99 | pryzbyj   | postgres | statement: VACUUM FULL VERBOSE
pg_toast.pg_toast_2619
>  2018-05-24 21:16:39.876-06 | LOG   | 5b078010.7968 |           |          | automatic vacuum of table
"postgres.pg_toast.pg_toast_2619":index scans: 1                        +
 
>                             |       |               |           |          | pages: 0 removed, 117 r
>  2018-05-24 21:16:39.909-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | vacuuming "pg_toast.pg_toast_2619"
>  2018-05-24 21:16:39.962-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | "pg_toast_2619": found 0 removable, 492
nonremovablerow versions in 117 pages
 
>  2018-05-24 21:16:40.025-06 | LOG   | 5b078018.7b9b | pryzbyj   | postgres | statement: VACUUM FULL VERBOSE
pg_statistic
>  2018-05-24 21:16:40.064-06 | INFO  | 5b078018.7b9b | pryzbyj   | postgres | vacuuming "pg_catalog.pg_statistic"
>  2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj   | postgres | missing chunk number 0 for toast value
765874692in pg_toast_2619
 
> 
> Or this one?
> 
> postgres=# SELECT log_time, database, user_name, session_id, left(message,999) FROM postgres_log WHERE
(log_time>='2018-05-2419:56' AND log_time<'2018-05-24 19:58') AND (database='postgres' OR database IS NULL OR user_name
ISNULL OR user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1;
 
> 
> log_time   | 2018-05-24 19:56:35.396-04
> database   | 
> user_name  | 
> session_id | 5b075131.3ec0
> left       | skipping vacuum of "pg_toast_2619" --- lock not available
> 
> ...
> 
> log_time   | 2019-05-24 19:57:35.78-04
> database   | 
> user_name  | 
> session_id | 5b07516d.445e
> left       | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1
>            : pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped frozen
>            : tuples: 1 removed, 348 remain, 0 are dead but not yet removable, oldest xmin: 63803106
>            : buffer usage: 179 hits, 4 misses, 87 dirtied
>            : avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s
>            : system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s
> 
> log_time   | 2018-05-24 19:57:35.879-04
> database   | postgres
> user_name  | pryzbyj
> session_id | 5b07516f.447f
> left       | missing chunk number 0 for toast value 624341680 in pg_toast_2619
> 
> log_time   | 2018-05-24 19:57:44.332-04
> database   |
> user_name  |
> session_id | 5af9fda3.70d5
> left       | checkpoint starting: time



pgsql-general by date:

Previous
From: Stephen Frost
Date:
Subject: Re: SSPI auth and mixed case usernames
Next
From: Thomas Kellerer
Date:
Subject: Re: SQL equivalint of #incude directive ?