Thread: BUG #5998: CLUSTER and "ERROR: missing chunk number 0 for toast value"

BUG #5998: CLUSTER and "ERROR: missing chunk number 0 for toast value"

From
"Mark Reid"
Date:
The following bug has been logged online:

Bug reference:      5998
Logged by:          Mark Reid
Email address:      mail@markreid.org
PostgreSQL version: 8.3.5
Operating system:   Debian Etch
Description:        CLUSTER and "ERROR:  missing chunk number 0 for toast
value"
Details:

Hello,

This morning I noticed that a normally small table (18 wide rows) was
bloated to 6GB in size.  This has happened before using older postgres
versions in the past, where the main table got vacuumed, but the pg_toastXX
table did not.  This is the first time I've seen a problem for this version
of postgres (8.3.5).  So I decided to manually vacuum the table to find out
what was going on.  I saw a lot of "dead row versions cannot be removed
yet." (on the order of 60k in the main table, and 3.8M in the toast table).
There was a day-old "idle in transaction" query for that same user, so I
killed that and tried again.  Same deal.  So I tried a "CLUSTER
properties_pkey ON properties;" which gave the following error message:
ERROR:  missing chunk number 0 for toast value 396950697 in
pg_toast_373928870

I jumped on IRC and got some excellent help from andres, who suggested
running the following query with index scans both enabled and disabled:
SELECT chunk_id, chunk_seq, length(chunk_data) FROM
pg_toast.pg_toast_373928870 WHERE chunk_id = 396950697;

Both queries returned zero rows.

I checked for other long-running transactions, and found one that had been
running since April 15th (it's now April 29th), but for a different database
user, which would definitely *not* have been accessing this table in any
way.  andres said that might cause such behaviour, so I killed the offending
process and re-tried the vacuum.  This time it was able to remove the dead
rows as expected.  After the vacuum, I also ran the CLUSTER command again,
which completed successfully and reduced the table size to 576kB.

Andres concluded "The bug is that CLUSTER seems to use the wrong xid horizon
when determining visibility, Or autovacuum. Not sure"

I have the output of the following queries, but would prefer to send it
off-list for confidentiality:
SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
SELECT relname, relfrozenxid, age(relfrozenxid) FROM pg_class;
SELECT * FROM pg_stat_activity;

These queries were run shortly after the long-running transaction was
killed.

Thanks!

Mark Reid
"Mark Reid" <mail@markreid.org> writes:
> This morning I noticed that a normally small table (18 wide rows) was
> bloated to 6GB in size.  This has happened before using older postgres
> versions in the past, where the main table got vacuumed, but the pg_toastXX
> table did not.  This is the first time I've seen a problem for this version
> of postgres (8.3.5).  So I decided to manually vacuum the table to find out
> what was going on.  I saw a lot of "dead row versions cannot be removed
> yet." (on the order of 60k in the main table, and 3.8M in the toast table). 
> There was a day-old "idle in transaction" query for that same user, so I
> killed that and tried again.  Same deal.  So I tried a "CLUSTER
> properties_pkey ON properties;" which gave the following error message:
> ERROR:  missing chunk number 0 for toast value 396950697 in
> pg_toast_373928870
> [ and closing an old open transaction made the error go away ]

> Andres concluded "The bug is that CLUSTER seems to use the wrong xid horizon
> when determining visibility, Or autovacuum. Not sure"

No, I don't think so.  I've been able to replicate the failure with this
sequence, in the presence of an old open transaction:

regression=# create table foo (f1 serial primary key, f2 text);
NOTICE:  CREATE TABLE will create implicit sequence "foo_f1_seq" for serial column "foo.f1"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pkey" for table "foo"
CREATE TABLE
regression=# insert into foo(f2) select repeat('xyzzyklflencxlkslnvlzsm;pemc', 10000);
INSERT 0 1
regression=# begin;
BEGIN
regression=# update foo set f2 = f2 || 'z';
UPDATE 1
regression=# update foo set f2 = f2 || 'z';
UPDATE 1
regression=# update foo set f2 = f2 || 'z';
UPDATE 1
regression=# commit;
COMMIT
regression=# vacuum foo;
VACUUM
regression=# cluster foo using foo_pkey;
ERROR:  missing chunk number 0 for toast value 41347 in pg_toast_41336

The problem is that CLUSTER is trying to copy a tuple that it believes
to be RECENTLY_DEAD, but that is in fact DEAD (ie, inaccessible to every
open transaction).  The preceding VACUUM did not remove the tuple, but
it did remove the underlying TOAST data, which means that CLUSTER fails
when trying to copy the tuple's toast data to the new table.  The error
goes away once OldestXmin advances enough to let the problem tuple be
seen as DEAD.

So, how is it that the vacuum removed the underlying toast data but not
the putatively-recently-dead tuple?  The problem tuple is in the middle
of the update chain that I created with those updates, and the
difficulty arises from the following bit in HeapTupleSatisfiesVacuum:
   /*    * Deleter committed, but check special cases.    */
   if (TransactionIdEquals(HeapTupleHeaderGetXmin(tuple),                           HeapTupleHeaderGetXmax(tuple)))   {
     /*        * Inserter also deleted it, so it was never visible to anyone else.        * However, we can only remove
itearly if it's not an updated tuple;        * else its parent tuple is linking to it via t_ctid, and this tuple
*mustn't go away before the parent does.        */       if (!(tuple->t_infomask & HEAP_UPDATED))           return
HEAPTUPLE_DEAD;  }
 
   if (!TransactionIdPrecedes(HeapTupleHeaderGetXmax(tuple), OldestXmin))   {       /* deleting xact is too recent,
tuplecould still be visible */       return HEAPTUPLE_RECENTLY_DEAD;   }
 
   /* Otherwise, it's dead and removable */   return HEAPTUPLE_DEAD;

Those middle-of-the-chain tuples have xmin=xmax, and so do their
supporting toast tuples.  Problem is the toast tuples aren't "updated",
they are independent tuples so far as the toast table is concerned.
Therefore, the toast tuples are reported as DEAD regardless of
OldestXmin, but their parent tuples escape the first test and can be
reported RECENTLY_DEAD.  This allows the toast tuples to be removed
ahead of their parent tuple, and then you've got trouble waiting to
happen.

This issue clearly is problematic for CLUSTER (as well as VACUUM FULL
in 9.0+).  I wonder whether it might explain some of the heretofore
unexplainable reports of "missing chunk number 0" errors in other
operations.  You'd have to suppose that those other commands were
being run with stale snapshots, so it's not super credible, but ...

I'm inclined to think that the xmin=xmax test is just too cute and
should be dropped altogether, ie removing the first if-block quoted
above would fix the problem.  Anyone see a better answer?
        regards, tom lane