BUG #5998: CLUSTER and "ERROR: missing chunk number 0 for toast value" - Mailing list pgsql-bugs

From Mark Reid
Subject BUG #5998: CLUSTER and "ERROR: missing chunk number 0 for toast value"
Date
Msg-id 201104291506.p3TF6dA9003698@wwwmaster.postgresql.org
Whole thread Raw
Responses Re: BUG #5998: CLUSTER and "ERROR: missing chunk number 0 for toast value"
List pgsql-bugs
The following bug has been logged online:

Bug reference:      5998
Logged by:          Mark Reid
Email address:      mail@markreid.org
PostgreSQL version: 8.3.5
Operating system:   Debian Etch
Description:        CLUSTER and "ERROR:  missing chunk number 0 for toast
value"
Details:

Hello,

This morning I noticed that a normally small table (18 wide rows) was
bloated to 6GB in size.  This has happened before using older postgres
versions in the past, where the main table got vacuumed, but the pg_toastXX
table did not.  This is the first time I've seen a problem for this version
of postgres (8.3.5).  So I decided to manually vacuum the table to find out
what was going on.  I saw a lot of "dead row versions cannot be removed
yet." (on the order of 60k in the main table, and 3.8M in the toast table).
There was a day-old "idle in transaction" query for that same user, so I
killed that and tried again.  Same deal.  So I tried a "CLUSTER
properties_pkey ON properties;" which gave the following error message:
ERROR:  missing chunk number 0 for toast value 396950697 in
pg_toast_373928870

I jumped on IRC and got some excellent help from andres, who suggested
running the following query with index scans both enabled and disabled:
SELECT chunk_id, chunk_seq, length(chunk_data) FROM
pg_toast.pg_toast_373928870 WHERE chunk_id = 396950697;

Both queries returned zero rows.

I checked for other long-running transactions, and found one that had been
running since April 15th (it's now April 29th), but for a different database
user, which would definitely *not* have been accessing this table in any
way.  andres said that might cause such behaviour, so I killed the offending
process and re-tried the vacuum.  This time it was able to remove the dead
rows as expected.  After the vacuum, I also ran the CLUSTER command again,
which completed successfully and reduced the table size to 576kB.

Andres concluded "The bug is that CLUSTER seems to use the wrong xid horizon
when determining visibility, Or autovacuum. Not sure"

I have the output of the following queries, but would prefer to send it
off-list for confidentiality:
SELECT datname, datfrozenxid, age(datfrozenxid) FROM pg_database;
SELECT relname, relfrozenxid, age(relfrozenxid) FROM pg_class;
SELECT * FROM pg_stat_activity;

These queries were run shortly after the long-running transaction was
killed.

Thanks!

Mark Reid

pgsql-bugs by date:

Previous
From: "Michiel"
Date:
Subject: BUG #5997: [queryplan] PostgreSQL is sorting the query results when the result can only be 1 item.
Next
From: "Kevin Grittner"
Date:
Subject: Re: BUG #5997: [queryplan] PostgreSQL is sorting the query results when the result can only be 1 item.