GetOldestXmin going backwards is dangerous after all - Mailing list pgsql-hackers

From Tom Lane
Subject GetOldestXmin going backwards is dangerous after all
Date
Msg-id 20362.1359747327@sss.pgh.pa.us
Whole thread Raw
Responses Re: GetOldestXmin going backwards is dangerous after all  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I've been able to reproduce the problem reported by Pius Chan in bug
#7819.  With some logging added that prints the OldestXmin values used
by vacuum and cluster operations, the reason is fairly clear:

2013-02-01 13:41:12 EST 8011 LOG:  vacuuming "test" with OldestXmin 1760160 FreezeLimit 4246727456
2013-02-01 13:41:14 EST 8011 LOG:  automatic vacuum of table "test.public.test": index scans: 1       pages: 0 removed,
635remain       tuples: 48895 removed, 91 remain       system usage: CPU 0.00s/0.03u sec elapsed 1.64 sec
 
2013-02-01 13:41:14 EST 8011 LOG:  automatic analyze of table "test.public.test" system usage: CPU 0.00s/0.00u sec
elapsed0.09 sec
 
2013-02-01 13:41:14 EST 8011 LOG:  vacuuming "pg_toast_115435" with OldestXmin 1761518 FreezeLimit 4246728814
2013-02-01 13:41:57 EST 8011 LOG:  automatic vacuum of table "test.pg_toast.pg_toast_115435": index scans: 1
pages:0 removed, 12904 remain       tuples: 7624 removed, 43992 remain       system usage: CPU 0.21s/0.13u sec elapsed
42.79sec
 
2013-02-01 13:41:58 EST 7025 LOG:  clustering "test" with OldestXmin 1752719 FreezeXid 1630979
2013-02-01 13:41:58 EST 7025 ERROR:  missing chunk number 0 for toast value 215828 in pg_toast_115435

IOW, autovacuum worker 8011 vacuumed "test" with one xmin, and then a
bit later vacuumed its toast table with a slightly newer xmin.  This
means that certain tuples' toast tuples could get cleaned out of the
toast table while the parent tuples are still there in the heap.
Normally this is fine because the parent tuples are certainly dead and
beyond snapshot range, so no one will make any attempt to fetch their
toast values.  But then, here comes CLUSTER with an older xmin ---
so it thinks it needs to copy over some of those dead-but-possibly-
still-visible-to-somebody tuples, and kaboom.

Now, CLUSTER is very careful to make sure that it gets a lock on the
toast table before computing OldestXmin, with the express intent of
preventing this type of problem (cf comment in copy_heap_data).
However, if the result of GetOldestXmin() can ever go backwards,
that doesn't help.  And guess what ... it can.

The header comment for GetOldestXmin() contains a long rationalization
for why it's safe for the result to sometimes go backwards, but now
that I've seen this example I'm pretty sure that that's all horsepucky.

I think we need to either ensure that GetOldestXmin is strictly
nondecreasing, or find some more-direct way of interlocking vacuuming
of toast tables with their parents.

We could possibly do the former by tracking the last-known value in
shared memory, but it'd be complicated because the value can
(legitimately) vary depending on which database you ask about.

The only backpatchable way I can think of to do the latter is to undo
the decoupling of main and toast-table vacuuming, and then say that
we simply don't change the value of OldestXmin when going to vacuum
the toast table.  This is a bit unpleasant.

In any case, I no longer have much faith in the idea that letting
GetOldestXmin go backwards is really safe.

The one bright spot is that AFAICS this isn't a true data loss hazard;
no possibly-still-needed data can be lost.  Any failures would be
transient failures of CLUSTER or VACUUM FULL.

Thoughts?
        regards, tom lane



pgsql-hackers by date:

Previous
From: "Marko Tiikkaja"
Date:
Subject: Re: PL/PgSQL STRICT
Next
From: Robert Haas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Tolerate timeline switches while "pg_basebackup -X fetch" is run