Home > mailing lists

GetOldestXmin going backwards is dangerous after all - Mailing list pgsql-hackers

From	Tom Lane
Subject	GetOldestXmin going backwards is dangerous after all
Date	February 1, 2013 19:35:31
Msg-id	20362.1359747327@sss.pgh.pa.us Whole thread Raw
Responses	Re: GetOldestXmin going backwards is dangerous after all
List	pgsql-hackers

Tree view

I've been able to reproduce the problem reported by Pius Chan in bug
#7819. With some logging added that prints the OldestXmin values used
by vacuum and cluster operations, the reason is fairly clear:

2013-02-01 13:41:12 EST 8011 LOG: vacuuming "test" with OldestXmin 1760160 FreezeLimit 4246727456
2013-02-01 13:41:14 EST 8011 LOG: automatic vacuum of table "test.public.test": index scans: 1 pages: 0 removed,
635remain tuples: 48895 removed, 91 remain system usage: CPU 0.00s/0.03u sec elapsed 1.64 sec

2013-02-01 13:41:14 EST 8011 LOG: automatic analyze of table "test.public.test" system usage: CPU 0.00s/0.00u sec
elapsed0.09 sec

2013-02-01 13:41:14 EST 8011 LOG: vacuuming "pg_toast_115435" with OldestXmin 1761518 FreezeLimit 4246728814
2013-02-01 13:41:57 EST 8011 LOG: automatic vacuum of table "test.pg_toast.pg_toast_115435": index scans: 1
pages:0 removed, 12904 remain tuples: 7624 removed, 43992 remain system usage: CPU 0.21s/0.13u sec elapsed
42.79sec

2013-02-01 13:41:58 EST 7025 LOG: clustering "test" with OldestXmin 1752719 FreezeXid 1630979
2013-02-01 13:41:58 EST 7025 ERROR: missing chunk number 0 for toast value 215828 in pg_toast_115435

IOW, autovacuum worker 8011 vacuumed "test" with one xmin, and then a
bit later vacuumed its toast table with a slightly newer xmin. This
means that certain tuples' toast tuples could get cleaned out of the
toast table while the parent tuples are still there in the heap.
Normally this is fine because the parent tuples are certainly dead and
beyond snapshot range, so no one will make any attempt to fetch their
toast values. But then, here comes CLUSTER with an older xmin ---
so it thinks it needs to copy over some of those dead-but-possibly-
still-visible-to-somebody tuples, and kaboom.

Now, CLUSTER is very careful to make sure that it gets a lock on the
toast table before computing OldestXmin, with the express intent of
preventing this type of problem (cf comment in copy_heap_data).
However, if the result of GetOldestXmin() can ever go backwards,
that doesn't help. And guess what ... it can.

The header comment for GetOldestXmin() contains a long rationalization
for why it's safe for the result to sometimes go backwards, but now
that I've seen this example I'm pretty sure that that's all horsepucky.

I think we need to either ensure that GetOldestXmin is strictly
nondecreasing, or find some more-direct way of interlocking vacuuming
of toast tables with their parents.

We could possibly do the former by tracking the last-known value in
shared memory, but it'd be complicated because the value can
(legitimately) vary depending on which database you ask about.

The only backpatchable way I can think of to do the latter is to undo
the decoupling of main and toast-table vacuuming, and then say that
we simply don't change the value of OldestXmin when going to vacuum
the toast table. This is a bit unpleasant.

In any case, I no longer have much faith in the idea that letting
GetOldestXmin go backwards is really safe.

The one bright spot is that AFAICS this isn't a true data loss hazard;
no possibly-still-needed data can be lost. Any failures would be
transient failures of CLUSTER or VACUUM FULL.

Thoughts?
regards, tom lane

pgsql-hackers by date:

From: "Marko Tiikkaja"
Date: 01 February 2013, 18:19:30
Subject: Re: PL/PgSQL STRICT

From: Robert Haas
Date: 01 February 2013, 19:46:55
Subject: Re: Re: [COMMITTERS] pgsql: Tolerate timeline switches while "pg_basebackup -X fetch" is run

GetOldestXmin going backwards is dangerous after all - Mailing list pgsql-hackers

Previous

Next