Re: GetOldestXmin going backwards is dangerous after all - Mailing list pgsql-hackers

From Andres Freund
Subject Re: GetOldestXmin going backwards is dangerous after all
Date
Msg-id 20130204103752.GB6645@awork2.anarazel.de
Whole thread Raw
In response to Re: GetOldestXmin going backwards is dangerous after all  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: GetOldestXmin going backwards is dangerous after all  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 2013-02-01 19:24:02 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > Having said that, I agree that a fix in GetOldestXmin() would be nice
> > if we could find one, but since the comment describes at least three
> > different ways the value can move backwards, I'm not sure that there's
> > really a practical solution there, especially if you want something we
> > can back-patch.
> 
> Actually, wait a second.  As you say, the comment describes three known
> ways to make it go backwards.  It strikes me that all three are fixable:
> 
>  * if allDbs is FALSE and there are no transactions running in the current
>  * database, GetOldestXmin() returns latestCompletedXid. If a transaction
>  * begins after that, its xmin will include in-progress transactions in other
>  * databases that started earlier, so another call will return a lower value.
> 
> The reason this is a problem is that GetOldestXmin ignores XIDs of
> processes that are connected to other DBs.  It now seems to me that this
> is a flat-out bug.  It can ignore their xmins, but it should include
> their XIDs, because the point of considering those XIDs is that they may
> contribute to the xmins of snapshots computed in the future by processes
> in our own DB.  And snapshots never exclude any XIDs on the basis of
> which DB they're in.  (They can't really, since we can't know when the
> snap is taken whether it might be used to examine shared catalogs.)

>  * The return value is also adjusted with vacuum_defer_cleanup_age, so
>  * increasing that setting on the fly is another easy way to make
>  * GetOldestXmin() move backwards, with no consequences for data integrity.
>
> And as for that, it's been pretty clear for awhile that allowing
> vacuum_defer_cleanup_age to change on the fly was a bad idea we'd
> eventually have to undo.  The day of reckoning has arrived: it needs
> to be PGC_POSTMASTER.

ISTM that the original problem can still occur, even after Simon's
commit.
1) start with -c vacuum_defer_cleanup_age=0
2) autovacuum vacuums "test";
3) restart with -c vacuum_defer_cleanup_age=10000
4) autovacuum vacuums "test"'s toast table;

should result in about the same ERROR, shouldn't it?

Given that there seemingly isn't yet a way to fix that people agree on
and that it "only" result in a transient error I think the fix for this
should be pushed after the next point release.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Marko Tiikkaja
Date:
Subject: Re: pg_dump --pretty-print-views
Next
From: Miroslav Šimulčík
Date:
Subject: Re: Temporal features in PostgreSQL