Re: Toast issues with OldestXmin going backwards - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Toast issues with OldestXmin going backwards
Date
Msg-id CAA4eK1+XejU=FH=0tonrOd63OH=z_4mjypT-W2MuzDQrXVehEA@mail.gmail.com
Whole thread Raw
In response to Toast issues with OldestXmin going backwards  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Responses Re: Toast issues with OldestXmin going backwards
Re: Toast issues with OldestXmin going backwards
List pgsql-hackers
On Thu, Apr 19, 2018 at 4:07 PM, Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> Various comments in GetOldestXmin mention the possibility of the oldest
> xmin going backward, and assert that this is actually safe. It's not.
>
> Consider:
>
> A table has a toastable column. A row is updated in a way that changes
> the toasted value. There are now two row versions pointing to different
> toast values, one live and one dead.
>
> Now suppose the toast table - but not the main table - is vacuumed; the
> dead toast entries are removed, even though they are still referenced by
> the dead main-table row. Autovacuum treats the main table and toast
> table separately, so this can happen.
>
> Now suppose that OldestXmin goes backwards so that the older main table
> row version is no longer dead, but merely recently-dead.
>
> At this point, VACUUM FULL (or similar rewrites) on the table will fail
> with "missing chunk number 0 for ..." toast errors, because it tries to
> copy the recently-dead row, but that row's toasted values have been
> vacuumed away already.
>

I haven't tried to reproduce it, but I can see the possibility of the
problem described by you.  What should we do next?  I could see few
possibilities: (a) Vacuum for main and toast table should always use
same OldestXmin, (b) Before removing the row from toast table, we
should ensure that row in the main table is removed, (c) Change the
logic during rewrite such that it can detect this situation and skip
copying the tuple in the main table, on a quick look, this seems
tricky, because the toast table TID might have been reused by that
time, basically I am not sure if this can be a viable solution or (d)
Ensure that GetOldestXmin doesn't move backwards or write a new API
similar to it which doesn't allow OldestXmin to move backwards and use
that for the purpose of vacuum.

Any better ideas?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Adrien Nayrat
Date:
Subject: Re: Explain buffers wrong counter with parallel plans
Next
From: Tom Lane
Date:
Subject: Re: Postgresql9.6 type cache invalidation issue - different behave of psql and pg regress