Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions
Date
Msg-id 37c80783-557b-07b5-0a65-2cdcee63fa6d@2ndquadrant.com
Whole thread Raw
In response to Re: Logical Decoding and HeapTupleSatisfiesVacuum assumptions  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
Hi,

On 01/19/2018 03:34 PM, Simon Riggs wrote:
> On 26 December 2017 at 14:21, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote:
> 
>> With logical decoding, there might arise a case that such a row, if it
>> belongs to a system catalog table or even a user catalog table
>> (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=66abc2608c7c00fcd449e00a9e23f13f02e65d04),
>> then it might be being used for decoding at the very same moment that
>> the abort came in. If the row is re-claimed immediately, then it's
>> possible that the decoding happening alongside might face issues.
> 
> If we want to make this change, I'd like to see something that
> explains exactly how the problem in logical decoding occurs, or
> preferably a test case. I can't understand the problem by reading the
> archives. I'm not suggesting this patch doesn't work, I'm thinking
> about whether there are other ways.
> 
> Surely if you are decoding a transaction and a concurrent abort is
> requested then decoding should be interrupted at the next sensible
> point. Allowing the two actions to occur without interlock is an
> issue, so I suggest we just don't do it, rather than allow it and fix
> subsequent breakage, which is what I understand this patch to do.
> 

I don't quite understand how the interlock would work? Can you elaborate
how would that be implemented? Consider that logical decoding

(a) is a read-only process, so it can't modify database state (e.g.
    locking a row in a catalog)

(b) is asynchronous process, i.e. it may be happening quite a bit of
    time after the changes actually happened

An example of a problematic transaction was already outlined in Nikhil's
post, i.e. it's a transaction that

   (1) does DDL, e.g. to add a column to the decoded table

   (2) inserts some data into the table (with the new column)

   (3) aborts (possibly hours / gigabytes of WAL in the future)

Currently, vacuum can just go and cleanup the catalog rows added in (1)
because the transaction is aborted (and postgres already knows that),
and we do ignore OldestXmin for catalog rows.

But if you try decode the changes (and hand them over to the plugin for
apply) without waiting for the final commit/abort, that will fail.
Because it will read changes from WAL with a column missing in any
existing catalog row. Kabooooom!

Until now this was not an issue, because the decoding happens at commit
time (or more precisely, we do decode the changes incrementally, but the
apply only happens on commit). But it's an issue for both Nikhil's and
mine patches.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Built-in connection pooling
Next
From: Pavel Stehule
Date:
Subject: Re: Built-in connection pooling