Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] logical decoding of two-phase transactions
Date
Msg-id 00bafa2d-4742-6555-5a72-3208812dc3fe@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] logical decoding of two-phase transactions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] logical decoding of two-phase transactions
List pgsql-hackers
On 07/18/2018 04:56 PM, Robert Haas wrote:
> On Wed, Jul 18, 2018 at 10:08 AM, Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
>> The problem is you don't know if a transaction does DDL sometime later, in
>> the part that you might not have decoded yet (or perhaps concurrently with
>> the decoding). So I don't see how you could easily exclude such transactions
>> from the decoding ...
> 
> One idea is that maybe the running transaction could communicate with
> the decoding process through shared memory.  For example, suppose that
> before you begin decoding an ongoing transaction, you have to send
> some kind of notification to the process saying "hey, I'm going to
> start decoding you" and wait for that process to acknowledge receipt
> of that message (say, at the next CFI).  Once it acknowledges receipt,
> you can begin decoding.  Then, we're guaranteed that the foreground
> process knows when that it  must be careful about catalog changes. If
> it's going to make one, it sends a note to the decoding process and
> says, hey, sorry, I'm about to do catalog changes, please pause
> decoding.  Once it gets an acknowledgement that decoding has paused,
> it continues its work.  Decoding resumes after commit (or maybe
> earlier if it's provably safe).
> 

Let's assume running transaction is holding an exclusive lock on 
something. We start decoding it and do this little dance with sending 
messages, confirmations etc. The decoding starts, and the plugin asks 
for the same lock (and starts waiting). Then the transaction decides to 
do some catalog changes, and sends a "pause" message to the decoding. 
Who's going to respond, considering the decoding is waiting for the lock 
(and it's not easy to jump out, because it might be deep inside the 
output plugin, i.e. deep in some extension).

>> But isn't this (delaying the catalog cleanup etc.) pretty much the original
>> approach, implemented by the original patch? Which you also claimed to be
>> unworkable, IIRC? Or how is this addressing the problems with broken HOT
>> chains, for example? Those issues were pretty much the reason why we started
>> looking at alternative approaches, like delaying the abort ...
> 
> I don't think so.  The original approach, IIRC, was to decode after
> the abort had already happened, and my objection was that you can't
> rely on the state of anything at that point.

Pretty much, yes. Clearly there needs to be some sort of coordination 
between the transaction and decoding process ...

> The approach here is to
> wait until the abort is in progress and then basically pause it while
> we try to read stuff, but that seems similarly riddled with problems.

Yeah :-(

> The newer approach could be considered an improvement in that you've
> tried to get your hands around the problem at an earlier point, but
> it's not early enough.  To take a very rough analogy, the original
> approach was like trying to install a sprinkler system after the
> building had already burned down, while the new approach is like
> trying to install a sprinkler system when you notice that the building
> is on fire.

When an oil well is burning, they detonate a small bomb next to it to 
extinguish it. What would be the analogy to that, here? pg_resetwal? ;-)

> But we need to install the sprinkler system in advance.

Damn causality!

> That is, we need to make all of the necessary preparations for a
> possible abort before the abort occurs.  That could perhaps be done by
> arranging things so that decoding after an abort is actually still
> safe (e.g. by making it look to certain parts of the system as though
> the aborted transaction is still in progress until decoding no longer
> cares about it) or by making sure that we are never decoding at the
> point where a problematic abort happens (e.g. as proposed above, pause
> decoding before doing dangerous things).
> 
>> I wonder if disabling HOT on catalogs with wal_level=logical would be an
>> option here. I'm not sure how important HOT on catalogs is, in practice (it
>> surely does not help with the typical catalog bloat issue, which is
>> temporary tables, because that's mostly insert+delete). I suppose we could
>> disable it only when there's a replication slot indicating support for
>> decoding of in-progress transactions, so that you still get HOT with plain
>> logical decoding.
> 
> Are you talking about HOT updates, or HOT pruning?  Disabling the
> former wouldn't help, and disabling the latter would break VACUUM,
> which assumes that any tuple not removed by HOT pruning is not a dead
> tuple (cf. 1224383e85eee580a838ff1abf1fdb03ced973dc, which was caused
> by a case where that wasn't true).
> 

I'm talking about the issue you described here:

https://www.postgresql.org/message-id/CA+TgmoZP0SxEfKW1Pn=ackUj+KdWCxs7PumMAhSYJeZ+_61_GQ@mail.gmail.com

>> I'm sure there will be other obstacles, not just the HOT chain stuff, but it
>> would mean one step closer to a solution.
> 
> Right.
> 
> Here's a crazy idea.  Instead of disabling HOT pruning or anything
> like that, have the decoding process advertise the XID of the
> transaction being decoded as its own XID in its PGPROC.  Also, using
> magic, acquire a lock on that XID even though the foreground
> transaction already holds that lock in exclusive mode.  Fix the code
> (and I'm pretty sure there is some) that relies on an XID appearing in
> the procarray only once to no longer make that assumption.  Then, if
> the foreground process aborts, it will appear to the rest of the
> system that the it's still running, so HOT pruning won't remove the
> XID, CLOG won't get truncated, people who are waiting to update a
> tuple updated by the aborted transaction will keep waiting, etc.  We
> know that we do the right thing for running transactions, so if we
> make this aborted transaction look like it is running and are
> sufficiently convincing about the way we do that, then it should also
> work.  That seems more likely to be able to be made robust than
> addressing specific problems (e.g. a tuple might get removed!) one by
> one.
> 

A dumb question - would this work with subtransaction-level aborts? I 
mean, a transaction that does some catalog changes in a subxact which 
then however aborts, but then still continues.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: Make foo=null a warning by default.
Next
From: Valery Kuzmin
Date:
Subject: One transaction and several processes