Re: logical decoding of two-phase transactions - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: logical decoding of two-phase transactions
Date
Msg-id CAMsr+YFOR=QAvk_2e+Yq8Jg01TBPwHCN1ccEsqbUxz5phFfFgQ@mail.gmail.com
Whole thread Raw
In response to Re: logical decoding of two-phase transactions  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 28 Mar. 2017 23:08, "Andres Freund" <andres@anarazel.de> wrote:

> >> I don't think its for us to say what the plugin is allowed to do. We
> >> decided on a plugin architecture, so we have to trust that the plugin
> >> author resolves the issues. We can document them so those choices are
> >> clear.
> >
> > I don't think this is "plugin architecture" related. The output pluging
> > can't do right here, this has to be solved at a higher level.
>
> That assertion is obviously false... the plugin can resolve this in
> various ways, if we allow it.

Handling it by breaking replication isn't handling it (e.g. timeouts in
decoding etc). 

IMO, if it's a rare condition and we can abort decoding then recover cleanly and succeed on retry, that's OK. Not dissimilar to the deadlock detector. But right now that's not the case, it's possible (however artificially) to create prepared xacts for which decoding will stall and not succeed.
 
Handling it by rolling back *prepared* transactions
(which are supposed to be guaranteed to succeed!), isn't either.

I agree, we can't rely on anything for which the only way to continue is to rollback a prepared xact.
 
> You can say that in your opinion you prefer to see this handled in
> some higher level way, though it would be good to hear why and how.

It's pretty obvious why: A bit of DDL by the user shouldn't lead to the
issues mentioned above.

I agree that it shouldn't, and in fact DDL is the main part of why I want 2PC decoding.

What's surprised me is that I haven't actually been able to create any situations where, with test_decoding, we have such a failure. Not unless I manually LOCK TABLE pg_attribute, anyway.

Notably, we already disallow prepared xacts that make changes to the relfilenodemap, which covers a lot of the problem cases like CLUSTERing system tables.
 
> Bottom line here is we shouldn't reject this patch on this point,

I think it definitely has to be rejected because of that.  And I didn't
bring this up at the last minute, I repeatedly brought it up before.
Both to Craig and Stas.

Yes, and I lost track of it while focusing on the catalog tuple visibility issues. I warned Stas of this issue when he first mentioned an interest in decoding of 2PC actually, but haven't kept a proper eye on it since.

Andres and I even discussed this back in the early BDR days, it's not new and is part of why I poked Stas to try some DDL tests etc. The tests in the initial patch didn't have enough coverage to trigger any issues - they didn't actually test decoding of a 2pc xact while it was still in-progress at all. But even once I added more tests I've actually been unable to reproduce this in a realistic real world example.

Frankly I'm confused by that, since I would expect an AEL on some_table to cause decoding of some_table to get stuck. It does not.

That doesn't mean we should accept failure cases and commit something with holes in it. But it might inform our choices about how we solve those issues.
 
One way to fix this would be to allow decoding to acquire such locks
(i.e. locks held by the prepared xact we're decoding) - there
unfortunately are some practical issues with that (e.g. the locking code
doesnt' necessarily expect a second non-exclusive locker, when there's
an exclusive one),  or we could add an exception to the locking code to
simply not acquire such locks.

I've been meaning to see if we can use the parallel infrastructure's session leader infrastructure for this, by making the 2pc fake-proc a leader and making our decoding session inherit its locks. I haven't dug into it to see if it's even remotely practical yet, and won't be able to until early pg11.

We could proceed with the caveat that decoding plugins that use 2pc support must defer decoding of 2pc xacts containing ddl until commit prepared, or must take responsibility for ensuring (via a command filter, etc) that xacts are safe to decode and 2pc lock xacts during decoding. But we're likely to change the interface for all that when we iterate for pg11 and I'd rather not carry more BC than we have to. Also, the patch has unsolved issues with how it keeps track of whether an xact was output at prepare time or not and suppresses output at commit time.

I'm inclined to shelve the patch for Pg 10. We've only got a couple of days left, the tests are still pretty minimal. We have open issues around locking, less than totally satisfactory abort handling, and potential to skip replay of transactions for both prepare and commit prepared. It's not ready to go. However, it's definitely to the point where with a little more work it'll be practical to patch into variants of Pg until we can mainstream it in Pg 11, which is nice.

--
Craig Ringer

pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: Supporting huge pages on Windows
Next
From: Tom Lane
Date:
Subject: Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if, \quit_unless)