Re: [HACKERS] logical decoding of two-phase transactions - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] logical decoding of two-phase transactions |
Date | |
Msg-id | 00bafa2d-4742-6555-5a72-3208812dc3fe@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] logical decoding of two-phase transactions (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] logical decoding of two-phase transactions
|
List | pgsql-hackers |
On 07/18/2018 04:56 PM, Robert Haas wrote: > On Wed, Jul 18, 2018 at 10:08 AM, Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: >> The problem is you don't know if a transaction does DDL sometime later, in >> the part that you might not have decoded yet (or perhaps concurrently with >> the decoding). So I don't see how you could easily exclude such transactions >> from the decoding ... > > One idea is that maybe the running transaction could communicate with > the decoding process through shared memory. For example, suppose that > before you begin decoding an ongoing transaction, you have to send > some kind of notification to the process saying "hey, I'm going to > start decoding you" and wait for that process to acknowledge receipt > of that message (say, at the next CFI). Once it acknowledges receipt, > you can begin decoding. Then, we're guaranteed that the foreground > process knows when that it must be careful about catalog changes. If > it's going to make one, it sends a note to the decoding process and > says, hey, sorry, I'm about to do catalog changes, please pause > decoding. Once it gets an acknowledgement that decoding has paused, > it continues its work. Decoding resumes after commit (or maybe > earlier if it's provably safe). > Let's assume running transaction is holding an exclusive lock on something. We start decoding it and do this little dance with sending messages, confirmations etc. The decoding starts, and the plugin asks for the same lock (and starts waiting). Then the transaction decides to do some catalog changes, and sends a "pause" message to the decoding. Who's going to respond, considering the decoding is waiting for the lock (and it's not easy to jump out, because it might be deep inside the output plugin, i.e. deep in some extension). >> But isn't this (delaying the catalog cleanup etc.) pretty much the original >> approach, implemented by the original patch? Which you also claimed to be >> unworkable, IIRC? Or how is this addressing the problems with broken HOT >> chains, for example? Those issues were pretty much the reason why we started >> looking at alternative approaches, like delaying the abort ... > > I don't think so. The original approach, IIRC, was to decode after > the abort had already happened, and my objection was that you can't > rely on the state of anything at that point. Pretty much, yes. Clearly there needs to be some sort of coordination between the transaction and decoding process ... > The approach here is to > wait until the abort is in progress and then basically pause it while > we try to read stuff, but that seems similarly riddled with problems. Yeah :-( > The newer approach could be considered an improvement in that you've > tried to get your hands around the problem at an earlier point, but > it's not early enough. To take a very rough analogy, the original > approach was like trying to install a sprinkler system after the > building had already burned down, while the new approach is like > trying to install a sprinkler system when you notice that the building > is on fire. When an oil well is burning, they detonate a small bomb next to it to extinguish it. What would be the analogy to that, here? pg_resetwal? ;-) > But we need to install the sprinkler system in advance. Damn causality! > That is, we need to make all of the necessary preparations for a > possible abort before the abort occurs. That could perhaps be done by > arranging things so that decoding after an abort is actually still > safe (e.g. by making it look to certain parts of the system as though > the aborted transaction is still in progress until decoding no longer > cares about it) or by making sure that we are never decoding at the > point where a problematic abort happens (e.g. as proposed above, pause > decoding before doing dangerous things). > >> I wonder if disabling HOT on catalogs with wal_level=logical would be an >> option here. I'm not sure how important HOT on catalogs is, in practice (it >> surely does not help with the typical catalog bloat issue, which is >> temporary tables, because that's mostly insert+delete). I suppose we could >> disable it only when there's a replication slot indicating support for >> decoding of in-progress transactions, so that you still get HOT with plain >> logical decoding. > > Are you talking about HOT updates, or HOT pruning? Disabling the > former wouldn't help, and disabling the latter would break VACUUM, > which assumes that any tuple not removed by HOT pruning is not a dead > tuple (cf. 1224383e85eee580a838ff1abf1fdb03ced973dc, which was caused > by a case where that wasn't true). > I'm talking about the issue you described here: https://www.postgresql.org/message-id/CA+TgmoZP0SxEfKW1Pn=ackUj+KdWCxs7PumMAhSYJeZ+_61_GQ@mail.gmail.com >> I'm sure there will be other obstacles, not just the HOT chain stuff, but it >> would mean one step closer to a solution. > > Right. > > Here's a crazy idea. Instead of disabling HOT pruning or anything > like that, have the decoding process advertise the XID of the > transaction being decoded as its own XID in its PGPROC. Also, using > magic, acquire a lock on that XID even though the foreground > transaction already holds that lock in exclusive mode. Fix the code > (and I'm pretty sure there is some) that relies on an XID appearing in > the procarray only once to no longer make that assumption. Then, if > the foreground process aborts, it will appear to the rest of the > system that the it's still running, so HOT pruning won't remove the > XID, CLOG won't get truncated, people who are waiting to update a > tuple updated by the aborted transaction will keep waiting, etc. We > know that we do the right thing for running transactions, so if we > make this aborted transaction look like it is running and are > sufficiently convincing about the way we do that, then it should also > work. That seems more likely to be able to be made robust than > addressing specific problems (e.g. a tuple might get removed!) one by > one. > A dumb question - would this work with subtransaction-level aborts? I mean, a transaction that does some catalog changes in a subxact which then however aborts, but then still continues. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: