Re: Parallel scan with SubTransGetTopmostTransaction assert coredump - Mailing list pgsql-hackers
From | Greg Nancarrow |
---|---|
Subject | Re: Parallel scan with SubTransGetTopmostTransaction assert coredump |
Date | |
Msg-id | CAJcOf-engk+dyWJcdZzNZct1aGG2gqTbPzorDeTARKxrP89QAQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel scan with SubTransGetTopmostTransaction assert coredump (Maxim Orlov <m.orlov@postgrespro.ru>) |
Responses |
Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
|
List | pgsql-hackers |
On Wed, Jun 23, 2021 at 1:06 AM Maxim Orlov <m.orlov@postgrespro.ru> wrote: > > The analysis in the beginning of the discussion seems to be right, but > the fix v2 looks too invasive for me. > > Personally, I'd like not to remove snapshot even if transaction is > read-only. I propose to consider "xid < TransactionXmin" as a legit case > and just promote xid to TransactionXmin. > > It's annoying this old bug still not fixed. What do you think? This v3 patch doesn't look right to me at all. It's not addressing the fundamental problem, it just seems to be working around it, by fiddling a xid value to avoid an assertion failure. I really can't see how the v2 patch "removes a snapshot" in the read-only transaction case, and is "invasive". Have you found a case that the v2 patch breaks? The v2 patch does follow the analysis in the beginning of the discussion, which identified that in setting up parallel workers, a "later transaction snapshot" was taken than the one actually used in the statement execution, and that's what eventually leads to the observed Assertion failure. The original problem reporter stated: "So the root cause is the Parallel Workers process set the TransactionXmin with later transcation snapshot". Also, as far as I can see, it makes no sense to pass parallel workers both an active snapshot and a (later) transaction snapshot. In the leader, prior to execution and running parallel workers, a transaction snapshot is obtained and pushed as the active snapshot (note: only ONE snapshot at this point). It makes little sense to me to then obtain ANOTHER transaction snapshot when setting up parallel workers, and pass that (only) to the workers along with the original (earlier) active snapshot. (If there is a reason for passing the TWO snapshots to parallel workers, original code authors and/or snapshot experts should speak up now, and code comments should be added accordingly to explain the purpose and how it is MEANT to work) This is why the v2 patch updates the code to just pass one snapshot (the active snapshot) to the parallel workers, which gets restored in each worker as the transaction snapshot and set as the active snapshot (so is then consistent with the snapshot setup in the parallel leader). The v3 patch doesn't address any of this at all. It is just checking if the xid is < the TransactionXMin in the snapshot, and if so, setting the xid to be TransactionXMin, and thus avoiding the Assertion failure. But that TransactionXMin was from the "later transaction snapshot", which was only obtained for the workers and doesn't necessarily match the earlier active snapshot taken, that is associated with the actual statement execution. As far as I am concerned, that "later transaction snapshot", taken for use by the workers, doesn't seem right, so that's why the v2 patch removes the use of it and only uses the active snapshot, for consistency with the rest of the code. I think that if there was something fundamentally wrong with the v2 patch's removal of that "later transaction snapshot" used by the parallel workers, then surely there would be a test failure somewhere - but that is not the case. I think the v2 patch should be restored as the proposed solution here. Regards, Greg Nancarrow Fujitsu Australia
pgsql-hackers by date: