Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Date
Msg-id CAA4eK1+=ErcJHPhW_mbTO4k+abZabAi5FtFXe3eYua9L3xrm4w@mail.gmail.com
Whole thread Raw
In response to Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
Re: PATCH: logical_work_mem and logical streaming of largein-progress transactions
List pgsql-hackers
On Thu, May 28, 2020 at 12:46 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, May 26, 2020 at 12:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Isn't this problem only for subxact file as we anyway create changes
> > file as part of start stream message which should have come after
> > abort?  If so, can't we detect whether subxact file exists probably by
> > using nsubxacts or something like that?  Can you please once try to
> > reproduce this scenario to ensure that we are not missing anything?
>
> I have tested this, as of now, by default we create both changes and
> subxact files irrespective of whether we get any subtransactions or
> not.  Maybe this could be optimized that only if we have any subxact
> then only create that file otherwise not?  What's your opinion on the
> same.
>

Yeah, that makes sense.

> > > > > > 8.
> > > > > > @@ -2295,6 +2677,13 @@ ReorderBufferXidSetCatalogChanges(ReorderBuffer
> > > > > > *rb, TransactionId xid,
> > > > > >   txn = ReorderBufferTXNByXid(rb, xid, true, NULL, lsn, true);
> > > > > >
> > > > > >   txn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > > +
> > > > > > + /*
> > > > > > + * TOCHECK: Mark toplevel transaction as having catalog changes too
> > > > > > + * if one of its children has.
> > > > > > + */
> > > > > > + if (txn->toptxn != NULL)
> > > > > > + txn->toptxn->txn_flags |= RBTXN_HAS_CATALOG_CHANGES;
> > > > > >  }
> > > > > >
> > > > > > Why are we marking top transaction here?
> > > > >
> > > > > We need to mark top transaction to decide whether to build tuplecid
> > > > > hash or not.  In non-streaming mode, we are only sending during the
> > > > > commit time, and during commit time we know whether the top
> > > > > transaction has any catalog changes or not based on the invalidation
> > > > > message so we are marking the top transaction there in DecodeCommit.
> > > > > Since here we are not waiting till commit so we need to mark the top
> > > > > transaction as soon as we mark any of its child transactions.
> > > > >
> > > >
> > > > But how does it help?  We use this flag (via
> > > > ReorderBufferXidHasCatalogChanges) in SnapBuildCommitTxn which is
> > > > anyway done in DecodeCommit and that too after setting this flag for
> > > > the top transaction if required.  So, how will it help in setting it
> > > > while processing for subxid.  Also, even if we have to do it won't it
> > > > add the xid needlessly in builder->committed.xip array?
> > >
> > > In ReorderBufferBuildTupleCidHash, we use this flag to decide whether
> > > to build the tuplecid hash or not based on whether it has catalog
> > > changes or not.
> > >
> >
> > Okay, but you haven't answered the second part of the question: "won't
> > it add the xid of top transaction needlessly in builder->committed.xip
> > array, see function SnapBuildCommitTxn?"  IIUC, this can happen
> > without patch as well because DecodeCommit also sets the flags just
> > based on invalidation messages irrespective of whether the messages
> > are generated by top transaction or not, is that right?
>
> Yes, with or without the patch it always adds the topxid.  I think
> purpose for doing this with/without patch is not for the snapshot
> instead we are marking the top itself that some of its subtxn has the
> catalog changes so that while building the tuplecid has we can know
> whether to build the hash or not.  But, having said that I feel in
> ReorderBufferBuildTupleCidHash why do we need these two checks
> if (!rbtxn_has_catalog_changes(txn) || dlist_is_empty(&txn->tuplecids))
> return;
>
> I mean it should be enough to just have the check,  because if we have
> added something to the tuplecids then catalog changes must be there
> because that time we are setting the catalog changes to true.
>
> if (dlist_is_empty(&txn->tuplecids))
> return;
>
> I think in the base code there are multiple things going on
> 1. If we get new CID we always set the catalog change in that
> transaction but add the tuplecids in the top transaction.  So
> basically, top transaction is so far not marked with catalog changes
> but it has tuplecids.
> 2. Now, in DecodeCommit the top xid will be marked that it has catalog
> changes based on the invalidation messages.
>

I don't think it is advisable to remove that check from base code
unless we have a strong reason for doing so.  I think here you can
write better comments about why you are marking the flag for top
transaction and remove TOCHECK from the comment.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Inlining of couple of functions in pl_exec.c improves performance
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: SIGSEGV from START_REPLICATION 0/XXXXXXX in XLogSendPhysical() at walsender.c:2762