Thread: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

On Tue, Mar 16, 2021 at 1:35 AM Oh, Mike <minsoo@amazon.com> wrote:
>
> Sending this to pgsql-hackers list to create a CommitFest entry with the attached patch proposal.
>
>
>
> Hello,
>
> We noticed that the logical replication could fail when the Standby::RUNNING_XACT record is generated in the middle
ofa catalog modifying transaction and if the logical decoding has to restart from the RUNNING_XACT 
>
> WAL entry.
>
> The Standby::RUNNING_XACT record is generated periodically (roughly every 15s by default) or during a CHECKPOINT
operation.
>
>
>
> Detailed problem description:
>
> Tested on 11.8 & current master.
>
> The logical replication slot restart_lsn advances in the middle of an open txn that modified the catalog (e.g.
TRUNCATEoperation). 
>
> Should the logical decoding has to restart it could fail with an error like this:
>
> ERROR:  could not map filenode "base/13237/442428"

Thank you for reporting the issue.

I could reproduce this issue by the steps you shared.

>
> Currently, the system relies on processing Heap2::NEW_CID to keep track of catalog modifying (sub)transactions.
>
> This context is lost if the logical decoding has to restart from a Standby::RUNNING_XACTS that is written between the
NEW_CIDrecord and its parent txn commit. 
>
> If the logical stream restarts from this restart_lsn, then it doesn't have the xid responsible for modifying the
catalog.
>

I agree with your analysis. Since we don’t use commit WAL record to
track the transaction that has modified system catalogs, if we decode
only the commit record of such transaction, we cannot know the
transaction has been modified system catalogs, resulting in the
subsequent transaction scans system catalog with the wrong snapshot.

With the patch, if the commit WAL record has a XACT_XINFO_HAS_INVALS
flag and its xid is included in RUNNING_XACT record written at
restart_lsn,  we forcibly add the top XID and its sub XIDs as a
committed transaction that has modified system catalogs to the
snapshot. I might be missing something about your patch but I have
some comments on this approach:

1. Commit WAL record may not have invalidation message for system
catalogs  (e.g., when commit record has only invalidation message for
relcache) even if it has XACT_XINFO_HAS_INVALS flag. In this case, the
transaction wrongly is added to the snapshot, is that okay?

2. We might add a subtransaction XID as a committed transaction that
has modified system catalogs even if it actually didn't. As the
comment in SnapBuildBuildSnapshot() describes, we track only the
transactions that have modified the system catalog and store in the
snapshot (in the ‘xip' array). The patch could break that assumption.
However, I’m really not sure how to deal with this point. We cannot
know which subtransaction has actually modified system catalogs by
using only the commit WAL record.

3. The patch covers only the case where the restart_lsn exactly
matches the LSN of RUNNING_XACT. I wonder if there could be a case
where the decoding starts at a WAL record other than RUNNING_XACT but
the next WAL record is RUNNING_XACT.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Hi,

On 7/29/21 10:25 AM, Masahiko Sawada wrote:
> Thank you for reporting the issue.
>
> I could reproduce this issue by the steps you shared.

Thanks for looking at it!

>
>> Currently, the system relies on processing Heap2::NEW_CID to keep track of catalog modifying (sub)transactions.
>>
>> This context is lost if the logical decoding has to restart from a Standby::RUNNING_XACTS that is written between
theNEW_CID record and its parent txn commit.
 
>>
>> If the logical stream restarts from this restart_lsn, then it doesn't have the xid responsible for modifying the
catalog.
>>
> I agree with your analysis. Since we don’t use commit WAL record to
> track the transaction that has modified system catalogs, if we decode
> only the commit record of such transaction, we cannot know the
> transaction has been modified system catalogs, resulting in the
> subsequent transaction scans system catalog with the wrong snapshot.
Right.
>
> With the patch, if the commit WAL record has a XACT_XINFO_HAS_INVALS
> flag and its xid is included in RUNNING_XACT record written at
> restart_lsn,  we forcibly add the top XID and its sub XIDs as a
> committed transaction that has modified system catalogs to the
> snapshot. I might be missing something about your patch but I have
> some comments on this approach:
>
> 1. Commit WAL record may not have invalidation message for system
> catalogs  (e.g., when commit record has only invalidation message for
> relcache) even if it has XACT_XINFO_HAS_INVALS flag.

Right, good point (create policy for example would lead to an 
invalidation for relcache only).

> In this case, the
> transaction wrongly is added to the snapshot, is that okay?
This transaction is a committed one, and IIUC the snapshot would be used 
only for catalog visibility, so i don't see any issue to add it in the 
snapshot, what do you think?
>
> 2. We might add a subtransaction XID as a committed transaction that
> has modified system catalogs even if it actually didn't.

Right, like when needs_timetravel is true.

> As the
> comment in SnapBuildBuildSnapshot() describes, we track only the
> transactions that have modified the system catalog and store in the
> snapshot (in the ‘xip' array). The patch could break that assumption.
Right. It looks to me that breaking this assumption is not an issue.

IIUC currently the committed ones that are not modifying the catalog are 
not stored "just" because we don't need them.
> However, I’m really not sure how to deal with this point. We cannot
> know which subtransaction has actually modified system catalogs by
> using only the commit WAL record.
Right, unless we rewrite this patch so that a commit WAL record will 
produce this information.
>
> 3. The patch covers only the case where the restart_lsn exactly
> matches the LSN of RUNNING_XACT.
Right.
>   I wonder if there could be a case
> where the decoding starts at a WAL record other than RUNNING_XACT but
> the next WAL record is RUNNING_XACT.

Not sure, but could a restart_lsn not be a RUNNING_XACTS?

Thanks

Bertrand




On 7/29/21 01:25, Masahiko Sawada wrote:
> On Tue, Mar 16, 2021 at 1:35 AM Oh, Mike <minsoo@amazon.com> wrote:
>>
>> Sending this to pgsql-hackers list to create a CommitFest entry with the attached patch proposal.
>>
>> ...
>>
>> Detailed problem description:
>>
>> Tested on 11.8 & current master.
>>
>> The logical replication slot restart_lsn advances in the middle of an open txn that modified the catalog (e.g.
TRUNCATEoperation).
 
>>
>> Should the logical decoding has to restart it could fail with an error like this:
>>
>> ERROR:  could not map filenode "base/13237/442428"
> 
> Thank you for reporting the issue.
> 
> I could reproduce this issue by the steps you shared.


I also noticed a bug report earlier this year with another PG user
reporting the same error - on version 12.3

https://www.postgresql.org/message-id/flat/16812-3d9df99bd77ff616%40postgresql.org

Today I received a report from a new PG user of this same error message
causing their logical replication to break. This customer was also
running PostgreSQL 12.3 on both source and target side.

Haven't yet dumped WAL or anything, but wanted to point out that the
error is being seen in the wild - I hope we can get a version of this
patch committed soon, as it will help with at least one cause.


-Jeremy

-- 
Jeremy Schneider
Database Engineer
Amazon Web Services



On Thu, Sep 23, 2021 at 5:44 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
>
> Hi,
>
> On 7/29/21 10:25 AM, Masahiko Sawada wrote:
> > Thank you for reporting the issue.
> >
> > I could reproduce this issue by the steps you shared.
>
> Thanks for looking at it!
>
> >
> >> Currently, the system relies on processing Heap2::NEW_CID to keep track of catalog modifying (sub)transactions.
> >>
> >> This context is lost if the logical decoding has to restart from a Standby::RUNNING_XACTS that is written between
theNEW_CID record and its parent txn commit. 
> >>
> >> If the logical stream restarts from this restart_lsn, then it doesn't have the xid responsible for modifying the
catalog.
> >>
> > I agree with your analysis. Since we don’t use commit WAL record to
> > track the transaction that has modified system catalogs, if we decode
> > only the commit record of such transaction, we cannot know the
> > transaction has been modified system catalogs, resulting in the
> > subsequent transaction scans system catalog with the wrong snapshot.
> Right.
> >
> > With the patch, if the commit WAL record has a XACT_XINFO_HAS_INVALS
> > flag and its xid is included in RUNNING_XACT record written at
> > restart_lsn,  we forcibly add the top XID and its sub XIDs as a
> > committed transaction that has modified system catalogs to the
> > snapshot. I might be missing something about your patch but I have
> > some comments on this approach:
> >
> > 1. Commit WAL record may not have invalidation message for system
> > catalogs  (e.g., when commit record has only invalidation message for
> > relcache) even if it has XACT_XINFO_HAS_INVALS flag.
>
> Right, good point (create policy for example would lead to an
> invalidation for relcache only).
>
> > In this case, the
> > transaction wrongly is added to the snapshot, is that okay?
> This transaction is a committed one, and IIUC the snapshot would be used
> only for catalog visibility, so i don't see any issue to add it in the
> snapshot, what do you think?

It seems to me that it's no problem since we always transaction with
catalog-changed when decoding XLOG_XACT_INVALIDATIONS records.

> >
> > 2. We might add a subtransaction XID as a committed transaction that
> > has modified system catalogs even if it actually didn't.
>
> Right, like when needs_timetravel is true.
>
> > As the
> > comment in SnapBuildBuildSnapshot() describes, we track only the
> > transactions that have modified the system catalog and store in the
> > snapshot (in the ‘xip' array). The patch could break that assumption.
> Right. It looks to me that breaking this assumption is not an issue.
>
> IIUC currently the committed ones that are not modifying the catalog are
> not stored "just" because we don't need them.
> > However, I’m really not sure how to deal with this point. We cannot
> > know which subtransaction has actually modified system catalogs by
> > using only the commit WAL record.
> Right, unless we rewrite this patch so that a commit WAL record will
> produce this information.
> >
> > 3. The patch covers only the case where the restart_lsn exactly
> > matches the LSN of RUNNING_XACT.
> Right.
> >   I wonder if there could be a case
> > where the decoding starts at a WAL record other than RUNNING_XACT but
> > the next WAL record is RUNNING_XACT.
>
> Not sure, but could a restart_lsn not be a RUNNING_XACTS?

I guess the decoding always starts from RUNING_XACTS.
After more thought, I think that the basic approach of the proposed
patch is a probably good idea, which we add xid whose commit record
has XACT_XINFO_HAS_INVALS to the snapshot. The problem as I see is
that during decoding COMMIT record we cannot know which transactions
(top transaction or subtransactions) actually did catalog changes. But
given that even if XLOG_XACT_INVALIDATION has only relcache
invalidation message we always mark the transaction with
catalog-changed, it seems no problem. Therefore, in the reported
cases, probably we can add both the top transaction xid and its
subscription xids to the snapshot.

Regarding the patch details, I have two comments:

---
+ if ((parsed->xinfo & XACT_XINFO_HAS_INVALS) && last_running)
+ {
+     /* make last_running->xids bsearch()able */
+     qsort(last_running->xids,
+              last_running->subxcnt + last_running->xcnt,
+              sizeof(TransactionId), xidComparator);

The patch does qsort() every time when the commit message has
XACT_XINFO_HAS_INVALS. IIUC the xids we need to remember is the only
xids that are recorded in the first replayed XLOG_RUNNING_XACTS,
right? If so, we need to do qsort() once, can remove xid from the
array once it gets committed, and then can eventually make
last_running empty so that we can skip even TransactionIdInArray().

---
Since last_running is allocated by malloc() and it isn't freed even
after finishing logical decoding.


Another idea to fix this problem would be that before calling
SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
and then mark all of them as catalog-changed by calling
ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
this idea. What the patch does is essentially the same as what the
proposed patch does. But the patch doesn't modify the
SnapBuildCommitTxn(). And we remember the list of last running
transactions in reorder buffer and the list is periodically purged
during decoding RUNNING_XACTS records, eventually making it empty.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Thursday, October 7, 2021 1:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Regarding the patch details, I have two comments:
> 
> ---
> + if ((parsed->xinfo & XACT_XINFO_HAS_INVALS) && last_running) {
> +     /* make last_running->xids bsearch()able */
> +     qsort(last_running->xids,
> +              last_running->subxcnt + last_running->xcnt,
> +              sizeof(TransactionId), xidComparator);
> 
> The patch does qsort() every time when the commit message has
> XACT_XINFO_HAS_INVALS. IIUC the xids we need to remember is the only
> xids that are recorded in the first replayed XLOG_RUNNING_XACTS, right? If so,
> we need to do qsort() once, can remove xid from the array once it gets
> committed, and then can eventually make last_running empty so that we can
> skip even TransactionIdInArray().
> 
> ---
> Since last_running is allocated by malloc() and it isn't freed even after finishing
> logical decoding.
> 
> 
> Another idea to fix this problem would be that before calling
> SnapBuildCommitTxn() we create transaction entries in ReorderBuffer for
> (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
> and then mark all of them as catalog-changed by calling
> ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for this idea.
> What the patch does is essentially the same as what the proposed patch does.
> But the patch doesn't modify the SnapBuildCommitTxn(). And we remember
> the list of last running transactions in reorder buffer and the list is periodically
> purged during decoding RUNNING_XACTS records, eventually making it
> empty.
Thanks for the patch.

Conducted a quick check of the POC.

Test of check-world PASSED with your patch and head.
Also, the original scenario described in [1] looks fine
with your revised patch and LOG_SNAPSHOT_INTERVAL_MS expansion in the procedure.

The last command in the provided steps showed below.

postgres=# select * from pg_logical_slot_get_changes('bdt_slot', null, null);
    lsn    | xid |                  data                  
-----------+-----+----------------------------------------
 0/1560020 | 710 | BEGIN 710
 0/1560020 | 710 | table public.bdt: INSERT: a[integer]:1
 0/1560140 | 710 | COMMIT 710


Minor comments for DecodeStandbyOp changes I noticed instantly
(1) minor suggestion of your comment.


+                                * has done catalog changes without these records, we miss to add
+                                * the xid to the snapshot so up creating the wrong snapshot. To

"miss to add" would be miss adding or fail to add.
And "up creating" is natural in this sentence ?

(2) a full-width space between "it'" and "s" in the next sentence.

+                                * mark an xid that actually has not done that but it’s not a



[1] - https://www.postgresql.org/message-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com

Best Regards,
    Takamichi Osumi



At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in 
> Another idea to fix this problem would be that before calling
> SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
> for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
> and then mark all of them as catalog-changed by calling
> ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
> this idea. What the patch does is essentially the same as what the
> proposed patch does. But the patch doesn't modify the
> SnapBuildCommitTxn(). And we remember the list of last running
> transactions in reorder buffer and the list is periodically purged
> during decoding RUNNING_XACTS records, eventually making it empty.

I came up with the third way.  SnapBuildCommitTxn already properly
handles the case where a ReorderBufferTXN with
RBTXN_HAS_CATALOG_CHANGES.  So this issue can be resolved by create
such ReorderBufferTXNs in SnapBuildProcessRunningXacts.

One problem with this is that change creates the case where multiple
ReorderBufferTXNs share the same first_lsn.  I haven't come up with a
clean idea to avoid relaxing the restriction of AssertTXNLsnOrder..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/logical/reorderbuffer.c b/src/backend/replication/logical/reorderbuffer.c
index 46e66608cf..503116764f 100644
--- a/src/backend/replication/logical/reorderbuffer.c
+++ b/src/backend/replication/logical/reorderbuffer.c
@@ -887,9 +887,14 @@ AssertTXNLsnOrder(ReorderBuffer *rb)
         if (cur_txn->end_lsn != InvalidXLogRecPtr)
             Assert(cur_txn->first_lsn <= cur_txn->end_lsn);
 
-        /* Current initial LSN must be strictly higher than previous */
+        /*
+         * Current initial LSN must be strictly higher than previous. except
+         * this transaction is created by XLOG_RUNNING_XACTS.  If one
+         * XLOG_RUNNING_XACTS creates multiple transactions, they share the
+         * same LSN. See SnapBuildProcessRunningXacts.
+         */
         if (prev_first_lsn != InvalidXLogRecPtr)
-            Assert(prev_first_lsn < cur_txn->first_lsn);
+            Assert(prev_first_lsn <= cur_txn->first_lsn);
 
         /* known-as-subtxn txns must not be listed */
         Assert(!rbtxn_is_known_subxact(cur_txn));
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index a5333349a8..58859112dc 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1097,6 +1097,20 @@ SnapBuildProcessRunningXacts(SnapBuild *builder, XLogRecPtr lsn, xl_running_xact
      */
     if (builder->state < SNAPBUILD_CONSISTENT)
     {
+        /*
+         * At the time we passed the first XLOG_RUNNING_XACTS record, the
+         * transactions notified by the record may have updated
+         * catalogs. Register the transactions with marking them as having
+         * caused catalog changes.  The worst misbehavior here is some spurious
+         * invalidation at decoding start.
+         */
+        if (builder->state == SNAPBUILD_START)
+        {
+            for (int i = 0 ; i < running->xcnt + running->subxcnt ; i++)
+                ReorderBufferXidSetCatalogChanges(builder->reorder,
+                                                  running->xids[i], lsn);
+        }
+
         /* returns false if there's no point in performing cleanup just yet */
         if (!SnapBuildFindSnapshot(builder, lsn, running))
             return;

.

On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
> > Another idea to fix this problem would be that before calling
> > SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
> > for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
> > and then mark all of them as catalog-changed by calling
> > ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
> > this idea. What the patch does is essentially the same as what the
> > proposed patch does. But the patch doesn't modify the
> > SnapBuildCommitTxn(). And we remember the list of last running
> > transactions in reorder buffer and the list is periodically purged
> > during decoding RUNNING_XACTS records, eventually making it empty.
>
> I came up with the third way.  SnapBuildCommitTxn already properly
> handles the case where a ReorderBufferTXN with
> RBTXN_HAS_CATALOG_CHANGES.  So this issue can be resolved by create
> such ReorderBufferTXNs in SnapBuildProcessRunningXacts.

Thank you for the idea and patch!

It's much simpler than mine. I think that creating an entry of a
catalog-changed transaction in the reorder buffer before
SunapBuildCommitTxn() is the right direction.

After more thought, given DDLs are not likely to happen than DML in
practice, probably we can always mark both the top transaction and its
subtransactions as containing catalog changes if the commit record has
XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to
overhead in practice. That way, the patch could be more simple and
doesn't need the change of AssertTXNLsnOrder().

I've attached another PoC patch. Also, I've added the tests for this
issue in test_decoding.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
Hi,

On 10/11/21 8:27 AM, Masahiko Sawada wrote:
> On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>> At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
>>> Another idea to fix this problem would be that before calling
>>> SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
>>> for (sub)transactions whose COMMIT record has XACT_XINFO_HAS_INVALS,
>>> and then mark all of them as catalog-changed by calling
>>> ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
>>> this idea. What the patch does is essentially the same as what the
>>> proposed patch does. But the patch doesn't modify the
>>> SnapBuildCommitTxn(). And we remember the list of last running
>>> transactions in reorder buffer and the list is periodically purged
>>> during decoding RUNNING_XACTS records, eventually making it empty.
>> I came up with the third way.  SnapBuildCommitTxn already properly
>> handles the case where a ReorderBufferTXN with
>> RBTXN_HAS_CATALOG_CHANGES.  So this issue can be resolved by create
>> such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
> Thank you for the idea and patch!

Thanks you both for your new patches proposal!

I liked Sawada's one but also do "prefer" Horiguchi's one.

>
> It's much simpler than mine. I think that creating an entry of a
> catalog-changed transaction in the reorder buffer before
> SunapBuildCommitTxn() is the right direction.
+1
>
> After more thought, given DDLs are not likely to happen than DML in
> practice, probably we can always mark both the top transaction and its
> subtransactions as containing catalog changes if the commit record has
> XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to
> overhead in practice. That way, the patch could be more simple and
> doesn't need the change of AssertTXNLsnOrder().
>
> I've attached another PoC patch. Also, I've added the tests for this
> issue in test_decoding.

Thanks!

It looks good to me, just have a remark about the comment:

+   /*
+    * Mark the top transaction and its subtransactions as containing 
catalog
+    * changes, if the commit record has invalidation message. This is 
necessary
+    * for the case where we decode only the commit record of the 
transaction
+    * that actually has done catalog changes.
+    */

What about?

     /*
      * Mark the top transaction and its subtransactions as containing 
catalog
      * changes, if the commit record has invalidation message. This is 
necessary
      * for the case where we did not decode the transaction that did 
the catalog
      * change(s) (the decoding restarted after). So that we are 
decoding only the
      * commit record of the transaction that actually has done catalog 
changes.
      */

Thanks

Bertrand




On 10/10/21 23:27, Masahiko Sawada wrote:
> 
> After more thought, given DDLs are not likely to happen than DML in
> practice, ...

I haven't looked closely at the patch, but I'd be careful about
workloads where people create and drop "temporary tables". I've seen
this pattern used a few times, especially by developers who came from a
SQL server background, for some reason.

I certainly don't think we need to optimize for this workload - which is
not a best practice on PostreSQL. I'd just want to be careful not to
make PostgreSQL logical replication crumble underneath it, if PG was
previously keeping up with difficulty. That would be a sad upgrade
experience!

-Jeremy

-- 
http://about.me/jeremy_schneider



On Monday, October 11, 2021 3:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> >
> > At Thu, 7 Oct 2021 13:20:14 +0900, Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote in
> > > Another idea to fix this problem would be that before calling
> > > SnapBuildCommitTxn() we create transaction entries in ReorderBuffer
> > > for (sub)transactions whose COMMIT record has
> XACT_XINFO_HAS_INVALS,
> > > and then mark all of them as catalog-changed by calling
> > > ReorderBufferXidSetCatalogChanges(). I've attached a PoC patch for
> > > this idea. What the patch does is essentially the same as what the
> > > proposed patch does. But the patch doesn't modify the
> > > SnapBuildCommitTxn(). And we remember the list of last running
> > > transactions in reorder buffer and the list is periodically purged
> > > during decoding RUNNING_XACTS records, eventually making it empty.
> >
> > I came up with the third way.  SnapBuildCommitTxn already properly
> > handles the case where a ReorderBufferTXN with
> > RBTXN_HAS_CATALOG_CHANGES.  So this issue can be resolved by
> create
> > such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
> 
> Thank you for the idea and patch!
> 
> It's much simpler than mine. I think that creating an entry of a catalog-changed
> transaction in the reorder buffer before
> SunapBuildCommitTxn() is the right direction.
> 
> After more thought, given DDLs are not likely to happen than DML in practice,
> probably we can always mark both the top transaction and its subtransactions
> as containing catalog changes if the commit record has
> XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to overhead in
> practice. That way, the patch could be more simple and doesn't need the
> change of AssertTXNLsnOrder().
> 
> I've attached another PoC patch. Also, I've added the tests for this issue in
> test_decoding.
I also felt that your patch addresses the problem in a good way.
Even without setting xid by NEW_CID decoding like in the original scenario,
we can set catalog change flag.

One really minor comment I have is,
in DecodeCommit(), you don't need to declar i. It's defined at the top of the function.

+               for (int i = 0; i < parsed->nsubxacts; i++)


Best Regards,
    Takamichi Osumi


On Wed, Oct 13, 2021 at 7:55 AM Jeremy Schneider
<schneider@ardentperf.com> wrote:
>
> On 10/10/21 23:27, Masahiko Sawada wrote:
> >
> > After more thought, given DDLs are not likely to happen than DML in
> > practice, ...
>
> I haven't looked closely at the patch, but I'd be careful about
> workloads where people create and drop "temporary tables". I've seen
> this pattern used a few times, especially by developers who came from a
> SQL server background, for some reason.

True. But since the snapshot builder is designed on the same
assumption it would not be problematic. It keeps track of the
committed catalog modifying transaction instead of keeping track of
all running transactions. See the header comment of snapbuild.c

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



At Mon, 11 Oct 2021 15:27:41 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in 
> .
> 
> On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > I came up with the third way.  SnapBuildCommitTxn already properly
> > handles the case where a ReorderBufferTXN with
> > RBTXN_HAS_CATALOG_CHANGES.  So this issue can be resolved by create
> > such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
> 
> Thank you for the idea and patch!
> 
> It's much simpler than mine. I think that creating an entry of a
> catalog-changed transaction in the reorder buffer before
> SunapBuildCommitTxn() is the right direction.
> 
> After more thought, given DDLs are not likely to happen than DML in
> practice, probably we can always mark both the top transaction and its
> subtransactions as containing catalog changes if the commit record has
> XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to
> overhead in practice. That way, the patch could be more simple and
> doesn't need the change of AssertTXNLsnOrder().
> 
> I've attached another PoC patch. Also, I've added the tests for this
> issue in test_decoding.

Thanks for the test script. (I did that with TAP framework but
isolation tester version is simpler.)

It adds a call to ReorderBufferAssignChild but usually subtransactions
are assigned to top level elsewherae.  Addition to that
ReorderBufferCommitChild() called just later does the same thing.  We
are adding the third call to the same function, which looks a bit odd.

And I'm not sure it is wise to mark all subtransactions as "catalog
changed" always when the top transaction is XACT_XINFO_HAS_INVALS. The
reason I did that in the snapshiot building phase is to prevent adding
to DecodeCommit an extra code that is needed only while any
transaction running since before replication start is surviving.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Thursday, October 14, 2021 11:21 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> At Mon, 11 Oct 2021 15:27:41 +0900, Masahiko Sawada
> <sawada.mshk@gmail.com> wrote in
> >
> > On Fri, Oct 8, 2021 at 4:50 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > > I came up with the third way.  SnapBuildCommitTxn already properly
> > > handles the case where a ReorderBufferTXN with
> > > RBTXN_HAS_CATALOG_CHANGES.  So this issue can be resolved by
> create
> > > such ReorderBufferTXNs in SnapBuildProcessRunningXacts.
> >
> > Thank you for the idea and patch!
> >
> > It's much simpler than mine. I think that creating an entry of a
> > catalog-changed transaction in the reorder buffer before
> > SunapBuildCommitTxn() is the right direction.
> >
> > After more thought, given DDLs are not likely to happen than DML in
> > practice, probably we can always mark both the top transaction and its
> > subtransactions as containing catalog changes if the commit record has
> > XACT_XINFO_HAS_INVALS? I believe this is not likely to lead to
> > overhead in practice. That way, the patch could be more simple and
> > doesn't need the change of AssertTXNLsnOrder().
> >
> > I've attached another PoC patch. Also, I've added the tests for this
> > issue in test_decoding.
>
> Thanks for the test script. (I did that with TAP framework but isolation tester
> version is simpler.)
>
> It adds a call to ReorderBufferAssignChild but usually subtransactions are
> assigned to top level elsewherae.  Addition to that
> ReorderBufferCommitChild() called just later does the same thing.  We are
> adding the third call to the same function, which looks a bit odd.
It can be odd. However, we
have a check at the top of ReorderBufferAssignChild
to judge if the sub transaction is already associated or not
and skip the processings if it is.

> And I'm not sure it is wise to mark all subtransactions as "catalog changed"
> always when the top transaction is XACT_XINFO_HAS_INVALS.
In order to avoid this,
can't we have a new flag (for example, in reorderbuffer struct) to check
if we start decoding from RUNNING_XACTS, which is similar to the first patch of [1]
and use it at DecodeCommit ? This still leads to some extra specific codes added
to DecodeCommit and this solution becomes a bit similar to other previous patches though.


[1] - https://www.postgresql.org/message-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com


Best Regards,
    Takamichi Osumi




At Tue, 19 Oct 2021 02:45:24 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in 
> On Thursday, October 14, 2021 11:21 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
> > It adds a call to ReorderBufferAssignChild but usually subtransactions are
> > assigned to top level elsewherae.  Addition to that
> > ReorderBufferCommitChild() called just later does the same thing.  We are
> > adding the third call to the same function, which looks a bit odd.
> It can be odd. However, we
> have a check at the top of ReorderBufferAssignChild
> to judge if the sub transaction is already associated or not
> and skip the processings if it is.

My question was why do we need to make the extra call to
ReorerBufferCommitChild when XACT_XINFO_HAS_INVALS in spite of the
existing call to the same fuction that unconditionally made.  It
doesn't cost so much but also it's not free.

> > And I'm not sure it is wise to mark all subtransactions as "catalog changed"
> > always when the top transaction is XACT_XINFO_HAS_INVALS.
> In order to avoid this,
> can't we have a new flag (for example, in reorderbuffer struct) to check
> if we start decoding from RUNNING_XACTS, which is similar to the first patch of [1]
> and use it at DecodeCommit ? This still leads to some extra specific codes added
> to DecodeCommit and this solution becomes a bit similar to other previous patches though.

If it is somehow wrong in any sense that we add subtransactions in
SnapBuildProcessRunningXacts (for example, we should avoid relaxing
the assertion condition.), I think we would go another way.  Otherwise
we don't even need that additional flag.  (But Sawadasan's recent PoC
also needs that relaxation.)

ASAICS, and unless I'm missing something (that odds are rtlatively
high:p), we need the specially added subransactions only for the
transactions that were running at passing the first RUNNING_XACTS,
becuase otherwise (substantial) subtransactions are assigned to
toplevel by the first record of the subtransaction.

Before reaching consistency, DecodeCommit feeds the subtransactions to
ReorderBufferForget individually so the subtransactions are not needed
to be assigned to the top transaction at all. Since the
subtransactions added by the first RUNNING_XACT are processed that
way, we don't need in the first place to call ReorderBufferCommitChild
for such subtransactions.

> [1] - https://www.postgresql.org/message-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Hi,

On 10/19/21 8:43 AM, Kyotaro Horiguchi wrote:
> At Tue, 19 Oct 2021 02:45:24 +0000, "osumi.takamichi@fujitsu.com" <osumi.takamichi@fujitsu.com> wrote in
>> On Thursday, October 14, 2021 11:21 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>>> It adds a call to ReorderBufferAssignChild but usually subtransactions are
>>> assigned to top level elsewherae.  Addition to that
>>> ReorderBufferCommitChild() called just later does the same thing.  We are
>>> adding the third call to the same function, which looks a bit odd.
>> It can be odd. However, we
>> have a check at the top of ReorderBufferAssignChild
>> to judge if the sub transaction is already associated or not
>> and skip the processings if it is.
> My question was why do we need to make the extra call to
> ReorerBufferCommitChild when XACT_XINFO_HAS_INVALS in spite of the
> existing call to the same fuction that unconditionally made.  It
> doesn't cost so much but also it's not free.
>
>>> And I'm not sure it is wise to mark all subtransactions as "catalog changed"
>>> always when the top transaction is XACT_XINFO_HAS_INVALS.
>> In order to avoid this,
>> can't we have a new flag (for example, in reorderbuffer struct) to check
>> if we start decoding from RUNNING_XACTS, which is similar to the first patch of [1]
>> and use it at DecodeCommit ? This still leads to some extra specific codes added
>> to DecodeCommit and this solution becomes a bit similar to other previous patches though.
> If it is somehow wrong in any sense that we add subtransactions in
> SnapBuildProcessRunningXacts (for example, we should avoid relaxing
> the assertion condition.), I think we would go another way.  Otherwise
> we don't even need that additional flag.  (But Sawadasan's recent PoC
> also needs that relaxation.)
>
> ASAICS, and unless I'm missing something (that odds are rtlatively
> high:p), we need the specially added subransactions only for the
> transactions that were running at passing the first RUNNING_XACTS,
> becuase otherwise (substantial) subtransactions are assigned to
> toplevel by the first record of the subtransaction.
>
> Before reaching consistency, DecodeCommit feeds the subtransactions to
> ReorderBufferForget individually so the subtransactions are not needed
> to be assigned to the top transaction at all. Since the
> subtransactions added by the first RUNNING_XACT are processed that
> way, we don't need in the first place to call ReorderBufferCommitChild
> for such subtransactions.
>
>> [1] - https://www.postgresql.org/message-id/81D0D8B0-E7C4-4999-B616-1E5004DBDCD2%40amazon.com
> regards.
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
Just rebased (minor change in the contrib/test_decoding/Makefile) the 
last POC version linked to the CF entry as it was failing the CF bot.

Thanks

Bertrand

Attachment
On Mon, Oct 11, 2021 at 11:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> It's much simpler than mine. I think that creating an entry of a
> catalog-changed transaction in the reorder buffer before
> SunapBuildCommitTxn() is the right direction.
>
> After more thought, given DDLs are not likely to happen than DML in
> practice, probably we can always mark both the top transaction and its
> subtransactions as containing catalog changes if the commit record has
> XACT_XINFO_HAS_INVALS?
>

I have some observations and thoughts on this work.

1.
+# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
+# only its COMMIT record, because it starts from the RUNNING_XACT
record emitted
+# during the second checkpoint execution.  This transaction must be marked as
+# containing catalog changes during decoding the COMMIT record and the decoding
+# of the INSERT record must read the pg_class with the correct
historic snapshot.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
"s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
"s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"

In the first line of comment, do you want to say "... record emitted
during the first checkpoint" because only then it can start from the
commit record of the transaction that has performed truncate.

2.
+ /*
+ * Mark the top transaction and its subtransactions as containing catalog
+ * changes, if the commit record has invalidation message.  This is necessary
+ * for the case where we decode only the commit record of the transaction
+ * that actually has done catalog changes.
+ */
+ if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
+ {
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
+
+ for (int i = 0; i < parsed->nsubxacts; i++)
+ {
+ ReorderBufferAssignChild(ctx->reorder, xid, parsed->subxacts[i],
+ buf->origptr);
+ ReorderBufferXidSetCatalogChanges(ctx->reorder, parsed->subxacts[i],
+   buf->origptr);
+ }
+ }
+
  SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
     parsed->nsubxacts, parsed->subxacts);

Marking it before SnapBuildCommitTxn has one disadvantage that we
sometimes do this work even if the snapshot state is SNAPBUILD_START
or SNAPBUILD_BUILDING_SNAPSHOT in which case SnapBuildCommitTxn
wouldn't do anything. Now, whereas this will fix the issue but it
seems we need to do this work even when we would have already marked
the txn has catalog changes, and then probably there are cases when we
mark them when it is not required as discussed in this thread.

I think if we don't have any better ideas then we should go with
either this or one of the other proposals in this thread. The other
idea that occurred to me is whether we can somehow update the snapshot
we have serialized on disk about this information. On each
running_xact record when we serialize the snapshot, we also try to
purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
that we can check if there are committed xacts to be purged and if we
have previously serialized the snapshot for the prior running xact
record, if so, we can update it with the list of xacts that have
catalog changes. If this is feasible then I think we need to somehow
remember the point where we last serialized the snapshot (maybe by
using builder->last_serialized_snapshot). Even, if this is feasible we
may not be able to do this in back-branches because of the disk-format
change required for this.

Thoughts?

-- 
With Regards,
Amit Kapila.



At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> I think if we don't have any better ideas then we should go with
> either this or one of the other proposals in this thread. The other
> idea that occurred to me is whether we can somehow update the snapshot
> we have serialized on disk about this information. On each
> running_xact record when we serialize the snapshot, we also try to
> purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> that we can check if there are committed xacts to be purged and if we
> have previously serialized the snapshot for the prior running xact
> record, if so, we can update it with the list of xacts that have
> catalog changes. If this is feasible then I think we need to somehow
> remember the point where we last serialized the snapshot (maybe by
> using builder->last_serialized_snapshot). Even, if this is feasible we
> may not be able to do this in back-branches because of the disk-format
> change required for this.
> 
> Thoughts?

I didn't look it closer, but it seems to work. I'm not sure how much
spurious invalidations at replication start impacts on performance,
but it is promising if the impact is significant.  That being said I'm
a bit negative for doing that in post-beta1 stage.

I thought for a moment that RUNNING_XACT might be able to contain
invalidation information but it seems too complex to happen with such
a frequency..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > I think if we don't have any better ideas then we should go with
> > either this or one of the other proposals in this thread. The other
> > idea that occurred to me is whether we can somehow update the snapshot
> > we have serialized on disk about this information. On each
> > running_xact record when we serialize the snapshot, we also try to
> > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > that we can check if there are committed xacts to be purged and if we
> > have previously serialized the snapshot for the prior running xact
> > record, if so, we can update it with the list of xacts that have
> > catalog changes. If this is feasible then I think we need to somehow
> > remember the point where we last serialized the snapshot (maybe by
> > using builder->last_serialized_snapshot). Even, if this is feasible we
> > may not be able to do this in back-branches because of the disk-format
> > change required for this.
> >
> > Thoughts?
>
> I didn't look it closer, but it seems to work. I'm not sure how much
> spurious invalidations at replication start impacts on performance,
> but it is promising if the impact is significant.
>

It seems Sawada-San's patch is doing at each commit not at the start
of replication and I think that is required because we need this each
time for replication restart. So, I feel this will be an ongoing
overhead for spurious cases with the current approach.

>  That being said I'm
> a bit negative for doing that in post-beta1 stage.
>

Fair point. We can use the do it early in PG-16 if the approach is
feasible, and backpatch something on lines of what Sawada-San or you
proposed.

-- 
With Regards,
Amit Kapila.



On Mon, May 23, 2022 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > I think if we don't have any better ideas then we should go with
> > > either this or one of the other proposals in this thread. The other
> > > idea that occurred to me is whether we can somehow update the snapshot
> > > we have serialized on disk about this information. On each
> > > running_xact record when we serialize the snapshot, we also try to
> > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > that we can check if there are committed xacts to be purged and if we
> > > have previously serialized the snapshot for the prior running xact
> > > record, if so, we can update it with the list of xacts that have
> > > catalog changes. If this is feasible then I think we need to somehow
> > > remember the point where we last serialized the snapshot (maybe by
> > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > may not be able to do this in back-branches because of the disk-format
> > > change required for this.
> > >
> > > Thoughts?

It seems to work, could you draft the patch?

> >
> > I didn't look it closer, but it seems to work. I'm not sure how much
> > spurious invalidations at replication start impacts on performance,
> > but it is promising if the impact is significant.
> >
>
> It seems Sawada-San's patch is doing at each commit not at the start
> of replication and I think that is required because we need this each
> time for replication restart. So, I feel this will be an ongoing
> overhead for spurious cases with the current approach.
>
> >  That being said I'm
> > a bit negative for doing that in post-beta1 stage.
> >
>
> Fair point. We can use the do it early in PG-16 if the approach is
> feasible, and backpatch something on lines of what Sawada-San or you
> proposed.

+1.

I proposed two approaches: [1] and [2,] and I prefer [1].
Horiguchi-san's idea[3] also looks good but I think it's better to
somehow deal with the problem he mentioned:

> One problem with this is that change creates the case where multiple
> ReorderBufferTXNs share the same first_lsn.  I haven't come up with a
> clean idea to avoid relaxing the restriction of AssertTXNLsnOrder..

Regards,

[1] https://www.postgresql.org/message-id/CAD21AoAn-k6OpZ6HSAH_G91tpTXR6KYvkf663kg6EqW-f6sz1w%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAD21AoD00wV4gt-53ze%2BZB8n4bqJrdH8J_UnDHddy8S2A%2Ba25g%40mail.gmail.com
[3] https://www.postgresql.org/message-id/20211008.165055.1621145185927268721.horikyota.ntt%40gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, May 24, 2022 at 7:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, May 23, 2022 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > > I think if we don't have any better ideas then we should go with
> > > > either this or one of the other proposals in this thread. The other
> > > > idea that occurred to me is whether we can somehow update the snapshot
> > > > we have serialized on disk about this information. On each
> > > > running_xact record when we serialize the snapshot, we also try to
> > > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > > that we can check if there are committed xacts to be purged and if we
> > > > have previously serialized the snapshot for the prior running xact
> > > > record, if so, we can update it with the list of xacts that have
> > > > catalog changes. If this is feasible then I think we need to somehow
> > > > remember the point where we last serialized the snapshot (maybe by
> > > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > > may not be able to do this in back-branches because of the disk-format
> > > > change required for this.
> > > >
> > > > Thoughts?
>
> It seems to work, could you draft the patch?
>

I can help with the review and discussion.

-- 
With Regards,
Amit Kapila.



On Tue, May 24, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, May 24, 2022 at 7:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, May 23, 2022 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> > > <horikyota.ntt@gmail.com> wrote:
> > > >
> > > > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > > > I think if we don't have any better ideas then we should go with
> > > > > either this or one of the other proposals in this thread. The other
> > > > > idea that occurred to me is whether we can somehow update the snapshot
> > > > > we have serialized on disk about this information. On each
> > > > > running_xact record when we serialize the snapshot, we also try to
> > > > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > > > that we can check if there are committed xacts to be purged and if we
> > > > > have previously serialized the snapshot for the prior running xact
> > > > > record, if so, we can update it with the list of xacts that have
> > > > > catalog changes. If this is feasible then I think we need to somehow
> > > > > remember the point where we last serialized the snapshot (maybe by
> > > > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > > > may not be able to do this in back-branches because of the disk-format
> > > > > change required for this.
> > > > >
> > > > > Thoughts?
> >
> > It seems to work, could you draft the patch?
> >
>
> I can help with the review and discussion.

Okay, I'll draft the patch for this idea.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, May 24, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, May 24, 2022 at 7:58 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, May 23, 2022 at 2:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> > > > <horikyota.ntt@gmail.com> wrote:
> > > > >
> > > > > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > > > > > I think if we don't have any better ideas then we should go with
> > > > > > either this or one of the other proposals in this thread. The other
> > > > > > idea that occurred to me is whether we can somehow update the snapshot
> > > > > > we have serialized on disk about this information. On each
> > > > > > running_xact record when we serialize the snapshot, we also try to
> > > > > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > > > > that we can check if there are committed xacts to be purged and if we
> > > > > > have previously serialized the snapshot for the prior running xact
> > > > > > record, if so, we can update it with the list of xacts that have
> > > > > > catalog changes. If this is feasible then I think we need to somehow
> > > > > > remember the point where we last serialized the snapshot (maybe by
> > > > > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > > > > may not be able to do this in back-branches because of the disk-format
> > > > > > change required for this.
> > > > > >
> > > > > > Thoughts?
> > >
> > > It seems to work, could you draft the patch?
> > >
> >
> > I can help with the review and discussion.
>
> Okay, I'll draft the patch for this idea.

I've attached three POC patches:

poc_remember_last_running_xacts_v2.patch is a rebased patch of my
previous proposal[1]. This is based on the original proposal: we
remember the last-running-xacts list of the first decoded
RUNNING_XACTS record and check if the transaction whose commit record
has XACT_XINFO_HAS_INVALS and whose xid is in the list. This doesn’t
require any file format changes but the transaction will end up being
added to the snapshot even if it has only relcache invalidations.

poc_add_running_catchanges_xacts_to_serialized_snapshot.patch is a
patch for the idea Amit Kapila proposed with some changes. The basic
approach is to remember the list of xids that changed catalogs and
were running when serializing the snapshot. The list of xids is kept
in SnapShotBuilder and is serialized and restored to/from the
serialized snapshot. When decoding a commit record, we check if the
transaction is already marked as catalog-changes or its xid is in the
list. If so, we add it to the snapshot. Unlike the first patch, it can
add only transactions properly that have changed catalogs, but as Amit
mentioned before, this idea cannot be back patched as this changes the
on-disk format of the serialized snapshot.

poc_add_regression_tests.patch adds regression tests for this bug. The
regression tests are required for both HEAD and back-patching but I've
separated this patch for testing the above two patches easily.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
>
> poc_add_regression_tests.patch adds regression tests for this bug. The
> regression tests are required for both HEAD and back-patching but I've
> separated this patch for testing the above two patches easily.
>

Few comments on the test case patch:
===============================
1.
+# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
+# only its COMMIT record, because it starts from the RUNNING_XACT
record emitted
+# during the first checkpoint execution.  This transaction must be marked as
+# catalog-changes while decoding the COMMIT record and the decoding
of the INSERT
+# record must read the pg_class with the correct historic snapshot.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
"s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
"s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"

Will this test always work? What if we get an additional running_xact
record between steps "s0_commit" and "s0_begin" that is logged via
bgwriter? You can mimic that by adding an additional checkpoint
between those two steps. If we do that, the test will pass even
without the patch because I think the last decoding will start
decoding from this new running_xact record.

2.
+step "s1_get_changes" { SELECT data FROM
pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
'include-xids', '0'); }

It is better to skip empty transactions by using 'skip-empty-xacts' to
avoid any transaction from a background process like autovacuum. We
have previously seen some buildfarm failures due to that.

3. Did you intentionally omit the .out from the test case patch?

4.
This transaction must be marked as
+# catalog-changes while decoding the COMMIT record and the decoding
of the INSERT
+# record must read the pg_class with the correct historic snapshot.

/marked as catalog-changes/marked as containing catalog changes

-- 
With Regards,
Amit Kapila.



On Tue, Jun 7, 2022 at 9:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > poc_add_regression_tests.patch adds regression tests for this bug. The
> > regression tests are required for both HEAD and back-patching but I've
> > separated this patch for testing the above two patches easily.
> >

Thank you for the comments.

>
> Few comments on the test case patch:
> ===============================
> 1.
> +# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
> +# only its COMMIT record, because it starts from the RUNNING_XACT
> record emitted
> +# during the first checkpoint execution.  This transaction must be marked as
> +# catalog-changes while decoding the COMMIT record and the decoding
> of the INSERT
> +# record must read the pg_class with the correct historic snapshot.
> +permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
> "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
> "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
>
> Will this test always work? What if we get an additional running_xact
> record between steps "s0_commit" and "s0_begin" that is logged via
> bgwriter? You can mimic that by adding an additional checkpoint
> between those two steps. If we do that, the test will pass even
> without the patch because I think the last decoding will start
> decoding from this new running_xact record.

Right. It could pass depending on the timing but doesn't fail
depending on the timing. I think we need to somehow stop bgwriter to
make the test case stable but it seems unrealistic. Do you have any
better ideas?

>
> 2.
> +step "s1_get_changes" { SELECT data FROM
> pg_logical_slot_get_changes('isolation_slot', NULL, NULL,
> 'include-xids', '0'); }
>
> It is better to skip empty transactions by using 'skip-empty-xacts' to
> avoid any transaction from a background process like autovacuum. We
> have previously seen some buildfarm failures due to that.

Agreed.

>
> 3. Did you intentionally omit the .out from the test case patch?

No, I'll add .out file in the next version patch.

>
> 4.
> This transaction must be marked as
> +# catalog-changes while decoding the COMMIT record and the decoding
> of the INSERT
> +# record must read the pg_class with the correct historic snapshot.
>
> /marked as catalog-changes/marked as containing catalog changes

Agreed.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Mon, Jun 13, 2022 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jun 7, 2022 at 9:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > >
> > > poc_add_regression_tests.patch adds regression tests for this bug. The
> > > regression tests are required for both HEAD and back-patching but I've
> > > separated this patch for testing the above two patches easily.
> > >
>
> Thank you for the comments.
>
> >
> > Few comments on the test case patch:
> > ===============================
> > 1.
> > +# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
> > +# only its COMMIT record, because it starts from the RUNNING_XACT
> > record emitted
> > +# during the first checkpoint execution.  This transaction must be marked as
> > +# catalog-changes while decoding the COMMIT record and the decoding
> > of the INSERT
> > +# record must read the pg_class with the correct historic snapshot.
> > +permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
> > "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
> > "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
> >
> > Will this test always work? What if we get an additional running_xact
> > record between steps "s0_commit" and "s0_begin" that is logged via
> > bgwriter? You can mimic that by adding an additional checkpoint
> > between those two steps. If we do that, the test will pass even
> > without the patch because I think the last decoding will start
> > decoding from this new running_xact record.
>
> Right. It could pass depending on the timing but doesn't fail
> depending on the timing. I think we need to somehow stop bgwriter to
> make the test case stable but it seems unrealistic.
>

Agreed, in my local testing for this case, I use to increase
LOG_SNAPSHOT_INTERVAL_MS to avoid such a situation but I understand it
is not practical via test.

> Do you have any
> better ideas?
>

No, I don't have any better ideas. I think it is better to add some
information related to this in the comments because it may help to
improve the test in the future if we come up with a better idea.

-- 
With Regards,
Amit Kapila.



On Tue, Jun 14, 2022 at 3:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jun 13, 2022 at 8:29 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jun 7, 2022 at 9:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Wed, May 25, 2022 at 12:11 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > >
> > > > poc_add_regression_tests.patch adds regression tests for this bug. The
> > > > regression tests are required for both HEAD and back-patching but I've
> > > > separated this patch for testing the above two patches easily.
> > > >
> >
> > Thank you for the comments.
> >
> > >
> > > Few comments on the test case patch:
> > > ===============================
> > > 1.
> > > +# For the transaction that TRUNCATEd the table tbl1, the last decoding decodes
> > > +# only its COMMIT record, because it starts from the RUNNING_XACT
> > > record emitted
> > > +# during the first checkpoint execution.  This transaction must be marked as
> > > +# catalog-changes while decoding the COMMIT record and the decoding
> > > of the INSERT
> > > +# record must read the pg_class with the correct historic snapshot.
> > > +permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate"
> > > "s1_checkpoint" "s1_get_changes" "s0_commit" "s0_begin" "s0_insert"
> > > "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
> > >
> > > Will this test always work? What if we get an additional running_xact
> > > record between steps "s0_commit" and "s0_begin" that is logged via
> > > bgwriter? You can mimic that by adding an additional checkpoint
> > > between those two steps. If we do that, the test will pass even
> > > without the patch because I think the last decoding will start
> > > decoding from this new running_xact record.
> >
> > Right. It could pass depending on the timing but doesn't fail
> > depending on the timing. I think we need to somehow stop bgwriter to
> > make the test case stable but it seems unrealistic.
> >
>
> Agreed, in my local testing for this case, I use to increase
> LOG_SNAPSHOT_INTERVAL_MS to avoid such a situation but I understand it
> is not practical via test.
>
> > Do you have any
> > better ideas?
> >
>
> No, I don't have any better ideas. I think it is better to add some
> information related to this in the comments because it may help to
> improve the test in the future if we come up with a better idea.

I also don't have any better ideas to make it stable, and agreed. I've
attached an updated version patch for adding regression tests.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached three POC patches:
>

I think it will be a good idea if you can add a short commit message
at least to say which patch is proposed for HEAD and which one is for
back branches. Also, it would be good if you can add some description
of the fix in the commit message. Let's remove poc* from the patch
name.

Review poc_add_running_catchanges_xacts_to_serialized_snapshot
=====================================================
1.
+ /*
+ * Array of transactions that were running when the snapshot serialization
+ * and changed system catalogs,

The part of the sentence after serialization is not very clear.

2.
- if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
+ if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid) ||
+ bsearch(&xid, builder->catchanges.xip, builder->catchanges.xcnt,
+ sizeof(TransactionId), xidComparator) != NULL)

Why are you using xid instead of subxid in bsearch call? Can we add a
comment to say why it is okay to use xid if there is a valid reason?
But note, we are using subxid to add to the committed xact array so
not sure if this is a good idea but I might be missing something.

Suggestions for improvement in comments:
-       /*
-        * Update the transactions that are running and changes
catalogs that are
-        * not committed.
-        */
+       /* Update the catalog modifying transactions that are yet not
committed. */
        if (builder->catchanges.xip)
                pfree(builder->catchanges.xip);
        builder->catchanges.xip =
ReorderBufferGetCatalogChangesXacts(builder->reorder,
@@ -1647,7 +1644,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
        COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
        ondisk_c += sz;

-       /* copy catalog-changes xacts */
+       /* copy catalog modifying xacts */
        sz = sizeof(TransactionId) * builder->catchanges.xcnt;
        memcpy(ondisk_c, builder->catchanges.xip, sz);
        COMP_CRC32C(ondisk->checksum, ondisk_c, sz);

-- 
With Regards,
Amit Kapila.



On Mon, Jul 4, 2022 at 6:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached three POC patches:
> >
>
> I think it will be a good idea if you can add a short commit message
> at least to say which patch is proposed for HEAD and which one is for
> back branches. Also, it would be good if you can add some description
> of the fix in the commit message. Let's remove poc* from the patch
> name.
>
> Review poc_add_running_catchanges_xacts_to_serialized_snapshot
> =====================================================

Few more comments:
1.
+
+ /* This array must be sorted in xidComparator order */
+ TransactionId *xip;
+ } catchanges;
 };

This array contains the transaction ids for subtransactions as well. I
think it is better mention the same in comments.

2. Are we anytime removing transaction ids from catchanges->xip array?
If not, is there a reason for the same? I think we can remove it
either at commit/abort or even immediately after adding the xid/subxid
to committed->xip array.

3.
+ if (readBytes != sz)
+ {
+ int save_errno = errno;
+
+ CloseTransientFile(fd);
+
+ if (readBytes < 0)
+ {
+ errno = save_errno;
+ ereport(ERROR,
+ (errcode_for_file_access(),
+ errmsg("could not read file \"%s\": %m", path)));
+ }
+ else
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("could not read file \"%s\": read %d of %zu",
+ path, readBytes, sz)));
+ }

This is the fourth instance of similar error handling code in
SnapBuildRestore(). Isn't it better to extract this into a separate
function?

4.
+TransactionId *
+ReorderBufferGetCatalogChangesXacts(ReorderBuffer *rb, size_t *xcnt_p)
+{
+ HASH_SEQ_STATUS hash_seq;
+ ReorderBufferTXNByIdEnt *ent;
+ TransactionId *xids;
+ size_t xcnt = 0;
+ size_t xcnt_space = 64; /* arbitrary number */
+
+ xids = (TransactionId *) palloc(sizeof(TransactionId) * xcnt_space);
+
+ hash_seq_init(&hash_seq, rb->by_txn);
+ while ((ent = hash_seq_search(&hash_seq)) != NULL)
+ {
+ ReorderBufferTXN *txn = ent->txn;
+
+ if (!rbtxn_has_catalog_changes(txn))
+ continue;

It would be better to allocate memory the first time we have to store
xids. There is a good chance that many a time this function will do
just palloc without having to store any xid.

5. Do you think we should do some performance testing for a mix of
ddl/dml workload to see if it adds any overhead in decoding due to
serialize/restore doing additional work? I don't think it should add
some meaningful overhead but OTOH there is no harm in doing some
testing of the same.

-- 
With Regards,
Amit Kapila.



On Mon, Jul 4, 2022 at 9:42 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached three POC patches:
> >
>
> I think it will be a good idea if you can add a short commit message
> at least to say which patch is proposed for HEAD and which one is for
> back branches. Also, it would be good if you can add some description
> of the fix in the commit message. Let's remove poc* from the patch
> name.

Updated.

>
> Review poc_add_running_catchanges_xacts_to_serialized_snapshot
> =====================================================
> 1.
> + /*
> + * Array of transactions that were running when the snapshot serialization
> + * and changed system catalogs,
>
> The part of the sentence after serialization is not very clear.

Updated.

>
> 2.
> - if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid))
> + if (ReorderBufferXidHasCatalogChanges(builder->reorder, subxid) ||
> + bsearch(&xid, builder->catchanges.xip, builder->catchanges.xcnt,
> + sizeof(TransactionId), xidComparator) != NULL)
>
> Why are you using xid instead of subxid in bsearch call? Can we add a
> comment to say why it is okay to use xid if there is a valid reason?
> But note, we are using subxid to add to the committed xact array so
> not sure if this is a good idea but I might be missing something.

You're right, subxid should be used here.

>
> Suggestions for improvement in comments:
> -       /*
> -        * Update the transactions that are running and changes
> catalogs that are
> -        * not committed.
> -        */
> +       /* Update the catalog modifying transactions that are yet not
> committed. */
>         if (builder->catchanges.xip)
>                 pfree(builder->catchanges.xip);
>         builder->catchanges.xip =
> ReorderBufferGetCatalogChangesXacts(builder->reorder,
> @@ -1647,7 +1644,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
>         COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
>         ondisk_c += sz;
>
> -       /* copy catalog-changes xacts */
> +       /* copy catalog modifying xacts */
>         sz = sizeof(TransactionId) * builder->catchanges.xcnt;
>         memcpy(ondisk_c, builder->catchanges.xip, sz);
>         COMP_CRC32C(ondisk->checksum, ondisk_c, sz);

Updated.

I'll post a new version patch in the next email with replying to other comments.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Wed, Jul 6, 2022 at 7:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I'll post a new version patch in the next email with replying to other comments.
>

Okay, thanks for working on this. Few comments/suggestions on
poc_remember_last_running_xacts_v2 patch:

1.
+ReorderBufferSetLastRunningXactsCatalogChanges(ReorderBuffer *rb,
TransactionId xid,
+    uint32 xinfo, int subxcnt,
+    TransactionId *subxacts, XLogRecPtr lsn)
+{
...
...
+
+ test = bsearch(&xid, rb->last_running_xacts, rb->n_last_running_xacts,
+    sizeof(TransactionId), xidComparator);
+
+ if (test == NULL)
+ {
+ for (int i = 0; i < subxcnt; i++)
+ {
+ test = bsearch(&subxacts[i], rb->last_running_xacts, rb->n_last_running_xacts,
+    sizeof(TransactionId), xidComparator);
...

Is there ever a possibility that the top transaction id is not in the
running_xacts list but one of its subxids is present? If yes, it is
not very obvious at least to me so adding a comment here could be
useful. If not, then why do we need this additional check for each of
the sub-transaction ids?

2.
@@ -627,6 +647,15 @@ DecodeCommit(LogicalDecodingContext *ctx,
XLogRecordBuffer *buf,
  commit_time = parsed->origin_timestamp;
  }

+ /*
+ * Set the last running xacts as containing catalog change if necessary.
+ * This must be done before SnapBuildCommitTxn() so that we include catalog
+ * change transactions to the historic snapshot.
+ */
+ ReorderBufferSetLastRunningXactsCatalogChanges(ctx->reorder, xid,
parsed->xinfo,
+    parsed->nsubxacts, parsed->subxacts,
+    buf->origptr);
+
  SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
     parsed->nsubxacts, parsed->subxacts);

As mentioned previously as well, marking it before SnapBuildCommitTxn
has one disadvantage, we sometimes do this work even if the snapshot
state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT in which case
SnapBuildCommitTxn wouldn't do anything. Can we instead check whether
the particular txn has invalidations and is present in the
last_running_xacts list along with the check
ReorderBufferXidHasCatalogChanges? I think that has the additional
advantage that we don't need this additional marking if the xact is
already marked as containing catalog changes.

3.
1.
+ /*
+ * We rely on HEAP2_NEW_CID records and XACT_INVALIDATIONS to know
+ * if the transaction has changed the catalog, and that information
+ * is not serialized to SnapBuilder.  Therefore, if the logical
+ * decoding decodes the commit record of the transaction that actually
+ * has done catalog changes without these records, we miss to add
+ * the xid to the snapshot so up creating the wrong snapshot.

The part of the sentence "... snapshot so up creating the wrong
snapshot." is not clear. In this comment, at one place you have used
two spaces after a full stop, and at another place, there is one
space. I think let's follow nearby code practice to use a single space
before a new sentence.

4.
+void
+ReorderBufferProcessLastRunningXacts(ReorderBuffer *rb,
xl_running_xacts *running)
+{
+ /* Quick exit if there is no longer last running xacts */
+ if (likely(rb->n_last_running_xacts == 0))
+ return;
+
+ /* First call, build the last running xact list */
+ if (rb->n_last_running_xacts == -1)
+ {
+ int nxacts = running->subxcnt + running->xcnt;
+ Size sz = sizeof(TransactionId) * nxacts;;
+
+ rb->last_running_xacts = MemoryContextAlloc(rb->context, sz);
+ memcpy(rb->last_running_xacts, running->xids, sz);
+ qsort(rb->last_running_xacts, nxacts, sizeof(TransactionId), xidComparator);
+
+ rb->n_last_running_xacts = nxacts;
+
+ return;
+ }

a. Can we add the function header comments for this function?
b. We seem to be tracking the running_xact information for the first
running_xact record after start/restart. The name last_running_xacts
doesn't sound appropriate for that, how about initial_running_xacts?

5.
+ /*
+ * Purge xids in the last running xacts list if we can do that for at least
+ * one xid.
+ */
+ if (NormalTransactionIdPrecedes(rb->last_running_xacts[0],
+ running->oldestRunningXid))

I think it would be a good idea to add a few lines here explaining why
it is safe to purge. IIUC, it is because the commit for those xacts
would have already been processed and we don't need such a xid
anymore.

6. As per the discussion above in this thread having
XACT_XINFO_HAS_INVALS in the commit record doesn't indicate that the
xact has catalog changes, so can we add somewhere in comments that for
such a case we can't distinguish whether the txn has catalog change
but we still mark the txn has catalog changes? Can you please share
one example for this case?

-- 
With Regards,
Amit Kapila.



On Tue, Jul 5, 2022 at 8:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 4, 2022 at 6:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, May 30, 2022 at 11:13 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I've attached three POC patches:
> > >
> >
> > I think it will be a good idea if you can add a short commit message
> > at least to say which patch is proposed for HEAD and which one is for
> > back branches. Also, it would be good if you can add some description
> > of the fix in the commit message. Let's remove poc* from the patch
> > name.
> >
> > Review poc_add_running_catchanges_xacts_to_serialized_snapshot
> > =====================================================
>
> Few more comments:

Thank you for the comments.

> 1.
> +
> + /* This array must be sorted in xidComparator order */
> + TransactionId *xip;
> + } catchanges;
>  };
>
> This array contains the transaction ids for subtransactions as well. I
> think it is better mention the same in comments.

Updated.

>
> 2. Are we anytime removing transaction ids from catchanges->xip array?

No.

> If not, is there a reason for the same? I think we can remove it
> either at commit/abort or even immediately after adding the xid/subxid
> to committed->xip array.

It might be a good idea but I'm concerned that removing XID from the
array at every commit/abort or after adding it to committed->xip array
might be costly as it requires adjustment of the array to keep its
order. Removing XIDs from the array would make bsearch faster but the
array is updated reasonably often (every 15 sec).

>
> 3.
> + if (readBytes != sz)
> + {
> + int save_errno = errno;
> +
> + CloseTransientFile(fd);
> +
> + if (readBytes < 0)
> + {
> + errno = save_errno;
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not read file \"%s\": %m", path)));
> + }
> + else
> + ereport(ERROR,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("could not read file \"%s\": read %d of %zu",
> + path, readBytes, sz)));
> + }
>
> This is the fourth instance of similar error handling code in
> SnapBuildRestore(). Isn't it better to extract this into a separate
> function?

Good idea, updated.

>
> 4.
> +TransactionId *
> +ReorderBufferGetCatalogChangesXacts(ReorderBuffer *rb, size_t *xcnt_p)
> +{
> + HASH_SEQ_STATUS hash_seq;
> + ReorderBufferTXNByIdEnt *ent;
> + TransactionId *xids;
> + size_t xcnt = 0;
> + size_t xcnt_space = 64; /* arbitrary number */
> +
> + xids = (TransactionId *) palloc(sizeof(TransactionId) * xcnt_space);
> +
> + hash_seq_init(&hash_seq, rb->by_txn);
> + while ((ent = hash_seq_search(&hash_seq)) != NULL)
> + {
> + ReorderBufferTXN *txn = ent->txn;
> +
> + if (!rbtxn_has_catalog_changes(txn))
> + continue;
>
> It would be better to allocate memory the first time we have to store
> xids. There is a good chance that many a time this function will do
> just palloc without having to store any xid.

Agreed.

>
> 5. Do you think we should do some performance testing for a mix of
> ddl/dml workload to see if it adds any overhead in decoding due to
> serialize/restore doing additional work? I don't think it should add
> some meaningful overhead but OTOH there is no harm in doing some
> testing of the same.

Yes, it would be worth trying. I also believe this change doesn't
introduce noticeable overhead but let's check just in case.

I've attached an updated patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Wed, Jul 6, 2022 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 5, 2022 at 8:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 2. Are we anytime removing transaction ids from catchanges->xip array?
>
> No.
>
> > If not, is there a reason for the same? I think we can remove it
> > either at commit/abort or even immediately after adding the xid/subxid
> > to committed->xip array.
>
> It might be a good idea but I'm concerned that removing XID from the
> array at every commit/abort or after adding it to committed->xip array
> might be costly as it requires adjustment of the array to keep its
> order. Removing XIDs from the array would make bsearch faster but the
> array is updated reasonably often (every 15 sec).
>

Fair point. However, I am slightly worried that we are unnecessarily
searching in this new array even when ReorderBufferTxn has the
required information. To avoid that, in function
SnapBuildXidHasCatalogChange(), we can first check
ReorderBufferXidHasCatalogChanges() and then check the array if the
first check doesn't return true. Also, by the way, do we need to
always keep builder->catchanges.xip updated via SnapBuildRestore()?
Isn't it sufficient that we just read and throw away contents from a
snapshot if builder->catchanges.xip is non-NULL?

I had additionally thought if can further optimize this solution to
just store this additional information when we need to serialize for
checkpoint record but I think that won't work because walsender can
restart even without resatart of server in which case the same problem
can occur. I am not if sure there is a way to further optimize this
solution, let me know if you have any ideas?

-- 
With Regards,
Amit Kapila.



On Wed, Jul 6, 2022 at 5:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 6, 2022 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 5, 2022 at 8:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 2. Are we anytime removing transaction ids from catchanges->xip array?
> >
> > No.
> >
> > > If not, is there a reason for the same? I think we can remove it
> > > either at commit/abort or even immediately after adding the xid/subxid
> > > to committed->xip array.
> >
> > It might be a good idea but I'm concerned that removing XID from the
> > array at every commit/abort or after adding it to committed->xip array
> > might be costly as it requires adjustment of the array to keep its
> > order. Removing XIDs from the array would make bsearch faster but the
> > array is updated reasonably often (every 15 sec).
> >
>
> Fair point. However, I am slightly worried that we are unnecessarily
> searching in this new array even when ReorderBufferTxn has the
> required information. To avoid that, in function
> SnapBuildXidHasCatalogChange(), we can first check
> ReorderBufferXidHasCatalogChanges() and then check the array if the
> first check doesn't return true. Also, by the way, do we need to
> always keep builder->catchanges.xip updated via SnapBuildRestore()?
> Isn't it sufficient that we just read and throw away contents from a
> snapshot if builder->catchanges.xip is non-NULL?

IIUC catchanges.xip is restored only once when restoring a consistent
snapshot via SnapBuildRestore(). I think it's necessary to set
catchanges.xip for later use in SnapBuildXidHasCatalogChange(). Or did
you mean via SnapBuildSerialize()?∫

>
> I had additionally thought if can further optimize this solution to
> just store this additional information when we need to serialize for
> checkpoint record but I think that won't work because walsender can
> restart even without resatart of server in which case the same problem
> can occur.

Yes, probably we need to write catalog modifying transactions for
every serialized snapshot.

> I am not if sure there is a way to further optimize this
> solution, let me know if you have any ideas?

I suppose that writing additional information to serialized snapshots
would not be a noticeable overhead since we need 4 bytes per
transaction and we would not expect there is a huge number of
concurrent catalog modifying transactions. But both collecting catalog
modifying transactions (especially when there are many ongoing
transactions) and bsearch'ing on the XID list every time decoding the
COMMIT record could bring overhead.

A solution for the first point would be to keep track of catalog
modifying transactions by using a linked list so that we can avoid
checking all ongoing transactions.

Regarding the second point, on reflection, I think we need to look up
the XID list until all XID in the list is committed/aborted. We can
remove XIDs from the list after adding it to committed.xip as you
suggested. Or when decoding a RUNNING_XACTS record, we can remove XIDs
older than builder->xmin from the list like we do for committed.xip in
SnapBuildPurgeCommittedTxn().

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Thu, Jul 7, 2022 at 8:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 6, 2022 at 5:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jul 5, 2022 at 8:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > 2. Are we anytime removing transaction ids from catchanges->xip array?
> > >
> > > No.
> > >
> > > > If not, is there a reason for the same? I think we can remove it
> > > > either at commit/abort or even immediately after adding the xid/subxid
> > > > to committed->xip array.
> > >
> > > It might be a good idea but I'm concerned that removing XID from the
> > > array at every commit/abort or after adding it to committed->xip array
> > > might be costly as it requires adjustment of the array to keep its
> > > order. Removing XIDs from the array would make bsearch faster but the
> > > array is updated reasonably often (every 15 sec).
> > >
> >
> > Fair point. However, I am slightly worried that we are unnecessarily
> > searching in this new array even when ReorderBufferTxn has the
> > required information. To avoid that, in function
> > SnapBuildXidHasCatalogChange(), we can first check
> > ReorderBufferXidHasCatalogChanges() and then check the array if the
> > first check doesn't return true. Also, by the way, do we need to
> > always keep builder->catchanges.xip updated via SnapBuildRestore()?
> > Isn't it sufficient that we just read and throw away contents from a
> > snapshot if builder->catchanges.xip is non-NULL?
>
> IIUC catchanges.xip is restored only once when restoring a consistent
> snapshot via SnapBuildRestore(). I think it's necessary to set
> catchanges.xip for later use in SnapBuildXidHasCatalogChange(). Or did
> you mean via SnapBuildSerialize()?∫
>

Sorry, I got confused about the way restore is used. You are right, it
will be done once. My main worry is that we shouldn't look at
builder->catchanges.xip array on an ongoing basis which I think can be
dealt with by one of the ideas you mentioned below. But, I think we
can still follow the other suggestion related to moving
ReorderBufferXidHasCatalogChanges() check prior to checking array.

> >
> > I had additionally thought if can further optimize this solution to
> > just store this additional information when we need to serialize for
> > checkpoint record but I think that won't work because walsender can
> > restart even without resatart of server in which case the same problem
> > can occur.
>
> Yes, probably we need to write catalog modifying transactions for
> every serialized snapshot.
>
> > I am not if sure there is a way to further optimize this
> > solution, let me know if you have any ideas?
>
> I suppose that writing additional information to serialized snapshots
> would not be a noticeable overhead since we need 4 bytes per
> transaction and we would not expect there is a huge number of
> concurrent catalog modifying transactions. But both collecting catalog
> modifying transactions (especially when there are many ongoing
> transactions) and bsearch'ing on the XID list every time decoding the
> COMMIT record could bring overhead.
>
> A solution for the first point would be to keep track of catalog
> modifying transactions by using a linked list so that we can avoid
> checking all ongoing transactions.
>

This sounds reasonable to me.

> Regarding the second point, on reflection, I think we need to look up
> the XID list until all XID in the list is committed/aborted. We can
> remove XIDs from the list after adding it to committed.xip as you
> suggested. Or when decoding a RUNNING_XACTS record, we can remove XIDs
> older than builder->xmin from the list like we do for committed.xip in
> SnapBuildPurgeCommittedTxn().
>

I think doing along with RUNNING_XACTS should be fine. At each
commit/abort, the cost could be high because we need to maintain the
sort order. In general, I feel any one of these should be okay because
once the array becomes empty, it won't be used again and there won't
be any operation related to it during ongoing replication.

--
With Regards,
Amit Kapila.



On Thu, Jul 7, 2022 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 8:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 6, 2022 at 5:55 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jul 6, 2022 at 12:19 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 5, 2022 at 8:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > 2. Are we anytime removing transaction ids from catchanges->xip array?
> > > >
> > > > No.
> > > >
> > > > > If not, is there a reason for the same? I think we can remove it
> > > > > either at commit/abort or even immediately after adding the xid/subxid
> > > > > to committed->xip array.
> > > >
> > > > It might be a good idea but I'm concerned that removing XID from the
> > > > array at every commit/abort or after adding it to committed->xip array
> > > > might be costly as it requires adjustment of the array to keep its
> > > > order. Removing XIDs from the array would make bsearch faster but the
> > > > array is updated reasonably often (every 15 sec).
> > > >
> > >
> > > Fair point. However, I am slightly worried that we are unnecessarily
> > > searching in this new array even when ReorderBufferTxn has the
> > > required information. To avoid that, in function
> > > SnapBuildXidHasCatalogChange(), we can first check
> > > ReorderBufferXidHasCatalogChanges() and then check the array if the
> > > first check doesn't return true. Also, by the way, do we need to
> > > always keep builder->catchanges.xip updated via SnapBuildRestore()?
> > > Isn't it sufficient that we just read and throw away contents from a
> > > snapshot if builder->catchanges.xip is non-NULL?
> >
> > IIUC catchanges.xip is restored only once when restoring a consistent
> > snapshot via SnapBuildRestore(). I think it's necessary to set
> > catchanges.xip for later use in SnapBuildXidHasCatalogChange(). Or did
> > you mean via SnapBuildSerialize()?∫
> >
>
> Sorry, I got confused about the way restore is used. You are right, it
> will be done once. My main worry is that we shouldn't look at
> builder->catchanges.xip array on an ongoing basis which I think can be
> dealt with by one of the ideas you mentioned below. But, I think we
> can still follow the other suggestion related to moving
> ReorderBufferXidHasCatalogChanges() check prior to checking array.

Agreed. I've incorporated this change in the new version patch.

>
> > >
> > > I had additionally thought if can further optimize this solution to
> > > just store this additional information when we need to serialize for
> > > checkpoint record but I think that won't work because walsender can
> > > restart even without resatart of server in which case the same problem
> > > can occur.
> >
> > Yes, probably we need to write catalog modifying transactions for
> > every serialized snapshot.
> >
> > > I am not if sure there is a way to further optimize this
> > > solution, let me know if you have any ideas?
> >
> > I suppose that writing additional information to serialized snapshots
> > would not be a noticeable overhead since we need 4 bytes per
> > transaction and we would not expect there is a huge number of
> > concurrent catalog modifying transactions. But both collecting catalog
> > modifying transactions (especially when there are many ongoing
> > transactions) and bsearch'ing on the XID list every time decoding the
> > COMMIT record could bring overhead.
> >
> > A solution for the first point would be to keep track of catalog
> > modifying transactions by using a linked list so that we can avoid
> > checking all ongoing transactions.
> >
>
> This sounds reasonable to me.
>
> > Regarding the second point, on reflection, I think we need to look up
> > the XID list until all XID in the list is committed/aborted. We can
> > remove XIDs from the list after adding it to committed.xip as you
> > suggested. Or when decoding a RUNNING_XACTS record, we can remove XIDs
> > older than builder->xmin from the list like we do for committed.xip in
> > SnapBuildPurgeCommittedTxn().
> >
>
> I think doing along with RUNNING_XACTS should be fine. At each
> commit/abort, the cost could be high because we need to maintain the
> sort order. In general, I feel any one of these should be okay because
> once the array becomes empty, it won't be used again and there won't
> be any operation related to it during ongoing replication.

I've attached the new version patch that incorporates the comments and
the optimizations discussed above.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Fri, Jul 8, 2022 at 6:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 7, 2022 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jul 7, 2022 at 8:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached the new version patch that incorporates the comments and
> the optimizations discussed above.
>

Thanks, few minor comments:
1.
In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
list length of 'catchange_txns' to allocate xids array? If we can do
so, then we will save the need to repalloc as well.

2.
/* ->committed manipulation */
static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);

The above comment also needs to be changed.

3. As SnapBuildPurgeCommittedTxn() removes xacts both from committed
and catchange arrays, the function name no more remains appropriate.
We can either rename to something like SnapBuildPurgeOlderTxn() or
move the catchange logic to a different function and call it from
SnapBuildProcessRunningXacts.

4.
+ if (TransactionIdEquals(builder->catchange.xip[off],
+ builder->xmin) ||
+ NormalTransactionIdFollows(builder->catchange.xip[off],
+    builder->xmin))

Can we use TransactionIdFollowsOrEquals() instead of above?

5. Comment change suggestion:
/*
  * Remove knowledge about transactions we treat as committed or
containing catalog
  * changes that are smaller than ->xmin. Those won't ever get checked via
- * the ->committed array and ->catchange, respectively. The committed xids will
- * get checked via the clog machinery.
+ * the ->committed or ->catchange array, respectively. The committed xids will
+ * get checked via the clog machinery. We can ideally remove the transaction
+ * from catchange array once it is finished (committed/aborted) but that could
+ * be costly as we need to maintain the xids order in the array.
  */

Apart from the above, I think there are pending comments for the
back-branch patch and some performance testing of this work.

-- 
With Regards,
Amit Kapila.



On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 6:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jul 7, 2022 at 3:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Jul 7, 2022 at 8:21 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached the new version patch that incorporates the comments and
> > the optimizations discussed above.
> >
>
> Thanks, few minor comments:

Thank you for the comments.

> 1.
> In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> list length of 'catchange_txns' to allocate xids array? If we can do
> so, then we will save the need to repalloc as well.

Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
catalog modifying transactions, the length of the array could be
bigger than the one taken last time. We can start with the previous
length but I think we cannot remove the need for repalloc.

> 2.
> /* ->committed manipulation */
> static void SnapBuildPurgeCommittedTxn(SnapBuild *builder);
>
> The above comment also needs to be changed.
>
> 3. As SnapBuildPurgeCommittedTxn() removes xacts both from committed
> and catchange arrays, the function name no more remains appropriate.
> We can either rename to something like SnapBuildPurgeOlderTxn() or
> move the catchange logic to a different function and call it from
> SnapBuildProcessRunningXacts.
>
> 4.
> + if (TransactionIdEquals(builder->catchange.xip[off],
> + builder->xmin) ||
> + NormalTransactionIdFollows(builder->catchange.xip[off],
> +    builder->xmin))
>
> Can we use TransactionIdFollowsOrEquals() instead of above?
>
> 5. Comment change suggestion:
> /*
>   * Remove knowledge about transactions we treat as committed or
> containing catalog
>   * changes that are smaller than ->xmin. Those won't ever get checked via
> - * the ->committed array and ->catchange, respectively. The committed xids will
> - * get checked via the clog machinery.
> + * the ->committed or ->catchange array, respectively. The committed xids will
> + * get checked via the clog machinery. We can ideally remove the transaction
> + * from catchange array once it is finished (committed/aborted) but that could
> + * be costly as we need to maintain the xids order in the array.
>   */
>

Agreed with the above comments.

> Apart from the above, I think there are pending comments for the
> back-branch patch and some performance testing of this work.

Right. I'll incorporate all comments I got so far into these patches
and submit them. Also, will do some benchmark tests.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> > 1.
> > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > list length of 'catchange_txns' to allocate xids array? If we can do
> > so, then we will save the need to repalloc as well.
>
> Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> catalog modifying transactions, the length of the array could be
> bigger than the one taken last time. We can start with the previous
> length but I think we cannot remove the need for repalloc.
>

It is using the list "catchange_txns" to form xid array which
shouldn't change for the duration of
ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
array after its use. Next time in
ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
array happens, so not sure why repalloc would be required?

-- 
With Regards,
Amit Kapila.



On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > > 1.
> > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > so, then we will save the need to repalloc as well.
> >
> > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > catalog modifying transactions, the length of the array could be
> > bigger than the one taken last time. We can start with the previous
> > length but I think we cannot remove the need for repalloc.
> >
>
> It is using the list "catchange_txns" to form xid array which
> shouldn't change for the duration of
> ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> array after its use. Next time in
> ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> array happens, so not sure why repalloc would be required?

Oops, I mistook catchange_txns for catchange->xcnt. You're right.
Starting with the length of catchange_txns should be sufficient.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Wed, Jul 6, 2022 at 3:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 6, 2022 at 7:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I'll post a new version patch in the next email with replying to other comments.
> >
>
> Okay, thanks for working on this. Few comments/suggestions on
> poc_remember_last_running_xacts_v2 patch:
>
> 1.
> +ReorderBufferSetLastRunningXactsCatalogChanges(ReorderBuffer *rb,
> TransactionId xid,
> +    uint32 xinfo, int subxcnt,
> +    TransactionId *subxacts, XLogRecPtr lsn)
> +{
> ...
> ...
> +
> + test = bsearch(&xid, rb->last_running_xacts, rb->n_last_running_xacts,
> +    sizeof(TransactionId), xidComparator);
> +
> + if (test == NULL)
> + {
> + for (int i = 0; i < subxcnt; i++)
> + {
> + test = bsearch(&subxacts[i], rb->last_running_xacts, rb->n_last_running_xacts,
> +    sizeof(TransactionId), xidComparator);
> ...
>
> Is there ever a possibility that the top transaction id is not in the
> running_xacts list but one of its subxids is present? If yes, it is
> not very obvious at least to me so adding a comment here could be
> useful. If not, then why do we need this additional check for each of
> the sub-transaction ids?

I think there is no possibility. The check for subtransactions is not necessary.

>
> 2.
> @@ -627,6 +647,15 @@ DecodeCommit(LogicalDecodingContext *ctx,
> XLogRecordBuffer *buf,
>   commit_time = parsed->origin_timestamp;
>   }
>
> + /*
> + * Set the last running xacts as containing catalog change if necessary.
> + * This must be done before SnapBuildCommitTxn() so that we include catalog
> + * change transactions to the historic snapshot.
> + */
> + ReorderBufferSetLastRunningXactsCatalogChanges(ctx->reorder, xid,
> parsed->xinfo,
> +    parsed->nsubxacts, parsed->subxacts,
> +    buf->origptr);
> +
>   SnapBuildCommitTxn(ctx->snapshot_builder, buf->origptr, xid,
>      parsed->nsubxacts, parsed->subxacts);
>
> As mentioned previously as well, marking it before SnapBuildCommitTxn
> has one disadvantage, we sometimes do this work even if the snapshot
> state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT in which case
> SnapBuildCommitTxn wouldn't do anything. Can we instead check whether
> the particular txn has invalidations and is present in the
> last_running_xacts list along with the check
> ReorderBufferXidHasCatalogChanges? I think that has the additional
> advantage that we don't need this additional marking if the xact is
> already marked as containing catalog changes.

Agreed.

>
> 3.
> 1.
> + /*
> + * We rely on HEAP2_NEW_CID records and XACT_INVALIDATIONS to know
> + * if the transaction has changed the catalog, and that information
> + * is not serialized to SnapBuilder.  Therefore, if the logical
> + * decoding decodes the commit record of the transaction that actually
> + * has done catalog changes without these records, we miss to add
> + * the xid to the snapshot so up creating the wrong snapshot.
>
> The part of the sentence "... snapshot so up creating the wrong
> snapshot." is not clear. In this comment, at one place you have used
> two spaces after a full stop, and at another place, there is one
> space. I think let's follow nearby code practice to use a single space
> before a new sentence.

Agreed.

>
> 4.
> +void
> +ReorderBufferProcessLastRunningXacts(ReorderBuffer *rb,
> xl_running_xacts *running)
> +{
> + /* Quick exit if there is no longer last running xacts */
> + if (likely(rb->n_last_running_xacts == 0))
> + return;
> +
> + /* First call, build the last running xact list */
> + if (rb->n_last_running_xacts == -1)
> + {
> + int nxacts = running->subxcnt + running->xcnt;
> + Size sz = sizeof(TransactionId) * nxacts;;
> +
> + rb->last_running_xacts = MemoryContextAlloc(rb->context, sz);
> + memcpy(rb->last_running_xacts, running->xids, sz);
> + qsort(rb->last_running_xacts, nxacts, sizeof(TransactionId), xidComparator);
> +
> + rb->n_last_running_xacts = nxacts;
> +
> + return;
> + }
>
> a. Can we add the function header comments for this function?

Updated.

> b. We seem to be tracking the running_xact information for the first
> running_xact record after start/restart. The name last_running_xacts
> doesn't sound appropriate for that, how about initial_running_xacts?

Sound good, updated.

>
> 5.
> + /*
> + * Purge xids in the last running xacts list if we can do that for at least
> + * one xid.
> + */
> + if (NormalTransactionIdPrecedes(rb->last_running_xacts[0],
> + running->oldestRunningXid))
>
> I think it would be a good idea to add a few lines here explaining why
> it is safe to purge. IIUC, it is because the commit for those xacts
> would have already been processed and we don't need such a xid
> anymore.

Right, updated.

>
> 6. As per the discussion above in this thread having
> XACT_XINFO_HAS_INVALS in the commit record doesn't indicate that the
> xact has catalog changes, so can we add somewhere in comments that for
> such a case we can't distinguish whether the txn has catalog change
> but we still mark the txn has catalog changes?

Agreed.

> Can you please share one example for this case?

I think it depends on what we did in the transaction but one example I
have is that a commit record for ALTER DATABASE has only a snapshot
invalidation message:

=# alter database postgrse set log_statement to 'all';
ALTER DATABASE

$ pg_waldump $PGDATA/pg_wal/000000010000000000000001 | tail -1
rmgr: Transaction len (rec/tot):     66/    66, tx:        821, lsn:
0/019B50A8, prev 0/019B5070, desc: COMMIT 2022-07-11 21:38:44.036513
JST; inval msgs: snapshot 2964

I've attached an updated patch, please review it.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Fri, Jul 8, 2022 at 8:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > > 1.
> > > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > > so, then we will save the need to repalloc as well.
> > >
> > > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > > catalog modifying transactions, the length of the array could be
> > > bigger than the one taken last time. We can start with the previous
> > > length but I think we cannot remove the need for repalloc.
> > >
> >
> > It is using the list "catchange_txns" to form xid array which
> > shouldn't change for the duration of
> > ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> > array after its use. Next time in
> > ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> > array happens, so not sure why repalloc would be required?
>
> Oops, I mistook catchange_txns for catchange->xcnt. You're right.
> Starting with the length of catchange_txns should be sufficient.
>

I've attached an updated patch.

While trying this idea, I noticed there is no API to get the length of
dlist, as we discussed offlist. Alternative idea was to use List
(T_XidList) but I'm not sure it's a great idea since deleting an xid
from the list is O(N), we need to implement list_delete_xid, and we
need to make sure allocating list node in the reorder buffer context.
So in the patch, I added a variable, catchange_ntxns, to keep track of
the length of the dlist. Please review it.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Tue, Jul 12, 2022 at 9:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 8, 2022 at 8:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > >
> > > > > 1.
> > > > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > > > so, then we will save the need to repalloc as well.
> > > >
> > > > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > > > catalog modifying transactions, the length of the array could be
> > > > bigger than the one taken last time. We can start with the previous
> > > > length but I think we cannot remove the need for repalloc.
> > > >
> > >
> > > It is using the list "catchange_txns" to form xid array which
> > > shouldn't change for the duration of
> > > ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> > > array after its use. Next time in
> > > ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> > > array happens, so not sure why repalloc would be required?
> >
> > Oops, I mistook catchange_txns for catchange->xcnt. You're right.
> > Starting with the length of catchange_txns should be sufficient.
> >
>
> I've attached an updated patch.
>
> While trying this idea, I noticed there is no API to get the length of
> dlist, as we discussed offlist. Alternative idea was to use List
> (T_XidList) but I'm not sure it's a great idea since deleting an xid
> from the list is O(N), we need to implement list_delete_xid, and we
> need to make sure allocating list node in the reorder buffer context.
> So in the patch, I added a variable, catchange_ntxns, to keep track of
> the length of the dlist. Please review it.
>

I'm doing benchmark tests and will share the results.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> I've attached an updated patch.
> 
> While trying this idea, I noticed there is no API to get the length of
> dlist, as we discussed offlist. Alternative idea was to use List
> (T_XidList) but I'm not sure it's a great idea since deleting an xid
> from the list is O(N), we need to implement list_delete_xid, and we
> need to make sure allocating list node in the reorder buffer context.
> So in the patch, I added a variable, catchange_ntxns, to keep track of
> the length of the dlist. Please review it.
> 

Thanks for your patch. Here are some comments on the master patch.

1.
In catalog_change_snapshot.spec, should we use "RUNNING_XACTS record" instead of
"RUNNING_XACT record" / "XACT_RUNNING record" in the comment?

2.
+         * Since catchange.xip is sorted, we find the lower bound of
+         * xids that sill are interesting.

Typo?
"sill" -> "still"

3.
+     * This array is set once when restoring the snapshot, xids are removed
+     * from the array when decoding xl_running_xacts record, and then eventually
+     * becomes an empty.

+            /* catchange list becomes an empty */
+            pfree(builder->catchange.xip);
+            builder->catchange.xip = NULL;

Should "becomes an empty" be modified to "becomes empty"?

4.
+ * changes that are smaller than ->xmin. Those won't ever get checked via
+ * the ->committed array and ->catchange, respectively. The committed xids will

Should we change 
"the ->committed array and ->catchange"
to
"the ->committed or ->catchange array"
?

Regards,
Shi yu


On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 9:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jul 8, 2022 at 8:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jul 8, 2022 at 5:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 8, 2022 at 12:46 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > > On Fri, Jul 8, 2022 at 3:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > > >
> > > > >
> > > > > > 1.
> > > > > > In ReorderBufferGetCatalogChangesXacts(), isn't it better to use the
> > > > > > list length of 'catchange_txns' to allocate xids array? If we can do
> > > > > > so, then we will save the need to repalloc as well.
> > > > >
> > > > > Since ReorderBufferGetcatalogChangesXacts() collects all ongoing
> > > > > catalog modifying transactions, the length of the array could be
> > > > > bigger than the one taken last time. We can start with the previous
> > > > > length but I think we cannot remove the need for repalloc.
> > > > >
> > > >
> > > > It is using the list "catchange_txns" to form xid array which
> > > > shouldn't change for the duration of
> > > > ReorderBufferGetCatalogChangesXacts(). Then the caller frees the xid
> > > > array after its use. Next time in
> > > > ReorderBufferGetCatalogChangesXacts(), the fresh allocation for xid
> > > > array happens, so not sure why repalloc would be required?
> > >
> > > Oops, I mistook catchange_txns for catchange->xcnt. You're right.
> > > Starting with the length of catchange_txns should be sufficient.
> > >
> >
> > I've attached an updated patch.
> >
> > While trying this idea, I noticed there is no API to get the length of
> > dlist, as we discussed offlist. Alternative idea was to use List
> > (T_XidList) but I'm not sure it's a great idea since deleting an xid
> > from the list is O(N), we need to implement list_delete_xid, and we
> > need to make sure allocating list node in the reorder buffer context.
> > So in the patch, I added a variable, catchange_ntxns, to keep track of
> > the length of the dlist. Please review it.
> >
>
> I'm doing benchmark tests and will share the results.
>

I've done benchmark tests to measure the overhead introduced by doing
bsearch() every time when decoding a commit record. I've simulated a
very intensified situation where we decode 1M commit records while
keeping builder->catchange.xip array but the overhead is negilible:

HEAD: 584 ms
Patched: 614 ms

I've attached the benchmark script I used. With increasing
LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
pg_logicla_slot_get_changes() decodes 1M commit records while keeping
catalog modifying transactions.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > I'm doing benchmark tests and will share the results.
> >
>
> I've done benchmark tests to measure the overhead introduced by doing
> bsearch() every time when decoding a commit record. I've simulated a
> very intensified situation where we decode 1M commit records while
> keeping builder->catchange.xip array but the overhead is negilible:
>
> HEAD: 584 ms
> Patched: 614 ms
>
> I've attached the benchmark script I used. With increasing
> LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
> pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> catalog modifying transactions.
>

Thanks for the test. We should also see how it performs when (a) we
don't change LOG_SNAPSHOT_INTERVAL_MS, and (b) we have more DDL xacts
so that the array to search is somewhat bigger

-- 
With Regards,
Amit Kapila.



On Tue, Jul 12, 2022 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > I'm doing benchmark tests and will share the results.
> > >
> >
> > I've done benchmark tests to measure the overhead introduced by doing
> > bsearch() every time when decoding a commit record. I've simulated a
> > very intensified situation where we decode 1M commit records while
> > keeping builder->catchange.xip array but the overhead is negilible:
> >
> > HEAD: 584 ms
> > Patched: 614 ms
> >
> > I've attached the benchmark script I used. With increasing
> > LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
> > pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> > catalog modifying transactions.
> >
>
> Thanks for the test. We should also see how it performs when (a) we
> don't change LOG_SNAPSHOT_INTERVAL_MS,

What point do you want to see in this test? I think the performance
overhead depends on how many times we do bsearch() and how many
transactions are in the list. I increased this value to easily
simulate the situation where we decode many commit records while
keeping catalog modifying transactions. But even if we don't change
this value, the result would not change if we don't change how many
commit records we decode.

> and (b) we have more DDL xacts
> so that the array to search is somewhat bigger

I've done the same performance tests while creating 64 catalog
modifying transactions. The result is:

HEAD: 595 ms
Patched: 628 ms

There was no big overhead.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 12, 2022 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > >
> > > > I'm doing benchmark tests and will share the results.
> > > >
> > >
> > > I've done benchmark tests to measure the overhead introduced by doing
> > > bsearch() every time when decoding a commit record. I've simulated a
> > > very intensified situation where we decode 1M commit records while
> > > keeping builder->catchange.xip array but the overhead is negilible:
> > >
> > > HEAD: 584 ms
> > > Patched: 614 ms
> > >
> > > I've attached the benchmark script I used. With increasing
> > > LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
> > > pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> > > catalog modifying transactions.
> > >
> >
> > Thanks for the test. We should also see how it performs when (a) we
> > don't change LOG_SNAPSHOT_INTERVAL_MS,
>
> What point do you want to see in this test? I think the performance
> overhead depends on how many times we do bsearch() and how many
> transactions are in the list.
>

Right, I am not expecting any visible performance difference in this
case. This is to ensure that we are not incurring any overhead in the
more usual scenarios (or default cases). As per my understanding, the
purpose of increasing the value of LOG_SNAPSHOT_INTERVAL_MS is to
simulate a stress case for the changes made by the patch, and keeping
its value default will test the more usual scenarios.

> I increased this value to easily
> simulate the situation where we decode many commit records while
> keeping catalog modifying transactions. But even if we don't change
> this value, the result would not change if we don't change how many
> commit records we decode.
>
> > and (b) we have more DDL xacts
> > so that the array to search is somewhat bigger
>
> I've done the same performance tests while creating 64 catalog
> modifying transactions. The result is:
>
> HEAD: 595 ms
> Patched: 628 ms
>
> There was no big overhead.
>

Yeah, especially considering you have simulated a stress case for the patch.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> I've attached an updated patch.
> 

Hi,

I met a segmentation fault in test_decoding test after applying the patch for master
branch. Attach the backtrace.

It happened when executing the following code because it tried to free a NULL
pointer (catchange_xip).

    /* be tidy */
     if (ondisk)
         pfree(ondisk);
+    if (catchange_xip)
+        pfree(catchange_xip);
 }

It seems to be related to configure option. I could reproduce it when using
`./configure --enable-debug`.
But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -ggdb"`.

Regards,
Shi yu

Attachment
On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached an updated patch.
> >
>
> Hi,
>
> I met a segmentation fault in test_decoding test after applying the patch for master
> branch. Attach the backtrace.

Thank you for testing the patch!

>
> It happened when executing the following code because it tried to free a NULL
> pointer (catchange_xip).
>
>         /* be tidy */
>         if (ondisk)
>                 pfree(ondisk);
> +       if (catchange_xip)
> +               pfree(catchange_xip);
>  }
>
> It seems to be related to configure option. I could reproduce it when using
> `./configure --enable-debug`.
> But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -ggdb"`.

Hmm, I could not reproduce this problem even if I use ./configure
--enable-debug. And it's weird that we checked if catchange_xip is not
null but we did pfree for it:

#1  pfree (pointer=0x0) at mcxt.c:1177
#2  0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
lsn=25719712) at snapbuild.c:1792

Is it reproducible in your environment? If so, could you test it again
with the following changes?

diff --git a/src/backend/replication/logical/snapbuild.c
b/src/backend/replication/logical/snapbuild.c
index d015c06ced..a6e76e3781 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1788,7 +1788,7 @@ out:
    /* be tidy */
    if (ondisk)
        pfree(ondisk);
-   if (catchange_xip)
+   if (catchange_xip != NULL)
        pfree(catchange_xip);
 }

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 12, 2022 at 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
> >
> > It happened when executing the following code because it tried to free a NULL
> > pointer (catchange_xip).
> >
> >         /* be tidy */
> >         if (ondisk)
> >                 pfree(ondisk);
> > +       if (catchange_xip)
> > +               pfree(catchange_xip);
> >  }
> >
> > It seems to be related to configure option. I could reproduce it when using
> > `./configure --enable-debug`.
> > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -ggdb"`.
>
> Hmm, I could not reproduce this problem even if I use ./configure
> --enable-debug. And it's weird that we checked if catchange_xip is not
> null but we did pfree for it:
>

Yeah, this looks weird to me as well but one difference in running
tests could be the timing of WAL LOG for XLOG_RUNNING_XACTS. That may
change the timing of SnapBuildSerialize. The other thing we can try is
by checking the value of catchange_xcnt before pfree.

BTW, I think ReorderBufferGetCatalogChangesXacts should have an Assert
to ensure rb->catchange_ntxns and xcnt are equal. We can probably then
avoid having xcnt_p as an out parameter as the caller can use
rb->catchange_ntxns instead.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 12, 2022 at 7:59 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 2:53 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
> > <shiy.fnst@fujitsu.com> wrote:
> > >
> > >
> > > It happened when executing the following code because it tried to free a NULL
> > > pointer (catchange_xip).
> > >
> > >         /* be tidy */
> > >         if (ondisk)
> > >                 pfree(ondisk);
> > > +       if (catchange_xip)
> > > +               pfree(catchange_xip);
> > >  }
> > >
> > > It seems to be related to configure option. I could reproduce it when using
> > > `./configure --enable-debug`.
> > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -ggdb"`.
> >
> > Hmm, I could not reproduce this problem even if I use ./configure
> > --enable-debug. And it's weird that we checked if catchange_xip is not
> > null but we did pfree for it:
> >
>
> Yeah, this looks weird to me as well but one difference in running
> tests could be the timing of WAL LOG for XLOG_RUNNING_XACTS. That may
> change the timing of SnapBuildSerialize. The other thing we can try is
> by checking the value of catchange_xcnt before pfree.

Yeah, we can try that.

While reading the code, I realized that we try to pfree both ondisk
and catchange_xip also when we jumped to 'out:':

out:
    ReorderBufferSetRestartPoint(builder->reorder,
                                 builder->last_serialized_snapshot);
    /* be tidy */
    if (ondisk)
        pfree(ondisk);
    if (catchange_xip)
        pfree(catchange_xip);

But we use both ondisk and catchange_xip only if we didn't jump to
'out:'. If this problem is related to compiler optimization with
'goto' statement, moving them before 'out:' might be worth trying.

>
> BTW, I think ReorderBufferGetCatalogChangesXacts should have an Assert
> to ensure rb->catchange_ntxns and xcnt are equal. We can probably then
> avoid having xcnt_p as an out parameter as the caller can use
> rb->catchange_ntxns instead.
>

Agreed.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 12, 2022 at 5:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 1:13 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 3:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jul 12, 2022 at 11:38 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Tue, Jul 12, 2022 at 10:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > > >
> > > > > I'm doing benchmark tests and will share the results.
> > > > >
> > > >
> > > > I've done benchmark tests to measure the overhead introduced by doing
> > > > bsearch() every time when decoding a commit record. I've simulated a
> > > > very intensified situation where we decode 1M commit records while
> > > > keeping builder->catchange.xip array but the overhead is negilible:
> > > >
> > > > HEAD: 584 ms
> > > > Patched: 614 ms
> > > >
> > > > I've attached the benchmark script I used. With increasing
> > > > LOG_SNAPSHOT_INTERVAL_MS to 90000, the last decoding by
> > > > pg_logicla_slot_get_changes() decodes 1M commit records while keeping
> > > > catalog modifying transactions.
> > > >
> > >
> > > Thanks for the test. We should also see how it performs when (a) we
> > > don't change LOG_SNAPSHOT_INTERVAL_MS,
> >
> > What point do you want to see in this test? I think the performance
> > overhead depends on how many times we do bsearch() and how many
> > transactions are in the list.
> >
>
> Right, I am not expecting any visible performance difference in this
> case. This is to ensure that we are not incurring any overhead in the
> more usual scenarios (or default cases). As per my understanding, the
> purpose of increasing the value of LOG_SNAPSHOT_INTERVAL_MS is to
> simulate a stress case for the changes made by the patch, and keeping
> its value default will test the more usual scenarios.

Agreed.

I've done simple benchmark tests to decode 100k pgbench transactions:

HEAD: 10.34 s
Patched: 10.29 s

I've attached an updated patch that incorporated comments from Amit and Shi.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Tue, Jul 12, 2022 at 12:40 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Tue, Jul 12, 2022 8:49 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached an updated patch.
> >
> > While trying this idea, I noticed there is no API to get the length of
> > dlist, as we discussed offlist. Alternative idea was to use List
> > (T_XidList) but I'm not sure it's a great idea since deleting an xid
> > from the list is O(N), we need to implement list_delete_xid, and we
> > need to make sure allocating list node in the reorder buffer context.
> > So in the patch, I added a variable, catchange_ntxns, to keep track of
> > the length of the dlist. Please review it.
> >
>
> Thanks for your patch. Here are some comments on the master patch.

Thank you for the comments.

>
> 1.
> In catalog_change_snapshot.spec, should we use "RUNNING_XACTS record" instead of
> "RUNNING_XACT record" / "XACT_RUNNING record" in the comment?
>
> 2.
> +                * Since catchange.xip is sorted, we find the lower bound of
> +                * xids that sill are interesting.
>
> Typo?
> "sill" -> "still"
>
> 3.
> +        * This array is set once when restoring the snapshot, xids are removed
> +        * from the array when decoding xl_running_xacts record, and then eventually
> +        * becomes an empty.
>
> +                       /* catchange list becomes an empty */
> +                       pfree(builder->catchange.xip);
> +                       builder->catchange.xip = NULL;
>
> Should "becomes an empty" be modified to "becomes empty"?
>
> 4.
> + * changes that are smaller than ->xmin. Those won't ever get checked via
> + * the ->committed array and ->catchange, respectively. The committed xids will
>
> Should we change
> "the ->committed array and ->catchange"
> to
> "the ->committed or ->catchange array"
> ?

Agreed with all the above comments. These are incorporated in the
latest v4 patch I just sent[1].

Regards,

[1]
https://www.postgresql.org/message-id/CAD21AoAyNPrOFg%2BQGh%2B%3D4205TU0%3DyrE%2BQyMgzStkH85uBZXptQ%40mail.gmail.com

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
> > It happened when executing the following code because it tried to free a
> NULL
> > pointer (catchange_xip).
> >
> >         /* be tidy */
> >         if (ondisk)
> >                 pfree(ondisk);
> > +       if (catchange_xip)
> > +               pfree(catchange_xip);
> >  }
> >
> > It seems to be related to configure option. I could reproduce it when using
> > `./configure --enable-debug`.
> > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> ggdb"`.
> 
> Hmm, I could not reproduce this problem even if I use ./configure
> --enable-debug. And it's weird that we checked if catchange_xip is not
> null but we did pfree for it:
> 
> #1  pfree (pointer=0x0) at mcxt.c:1177
> #2  0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
> lsn=25719712) at snapbuild.c:1792
> 
> Is it reproducible in your environment?

Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
v4 patch you posted [1].

[1]
https://www.postgresql.org/message-id/CAD21AoAyNPrOFg%2BQGh%2B%3D4205TU0%3DyrE%2BQyMgzStkH85uBZXptQ%40mail.gmail.com

> If so, could you test it again
> with the following changes?
> 
> diff --git a/src/backend/replication/logical/snapbuild.c
> b/src/backend/replication/logical/snapbuild.c
> index d015c06ced..a6e76e3781 100644
> --- a/src/backend/replication/logical/snapbuild.c
> +++ b/src/backend/replication/logical/snapbuild.c
> @@ -1788,7 +1788,7 @@ out:
>     /* be tidy */
>     if (ondisk)
>         pfree(ondisk);
> -   if (catchange_xip)
> +   if (catchange_xip != NULL)
>         pfree(catchange_xip);
>  }
> 

I tried this and could still reproduce the problem.

Besides, I tried the suggestion from Amit [2],  it could be fixed by checking
the value of catchange_xcnt instead of catchange_xip before pfree.

[2] https://www.postgresql.org/message-id/CAA4eK1%2BXPdm8G%3DEhUJA12Pi1YvQAfcz2%3DkTd9a4BjVx4%3Dgk-MA%40mail.gmail.com

diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index c482e906b0..68b9c4ef7d 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -1573,7 +1573,7 @@ SnapBuildSerialize(SnapBuild *builder, XLogRecPtr lsn)
        Size            needed_length;
        SnapBuildOnDisk *ondisk = NULL;
        TransactionId   *catchange_xip = NULL;
-       size_t          catchange_xcnt;
+       size_t          catchange_xcnt = 0;
        char       *ondisk_c;
        int                     fd;
        char            tmppath[MAXPGPATH];
@@ -1788,7 +1788,7 @@ out:
        /* be tidy */
        if (ondisk)
                pfree(ondisk);
-       if (catchange_xip)
+       if (catchange_xcnt != 0)
                pfree(catchange_xip);
 }


Regards,
Shi yu

On Thu, Jul 14, 2022 at 11:16 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
> > <shiy.fnst@fujitsu.com> wrote:
> > >
> > > It happened when executing the following code because it tried to free a
> > NULL
> > > pointer (catchange_xip).
> > >
> > >         /* be tidy */
> > >         if (ondisk)
> > >                 pfree(ondisk);
> > > +       if (catchange_xip)
> > > +               pfree(catchange_xip);
> > >  }
> > >
> > > It seems to be related to configure option. I could reproduce it when using
> > > `./configure --enable-debug`.
> > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> > ggdb"`.
> >
> > Hmm, I could not reproduce this problem even if I use ./configure
> > --enable-debug. And it's weird that we checked if catchange_xip is not
> > null but we did pfree for it:
> >
> > #1  pfree (pointer=0x0) at mcxt.c:1177
> > #2  0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
> > lsn=25719712) at snapbuild.c:1792
> >
> > Is it reproducible in your environment?
>
> Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
> v4 patch you posted [1].

Thank you for testing!

>
> [1]
https://www.postgresql.org/message-id/CAD21AoAyNPrOFg%2BQGh%2B%3D4205TU0%3DyrE%2BQyMgzStkH85uBZXptQ%40mail.gmail.com
>
> > If so, could you test it again
> > with the following changes?
> >
> > diff --git a/src/backend/replication/logical/snapbuild.c
> > b/src/backend/replication/logical/snapbuild.c
> > index d015c06ced..a6e76e3781 100644
> > --- a/src/backend/replication/logical/snapbuild.c
> > +++ b/src/backend/replication/logical/snapbuild.c
> > @@ -1788,7 +1788,7 @@ out:
> >     /* be tidy */
> >     if (ondisk)
> >         pfree(ondisk);
> > -   if (catchange_xip)
> > +   if (catchange_xip != NULL)
> >         pfree(catchange_xip);
> >  }
> >
>
> I tried this and could still reproduce the problem.

Does the backtrace still show we attempt to pfree a null-pointer?

>
> Besides, I tried the suggestion from Amit [2],  it could be fixed by checking
> the value of catchange_xcnt instead of catchange_xip before pfree.

Could you check if this problem occurred when we reached there via
goto pass, i.e., did we call ReorderBufferGetCatalogChangesXacts() or
not?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Thu, Jul 14, 2022 at 12:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 14, 2022 at 11:16 AM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
> > On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Tue, Jul 12, 2022 at 5:58 PM shiy.fnst@fujitsu.com
> > > <shiy.fnst@fujitsu.com> wrote:
> > > >
> > > > It happened when executing the following code because it tried to free a
> > > NULL
> > > > pointer (catchange_xip).
> > > >
> > > >         /* be tidy */
> > > >         if (ondisk)
> > > >                 pfree(ondisk);
> > > > +       if (catchange_xip)
> > > > +               pfree(catchange_xip);
> > > >  }
> > > >
> > > > It seems to be related to configure option. I could reproduce it when using
> > > > `./configure --enable-debug`.
> > > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og -
> > > ggdb"`.
> > >
> > > Hmm, I could not reproduce this problem even if I use ./configure
> > > --enable-debug. And it's weird that we checked if catchange_xip is not
> > > null but we did pfree for it:
> > >
> > > #1  pfree (pointer=0x0) at mcxt.c:1177
> > > #2  0x000000000078186b in SnapBuildSerialize (builder=0x1fd5e78,
> > > lsn=25719712) at snapbuild.c:1792
> > >
> > > Is it reproducible in your environment?
> >
> > Thanks for your reply! Yes, it is reproducible. And I also reproduced it on the
> > v4 patch you posted [1].
>
> Thank you for testing!

I've found out the exact cause of this problem and how to fix it. I'll
submit an updated patch next week with my analysis.

Thank you for testing and providing additional information off-list, Shi yu.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Mon, Jul 11, 2022 9:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> I've attached an updated patch, please review it.
> 

Thanks for your patch. Here are some comments for the REL14-v1 patch.

1.
+        Size        sz = sizeof(TransactionId) * nxacts;;

There is a redundant semicolon at the end.

2.
+    workspace = MemoryContextAlloc(rb->context, rb->n_initial_running_xacts);

Should it be:
+    workspace = MemoryContextAlloc(rb->context, sizeof(TransactionId) * rb->n_initial_running_xacts);

3.
+    /* bound check if there is at least one transaction to be removed */
+    if (NormalTransactionIdPrecedes(rb->initial_running_xacts[0],
+                                    running->oldestRunningXid))
+        return;
+

Here, I think it should return if rb->initial_running_xacts[0] is older than
oldestRunningXid, right? Should it be changed to:

+    if (!NormalTransactionIdPrecedes(rb->initial_running_xacts[0],
+                                    running->oldestRunningXid))
+        return;

4.
+    if ((parsed->xinfo & XACT_XINFO_HAS_INVALS) != 0)

Maybe we can change it like the following, to be consistent with other places in
this file. It's also fine if you don't change it.

+    if (parsed->xinfo & XACT_XINFO_HAS_INVALS)


Regards,
Shi yu

On Thursday, July 14, 2022 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've attached an updated patch that incorporated comments from Amit and Shi.
Hi,


Minor comments for v4.

(1) typo in the commit message

"When decoding a COMMIT record, we check both the list and the ReorderBuffer to see if
if the transaction has modified catalogs."

There are two 'if's in succession in the last sentence of the second paragraph.

(2) The header comment for the spec test

+# Test that decoding only the commit record of the transaction that have
+# catalog-changed.

Rewording of this part looks required, because "test that ... " requires a complete sentence
after that, right ?


(3) SnapBuildRestore

snapshot_not_interesting:
    if (ondisk.builder.committed.xip != NULL)
        pfree(ondisk.builder.committed.xip);
    return false;
}

Do we need to add pfree for ondisk.builder.catchange.xip after the 'snapshot_not_interesting' label ?


(4) SnapBuildPurgeOlderTxn

+               elog(DEBUG3, "purged catalog modifying transactions from %d to %d",
+                        (uint32) builder->catchange.xcnt, surviving_xids);

To make this part more aligned with existing codes,
probably we can have a look at another elog for debug in the same function.

We should use %u for casted xcnt & surviving_xids,
while adding a format for xmin if necessary ?


Best Regards,
    Takamichi Osumi


On Fri, Jul 15, 2022 at 10:43 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> On Thursday, July 14, 2022 10:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I've attached an updated patch that incorporated comments from Amit and Shi.
> Hi,
>
>
> Minor comments for v4.

Thank you for the comments!

>
> (1) typo in the commit message
>
> "When decoding a COMMIT record, we check both the list and the ReorderBuffer to see if
> if the transaction has modified catalogs."
>
> There are two 'if's in succession in the last sentence of the second paragraph.
>
> (2) The header comment for the spec test
>
> +# Test that decoding only the commit record of the transaction that have
> +# catalog-changed.
>
> Rewording of this part looks required, because "test that ... " requires a complete sentence
> after that, right ?
>
>
> (3) SnapBuildRestore
>
> snapshot_not_interesting:
>     if (ondisk.builder.committed.xip != NULL)
>         pfree(ondisk.builder.committed.xip);
>     return false;
> }
>
> Do we need to add pfree for ondisk.builder.catchange.xip after the 'snapshot_not_interesting' label ?
>
>
> (4) SnapBuildPurgeOlderTxn
>
> +               elog(DEBUG3, "purged catalog modifying transactions from %d to %d",
> +                        (uint32) builder->catchange.xcnt, surviving_xids);
>
> To make this part more aligned with existing codes,
> probably we can have a look at another elog for debug in the same function.
>
> We should use %u for casted xcnt & surviving_xids,
> while adding a format for xmin if necessary ?

I agreed with all the above comments and incorporated them into the
updated patch.

This patch should have the fix for the issue that Shi yu reported. Shi
yu, could you please test it again with this patch?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Fri, Jul 15, 2022 at 3:32 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Mon, Jul 11, 2022 9:54 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached an updated patch, please review it.
> >
>
> Thanks for your patch. Here are some comments for the REL14-v1 patch.
>
> 1.
> +               Size            sz = sizeof(TransactionId) * nxacts;;
>
> There is a redundant semicolon at the end.
>
> 2.
> +       workspace = MemoryContextAlloc(rb->context, rb->n_initial_running_xacts);
>
> Should it be:
> +       workspace = MemoryContextAlloc(rb->context, sizeof(TransactionId) * rb->n_initial_running_xacts);
>
> 3.
> +       /* bound check if there is at least one transaction to be removed */
> +       if (NormalTransactionIdPrecedes(rb->initial_running_xacts[0],
> +                                                                       running->oldestRunningXid))
> +               return;
> +
>
> Here, I think it should return if rb->initial_running_xacts[0] is older than
> oldestRunningXid, right? Should it be changed to:
>
> +       if (!NormalTransactionIdPrecedes(rb->initial_running_xacts[0],
> +                                                                       running->oldestRunningXid))
> +               return;
>
> 4.
> +       if ((parsed->xinfo & XACT_XINFO_HAS_INVALS) != 0)
>
> Maybe we can change it like the following, to be consistent with other places in
> this file. It's also fine if you don't change it.
>
> +       if (parsed->xinfo & XACT_XINFO_HAS_INVALS)

Thank you for the comments!

I've attached patches for all supported branches including the master.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Fri, Jul 15, 2022 10:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> This patch should have the fix for the issue that Shi yu reported. Shi
> yu, could you please test it again with this patch?
> 

Thanks for updating the patch!
I have tested and confirmed that the problem I found has been fixed.

Regards,
Shi yu

On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> This patch should have the fix for the issue that Shi yu reported. Shi
> yu, could you please test it again with this patch?
>

Can you explain the cause of the failure and your fix for the same?

-- 
With Regards,
Amit Kapila.



On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 15, 2022 at 3:32 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
>
> I've attached patches for all supported branches including the master.
>

For back branch patches,
* Wouldn't it be better to move purge logic into the function
SnapBuildPurge* function for the sake of consistency?
* Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
Can't we instead have a function similar to
SnapBuildXidHasCatalogChanges() as we have for the master branch? That
will avoid calling it when the snapshot
state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT

-- 
With Regards,
Amit Kapila.



On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > This patch should have the fix for the issue that Shi yu reported. Shi
> > yu, could you please test it again with this patch?
> >
>
> Can you explain the cause of the failure and your fix for the same?

@@ -1694,6 +1788,8 @@ out:
    /* be tidy */
    if (ondisk)
        pfree(ondisk);
+   if (catchange_xip)
+       pfree(catchange_xip);

Regarding the above code in the previous version patch, looking at the
generated assembler code shared by Shi yu offlist, I realized that the
“if (catchange_xip)” is removed (folded) by gcc optimization. This is
because we dereference catchange_xip before null-pointer check as
follow:

+   /* copy catalog modifying xacts */
+   sz = sizeof(TransactionId) * catchange_xcnt;
+   memcpy(ondisk_c, catchange_xip, sz);
+   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
+   ondisk_c += sz;

Since sz is 0 in this case, memcpy doesn’t do anything actually.

By checking the assembler code, I’ve confirmed that gcc does the
optimization for these code and setting
-fno-delete-null-pointer-checks flag prevents the if statement from
being folded. Also, I’ve confirmed that adding the check if
"catchange.xcnt > 0” before the null-pointer check also can prevent
that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
added a similar check for builder->committed.xcnt as well for
consistency. builder->committed.xip could have no transactions.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Mon, Jul 18, 2022 at 12:28 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Fri, Jul 15, 2022 10:39 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > This patch should have the fix for the issue that Shi yu reported. Shi
> > yu, could you please test it again with this patch?
> >
>
> Thanks for updating the patch!
> I have tested and confirmed that the problem I found has been fixed.

Thank you for testing!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > This patch should have the fix for the issue that Shi yu reported. Shi
> > > yu, could you please test it again with this patch?
> > >
> >
> > Can you explain the cause of the failure and your fix for the same?
>
> @@ -1694,6 +1788,8 @@ out:
>     /* be tidy */
>     if (ondisk)
>         pfree(ondisk);
> +   if (catchange_xip)
> +       pfree(catchange_xip);
>
> Regarding the above code in the previous version patch, looking at the
> generated assembler code shared by Shi yu offlist, I realized that the
> “if (catchange_xip)” is removed (folded) by gcc optimization. This is
> because we dereference catchange_xip before null-pointer check as
> follow:
>
> +   /* copy catalog modifying xacts */
> +   sz = sizeof(TransactionId) * catchange_xcnt;
> +   memcpy(ondisk_c, catchange_xip, sz);
> +   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
> +   ondisk_c += sz;
>
> Since sz is 0 in this case, memcpy doesn’t do anything actually.
>
> By checking the assembler code, I’ve confirmed that gcc does the
> optimization for these code and setting
> -fno-delete-null-pointer-checks flag prevents the if statement from
> being folded. Also, I’ve confirmed that adding the check if
> "catchange.xcnt > 0” before the null-pointer check also can prevent
> that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
> added a similar check for builder->committed.xcnt as well for
> consistency. builder->committed.xip could have no transactions.
>

Good work. I wonder without comments this may create a problem in the
future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
freeing the memory any less robust. Also, for consistency, we can use
a similar check based on xcnt in the SnapBuildRestore to free the
memory in the below code:
+ /* set catalog modifying transactions */
+ if (builder->catchange.xip)
+ pfree(builder->catchange.xip);

--
With Regards,
Amit Kapila.



On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 19, 2022 at 6:34 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Jul 18, 2022 at 1:12 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jul 15, 2022 at 8:09 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > This patch should have the fix for the issue that Shi yu reported. Shi
> > > > yu, could you please test it again with this patch?
> > > >
> > >
> > > Can you explain the cause of the failure and your fix for the same?
> >
> > @@ -1694,6 +1788,8 @@ out:
> >     /* be tidy */
> >     if (ondisk)
> >         pfree(ondisk);
> > +   if (catchange_xip)
> > +       pfree(catchange_xip);
> >
> > Regarding the above code in the previous version patch, looking at the
> > generated assembler code shared by Shi yu offlist, I realized that the
> > “if (catchange_xip)” is removed (folded) by gcc optimization. This is
> > because we dereference catchange_xip before null-pointer check as
> > follow:
> >
> > +   /* copy catalog modifying xacts */
> > +   sz = sizeof(TransactionId) * catchange_xcnt;
> > +   memcpy(ondisk_c, catchange_xip, sz);
> > +   COMP_CRC32C(ondisk->checksum, ondisk_c, sz);
> > +   ondisk_c += sz;
> >
> > Since sz is 0 in this case, memcpy doesn’t do anything actually.
> >
> > By checking the assembler code, I’ve confirmed that gcc does the
> > optimization for these code and setting
> > -fno-delete-null-pointer-checks flag prevents the if statement from
> > being folded. Also, I’ve confirmed that adding the check if
> > "catchange.xcnt > 0” before the null-pointer check also can prevent
> > that. Adding a check  if "catchange.xcnt > 0” looks more robust. I’ve
> > added a similar check for builder->committed.xcnt as well for
> > consistency. builder->committed.xip could have no transactions.
> >
>
> Good work. I wonder without comments this may create a problem in the
> future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> freeing the memory any less robust. Also, for consistency, we can use
> a similar check based on xcnt in the SnapBuildRestore to free the
> memory in the below code:
> + /* set catalog modifying transactions */
> + if (builder->catchange.xip)
> + pfree(builder->catchange.xip);

I would hesitate to add comments about preventing the particular
optimization. I think we do null-pointer-check-then-pfree many place.
It seems to me that checking the array length before memcpy is more
natural than checking both the array length and the array existence
before pfree.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Sunday, July 17, 2022 9:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> I've attached patches for all supported branches including the master.
Hi,


Minor comments for REL14.

(1) There are some foreign characters in the patches (in the commit message)

When I had a look at your patch for back branches with some editor,
I could see some unfamiliar full-width characters like below two cases,
mainly around "single quotes" in the sentences.

Could you please check the entire patches,
probably by some tool that helps you to detect this kind of characters ?

* the 2nd paragraph of the commit message

...mark the transaction as containing catalog changes if it窶冱 in the list of the
initial running transactions ...

* the 3rd paragraph of the same

It doesn窶冲 have the information on which (sub) transaction has catalog changes....

FYI, this comment applies to other patches for REL13, REL12, REL11, REL10.


(2) typo in the commit message

FROM:
To fix this problem, this change the reorder buffer so that...
TO:
To fix this problem, this changes the reorder buffer so that...


(3) typo in ReorderBufferProcessInitialXacts

+       /*
+        * Remove transactions that would have been processed and we don't need to
+        * keep track off anymore.


Kindly change
FROM:
keep track off
TO:
keep track of



Best Regards,
    Takamichi Osumi


At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> Good work. I wonder without comments this may create a problem in the
> future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> freeing the memory any less robust. Also, for consistency, we can use
> a similar check based on xcnt in the SnapBuildRestore to free the
> memory in the below code:
> + /* set catalog modifying transactions */
> + if (builder->catchange.xip)
> + pfree(builder->catchange.xip);

But xip must be positive there.  We can add a comment explains that.


+     * Array of transactions and subtransactions that had modified catalogs
+     * and were running when the snapshot was serialized.
+     *
+     * We normally rely on HEAP2_NEW_CID and XLOG_XACT_INVALIDATIONS records to
+     * know if the transaction has changed the catalog. But it could happen that
+     * the logical decoding decodes only the commit record of the transaction.
+     * This array keeps track of the transactions that have modified catalogs

(Might be only me, but) "track" makes me think that xids are added and
removed by activities. On the other hand the array just remembers
catalog-modifying xids in the last life until the all xids in the list
gone.

+     * and were running when serializing a snapshot, and this array is used to
+     * add such transactions to the snapshot.
+     *
+     * This array is set once when restoring the snapshot, xids are removed

(So I want to add "only" between "are removed").

+     * from the array when decoding xl_running_xacts record, and then eventually
+     * becomes empty.


+    catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);

catchange_xip is allocated in the current context, but ondisk is
allocated in builder->context.  I see it kind of inconsistent (even if
the current context is same with build->context).


+    if (builder->committed.xcnt > 0)
+    {

It seems to me comitted.xip is always non-null, so we don't need this.
I don't strongly object to do that, though.

-     * Remove TXN from its containing list.
+     * Remove TXN from its containing lists.

The comment body only describes abut txn->nodes. I think we need to
add that for catchange_node.


+    Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));

(xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
(xcnt == rb->catchange_ntxns) is not what should be checked here. The
assert just requires that catchange_txns and catchange_ntxns are
consistent so it should be checked just after dlist_empty.. I think.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Mon, Jul 18, 2022 at 8:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jul 15, 2022 at 3:32 PM shiy.fnst@fujitsu.com
> > <shiy.fnst@fujitsu.com> wrote:
> > >
> >
> > I've attached patches for all supported branches including the master.
> >
>
> For back branch patches,
> * Wouldn't it be better to move purge logic into the function
> SnapBuildPurge* function for the sake of consistency?

Agreed.

> * Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
> Can't we instead have a function similar to
> SnapBuildXidHasCatalogChanges() as we have for the master branch? That
> will avoid calling it when the snapshot
> state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT

Seems a good idea. We would need to pass the information about
(parsed->xinfo & XACT_XINFO_HAS_INVALS) to the function but probably
we can change ReorderBufferXidHasCatalogChanges() so that it checks
the RBTXN_HAS_CATALOG_CHANGES flag and then the initial running xacts
array.

BTW on backbranches, I think that the reason why we add
initial_running_xacts stuff to ReorderBuffer is that we cannot modify
SnapBuild that could be serialized. Can we add a (private) array for
the initial running xacts in snapbuild.c instead of adding new
variables to ReorderBuffer? That way, the code would become more
consistent with the changes on the master branch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 19, 2022 at 4:28 PM osumi.takamichi@fujitsu.com
<osumi.takamichi@fujitsu.com> wrote:
>
> On Sunday, July 17, 2022 9:59 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > I've attached patches for all supported branches including the master.
> Hi,
>
>
> Minor comments for REL14.
>
> (1) There are some foreign characters in the patches (in the commit message)
>
> When I had a look at your patch for back branches with some editor,
> I could see some unfamiliar full-width characters like below two cases,
> mainly around "single quotes" in the sentences.
>
> Could you please check the entire patches,
> probably by some tool that helps you to detect this kind of characters ?
>
> * the 2nd paragraph of the commit message
>
> ...mark the transaction as containing catalog changes if it窶冱 in the list of the
> initial running transactions ...
>
> * the 3rd paragraph of the same
>
> It doesn窶冲 have the information on which (sub) transaction has catalog changes....
>
> FYI, this comment applies to other patches for REL13, REL12, REL11, REL10.
>
>
> (2) typo in the commit message
>
> FROM:
> To fix this problem, this change the reorder buffer so that...
> TO:
> To fix this problem, this changes the reorder buffer so that...
>
>
> (3) typo in ReorderBufferProcessInitialXacts
>
> +       /*
> +        * Remove transactions that would have been processed and we don't need to
> +        * keep track off anymore.
>
>
> Kindly change
> FROM:
> keep track off
> TO:
> keep track of

Thank you for the comments! I'll address these comments in the next
version patch.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



At Tue, 19 Jul 2022 16:02:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in 
> On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Good work. I wonder without comments this may create a problem in the
> > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > freeing the memory any less robust. Also, for consistency, we can use
> > a similar check based on xcnt in the SnapBuildRestore to free the
> > memory in the below code:
> > + /* set catalog modifying transactions */
> > + if (builder->catchange.xip)
> > + pfree(builder->catchange.xip);
> 
> I would hesitate to add comments about preventing the particular
> optimization. I think we do null-pointer-check-then-pfree many place.
> It seems to me that checking the array length before memcpy is more
> natural than checking both the array length and the array existence
> before pfree.

Anyway according to commit message of 46ab07ffda, POSIX forbits
memcpy(NULL, NULL, 0). It seems to me that it is the cause of the
false (or over) optimization.  So if we add some comment, it would be
for memcpy, not pfree..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



At Tue, 19 Jul 2022 16:57:14 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Tue, 19 Jul 2022 16:02:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in 
> > On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Good work. I wonder without comments this may create a problem in the
> > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > freeing the memory any less robust. Also, for consistency, we can use
> > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > memory in the below code:
> > > + /* set catalog modifying transactions */
> > > + if (builder->catchange.xip)
> > > + pfree(builder->catchange.xip);
> > 
> > I would hesitate to add comments about preventing the particular
> > optimization. I think we do null-pointer-check-then-pfree many place.
> > It seems to me that checking the array length before memcpy is more
> > natural than checking both the array length and the array existence
> > before pfree.
> 
> Anyway according to commit message of 46ab07ffda, POSIX forbits
> memcpy(NULL, NULL, 0). It seems to me that it is the cause of the
> false (or over) optimization.  So if we add some comment, it would be
> for memcpy, not pfree..

For clarilty, I meant that I don't think we need that comment.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:

Thank you for the comments!

>
> At Tue, 19 Jul 2022 10:17:15 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > Good work. I wonder without comments this may create a problem in the
> > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > freeing the memory any less robust. Also, for consistency, we can use
> > a similar check based on xcnt in the SnapBuildRestore to free the
> > memory in the below code:
> > + /* set catalog modifying transactions */
> > + if (builder->catchange.xip)
> > + pfree(builder->catchange.xip);
>
> But xip must be positive there.  We can add a comment explains that.
>

Yes, if we add the comment for it, probably we need to explain a gcc's
optimization but it seems to be too much to me.

>
> +        * Array of transactions and subtransactions that had modified catalogs
> +        * and were running when the snapshot was serialized.
> +        *
> +        * We normally rely on HEAP2_NEW_CID and XLOG_XACT_INVALIDATIONS records to
> +        * know if the transaction has changed the catalog. But it could happen that
> +        * the logical decoding decodes only the commit record of the transaction.
> +        * This array keeps track of the transactions that have modified catalogs
>
> (Might be only me, but) "track" makes me think that xids are added and
> removed by activities. On the other hand the array just remembers
> catalog-modifying xids in the last life until the all xids in the list
> gone.
>
> +        * and were running when serializing a snapshot, and this array is used to
> +        * add such transactions to the snapshot.
> +        *
> +        * This array is set once when restoring the snapshot, xids are removed
>
> (So I want to add "only" between "are removed").
>
> +        * from the array when decoding xl_running_xacts record, and then eventually
> +        * becomes empty.

Agreed. WIll fix.

>
>
> +       catchange_xip = ReorderBufferGetCatalogChangesXacts(builder->reorder);
>
> catchange_xip is allocated in the current context, but ondisk is
> allocated in builder->context.  I see it kind of inconsistent (even if
> the current context is same with build->context).

Right. I thought that since the lifetime of catchange_xip is short,
until the end of SnapBuildSerialize() function we didn't need to
allocate it in builder->context. But given ondisk, we need to do that
for catchange_xip as well. Will fix it.

>
>
> +       if (builder->committed.xcnt > 0)
> +       {
>
> It seems to me comitted.xip is always non-null, so we don't need this.
> I don't strongly object to do that, though.

But committed.xcnt could be 0, right? We don't need to copy anything
by calling memcpy with size = 0 in this case. Also, it looks more
consistent with what we do for catchange_xcnt.

>
> -        * Remove TXN from its containing list.
> +        * Remove TXN from its containing lists.
>
> The comment body only describes abut txn->nodes. I think we need to
> add that for catchange_node.

Will add.

>
>
> +       Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));
>
> (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
> (xcnt == rb->catchange_ntxns) is not what should be checked here. The
> assert just requires that catchange_txns and catchange_ntxns are
> consistent so it should be checked just after dlist_empty.. I think.
>

If we want to check if catchange_txns and catchange_ntxns are
consistent, should we check (xcnt == rb->catchange_ntxns) as well, no?
This function requires the caller to use rb->catchange_ntxns as the
length of the returned array. I think this assertion ensures that the
actual length of the array is consistent with the length we
pre-calculated.


Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 19, 2022 at 1:43 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Tue, 19 Jul 2022 16:57:14 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > At Tue, 19 Jul 2022 16:02:26 +0900, Masahiko Sawada <sawada.mshk@gmail.com> wrote in
> > > On Tue, Jul 19, 2022 at 1:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > Good work. I wonder without comments this may create a problem in the
> > > > future. OTOH, I don't see adding a check "catchange.xcnt > 0" before
> > > > freeing the memory any less robust. Also, for consistency, we can use
> > > > a similar check based on xcnt in the SnapBuildRestore to free the
> > > > memory in the below code:
> > > > + /* set catalog modifying transactions */
> > > > + if (builder->catchange.xip)
> > > > + pfree(builder->catchange.xip);
> > >
> > > I would hesitate to add comments about preventing the particular
> > > optimization. I think we do null-pointer-check-then-pfree many place.
> > > It seems to me that checking the array length before memcpy is more
> > > natural than checking both the array length and the array existence
> > > before pfree.
> >
> > Anyway according to commit message of 46ab07ffda, POSIX forbits
> > memcpy(NULL, NULL, 0). It seems to me that it is the cause of the
> > false (or over) optimization.  So if we add some comment, it would be
> > for memcpy, not pfree..
>
> For clarilty, I meant that I don't think we need that comment.
>

Fair enough. I think commit 46ab07ffda clearly explains why it is a
good idea to add a check as Sawada-San did in his latest version. I
also agree that we don't any comment for this change.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 18, 2022 at 8:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jul 15, 2022 at 3:32 PM shiy.fnst@fujitsu.com
> > > <shiy.fnst@fujitsu.com> wrote:
> > > >
> > >
> > > I've attached patches for all supported branches including the master.
> > >
> >
> > For back branch patches,
> > * Wouldn't it be better to move purge logic into the function
> > SnapBuildPurge* function for the sake of consistency?
>
> Agreed.
>
> > * Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
> > Can't we instead have a function similar to
> > SnapBuildXidHasCatalogChanges() as we have for the master branch? That
> > will avoid calling it when the snapshot
> > state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT
>
> Seems a good idea. We would need to pass the information about
> (parsed->xinfo & XACT_XINFO_HAS_INVALS) to the function but probably
> we can change ReorderBufferXidHasCatalogChanges() so that it checks
> the RBTXN_HAS_CATALOG_CHANGES flag and then the initial running xacts
> array.
>

Let's try to keep this as much similar to the master branch patch as possible.

> BTW on backbranches, I think that the reason why we add
> initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> SnapBuild that could be serialized. Can we add a (private) array for
> the initial running xacts in snapbuild.c instead of adding new
> variables to ReorderBuffer?
>

While thinking about this, I wonder if the current patch for back
branches can lead to an ABI break as it changes the exposed structure?
If so, it may be another reason to change it to some other way
probably as you are suggesting.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 19, 2022 at 2:01 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 19, 2022 at 4:35 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
>
> >
> >
> > +       Assert((xcnt > 0) && (xcnt == rb->catchange_ntxns));
> >
> > (xcnt > 0) is obvious here (otherwise means dlist_foreach is broken..).
> > (xcnt == rb->catchange_ntxns) is not what should be checked here. The
> > assert just requires that catchange_txns and catchange_ntxns are
> > consistent so it should be checked just after dlist_empty.. I think.
> >
>
> If we want to check if catchange_txns and catchange_ntxns are
> consistent, should we check (xcnt == rb->catchange_ntxns) as well, no?
> This function requires the caller to use rb->catchange_ntxns as the
> length of the returned array. I think this assertion ensures that the
> actual length of the array is consistent with the length we
> pre-calculated.
>

Right, so, I think it is better to keep this assertion but remove
(xcnt > 0) part as pointed out by Horiguchi-San.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 19, 2022 at 9:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Jul 18, 2022 at 8:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sun, Jul 17, 2022 at 6:29 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > > > On Fri, Jul 15, 2022 at 3:32 PM shiy.fnst@fujitsu.com
> > > > <shiy.fnst@fujitsu.com> wrote:
> > > > >
> > > >
> > > > I've attached patches for all supported branches including the master.
> > > >
> > >
> > > For back branch patches,
> > > * Wouldn't it be better to move purge logic into the function
> > > SnapBuildPurge* function for the sake of consistency?
> >
> > Agreed.
> >
> > > * Do we really need ReorderBufferInitialXactsSetCatalogChanges()?
> > > Can't we instead have a function similar to
> > > SnapBuildXidHasCatalogChanges() as we have for the master branch? That
> > > will avoid calling it when the snapshot
> > > state is SNAPBUILD_START or SNAPBUILD_BUILDING_SNAPSHOT
> >
> > Seems a good idea. We would need to pass the information about
> > (parsed->xinfo & XACT_XINFO_HAS_INVALS) to the function but probably
> > we can change ReorderBufferXidHasCatalogChanges() so that it checks
> > the RBTXN_HAS_CATALOG_CHANGES flag and then the initial running xacts
> > array.
> >
>
> Let's try to keep this as much similar to the master branch patch as possible.
>
> > BTW on backbranches, I think that the reason why we add
> > initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> > SnapBuild that could be serialized. Can we add a (private) array for
> > the initial running xacts in snapbuild.c instead of adding new
> > variables to ReorderBuffer?
> >
>
> While thinking about this, I wonder if the current patch for back
> branches can lead to an ABI break as it changes the exposed structure?
> If so, it may be another reason to change it to some other way
> probably as you are suggesting.

Yeah, it changes the size of ReorderBuffer, which is not good.
Changing the function names and arguments would also break ABI. So
probably we cannot do the above idea of removing
ReorderBufferInitialXactsSetCatalogChanges() as well.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 19, 2022 at 7:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Tue, Jul 19, 2022 at 9:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > > BTW on backbranches, I think that the reason why we add
> > > initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> > > SnapBuild that could be serialized. Can we add a (private) array for
> > > the initial running xacts in snapbuild.c instead of adding new
> > > variables to ReorderBuffer?
> > >
> >
> > While thinking about this, I wonder if the current patch for back
> > branches can lead to an ABI break as it changes the exposed structure?
> > If so, it may be another reason to change it to some other way
> > probably as you are suggesting.
>
> Yeah, it changes the size of ReorderBuffer, which is not good.
>

So, are you planning to give a try with your idea of making a private
array for the initial running xacts? I am not sure but I guess you are
proposing to add it in SnapBuild structure, if so, that seems safe as
that structure is not exposed.

> Changing the function names and arguments would also break ABI. So
> probably we cannot do the above idea of removing
> ReorderBufferInitialXactsSetCatalogChanges() as well.
>

Why do you think we can't remove
ReorderBufferInitialXactsSetCatalogChanges() from the back branch
patch? I think we don't need to change the existing function
ReorderBufferXidHasCatalogChanges() but instead can have a wrapper
like SnapBuildXidHasCatalogChanges() similar to master branch patch.

-- 
With Regards,
Amit Kapila.



On Wed, Jul 20, 2022 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 19, 2022 at 7:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Tue, Jul 19, 2022 at 9:25 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jul 19, 2022 at 1:10 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > >
> > > > BTW on backbranches, I think that the reason why we add
> > > > initial_running_xacts stuff to ReorderBuffer is that we cannot modify
> > > > SnapBuild that could be serialized. Can we add a (private) array for
> > > > the initial running xacts in snapbuild.c instead of adding new
> > > > variables to ReorderBuffer?
> > > >
> > >
> > > While thinking about this, I wonder if the current patch for back
> > > branches can lead to an ABI break as it changes the exposed structure?
> > > If so, it may be another reason to change it to some other way
> > > probably as you are suggesting.
> >
> > Yeah, it changes the size of ReorderBuffer, which is not good.
> >
>
> So, are you planning to give a try with your idea of making a private
> array for the initial running xacts?

Yes.

>  I am not sure but I guess you are
> proposing to add it in SnapBuild structure, if so, that seems safe as
> that structure is not exposed.

We cannot add it in SnapBuild as it gets serialized to the disk.

>
> > Changing the function names and arguments would also break ABI. So
> > probably we cannot do the above idea of removing
> > ReorderBufferInitialXactsSetCatalogChanges() as well.
> >
>
> Why do you think we can't remove
> ReorderBufferInitialXactsSetCatalogChanges() from the back branch
> patch? I think we don't need to change the existing function
> ReorderBufferXidHasCatalogChanges() but instead can have a wrapper
> like SnapBuildXidHasCatalogChanges() similar to master branch patch.

IIUC we need to change SnapBuildCommitTxn() but it's exposed.

Currently, we call like DecodeCommit() -> SnapBuildCommitTxn() ->
ReorderBufferXidHasCatalogChanges(). If we have a wrapper function, we
call like DecodeCommit() -> SnapBuildCommitTxn() ->
SnapBuildXidHasCatalogChanges() ->
ReorderBufferXidHasCatalogChanges(). In
SnapBuildXidHasCatalogChanges(), we need to check if the transaction
has XACT_XINFO_HAS_INVALS, which means DecodeCommit() needs to pass
either parsed->xinfo or (parsed->xinfo & XACT_XINFO_HAS_INVALS != 0)
down to SnapBuildXidHasCatalogChanges(). However, since
SnapBuildCommitTxn(), between DecodeCommit() and
SnapBuildXidHasCatalogChanges(), is exposed we cannot change it.

Another idea would be to have functions, say
SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
does actual work of handling transaction commits and both
SnapBuildCommitTxn() and SnapBuildCommit() call
SnapBuildCommitTxnWithXInfo() with different arguments.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Wed, Jul 20, 2022 at 9:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 19, 2022 at 7:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Why do you think we can't remove
> > ReorderBufferInitialXactsSetCatalogChanges() from the back branch
> > patch? I think we don't need to change the existing function
> > ReorderBufferXidHasCatalogChanges() but instead can have a wrapper
> > like SnapBuildXidHasCatalogChanges() similar to master branch patch.
>
> IIUC we need to change SnapBuildCommitTxn() but it's exposed.
>
> Currently, we call like DecodeCommit() -> SnapBuildCommitTxn() ->
> ReorderBufferXidHasCatalogChanges(). If we have a wrapper function, we
> call like DecodeCommit() -> SnapBuildCommitTxn() ->
> SnapBuildXidHasCatalogChanges() ->
> ReorderBufferXidHasCatalogChanges(). In
> SnapBuildXidHasCatalogChanges(), we need to check if the transaction
> has XACT_XINFO_HAS_INVALS, which means DecodeCommit() needs to pass
> either parsed->xinfo or (parsed->xinfo & XACT_XINFO_HAS_INVALS != 0)
> down to SnapBuildXidHasCatalogChanges(). However, since
> SnapBuildCommitTxn(), between DecodeCommit() and
> SnapBuildXidHasCatalogChanges(), is exposed we cannot change it.
>

Agreed.

> Another idea would be to have functions, say
> SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
> does actual work of handling transaction commits and both
> SnapBuildCommitTxn() and SnapBuildCommit() call
> SnapBuildCommitTxnWithXInfo() with different arguments.
>

Do you want to say DecodeCommit() instead of SnapBuildCommit() in
above para? Yet another idea could be to have another flag
RBTXN_HAS_INVALS which will be set by DecodeCommit for top-level TXN.
Then, we can retrieve it even for each of the subtxn's if and when
required.

-- 
With Regards,
Amit Kapila.



On Wed, Jul 20, 2022 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 9:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 12:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Jul 19, 2022 at 7:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Why do you think we can't remove
> > > ReorderBufferInitialXactsSetCatalogChanges() from the back branch
> > > patch? I think we don't need to change the existing function
> > > ReorderBufferXidHasCatalogChanges() but instead can have a wrapper
> > > like SnapBuildXidHasCatalogChanges() similar to master branch patch.
> >
> > IIUC we need to change SnapBuildCommitTxn() but it's exposed.
> >
> > Currently, we call like DecodeCommit() -> SnapBuildCommitTxn() ->
> > ReorderBufferXidHasCatalogChanges(). If we have a wrapper function, we
> > call like DecodeCommit() -> SnapBuildCommitTxn() ->
> > SnapBuildXidHasCatalogChanges() ->
> > ReorderBufferXidHasCatalogChanges(). In
> > SnapBuildXidHasCatalogChanges(), we need to check if the transaction
> > has XACT_XINFO_HAS_INVALS, which means DecodeCommit() needs to pass
> > either parsed->xinfo or (parsed->xinfo & XACT_XINFO_HAS_INVALS != 0)
> > down to SnapBuildXidHasCatalogChanges(). However, since
> > SnapBuildCommitTxn(), between DecodeCommit() and
> > SnapBuildXidHasCatalogChanges(), is exposed we cannot change it.
> >
>
> Agreed.
>
> > Another idea would be to have functions, say
> > SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
> > does actual work of handling transaction commits and both
> > SnapBuildCommitTxn() and SnapBuildCommit() call
> > SnapBuildCommitTxnWithXInfo() with different arguments.
> >
>
> Do you want to say DecodeCommit() instead of SnapBuildCommit() in
> above para?

I meant that we will call like DecodeCommit() ->
SnapBuildCommitTxnWithXInfo() -> SnapBuildCommitTxn_ext(has_invals =
true) -> SnapBuildXidHasCatalogChanges(has_invals = true) -> ... If
SnapBuildCommitTxn() gets called, it calls SnapBuildCommitTxn_ext()
with has_invals = false and behaves the same as before.

> Yet another idea could be to have another flag
> RBTXN_HAS_INVALS which will be set by DecodeCommit for top-level TXN.
> Then, we can retrieve it even for each of the subtxn's if and when
> required.

Do you mean that when checking if the subtransaction has catalog
changes, we check if its top-level XID has this new flag? Why do we
need the new flag?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 9:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > Another idea would be to have functions, say
> > > SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
> > > does actual work of handling transaction commits and both
> > > SnapBuildCommitTxn() and SnapBuildCommit() call
> > > SnapBuildCommitTxnWithXInfo() with different arguments.
> > >
> >
> > Do you want to say DecodeCommit() instead of SnapBuildCommit() in
> > above para?
>
> I meant that we will call like DecodeCommit() ->
> SnapBuildCommitTxnWithXInfo() -> SnapBuildCommitTxn_ext(has_invals =
> true) -> SnapBuildXidHasCatalogChanges(has_invals = true) -> ... If
> SnapBuildCommitTxn() gets called, it calls SnapBuildCommitTxn_ext()
> with has_invals = false and behaves the same as before.
>

Okay, understood. This will work.

> > Yet another idea could be to have another flag
> > RBTXN_HAS_INVALS which will be set by DecodeCommit for top-level TXN.
> > Then, we can retrieve it even for each of the subtxn's if and when
> > required.
>
> Do you mean that when checking if the subtransaction has catalog
> changes, we check if its top-level XID has this new flag?
>

Yes.

> Why do we
> need the new flag?
>

This is required if we don't want to introduce a new set of functions
as you proposed above. I am not sure which one is better w.r.t back
patching effort later but it seems to me using flag stuff would make
future back patches easier if we make any changes in
SnapBuildCommitTxn.

-- 
With Regards,
Amit Kapila.



On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 2:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 9:01 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > Another idea would be to have functions, say
> > > > SnapBuildCommitTxnWithXInfo() and SnapBuildCommitTxn_ext(). The latter
> > > > does actual work of handling transaction commits and both
> > > > SnapBuildCommitTxn() and SnapBuildCommit() call
> > > > SnapBuildCommitTxnWithXInfo() with different arguments.
> > > >
> > >
> > > Do you want to say DecodeCommit() instead of SnapBuildCommit() in
> > > above para?
> >
> > I meant that we will call like DecodeCommit() ->
> > SnapBuildCommitTxnWithXInfo() -> SnapBuildCommitTxn_ext(has_invals =
> > true) -> SnapBuildXidHasCatalogChanges(has_invals = true) -> ... If
> > SnapBuildCommitTxn() gets called, it calls SnapBuildCommitTxn_ext()
> > with has_invals = false and behaves the same as before.
> >
>
> Okay, understood. This will work.
>
> > > Yet another idea could be to have another flag
> > > RBTXN_HAS_INVALS which will be set by DecodeCommit for top-level TXN.
> > > Then, we can retrieve it even for each of the subtxn's if and when
> > > required.
> >
> > Do you mean that when checking if the subtransaction has catalog
> > changes, we check if its top-level XID has this new flag?
> >
>
> Yes.
>
> > Why do we
> > need the new flag?
> >
>
> This is required if we don't want to introduce a new set of functions
> as you proposed above. I am not sure which one is better w.r.t back
> patching effort later but it seems to me using flag stuff would make
> future back patches easier if we make any changes in
> SnapBuildCommitTxn.

Understood.

I've implemented this idea as well for discussion. Both patches have
the common change to remember the initial running transactions and to
purge them when decoding xl_running_xacts records. The difference is
how to mark the transactions as needing to be added to the snapshot.

In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
when the transaction is in the initial running xact list and its
commit record has XINFO_HAS_INVAL flag, we mark both the top
transaction and its all subtransactions as containing catalog changes
(which also means to create ReorderBufferTXN entries for them). These
transactions are added to the snapshot in SnapBuildCommitTxn() since
ReorderBufferXidHasCatalogChanges () for them returns true.

In poc_mark_top_txn_has_inval.patch, when the transaction is in the
initial running xacts list and its commit record has XINFO_HAS_INVALS
flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
transaction. In SnapBuildCommitTxn(), we add all subtransactions to
the snapshot without checking ReorderBufferXidHasCatalogChanges() for
subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
flag.

A difference between the two ideas is the scope of changes: the former
changes only snapbuild.c but the latter changes both snapbuild.c and
reorderbuffer.c. Moreover, while the former uses the existing flag,
the latter adds a new flag to the reorder buffer for dealing with only
this case. I think the former idea is simpler in terms of that. But,
an advantage of the latter idea is that the latter idea can save to
create ReorderBufferTXN entries for subtransactions.

Overall I prefer the former for now but I'd like to hear what others think.

FWIW, I didn't try the idea of adding wrapper functions since it would
be costly in terms of back patching effort in the future.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Fri, Jul 22, 2022 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> >
> > This is required if we don't want to introduce a new set of functions
> > as you proposed above. I am not sure which one is better w.r.t back
> > patching effort later but it seems to me using flag stuff would make
> > future back patches easier if we make any changes in
> > SnapBuildCommitTxn.
>
> Understood.
>
> I've implemented this idea as well for discussion. Both patches have
> the common change to remember the initial running transactions and to
> purge them when decoding xl_running_xacts records. The difference is
> how to mark the transactions as needing to be added to the snapshot.
>
> In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
> when the transaction is in the initial running xact list and its
> commit record has XINFO_HAS_INVAL flag, we mark both the top
> transaction and its all subtransactions as containing catalog changes
> (which also means to create ReorderBufferTXN entries for them). These
> transactions are added to the snapshot in SnapBuildCommitTxn() since
> ReorderBufferXidHasCatalogChanges () for them returns true.
>
> In poc_mark_top_txn_has_inval.patch, when the transaction is in the
> initial running xacts list and its commit record has XINFO_HAS_INVALS
> flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
> transaction.
>

It seems that the patch has missed the part to check if the xid is in
the initial running xacts list?

> In SnapBuildCommitTxn(), we add all subtransactions to
> the snapshot without checking ReorderBufferXidHasCatalogChanges() for
> subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
> flag.
>
> A difference between the two ideas is the scope of changes: the former
> changes only snapbuild.c but the latter changes both snapbuild.c and
> reorderbuffer.c. Moreover, while the former uses the existing flag,
> the latter adds a new flag to the reorder buffer for dealing with only
> this case. I think the former idea is simpler in terms of that. But,
> an advantage of the latter idea is that the latter idea can save to
> create ReorderBufferTXN entries for subtransactions.
>
> Overall I prefer the former for now but I'd like to hear what others think.
>

I agree that the latter idea can have better performance in extremely
special scenarios but introducing a new flag for the same sounds a bit
ugly to me. So, I would also prefer to go with the former idea,
however, I would also like to hear what Horiguchi-San and others have
to say.

Few comments on v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during:
1.
+void
+SnapBuildInitialXactSetCatalogChanges(SnapBuild *builder, TransactionId xid,
+   int subxcnt, TransactionId *subxacts,
+   XLogRecPtr lsn)
+{

I think it is better to name this function as
SnapBuildXIDSetCatalogChanges as we use this to mark a particular
transaction as having catalog changes.

2. Changed/added a few comments in the attached.

-- 
With Regards,
Amit Kapila.

Attachment
On Sat, Jul 23, 2022 at 8:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 22, 2022 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> > >
> > > This is required if we don't want to introduce a new set of functions
> > > as you proposed above. I am not sure which one is better w.r.t back
> > > patching effort later but it seems to me using flag stuff would make
> > > future back patches easier if we make any changes in
> > > SnapBuildCommitTxn.
> >
> > Understood.
> >
> > I've implemented this idea as well for discussion. Both patches have
> > the common change to remember the initial running transactions and to
> > purge them when decoding xl_running_xacts records. The difference is
> > how to mark the transactions as needing to be added to the snapshot.
> >
> > In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
> > when the transaction is in the initial running xact list and its
> > commit record has XINFO_HAS_INVAL flag, we mark both the top
> > transaction and its all subtransactions as containing catalog changes
> > (which also means to create ReorderBufferTXN entries for them). These
> > transactions are added to the snapshot in SnapBuildCommitTxn() since
> > ReorderBufferXidHasCatalogChanges () for them returns true.
> >
> > In poc_mark_top_txn_has_inval.patch, when the transaction is in the
> > initial running xacts list and its commit record has XINFO_HAS_INVALS
> > flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
> > transaction.
> >
>
> It seems that the patch has missed the part to check if the xid is in
> the initial running xacts list?

Oops, right.

>
> > In SnapBuildCommitTxn(), we add all subtransactions to
> > the snapshot without checking ReorderBufferXidHasCatalogChanges() for
> > subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
> > flag.
> >
> > A difference between the two ideas is the scope of changes: the former
> > changes only snapbuild.c but the latter changes both snapbuild.c and
> > reorderbuffer.c. Moreover, while the former uses the existing flag,
> > the latter adds a new flag to the reorder buffer for dealing with only
> > this case. I think the former idea is simpler in terms of that. But,
> > an advantage of the latter idea is that the latter idea can save to
> > create ReorderBufferTXN entries for subtransactions.
> >
> > Overall I prefer the former for now but I'd like to hear what others think.
> >
>
> I agree that the latter idea can have better performance in extremely
> special scenarios but introducing a new flag for the same sounds a bit
> ugly to me. So, I would also prefer to go with the former idea,
> however, I would also like to hear what Horiguchi-San and others have
> to say.

Agreed.

>
> Few comments on v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during:
> 1.
> +void
> +SnapBuildInitialXactSetCatalogChanges(SnapBuild *builder, TransactionId xid,
> +   int subxcnt, TransactionId *subxacts,
> +   XLogRecPtr lsn)
> +{
>
> I think it is better to name this function as
> SnapBuildXIDSetCatalogChanges as we use this to mark a particular
> transaction as having catalog changes.
>
> 2. Changed/added a few comments in the attached.

Thank you for the comments.

I've attached updated version patches for the master and back branches.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Mon, Jul 25, 2022 at 10:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Jul 23, 2022 at 8:32 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 22, 2022 at 11:48 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Wed, Jul 20, 2022 at 5:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Wed, Jul 20, 2022 at 1:28 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > > >
> > > >
> > > > This is required if we don't want to introduce a new set of functions
> > > > as you proposed above. I am not sure which one is better w.r.t back
> > > > patching effort later but it seems to me using flag stuff would make
> > > > future back patches easier if we make any changes in
> > > > SnapBuildCommitTxn.
> > >
> > > Understood.
> > >
> > > I've implemented this idea as well for discussion. Both patches have
> > > the common change to remember the initial running transactions and to
> > > purge them when decoding xl_running_xacts records. The difference is
> > > how to mark the transactions as needing to be added to the snapshot.
> > >
> > > In v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during.patch,
> > > when the transaction is in the initial running xact list and its
> > > commit record has XINFO_HAS_INVAL flag, we mark both the top
> > > transaction and its all subtransactions as containing catalog changes
> > > (which also means to create ReorderBufferTXN entries for them). These
> > > transactions are added to the snapshot in SnapBuildCommitTxn() since
> > > ReorderBufferXidHasCatalogChanges () for them returns true.
> > >
> > > In poc_mark_top_txn_has_inval.patch, when the transaction is in the
> > > initial running xacts list and its commit record has XINFO_HAS_INVALS
> > > flag, we set a new flag, say RBTXN_COMMIT_HAS_INVALS, only to the top
> > > transaction.
> > >
> >
> > It seems that the patch has missed the part to check if the xid is in
> > the initial running xacts list?
>
> Oops, right.
>
> >
> > > In SnapBuildCommitTxn(), we add all subtransactions to
> > > the snapshot without checking ReorderBufferXidHasCatalogChanges() for
> > > subtransactions if its top transaction has the RBTXN_COMMIT_HAS_INVALS
> > > flag.
> > >
> > > A difference between the two ideas is the scope of changes: the former
> > > changes only snapbuild.c but the latter changes both snapbuild.c and
> > > reorderbuffer.c. Moreover, while the former uses the existing flag,
> > > the latter adds a new flag to the reorder buffer for dealing with only
> > > this case. I think the former idea is simpler in terms of that. But,
> > > an advantage of the latter idea is that the latter idea can save to
> > > create ReorderBufferTXN entries for subtransactions.
> > >
> > > Overall I prefer the former for now but I'd like to hear what others think.
> > >
> >
> > I agree that the latter idea can have better performance in extremely
> > special scenarios but introducing a new flag for the same sounds a bit
> > ugly to me. So, I would also prefer to go with the former idea,
> > however, I would also like to hear what Horiguchi-San and others have
> > to say.
>
> Agreed.
>
> >
> > Few comments on v7-0001-Fix-catalog-lookup-with-the-wrong-snapshot-during:
> > 1.
> > +void
> > +SnapBuildInitialXactSetCatalogChanges(SnapBuild *builder, TransactionId xid,
> > +   int subxcnt, TransactionId *subxacts,
> > +   XLogRecPtr lsn)
> > +{
> >
> > I think it is better to name this function as
> > SnapBuildXIDSetCatalogChanges as we use this to mark a particular
> > transaction as having catalog changes.
> >
> > 2. Changed/added a few comments in the attached.
>
> Thank you for the comments.
>
> I've attached updated version patches for the master and back branches.

I've attached the patch for REl15 that I forgot.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
Hi,

I did some performance test for the master branch patch (based on v6 patch) to
see if the bsearch() added by this patch will cause any overhead.

I tested them three times and took the average.

The results are as follows, and attach the bar chart.

case 1
---------
No catalog modifying transaction.
Decode 800k pgbench transactions. (8 clients, 100k transactions per client)

master      7.5417
patched     7.4107

case 2
---------
There's one catalog modifying transaction.
Decode 100k/500k/1M transactions.

            100k        500k        1M
master      0.0576      0.1491      0.4346
patched     0.0586      0.1500      0.4344

case 3
---------
There are 64 catalog modifying transactions.
Decode 100k/500k/1M transactions.

            100k        500k        1M
master      0.0600      0.1666      0.4876
patched     0.0620      0.1653      0.4795

(Because the result of case 3 shows that there is a overhead of about 3% in the
case decoding 100k transactions with 64 catalog modifying transactions, I
tested the next run of 100k xacts with or without catalog modifying
transactions, to see if it affects subsequent decoding.)

case 4.1
---------
After the test steps in case 3 (64 catalog modifying transactions, decode 100k
transactions), run 100k xacts and then decode.

master      0.3699
patched     0.3701

case 4.2
---------
After the test steps in case 3 (64 catalog modifying transactions, decode 100k
transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.

master      0.3687
patched     0.3696

Summary of the tests:
After applying this patch, there is a overhead of about 3% in the case decoding
100k transactions with 64 catalog modifying transactions. This is an extreme
case, so maybe it's okay. And case 4.1 and case 4.2 shows that the patch has no
effect on subsequent decoding. In other cases, there are no significant
differences.

For your information, here are the parameters specified in postgresql.conf in
the test.

shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

Regards,
Shi yu

Attachment
On Mon, Jul 25, 2022 at 7:57 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> Hi,
>
> I did some performance test for the master branch patch (based on v6 patch) to
> see if the bsearch() added by this patch will cause any overhead.

Thank you for doing performance tests!

>
> I tested them three times and took the average.
>
> The results are as follows, and attach the bar chart.
>
> case 1
> ---------
> No catalog modifying transaction.
> Decode 800k pgbench transactions. (8 clients, 100k transactions per client)
>
> master      7.5417
> patched     7.4107
>
> case 2
> ---------
> There's one catalog modifying transaction.
> Decode 100k/500k/1M transactions.
>
>             100k        500k        1M
> master      0.0576      0.1491      0.4346
> patched     0.0586      0.1500      0.4344
>
> case 3
> ---------
> There are 64 catalog modifying transactions.
> Decode 100k/500k/1M transactions.
>
>             100k        500k        1M
> master      0.0600      0.1666      0.4876
> patched     0.0620      0.1653      0.4795
>
> (Because the result of case 3 shows that there is a overhead of about 3% in the
> case decoding 100k transactions with 64 catalog modifying transactions, I
> tested the next run of 100k xacts with or without catalog modifying
> transactions, to see if it affects subsequent decoding.)
>
> case 4.1
> ---------
> After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> transactions), run 100k xacts and then decode.
>
> master      0.3699
> patched     0.3701
>
> case 4.2
> ---------
> After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.
>
> master      0.3687
> patched     0.3696
>
> Summary of the tests:
> After applying this patch, there is a overhead of about 3% in the case decoding
> 100k transactions with 64 catalog modifying transactions. This is an extreme
> case, so maybe it's okay.

Yes. If we're worried about the overhead and doing bsearch() is the
cause, probably we can try simplehash instead of the array.

An improvement idea is that we pass the parsed->xinfo down to
SnapBuildXidHasCatalogChanges(), and then return from that function
before doing bearch() if the parsed->xinfo doesn't have
XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
non-catalog-modifying transactions. Is it worth trying?

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Tue, Jul 26, 2022 at 7:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 25, 2022 at 7:57 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
> > Hi,
> >
> > I did some performance test for the master branch patch (based on v6 patch) to
> > see if the bsearch() added by this patch will cause any overhead.
>
> Thank you for doing performance tests!
>
> >
> > I tested them three times and took the average.
> >
> > The results are as follows, and attach the bar chart.
> >
> > case 1
> > ---------
> > No catalog modifying transaction.
> > Decode 800k pgbench transactions. (8 clients, 100k transactions per client)
> >
> > master      7.5417
> > patched     7.4107
> >
> > case 2
> > ---------
> > There's one catalog modifying transaction.
> > Decode 100k/500k/1M transactions.
> >
> >             100k        500k        1M
> > master      0.0576      0.1491      0.4346
> > patched     0.0586      0.1500      0.4344
> >
> > case 3
> > ---------
> > There are 64 catalog modifying transactions.
> > Decode 100k/500k/1M transactions.
> >
> >             100k        500k        1M
> > master      0.0600      0.1666      0.4876
> > patched     0.0620      0.1653      0.4795
> >
> > (Because the result of case 3 shows that there is a overhead of about 3% in the
> > case decoding 100k transactions with 64 catalog modifying transactions, I
> > tested the next run of 100k xacts with or without catalog modifying
> > transactions, to see if it affects subsequent decoding.)
> >
> > case 4.1
> > ---------
> > After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> > transactions), run 100k xacts and then decode.
> >
> > master      0.3699
> > patched     0.3701
> >
> > case 4.2
> > ---------
> > After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> > transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.
> >
> > master      0.3687
> > patched     0.3696
> >
> > Summary of the tests:
> > After applying this patch, there is a overhead of about 3% in the case decoding
> > 100k transactions with 64 catalog modifying transactions. This is an extreme
> > case, so maybe it's okay.
>
> Yes. If we're worried about the overhead and doing bsearch() is the
> cause, probably we can try simplehash instead of the array.
>

I am not sure if we need to go that far for this extremely corner
case. Let's first try your below idea.

> An improvement idea is that we pass the parsed->xinfo down to
> SnapBuildXidHasCatalogChanges(), and then return from that function
> before doing bearch() if the parsed->xinfo doesn't have
> XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
> non-catalog-modifying transactions. Is it worth trying?
>

I think this is worth trying and this might reduce some of the
overhead as well in the case presented by Shi-San.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 26, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 7:00 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Jul 25, 2022 at 7:57 PM shiy.fnst@fujitsu.com
> > <shiy.fnst@fujitsu.com> wrote:
> > >
> > > Hi,
> > >
> > > I did some performance test for the master branch patch (based on v6 patch) to
> > > see if the bsearch() added by this patch will cause any overhead.
> >
> > Thank you for doing performance tests!
> >
> > >
> > > I tested them three times and took the average.
> > >
> > > The results are as follows, and attach the bar chart.
> > >
> > > case 1
> > > ---------
> > > No catalog modifying transaction.
> > > Decode 800k pgbench transactions. (8 clients, 100k transactions per client)
> > >
> > > master      7.5417
> > > patched     7.4107
> > >
> > > case 2
> > > ---------
> > > There's one catalog modifying transaction.
> > > Decode 100k/500k/1M transactions.
> > >
> > >             100k        500k        1M
> > > master      0.0576      0.1491      0.4346
> > > patched     0.0586      0.1500      0.4344
> > >
> > > case 3
> > > ---------
> > > There are 64 catalog modifying transactions.
> > > Decode 100k/500k/1M transactions.
> > >
> > >             100k        500k        1M
> > > master      0.0600      0.1666      0.4876
> > > patched     0.0620      0.1653      0.4795
> > >
> > > (Because the result of case 3 shows that there is a overhead of about 3% in the
> > > case decoding 100k transactions with 64 catalog modifying transactions, I
> > > tested the next run of 100k xacts with or without catalog modifying
> > > transactions, to see if it affects subsequent decoding.)
> > >
> > > case 4.1
> > > ---------
> > > After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> > > transactions), run 100k xacts and then decode.
> > >
> > > master      0.3699
> > > patched     0.3701
> > >
> > > case 4.2
> > > ---------
> > > After the test steps in case 3 (64 catalog modifying transactions, decode 100k
> > > transactions), run 64 DDLs(without checkpoint) and 100k xacts, then decode.
> > >
> > > master      0.3687
> > > patched     0.3696
> > >
> > > Summary of the tests:
> > > After applying this patch, there is a overhead of about 3% in the case decoding
> > > 100k transactions with 64 catalog modifying transactions. This is an extreme
> > > case, so maybe it's okay.
> >
> > Yes. If we're worried about the overhead and doing bsearch() is the
> > cause, probably we can try simplehash instead of the array.
> >
>
> I am not sure if we need to go that far for this extremely corner
> case. Let's first try your below idea.
>
> > An improvement idea is that we pass the parsed->xinfo down to
> > SnapBuildXidHasCatalogChanges(), and then return from that function
> > before doing bearch() if the parsed->xinfo doesn't have
> > XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
> > non-catalog-modifying transactions. Is it worth trying?
> >
>
> I think this is worth trying and this might reduce some of the
> overhead as well in the case presented by Shi-San.

Okay, I've attached an updated patch that does the above idea. Could
you please do the performance tests again to see if the idea can help
reduce the overhead, Shi yu?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Tue, Jul 26, 2022 3:52 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> On Tue, Jul 26, 2022 at 2:18 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Tue, Jul 26, 2022 at 7:00 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > >
> > > On Mon, Jul 25, 2022 at 7:57 PM shiy.fnst@fujitsu.com
> > > <shiy.fnst@fujitsu.com> wrote:
> > > >
> > > >
> > > > case 3
> > > > ---------
> > > > There are 64 catalog modifying transactions.
> > > > Decode 100k/500k/1M transactions.
> > > >
> > > >             100k        500k        1M
> > > > master      0.0600      0.1666      0.4876
> > > > patched     0.0620      0.1653      0.4795
> > > >
> > > >
> > > > Summary of the tests:
> > > > After applying this patch, there is a overhead of about 3% in the case
> decoding
> > > > 100k transactions with 64 catalog modifying transactions. This is an
> extreme
> > > > case, so maybe it's okay.
> > >
> > > Yes. If we're worried about the overhead and doing bsearch() is the
> > > cause, probably we can try simplehash instead of the array.
> > >
> >
> > I am not sure if we need to go that far for this extremely corner
> > case. Let's first try your below idea.
> >
> > > An improvement idea is that we pass the parsed->xinfo down to
> > > SnapBuildXidHasCatalogChanges(), and then return from that function
> > > before doing bearch() if the parsed->xinfo doesn't have
> > > XACT_XINFO_HAS_INVALS. That would save calling bsearch() for
> > > non-catalog-modifying transactions. Is it worth trying?
> > >
> >
> > I think this is worth trying and this might reduce some of the
> > overhead as well in the case presented by Shi-San.
> 
> Okay, I've attached an updated patch that does the above idea. Could
> you please do the performance tests again to see if the idea can help
> reduce the overhead, Shi yu?
> 

Thanks for your improvement. I have tested the case which shows overhead before
(decoding 100k transactions with 64 catalog modifying transactions) for the v9
patch, the result is as follows.

master      0.0607
patched     0.0613

There's almost no difference compared with master (less than 1%), which looks
good to me.

Regards,
Shi yu

On Mon, Jul 25, 2022 at 11:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Jul 25, 2022 at 10:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I've attached the patch for REl15 that I forgot.
>

I feel the place to remember running xacts information in
SnapBuildProcessRunningXacts is not appropriate. Because in cases
where there are no running xacts or when xl_running_xact is old enough
that we can't use it, we don't need that information. I feel we need
it only when we have to reuse the already serialized snapshot, so,
won't it be better to initialize at that place in
SnapBuildFindSnapshot()? I have changed accordingly in the attached
and apart from that slightly modified the comments and commit message.
Do let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.

Attachment
On Wed, Jul 27, 2022 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Jul 25, 2022 at 11:26 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Mon, Jul 25, 2022 at 10:45 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached the patch for REl15 that I forgot.
> >
>
> I feel the place to remember running xacts information in
> SnapBuildProcessRunningXacts is not appropriate. Because in cases
> where there are no running xacts or when xl_running_xact is old enough
> that we can't use it, we don't need that information. I feel we need
> it only when we have to reuse the already serialized snapshot, so,
> won't it be better to initialize at that place in
> SnapBuildFindSnapshot()?

Good point, agreed.

>  I have changed accordingly in the attached
> and apart from that slightly modified the comments and commit message.
> Do let me know what you think of the attached?

It would be better to remember the initial running xacts after
SnapBuildRestore() returns true? Because otherwise, we could end up
allocating InitialRunningXacts multiple times while leaking the old
ones if there are no serialized snapshots that we are interested in.

---
+               if (builder->state == SNAPBUILD_START)
+               {
+                       int                     nxacts =
running->subxcnt + running->xcnt;
+                       Size            sz = sizeof(TransactionId) * nxacts;
+
+                       NInitialRunningXacts = nxacts;
+                       InitialRunningXacts =
MemoryContextAlloc(builder->context, sz);
+                       memcpy(InitialRunningXacts, running->xids, sz);
+                       qsort(InitialRunningXacts, nxacts,
sizeof(TransactionId), xidComparator);
+               }

We should allocate the memory for InitialRunningXacts only when
(running->subxcnt + running->xcnt) > 0.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Jul 27, 2022 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> >  I have changed accordingly in the attached
> > and apart from that slightly modified the comments and commit message.
> > Do let me know what you think of the attached?
>
> It would be better to remember the initial running xacts after
> SnapBuildRestore() returns true? Because otherwise, we could end up
> allocating InitialRunningXacts multiple times while leaking the old
> ones if there are no serialized snapshots that we are interested in.
>

Right, this makes sense. But note that you can no longer have a check
(builder->state == SNAPBUILD_START) which I believe is not required.
We need to do this after restore, in whichever state snapshot was as
any state other than SNAPBUILD_CONSISTENT can have commits without all
their changes.

Accordingly, I think the comment: "Remember the transactions and
subtransactions that were running when xl_running_xacts record that we
decoded first was written." needs to be slightly modified to something
like: "Remember the transactions and subtransactions that were running
when xl_running_xacts record that we decoded was written.". Change
this if it is used at any other place in the patch.

> ---
> +               if (builder->state == SNAPBUILD_START)
> +               {
> +                       int                     nxacts =
> running->subxcnt + running->xcnt;
> +                       Size            sz = sizeof(TransactionId) * nxacts;
> +
> +                       NInitialRunningXacts = nxacts;
> +                       InitialRunningXacts =
> MemoryContextAlloc(builder->context, sz);
> +                       memcpy(InitialRunningXacts, running->xids, sz);
> +                       qsort(InitialRunningXacts, nxacts,
> sizeof(TransactionId), xidComparator);
> +               }
>
> We should allocate the memory for InitialRunningXacts only when
> (running->subxcnt + running->xcnt) > 0.
>

There is no harm in doing that but ideally, that case would have been
covered by an earlier check "if (running->oldestRunningXid ==
running->nextXid)" which suggests "No transactions were running, so we
can jump to consistent."

Kindly make the required changes and submit the back branch patches again.

-- 
With Regards,
Amit Kapila.



() an

On Thu, Jul 28, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Wed, Jul 27, 2022 at 8:33 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > >  I have changed accordingly in the attached
> > > and apart from that slightly modified the comments and commit message.
> > > Do let me know what you think of the attached?
> >
> > It would be better to remember the initial running xacts after
> > SnapBuildRestore() returns true? Because otherwise, we could end up
> > allocating InitialRunningXacts multiple times while leaking the old
> > ones if there are no serialized snapshots that we are interested in.
> >
>
> Right, this makes sense. But note that you can no longer have a check
> (builder->state == SNAPBUILD_START) which I believe is not required.
> We need to do this after restore, in whichever state snapshot was as
> any state other than SNAPBUILD_CONSISTENT can have commits without all
> their changes.

Right.

>
> Accordingly, I think the comment: "Remember the transactions and
> subtransactions that were running when xl_running_xacts record that we
> decoded first was written." needs to be slightly modified to something
> like: "Remember the transactions and subtransactions that were running
> when xl_running_xacts record that we decoded was written.". Change
> this if it is used at any other place in the patch.

Agreed.

>
> > ---
> > +               if (builder->state == SNAPBUILD_START)
> > +               {
> > +                       int                     nxacts =
> > running->subxcnt + running->xcnt;
> > +                       Size            sz = sizeof(TransactionId) * nxacts;
> > +
> > +                       NInitialRunningXacts = nxacts;
> > +                       InitialRunningXacts =
> > MemoryContextAlloc(builder->context, sz);
> > +                       memcpy(InitialRunningXacts, running->xids, sz);
> > +                       qsort(InitialRunningXacts, nxacts,
> > sizeof(TransactionId), xidComparator);
> > +               }
> >
> > We should allocate the memory for InitialRunningXacts only when
> > (running->subxcnt + running->xcnt) > 0.
> >
>
d > There is no harm in doing that but ideally, that case would have been
> covered by an earlier check "if (running->oldestRunningXid ==
> running->nextXid)" which suggests "No transactions were running, so we
> can jump to consistent."

You're right.

While editing back branch patches, I realized that the following
(parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
equivalent:

+   /*
+    * If the COMMIT record has invalidation messages, it could have catalog
+    * changes. It is possible that we didn't mark this transaction as
+    * containing catalog changes when the decoding starts from a commit
+    * record without decoding the transaction's other changes. So, we ensure
+    * to mark such transactions as containing catalog change.
+    *
+    * This must be done before SnapBuildCommitTxn() so that we can include
+    * these transactions in the historic snapshot.
+    */
+   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
+       SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
+                                     parsed->nsubxacts, parsed->subxacts,
+                                     buf->origptr);
+
    /*
     * Process invalidation messages, even if we're not interested in the
     * transaction's contents, since the various caches need to always be
     * consistent.
     */
    if (parsed->nmsgs > 0)
    {
        if (!ctx->fast_forward)
            ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
                                          parsed->nmsgs, parsed->msgs);
        ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
    }

If that's right, I think we can merge these if branches. We can call
ReorderBufferXidSetCatalogChanges() for top-txn and in
SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
is in the list. What do you think?

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Thu, Jul 28, 2022 at 11:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
>
> While editing back branch patches, I realized that the following
> (parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
> equivalent:
>
> +   /*
> +    * If the COMMIT record has invalidation messages, it could have catalog
> +    * changes. It is possible that we didn't mark this transaction as
> +    * containing catalog changes when the decoding starts from a commit
> +    * record without decoding the transaction's other changes. So, we ensure
> +    * to mark such transactions as containing catalog change.
> +    *
> +    * This must be done before SnapBuildCommitTxn() so that we can include
> +    * these transactions in the historic snapshot.
> +    */
> +   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
> +       SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
> +                                     parsed->nsubxacts, parsed->subxacts,
> +                                     buf->origptr);
> +
>     /*
>      * Process invalidation messages, even if we're not interested in the
>      * transaction's contents, since the various caches need to always be
>      * consistent.
>      */
>     if (parsed->nmsgs > 0)
>     {
>         if (!ctx->fast_forward)
>             ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
>                                           parsed->nmsgs, parsed->msgs);
>         ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
>     }
>
> If that's right, I think we can merge these if branches. We can call
> ReorderBufferXidSetCatalogChanges() for top-txn and in
> SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
> is in the list. What do you think?
>

Note that this code doesn't exist in 14 and 15, so we need to create
different patches for those. BTW, how in 13 and lower versions did we
identify and mark subxacts as having catalog changes without our
patch?

-- 
With Regards,
Amit Kapila.



On Thu, Jul 28, 2022 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 11:56 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jul 28, 2022 at 12:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > > >
> >
> > While editing back branch patches, I realized that the following
> > (parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
> > equivalent:
> >
> > +   /*
> > +    * If the COMMIT record has invalidation messages, it could have catalog
> > +    * changes. It is possible that we didn't mark this transaction as
> > +    * containing catalog changes when the decoding starts from a commit
> > +    * record without decoding the transaction's other changes. So, we ensure
> > +    * to mark such transactions as containing catalog change.
> > +    *
> > +    * This must be done before SnapBuildCommitTxn() so that we can include
> > +    * these transactions in the historic snapshot.
> > +    */
> > +   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
> > +       SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
> > +                                     parsed->nsubxacts, parsed->subxacts,
> > +                                     buf->origptr);
> > +
> >     /*
> >      * Process invalidation messages, even if we're not interested in the
> >      * transaction's contents, since the various caches need to always be
> >      * consistent.
> >      */
> >     if (parsed->nmsgs > 0)
> >     {
> >         if (!ctx->fast_forward)
> >             ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
> >                                           parsed->nmsgs, parsed->msgs);
> >         ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
> >     }
> >
> > If that's right, I think we can merge these if branches. We can call
> > ReorderBufferXidSetCatalogChanges() for top-txn and in
> > SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
> > is in the list. What do you think?
> >
>
> Note that this code doesn't exist in 14 and 15, so we need to create
> different patches for those.

Right.

> BTW, how in 13 and lower versions did we
> identify and mark subxacts as having catalog changes without our
> patch?

I think we use HEAP_INPLACE and XLOG_HEAP2_NEW_CID to mark subxacts as well.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Thu, Jul 28, 2022 at 12:56 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 4:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > While editing back branch patches, I realized that the following
> > > (parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
> > > equivalent:
> > >
> > > +   /*
> > > +    * If the COMMIT record has invalidation messages, it could have catalog
> > > +    * changes. It is possible that we didn't mark this transaction as
> > > +    * containing catalog changes when the decoding starts from a commit
> > > +    * record without decoding the transaction's other changes. So, we ensure
> > > +    * to mark such transactions as containing catalog change.
> > > +    *
> > > +    * This must be done before SnapBuildCommitTxn() so that we can include
> > > +    * these transactions in the historic snapshot.
> > > +    */
> > > +   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
> > > +       SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
> > > +                                     parsed->nsubxacts, parsed->subxacts,
> > > +                                     buf->origptr);
> > > +
> > >     /*
> > >      * Process invalidation messages, even if we're not interested in the
> > >      * transaction's contents, since the various caches need to always be
> > >      * consistent.
> > >      */
> > >     if (parsed->nmsgs > 0)
> > >     {
> > >         if (!ctx->fast_forward)
> > >             ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
> > >                                           parsed->nmsgs, parsed->msgs);
> > >         ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
> > >     }
> > >
> > > If that's right, I think we can merge these if branches. We can call
> > > ReorderBufferXidSetCatalogChanges() for top-txn and in
> > > SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
> > > is in the list. What do you think?
> > >
> >
> > Note that this code doesn't exist in 14 and 15, so we need to create
> > different patches for those.
>
> Right.
>

Okay, then this sounds reasonable to me.

-- 
With Regards,
Amit Kapila.



On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> Okay, I've attached an updated patch that does the above idea. Could
> you please do the performance tests again to see if the idea can help
> reduce the overhead, Shi yu?
>

While reviewing the patch for HEAD, I have changed a few comments. See
attached, if you agree with these changes then include them in the
next version.

-- 
With Regards,
Amit Kapila.

Attachment
On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Okay, I've attached an updated patch that does the above idea. Could
> > you please do the performance tests again to see if the idea can help
> > reduce the overhead, Shi yu?
> >
>
> While reviewing the patch for HEAD, I have changed a few comments. See
> attached, if you agree with these changes then include them in the
> next version.
>

I have another comment on this patch:
SnapBuildPurgeOlderTxn()
{
...
+ if (surviving_xids > 0)
+ memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
+ surviving_xids * sizeof(TransactionId))
...

For this code to hit, we must have a situation where one or more of
the xacts in this array must be still running. And, if that is true,
we would not have started from the restart point where the
corresponding snapshot (that contains the still running xacts) has
been serialized because we advance the restart point to not before the
oldest running xacts restart_decoding_lsn. This may not be easy to
understand so let me take an example to explain. Say we have two
transactions t1 and t2, and both have made catalog changes. We want a
situation where one of those gets purged and the other remains in
builder->catchange.xip array. I have tried variants of the below
sequence to see if I can get into the required situation but am not
able to make it.

Session-1
Checkpoint -1;
T1
DDL

Session-2
T2
DDL

Session-3
Checkpoint-2;
pg_logical_slot_get_changes()
 -- Here when we serialize the snapshot corresponding to
CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
as catalog-changing xacts.

Session-1
T1
Commit;

Checkpoint;
pg_logical_slot_get_changes()
 -- Here we will restore from Checkpoint-1's serialized snapshot and
won't be able to move restart_point to Checkpoint-2 because T2 is
still open.

Now, as per my understanding, it is only possible to move
restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
which case we will never have that in surviving_xids array after the
purge.

It is possible I am missing something here. Do let me know your thoughts.

-- 
With Regards,
Amit Kapila.



On Thu, Jul 28, 2022 at 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > Okay, I've attached an updated patch that does the above idea. Could
> > > you please do the performance tests again to see if the idea can help
> > > reduce the overhead, Shi yu?
> > >
> >
> > While reviewing the patch for HEAD, I have changed a few comments. See
> > attached, if you agree with these changes then include them in the
> > next version.
> >
>
> I have another comment on this patch:
> SnapBuildPurgeOlderTxn()
> {
> ...
> + if (surviving_xids > 0)
> + memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
> + surviving_xids * sizeof(TransactionId))
> ...
>
> For this code to hit, we must have a situation where one or more of
> the xacts in this array must be still running. And, if that is true,
> we would not have started from the restart point where the
> corresponding snapshot (that contains the still running xacts) has
> been serialized because we advance the restart point to not before the
> oldest running xacts restart_decoding_lsn. This may not be easy to
> understand so let me take an example to explain. Say we have two
> transactions t1 and t2, and both have made catalog changes. We want a
> situation where one of those gets purged and the other remains in
> builder->catchange.xip array. I have tried variants of the below
> sequence to see if I can get into the required situation but am not
> able to make it.
>
> Session-1
> Checkpoint -1;
> T1
> DDL
>
> Session-2
> T2
> DDL
>
> Session-3
> Checkpoint-2;
> pg_logical_slot_get_changes()
>  -- Here when we serialize the snapshot corresponding to
> CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
> as catalog-changing xacts.
>
> Session-1
> T1
> Commit;
>
> Checkpoint;
> pg_logical_slot_get_changes()
>  -- Here we will restore from Checkpoint-1's serialized snapshot and
> won't be able to move restart_point to Checkpoint-2 because T2 is
> still open.
>
> Now, as per my understanding, it is only possible to move
> restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
> which case we will never have that in surviving_xids array after the
> purge.
>
> It is possible I am missing something here. Do let me know your thoughts.

Yeah, your description makes sense to me. I've also considered how to
hit this path but I guess it is never hit. Thinking of it in another
way, first of all, at least 2 catalog modifying transactions have to
be running while writing a xl_running_xacts. The serialized snapshot
that is written when we decode the first xl_running_xact has two
transactions. Then, one of them is committed before the second
xl_running_xacts. The second serialized snapshot has only one
transaction. Then, the transaction is also committed after that. Now,
in order to execute the path, we need to start decoding from the first
serialized snapshot. However, if we start from there, we cannot decode
the full contents of the transaction that was committed later.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Fri, Jul 29, 2022 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Thu, Jul 28, 2022 at 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > I have another comment on this patch:
> > SnapBuildPurgeOlderTxn()
> > {
> > ...
> > + if (surviving_xids > 0)
> > + memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
> > + surviving_xids * sizeof(TransactionId))
> > ...
> >
> > For this code to hit, we must have a situation where one or more of
> > the xacts in this array must be still running. And, if that is true,
> > we would not have started from the restart point where the
> > corresponding snapshot (that contains the still running xacts) has
> > been serialized because we advance the restart point to not before the
> > oldest running xacts restart_decoding_lsn. This may not be easy to
> > understand so let me take an example to explain. Say we have two
> > transactions t1 and t2, and both have made catalog changes. We want a
> > situation where one of those gets purged and the other remains in
> > builder->catchange.xip array. I have tried variants of the below
> > sequence to see if I can get into the required situation but am not
> > able to make it.
> >
> > Session-1
> > Checkpoint -1;
> > T1
> > DDL
> >
> > Session-2
> > T2
> > DDL
> >
> > Session-3
> > Checkpoint-2;
> > pg_logical_slot_get_changes()
> >  -- Here when we serialize the snapshot corresponding to
> > CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
> > as catalog-changing xacts.
> >
> > Session-1
> > T1
> > Commit;
> >
> > Checkpoint;
> > pg_logical_slot_get_changes()
> >  -- Here we will restore from Checkpoint-1's serialized snapshot and
> > won't be able to move restart_point to Checkpoint-2 because T2 is
> > still open.
> >
> > Now, as per my understanding, it is only possible to move
> > restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
> > which case we will never have that in surviving_xids array after the
> > purge.
> >
> > It is possible I am missing something here. Do let me know your thoughts.
>
> Yeah, your description makes sense to me. I've also considered how to
> hit this path but I guess it is never hit. Thinking of it in another
> way, first of all, at least 2 catalog modifying transactions have to
> be running while writing a xl_running_xacts. The serialized snapshot
> that is written when we decode the first xl_running_xact has two
> transactions. Then, one of them is committed before the second
> xl_running_xacts. The second serialized snapshot has only one
> transaction. Then, the transaction is also committed after that. Now,
> in order to execute the path, we need to start decoding from the first
> serialized snapshot. However, if we start from there, we cannot decode
> the full contents of the transaction that was committed later.
>

I think then we should change this code in the master branch patch
with an additional comment on the lines of: "Either all the xacts got
purged or none. It is only possible to partially remove the xids from
this array if one or more of the xids are still running but not all.
That can happen if we start decoding from a point (LSN where the
snapshot state became consistent) where all the xacts in this were
running and then at least one of those got committed and a few are
still running. We will never start from such a point because we won't
move the slot's restart_lsn past the point where the oldest running
transaction's restart_decoding_lsn is."

I suggest keeping the back branch as it is w.r.t this change as if
this logic proves to be faulty it won't affect the stable branches. We
can always back-patch this small change if required.

-- 
With Regards,
Amit Kapila.



On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 29, 2022 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Thu, Jul 28, 2022 at 8:57 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > I have another comment on this patch:
> > > SnapBuildPurgeOlderTxn()
> > > {
> > > ...
> > > + if (surviving_xids > 0)
> > > + memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
> > > + surviving_xids * sizeof(TransactionId))
> > > ...
> > >
> > > For this code to hit, we must have a situation where one or more of
> > > the xacts in this array must be still running. And, if that is true,
> > > we would not have started from the restart point where the
> > > corresponding snapshot (that contains the still running xacts) has
> > > been serialized because we advance the restart point to not before the
> > > oldest running xacts restart_decoding_lsn. This may not be easy to
> > > understand so let me take an example to explain. Say we have two
> > > transactions t1 and t2, and both have made catalog changes. We want a
> > > situation where one of those gets purged and the other remains in
> > > builder->catchange.xip array. I have tried variants of the below
> > > sequence to see if I can get into the required situation but am not
> > > able to make it.
> > >
> > > Session-1
> > > Checkpoint -1;
> > > T1
> > > DDL
> > >
> > > Session-2
> > > T2
> > > DDL
> > >
> > > Session-3
> > > Checkpoint-2;
> > > pg_logical_slot_get_changes()
> > >  -- Here when we serialize the snapshot corresponding to
> > > CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
> > > as catalog-changing xacts.
> > >
> > > Session-1
> > > T1
> > > Commit;
> > >
> > > Checkpoint;
> > > pg_logical_slot_get_changes()
> > >  -- Here we will restore from Checkpoint-1's serialized snapshot and
> > > won't be able to move restart_point to Checkpoint-2 because T2 is
> > > still open.
> > >
> > > Now, as per my understanding, it is only possible to move
> > > restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
> > > which case we will never have that in surviving_xids array after the
> > > purge.
> > >
> > > It is possible I am missing something here. Do let me know your thoughts.
> >
> > Yeah, your description makes sense to me. I've also considered how to
> > hit this path but I guess it is never hit. Thinking of it in another
> > way, first of all, at least 2 catalog modifying transactions have to
> > be running while writing a xl_running_xacts. The serialized snapshot
> > that is written when we decode the first xl_running_xact has two
> > transactions. Then, one of them is committed before the second
> > xl_running_xacts. The second serialized snapshot has only one
> > transaction. Then, the transaction is also committed after that. Now,
> > in order to execute the path, we need to start decoding from the first
> > serialized snapshot. However, if we start from there, we cannot decode
> > the full contents of the transaction that was committed later.
> >
>
> I think then we should change this code in the master branch patch
> with an additional comment on the lines of: "Either all the xacts got
> purged or none. It is only possible to partially remove the xids from
> this array if one or more of the xids are still running but not all.
> That can happen if we start decoding from a point (LSN where the
> snapshot state became consistent) where all the xacts in this were
> running and then at least one of those got committed and a few are
> still running. We will never start from such a point because we won't
> move the slot's restart_lsn past the point where the oldest running
> transaction's restart_decoding_lsn is."

Agreed.

>
> I suggest keeping the back branch as it is w.r.t this change as if
> this logic proves to be faulty it won't affect the stable branches. We
> can always back-patch this small change if required.

Yes, during PG16 release cycle, we can have time for evaluating
whether the approach in the master branch is correct. We can always
back-patch the part.

I've attached updated patches for all branches. Please review them.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
>
> I've attached updated patches for all branches. Please review them.
>

Thanks, the patches look mostly good to me. I have made minor edits by
removing 'likely' from a few places as those don't seem to be adding
much value, changed comments at a few places, and was getting
compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
initial declarations are only allowed in C99 mode) which I have fixed.
See attached, unless there are major comments/suggestions, I am
planning to push this day after tomorrow (by Wednesday) after another
pass.

--
With Regards,
Amit Kapila.

Attachment
At Mon, 1 Aug 2022 20:01:00 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> >
> > I've attached updated patches for all branches. Please review them.
> >
> 
> Thanks, the patches look mostly good to me. I have made minor edits by
> removing 'likely' from a few places as those don't seem to be adding
> much value, changed comments at a few places, and was getting
> compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> initial declarations are only allowed in C99 mode) which I have fixed.
> See attached, unless there are major comments/suggestions, I am
> planning to push this day after tomorrow (by Wednesday) after another
> pass.

master:
+ * Read the contents of the serialized snapshot to the dest.

Do we need the "the" before the "dest"?

+    {
+        int            save_errno = errno;
+
+        CloseTransientFile(fd);
+
+        if (readBytes < 0)
+        {
+            errno = save_errno;
+            ereport(ERROR,

Do we need the CloseTransientFile(fd) there?  This call requires errno
to be remembered but anyway OpenTransientFile'd files are to be close
at transaction end.  Actually CloseTransientFile() is not called
before error'ing-out at error in other places.


+     * from the LSN-ordered list of toplevel TXNs. We remove TXN from the list

We remove "the" TXN"?

+    if (dlist_is_empty(&rb->catchange_txns))
+    {
+        Assert(rb->catchange_ntxns == 0);
+        return NULL;
+    }

It seems that the assert is far simpler than dlist_is_empty().  Why
don't we swap the conditions for if() and Assert() in the above?

+     * the oldest running transaction窶冱 restart_decoding_lsn is.

The line contains a broken characters.


+     * Either all the xacts got purged or none. It is only possible to
+     * partially remove the xids from this array if one or more of the xids
+     * are still running but not all. That can happen if we start decoding

Assuming this, the commment below seems getting stale.

+     * catalog. We remove xids from this array when they become old enough to
+     * matter, and then it eventually becomes empty.

"We discard this array when the all containing xids are gone. See
SnapBuildPurgeOlderTxn for details." or something like?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



On Tue, Aug 2, 2022 at 12:00 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 1 Aug 2022 20:01:00 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > I've attached updated patches for all branches. Please review them.
> > >
> >
> > Thanks, the patches look mostly good to me. I have made minor edits by
> > removing 'likely' from a few places as those don't seem to be adding
> > much value, changed comments at a few places, and was getting
> > compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> > initial declarations are only allowed in C99 mode) which I have fixed.
> > See attached, unless there are major comments/suggestions, I am
> > planning to push this day after tomorrow (by Wednesday) after another
> > pass.
>
>
> +       {
> +               int                     save_errno = errno;
> +
> +               CloseTransientFile(fd);
> +
> +               if (readBytes < 0)
> +               {
> +                       errno = save_errno;
> +                       ereport(ERROR,
>
> Do we need the CloseTransientFile(fd) there?  This call requires errno
> to be remembered but anyway OpenTransientFile'd files are to be close
> at transaction end.  Actually CloseTransientFile() is not called
> before error'ing-out at error in other places.
>

But this part of the code is just a copy of the existing code. See:

- if (readBytes != sizeof(SnapBuild))
- {
- int save_errno = errno;
-
- CloseTransientFile(fd);
-
- if (readBytes < 0)
- {
- errno = save_errno;
- ereport(ERROR,
- (errcode_for_file_access(),
- errmsg("could not read file \"%s\": %m", path)));
- }
- else
- ereport(ERROR,
- (errcode(ERRCODE_DATA_CORRUPTED),
- errmsg("could not read file \"%s\": read %d of %zu",
- path, readBytes, sizeof(SnapBuild))));
- }

We just moved it to a separate function as the same code is being
duplicated to multiple places.

--
With Regards,
Amit Kapila.



On Mon, Aug 1, 2022 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> >
> > I've attached updated patches for all branches. Please review them.
> >
> 
> Thanks, the patches look mostly good to me. I have made minor edits by
> removing 'likely' from a few places as those don't seem to be adding
> much value, changed comments at a few places, and was getting
> compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> initial declarations are only allowed in C99 mode) which I have fixed.
> See attached, unless there are major comments/suggestions, I am
> planning to push this day after tomorrow (by Wednesday) after another
> pass.
> 

Thanks for updating the patch.

Here are some minor comments:

1.
patches for REL10 ~ REL13:
+ * Mark the transaction as containing catalog changes. In addition, if the
+ * given xid is in the list of the initial running xacts, we mark the
+ * its subtransactions as well. See comments for NInitialRunningXacts and
+ * InitialRunningXacts for additional info.

"mark the its subtransactions"
->
"mark its subtransactions"

2.
patches for REL10 ~ REL15:
In the comment in catalog_change_snapshot.spec, maybe we can use "RUNNING_XACTS"
instead of "RUNNING_XACT" "XACT_RUNNING", same as the patch for master branch.

Regards,
Shi yu


On Tue, Aug 2, 2022 at 3:30 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Mon, 1 Aug 2022 20:01:00 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
> > On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > I've attached updated patches for all branches. Please review them.
> > >
> >
> > Thanks, the patches look mostly good to me. I have made minor edits by
> > removing 'likely' from a few places as those don't seem to be adding
> > much value, changed comments at a few places, and was getting
> > compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> > initial declarations are only allowed in C99 mode) which I have fixed.
> > See attached, unless there are major comments/suggestions, I am
> > planning to push this day after tomorrow (by Wednesday) after another
> > pass.
>
> master:
> + * Read the contents of the serialized snapshot to the dest.
>
> Do we need the "the" before the "dest"?

Fixed.

>
> +       {
> +               int                     save_errno = errno;
> +
> +               CloseTransientFile(fd);
> +
> +               if (readBytes < 0)
> +               {
> +                       errno = save_errno;
> +                       ereport(ERROR,
>
> Do we need the CloseTransientFile(fd) there?  This call requires errno
> to be remembered but anyway OpenTransientFile'd files are to be close
> at transaction end.  Actually CloseTransientFile() is not called
> before error'ing-out at error in other places.

As Amit mentioned, it's just moved from SnapBuildRestore(). Looking at
other code in snapbuild.c, we call CloseTransientFile before erroring
out. I think it's better to keep it consistent with nearby codes.

>
>
> +        * from the LSN-ordered list of toplevel TXNs. We remove TXN from the list
>
> We remove "the" TXN"?

Fixed.

>
> +       if (dlist_is_empty(&rb->catchange_txns))
> +       {
> +               Assert(rb->catchange_ntxns == 0);
> +               return NULL;
> +       }
>
> It seems that the assert is far simpler than dlist_is_empty().  Why
> don't we swap the conditions for if() and Assert() in the above?

Changed.

>
> +        * the oldest running transaction窶冱 restart_decoding_lsn is.
>
> The line contains a broken characters.

Fixed.

>
>
> +        * Either all the xacts got purged or none. It is only possible to
> +        * partially remove the xids from this array if one or more of the xids
> +        * are still running but not all. That can happen if we start decoding
>
> Assuming this, the commment below seems getting stale.
>
> +        * catalog. We remove xids from this array when they become old enough to
> +        * matter, and then it eventually becomes empty.
>
> "We discard this array when the all containing xids are gone. See
> SnapBuildPurgeOlderTxn for details." or something like?

Changed to:

We discard this array when all the xids in the list become old enough
to matter. See SnapBuildPurgeOlderTxn for details.

I've attached updated patches that incorporated the above comments as
well as the comments from Shi yu. Please review them.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Tue, Aug 2, 2022 at 5:31 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Mon, Aug 1, 2022 10:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Aug 1, 2022 at 7:46 AM Masahiko Sawada
> > <sawada.mshk@gmail.com> wrote:
> > >
> > > On Fri, Jul 29, 2022 at 3:45 PM Amit Kapila <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > >
> > > I've attached updated patches for all branches. Please review them.
> > >
> >
> > Thanks, the patches look mostly good to me. I have made minor edits by
> > removing 'likely' from a few places as those don't seem to be adding
> > much value, changed comments at a few places, and was getting
> > compilation in error in v11/10 (snapbuild.c:2111:3: error: ‘for’ loop
> > initial declarations are only allowed in C99 mode) which I have fixed.
> > See attached, unless there are major comments/suggestions, I am
> > planning to push this day after tomorrow (by Wednesday) after another
> > pass.
> >
>
> Thanks for updating the patch.
>
> Here are some minor comments:
>
> 1.
> patches for REL10 ~ REL13:
> + * Mark the transaction as containing catalog changes. In addition, if the
> + * given xid is in the list of the initial running xacts, we mark the
> + * its subtransactions as well. See comments for NInitialRunningXacts and
> + * InitialRunningXacts for additional info.
>
> "mark the its subtransactions"
> ->
> "mark its subtransactions"
>
> 2.
> patches for REL10 ~ REL15:
> In the comment in catalog_change_snapshot.spec, maybe we can use "RUNNING_XACTS"
> instead of "RUNNING_XACT" "XACT_RUNNING", same as the patch for master branch.
>

Thank you for the comments! These have been incorporated in the latest
version v12 patch I just submitted.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Wed, Aug 3, 2022 12:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> 
> I've attached updated patches that incorporated the above comments as
> well as the comments from Shi yu. Please review them.
> 

Thanks for updating the patch.

I noticed that in SnapBuildXidSetCatalogChanges(), "i" is initialized in the if
branch in REL10 patch, which is different from REL11 patch. Maybe we can modify
REL11 patch to be consistent with REL10 patch.

The rest of the patch looks good to me.

Regards,
Shi yu

On Wed, Aug 3, 2022 at 3:52 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Wed, Aug 3, 2022 12:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > I've attached updated patches that incorporated the above comments as
> > well as the comments from Shi yu. Please review them.
> >
>
> Thanks for updating the patch.
>
> I noticed that in SnapBuildXidSetCatalogChanges(), "i" is initialized in the if
> branch in REL10 patch, which is different from REL11 patch. Maybe we can modify
> REL11 patch to be consistent with REL10 patch.
>
> The rest of the patch looks good to me.

Oops, thanks for pointing it out. I've fixed it and attached updated
patches for all branches so as not to confuse the patch version. There
is no update from v12 patch on REL12 - master patches.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Attachment
On Wed, Aug 3, 2022 at 1:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Aug 3, 2022 at 3:52 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> >
> > On Wed, Aug 3, 2022 12:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > I've attached updated patches that incorporated the above comments as
> > > well as the comments from Shi yu. Please review them.
> > >
> >
> > Thanks for updating the patch.
> >
> > I noticed that in SnapBuildXidSetCatalogChanges(), "i" is initialized in the if
> > branch in REL10 patch, which is different from REL11 patch. Maybe we can modify
> > REL11 patch to be consistent with REL10 patch.
> >
> > The rest of the patch looks good to me.
>
> Oops, thanks for pointing it out. I've fixed it and attached updated
> patches for all branches so as not to confuse the patch version. There
> is no update from v12 patch on REL12 - master patches.
>

Thanks for the updated patches, the changes look good to me.
Horiguchi-San, and others, do you have any further comments on this or
do you want to spend time in review of it? If not, I would like to
push this after the current minor version release.

-- 
With Regards,
Amit Kapila.



On Mon, Aug 8, 2022 at 9:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 3, 2022 at 1:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> >
> > Oops, thanks for pointing it out. I've fixed it and attached updated
> > patches for all branches so as not to confuse the patch version. There
> > is no update from v12 patch on REL12 - master patches.
> >
>
> Thanks for the updated patches, the changes look good to me.
> Horiguchi-San, and others, do you have any further comments on this or
> do you want to spend time in review of it? If not, I would like to
> push this after the current minor version release.
>

Pushed.

-- 
With Regards,
Amit Kapila.



On Thu, Aug 11, 2022 at 3:10 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Aug 8, 2022 at 9:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 3, 2022 at 1:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > >
> > > Oops, thanks for pointing it out. I've fixed it and attached updated
> > > patches for all branches so as not to confuse the patch version. There
> > > is no update from v12 patch on REL12 - master patches.
> > >
> >
> > Thanks for the updated patches, the changes look good to me.
> > Horiguchi-San, and others, do you have any further comments on this or
> > do you want to spend time in review of it? If not, I would like to
> > push this after the current minor version release.
> >
>
> Pushed.

Thank you!

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



Hi,

On 8/11/22 8:10 AM, Amit Kapila wrote:
> On Mon, Aug 8, 2022 at 9:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Wed, Aug 3, 2022 at 1:20 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>>>
>>> Oops, thanks for pointing it out. I've fixed it and attached updated
>>> patches for all branches so as not to confuse the patch version. There
>>> is no update from v12 patch on REL12 - master patches.
>>>
>> Thanks for the updated patches, the changes look good to me.
>> Horiguchi-San, and others, do you have any further comments on this or
>> do you want to spend time in review of it? If not, I would like to
>> push this after the current minor version release.
>>
> Pushed.

Thank you!

I just marked the corresponding CF entry [1] as committed.

[1]: https://commitfest.postgresql.org/39/3041/

Regards,

-- 

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com




On Fri, Jul 29, 2022 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > Yeah, your description makes sense to me. I've also considered how to
> > hit this path but I guess it is never hit. Thinking of it in another
> > way, first of all, at least 2 catalog modifying transactions have to
> > be running while writing a xl_running_xacts. The serialized snapshot
> > that is written when we decode the first xl_running_xact has two
> > transactions. Then, one of them is committed before the second
> > xl_running_xacts. The second serialized snapshot has only one
> > transaction. Then, the transaction is also committed after that. Now,
> > in order to execute the path, we need to start decoding from the first
> > serialized snapshot. However, if we start from there, we cannot decode
> > the full contents of the transaction that was committed later.
> >
>
> I think then we should change this code in the master branch patch
> with an additional comment on the lines of: "Either all the xacts got
> purged or none. It is only possible to partially remove the xids from
> this array if one or more of the xids are still running but not all.
> That can happen if we start decoding from a point (LSN where the
> snapshot state became consistent) where all the xacts in this were
> running and then at least one of those got committed and a few are
> still running. We will never start from such a point because we won't
> move the slot's restart_lsn past the point where the oldest running
> transaction's restart_decoding_lsn is."
>

Unfortunately, this theory doesn't turn out to be true. While
investigating the latest buildfarm failure [1], I see that it is
possible that only part of the xacts in the restored catalog modifying
xacts list needs to be purged. See the attached where I have
demonstrated it via a reproducible test. It seems the point we were
missing was that to start from a point where two or more catalog
modifying were serialized, it requires another open transaction before
both get committed, and then we need the checkpoint or other way to
force running_xacts record in-between the commit of initial two
catalog modifying xacts. There could possibly be other ways as well
but the theory above wasn't correct.

[1] - https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2022-08-25%2004%3A15%3A34


--
With Regards,
Amit Kapila.

Attachment
On Sat, Aug 27, 2022 at 3:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Jul 29, 2022 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > >
> > > Yeah, your description makes sense to me. I've also considered how to
> > > hit this path but I guess it is never hit. Thinking of it in another
> > > way, first of all, at least 2 catalog modifying transactions have to
> > > be running while writing a xl_running_xacts. The serialized snapshot
> > > that is written when we decode the first xl_running_xact has two
> > > transactions. Then, one of them is committed before the second
> > > xl_running_xacts. The second serialized snapshot has only one
> > > transaction. Then, the transaction is also committed after that. Now,
> > > in order to execute the path, we need to start decoding from the first
> > > serialized snapshot. However, if we start from there, we cannot decode
> > > the full contents of the transaction that was committed later.
> > >
> >
> > I think then we should change this code in the master branch patch
> > with an additional comment on the lines of: "Either all the xacts got
> > purged or none. It is only possible to partially remove the xids from
> > this array if one or more of the xids are still running but not all.
> > That can happen if we start decoding from a point (LSN where the
> > snapshot state became consistent) where all the xacts in this were
> > running and then at least one of those got committed and a few are
> > still running. We will never start from such a point because we won't
> > move the slot's restart_lsn past the point where the oldest running
> > transaction's restart_decoding_lsn is."
> >
>
> Unfortunately, this theory doesn't turn out to be true. While
> investigating the latest buildfarm failure [1], I see that it is
> possible that only part of the xacts in the restored catalog modifying
> xacts list needs to be purged. See the attached where I have
> demonstrated it via a reproducible test. It seems the point we were
> missing was that to start from a point where two or more catalog
> modifying were serialized, it requires another open transaction before
> both get committed, and then we need the checkpoint or other way to
> force running_xacts record in-between the commit of initial two
> catalog modifying xacts. There could possibly be other ways as well
> but the theory above wasn't correct.
>

Thank you for the analysis and the patch. I have the same conclusion.
Since we took this approach only on the master the back branches are
not affected.

The new test scenario makes sense to me and looks better than the one
I have. Regarding the fix, I think we should use
TransactionIdFollowsOrEquals() instead of
NormalTransactionIdPrecedes():

 +       for (off = 0; off < builder->catchange.xcnt; off++)
 +       {
 +           if (NormalTransactionIdPrecedes(builder->catchange.xip[off],
 +                                           builder->xmin))
 +               break;
 +       }

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Sat, Aug 27, 2022 at 1:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Aug 27, 2022 at 3:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Jul 29, 2022 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > >
> > > > Yeah, your description makes sense to me. I've also considered how to
> > > > hit this path but I guess it is never hit. Thinking of it in another
> > > > way, first of all, at least 2 catalog modifying transactions have to
> > > > be running while writing a xl_running_xacts. The serialized snapshot
> > > > that is written when we decode the first xl_running_xact has two
> > > > transactions. Then, one of them is committed before the second
> > > > xl_running_xacts. The second serialized snapshot has only one
> > > > transaction. Then, the transaction is also committed after that. Now,
> > > > in order to execute the path, we need to start decoding from the first
> > > > serialized snapshot. However, if we start from there, we cannot decode
> > > > the full contents of the transaction that was committed later.
> > > >
> > >
> > > I think then we should change this code in the master branch patch
> > > with an additional comment on the lines of: "Either all the xacts got
> > > purged or none. It is only possible to partially remove the xids from
> > > this array if one or more of the xids are still running but not all.
> > > That can happen if we start decoding from a point (LSN where the
> > > snapshot state became consistent) where all the xacts in this were
> > > running and then at least one of those got committed and a few are
> > > still running. We will never start from such a point because we won't
> > > move the slot's restart_lsn past the point where the oldest running
> > > transaction's restart_decoding_lsn is."
> > >
> >
> > Unfortunately, this theory doesn't turn out to be true. While
> > investigating the latest buildfarm failure [1], I see that it is
> > possible that only part of the xacts in the restored catalog modifying
> > xacts list needs to be purged. See the attached where I have
> > demonstrated it via a reproducible test. It seems the point we were
> > missing was that to start from a point where two or more catalog
> > modifying were serialized, it requires another open transaction before
> > both get committed, and then we need the checkpoint or other way to
> > force running_xacts record in-between the commit of initial two
> > catalog modifying xacts. There could possibly be other ways as well
> > but the theory above wasn't correct.
> >
>
> Thank you for the analysis and the patch. I have the same conclusion.
> Since we took this approach only on the master the back branches are
> not affected.
>
> The new test scenario makes sense to me and looks better than the one
> I have. Regarding the fix, I think we should use
> TransactionIdFollowsOrEquals() instead of
> NormalTransactionIdPrecedes():
>
>  +       for (off = 0; off < builder->catchange.xcnt; off++)
>  +       {
>  +           if (NormalTransactionIdPrecedes(builder->catchange.xip[off],
>  +                                           builder->xmin))
>  +               break;
>  +       }
>

Right, fixed.

-- 
With Regards,
Amit Kapila.

Attachment
On Sat, Aug 27, 2022 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Aug 27, 2022 at 1:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sat, Aug 27, 2022 at 3:56 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Jul 29, 2022 at 12:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > >
> > > > > Yeah, your description makes sense to me. I've also considered how to
> > > > > hit this path but I guess it is never hit. Thinking of it in another
> > > > > way, first of all, at least 2 catalog modifying transactions have to
> > > > > be running while writing a xl_running_xacts. The serialized snapshot
> > > > > that is written when we decode the first xl_running_xact has two
> > > > > transactions. Then, one of them is committed before the second
> > > > > xl_running_xacts. The second serialized snapshot has only one
> > > > > transaction. Then, the transaction is also committed after that. Now,
> > > > > in order to execute the path, we need to start decoding from the first
> > > > > serialized snapshot. However, if we start from there, we cannot decode
> > > > > the full contents of the transaction that was committed later.
> > > > >
> > > >
> > > > I think then we should change this code in the master branch patch
> > > > with an additional comment on the lines of: "Either all the xacts got
> > > > purged or none. It is only possible to partially remove the xids from
> > > > this array if one or more of the xids are still running but not all.
> > > > That can happen if we start decoding from a point (LSN where the
> > > > snapshot state became consistent) where all the xacts in this were
> > > > running and then at least one of those got committed and a few are
> > > > still running. We will never start from such a point because we won't
> > > > move the slot's restart_lsn past the point where the oldest running
> > > > transaction's restart_decoding_lsn is."
> > > >
> > >
> > > Unfortunately, this theory doesn't turn out to be true. While
> > > investigating the latest buildfarm failure [1], I see that it is
> > > possible that only part of the xacts in the restored catalog modifying
> > > xacts list needs to be purged. See the attached where I have
> > > demonstrated it via a reproducible test. It seems the point we were
> > > missing was that to start from a point where two or more catalog
> > > modifying were serialized, it requires another open transaction before
> > > both get committed, and then we need the checkpoint or other way to
> > > force running_xacts record in-between the commit of initial two
> > > catalog modifying xacts. There could possibly be other ways as well
> > > but the theory above wasn't correct.
> > >
> >
> > Thank you for the analysis and the patch. I have the same conclusion.
> > Since we took this approach only on the master the back branches are
> > not affected.
> >
> > The new test scenario makes sense to me and looks better than the one
> > I have. Regarding the fix, I think we should use
> > TransactionIdFollowsOrEquals() instead of
> > NormalTransactionIdPrecedes():
> >
> >  +       for (off = 0; off < builder->catchange.xcnt; off++)
> >  +       {
> >  +           if (NormalTransactionIdPrecedes(builder->catchange.xip[off],
> >  +                                           builder->xmin))
> >  +               break;
> >  +       }
> >
>
> Right, fixed.

Thank you for updating the patch! It looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



On Sat, Aug 27, 2022 at 7:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Sat, Aug 27, 2022 at 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Aug 27, 2022 at 1:06 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> > >
> > > > >
> > > > > I think then we should change this code in the master branch patch
> > > > > with an additional comment on the lines of: "Either all the xacts got
> > > > > purged or none. It is only possible to partially remove the xids from
> > > > > this array if one or more of the xids are still running but not all.
> > > > > That can happen if we start decoding from a point (LSN where the
> > > > > snapshot state became consistent) where all the xacts in this were
> > > > > running and then at least one of those got committed and a few are
> > > > > still running. We will never start from such a point because we won't
> > > > > move the slot's restart_lsn past the point where the oldest running
> > > > > transaction's restart_decoding_lsn is."
> > > > >
> > > >
> > > > Unfortunately, this theory doesn't turn out to be true. While
> > > > investigating the latest buildfarm failure [1], I see that it is
> > > > possible that only part of the xacts in the restored catalog modifying
> > > > xacts list needs to be purged. See the attached where I have
> > > > demonstrated it via a reproducible test. It seems the point we were
> > > > missing was that to start from a point where two or more catalog
> > > > modifying were serialized, it requires another open transaction before
> > > > both get committed, and then we need the checkpoint or other way to
> > > > force running_xacts record in-between the commit of initial two
> > > > catalog modifying xacts. There could possibly be other ways as well
> > > > but the theory above wasn't correct.
> > > >
> > >
> > > Thank you for the analysis and the patch. I have the same conclusion.
> > > Since we took this approach only on the master the back branches are
> > > not affected.
> > >
> > > The new test scenario makes sense to me and looks better than the one
> > > I have. Regarding the fix, I think we should use
> > > TransactionIdFollowsOrEquals() instead of
> > > NormalTransactionIdPrecedes():
> > >
> > >  +       for (off = 0; off < builder->catchange.xcnt; off++)
> > >  +       {
> > >  +           if (NormalTransactionIdPrecedes(builder->catchange.xip[off],
> > >  +                                           builder->xmin))
> > >  +               break;
> > >  +       }
> > >
> >
> > Right, fixed.
>
> Thank you for updating the patch! It looks good to me.
>

Pushed.

-- 
With Regards,
Amit Kapila.



Pushed.

--
With Regards,
Amit Kapila.


Hi!

While working on 64–bit XID's patch set, I stumble into problems with contrib/test_decoding/catalog_change_snapshot test [0].

AFAICS, the problem is not related to the 64–bit XID's patch set and the problem is in InitialRunningXacts array, allocaled in
builder->context. Do we really need it to be allocated that way?




--
Best regards,
Maxim Orlov.