Re: Parallel Apply - Mailing list pgsql-hackers

From shveta malik
Subject Re: Parallel Apply
Date
Msg-id CAJpy0uCmouRYrvQkQiqYE3z_X6Yg7Hep-eYmZxqT=O0faya1SQ@mail.gmail.com
Whole thread
In response to RE: Parallel Apply  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Parallel Apply
RE: Parallel Apply
List pgsql-hackers
On Mon, Apr 27, 2026 at 4:10 PM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Dear hackers,
>
> Here is new patch set. The Option 2 was chosen for the format of internal messages
> based on the discussion.
>

Thanks. Please find a few comments:


002:
1)
pa_wait_for_depended_transaction():

+ /*
+ * Quick exit if parallelized_txns has not been initialized yet. This can
+ * happen when 1) this function is called by the leader worker and 2) no
+ * parallel apply workers have never been launched yet since the leader
+ * worker started.
+ */

have never been --> have been

003:

2)
There are a lot of 'it' in last paragraph of commit meesage, making it
ambiguous. Can we change it to:

When the leader sends replication changes to parallel workers, it checks whether
other transactions have already used the RI associated with the
change. If such a transaction is found,
the leader treats the current transaction as dependent on that
transaction and notifies parallel workers to
wait for that transaction to finish via PA_MSG_XACT_DEPENDENCY.

3)
+ /*
+ * The parallel apply worker assigned for applying the transaction. It is
+ * NULL if the assigned worker has been reused for another transaction.
+ */
  ParallelApplyWorkerInfo *winfo;

Comment in pa_get_last_commit_end() was changed, but above was missed.
This too need to be changed.

4)
+XLogRecPtr
+pa_get_last_commit_end(TransactionId xid, bool *skipped_write)

We can add a comment saying:
'It removes the xid entry from ParallelApplyTxnHash after reading
local end LSN, provided the transaction is finished.'

5)
Now there are 3 types of hash-tables:

HTAB used for existing ParallelApplyTxnHash
dynamic-hash for parallelized_txns
simplehash for replica_identity_table

a)
What was the design rationale behind choosing different hash table
implementations for these different uses? Is there a comment, prior
discussion, or email thread where this reasoning has been explained?

b)
Regarding names, was it a conscious decision to name parallelized_txns
and replica_identity_table in snake_case case different from existing
'ParallelApplyTxnHash'? I know other code using DSA often follows
snake_case, but having ParallelApplyTxnHash
and parallelized_txns defined next to each other feels a bit
inconsistent. Also, I wonder whether hash table objects should perhaps
be named more like ParallelApplyTxnHash, so they stand out while
reading the code; otherwise they mostly look like ordinary local
variables, which they are not.

c)
The hash table entries are ParallelApplyWorkerEntry and
ParallelizedTxnEntry, but from the names alone it’s not immediately
obvious what the distinction is between the two tables and their
entries, or which entry maps to which table. Could we add comments
above each table and entry declaration indicating:

--which entry corresponds to which table, and
--what each table stores (i.e. what entry-type, it is composed of)?


6)
cleanup_committed_replica_identity_entries:

+ if (!skipped_write && !XLogRecPtrIsValid(pos->local_end))
+ continue;

We can use XLogRecPtrIsInvalid instead.

7)
+/*
+ * Clean up hash table entries associated with the given transaction IDs.
+ */
+static void
+cleanup_replica_identity_table(List *committed_xid)

It is an helper function for
'cleanup_committed_replica_identity_entries'. The name
'cleanup_replica_identity_table' does not look apt, shall we rename to
delete_replica_identity_entries_for_xids.  delete/remove/prune:
anything will do.

8)
cleanup_committed_replica_identity_entries():

+ if (!TransactionIdIsValid(pos->pa_remote_xid) ||
+ XLogRecPtrIsValid(pos->local_end))
+ continue;

pa_remote_xid is not assigned in this patch (003). I think it will be
in 004. But for the sake of above logic, I think we shall initialize
it to Invalid at-least when we allocate a new FlushPosition
store_flush_position().

~~

Reveiwing further

thanks
Shveta



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Support EXCEPT for TABLES IN SCHEMA publications
Next
From: Alexandra Wang
Date:
Subject: Re: Is there value in having optimizer stats for joins/foreignkeys?