Re: Parallel Apply - Mailing list pgsql-hackers
| From | shveta malik |
|---|---|
| Subject | Re: Parallel Apply |
| Date | |
| Msg-id | CAJpy0uAMRA67+NhMEn_TUm6hrEKZf7su1Ht1GxbgThr4O1gW1Q@mail.gmail.com Whole thread |
| In response to | Re: Parallel Apply (shveta malik <shveta.malik@gmail.com>) |
| Responses |
RE: Parallel Apply
|
| List | pgsql-hackers |
On Wed, Apr 29, 2026 at 11:20 AM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Apr 27, 2026 at 4:10 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear hackers, > > > > Here is new patch set. The Option 2 was chosen for the format of internal messages > > based on the discussion. > > > > Thanks. Please find a few comments: > > > 002: > 1) > pa_wait_for_depended_transaction(): > > + /* > + * Quick exit if parallelized_txns has not been initialized yet. This can > + * happen when 1) this function is called by the leader worker and 2) no > + * parallel apply workers have never been launched yet since the leader > + * worker started. > + */ > > have never been --> have been > > 003: > > 2) > There are a lot of 'it' in last paragraph of commit meesage, making it > ambiguous. Can we change it to: > > When the leader sends replication changes to parallel workers, it checks whether > other transactions have already used the RI associated with the > change. If such a transaction is found, > the leader treats the current transaction as dependent on that > transaction and notifies parallel workers to > wait for that transaction to finish via PA_MSG_XACT_DEPENDENCY. > > 3) > + /* > + * The parallel apply worker assigned for applying the transaction. It is > + * NULL if the assigned worker has been reused for another transaction. > + */ > ParallelApplyWorkerInfo *winfo; > > Comment in pa_get_last_commit_end() was changed, but above was missed. > This too need to be changed. > > 4) > +XLogRecPtr > +pa_get_last_commit_end(TransactionId xid, bool *skipped_write) > > We can add a comment saying: > 'It removes the xid entry from ParallelApplyTxnHash after reading > local end LSN, provided the transaction is finished.' > > 5) > Now there are 3 types of hash-tables: > > HTAB used for existing ParallelApplyTxnHash > dynamic-hash for parallelized_txns > simplehash for replica_identity_table > > a) > What was the design rationale behind choosing different hash table > implementations for these different uses? Is there a comment, prior > discussion, or email thread where this reasoning has been explained? > > b) > Regarding names, was it a conscious decision to name parallelized_txns > and replica_identity_table in snake_case case different from existing > 'ParallelApplyTxnHash'? I know other code using DSA often follows > snake_case, but having ParallelApplyTxnHash > and parallelized_txns defined next to each other feels a bit > inconsistent. Also, I wonder whether hash table objects should perhaps > be named more like ParallelApplyTxnHash, so they stand out while > reading the code; otherwise they mostly look like ordinary local > variables, which they are not. > > c) > The hash table entries are ParallelApplyWorkerEntry and > ParallelizedTxnEntry, but from the names alone it’s not immediately > obvious what the distinction is between the two tables and their > entries, or which entry maps to which table. Could we add comments > above each table and entry declaration indicating: > > --which entry corresponds to which table, and > --what each table stores (i.e. what entry-type, it is composed of)? > > > 6) > cleanup_committed_replica_identity_entries: > > + if (!skipped_write && !XLogRecPtrIsValid(pos->local_end)) > + continue; > > We can use XLogRecPtrIsInvalid instead. > > 7) > +/* > + * Clean up hash table entries associated with the given transaction IDs. > + */ > +static void > +cleanup_replica_identity_table(List *committed_xid) > > It is an helper function for > 'cleanup_committed_replica_identity_entries'. The name > 'cleanup_replica_identity_table' does not look apt, shall we rename to > delete_replica_identity_entries_for_xids. delete/remove/prune: > anything will do. > > 8) > cleanup_committed_replica_identity_entries(): > > + if (!TransactionIdIsValid(pos->pa_remote_xid) || > + XLogRecPtrIsValid(pos->local_end)) > + continue; > > pa_remote_xid is not assigned in this patch (003). I think it will be > in 004. But for the sake of above logic, I think we shall initialize > it to Invalid at-least when we allocate a new FlushPosition > store_flush_position(). > > ~~ > Few more comments on v17-003: 9) +/* + * Append a transaction dependency, excluding duplicates and committed + * transactions. + */ +static List * +check_and_append_xid_dependency(List *depends_on_xids, + TransactionId *depends_on_xid, + TransactionId current_xid) Very difficult to understand the arguments without looking deep into callers. May be some comments about these confusing arguements and return-value will help. 10) +/* + * Check for preceding transactions that involve insert, delete, or update + * operations on the specified table, and return them in 'depends_on_xids'. + */ +static void +find_all_dependencies_on_rel(LogicalRepRelId relid, TransactionId new_depended_xid, + List **depends_on_xids) We shall mention what it does with 'new_depended_xid'? 11) + *depends_on_xids = check_and_append_xid_dependency(*depends_on_xids, + &rientry->remote_xid, + new_depended_xid); We pass *depends_on_xids to check_and_append_xid_dependency() and collect the output in *depends_on_xids. Why don't we pass 'List **depends_on_xids' as argument to check_and_append_xid_dependency() and append it internally (with no return-value needed from this function), similar to how the parent find_all_dependencies_on_rel() accepts/manage 'List **depends_on_xids'? 12) Can we please make argument order of check_and_append_xid_dependency() somewhat simialr to its callers find_all_dependencies_on_rel() and check_dependency_on_replica_identity() to increase readibility. 13) +/* + * Clean up hash table entries associated with the given transaction IDs. + */ +static void +cleanup_replica_identity_table(List *committed_xid) Since argument is a List, we can keep the name as plural: committed_xids. Infact, this function need not to bother about 'committed' or not 'committed', we can simple change argument to 'xids'. 14) Since check_dependency_on_replica_identity() not only checks dependency but also records it, shall we rename it to check_and_record_dependency_on_replica_identity? Or is too long? Any better suggestions? 15) + * been applide yet. applide --> applied thanks Shveta
pgsql-hackers by date: