RE: Parallel Apply - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Parallel Apply
Date
Msg-id TYRPR01MB14195F002941AF23A1350F00D94352@TYRPR01MB14195.jpnprd01.prod.outlook.com
Whole thread
In response to Re: Parallel Apply  (shveta malik <shveta.malik@gmail.com>)
List pgsql-hackers
On Wednesday, April 29, 2026 2:51 PM shveta malik <shveta.malik@gmail.com> wrote:
> 
> On Mon, Apr 27, 2026 at 4:10 PM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > Dear hackers,
> >
> > Here is new patch set. The Option 2 was chosen for the format of
> > internal messages based on the discussion.
> >
> 
> Thanks. Please find a few comments:

Thanks for the comments.

> 5)
> Now there are 3 types of hash-tables:
> 
> HTAB used for existing ParallelApplyTxnHash dynamic-hash for
> parallelized_txns simplehash for replica_identity_table
> 
> a)
> What was the design rationale behind choosing different hash table
> implementations for these different uses? Is there a comment, prior discussion,
> or email thread where this reasoning has been explained?

For parallelized_txns, the hash table must reside in shared memory and needs to
grow dynamically based on the number of transactions. dshash is the only natural choice
here: simplehash does not seem to be designed for shared memory, and HTAB (which
does support shared memory) cannot expand its size on the fly.

For replica_identity_table, we need maximum efficiency since every change
requires a hash operation. simplehash is the better choice here. See the
comments atop simplehash.h for details.

For ParallelApplyTxnHash (used per large transaction with a simple XID key),
we prioritize ease of setup, so HTAB is sufficient.

I've added brief comments for new hash tables in the patch set.

> 
> b)
> Regarding names, was it a conscious decision to name parallelized_txns and
> replica_identity_table in snake_case case different from existing
> 'ParallelApplyTxnHash'? I know other code using DSA often follows snake_case,
> but having ParallelApplyTxnHash and parallelized_txns defined next to each
> other feels a bit inconsistent. Also, I wonder whether hash table objects should
> perhaps be named more like ParallelApplyTxnHash, so they stand out while
> reading the code; otherwise they mostly look like ordinary local variables,
> which they are not.

I don't have a strong opinion on the naming of the hash tables. I'll wait to
see if others have suggestions, as renaming is easy to address once the
patches become stable.

> 
> 6)
> cleanup_committed_replica_identity_entries:
> 
> + if (!skipped_write && !XLogRecPtrIsValid(pos->local_end))
> + continue;
> 
> We can use XLogRecPtrIsInvalid instead.

XLogRecPtrIsInvalid() is a deprecated macro, so we should avoid using it.


Other comments have been addressed in the latest patch set.

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Parallel Apply
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Parallel Apply