RE: Parallel Apply - Mailing list pgsql-hackers
| From | Zhijie Hou (Fujitsu) |
|---|---|
| Subject | RE: Parallel Apply |
| Date | |
| Msg-id | TYRPR01MB14195F002941AF23A1350F00D94352@TYRPR01MB14195.jpnprd01.prod.outlook.com Whole thread |
| In response to | Re: Parallel Apply (shveta malik <shveta.malik@gmail.com>) |
| List | pgsql-hackers |
On Wednesday, April 29, 2026 2:51 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Mon, Apr 27, 2026 at 4:10 PM Hayato Kuroda (Fujitsu) > <kuroda.hayato@fujitsu.com> wrote: > > > > Dear hackers, > > > > Here is new patch set. The Option 2 was chosen for the format of > > internal messages based on the discussion. > > > > Thanks. Please find a few comments: Thanks for the comments. > 5) > Now there are 3 types of hash-tables: > > HTAB used for existing ParallelApplyTxnHash dynamic-hash for > parallelized_txns simplehash for replica_identity_table > > a) > What was the design rationale behind choosing different hash table > implementations for these different uses? Is there a comment, prior discussion, > or email thread where this reasoning has been explained? For parallelized_txns, the hash table must reside in shared memory and needs to grow dynamically based on the number of transactions. dshash is the only natural choice here: simplehash does not seem to be designed for shared memory, and HTAB (which does support shared memory) cannot expand its size on the fly. For replica_identity_table, we need maximum efficiency since every change requires a hash operation. simplehash is the better choice here. See the comments atop simplehash.h for details. For ParallelApplyTxnHash (used per large transaction with a simple XID key), we prioritize ease of setup, so HTAB is sufficient. I've added brief comments for new hash tables in the patch set. > > b) > Regarding names, was it a conscious decision to name parallelized_txns and > replica_identity_table in snake_case case different from existing > 'ParallelApplyTxnHash'? I know other code using DSA often follows snake_case, > but having ParallelApplyTxnHash and parallelized_txns defined next to each > other feels a bit inconsistent. Also, I wonder whether hash table objects should > perhaps be named more like ParallelApplyTxnHash, so they stand out while > reading the code; otherwise they mostly look like ordinary local variables, > which they are not. I don't have a strong opinion on the naming of the hash tables. I'll wait to see if others have suggestions, as renaming is easy to address once the patches become stable. > > 6) > cleanup_committed_replica_identity_entries: > > + if (!skipped_write && !XLogRecPtrIsValid(pos->local_end)) > + continue; > > We can use XLogRecPtrIsInvalid instead. XLogRecPtrIsInvalid() is a deprecated macro, so we should avoid using it. Other comments have been addressed in the latest patch set. Best Regards, Hou zj
pgsql-hackers by date: