Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs
From | Peter Smith |
---|---|
Subject | Re: Excessive number of replication slots for 12->14 logical replication |
Date | |
Msg-id | CAHut+Pvt9G8Z8DDE0627=GZJFWC6jiF_uymmXX-RzDojQTw2hw@mail.gmail.com Whole thread Raw |
In response to | Re: Excessive number of replication slots for 12->14 logical replication (Ajin Cherian <itsajin@gmail.com>) |
Responses |
Re: Excessive number of replication slots for 12->14 logical replication
|
List | pgsql-bugs |
Here are some comments for v3-0001: (all cosmetic / comments) ====== 0. <apply> There were multiple whitespace warnings reported by git apply. [postgres@CentOS7-x64 oss_postgres_misc]$ git apply ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:36: indent with spaces. /* ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:37: indent with spaces. * Cleanup the tablesync slot and the origin tracking if exists. ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:38: indent with spaces. * ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:39: indent with spaces. * This has to be done after updating the state because otherwise if ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:40: indent with spaces. * there is an error while doing the database operations we won't be warning: squelched 3 whitespace errors warning: 8 lines add whitespace errors. ====== 1. Commit message I think the Replication Origin tracking is maybe not strictly speaking a "slot", even though it does use the same GUC as true slots. With that in mind, perhaps the following text is more accurate. SUGGESTION The replication origin tracking uses the same GUC (max_replication_slots) as the tablesync slots for limiting resources. In the current (HEAD) code the replication origin tracking of the completed tablesync worker is dropped by the apply worker, but this means there can be a small lag between when the tablesync worker exited, and when its origin tracking is actually dropped. Meanwhile, new tablesync workers can be launched and will immediately try to acquire new slots and origin tracking. If this happens before the worker's origin tracking gets dropped then the available number of slots (max_replication_slots) can be exceeded, leading to the error as reported. To avoid this lag, the dropping of replicating origin tracking is moved to the tablesync worker where it exits. ====== 2. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_sync @@ -315,13 +316,43 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) */ walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli); + /* + * Cleanup the tablesync slot and the origin tracking if exists. + * Consider reversing that comment's first sentence so that it describes what it will do in the same order as the code logic. SUGGESTION Cleanup the origin tracking and tablesync slot. ~~~ 3. + /* + * Reset the origin session before dropping. * - * This has to be done after updating the state because otherwise if - * there is an error while doing the database operations we won't be - * able to rollback dropped slot. + * This is required to reset the ownership of the slot + * and allow the slot to be dropped. */ "slot to be dropped" -> "origin to be dropped" (maybe?) ~~~ 4. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_apply @@ -451,27 +480,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) started_tx = true; } - /* - * Remove the tablesync origin tracking if exists. - * - * The normal case origin drop is done here instead of in the - * process_syncing_tables_for_sync function because we don't - * allow to drop the origin till the process owning the origin - * is alive. - * - * There is a chance that the user is concurrently performing - * refresh for the subscription where we remove the table - * state and its origin and by this time the origin might be - * already removed. So passing missing_ok = true. - */ - ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid, - rstate->relid, - originname, - sizeof(originname)); - replorigin_drop_by_name(originname, true, false); /* This change results in a double blank line remaining instead of just a single blank line. ------ Kind Regards, Peter Smith Fujitsu Australia
pgsql-bugs by date: