Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs
From | Ajin Cherian |
---|---|
Subject | Re: Excessive number of replication slots for 12->14 logical replication |
Date | |
Msg-id | CAFPTHDYgyNfhaR7UkFEYrMZgjrtat0Ry=hezNc6M_rwEuKi4MQ@mail.gmail.com Whole thread Raw |
In response to | Re: Excessive number of replication slots for 12->14 logical replication (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Excessive number of replication slots for 12->14 logical replication
|
List | pgsql-bugs |
On Fri, Jul 29, 2022 at 7:37 PM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some comments for v3-0001: > > (all cosmetic / comments) > > ====== > > 0. <apply> > > There were multiple whitespace warnings reported by git apply. > > [postgres@CentOS7-x64 oss_postgres_misc]$ git apply > ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch > ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:36: > indent with spaces. > /* > ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:37: > indent with spaces. > * Cleanup the tablesync slot and the origin tracking if exists. > ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:38: > indent with spaces. > * > ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:39: > indent with spaces. > * This has to be done after updating the state because otherwise if > ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:40: > indent with spaces. > * there is an error while doing the database operations we won't be > warning: squelched 3 whitespace errors > warning: 8 lines add whitespace errors. > > ====== > Fixed. > 1. Commit message > > I think the Replication Origin tracking is maybe not strictly speaking > a "slot", even though it does use the same GUC as true slots. With > that in mind, perhaps the following text is more accurate. > > SUGGESTION > The replication origin tracking uses the same GUC > (max_replication_slots) as the tablesync slots for limiting resources. > > In the current (HEAD) code the replication origin tracking of the > completed tablesync worker is dropped by the apply worker, but this > means there can be a small lag between when the tablesync worker > exited, and when its origin tracking is actually dropped. > > Meanwhile, new tablesync workers can be launched and will immediately > try to acquire new slots and origin tracking. If this happens before > the worker's origin tracking gets dropped then the available number of > slots (max_replication_slots) can be exceeded, leading to the error as > reported. > > To avoid this lag, the dropping of replicating origin tracking is > moved to the tablesync worker where it exits. > > ====== Updated as suggested. > > 2. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_sync > > @@ -315,13 +316,43 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) > */ > walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli); > > + /* > + * Cleanup the tablesync slot and the origin tracking if exists. > + * > > Consider reversing that comment's first sentence so that it describes > what it will do in the same order as the code logic. > > SUGGESTION > Cleanup the origin tracking and tablesync slot. > > ~~~ Fixed. > > 3. > > + /* > + * Reset the origin session before dropping. > * > - * This has to be done after updating the state because otherwise if > - * there is an error while doing the database operations we won't be > - * able to rollback dropped slot. > + * This is required to reset the ownership of the slot > + * and allow the slot to be dropped. > */ > > "slot to be dropped" -> "origin to be dropped" (maybe?) > > ~~~ Fixed > > 4. src/backend/replication/logical/tablesync.c - > process_syncing_tables_for_apply > > @@ -451,27 +480,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) > started_tx = true; > } > > - /* > - * Remove the tablesync origin tracking if exists. > - * > - * The normal case origin drop is done here instead of in the > - * process_syncing_tables_for_sync function because we don't > - * allow to drop the origin till the process owning the origin > - * is alive. > - * > - * There is a chance that the user is concurrently performing > - * refresh for the subscription where we remove the table > - * state and its origin and by this time the origin might be > - * already removed. So passing missing_ok = true. > - */ > - ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid, > - rstate->relid, > - originname, > - sizeof(originname)); > - replorigin_drop_by_name(originname, true, false); > > /* > > > This change results in a double blank line remaining instead of just a > single blank line. > Fixed. regards, Ajin Cherian Fujitsu Australia
Attachment
pgsql-bugs by date: