Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs

From Ajin Cherian
Subject Re: Excessive number of replication slots for 12->14 logical replication
Date
Msg-id CAFPTHDYgyNfhaR7UkFEYrMZgjrtat0Ry=hezNc6M_rwEuKi4MQ@mail.gmail.com
Whole thread Raw
In response to Re: Excessive number of replication slots for 12->14 logical replication  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Excessive number of replication slots for 12->14 logical replication
List pgsql-bugs
On Fri, Jul 29, 2022 at 7:37 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Here are some comments for v3-0001:
>
> (all cosmetic / comments)
>
> ======
>
> 0. <apply>
>
> There were multiple whitespace warnings reported by git apply.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:36:
> indent with spaces.
>         /*
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:37:
> indent with spaces.
>          * Cleanup the tablesync slot and the origin tracking if exists.
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:38:
> indent with spaces.
>          *
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:39:
> indent with spaces.
>          * This has to be done after updating the state because otherwise if
> ../patches_misc/v3-0001-fix-excessive-replicating-origin-slots-issue.patch:40:
> indent with spaces.
>          * there is an error while doing the database operations we won't be
> warning: squelched 3 whitespace errors
> warning: 8 lines add whitespace errors.
>
> ======
>

Fixed.

> 1. Commit message
>
> I think the Replication Origin tracking is maybe not strictly speaking
> a "slot", even though it does use the same GUC as true slots. With
> that in mind, perhaps the following text is more accurate.
>
> SUGGESTION
> The replication origin tracking uses the same GUC
> (max_replication_slots) as the tablesync slots for limiting resources.
>
> In the current (HEAD) code the replication origin tracking of the
> completed tablesync worker is dropped by the apply worker, but this
> means there can be a small lag between when the tablesync worker
> exited, and when its origin tracking is actually dropped.
>
> Meanwhile, new tablesync workers can be launched and will immediately
> try to acquire new slots and origin tracking. If this happens before
> the worker's origin tracking gets dropped then the available number of
> slots (max_replication_slots) can be exceeded, leading to the error as
> reported.
>
> To avoid this lag, the dropping of replicating origin tracking is
> moved to the tablesync worker where it exits.
>
> ======

Updated as suggested.

>
> 2. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_sync
>
> @@ -315,13 +316,43 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn)
>   */
>   walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli);
>
> +        /*
> +         * Cleanup the tablesync slot and the origin tracking if exists.
> +         *
>
> Consider reversing that comment's first sentence so that it describes
> what it will do in the same order as the code logic.
>
> SUGGESTION
> Cleanup the origin tracking and tablesync slot.
>
> ~~~

Fixed.
>
> 3.
>
> + /*
> + * Reset the origin session before dropping.
>   *
> - * This has to be done after updating the state because otherwise if
> - * there is an error while doing the database operations we won't be
> - * able to rollback dropped slot.
> + * This is required to reset the ownership of the slot
> + * and allow the slot to be dropped.
>   */
>
> "slot to be dropped" -> "origin to be dropped" (maybe?)
>
> ~~~

Fixed

>
> 4. src/backend/replication/logical/tablesync.c -
> process_syncing_tables_for_apply
>
> @@ -451,27 +480,9 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn)
>   started_tx = true;
>   }
>
> - /*
> - * Remove the tablesync origin tracking if exists.
> - *
> - * The normal case origin drop is done here instead of in the
> - * process_syncing_tables_for_sync function because we don't
> - * allow to drop the origin till the process owning the origin
> - * is alive.
> - *
> - * There is a chance that the user is concurrently performing
> - * refresh for the subscription where we remove the table
> - * state and its origin and by this time the origin might be
> - * already removed. So passing missing_ok = true.
> - */
> - ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
> -   rstate->relid,
> -   originname,
> -   sizeof(originname));
> - replorigin_drop_by_name(originname, true, false);
>
>   /*
>
>
> This change results in a double blank line remaining instead of just a
> single blank line.
>

Fixed.

regards,
Ajin Cherian
Fujitsu Australia

Attachment

pgsql-bugs by date:

Previous
From: Richard Guo
Date:
Subject: Re: BUG #17561: Server crashes on executing row() with very long argument list
Next
From: Tom Lane
Date:
Subject: Re: BUG #17561: Server crashes on executing row() with very long argument list