Re: wake up logical workers after ALTER SUBSCRIPTION - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: wake up logical workers after ALTER SUBSCRIPTION
Date
Msg-id CAA4eK1Jpc9dSYJ9N8nLoNCK8G2TRQTwTh9RsUzJmJJquQcqvQA@mail.gmail.com
Whole thread Raw
In response to Re: wake up logical workers after ALTER SUBSCRIPTION  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: wake up logical workers after ALTER SUBSCRIPTION
List pgsql-hackers
On Tue, Jan 3, 2023 at 11:40 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Tue, Jan 03, 2023 at 11:03:32AM +0530, Amit Kapila wrote:
> > On Thu, Dec 15, 2022 at 4:47 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >> On Wed, Dec 14, 2022 at 02:02:58PM -0500, Tom Lane wrote:
> >> > Maybe we could have workers that are exiting for that reason set a
> >> > flag saying "please restart me without delay"?
> >>
> >> That helps a bit, but there are still delays when starting workers for new
> >> subscriptions.  I think we'd need to create a new array in shared memory
> >> for subscription OIDs that need their workers started immediately.
> >
> > That would be tricky because the list of subscription OIDs can be
> > longer than the workers. Can't we set a boolean variable
> > (check_immediate or something like that) in LogicalRepCtxStruct and
> > use that to traverse the subscriptions? So, when any worker will
> > restart because of a parameter change, we can set the variable and
> > send a signal to the launcher. The launcher can then check this
> > variable to decide whether to start the missing workers for enabled
> > subscriptions.
>
> My approach was to add a variable to LogicalRepWorker that indicated
> whether a worker needed to be restarted immediately.  While this is a
> little weird because the workers array is treated as slots, it worked
> nicely for ALTER SUBSCRIPTION.
>

So, are you planning to keep its in_use and subid flag as it is in
logicalrep_worker_cleanup()? Otherwise, without that it could be
reused for some other subscription.

>  However, this doesn't help at all for
> CREATE SUBSCRIPTION.
>

What if we maintain a hash table similar to 'last_start_times'
maintained in tablesync.c? It won't have entries for new
subscriptions, so for those we may not need to wait till
wal_retrieve_retry_interval.

> IIUC you are suggesting just one variable that would bypass
> wal_retrieve_retry_interval for all subscriptions, not just those newly
> altered or created.  This definitely seems like it would prevent delays,
> but it would also cause wal_retrieve_retry_interval to be incorrectly
> bypassed for the other workers in some cases.
>

Right, but I guess it would be rare in practical cases that someone
Altered/Created a subscription, and also some workers are restarted
due to errors/crashes as only in those cases launcher can restart the
worker when it shouldn't. However, in that case, also, it won't
restart the apply worker again and again unless there are concurrent
Create/Alter Subscription operations going on. IIUC, currently also it
can always first time restart the worker immediately after ERROR/CRASH
because we don't maintain last_start_time for each worker. I think
this is probably okay as we want to avoid repeated restarts after the
ERROR.

BTW, now users also have a subscription option 'disable_on_error'
which could also be used to avoid repeated restarts due to ERRORS.

>
  Is this acceptable?
>

To me, this sounds acceptable but if you and others don't think so
then we can try to develop some solution like per-worker-flag and a
hash table as discussed in the earlier part of the email.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: [PATCH] Improve ability to display optimizer analysis using OPTIMIZER_DEBUG
Next
From: Masahiko Sawada
Date:
Subject: Re: Perform streaming logical transactions by background workers and parallel apply