RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS0PR01MB57165D850B0262D13141D63E94719@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
On Monday, May 8, 2023 11:08 AM Masahiko Sawada <sawada.mshk@gmail.com>

Hi,

> 
> On Tue, May 2, 2023 at 12:22 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Fri, Apr 28, 2023 at 11:48 AM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> > >
> > > While investigating this issue, I've reviewed the code around
> > > callbacks and worker termination etc and I found a problem.
> > >
> > > A parallel apply worker calls the before_shmem_exit callbacks in the
> > > following order:
> > >
> > > 1. ShutdownPostgres()
> > > 2. logicalrep_worker_onexit()
> > > 3. pa_shutdown()
> > >
> > > Since the worker is detached during logicalrep_worker_onexit(),
> > > MyLogicalReplication->leader_pid is an invalid when we call
> > > pa_shutdown():
> > >
> > > static void
> > > pa_shutdown(int code, Datum arg)
> > > {
> > >     Assert(MyLogicalRepWorker->leader_pid != InvalidPid);
> > >     SendProcSignal(MyLogicalRepWorker->leader_pid,
> > >                    PROCSIG_PARALLEL_APPLY_MESSAGE,
> > >                    InvalidBackendId);
> > >
> > > Also, if the parallel apply worker fails shm_toc_lookup() during the
> > > initialization, it raises an error (because of noError = false) but
> > > ends up a SEGV as MyLogicalRepWorker is still NULL.
> > >
> > > I think that we should not use MyLogicalRepWorker->leader_pid in
> > > pa_shutdown() but instead store the leader's pid to a static variable
> > > before registering pa_shutdown() callback.
> > >
> >
> > Why not simply move the registration of pa_shutdown() to someplace
> > after logicalrep_worker_attach()?
> 
> If we do that, the worker won't call dsm_detach() if it raises an
> ERROR in logicalrep_worker_attach(), is that okay? It seems that it's
> no practically problem since we call dsm_backend_shutdown() in
> shmem_exit(), but if so why do we need to call it in pa_shutdown()?

I think the dsm_detach in pa_shutdown was intended to fire on_dsm_detach
callbacks to give callback a chance to report stat before the stat system is
shutdown, following what we do in ParallelWorkerShutdown() (e.g.
sharedfileset.c callbacks cause fd.c to do ReportTemporaryFileUsage(), so we
need to fire that earlier).

But for parallel apply, we currently only have one on_dsm_detach
callback(shm_mq_detach_callback) which doesn't report extra stats. So the
dsm_detach in pa_shutdown is only used to make it a bit future-proof in case
we add some other on_dsm_detach callbacks in the future which need to report
stats.

Best regards,
Hou zj


pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: [DOC] Update ALTER SUBSCRIPTION documentation
Next
From: Peter Smith
Date:
Subject: Re: PGDOCS - Replica Identity quotes