Re: shared-memory based stats collector - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector
Date
Msg-id 20200309184754.yvrgzqpzs3iynszq@alap3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: shared-memory based stats collector  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Hi,

On 2020-03-09 15:37:05 -0300, Alvaro Herrera wrote:
> Tom Lane escribió:
> 
> In patch 0003,
> 
> >          /*
> > -         * Was it the archiver?  If so, just try to start a new one; no need
> > -         * to force reset of the rest of the system.  (If fail, we'll try
> > -         * again in future cycles of the main loop.).  Unless we were waiting
> > -         * for it to shut down; don't restart it in that case, and
> > -         * PostmasterStateMachine() will advance to the next shutdown step.
> > +         * Was it the archiver?  Normal exit can be ignored; we'll start a new
> > +         * one at the next iteration of the postmaster's main loop, if
> > +         * necessary. Any other exit condition is treated as a crash.
> >           */
> >          if (pid == PgArchPID)
> >          {
> >              PgArchPID = 0;
> >              if (!EXIT_STATUS_0(exitstatus))
> > -                LogChildExit(LOG, _("archiver process"),
> > -                             pid, exitstatus);
> > -            if (PgArchStartupAllowed())
> > -                PgArchPID = pgarch_start();
> > +                HandleChildCrash(pid, exitstatus,
> > +                                 _("archiver process"));
> >              continue;
> >          }
> 
> I'm worried that we're causing all processes to terminate when an
> archiver dies in some ugly way; but in the current coding, it's pretty
> harmless and we'd just start a new one.  I think this needs to be
> reconsidered.  As far as I know, pgarchiver remains unconnected to
> shared memory so a crash-restart cycle is not necessary.  We should
> continue to just log the error message and move on.

Why is it worth having the archiver be "robust" that way? Except that
random implementation details led to it not being connected to shared
memory, and thus allowing a restart for any exit code, I don't see a
need? It doesn't have exit paths that could validly trigger another exit
code, as far as I can see.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: shared-memory based stats collector
Next
From: Tom Lane
Date:
Subject: Re: Bug in pg_restore with EventTrigger in parallel mode