Re: shared-memory based stats collector - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector
Date
Msg-id 20200310213242.bvkuykpswgqgjcpq@alap3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector  (Alvaro Herrera <alvherre@2ndquadrant.com>)
List pgsql-hackers
On 2020-03-10 09:48:07 -0300, Alvaro Herrera wrote:
> On 2020-Mar-10, Kyotaro Horiguchi wrote:
> 
> > At Mon, 9 Mar 2020 20:34:20 -0700, Andres Freund <andres@anarazel.de> wrote in 
> > > On 2020-03-10 12:27:25 +0900, Kyotaro Horiguchi wrote:
> > > > That's true, but I have the same concern with Tom. The archive bacame
> > > > too-tightly linked with other processes than actual relation.
> > > 
> > > What's the problem here? We have a number of helper processes
> > > (checkpointer, bgwriter) that are attached to shared memory, and it's
> > > not a problem.
> > 
> > That theoretically raises the chance of server-crash by a small amount
> > of probability. But, yes, it's absurd to prmise that archiver process
> > crashes.
> 
> The case I'm worried about is a misconfigured archive_command that
> causes the archiver to misbehave (exit with a code other than 0); if
> that already doesn't happen, or we can make it not happen, then I'm okay
> with the changes to archiver.

Well, an exit(1) is also fine, afaict. No?

The archive command can just trigger either a FATAL or a LOG:

    rc = system(xlogarchcmd);
    if (rc != 0)
    {
        /*
         * If either the shell itself, or a called command, died on a signal,
         * abort the archiver.  We do this because system() ignores SIGINT and
         * SIGQUIT while waiting; so a signal is very likely something that
         * should have interrupted us too.  Also die if the shell got a hard
         * "command not found" type of error.  If we overreact it's no big
         * deal, the postmaster will just start the archiver again.
         */
        int            lev = wait_result_is_any_signal(rc, true) ? FATAL : LOG;

        if (WIFEXITED(rc))
        {
            ereport(lev,
                    (errmsg("archive command failed with exit code %d",
                            WEXITSTATUS(rc)),
                     errdetail("The failed archive command was: %s",
                               xlogarchcmd)));
        }
        else if (WIFSIGNALED(rc))
        {
#if defined(WIN32)
            ereport(lev,
                    (errmsg("archive command was terminated by exception 0x%X",
                            WTERMSIG(rc)),
                     errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
                     errdetail("The failed archive command was: %s",
                               xlogarchcmd)));
#else
            ereport(lev,
                    (errmsg("archive command was terminated by signal %d: %s",
                            WTERMSIG(rc), pg_strsignal(WTERMSIG(rc))),
                     errdetail("The failed archive command was: %s",
                               xlogarchcmd)));
#endif
        }
        else
        {
            ereport(lev,
                    (errmsg("archive command exited with unrecognized status %d",
                            rc),
                     errdetail("The failed archive command was: %s",
                               xlogarchcmd)));
        }

        snprintf(activitymsg, sizeof(activitymsg), "failed on %s", xlog);
        set_ps_display(activitymsg, false);

        return false;
    }

I.e. there's only normal ways to shut down the archiver due to a failing
archvie command.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: [PATCH] Use PKG_CHECK_MODULES to detect the libxml2 library
Next
From: David Rowley
Date:
Subject: Re: Berserk Autovacuum (let's save next Mandrill)