Thread: Auxiliary Processes and MyAuxProc

Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
Greetings,

I've been looking through the startup code for postmaster forks and
saw a couple of mechanisms used. Most forks seem to be using
StartChildProcess with a MyAuxProc emumerator, but then some
(autovacuum launcher/worker, syslogger, bgworker, archiver) are forked
through their own start functions.

I noticed some implication in the pgsql-hackers archives [1] that
non-AuxProcs are as such because they don't need access to shared
memory. Is this an accurate explanation?

For some context, I'm trying to come up with a patch to set the
process identifier (MyAuxProc/am_autovacuumworker/launcher,
am_archiver, etc.) immediately after forking, so we can provide
process context info to a hook in early startup. I wanted to make sure
to do things as cleanly as possible.

With the current startup architecture, we have to touch multiple
entrypoints to achieve the desired effect. Is there any particular
reason we couldn't fold all of the startup processes into the
StartChildProcess code and assign MyAuxProc for processes that don't
currently have one, or is this a non-starter?

Thank you for your time.

--
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com


[1] https://www.postgresql.org/message-id/20181127.175949.06807946.horiguchi.kyotaro%40lab.ntt.co.jp


Re: Auxiliary Processes and MyAuxProc

From
Tom Lane
Date:
Mike Palmiotto <mike.palmiotto@crunchydata.com> writes:
> I've been looking through the startup code for postmaster forks and
> saw a couple of mechanisms used. Most forks seem to be using
> StartChildProcess with a MyAuxProc emumerator, but then some
> (autovacuum launcher/worker, syslogger, bgworker, archiver) are forked
> through their own start functions.

> I noticed some implication in the pgsql-hackers archives [1] that
> non-AuxProcs are as such because they don't need access to shared
> memory. Is this an accurate explanation?

That was the original idea, but it hasn't stood the test of time very
well.  In particular, the AuxProcType enum only works for child
processes that there's supposed to be exactly one of, so we haven't
used it for autovac workers or bgworkers, although those certainly
have to be connected to shmem (hm, maybe that's not true for all
types of bgworker, not sure).  The autovac launcher is kind of a
weird special case --- I think it could have been included in AuxProcType,
but it wasn't, possibly to minimize the cosmetic difference between it
and autovac workers.

> For some context, I'm trying to come up with a patch to set the
> process identifier (MyAuxProc/am_autovacuumworker/launcher,
> am_archiver, etc.) immediately after forking,

Don't we do that already?

> With the current startup architecture, we have to touch multiple
> entrypoints to achieve the desired effect. Is there any particular
> reason we couldn't fold all of the startup processes into the
> StartChildProcess code and assign MyAuxProc for processes that don't
> currently have one, or is this a non-starter?

If memory serves, StartChildProcess already was an attempt to unify
the treatment of postmaster children.  It's possible that another
round of unification would be productive, but I think you'll find
that there are random small differences in requirements that'd
make it messy.

            regards, tom lane


Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Mon, Feb 25, 2019 at 1:29 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Mike Palmiotto <mike.palmiotto@crunchydata.com> writes:
> <snip>
>
> > For some context, I'm trying to come up with a patch to set the
> > process identifier (MyAuxProc/am_autovacuumworker/launcher,
> > am_archiver, etc.) immediately after forking,
>
> Don't we do that already?

Kind of. The indicators are set in the Main functions after
InitPostmasterChild and some other setup. In my little bit of digging
I found InitPostmasterChild to be the best place for a centralized
"early start-up hook." Is there a better place you can think of
off-hand?

> <snip>
>
> If memory serves, StartChildProcess already was an attempt to unify
> the treatment of postmaster children.  It's possible that another
> round of unification would be productive, but I think you'll find
> that there are random small differences in requirements that'd
> make it messy.

It kind of seemed like it, but I noticed the small differences in
requirements, which made me a bit hesitant. I'll go ahead and see what
I can do and submit the patch for consideration.

-- 
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com


Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Mon, Feb 25, 2019 at 1:41 PM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> <snip>
> >
> > If memory serves, StartChildProcess already was an attempt to unify
> > the treatment of postmaster children.  It's possible that another
> > round of unification would be productive, but I think you'll find
> > that there are random small differences in requirements that'd
> > make it messy.
>
> It kind of seemed like it, but I noticed the small differences in
> requirements, which made me a bit hesitant. I'll go ahead and see what
> I can do and submit the patch for consideration.

I'm considering changing StartChildProcess to take a struct with data
for forking/execing each different process. Each different backend
type would build up the struct and then pass it on to
StartChildProcess, which would handle each separately. This would
ensure that the fork type is set prior to InitPostmasterChild and
would provide us with the information necessary to do what we need in
the InitPostmasterChild_hook.

Attached is a patch to fork_process.h which shows roughly what I'm
thinking. Does this seem somewhat sane as a first step?




--
Mike Palmiotto
Software Engineer
Crunchy Data Solutions
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Yuli Khodorkovskiy
Date:
On Mon, Feb 25, 2019 at 5:25 PM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Mon, Feb 25, 2019 at 1:41 PM Mike Palmiotto
> <mike.palmiotto@crunchydata.com> wrote:
> >
> > <snip>
> > >
> > > If memory serves, StartChildProcess already was an attempt to unify
> > > the treatment of postmaster children.  It's possible that another
> > > round of unification would be productive, but I think you'll find
> > > that there are random small differences in requirements that'd
> > > make it messy.
> >
> > It kind of seemed like it, but I noticed the small differences in
> > requirements, which made me a bit hesitant. I'll go ahead and see what
> > I can do and submit the patch for consideration.
>
> I'm considering changing StartChildProcess to take a struct with data
> for forking/execing each different process. Each different backend
> type would build up the struct and then pass it on to
> StartChildProcess, which would handle each separately. This would
> ensure that the fork type is set prior to InitPostmasterChild and
> would provide us with the information necessary to do what we need in
> the InitPostmasterChild_hook.
>
> Attached is a patch to fork_process.h which shows roughly what I'm
> thinking. Does this seem somewhat sane as a first step?
>

All,

Mike and I have written two patches to solve the issues
discussed in this thread.

The first patch centralizes the startup of workers and extends
worker identification that was introduced by AuxProcType. The worker
id can then be leveraged by extensions for identification of each
process.

The second patch adds a new hook that allows extensions to modify a worker
process' metadata in backends.

These patches should make future worker implementation more
straightforward as there is now one function to call that sets up
each worker. There is also code cleanup and removal of startup
redundancies.

Please let me know your thoughts. I look forward to your feedback.

Thanks,

Yuli

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Alvaro Herrera
Date:
0002 seems way too large (and it doesn't currently apply).  Is there
something we can do to make it more manageable?

I think it would be better to put your 0001 in second place rather than
first, since your other patch doesn't use it AFAICS and it adds
functionality that has not yet received approval [or even been
discussed], while the other is necessary refactoring.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Sep 26, 2019 at 9:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> 0002 seems way too large (and it doesn't currently apply).  Is there
> something we can do to make it more manageable?

Initially we were thinking of submitting one patch for the
centralization work and then separate patches per backend type. We
opted not to go that route, mainly because of the number of resulting
patches (there were somewhere around 13 total, as I remember). If it
makes sense, we can go ahead and split the patches up in that fashion
after rebasing.

>
> I think it would be better to put your 0001 in second place rather than
> first, since your other patch doesn't use it AFAICS and it adds
> functionality that has not yet received approval [or even been
> discussed], while the other is necessary refactoring.

Sounds good.

-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Alvaro Herrera
Date:
On 2019-Sep-26, Mike Palmiotto wrote:

> On Thu, Sep 26, 2019 at 9:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > 0002 seems way too large (and it doesn't currently apply).  Is there
> > something we can do to make it more manageable?
> 
> Initially we were thinking of submitting one patch for the
> centralization work and then separate patches per backend type. We
> opted not to go that route, mainly because of the number of resulting
> patches (there were somewhere around 13 total, as I remember). If it
> makes sense, we can go ahead and split the patches up in that fashion
> after rebasing.

Well, I think it would be easier to manage as split patches, yeah.
I think it'd be infrastructure that needs to be carefully reviewed,
while the other ones are mostly boilerplate.  If I were the committer
for it, I would push that initial patch first immediately followed by
conversion of some process that's heavily exercised in buildfarm, wait
until lack of trouble is evident, followed by a trickle of pushes to
adapt the other processes.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Sep 26, 2019 at 10:56 AM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>
> On 2019-Sep-26, Mike Palmiotto wrote:
>
> > On Thu, Sep 26, 2019 at 9:49 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > >
> > > 0002 seems way too large (and it doesn't currently apply).  Is there
> > > something we can do to make it more manageable?
> >
> > Initially we were thinking of submitting one patch for the
> > centralization work and then separate patches per backend type. We
> > opted not to go that route, mainly because of the number of resulting
> > patches (there were somewhere around 13 total, as I remember). If it
> > makes sense, we can go ahead and split the patches up in that fashion
> > after rebasing.
>
> Well, I think it would be easier to manage as split patches, yeah.
> I think it'd be infrastructure that needs to be carefully reviewed,
> while the other ones are mostly boilerplate.  If I were the committer
> for it, I would push that initial patch first immediately followed by
> conversion of some process that's heavily exercised in buildfarm, wait
> until lack of trouble is evident, followed by a trickle of pushes to
> adapt the other processes.

Thanks for the feedback! I've rebased and tested on my F30 box with
and without EXEC_BACKEND. Just working on splitting out the patches
now and will post the new patchset as soon as that's done (hopefully
sometime tomorrow).

-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Sep 26, 2019 at 6:03 PM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Thu, Sep 26, 2019 at 10:56 AM Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> <snip>
> >
> > Well, I think it would be easier to manage as split patches, yeah.
> > I think it'd be infrastructure that needs to be carefully reviewed,
> > while the other ones are mostly boilerplate.  If I were the committer
> > for it, I would push that initial patch first immediately followed by
> > conversion of some process that's heavily exercised in buildfarm, wait
> > until lack of trouble is evident, followed by a trickle of pushes to
> > adapt the other processes.
>
> Thanks for the feedback! I've rebased and tested on my F30 box with
> and without EXEC_BACKEND. Just working on splitting out the patches
> now and will post the new patchset as soon as that's done (hopefully
> sometime tomorrow).

Attached is the reworked and rebased patch set. I put the hook on top
and a separate commit for each process type. Note that avworker and
avlauncher were intentionally left together. Let me know if you think
those should be split out as well.

Tested again with and without EXEC_BACKEND. Note that you'll need to
set randomize_va_space to 0 to disable ASLR in order to avoid
occasional failure in the EXEC_BACKEND case.

Please let me know if anything else jumps out to you.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Andres Freund
Date:
Hi,

On 2019-09-30 15:13:18 -0400, Mike Palmiotto wrote:
> Attached is the reworked and rebased patch set. I put the hook on top
> and a separate commit for each process type. Note that avworker and
> avlauncher were intentionally left together. Let me know if you think
> those should be split out as well.

I've not looked at this before, nor the thread. Just jumping here
because of the reference to this patch you made in another thread...

In short, I think this is far from ready, nor am I sure this is going in
the right direction. There's a bit more detail about where I think this
ought to go interspersed below (in particular search for PgSubProcess,
EXEC_BACKEND).



> From 205ea86dde898f7edac327d46b2b43b04721aadc Mon Sep 17 00:00:00 2001
> From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> Date: Mon, 30 Sep 2019 14:42:53 -0400
> Subject: [PATCH 7/8] Add backends to process centralization

.oO(out of order attached patches, how fun). 7, 8, 6, 5, 4, 2, 3, 1...



> From 2a3a35f2e80cb2badcb0efbce1bad2484e364b7b Mon Sep 17 00:00:00 2001
> From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> Date: Fri, 27 Sep 2019 12:28:19 -0400
> Subject: [PATCH 1/8] Add ForkProcType infrastructure

A better explanation would be nice. It's hard to review non-trivial
patches that have little explanation as to what they're
achieving. Especially when there's some unrelated seeming changes.


> diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
> index 9238fbe98d..9f3dad1c6d 100644
> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -70,6 +70,7 @@ static void cleanup(void);
>   */
>
>  AuxProcType MyAuxProcType = NotAnAuxProcess;    /* declared in miscadmin.h */
> +ForkProcType MyForkProcType = NoForkProcess;    /* declared in fork_process.h */
>
>  Relation    boot_reldesc;        /* current relation descriptor
>  */

IMO, before this stuff gets massaged meaningfully, we ought to move code
like this (including AuxiliaryProcessMain etc) somewhere where it makes
sense. If we're going to rejigger things, it ought to be towards a more
understandable architecture, rather than keeping weird quirks around.

I'm inclined to think that this better also include properly
centralizing a good portion of the EXEC_BACKEND code. Building
infrastructure that's supposed to be long-lived with that being left as
is just seems like recipe for yet another odd framework.


> @@ -538,11 +539,11 @@ static void ShmemBackendArrayAdd(Backend *bn);
>  static void ShmemBackendArrayRemove(Backend *bn);
>  #endif                            /* EXEC_BACKEND */
>
> -#define StartupDataBase()        StartChildProcess(StartupProcess)
> -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> -#define StartCheckpointer()        StartChildProcess(CheckpointerProcess)
> -#define StartWalWriter()        StartChildProcess(WalWriterProcess)
> -#define StartWalReceiver()        StartChildProcess(WalReceiverProcess)
> +#define StartupDataBase()        StartChildProcess(StartupFork)
> +#define StartBackgroundWriter() StartChildProcess(BgWriterFork)
> +#define StartCheckpointer()        StartChildProcess(CheckpointerFork)
> +#define StartWalWriter()        StartChildProcess(WalWriterFork)
> +#define StartWalReceiver()        StartChildProcess(WalReceiverFork)

FWIW, I'd just rip this out, it adds exactly nothing but one more place
to change.


> +/*
> + * PrepAuxProcessFork
> + *
> + * Prepare a ForkProcType struct for the auxiliary process specified by

ForkProcData, not ForkProcType


> + * AuxProcType. This does all prep related to av parameters and error messages.
> + */
> +static void
> +PrepAuxProcessFork(ForkProcData *aux_fork)
> +{
> +    int            ac = 0;
> +
> +    /*
> +     * Set up command-line arguments for subprocess
> +     */
> +    aux_fork->av[ac++] = pstrdup("postgres");

There's no documentation as to what 'av' is actually supposed to mean. I
assume auxvector or such, but if so that's neither obvious, nor
necessarily good.


> +#ifdef EXEC_BACKEND
> +    aux_fork->av[ac++] = pstrdup("--forkboot");
> +    aux_fork->av[ac++] = NULL;            /* filled in by postmaster_forkexec */
> +#endif

What's the point of pstrdup()ing constant strings here? Afaict you're
not freeing them ever?


> +    aux_fork->av[ac++] = psprintf("-x%d", MyForkProcType);
> +
> +    aux_fork->av[ac] = NULL;
> +    Assert(ac < lengthof(*aux_fork->av));

Unless I miss something, the lengthof here doesn't do anythign
useful. ForkProcData->av is defined as:

> +typedef struct ForkProcData
> +{
> +   char                   **av;

which I think means you're computing sizeof(char*)/sizeof(char)?



> +    aux_fork->ac = ac;
> +    switch (MyForkProcType)
> +    {
> +        case StartupProcess:
> +            aux_fork->type_desc = pstrdup("startup");
> +            break;
> +        case BgWriterProcess:
> +            aux_fork->type_desc = pstrdup("background writer");
> +            break;
> +        case CheckpointerProcess:
> +            aux_fork->type_desc = pstrdup("checkpointer");
> +            break;
> +        case WalWriterProcess:
> +            aux_fork->type_desc = pstrdup("WAL writer");
> +            break;
> +        case WalReceiverProcess:
> +            aux_fork->type_desc = pstrdup("WAL receiver");
> +            break;
> +        default:
> +            aux_fork->type_desc = pstrdup("child");
> +            break;
> +    }
> +
> +    aux_fork->child_main = AuxiliaryProcessMain;

I'm really not in love in having details like the descriptors of
processes, the types of processes, the mapping from MyForkProcType to
AuxProcType in multiple places.  I feel this is adding even more
confusion to this, rather than reducing it.

I feel like we should define the properties of the different types of
processes in *one* place, rather multiple. Define one array of
structs that contains information about the different types of
processes, rather than distributing that knowledge all over.


> @@ -4477,11 +4530,11 @@ BackendRun(Port *port)
>  pid_t
>  postmaster_forkexec(int argc, char *argv[])
>  {
> -    Port        port;
> -
>      /* This entry point passes dummy values for the Port variables */
> -    memset(&port, 0, sizeof(port));
> -    return internal_forkexec(argc, argv, &port);
> +    if (!ConnProcPort)
> +        ConnProcPort = palloc0(sizeof(*ConnProcPort));
> +
> +    return internal_forkexec(argc, argv);
>  }

What's the point of creating a dummy ConnProcPort? And when is this
being pfreed? I mean previously this at least used a stack variable that
was automatically being released by the time postmaster_forkexec
returned, now it sure looks like a permanent leak to me.

Also, why is it a good idea to rely *more* on data being implicitly
passed by stuffing things into global variables? That strikes me as


>          /* in parent, fork failed */
> -        int            save_errno = errno;
> +        int save_errno = errno;
>
> -        errno = save_errno;
> -        switch (type)
> -        {
> -            case StartupProcess:
> -                ereport(LOG,
> -                        (errmsg("could not fork startup process: %m")));
> -                break;
> -            case BgWriterProcess:
> -                ereport(LOG,
> -                        (errmsg("could not fork background writer process: %m")));
> -                break;
> -            case CheckpointerProcess:
> -                ereport(LOG,
> -                        (errmsg("could not fork checkpointer process: %m")));
> -                break;
> -            case WalWriterProcess:
> -                ereport(LOG,
> -                        (errmsg("could not fork WAL writer process: %m")));
> -                break;
> -            case WalReceiverProcess:
> -                ereport(LOG,
> -                        (errmsg("could not fork WAL receiver process: %m")));
> -                break;
> -            default:
> -                ereport(LOG,
> -                        (errmsg("could not fork process: %m")));
> -                break;
> -        }
> +        errno =  save_errno;
> +        ereport(LOG,
> +            (errmsg("could not fork %s process: %m", fork_data->type_desc)));

Previously the process type was translatable, now it is not
anymore. You'd probably have to solve that somehow, perhaps by
translating the type_desc once, when setting it.


> @@ -1113,7 +1113,7 @@ process_startup_options(Port *port, bool am_superuser)
>          av = (char **) palloc(maxac * sizeof(char *));
>          ac = 0;
>
> -        av[ac++] = "postgres";
> +        av[ac++] = pstrdup("postgres");
>
>          pg_split_opts(av, &ac, port->cmdline_options);

Again, what's with all these conversions to dynamic memory allocations?



>  /* Flags to tell if we are in an autovacuum process */
> -static bool am_autovacuum_launcher = false;
> -static bool am_autovacuum_worker = false;
> +bool am_autovacuum_launcher = false;
> +bool am_autovacuum_worker = false;

I'm against exposing a multitude of am_* variables for each process
type, especially globally.  If we're building proper infrastructure
here, why do we need multiple variables, rather than exactly one
indicating the process type?



> @@ -347,88 +343,53 @@ static void avl_sigusr2_handler(SIGNAL_ARGS);
>  static void avl_sigterm_handler(SIGNAL_ARGS);
>  static void autovac_refresh_stats(void);
>
> -
> -
>  /********************************************************************
>   *                      AUTOVACUUM LAUNCHER CODE
>   ********************************************************************/

Spurious whitespace changes (in a few other places too).


> +void
> +PrepAutoVacProcessFork(ForkProcData *autovac_fork)
>  {

I'm really not in love with keeping "fork" in all these, given that
we're not actually going to fork in some cases. Why do we not just call
it subprocess and get rid of the "fork" and "forkexec" suffixes?




> +/*
> + * shmemSetup
> + *
> + * Helper function for a child to set up shmem before
> + * executing.
> + *
> + * aux_process - set to true if an auxiliary process.
> + */
> +static void
> +shmemSetup(bool aux_process)
> +{
> +    /* Restore basic shared memory pointers */
> +    InitShmemAccess(UsedShmemSegAddr);
> +
> +    /* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
> +    if (aux_process)
> +        InitAuxiliaryProcess();
> +    else
> +        InitProcess();
> +
> +    /* Attach process to shared data structures */
> +    CreateSharedMemoryAndSemaphores();
> +}

Not quite convinced this is a useful abstraction.  I'm inclined to think
that this again should use something more like one array of
per-subprocess properties.


Imagine if we had an array like

typedef struct PgSubProcess
{
        const char *name;
        const char *translated_name;
        bool        needs_shmem;
        bool        needs_aux_proc;
        bool        needs_full_proc;
        SubProcessEntryPoint entrypoint;
} PgSubProcess;

PgSubProcess process_types[] = {
    {.name = "startup process", .needs_shmem = true, .needs_aux_proc = true, .entrypoint = StartupProcessMain},
    {.name = "background writer", .needs_shmem = true, .needs_aux_proc = true, .entrypoint = BackgroundWriterMain},
    {.name = "autovacuum launcher", .needs_shmem = true, .needs_aux_proc = true, .entrypoint = AutoVacLauncherMain},
    {.name = "backend", .needs_shmem = true, .needs_full_proc = true, , .entrypoint = PostgresMain},


> From efc6f0531b71847c6500062b5a0fe43331a6ea7e Mon Sep 17 00:00:00 2001
> From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> Date: Fri, 27 Sep 2019 19:13:53 -0400
> Subject: [PATCH 3/8] Add centralized stat collector
>
> ---
>  src/backend/postmaster/pgstat.c       | 88 ++++++---------------------
>  src/backend/postmaster/postmaster.c   |  5 +-
>  src/include/pgstat.h                  |  4 +-
>  src/include/postmaster/fork_process.h |  1 +
>  4 files changed, 26 insertions(+), 72 deletions(-)
>
> diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
> index 011076c3e3..73d953aedf 100644
> --- a/src/backend/postmaster/pgstat.c
> +++ b/src/backend/postmaster/pgstat.c
> @@ -280,10 +280,6 @@ static instr_time total_func_time;
>   * Local function forward declarations
>   * ----------
>   */
> -#ifdef EXEC_BACKEND
> -static pid_t pgstat_forkexec(void);
> -#endif
> -
>  NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
>  static void pgstat_exit(SIGNAL_ARGS);
>  static void pgstat_beshutdown_hook(int code, Datum arg);
> @@ -689,53 +685,26 @@ pgstat_reset_all(void)
>      pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
>  }
>
> -#ifdef EXEC_BACKEND
>
>  /*
> - * pgstat_forkexec() -
> - *
> - * Format up the arglist for, then fork and exec, statistics collector process
> - */
> -static pid_t
> -pgstat_forkexec(void)
> -{
> -    char       *av[10];
> -    int            ac = 0;
> -
> -    av[ac++] = "postgres";
> -    av[ac++] = "--forkcol";
> -    av[ac++] = NULL;            /* filled in by postmaster_forkexec */
> -
> -    av[ac] = NULL;
> -    Assert(ac < lengthof(av));
> -
> -    return postmaster_forkexec(ac, av);
> -}
> -#endif                            /* EXEC_BACKEND */
> -
> -
> -/*
> - * pgstat_start() -
> + * PrepPgstatCollectorFork
>   *
>   *    Called from postmaster at startup or after an existing collector
>   *    died.  Attempt to fire up a fresh statistics collector.
>   *
> - *    Returns PID of child process, or 0 if fail.
> - *
> - *    Note: if fail, we will be called again from the postmaster main loop.
>   */
> -int
> -pgstat_start(void)
> +void
> +PrepPgstatCollectorFork(ForkProcData *pgstat_fork)
>  {
> +    int                ac = 0;
>      time_t        curtime;
> -    pid_t        pgStatPid;
>
>      /*
>       * Check that the socket is there, else pgstat_init failed and we can do
>       * nothing useful.
>       */
>      if (pgStatSock == PGINVALID_SOCKET)
> -        return 0;
> +        return;
>
>      /*
>       * Do nothing if too soon since last collector start.  This is a safety
> @@ -746,45 +715,20 @@ pgstat_start(void)
>      curtime = time(NULL);
>      if ((unsigned int) (curtime - last_pgstat_start_time) <
>          (unsigned int) PGSTAT_RESTART_INTERVAL)
> -        return 0;
> +        return;
>      last_pgstat_start_time = curtime;
>
> -    /*
> -     * Okay, fork off the collector.
> -     */
>  #ifdef EXEC_BACKEND
> -    switch ((pgStatPid = pgstat_forkexec()))
> -#else
> -    switch ((pgStatPid = fork_process()))
> -#endif
> -    {
> -        case -1:
> -            ereport(LOG,
> -                    (errmsg("could not fork statistics collector: %m")));
> -            return 0;
> -
> -#ifndef EXEC_BACKEND
> -        case 0:
> -            /* in postmaster child ... */
> -            InitPostmasterChild();
> -
> -            /* Close the postmaster's sockets */
> -            ClosePostmasterPorts(false);
> -
> -            /* Drop our connection to postmaster's shared memory, as well */
> -            dsm_detach_all();
> -            PGSharedMemoryDetach();
> -
> -            PgstatCollectorMain(0, NULL);
> -            break;
> +    pgstat_fork->av[ac++] = pstrdup("postgres");
> +    pgstat_fork->av[ac++] = pstrdup("--forkcol");
> +    pgstat_fork->av[ac++] = NULL;            /* filled in by postmaster_forkexec */
>  #endif
>
> -        default:
> -            return (int) pgStatPid;
> -    }
> +    pgstat_fork->ac = ac;
> +    Assert(pgstat_fork->ac < lengthof(*pgstat_fork->av));
>
> -    /* shouldn't get here */
> -    return 0;
> +    pgstat_fork->type_desc = pstrdup("statistics collector");
> +    pgstat_fork->child_main = PgstatCollectorMain;
>  }
>
>  void
> @@ -4425,12 +4369,16 @@ pgstat_send_bgwriter(void)
>   * ----------
>   */
>  NON_EXEC_STATIC void
> -PgstatCollectorMain(int argc, char *argv[])
> +PgstatCollectorMain(pg_attribute_unused() int argc, pg_attribute_unused() char *argv[])
>  {
>      int            len;
>      PgStat_Msg    msg;
>      int            wr;
>
> +    /* Drop our connection to postmaster's shared memory, as well */
> +    dsm_detach_all();
> +    PGSharedMemoryDetach();
> +
>      /*
>       * Ignore all signals usually bound to some action in the postmaster,
>       * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
> diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> index 35cd1479b9..37a36387a3 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -549,6 +549,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
>  #define StartWalReceiver()        StartChildProcess(WalReceiverFork)
>  #define StartAutoVacLauncher()        StartChildProcess(AutoVacLauncherFork)
>  #define StartAutoVacWorker()        StartChildProcess(AutoVacWorkerFork)
> +#define pgstat_start()            StartChildProcess(PgstatCollectorFork)
>
>  /* Macros to check exit status of a child process */
>  #define EXIT_STATUS_0(st)  ((st) == 0)
> @@ -5459,7 +5460,9 @@ StartChildProcess(ForkProcType type)
>          case AutoVacWorkerFork:
>              PrepAutoVacProcessFork(fork_data);
>              break;
> -
> +        case PgstatCollectorFork:
> +            PrepPgstatCollectorFork(fork_data);
> +            break;
>          default:
>              break;
>      }
> diff --git a/src/include/pgstat.h b/src/include/pgstat.h
> index fe076d823d..00e95caa60 100644
> --- a/src/include/pgstat.h
> +++ b/src/include/pgstat.h
> @@ -15,6 +15,7 @@
>  #include "libpq/pqcomm.h"
>  #include "port/atomics.h"
>  #include "portability/instr_time.h"
> +#include "postmaster/fork_process.h"
>  #include "postmaster/pgarch.h"
>  #include "storage/proc.h"
>  #include "utils/hsearch.h"
> @@ -1235,10 +1236,11 @@ extern Size BackendStatusShmemSize(void);
>  extern void CreateSharedBackendStatus(void);
>
>  extern void pgstat_init(void);
> -extern int    pgstat_start(void);
>  extern void pgstat_reset_all(void);
>  extern void allow_immediate_pgstat_restart(void);
>
> +extern void PrepPgstatCollectorFork(ForkProcData *pgstat_fork);
> +
>  #ifdef EXEC_BACKEND
>  extern void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
>  #endif
> diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
> index 1f319fc98f..16ca8be968 100644
> --- a/src/include/postmaster/fork_process.h
> +++ b/src/include/postmaster/fork_process.h
> @@ -26,6 +26,7 @@ typedef enum
>     WalReceiverFork,    /* end of Auxiliary Process Forks */
>     AutoVacLauncherFork,
>     AutoVacWorkerFork,
> +   PgstatCollectorFork,
>
>     NUMFORKPROCTYPES            /* Must be last! */
>  } ForkProcType;




> ---
>  src/backend/postmaster/postmaster.c   | 309 +++++++++++++-------------
>  src/include/postmaster/fork_process.h |   5 +
>  src/include/postmaster/postmaster.h   |   1 -
>  3 files changed, 160 insertions(+), 155 deletions(-)
>
> diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> index 8f862fcd64..b55cc4556d 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -144,6 +144,7 @@ static Backend *ShmemBackendArray;
>
>  BackgroundWorker *MyBgworkerEntry = NULL;
>  RegisteredBgWorker *CurrentBgWorker = NULL;
> +static Backend       *MyBackend;
>  static int            child_errno;

This is another global variable that strikes me as a bad idea. Whereas
MyBgworkerEntry is only set in the bgworker itself, you appear to set
MyBackend *in postmaster*:

> +/*
> + * PrepBackendFork
> + *
> + * Prepare a ForkProcType struct for starting a Backend.
> + * This does all prep related to av parameters and error messages.
> +*/
> +static void
> +PrepBackendFork(ForkProcData *backend_fork)
> +{
> +    int            ac = 0;
> +
> +    /*
> +     * Create backend data structure.  Better before the fork() so we can
> +     * handle failure cleanly.
> +     */
> +    MyBackend = (Backend *) malloc(sizeof(Backend));
> +    if (!MyBackend)
> +    {
> +        ereport(LOG,
> +                (errcode(ERRCODE_OUT_OF_MEMORY),
> +                 errmsg("out of memory")));
> +    }

That's totally not ok, we'll constantly overwrite the variable set by
another process, and otherwise have now obsolete values in there. And
even worse, you leave it dangling in some cases:

> +    /*
> +     * Compute the cancel key that will be assigned to this backend. The
> +     * backend will have its own copy in the forked-off process' value of
> +     * MyCancelKey, so that it can transmit the key to the frontend.
> +     */
> +    if (!RandomCancelKey(&MyCancelKey))
> +    {
> +        free(MyBackend);
> +        ereport(LOG,
> +                (errcode(ERRCODE_INTERNAL_ERROR),
> +                 errmsg("could not generate random cancel key")));
> +    }


Nor do I see where you're actually freeing that memory in postmaster, in
the successful case.  Previously it was freed in postmaster in the
success case

> -#ifdef EXEC_BACKEND
> -    pid = backend_forkexec(port);
> -#else                            /* !EXEC_BACKEND */
> -    pid = fork_process();
> -    if (pid == 0)                /* child */
> -    {
> -        free(bn);
> -

but now it appears  it's only freed in the failure case:

> +/*
> + *     BackendFailFork
> + *
> + *     Backend cleanup in case a failure occurs forking a new Backend.
> + */
> +static void BackendFailFork(pg_attribute_unused() int argc, pg_attribute_unused() char *argv[])
> +{
> +    if (!MyBackend->dead_end)
> +        (void) ReleasePostmasterChildSlot(MyBackend->child_slot);
> +    free(MyBackend);
> +
> +    report_fork_failure_to_client(MyProcPort, child_errno);
> +}



> From 718c6598e9e02e1ce9ade99afb3c8948c67ef5ae Mon Sep 17 00:00:00 2001
> From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> Date: Tue, 19 Feb 2019 15:29:33 +0000
> Subject: [PATCH 8/8] Add a hook to allow extensions to set worker metadata
>
> This patch implements a hook that allows extensions to modify a worker
> process' metadata in backends.
>
> Additional work done by Yuli Khodorkovskiy <yuli@crunchydata.com>
> Original discussion here:
https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO%3DhtC6LnA6aW4r-%2Bjq%3D3Q5RAoFQgW8EtA%40mail.gmail.com
> ---
>  src/backend/postmaster/fork_process.c | 11 +++++++++++
>  src/include/postmaster/fork_process.h |  4 ++++
>  2 files changed, 15 insertions(+)
>
> diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
> index a9138f589e..964c04d846 100644
> --- a/src/backend/postmaster/fork_process.c
> +++ b/src/backend/postmaster/fork_process.c
> @@ -21,6 +21,14 @@
>  #include <openssl/rand.h>
>  #endif
>
> +/*
> + * This hook allows plugins to get control of the child process following
> + * a successful fork_process(). It can be used to perform some action in
> + * extensions to set metadata in backends (including special backends) upon
> + * setting of the session user.
> + */
> +ForkProcess_post_hook_type ForkProcess_post_hook = NULL;
> +
>  #ifndef WIN32
>  /*
>   * Wrapper for fork(). Return values are the same as those for fork():
> @@ -113,6 +121,9 @@ fork_process(void)
>  #ifdef USE_OPENSSL
>          RAND_cleanup();
>  #endif
> +
> +        if (ForkProcess_post_hook)
> +            (*ForkProcess_post_hook) ();
>      }


>      return result;
> diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
> index 631a2113b5..d756fcead7 100644
> --- a/src/include/postmaster/fork_process.h
> +++ b/src/include/postmaster/fork_process.h
> @@ -58,4 +58,8 @@ extern pid_t fork_process(void);
>  extern pid_t postmaster_forkexec(ForkProcData *fork_data);
>  #endif
>
> +/* Hook for plugins to get control after a successful fork_process() */
> +typedef void (*ForkProcess_post_hook_type) ();
> +extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook;
> +
>  #endif                            /* FORK_PROCESS_H */
> --
> 2.23.0
>

Why do we want libraries to allow to hook into processes like the
startup process etc?


Greetings,

Andres Freund



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Oct 3, 2019 at 1:31 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-09-30 15:13:18 -0400, Mike Palmiotto wrote:
> > Attached is the reworked and rebased patch set. I put the hook on top
> > and a separate commit for each process type. Note that avworker and
> > avlauncher were intentionally left together. Let me know if you think
> > those should be split out as well.
>
> I've not looked at this before, nor the thread. Just jumping here
> because of the reference to this patch you made in another thread...
>
> In short, I think this is far from ready, nor am I sure this is going in
> the right direction. There's a bit more detail about where I think this
> ought to go interspersed below (in particular search for PgSubProcess,
> EXEC_BACKEND).

Great, thanks for the review!

>
>
>
> > From 205ea86dde898f7edac327d46b2b43b04721aadc Mon Sep 17 00:00:00 2001
> > From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> > Date: Mon, 30 Sep 2019 14:42:53 -0400
> > Subject: [PATCH 7/8] Add backends to process centralization
>
> .oO(out of order attached patches, how fun). 7, 8, 6, 5, 4, 2, 3, 1...

Yeah, hm. Not really sure why/how that happened.

>
>
>
> > From 2a3a35f2e80cb2badcb0efbce1bad2484e364b7b Mon Sep 17 00:00:00 2001
> > From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> > Date: Fri, 27 Sep 2019 12:28:19 -0400
> > Subject: [PATCH 1/8] Add ForkProcType infrastructure
>
> A better explanation would be nice. It's hard to review non-trivial
> patches that have little explanation as to what they're
> achieving. Especially when there's some unrelated seeming changes.

I should have maintained the initial patch description from the last
cut -- there is a lot more detail there. I will pull that back in
after addressing the other concerns. :)

>
>
> > diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
> > index 9238fbe98d..9f3dad1c6d 100644
> > --- a/src/backend/bootstrap/bootstrap.c
> > +++ b/src/backend/bootstrap/bootstrap.c
> > @@ -70,6 +70,7 @@ static void cleanup(void);
> >   */
> >
> >  AuxProcType MyAuxProcType = NotAnAuxProcess; /* declared in miscadmin.h */
> > +ForkProcType MyForkProcType = NoForkProcess; /* declared in fork_process.h */
> >
> >  Relation     boot_reldesc;           /* current relation descriptor
> >  */
>
> IMO, before this stuff gets massaged meaningfully, we ought to move code
> like this (including AuxiliaryProcessMain etc) somewhere where it makes
> sense. If we're going to rejigger things, it ought to be towards a more
> understandable architecture, rather than keeping weird quirks around.

Sounds good. I'll address this in the next round.

>
> I'm inclined to think that this better also include properly
> centralizing a good portion of the EXEC_BACKEND code. Building
> infrastructure that's supposed to be long-lived with that being left as
> is just seems like recipe for yet another odd framework.

+1

>
>
> > @@ -538,11 +539,11 @@ static void ShmemBackendArrayAdd(Backend *bn);
> >  static void ShmemBackendArrayRemove(Backend *bn);
> >  #endif                                                       /* EXEC_BACKEND */
> >
> > -#define StartupDataBase()            StartChildProcess(StartupProcess)
> > -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess)
> > -#define StartCheckpointer()          StartChildProcess(CheckpointerProcess)
> > -#define StartWalWriter()             StartChildProcess(WalWriterProcess)
> > -#define StartWalReceiver()           StartChildProcess(WalReceiverProcess)
> > +#define StartupDataBase()            StartChildProcess(StartupFork)
> > +#define StartBackgroundWriter() StartChildProcess(BgWriterFork)
> > +#define StartCheckpointer()          StartChildProcess(CheckpointerFork)
> > +#define StartWalWriter()             StartChildProcess(WalWriterFork)
> > +#define StartWalReceiver()           StartChildProcess(WalReceiverFork)
>
> FWIW, I'd just rip this out, it adds exactly nothing but one more place
> to change.

+1

>
>
> > +/*
> > + * PrepAuxProcessFork
> > + *
> > + * Prepare a ForkProcType struct for the auxiliary process specified by
>
> ForkProcData, not ForkProcType
>
>
> > + * AuxProcType. This does all prep related to av parameters and error messages.
> > + */
> > +static void
> > +PrepAuxProcessFork(ForkProcData *aux_fork)
> > +{
> > +     int                     ac = 0;
> > +
> > +     /*
> > +      * Set up command-line arguments for subprocess
> > +      */
> > +     aux_fork->av[ac++] = pstrdup("postgres");
>
> There's no documentation as to what 'av' is actually supposed to mean. I
> assume auxvector or such, but if so that's neither obvious, nor
> necessarily good.

Got it, better variable names here.

>
>
> > +#ifdef EXEC_BACKEND
> > +     aux_fork->av[ac++] = pstrdup("--forkboot");
> > +     aux_fork->av[ac++] = NULL;                      /* filled in by postmaster_forkexec */
> > +#endif
>
> What's the point of pstrdup()ing constant strings here? Afaict you're
> not freeing them ever?

The idea was supposed to be that the prep work lives up until the fork
happens and then is freed en masse with the rest of the MemoryContext.
Perhaps there was some oversight here. I'll revisit this and others in
the next pass.

>
>
> > +     aux_fork->av[ac++] = psprintf("-x%d", MyForkProcType);
> > +
> > +     aux_fork->av[ac] = NULL;
> > +     Assert(ac < lengthof(*aux_fork->av));
>
> Unless I miss something, the lengthof here doesn't do anythign
> useful. ForkProcData->av is defined as:
>
> > +typedef struct ForkProcData
> > +{
> > +   char                   **av;
>
> which I think means you're computing sizeof(char*)/sizeof(char)?

I think this might be a leftover assertion from the previous
infrastructure. Agreed that this is probably useless.

>
>
>
> > +     aux_fork->ac = ac;
> > +     switch (MyForkProcType)
> > +     {
> > +             case StartupProcess:
> > +                     aux_fork->type_desc = pstrdup("startup");
> > +                     break;
> > +             case BgWriterProcess:
> > +                     aux_fork->type_desc = pstrdup("background writer");
> > +                     break;
> > +             case CheckpointerProcess:
> > +                     aux_fork->type_desc = pstrdup("checkpointer");
> > +                     break;
> > +             case WalWriterProcess:
> > +                     aux_fork->type_desc = pstrdup("WAL writer");
> > +                     break;
> > +             case WalReceiverProcess:
> > +                     aux_fork->type_desc = pstrdup("WAL receiver");
> > +                     break;
> > +             default:
> > +                     aux_fork->type_desc = pstrdup("child");
> > +                     break;
> > +     }
> > +
> > +     aux_fork->child_main = AuxiliaryProcessMain;
>
> I'm really not in love in having details like the descriptors of
> processes, the types of processes, the mapping from MyForkProcType to
> AuxProcType in multiple places.  I feel this is adding even more
> confusion to this, rather than reducing it.
>
> I feel like we should define the properties of the different types of
> processes in *one* place, rather multiple. Define one array of
> structs that contains information about the different types of
> processes, rather than distributing that knowledge all over.

Agreed here. Thanks for the suggestion.

>
>
> > @@ -4477,11 +4530,11 @@ BackendRun(Port *port)
> >  pid_t
> >  postmaster_forkexec(int argc, char *argv[])
> >  {
> > -     Port            port;
> > -
> >       /* This entry point passes dummy values for the Port variables */
> > -     memset(&port, 0, sizeof(port));
> > -     return internal_forkexec(argc, argv, &port);
> > +     if (!ConnProcPort)
> > +             ConnProcPort = palloc0(sizeof(*ConnProcPort));
> > +
> > +     return internal_forkexec(argc, argv);
> >  }
>
> What's the point of creating a dummy ConnProcPort? And when is this
> being pfreed? I mean previously this at least used a stack variable that
> was automatically being released by the time postmaster_forkexec
> returned, now it sure looks like a permanent leak to me.

Will address this in the next pass.

>
> Also, why is it a good idea to rely *more* on data being implicitly
> passed by stuffing things into global variables? That strikes me as

It is certainly not. This will be addressed in the next pass, thank you!

>
>
> >               /* in parent, fork failed */
> > -             int                     save_errno = errno;
> > +             int save_errno = errno;
> >
> > -             errno = save_errno;
> > -             switch (type)
> > -             {
> > -                     case StartupProcess:
> > -                             ereport(LOG,
> > -                                             (errmsg("could not fork startup process: %m")));
> > -                             break;
> > -                     case BgWriterProcess:
> > -                             ereport(LOG,
> > -                                             (errmsg("could not fork background writer process: %m")));
> > -                             break;
> > -                     case CheckpointerProcess:
> > -                             ereport(LOG,
> > -                                             (errmsg("could not fork checkpointer process: %m")));
> > -                             break;
> > -                     case WalWriterProcess:
> > -                             ereport(LOG,
> > -                                             (errmsg("could not fork WAL writer process: %m")));
> > -                             break;
> > -                     case WalReceiverProcess:
> > -                             ereport(LOG,
> > -                                             (errmsg("could not fork WAL receiver process: %m")));
> > -                             break;
> > -                     default:
> > -                             ereport(LOG,
> > -                                             (errmsg("could not fork process: %m")));
> > -                             break;
> > -             }
> > +             errno =  save_errno;
> > +             ereport(LOG,
> > +                     (errmsg("could not fork %s process: %m", fork_data->type_desc)));
>
> Previously the process type was translatable, now it is not
> anymore. You'd probably have to solve that somehow, perhaps by
> translating the type_desc once, when setting it.

+1

>
>
> > @@ -1113,7 +1113,7 @@ process_startup_options(Port *port, bool am_superuser)
> >               av = (char **) palloc(maxac * sizeof(char *));
> >               ac = 0;
> >
> > -             av[ac++] = "postgres";
> > +             av[ac++] = pstrdup("postgres");
> >
> >               pg_split_opts(av, &ac, port->cmdline_options);
>
> Again, what's with all these conversions to dynamic memory allocations?
>
>
>
> >  /* Flags to tell if we are in an autovacuum process */
> > -static bool am_autovacuum_launcher = false;
> > -static bool am_autovacuum_worker = false;
> > +bool am_autovacuum_launcher = false;
> > +bool am_autovacuum_worker = false;
>
> I'm against exposing a multitude of am_* variables for each process
> type, especially globally.  If we're building proper infrastructure
> here, why do we need multiple variables, rather than exactly one
> indicating the process type?

Noted.

>
>
>
> > @@ -347,88 +343,53 @@ static void avl_sigusr2_handler(SIGNAL_ARGS);
> >  static void avl_sigterm_handler(SIGNAL_ARGS);
> >  static void autovac_refresh_stats(void);
> >
> > -
> > -
> >  /********************************************************************
> >   *                                     AUTOVACUUM LAUNCHER CODE
> >   ********************************************************************/
>
> Spurious whitespace changes (in a few other places too).

Noted.

>
>
> > +void
> > +PrepAutoVacProcessFork(ForkProcData *autovac_fork)
> >  {
>
> I'm really not in love with keeping "fork" in all these, given that
> we're not actually going to fork in some cases. Why do we not just call
> it subprocess and get rid of the "fork" and "forkexec" suffixes?

Sounds good to me.

>
>
>
>
> > +/*
> > + * shmemSetup
> > + *
> > + * Helper function for a child to set up shmem before
> > + * executing.
> > + *
> > + * aux_process - set to true if an auxiliary process.
> > + */
> > +static void
> > +shmemSetup(bool aux_process)
> > +{
> > +     /* Restore basic shared memory pointers */
> > +     InitShmemAccess(UsedShmemSegAddr);
> > +
> > +     /* Need a PGPROC to run CreateSharedMemoryAndSemaphores */
> > +     if (aux_process)
> > +             InitAuxiliaryProcess();
> > +     else
> > +             InitProcess();
> > +
> > +     /* Attach process to shared data structures */
> > +     CreateSharedMemoryAndSemaphores();
> > +}
>
> Not quite convinced this is a useful abstraction.  I'm inclined to think
> that this again should use something more like one array of
> per-subprocess properties.
>
>
> Imagine if we had an array like
>
> typedef struct PgSubProcess
> {
>         const char *name;
>         const char *translated_name;
>         bool            needs_shmem;
>         bool            needs_aux_proc;
>         bool            needs_full_proc;
>         SubProcessEntryPoint entrypoint;
> } PgSubProcess;
>
> PgSubProcess process_types[] = {
>     {.name = "startup process", .needs_shmem = true, .needs_aux_proc = true, .entrypoint = StartupProcessMain},
>     {.name = "background writer", .needs_shmem = true, .needs_aux_proc = true, .entrypoint = BackgroundWriterMain},
>     {.name = "autovacuum launcher", .needs_shmem = true, .needs_aux_proc = true, .entrypoint = AutoVacLauncherMain},
>     {.name = "backend", .needs_shmem = true, .needs_full_proc = true, , .entrypoint = PostgresMain},

Great idea, thanks!

>
>
> > From efc6f0531b71847c6500062b5a0fe43331a6ea7e Mon Sep 17 00:00:00 2001
> > From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> > Date: Fri, 27 Sep 2019 19:13:53 -0400
> > Subject: [PATCH 3/8] Add centralized stat collector
> >
> > ---
> >  src/backend/postmaster/pgstat.c       | 88 ++++++---------------------
> >  src/backend/postmaster/postmaster.c   |  5 +-
> >  src/include/pgstat.h                  |  4 +-
> >  src/include/postmaster/fork_process.h |  1 +
> >  4 files changed, 26 insertions(+), 72 deletions(-)
> >
> > diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
> > index 011076c3e3..73d953aedf 100644
> > --- a/src/backend/postmaster/pgstat.c
> > +++ b/src/backend/postmaster/pgstat.c
> > @@ -280,10 +280,6 @@ static instr_time total_func_time;
> >   * Local function forward declarations
> >   * ----------
> >   */
> > -#ifdef EXEC_BACKEND
> > -static pid_t pgstat_forkexec(void);
> > -#endif
> > -
> >  NON_EXEC_STATIC void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
> >  static void pgstat_exit(SIGNAL_ARGS);
> >  static void pgstat_beshutdown_hook(int code, Datum arg);
> > @@ -689,53 +685,26 @@ pgstat_reset_all(void)
> >       pgstat_reset_remove_files(PGSTAT_STAT_PERMANENT_DIRECTORY);
> >  }
> >
> > -#ifdef EXEC_BACKEND
> >
> >  /*
> > - * pgstat_forkexec() -
> > - *
> > - * Format up the arglist for, then fork and exec, statistics collector process
> > - */
> > -static pid_t
> > -pgstat_forkexec(void)
> > -{
> > -     char       *av[10];
> > -     int                     ac = 0;
> > -
> > -     av[ac++] = "postgres";
> > -     av[ac++] = "--forkcol";
> > -     av[ac++] = NULL;                        /* filled in by postmaster_forkexec */
> > -
> > -     av[ac] = NULL;
> > -     Assert(ac < lengthof(av));
> > -
> > -     return postmaster_forkexec(ac, av);
> > -}
> > -#endif                                                       /* EXEC_BACKEND */
> > -
> > -
> > -/*
> > - * pgstat_start() -
> > + * PrepPgstatCollectorFork
> >   *
> >   *   Called from postmaster at startup or after an existing collector
> >   *   died.  Attempt to fire up a fresh statistics collector.
> >   *
> > - *   Returns PID of child process, or 0 if fail.
> > - *
> > - *   Note: if fail, we will be called again from the postmaster main loop.
> >   */
> > -int
> > -pgstat_start(void)
> > +void
> > +PrepPgstatCollectorFork(ForkProcData *pgstat_fork)
> >  {
> > +     int                             ac = 0;
> >       time_t          curtime;
> > -     pid_t           pgStatPid;
> >
> >       /*
> >        * Check that the socket is there, else pgstat_init failed and we can do
> >        * nothing useful.
> >        */
> >       if (pgStatSock == PGINVALID_SOCKET)
> > -             return 0;
> > +             return;
> >
> >       /*
> >        * Do nothing if too soon since last collector start.  This is a safety
> > @@ -746,45 +715,20 @@ pgstat_start(void)
> >       curtime = time(NULL);
> >       if ((unsigned int) (curtime - last_pgstat_start_time) <
> >               (unsigned int) PGSTAT_RESTART_INTERVAL)
> > -             return 0;
> > +             return;
> >       last_pgstat_start_time = curtime;
> >
> > -     /*
> > -      * Okay, fork off the collector.
> > -      */
> >  #ifdef EXEC_BACKEND
> > -     switch ((pgStatPid = pgstat_forkexec()))
> > -#else
> > -     switch ((pgStatPid = fork_process()))
> > -#endif
> > -     {
> > -             case -1:
> > -                     ereport(LOG,
> > -                                     (errmsg("could not fork statistics collector: %m")));
> > -                     return 0;
> > -
> > -#ifndef EXEC_BACKEND
> > -             case 0:
> > -                     /* in postmaster child ... */
> > -                     InitPostmasterChild();
> > -
> > -                     /* Close the postmaster's sockets */
> > -                     ClosePostmasterPorts(false);
> > -
> > -                     /* Drop our connection to postmaster's shared memory, as well */
> > -                     dsm_detach_all();
> > -                     PGSharedMemoryDetach();
> > -
> > -                     PgstatCollectorMain(0, NULL);
> > -                     break;
> > +     pgstat_fork->av[ac++] = pstrdup("postgres");
> > +     pgstat_fork->av[ac++] = pstrdup("--forkcol");
> > +     pgstat_fork->av[ac++] = NULL;                   /* filled in by postmaster_forkexec */
> >  #endif
> >
> > -             default:
> > -                     return (int) pgStatPid;
> > -     }
> > +     pgstat_fork->ac = ac;
> > +     Assert(pgstat_fork->ac < lengthof(*pgstat_fork->av));
> >
> > -     /* shouldn't get here */
> > -     return 0;
> > +     pgstat_fork->type_desc = pstrdup("statistics collector");
> > +     pgstat_fork->child_main = PgstatCollectorMain;
> >  }
> >
> >  void
> > @@ -4425,12 +4369,16 @@ pgstat_send_bgwriter(void)
> >   * ----------
> >   */
> >  NON_EXEC_STATIC void
> > -PgstatCollectorMain(int argc, char *argv[])
> > +PgstatCollectorMain(pg_attribute_unused() int argc, pg_attribute_unused() char *argv[])
> >  {
> >       int                     len;
> >       PgStat_Msg      msg;
> >       int                     wr;
> >
> > +     /* Drop our connection to postmaster's shared memory, as well */
> > +     dsm_detach_all();
> > +     PGSharedMemoryDetach();
> > +
> >       /*
> >        * Ignore all signals usually bound to some action in the postmaster,
> >        * except SIGHUP and SIGQUIT.  Note we don't need a SIGUSR1 handler to
> > diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> > index 35cd1479b9..37a36387a3 100644
> > --- a/src/backend/postmaster/postmaster.c
> > +++ b/src/backend/postmaster/postmaster.c
> > @@ -549,6 +549,7 @@ static void ShmemBackendArrayRemove(Backend *bn);
> >  #define StartWalReceiver()           StartChildProcess(WalReceiverFork)
> >  #define StartAutoVacLauncher()               StartChildProcess(AutoVacLauncherFork)
> >  #define StartAutoVacWorker()         StartChildProcess(AutoVacWorkerFork)
> > +#define pgstat_start()                       StartChildProcess(PgstatCollectorFork)
> >
> >  /* Macros to check exit status of a child process */
> >  #define EXIT_STATUS_0(st)  ((st) == 0)
> > @@ -5459,7 +5460,9 @@ StartChildProcess(ForkProcType type)
> >               case AutoVacWorkerFork:
> >                       PrepAutoVacProcessFork(fork_data);
> >                       break;
> > -
> > +             case PgstatCollectorFork:
> > +                     PrepPgstatCollectorFork(fork_data);
> > +                     break;
> >               default:
> >                       break;
> >       }
> > diff --git a/src/include/pgstat.h b/src/include/pgstat.h
> > index fe076d823d..00e95caa60 100644
> > --- a/src/include/pgstat.h
> > +++ b/src/include/pgstat.h
> > @@ -15,6 +15,7 @@
> >  #include "libpq/pqcomm.h"
> >  #include "port/atomics.h"
> >  #include "portability/instr_time.h"
> > +#include "postmaster/fork_process.h"
> >  #include "postmaster/pgarch.h"
> >  #include "storage/proc.h"
> >  #include "utils/hsearch.h"
> > @@ -1235,10 +1236,11 @@ extern Size BackendStatusShmemSize(void);
> >  extern void CreateSharedBackendStatus(void);
> >
> >  extern void pgstat_init(void);
> > -extern int   pgstat_start(void);
> >  extern void pgstat_reset_all(void);
> >  extern void allow_immediate_pgstat_restart(void);
> >
> > +extern void PrepPgstatCollectorFork(ForkProcData *pgstat_fork);
> > +
> >  #ifdef EXEC_BACKEND
> >  extern void PgstatCollectorMain(int argc, char *argv[]) pg_attribute_noreturn();
> >  #endif
> > diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
> > index 1f319fc98f..16ca8be968 100644
> > --- a/src/include/postmaster/fork_process.h
> > +++ b/src/include/postmaster/fork_process.h
> > @@ -26,6 +26,7 @@ typedef enum
> >     WalReceiverFork,  /* end of Auxiliary Process Forks */
> >     AutoVacLauncherFork,
> >     AutoVacWorkerFork,
> > +   PgstatCollectorFork,
> >
> >     NUMFORKPROCTYPES                  /* Must be last! */
> >  } ForkProcType;
>
>
>
>
> > ---
> >  src/backend/postmaster/postmaster.c   | 309 +++++++++++++-------------
> >  src/include/postmaster/fork_process.h |   5 +
> >  src/include/postmaster/postmaster.h   |   1 -
> >  3 files changed, 160 insertions(+), 155 deletions(-)
> >
> > diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
> > index 8f862fcd64..b55cc4556d 100644
> > --- a/src/backend/postmaster/postmaster.c
> > +++ b/src/backend/postmaster/postmaster.c
> > @@ -144,6 +144,7 @@ static Backend *ShmemBackendArray;
> >
> >  BackgroundWorker *MyBgworkerEntry = NULL;
> >  RegisteredBgWorker *CurrentBgWorker = NULL;
> > +static Backend          *MyBackend;
> >  static int                   child_errno;
>
> This is another global variable that strikes me as a bad idea. Whereas
> MyBgworkerEntry is only set in the bgworker itself, you appear to set
> MyBackend *in postmaster*:

Okay, will address this in the next pass.

>
> > +/*
> > + * PrepBackendFork
> > + *
> > + * Prepare a ForkProcType struct for starting a Backend.
> > + * This does all prep related to av parameters and error messages.
> > +*/
> > +static void
> > +PrepBackendFork(ForkProcData *backend_fork)
> > +{
> > +     int                     ac = 0;
> > +
> > +     /*
> > +      * Create backend data structure.  Better before the fork() so we can
> > +      * handle failure cleanly.
> > +      */
> > +     MyBackend = (Backend *) malloc(sizeof(Backend));
> > +     if (!MyBackend)
> > +     {
> > +             ereport(LOG,
> > +                             (errcode(ERRCODE_OUT_OF_MEMORY),
> > +                              errmsg("out of memory")));
> > +     }
>
> That's totally not ok, we'll constantly overwrite the variable set by
> another process, and otherwise have now obsolete values in there. And
> even worse, you leave it dangling in some cases:

Got it. Will address in the next pass.

>
> > +     /*
> > +      * Compute the cancel key that will be assigned to this backend. The
> > +      * backend will have its own copy in the forked-off process' value of
> > +      * MyCancelKey, so that it can transmit the key to the frontend.
> > +      */
> > +     if (!RandomCancelKey(&MyCancelKey))
> > +     {
> > +             free(MyBackend);
> > +             ereport(LOG,
> > +                             (errcode(ERRCODE_INTERNAL_ERROR),
> > +                              errmsg("could not generate random cancel key")));
> > +     }
>
>
> Nor do I see where you're actually freeing that memory in postmaster, in
> the successful case.  Previously it was freed in postmaster in the
> success case
>
> > -#ifdef EXEC_BACKEND
> > -     pid = backend_forkexec(port);
> > -#else                                                        /* !EXEC_BACKEND */
> > -     pid = fork_process();
> > -     if (pid == 0)                           /* child */
> > -     {
> > -             free(bn);
> > -
>
> but now it appears  it's only freed in the failure case:

Good catch. Will fix.

>
> > +/*
> > + *    BackendFailFork
> > + *
> > + *    Backend cleanup in case a failure occurs forking a new Backend.
> > + */
> > +static void BackendFailFork(pg_attribute_unused() int argc, pg_attribute_unused() char *argv[])
> > +{
> > +     if (!MyBackend->dead_end)
> > +             (void) ReleasePostmasterChildSlot(MyBackend->child_slot);
> > +     free(MyBackend);
> > +
> > +     report_fork_failure_to_client(MyProcPort, child_errno);
> > +}
>
>
>
> > From 718c6598e9e02e1ce9ade99afb3c8948c67ef5ae Mon Sep 17 00:00:00 2001
> > From: Mike Palmiotto <mike.palmiotto@crunchydata.com>
> > Date: Tue, 19 Feb 2019 15:29:33 +0000
> > Subject: [PATCH 8/8] Add a hook to allow extensions to set worker metadata
> >
> > This patch implements a hook that allows extensions to modify a worker
> > process' metadata in backends.
> >
> > Additional work done by Yuli Khodorkovskiy <yuli@crunchydata.com>
> > Original discussion here:
https://www.postgresql.org/message-id/flat/CAMN686FE0OdZKp9YPO%3DhtC6LnA6aW4r-%2Bjq%3D3Q5RAoFQgW8EtA%40mail.gmail.com
> > ---
> >  src/backend/postmaster/fork_process.c | 11 +++++++++++
> >  src/include/postmaster/fork_process.h |  4 ++++
> >  2 files changed, 15 insertions(+)
> >
> > diff --git a/src/backend/postmaster/fork_process.c b/src/backend/postmaster/fork_process.c
> > index a9138f589e..964c04d846 100644
> > --- a/src/backend/postmaster/fork_process.c
> > +++ b/src/backend/postmaster/fork_process.c
> > @@ -21,6 +21,14 @@
> >  #include <openssl/rand.h>
> >  #endif
> >
> > +/*
> > + * This hook allows plugins to get control of the child process following
> > + * a successful fork_process(). It can be used to perform some action in
> > + * extensions to set metadata in backends (including special backends) upon
> > + * setting of the session user.
> > + */
> > +ForkProcess_post_hook_type ForkProcess_post_hook = NULL;
> > +
> >  #ifndef WIN32
> >  /*
> >   * Wrapper for fork(). Return values are the same as those for fork():
> > @@ -113,6 +121,9 @@ fork_process(void)
> >  #ifdef USE_OPENSSL
> >               RAND_cleanup();
> >  #endif
> > +
> > +             if (ForkProcess_post_hook)
> > +                     (*ForkProcess_post_hook) ();
> >       }
>
>
> >       return result;
> > diff --git a/src/include/postmaster/fork_process.h b/src/include/postmaster/fork_process.h
> > index 631a2113b5..d756fcead7 100644
> > --- a/src/include/postmaster/fork_process.h
> > +++ b/src/include/postmaster/fork_process.h
> > @@ -58,4 +58,8 @@ extern pid_t fork_process(void);
> >  extern pid_t postmaster_forkexec(ForkProcData *fork_data);
> >  #endif
> >
> > +/* Hook for plugins to get control after a successful fork_process() */
> > +typedef void (*ForkProcess_post_hook_type) ();
> > +extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook;
> > +
> >  #endif                                                       /* FORK_PROCESS_H */
> > --
> > 2.23.0
> >
>
> Why do we want libraries to allow to hook into processes like the
> startup process etc?

There are a number of OS-level process manipulations that this could
afford you as an extension developer. For instance, you could roll
your own seccomp implementation (to limit syscalls per-process-type,
perhaps?), perform some setcap magic, or some other security-related
magic.

Thanks for your patience with this patchset. I understand it's still
very immature. Hopefully all of the concerns raised will be adjusted
in the follow-up patches.



-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Andres Freund
Date:
Hi,

On 2019-10-03 14:33:26 -0400, Mike Palmiotto wrote:
> > > +#ifdef EXEC_BACKEND
> > > +     aux_fork->av[ac++] = pstrdup("--forkboot");
> > > +     aux_fork->av[ac++] = NULL;                      /* filled in by postmaster_forkexec */
> > > +#endif
> >
> > What's the point of pstrdup()ing constant strings here? Afaict you're
> > not freeing them ever?
>
> The idea was supposed to be that the prep work lives up until the fork
> happens and then is freed en masse with the rest of the MemoryContext.
> Perhaps there was some oversight here. I'll revisit this and others in
> the next pass.

Well, two things:
1) That'd *NEVER* be more efficient that just referencing constant
   strings. Unless you pfree() the variable, and there sometimes can be
   constant, and sometimes non-constant strings, there is simply no
   reason to ever pstrdup a constant string.
2) Which context are you talking about? Are you thinking of? A forked
   process doing MemoryContextDelete(PostmasterContext); doesn't free
   that memory in postmaster.


> > > @@ -58,4 +58,8 @@ extern pid_t fork_process(void);
> > >  extern pid_t postmaster_forkexec(ForkProcData *fork_data);
> > >  #endif
> > >
> > > +/* Hook for plugins to get control after a successful fork_process() */
> > > +typedef void (*ForkProcess_post_hook_type) ();
> > > +extern PGDLLIMPORT ForkProcess_post_hook_type ForkProcess_post_hook;
> > > +
> > >  #endif                                                       /* FORK_PROCESS_H */
> > > --
> > > 2.23.0
> > >
> >
> > Why do we want libraries to allow to hook into processes like the
> > startup process etc?
>
> There are a number of OS-level process manipulations that this could
> afford you as an extension developer. For instance, you could roll
> your own seccomp implementation (to limit syscalls per-process-type,
> perhaps?), perform some setcap magic, or some other security-related
> magic.

Color me unconvinced.

Greetings,

Andres Freund



Re: Auxiliary Processes and MyAuxProc

From
Michael Paquier
Date:
On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote:
> Color me unconvinced.

The latest comments of the thread have not been addressed yet. so I am
marking the patch as returned with feedback.  If you think that's not
correct, please feel free to update the status of the patch.  If you
do so, please provide at the same time a rebased version of the patch,
as it is failing to apply on HEAD.
--
Michael

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Sat, Nov 30, 2019 at 9:15 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote:
> > Color me unconvinced.
>
> The latest comments of the thread have not been addressed yet. so I am
> marking the patch as returned with feedback.  If you think that's not
> correct, please feel free to update the status of the patch.  If you
> do so, please provide at the same time a rebased version of the patch,
> as it is failing to apply on HEAD.

I've finally had some time to update the patchset with an attempt at
addressing Andres' concerns. Note that the current spin has a bug
which does not allow syslogger to run properly. Additionally, it is
squashed into one patch again.  I intend to split the patch out into
separate functional patches and fix the syslogger issue, but wanted to
get this on the ticket for the open CommitFest before it closes. I'll
go ahead and re-add it and will be pushing updates as I have them.

I intend to give this patchset the time it deserves, so will be much
more responsive this go-around.

Thanks,

-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Sat, Feb 29, 2020 at 9:51 PM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Sat, Nov 30, 2019 at 9:15 PM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Thu, Oct 03, 2019 at 11:39:37AM -0700, Andres Freund wrote:
> > > Color me unconvinced.
> >
> > The latest comments of the thread have not been addressed yet. so I am
> > marking the patch as returned with feedback.  If you think that's not
> > correct, please feel free to update the status of the patch.  If you
> > do so, please provide at the same time a rebased version of the patch,
> > as it is failing to apply on HEAD.
>
> I've finally had some time to update the patchset with an attempt at
> addressing Andres' concerns. Note that the current spin has a bug
> which does not allow syslogger to run properly. Additionally, it is
> squashed into one patch again.  I intend to split the patch out into
> separate functional patches and fix the syslogger issue, but wanted to
> get this on the ticket for the open CommitFest before it closes. I'll
> go ahead and re-add it and will be pushing updates as I have them.

Okay, here is an updated and rebased patch that passes all regression
tests with and without EXEC_BACKEND. This also treats more of the
issues raised by Andres.

I still need to do the following:
- split giant patch out into separate functional commits
- add translated process descriptions
- fix some variable names here and there (notably the PgSubprocess
struct ".progname" member -- that's a misnomer.
- address some valgrind findings

I should be able to split this out into smaller commits sometime today
and will continue iterating to scratch the other items off the list.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Mon, Mar 2, 2020 at 9:53 AM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Sat, Feb 29, 2020 at 9:51 PM Mike Palmiotto
> <mike.palmiotto@crunchydata.com> wrote:
> >
<snip>
> Okay, here is an updated and rebased patch that passes all regression
> tests with and without EXEC_BACKEND. This also treats more of the
> issues raised by Andres.
>
> I still need to do the following:
> - split giant patch out into separate functional commits
> - add translated process descriptions
> - fix some variable names here and there (notably the PgSubprocess
> struct ".progname" member -- that's a misnomer.
> - address some valgrind findings
>
> I should be able to split this out into smaller commits sometime today
> and will continue iterating to scratch the other items off the list.

I've addressed all of the points above (except splitting the base
patch up) and rebased on master. I didn't squash before generating the
patches so it may be easier to read the progress here, or annoying
that there are so many attachments.

This patchset hits all of Andres' points with the exception of the
reshuffling of functions like AuxiliaryProcessMain, but it does get
rid of a lot of direct argv string comparison. I'll reorganize a bit
when I re-write the patchset and will likely move
AuxiliaryProcessMain, along with the Backend subprocess functions but
just wanted to get this up here in the meantime.

There is quite a lot to be gained from this patchset in the form of
re-organization and removal of redundant code-paths/globals. I see
this as a good first step and am willing to continue to push it
forward with any new suggestions.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Tue, Mar 3, 2020 at 11:56 PM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Mon, Mar 2, 2020 at 9:53 AM Mike Palmiotto
> <mike.palmiotto@crunchydata.com> wrote:
> >
> > On Sat, Feb 29, 2020 at 9:51 PM Mike Palmiotto
> > <mike.palmiotto@crunchydata.com> wrote:
> > >
> <snip>
> > Okay, here is an updated and rebased patch that passes all regression
> > tests with and without EXEC_BACKEND. This also treats more of the
> > issues raised by Andres.
> >
> > I still need to do the following:
> > - split giant patch out into separate functional commits
> > - add translated process descriptions
> > - fix some variable names here and there (notably the PgSubprocess
> > struct ".progname" member -- that's a misnomer.
> > - address some valgrind findings
> >
> > I should be able to split this out into smaller commits sometime today
> > and will continue iterating to scratch the other items off the list.
>
> I've addressed all of the points above (except splitting the base
> patch up) and rebased on master. I didn't squash before generating the
> patches so it may be easier to read the progress here, or annoying
> that there are so many attachments.
>
> This patchset hits all of Andres' points with the exception of the
> reshuffling of functions like AuxiliaryProcessMain, but it does get
> rid of a lot of direct argv string comparison. I'll reorganize a bit
> when I re-write the patchset and will likely move
> AuxiliaryProcessMain, along with the Backend subprocess functions but
> just wanted to get this up here in the meantime.
>
> There is quite a lot to be gained from this patchset in the form of
> re-organization and removal of redundant code-paths/globals. I see
> this as a good first step and am willing to continue to push it
> forward with any new suggestions.

The patchset is now split out. I've just noticed that Peter Eisentraut
included some changes for a generic MyBackendType, which I should have
been aware of. I was unable to rebase due to these changes, but can
fold these patches into that framework if others think it's
worthwhile.

The resulting diffstat is as follows:

% git diff 24d85952a57b16090ca8ad9cf800fbdd9ddd104f --shortstat
 29 files changed, 1029 insertions(+), 1201 deletions(-)

-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Justin Pryzby
Date:
On Tue, Mar 17, 2020 at 02:50:19PM -0400, Mike Palmiotto wrote:
> The patchset is now split out. I've just noticed that Peter Eisentraut
> included some changes for a generic MyBackendType, which I should have
> been aware of. I was unable to rebase due to these changes, but can
> fold these patches into that framework if others think it's
> worthwhile.

I don't have many comments on the patch, but it's easy enough to rebase.

I think maybe you'll want to do something more with this new function:
GetBackendTypeDesc()

+       /* Don't panic. */

+1

-- 
Justin

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Alvaro Herrera
Date:
On 2020-Mar-17, Justin Pryzby wrote:

> +static PgSubprocess process_types[] = {
> +    {
> +        .desc = "checker",
> +        .entrypoint = CheckerModeMain
> +    },
> +    {
> +        .desc = "bootstrap",
> +        .entrypoint = BootstrapModeMain
> +    },

Maybe this stuff can be resolved using a technique like rmgrlist.h or
cmdtaglist.h.  That way it's not in two separate files.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Tue, Mar 17, 2020 at 9:04 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Mar-17, Justin Pryzby wrote:
>
> > +static PgSubprocess process_types[] = {
> > +     {
> > +             .desc = "checker",
> > +             .entrypoint = CheckerModeMain
> > +     },
> > +     {
> > +             .desc = "bootstrap",
> > +             .entrypoint = BootstrapModeMain
> > +     },
>
> Maybe this stuff can be resolved using a technique like rmgrlist.h or
> cmdtaglist.h.  That way it's not in two separate files.

Great suggestion, thanks! I'll try this out in the next version.

-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Justin Pryzby
Date:
On Wed, Mar 18, 2020 at 09:22:58AM -0400, Mike Palmiotto wrote:
> On Tue, Mar 17, 2020 at 9:04 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > On 2020-Mar-17, Justin Pryzby wrote:
> >
> > > +static PgSubprocess process_types[] = {
> > > +     {
> > > +             .desc = "checker",
> > > +             .entrypoint = CheckerModeMain
> > > +     },
> > > +     {
> > > +             .desc = "bootstrap",
> > > +             .entrypoint = BootstrapModeMain
> > > +     },
> >
> > Maybe this stuff can be resolved using a technique like rmgrlist.h or
> > cmdtaglist.h.  That way it's not in two separate files.
> 
> Great suggestion, thanks! I'll try this out in the next version.

Also, I saw this was failing tests both before and after my rebase.

http://cfbot.cputube.org/
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446

-- 
Justin



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Mar 18, 2020 at 09:22:58AM -0400, Mike Palmiotto wrote:
> > On Tue, Mar 17, 2020 at 9:04 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > >
> > > On 2020-Mar-17, Justin Pryzby wrote:
> > >
> > > > +static PgSubprocess process_types[] = {
> > > > +     {
> > > > +             .desc = "checker",
> > > > +             .entrypoint = CheckerModeMain
> > > > +     },
> > > > +     {
> > > > +             .desc = "bootstrap",
> > > > +             .entrypoint = BootstrapModeMain
> > > > +     },
> > >
> > > Maybe this stuff can be resolved using a technique like rmgrlist.h or
> > > cmdtaglist.h.  That way it's not in two separate files.
> >
> > Great suggestion, thanks! I'll try this out in the next version.
>
> Also, I saw this was failing tests both before and after my rebase.
>
> http://cfbot.cputube.org/
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446

Good catch, thanks. Will address this as well in the next round. Just
need to set up a Windows dev environment to see if I can
reproduce/fix.

-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Also, I saw this was failing tests both before and after my rebase.
> >
> > http://cfbot.cputube.org/
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
>
> Good catch, thanks. Will address this as well in the next round. Just
> need to set up a Windows dev environment to see if I can
> reproduce/fix.

While I track this down, here is a rebased patchset, which drops
MySubprocessType and makes use of the MyBackendType.

Once I get the fix for Windows builds, I'll see about making use of
the rmgrlist/cmdtaglist-style technique that Alvaro suggested.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Alvaro Herrera
Date:
On 2020-Mar-18, Mike Palmiotto wrote:

> On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> <mike.palmiotto@crunchydata.com> wrote:
> >
> > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > Also, I saw this was failing tests both before and after my rebase.
> > >
> > > http://cfbot.cputube.org/
> > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> >
> > Good catch, thanks. Will address this as well in the next round. Just
> > need to set up a Windows dev environment to see if I can
> > reproduce/fix.
> 
> While I track this down, here is a rebased patchset, which drops
> MySubprocessType and makes use of the MyBackendType.

Note that you can compile with -DEXEC_BACKEND to use in a *nix build the
same technique used to spawn processes in Windows, which might be an
easier way to discover some problems without a proper Windows build.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Wed, Mar 18, 2020 at 12:54 PM Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>
> On 2020-Mar-18, Mike Palmiotto wrote:
>
> > On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> > <mike.palmiotto@crunchydata.com> wrote:
> > >
> > > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > Also, I saw this was failing tests both before and after my rebase.
> > > >
> > > > http://cfbot.cputube.org/
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> > >
> > > Good catch, thanks. Will address this as well in the next round. Just
> > > need to set up a Windows dev environment to see if I can
> > > reproduce/fix.
> >
> > While I track this down, here is a rebased patchset, which drops
> > MySubprocessType and makes use of the MyBackendType.
>
> Note that you can compile with -DEXEC_BACKEND to use in a *nix build the
> same technique used to spawn processes in Windows, which might be an
> easier way to discover some problems without a proper Windows build.

Good suggestion. Unfortunately,that's how I've been testing all along.
I thought that would be sufficient, but seems like this might be more
specific to the Windows #ifdef's.

I have another version on my devbox which fixes the (embarrassing)
Travis failure for non-EXEC_BACKEND due to a dropped semicolon during
rebase. I'm setting up my own appveyor instance with Tomas's config
and will see if I can get to the bottom of this.

-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Peter Eisentraut
Date:
On 2020-03-18 17:07, Mike Palmiotto wrote:
> On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> <mike.palmiotto@crunchydata.com> wrote:
>>
>> On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>>> Also, I saw this was failing tests both before and after my rebase.
>>>
>>> http://cfbot.cputube.org/
>>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
>>> https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
>>
>> Good catch, thanks. Will address this as well in the next round. Just
>> need to set up a Windows dev environment to see if I can
>> reproduce/fix.
> 
> While I track this down, here is a rebased patchset, which drops
> MySubprocessType and makes use of the MyBackendType.

While working on (My)BackendType, I was attempting to get rid of 
(My)AuxProcType altogether.  This would mostly work except that it's 
sort of wired into the pgstats subsystem (see NumBackendStatSlots). 
This can probably be reorganized, but I didn't pursue it further.

Now, I'm a sucker for refactoring, but I feel this proposal is going 
into a direction I don't understand.  I'd prefer that we focus around 
building out background workers as the preferred subprocess mechanism. 
Building out a second generic mechanism, again, I don't understand the 
direction.  Are we hoping to add more of these processes?  Make it 
extensible?  The net lines added/removed by this patch series seems 
pretty neutral.  What are we getting at the end of this?

More specifically, I don't agree with the wholesale renaming of 
auxiliary process to subprocess.  Besides the massive code churn, the 
old naming seemed pretty good to distinguish them from background 
workers, the new naming is more ambiguous.

Btw., if I had a lot of time I would attempt to rewrite walreceiver and 
perhaps the autovacuum system as background workers and thus further 
reduce the footprint of the aux process system.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Mar 19, 2020 at 6:35 AM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> While working on (My)BackendType, I was attempting to get rid of
> (My)AuxProcType altogether.  This would mostly work except that it's
> sort of wired into the pgstats subsystem (see NumBackendStatSlots).
> This can probably be reorganized, but I didn't pursue it further.

This patchset has a different goal: to remove redundant startup code
and interspersed variants of fork/forkexec code so that we can
centralize the postmaster child startup.

The goal of centralizing postmaster startup stems from the desire to
be able to control the process security attributes immediately
before/after fork/exec. This is simply not possible with the existing
infrastructure, since processes are identified in Main functions,
which is too late (and again too scattered) to be able to do anything
useful.

By providing a mechanism to set child process metadata prior to
spawning the subprocess, we gain the ability to identify the process
type and thus set security attributes on that process.

In an earlier spin of the patchset, I included a fork_process hook,
which would be where an extension could set security attributes on a
process. I have since dropped the fork (as advised), but now realize
that it actually demonstrates the main motivation of the patchset.
Perhaps I should add that back in for the next version.

>
> Now, I'm a sucker for refactoring, but I feel this proposal is going
> into a direction I don't understand.  I'd prefer that we focus around
> building out background workers as the preferred subprocess mechanism.
> Building out a second generic mechanism, again, I don't understand the
> direction.  Are we hoping to add more of these processes?  Make it
> extensible?  The net lines added/removed by this patch series seems
> pretty neutral.  What are we getting at the end of this?

As stated above, the primary goal is to centralize the startup code.
One nice side-effect is the introduction of a mechanism that is now
both extensible and provides the ability to remove a lot of redundant
code. I see no reason to have 5 different variants of process forkexec
functions for the sole purpose of building up argv. This patchset
intends to get rid of such an architecture.

Note that this is not intended to be the complete product here -- it
is just a first step at swapping in and making use of a new
infrastructure. There will be follow-up work required to really get
the most out of this infrastructure. For instance, we could drop a
large portion of the SubPostmasterMain switch logic. There are a
number of other areas throughout the codebase (including the example
provided in the last commit, which changes the way we retrieve process
descriptions), that can utilize this new infrastructure to get rid of
code.

>
> More specifically, I don't agree with the wholesale renaming of
> auxiliary process to subprocess.  Besides the massive code churn, the

I'm not sure I understand the argument here. Where do you see
wholesale renaming of AuxProc to Subprocess? Subprocess is the name
for postmaster subprocesses, whereas Auxiliary Processes are a subset
of those processes.

> old naming seemed pretty good to distinguish them from background
> workers, the new naming is more ambiguous.

I do not see any conflation between Background Workers and Auxiliary
Processes in this patchset. Since Auxiliary Processes are included in
the full set of subprocesses and are delineated with a boolean:
needs_aux_proc, it seems fairly straightforward to me which
subprocesses are in fact Auxiliary Processes.

Thanks,
-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Andres Freund
Date:
Hi,

On 2020-03-19 11:35:41 +0100, Peter Eisentraut wrote:
> On 2020-03-18 17:07, Mike Palmiotto wrote:
> > On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> > <mike.palmiotto@crunchydata.com> wrote:
> > >
> > > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > Also, I saw this was failing tests both before and after my rebase.
> > > >
> > > > http://cfbot.cputube.org/
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> > >
> > > Good catch, thanks. Will address this as well in the next round. Just
> > > need to set up a Windows dev environment to see if I can
> > > reproduce/fix.
> >
> > While I track this down, here is a rebased patchset, which drops
> > MySubprocessType and makes use of the MyBackendType.
>
> While working on (My)BackendType, I was attempting to get rid of
> (My)AuxProcType altogether.  This would mostly work except that it's sort of
> wired into the pgstats subsystem (see NumBackendStatSlots). This can
> probably be reorganized, but I didn't pursue it further.

Hm. Why does the number of stat slots prevent dropping AuxProcType?


> Now, I'm a sucker for refactoring, but I feel this proposal is going into a
> direction I don't understand.  I'd prefer that we focus around building out
> background workers as the preferred subprocess mechanism. Building out a
> second generic mechanism, again, I don't understand the direction.  Are we
> hoping to add more of these processes?  Make it extensible?  The net lines
> added/removed by this patch series seems pretty neutral.  What are we
> getting at the end of this?

I think background workers for internal processes are the *wrong*
direction to go. They were used as a shortcut for parallelism, and then
that was extended for logical replication. In my opinion that was done,
to a significant degree, because the aux proc stuff is/was too painful
to deal with, but that's something that should be fixed, instead of
building more and more parallel infrastructure.


Bgworkers are imo not actually a very good fit for internal
processes. We have to be able to guarantee that there's a free "slot" to
start internal processes, we should be able to efficiently reference
their pids (especially from postmaster, but also other processes), we
want to precisely know which shared PGPROC is being used, etc.

We now have somewhat different systems for at least: non-shmem
postmaster children, aux processes, autovacuum workers, internal
bgworkers, extension bgworkers. That's just insane.

We should merge those as much as possible. There's obviously going to be
some differences, but it needs to be less than now.  I think we're
mostly on the same page on that, I just don't see bgworkers getting us
there.


The worst part about the current situation imo is that there's way too
many places that one needs to modify / check to create / understand a
process. Moving towards having a single c file w/ associated header that
defines 95% of that seems like a good direction.  I've not looked at
recent versions of the patch, but there was some movement towards that
in earlier versions.

On a green field, I would say that there should be one or two C
arrays-of-structs defining subprocesses. And most behaviour should be
channeled through that.

struct PgProcessType
{
    const char* name;
    PgProcessBeforeShmem before_shmem;
    PgProcessEntry entrypoint;
    uint8:1 only_one_exists;
    uint8:1 uses_shared_memory;
    uint8:1 client_connected;
    uint8:1 database_connected;
    ...
};

PgProcessType ProcessTypes[] = {
 [StartupType] = {
     .name = "startup",
     .entrypoint = StartupProcessMain,
     .only_one_exists = true,
     .uses_shared_memory = true,
     .client_connected = false,
     .database_connected = false,
     ...
 },
 ...
 [UserBackendType] = {
     .name = "backend",
     .before_shmem = BackendInitialize,
     .entrypoint = BackendRun, // fixme
     .only_one_exists = false,
     .uses_shared_memory = true,
     .client_connected = true,
     .database_connected = true,
     ...
 }
 ...

Then there should be a single startup routine for all postmaster
children. Since most of the startup is actually shared between all the
different types of processes, we can declutter it a lot.


When starting a process postmaster would just specify the process type,
and if relevant, an argument (struct Port for backends, whatever
relevant for bgworkers etc) . Generic code should handle all the work
until the process type entry point - and likely we should move more work
from the individual process types into generic code.


If a process is 'only_one_exists' (to be renamed), the generic code
would also (in postmaster) register the pid as
subprocess_pids[type] = pid;
which would make it easy to only have per-type code in the few locations
that need to be aware, instead of many locations in
postmaster.c.   Perhaps also some shared memory location.


Coming back to earth, from green field imagination land: I think the
patchset does go in that direction (to be fair, I think I outlined
something like the above elsewhere in this thread in a discussion with
Mike). And that's good.


Greetings,

Andres Freund



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Mar 19, 2020 at 4:57 PM Andres Freund <andres@anarazel.de> wrote:
> On 2020-03-19 11:35:41 +0100, Peter Eisentraut wrote:
> > On 2020-03-18 17:07, Mike Palmiotto wrote:
> > > On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto
> > > <mike.palmiotto@crunchydata.com> wrote:
> > > >
> > > > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > > Also, I saw this was failing tests both before and after my rebase.
> > > > >
> > > > > http://cfbot.cputube.org/
> > > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161
> > > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446
> > > >
> > > > Good catch, thanks. Will address this as well in the next round. Just
> > > > need to set up a Windows dev environment to see if I can
> > > > reproduce/fix.

I'm still working on wiring up an AppVeyor instance, as seemingly
builds don't work on any of the default Azure/Visual Studio images. In
the meantime, I've fixed some spurious whitespace changes and the
compile error for non-EXEC_BACKEND. I'm posting a new version to keep
Travis happy at least while I keep working on that. Sorry for the
delay.

> On a green field, I would say that there should be one or two C
> arrays-of-structs defining subprocesses. And most behaviour should be
> channeled through that.
>
> struct PgProcessType
> {
>     const char* name;
>     PgProcessBeforeShmem before_shmem;
>     PgProcessEntry entrypoint;
>     uint8:1 only_one_exists;
>     uint8:1 uses_shared_memory;
>     uint8:1 client_connected;
>     uint8:1 database_connected;
>     ...
> };

Only some of these are currently included in the process_types struct,
but this demonstrates the extensibility of the architecture and room
for future centralization. Do you see any items in this set that
aren't currently included but are must-haves for this round?

> Then there should be a single startup routine for all postmaster
> children. Since most of the startup is actually shared between all the
> different types of processes, we can declutter it a lot.

Agreed.

> When starting a process postmaster would just specify the process type,
> and if relevant, an argument (struct Port for backends, whatever
> relevant for bgworkers etc) . Generic code should handle all the work
> until the process type entry point - and likely we should move more work
> from the individual process types into generic code.
>
> If a process is 'only_one_exists' (to be renamed), the generic code
> would also (in postmaster) register the pid as
> subprocess_pids[type] = pid;
> which would make it easy to only have per-type code in the few locations
> that need to be aware, instead of many locations in
> postmaster.c.   Perhaps also some shared memory location.

All of this sounds good.

> Coming back to earth, from green field imagination land: I think the
> patchset does go in that direction (to be fair, I think I outlined
> something like the above elsewhere in this thread in a discussion with
> Mike). And that's good.

Thanks for chiming in. Is there anything else you think needs to go in
this version to push things along -- besides fixing Windows builds, of
course?

Regards,
-- 
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Peter Eisentraut
Date:
On 2020-03-19 14:29, Mike Palmiotto wrote:
>> More specifically, I don't agree with the wholesale renaming of
>> auxiliary process to subprocess.  Besides the massive code churn, the
> 
> I'm not sure I understand the argument here. Where do you see
> wholesale renaming of AuxProc to Subprocess? Subprocess is the name
> for postmaster subprocesses, whereas Auxiliary Processes are a subset
> of those processes.

Specifically renaming AuxProcType to SubprocessType and everything that 
follows from that, including the name of new files.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Fri, Mar 20, 2020 at 6:17 PM Mike Palmiotto
<mike.palmiotto@crunchydata.com> wrote:
>
> On Thu, Mar 19, 2020 at 4:57 PM Andres Freund <andres@anarazel.de> wrote:
>
> I'm still working on wiring up an AppVeyor instance, as seemingly
> builds don't work on any of the default Azure/Visual Studio images. In
> the meantime, I've fixed some spurious whitespace changes and the
> compile error for non-EXEC_BACKEND. I'm posting a new version to keep
> Travis happy at least while I keep working on that. Sorry for the
> delay.

The attached patchset should be fixed. I rebased on master and tested
with TAP tests enabled (fork/EXEC_BACKEND/and Windows).

The AppVeyor configs that Peter posted in a separate thread were
extremely helpful. Thanks, Peter!

> > When starting a process postmaster would just specify the process type,
> > and if relevant, an argument (struct Port for backends, whatever
> > relevant for bgworkers etc) . Generic code should handle all the work
> > until the process type entry point - and likely we should move more work
> > from the individual process types into generic code.
> >
> > If a process is 'only_one_exists' (to be renamed), the generic code
> > would also (in postmaster) register the pid as
> > subprocess_pids[type] = pid;
> > which would make it easy to only have per-type code in the few locations
> > that need to be aware, instead of many locations in
> > postmaster.c.   Perhaps also some shared memory location.

I played around with this a bit last weekend and have a local/untested
patch to move all subprocess_pids into the array. I think
'is_singleton_process' would be a decent name for 'only_one_exists'.
We can also probably add a field to the subprocess array to tell which
signals each gets from postmaster.

Are these pieces required to make this patchset committable? Is there
anything else needed at this point to make it committable?

Thanks again for everyone's feedback and comments.

Regards,
--
Mike Palmiotto
https://crunchydata.com

Attachment

Re: Auxiliary Processes and MyAuxProc

From
Daniel Gustafsson
Date:
> On 27 Mar 2020, at 00:30, Mike Palmiotto <mike.palmiotto@crunchydata.com> wrote:

> Are these pieces required to make this patchset committable? Is there
> anything else needed at this point to make it committable?

The submitted version no longer applies, the 0009 patch has conflicts in
postmaster.c.  Can you please submit a new version of the patch?

cheers ./daniel


Re: Auxiliary Processes and MyAuxProc

From
Alvaro Herrera
Date:
On 2020-Jul-02, Daniel Gustafsson wrote:

> > On 27 Mar 2020, at 00:30, Mike Palmiotto <mike.palmiotto@crunchydata.com> wrote:
> 
> > Are these pieces required to make this patchset committable? Is there
> > anything else needed at this point to make it committable?
> 
> The submitted version no longer applies, the 0009 patch has conflicts in
> postmaster.c.  Can you please submit a new version of the patch?

If the first 8 patches still apply, please do keep it as needs-review
though, since we can still give advice on those first parts.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Daniel Gustafsson
Date:
> On 2 Jul 2020, at 17:03, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Jul-02, Daniel Gustafsson wrote:
>
>>> On 27 Mar 2020, at 00:30, Mike Palmiotto <mike.palmiotto@crunchydata.com> wrote:
>>
>>> Are these pieces required to make this patchset committable? Is there
>>> anything else needed at this point to make it committable?
>>
>> The submitted version no longer applies, the 0009 patch has conflicts in
>> postmaster.c.  Can you please submit a new version of the patch?
>
> If the first 8 patches still apply, please do keep it as needs-review
> though, since we can still give advice on those first parts.

Done.

cheers ./daniel


Re: Auxiliary Processes and MyAuxProc

From
Alvaro Herrera
Date:
On 2020-Mar-26, Mike Palmiotto wrote:

Regarding 0001:

> diff --git a/src/backend/postmaster/subprocess.c b/src/backend/postmaster/subprocess.c
> new file mode 100644
> index 0000000000..3e7a45bf10
> --- /dev/null
> +++ b/src/backend/postmaster/subprocess.c
> @@ -0,0 +1,62 @@
> +/*-------------------------------------------------------------------------
> + *
> + * subprocess.c
> + *
> + * Copyright (c) 2004-2020, PostgreSQL Global Development Group
> + *
> + *
> + * IDENTIFICATION
> + *      src/backend/postmaster/syslogger.c

Wrong file identification.

> +static PgSubprocess process_types[] = {
> +    {
> +        .desc = "checker",
> +        .entrypoint = CheckerModeMain
> +    },
> +    {
...

This array has to match the order in subprocess.h:

> +typedef enum
> +{
> +    NoProcessType = -1,
> +    CheckerType = 0,
> +    BootstrapType,
> +    StartupType,
> +    BgWriterType,
> +    CheckpointerType,
> +    WalWriterType,
> +    WalReceiverType,    /* end of Auxiliary Process Forks */
> +
> +    NUMSUBPROCESSTYPES            /* Must be last! */
> +} SubprocessType;

This sort of thing is messy and unfriendly to maintain.  I suggest we
use the same trick as in cmdtaglist.h and rmgrlist.h; see commits
2f9661311b83 and 5a1cd89f8f4a for examples.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Auxiliary Processes and MyAuxProc

From
Mike Palmiotto
Date:
On Thu, Jul 2, 2020 at 11:11 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Mar-26, Mike Palmiotto wrote:
>
> Regarding 0001:
>
> > diff --git a/src/backend/postmaster/subprocess.c b/src/backend/postmaster/subprocess.c
> > new file mode 100644
> > index 0000000000..3e7a45bf10
> > --- /dev/null
> > +++ b/src/backend/postmaster/subprocess.c
> > @@ -0,0 +1,62 @@
> > +/*-------------------------------------------------------------------------
> > + *
> > + * subprocess.c
> > + *
> > + * Copyright (c) 2004-2020, PostgreSQL Global Development Group
> > + *
> > + *
> > + * IDENTIFICATION
> > + *     src/backend/postmaster/syslogger.c
>
> Wrong file identification.

Thanks, I'll fix that.

<snip>

> > +     WalReceiverType,        /* end of Auxiliary Process Forks */
> > +
> > +     NUMSUBPROCESSTYPES                      /* Must be last! */
> > +} SubprocessType;
>
> This sort of thing is messy and unfriendly to maintain.  I suggest we
> use the same trick as in cmdtaglist.h and rmgrlist.h; see commits
> 2f9661311b83 and 5a1cd89f8f4a for examples.

Thanks for the reviews. I'm hoping to get to this next week (hopefully
sooner). It was on my TODO list to use this approach (from the last
round of reviews), so I'll make sure to do it first.

Other dangling items were the subprocess pids array and obviously the
rebase.  I've got a decent AppVeyor/tap test workflow now, so it
should be a little less painful this time around.

Regards,

-- 
Mike Palmiotto
https://crunchydata.com



Re: Auxiliary Processes and MyAuxProc

From
Michael Paquier
Date:
On Thu, Jul 09, 2020 at 04:17:59PM -0400, Mike Palmiotto wrote:
> Thanks for the reviews. I'm hoping to get to this next week (hopefully
> sooner). It was on my TODO list to use this approach (from the last
> round of reviews), so I'll make sure to do it first.

Nothing has happened here in two months, so I am marking this stuff as
returned with feedback.
--
Michael

Attachment