Thread: Refactoring backend fork+exec code

Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

18 June 2023, 11:22:33

I started to look at the code in postmaster.c related to launching child 
processes. I tried to reduce the difference between EXEC_BACKEND and 
!EXEC_BACKEND code paths, and put the code that needs to differ behind a 
better abstraction. I started doing this to help with implementing 
multi-threading, but it doesn't introduce anything thread-related yet 
and I think this improves readability anyway.

This is still work-inprogress, especially the last, big, patch in the 
patch set. Mainly, I need to clean up the comments in the new 
launch_backend.c file. But the other patches are in pretty good shape, 
and if you ignore launch_backend.c, you can see the effect on the other 
source files.

With these patches, there is a new function for launching a postmaster 
child process:

pid_t postmaster_child_launch(PostmasterChildType child_type, char 
*startup_data, size_t startup_data_len, ClientSocket *client_sock);

This function hides the differences between EXEC_BACKEND and 
!EXEC_BACKEND cases.

In 'startup_data', the caller can pass a blob of data to the child 
process, with different meaning for different kinds of child processes. 
For a backend process, for example, it's used to pass the CAC_state, 
which indicates whether the backend accepts the connection or just sends 
"too many clients" error. And for background workers, it's used to pass 
the BackgroundWorker struct. The startup data is passed to the child 
process in the

ClientSocket is a new struct holds a socket FD, and the local and remote 
address info. Before this patch set, postmaster initializes the Port 
structs but only fills in those fields in it. With this patch set, we 
have a new ClientSocket struct just for those fields, which makes it 
more clear which fields are initialized where.

I haven't done much testing yet, and no testing at all on Windows, so 
that's probably still broken.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Focusing on this one patch in this series:

On 11/07/2023 01:50, Andres Freund wrote:
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Mon, 12 Jun 2023 16:33:20 +0300
>> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
>>
>> We went through some effort to close them in the child process. Better to
>> not hand them down to the child process in the first place.
> 
> I think Thomas has a larger version of this patch:
> https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com

Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option 
to the *accepted* socket in commit 1da569ca1f. That was part of that 
thread. This patch adds the option to the *listen* sockets. That was not 
discussed in that thread, but it's certainly in the same vein.

Thomas: What do you think of the attached?

On 11/07/2023 00:07, Tristan Partin wrote:
>> @@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port)
>>  void
>>  StreamClose(pgsocket sock)
>>  {
>> -       closesocket(sock);
>> +       if (closesocket(sock) != 0)
>> +               elog(LOG, "closesocket failed: %m");
>>  }
>>
>>  /*
> 
> Do you think WARNING would be a more appropriate log level?

No, WARNING is for messages that you expect the client to receive. This 
failure is unexpected at the system level, the message is for the 
administrator. The distinction isn't always very clear, but LOG seems 
more appropriate in this case.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

v2-0001-Use-FD_CLOEXEC-on-ListenSockets.patch

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Thomas Munro

Date:

24 August 2023, 12:48:14

On Thu, Aug 24, 2023 at 11:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> On 11/07/2023 01:50, Andres Freund wrote:
> >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> >> Date: Mon, 12 Jun 2023 16:33:20 +0300
> >> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets
> >>
> >> We went through some effort to close them in the child process. Better to
> >> not hand them down to the child process in the first place.
> >
> > I think Thomas has a larger version of this patch:
> > https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com
>
> Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option
> to the *accepted* socket in commit 1da569ca1f. That was part of that
> thread. This patch adds the option to the *listen* sockets. That was not
> discussed in that thread, but it's certainly in the same vein.
>
> Thomas: What do you think of the attached?

LGTM.  I vaguely recall thinking that it might be better to keep
EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
didn't try this one, but it looks fine with the comment to explain, as
you have it.  (It's a shame we can't use O_CLOFORK.)

There was some question in the other thread about whether doing that
to the server socket might affect accepted sockets too on some OS, but
I can at least confirm that your patch works fine on FreeBSD in an
EXEC_BACKEND build.  I think there were some historical disagreements
about which socket properties were inherited, but not that.

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Heikki Linnakangas

Date:

24 August 2023, 14:05:46

On 24/08/2023 15:48, Thomas Munro wrote:
> LGTM.  I vaguely recall thinking that it might be better to keep
> EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> didn't try this one, but it looks fine with the comment to explain, as
> you have it.  (It's a shame we can't use O_CLOFORK.)

Yeah, O_CLOFORK would be nice..

Committed, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Jeff Janes

Date:

28 August 2023, 15:55:52

On Thu, Aug 24, 2023 at 10:05 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 24/08/2023 15:48, Thomas Munro wrote:
> LGTM. I vaguely recall thinking that it might be better to keep
> EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> didn't try this one, but it looks fine with the comment to explain, as
> you have it. (It's a shame we can't use O_CLOFORK.)

Yeah, O_CLOFORK would be nice..

Committed, thanks!

Since this commit, I'm getting a lot (63 per restart) of messages:

LOG: could not close client or listen socket: Bad file descriptor

All I have to do to get the message is turn logging_collector = on and restart.

The close failure condition existed before the commit, it just wasn't logged before. So, did the extra logging added here just uncover a pre-existing bug?

The LOG message is sent to the terminal, not to the log file.

Cheers,

Jeff

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Heikki Linnakangas

Date:

28 August 2023, 20:52:15

On 28/08/2023 18:55, Jeff Janes wrote:
> Since this commit, I'm getting a lot (63 per restart) of messages:
> 
>   LOG:  could not close client or listen socket: Bad file descriptor
> All I have to do to get the message is turn logging_collector = on and 
> restart.
> 
> The close failure condition existed before the commit, it just wasn't 
> logged before.  So, did the extra logging added here just uncover a  
> pre-existing bug?

Yes, so it seems. Syslogger is started before the ListenSockets array is 
initialized, so its still all zeros. When syslogger is forked, the child 
process tries to close all the listen sockets, which are all zeros. So 
syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the 
array initialization earlier.

The first close(0) actually does have an effect: it closes stdin, which 
is fd 0. That is surely accidental, but I wonder if we should indeed 
close stdin in child processes? The postmaster process doesn't do 
anything with stdin either, although I guess a library might try to read 
a passphrase from stdin before starting up, for example.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

fix-syslogger-closesocket-errors.patch

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Michael Paquier

Date:

28 August 2023, 22:28:16

On Mon, Aug 28, 2023 at 11:52:15PM +0300, Heikki Linnakangas wrote:
> On 28/08/2023 18:55, Jeff Janes wrote:
>> Since this commit, I'm getting a lot (63 per restart) of messages:
>>
>>   LOG:  could not close client or listen socket: Bad file descriptor
>> All I have to do to get the message is turn logging_collector = on and
>> restart.
>>
>> The close failure condition existed before the commit, it just wasn't
>> logged before.  So, did the extra logging added here just uncover a
>> pre-existing bug?

In case you've not noticed:
https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9@paquier.xyz
But it does not really matter now ;)

> Yes, so it seems. Syslogger is started before the ListenSockets array is
> initialized, so its still all zeros. When syslogger is forked, the child
> process tries to close all the listen sockets, which are all zeros. So
> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
> array initialization earlier.

Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
the callback registration of CloseServerPorts() closer to the array
initialization, though?

> The first close(0) actually does have an effect: it closes stdin, which is
> fd 0. That is surely accidental, but I wonder if we should indeed close
> stdin in child processes? The postmaster process doesn't do anything with
> stdin either, although I guess a library might try to read a passphrase from
> stdin before starting up, for example.

We would have heard about that, wouldn't we?  I may be missing
something of course, but on HEAD, the array initialization is done
before starting any child processes, and the syslogger is the first
one forked.
--
Michael

Attachment

signature.asc

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Heikki Linnakangas

Date:

29 August 2023, 06:21:32

On 29/08/2023 01:28, Michael Paquier wrote:
> 
> In case you've not noticed:
> https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9@paquier.xyz
> But it does not really matter now ;)

Ah sorry, missed that thread.

>> Yes, so it seems. Syslogger is started before the ListenSockets array is
>> initialized, so its still all zeros. When syslogger is forked, the child
>> process tries to close all the listen sockets, which are all zeros. So
>> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the
>> array initialization earlier.
> 
> Yep, I've reached the same conclusion.  Wouldn't it be cleaner to move
> the callback registration of CloseServerPorts() closer to the array
> initialization, though?

Ok, pushed that way.

I checked the history of this: it goes back to commit 9a86f03b4e in 
version 13. The SysLogger_Start() call used to be later, after setting p 
ListenSockets, but that commit moved it. So I backpatched this to v13.

>> The first close(0) actually does have an effect: it closes stdin, which is
>> fd 0. That is surely accidental, but I wonder if we should indeed close
>> stdin in child processes? The postmaster process doesn't do anything with
>> stdin either, although I guess a library might try to read a passphrase from
>> stdin before starting up, for example.
> 
> We would have heard about that, wouldn't we?  I may be missing
> something of course, but on HEAD, the array initialization is done
> before starting any child processes, and the syslogger is the first
> one forked.

Yes, syslogger is the only process that closes stdin. After moving the 
initialization, it doesn't close it either.

Thinking about this some more, the ListenSockets array is a bit silly 
anyway. We fill the array starting from index 0, always append to the 
end, and never remove entries from it. It would seem more 
straightforward to keep track of the used size of the array. Currently 
we always loop through the unused parts too, and e.g. 
ConfigurePostmasterWaitSet() needs to walk the array to count how many 
elements are in use.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Heikki Linnakangas

Date:

29 August 2023, 06:58:48

On 29/08/2023 09:21, Heikki Linnakangas wrote:
> Thinking about this some more, the ListenSockets array is a bit silly
> anyway. We fill the array starting from index 0, always append to the
> end, and never remove entries from it. It would seem more
> straightforward to keep track of the used size of the array. Currently
> we always loop through the unused parts too, and e.g.
> ConfigurePostmasterWaitSet() needs to walk the array to count how many
> elements are in use.

Like this.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

0001-Refactor-ListenSocket-array.patch

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Heikki Linnakangas

Date:

05 October 2023, 12:08:37

On 29/08/2023 09:58, Heikki Linnakangas wrote:
> On 29/08/2023 09:21, Heikki Linnakangas wrote:
>> Thinking about this some more, the ListenSockets array is a bit silly
>> anyway. We fill the array starting from index 0, always append to the
>> end, and never remove entries from it. It would seem more
>> straightforward to keep track of the used size of the array. Currently
>> we always loop through the unused parts too, and e.g.
>> ConfigurePostmasterWaitSet() needs to walk the array to count how many
>> elements are in use.
> 
> Like this.

This seems pretty uncontroversial, and I heard no objections, so I went 
ahead and committed that.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Michael Paquier

Date:

06 October 2023, 05:30:16

On Thu, Oct 05, 2023 at 03:08:37PM +0300, Heikki Linnakangas wrote:
> This seems pretty uncontroversial, and I heard no objections, so I went
> ahead and committed that.

It looks like e29c4643951 is causing issues here.  While doing
benchmarking on a cluster compiled with -O2, I got a crash:
LOG:  system logger process (PID 27924) was terminated by signal 11: Segmentation fault

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055ef3b9aed20 in pfree ()
(gdb) bt
#0  0x000055ef3b9aed20 in pfree ()
#1  0x000055ef3b7e0e41 in ClosePostmasterPorts ()
#2  0x000055ef3b7e6649 in SysLogger_Start ()
#3  0x000055ef3b7e4413 in PostmasterMain ()

Okay, the backtrace is not that useful.  I'll see if I can get
something better, still it seems like this has broken the way the
syslogger closes these ports.
--
Michael

Attachment

signature.asc

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Michael Paquier

Date:

06 October 2023, 06:50:30

On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:
> Okay, the backtrace is not that useful.  I'll see if I can get
> something better, still it seems like this has broken the way the
> syslogger closes these ports.

And here you go:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
*((const uint64 *) ((const char *) pointer - sizeof(uint64)));
(gdb) bt
#0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
#1  0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
#2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571
#3  0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686
#4  0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
at postmaster.c:1148
#5  0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
(gdb) up 2
#2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
postmaster.c:2571
2571 pfree(ListenSockets);
(gdb) p ListenSockets $1 = (pgsocket *) 0x0
--
Michael

Attachment

signature.asc

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Heikki Linnakangas

Date:

06 October 2023, 07:27:22

On 06/10/2023 09:50, Michael Paquier wrote:
> On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote:
>> Okay, the backtrace is not that useful.  I'll see if I can get
>> something better, still it seems like this has broken the way the
>> syslogger closes these ports.
> 
> And here you go:
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header =
> *((const uint64 *) ((const char *) pointer - sizeof(uint64)));
> (gdb) bt
> #0  GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196
> #1  0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463
> #2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571
> #3  0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686
> #4  0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00)
> at postmaster.c:1148
> #5  0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198
> (gdb) up 2
> #2  0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at
> postmaster.c:2571
> 2571 pfree(ListenSockets);
> (gdb) p ListenSockets $1 = (pgsocket *) 0x0

Fixed, thanks!

I did a quick test with syslogger enabled before committing, but didn't 
notice the segfault. I missed it because syslogger gets restarted and 
then it worked.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)

From

Michael Paquier

Date:

06 October 2023, 08:02:50

On Fri, Oct 06, 2023 at 10:27:22AM +0300, Heikki Linnakangas wrote:
> I did a quick test with syslogger enabled before committing, but didn't
> notice the segfault. I missed it because syslogger gets restarted and then
> it worked.

Thanks, Heikki.
--
Michael

Attachment

signature.asc

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

11 October 2023, 11:12:47

I updated this patch set, addressing some of the straightforward 
comments from Tristan and Andres, and did some more cleanups, commenting 
etc. Works on Windows now.

Replies to some of the individual comments below:

On 11/07/2023 00:07, Tristan Partin wrote:
>> @@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[])
>>   * returns the pid of the fork/exec'd process, or -1 on failure
>>   */
>>  static pid_t
>> -backend_forkexec(Port *port)
>> +backend_forkexec(Port *port, CAC_state cac)
>>  {
>> -       char       *av[4];
>> +       char       *av[5];
>>         int                     ac = 0;
>> +       char            cacbuf[10];
>>
>>         av[ac++] = "postgres";
>>         av[ac++] = "--forkbackend";
>>         av[ac++] = NULL;                        /* filled in by internal_forkexec */
>>
>> +       snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac);
>> +       av[ac++] = cacbuf;
> 
> Might be worth a sanity check that there wasn't any truncation into
> cacbuf, which is an impossibility as the code is written, but still
> useful for catching a future developer error.
> 
> Is it worth adding a command line option at all instead of having the
> naked positional argument? It would help anybody who might read the
> command line what the seemingly random integer stands for.

+1. This gets refactored away in the last patch though. In the last 
patch, I used a child process name instead of an integer precisely 
because it looks nicer in "ps".

I wonder if we should add more command line arguments, just for 
informational purposes. Autovacuum worker process could display the 
database name it's connected to, for example. I don't know how important 
the command line is on Windows, is it displayed by tools that people 
care about?

On 11/07/2023 01:50, Andres Freund wrote:
> On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote:
>>  From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Sun, 18 Jun 2023 11:00:21 +0300
>> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.
>>
>> Moves InitProcess calls a little later in EXEC_BACKEND case.
> 
> What's the reason for this part? 

The point is that with this commit, InitProcess() is called at same 
place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent 
that way.

> ISTM that we'd really want to get away from plastering duplicated
> InitProcess() etc everywhere.

Sure, we could do more to reduce the duplication. I think this is a step 
in the right direction, though.

>>  From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Sun, 18 Jun 2023 13:56:44 +0300
>> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs
>>
>> - Move more of the work on a client socket to the child process.
>>
>> - Reduce the amount of data that needs to be passed from postmaster to
>>    child. (Used to pass a full Port struct, although most of the fields were
>>    empty. Now we pass the much slimmer ClientSocket.)
> 
> I think there might be extensions accessing Port. Not sure if it's worth
> worrying about, but ...

That's OK. Port still exists, it's just created a little later. It will 
be initialized by the time extensions might look at it.

>> +const        PMChildEntry entry_kinds[] = {
>> +    {"backend", BackendMain, true},
>> +
>> +    {"autovacuum launcher", AutoVacLauncherMain, true},
>> +    {"autovacuum worker", AutoVacWorkerMain, true},
>> +    {"bgworker", BackgroundWorkerMain, true},
>> +    {"syslogger", SysLoggerMain, false},
>> +
>> +    {"startup", StartupProcessMain, true},
>> +    {"bgwriter", BackgroundWriterMain, true},
>> +    {"archiver", PgArchiverMain, true},
>> +    {"checkpointer", CheckpointerMain, true},
>> +    {"wal_writer", WalWriterMain, true},
>> +    {"wal_receiver", WalReceiverMain, true},
>> +};
> 
> I'd assign them with the PostmasterChildType as index, so there's no danger of
> getting out of order.
> 
> const                PMChildEntry entry_kinds = {
>    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>    ...
> }
> 
> or such should work.

Nice, I didn't know about that syntax! Changed it that way.

> I'd also use designated initializers for the fields, it's otherwise hard to
> know what true means etc.

I think with one boolean and the struct declaration nearby, it's fine. 
If this becomes more complex in the future, with more fields, I agree.

> I think it might be good to put more into array. If we e.g. knew whether a
> particular child type is a backend-like, and aux process or syslogger, we
> could avoid the duplicated InitAuxiliaryProcess(),
> MemoryContextDelete(PostmasterContext) etc calls everywhere.

I agree we could do more refactoring here. I don't agree with adding 
more to this struct though. I'm trying to limit the code in 
launch_backend.c to hiding the differences between EXEC_BACKEND and 
!EXEC_BACKEND. In EXEC_BACKEND mode, it restores the child process to 
the same state as it is after fork() in !EXEC_BACKEND mode. Any other 
initialization steps belong elsewhere.

Some of the steps between InitPostmasterChild() and the *Main() 
functions could probably be moved around and refactored. I didn't think 
hard about that. I think that can be done separately as follow-up patch.

>> +/* Save critical backend variables into the BackendParameters struct */
>> +#ifndef WIN32
>> +static bool
>> +save_backend_variables(BackendParameters *param, ClientSocket *client_sock)
>> +#else
> 
> There's so much of this kind of thing. Could we hide it in a struct or such
> instead of needing ifdefs everywhere?

A lot of #ifdefs you mean? I agree launch_backend.c has a lot of those. 
I haven't come up with any good ideas on reducing them, unfortunately.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

29 November 2023, 23:36:25

On 11/10/2023 14:12, Heikki Linnakangas wrote:
> On 11/07/2023 01:50, Andres Freund wrote:
>>> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores.
>>>
>>> Moves InitProcess calls a little later in EXEC_BACKEND case.
>>
>> What's the reason for this part?
> 
> The point is that with this commit, InitProcess() is called at same
> place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent
> that way.
> 
>> ISTM that we'd really want to get away from plastering duplicated
>> InitProcess() etc everywhere.
> 
> Sure, we could do more to reduce the duplication. I think this is a step
> in the right direction, though.

Here's another rebased patch set. Compared to previous version, I did a 
little more refactoring around CreateSharedMemoryAndSemaphores and 
InitProcess:

- patch 1 splits CreateSharedMemoryAndSemaphores into two functions: 
CreateSharedMemoryAndSemaphores is now only called at postmaster 
startup, and a new function called AttachSharedMemoryStructs() is called 
in backends in EXEC_BACKEND mode. I extracted the common parts of those 
functions to a new static function. (Some of this refactoring used to be 
part of the 3rd patch in the series, but it seems useful on its own, so 
I split it out.)

- patch 3 moves the call to AttachSharedMemoryStructs() to 
InitProcess(), reducing the boilerplate code a little.


The patches are incrementally useful, so if you don't have time to 
review all of them, a review on a subset would be useful too.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

On 30/11/2023 20:44, Tristan Partin wrote:
> Patches 1-3 seem committable as-is.

Thanks for the review! I'm focusing on patches 1-3 now, and will come 
back to the rest after committing 1-3.

There was one test failure with EXEC_BACKEND from patch 2, in 
'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is 
empty to decide if the BackgroundWorker struct is filled in or not, but 
it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably 
should, I think that's an oversight in 'test_shm_mq', but that's a 
separate issue.

I did some more refactoring of patch 2, to fix that and to improve it in 
general. The BackgroundWorker struct is now passed through the 
fork-related functions similarly to the Port struct. That seems more 
consistent.

Attached is new version of these patches. For easier review, I made the 
new refactorings compared in a new commit 0003. I will squash that 
before pushing, but this makes it easier to see what changed.

Barring any new feedback or issues, I will commit these.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

On 30/11/2023 22:26, Andres Freund wrote:
> On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote:
>>   [...]
>>   33 files changed, 1787 insertions(+), 2002 deletions(-)
> 
> Well, that's not small...
> 
> I think it may be worth splitting some of the file renaming out into a
> separate commit, makes it harder to see what change here.

Here you are (details at end of this email)

>> +    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>> +    [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
>> +    [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
>> +    [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
>> +
>> +    [PMC_STARTUP] = {"startup", StartupProcessMain, true},
>> +    [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
>> +    [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
>> +    [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
>> +    [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
>> +    [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
>> +};
> 
> 
> It feels like we have too many different ways of documenting the type of a
> process. This new PMC_ stuff, enum AuxProcType, enum BackendType.

Agreed. And "am_walsender" and such variables.

> Which then leads to code like this:
> 
>> -CheckpointerMain(void)
>> +CheckpointerMain(char *startup_data, size_t startup_data_len)
>>   {
>>       sigjmp_buf    local_sigjmp_buf;
>>       MemoryContext checkpointer_context;
>>
>> +    Assert(startup_data_len == 0);
>> +
>> +    MyAuxProcType = CheckpointerProcess;
>> +    MyBackendType = B_CHECKPOINTER;
>> +    AuxiliaryProcessInit();
>> +
> 
> For each type of child process. That seems a bit too redundant.  Can't we
> unify this at least somewhat? Can't we just reuse BackendType here? Sure,
> there'd be pointless entry for B_INVALID, but that doesn't seem like a
> problem, could even be useful, by pointing it to a function raising an error.

There are a few differences: B_INVALID (and B_STANDALONE_BACKEND) are 
pointless for this array as you noted. But also, we don't know if the 
backend is a regular backend or WAL sender until authentication, so for 
a WAL sender, we'd need to change MyBackendType from B_BACKEND to 
B_WAL_SENDER after forking. Maybe that's ok.

I didn't do anything about this yet, but I'll give it some more thought.

>> +    if (strncmp(argv[1], "--forkchild=", 12) != 0)
>> +        elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)");
>> +    entry_name = argv[1] + 12;
>> +    found = false;
>> +    for (int idx = 0; idx < lengthof(entry_kinds); idx++)
>> +    {
>> +        if (strcmp(entry_kinds[idx].name, entry_name) == 0)
>> +        {
>> +            child_type = idx;
>> +            found = true;
>> +            break;
>> +        }
>> +    }
>> +    if (!found)
>> +        elog(ERROR, "unknown child kind %s", entry_name);
> 
> If we then have to search linearly, why don't we just pass the index into the
> array?

We could. I like the idea of a human-readable name on the command line, 
although I'm not sure if it's really visible anywhere.

>> +void
>> +BackendMain(char *startup_data, size_t startup_data_len)
>> +{
> 
> Is there any remaining reason for this to live in postmaster.c? Given that
> other backend types don't, that seems oddly assymmetrical.

Gee, another yak to shave, thanks ;-). You're right, that makes a lot of 
sense. I added another patch that moves that to a new file, 
src/backend/tcop/backend_startup.c. ProcessStartupPacket() and friends 
go there too. It might make sense to do this before the other patches, 
but it's the last patch in the series now.

I kept processCancelRequest() in postmaster.c because it looks at 
BackendList/ShmemBackendArray, which are static in postmaster.c. Some 
more refactoring might be in order there, perhaps moving those to a 
different file too. But that can be done separately, this split is 
pretty OK as is.

On 30/11/2023 20:44, Tristan Partin wrote:
>>  From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001
>>  From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>>  Date: Thu, 30 Nov 2023 00:04:22 +0200
>>  Subject: [PATCH v3 4/7] Pass CAC as argument to backend process
> 
> For me, being new to the code, it would be nice to have more of an 
> explanation as to why this is "better." I don't doubt it; it would just 
> help me and future readers of this commit in the future. More of an 
> explanation in the commit message would suffice.

Updated the commit message. It's mainly to pave the way for the next 
patches, which move the initialization of Port to the backend process, 
after forking. And that in turn paves the way for the patches after 
that. But also, very subjectively, it feels more natural to me.

> My other comment on this commit is that we now seem to have lost the 
> context on what CAC stands for. Before we had the member variable to 
> explain it. A comment on the enum would be great or changing cac named 
> variables to canAcceptConnections. I did notice in patch 7 that there 
> are still some variables named canAcceptConnections around, so I'll 
> leave this comment up to you.

Good point. The last patch in this series - which is new compared to 
previous patch version - moves CAC_state to a different header file 
again. I added a comment there.

>>  +        if (fwrite(param, paramsz, 1, fp) != 1)
>>  +        {
>>  +                ereport(LOG,
>>  +                                (errcode_for_file_access(),
>>  +                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
>>  +                FreeFile(fp);
>>  +                return -1;
>>  +        }
>>  +
>>  +        /* Release file */
>>  +        if (FreeFile(fp))
>>  +        {
>>  +                ereport(LOG,
>>  +                                (errcode_for_file_access(),
>>  +                                 errmsg("could not write to file \"%s\": %m", tmpfilename)));
>>  +                return -1;
>>  +        }
> 
> Two pieces of feedback here. I generally find write(2) more useful than 
> fwrite(3) because write(2) will report a useful errno, whereas fwrite(2) 
> just uses ferror(3). The additional errno information might be valuable 
> context in the log message. Up to you if you think it is also valuable.

In general I agree. This patch just moves existing code though, so I 
left it as is.

> The log message if FreeFile() fails doesn't seem to make sense to me. 
> I didn't see any file writing in that code path, but it is possible that 
> I missed something.

FreeFile() calls fclose(), which flushes the buffer. If fclose() fails, 
it's most likely because the write() to flush the buffer failed, so 
"could not write" is usually appropriate. (It feels ugly to me too, 
error handling with the buffered i/o functions is a bit messy. As you 
said, plain open()/write() is more clear.)

>>  +        /*
>>  +         * Need to reinitialize the SSL library in the backend, since the context
>>  +         * structures contain function pointers and cannot be passed through the
>>  +         * parameter file.
>>  +         *
>>  +         * If for some reason reload fails (maybe the user installed broken key
>>  +         * files), soldier on without SSL; that's better than all connections
>>  +         * becoming impossible.
>>  +         *
>>  +         * XXX should we do this in all child processes?  For the moment it's
>>  +         * enough to do it in backend children.
>>  +         */
>>  +#ifdef USE_SSL
>>  +        if (EnableSSL)
>>  +        {
>>  +                if (secure_initialize(false) == 0)
>>  +                        LoadedSSL = true;
>>  +                else
>>  +                        ereport(LOG,
>>  +                                        (errmsg("SSL configuration could not be loaded in child process")));
>>  +        }
>>  +#endif
> 
> Do other child process types do any non-local communication?

No. Although in theory an extension-defined background worker could do 
whatever, including opening TLS connections. It's not clear if such a
background worker would want the same initialization that we do in 
secure_initialize(), or something else.


Here is a new patch set:

> v5-0001-Pass-CAC-as-argument-to-backend-process.patch
> v5-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch
> v5-0003-Move-initialization-of-Port-struct-to-child-proce.patch

These patches form a pretty well-contained unit. The gist is to move the 
initialization of the Port struct to after forking the backend process 
(in patch 3).

I plan to polish and commit these next, so any final reviews on these 
are welcome.

> v5-0004-Extract-registration-of-Win32-deadchild-callback-.patch
> v5-0005-Move-some-functions-from-postmaster.c-to-new-sour.patch
> v5-0006-Refactor-AuxProcess-startup.patch
> v5-0007-Refactor-postmaster-child-process-launching.patch

Patches 4-6 are refactorings that don't do much good on their own, but 
they help to make patch 7 much smaller and easier to review.

I left out some of the code-moving that I had in previous patch versions:

- Previously I moved fork_process() function from fork_process.c to the 
new launch_backend.c file. That might still make sense, there is nothing 
else in fork_process.c and the only caller is in launch_backend.c. But 
I'm not sure, and it can be done separately.

- Previously I moved InitPostmasterChild from miscinit.c to the new 
launch_backend.c file. That might also still make sense, but I'm not 
100% sure it's an improvement, and it can be done later if we want to.

> v5-0008-Move-code-for-backend-startup-to-separate-file.patch

This moves BackendMain() and friends from postmaster.c to a new file, 
per Andres's suggestion.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

10 January 2024, 12:35:52

On 08/12/2023 14:33, Heikki Linnakangas wrote:
>>> +    [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},
>>> +    [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},
>>> +    [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true},
>>> +    [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false},
>>> +
>>> +    [PMC_STARTUP] = {"startup", StartupProcessMain, true},
>>> +    [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true},
>>> +    [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true},
>>> +    [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},
>>> +    [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true},
>>> +    [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},
>>> +};
>>
>> It feels like we have too many different ways of documenting the type of a
>> process. This new PMC_ stuff, enum AuxProcType, enum BackendType.
> Agreed. And "am_walsender" and such variables.

Here's a patch that gets rid of AuxProcType. It's independent of the 
other patches in this thread; if this is committed, I'll rebase the rest 
of the patches over this and get rid of the new PMC_* enum.

Three patches, actually. The first one fixes an existing comment that I 
noticed to be incorrect while working on this. I'll push that soon, 
barring objections. The second one gets rid of AuxProcType, and the 
third one replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and 
IsAutoVacuumWorkerProcess() with checks on MyBackendType. So 
MyBackendType is now the primary way to check what kind of a process the 
current process is.

'am_walsender' would also be fairly straightforward to remove and 
replace with AmWalSenderProcess(). I didn't do that because we also have 
am_db_walsender and am_cascading_walsender which cannot be directly 
replaced with MyBackendType. Given that, it might be best to keep 
am_walsender for symmetry.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

On 23/01/2024 21:50, Andres Freund wrote:
> On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote:
>> On 22/01/2024 23:07, Andres Freund wrote:
>>>> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
>>>> index 1a1050c8da1..92f24db4e18 100644
>>>> --- a/src/backend/utils/activity/backend_status.c
>>>> +++ b/src/backend/utils/activity/backend_status.c
>>>> @@ -257,17 +257,16 @@ pgstat_beinit(void)
>>>>        else
>>>>        {
>>>>            /* Must be an auxiliary process */
>>>> -        Assert(MyAuxProcType != NotAnAuxProcess);
>>>> +        Assert(IsAuxProcess(MyBackendType));
>>>>            /*
>>>>             * Assign the MyBEEntry for an auxiliary process.  Since it doesn't
>>>>             * have a BackendId, the slot is statically allocated based on the
>>>> -         * auxiliary process type (MyAuxProcType).  Backends use slots indexed
>>>> -         * in the range from 0 to MaxBackends (exclusive), so we use
>>>> -         * MaxBackends + AuxProcType as the index of the slot for an auxiliary
>>>> -         * process.
>>>> +         * auxiliary process type.  Backends use slots indexed in the range
>>>> +         * from 0 to MaxBackends (exclusive), and aux processes use the slots
>>>> +         * after that.
>>>>             */
>>>> -        MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
>>>> +        MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC];
>>>>        }
>>>
>>> Hm, this seems less than pretty. It's probably ok for now, but it seems like a
>>> better fix might be to just start assigning backend ids to aux procs or switch
>>> to indexing by pgprocno?
>>
>> Using pgprocno is a good idea. Come to think of it, why do we even have a
>> concept of backend ID that's separate from pgprocno? backend ID is used to
>> index the ProcState array, but AFAICS we could use pgprocno as the index to
>> that, too.
> 
> I think we should do that. There are a few processes not participating in
> sinval, but it doesn't make enough of a difference to make sinval slower. And
> I think there'd be far bigger efficiency improvements to sinvaladt than not
> having a handful more entries.

And here we go. BackendID is now a 1-based index directly into the 
PGPROC array.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

reid.thompson@crunchydata.com

Date:

29 January 2024, 15:54:52

On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
>
> And here we go. BackendID is now a 1-based index directly into the
> PGPROC array.
>

Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
enum table?

      /*
              
      ¦* Auxiliary processes. These have PGPROC entries, but they are not
              
      ¦* attached to any particular database. There can be only one of each of
              
      ¦* these running at a time.
              
      ¦*
              
      ¦* If you modify these, make sure to update NUM_AUXILIARY_PROCS and the
              
      ¦* glossary in the docs.
              
      ¦*/
              
      B_ARCHIVER,
              
      B_BG_WRITER,
              
      B_CHECKPOINTER,
              
      B_STARTUP,
              
      B_WAL_RECEIVER,
              
      B_WAL_SUMMARIZER,
              
      B_WAL_WRITER,
              
  } BackendType;
              

            
  #define BACKEND_NUM_TYPES (B_WAL_WRITER + 1)

              
  extern PGDLLIMPORT BackendType MyBackendType;

              
  #define FIRST_AUX_PROC B_ARCHIVER
  #define IsAuxProcess(type)          (MyBackendType >= FIRST_AUX_PROC)

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

30 January 2024, 00:08:36

On 29/01/2024 17:54, reid.thompson@crunchydata.com wrote:
> On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
>>
>> And here we go. BackendID is now a 1-based index directly into the
>> PGPROC array.
> 
> Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
> and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
> enum table?

Yeah, that might be in order. Although looking closer, it's only used in 
IsAuxProcess, which is only used in one sanity check in 
AuxProcessMain(). And even that gets refactored away by the later 
patches in this thread. So on second thoughts, I'll just remove it 
altogether.

I spent some more time on the 'lastBackend' optimization in sinvaladt.c. 
I realized that it became very useless with these patches, because aux 
processes are allocated pgprocno's after all the slots for regular 
backends. There are always aux processes running, so lastBackend would 
always have a value close to the max anyway. I replaced that with a 
dense 'pgprocnos' array that keeps track of the exact slots that are in 
use. I'm not 100% sure this is worth the effort; does any real world 
workload send shared invalidations so frequently that this matters? In 
any case, this should avoid the regression if such a workload exists.

New patch set attached. I think these are ready to be committed, but 
would appreciate a final review.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

01 February 2024, 13:54:23

On 30/01/2024 02:08, Heikki Linnakangas wrote:
> On 29/01/2024 17:54, reid.thompson@crunchydata.com wrote:
>> On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote:
>>>
>>> And here we go. BackendID is now a 1-based index directly into the
>>> PGPROC array.
>>
>> Would it be worthwhile to also note in this comment FIRST_AUX_PROC's
>> and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the
>> enum table?
> 
> Yeah, that might be in order. Although looking closer, it's only used in
> IsAuxProcess, which is only used in one sanity check in
> AuxProcessMain(). And even that gets refactored away by the later
> patches in this thread. So on second thoughts, I'll just remove it
> altogether.
> 
> I spent some more time on the 'lastBackend' optimization in sinvaladt.c.
> I realized that it became very useless with these patches, because aux
> processes are allocated pgprocno's after all the slots for regular
> backends. There are always aux processes running, so lastBackend would
> always have a value close to the max anyway. I replaced that with a
> dense 'pgprocnos' array that keeps track of the exact slots that are in
> use. I'm not 100% sure this is worth the effort; does any real world
> workload send shared invalidations so frequently that this matters? In
> any case, this should avoid the regression if such a workload exists.
> 
> New patch set attached. I think these are ready to be committed, but
> would appreciate a final review.

contrib/amcheck 003_cic_2pc.pl test failures revealed a bug that 
required some reworking:

In a PGPROC entry for a prepared xact, the PGPROC's backendID needs to 
be the original backend's ID, because the prepared xact is holding the 
lock on the original virtual transaction id. When a transaction's 
ownership is moved from the original backend's PGPROC entry to the 
prepared xact PGPROC entry, the backendID needs to be copied over. My 
patch removed the field altogether, so it was not copied over, which 
made it look like it the original VXID lock was released at prepare.

I fixed that by adding back the backendID field. For regular backends, 
it's always equal to pgprocno + 1, but for prepared xacts, it's the 
original backend's ID. To make that less confusing, I moved the 
backendID and lxid fields together under a 'vxid' struct. The two fields 
together form the virtual transaction ID, and that's the only context 
where the 'backendID' field should now be looked at.

I also squashed the 'lastBackend' changes in sinvaladt.c to the main patch.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

Andres Freund

Date:

07 February 2024, 18:25:21

Hi,

On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote:
> I spent some more time on the 'lastBackend' optimization in sinvaladt.c. I
> realized that it became very useless with these patches, because aux
> processes are allocated pgprocno's after all the slots for regular backends.
> There are always aux processes running, so lastBackend would always have a
> value close to the max anyway. I replaced that with a dense 'pgprocnos'
> array that keeps track of the exact slots that are in use. I'm not 100% sure
> this is worth the effort; does any real world workload send shared
> invalidations so frequently that this matters? In any case, this should
> avoid the regression if such a workload exists.
>
> New patch set attached. I think these are ready to be committed, but would
> appreciate a final review.


> From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Wed, 24 Jan 2024 23:15:55 +0200
> Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC
>
> It was always just the index of the PGPROC entry from the beginning of
> the proc array. Introduce a macro to compute it from the pointer
> instead.

Hm. The pointer math here is bit more expensive than in some other cases, as
the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding
more math into loops like in TransactionGroupUpdateXidStatus() might end up
showing up.

I've been thinking that we likely should pad PGPROC to some more convenient
boundary, but...


Is this really related to the rest of the series?


> From 4e0121e064804b73ef8a5dc10be27b85968ea1af Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 29 Jan 2024 23:50:34 +0200
> Subject: [PATCH v8 2/5] Redefine backend ID to be an index into the proc
>  array.
>
> Previously, backend ID was an index into the ProcState array, in the
> shared cache invalidation manager (sinvaladt.c). The entry in the
> ProcState array was reserved at backend startup by scanning the array
> for a free entry, and that was also when the backend got its backend
> ID. Things becomes slightly simpler if we redefine backend ID to be
> the index into the PGPROC array, and directly use it also as an index
> to the ProcState array. This uses a little more memory, as we reserve
> a few extra slots in the ProcState array for aux processes that don't
> need them, but the simplicity is worth it.

> Aux processes now also have a backend ID. This simplifies the
> reservation of BackendStatusArray and ProcSignal slots.
>
> You can now convert a backend ID into an index into the PGPROC array
> simply by subtracting 1. We still use 0-based "pgprocnos" in various
> places, for indexes into the PGPROC array, but the only difference now
> is that backend IDs start at 1 while pgprocnos start at 0.

Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so
there'd not be a conflict, right?


> One potential downside of this patch is that the ProcState array might
> get less densely packed, as we we don't try so hard to assign
> low-numbered backend ID anymore. If it's less densely packed,
> lastBackend will stay at a higher value, and SIInsertDataEntries() and
> SICleanupQueue() need to scan over more unused entries. I think that's
> fine. They are performance critical enough to matter, and there was no
> guarantee on dense packing before either: If you launched a lot of
> backends concurrently, and kept the last one open, lastBackend would
> also stay at a high value.

It's perhaps worth noting here that there's a future patch that also addresses
this to some degree?


> @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>      Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));
>
>      Assert(gxact != NULL);
> -    proc = &ProcGlobal->allProcs[gxact->pgprocno];
> +    proc = GetPGProcByNumber(gxact->pgprocno);
>
>      /* Initialize the PGPROC entry */
>      MemSet(proc, 0, sizeof(PGPROC));

This set of changes is independent of this commit, isn't it?


> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
> index ab86e802f21..39171fea06b 100644
> --- a/src/backend/postmaster/auxprocess.c
> +++ b/src/backend/postmaster/auxprocess.c
> @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)
>
>      BaseInit();
>
> -    /*
> -     * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
> -     * have a BackendId, the slot is statically allocated based on the
> -     * auxiliary process type (MyAuxProcType).  Backends use slots indexed in
> -     * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
> -     * AuxProcType + 1 as the index of the slot for an auxiliary process.
> -     *
> -     * This will need rethinking if we ever want more than one of a particular
> -     * auxiliary process type.
> -     */
> -    ProcSignalInit(MaxBackends + MyAuxProcType + 1);
> +    ProcSignalInit();

Now that we don't need the offset here, we could move ProcSignalInit() into
BsaeInit() I think?



> +/*
> + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
> + *
> + * The result may be out of date arbitrarily quickly, so the caller
> + * must be careful about how this information is used.  NULL is
> + * returned if the backend is not active.
> + */
> +PGPROC *
> +BackendIdGetProc(int backendID)
> +{
> +    PGPROC       *result;
> +
> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> +        return NULL;

Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
about being out of date or such.


> +/*
> + * BackendIdGetTransactionIds -- get a backend's transaction status
> + *
> + * Get the xid, xmin, nsubxid and overflow status of the backend.  The
> + * result may be out of date arbitrarily quickly, so the caller must be
> + * careful about how this information is used.
> + */
> +void
> +BackendIdGetTransactionIds(int backendID, TransactionId *xid,
> +                           TransactionId *xmin, int *nsubxid, bool *overflowed)
> +{
> +    PGPROC       *proc;
> +
> +    *xid = InvalidTransactionId;
> +    *xmin = InvalidTransactionId;
> +    *nsubxid = 0;
> +    *overflowed = false;
> +
> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
> +        return;
> +    proc = GetPGProcByBackendId(backendID);
> +
> +    /* Need to lock out additions/removals of backends */
> +    LWLockAcquire(ProcArrayLock, LW_SHARED);
> +
> +    if (proc->pid != 0)
> +    {
> +        *xid = proc->xid;
> +        *xmin = proc->xmin;
> +        *nsubxid = proc->subxidStatus.count;
> +        *overflowed = proc->subxidStatus.overflowed;
> +    }
> +
> +    LWLockRelease(ProcArrayLock);
> +}

Hm, I'm not sure about the locking here. For one, previously we weren't
holding ProcArrayLock. For another, holding ProcArrayLock guarantees that the
backend doesn't end its transaction, but it can still assign xids etc.  And,
for that matter, the backendid could have been recycled between the caller
acquiring the backendId and calling BackendIdGetTransactionIds().


> --- a/src/backend/utils/error/elog.c
> +++ b/src/backend/utils/error/elog.c
> @@ -3074,18 +3074,18 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata)
>                  break;
>              case 'v':
>                  /* keep VXID format in sync with lockfuncs.c */
> -                if (MyProc != NULL && MyProc->backendId != InvalidBackendId)
> +                if (MyProc != NULL)

Doesn't this mean we'll include a vxid in more cases now, particularly
including aux processes? That might be ok, but I also suspect that it'll never
have meaningful values...


> From 94fd46c9ef30ba5e8ac1a8873fce577a4be425f4 Mon Sep 17 00:00:00 2001
> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date: Mon, 29 Jan 2024 22:57:49 +0200
> Subject: [PATCH v8 3/5] Replace 'lastBackend' with an array of in-use slots
>
> Now that the procState array is indexed by pgprocno, the 'lastBackend'
> optimization is useless, because aux processes are assigned PGPROC
> slots and hence have numbers higher than max_connection. So
> 'lastBackend' was always set to almost the end of the array.
>
> To replace that optimization, mantain a dense array of in-use
> indexes. This's redundant with ProgGlobal->procarray, but I was afraid
> of adding any more contention to ProcArrayLock, and this keeps the
> code isolated to sinvaladt.c too.

I think it'd be good to include that explanation and justification in the code
as well.

I suspect we'll need to split out "procarray membership" locking from
ProcArrayLock at some point in some form (vagueness alert). To reduce
contention we already have to hold both ProcArrayLock and XidGenLock when
changing membership, so that holding either of the locks prevents the set of
members to change. This, kinda and differently, adds yet another lock to that.



> It's not clear if we need that optimization at all. I was able to
> write a test case that become slower without this: set max_connections
> to a very high number (> 5000), and create+truncate a table in the
> same transaction thousands of times to send invalidation messages,
> with fsync=off. That became about 20% slower on my laptop.  Arguably
> that's so unrealistic that it doesn't matter, but nevertheless, this
> commit restores the performance of that.

I think it's unfortunately not that uncommon to be bottlenecked by sinval
performance, so I think it's good that you're addressing it.

Greetings,

Andres Freund

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

08 February 2024, 11:19:53

On 07/02/2024 20:25, Andres Freund wrote:
> On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote:
>>  From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Wed, 24 Jan 2024 23:15:55 +0200
>> Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC
>>
>> It was always just the index of the PGPROC entry from the beginning of
>> the proc array. Introduce a macro to compute it from the pointer
>> instead.
> 
> Hm. The pointer math here is bit more expensive than in some other cases, as
> the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding
> more math into loops like in TransactionGroupUpdateXidStatus() might end up
> showing up.

I added a MyProcNumber global variable that is set to 
GetNumberFromPGProc(MyProc). I'm not really concerned about the extra 
math, but with MyProcNumber it should definitely not be an issue. The 
few GetNumberFromPGProc() invocations that remain are in less 
performance-critical paths.

(In later patch, I switch backend ids to 0-based indexing, which 
replaces MyProcNumber references with MyBackendId)

> Is this really related to the rest of the series?

It's not strictly necessary, but it felt prudent to remove it now, since 
I'm removing the backendID field too.

>> You can now convert a backend ID into an index into the PGPROC array
>> simply by subtracting 1. We still use 0-based "pgprocnos" in various
>> places, for indexes into the PGPROC array, but the only difference now
>> is that backend IDs start at 1 while pgprocnos start at 0.
> 
> Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so
> there'd not be a conflict, right?

Correct. I was being conservative and didn't dare to change the old 
convention. The backend ids are visible in a few places like "pg_temp_0" 
schema names, and pg_stat_get_*() functions.

One alternative would be to reserve and waste allProcs[0]. Then pgprocno 
and backend ID could both be direct indexes to the array, but 0 would 
not be used.

If we switch to 0-based indexing, it begs the question: why don't we 
merge the concepts of "pgprocno" and "BackendId" completely and call it 
the same thing everywhere? It probably would be best in the long run. It 
feels like a lot of churn though.

Anyway, I switched to 0-based indexing in the attached new version, to 
see what it looks like.

>> @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid,
>>       Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE));
>>
>>       Assert(gxact != NULL);
>> -    proc = &ProcGlobal->allProcs[gxact->pgprocno];
>> +    proc = GetPGProcByNumber(gxact->pgprocno);
>>
>>       /* Initialize the PGPROC entry */
>>       MemSet(proc, 0, sizeof(PGPROC));
> 
> This set of changes is independent of this commit, isn't it?

Yes. It's just for symmetry, now that we use GetNumberFromPGProc() to 
get the pgprocno.

>> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c
>> index ab86e802f21..39171fea06b 100644
>> --- a/src/backend/postmaster/auxprocess.c
>> +++ b/src/backend/postmaster/auxprocess.c
>> @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype)
>>
>>       BaseInit();
>>
>> -    /*
>> -     * Assign the ProcSignalSlot for an auxiliary process.  Since it doesn't
>> -     * have a BackendId, the slot is statically allocated based on the
>> -     * auxiliary process type (MyAuxProcType).  Backends use slots indexed in
>> -     * the range from 1 to MaxBackends (inclusive), so we use MaxBackends +
>> -     * AuxProcType + 1 as the index of the slot for an auxiliary process.
>> -     *
>> -     * This will need rethinking if we ever want more than one of a particular
>> -     * auxiliary process type.
>> -     */
>> -    ProcSignalInit(MaxBackends + MyAuxProcType + 1);
>> +    ProcSignalInit();
> 
> Now that we don't need the offset here, we could move ProcSignalInit() into
> BsaeInit() I think?

Hmm, doesn't feel right to me. BaseInit() is mostly concerned with 
setting up backend-private structures, and it's also called for a 
standalone backend.

I feel the process initialization codepaths could use some cleanup in 
general. Not sure what exactly.

>> +/*
>> + * BackendIdGetProc -- get a backend's PGPROC given its backend ID
>> + *
>> + * The result may be out of date arbitrarily quickly, so the caller
>> + * must be careful about how this information is used.  NULL is
>> + * returned if the backend is not active.
>> + */
>> +PGPROC *
>> +BackendIdGetProc(int backendID)
>> +{
>> +    PGPROC       *result;
>> +
>> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
>> +        return NULL;
> 
> Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not
> about being out of date or such.

Perhaps. I just followed the example of the old implementation, which 
also returns NULL on bogus inputs.

>> +/*
>> + * BackendIdGetTransactionIds -- get a backend's transaction status
>> + *
>> + * Get the xid, xmin, nsubxid and overflow status of the backend.  The
>> + * result may be out of date arbitrarily quickly, so the caller must be
>> + * careful about how this information is used.
>> + */
>> +void
>> +BackendIdGetTransactionIds(int backendID, TransactionId *xid,
>> +                           TransactionId *xmin, int *nsubxid, bool *overflowed)
>> +{
>> +    PGPROC       *proc;
>> +
>> +    *xid = InvalidTransactionId;
>> +    *xmin = InvalidTransactionId;
>> +    *nsubxid = 0;
>> +    *overflowed = false;
>> +
>> +    if (backendID < 1 || backendID > ProcGlobal->allProcCount)
>> +        return;
>> +    proc = GetPGProcByBackendId(backendID);
>> +
>> +    /* Need to lock out additions/removals of backends */
>> +    LWLockAcquire(ProcArrayLock, LW_SHARED);
>> +
>> +    if (proc->pid != 0)
>> +    {
>> +        *xid = proc->xid;
>> +        *xmin = proc->xmin;
>> +        *nsubxid = proc->subxidStatus.count;
>> +        *overflowed = proc->subxidStatus.overflowed;
>> +    }
>> +
>> +    LWLockRelease(ProcArrayLock);
>> +}
> 
> Hm, I'm not sure about the locking here. For one, previously we weren't
> holding ProcArrayLock. For another, holding ProcArrayLock guarantees that the
> backend doesn't end its transaction, but it can still assign xids etc.  And,
> for that matter, the backendid could have been recycled between the caller
> acquiring the backendId and calling BackendIdGetTransactionIds().

Yeah, the returned values could be out-of-date and even inconsistent 
with each other. I just faithfully copied the old implementation.

Perhaps this should just skip the ProcArrayLock altogether.

>> --- a/src/backend/utils/error/elog.c
>> +++ b/src/backend/utils/error/elog.c
>> @@ -3074,18 +3074,18 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata)
>>                   break;
>>               case 'v':
>>                   /* keep VXID format in sync with lockfuncs.c */
>> -                if (MyProc != NULL && MyProc->backendId != InvalidBackendId)
>> +                if (MyProc != NULL)
> 
> Doesn't this mean we'll include a vxid in more cases now, particularly
> including aux processes? That might be ok, but I also suspect that it'll never
> have meaningful values...

Fixed. (I thought I changed that back already in the last patch version, 
but apparently I only did it in jsonlog.c)

>>  From 94fd46c9ef30ba5e8ac1a8873fce577a4be425f4 Mon Sep 17 00:00:00 2001
>> From: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date: Mon, 29 Jan 2024 22:57:49 +0200
>> Subject: [PATCH v8 3/5] Replace 'lastBackend' with an array of in-use slots
>>
>> Now that the procState array is indexed by pgprocno, the 'lastBackend'
>> optimization is useless, because aux processes are assigned PGPROC
>> slots and hence have numbers higher than max_connection. So
>> 'lastBackend' was always set to almost the end of the array.
>>
>> To replace that optimization, mantain a dense array of in-use
>> indexes. This's redundant with ProgGlobal->procarray, but I was afraid
>> of adding any more contention to ProcArrayLock, and this keeps the
>> code isolated to sinvaladt.c too.
> 
> I think it'd be good to include that explanation and justification in the code
> as well.

Added a comment.


Attached is a new version of these BackendId changes. I kept it as three 
separate patches to highlight the changes from switching to 0-based 
indexing, but I think they should be squashed together before pushing.

I think the last remaining question here is about the 0- vs 1-based 
indexing of BackendIds. Is it a good idea to switch to 0-based indexing? 
And if we do it, should we reserve PGPROC 0. I'm on the fence on this one.

And if we switch to 0-based indexing, should we do a more comprehensive 
search & replace of "pgprocno" to "backendId", or something like that. 
My vote is no, the code churn doesn't feel worth it. And it can also be 
done separately later if we want to.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

On 15/02/2024 07:09, Robert Haas wrote:
> On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote:
>>> I think the last remaining question here is about the 0- vs 1-based indexing
>>> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
>>> it, should we reserve PGPROC 0. I'm on the fence on this one.
>>
>> I lean towards it being a good idea. Having two internal indexing schemes was
>> bad enough so far, but at least one would fairly quickly notice if one used
>> the wrong one. If they're just offset by 1, it might end up taking longer,
>> because that'll often also be a valid id.
> 
> Yeah, I think making everything 0-based is probably the best way
> forward long term. It might require more cleanup work to get there,
> but it's just a lot simpler in the end, IMHO.

Here's another patch version that does that. Yeah, I agree it's nicer in 
the end.

I'm pretty happy with this now. I'll read through these patches myself 
again after sleeping over it and try to get this committed by the end of 
the week, but another pair of eyes wouldn't hurt.

On 14/02/2024 23:37, Andres Freund wrote:
>>   void
>> -ProcSignalInit(int pss_idx)
>> +ProcSignalInit(void)
>>   {
>>       ProcSignalSlot *slot;
>>       uint64        barrier_generation;
>>
>> -    Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots);
>> -
>> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
>> +    if (MyBackendId <= 0)
>> +        elog(ERROR, "MyBackendId not set");
>> +    if (MyBackendId > NumProcSignalSlots)
>> +        elog(ERROR, "unexpected MyBackendId %d in ProcSignalInit (max %d)", MyBackendId, NumProcSignalSlots);
>> +    slot = &ProcSignal->psh_slot[MyBackendId - 1];
>>
>>       /* sanity check */
>>       if (slot->pss_pid != 0)
>>           elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty",
>> -             MyProcPid, pss_idx);
>> +             MyProcPid, (int) (slot - ProcSignal->psh_slot));
> 
> Hm, why not use MyBackendId - 1 as above? Am I missing something?

You're right, fixed.

>>   /*
>> @@ -212,11 +211,7 @@ ProcSignalInit(int pss_idx)
>>   static void
>>   CleanupProcSignalState(int status, Datum arg)
>>   {
>> -    int            pss_idx = DatumGetInt32(arg);
>> -    ProcSignalSlot *slot;
>> -
>> -    slot = &ProcSignal->psh_slot[pss_idx - 1];
>> -    Assert(slot == MyProcSignalSlot);
>> +    ProcSignalSlot *slot = MyProcSignalSlot;
> 
> Maybe worth asserting that MyProcSignalSlot isn't NULL? Previously that was
> checked via the assertion above.

Added.

>> +            if (i != segP->numProcs - 1)
>> +                segP->pgprocnos[i] = segP->pgprocnos[segP->numProcs - 1];
>> +            break;
> 
> Hm. This means the list will be out-of-order more and more over time, leading
> to less cache efficient access patterns. Perhaps we should keep this sorted,
> like we do for ProcGlobal->xids etc?

Perhaps, although these are accessed much less frequently so I'm not 
convinced it's worth the trouble.

I haven't found a good performance test case that where the shared cache 
invalidation would show up. Would you happen to have one?

>> @@ -183,6 +175,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS)
>>           PG_RETURN_BOOL(false);
>>       }
>>
>> +    if (proc != NULL)
>> +        backendId = GetBackendIdFromPGProc(proc);
> 
> How can proc be NULL here?

Fixed.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

03 March 2024, 17:40:32

On 22/02/2024 02:37, Heikki Linnakangas wrote:
> On 15/02/2024 07:09, Robert Haas wrote:
>> On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote:
>>>> I think the last remaining question here is about the 0- vs 1-based indexing
>>>> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do
>>>> it, should we reserve PGPROC 0. I'm on the fence on this one.
>>>
>>> I lean towards it being a good idea. Having two internal indexing schemes was
>>> bad enough so far, but at least one would fairly quickly notice if one used
>>> the wrong one. If they're just offset by 1, it might end up taking longer,
>>> because that'll often also be a valid id.
>>
>> Yeah, I think making everything 0-based is probably the best way
>> forward long term. It might require more cleanup work to get there,
>> but it's just a lot simpler in the end, IMHO.
> 
> Here's another patch version that does that. Yeah, I agree it's nicer in
> the end.
> 
> I'm pretty happy with this now. I'll read through these patches myself
> again after sleeping over it and try to get this committed by the end of
> the week, but another pair of eyes wouldn't hurt.

And pushed. Thanks for the reviews!

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

04 March 2024, 09:05:08

I've now completed many of the side-quests, here are the patches that 
remain.

The first three patches form a logical unit. They move the 
initialization of the Port struct from postmaster to the backend 
process. Currently, that work is split between the postmaster and the 
backend process so that postmaster fills in the socket and some other 
fields, and the backend process fills the rest after reading the startup 
packet. With these patches, there is a new much smaller ClientSocket 
struct that is passed from the postmaster to the child process, which 
contains just the fields that postmaster initializes. The Port struct is 
allocated in the child process. That makes the backend startup easier to 
understand. I plan to commit those three patches next if there are no 
objections.

That leaves the rest of the patches. I think they're in pretty good 
shape too, and I've gotten some review on those earlier and have 
addressed the comments I got so far, but would still appreciate another 
round of review.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Attachment

Re: Refactoring backend fork+exec code

From

Richard Guo

Date:

05 March 2024, 09:44:37

On Mon, Mar 4, 2024 at 1:40 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 22/02/2024 02:37, Heikki Linnakangas wrote:
> Here's another patch version that does that. Yeah, I agree it's nicer in
> the end.
>
> I'm pretty happy with this now. I'll read through these patches myself
> again after sleeping over it and try to get this committed by the end of
> the week, but another pair of eyes wouldn't hurt.

And pushed. Thanks for the reviews!

I noticed that there are still three places in backend_status.c where
pgstat_get_beentry_by_backend_id() is referenced. I think we should
replace them with pgstat_get_beentry_by_proc_number().

Thanks
Richard

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

05 March 2024, 16:31:31

On 05/03/2024 11:44, Richard Guo wrote:
> I noticed that there are still three places in backend_status.c where
> pgstat_get_beentry_by_backend_id() is referenced.  I think we should
> replace them with pgstat_get_beentry_by_proc_number().

Fixed, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Refactoring backend fork+exec code

From

"Tristan Partin"

Date:

05 March 2024, 23:02:55

On Mon Mar 4, 2024 at 3:05 AM CST, Heikki Linnakangas wrote:
> I've now completed many of the side-quests, here are the patches that
> remain.
>
> The first three patches form a logical unit. They move the
> initialization of the Port struct from postmaster to the backend
> process. Currently, that work is split between the postmaster and the
> backend process so that postmaster fills in the socket and some other
> fields, and the backend process fills the rest after reading the startup
> packet. With these patches, there is a new much smaller ClientSocket
> struct that is passed from the postmaster to the child process, which
> contains just the fields that postmaster initializes. The Port struct is
> allocated in the child process. That makes the backend startup easier to
> understand. I plan to commit those three patches next if there are no
> objections.
>
> That leaves the rest of the patches. I think they're in pretty good
> shape too, and I've gotten some review on those earlier and have
> addressed the comments I got so far, but would still appreciate another
> round of review.

> -         * *MyProcPort, because ConnCreate() allocated that space with malloc()
> -         * ... else we'd need to copy the Port data first.  Also, subsidiary data
> -         * such as the username isn't lost either; see ProcessStartupPacket().
> +         * *MyProcPort, because that space is allocated in stack ... else we'd
> +         * need to copy the Port data first.  Also, subsidiary data such as the
> +         * username isn't lost either; see ProcessStartupPacket().

s/allocated in/allocated on the

The first 3 patches seem good to go, in my opinion.

> @@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket *client_sock, BackgroundW
>                  return -1;
>          }
>
> -        /* Make sure caller set up argv properly */
> -        Assert(argc >= 3);
> -        Assert(argv[argc] == NULL);
> -        Assert(strncmp(argv[1], "--fork", 6) == 0);
> -        Assert(argv[2] == NULL);
> -
> -        /* Insert temp file name after --fork argument */
> +        /* set up argv properly */
> +        argv[0] = "postgres";
> +        snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind);
> +        argv[1] = forkav;
> +        /* Insert temp file name after --forkchild argument */
>          argv[2] = tmpfilename;
> +        argv[3] = NULL;

Should we use postgres_exec_path instead of the naked "postgres" here?

> +                /* in postmaster, fork failed ... */
> +                ereport(LOG,
> +                                (errmsg("could not fork worker process: %m")));
> +                /* undo what assign_backendlist_entry did */
> +                ReleasePostmasterChildSlot(rw->rw_child_slot);
> +                rw->rw_child_slot = 0;
> +                pfree(rw->rw_backend);
> +                rw->rw_backend = NULL;
> +                /* mark entry as crashed, so we'll try again later */
> +                rw->rw_crashed_at = GetCurrentTimestamp();
> +                return false;

I think the error message should include the word "background." It would
be more consistent with the log message above it.

> +typedef struct
> +{
> +        int                        syslogFile;
> +        int                        csvlogFile;
> +        int                        jsonlogFile;
> +} syslogger_startup_data;

It would be nice if all of these startup data structs were named
similarly. For instance, a previous one was BackendStartupInfo. It would
help with greppability.

I noticed there were a few XXX comments left that you created. I'll
highlight them here for more visibility.

> +/* XXX: where does this belong? */
> +extern bool LoadedSSL;

Perhaps near the My* variables or maybe in the Port struct?

> +#ifdef EXEC_BACKEND
> +
> +        /*
> +         * Need to reinitialize the SSL library in the backend, since the context
> +         * structures contain function pointers and cannot be passed through the
> +         * parameter file.
> +         *
> +         * If for some reason reload fails (maybe the user installed broken key
> +         * files), soldier on without SSL; that's better than all connections
> +         * becoming impossible.
> +         *
> +         * XXX should we do this in all child processes?  For the moment it's
> +         * enough to do it in backend children. XXX good question indeed
> +         */
> +#ifdef USE_SSL
> +        if (EnableSSL)
> +        {
> +                if (secure_initialize(false) == 0)
> +                        LoadedSSL = true;
> +                else
> +                        ereport(LOG,
> +                                        (errmsg("SSL configuration could not be loaded in child process")));
> +        }
> +#endif
> +#endif

Here you added the "good question indeed." I am not sure what the best
answer is either! :)

> +                /* XXX: translation? */
> +                ereport(LOG,
> +                                (errmsg("could not fork %s process: %m", PostmasterChildName(type))));

I assume you are referring to the child name here?

> XXX: We now have functions called AuxiliaryProcessInit() and
> InitAuxiliaryProcess(). Confusing.

Based on my analysis, the *Init() is called in the Main functions, while
Init*() is called before the Main functions. Maybe
AuxiliaryProcessInit() could be renamed to AuxiliaryProcessStartup()?
Rename the other to AuxiliaryProcessInit().

--
Tristan Partin
Neon (https://neon.tech)

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

13 March 2024, 07:30:27

On 06/03/2024 01:02, Tristan Partin wrote:
> The first 3 patches seem good to go, in my opinion.

Committed these first patches, with a few more changes. Notably, I 
realized that we should move the logic that I originally put in the new 
InitClientConnection function to the existing pq_init() function. It 
servers the same purpose, initialization of the socket in the child 
process. Thanks for the review!

>> @@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket *client_sock, BackgroundW
>>                   return -1;
>>           }
>>
>> -        /* Make sure caller set up argv properly */
>> -        Assert(argc >= 3);
>> -        Assert(argv[argc] == NULL);
>> -        Assert(strncmp(argv[1], "--fork", 6) == 0);
>> -        Assert(argv[2] == NULL);
>> -
>> -        /* Insert temp file name after --fork argument */
>> +        /* set up argv properly */
>> +        argv[0] = "postgres";
>> +        snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind);
>> +        argv[1] = forkav;
>> +        /* Insert temp file name after --forkchild argument */
>>           argv[2] = tmpfilename;
>> +        argv[3] = NULL;
> 
> Should we use postgres_exec_path instead of the naked "postgres" here?

I don't know, but it's the same as on 'master' currently. The code just 
got moved around.

>> +                /* in postmaster, fork failed ... */
>> +                ereport(LOG,
>> +                                (errmsg("could not fork worker process: %m")));
>> +                /* undo what assign_backendlist_entry did */
>> +                ReleasePostmasterChildSlot(rw->rw_child_slot);
>> +                rw->rw_child_slot = 0;
>> +                pfree(rw->rw_backend);
>> +                rw->rw_backend = NULL;
>> +                /* mark entry as crashed, so we'll try again later */
>> +                rw->rw_crashed_at = GetCurrentTimestamp();
>> +                return false;
> 
> I think the error message should include the word "background." It would
> be more consistent with the log message above it.

This is also a pre-existing message I just moved around. But yeah, I 
agree, so changed.

>> +typedef struct
>> +{
>> +        int                        syslogFile;
>> +        int                        csvlogFile;
>> +        int                        jsonlogFile;
>> +} syslogger_startup_data;
> 
> It would be nice if all of these startup data structs were named
> similarly. For instance, a previous one was BackendStartupInfo. It would
> help with greppability.

Renamed them to SysloggerStartupData and BackendStartupData. Background 
worker startup still passes a struct called BackgroundWorker, however. I 
left that as it is, because the struct is used for other purposes too.

> I noticed there were a few XXX comments left that you created. I'll
> highlight them here for more visibility.
> 
>> +/* XXX: where does this belong? */
>> +extern bool LoadedSSL;
> 
> Perhaps near the My* variables or maybe in the Port struct?

It is valid in the postmaster, too, though. The My* variables and Port 
struct only make sense in the child process.

I think this is the best place after all, so I just removed the XXX comment.

>> +#ifdef EXEC_BACKEND
>> +
>> +        /*
>> +         * Need to reinitialize the SSL library in the backend, since the context
>> +         * structures contain function pointers and cannot be passed through the
>> +         * parameter file.
>> +         *
>> +         * If for some reason reload fails (maybe the user installed broken key
>> +         * files), soldier on without SSL; that's better than all connections
>> +         * becoming impossible.
>> +         *
>> +         * XXX should we do this in all child processes?  For the moment it's
>> +         * enough to do it in backend children. XXX good question indeed
>> +         */
>> +#ifdef USE_SSL
>> +        if (EnableSSL)
>> +        {
>> +                if (secure_initialize(false) == 0)
>> +                        LoadedSSL = true;
>> +                else
>> +                        ereport(LOG,
>> +                                        (errmsg("SSL configuration could not be loaded in child process")));
>> +        }
>> +#endif
>> +#endif
> 
> Here you added the "good question indeed." I am not sure what the best
> answer is either! :)

I just removed the extra XXX comment. It's still a valid question, but 
this patch just moves it around, we don't need to answer it here.

>> +                /* XXX: translation? */
>> +                ereport(LOG,
>> +                                (errmsg("could not fork %s process: %m", PostmasterChildName(type))));
> 
> I assume you are referring to the child name here?

Correct. Does the process name need to be translated? And this way of 
constructing sentences is not translation-friendly anyway. In some 
languages, the word 'process' might need to be inflected differently 
depending on the child name, for example.

I put the process name in quotes, and didn't mark the process name for 
translation.

>> XXX: We now have functions called AuxiliaryProcessInit() and
>> InitAuxiliaryProcess(). Confusing.
> 
> Based on my analysis, the *Init() is called in the Main functions, while
> Init*() is called before the Main functions. Maybe
> AuxiliaryProcessInit() could be renamed to AuxiliaryProcessStartup()?
> Rename the other to AuxiliaryProcessInit().

Hmm. There's also BackendStartup() function in postmaster.c, which is 
very different: it runs in the postmaster process and launches the 
backend process. So the Startup suffix is not great either.

I renamed AuxiliaryProcessInit() to AuxiliaryProcessMainCommon(). As in 
"the common parts of the main functions of all the aux processes".

(We should perhaps merge InitProcess() and InitAuxiliaryProcess() into 
one function. There's a lot of duplicated code between them. And the 
parts that differ should perhaps be refactored to be more similar 
anyway. I don't want to take on that refactoring right now though.)

Attached is a new version of the remaining patches.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

On Wed, 20 Mar 2024 at 08:16, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Yeah, it's not a very valuable assertion. Removed, thanks!

How about we add it as a static assert instead of removing it, like we
have for many other similar arrays.

Attachment

v1-0001-Add-child_process_kinds-static-assert.patch

Re: Refactoring backend fork+exec code

From

"Anton A. Melnikov"

Date:

27 April 2024, 08:27:01

Hello!

Maybe add PGDLLIMPORT to
extern bool LoadedSSL;
and
extern struct ClientSocket *MyClientSocket;
definitions in the src/include/postmaster/postmaster.h ?

With the best regards,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

28 April 2024, 19:36:33

On 27/04/2024 11:27, Anton A. Melnikov wrote:
> Hello!
> 
> Maybe add PGDLLIMPORT to
> extern bool LoadedSSL;
> and
> extern struct ClientSocket *MyClientSocket;
> definitions in the src/include/postmaster/postmaster.h ?
Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Refactoring backend fork+exec code

From

"Anton A. Melnikov"

Date:

01 May 2024, 13:32:24

On 28.04.2024 22:36, Heikki Linnakangas wrote:
> Peter E noticed and Michael fixed them in commit 768ceeeaa1 already.

Didn't check that is already fixed in the current master. Sorry!
Thanks for pointing this out!

With the best wishes,

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Refactoring backend fork+exec code

From

Thomas Munro

Date:

18 May 2024, 05:24:45

On Mon, Mar 18, 2024 at 10:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Committed, with some final cosmetic cleanups. Thanks everyone!

Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be
a bit off, from a branch of mine):

../src/backend/postmaster/launch_backend.c:772:2: runtime error: null
pointer passed as argument 2, which is declared to never be null
==13303==Using libbacktrace symbolizer.
    #0 0x5555564b0202 in save_backend_variables
../src/backend/postmaster/launch_backend.c:772
    #1 0x5555564b0242 in internal_forkexec
../src/backend/postmaster/launch_backend.c:311
    #2 0x5555564b0bdd in postmaster_child_launch
../src/backend/postmaster/launch_backend.c:244
    #3 0x5555564b3121 in StartChildProcess
../src/backend/postmaster/postmaster.c:3928
    #4 0x5555564b933a in PostmasterMain
../src/backend/postmaster/postmaster.c:1357
    #5 0x5555562de4ad in main ../src/backend/main/main.c:197
    #6 0x7ffff667ad09 in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
    #7 0x555555e34279 in _start
(/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279)

This silences it:

-       memcpy(param->startup_data, startup_data, startup_data_len);
+       if (startup_data_len > 0)
+               memcpy(param->startup_data, startup_data, startup_data_len);

(I found that out by testing EXEC_BACKEND on CI.  I also learned that
the Mac and FreeBSD tasks fail with EXEC_BACKEND because of SysV shmem
bleating.  We probably should go and crank up the relevant sysctls in
the .cirrus.tasks.yml...)

Re: Refactoring backend fork+exec code

From

Nathan Bossart

Date:

17 June 2024, 18:36:00

While looking into [0], I noticed that main() still only checks for the
--fork prefix, but IIUC commit aafc05d removed all --fork* options except
for --forkchild.  I've attached a patch to strengthen the check in main().
This is definitely just a nitpick.

[0] https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com

-- 
nathan

Attachment

forkchild_check.patch

Re: Refactoring backend fork+exec code

From

Heikki Linnakangas

Date:

03 July 2024, 13:25:18

On 18/05/2024 08:24, Thomas Munro wrote:
> Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be
> a bit off, from a branch of mine):
> 
> ../src/backend/postmaster/launch_backend.c:772:2: runtime error: null
> pointer passed as argument 2, which is declared to never be null
> ==13303==Using libbacktrace symbolizer.
>      #0 0x5555564b0202 in save_backend_variables
> ../src/backend/postmaster/launch_backend.c:772
>      #1 0x5555564b0242 in internal_forkexec
> ../src/backend/postmaster/launch_backend.c:311
>      #2 0x5555564b0bdd in postmaster_child_launch
> ../src/backend/postmaster/launch_backend.c:244
>      #3 0x5555564b3121 in StartChildProcess
> ../src/backend/postmaster/postmaster.c:3928
>      #4 0x5555564b933a in PostmasterMain
> ../src/backend/postmaster/postmaster.c:1357
>      #5 0x5555562de4ad in main ../src/backend/main/main.c:197
>      #6 0x7ffff667ad09 in __libc_start_main
> (/lib/x86_64-linux-gnu/libc.so.6+0x23d09)
>      #7 0x555555e34279 in _start
> (/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279)
> 
> This silences it:
> 
> -       memcpy(param->startup_data, startup_data, startup_data_len);
> +       if (startup_data_len > 0)
> +               memcpy(param->startup_data, startup_data, startup_data_len);

Fixed, thanks!

On 17/06/2024 21:36, Nathan Bossart wrote:
> While looking into [0], I noticed that main() still only checks for the
> --fork prefix, but IIUC commit aafc05d removed all --fork* options except
> for --forkchild.  I've attached a patch to strengthen the check in main().
> This is definitely just a nitpick.

Fixed this too, thanks!

-- 
Heikki Linnakangas
Neon (https://neon.tech)