Thread: Refactoring backend fork+exec code
I started to look at the code in postmaster.c related to launching child processes. I tried to reduce the difference between EXEC_BACKEND and !EXEC_BACKEND code paths, and put the code that needs to differ behind a better abstraction. I started doing this to help with implementing multi-threading, but it doesn't introduce anything thread-related yet and I think this improves readability anyway. This is still work-inprogress, especially the last, big, patch in the patch set. Mainly, I need to clean up the comments in the new launch_backend.c file. But the other patches are in pretty good shape, and if you ignore launch_backend.c, you can see the effect on the other source files. With these patches, there is a new function for launching a postmaster child process: pid_t postmaster_child_launch(PostmasterChildType child_type, char *startup_data, size_t startup_data_len, ClientSocket *client_sock); This function hides the differences between EXEC_BACKEND and !EXEC_BACKEND cases. In 'startup_data', the caller can pass a blob of data to the child process, with different meaning for different kinds of child processes. For a backend process, for example, it's used to pass the CAC_state, which indicates whether the backend accepts the connection or just sends "too many clients" error. And for background workers, it's used to pass the BackgroundWorker struct. The startup data is passed to the child process in the ClientSocket is a new struct holds a socket FD, and the local and remote address info. Before this patch set, postmaster initializes the Port structs but only fills in those fields in it. With this patch set, we have a new ClientSocket struct just for those fields, which makes it more clear which fields are initialized where. I haven't done much testing yet, and no testing at all on Windows, so that's probably still broken. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- 0001-Allocate-Backend-structs-in-PostmasterContext.patch
- 0002-Pass-background-worker-entry-in-the-parameter-file.patch
- 0003-Refactor-CreateSharedMemoryAndSemaphores.patch
- 0004-Use-FD_CLOEXEC-on-ListenSockets.patch
- 0005-Move-too-many-clients-already-et-al.-checks-from-Pro.patch
- 0006-Pass-CAC-as-argument-to-backend-process.patch
- 0007-Remove-ConnCreate-and-ConnFree-and-allocate-Port-in-.patch
- 0008-Introduce-ClientSocket-rename-some-funcs.patch
- 0009-Refactor-postmaster-child-process-launching.patch
> From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 12 Jun 2023 16:33:20 +0300 > Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets > @@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port) > void > StreamClose(pgsocket sock) > { > - closesocket(sock); > + if (closesocket(sock) != 0) > + elog(LOG, "closesocket failed: %m"); > } > > /* Do you think WARNING would be a more appropriate log level? > From 2f518be9e96cfed1a1a49b4af8f7cb4a837aa784 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 12 Jun 2023 18:07:54 +0300 > Subject: [PATCH 5/9] Move "too many clients already" et al. checks from > ProcessStartupPacket. This seems like a change you could push already (assuming another maintainer agrees with you), which makes reviews for this patchset even easier. > From c25b67c045018a2bf05e6ff53819d26e561fc83f Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Sun, 18 Jun 2023 14:11:16 +0300 > Subject: [PATCH 6/9] Pass CAC as argument to backend process. Could you expand a bit more on why it is better to pass it as a separate argument? Does it not fit well conceptually in struct Port? > @@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[]) > * returns the pid of the fork/exec'd process, or -1 on failure > */ > static pid_t > -backend_forkexec(Port *port) > +backend_forkexec(Port *port, CAC_state cac) > { > - char *av[4]; > + char *av[5]; > int ac = 0; > + char cacbuf[10]; > > av[ac++] = "postgres"; > av[ac++] = "--forkbackend"; > av[ac++] = NULL; /* filled in by internal_forkexec */ > > + snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac); > + av[ac++] = cacbuf; Might be worth a sanity check that there wasn't any truncation into cacbuf, which is an impossibility as the code is written, but still useful for catching a future developer error. Is it worth adding a command line option at all instead of having the naked positional argument? It would help anybody who might read the command line what the seemingly random integer stands for. > @@ -4910,7 +4926,10 @@ SubPostmasterMain(int argc, char *argv[]) > /* Run backend or appropriate child */ > if (strcmp(argv[1], "--forkbackend") == 0) > { > - Assert(argc == 3); /* shouldn't be any more args */ > + CAC_state cac; > + > + Assert(argc == 4); > + cac = (CAC_state) atoi(argv[3]); Perhaps an assert or full error checking that atoi succeeds would be useful for similar reasons to my previous comment. > From 658cba5cdb2e5c45faff84566906d2fcaa8a3674 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 12 Jun 2023 18:03:03 +0300 > Subject: [PATCH 7/9] Remove ConnCreate and ConnFree, and allocate Port in > stack. Again, seems like another patch that could be pushed separately assuming others don't have any comments. > From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Sun, 18 Jun 2023 13:56:44 +0300 > Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs > @@ -1499,7 +1499,7 @@ CloseServerPorts(int status, Datum arg) > { > if (ListenSocket[i] != PGINVALID_SOCKET) > { > - StreamClose(ListenSocket[i]); > + closesocket(ListenSocket[i]); > ListenSocket[i] = PGINVALID_SOCKET; > } > } I see you have been adding log messages in the case of closesocket() failing. Do you think it is worth doing here as well? One strange part about this patch is that in patch 4, you edit StreamClose() to emit a log message in the case of closesocket() failure, but then this patch just completely removes it. > @@ -4407,11 +4420,11 @@ BackendInitialize(Port *port, CAC_state cac) > * Doesn't return at all. > */ > static void > -BackendRun(Port *port) > +BackendRun(void) > { > /* > - * Create a per-backend PGPROC struct in shared memory. We must do > - * this before we can use LWLocks (in AttachSharedMemoryAndSemaphores). > + * Create a per-backend PGPROC struct in shared memory. We must do this > + * before we can use LWLocks (in AttachSharedMemoryAndSemaphores). > */ > InitProcess(); This comment reflow probably fits better in the patch that added the AttachSharedMemoryAndSemaphores function. > From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Sun, 18 Jun 2023 13:59:48 +0300 > Subject: [PATCH 9/9] Refactor postmaster child process launching > - Move code related to launching backend processes to new source file, > process_start.c Since this seems pretty self-contained, my be easier to review if this was its own commit. > - Refactor the mechanism of passing informaton from the parent to > child process. Instead of using different command-line arguments > when launching the child process in EXEC_BACKEND mode, pass a > variable-length blob of data along with all the global > variables. The contents of that blob depends on the kind of child > process being launched. In !EXEC_BACKEND mode, we use the same blob, > but it's simply inherited from the parent to child process. Same with this. Perhaps others would disagree. > +const PMChildEntry entry_kinds[] = { > + {"backend", BackendMain, true}, > + > + {"autovacuum launcher", AutoVacLauncherMain, true}, > + {"autovacuum worker", AutoVacWorkerMain, true}, > + {"bgworker", BackgroundWorkerMain, true}, > + {"syslogger", SysLoggerMain, false}, > + > + {"startup", StartupProcessMain, true}, > + {"bgwriter", BackgroundWriterMain, true}, > + {"archiver", PgArchiverMain, true}, > + {"checkpointer", CheckpointerMain, true}, > + {"wal_writer", WalWriterMain, true}, > + {"wal_receiver", WalReceiverMain, true}, > +}; It seems like this could be made static. I didn't see it getting exposed in a header file anywhere, but I also admit that I can be blind at times. I need to spend more time looking at this last patch. Nice work so far! -- Tristan Partin Neon (https://neon.tech)
Hi, On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote: > I started to look at the code in postmaster.c related to launching child > processes. I tried to reduce the difference between EXEC_BACKEND and > !EXEC_BACKEND code paths, and put the code that needs to differ behind a > better abstraction. I started doing this to help with implementing > multi-threading, but it doesn't introduce anything thread-related yet and I > think this improves readability anyway. Yes please! This code is absolutely awful. > From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Sun, 18 Jun 2023 11:00:21 +0300 > Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores. > > Moves InitProcess calls a little later in EXEC_BACKEND case. What's the reason for this part? ISTM that we'd really want to get away from plastering duplicated InitProcess() etc everywhere. I think this might be easier to understand if you just changed did the CreateSharedMemoryAndSemaphores() -> AttachSharedMemoryAndSemaphores() piece in this commit, and the rest later. > +void > +AttachSharedMemoryAndSemaphores(void) > +{ > + /* InitProcess must've been called already */ Perhaps worth an assertion to make it easier to see that the order is wrong? > From 1d89eec53c7fefa7a4a8c011c9f19e3df64dc436 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 12 Jun 2023 16:33:20 +0300 > Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets > > We went through some effort to close them in the child process. Better to > not hand them down to the child process in the first place. I think Thomas has a larger version of this patch: https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com > From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Sun, 18 Jun 2023 13:56:44 +0300 > Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs > > - Move more of the work on a client socket to the child process. > > - Reduce the amount of data that needs to be passed from postmaster to > child. (Used to pass a full Port struct, although most of the fields were > empty. Now we pass the much slimmer ClientSocket.) I think there might be extensions accessing Port. Not sure if it's worth worrying about, but ... > --- a/src/backend/postmaster/autovacuum.c > +++ b/src/backend/postmaster/autovacuum.c > @@ -476,8 +476,8 @@ AutoVacLauncherMain(int argc, char *argv[]) > pqsignal(SIGCHLD, SIG_DFL); > > /* > - * Create a per-backend PGPROC struct in shared memory. We must do > - * this before we can use LWLocks. > + * Create a per-backend PGPROC struct in shared memory. We must do this > + * before we can use LWLocks. > */ > InitProcess(); > Don't think this was intentional? > From b33cfeb28a5419045acb659a01410b2b463bea3e Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Sun, 18 Jun 2023 13:59:48 +0300 > Subject: [PATCH 9/9] Refactor postmaster child process launching > > - Move code related to launching backend processes to new source file, > process_start.c I think you might have renamed this to launch_backend.c? > - Introduce new postmaster_child_launch() function that deals with the > differences between EXEC_BACKEND and fork mode. > > - Refactor the mechanism of passing informaton from the parent to > child process. Instead of using different command-line arguments > when launching the child process in EXEC_BACKEND mode, pass a > variable-length blob of data along with all the global > variables. The contents of that blob depends on the kind of child > process being launched. In !EXEC_BACKEND mode, we use the same blob, > but it's simply inherited from the parent to child process. > +const PMChildEntry entry_kinds[] = { > + {"backend", BackendMain, true}, > + > + {"autovacuum launcher", AutoVacLauncherMain, true}, > + {"autovacuum worker", AutoVacWorkerMain, true}, > + {"bgworker", BackgroundWorkerMain, true}, > + {"syslogger", SysLoggerMain, false}, > + > + {"startup", StartupProcessMain, true}, > + {"bgwriter", BackgroundWriterMain, true}, > + {"archiver", PgArchiverMain, true}, > + {"checkpointer", CheckpointerMain, true}, > + {"wal_writer", WalWriterMain, true}, > + {"wal_receiver", WalReceiverMain, true}, > +}; I'd assign them with the PostmasterChildType as index, so there's no danger of getting out of order. const PMChildEntry entry_kinds = { [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true}, ... } or such should work. I'd also use designated initializers for the fields, it's otherwise hard to know what true means etc. I think it might be good to put more into array. If we e.g. knew whether a particular child type is a backend-like, and aux process or syslogger, we could avoid the duplicated InitAuxiliaryProcess(), MemoryContextDelete(PostmasterContext) etc calls everywhere. > +/* > + * SubPostmasterMain -- Get the fork/exec'd process into a state equivalent > + * to what it would be if we'd simply forked on Unix, and then > + * dispatch to the appropriate place. > + * > + * The first two command line arguments are expected to be "--forkFOO" > + * (where FOO indicates which postmaster child we are to become), and > + * the name of a variables file that we can read to load data that would > + * have been inherited by fork() on Unix. Remaining arguments go to the > + * subprocess FooMain() routine. XXX > + */ > +void > +SubPostmasterMain(int argc, char *argv[]) > +{ > + PostmasterChildType child_type; > + char *startup_data; > + size_t startup_data_len; > + > + /* In EXEC_BACKEND case we will not have inherited these settings */ > + IsPostmasterEnvironment = true; > + whereToSendOutput = DestNone; > + > + /* Setup essential subsystems (to ensure elog() behaves sanely) */ > + InitializeGUCOptions(); > + > + /* Check we got appropriate args */ > + if (argc < 3) > + elog(FATAL, "invalid subpostmaster invocation"); > + > + if (strncmp(argv[1], "--forkchild=", 12) == 0) > + { > + char *entry_name = argv[1] + 12; > + bool found = false; > + > + for (int idx = 0; idx < lengthof(entry_kinds); idx++) > + { > + if (strcmp(entry_kinds[idx].name, entry_name) == 0) > + { > + child_type = idx; > + found = true; > + break; > + } > + } > + if (!found) > + elog(ERROR, "unknown child kind %s", entry_name); > + } Hm, shouldn't we error out when called without --forkchild? > +/* Save critical backend variables into the BackendParameters struct */ > +#ifndef WIN32 > +static bool > +save_backend_variables(BackendParameters *param, ClientSocket *client_sock) > +#else There's so much of this kind of thing. Could we hide it in a struct or such instead of needing ifdefs everywhere? > --- a/src/backend/storage/ipc/shmem.c > +++ b/src/backend/storage/ipc/shmem.c > @@ -144,6 +144,8 @@ InitShmemAllocation(void) > /* > * Initialize ShmemVariableCache for transaction manager. (This doesn't > * really belong here, but not worth moving.) > + * > + * XXX: we really should move this > */ > ShmemVariableCache = (VariableCache) > ShmemAlloc(sizeof(*ShmemVariableCache)); Heh. Indeed. And probably just rename it to something less insane. Greetings, Andres Freund
Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
Focusing on this one patch in this series: On 11/07/2023 01:50, Andres Freund wrote: >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Mon, 12 Jun 2023 16:33:20 +0300 >> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets >> >> We went through some effort to close them in the child process. Better to >> not hand them down to the child process in the first place. > > I think Thomas has a larger version of this patch: > https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option to the *accepted* socket in commit 1da569ca1f. That was part of that thread. This patch adds the option to the *listen* sockets. That was not discussed in that thread, but it's certainly in the same vein. Thomas: What do you think of the attached? On 11/07/2023 00:07, Tristan Partin wrote: >> @@ -831,7 +834,8 @@ StreamConnection(pgsocket server_fd, Port *port) >> void >> StreamClose(pgsocket sock) >> { >> - closesocket(sock); >> + if (closesocket(sock) != 0) >> + elog(LOG, "closesocket failed: %m"); >> } >> >> /* > > Do you think WARNING would be a more appropriate log level? No, WARNING is for messages that you expect the client to receive. This failure is unexpected at the system level, the message is for the administrator. The distinction isn't always very clear, but LOG seems more appropriate in this case. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Thomas Munro
Date:
On Thu, Aug 24, 2023 at 11:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 11/07/2023 01:50, Andres Freund wrote: > >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > >> Date: Mon, 12 Jun 2023 16:33:20 +0300 > >> Subject: [PATCH 4/9] Use FD_CLOEXEC on ListenSockets > >> > >> We went through some effort to close them in the child process. Better to > >> not hand them down to the child process in the first place. > > > > I think Thomas has a larger version of this patch: > > https://postgr.es/m/CA%2BhUKGKPNFcfBQduqof4-7C%3DavjcSfdkKBGvQoRuAvfocnvY0A%40mail.gmail.com > > Hmm, no, that's a little different. Thomas added the FD_CLOEXEC option > to the *accepted* socket in commit 1da569ca1f. That was part of that > thread. This patch adds the option to the *listen* sockets. That was not > discussed in that thread, but it's certainly in the same vein. > > Thomas: What do you think of the attached? LGTM. I vaguely recall thinking that it might be better to keep EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I didn't try this one, but it looks fine with the comment to explain, as you have it. (It's a shame we can't use O_CLOFORK.) There was some question in the other thread about whether doing that to the server socket might affect accepted sockets too on some OS, but I can at least confirm that your patch works fine on FreeBSD in an EXEC_BACKEND build. I think there were some historical disagreements about which socket properties were inherited, but not that.
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
On 24/08/2023 15:48, Thomas Munro wrote: > LGTM. I vaguely recall thinking that it might be better to keep > EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I > didn't try this one, but it looks fine with the comment to explain, as > you have it. (It's a shame we can't use O_CLOFORK.) Yeah, O_CLOFORK would be nice.. Committed, thanks! -- Heikki Linnakangas Neon (https://neon.tech)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Jeff Janes
Date:
On Thu, Aug 24, 2023 at 10:05 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 24/08/2023 15:48, Thomas Munro wrote:
> LGTM. I vaguely recall thinking that it might be better to keep
> EXEC_BACKEND and !EXEC_BACKEND working the same which might be why I
> didn't try this one, but it looks fine with the comment to explain, as
> you have it. (It's a shame we can't use O_CLOFORK.)
Yeah, O_CLOFORK would be nice..
Committed, thanks!
Since this commit, I'm getting a lot (63 per restart) of messages:
LOG: could not close client or listen socket: Bad file descriptor
All I have to do to get the message is turn logging_collector = on and restart.
The close failure condition existed before the commit, it just wasn't logged before. So, did the extra logging added here just uncover a pre-existing bug?
The LOG message is sent to the terminal, not to the log file.
Cheers,
Jeff
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
On 28/08/2023 18:55, Jeff Janes wrote: > Since this commit, I'm getting a lot (63 per restart) of messages: > > LOG: could not close client or listen socket: Bad file descriptor > All I have to do to get the message is turn logging_collector = on and > restart. > > The close failure condition existed before the commit, it just wasn't > logged before. So, did the extra logging added here just uncover a > pre-existing bug? Yes, so it seems. Syslogger is started before the ListenSockets array is initialized, so its still all zeros. When syslogger is forked, the child process tries to close all the listen sockets, which are all zeros. So syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the array initialization earlier. The first close(0) actually does have an effect: it closes stdin, which is fd 0. That is surely accidental, but I wonder if we should indeed close stdin in child processes? The postmaster process doesn't do anything with stdin either, although I guess a library might try to read a passphrase from stdin before starting up, for example. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Michael Paquier
Date:
On Mon, Aug 28, 2023 at 11:52:15PM +0300, Heikki Linnakangas wrote: > On 28/08/2023 18:55, Jeff Janes wrote: >> Since this commit, I'm getting a lot (63 per restart) of messages: >> >> LOG: could not close client or listen socket: Bad file descriptor >> All I have to do to get the message is turn logging_collector = on and >> restart. >> >> The close failure condition existed before the commit, it just wasn't >> logged before. So, did the extra logging added here just uncover a >> pre-existing bug? In case you've not noticed: https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9@paquier.xyz But it does not really matter now ;) > Yes, so it seems. Syslogger is started before the ListenSockets array is > initialized, so its still all zeros. When syslogger is forked, the child > process tries to close all the listen sockets, which are all zeros. So > syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the > array initialization earlier. Yep, I've reached the same conclusion. Wouldn't it be cleaner to move the callback registration of CloseServerPorts() closer to the array initialization, though? > The first close(0) actually does have an effect: it closes stdin, which is > fd 0. That is surely accidental, but I wonder if we should indeed close > stdin in child processes? The postmaster process doesn't do anything with > stdin either, although I guess a library might try to read a passphrase from > stdin before starting up, for example. We would have heard about that, wouldn't we? I may be missing something of course, but on HEAD, the array initialization is done before starting any child processes, and the syslogger is the first one forked. -- Michael
Attachment
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
On 29/08/2023 01:28, Michael Paquier wrote: > > In case you've not noticed: > https://www.postgresql.org/message-id/ZOvvuQe0rdj2slA9@paquier.xyz > But it does not really matter now ;) Ah sorry, missed that thread. >> Yes, so it seems. Syslogger is started before the ListenSockets array is >> initialized, so its still all zeros. When syslogger is forked, the child >> process tries to close all the listen sockets, which are all zeros. So >> syslogger calls close(0) MAXLISTEN (64) times. Attached patch moves the >> array initialization earlier. > > Yep, I've reached the same conclusion. Wouldn't it be cleaner to move > the callback registration of CloseServerPorts() closer to the array > initialization, though? Ok, pushed that way. I checked the history of this: it goes back to commit 9a86f03b4e in version 13. The SysLogger_Start() call used to be later, after setting p ListenSockets, but that commit moved it. So I backpatched this to v13. >> The first close(0) actually does have an effect: it closes stdin, which is >> fd 0. That is surely accidental, but I wonder if we should indeed close >> stdin in child processes? The postmaster process doesn't do anything with >> stdin either, although I guess a library might try to read a passphrase from >> stdin before starting up, for example. > > We would have heard about that, wouldn't we? I may be missing > something of course, but on HEAD, the array initialization is done > before starting any child processes, and the syslogger is the first > one forked. Yes, syslogger is the only process that closes stdin. After moving the initialization, it doesn't close it either. Thinking about this some more, the ListenSockets array is a bit silly anyway. We fill the array starting from index 0, always append to the end, and never remove entries from it. It would seem more straightforward to keep track of the used size of the array. Currently we always loop through the unused parts too, and e.g. ConfigurePostmasterWaitSet() needs to walk the array to count how many elements are in use. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
On 29/08/2023 09:21, Heikki Linnakangas wrote: > Thinking about this some more, the ListenSockets array is a bit silly > anyway. We fill the array starting from index 0, always append to the > end, and never remove entries from it. It would seem more > straightforward to keep track of the used size of the array. Currently > we always loop through the unused parts too, and e.g. > ConfigurePostmasterWaitSet() needs to walk the array to count how many > elements are in use. Like this. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
On 29/08/2023 09:58, Heikki Linnakangas wrote: > On 29/08/2023 09:21, Heikki Linnakangas wrote: >> Thinking about this some more, the ListenSockets array is a bit silly >> anyway. We fill the array starting from index 0, always append to the >> end, and never remove entries from it. It would seem more >> straightforward to keep track of the used size of the array. Currently >> we always loop through the unused parts too, and e.g. >> ConfigurePostmasterWaitSet() needs to walk the array to count how many >> elements are in use. > > Like this. This seems pretty uncontroversial, and I heard no objections, so I went ahead and committed that. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Michael Paquier
Date:
On Thu, Oct 05, 2023 at 03:08:37PM +0300, Heikki Linnakangas wrote: > This seems pretty uncontroversial, and I heard no objections, so I went > ahead and committed that. It looks like e29c4643951 is causing issues here. While doing benchmarking on a cluster compiled with -O2, I got a crash: LOG: system logger process (PID 27924) was terminated by signal 11: Segmentation fault Program terminated with signal SIGSEGV, Segmentation fault. #0 0x000055ef3b9aed20 in pfree () (gdb) bt #0 0x000055ef3b9aed20 in pfree () #1 0x000055ef3b7e0e41 in ClosePostmasterPorts () #2 0x000055ef3b7e6649 in SysLogger_Start () #3 0x000055ef3b7e4413 in PostmasterMain () Okay, the backtrace is not that useful. I'll see if I can get something better, still it seems like this has broken the way the syslogger closes these ports. -- Michael
Attachment
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Michael Paquier
Date:
On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote: > Okay, the backtrace is not that useful. I'll see if I can get > something better, still it seems like this has broken the way the > syslogger closes these ports. And here you go: Program terminated with signal SIGSEGV, Segmentation fault. #0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header = *((const uint64 *) ((const char *) pointer - sizeof(uint64))); (gdb) bt #0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 #1 0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463 #2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571 #3 0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686 #4 0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00) at postmaster.c:1148 #5 0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198 (gdb) up 2 #2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571 2571 pfree(ListenSockets); (gdb) p ListenSockets $1 = (pgsocket *) 0x0 -- Michael
Attachment
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Heikki Linnakangas
Date:
On 06/10/2023 09:50, Michael Paquier wrote: > On Fri, Oct 06, 2023 at 02:30:16PM +0900, Michael Paquier wrote: >> Okay, the backtrace is not that useful. I'll see if I can get >> something better, still it seems like this has broken the way the >> syslogger closes these ports. > > And here you go: > Program terminated with signal SIGSEGV, Segmentation fault. > #0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 196 header = > *((const uint64 *) ((const char *) pointer - sizeof(uint64))); > (gdb) bt > #0 GetMemoryChunkMethodID (pointer=0x0) at mcxt.c:196 > #1 0x0000557d04176d59 in pfree (pointer=0x0) at mcxt.c:1463 > #2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at postmaster.c:2571 > #3 0x0000557d03e93ac2 in SysLogger_Start () at syslogger.c:686 > #4 0x0000557d03e8c5b7 in PostmasterMain (argc=3, argv=0x557d0471ed00) > at postmaster.c:1148 > #5 0x0000557d03d48e34 in main (argc=3, argv=0x557d0471ed00) at main.c:198 > (gdb) up 2 > #2 0x0000557d03e8eab3 in ClosePostmasterPorts (am_syslogger=true) at > postmaster.c:2571 > 2571 pfree(ListenSockets); > (gdb) p ListenSockets $1 = (pgsocket *) 0x0 Fixed, thanks! I did a quick test with syslogger enabled before committing, but didn't notice the segfault. I missed it because syslogger gets restarted and then it worked. -- Heikki Linnakangas Neon (https://neon.tech)
Re: Use FD_CLOEXEC on ListenSockets (was Re: Refactoring backend fork+exec code)
From
Michael Paquier
Date:
On Fri, Oct 06, 2023 at 10:27:22AM +0300, Heikki Linnakangas wrote: > I did a quick test with syslogger enabled before committing, but didn't > notice the segfault. I missed it because syslogger gets restarted and then > it worked. Thanks, Heikki. -- Michael
Attachment
I updated this patch set, addressing some of the straightforward comments from Tristan and Andres, and did some more cleanups, commenting etc. Works on Windows now. Replies to some of the individual comments below: On 11/07/2023 00:07, Tristan Partin wrote: >> @@ -4498,15 +4510,19 @@ postmaster_forkexec(int argc, char *argv[]) >> * returns the pid of the fork/exec'd process, or -1 on failure >> */ >> static pid_t >> -backend_forkexec(Port *port) >> +backend_forkexec(Port *port, CAC_state cac) >> { >> - char *av[4]; >> + char *av[5]; >> int ac = 0; >> + char cacbuf[10]; >> >> av[ac++] = "postgres"; >> av[ac++] = "--forkbackend"; >> av[ac++] = NULL; /* filled in by internal_forkexec */ >> >> + snprintf(cacbuf, sizeof(cacbuf), "%d", (int) cac); >> + av[ac++] = cacbuf; > > Might be worth a sanity check that there wasn't any truncation into > cacbuf, which is an impossibility as the code is written, but still > useful for catching a future developer error. > > Is it worth adding a command line option at all instead of having the > naked positional argument? It would help anybody who might read the > command line what the seemingly random integer stands for. +1. This gets refactored away in the last patch though. In the last patch, I used a child process name instead of an integer precisely because it looks nicer in "ps". I wonder if we should add more command line arguments, just for informational purposes. Autovacuum worker process could display the database name it's connected to, for example. I don't know how important the command line is on Windows, is it displayed by tools that people care about? On 11/07/2023 01:50, Andres Freund wrote: > On 2023-06-18 14:22:33 +0300, Heikki Linnakangas wrote: >> From 0cb6f8d665980d30a5d2a29013000744f16bf813 Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Sun, 18 Jun 2023 11:00:21 +0300 >> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores. >> >> Moves InitProcess calls a little later in EXEC_BACKEND case. > > What's the reason for this part? The point is that with this commit, InitProcess() is called at same place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent that way. > ISTM that we'd really want to get away from plastering duplicated > InitProcess() etc everywhere. Sure, we could do more to reduce the duplication. I think this is a step in the right direction, though. >> From 65384b9a6cfb3b9b589041526216e0f64d64bea5 Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Sun, 18 Jun 2023 13:56:44 +0300 >> Subject: [PATCH 8/9] Introduce ClientSocket, rename some funcs >> >> - Move more of the work on a client socket to the child process. >> >> - Reduce the amount of data that needs to be passed from postmaster to >> child. (Used to pass a full Port struct, although most of the fields were >> empty. Now we pass the much slimmer ClientSocket.) > > I think there might be extensions accessing Port. Not sure if it's worth > worrying about, but ... That's OK. Port still exists, it's just created a little later. It will be initialized by the time extensions might look at it. >> +const PMChildEntry entry_kinds[] = { >> + {"backend", BackendMain, true}, >> + >> + {"autovacuum launcher", AutoVacLauncherMain, true}, >> + {"autovacuum worker", AutoVacWorkerMain, true}, >> + {"bgworker", BackgroundWorkerMain, true}, >> + {"syslogger", SysLoggerMain, false}, >> + >> + {"startup", StartupProcessMain, true}, >> + {"bgwriter", BackgroundWriterMain, true}, >> + {"archiver", PgArchiverMain, true}, >> + {"checkpointer", CheckpointerMain, true}, >> + {"wal_writer", WalWriterMain, true}, >> + {"wal_receiver", WalReceiverMain, true}, >> +}; > > I'd assign them with the PostmasterChildType as index, so there's no danger of > getting out of order. > > const PMChildEntry entry_kinds = { > [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true}, > ... > } > > or such should work. Nice, I didn't know about that syntax! Changed it that way. > I'd also use designated initializers for the fields, it's otherwise hard to > know what true means etc. I think with one boolean and the struct declaration nearby, it's fine. If this becomes more complex in the future, with more fields, I agree. > I think it might be good to put more into array. If we e.g. knew whether a > particular child type is a backend-like, and aux process or syslogger, we > could avoid the duplicated InitAuxiliaryProcess(), > MemoryContextDelete(PostmasterContext) etc calls everywhere. I agree we could do more refactoring here. I don't agree with adding more to this struct though. I'm trying to limit the code in launch_backend.c to hiding the differences between EXEC_BACKEND and !EXEC_BACKEND. In EXEC_BACKEND mode, it restores the child process to the same state as it is after fork() in !EXEC_BACKEND mode. Any other initialization steps belong elsewhere. Some of the steps between InitPostmasterChild() and the *Main() functions could probably be moved around and refactored. I didn't think hard about that. I think that can be done separately as follow-up patch. >> +/* Save critical backend variables into the BackendParameters struct */ >> +#ifndef WIN32 >> +static bool >> +save_backend_variables(BackendParameters *param, ClientSocket *client_sock) >> +#else > > There's so much of this kind of thing. Could we hide it in a struct or such > instead of needing ifdefs everywhere? A lot of #ifdefs you mean? I agree launch_backend.c has a lot of those. I haven't come up with any good ideas on reducing them, unfortunately. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v2-0001-Pass-background-worker-entry-in-the-parameter-fil.patch
- v2-0002-Refactor-CreateSharedMemoryAndSemaphores.patch
- v2-0003-Pass-CAC-as-argument-to-backend-process.patch
- v2-0004-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch
- v2-0005-Introduce-ClientSocket-rename-some-funcs.patch
- v2-0006-Refactor-postmaster-child-process-launching.patch
On 11/10/2023 14:12, Heikki Linnakangas wrote: > On 11/07/2023 01:50, Andres Freund wrote: >>> Subject: [PATCH 3/9] Refactor CreateSharedMemoryAndSemaphores. >>> >>> Moves InitProcess calls a little later in EXEC_BACKEND case. >> >> What's the reason for this part? > > The point is that with this commit, InitProcess() is called at same > place in EXEC_BACKEND mode and !EXEC_BACKEND. It feels more consistent > that way. > >> ISTM that we'd really want to get away from plastering duplicated >> InitProcess() etc everywhere. > > Sure, we could do more to reduce the duplication. I think this is a step > in the right direction, though. Here's another rebased patch set. Compared to previous version, I did a little more refactoring around CreateSharedMemoryAndSemaphores and InitProcess: - patch 1 splits CreateSharedMemoryAndSemaphores into two functions: CreateSharedMemoryAndSemaphores is now only called at postmaster startup, and a new function called AttachSharedMemoryStructs() is called in backends in EXEC_BACKEND mode. I extracted the common parts of those functions to a new static function. (Some of this refactoring used to be part of the 3rd patch in the series, but it seems useful on its own, so I split it out.) - patch 3 moves the call to AttachSharedMemoryStructs() to InitProcess(), reducing the boilerplate code a little. The patches are incrementally useful, so if you don't have time to review all of them, a review on a subset would be useful too. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v3-0003-Refactor-how-InitProcess-is-called.patch
- v3-0001-Refactor-CreateSharedMemoryAndSemaphores.patch
- v3-0002-Pass-BackgroundWorker-entry-in-the-parameter-file.patch
- v3-0004-Pass-CAC-as-argument-to-backend-process.patch
- v3-0005-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch
- v3-0006-Introduce-ClientSocket-rename-some-funcs.patch
- v3-0007-Refactor-postmaster-child-process-launching.patch
On Wed Nov 29, 2023 at 5:36 PM CST, Heikki Linnakangas wrote: > On 11/10/2023 14:12, Heikki Linnakangas wrote: > Here's another rebased patch set. Compared to previous version, I did a > little more refactoring around CreateSharedMemoryAndSemaphores and > InitProcess: > > - patch 1 splits CreateSharedMemoryAndSemaphores into two functions: > CreateSharedMemoryAndSemaphores is now only called at postmaster > startup, and a new function called AttachSharedMemoryStructs() is called > in backends in EXEC_BACKEND mode. I extracted the common parts of those > functions to a new static function. (Some of this refactoring used to be > part of the 3rd patch in the series, but it seems useful on its own, so > I split it out.) > > - patch 3 moves the call to AttachSharedMemoryStructs() to > InitProcess(), reducing the boilerplate code a little. > > > The patches are incrementally useful, so if you don't have time to > review all of them, a review on a subset would be useful too. > From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Thu, 30 Nov 2023 00:04:22 +0200 > Subject: [PATCH v3 4/7] Pass CAC as argument to backend process For me, being new to the code, it would be nice to have more of an explanation as to why this is "better." I don't doubt it; it would just help me and future readers of this commit in the future. More of an explanation in the commit message would suffice. My other comment on this commit is that we now seem to have lost the context on what CAC stands for. Before we had the member variable to explain it. A comment on the enum would be great or changing cac named variables to canAcceptConnections. I did notice in patch 7 that there are still some variables named canAcceptConnections around, so I'll leave this comment up to you. > From 98f8397b32a0b36e221475b32697c9c5bbca86a0 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed, 11 Oct 2023 13:38:06 +0300 > Subject: [PATCH v3 5/7] Remove ConnCreate and ConnFree, and allocate Port in > stack I like it separate. > From 79aab42705a8cb0e16e61c33052fc56fdd4fca76 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed, 11 Oct 2023 13:38:10 +0300 > Subject: [PATCH v3 6/7] Introduce ClientSocket, rename some funcs > +static int BackendStartup(ClientSocket *port); s/port/client_sock > - port->remote_hostname = strdup(remote_host); > + port->remote_hostname = pstrdup(remote_host); > + MemoryContextSwitchTo(oldcontext); Something funky with the whitespace here, but my eyes might also be playing tricks on me. Mixing spaces in tabs like what do in this codebase makes it difficult to review :). > From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Thu, 30 Nov 2023 00:07:34 +0200 > Subject: [PATCH v3 7/7] Refactor postmaster child process launching > + entry_kinds[child_type].main_fn(startup_data, startup_data_len); > + Assert(false); Seems like you want the pg_unreachable() macro here instead of Assert(false). Similar comment at the end of SubPostmasterMain(). > + if (fwrite(param, paramsz, 1, fp) != 1) > + { > + ereport(LOG, > + (errcode_for_file_access(), > + errmsg("could not write to file \"%s\": %m", tmpfilename))); > + FreeFile(fp); > + return -1; > + } > + > + /* Release file */ > + if (FreeFile(fp)) > + { > + ereport(LOG, > + (errcode_for_file_access(), > + errmsg("could not write to file \"%s\": %m", tmpfilename))); > + return -1; > + } Two pieces of feedback here. I generally find write(2) more useful than fwrite(3) because write(2) will report a useful errno, whereas fwrite(2) just uses ferror(3). The additional errno information might be valuable context in the log message. Up to you if you think it is also valuable. The log message if FreeFile() fails doesn't seem to make sense to me. I didn't see any file writing in that code path, but it is possible that I missed something. > + /* > + * Set reference point for stack-depth checking. This might seem > + * redundant in !EXEC_BACKEND builds; but it's not because the postmaster > + * launches its children from signal handlers, so we might be running on > + * an alternative stack. XXX still true? > + */ > + (void) set_stack_base(); Looks like there is still this XXX left. Can't say I completely understand the second sentence either. > + /* > + * make sure stderr is in binary mode before anything can possibly be > + * written to it, in case it's actually the syslogger pipe, so the pipe > + * chunking protocol isn't disturbed. Non-logpipe data gets translated on > + * redirection (e.g. via pg_ctl -l) anyway. > + */ Nit: The 'm' in the first "make" should be capitalized. > + if (fread(¶m, sizeof(param), 1, fp) != 1) > + { > + write_stderr("could not read from backend variables file \"%s\": %s\n", > + id, strerror(errno)); > + exit(1); > + } > + > + /* read startup data */ > + *startup_data_len = param.startup_data_len; > + if (param.startup_data_len > 0) > + { > + *startup_data = palloc(*startup_data_len); > + if (fread(*startup_data, *startup_data_len, 1, fp) != 1) > + { > + write_stderr("could not read startup data from backend variables file \"%s\": %s\n", > + id, strerror(errno)); > + exit(1); > + } > + } fread(3) doesn't set errno. I would probably switch these to read(2) for the reason I wrote in a previous comment. > + /* > + * Need to reinitialize the SSL library in the backend, since the context > + * structures contain function pointers and cannot be passed through the > + * parameter file. > + * > + * If for some reason reload fails (maybe the user installed broken key > + * files), soldier on without SSL; that's better than all connections > + * becoming impossible. > + * > + * XXX should we do this in all child processes? For the moment it's > + * enough to do it in backend children. > + */ > +#ifdef USE_SSL > + if (EnableSSL) > + { > + if (secure_initialize(false) == 0) > + LoadedSSL = true; > + else > + ereport(LOG, > + (errmsg("SSL configuration could not be loaded in child process"))); > + } > +#endif Do other child process types do any non-local communication? > -typedef struct ClientSocket { > +struct ClientSocket > +{ > pgsocket sock; /* File descriptor */ > SockAddr laddr; /* local addr (postmaster) */ > SockAddr raddr; /* remote addr (client) */ > -} ClientSocket; > +}; > +typedef struct ClientSocket ClientSocket; Can't say I completely understand the reason for this change given it was added in your series. I didn't look too hard at the Windows-specific code, so maybe someone who knows Windows will have something to say, but it also might've just been copy-paste that I didn't realize. There were a few more XXXs that probably should be figured out before committing. Though perhaps some of them were already there. Patches 1-3 seem committable as-is. I only had minor comments on everything but 7, so after taking a look at those, they could be committed. Overall, this seems liked a marked improvement :). -- Tristan Partin Neon (https://neon.tech)
Hi, On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote: > - patch 1 splits CreateSharedMemoryAndSemaphores into two functions: > CreateSharedMemoryAndSemaphores is now only called at postmaster startup, > and a new function called AttachSharedMemoryStructs() is called in backends > in EXEC_BACKEND mode. I extracted the common parts of those functions to a > new static function. (Some of this refactoring used to be part of the 3rd > patch in the series, but it seems useful on its own, so I split it out.) I like that idea. > From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Thu, 30 Nov 2023 00:01:25 +0200 > Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores > > For clarity, have separate functions for *creating* the shared memory > and semaphores at postmaster or single-user backend startup, and > for *attaching* to existing shared memory structures in EXEC_BACKEND > case. CreateSharedMemoryAndSemaphores() is now called only at > postmaster startup, and a new AttachSharedMemoryStructs() function is > called at backend startup in EXEC_BACKEND mode. I assume CreateSharedMemoryAndSemaphores() is still called during crash restart? I wonder if it shouldn't split three ways: 1) create 2) initialize 3) attach > From 3478cafcf74a5c8d649e0287e6c72669a29c0e70 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Thu, 30 Nov 2023 00:02:03 +0200 > Subject: [PATCH v3 2/7] Pass BackgroundWorker entry in the parameter file in > EXEC_BACKEND mode > > This makes it possible to move InitProcess later in SubPostmasterMain > (in next commit), as we no longer need to access shared memory to read > background worker entry. > static void read_backend_variables(char *id, Port *port); > @@ -4831,7 +4833,7 @@ SubPostmasterMain(int argc, char *argv[]) > strcmp(argv[1], "--forkavlauncher") == 0 || > strcmp(argv[1], "--forkavworker") == 0 || > strcmp(argv[1], "--forkaux") == 0 || > - strncmp(argv[1], "--forkbgworker=", 15) == 0) > + strncmp(argv[1], "--forkbgworker", 14) == 0) > PGSharedMemoryReAttach(); > else > PGSharedMemoryNoReAttach(); > @@ -4962,10 +4964,8 @@ SubPostmasterMain(int argc, char *argv[]) > > AutoVacWorkerMain(argc - 2, argv + 2); /* does not return */ > } > - if (strncmp(argv[1], "--forkbgworker=", 15) == 0) > + if (strncmp(argv[1], "--forkbgworker", 14) == 0) Now that we don't need to look at parameters anymore, these should probably be just a strcmp(), like the other cases? > From 0d071474e12a70ff8113c7b0731c5b97fec45007 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed, 29 Nov 2023 23:47:25 +0200 > Subject: [PATCH v3 3/7] Refactor how InitProcess is called > > The order of process initialization steps is now more consistent > between !EXEC_BACKEND and EXEC_BACKEND modes. InitProcess() is called > at the same place in either mode. We can now also move the > AttachSharedMemoryStructs() call into InitProcess() itself. This > reduces the number of "#ifdef EXEC_BACKEND" blocks. Yay. > diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c > index cdfdd6fbe1d..6c708777dde 100644 > --- a/src/backend/storage/lmgr/proc.c > +++ b/src/backend/storage/lmgr/proc.c > @@ -461,6 +461,12 @@ InitProcess(void) > */ > InitLWLockAccess(); > InitDeadLockChecking(); > + > +#ifdef EXEC_BACKEND > + /* Attach process to shared data structures */ > + if (IsUnderPostmaster) > + AttachSharedMemoryStructs(); > +#endif > } > > /* > @@ -614,6 +620,12 @@ InitAuxiliaryProcess(void) > * Arrange to clean up at process exit. > */ > on_shmem_exit(AuxiliaryProcKill, Int32GetDatum(proctype)); > + > +#ifdef EXEC_BACKEND > + /* Attach process to shared data structures */ > + if (IsUnderPostmaster) > + AttachSharedMemoryStructs(); > +#endif > } Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call InitLWLockAccess(). I think a short comment explaining why we can attach to shmem structs after already accessing shared memory earlier in the function would be worthwhile. > From ce51876f87f1e4317e25baf64184749448fcd033 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Thu, 30 Nov 2023 00:07:34 +0200 > Subject: [PATCH v3 7/7] Refactor postmaster child process launching > > - Move code related to launching backend processes to new source file, > launch_backend.c > > - Introduce new postmaster_child_launch() function that deals with the > differences between EXEC_BACKEND and fork mode. > > - Refactor the mechanism of passing information from the parent to > child process. Instead of using different command-line arguments when > launching the child process in EXEC_BACKEND mode, pass a > variable-length blob of data along with all the global variables. The > contents of that blob depend on the kind of child process being > launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply > inherited from the parent to child process. > > [...] > 33 files changed, 1787 insertions(+), 2002 deletions(-) Well, that's not small... I think it may be worth splitting some of the file renaming out into a separate commit, makes it harder to see what changed here. > +AutoVacLauncherMain(char *startup_data, size_t startup_data_len) > { > - pid_t AutoVacPID; > + sigjmp_buf local_sigjmp_buf; > > -#ifdef EXEC_BACKEND > - switch ((AutoVacPID = avlauncher_forkexec())) > -#else > - switch ((AutoVacPID = fork_process())) > -#endif > + /* Release postmaster's working memory context */ > + if (PostmasterContext) > { > - case -1: > - ereport(LOG, > - (errmsg("could not fork autovacuum launcher process: %m"))); > - return 0; > - > -#ifndef EXEC_BACKEND > - case 0: > - /* in postmaster child ... */ > - InitPostmasterChild(); > - > - /* Close the postmaster's sockets */ > - ClosePostmasterPorts(false); > - > - AutoVacLauncherMain(0, NULL); > - break; > -#endif > - default: > - return (int) AutoVacPID; > + MemoryContextDelete(PostmasterContext); > + PostmasterContext = NULL; > } > > - /* shouldn't get here */ > - return 0; > -} This if (PostmasterContext) ... else "shouldn't get here" business seems pretty silly, more likely to hide problems than to help. > +/* > + * Information needed to launch different kinds of child processes. > + */ > +static const struct > +{ > + const char *name; > + void (*main_fn) (char *startup_data, size_t startup_data_len); > + bool shmem_attach; > +} entry_kinds[] = { > + [PMC_BACKEND] = {"backend", BackendMain, true}, Personally I'd give the struct an actual name - makes the debugging experience a bit nicer than anonymous structs that you can't even reference by a typedef. > + [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true}, > + [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true}, > + [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true}, > + [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false}, > + > + [PMC_STARTUP] = {"startup", StartupProcessMain, true}, > + [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true}, > + [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true}, > + [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true}, > + [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true}, > + [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true}, > +}; It feels like we have too many different ways of documenting the type of a process. This new PMC_ stuff, enum AuxProcType, enum BackendType. Which then leads to code like this: > -CheckpointerMain(void) > +CheckpointerMain(char *startup_data, size_t startup_data_len) > { > sigjmp_buf local_sigjmp_buf; > MemoryContext checkpointer_context; > > + Assert(startup_data_len == 0); > + > + MyAuxProcType = CheckpointerProcess; > + MyBackendType = B_CHECKPOINTER; > + AuxiliaryProcessInit(); > + For each type of child process. That seems a bit too redundant. Can't we unify this at least somewhat? Can't we just reuse BackendType here? Sure, there'd be pointless entry for B_INVALID, but that doesn't seem like a problem, could even be useful, by pointing it to a function raising an error. At the very least this shouldn't deviate from the naming pattern of BackendType. > +/* > + * SubPostmasterMain -- Get the fork/exec'd process into a state equivalent > + * to what it would be if we'd simply forked on Unix, and then > + * dispatch to the appropriate place. > + * > + * The first two command line arguments are expected to be "--forkchild=<name>", > + * where <name> indicates which postmaster child we are to become, and > + * the name of a variables file that we can read to load data that would > + * have been inherited by fork() on Unix. > + */ > +void > +SubPostmasterMain(int argc, char *argv[]) > +{ > + PostmasterChildType child_type; > + char *startup_data; > + size_t startup_data_len; > + char *entry_name; > + bool found = false; > + > + /* In EXEC_BACKEND case we will not have inherited these settings */ > + IsPostmasterEnvironment = true; > + whereToSendOutput = DestNone; > + > + /* Setup essential subsystems (to ensure elog() behaves sanely) */ > + InitializeGUCOptions(); > + > + /* Check we got appropriate args */ > + if (argc != 3) > + elog(FATAL, "invalid subpostmaster invocation"); > + > + if (strncmp(argv[1], "--forkchild=", 12) != 0) > + elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)"); > + entry_name = argv[1] + 12; > + found = false; > + for (int idx = 0; idx < lengthof(entry_kinds); idx++) > + { > + if (strcmp(entry_kinds[idx].name, entry_name) == 0) > + { > + child_type = idx; > + found = true; > + break; > + } > + } > + if (!found) > + elog(ERROR, "unknown child kind %s", entry_name); If we then have to search linearly, why don't we just pass the index into the array? > > -#define StartupDataBase() StartChildProcess(StartupProcess) > -#define StartArchiver() StartChildProcess(ArchiverProcess) > -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess) > -#define StartCheckpointer() StartChildProcess(CheckpointerProcess) > -#define StartWalWriter() StartChildProcess(WalWriterProcess) > -#define StartWalReceiver() StartChildProcess(WalReceiverProcess) > +#define StartupDataBase() StartChildProcess(PMC_STARTUP) > +#define StartArchiver() StartChildProcess(PMC_ARCHIVER) > +#define StartBackgroundWriter() StartChildProcess(PMC_BGWRITER) > +#define StartCheckpointer() StartChildProcess(PMC_CHECKPOINTER) > +#define StartWalWriter() StartChildProcess(PMC_WAL_WRITER) > +#define StartWalReceiver() StartChildProcess(PMC_WAL_RECEIVER) > + > +#define StartAutoVacLauncher() StartChildProcess(PMC_AV_LAUNCHER); > +#define StartAutoVacWorker() StartChildProcess(PMC_AV_WORKER); Obviously not your fault, but these macros are so pointless... Making it harder to find where we start child processes, all to save a a few characters in one place, while adding considerably more in others. > +void > +BackendMain(char *startup_data, size_t startup_data_len) > +{ Is there any remaining reason for this to live in postmaster.c? Given that other backend types don't, that seems oddly assymmetrical. Greetings, Andres Freund
Hi, On 2023-11-30 12:44:33 -0600, Tristan Partin wrote: > > + /* > > + * Set reference point for stack-depth checking. This might seem > > + * redundant in !EXEC_BACKEND builds; but it's not because the postmaster > > + * launches its children from signal handlers, so we might be running on > > + * an alternative stack. XXX still true? > > + */ > > + (void) set_stack_base(); > > Looks like there is still this XXX left. Can't say I completely understand > the second sentence either. We used to start some child processes of postmaster in signal handlers. That was fixed in commit 7389aad6366 Author: Thomas Munro <tmunro@postgresql.org> Date: 2023-01-12 12:34:23 +1300 Use WaitEventSet API for postmaster's event loop. In some cases signal handlers run on a separate stack, which meant that the set_stack_base() we did in postmaster would yield a completely bogus stack depth estimation. So this comment should likely have been removed. Thomas? Greetings, Andres Freund
On Fri, Dec 1, 2023 at 9:31 AM Andres Freund <andres@anarazel.de> wrote: > On 2023-11-30 12:44:33 -0600, Tristan Partin wrote: > > > + /* > > > + * Set reference point for stack-depth checking. This might seem > > > + * redundant in !EXEC_BACKEND builds; but it's not because the postmaster > > > + * launches its children from signal handlers, so we might be running on > > > + * an alternative stack. XXX still true? > > > + */ > > > + (void) set_stack_base(); > > > > Looks like there is still this XXX left. Can't say I completely understand > > the second sentence either. > > We used to start some child processes of postmaster in signal handlers. That > was fixed in > > commit 7389aad6366 > Author: Thomas Munro <tmunro@postgresql.org> > Date: 2023-01-12 12:34:23 +1300 > > Use WaitEventSet API for postmaster's event loop. > > > In some cases signal handlers run on a separate stack, which meant that the > set_stack_base() we did in postmaster would yield a completely bogus stack > depth estimation. So this comment should likely have been removed. Thomas? Right, I should delete that comment in master and 16. While wondering what to write instead, my first thought is that it is better to leave the actual call there though, because otherwise there is a small difference in stack reference point between EXEC_BACKEND and !EXEC_BACKEND builds, consumed by a few postmaster stack frames. So the new comment would just say that. (I did idly wonder if there is a longjmp trick to zap those frames post-fork, not looked into and probably doesn't really improve anything we care about...)
On 30/11/2023 22:26, Andres Freund wrote: > Aside: Somewhat odd that InitAuxiliaryProcess() doesn't call > InitLWLockAccess(). Yeah that caught my eye too. It seems to have been an oversight in commit 1c6821be31f. Before that, in 9.4, the lwlock stats were printed for aux processes too, on shutdown. Committed a fix for that to master. -- Heikki Linnakangas Neon (https://neon.tech)
On 30/11/2023 22:26, Andres Freund wrote: > On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote: >> From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Thu, 30 Nov 2023 00:01:25 +0200 >> Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores >> >> For clarity, have separate functions for *creating* the shared memory >> and semaphores at postmaster or single-user backend startup, and >> for *attaching* to existing shared memory structures in EXEC_BACKEND >> case. CreateSharedMemoryAndSemaphores() is now called only at >> postmaster startup, and a new AttachSharedMemoryStructs() function is >> called at backend startup in EXEC_BACKEND mode. > > I assume CreateSharedMemoryAndSemaphores() is still called during crash > restart? Yes. > I wonder if it shouldn't split three ways: > 1) create > 2) initialize > 3) attach Why? What would be the difference between create and initialize phases? -- Heikki Linnakangas Neon (https://neon.tech)
Hi, On 2023-12-01 01:36:13 +0200, Heikki Linnakangas wrote: > On 30/11/2023 22:26, Andres Freund wrote: > > On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote: > > > From a96b6e92fdeaa947bf32774c425419b8f987b8e2 Mon Sep 17 00:00:00 2001 > > > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > > > Date: Thu, 30 Nov 2023 00:01:25 +0200 > > > Subject: [PATCH v3 1/7] Refactor CreateSharedMemoryAndSemaphores > > > > > > For clarity, have separate functions for *creating* the shared memory > > > and semaphores at postmaster or single-user backend startup, and > > > for *attaching* to existing shared memory structures in EXEC_BACKEND > > > case. CreateSharedMemoryAndSemaphores() is now called only at > > > postmaster startup, and a new AttachSharedMemoryStructs() function is > > > called at backend startup in EXEC_BACKEND mode. > > > > I assume CreateSharedMemoryAndSemaphores() is still called during crash > > restart? > > Yes. > > > I wonder if it shouldn't split three ways: > > 1) create > > 2) initialize > > 3) attach > > Why? What would be the difference between create and initialize phases? Mainly because I somehow mis-remembered how we deal with the shared memory allocation when crashing. I somehow had remembered that we reused the same allocation across restarts, but reinitialized it from scratch. There's a kernel of truth to that, because we can end up re-attaching to an existing sysv shared memory segment. But not more. Perhaps I was confusing things with the listen sockets? Andres
On 30/11/2023 20:44, Tristan Partin wrote: > Patches 1-3 seem committable as-is. Thanks for the review! I'm focusing on patches 1-3 now, and will come back to the rest after committing 1-3. There was one test failure with EXEC_BACKEND from patch 2, in 'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is empty to decide if the BackgroundWorker struct is filled in or not, but it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably should, I think that's an oversight in 'test_shm_mq', but that's a separate issue. I did some more refactoring of patch 2, to fix that and to improve it in general. The BackgroundWorker struct is now passed through the fork-related functions similarly to the Port struct. That seems more consistent. Attached is new version of these patches. For easier review, I made the new refactorings compared in a new commit 0003. I will squash that before pushing, but this makes it easier to see what changed. Barring any new feedback or issues, I will commit these. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Hello Heikki, 01.12.2023 15:10, Heikki Linnakangas wrote: > Attached is new version of these patches. For easier review, I made the new refactorings compared in a new commit > 0003. I will squash that before pushing, but this makes it easier to see what changed. > > Barring any new feedback or issues, I will commit these. > Maybe you could look at issues with running `make check` under Valgrind when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND": # +++ regress check in src/test/regress +++ # initializing database system by copying initdb template # postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the reason Bail out!make[1]: *** ... 2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG: listening on Unix socket "/tmp/pg_regress-pPFNk0/.s.PGSQL.55312" ==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised byte(s) ==00:00:00:01.692 1259396== at 0x5245A37: write (write.c:26) ==00:00:00:01.692 1259396== by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180) ==00:00:00:01.692 1259396== by 0x51BC84F: new_do_write (fileops.c:448) ==00:00:00:01.692 1259396== by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254) ==00:00:00:01.692 1259396== by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196) ==00:00:00:01.692 1259396== by 0x51B1056: fwrite (iofwrite.c:39) ==00:00:00:01.692 1259396== by 0x552E21: internal_forkexec (postmaster.c:4518) ==00:00:00:01.692 1259396== by 0x5546A1: postmaster_forkexec (postmaster.c:4444) ==00:00:00:01.692 1259396== by 0x55471C: StartChildProcess (postmaster.c:5275) ==00:00:00:01.692 1259396== by 0x557B61: PostmasterMain (postmaster.c:1454) ==00:00:00:01.692 1259396== by 0x472136: main (main.c:198) ==00:00:00:01.692 1259396== Address 0x1ffeffdc11 is on thread 1's stack ==00:00:00:01.692 1259396== in frame #4, created by internal_forkexec (postmaster.c:4482) ==00:00:00:01.692 1259396== With memset(param, 0, sizeof(*param)); added at the beginning of save_backend_variables(), server starts successfully, but then `make check` fails with other Valgrind error. Best regards, Alexander
On Fri Dec 1, 2023 at 6:10 AM CST, Heikki Linnakangas wrote: > On 30/11/2023 20:44, Tristan Partin wrote: > > Patches 1-3 seem committable as-is. > > Thanks for the review! I'm focusing on patches 1-3 now, and will come > back to the rest after committing 1-3. > > There was one test failure with EXEC_BACKEND from patch 2, in > 'test_shm_mq'. In restore_backend_variables() I checked if 'bgw_name' is > empty to decide if the BackgroundWorker struct is filled in or not, but > it turns out that 'test_shm_mq' doesn't fill in bgw_name. It probably > should, I think that's an oversight in 'test_shm_mq', but that's a > separate issue. > > I did some more refactoring of patch 2, to fix that and to improve it in > general. The BackgroundWorker struct is now passed through the > fork-related functions similarly to the Port struct. That seems more > consistent. > > Attached is new version of these patches. For easier review, I made the > new refactorings compared in a new commit 0003. I will squash that > before pushing, but this makes it easier to see what changed. > > Barring any new feedback or issues, I will commit these. My only thought is that perhaps has_bg_worker is a better name than has_worker, but I agree that having a flag is better than checking bgw_name. -- Tristan Partin Neon (https://neon.tech)
On 01/12/2023 16:00, Alexander Lakhin wrote: > Maybe you could look at issues with running `make check` under Valgrind > when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND": > # +++ regress check in src/test/regress +++ > # initializing database system by copying initdb template > # postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the reason > Bail out!make[1]: *** > > ... > 2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG: listening on Unix socket "/tmp/pg_regress-pPFNk0/.s.PGSQL.55312" > ==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised byte(s) > ==00:00:00:01.692 1259396== at 0x5245A37: write (write.c:26) > ==00:00:00:01.692 1259396== by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180) > ==00:00:00:01.692 1259396== by 0x51BC84F: new_do_write (fileops.c:448) > ==00:00:00:01.692 1259396== by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254) > ==00:00:00:01.692 1259396== by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196) > ==00:00:00:01.692 1259396== by 0x51B1056: fwrite (iofwrite.c:39) > ==00:00:00:01.692 1259396== by 0x552E21: internal_forkexec (postmaster.c:4518) > ==00:00:00:01.692 1259396== by 0x5546A1: postmaster_forkexec (postmaster.c:4444) > ==00:00:00:01.692 1259396== by 0x55471C: StartChildProcess (postmaster.c:5275) > ==00:00:00:01.692 1259396== by 0x557B61: PostmasterMain (postmaster.c:1454) > ==00:00:00:01.692 1259396== by 0x472136: main (main.c:198) > ==00:00:00:01.692 1259396== Address 0x1ffeffdc11 is on thread 1's stack > ==00:00:00:01.692 1259396== in frame #4, created by internal_forkexec (postmaster.c:4482) > ==00:00:00:01.692 1259396== > > With memset(param, 0, sizeof(*param)); added at the beginning of > save_backend_variables(), server starts successfully, but then > `make check` fails with other Valgrind error. That's actually a pre-existing issue, I'm seeing that even on unpatched 'master'. In a nutshell, the problem is that the uninitialized padding bytes in BackendParameters are written to the file. When we read the file back, we don't access the padding bytes, so that's harmless. But Valgrind doesn't know that. On Windows, the file is created with CreateFileMapping(INVALID_HANDLE_VALUE, ...) and we write the variables directly to the mapping. If I understand the Windows API docs correctly, it is guaranteed to be initialized to zeros, so we don't have this problem on Windows, only on other platforms with EXEC_BACKEND. I think it makes sense to clear the memory on other platforms too, since that's what we do on Windows. Committed a fix with memset(). I'm not sure what our policy with backpatching this kind of issues is. This goes back to all supported versions, but given the lack of complaints, I chose to not backpatch. -- Heikki Linnakangas Neon (https://neon.tech)
On Fri Dec 1, 2023 at 2:44 PM CST, Heikki Linnakangas wrote: > On 01/12/2023 16:00, Alexander Lakhin wrote: > > Maybe you could look at issues with running `make check` under Valgrind > > when server built with CPPFLAGS="-DUSE_VALGRIND -DEXEC_BACKEND": > > # +++ regress check in src/test/regress +++ > > # initializing database system by copying initdb template > > # postmaster failed, examine ".../src/test/regress/log/postmaster.log" for the reason > > Bail out!make[1]: *** > > > > ... > > 2023-12-01 16:48:39.136 MSK postmaster[1307988] LOG: listening on Unix socket "/tmp/pg_regress-pPFNk0/.s.PGSQL.55312" > > ==00:00:00:01.692 1259396== Syscall param write(buf) points to uninitialised byte(s) > > ==00:00:00:01.692 1259396== at 0x5245A37: write (write.c:26) > > ==00:00:00:01.692 1259396== by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180) > > ==00:00:00:01.692 1259396== by 0x51BC84F: new_do_write (fileops.c:448) > > ==00:00:00:01.692 1259396== by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254) > > ==00:00:00:01.692 1259396== by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196) > > ==00:00:00:01.692 1259396== by 0x51B1056: fwrite (iofwrite.c:39) > > ==00:00:00:01.692 1259396== by 0x552E21: internal_forkexec (postmaster.c:4518) > > ==00:00:00:01.692 1259396== by 0x5546A1: postmaster_forkexec (postmaster.c:4444) > > ==00:00:00:01.692 1259396== by 0x55471C: StartChildProcess (postmaster.c:5275) > > ==00:00:00:01.692 1259396== by 0x557B61: PostmasterMain (postmaster.c:1454) > > ==00:00:00:01.692 1259396== by 0x472136: main (main.c:198) > > ==00:00:00:01.692 1259396== Address 0x1ffeffdc11 is on thread 1's stack > > ==00:00:00:01.692 1259396== in frame #4, created by internal_forkexec (postmaster.c:4482) > > ==00:00:00:01.692 1259396== > > > > With memset(param, 0, sizeof(*param)); added at the beginning of > > save_backend_variables(), server starts successfully, but then > > `make check` fails with other Valgrind error. > > That's actually a pre-existing issue, I'm seeing that even on unpatched > 'master'. > > In a nutshell, the problem is that the uninitialized padding bytes in > BackendParameters are written to the file. When we read the file back, > we don't access the padding bytes, so that's harmless. But Valgrind > doesn't know that. > > On Windows, the file is created with > CreateFileMapping(INVALID_HANDLE_VALUE, ...) and we write the variables > directly to the mapping. If I understand the Windows API docs correctly, > it is guaranteed to be initialized to zeros, so we don't have this > problem on Windows, only on other platforms with EXEC_BACKEND. I think > it makes sense to clear the memory on other platforms too, since that's > what we do on Windows. > > Committed a fix with memset(). I'm not sure what our policy with > backpatching this kind of issues is. This goes back to all supported > versions, but given the lack of complaints, I chose to not backpatch. Seems like a harmless think to backpatch. It is conceivable that someone might want to run Valgrind on something other than HEAD. -- Tristan Partin Neon (https://neon.tech)
"Tristan Partin" <tristan@neon.tech> writes: > On Fri Dec 1, 2023 at 2:44 PM CST, Heikki Linnakangas wrote: >> Committed a fix with memset(). I'm not sure what our policy with >> backpatching this kind of issues is. This goes back to all supported >> versions, but given the lack of complaints, I chose to not backpatch. > Seems like a harmless think to backpatch. It is conceivable that someone > might want to run Valgrind on something other than HEAD. FWIW, I agree with Heikki's conclusion. EXEC_BACKEND on non-Windows is already a niche developer-only setup, and given the lack of complaints, nobody's that interested in running Valgrind with it. Fixing it on HEAD seems like plenty. regards, tom lane
Hello Heikki, 01.12.2023 23:44, Heikki Linnakangas wrote: > >> With memset(param, 0, sizeof(*param)); added at the beginning of >> save_backend_variables(), server starts successfully, but then >> `make check` fails with other Valgrind error. > > That's actually a pre-existing issue, I'm seeing that even on unpatched 'master'. Thank you for fixing that! Yes, I had discovered it before, but yesterday I decided to check whether your patches improve the situation... What bothered me additionally, is an error detected after server start. I couldn't see it without the patches applied. I mean, on HEAD I now see `make check` passing, but with the patches it fails: ... # parallel group (20 tests): interval date numerology polygon box macaddr8 macaddr multirangetypes line timestamp timetz timestamptz time circle strings lseg inet md5 path point not ok 22 + strings 1048 ms # (test process exited with exit code 2) not ok 23 + md5 1052 ms # (test process exited with exit code 2) ... src/test/regress/log/postmaster.log contains: ==00:00:00:30.730 1713480== Syscall param write(buf) points to uninitialised byte(s) ==00:00:00:30.730 1713480== at 0x5245A37: write (write.c:26) ==00:00:00:30.730 1713480== by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180) ==00:00:00:30.730 1713480== by 0x51BC84F: new_do_write (fileops.c:448) ==00:00:00:30.730 1713480== by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254) ==00:00:00:30.730 1713480== by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196) ==00:00:00:30.730 1713480== by 0x51B1056: fwrite (iofwrite.c:39) ==00:00:00:30.730 1713480== by 0x5540CF: internal_forkexec (postmaster.c:4526) ==00:00:00:30.730 1713480== by 0x5543C0: bgworker_forkexec (postmaster.c:5624) ==00:00:00:30.730 1713480== by 0x555477: do_start_bgworker (postmaster.c:5665) ==00:00:00:30.730 1713480== by 0x555738: maybe_start_bgworkers (postmaster.c:5928) ==00:00:00:30.730 1713480== by 0x556072: process_pm_pmsignal (postmaster.c:5080) ==00:00:00:30.730 1713480== by 0x556610: ServerLoop (postmaster.c:1761) ==00:00:00:30.730 1713480== by 0x557BE2: PostmasterMain (postmaster.c:1469) ==00:00:00:30.730 1713480== by 0x47216B: main (main.c:198) ==00:00:00:30.730 1713480== Address 0x1ffeffd8c0 is on thread 1's stack ==00:00:00:30.730 1713480== in frame #4, created by internal_forkexec (postmaster.c:4482) ==00:00:00:30.730 1713480== ... 2023-12-02 05:14:30.751 MSK client backend[1713740] pg_regress/rangetypes FATAL: terminating connection due to unexpected postmaster exit 2023-12-02 05:14:31.033 MSK client backend[1713734] pg_regress/numeric FATAL: postmaster exited during a parallel transaction TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", Line: 591, PID: 1713734 I haven't looked deeper yet, but it seems that we see two issues here (and Assert is not directly caused by the patches set.) Best regards, Alexander
On 02/12/2023 05:00, Alexander Lakhin wrote: > What bothered me additionally, is an error detected after server start. I > couldn't see it without the patches applied. I mean, on HEAD I now see > `make check` passing, but with the patches it fails: > ... > # parallel group (20 tests): interval date numerology polygon box macaddr8 macaddr multirangetypes line timestamp > timetz timestamptz time circle strings lseg inet md5 path point > not ok 22 + strings 1048 ms > # (test process exited with exit code 2) > not ok 23 + md5 1052 ms > # (test process exited with exit code 2) > ... > src/test/regress/log/postmaster.log contains: > ==00:00:00:30.730 1713480== Syscall param write(buf) points to uninitialised byte(s) > ==00:00:00:30.730 1713480== at 0x5245A37: write (write.c:26) > ==00:00:00:30.730 1713480== by 0x51BBF6C: _IO_file_write@@GLIBC_2.2.5 (fileops.c:1180) > ==00:00:00:30.730 1713480== by 0x51BC84F: new_do_write (fileops.c:448) > ==00:00:00:30.730 1713480== by 0x51BC84F: _IO_new_file_xsputn (fileops.c:1254) > ==00:00:00:30.730 1713480== by 0x51BC84F: _IO_file_xsputn@@GLIBC_2.2.5 (fileops.c:1196) > ==00:00:00:30.730 1713480== by 0x51B1056: fwrite (iofwrite.c:39) > ==00:00:00:30.730 1713480== by 0x5540CF: internal_forkexec (postmaster.c:4526) > ==00:00:00:30.730 1713480== by 0x5543C0: bgworker_forkexec (postmaster.c:5624) > ==00:00:00:30.730 1713480== by 0x555477: do_start_bgworker (postmaster.c:5665) > ==00:00:00:30.730 1713480== by 0x555738: maybe_start_bgworkers (postmaster.c:5928) > ==00:00:00:30.730 1713480== by 0x556072: process_pm_pmsignal (postmaster.c:5080) > ==00:00:00:30.730 1713480== by 0x556610: ServerLoop (postmaster.c:1761) > ==00:00:00:30.730 1713480== by 0x557BE2: PostmasterMain (postmaster.c:1469) > ==00:00:00:30.730 1713480== by 0x47216B: main (main.c:198) > ==00:00:00:30.730 1713480== Address 0x1ffeffd8c0 is on thread 1's stack > ==00:00:00:30.730 1713480== in frame #4, created by internal_forkexec (postmaster.c:4482) > ==00:00:00:30.730 1713480== > ... Ack, I see this too. I fixed it by adding MCXT_ALLOC_ZERO to the allocation of the BackendWorker struct. That's a little heavy-handed, like with the previous failures the uninitialized padding bytes are written to the file and read back, but not accessed after that. But it seems like the simplest fix. This isn't performance critical after all. I also renamed the 'has_worker' field to 'has_bgworker', per Tristan's suggestion. Pushed with those changes. Thanks for the reviews! > 2023-12-02 05:14:30.751 MSK client backend[1713740] pg_regress/rangetypes FATAL: terminating connection due to > unexpected postmaster exit > 2023-12-02 05:14:31.033 MSK client backend[1713734] pg_regress/numeric FATAL: postmaster exited during a parallel > transaction > TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", Line: 591, PID: 1713734 > > I haven't looked deeper yet, but it seems that we see two issues here (and > Assert is not directly caused by the patches set.) I have not been able to reproduce this one. -- Heikki Linnakangas Neon (https://neon.tech)
On 30/11/2023 22:26, Andres Freund wrote: > On 2023-11-30 01:36:25 +0200, Heikki Linnakangas wrote: >> [...] >> 33 files changed, 1787 insertions(+), 2002 deletions(-) > > Well, that's not small... > > I think it may be worth splitting some of the file renaming out into a > separate commit, makes it harder to see what change here. Here you are (details at end of this email) >> + [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true}, >> + [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true}, >> + [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true}, >> + [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false}, >> + >> + [PMC_STARTUP] = {"startup", StartupProcessMain, true}, >> + [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true}, >> + [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true}, >> + [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true}, >> + [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true}, >> + [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true}, >> +}; > > > It feels like we have too many different ways of documenting the type of a > process. This new PMC_ stuff, enum AuxProcType, enum BackendType. Agreed. And "am_walsender" and such variables. > Which then leads to code like this: > >> -CheckpointerMain(void) >> +CheckpointerMain(char *startup_data, size_t startup_data_len) >> { >> sigjmp_buf local_sigjmp_buf; >> MemoryContext checkpointer_context; >> >> + Assert(startup_data_len == 0); >> + >> + MyAuxProcType = CheckpointerProcess; >> + MyBackendType = B_CHECKPOINTER; >> + AuxiliaryProcessInit(); >> + > > For each type of child process. That seems a bit too redundant. Can't we > unify this at least somewhat? Can't we just reuse BackendType here? Sure, > there'd be pointless entry for B_INVALID, but that doesn't seem like a > problem, could even be useful, by pointing it to a function raising an error. There are a few differences: B_INVALID (and B_STANDALONE_BACKEND) are pointless for this array as you noted. But also, we don't know if the backend is a regular backend or WAL sender until authentication, so for a WAL sender, we'd need to change MyBackendType from B_BACKEND to B_WAL_SENDER after forking. Maybe that's ok. I didn't do anything about this yet, but I'll give it some more thought. >> + if (strncmp(argv[1], "--forkchild=", 12) != 0) >> + elog(FATAL, "invalid subpostmaster invocation (--forkchild argument missing)"); >> + entry_name = argv[1] + 12; >> + found = false; >> + for (int idx = 0; idx < lengthof(entry_kinds); idx++) >> + { >> + if (strcmp(entry_kinds[idx].name, entry_name) == 0) >> + { >> + child_type = idx; >> + found = true; >> + break; >> + } >> + } >> + if (!found) >> + elog(ERROR, "unknown child kind %s", entry_name); > > If we then have to search linearly, why don't we just pass the index into the > array? We could. I like the idea of a human-readable name on the command line, although I'm not sure if it's really visible anywhere. >> +void >> +BackendMain(char *startup_data, size_t startup_data_len) >> +{ > > Is there any remaining reason for this to live in postmaster.c? Given that > other backend types don't, that seems oddly assymmetrical. Gee, another yak to shave, thanks ;-). You're right, that makes a lot of sense. I added another patch that moves that to a new file, src/backend/tcop/backend_startup.c. ProcessStartupPacket() and friends go there too. It might make sense to do this before the other patches, but it's the last patch in the series now. I kept processCancelRequest() in postmaster.c because it looks at BackendList/ShmemBackendArray, which are static in postmaster.c. Some more refactoring might be in order there, perhaps moving those to a different file too. But that can be done separately, this split is pretty OK as is. On 30/11/2023 20:44, Tristan Partin wrote: >> From 8886db1ed6bae21bf6d77c9bb1230edbb55e24f9 Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Thu, 30 Nov 2023 00:04:22 +0200 >> Subject: [PATCH v3 4/7] Pass CAC as argument to backend process > > For me, being new to the code, it would be nice to have more of an > explanation as to why this is "better." I don't doubt it; it would just > help me and future readers of this commit in the future. More of an > explanation in the commit message would suffice. Updated the commit message. It's mainly to pave the way for the next patches, which move the initialization of Port to the backend process, after forking. And that in turn paves the way for the patches after that. But also, very subjectively, it feels more natural to me. > My other comment on this commit is that we now seem to have lost the > context on what CAC stands for. Before we had the member variable to > explain it. A comment on the enum would be great or changing cac named > variables to canAcceptConnections. I did notice in patch 7 that there > are still some variables named canAcceptConnections around, so I'll > leave this comment up to you. Good point. The last patch in this series - which is new compared to previous patch version - moves CAC_state to a different header file again. I added a comment there. >> + if (fwrite(param, paramsz, 1, fp) != 1) >> + { >> + ereport(LOG, >> + (errcode_for_file_access(), >> + errmsg("could not write to file \"%s\": %m", tmpfilename))); >> + FreeFile(fp); >> + return -1; >> + } >> + >> + /* Release file */ >> + if (FreeFile(fp)) >> + { >> + ereport(LOG, >> + (errcode_for_file_access(), >> + errmsg("could not write to file \"%s\": %m", tmpfilename))); >> + return -1; >> + } > > Two pieces of feedback here. I generally find write(2) more useful than > fwrite(3) because write(2) will report a useful errno, whereas fwrite(2) > just uses ferror(3). The additional errno information might be valuable > context in the log message. Up to you if you think it is also valuable. In general I agree. This patch just moves existing code though, so I left it as is. > The log message if FreeFile() fails doesn't seem to make sense to me. > I didn't see any file writing in that code path, but it is possible that > I missed something. FreeFile() calls fclose(), which flushes the buffer. If fclose() fails, it's most likely because the write() to flush the buffer failed, so "could not write" is usually appropriate. (It feels ugly to me too, error handling with the buffered i/o functions is a bit messy. As you said, plain open()/write() is more clear.) >> + /* >> + * Need to reinitialize the SSL library in the backend, since the context >> + * structures contain function pointers and cannot be passed through the >> + * parameter file. >> + * >> + * If for some reason reload fails (maybe the user installed broken key >> + * files), soldier on without SSL; that's better than all connections >> + * becoming impossible. >> + * >> + * XXX should we do this in all child processes? For the moment it's >> + * enough to do it in backend children. >> + */ >> +#ifdef USE_SSL >> + if (EnableSSL) >> + { >> + if (secure_initialize(false) == 0) >> + LoadedSSL = true; >> + else >> + ereport(LOG, >> + (errmsg("SSL configuration could not be loaded in child process"))); >> + } >> +#endif > > Do other child process types do any non-local communication? No. Although in theory an extension-defined background worker could do whatever, including opening TLS connections. It's not clear if such a background worker would want the same initialization that we do in secure_initialize(), or something else. Here is a new patch set: > v5-0001-Pass-CAC-as-argument-to-backend-process.patch > v5-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch > v5-0003-Move-initialization-of-Port-struct-to-child-proce.patch These patches form a pretty well-contained unit. The gist is to move the initialization of the Port struct to after forking the backend process (in patch 3). I plan to polish and commit these next, so any final reviews on these are welcome. > v5-0004-Extract-registration-of-Win32-deadchild-callback-.patch > v5-0005-Move-some-functions-from-postmaster.c-to-new-sour.patch > v5-0006-Refactor-AuxProcess-startup.patch > v5-0007-Refactor-postmaster-child-process-launching.patch Patches 4-6 are refactorings that don't do much good on their own, but they help to make patch 7 much smaller and easier to review. I left out some of the code-moving that I had in previous patch versions: - Previously I moved fork_process() function from fork_process.c to the new launch_backend.c file. That might still make sense, there is nothing else in fork_process.c and the only caller is in launch_backend.c. But I'm not sure, and it can be done separately. - Previously I moved InitPostmasterChild from miscinit.c to the new launch_backend.c file. That might also still make sense, but I'm not 100% sure it's an improvement, and it can be done later if we want to. > v5-0008-Move-code-for-backend-startup-to-separate-file.patch This moves BackendMain() and friends from postmaster.c to a new file, per Andres's suggestion. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v5-0001-Pass-CAC-as-argument-to-backend-process.patch
- v5-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port-.patch
- v5-0003-Move-initialization-of-Port-struct-to-child-proce.patch
- v5-0004-Extract-registration-of-Win32-deadchild-callback-.patch
- v5-0005-Move-some-functions-from-postmaster.c-to-new-sour.patch
- v5-0006-Refactor-AuxProcess-startup.patch
- v5-0007-Refactor-postmaster-child-process-launching.patch
- v5-0008-Move-code-for-backend-startup-to-separate-file.patch
On 08/12/2023 14:33, Heikki Linnakangas wrote: >>> + [PMC_AV_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true}, >>> + [PMC_AV_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true}, >>> + [PMC_BGWORKER] = {"bgworker", BackgroundWorkerMain, true}, >>> + [PMC_SYSLOGGER] = {"syslogger", SysLoggerMain, false}, >>> + >>> + [PMC_STARTUP] = {"startup", StartupProcessMain, true}, >>> + [PMC_BGWRITER] = {"bgwriter", BackgroundWriterMain, true}, >>> + [PMC_ARCHIVER] = {"archiver", PgArchiverMain, true}, >>> + [PMC_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true}, >>> + [PMC_WAL_WRITER] = {"wal_writer", WalWriterMain, true}, >>> + [PMC_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true}, >>> +}; >> >> It feels like we have too many different ways of documenting the type of a >> process. This new PMC_ stuff, enum AuxProcType, enum BackendType. > Agreed. And "am_walsender" and such variables. Here's a patch that gets rid of AuxProcType. It's independent of the other patches in this thread; if this is committed, I'll rebase the rest of the patches over this and get rid of the new PMC_* enum. Three patches, actually. The first one fixes an existing comment that I noticed to be incorrect while working on this. I'll push that soon, barring objections. The second one gets rid of AuxProcType, and the third one replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and IsAutoVacuumWorkerProcess() with checks on MyBackendType. So MyBackendType is now the primary way to check what kind of a process the current process is. 'am_walsender' would also be fairly straightforward to remove and replace with AmWalSenderProcess(). I didn't do that because we also have am_db_walsender and am_cascading_walsender which cannot be directly replaced with MyBackendType. Given that, it might be best to keep am_walsender for symmetry. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Hi, On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote: > Here's a patch that gets rid of AuxProcType. It's independent of the other > patches in this thread; if this is committed, I'll rebase the rest of the > patches over this and get rid of the new PMC_* enum. > > Three patches, actually. The first one fixes an existing comment that I > noticed to be incorrect while working on this. I'll push that soon, barring > objections. The second one gets rid of AuxProcType, and the third one > replaces IsBackgroundWorker, IsAutoVacuumLauncherProcess() and > IsAutoVacuumWorkerProcess() with checks on MyBackendType. So MyBackendType > is now the primary way to check what kind of a process the current process > is. > > 'am_walsender' would also be fairly straightforward to remove and replace > with AmWalSenderProcess(). I didn't do that because we also have > am_db_walsender and am_cascading_walsender which cannot be directly replaced > with MyBackendType. Given that, it might be best to keep am_walsender for > symmetry. > @@ -561,13 +561,13 @@ static void ShmemBackendArrayAdd(Backend *bn); > static void ShmemBackendArrayRemove(Backend *bn); > #endif /* EXEC_BACKEND */ > > -#define StartupDataBase() StartChildProcess(StartupProcess) > -#define StartArchiver() StartChildProcess(ArchiverProcess) > -#define StartBackgroundWriter() StartChildProcess(BgWriterProcess) > -#define StartCheckpointer() StartChildProcess(CheckpointerProcess) > -#define StartWalWriter() StartChildProcess(WalWriterProcess) > -#define StartWalReceiver() StartChildProcess(WalReceiverProcess) > -#define StartWalSummarizer() StartChildProcess(WalSummarizerProcess) > +#define StartupDataBase() StartChildProcess(B_STARTUP) > +#define StartArchiver() StartChildProcess(B_ARCHIVER) > +#define StartBackgroundWriter() StartChildProcess(B_BG_WRITER) > +#define StartCheckpointer() StartChildProcess(B_CHECKPOINTER) > +#define StartWalWriter() StartChildProcess(B_WAL_WRITER) > +#define StartWalReceiver() StartChildProcess(B_WAL_RECEIVER) > +#define StartWalSummarizer() StartChildProcess(B_WAL_SUMMARIZER) Not for this commit, but we ought to rip out these macros - all they do is to make it harder to understand the code. > @@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type) > errno = save_errno; > switch (type) > { > - case StartupProcess: > + case B_STARTUP: > ereport(LOG, > (errmsg("could not fork startup process: %m"))); > break; > - case ArchiverProcess: > + case B_ARCHIVER: > ereport(LOG, > (errmsg("could not fork archiver process: %m"))); > break; > - case BgWriterProcess: > + case B_BG_WRITER: > ereport(LOG, > (errmsg("could not fork background writer process: %m"))); > break; > - case CheckpointerProcess: > + case B_CHECKPOINTER: > ereport(LOG, > (errmsg("could not fork checkpointer process: %m"))); > break; > - case WalWriterProcess: > + case B_WAL_WRITER: > ereport(LOG, > (errmsg("could not fork WAL writer process: %m"))); > break; > - case WalReceiverProcess: > + case B_WAL_RECEIVER: > ereport(LOG, > (errmsg("could not fork WAL receiver process: %m"))); > break; > - case WalSummarizerProcess: > + case B_WAL_SUMMARIZER: > ereport(LOG, > (errmsg("could not fork WAL summarizer process: %m"))); > break; Seems we should replace this with something slightly more generic one of these days... > diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c > index 1a1050c8da1..92f24db4e18 100644 > --- a/src/backend/utils/activity/backend_status.c > +++ b/src/backend/utils/activity/backend_status.c > @@ -257,17 +257,16 @@ pgstat_beinit(void) > else > { > /* Must be an auxiliary process */ > - Assert(MyAuxProcType != NotAnAuxProcess); > + Assert(IsAuxProcess(MyBackendType)); > > /* > * Assign the MyBEEntry for an auxiliary process. Since it doesn't > * have a BackendId, the slot is statically allocated based on the > - * auxiliary process type (MyAuxProcType). Backends use slots indexed > - * in the range from 0 to MaxBackends (exclusive), so we use > - * MaxBackends + AuxProcType as the index of the slot for an auxiliary > - * process. > + * auxiliary process type. Backends use slots indexed in the range > + * from 0 to MaxBackends (exclusive), and aux processes use the slots > + * after that. > */ > - MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType]; > + MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC]; > } Hm, this seems less than pretty. It's probably ok for now, but it seems like a better fix might be to just start assigning backend ids to aux procs or switch to indexing by pgprocno? > From 795929a5f5a5d6ea4fa8a46bb15c68d2ff46ad3d Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed, 10 Jan 2024 12:59:48 +0200 > Subject: [PATCH v6 3/3] Use MyBackendType in more places to check what process > this is > > Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess() and > IsAutoVacuumWorkerProcess() in favor of new Am*Process() macros that > use MyBackendType. For consistency with the existing Am*Process() > macros. The Am*Process() macros aren't realy new, they are just implemented differently, right? I guess there are a few more of them now. Given that we are probably going to have more process types in the future, it seems like a better direction would be a AmProcessType(proctype) style macro/inline function. That we we don't have to mirror the list of process types in the enum and a set of macros. Greetings, Andres Freund
On 22/01/2024 23:07, Andres Freund wrote: > On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote: >> @@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type) >> errno = save_errno; >> switch (type) >> { >> - case StartupProcess: >> + case B_STARTUP: >> ereport(LOG, >> (errmsg("could not fork startup process: %m"))); >> break; >> - case ArchiverProcess: >> + case B_ARCHIVER: >> ereport(LOG, >> (errmsg("could not fork archiver process: %m"))); >> break; >> - case BgWriterProcess: >> + case B_BG_WRITER: >> ereport(LOG, >> (errmsg("could not fork background writer process: %m"))); >> break; >> - case CheckpointerProcess: >> + case B_CHECKPOINTER: >> ereport(LOG, >> (errmsg("could not fork checkpointer process: %m"))); >> break; >> - case WalWriterProcess: >> + case B_WAL_WRITER: >> ereport(LOG, >> (errmsg("could not fork WAL writer process: %m"))); >> break; >> - case WalReceiverProcess: >> + case B_WAL_RECEIVER: >> ereport(LOG, >> (errmsg("could not fork WAL receiver process: %m"))); >> break; >> - case WalSummarizerProcess: >> + case B_WAL_SUMMARIZER: >> ereport(LOG, >> (errmsg("could not fork WAL summarizer process: %m"))); >> break; > > Seems we should replace this with something slightly more generic one of these > days... The later patches in this thread will turn these into ereport(LOG, (errmsg("could not fork %s process: %m", PostmasterChildName(type)))); >> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c >> index 1a1050c8da1..92f24db4e18 100644 >> --- a/src/backend/utils/activity/backend_status.c >> +++ b/src/backend/utils/activity/backend_status.c >> @@ -257,17 +257,16 @@ pgstat_beinit(void) >> else >> { >> /* Must be an auxiliary process */ >> - Assert(MyAuxProcType != NotAnAuxProcess); >> + Assert(IsAuxProcess(MyBackendType)); >> >> /* >> * Assign the MyBEEntry for an auxiliary process. Since it doesn't >> * have a BackendId, the slot is statically allocated based on the >> - * auxiliary process type (MyAuxProcType). Backends use slots indexed >> - * in the range from 0 to MaxBackends (exclusive), so we use >> - * MaxBackends + AuxProcType as the index of the slot for an auxiliary >> - * process. >> + * auxiliary process type. Backends use slots indexed in the range >> + * from 0 to MaxBackends (exclusive), and aux processes use the slots >> + * after that. >> */ >> - MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType]; >> + MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC]; >> } > > Hm, this seems less than pretty. It's probably ok for now, but it seems like a > better fix might be to just start assigning backend ids to aux procs or switch > to indexing by pgprocno? Using pgprocno is a good idea. Come to think of it, why do we even have a concept of backend ID that's separate from pgprocno? backend ID is used to index the ProcState array, but AFAICS we could use pgprocno as the index to that, too. -- Heikki Linnakangas Neon (https://neon.tech)
Hi, On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote: > On 22/01/2024 23:07, Andres Freund wrote: > > > diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c > > > index 1a1050c8da1..92f24db4e18 100644 > > > --- a/src/backend/utils/activity/backend_status.c > > > +++ b/src/backend/utils/activity/backend_status.c > > > @@ -257,17 +257,16 @@ pgstat_beinit(void) > > > else > > > { > > > /* Must be an auxiliary process */ > > > - Assert(MyAuxProcType != NotAnAuxProcess); > > > + Assert(IsAuxProcess(MyBackendType)); > > > /* > > > * Assign the MyBEEntry for an auxiliary process. Since it doesn't > > > * have a BackendId, the slot is statically allocated based on the > > > - * auxiliary process type (MyAuxProcType). Backends use slots indexed > > > - * in the range from 0 to MaxBackends (exclusive), so we use > > > - * MaxBackends + AuxProcType as the index of the slot for an auxiliary > > > - * process. > > > + * auxiliary process type. Backends use slots indexed in the range > > > + * from 0 to MaxBackends (exclusive), and aux processes use the slots > > > + * after that. > > > */ > > > - MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType]; > > > + MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC]; > > > } > > > > Hm, this seems less than pretty. It's probably ok for now, but it seems like a > > better fix might be to just start assigning backend ids to aux procs or switch > > to indexing by pgprocno? > > Using pgprocno is a good idea. Come to think of it, why do we even have a > concept of backend ID that's separate from pgprocno? backend ID is used to > index the ProcState array, but AFAICS we could use pgprocno as the index to > that, too. I think we should do that. There are a few processes not participating in sinval, but it doesn't make enough of a difference to make sinval slower. And I think there'd be far bigger efficiency improvements to sinvaladt than not having a handful more entries. Greetings, Andres Freund
On 23/01/2024 21:50, Andres Freund wrote: > On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote: >> On 22/01/2024 23:07, Andres Freund wrote: >>>> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c >>>> index 1a1050c8da1..92f24db4e18 100644 >>>> --- a/src/backend/utils/activity/backend_status.c >>>> +++ b/src/backend/utils/activity/backend_status.c >>>> @@ -257,17 +257,16 @@ pgstat_beinit(void) >>>> else >>>> { >>>> /* Must be an auxiliary process */ >>>> - Assert(MyAuxProcType != NotAnAuxProcess); >>>> + Assert(IsAuxProcess(MyBackendType)); >>>> /* >>>> * Assign the MyBEEntry for an auxiliary process. Since it doesn't >>>> * have a BackendId, the slot is statically allocated based on the >>>> - * auxiliary process type (MyAuxProcType). Backends use slots indexed >>>> - * in the range from 0 to MaxBackends (exclusive), so we use >>>> - * MaxBackends + AuxProcType as the index of the slot for an auxiliary >>>> - * process. >>>> + * auxiliary process type. Backends use slots indexed in the range >>>> + * from 0 to MaxBackends (exclusive), and aux processes use the slots >>>> + * after that. >>>> */ >>>> - MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType]; >>>> + MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - FIRST_AUX_PROC]; >>>> } >>> >>> Hm, this seems less than pretty. It's probably ok for now, but it seems like a >>> better fix might be to just start assigning backend ids to aux procs or switch >>> to indexing by pgprocno? >> >> Using pgprocno is a good idea. Come to think of it, why do we even have a >> concept of backend ID that's separate from pgprocno? backend ID is used to >> index the ProcState array, but AFAICS we could use pgprocno as the index to >> that, too. > > I think we should do that. There are a few processes not participating in > sinval, but it doesn't make enough of a difference to make sinval slower. And > I think there'd be far bigger efficiency improvements to sinvaladt than not > having a handful more entries. And here we go. BackendID is now a 1-based index directly into the PGPROC array. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote: > > And here we go. BackendID is now a 1-based index directly into the > PGPROC array. > Would it be worthwhile to also note in this comment FIRST_AUX_PROC's and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the enum table? /* ¦* Auxiliary processes. These have PGPROC entries, but they are not ¦* attached to any particular database. There can be only one of each of ¦* these running at a time. ¦* ¦* If you modify these, make sure to update NUM_AUXILIARY_PROCS and the ¦* glossary in the docs. ¦*/ B_ARCHIVER, B_BG_WRITER, B_CHECKPOINTER, B_STARTUP, B_WAL_RECEIVER, B_WAL_SUMMARIZER, B_WAL_WRITER, } BackendType; #define BACKEND_NUM_TYPES (B_WAL_WRITER + 1) extern PGDLLIMPORT BackendType MyBackendType; #define FIRST_AUX_PROC B_ARCHIVER #define IsAuxProcess(type) (MyBackendType >= FIRST_AUX_PROC)
On 29/01/2024 17:54, reid.thompson@crunchydata.com wrote: > On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote: >> >> And here we go. BackendID is now a 1-based index directly into the >> PGPROC array. > > Would it be worthwhile to also note in this comment FIRST_AUX_PROC's > and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the > enum table? Yeah, that might be in order. Although looking closer, it's only used in IsAuxProcess, which is only used in one sanity check in AuxProcessMain(). And even that gets refactored away by the later patches in this thread. So on second thoughts, I'll just remove it altogether. I spent some more time on the 'lastBackend' optimization in sinvaladt.c. I realized that it became very useless with these patches, because aux processes are allocated pgprocno's after all the slots for regular backends. There are always aux processes running, so lastBackend would always have a value close to the max anyway. I replaced that with a dense 'pgprocnos' array that keeps track of the exact slots that are in use. I'm not 100% sure this is worth the effort; does any real world workload send shared invalidations so frequently that this matters? In any case, this should avoid the regression if such a workload exists. New patch set attached. I think these are ready to be committed, but would appreciate a final review. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v8-0001-Remove-superfluous-pgprocno-field-from-PGPROC.patch
- v8-0002-Redefine-backend-ID-to-be-an-index-into-the-proc-.patch
- v8-0003-Replace-lastBackend-with-an-array-of-in-use-slots.patch
- v8-0004-Remove-MyAuxProcType-use-MyBackendType-instead.patch
- v8-0005-Use-MyBackendType-in-more-places-to-check-what-pr.patch
On 30/01/2024 02:08, Heikki Linnakangas wrote: > On 29/01/2024 17:54, reid.thompson@crunchydata.com wrote: >> On Thu, 2024-01-25 at 01:51 +0200, Heikki Linnakangas wrote: >>> >>> And here we go. BackendID is now a 1-based index directly into the >>> PGPROC array. >> >> Would it be worthwhile to also note in this comment FIRST_AUX_PROC's >> and IsAuxProcess()'s dependency on B_ARCHIVER and it's location in the >> enum table? > > Yeah, that might be in order. Although looking closer, it's only used in > IsAuxProcess, which is only used in one sanity check in > AuxProcessMain(). And even that gets refactored away by the later > patches in this thread. So on second thoughts, I'll just remove it > altogether. > > I spent some more time on the 'lastBackend' optimization in sinvaladt.c. > I realized that it became very useless with these patches, because aux > processes are allocated pgprocno's after all the slots for regular > backends. There are always aux processes running, so lastBackend would > always have a value close to the max anyway. I replaced that with a > dense 'pgprocnos' array that keeps track of the exact slots that are in > use. I'm not 100% sure this is worth the effort; does any real world > workload send shared invalidations so frequently that this matters? In > any case, this should avoid the regression if such a workload exists. > > New patch set attached. I think these are ready to be committed, but > would appreciate a final review. contrib/amcheck 003_cic_2pc.pl test failures revealed a bug that required some reworking: In a PGPROC entry for a prepared xact, the PGPROC's backendID needs to be the original backend's ID, because the prepared xact is holding the lock on the original virtual transaction id. When a transaction's ownership is moved from the original backend's PGPROC entry to the prepared xact PGPROC entry, the backendID needs to be copied over. My patch removed the field altogether, so it was not copied over, which made it look like it the original VXID lock was released at prepare. I fixed that by adding back the backendID field. For regular backends, it's always equal to pgprocno + 1, but for prepared xacts, it's the original backend's ID. To make that less confusing, I moved the backendID and lxid fields together under a 'vxid' struct. The two fields together form the virtual transaction ID, and that's the only context where the 'backendID' field should now be looked at. I also squashed the 'lastBackend' changes in sinvaladt.c to the main patch. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Hi, On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote: > I spent some more time on the 'lastBackend' optimization in sinvaladt.c. I > realized that it became very useless with these patches, because aux > processes are allocated pgprocno's after all the slots for regular backends. > There are always aux processes running, so lastBackend would always have a > value close to the max anyway. I replaced that with a dense 'pgprocnos' > array that keeps track of the exact slots that are in use. I'm not 100% sure > this is worth the effort; does any real world workload send shared > invalidations so frequently that this matters? In any case, this should > avoid the regression if such a workload exists. > > New patch set attached. I think these are ready to be committed, but would > appreciate a final review. > From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Wed, 24 Jan 2024 23:15:55 +0200 > Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC > > It was always just the index of the PGPROC entry from the beginning of > the proc array. Introduce a macro to compute it from the pointer > instead. Hm. The pointer math here is bit more expensive than in some other cases, as the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding more math into loops like in TransactionGroupUpdateXidStatus() might end up showing up. I've been thinking that we likely should pad PGPROC to some more convenient boundary, but... Is this really related to the rest of the series? > From 4e0121e064804b73ef8a5dc10be27b85968ea1af Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 29 Jan 2024 23:50:34 +0200 > Subject: [PATCH v8 2/5] Redefine backend ID to be an index into the proc > array. > > Previously, backend ID was an index into the ProcState array, in the > shared cache invalidation manager (sinvaladt.c). The entry in the > ProcState array was reserved at backend startup by scanning the array > for a free entry, and that was also when the backend got its backend > ID. Things becomes slightly simpler if we redefine backend ID to be > the index into the PGPROC array, and directly use it also as an index > to the ProcState array. This uses a little more memory, as we reserve > a few extra slots in the ProcState array for aux processes that don't > need them, but the simplicity is worth it. > Aux processes now also have a backend ID. This simplifies the > reservation of BackendStatusArray and ProcSignal slots. > > You can now convert a backend ID into an index into the PGPROC array > simply by subtracting 1. We still use 0-based "pgprocnos" in various > places, for indexes into the PGPROC array, but the only difference now > is that backend IDs start at 1 while pgprocnos start at 0. Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so there'd not be a conflict, right? > One potential downside of this patch is that the ProcState array might > get less densely packed, as we we don't try so hard to assign > low-numbered backend ID anymore. If it's less densely packed, > lastBackend will stay at a higher value, and SIInsertDataEntries() and > SICleanupQueue() need to scan over more unused entries. I think that's > fine. They are performance critical enough to matter, and there was no > guarantee on dense packing before either: If you launched a lot of > backends concurrently, and kept the last one open, lastBackend would > also stay at a high value. It's perhaps worth noting here that there's a future patch that also addresses this to some degree? > @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, > Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE)); > > Assert(gxact != NULL); > - proc = &ProcGlobal->allProcs[gxact->pgprocno]; > + proc = GetPGProcByNumber(gxact->pgprocno); > > /* Initialize the PGPROC entry */ > MemSet(proc, 0, sizeof(PGPROC)); This set of changes is independent of this commit, isn't it? > diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c > index ab86e802f21..39171fea06b 100644 > --- a/src/backend/postmaster/auxprocess.c > +++ b/src/backend/postmaster/auxprocess.c > @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype) > > BaseInit(); > > - /* > - * Assign the ProcSignalSlot for an auxiliary process. Since it doesn't > - * have a BackendId, the slot is statically allocated based on the > - * auxiliary process type (MyAuxProcType). Backends use slots indexed in > - * the range from 1 to MaxBackends (inclusive), so we use MaxBackends + > - * AuxProcType + 1 as the index of the slot for an auxiliary process. > - * > - * This will need rethinking if we ever want more than one of a particular > - * auxiliary process type. > - */ > - ProcSignalInit(MaxBackends + MyAuxProcType + 1); > + ProcSignalInit(); Now that we don't need the offset here, we could move ProcSignalInit() into BsaeInit() I think? > +/* > + * BackendIdGetProc -- get a backend's PGPROC given its backend ID > + * > + * The result may be out of date arbitrarily quickly, so the caller > + * must be careful about how this information is used. NULL is > + * returned if the backend is not active. > + */ > +PGPROC * > +BackendIdGetProc(int backendID) > +{ > + PGPROC *result; > + > + if (backendID < 1 || backendID > ProcGlobal->allProcCount) > + return NULL; Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not about being out of date or such. > +/* > + * BackendIdGetTransactionIds -- get a backend's transaction status > + * > + * Get the xid, xmin, nsubxid and overflow status of the backend. The > + * result may be out of date arbitrarily quickly, so the caller must be > + * careful about how this information is used. > + */ > +void > +BackendIdGetTransactionIds(int backendID, TransactionId *xid, > + TransactionId *xmin, int *nsubxid, bool *overflowed) > +{ > + PGPROC *proc; > + > + *xid = InvalidTransactionId; > + *xmin = InvalidTransactionId; > + *nsubxid = 0; > + *overflowed = false; > + > + if (backendID < 1 || backendID > ProcGlobal->allProcCount) > + return; > + proc = GetPGProcByBackendId(backendID); > + > + /* Need to lock out additions/removals of backends */ > + LWLockAcquire(ProcArrayLock, LW_SHARED); > + > + if (proc->pid != 0) > + { > + *xid = proc->xid; > + *xmin = proc->xmin; > + *nsubxid = proc->subxidStatus.count; > + *overflowed = proc->subxidStatus.overflowed; > + } > + > + LWLockRelease(ProcArrayLock); > +} Hm, I'm not sure about the locking here. For one, previously we weren't holding ProcArrayLock. For another, holding ProcArrayLock guarantees that the backend doesn't end its transaction, but it can still assign xids etc. And, for that matter, the backendid could have been recycled between the caller acquiring the backendId and calling BackendIdGetTransactionIds(). > --- a/src/backend/utils/error/elog.c > +++ b/src/backend/utils/error/elog.c > @@ -3074,18 +3074,18 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata) > break; > case 'v': > /* keep VXID format in sync with lockfuncs.c */ > - if (MyProc != NULL && MyProc->backendId != InvalidBackendId) > + if (MyProc != NULL) Doesn't this mean we'll include a vxid in more cases now, particularly including aux processes? That might be ok, but I also suspect that it'll never have meaningful values... > From 94fd46c9ef30ba5e8ac1a8873fce577a4be425f4 Mon Sep 17 00:00:00 2001 > From: Heikki Linnakangas <heikki.linnakangas@iki.fi> > Date: Mon, 29 Jan 2024 22:57:49 +0200 > Subject: [PATCH v8 3/5] Replace 'lastBackend' with an array of in-use slots > > Now that the procState array is indexed by pgprocno, the 'lastBackend' > optimization is useless, because aux processes are assigned PGPROC > slots and hence have numbers higher than max_connection. So > 'lastBackend' was always set to almost the end of the array. > > To replace that optimization, mantain a dense array of in-use > indexes. This's redundant with ProgGlobal->procarray, but I was afraid > of adding any more contention to ProcArrayLock, and this keeps the > code isolated to sinvaladt.c too. I think it'd be good to include that explanation and justification in the code as well. I suspect we'll need to split out "procarray membership" locking from ProcArrayLock at some point in some form (vagueness alert). To reduce contention we already have to hold both ProcArrayLock and XidGenLock when changing membership, so that holding either of the locks prevents the set of members to change. This, kinda and differently, adds yet another lock to that. > It's not clear if we need that optimization at all. I was able to > write a test case that become slower without this: set max_connections > to a very high number (> 5000), and create+truncate a table in the > same transaction thousands of times to send invalidation messages, > with fsync=off. That became about 20% slower on my laptop. Arguably > that's so unrealistic that it doesn't matter, but nevertheless, this > commit restores the performance of that. I think it's unfortunately not that uncommon to be bottlenecked by sinval performance, so I think it's good that you're addressing it. Greetings, Andres Freund
On 07/02/2024 20:25, Andres Freund wrote: > On 2024-01-30 02:08:36 +0200, Heikki Linnakangas wrote: >> From 54f22231bb2540fc5957c14005956161e6fc9dac Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Wed, 24 Jan 2024 23:15:55 +0200 >> Subject: [PATCH v8 1/5] Remove superfluous 'pgprocno' field from PGPROC >> >> It was always just the index of the PGPROC entry from the beginning of >> the proc array. Introduce a macro to compute it from the pointer >> instead. > > Hm. The pointer math here is bit more expensive than in some other cases, as > the struct is fairly large and sizeof(PGPROC) isn't a power of two. Adding > more math into loops like in TransactionGroupUpdateXidStatus() might end up > showing up. I added a MyProcNumber global variable that is set to GetNumberFromPGProc(MyProc). I'm not really concerned about the extra math, but with MyProcNumber it should definitely not be an issue. The few GetNumberFromPGProc() invocations that remain are in less performance-critical paths. (In later patch, I switch backend ids to 0-based indexing, which replaces MyProcNumber references with MyBackendId) > Is this really related to the rest of the series? It's not strictly necessary, but it felt prudent to remove it now, since I'm removing the backendID field too. >> You can now convert a backend ID into an index into the PGPROC array >> simply by subtracting 1. We still use 0-based "pgprocnos" in various >> places, for indexes into the PGPROC array, but the only difference now >> is that backend IDs start at 1 while pgprocnos start at 0. > > Why aren't we using 0-based indexing for both? InvalidBackendId is -1, so > there'd not be a conflict, right? Correct. I was being conservative and didn't dare to change the old convention. The backend ids are visible in a few places like "pg_temp_0" schema names, and pg_stat_get_*() functions. One alternative would be to reserve and waste allProcs[0]. Then pgprocno and backend ID could both be direct indexes to the array, but 0 would not be used. If we switch to 0-based indexing, it begs the question: why don't we merge the concepts of "pgprocno" and "BackendId" completely and call it the same thing everywhere? It probably would be best in the long run. It feels like a lot of churn though. Anyway, I switched to 0-based indexing in the attached new version, to see what it looks like. >> @@ -457,7 +442,7 @@ MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, >> Assert(LWLockHeldByMeInMode(TwoPhaseStateLock, LW_EXCLUSIVE)); >> >> Assert(gxact != NULL); >> - proc = &ProcGlobal->allProcs[gxact->pgprocno]; >> + proc = GetPGProcByNumber(gxact->pgprocno); >> >> /* Initialize the PGPROC entry */ >> MemSet(proc, 0, sizeof(PGPROC)); > > This set of changes is independent of this commit, isn't it? Yes. It's just for symmetry, now that we use GetNumberFromPGProc() to get the pgprocno. >> diff --git a/src/backend/postmaster/auxprocess.c b/src/backend/postmaster/auxprocess.c >> index ab86e802f21..39171fea06b 100644 >> --- a/src/backend/postmaster/auxprocess.c >> +++ b/src/backend/postmaster/auxprocess.c >> @@ -107,17 +107,7 @@ AuxiliaryProcessMain(AuxProcType auxtype) >> >> BaseInit(); >> >> - /* >> - * Assign the ProcSignalSlot for an auxiliary process. Since it doesn't >> - * have a BackendId, the slot is statically allocated based on the >> - * auxiliary process type (MyAuxProcType). Backends use slots indexed in >> - * the range from 1 to MaxBackends (inclusive), so we use MaxBackends + >> - * AuxProcType + 1 as the index of the slot for an auxiliary process. >> - * >> - * This will need rethinking if we ever want more than one of a particular >> - * auxiliary process type. >> - */ >> - ProcSignalInit(MaxBackends + MyAuxProcType + 1); >> + ProcSignalInit(); > > Now that we don't need the offset here, we could move ProcSignalInit() into > BsaeInit() I think? Hmm, doesn't feel right to me. BaseInit() is mostly concerned with setting up backend-private structures, and it's also called for a standalone backend. I feel the process initialization codepaths could use some cleanup in general. Not sure what exactly. >> +/* >> + * BackendIdGetProc -- get a backend's PGPROC given its backend ID >> + * >> + * The result may be out of date arbitrarily quickly, so the caller >> + * must be careful about how this information is used. NULL is >> + * returned if the backend is not active. >> + */ >> +PGPROC * >> +BackendIdGetProc(int backendID) >> +{ >> + PGPROC *result; >> + >> + if (backendID < 1 || backendID > ProcGlobal->allProcCount) >> + return NULL; > > Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not > about being out of date or such. Perhaps. I just followed the example of the old implementation, which also returns NULL on bogus inputs. >> +/* >> + * BackendIdGetTransactionIds -- get a backend's transaction status >> + * >> + * Get the xid, xmin, nsubxid and overflow status of the backend. The >> + * result may be out of date arbitrarily quickly, so the caller must be >> + * careful about how this information is used. >> + */ >> +void >> +BackendIdGetTransactionIds(int backendID, TransactionId *xid, >> + TransactionId *xmin, int *nsubxid, bool *overflowed) >> +{ >> + PGPROC *proc; >> + >> + *xid = InvalidTransactionId; >> + *xmin = InvalidTransactionId; >> + *nsubxid = 0; >> + *overflowed = false; >> + >> + if (backendID < 1 || backendID > ProcGlobal->allProcCount) >> + return; >> + proc = GetPGProcByBackendId(backendID); >> + >> + /* Need to lock out additions/removals of backends */ >> + LWLockAcquire(ProcArrayLock, LW_SHARED); >> + >> + if (proc->pid != 0) >> + { >> + *xid = proc->xid; >> + *xmin = proc->xmin; >> + *nsubxid = proc->subxidStatus.count; >> + *overflowed = proc->subxidStatus.overflowed; >> + } >> + >> + LWLockRelease(ProcArrayLock); >> +} > > Hm, I'm not sure about the locking here. For one, previously we weren't > holding ProcArrayLock. For another, holding ProcArrayLock guarantees that the > backend doesn't end its transaction, but it can still assign xids etc. And, > for that matter, the backendid could have been recycled between the caller > acquiring the backendId and calling BackendIdGetTransactionIds(). Yeah, the returned values could be out-of-date and even inconsistent with each other. I just faithfully copied the old implementation. Perhaps this should just skip the ProcArrayLock altogether. >> --- a/src/backend/utils/error/elog.c >> +++ b/src/backend/utils/error/elog.c >> @@ -3074,18 +3074,18 @@ log_status_format(StringInfo buf, const char *format, ErrorData *edata) >> break; >> case 'v': >> /* keep VXID format in sync with lockfuncs.c */ >> - if (MyProc != NULL && MyProc->backendId != InvalidBackendId) >> + if (MyProc != NULL) > > Doesn't this mean we'll include a vxid in more cases now, particularly > including aux processes? That might be ok, but I also suspect that it'll never > have meaningful values... Fixed. (I thought I changed that back already in the last patch version, but apparently I only did it in jsonlog.c) >> From 94fd46c9ef30ba5e8ac1a8873fce577a4be425f4 Mon Sep 17 00:00:00 2001 >> From: Heikki Linnakangas <heikki.linnakangas@iki.fi> >> Date: Mon, 29 Jan 2024 22:57:49 +0200 >> Subject: [PATCH v8 3/5] Replace 'lastBackend' with an array of in-use slots >> >> Now that the procState array is indexed by pgprocno, the 'lastBackend' >> optimization is useless, because aux processes are assigned PGPROC >> slots and hence have numbers higher than max_connection. So >> 'lastBackend' was always set to almost the end of the array. >> >> To replace that optimization, mantain a dense array of in-use >> indexes. This's redundant with ProgGlobal->procarray, but I was afraid >> of adding any more contention to ProcArrayLock, and this keeps the >> code isolated to sinvaladt.c too. > > I think it'd be good to include that explanation and justification in the code > as well. Added a comment. Attached is a new version of these BackendId changes. I kept it as three separate patches to highlight the changes from switching to 0-based indexing, but I think they should be squashed together before pushing. I think the last remaining question here is about the 0- vs 1-based indexing of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do it, should we reserve PGPROC 0. I'm on the fence on this one. And if we switch to 0-based indexing, should we do a more comprehensive search & replace of "pgprocno" to "backendId", or something like that. My vote is no, the code churn doesn't feel worth it. And it can also be done separately later if we want to. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
Hi, On 2024-02-08 13:19:53 +0200, Heikki Linnakangas wrote: > > > - /* > > > - * Assign the ProcSignalSlot for an auxiliary process. Since it doesn't > > > - * have a BackendId, the slot is statically allocated based on the > > > - * auxiliary process type (MyAuxProcType). Backends use slots indexed in > > > - * the range from 1 to MaxBackends (inclusive), so we use MaxBackends + > > > - * AuxProcType + 1 as the index of the slot for an auxiliary process. > > > - * > > > - * This will need rethinking if we ever want more than one of a particular > > > - * auxiliary process type. > > > - */ > > > - ProcSignalInit(MaxBackends + MyAuxProcType + 1); > > > + ProcSignalInit(); > > > > Now that we don't need the offset here, we could move ProcSignalInit() into > > BsaeInit() I think? > > Hmm, doesn't feel right to me. BaseInit() is mostly concerned with setting > up backend-private structures, and it's also called for a standalone > backend. It already initializes a lot of shared subsystems (pgstat, replication slots and arguable things like the buffer pool, temporary file access and WAL). And note that it already requires that MyProc is already set (but it's not yet "added" to the procarray, i.e. doesn't do visibility stuff at that stage). I don't think that BaseInit() being called by standalone backends really poses a problem? So is InitPostgres(), which does call ProcSignalInit() in standalone processes. My mental model is that BaseInit() is for stuff that's shared between processes that do attach to databases and those that don't. Right now the initialization flow is something like this ascii diagram: standalone: \ /-> StartupXLOG() \ -> InitProcess() -\ /-> ProcArrayAdd() -> SharedInvalBackendInit() -> ProcSignalInit()- -> pgstat_beinit() -> attach to db -> pgstat_bestart() normal backend: / \ / -> BaseInit() - aux process: InitAuxiliaryProcess() -/ \-- -> ProcSignalInit() -> pgstat_beinit() -> pgstat_bestart() The only reason ProcSignalInit() happens kinda late is that historically we used BackendIds as the index, which were only assigned in SharedInvalBackendInit() for normal processes. But that doesn't make sense anymore after your changes. Similarly, we do pgstat_beinit() quite late, but that's again only because it uses MyBackendId, which today is only assigned during SharedInvalBackendInit(). I don't think we can do pgstat_bestart() earlier though, which is a shame, given the four calls to it inside InitPostgres(). > I feel the process initialization codepaths could use some cleanup in > general. Not sure what exactly. Very much agreed. > > > +/* > > > + * BackendIdGetProc -- get a backend's PGPROC given its backend ID > > > + * > > > + * The result may be out of date arbitrarily quickly, so the caller > > > + * must be careful about how this information is used. NULL is > > > + * returned if the backend is not active. > > > + */ > > > +PGPROC * > > > +BackendIdGetProc(int backendID) > > > +{ > > > + PGPROC *result; > > > + > > > + if (backendID < 1 || backendID > ProcGlobal->allProcCount) > > > + return NULL; > > > > Hm, doesn't calling BackendIdGetProc() with these values a bug? That's not > > about being out of date or such. > > Perhaps. I just followed the example of the old implementation, which also > returns NULL on bogus inputs. Fair enough. Makes it harder to not notice bugs, but that's not on this patchset to fix... > I think the last remaining question here is about the 0- vs 1-based indexing > of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do > it, should we reserve PGPROC 0. I'm on the fence on this one. I lean towards it being a good idea. Having two internal indexing schemes was bad enough so far, but at least one would fairly quickly notice if one used the wrong one. If they're just offset by 1, it might end up taking longer, because that'll often also be a valid id. But I think you have the author's prerogative on this one. If we do so, I think it might be better to standardize on MyProcNumber instead of MyBackendId. That'll force looking at code where indexing shifts by 1 - and it also seems more descriptive, as inside postgres it's imo clearer what a "proc number" is than what a "backend id" is. Particularly because the latter is also used for things that aren't backends... The only exception are SQL level users, for those I think it might make sense to keep the current 1 based indexing, there's just a few functions where we'd need to translate. > @@ -791,6 +792,7 @@ ProcArrayEndTransactionInternal(PGPROC *proc, TransactionId latestXid) > static void > ProcArrayGroupClearXid(PGPROC *proc, TransactionId latestXid) > { > + int pgprocno = GetNumberFromPGProc(proc); > PROC_HDR *procglobal = ProcGlobal; > uint32 nextidx; > uint32 wakeidx; This one is the only one where I could see the additional math done in GetNumberFromPGProc() hurting. Which is somewhat silly, because the proc passed in is always MyProc. In the most unrealistic workload imaginable (many backends doing nothing but assigning xids and committing, server-side), it indeed seems to make a tiny difference. But not enough to worry about, I think. FWIW, if I use GetNumberFromPGProc(MyProc) instead of MyProcNumber in LWLockQueueSelf(), that does show up a bit more noticeable. > void > -ProcSignalInit(int pss_idx) > +ProcSignalInit(void) > { > ProcSignalSlot *slot; > uint64 barrier_generation; > > - Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots); > - > - slot = &ProcSignal->psh_slot[pss_idx - 1]; > + if (MyBackendId <= 0) > + elog(ERROR, "MyBackendId not set"); > + if (MyBackendId > NumProcSignalSlots) > + elog(ERROR, "unexpected MyBackendId %d in ProcSignalInit (max %d)", MyBackendId, NumProcSignalSlots); > + slot = &ProcSignal->psh_slot[MyBackendId - 1]; > > /* sanity check */ > if (slot->pss_pid != 0) > elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty", > - MyProcPid, pss_idx); > + MyProcPid, (int) (slot - ProcSignal->psh_slot)); Hm, why not use MyBackendId - 1 as above? Am I missing something? > /* > @@ -212,11 +211,7 @@ ProcSignalInit(int pss_idx) > static void > CleanupProcSignalState(int status, Datum arg) > { > - int pss_idx = DatumGetInt32(arg); > - ProcSignalSlot *slot; > - > - slot = &ProcSignal->psh_slot[pss_idx - 1]; > - Assert(slot == MyProcSignalSlot); > + ProcSignalSlot *slot = MyProcSignalSlot; Maybe worth asserting that MyProcSignalSlot isn't NULL? Previously that was checked via the assertion above. > + if (i != segP->numProcs - 1) > + segP->pgprocnos[i] = segP->pgprocnos[segP->numProcs - 1]; > + break; Hm. This means the list will be out-of-order more and more over time, leading to less cache efficient access patterns. Perhaps we should keep this sorted, like we do for ProcGlobal->xids etc? > @@ -148,19 +148,11 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) > PGPROC *proc; > BackendId backendId = InvalidBackendId; > > - proc = BackendPidGetProc(pid); > - > /* > * See if the process with given pid is a backend or an auxiliary process. > - * > - * If the given process is a backend, use its backend id in > - * SendProcSignal() later to speed up the operation. Otherwise, don't do > - * that because auxiliary processes (except the startup process) don't > - * have a valid backend id. > */ > - if (proc != NULL) > - backendId = proc->backendId; > - else > + proc = BackendPidGetProc(pid); > + if (proc == NULL) > proc = AuxiliaryPidGetProc(pid); > > /* > @@ -183,6 +175,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) > PG_RETURN_BOOL(false); > } > > + if (proc != NULL) > + backendId = GetBackendIdFromPGProc(proc); How can proc be NULL here? Greetings, Andres Freund
On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote: > > I think the last remaining question here is about the 0- vs 1-based indexing > > of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do > > it, should we reserve PGPROC 0. I'm on the fence on this one. > > I lean towards it being a good idea. Having two internal indexing schemes was > bad enough so far, but at least one would fairly quickly notice if one used > the wrong one. If they're just offset by 1, it might end up taking longer, > because that'll often also be a valid id. Yeah, I think making everything 0-based is probably the best way forward long term. It might require more cleanup work to get there, but it's just a lot simpler in the end, IMHO. -- Robert Haas EDB: http://www.enterprisedb.com
On 15/02/2024 07:09, Robert Haas wrote: > On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote: >>> I think the last remaining question here is about the 0- vs 1-based indexing >>> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do >>> it, should we reserve PGPROC 0. I'm on the fence on this one. >> >> I lean towards it being a good idea. Having two internal indexing schemes was >> bad enough so far, but at least one would fairly quickly notice if one used >> the wrong one. If they're just offset by 1, it might end up taking longer, >> because that'll often also be a valid id. > > Yeah, I think making everything 0-based is probably the best way > forward long term. It might require more cleanup work to get there, > but it's just a lot simpler in the end, IMHO. Here's another patch version that does that. Yeah, I agree it's nicer in the end. I'm pretty happy with this now. I'll read through these patches myself again after sleeping over it and try to get this committed by the end of the week, but another pair of eyes wouldn't hurt. On 14/02/2024 23:37, Andres Freund wrote: >> void >> -ProcSignalInit(int pss_idx) >> +ProcSignalInit(void) >> { >> ProcSignalSlot *slot; >> uint64 barrier_generation; >> >> - Assert(pss_idx >= 1 && pss_idx <= NumProcSignalSlots); >> - >> - slot = &ProcSignal->psh_slot[pss_idx - 1]; >> + if (MyBackendId <= 0) >> + elog(ERROR, "MyBackendId not set"); >> + if (MyBackendId > NumProcSignalSlots) >> + elog(ERROR, "unexpected MyBackendId %d in ProcSignalInit (max %d)", MyBackendId, NumProcSignalSlots); >> + slot = &ProcSignal->psh_slot[MyBackendId - 1]; >> >> /* sanity check */ >> if (slot->pss_pid != 0) >> elog(LOG, "process %d taking over ProcSignal slot %d, but it's not empty", >> - MyProcPid, pss_idx); >> + MyProcPid, (int) (slot - ProcSignal->psh_slot)); > > Hm, why not use MyBackendId - 1 as above? Am I missing something? You're right, fixed. >> /* >> @@ -212,11 +211,7 @@ ProcSignalInit(int pss_idx) >> static void >> CleanupProcSignalState(int status, Datum arg) >> { >> - int pss_idx = DatumGetInt32(arg); >> - ProcSignalSlot *slot; >> - >> - slot = &ProcSignal->psh_slot[pss_idx - 1]; >> - Assert(slot == MyProcSignalSlot); >> + ProcSignalSlot *slot = MyProcSignalSlot; > > Maybe worth asserting that MyProcSignalSlot isn't NULL? Previously that was > checked via the assertion above. Added. >> + if (i != segP->numProcs - 1) >> + segP->pgprocnos[i] = segP->pgprocnos[segP->numProcs - 1]; >> + break; > > Hm. This means the list will be out-of-order more and more over time, leading > to less cache efficient access patterns. Perhaps we should keep this sorted, > like we do for ProcGlobal->xids etc? Perhaps, although these are accessed much less frequently so I'm not convinced it's worth the trouble. I haven't found a good performance test case that where the shared cache invalidation would show up. Would you happen to have one? >> @@ -183,6 +175,8 @@ pg_log_backend_memory_contexts(PG_FUNCTION_ARGS) >> PG_RETURN_BOOL(false); >> } >> >> + if (proc != NULL) >> + backendId = GetBackendIdFromPGProc(proc); > > How can proc be NULL here? Fixed. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
On 22/02/2024 02:37, Heikki Linnakangas wrote: > On 15/02/2024 07:09, Robert Haas wrote: >> On Thu, Feb 15, 2024 at 3:07 AM Andres Freund <andres@anarazel.de> wrote: >>>> I think the last remaining question here is about the 0- vs 1-based indexing >>>> of BackendIds. Is it a good idea to switch to 0-based indexing? And if we do >>>> it, should we reserve PGPROC 0. I'm on the fence on this one. >>> >>> I lean towards it being a good idea. Having two internal indexing schemes was >>> bad enough so far, but at least one would fairly quickly notice if one used >>> the wrong one. If they're just offset by 1, it might end up taking longer, >>> because that'll often also be a valid id. >> >> Yeah, I think making everything 0-based is probably the best way >> forward long term. It might require more cleanup work to get there, >> but it's just a lot simpler in the end, IMHO. > > Here's another patch version that does that. Yeah, I agree it's nicer in > the end. > > I'm pretty happy with this now. I'll read through these patches myself > again after sleeping over it and try to get this committed by the end of > the week, but another pair of eyes wouldn't hurt. And pushed. Thanks for the reviews! -- Heikki Linnakangas Neon (https://neon.tech)
I've now completed many of the side-quests, here are the patches that remain. The first three patches form a logical unit. They move the initialization of the Port struct from postmaster to the backend process. Currently, that work is split between the postmaster and the backend process so that postmaster fills in the socket and some other fields, and the backend process fills the rest after reading the startup packet. With these patches, there is a new much smaller ClientSocket struct that is passed from the postmaster to the child process, which contains just the fields that postmaster initializes. The Port struct is allocated in the child process. That makes the backend startup easier to understand. I plan to commit those three patches next if there are no objections. That leaves the rest of the patches. I think they're in pretty good shape too, and I've gotten some review on those earlier and have addressed the comments I got so far, but would still appreciate another round of review. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v12-0001-Pass-CAC-as-argument-to-backend-process.patch
- v12-0002-Remove-ConnCreate-and-ConnFree-and-allocate-Port.patch
- v12-0003-Move-initialization-of-Port-struct-to-child-proc.patch
- v12-0004-Extract-registration-of-Win32-deadchild-callback.patch
- v12-0005-Move-some-functions-from-postmaster.c-to-new-sou.patch
- v12-0006-Refactor-AuxProcess-startup.patch
- v12-0007-Refactor-postmaster-child-process-launching.patch
- v12-0008-Move-code-for-backend-startup-to-separate-file.patch
On Mon, Mar 4, 2024 at 1:40 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
On 22/02/2024 02:37, Heikki Linnakangas wrote:
> Here's another patch version that does that. Yeah, I agree it's nicer in
> the end.
>
> I'm pretty happy with this now. I'll read through these patches myself
> again after sleeping over it and try to get this committed by the end of
> the week, but another pair of eyes wouldn't hurt.
And pushed. Thanks for the reviews!
I noticed that there are still three places in backend_status.c where
pgstat_get_beentry_by_backend_id() is referenced. I think we should
replace them with pgstat_get_beentry_by_proc_number().
Thanks
Richard
pgstat_get_beentry_by_backend_id() is referenced. I think we should
replace them with pgstat_get_beentry_by_proc_number().
Thanks
Richard
On 05/03/2024 11:44, Richard Guo wrote: > I noticed that there are still three places in backend_status.c where > pgstat_get_beentry_by_backend_id() is referenced. I think we should > replace them with pgstat_get_beentry_by_proc_number(). Fixed, thanks! -- Heikki Linnakangas Neon (https://neon.tech)
On Mon Mar 4, 2024 at 3:05 AM CST, Heikki Linnakangas wrote: > I've now completed many of the side-quests, here are the patches that > remain. > > The first three patches form a logical unit. They move the > initialization of the Port struct from postmaster to the backend > process. Currently, that work is split between the postmaster and the > backend process so that postmaster fills in the socket and some other > fields, and the backend process fills the rest after reading the startup > packet. With these patches, there is a new much smaller ClientSocket > struct that is passed from the postmaster to the child process, which > contains just the fields that postmaster initializes. The Port struct is > allocated in the child process. That makes the backend startup easier to > understand. I plan to commit those three patches next if there are no > objections. > > That leaves the rest of the patches. I think they're in pretty good > shape too, and I've gotten some review on those earlier and have > addressed the comments I got so far, but would still appreciate another > round of review. > - * *MyProcPort, because ConnCreate() allocated that space with malloc() > - * ... else we'd need to copy the Port data first. Also, subsidiary data > - * such as the username isn't lost either; see ProcessStartupPacket(). > + * *MyProcPort, because that space is allocated in stack ... else we'd > + * need to copy the Port data first. Also, subsidiary data such as the > + * username isn't lost either; see ProcessStartupPacket(). s/allocated in/allocated on the The first 3 patches seem good to go, in my opinion. > @@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket *client_sock, BackgroundW > return -1; > } > > - /* Make sure caller set up argv properly */ > - Assert(argc >= 3); > - Assert(argv[argc] == NULL); > - Assert(strncmp(argv[1], "--fork", 6) == 0); > - Assert(argv[2] == NULL); > - > - /* Insert temp file name after --fork argument */ > + /* set up argv properly */ > + argv[0] = "postgres"; > + snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind); > + argv[1] = forkav; > + /* Insert temp file name after --forkchild argument */ > argv[2] = tmpfilename; > + argv[3] = NULL; Should we use postgres_exec_path instead of the naked "postgres" here? > + /* in postmaster, fork failed ... */ > + ereport(LOG, > + (errmsg("could not fork worker process: %m"))); > + /* undo what assign_backendlist_entry did */ > + ReleasePostmasterChildSlot(rw->rw_child_slot); > + rw->rw_child_slot = 0; > + pfree(rw->rw_backend); > + rw->rw_backend = NULL; > + /* mark entry as crashed, so we'll try again later */ > + rw->rw_crashed_at = GetCurrentTimestamp(); > + return false; I think the error message should include the word "background." It would be more consistent with the log message above it. > +typedef struct > +{ > + int syslogFile; > + int csvlogFile; > + int jsonlogFile; > +} syslogger_startup_data; It would be nice if all of these startup data structs were named similarly. For instance, a previous one was BackendStartupInfo. It would help with greppability. I noticed there were a few XXX comments left that you created. I'll highlight them here for more visibility. > +/* XXX: where does this belong? */ > +extern bool LoadedSSL; Perhaps near the My* variables or maybe in the Port struct? > +#ifdef EXEC_BACKEND > + > + /* > + * Need to reinitialize the SSL library in the backend, since the context > + * structures contain function pointers and cannot be passed through the > + * parameter file. > + * > + * If for some reason reload fails (maybe the user installed broken key > + * files), soldier on without SSL; that's better than all connections > + * becoming impossible. > + * > + * XXX should we do this in all child processes? For the moment it's > + * enough to do it in backend children. XXX good question indeed > + */ > +#ifdef USE_SSL > + if (EnableSSL) > + { > + if (secure_initialize(false) == 0) > + LoadedSSL = true; > + else > + ereport(LOG, > + (errmsg("SSL configuration could not be loaded in child process"))); > + } > +#endif > +#endif Here you added the "good question indeed." I am not sure what the best answer is either! :) > + /* XXX: translation? */ > + ereport(LOG, > + (errmsg("could not fork %s process: %m", PostmasterChildName(type)))); I assume you are referring to the child name here? > XXX: We now have functions called AuxiliaryProcessInit() and > InitAuxiliaryProcess(). Confusing. Based on my analysis, the *Init() is called in the Main functions, while Init*() is called before the Main functions. Maybe AuxiliaryProcessInit() could be renamed to AuxiliaryProcessStartup()? Rename the other to AuxiliaryProcessInit(). -- Tristan Partin Neon (https://neon.tech)
On 06/03/2024 01:02, Tristan Partin wrote: > The first 3 patches seem good to go, in my opinion. Committed these first patches, with a few more changes. Notably, I realized that we should move the logic that I originally put in the new InitClientConnection function to the existing pq_init() function. It servers the same purpose, initialization of the socket in the child process. Thanks for the review! >> @@ -225,14 +331,13 @@ internal_forkexec(int argc, char *argv[], ClientSocket *client_sock, BackgroundW >> return -1; >> } >> >> - /* Make sure caller set up argv properly */ >> - Assert(argc >= 3); >> - Assert(argv[argc] == NULL); >> - Assert(strncmp(argv[1], "--fork", 6) == 0); >> - Assert(argv[2] == NULL); >> - >> - /* Insert temp file name after --fork argument */ >> + /* set up argv properly */ >> + argv[0] = "postgres"; >> + snprintf(forkav, MAXPGPATH, "--forkchild=%s", child_kind); >> + argv[1] = forkav; >> + /* Insert temp file name after --forkchild argument */ >> argv[2] = tmpfilename; >> + argv[3] = NULL; > > Should we use postgres_exec_path instead of the naked "postgres" here? I don't know, but it's the same as on 'master' currently. The code just got moved around. >> + /* in postmaster, fork failed ... */ >> + ereport(LOG, >> + (errmsg("could not fork worker process: %m"))); >> + /* undo what assign_backendlist_entry did */ >> + ReleasePostmasterChildSlot(rw->rw_child_slot); >> + rw->rw_child_slot = 0; >> + pfree(rw->rw_backend); >> + rw->rw_backend = NULL; >> + /* mark entry as crashed, so we'll try again later */ >> + rw->rw_crashed_at = GetCurrentTimestamp(); >> + return false; > > I think the error message should include the word "background." It would > be more consistent with the log message above it. This is also a pre-existing message I just moved around. But yeah, I agree, so changed. >> +typedef struct >> +{ >> + int syslogFile; >> + int csvlogFile; >> + int jsonlogFile; >> +} syslogger_startup_data; > > It would be nice if all of these startup data structs were named > similarly. For instance, a previous one was BackendStartupInfo. It would > help with greppability. Renamed them to SysloggerStartupData and BackendStartupData. Background worker startup still passes a struct called BackgroundWorker, however. I left that as it is, because the struct is used for other purposes too. > I noticed there were a few XXX comments left that you created. I'll > highlight them here for more visibility. > >> +/* XXX: where does this belong? */ >> +extern bool LoadedSSL; > > Perhaps near the My* variables or maybe in the Port struct? It is valid in the postmaster, too, though. The My* variables and Port struct only make sense in the child process. I think this is the best place after all, so I just removed the XXX comment. >> +#ifdef EXEC_BACKEND >> + >> + /* >> + * Need to reinitialize the SSL library in the backend, since the context >> + * structures contain function pointers and cannot be passed through the >> + * parameter file. >> + * >> + * If for some reason reload fails (maybe the user installed broken key >> + * files), soldier on without SSL; that's better than all connections >> + * becoming impossible. >> + * >> + * XXX should we do this in all child processes? For the moment it's >> + * enough to do it in backend children. XXX good question indeed >> + */ >> +#ifdef USE_SSL >> + if (EnableSSL) >> + { >> + if (secure_initialize(false) == 0) >> + LoadedSSL = true; >> + else >> + ereport(LOG, >> + (errmsg("SSL configuration could not be loaded in child process"))); >> + } >> +#endif >> +#endif > > Here you added the "good question indeed." I am not sure what the best > answer is either! :) I just removed the extra XXX comment. It's still a valid question, but this patch just moves it around, we don't need to answer it here. >> + /* XXX: translation? */ >> + ereport(LOG, >> + (errmsg("could not fork %s process: %m", PostmasterChildName(type)))); > > I assume you are referring to the child name here? Correct. Does the process name need to be translated? And this way of constructing sentences is not translation-friendly anyway. In some languages, the word 'process' might need to be inflected differently depending on the child name, for example. I put the process name in quotes, and didn't mark the process name for translation. >> XXX: We now have functions called AuxiliaryProcessInit() and >> InitAuxiliaryProcess(). Confusing. > > Based on my analysis, the *Init() is called in the Main functions, while > Init*() is called before the Main functions. Maybe > AuxiliaryProcessInit() could be renamed to AuxiliaryProcessStartup()? > Rename the other to AuxiliaryProcessInit(). Hmm. There's also BackendStartup() function in postmaster.c, which is very different: it runs in the postmaster process and launches the backend process. So the Startup suffix is not great either. I renamed AuxiliaryProcessInit() to AuxiliaryProcessMainCommon(). As in "the common parts of the main functions of all the aux processes". (We should perhaps merge InitProcess() and InitAuxiliaryProcess() into one function. There's a lot of duplicated code between them. And the parts that differ should perhaps be refactored to be more similar anyway. I don't want to take on that refactoring right now though.) Attached is a new version of the remaining patches. -- Heikki Linnakangas Neon (https://neon.tech)
Attachment
- v13-0005-Refactor-postmaster-child-process-launching.patch
- v13-0001-Improve-log-messages-referring-to-background-wor.patch
- v13-0002-Extract-registration-of-Win32-deadchild-callback.patch
- v13-0003-Move-some-functions-from-postmaster.c-to-a-new-s.patch
- v13-0004-Refactor-AuxProcess-startup.patch
- v13-0006-Move-code-for-backend-startup-to-separate-file.patch
On 13/03/2024 09:30, Heikki Linnakangas wrote: > Attached is a new version of the remaining patches. Committed, with some final cosmetic cleanups. Thanks everyone! -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > Committed, with some final cosmetic cleanups. Thanks everyone! A couple of buildfarm animals don't like these tests: Assert(child_type >= 0 && child_type < lengthof(child_process_kinds)); for example ayu | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: comparison of constant 16 with expression of type'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] ayu | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: comparison of constant 16 with expression of type'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] I'm not real sure why it's moaning about the "<" check but not the ">= 0" check, which ought to be equally tautological given the assumption that the input is a valid member of the enum. But in any case, exactly how much value do these assertions carry? If you're intent on keeping them, perhaps casting child_type to int here would suppress the warnings. But personally I think I'd lose the Asserts. regards, tom lane
On 20/03/2024 07:37, Tom Lane wrote: > A couple of buildfarm animals don't like these tests: > > Assert(child_type >= 0 && child_type < lengthof(child_process_kinds)); > > for example > > ayu | 2024-03-19 13:08:05 | launch_backend.c:211:39: warning: comparison of constant 16 with expression oftype 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] > ayu | 2024-03-19 13:08:05 | launch_backend.c:233:39: warning: comparison of constant 16 with expression oftype 'BackendType' (aka 'enum BackendType') is always true [-Wtautological-constant-out-of-range-compare] > > I'm not real sure why it's moaning about the "<" check but not the > ">= 0" check, which ought to be equally tautological given the > assumption that the input is a valid member of the enum. But > in any case, exactly how much value do these assertions carry? > If you're intent on keeping them, perhaps casting child_type to > int here would suppress the warnings. But personally I think > I'd lose the Asserts. Yeah, it's not a very valuable assertion. Removed, thanks! -- Heikki Linnakangas Neon (https://neon.tech)
On Wed, 20 Mar 2024 at 08:16, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Yeah, it's not a very valuable assertion. Removed, thanks! How about we add it as a static assert instead of removing it, like we have for many other similar arrays.
Attachment
Hello! Maybe add PGDLLIMPORT to extern bool LoadedSSL; and extern struct ClientSocket *MyClientSocket; definitions in the src/include/postmaster/postmaster.h ? With the best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On 27/04/2024 11:27, Anton A. Melnikov wrote: > Hello! > > Maybe add PGDLLIMPORT to > extern bool LoadedSSL; > and > extern struct ClientSocket *MyClientSocket; > definitions in the src/include/postmaster/postmaster.h ? Peter E noticed and Michael fixed them in commit 768ceeeaa1 already. -- Heikki Linnakangas Neon (https://neon.tech)
On 28.04.2024 22:36, Heikki Linnakangas wrote: > Peter E noticed and Michael fixed them in commit 768ceeeaa1 already. Didn't check that is already fixed in the current master. Sorry! Thanks for pointing this out! With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Mon, Mar 18, 2024 at 10:41 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Committed, with some final cosmetic cleanups. Thanks everyone! Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be a bit off, from a branch of mine): ../src/backend/postmaster/launch_backend.c:772:2: runtime error: null pointer passed as argument 2, which is declared to never be null ==13303==Using libbacktrace symbolizer. #0 0x5555564b0202 in save_backend_variables ../src/backend/postmaster/launch_backend.c:772 #1 0x5555564b0242 in internal_forkexec ../src/backend/postmaster/launch_backend.c:311 #2 0x5555564b0bdd in postmaster_child_launch ../src/backend/postmaster/launch_backend.c:244 #3 0x5555564b3121 in StartChildProcess ../src/backend/postmaster/postmaster.c:3928 #4 0x5555564b933a in PostmasterMain ../src/backend/postmaster/postmaster.c:1357 #5 0x5555562de4ad in main ../src/backend/main/main.c:197 #6 0x7ffff667ad09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) #7 0x555555e34279 in _start (/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279) This silences it: - memcpy(param->startup_data, startup_data, startup_data_len); + if (startup_data_len > 0) + memcpy(param->startup_data, startup_data, startup_data_len); (I found that out by testing EXEC_BACKEND on CI. I also learned that the Mac and FreeBSD tasks fail with EXEC_BACKEND because of SysV shmem bleating. We probably should go and crank up the relevant sysctls in the .cirrus.tasks.yml...)
While looking into [0], I noticed that main() still only checks for the --fork prefix, but IIUC commit aafc05d removed all --fork* options except for --forkchild. I've attached a patch to strengthen the check in main(). This is definitely just a nitpick. [0] https://postgr.es/m/CAKAnmmJkZtZAiSryho%3DgYpbvC7H-HNjEDAh16F3SoC9LPu8rqQ%40mail.gmail.com -- nathan
Attachment
On 18/05/2024 08:24, Thomas Munro wrote: > Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be > a bit off, from a branch of mine): > > ../src/backend/postmaster/launch_backend.c:772:2: runtime error: null > pointer passed as argument 2, which is declared to never be null > ==13303==Using libbacktrace symbolizer. > #0 0x5555564b0202 in save_backend_variables > ../src/backend/postmaster/launch_backend.c:772 > #1 0x5555564b0242 in internal_forkexec > ../src/backend/postmaster/launch_backend.c:311 > #2 0x5555564b0bdd in postmaster_child_launch > ../src/backend/postmaster/launch_backend.c:244 > #3 0x5555564b3121 in StartChildProcess > ../src/backend/postmaster/postmaster.c:3928 > #4 0x5555564b933a in PostmasterMain > ../src/backend/postmaster/postmaster.c:1357 > #5 0x5555562de4ad in main ../src/backend/main/main.c:197 > #6 0x7ffff667ad09 in __libc_start_main > (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) > #7 0x555555e34279 in _start > (/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279) > > This silences it: > > - memcpy(param->startup_data, startup_data, startup_data_len); > + if (startup_data_len > 0) > + memcpy(param->startup_data, startup_data, startup_data_len); Fixed, thanks! On 17/06/2024 21:36, Nathan Bossart wrote: > While looking into [0], I noticed that main() still only checks for the > --fork prefix, but IIUC commit aafc05d removed all --fork* options except > for --forkchild. I've attached a patch to strengthen the check in main(). > This is definitely just a nitpick. Fixed this too, thanks! -- Heikki Linnakangas Neon (https://neon.tech)