Re: Auxiliary Processes and MyAuxProc - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Auxiliary Processes and MyAuxProc |
Date | |
Msg-id | 20200319205744.7ccomyhq5ivaftds@alap3.anarazel.de Whole thread Raw |
In response to | Re: Auxiliary Processes and MyAuxProc (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>) |
Responses |
Re: Auxiliary Processes and MyAuxProc
|
List | pgsql-hackers |
Hi, On 2020-03-19 11:35:41 +0100, Peter Eisentraut wrote: > On 2020-03-18 17:07, Mike Palmiotto wrote: > > On Wed, Mar 18, 2020 at 11:26 AM Mike Palmiotto > > <mike.palmiotto@crunchydata.com> wrote: > > > > > > On Wed, Mar 18, 2020 at 10:17 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > Also, I saw this was failing tests both before and after my rebase. > > > > > > > > http://cfbot.cputube.org/ > > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31535161 > > > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/builds/31386446 > > > > > > Good catch, thanks. Will address this as well in the next round. Just > > > need to set up a Windows dev environment to see if I can > > > reproduce/fix. > > > > While I track this down, here is a rebased patchset, which drops > > MySubprocessType and makes use of the MyBackendType. > > While working on (My)BackendType, I was attempting to get rid of > (My)AuxProcType altogether. This would mostly work except that it's sort of > wired into the pgstats subsystem (see NumBackendStatSlots). This can > probably be reorganized, but I didn't pursue it further. Hm. Why does the number of stat slots prevent dropping AuxProcType? > Now, I'm a sucker for refactoring, but I feel this proposal is going into a > direction I don't understand. I'd prefer that we focus around building out > background workers as the preferred subprocess mechanism. Building out a > second generic mechanism, again, I don't understand the direction. Are we > hoping to add more of these processes? Make it extensible? The net lines > added/removed by this patch series seems pretty neutral. What are we > getting at the end of this? I think background workers for internal processes are the *wrong* direction to go. They were used as a shortcut for parallelism, and then that was extended for logical replication. In my opinion that was done, to a significant degree, because the aux proc stuff is/was too painful to deal with, but that's something that should be fixed, instead of building more and more parallel infrastructure. Bgworkers are imo not actually a very good fit for internal processes. We have to be able to guarantee that there's a free "slot" to start internal processes, we should be able to efficiently reference their pids (especially from postmaster, but also other processes), we want to precisely know which shared PGPROC is being used, etc. We now have somewhat different systems for at least: non-shmem postmaster children, aux processes, autovacuum workers, internal bgworkers, extension bgworkers. That's just insane. We should merge those as much as possible. There's obviously going to be some differences, but it needs to be less than now. I think we're mostly on the same page on that, I just don't see bgworkers getting us there. The worst part about the current situation imo is that there's way too many places that one needs to modify / check to create / understand a process. Moving towards having a single c file w/ associated header that defines 95% of that seems like a good direction. I've not looked at recent versions of the patch, but there was some movement towards that in earlier versions. On a green field, I would say that there should be one or two C arrays-of-structs defining subprocesses. And most behaviour should be channeled through that. struct PgProcessType { const char* name; PgProcessBeforeShmem before_shmem; PgProcessEntry entrypoint; uint8:1 only_one_exists; uint8:1 uses_shared_memory; uint8:1 client_connected; uint8:1 database_connected; ... }; PgProcessType ProcessTypes[] = { [StartupType] = { .name = "startup", .entrypoint = StartupProcessMain, .only_one_exists = true, .uses_shared_memory = true, .client_connected = false, .database_connected = false, ... }, ... [UserBackendType] = { .name = "backend", .before_shmem = BackendInitialize, .entrypoint = BackendRun, // fixme .only_one_exists = false, .uses_shared_memory = true, .client_connected = true, .database_connected = true, ... } ... Then there should be a single startup routine for all postmaster children. Since most of the startup is actually shared between all the different types of processes, we can declutter it a lot. When starting a process postmaster would just specify the process type, and if relevant, an argument (struct Port for backends, whatever relevant for bgworkers etc) . Generic code should handle all the work until the process type entry point - and likely we should move more work from the individual process types into generic code. If a process is 'only_one_exists' (to be renamed), the generic code would also (in postmaster) register the pid as subprocess_pids[type] = pid; which would make it easy to only have per-type code in the few locations that need to be aware, instead of many locations in postmaster.c. Perhaps also some shared memory location. Coming back to earth, from green field imagination land: I think the patchset does go in that direction (to be fair, I think I outlined something like the above elsewhere in this thread in a discussion with Mike). And that's good. Greetings, Andres Freund
pgsql-hackers by date: