Thread: [HACKERS] Process startup infrastructure is a mess
Hi, The way we currently start and initialize individual postgres (sub-) processes is pretty complicated and duplicative. I've a couple complaints: 1) It's completely non-obvious that bootstrap.c:AuxiliaryProcessMain() can get invoked both via postmaster for subprocesses(startup, wal writer, bgwriter, checkpointer, wal receiver, "checker"), as well as directly via main.c forthe bootstrap processes. Note the autovacuum launcher, archiver, logger are *not* started this way. 2) Most of the processes mentioned in 1) and some additional ones (autovac launcher and walsender / autovacuum workers tosome degree) duplicate a lot of logic for startup, error handling and main loop. Especially the error handling code isorder dependent, complex, and has been broken in various subprocesses in the past. 3) There exists yet *another* way of starting processes in the form of background workers. Initially that was "just" forextensions, but is now employed for builtin tasks too, like the logical rep launcher / workers. In the course of thatspecial case handling had to be sprinkled around, because the bgworker logic isn't quite able to be good enough forsomething builtin. 4) The naming of initialization functions is, uh, not particularly clear. How many people can differentiate, without checking, BaseInit(), BackendInitialize(), InitPostgres(), InitProcess(). I think the complexity here just grew incrementally with the addition of more and more subprocesses, without sufficient refactoring. I did some in 31c453165b5a6, but that's not even remotely enough. I think we should seriously consider doing a larger refactoring of this soon. I've some ideas about what to do, but I'd welcome some thoughts on whether others consider this a serious problem or not, and what they think we should do about this, first. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 14 September 2017 at 22:44, Andres Freund <andres@anarazel.de> wrote: > The way we currently start and initialize individual postgres (sub-) > processes is pretty complicated and duplicative. I've a couple > complaints: ... > I think we should seriously consider doing a larger refactoring of this > soon. I've some ideas about what to do, but I'd welcome some thoughts > on whether others consider this a serious problem or not, and what they > think we should do about this, first. Refactoring without a purpose is a negative for me. It takes time, introduces bugs and means the greater code churn over time introduces more bugs because fewer people have seen the code. That is arguable, but when we compare the priority of that against things people want and need there is little contest in my mind. If we add something to an area then its a good time to refactor it since we were going to get bugs anyway. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On 2017-09-15 01:06:54 +0100, Simon Riggs wrote: > On 14 September 2017 at 22:44, Andres Freund <andres@anarazel.de> wrote: > > > The way we currently start and initialize individual postgres (sub-) > > processes is pretty complicated and duplicative. I've a couple > > complaints: > ... > > I think we should seriously consider doing a larger refactoring of this > > soon. I've some ideas about what to do, but I'd welcome some thoughts > > on whether others consider this a serious problem or not, and what they > > think we should do about this, first. > > Refactoring without a purpose is a negative for me. It takes time, > introduces bugs and means the greater code churn over time introduces > more bugs because fewer people have seen the code. That is arguable, > but when we compare the priority of that against things people want > and need there is little contest in my mind. That's true for code that's not going to be touched anymore, largely bug free, and can be understood with reasonable effort. Neither is the case here. > If we add something to an area then its a good time to refactor it > since we were going to get bugs anyway. We've added something to the area on a regular basis. As in last in v10. The fundamental problem with your argument is that that makes it impossible for most contributors to get feature in that touch these parts of the code - many neither have the time nor the knowledge to do such a refactoring. By your argument we should have rejected logical replication for v10 because it complicated this further, without cleaning it up. Besides that, it also makes it harder to develop new features, not just to get them in. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Andres Freund > I think we should seriously consider doing a larger refactoring of this > soon. I've some ideas about what to do, but I'd welcome some thoughts on > whether others consider this a serious problem or not, and what they think > we should do about this, first. Thank you for raising this. I think so, too. Some other things that quickly come to mind are: * The signal handlers are similar in many processes and redundant. It's not easy to see how the processes respond to a particularsignal and what signal can be used for new things such as dumping the stack trace in the server log. Maybe wecan have an array of process attributes that specify the signal handlers, startup/shutdown order, etc. Then the commonprocess management code handles processes based on the information in the array. * postmaster.c is very large (more than 6,000 lines) and hard to read. * It may be easy to forget adding initialization code in the Windows-specific path (SubPostmasterMaain). * It's confusing to find out which process counts toward max_connections, MaxBackends, the count of auxiliary processes. As a user, I'm sometimes confused about the relationship between max_connections, autovacuum_max_workers, max_wal_senders,max_worker_processes. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Andres, Simon, * Andres Freund (andres@anarazel.de) wrote: > On 2017-09-15 01:06:54 +0100, Simon Riggs wrote: > > If we add something to an area then its a good time to refactor it > > since we were going to get bugs anyway. > > We've added something to the area on a regular basis. As in last in > v10. The fundamental problem with your argument is that that makes it > impossible for most contributors to get feature in that touch these > parts of the code - many neither have the time nor the knowledge to do > such a refactoring. By your argument we should have rejected logical > replication for v10 because it complicated this further, without > cleaning it up. > > Besides that, it also makes it harder to develop new features, not just > to get them in. I'm definitely in agreement with Andres on this one. This isn't refactoring of little-to-never changed code, it's refactoring bits of the system which are changed with some regularity and looks likely to continue to need change as we add more features moving forward, and perhaps add greater controls over process startup. Thanks! Stephen
On Thu, Sep 14, 2017 at 9:30 PM, Stephen Frost <sfrost@snowman.net> wrote: > I'm definitely in agreement with Andres on this one. This isn't > refactoring of little-to-never changed code, it's refactoring bits of > the system which are changed with some regularity and looks likely to > continue to need change as we add more features moving forward, and > perhaps add greater controls over process startup. I agree to some extent. I think this is a mess and that cleaning it up would be better than not cleaning it up. I don't view it as a particularly urgent problem, though, and I'm not sure I see a need to attack it in a monolithic way. I think that unifying the error recovery paths in various processes would actually be more useful than unifying process startup, but that's not to say unifying process startup wouldn't be nice. Generally, I think the newer process startup methods are superior to the older ones. The new background worker mechanism provides a flexible way of adding new processes whose exact number doesn't need to be known in advance without having to tinker with so much code. In contrast, adding an auxiliary process is big nuisance. So if I were going to attack this problem, I'd try to make the background worker machinery sufficiently capable that we can do everything that way, and then gradually move more things over to that mechanism. However, I honestly probably wouldn't bother with this unless it were blocking something specific that I wanted to do. I wouldn't go as far as Simon did in his reply; I would not favor refusing a good patch from a highly qualified contributor that makes this all less ugly and fragile. On the other hand, I think he's right that we wouldn't necessarily get a lot of immediate benefit out of it apart from feeling good about having done it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers