Re: [CORE] postpone next week's release - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: [CORE] postpone next week's release |
Date | |
Msg-id | 557011A0.6070604@iki.fi Whole thread Raw |
In response to | Re: [CORE] postpone next week's release (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [CORE] postpone next week's release
|
List | pgsql-hackers |
On 05/30/2015 11:47 PM, Andres Freund wrote: > I don't think it's primarily a problem of lack of review; although that > is a large problem. I think the biggest systematic problem is that the > compound complexity of postgres has increased dramatically over the > years. Features have added complexity little by little, each not > incrementally not looking that bad. But very little has been done to > manage complexity. Since 8.0 the codesize has roughly doubled, but > little has been done to manage the increased complexity. Few new > abstractions have been introduced and the structure of the code is > largely the same. > > As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it > was ~500 LOC, in master it's ~1500. The interactions in 8.0 were > complex, they have gotten much more complex since. It fullfills lots of > different roles, all in one function: > > (roughly in the order things happen, but simplified) > * Read the control file/determine whether we crashed > * recovery.conf handling > * backup label handling > * tablespace map handling (huh, I missed that this was added directly to > StartupXLOG. What a bad idea) > * Determine whether we're doing archive recovery, read the relevant > checkpoint if so > * relcache init file removal > * timeline switch handling > * Loading the checkpoint we're starting from > * Initialization of a lot of subsystems > * crash recovery/replay > * Including pgstat, unlogged table, exported snapshot handling > * iff hot standby, some more subsystems are initialized here > * hot standby state handling > * replay process intialization > * crash replay itself, including > * progress tracking > * recovery pause handling > * nextxid tracking > * timeline increase handling > * hot standby state handling > * unlogged relations handling > * archive recovery handling > * creation/initialization of the end of recovery checkpoint > * timeline increment if failover > * subsystem initialization iff !hot_standby > * end of recovery actions > > Yes. that's one routine. And, to make things even funnier, half of that > routine isn't exercised by our tests. > > You can argue that this is an outlier, but I don't think so. Heapam, the > planner, etc. have similar cases. > > And I think this, to some degree, explains a lot of the multixact > problems. While there were a few "simple bugs", most of them were > interactions between the various subsystems that are rather intricate. I think this explanation is wrong. I agree that there are many places that would be good to refactor - like StartupXLOG() - but the multixact code was not too bad in that regard. IIRC the patch included some refactoring, it added some new helper functions in heapam.c, for example. You can argue that it didn't do enough of it, but that was not the big issue. The big issue was at the architecture level. Basically, we liked vacuuming of XIDs and clog so much that we decided that it'd be nice if you had to vacuum multixids too, in order to not lose data. Many of the bugs and issues were not new - we had multixids before - but we upped the ante and turned minor locking bugs into data loss. And that had nothing to do with the code structure - we'd have similar issues if we had rewritten everything java, with the same design. So, I'm all for refactoring and adding abstractions where it makes sense, but it's not going to solve design problems. - Heikki
pgsql-hackers by date: