Thread: relscan_details.h
relscan.h is a bit of an unfortunate header because it requires a lot of other headers to compile, and is also required by execnodes.h. This quick patch removes the struct definitions from that file, moving them into a new relscan_details.h file; the reworked relscan.h does not need any other header, freeing execnodes.h from needlessly including lots of unnecessary stuff all over the place. Only the files that really need the struct definition include the new file, and as seen in the patch, they are quite few. execnodes.h is included by 147 backend source files; relscan_details.h only 27. So the exposure area is reduced considerably. This patch also modifies a few other backend source files to add one or both of genam.h and heapam.h, which were previously included through execnodes.h. This is probably missing tweaks for contrib/. The following modules would need to include relscan_details.h: sepgsql dblink pgstattuple pgrowlocks. I expect the effect on third-party modules to be much less than by the htup_details.h change. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On Mon, Sep 16, 2013 at 11:02:28PM -0300, Alvaro Herrera wrote: > relscan.h is a bit of an unfortunate header because it requires a lot of > other headers to compile, and is also required by execnodes.h. This Not in my copy of the source tree. execnodes.h uses the corresponding typedefs that appear in genam.h and heapam.h. Indeed, "find . -name '*.Po' | xargs grep -l relscan.h | wc -l" only locates 26 files that include relscan.h directly or indirectly. > quick patch removes the struct definitions from that file, moving them > into a new relscan_details.h file; the reworked relscan.h does not need > any other header, freeing execnodes.h from needlessly including lots of > unnecessary stuff all over the place. Only the files that really need > the struct definition include the new file, and as seen in the patch, > they are quite few. > > execnodes.h is included by 147 backend source files; relscan_details.h > only 27. So the exposure area is reduced considerably. This patch also > modifies a few other backend source files to add one or both of genam.h > and heapam.h, which were previously included through execnodes.h. > > This is probably missing tweaks for contrib/. The following modules > would need to include relscan_details.h: sepgsql dblink pgstattuple > pgrowlocks. I expect the effect on third-party modules to be much less > than by the htup_details.h change. -1 on header restructuring in the absence of noteworthy compile-time benchmark improvements. Besides the obvious benchmark of full-build time, one could exercise the benefit of fewer header dependencies by modelling the series of compile times from running "git pull; make" at each commit for the past several months. The latter benchmark is perhaps more favorable. nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com
Noah Misch wrote: > On Mon, Sep 16, 2013 at 11:02:28PM -0300, Alvaro Herrera wrote: > > relscan.h is a bit of an unfortunate header because it requires a lot of > > other headers to compile, and is also required by execnodes.h. This > > Not in my copy of the source tree. execnodes.h uses the corresponding > typedefs that appear in genam.h and heapam.h. Indeed, "find . -name '*.Po' | > xargs grep -l relscan.h | wc -l" only locates 26 files that include relscan.h > directly or indirectly. Ah, I see now that I misspoke. This changeset has been dormant on my repo for a while, so I confused the detail of what it is about. The real benefit of this change is to eliminate indirect inclusion of genam.h and heapam.h. The number of indirect inclusions of each of them is: HEAD with patch heapam.h 219 118 genam.h 226 121 > -1 on header restructuring in the absence of noteworthy compile-time benchmark > improvements. Besides the obvious benchmark of full-build time, one could > exercise the benefit of fewer header dependencies by modelling the series of > compile times from running "git pull; make" at each commit for the past > several months. The latter benchmark is perhaps more favorable. I will have a go at measuring things this way. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 17, 2013 at 2:30 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >> -1 on header restructuring in the absence of noteworthy compile-time benchmark >> improvements. Besides the obvious benchmark of full-build time, one could >> exercise the benefit of fewer header dependencies by modelling the series of >> compile times from running "git pull; make" at each commit for the past >> several months. The latter benchmark is perhaps more favorable. > > I will have a go at measuring things this way. Personally, I'm not particularly in favor of these kinds of changes. The changes we made last time broke a lot of extensions - including some proprietary EDB ones that I had to go fix. I think a lot of people spent a lot of time fixing broken builds, at EDB and elsewhere, as well as rebasing patches. And the only benefit we have to balance that out is that incremental recompiles are faster, and I'm not really sure how important that actually is. On my system, configure takes 25 seconds and make -j3 takes 65 seconds; so, even a full recompile is pretty darn fast, and an incremental recompile is usually really fast.Now granted this is a relatively new system, but still. I don't want to be too dogmatic in opposing this; I accept that we should, from time to time, refactor things. If we don't, superflouous dependencies will probably proliferate over time. But personally, I'd rather do these changes in bulk every third release or so and keep them to a minimum in between times, so that we're not forcing people to constantly decorate their extensions with new #if directives. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas escribió: > Personally, I'm not particularly in favor of these kinds of changes. > The changes we made last time broke a lot of extensions - including > some proprietary EDB ones that I had to go fix. I think a lot of > people spent a lot of time fixing broken builds, at EDB and elsewhere, > as well as rebasing patches. And the only benefit we have to balance > that out is that incremental recompiles are faster, and I'm not really > sure how important that actually is. On my system, configure takes 25 > seconds and make -j3 takes 65 seconds; so, even a full recompile is > pretty darn fast, and an incremental recompile is usually really fast. > Now granted this is a relatively new system, but still. Fortunately the machines I work on now are also reasonably fast. There was a time when my desktop was so slow that it paid off to tweak certain file timestamps to avoid spurious recompiles. Now I don't have to worry. But it still annoys me that I have enough time to context-switch to, say, the email client or web browser, from where I don't switch back so quickly; which means I waste five or ten minutes for a task that should have taken 20 seconds. Now, htup_details.h was a bit different than the case at hand because there's evidently lots of code that want to deal with the guts of tuples, but for scans you mainly want to start one, iterate and finish, but don't care much about the innards. So the cleanup work required is going to be orders of magnitude smaller. OTOH, back then we had the alternative of doing the split in such a way that third-party code wouldn't have been affected that heavily, but we failed to take that into consideration. I trust we have all learned that lesson and will keep it in mind for next time. > I don't want to be too dogmatic in opposing this; I accept that we > should, from time to time, refactor things. If we don't, superflouous > dependencies will probably proliferate over time. But personally, I'd > rather do these changes in bulk every third release or so and keep > them to a minimum in between times, so that we're not forcing people > to constantly decorate their extensions with new #if directives. We can batch things, sure, but I don't think doing it only once every three years is the right tradeoff. There's not all that much of this refactoring, after all. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Sep 17, 2013 at 4:54 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Now, htup_details.h was a bit different than the case at hand because > there's evidently lots of code that want to deal with the guts of > tuples, but for scans you mainly want to start one, iterate and finish, > but don't care much about the innards. So the cleanup work required is > going to be orders of magnitude smaller. I think the point is that we failed to predict the knock-on consequences of that refactoring accurately. If we make enough such changes, we will probably face such issues again. Sure, we can hope that our ability to predict which changes will be disruptive will improve with practice, but I doubt it's ever going to be perfect. I certainly don't have the only vote here. I'm just telling you that from my point of view the last round of changes was a noticeable headache and I don't really feel that I'm better off because of it, so I'm in not in favor of continuing to make such changes on a regular basis. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Sep 17, 2013 at 05:54:04PM -0300, Alvaro Herrera wrote: > Robert Haas escribi?: > > > Personally, I'm not particularly in favor of these kinds of changes. > > The changes we made last time broke a lot of extensions - including > > some proprietary EDB ones that I had to go fix. I think a lot of > > people spent a lot of time fixing broken builds, at EDB and elsewhere, > > as well as rebasing patches. And the only benefit we have to balance > > that out is that incremental recompiles are faster, and I'm not really > > sure how important that actually is. On my system, configure takes 25 > > seconds and make -j3 takes 65 seconds; so, even a full recompile is > > pretty darn fast, and an incremental recompile is usually really fast. > > Now granted this is a relatively new system, but still. > > Fortunately the machines I work on now are also reasonably fast. There > was a time when my desktop was so slow that it paid off to tweak certain > file timestamps to avoid spurious recompiles. Now I don't have to > worry. But it still annoys me that I have enough time to context-switch > to, say, the email client or web browser, from where I don't switch back > so quickly; which means I waste five or ten minutes for a task that > should have taken 20 seconds. Right. If we can speed up a representative sample of incremental recompiles by 20%, then I'm on board. At 3%, probably not. (Alas, even 20% doesn't move it out of the causes-context-switch category. For that, I think you need fundamentally smarter tools.) > Now, htup_details.h was a bit different than the case at hand because > there's evidently lots of code that want to deal with the guts of > tuples, but for scans you mainly want to start one, iterate and finish, > but don't care much about the innards. So the cleanup work required is > going to be orders of magnitude smaller. There will also be the folks who must add heapam.h and/or genam.h includes despite formerly getting it/them through execnodes.h. That's not ugly like "#if PG_VERSION_NUM ...", but it's still work for authors. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Tue, Sep 17, 2013 at 05:54:04PM -0300, Alvaro Herrera wrote: > > I don't want to be too dogmatic in opposing this; I accept that we > > should, from time to time, refactor things. If we don't, superflouous > > dependencies will probably proliferate over time. But personally, I'd > > rather do these changes in bulk every third release or so and keep > > them to a minimum in between times, so that we're not forcing people > > to constantly decorate their extensions with new #if directives. > > We can batch things, sure, but I don't think doing it only once every > three years is the right tradeoff. There's not all that much of this > refactoring, after all. Agreed, we should go ahead and make the changes. People who use our code for plugins are already going to have a boat-load of stuff to change in each major release, and doing this isn't going to add a lot of burden, and it will give us cleaner code for the future. If we had not made massive cleanup changes years ago, our code would not be as good as it is today. By avoiding cleanup to reduce the burden on people who use our code, we are positioning our code on a slow decline in clarity. What I don't want to do is get into a mode where every code cleanup has to be verified that it isn't going to excessively burden outside code users. Cleanup is hard enough, and adding another check to that process makes cleanup even less likely. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On Tue, Sep 17, 2013 at 12:59 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Personally, I'm not particularly in favor of these kinds of changes. +1. Experience has shown this kind of effort to be a tarpit. It turns out that refactoring away compiler dependencies has this kind of fractal complexity - the more you look at it, the less sense it makes. I would be in favor of this if there were compelling gains in either compile time (and like others, I have a pretty high bar for compelling here), or if the refactoring effort remedied a clear modularity violation. Though I think it has to happen every once in a while, I'd suggest about every 5 years as the right interval. -- Peter Geoghegan
On Tue, Oct 01, 2013 at 10:12:05PM -0400, Bruce Momjian wrote: > If we had not made massive cleanup changes years ago, our code would not > be as good as it is today. By avoiding cleanup to reduce the burden on > people who use our code, we are positioning our code on a slow decline > in clarity. > > What I don't want to do is get into a mode where every code cleanup has > to be verified that it isn't going to excessively burden outside code > users. Cleanup is hard enough, and adding another check to that process > makes cleanup even less likely. I don't disagree with that, but I would describe the proposal as a mild dirty-up for the sake of build performance, not a cleanup. -- Noah Misch EnterpriseDB http://www.enterprisedb.com
On Wed, Oct 2, 2013 at 08:59:42AM -0400, Noah Misch wrote: > On Tue, Oct 01, 2013 at 10:12:05PM -0400, Bruce Momjian wrote: > > If we had not made massive cleanup changes years ago, our code would not > > be as good as it is today. By avoiding cleanup to reduce the burden on > > people who use our code, we are positioning our code on a slow decline > > in clarity. > > > > What I don't want to do is get into a mode where every code cleanup has > > to be verified that it isn't going to excessively burden outside code > > users. Cleanup is hard enough, and adding another check to that process > > makes cleanup even less likely. > > I don't disagree with that, but I would describe the proposal as a mild > dirty-up for the sake of build performance, not a cleanup. OK, good enough. Thanks for the feedback. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce Momjian escribió: > On Wed, Oct 2, 2013 at 08:59:42AM -0400, Noah Misch wrote: > > On Tue, Oct 01, 2013 at 10:12:05PM -0400, Bruce Momjian wrote: > > > If we had not made massive cleanup changes years ago, our code would not > > > be as good as it is today. By avoiding cleanup to reduce the burden on > > > people who use our code, we are positioning our code on a slow decline > > > in clarity. > > > > > > What I don't want to do is get into a mode where every code cleanup has > > > to be verified that it isn't going to excessively burden outside code > > > users. Cleanup is hard enough, and adding another check to that process > > > makes cleanup even less likely. > > > > I don't disagree with that, but I would describe the proposal as a mild > > dirty-up for the sake of build performance, not a cleanup. > > OK, good enough. Thanks for the feedback. Agreed. I will keep this patch in my local repo; I might present it again later as part of more invasive surgery. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services