Thread: relscan_details.h

relscan_details.h

From
Alvaro Herrera
Date:
relscan.h is a bit of an unfortunate header because it requires a lot of
other headers to compile, and is also required by execnodes.h.  This
quick patch removes the struct definitions from that file, moving them
into a new relscan_details.h file; the reworked relscan.h does not need
any other header, freeing execnodes.h from needlessly including lots of
unnecessary stuff all over the place.  Only the files that really need
the struct definition include the new file, and as seen in the patch,
they are quite few.

execnodes.h is included by 147 backend source files; relscan_details.h
only 27.  So the exposure area is reduced considerably.  This patch also
modifies a few other backend source files to add one or both of genam.h
and heapam.h, which were previously included through execnodes.h.

This is probably missing tweaks for contrib/.  The following modules
would need to include relscan_details.h: sepgsql dblink pgstattuple
pgrowlocks.  I expect the effect on third-party modules to be much less
than by the htup_details.h change.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: relscan_details.h

From
Noah Misch
Date:
On Mon, Sep 16, 2013 at 11:02:28PM -0300, Alvaro Herrera wrote:
> relscan.h is a bit of an unfortunate header because it requires a lot of
> other headers to compile, and is also required by execnodes.h.  This

Not in my copy of the source tree.  execnodes.h uses the corresponding
typedefs that appear in genam.h and heapam.h.  Indeed, "find . -name '*.Po' |
xargs grep -l relscan.h | wc -l" only locates 26 files that include relscan.h
directly or indirectly.

> quick patch removes the struct definitions from that file, moving them
> into a new relscan_details.h file; the reworked relscan.h does not need
> any other header, freeing execnodes.h from needlessly including lots of
> unnecessary stuff all over the place.  Only the files that really need
> the struct definition include the new file, and as seen in the patch,
> they are quite few.
> 
> execnodes.h is included by 147 backend source files; relscan_details.h
> only 27.  So the exposure area is reduced considerably.  This patch also
> modifies a few other backend source files to add one or both of genam.h
> and heapam.h, which were previously included through execnodes.h.
> 
> This is probably missing tweaks for contrib/.  The following modules
> would need to include relscan_details.h: sepgsql dblink pgstattuple
> pgrowlocks.  I expect the effect on third-party modules to be much less
> than by the htup_details.h change.

-1 on header restructuring in the absence of noteworthy compile-time benchmark
improvements.  Besides the obvious benchmark of full-build time, one could
exercise the benefit of fewer header dependencies by modelling the series of
compile times from running "git pull; make" at each commit for the past
several months.  The latter benchmark is perhaps more favorable.

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: relscan_details.h

From
Alvaro Herrera
Date:
Noah Misch wrote:
> On Mon, Sep 16, 2013 at 11:02:28PM -0300, Alvaro Herrera wrote:
> > relscan.h is a bit of an unfortunate header because it requires a lot of
> > other headers to compile, and is also required by execnodes.h.  This
> 
> Not in my copy of the source tree.  execnodes.h uses the corresponding
> typedefs that appear in genam.h and heapam.h.  Indeed, "find . -name '*.Po' |
> xargs grep -l relscan.h | wc -l" only locates 26 files that include relscan.h
> directly or indirectly.

Ah, I see now that I misspoke.  This changeset has been dormant on my
repo for a while, so I confused the detail of what it is about.  The
real benefit of this change is to eliminate indirect inclusion of
genam.h and heapam.h.  The number of indirect inclusions of each of them
is:    HEAD        with patch
heapam.h    219        118
genam.h        226        121

> -1 on header restructuring in the absence of noteworthy compile-time benchmark
> improvements.  Besides the obvious benchmark of full-build time, one could
> exercise the benefit of fewer header dependencies by modelling the series of
> compile times from running "git pull; make" at each commit for the past
> several months.  The latter benchmark is perhaps more favorable.

I will have a go at measuring things this way.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: relscan_details.h

From
Robert Haas
Date:
On Tue, Sep 17, 2013 at 2:30 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
>> -1 on header restructuring in the absence of noteworthy compile-time benchmark
>> improvements.  Besides the obvious benchmark of full-build time, one could
>> exercise the benefit of fewer header dependencies by modelling the series of
>> compile times from running "git pull; make" at each commit for the past
>> several months.  The latter benchmark is perhaps more favorable.
>
> I will have a go at measuring things this way.

Personally, I'm not particularly in favor of these kinds of changes.
The changes we made last time broke a lot of extensions - including
some proprietary EDB ones that I had to go fix.  I think a lot of
people spent a lot of time fixing broken builds, at EDB and elsewhere,
as well as rebasing patches.  And the only benefit we have to balance
that out is that incremental recompiles are faster, and I'm not really
sure how important that actually is.  On my system, configure takes 25
seconds and make -j3 takes 65 seconds; so, even a full recompile is
pretty darn fast, and an incremental recompile is usually really fast.Now granted this is a relatively new system, but
still.

I don't want to be too dogmatic in opposing this; I accept that we
should, from time to time, refactor things.  If we don't, superflouous
dependencies will probably proliferate over time.  But personally, I'd
rather do these changes in bulk every third release or so and keep
them to a minimum in between times, so that we're not forcing people
to constantly decorate their extensions with new #if directives.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: relscan_details.h

From
Alvaro Herrera
Date:
Robert Haas escribió:

> Personally, I'm not particularly in favor of these kinds of changes.
> The changes we made last time broke a lot of extensions - including
> some proprietary EDB ones that I had to go fix.  I think a lot of
> people spent a lot of time fixing broken builds, at EDB and elsewhere,
> as well as rebasing patches.  And the only benefit we have to balance
> that out is that incremental recompiles are faster, and I'm not really
> sure how important that actually is.  On my system, configure takes 25
> seconds and make -j3 takes 65 seconds; so, even a full recompile is
> pretty darn fast, and an incremental recompile is usually really fast.
>  Now granted this is a relatively new system, but still.

Fortunately the machines I work on now are also reasonably fast.  There
was a time when my desktop was so slow that it paid off to tweak certain
file timestamps to avoid spurious recompiles.  Now I don't have to
worry.  But it still annoys me that I have enough time to context-switch
to, say, the email client or web browser, from where I don't switch back
so quickly; which means I waste five or ten minutes for a task that
should have taken 20 seconds.

Now, htup_details.h was a bit different than the case at hand because
there's evidently lots of code that want to deal with the guts of
tuples, but for scans you mainly want to start one, iterate and finish,
but don't care much about the innards.  So the cleanup work required is
going to be orders of magnitude smaller.

OTOH, back then we had the alternative of doing the split in such a way
that third-party code wouldn't have been affected that heavily, but we
failed to take that into consideration.  I trust we have all learned
that lesson and will keep it in mind for next time.

> I don't want to be too dogmatic in opposing this; I accept that we
> should, from time to time, refactor things.  If we don't, superflouous
> dependencies will probably proliferate over time.  But personally, I'd
> rather do these changes in bulk every third release or so and keep
> them to a minimum in between times, so that we're not forcing people
> to constantly decorate their extensions with new #if directives.

We can batch things, sure, but I don't think doing it only once every
three years is the right tradeoff.  There's not all that much of this
refactoring, after all.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: relscan_details.h

From
Robert Haas
Date:
On Tue, Sep 17, 2013 at 4:54 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Now, htup_details.h was a bit different than the case at hand because
> there's evidently lots of code that want to deal with the guts of
> tuples, but for scans you mainly want to start one, iterate and finish,
> but don't care much about the innards.  So the cleanup work required is
> going to be orders of magnitude smaller.

I think the point is that we failed to predict the knock-on
consequences of that refactoring accurately.  If we make enough such
changes, we will probably face such issues again.  Sure, we can hope
that our ability to predict which changes will be disruptive will
improve with practice, but I doubt it's ever going to be perfect.

I certainly don't have the only vote here.  I'm just telling you that
from my point of view the last round of changes was a noticeable
headache and I don't really feel that I'm better off because of it, so
I'm in not in favor of continuing to make such changes on a regular
basis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: relscan_details.h

From
Noah Misch
Date:
On Tue, Sep 17, 2013 at 05:54:04PM -0300, Alvaro Herrera wrote:
> Robert Haas escribi?:
> 
> > Personally, I'm not particularly in favor of these kinds of changes.
> > The changes we made last time broke a lot of extensions - including
> > some proprietary EDB ones that I had to go fix.  I think a lot of
> > people spent a lot of time fixing broken builds, at EDB and elsewhere,
> > as well as rebasing patches.  And the only benefit we have to balance
> > that out is that incremental recompiles are faster, and I'm not really
> > sure how important that actually is.  On my system, configure takes 25
> > seconds and make -j3 takes 65 seconds; so, even a full recompile is
> > pretty darn fast, and an incremental recompile is usually really fast.
> >  Now granted this is a relatively new system, but still.
> 
> Fortunately the machines I work on now are also reasonably fast.  There
> was a time when my desktop was so slow that it paid off to tweak certain
> file timestamps to avoid spurious recompiles.  Now I don't have to
> worry.  But it still annoys me that I have enough time to context-switch
> to, say, the email client or web browser, from where I don't switch back
> so quickly; which means I waste five or ten minutes for a task that
> should have taken 20 seconds.

Right.  If we can speed up a representative sample of incremental recompiles
by 20%, then I'm on board.  At 3%, probably not.  (Alas, even 20% doesn't move
it out of the causes-context-switch category.  For that, I think you need
fundamentally smarter tools.)

> Now, htup_details.h was a bit different than the case at hand because
> there's evidently lots of code that want to deal with the guts of
> tuples, but for scans you mainly want to start one, iterate and finish,
> but don't care much about the innards.  So the cleanup work required is
> going to be orders of magnitude smaller.

There will also be the folks who must add heapam.h and/or genam.h includes
despite formerly getting it/them through execnodes.h.  That's not ugly like
"#if PG_VERSION_NUM ...", but it's still work for authors.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: relscan_details.h

From
Bruce Momjian
Date:
On Tue, Sep 17, 2013 at 05:54:04PM -0300, Alvaro Herrera wrote:
> > I don't want to be too dogmatic in opposing this; I accept that we
> > should, from time to time, refactor things.  If we don't, superflouous
> > dependencies will probably proliferate over time.  But personally, I'd
> > rather do these changes in bulk every third release or so and keep
> > them to a minimum in between times, so that we're not forcing people
> > to constantly decorate their extensions with new #if directives.
> 
> We can batch things, sure, but I don't think doing it only once every
> three years is the right tradeoff.  There's not all that much of this
> refactoring, after all.

Agreed, we should go ahead and make the changes.  People who use our
code for plugins are already going to have a boat-load of stuff to
change in each major release, and doing this isn't going to add a lot of
burden, and it will give us cleaner code for the future.

If we had not made massive cleanup changes years ago, our code would not
be as good as it is today.  By avoiding cleanup to reduce the burden on
people who use our code, we are positioning our code on a slow decline
in clarity.

What I don't want to do is get into a mode where every code cleanup has
to be verified that it isn't going to excessively burden outside code
users.  Cleanup is hard enough, and adding another check to that process
makes cleanup even less likely.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: relscan_details.h

From
Peter Geoghegan
Date:
On Tue, Sep 17, 2013 at 12:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Personally, I'm not particularly in favor of these kinds of changes.

+1. Experience has shown this kind of effort to be a tarpit. It turns
out that refactoring away compiler dependencies has this kind of
fractal complexity - the more you look at it, the less sense it makes.

I would be in favor of this if there were compelling gains in either
compile time (and like others, I have a pretty high bar for compelling
here), or if the refactoring effort remedied a clear modularity
violation. Though I think it has to happen every once in a while, I'd
suggest about every 5 years as the right interval.

-- 
Peter Geoghegan



Re: relscan_details.h

From
Noah Misch
Date:
On Tue, Oct 01, 2013 at 10:12:05PM -0400, Bruce Momjian wrote:
> If we had not made massive cleanup changes years ago, our code would not
> be as good as it is today.  By avoiding cleanup to reduce the burden on
> people who use our code, we are positioning our code on a slow decline
> in clarity.
> 
> What I don't want to do is get into a mode where every code cleanup has
> to be verified that it isn't going to excessively burden outside code
> users.  Cleanup is hard enough, and adding another check to that process
> makes cleanup even less likely.

I don't disagree with that, but I would describe the proposal as a mild
dirty-up for the sake of build performance, not a cleanup.

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: relscan_details.h

From
Bruce Momjian
Date:
On Wed, Oct  2, 2013 at 08:59:42AM -0400, Noah Misch wrote:
> On Tue, Oct 01, 2013 at 10:12:05PM -0400, Bruce Momjian wrote:
> > If we had not made massive cleanup changes years ago, our code would not
> > be as good as it is today.  By avoiding cleanup to reduce the burden on
> > people who use our code, we are positioning our code on a slow decline
> > in clarity.
> > 
> > What I don't want to do is get into a mode where every code cleanup has
> > to be verified that it isn't going to excessively burden outside code
> > users.  Cleanup is hard enough, and adding another check to that process
> > makes cleanup even less likely.
> 
> I don't disagree with that, but I would describe the proposal as a mild
> dirty-up for the sake of build performance, not a cleanup.

OK, good enough.  Thanks for the feedback.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +



Re: relscan_details.h

From
Alvaro Herrera
Date:
Bruce Momjian escribió:
> On Wed, Oct  2, 2013 at 08:59:42AM -0400, Noah Misch wrote:
> > On Tue, Oct 01, 2013 at 10:12:05PM -0400, Bruce Momjian wrote:
> > > If we had not made massive cleanup changes years ago, our code would not
> > > be as good as it is today.  By avoiding cleanup to reduce the burden on
> > > people who use our code, we are positioning our code on a slow decline
> > > in clarity.
> > > 
> > > What I don't want to do is get into a mode where every code cleanup has
> > > to be verified that it isn't going to excessively burden outside code
> > > users.  Cleanup is hard enough, and adding another check to that process
> > > makes cleanup even less likely.
> > 
> > I don't disagree with that, but I would describe the proposal as a mild
> > dirty-up for the sake of build performance, not a cleanup.
> 
> OK, good enough.  Thanks for the feedback.

Agreed.  I will keep this patch in my local repo; I might present it
again later as part of more invasive surgery.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services