Thread: Reducing header interdependencies around heapam.h et al.

Reducing header interdependencies around heapam.h et al.

From
Andres Freund
Date:
Hi,

While working on pluggable storage (in particular, while cleaning it up
over the last few days), I grew concerned with widely heapam.h is
included in other headers.  E.g. the executor (via execnodes.h,
executor.h relying on heapam.h) shouldn't depend on heapam.h details -
particularly after pluggable storage, but also before. To me that's
unnecessary leakage across abstraction boundaries.

In the attached patch I excised all heapam.h includes from other
headers. There were basically two things required to do so:

* In a few places that use HeapScanDesc (which confusingly is a typedef
  in heapam.h, but the underlying struct is in relscan.h) etc, we can
  easily get by just using struct HeapScanDescData * instead.

* I moved the LockTupleMode enum to to lockoptions.h - previously
  executor.h tried to avoid relying on heapam.h, but it ended up
  including heapam.h indirectly, which lead to a couple people
  introducing new uses of the enum.  We could just convert those to
  ints like in other places, but I think moving to a smaller header
  seems more appropriate.  I don't think lockoptions.h is perfect, but
  it's also not bad?

This required adding heapam.h includes to a bunch of places, but that
doesn't seem too bad. It'll possibly break a few external users, but I
think that's acceptable cost - many of those will already/will further
be broken in 12 anyway.

I think we should do the same with genam.h, but that seems better done
separately.


I've a pending local set of patches that splits relation_open/close,
heap_open/close et al into a separate set of includes, that then allows
to downgrade the heapam.h include to that new file (given that a large
percentage of the files really just want heap_open/close and nothing
else from heapam.h), which I'll rebase ontop of this if we can agree
that this change is a good idea.


Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
being in different files (in a3540b0f657c6352) - what do you think about
this change?  I didn't revert that split, but I think we ought to
consider just relying on a forward declared struct in heapam.h instead,
this split is pretty confusing and seems to lead to more interdependence
in a lot of cases.

Greetings,

Andres Freund

Attachment

Re: Reducing header interdependencies around heapam.h et al.

From
Alvaro Herrera
Date:
On 2019-Jan-13, Andres Freund wrote:

> Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
> being in different files (in a3540b0f657c6352) - what do you think about
> this change?  I didn't revert that split, but I think we ought to
> consider just relying on a forward declared struct in heapam.h instead,
> this split is pretty confusing and seems to lead to more interdependence
> in a lot of cases.

I wasn't terribly happy with that split, so I'm not opposed to doing
things differently.  For your consideration, I've had this patch lying
around for a few years, which (IIRC) reduces the exposure of heapam.h by
splitting relscan.h in two.  This applies on top of dd778e9d8884 (and as
I recall it worked well there).

I'll try to have a look at your patch tomorrow.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reducing header interdependencies around heapam.h et al.

From
Andres Freund
Date:
Hi,

On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:
> On 2019-Jan-13, Andres Freund wrote:
> 
> > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
> > being in different files (in a3540b0f657c6352) - what do you think about
> > this change?  I didn't revert that split, but I think we ought to
> > consider just relying on a forward declared struct in heapam.h instead,
> > this split is pretty confusing and seems to lead to more interdependence
> > in a lot of cases.
> 
> I wasn't terribly happy with that split, so I'm not opposed to doing
> things differently.  For your consideration, I've had this patch lying
> around for a few years, which (IIRC) reduces the exposure of heapam.h by
> splitting relscan.h in two.  This applies on top of dd778e9d8884 (and as
> I recall it worked well there).

You forgot to attach that patch... :).

I'm not sure I see a need to split relscan - note my patch makes it so
that it's not included by heapam.h anymore, and doing for the same for
genam.h would be fairly straightforward.  The most interesting bit there
would be whether we'd add the includes necessary for Snapshot (imo no),
Relation (?), ScanKey (imo no), or whether to add the necessary includes
directly.


> I'll try to have a look at your patch tomorrow.

Thanks!

Greetings,

Andres Freund


Re: Reducing header interdependencies around heapam.h et al.

From
Andres Freund
Date:
On 2019-01-13 19:05:03 -0800, Andres Freund wrote:
> Hi,
> 
> On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:
> > On 2019-Jan-13, Andres Freund wrote:
> > 
> > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
> > > being in different files (in a3540b0f657c6352) - what do you think about
> > > this change?  I didn't revert that split, but I think we ought to
> > > consider just relying on a forward declared struct in heapam.h instead,
> > > this split is pretty confusing and seems to lead to more interdependence
> > > in a lot of cases.
> > 
> > I wasn't terribly happy with that split, so I'm not opposed to doing
> > things differently.  For your consideration, I've had this patch lying
> > around for a few years, which (IIRC) reduces the exposure of heapam.h by
> > splitting relscan.h in two.  This applies on top of dd778e9d8884 (and as
> > I recall it worked well there).
> 
> You forgot to attach that patch... :).
> 
> I'm not sure I see a need to split relscan

One split I am wondering about however is splitting out the sysstable_
stuff out of genam.h. It's imo a different component and shouldn't
really be in there. Would be quite a bit of rote work to add all the
necessary includes over the backend...

Greetings,

Andres Freund


Re: Reducing header interdependencies around heapam.h et al.

From
Andres Freund
Date:
On 2019-01-13 19:05:03 -0800, Andres Freund wrote:
> Hi,
> 
> On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:
> > On 2019-Jan-13, Andres Freund wrote:
> > 
> > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
> > > being in different files (in a3540b0f657c6352) - what do you think about
> > > this change?  I didn't revert that split, but I think we ought to
> > > consider just relying on a forward declared struct in heapam.h instead,
> > > this split is pretty confusing and seems to lead to more interdependence
> > > in a lot of cases.
> > 
> > I wasn't terribly happy with that split, so I'm not opposed to doing
> > things differently.  For your consideration, I've had this patch lying
> > around for a few years, which (IIRC) reduces the exposure of heapam.h by
> > splitting relscan.h in two.  This applies on top of dd778e9d8884 (and as
> > I recall it worked well there).
> 
> You forgot to attach that patch... :).
> 
> I'm not sure I see a need to split relscan - note my patch makes it so
> that it's not included by heapam.h anymore, and doing for the same for
> genam.h would be fairly straightforward.  The most interesting bit there
> would be whether we'd add the includes necessary for Snapshot (imo no),
> Relation (?), ScanKey (imo no), or whether to add the necessary includes
> directly.

Here's a patch doing the same for genam as well.

Greetings,

Andres Freund

Attachment

Re: Reducing header interdependencies around heapam.h et al.

From
Alvaro Herrera
Date:
On 2019-Jan-13, Andres Freund wrote:

> On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:
> > On 2019-Jan-13, Andres Freund wrote:
> > 
> > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
> > > being in different files (in a3540b0f657c6352) - what do you think about
> > > this change?  I didn't revert that split, but I think we ought to
> > > consider just relying on a forward declared struct in heapam.h instead,
> > > this split is pretty confusing and seems to lead to more interdependence
> > > in a lot of cases.
> > 
> > I wasn't terribly happy with that split, so I'm not opposed to doing
> > things differently.  For your consideration, I've had this patch lying
> > around for a few years, which (IIRC) reduces the exposure of heapam.h by
> > splitting relscan.h in two.  This applies on top of dd778e9d8884 (and as
> > I recall it worked well there).
> 
> You forgot to attach that patch... :).

Oops :-(  Here it is anyway.  Notmuch reminded me that I had posted this
before, to a pretty cold reception:
https://postgr.es/m/20130917020228.GB7139@eldon.alvh.no-ip.org
Needless to say, I disagree with the general sentiment in that thread
that header refactoring is pointless and unwelcome.

> I'm not sure I see a need to split relscan - note my patch makes it so
> that it's not included by heapam.h anymore, and doing for the same for
> genam.h would be fairly straightforward.  The most interesting bit there
> would be whether we'd add the includes necessary for Snapshot (imo no),
> Relation (?), ScanKey (imo no), or whether to add the necessary includes
> directly.

Ah, you managed to get heapam.h and genam.h out of execnodes.h, which I
think was my main motivation ... that seems good enough to me.  I agree
that splitting relscan.h may not be necessary after these changes.

As for struct Relation, note that for that one you only need relcache.h
which should be lean enough, so it doesn't sound too bad to include that
one wherever.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Reducing header interdependencies around heapam.h et al.

From
Alvaro Herrera
Date:
On 2019-Jan-13, Andres Freund wrote:

> One split I am wondering about however is splitting out the sysstable_
> stuff out of genam.h. It's imo a different component and shouldn't
> really be in there. Would be quite a bit of rote work to add all the
> necessary includes over the backend...

Yeah -- unless there's a demonstrable win from this split, I would leave
this part alone, unless you regularly carry a shield to PG conferences.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reducing header interdependencies around heapam.h et al.

From
Alvaro Herrera
Date:
0001 -- looks reasonable.  One hunk in executor.h changes LockTupleMode
to "enum LockTupleMode", but there's no need for that.

AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr)
split is so that it's possible to refer to structs without having
the full struct definition.  I think changing uses of FooBar to "struct
FooBarData *" defeats the whole purpose -- it becomes pointless noise,
confusing the reader for no gain.  I've long considered that the struct
definitions should appear in "internal" headers (such as
htup_details.h), separate from the pointer typedefs, so that it is the
forward struct declarations (and the pointer typedefs, where there are
any) that are part of the exposed API for each module, and not the
struct definitions.  

I think that would be much more invasive, though, and it's unlikely to
succeed as easily as this simpler approach is.

I think MissingPtr is a terrible name.  Can we change that while at this?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reducing header interdependencies around heapam.h et al.

From
Andres Freund
Date:
Hi,

On 2019-01-14 15:36:14 -0300, Alvaro Herrera wrote:
> 0001 -- looks reasonable.  One hunk in executor.h changes LockTupleMode
> to "enum LockTupleMode", but there's no need for that.

Oh, that escaped from an earlier version where I briefly forgot that one
cannot portably forward-declare enums.


> AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr)
> split is so that it's possible to refer to structs without having
> the full struct definition.  I think changing uses of FooBar to "struct
> FooBarData *" defeats the whole purpose -- it becomes pointless noise,
> confusing the reader for no gain.  I've long considered that the struct
> definitions should appear in "internal" headers (such as
> htup_details.h), separate from the pointer typedefs, so that it is the
> forward struct declarations (and the pointer typedefs, where there are
> any) that are part of the exposed API for each module, and not the
> struct definitions.

I think the whole pointer hiding game that we play is a really really
bad idea. We should just about *never* have typedefs that hide the fact
that something is a pointer. But it's hard to go away from that for
legacy reasons.

The problem with your approach is that it's *eminently* reasonable to
want to forward declare a struct in multiple places. Otherwise you end
up in issues where you include headers like heapam.h solely for a
typedef, which obviously doesn't make a ton of sense.

If we were in C11 we could just forward declare the pointer hiding
typedefs in multiple places, and be done with that. But before C11
redundant typedefs aren't allowed. With the C99 move I'm however not
sure if there's any surviving supported compiler that doesn't allow
redundant typedefs as an extension.

Given the fact that including headers just for a typedef is frequent
overkill, hiding the typedef in a separate header has basically no
gain. I also don't quite understand why using a forward declaration with
struct in the name is that confusing, especially when it only happens in
the header.


> I think MissingPtr is a terrible name.  Can we change that while at
> this?

Indeed. I'd just remove the typedef.

Greetings,

Andres Freund


Re: Reducing header interdependencies around heapam.h et al.

From
Alvaro Herrera
Date:
On 2019-Jan-14, Andres Freund wrote:

> I think the whole pointer hiding game that we play is a really really
> bad idea. We should just about *never* have typedefs that hide the fact
> that something is a pointer. But it's hard to go away from that for
> legacy reasons.
> 
> The problem with your approach is that it's *eminently* reasonable to
> want to forward declare a struct in multiple places. Otherwise you end
> up in issues where you include headers like heapam.h solely for a
> typedef, which obviously doesn't make a ton of sense.

Well, my point is that in an ideal world we would have a header where
the struct is declared once in a very lean header, which doesn't include
almost anything else, so you can include it into other headers
liberally.  Then the struct definitions are any another (heavy) header,
which *does* need to include lots of other stuff in order to be able to
define the structs fully, and would be #included very sparingly, only in
the few .c files that really needed it.

For example, I would split up execnodes.h so that *only* the
typedef/struct declarations are there, and *no* struct definition.  Then
that header can be #included in other headers that need those to declare
functions -- no problem.  Another header (say execstructs.h, whatever)
would contain the struct definition and would only be used by executor
.c files.  So when a struct changes, only the executor is recompiled;
the rest of the source doesn't care, because execnodes.h (the struct
decls) hasn't changed.

This may be too disruptive.  I'm not suggesting that you do things this
way, only describing my ideal world.

Your idea of "liberally forward-declaring structs in multiple places"
seems like you don't *have* the lean header at all (only the heavy one
with all the struct definitions), and instead you distribute bits and
pieces of the lean header randomly to the places that need it.  It's
repetitive.  It gets the job done, but it's not *clean*.

> Given the fact that including headers just for a typedef is frequent
> overkill, hiding the typedef in a separate header has basically no
> gain. I also don't quite understand why using a forward declaration with
> struct in the name is that confusing, especially when it only happens in
> the header.

Oh, that's not the confusing part -- that's just repetitive, nothing
more.  What's confusing (IMO) is having two names for the same struct
(one pointer and one full struct).  It'd be useful if it was used the
way I describe above.  But that's the settled project style, so I don't
have any hopes that it'll ever be changed.  

Anyway, I'm not objecting to your patch ... I just don't want it on
record that I'm in love with the current situation :-)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Reducing header interdependencies around heapam.h et al.

From
Andres Freund
Date:
Hi,

On 2019-01-14 17:55:44 -0300, Alvaro Herrera wrote:
> On 2019-Jan-14, Andres Freund wrote:
>
> > I think the whole pointer hiding game that we play is a really really
> > bad idea. We should just about *never* have typedefs that hide the fact
> > that something is a pointer. But it's hard to go away from that for
> > legacy reasons.
> >
> > The problem with your approach is that it's *eminently* reasonable to
> > want to forward declare a struct in multiple places. Otherwise you end
> > up in issues where you include headers like heapam.h solely for a
> > typedef, which obviously doesn't make a ton of sense.
>
> Well, my point is that in an ideal world we would have a header where
> the struct is declared once in a very lean header, which doesn't include
> almost anything else, so you can include it into other headers
> liberally.  Then the struct definitions are any another (heavy) header,
> which *does* need to include lots of other stuff in order to be able to
> define the structs fully, and would be #included very sparingly, only in
> the few .c files that really needed it.

> For example, I would split up execnodes.h so that *only* the
> typedef/struct declarations are there, and *no* struct definition.  Then
> that header can be #included in other headers that need those to declare
> functions -- no problem.  Another header (say execstructs.h, whatever)
> would contain the struct definition and would only be used by executor
> .c files.  So when a struct changes, only the executor is recompiled;
> the rest of the source doesn't care, because execnodes.h (the struct
> decls) hasn't changed.

It's surely better than the current state, but it still requires
recompiling everything in a more cases than necessary.


> This may be too disruptive.  I'm not suggesting that you do things this
> way, only describing my ideal world.

It'd probably doable by leaving execnodes.h as the heavyweight nodes,
and execnodetypes.h as the lightweight one, and including the latter
from the former. And then moving users of execnodes over to
execnodetypes.


> Your idea of "liberally forward-declaring structs in multiple places"
> seems like you don't *have* the lean header at all (only the heavy one
> with all the struct definitions), and instead you distribute bits and
> pieces of the lean header randomly to the places that need it.  It's
> repetitive.  It gets the job done, but it's not *clean*.

I'm not really buying the repetitiveness bit - it's really primarily
adding 'struct ' prefix, and sometimes adding a 'Data *' postfix. That's
not a lot of duplication. When used in structs there's no need to even
add an explicit 'struct <name>;' forward declaration, that's only needed
for function parameters.  And afterwards there's a lot less entanglement
- no need to recompile every file just because a new node type has been
added etc.


> > Given the fact that including headers just for a typedef is frequent
> > overkill, hiding the typedef in a separate header has basically no
> > gain. I also don't quite understand why using a forward declaration with
> > struct in the name is that confusing, especially when it only happens in
> > the header.
>
> Oh, that's not the confusing part -- that's just repetitive, nothing
> more.  What's confusing (IMO) is having two names for the same struct
> (one pointer and one full struct).  It'd be useful if it was used the
> way I describe above.  But that's the settled project style, so I don't
> have any hopes that it'll ever be changed.

Not within a few days, but we probably can do it over time...


> Anyway, I'm not objecting to your patch ... I just don't want it on
> record that I'm in love with the current situation :-)

Cool, I've pushed these now. I'll rebase my patch to split
(heap|reation)_(open|close)(rv)? patch out of heapam.[ch] now. Brr.

Greetings,

Andres Freund