Thread: Reducing header interdependencies around heapam.h et al.
Hi, While working on pluggable storage (in particular, while cleaning it up over the last few days), I grew concerned with widely heapam.h is included in other headers. E.g. the executor (via execnodes.h, executor.h relying on heapam.h) shouldn't depend on heapam.h details - particularly after pluggable storage, but also before. To me that's unnecessary leakage across abstraction boundaries. In the attached patch I excised all heapam.h includes from other headers. There were basically two things required to do so: * In a few places that use HeapScanDesc (which confusingly is a typedef in heapam.h, but the underlying struct is in relscan.h) etc, we can easily get by just using struct HeapScanDescData * instead. * I moved the LockTupleMode enum to to lockoptions.h - previously executor.h tried to avoid relying on heapam.h, but it ended up including heapam.h indirectly, which lead to a couple people introducing new uses of the enum. We could just convert those to ints like in other places, but I think moving to a smaller header seems more appropriate. I don't think lockoptions.h is perfect, but it's also not bad? This required adding heapam.h includes to a bunch of places, but that doesn't seem too bad. It'll possibly break a few external users, but I think that's acceptable cost - many of those will already/will further be broken in 12 anyway. I think we should do the same with genam.h, but that seems better done separately. I've a pending local set of patches that splits relation_open/close, heap_open/close et al into a separate set of includes, that then allows to downgrade the heapam.h include to that new file (given that a large percentage of the files really just want heap_open/close and nothing else from heapam.h), which I'll rebase ontop of this if we can agree that this change is a good idea. Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData being in different files (in a3540b0f657c6352) - what do you think about this change? I didn't revert that split, but I think we ought to consider just relying on a forward declared struct in heapam.h instead, this split is pretty confusing and seems to lead to more interdependence in a lot of cases. Greetings, Andres Freund
Attachment
On 2019-Jan-13, Andres Freund wrote: > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData > being in different files (in a3540b0f657c6352) - what do you think about > this change? I didn't revert that split, but I think we ought to > consider just relying on a forward declared struct in heapam.h instead, > this split is pretty confusing and seems to lead to more interdependence > in a lot of cases. I wasn't terribly happy with that split, so I'm not opposed to doing things differently. For your consideration, I've had this patch lying around for a few years, which (IIRC) reduces the exposure of heapam.h by splitting relscan.h in two. This applies on top of dd778e9d8884 (and as I recall it worked well there). I'll try to have a look at your patch tomorrow. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote: > On 2019-Jan-13, Andres Freund wrote: > > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData > > being in different files (in a3540b0f657c6352) - what do you think about > > this change? I didn't revert that split, but I think we ought to > > consider just relying on a forward declared struct in heapam.h instead, > > this split is pretty confusing and seems to lead to more interdependence > > in a lot of cases. > > I wasn't terribly happy with that split, so I'm not opposed to doing > things differently. For your consideration, I've had this patch lying > around for a few years, which (IIRC) reduces the exposure of heapam.h by > splitting relscan.h in two. This applies on top of dd778e9d8884 (and as > I recall it worked well there). You forgot to attach that patch... :). I'm not sure I see a need to split relscan - note my patch makes it so that it's not included by heapam.h anymore, and doing for the same for genam.h would be fairly straightforward. The most interesting bit there would be whether we'd add the includes necessary for Snapshot (imo no), Relation (?), ScanKey (imo no), or whether to add the necessary includes directly. > I'll try to have a look at your patch tomorrow. Thanks! Greetings, Andres Freund
On 2019-01-13 19:05:03 -0800, Andres Freund wrote: > Hi, > > On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote: > > On 2019-Jan-13, Andres Freund wrote: > > > > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData > > > being in different files (in a3540b0f657c6352) - what do you think about > > > this change? I didn't revert that split, but I think we ought to > > > consider just relying on a forward declared struct in heapam.h instead, > > > this split is pretty confusing and seems to lead to more interdependence > > > in a lot of cases. > > > > I wasn't terribly happy with that split, so I'm not opposed to doing > > things differently. For your consideration, I've had this patch lying > > around for a few years, which (IIRC) reduces the exposure of heapam.h by > > splitting relscan.h in two. This applies on top of dd778e9d8884 (and as > > I recall it worked well there). > > You forgot to attach that patch... :). > > I'm not sure I see a need to split relscan One split I am wondering about however is splitting out the sysstable_ stuff out of genam.h. It's imo a different component and shouldn't really be in there. Would be quite a bit of rote work to add all the necessary includes over the backend... Greetings, Andres Freund
On 2019-01-13 19:05:03 -0800, Andres Freund wrote: > Hi, > > On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote: > > On 2019-Jan-13, Andres Freund wrote: > > > > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData > > > being in different files (in a3540b0f657c6352) - what do you think about > > > this change? I didn't revert that split, but I think we ought to > > > consider just relying on a forward declared struct in heapam.h instead, > > > this split is pretty confusing and seems to lead to more interdependence > > > in a lot of cases. > > > > I wasn't terribly happy with that split, so I'm not opposed to doing > > things differently. For your consideration, I've had this patch lying > > around for a few years, which (IIRC) reduces the exposure of heapam.h by > > splitting relscan.h in two. This applies on top of dd778e9d8884 (and as > > I recall it worked well there). > > You forgot to attach that patch... :). > > I'm not sure I see a need to split relscan - note my patch makes it so > that it's not included by heapam.h anymore, and doing for the same for > genam.h would be fairly straightforward. The most interesting bit there > would be whether we'd add the includes necessary for Snapshot (imo no), > Relation (?), ScanKey (imo no), or whether to add the necessary includes > directly. Here's a patch doing the same for genam as well. Greetings, Andres Freund
Attachment
On 2019-Jan-13, Andres Freund wrote: > On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote: > > On 2019-Jan-13, Andres Freund wrote: > > > > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData > > > being in different files (in a3540b0f657c6352) - what do you think about > > > this change? I didn't revert that split, but I think we ought to > > > consider just relying on a forward declared struct in heapam.h instead, > > > this split is pretty confusing and seems to lead to more interdependence > > > in a lot of cases. > > > > I wasn't terribly happy with that split, so I'm not opposed to doing > > things differently. For your consideration, I've had this patch lying > > around for a few years, which (IIRC) reduces the exposure of heapam.h by > > splitting relscan.h in two. This applies on top of dd778e9d8884 (and as > > I recall it worked well there). > > You forgot to attach that patch... :). Oops :-( Here it is anyway. Notmuch reminded me that I had posted this before, to a pretty cold reception: https://postgr.es/m/20130917020228.GB7139@eldon.alvh.no-ip.org Needless to say, I disagree with the general sentiment in that thread that header refactoring is pointless and unwelcome. > I'm not sure I see a need to split relscan - note my patch makes it so > that it's not included by heapam.h anymore, and doing for the same for > genam.h would be fairly straightforward. The most interesting bit there > would be whether we'd add the includes necessary for Snapshot (imo no), > Relation (?), ScanKey (imo no), or whether to add the necessary includes > directly. Ah, you managed to get heapam.h and genam.h out of execnodes.h, which I think was my main motivation ... that seems good enough to me. I agree that splitting relscan.h may not be necessary after these changes. As for struct Relation, note that for that one you only need relcache.h which should be lean enough, so it doesn't sound too bad to include that one wherever. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2019-Jan-13, Andres Freund wrote: > One split I am wondering about however is splitting out the sysstable_ > stuff out of genam.h. It's imo a different component and shouldn't > really be in there. Would be quite a bit of rote work to add all the > necessary includes over the backend... Yeah -- unless there's a demonstrable win from this split, I would leave this part alone, unless you regularly carry a shield to PG conferences. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
0001 -- looks reasonable. One hunk in executor.h changes LockTupleMode to "enum LockTupleMode", but there's no need for that. AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr) split is so that it's possible to refer to structs without having the full struct definition. I think changing uses of FooBar to "struct FooBarData *" defeats the whole purpose -- it becomes pointless noise, confusing the reader for no gain. I've long considered that the struct definitions should appear in "internal" headers (such as htup_details.h), separate from the pointer typedefs, so that it is the forward struct declarations (and the pointer typedefs, where there are any) that are part of the exposed API for each module, and not the struct definitions. I think that would be much more invasive, though, and it's unlikely to succeed as easily as this simpler approach is. I think MissingPtr is a terrible name. Can we change that while at this? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-01-14 15:36:14 -0300, Alvaro Herrera wrote: > 0001 -- looks reasonable. One hunk in executor.h changes LockTupleMode > to "enum LockTupleMode", but there's no need for that. Oh, that escaped from an earlier version where I briefly forgot that one cannot portably forward-declare enums. > AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr) > split is so that it's possible to refer to structs without having > the full struct definition. I think changing uses of FooBar to "struct > FooBarData *" defeats the whole purpose -- it becomes pointless noise, > confusing the reader for no gain. I've long considered that the struct > definitions should appear in "internal" headers (such as > htup_details.h), separate from the pointer typedefs, so that it is the > forward struct declarations (and the pointer typedefs, where there are > any) that are part of the exposed API for each module, and not the > struct definitions. I think the whole pointer hiding game that we play is a really really bad idea. We should just about *never* have typedefs that hide the fact that something is a pointer. But it's hard to go away from that for legacy reasons. The problem with your approach is that it's *eminently* reasonable to want to forward declare a struct in multiple places. Otherwise you end up in issues where you include headers like heapam.h solely for a typedef, which obviously doesn't make a ton of sense. If we were in C11 we could just forward declare the pointer hiding typedefs in multiple places, and be done with that. But before C11 redundant typedefs aren't allowed. With the C99 move I'm however not sure if there's any surviving supported compiler that doesn't allow redundant typedefs as an extension. Given the fact that including headers just for a typedef is frequent overkill, hiding the typedef in a separate header has basically no gain. I also don't quite understand why using a forward declaration with struct in the name is that confusing, especially when it only happens in the header. > I think MissingPtr is a terrible name. Can we change that while at > this? Indeed. I'd just remove the typedef. Greetings, Andres Freund
On 2019-Jan-14, Andres Freund wrote: > I think the whole pointer hiding game that we play is a really really > bad idea. We should just about *never* have typedefs that hide the fact > that something is a pointer. But it's hard to go away from that for > legacy reasons. > > The problem with your approach is that it's *eminently* reasonable to > want to forward declare a struct in multiple places. Otherwise you end > up in issues where you include headers like heapam.h solely for a > typedef, which obviously doesn't make a ton of sense. Well, my point is that in an ideal world we would have a header where the struct is declared once in a very lean header, which doesn't include almost anything else, so you can include it into other headers liberally. Then the struct definitions are any another (heavy) header, which *does* need to include lots of other stuff in order to be able to define the structs fully, and would be #included very sparingly, only in the few .c files that really needed it. For example, I would split up execnodes.h so that *only* the typedef/struct declarations are there, and *no* struct definition. Then that header can be #included in other headers that need those to declare functions -- no problem. Another header (say execstructs.h, whatever) would contain the struct definition and would only be used by executor .c files. So when a struct changes, only the executor is recompiled; the rest of the source doesn't care, because execnodes.h (the struct decls) hasn't changed. This may be too disruptive. I'm not suggesting that you do things this way, only describing my ideal world. Your idea of "liberally forward-declaring structs in multiple places" seems like you don't *have* the lean header at all (only the heavy one with all the struct definitions), and instead you distribute bits and pieces of the lean header randomly to the places that need it. It's repetitive. It gets the job done, but it's not *clean*. > Given the fact that including headers just for a typedef is frequent > overkill, hiding the typedef in a separate header has basically no > gain. I also don't quite understand why using a forward declaration with > struct in the name is that confusing, especially when it only happens in > the header. Oh, that's not the confusing part -- that's just repetitive, nothing more. What's confusing (IMO) is having two names for the same struct (one pointer and one full struct). It'd be useful if it was used the way I describe above. But that's the settled project style, so I don't have any hopes that it'll ever be changed. Anyway, I'm not objecting to your patch ... I just don't want it on record that I'm in love with the current situation :-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On 2019-01-14 17:55:44 -0300, Alvaro Herrera wrote: > On 2019-Jan-14, Andres Freund wrote: > > > I think the whole pointer hiding game that we play is a really really > > bad idea. We should just about *never* have typedefs that hide the fact > > that something is a pointer. But it's hard to go away from that for > > legacy reasons. > > > > The problem with your approach is that it's *eminently* reasonable to > > want to forward declare a struct in multiple places. Otherwise you end > > up in issues where you include headers like heapam.h solely for a > > typedef, which obviously doesn't make a ton of sense. > > Well, my point is that in an ideal world we would have a header where > the struct is declared once in a very lean header, which doesn't include > almost anything else, so you can include it into other headers > liberally. Then the struct definitions are any another (heavy) header, > which *does* need to include lots of other stuff in order to be able to > define the structs fully, and would be #included very sparingly, only in > the few .c files that really needed it. > For example, I would split up execnodes.h so that *only* the > typedef/struct declarations are there, and *no* struct definition. Then > that header can be #included in other headers that need those to declare > functions -- no problem. Another header (say execstructs.h, whatever) > would contain the struct definition and would only be used by executor > .c files. So when a struct changes, only the executor is recompiled; > the rest of the source doesn't care, because execnodes.h (the struct > decls) hasn't changed. It's surely better than the current state, but it still requires recompiling everything in a more cases than necessary. > This may be too disruptive. I'm not suggesting that you do things this > way, only describing my ideal world. It'd probably doable by leaving execnodes.h as the heavyweight nodes, and execnodetypes.h as the lightweight one, and including the latter from the former. And then moving users of execnodes over to execnodetypes. > Your idea of "liberally forward-declaring structs in multiple places" > seems like you don't *have* the lean header at all (only the heavy one > with all the struct definitions), and instead you distribute bits and > pieces of the lean header randomly to the places that need it. It's > repetitive. It gets the job done, but it's not *clean*. I'm not really buying the repetitiveness bit - it's really primarily adding 'struct ' prefix, and sometimes adding a 'Data *' postfix. That's not a lot of duplication. When used in structs there's no need to even add an explicit 'struct <name>;' forward declaration, that's only needed for function parameters. And afterwards there's a lot less entanglement - no need to recompile every file just because a new node type has been added etc. > > Given the fact that including headers just for a typedef is frequent > > overkill, hiding the typedef in a separate header has basically no > > gain. I also don't quite understand why using a forward declaration with > > struct in the name is that confusing, especially when it only happens in > > the header. > > Oh, that's not the confusing part -- that's just repetitive, nothing > more. What's confusing (IMO) is having two names for the same struct > (one pointer and one full struct). It'd be useful if it was used the > way I describe above. But that's the settled project style, so I don't > have any hopes that it'll ever be changed. Not within a few days, but we probably can do it over time... > Anyway, I'm not objecting to your patch ... I just don't want it on > record that I'm in love with the current situation :-) Cool, I've pushed these now. I'll rebase my patch to split (heap|reation)_(open|close)(rv)? patch out of heapam.[ch] now. Brr. Greetings, Andres Freund