Thread: splitting htup.h

splitting htup.h

From
Alvaro Herrera
Date:
Hi,

This patch splits htup.h in two pieces -- the first one (tupbasics.h;
not wedded to the name) does not include many other headers and is just
enough to have other parts of the code create tuples and pass them
around, to be used by most other headers.  The other one (which keeps
the name htup.h) contains internal tuple stuff (struct declarations
etc).

Before patch, htup.h is directly or indirectly included by 364 .c files
in src/backend; after patch, that's reduced to 299 files (that's 65
files less to compile if you modify the header).

--
Álvaro Herrera <alvherre@alvh.no-ip.org>

Attachment

Re: splitting htup.h

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> This patch splits htup.h in two pieces -- the first one (tupbasics.h;
> not wedded to the name) does not include many other headers and is just
> enough to have other parts of the code create tuples and pass them
> around, to be used by most other headers.  The other one (which keeps
> the name htup.h) contains internal tuple stuff (struct declarations
> etc).

> Before patch, htup.h is directly or indirectly included by 364 .c files
> in src/backend; after patch, that's reduced to 299 files (that's 65
> files less to compile if you modify the header).

That's kind of a disappointing result --- if we're going to split htup.h
into public and private parts, I would have hoped for a much smaller
inclusion footprint for the private part.  Maybe you could adjust the
boundary between public and private parts a bit more?  If we can't cut
the footprint I'm inclined to think this isn't worth the code churn.

(Or perhaps I'm missing the point.  Do you have a reason for doing this
other than cutting the inclusion footprint?)
        regards, tom lane


Re: splitting htup.h

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of vie jun 15 21:06:21 -0400 2012:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > This patch splits htup.h in two pieces -- the first one (tupbasics.h;
> > not wedded to the name) does not include many other headers and is just
> > enough to have other parts of the code create tuples and pass them
> > around, to be used by most other headers.  The other one (which keeps
> > the name htup.h) contains internal tuple stuff (struct declarations
> > etc).
>
> > Before patch, htup.h is directly or indirectly included by 364 .c files
> > in src/backend; after patch, that's reduced to 299 files (that's 65
> > files less to compile if you modify the header).
>
> That's kind of a disappointing result --- if we're going to split htup.h
> into public and private parts, I would have hoped for a much smaller
> inclusion footprint for the private part.  Maybe you could adjust the
> boundary between public and private parts a bit more?  If we can't cut
> the footprint I'm inclined to think this isn't worth the code churn.

Yeah, I have another version of the patch (attached) that includes
HeapTupleData in the public part; this means catcache.h does not need
htup.h, which was the main propagation vector in the original patch.
With that, 162 .c files include htup.h -- about 130 of them do so
directly, and the rest do so through either relscan.h or tuptoaster.h.

Is this enough?  I think the next step would require moving
HeapTupleHeaderData to the public header as well; I haven't checked to
see how much better it gets.  It would require copying some of the
other includes in htup.h into the public header, though, so there is a
tradeoff.

(Another idea: maybe we could split in three parts rather than two).

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment

Re: splitting htup.h

From
Alvaro Herrera
Date:
Here's a third version with which I'm much happier.

This patch is mainly doing four things:

1. take some typedefs and the HeapTupleData struct definition from
access/htup.h, and put them in access/tupbasics.h.  This new file is
used as #include in all headers instead of htup.h.

2. take out catcache.h from syscache.h; in its stead, we add a forward
struct declaration of struct catclist.  This was proposed by Andres
Freund and Peter Geogeghan previously, and it turns out to be convenient
as well as foreseen by older comments in syscache.h.  It limits
proliferation of other unnecessary headers in catcache.h as well.

3. split resowner.h creating resowner_private.h.  Files that just want
to create and use ResourceOwners can include the thinner resowner.h;
those that have stuff managed within a ResOwner use the other file.
This limits proliferation of lots of other header inclusion.

4. split the Xlog stuff out of heapam.h into heapam_xlog.h.

The number of src/backend files that depend on some src/include/ files:

access/heapam.h    216
access/heapam_xlog.h    12
access/htup.h    182
access/tupbasics.h    401
utils/resowner.h    74
utils/resowner_private.h    10
utils/catcache.h    23
utils/syscache.h    230

It seems pretty clear that all these the splits are useful.

I'm unsure about the "tupbasics.h" file name.  I'm open to better ideas.
The other two new files seem good enough that no bikeshedding seems
really necessary.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: splitting htup.h

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> This patch is mainly doing four things:

> 1. take some typedefs and the HeapTupleData struct definition from
> access/htup.h, and put them in access/tupbasics.h.  This new file is
> used as #include in all headers instead of htup.h.

> I'm unsure about the "tupbasics.h" file name.  I'm open to better ideas.

I'd be inclined to keep the name "htup.h" for the more widely used file,
and invent a new name for what we're splitting out of it.  This should
reduce the number of changes needed, not only in our code but third
party code.  Not sure if the new file could sanely be called
"htup_private.h"; it seems a bit widely used for that.  Maybe "heap.h"?

Also, is there any reason to consider just moving those defs into
heapam.h, instead of inventing a new header?  I'm not sure if there's
any principled distinction between heap.h and heapam.h, or any
significant differences between their sets of consumers.

The other changes all sound sane.
        regards, tom lane



Re: splitting htup.h

From
Tom Lane
Date:
... btw, the buildfarm says you forgot contrib/ while fixing the
collateral damage from these changes.
        regards, tom lane



Re: splitting htup.h

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > This patch is mainly doing four things:
>
> > 1. take some typedefs and the HeapTupleData struct definition from
> > access/htup.h, and put them in access/tupbasics.h.  This new file is
> > used as #include in all headers instead of htup.h.
>
> > I'm unsure about the "tupbasics.h" file name.  I'm open to better ideas.
>
> I'd be inclined to keep the name "htup.h" for the more widely used file,
> and invent a new name for what we're splitting out of it.

Meh.  When seen in that light, it makes a lot of sense.  I was actually
thinking it the other way around.

> This should
> reduce the number of changes needed, not only in our code but third
> party code.  Not sure if the new file could sanely be called
> "htup_private.h"; it seems a bit widely used for that.  Maybe "heap.h"?

Not sure I really like heap.h, but let's put that aside for a moment.

> Also, is there any reason to consider just moving those defs into
> heapam.h, instead of inventing a new header?  I'm not sure if there's
> any principled distinction between heap.h and heapam.h, or any
> significant differences between their sets of consumers.

To test this idea, I created a separate "scandesc.h" file that contains
struct declarations for HeapScanDesc and IndexScanDesc, and included
that one into execnodes.h instead of heapam.h and genam.h.  After fixing
the fallout in the .c files, it turns out that heapam.h is used by 116
files (some of them through relscan.h or hio.h).  Per my previous count,
heap.h would be used by 182 files.

So here's a list that depends on heap.h but not heapam.h (the reverse
is obviously empty because heapam.h itself depends on heap.h).  So I'm
inclined to keep both files separate.

./access/common/heaptuple.c
./access/common/reloptions.c
./access/common/tupconvert.c
./access/common/tupdesc.c
./access/gin/gininsert.c
./access/gist/gistbuild.c
./access/heap/visibilitymap.c
./access/nbtree/nbtsort.c
./access/nbtree/nbtxlog.c
./access/spgist/spginsert.c
./access/transam/rmgr.c
./access/transam/twophase.c
./access/transam/xlog.c
./access/transam/xlogfuncs.c
./bootstrap/bootparse.c
./catalog/indexing.c
./catalog/namespace.c
./commands/variable.c
./executor/execAmi.c
./executor/execQual.c
./executor/execTuples.c
./executor/functions.c
./executor/nodeAgg.c
./executor/nodeHash.c
./executor/nodeHashjoin.c
./executor/nodeSetOp.c
./executor/nodeSubplan.c
./executor/nodeWindowAgg.c
./executor/spi.c
./executor/tstoreReceiver.c
./foreign/foreign.c
./nodes/tidbitmap.c
./optimizer/path/costsize.c
./optimizer/plan/planagg.c
./optimizer/plan/planner.c
./optimizer/plan/subselect.c
./optimizer/util/clauses.c
./parser/parse_coerce.c
./parser/parse_func.c
./parser/parse_oper.c
./parser/parse_type.c
./snowball/dict_snowball.c
./storage/freespace/freespace.c
./storage/lmgr/predicate.c
./storage/page/bufpage.c
./tcop/fastpath.c
./tcop/utility.c
./tsearch/ts_selfuncs.c
./tsearch/wparser.c
./utils/adt/acl.c
./utils/adt/arrayfuncs.c
./utils/adt/array_selfuncs.c
./utils/adt/array_typanalyze.c
./utils/adt/datetime.c
./utils/adt/format_type.c
./utils/adt/genfile.c
./utils/adt/json.c
./utils/adt/lockfuncs.c
./utils/adt/pg_locale.c
./utils/adt/pgstatfuncs.c
./utils/adt/rangetypes_selfuncs.c
./utils/adt/rowtypes.c
./utils/adt/trigfuncs.c
./utils/adt/tsgistidx.c
./utils/adt/varbit.c
./utils/adt/varchar.c
./utils/adt/varlena.c
./utils/cache/inval.c
./utils/cache/lsyscache.c
./utils/cache/syscache.c
./utils/fmgr/fmgr.c
./utils/init/miscinit.c
./utils/misc/superuser.c
./utils/sort/tuplesort.c
./utils/sort/tuplestore.c
./utils/time/combocid.c
./utils/time/tqual.c

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: splitting htup.h

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012:
>> Also, is there any reason to consider just moving those defs into
>> heapam.h, instead of inventing a new header?  I'm not sure if there's
>> any principled distinction between heap.h and heapam.h, or any
>> significant differences between their sets of consumers.

> [ yeah, there's quite a few files that would need heap.h but not heapam.h ]

OK, scratch that thought then.  So we seem to be down to choosing a new
name for what we're going to take out of htup.h.  If you don't like
heap.h, maybe something like heap_tuple.h?  I'm not terribly excited
about it either way though.  Any other ideas out there?
        regards, tom lane



Re: splitting htup.h

From
Alvaro Herrera
Date:
Excerpts from Alvaro Herrera's message of mié ago 29 11:32:04 -0400 2012:
> Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012:

> > This should
> > reduce the number of changes needed, not only in our code but third
> > party code.  Not sure if the new file could sanely be called
> > "htup_private.h"; it seems a bit widely used for that.  Maybe "heap.h"?
>
> Not sure I really like heap.h, but let's put that aside for a moment.

... we already have catalog/heap.h, so that would be one reason to avoid
that name.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: splitting htup.h

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> ... we already have catalog/heap.h, so that would be one reason to avoid
> that name.

Yeah, good point.  We do have some duplicate file names elsewhere, but
it's not a terribly good practice.
        regards, tom lane



Re: splitting htup.h

From
Andres Freund
Date:
On Wednesday, August 29, 2012 05:47:14 PM Tom Lane wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > Excerpts from Tom Lane's message of mar ago 28 17:27:51 -0400 2012:
> >> Also, is there any reason to consider just moving those defs into
> >> heapam.h, instead of inventing a new header?  I'm not sure if there's
> >> any principled distinction between heap.h and heapam.h, or any
> >> significant differences between their sets of consumers.
> > 
> > [ yeah, there's quite a few files that would need heap.h but not heapam.h
> > ]
> 
> OK, scratch that thought then.  So we seem to be down to choosing a new
> name for what we're going to take out of htup.h.  If you don't like
> heap.h, maybe something like heap_tuple.h?  I'm not terribly excited
> about it either way though.  Any other ideas out there?
htup_details.h? That doesn't have the sound of "fringe details" that 
htup_private.h has.

Greetings,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: splitting htup.h

From
Alvaro Herrera
Date:
Excerpts from Andres Freund's message of mié ago 29 12:10:17 -0400 2012:

> > OK, scratch that thought then.  So we seem to be down to choosing a new
> > name for what we're going to take out of htup.h.  If you don't like
> > heap.h, maybe something like heap_tuple.h?  I'm not terribly excited
> > about it either way though.  Any other ideas out there?
> htup_details.h? That doesn't have the sound of "fringe details" that
> htup_private.h has.

htup_details.h seems reasonable, thanks.

Here's the proposed patch (which uses tupheader.h instead of
htup_details.h but other than that it's pretty much what I would
commit).

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: splitting htup.h

From
Alvaro Herrera
Date:
Excerpts from Alvaro Herrera's message of mié ago 29 15:13:11 -0400 2012:
> Excerpts from Andres Freund's message of mié ago 29 12:10:17 -0400 2012:
>
> > > OK, scratch that thought then.  So we seem to be down to choosing a new
> > > name for what we're going to take out of htup.h.  If you don't like
> > > heap.h, maybe something like heap_tuple.h?  I'm not terribly excited
> > > about it either way though.  Any other ideas out there?
> > htup_details.h? That doesn't have the sound of "fringe details" that
> > htup_private.h has.
>
> htup_details.h seems reasonable, thanks.

Committed, with a slight adjustment that I had left out of this patch
but was present in a previous version of it: the function prototypes,
which in the last posted patch were to remain in htup.h, are now in
htup_details.h.  So I removed access/tupdesc.h from htup.h.  I had to
add #include "access/htup_details.h" to half a dozen more .c files but
that seemed a good tradeoff.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services