Re: Proposed refactoring of planner header files - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Proposed refactoring of planner header files
Date
Msg-id 30320.1548904075@sss.pgh.pa.us
Whole thread Raw
In response to Proposed refactoring of planner header files  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Proposed refactoring of planner header files  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I wrote:
> ... I didn't work on tightening selfuncs.c's dependencies.
> While I don't have a big problem with considering selfuncs.c to be
> in bed with the planner, that's risky in that whatever dependencies
> selfuncs.c has may well apply to extensions' selectivity estimators too.
> What I'm thinking about doing there is trying to split selfuncs.c into
> two parts, one being infrastructure that can be tightly tied to the
> core planner (and, likely, get moved under backend/optimizer/) and the
> other being estimators that use a limited API and can serve as models
> for extension code.

I spent some time looking at this, wondering whether it'd be practical
to write selectivity-estimating code that hasn't #included pathnodes.h
(nee relation.h).  It seems not: even pretty high-level functions
such as eqjoinsel() need access to fields like RelOptInfo.rows and
SpecialJoinInfo.jointype.  Now, there are only a few such fields, so
conceivably we could provide accessor functions in optimizer.h for
the commonly-used fields and keep the struct types opaque.   I'm not
excited about that though; it's unlike how we do things elsewhere in
Postgres and the mere savings of one #include dependency doesn't seem
to justify it.  So I'm thinking that planner support functions that
want to do selectivity estimation will still end up pulling in
pathnodes.h via selfuncs.h, and we'll just live with that.

However ... there are three pretty clearly identifiable groups of
functions in selfuncs.c: operator-specific estimators, support
functions, and AM-specific indexscan cost estimation functions.
There's a case to be made for splitting them into three separate files.
One point is that selfuncs.c is just too large; at over 8K lines,
it's currently the 7th-largest hand-maintained C file in our tree.
Another point is that as things stand, there's a strong temptation
to bury useful functionality in static functions that can't be
gotten at by extension estimators.  Separating the support functions
might help keep us more honest on that point.  (Or not.)

I'm not sure whether those arguments are strong enough to justify
the added pain-in-the-neck factor from moving a bunch of code
around.  That complicates back-patching bug fixes and it makes it
harder to trace the git history of the code.  So I've got mixed
emotions about whether it's worth doing anything.

Thoughts?

            regards, tom lane


pgsql-hackers by date:

Previous
From: "Bossart, Nathan"
Date:
Subject: Re: A few new options for vacuumdb
Next
From: David Rowley
Date:
Subject: Re: Ordered Partitioned Table Scans