Re: WIP patch for parameterized inner paths - Mailing list pgsql-hackers

From Tom Lane
Subject Re: WIP patch for parameterized inner paths
Date
Msg-id 29697.1327508653@sss.pgh.pa.us
Whole thread Raw
In response to WIP patch for parameterized inner paths  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: WIP patch for parameterized inner paths  (Robert Haas <robertmhaas@gmail.com>)
Re: WIP patch for parameterized inner paths  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
I wrote:
> Attached is a WIP patch for parameterized paths, along the
> lines we have discussed before: ...

I've made considerable progress on the TODO items I listed: indxpath.c
has been ripped apart and restructured, and I have it considering
parameterized paths for hash sub-joins.  I made a deliberate policy
decision not to work very hard on exploring parameterized mergejoin
plans, because that seems to inflate the number of paths considered way
more than the relative usefulness of merge over hash joins justifies.
Also, the attached patch undoes the lobotomization of IN/EXISTS pullup
that we had to install in 8.4 as a stopgap for lack of ability to
consider the type of plan that can now be handled.

After that I started doing some performance testing, and the initial
news was bad: the planner was about 3x slower than 9.1 on a moderately
large join problem.  I've spent the past several days hacking away at
that, and have got it down to about 35% slower by dint of the following
changes:

* micro-optimization of add_path(), in particular avoiding duplicate
calculations during cost comparisons.

* introducing a two-step mechanism for testing whether proposed join
paths are interesting.  The patch first calculates a quick and dirty
lower bound on the cost of the join path (mostly, from the costs of
its input paths) and looks through the joinrel's pathlist to see if
there is already a path that is clearly better.  If not, it proceeds
with the full cost calculation and add_path insertion.  In my testing
the precheck is able to eliminate 80% or so of the proposed paths,
more than repaying the extra pathlist search.

Now this is of course cheating with both hands, in that either of these
optimization techniques could have been applied before ... but they
weren't.  I think that 35% slower on large join problems is probably
acceptable, given that this is investigating a larger number of possible
solutions and hopefully often finding a better plan.  I think I can
knock some more off of that by refactoring the costsize.c APIs, anyway.
Right now the first-pass and second-pass cost calculations are
independent and so there's some repeated work, which I'd like to
eliminate both for speed reasons and to get rid of duplicate code
that'd have to be kept in sync if it's left as-is.

If there are not objections, I'd like to go ahead and commit this
after one more round of polishing.  There's still a lot left to do,
but what it mainly needs now is some testing on real-world planning
problems, and I don't think it's going to get that until it's been
merged in.

            regards, tom lane


Attachment

pgsql-hackers by date:

Previous
From: Marko Kreen
Date:
Subject: Re: GUC_REPORT for protocol tunables was: Re: Optimize binary serialization format of arrays with fixed size elements
Next
From: Alvaro Herrera
Date:
Subject: Re: pg_trigger_depth() v3 (was: TG_DEPTH)