Idea for reducing planning time - Mailing list pgsql-hackers

From Tom Lane
Subject Idea for reducing planning time
Date
Msg-id 15124.976749296@sss.pgh.pa.us
Whole thread Raw
Responses Re: Idea for reducing planning time
Re: Idea for reducing planning time
List pgsql-hackers
I've been looking into Brian Hirt's complaint that 7.0.3 and 7.1 are
lots slower than 7.0.2 in planning big joins.  The direct cause is that
since we now deduce implied equality clauses, the system has more
potential join paths than it used to --- for example, given "WHERE
a.x = b.y AND b.y = c.z", the old system would not consider joining a to
c and then adding b, because it didn't have a joinqual relating a and c.
Now it does.  There's not a lot to be done about that, but I've been
looking to see if we can make any offsetting speedups.

While digging around, I've realized that the planner is still carrying
around a lot more paths than it really needs to.  The rule in add_path()
is that we will keep a path if it is cheaper OR differently sorted/
better sorted than any other path for the same rel.  But what is not
taken into account is whether the sort ordering of a path actually has
any potential usefulness.  Before saving a path on the grounds that it's
got an otherwise unobtainable sort ordering, we should check to see if
that sort ordering is really going to be useful for a later mergejoin
or for the final output ordering.  It turns out we already have code to
check that for basic indexscan paths --- see useful_for_mergejoin() and
useful_for_ordering() in indxpath.c.  But we fail to make the same sort
of test on paths for join relations, with the result that we carry along
a lot more paths than we could possibly need, and that costs huge
amounts of time.

An example of what's going on is that given a query withFROM a, b, ... other rels ...WHERE a.w = 1 AND a.x = 2 AND b.y
=3 AND b.z = 4 ...
 
if w,x,y,z all have indexes we will consider indexscans on all four
of those indexes.  Which is fine.  But we will then consider nestloops
and mergejoins of a with b that use these four indexscans as the outer
path, and therefore yield results that are sorted by w,x,y,z
respectively.  Those paths will be carried as possible paths for a+b
because they offer different sort orders, even if we have no further
use for those sort orderings.  And then we have a combinatorial blowup
in the number of paths considered at higher join levels.  We should
instead consider that these paths have no useful sort ordering, and
throw away all but the cheapest.

What I'm thinking of doing is truncating the recorded pathkeys of a path
at the first sortkey that's not useful for either a higher-level
mergejoin clause or the requested final output sort ordering.  Then the
logic inside add_path() wouldn't change, but it would only be considering
useful pathkeys and not useless ones.

I'm trying to resist the temptation to make this change right now :-).
It's not quite a bug fix --- well, maybe you could call it a performance
bug fix --- so I'm kind of thinking it shouldn't be done during beta.
OTOH I seem to have lost the argument that Vadim shouldn't commit VACUUM
performance improvements during beta, so maybe this should go in too.
What do you think?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Martin A. Marques"
Date:
Subject: Re: Why vacuum?
Next
From: Alfred Perlstein
Date:
Subject: Re: Why vacuum?