Idea for reducing planning time - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Idea for reducing planning time |
Date | |
Msg-id | 15124.976749296@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Idea for reducing planning time
Re: Idea for reducing planning time |
List | pgsql-hackers |
I've been looking into Brian Hirt's complaint that 7.0.3 and 7.1 are lots slower than 7.0.2 in planning big joins. The direct cause is that since we now deduce implied equality clauses, the system has more potential join paths than it used to --- for example, given "WHERE a.x = b.y AND b.y = c.z", the old system would not consider joining a to c and then adding b, because it didn't have a joinqual relating a and c. Now it does. There's not a lot to be done about that, but I've been looking to see if we can make any offsetting speedups. While digging around, I've realized that the planner is still carrying around a lot more paths than it really needs to. The rule in add_path() is that we will keep a path if it is cheaper OR differently sorted/ better sorted than any other path for the same rel. But what is not taken into account is whether the sort ordering of a path actually has any potential usefulness. Before saving a path on the grounds that it's got an otherwise unobtainable sort ordering, we should check to see if that sort ordering is really going to be useful for a later mergejoin or for the final output ordering. It turns out we already have code to check that for basic indexscan paths --- see useful_for_mergejoin() and useful_for_ordering() in indxpath.c. But we fail to make the same sort of test on paths for join relations, with the result that we carry along a lot more paths than we could possibly need, and that costs huge amounts of time. An example of what's going on is that given a query withFROM a, b, ... other rels ...WHERE a.w = 1 AND a.x = 2 AND b.y =3 AND b.z = 4 ... if w,x,y,z all have indexes we will consider indexscans on all four of those indexes. Which is fine. But we will then consider nestloops and mergejoins of a with b that use these four indexscans as the outer path, and therefore yield results that are sorted by w,x,y,z respectively. Those paths will be carried as possible paths for a+b because they offer different sort orders, even if we have no further use for those sort orderings. And then we have a combinatorial blowup in the number of paths considered at higher join levels. We should instead consider that these paths have no useful sort ordering, and throw away all but the cheapest. What I'm thinking of doing is truncating the recorded pathkeys of a path at the first sortkey that's not useful for either a higher-level mergejoin clause or the requested final output sort ordering. Then the logic inside add_path() wouldn't change, but it would only be considering useful pathkeys and not useless ones. I'm trying to resist the temptation to make this change right now :-). It's not quite a bug fix --- well, maybe you could call it a performance bug fix --- so I'm kind of thinking it shouldn't be done during beta. OTOH I seem to have lost the argument that Vadim shouldn't commit VACUUM performance improvements during beta, so maybe this should go in too. What do you think? regards, tom lane
pgsql-hackers by date: