Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD) - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)
Date
Msg-id 87h9qriaoc.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Join Filter vs. Index Cond (performance regression 9.1->9.2+/HEAD)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
Tom> Once you're down to an estimate of one row retrieved, addingTom> additional index conditions simply increases the
cost(not byTom> much, but it increases) without delivering any visible benefit.
 

OK, but this is a serious problem because "estimate of one row" is a
very common estimation failure mode, and isn't always solvable in the
sense of arranging for better estimates (in the absence of hints, ugh).
Tom> I believe what probably happened in this case is that the plannerTom> considered both forms of the indexscan path
andconcluded thatTom> they were fuzzily the same cost and rowcount, yet the path usingTom> only t2.a and t3.b clearly
dominatedby requiring strictly fewerTom> outer relations for parameters.  So it threw away the path thatTom> also had
thec = t4.c comparison before it ever got to the joinTom> stage.  Even had it kept that path, the join cost
estimateTom>wouldn't have looked any better than the one for the join it didTom> pick, so there would have been no
certaintyof picking theTom> "correct" plan.
 
Tom> The real problem in your example is thus the incorrect rowcountTom> estimate; with better rowcount estimates the
twocases wouldn'tTom> have appeared to have the same output rowcount.
 
Tom> For the toy data in your example, this can probably be blamed onTom> the fact that eqjoinsel_inner doesn't have
anysmarts for the caseTom> of having an MCV list for only one side (though as noted in theTom> comments, it's not
obviouswhat it should do instead).  However,Tom> it's not very clear what was happening in the real-world case.
 

In the real-world case, t1 was something like an "overrides" table for
data otherwise obtained from the other tables, i.e. special-case
exceptions for general rules. As such it is highly skew, with many
possible (a,b) values having no row at all, but others having hundreds
of matches on (a,b) (but only one at most on (a,b,c) since this was the
pkey in the real data as well as the testcase).

Accordingly, there was no way that we could identify of getting any kind
of better estimate of rowcount.

-- 
Andrew (irc:RhodiumToad)



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Release notes committed
Next
From: Robert Haas
Date:
Subject: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1