On 07/15/2018 11:36 AM, Dean Rasheed wrote:
> On 13 July 2018 at 18:27, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I'm not so sure. The issue is that a lot of the MCV deductions depends
>> on whether we can answer questions like "Is there a single match?" or
>> "If we got a match in MCV, do we need to look at the non-MCV part?" This
>> is not very different from the single-column estimates, except of course
>> here we need to look at multiple columns.
>>
>> The top-level clauses allow us to make such deductions, with deeper
>> clauses it's much more difficult (perhaps impossible). Because for
>> example with (a=1 AND b=1) there can be just a single match, so if we
>> find it in MCV we're done. With clauses like ((a=1 OR a=2) AND (b=1 OR
>> b=2)) it's not that simple, because there may be multiple combinations
>> and so a match in MCV does not guarantee anything.
>
> Actually, it guarantees a lower bound on the overall selectivity, and
> maybe that's the best that we can do in the absence of any other
> stats.
>
Hmmm, is that actually true? Let's consider a simple example, with two
columns, each with just 2 values, and a "perfect" MCV list:
a | b | frequency
-------------------
1 | 1 | 0.5
2 | 2 | 0.5
And let's estimate sel(a=1 & b=2). Your proposed algorithm does this:
1) sel(a=1) = 0.5
2) sel(b=2) = 0.5
3) total_sel = sel(a=1) * sel(b=2) = 0.25
4) mcv_sel = 0.0
5) total_sel = Max(total_sel, mcv_sel) = 0.25
How is that a lower bound? Or what is it lower than?
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services