Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id f0625b09-bf50-5703-6241-51aea96a3f6a@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists
List pgsql-hackers

On 07/15/2018 11:36 AM, Dean Rasheed wrote:
> On 13 July 2018 at 18:27, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> I'm not so sure. The issue is that a lot of the MCV deductions depends
>> on whether we can answer questions like "Is there a single match?" or
>> "If we got a match in MCV, do we need to look at the non-MCV part?" This
>> is not very different from the single-column estimates, except of course
>> here we need to look at multiple columns.
>>
>> The top-level clauses allow us to make such deductions, with deeper
>> clauses it's much more difficult (perhaps impossible). Because for
>> example with (a=1 AND b=1) there can be just a single match, so if we
>> find it in MCV we're done. With clauses like ((a=1 OR a=2) AND (b=1 OR
>> b=2)) it's not that simple, because there may be multiple combinations
>> and so a match in MCV does not guarantee anything.
> 
> Actually, it guarantees a lower bound on the overall selectivity, and
> maybe that's the best that we can do in the absence of any other
> stats.
> 

Hmmm, is that actually true? Let's consider a simple example, with two 
columns, each with just 2 values, and a "perfect" MCV list:

     a | b | frequency
    -------------------
     1 | 1 | 0.5
     2 | 2 | 0.5

And let's estimate sel(a=1 & b=2). Your proposed algorithm does this:

1) sel(a=1) = 0.5
2) sel(b=2) = 0.5
3) total_sel = sel(a=1) * sel(b=2) = 0.25
4) mcv_sel = 0.0
5) total_sel = Max(total_sel, mcv_sel) = 0.25

How is that a lower bound? Or what is it lower than?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Next
From: Tomas Vondra
Date:
Subject: Re: Parallel queries in single transaction