Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id 681c6a78-2aef-769e-bd0f-290ac650e71d@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Dean Rasheed <dean.a.rasheed@gmail.com>)
List pgsql-hackers

On 03/27/2018 07:34 PM, Tomas Vondra wrote:
> On 03/27/2018 07:03 PM, Dean Rasheed wrote:
>> On 27 March 2018 at 14:58, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>>> On 27 March 2018 at 01:36, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>>> 4) handling of NOT clauses in MCV lists (and in histograms)
>>>>
>>>> The query you posted does not fail anymore...
>>>>
>>> Ah, it turns out the previous query wasn't actually failing for the
>>> reason I thought it was -- it was failing because it had a
>>> ScalarArrayOpExpr that was being passed to
>>> mcv_clauselist_selectivity() because of the wrong list being passed to
>>> it. I could see from the code that a NOT clause would have tripped it
>>> up, but most NOT clauses actually get rewritten by negate_clause() so
>>> they end up not being NOT clauses.
>>>
>>
>> Thinking about that some, I think that the only NOT clauses this needs
>> to actually worry about are NOTs of boolean Vars. Anything else that
>> this code supports will have been transformed into something other
>> than a NOT before reaching this point. Thus it might be much simpler
>> to handle that as a special case in statext_is_compatible_clause() and
>> mcv_update_match_bitmap(), rather than trying to support general NOT
>> clauses, and going through a recursive call to
>> mcv_update_match_bitmap(), and then having to merge bitmaps. NOT of a
>> boolean Var could then be treated just like var=false, setting the
>> appropriate attribute match entry if it's found in the MCV list. This
>> would allow clauses like (a=1 and NOT b) to be supported, which I
>> don't think currently works, because fullmatch won't get set.
>>
> 
> Yes, I came to the same conclusion ;-) I'll send an updated patch later
> today.
> 

Attached is a patch fixing this. In the end I've decided to keep both
branches - one handling boolean Vars and one for NOT clauses. I think
you're right we can only see (NOT var) cases, but I'm not sure about that.

For example, what if an operator does not have a negator? Then we can't
transform NOT (a AND b) => (NOT a OR NOT b), I guess. So I kept this for
now, and we can remove this later.


I've added scalarneqsel, scalarlesel and scalargesel so that we
recognize those cases correctly. This fixes surprising behavior where
"obviously compatible" clauses like (a=1 AND b<1) became incompatible
when NOT was used, because

    NOT (a=1 AND b<1) = (a!=1 OR b>=1)

In my defense, the scalarlesel/scalargesel were introduced fairly
recently, I think.


I've also realized that the "fullmatch" flag is somewhat confused,
because some places interpreted it as "there is equality on each
attribute" but in fact it also required an actual MCV match. So when the
value was rare (not in MCV), it was always false.

There's a WIP part 0002, which should eventually be merged into 0001. It
should properly detect the case when each column has an equality, simply
by counting the top-level equality clauses (I'm not sure about the more
complex cases yet).

Another improvement done in this part is the ndistinct estimate. It
simply extracts Vars (from the top--level equality clauses, because it's
directly related to the fullmatch semantics), and uses that to compute
average frequency of non-MCV items. The function is just a simplified
version of the estimate_num_groups(), for this single-relation case.

BTW an unsquashed tag with those fixes is here:

    https://github.com/tvondra/postgres/tree/mvstats-20180328

it may be more convenient to quickly check the differences than
comparing the patches.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: Enhance pg_stat_wal_receiver view to display connected host
Next
From: Tomas Vondra
Date:
Subject: Re: PL/pgSQL nested CALL with transactions