Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id c2406825-6ebe-26e6-91b3-ba94026546a6@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
List pgsql-hackers

On 3/14/19 12:56 PM, Kyotaro HORIGUCHI wrote:
> At Wed, 13 Mar 2019 19:37:45 +1300, David Rowley <david.rowley@2ndquadrant.com> wrote in
<CAKJS1f_6qDQj9m2H0jF4bRkZVLpfc7O9E+MxdXrq0wgv0z1NrQ@mail.gmail.com>
>> On Wed, 13 Mar 2019 at 17:20, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> bms_member_index seems working differently than maybe expected.
>>>
>>>  bms_member_index((2, 4), 0) => 0, (I think) should be -1
>>>  bms_member_index((2, 4), 1) => 0, should be -1
>>>  bms_member_index((2, 4), 2) => 0, should be 0
>>>  bms_member_index((2, 4), 3) => 1, should be -1
>>>  bms_member_index((2, 4), 4) => 1, should be 1
>>>  bms_member_index((2, 4), 5) => 2, should be -1
>>>  bms_member_index((2, 4), 6) => 2, should be -1
>>> ...
>>>  bms_member_index((2, 4), 63) => 2, should be -1
>>>  bms_member_index((2, 4), 64) => -1, correct
>>>
>>> It works correctly only when x is a member - the way the function
>>> is maybe actually used in this patch -, or needs to change the
>>> specifiction (or the comment) of the function.
>>
>> Looks like:
>>
>> + if (wordnum >= a->nwords)
>> + return -1;
>>
>> should be:
>>
>> + if (wordnum >= a->nwords ||
>> + (a->word[wordnum] & ((bitmapword) 1 << bitnum)) == 0)
>> + return -1;
> 
> Yeah, seems right.
> 

Yep, that was broken. The attached patch fixes this by simply calling
bms_is_member, instead of copying the checks into bms_member_index.

I've also reworked the regression tests to use a function extracting the
cardinality estimates, as proposed by Dean and David. I have not reduced
the size of data sets yet, so the tests are not much faster, but we no
longer check the exact query plan. That's probably a good idea anyway.
Actually - the tests are a bit faster because it allows removing indexes
that were used for the query plans.

FWIW I've noticed an annoying thing when modifying type of column not
included in a statistics. Consider this:

create table t (a int, b int, c text);
insert into t select mod(i,10), mod(i,10), ''
  from generate_series(1,10000) s(i);
create statistics s (dependencies) on a,b from t;
analyze t;

explain analyze select * from t where a = 1 and b = 1;

                                QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..205.00 rows=1000 width=9)
                (actual time=0.014..1.910 rows=1000 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9000
 Planning Time: 0.119 ms
 Execution Time: 2.234 ms
(5 rows)

alter table t alter c type varchar(61);

explain analyze select * from t where a = 1 and b = 1;

                              QUERY PLAN
---------------------------------------------------------------------
 Seq Scan on t  (cost=0.00..92.95 rows=253 width=148)
                (actual time=0.020..2.420 rows=1000 loops=1)
   Filter: ((a = 1) AND (b = 1))
   Rows Removed by Filter: 9000
 Planning Time: 0.128 ms
 Execution Time: 2.767 ms
(5 rows)

select stxdependencies from pg_statistic_ext;

             stxdependencies
------------------------------------------
 {"1 => 2": 1.000000, "2 => 1": 1.000000}
(1 row)

That is, we don't remove the statistics, but the estimate still changes.
But that's because the ALTER TABLE also resets reltuples/relpages:

select relpages, reltuples from pg_class where relname = 't';

 relpages | reltuples
----------+-----------
        0 |         0
(1 row)

That's a bit unfortunate, and it kinda makes the whole effort to not
drop the statistics unnecessarily kinda pointless :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: propagating replica identity to partitions
Next
From: Shawn Debnath
Date:
Subject: Re: Introduce timeout capability for ConditionVariableSleep