Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id e96fe815-c55f-14f1-06e1-078ac4babfd0@postgrespro.ru
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists
List pgsql-hackers

On 18.07.2018 02:58, Tomas Vondra wrote:
> On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote:
>> ...
>>
>> Teodor Sigaev has proposed an alternative approach for calculating
>> selectivity of multicolumn join or compound index search.
>> Usually DBA creates compound indexes which can be used  by optimizer to
>> build efficient query execution plan based on index search.
>> We can stores statistic for compound keys of such indexes and (as it is
>> done now for functional indexes) and use it to estimate selectivity
>> of clauses. I have implemented this idea and will publish patch in
>> separate thread soon.
>> Now I just want to share some results for the Tomas examples.
>>
>> So for Vanilla Postges without extra statistic  estimated number of rows
>> is about 4 times smaller than real.
>>
> Can you please post plans with parallelism disabled, and perhaps without
> the aggregate? Both makes reading the plans unnecessarily difficult ...


Sorry, below are plans with disabled parallel execution on simpler 
query(a=1 and b=1):

explain analyze SELECT count(*) FROM foo WHERE a=1 and b=1;



Vanilla:

  Aggregate  (cost=11035.86..11035.87 rows=1 width=8) (actual 
time=22.746..22.746 rows=1 loops=1)
    ->  Bitmap Heap Scan on foo  (cost=291.35..11001.97 rows=13553 
width=0) (actual time=9.055..18.711 rows=50000 loops=1)
          Recheck Cond: ((a = 1) AND (b = 1))
          Heap Blocks: exact=222
          ->  Bitmap Index Scan on foo_a_b_idx  (cost=0.00..287.96 
rows=13553 width=0) (actual time=9.005..9.005 rows=50000 loops=1)
                Index Cond: ((a = 1) AND (b = 1))


----------------------------------------------------------------------

Vanilla + extra statistic (create statistics ab on a,b from foo):

  Aggregate  (cost=12693.35..12693.36 rows=1 width=8) (actual 
time=22.747..22.748 rows=1 loops=1)
    ->  Bitmap Heap Scan on foo  (cost=1490.08..12518.31 rows=70015 
width=0) (actual time=9.399..18.636 rows=50000 loops=1)
          Recheck Cond: ((a = 1) AND (b = 1))
          Heap Blocks: exact=222
          ->  Bitmap Index Scan on foo_a_b_idx (cost=0.00..1472.58 
rows=70015 width=0) (actual time=9.341..9.341 rows=50000 loops=1)
                Index Cond: ((a = 1) AND (b = 1))

----------------------------------------------------------------------

Multicolumn index statistic:

  Aggregate  (cost=11946.35..11946.36 rows=1 width=8) (actual 
time=25.117..25.117 rows=1 loops=1)
    ->  Bitmap Heap Scan on foo  (cost=1080.47..11819.51 rows=50736 
width=0) (actual time=11.568..21.362 rows=50000 loops=1)
          Recheck Cond: ((a = 1) AND (b = 1))
          Heap Blocks: exact=222
          ->  Bitmap Index Scan on foo_a_b_idx (cost=0.00..1067.79 
rows=50736 width=0) (actual time=11.300..11.300 rows=50000 loops=1)
                Index Cond: ((a = 1) AND (b = 1))



pgsql-hackers by date:

Previous
From: Kaye Ann Ignacio
Date:
Subject: Add constraint in a Materialized View
Next
From: "Tsunakawa, Takayuki"
Date:
Subject: RE: Temporary tables prevent autovacuum, leading to XID wraparound