Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date | |
Msg-id | 3fcfd5e5-6849-34e6-22ab-1b62d191bedb@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
List | pgsql-hackers |
Hi, On 07/18/2018 09:32 AM, Konstantin Knizhnik wrote: > > > On 18.07.2018 02:58, Tomas Vondra wrote: >> On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote: >>> ... >>> >>> Teodor Sigaev has proposed an alternative approach for calculating >>> selectivity of multicolumn join or compound index search. >>> Usually DBA creates compound indexes which can be used by optimizer to >>> build efficient query execution plan based on index search. >>> We can stores statistic for compound keys of such indexes and (as it is >>> done now for functional indexes) and use it to estimate selectivity >>> of clauses. I have implemented this idea and will publish patch in >>> separate thread soon. >>> Now I just want to share some results for the Tomas examples. >>> >>> So for Vanilla Postges without extra statistic estimated number of rows >>> is about 4 times smaller than real. >>> >> Can you please post plans with parallelism disabled, and perhaps without >> the aggregate? Both makes reading the plans unnecessarily difficult ... > > > Sorry, below are plans with disabled parallel execution on simpler > query(a=1 and b=1): > > explain analyze SELECT count(*) FROM foo WHERE a=1 and b=1; > > > > Vanilla: > > Aggregate (cost=11035.86..11035.87 rows=1 width=8) (actual > time=22.746..22.746 rows=1 loops=1) > -> Bitmap Heap Scan on foo (cost=291.35..11001.97 rows=13553 > width=0) (actual time=9.055..18.711 rows=50000 loops=1) > Recheck Cond: ((a = 1) AND (b = 1)) > Heap Blocks: exact=222 > -> Bitmap Index Scan on foo_a_b_idx (cost=0.00..287.96 > rows=13553 width=0) (actual time=9.005..9.005 rows=50000 loops=1) > Index Cond: ((a = 1) AND (b = 1)) > > > ---------------------------------------------------------------------- > > Vanilla + extra statistic (create statistics ab on a,b from foo): > > Aggregate (cost=12693.35..12693.36 rows=1 width=8) (actual > time=22.747..22.748 rows=1 loops=1) > -> Bitmap Heap Scan on foo (cost=1490.08..12518.31 rows=70015 > width=0) (actual time=9.399..18.636 rows=50000 loops=1) > Recheck Cond: ((a = 1) AND (b = 1)) > Heap Blocks: exact=222 > -> Bitmap Index Scan on foo_a_b_idx (cost=0.00..1472.58 > rows=70015 width=0) (actual time=9.341..9.341 rows=50000 loops=1) > Index Cond: ((a = 1) AND (b = 1)) > > ---------------------------------------------------------------------- > > Multicolumn index statistic: > > Aggregate (cost=11946.35..11946.36 rows=1 width=8) (actual > time=25.117..25.117 rows=1 loops=1) > -> Bitmap Heap Scan on foo (cost=1080.47..11819.51 rows=50736 > width=0) (actual time=11.568..21.362 rows=50000 loops=1) > Recheck Cond: ((a = 1) AND (b = 1)) > Heap Blocks: exact=222 > -> Bitmap Index Scan on foo_a_b_idx (cost=0.00..1067.79 > rows=50736 width=0) (actual time=11.300..11.300 rows=50000 loops=1) > Index Cond: ((a = 1) AND (b = 1)) > I wonder what happened to this alternative approach, relying on stats from multicolumn indexes ... -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: