Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PATCH: multivariate histograms and MCV lists
Date
Msg-id abef0126-ff75-850f-0a25-063a8ffa07ad@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: [HACKERS] PATCH: multivariate histograms and MCV lists  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
On 07/18/2018 12:41 AM, Konstantin Knizhnik wrote:
> ...
> 
> Teodor Sigaev has proposed an alternative approach for calculating
> selectivity of multicolumn join or compound index search.
> Usually DBA creates compound indexes which can be used  by optimizer to
> build efficient query execution plan based on index search.
> We can stores statistic for compound keys of such indexes and (as it is
> done now for functional indexes) and use it to estimate selectivity
> of clauses. I have implemented this idea and will publish patch in
> separate thread soon.
> Now I just want to share some results for the Tomas examples.
> 
> So for Vanilla Postges without extra statistic  estimated number of rows
> is about 4 times smaller than real.
> 

Can you please post plans with parallelism disabled, and perhaps without
the aggregate? Both makes reading the plans unnecessarily difficult ...

>  postgres=# explain analyze SELECT count(*) FROM foo WHERE a=1 and (b=1
> or b=2);
> QUERY PLAN
> 
>
--------------------------------------------------------------------------------------------------------------------------------------------
> 
>  Aggregate  (cost=10964.76..10964.77 rows=1 width=8) (actual
> time=49.251..49.251 rows=1 loops=1)
>    ->  Bitmap Heap Scan on foo  (cost=513.60..10906.48 rows=23310
> width=0) (actual time=17.368..39.928 rows=90000 loops=1)

ok, 23k vs. 90k

> 
> If we add statistic for a and b  columns:
> 
>      create statistics ab on a,b from foo;
>      analyze foo;
> 
> then expected results is about 30% larger then real: 120k vs 90k:
> 

Eh? The plan however says 9k vs. 30k ...

>                ->  Parallel Bitmap Heap Scan on foo
> (cost=2561.33..13424.24 rows=9063 width=0) (actual time=15.551..26.057
> rows=30000 loops=3)

...

> With compound index statistic estimation is almost equal to real value:
> 
>    ->  Bitmap Heap Scan on foo  (cost=1880.20..13411.66 rows=23310
> width=0) (actual time=38.776..61.050 rows=90000 loops=1)

Well, this says 23k vs. 90k.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: "Write amplification" is made worse by "getting tired" whileinserting into nbtree secondary indexes (Was: Why B-Tree suffix truncation matters)
Next
From: Tom Lane
Date:
Subject: Re: ENOSPC FailedAssertion("!(RefCountErrors == 0)"