Re: Partial Aggregation / GROUP BY before JOIN - Mailing list pgsql-hackers

From Amit Langote
Subject Re: Partial Aggregation / GROUP BY before JOIN
Date
Msg-id 5608EDFF.4040101@lab.ntt.co.jp
Whole thread Raw
In response to Partial Aggregation / GROUP BY before JOIN  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: Partial Aggregation / GROUP BY before JOIN  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On 2015/09/28 13:31, David Rowley wrote:
> I've been spending time working on allowing the planner to perform
> aggregation before the final join relation is created.
>
...

>
> The patch is however so far capable of giving us extremely nice performance
> improvements for some (likely artificial) queries.
> 
> Let's look at a quick example:
> 
> CREATE TABLE product (product_id INT NOT NULL,product_code VARCHAR(64) NOT
> NULL, PRIMARY KEY(product_id));
> CREATE UNIQUE INDEX product_product_code_uidx ON product (product_code);
> -- create small list of products
> INSERT INTO product SELECT g.id,'ABC' || CAST(g.id AS TEXT) FROM
> generate_series(1,100) g(id);
> 
> CREATE TABLE sale (sale_id INT NOT NULL, product_id INT NOT NULL, quantity
> INT NOT NULL);
> 
> INSERT INTO sale (sale_id, product_id,quantity) SELECT
> x.x,x.x%100+1,CAST(random() * 1000 AS INT) FROM
> generate_series(1,100000000) x(x);
> 
> ALTER TABLE sale ADD CONSTRAINT sale_pkey PRIMARY KEY(sale_id);
> 
> test=# SELECT count(sale.sale_id) FROM sale, product;
>     count
> -------------
>  10000000000
> (1 row)
> Time: 10323.053 ms
> 
> 
> And if I disable the optimisation:
> 
> test=# set enable_earlygrouping = off;
> SET
> Time: 0.302 ms
> test=# SELECT count(sale.sale_id) FROM sale, product;
>     count
> -------------
>  10000000000
> (1 row)
> Time: 775790.764 ms
> 
> So, in this probably rather unlikely query, we get something around a 7500%
> performance increase. Of course as the ratio of groups per underlying
> tuples increase, the performance increase will tail off.
> 
> The explain output from the optimised version is as follows:
> 
>                                      QUERY PLAN
> ------------------------------------------------------------------------------------
>  Finalize Aggregate  (cost=1790544.37..1790544.38 rows=1 width=4)
>    ->  Nested Loop  (cost=1790541.10..1790544.12 rows=100 width=4)
>          ->  Partial Aggregate  (cost=1790541.10..1790541.11 rows=1 width=4)
>                ->  Seq Scan on sale  (cost=0.00..1540541.08 rows=100000008
> width=4)
>          ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)
> 
> 

Did you perhaps attach a version of the patch you didn't intend to?

I get the following plan and hence a different result from what's shown above:

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;                                  QUERY PLAN

--------------------------------------------------------------------------------Aggregate  (cost=17909.27..17909.28
rows=1width=4)  ->  Nested Loop  (cost=17906.00..17909.02 rows=100 width=4)        ->  Aggregate
(cost=17906.00..17906.01rows=1 width=4)              ->  Seq Scan on sale  (cost=0.00..15406.00 rows=1000000
 
width=4)        ->  Seq Scan on product  (cost=0.00..2.00 rows=100 width=0)

postgres=# SELECT count(sale.sale_id) FROM sale, product;count
-------  100
(1 row)

postgres=# set enable_earlygrouping = off;
SET

postgres=# EXPLAIN SELECT count(sale.sale_id) FROM sale, product;                               QUERY PLAN
---------------------------------------------------------------------------Aggregate  (cost=1515408.25..1515408.26
rows=1width=4)  ->  Nested Loop  (cost=0.00..1265408.25 rows=100000000 width=4)        ->  Seq Scan on sale
(cost=0.00..15406.00rows=1000000 width=4)        ->  Materialize  (cost=0.00..2.50 rows=100 width=0)              ->
SeqScan on product  (cost=0.00..2.00 rows=100 width=0)
 
(5 rows)

postgres=# SELECT count(sale.sale_id) FROM sale, product;  count
-----------100000000
(1 row)

Am I missing something?

Thanks,
Amit




pgsql-hackers by date:

Previous
From: Kouhei Kaigai
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Tatsuo Ishii
Date:
Subject: Re: Doubt in pgbench TPS number