Re: [HACKERS] PoC: Grouped base relation - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] PoC: Grouped base relation
Date
Msg-id 4d60b453-7a3f-47f8-2c4a-53764d278c93@2ndquadrant.com
Whole thread Raw
In response to Re: [HACKERS] PoC: Grouped base relation  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On 01/17/2017 06:39 AM, David Rowley wrote:
> On 17 January 2017 at 16:30, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> Let's instead use an example similar to what Antonin mentioned in the
>> initial post - two tables, with two columns each.
>>
>>     CREATE TABLE t1 (a INT, b INT);
>>     CREATE TABLE t2 (c INT, d INT);
>>
>> And let's assume each table has 100.000 rows, but only 100 groups in the
>> first column, with 1000 rows per group. Something like
>>
>>     INSERT INTO t1
>>     SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i);
>>
>>     INSERT INTO t2
>>     SELECT mod(i,100), i FROM generate_series(1, 1e5) s(i);
>>
>> And let's assume this query
>>
>> SELECT t1.a, count(t2.d) FROM t1 JOIN t2 ON (t1.a = t2.c)
>>  GROUP BY t1.a;
>>
>> On master, EXPLAIN (COSTS OFF, TIMING OFF, ANALYZE) looks like this:
>>
>>                            QUERY PLAN
>> -----------------------------------------------------------------
>>  HashAggregate (actual rows=100 loops=1)
>>    Group Key: t1.a
>>    ->  Hash Join (actual rows=100000000 loops=1)
>>          Hash Cond: (t2.c = t1.a)
>>          ->  Seq Scan on t2 (actual rows=100000 loops=1)
>>          ->  Hash (actual rows=100000 loops=1)
>>                Buckets: 131072  Batches: 2  Memory Usage: 2716kB
>>                ->  Seq Scan on t1 (actual rows=100000 loops=1)
>>  Planning time: 0.167 ms
>>  Execution time: 17005.300 ms
>> (10 rows)
>>
>> while with the patch it looks like this
>>
>>                              QUERY PLAN
>> ---------------------------------------------------------------------
>>  Finalize HashAggregate (actual rows=100 loops=1)
>>    Group Key: t1.a
>>    ->  Hash Join (actual rows=100 loops=1)
>>          Hash Cond: (t1.a = t2.c)
>>          ->  Partial HashAggregate (actual rows=100 loops=1)
>>                Group Key: t1.a
>>                ->  Seq Scan on t1 (actual rows=100000 loops=1)
>>          ->  Hash (actual rows=100 loops=1)
>>                Buckets: 1024  Batches: 1  Memory Usage: 14kB
>>                ->  Partial HashAggregate (actual rows=100 loops=1)
>>                      Group Key: t2.c
>>                      ->  Seq Scan on t2 (actual rows=100000 loops=1)
>>  Planning time: 0.105 ms
>>  Execution time: 31.609 ms
>> (14 rows)
>>
>> This of course happens because with the patch we run the transition function
>> 200k-times (on each side of the join) and aggtransmultifn on the 100 rows
>> produced by the join, while on master the join produces 10.000.000 rows
>> (which already takes much more time), and then have to run the transition
>> function on all those rows.
>>
>> The performance difference is pretty obvious, I guess.
>
> An exceptional improvement.
>

I'm not sure if you're using "exceptional" in the "excellent" sense, or 
"rare to happen in practice". But I guess both meanings apply here, 
actually ;-)

> For the combine aggregate example of this query, since no patch exists
> yet, we could simply mock what the planner would do by rewriting the
> query. I'll use SUM() in-place of the combinefn for COUNT():
>
> explain analyze SELECT t1.a, sum(t2.d) FROM t1 join (SELECT c,count(d)
> d from t2 group by c) t2 on t1.a = t2.c group by t1.a;
>
                                   QUERY PLAN
-------------------------------------------------------------------------------- HashAggregate (actual rows=100
loops=1)  Group Key: t1.a   ->  Hash Join (actual rows=100000 loops=1)         Hash Cond: (t1.a = t2.c)         ->  Seq
Scanon t1 (actual rows=100000 loops=1)         ->  Hash (actual rows=100 loops=1)               Buckets: 1024  Batches:
1 Memory Usage: 13kB               ->  Subquery Scan on t2 (actual rows=100 loops=1)                     ->
HashAggregate(actual rows=100 loops=1)                           Group Key: t2_1.c                           ->  Seq
Scanon t2 t2_1 (actual rows=100000 
 
loops=1) Planning time: 0.271 ms Execution time: 60.226 ms
(13 rows)

> this seems to be 100,000 aggtransfn calls (for t2), then 100,000
> aggcombinefn calls (for t1) (total = 200,000), where as the patch
> would perform 100,000 aggtransfn calls (for t2), then 100,000
> aggtransfn calls (for t1), then 100 aggtransmultifn calls (total =
> 200,100)
>
> Is my maths ok?
>

Yes, I believe the math for agg function calls is correct.
>>
>> I don't quite see how the patch could use aggcombinefn without sacrificing a
>> lot of the benefits. Sure, it's possible to run the aggcombinefn in a loop
>> (with number of iterations determined by the group size on the other side of
>> the join), but that sounds pretty expensive and eliminates the reduction of
>> transition function calls. The join cardinality would still be reduced,
>> though.
>
> I'd be interested in seeing the run time of my example query above.
> I can't quite see a reason for it to be slower, but please let me
> know.
>

It's a bit slower of course, because the join needs to process more rows 
from t1. The patch reduces cardinalities on both sides of the join, if 
possible - the example schema is constructed to benefit from this, of 
course.
>>
>> I do have other question about the patch, however. It seems to rely on the
>> fact that the grouping and joins both reference the same columns. I wonder
>> how uncommon such queries are.
>>
>> To give a reasonable example, imagine the typical start schema, which is
>> pretty standard for large analytical databases. A dimension table is
>> "products" and the fact table is "sales", and the schema might look like
>> this:
>>
>> CREATE TABLE products (
>>     id            SERIAL PRIMARY KEY,
>>     name          TEXT,
>>     category_id   INT,
>>     producer_id   INT
>> );
>>
>> CREATE TABLE sales (
>>     product_id    REFERENCES products (id),
>>     nitems        INT,
>>     price         NUMERIC
>> );
>>
>> A typical query then looks like this:
>>
>>     SELECT category_id, SUM(nitems), SUM(price)
>>     FROM products p JOIN sales s ON (p.id = s.product_id)
>>     GROUP BY p.category_id;
>>
>> which obviously uses different columns for the grouping and join, and so the
>> patch won't help with that. Of course, a query grouping by product_id would
>> allow the patch to work
>>
>>     SELECT category_id, SUM(nitems), SUM(price)
>>     FROM products p JOIN sales s ON (p.id = s.product_id)
>>     GROUP BY p.product_id;
>>
>> Another thing is that in my experience most queries do joins on foreign keys
>> (so the PK side is unique by definition), so the benefit on practical
>> examples is likely much smaller.
>>
>> But I guess my main question is if there are actual examples of queries the
>> patch is trying to improve, or whether the general benefit is allowing
>> parallel plans for queries where it would not be possible otherwise.
>
> Using the combine function technique the planner could have performed
> this query the same as if the query had been written as:
>
> SELECT p.category_id, SUM(sum_nitems), SUM(sum_price) FROM products p
> JOIN (SELECT product_id,SUM(nitems) sum_nitems,SUM(price) sum_price
> FROM sales GROUP BY product_id) s ON p.product_id = s.product_id GROUP
> BY p.category_id;
>
> The outer SUM() would be the combine function for SUM() in the
> finalize aggregate node.
>
> Why's that less efficient?
>

I'm a bit confused. I wasn't talking about efficiency at all, but rather 
about which cases the patch currently optimizes, and whether it can be 
extended.

The patch currently does nothing for the "group by category_id" query, 
because it only changes the case when the grouping and join happen on 
the same columns. So my question is if this is inherent limitation, or 
if the patch can be extended to such queries. Perhaps it's just a 
limitation of the WIP patch version?

You're right the rewritten query performs better compared to master:

1) master
                              QUERY PLAN
---------------------------------------------------------------------- HashAggregate (actual rows=100 loops=1)   Group
Key:p.category_id   ->  Hash Join (actual rows=10000000 loops=1)         Hash Cond: (s.product_id = p.id)         ->
SeqScan on sales s (actual rows=10000000 loops=1)         ->  Hash (actual rows=10000 loops=1)               Buckets:
16384 Batches: 1  Memory Usage: 519kB               ->  Seq Scan on products p (actual rows=10000 loops=1) Planning
time:0.410 ms Execution time: 3577.070 ms
 
(10 rows)

2) rewritten
                              QUERY PLAN
---------------------------------------------------------------------- HashAggregate (actual rows=100 loops=1)   Group
Key:p.category_id   ->  Hash Join (actual rows=10000 loops=1)         Hash Cond: (sales.product_id = p.id)         ->
HashAggregate(actual rows=10000 loops=1)               Group Key: sales.product_id               ->  Seq Scan on sales
(actualrows=10000000 loops=1)         ->  Hash (actual rows=10000 loops=1)               Buckets: 16384  Batches: 1
MemoryUsage: 519kB               ->  Seq Scan on products p (actual rows=10000 loops=1) Planning time: 0.555 ms
Executiontime: 2585.287 ms
 
(12 rows)

I can't really compare it to the patch, because that simply does the 
same thing as master.
>
> I don't deny that there's cases that this multiple aggregate function
> won't be able to optimise better than the combine function, but I'm
> just not that convinced yet it'll be worth the trouble when combine
> functions, which are already in core could do most of what would be
> useful with a fraction of the code.
>

Sure, no problem with that. It's essentially a cost/benefit analysis, 
and we're still trying to understand what the patch does/can do.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Beena Emerson
Date:
Subject: Re: [HACKERS] increasing the default WAL segment size
Next
From: Kyotaro HORIGUCHI
Date:
Subject: Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?