Home > mailing lists
Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: POC: GROUP BY optimization
Date	June 20, 2020 18:23:57
Msg-id	20200620182357.cspxophvqjgqfvys@development Whole thread Raw
In response to	Re: POC: GROUP BY optimization (Dmitry Dolgov <9erthalion6@gmail.com>)
List	pgsql-hackers
Tree view
On Sat, Jun 20, 2020 at 04:30:10PM +0200, Dmitry Dolgov wrote:
>> On Sat, May 16, 2020 at 04:56:09PM +0200, Tomas Vondra wrote:
>>
>> So I don't think there will be a single "interesting" grouping pathkeys
>> (i.e. root->group_pathkeys), but a collection of pathkeys. And we'll
>> need to build grouping paths for all of those, and leave the planner to
>> eventually pick the one giving us the cheapest plan ...
>>
>> A "brute-force" approach would be to generate all possible orderings of
>> group_pathkeys, but that's probably not going to fly - it might easily
>> cause an explosion in number of paths we track, etc. So we'll need to
>> pick/prioritize orderings that are more likely to give us efficient
>> plans, and ORDER BY seems like a good example because it means we won't
>> need an explicit sort.
>
>Yes, I see. I've already rebased the original version of patch and now
>working on your suggestions. In the meantime one question:
>
>> But there are also decisions that can be made only after we build the
>> grouping paths. For example, we may have both GROUP BY and ORDER BY, and
>> there is no "always correct" way to combine those. In some cases it may
>> be correct to use the same pathkeys, in other cases it's better to use
>> different ones (which will require an extra Sort, with additional cost).
>
>Do I understand correctly, your idea is that in some cases it's cheaper
>to use different order for GROUP BY than in ORDER BY even with an extra
>sorting? In the current patch implementation there is an assumption that
>says it's always better to match the order of pathkeys for both GROUP
>BY/ORDER BY (which means that the only degree of freedom there is to
>reorder the tail, which in turn makes both "unsorted input" and
>"partially sorted input" cases from your original email essentially the
>same). Can you show such an example when this assumption is invalid?
>

Yes, that is mostly the point - I don't think we can assume simply using
the ORDER BY pathkeys (if specified) for grouping is optimal. As a
trivial counter-example, consider this

CREATE TABLE t (
   a int,
   b int,
   c int);

INSERT INTO t SELECT 1000 * random(), 1000 * random(), i
   FROM generate_series(1,10000000) s(i);

CREATE INDEX ON t (a, b, c);

VACUUM ANALYZE t;


And now some simple queries (you may need to tweak the planning options
a bit, to get these plans - disable hashagg etc.).


test=# explain analyze select a, b, count(c) from t group by a, b;
                                                                    QUERY PLAN
                         
 

-------------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.037..10509.505 rows=1001923 loops=1)
    Group Key: a, b
    ->  Index Only Scan using t_a_b_c_idx on t  (cost=0.43..304007.06 rows=10000175 width=12) (actual
time=0.024..5189.908rows=10000000 loops=1)
 
          Heap Fetches: 0
  Planning Time: 0.113 ms
  Execution Time: 10941.677 ms
(6 rows)


test=# explain analyze select a,b, count(c) from t group by a, b order by a, b;
                                                                    QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.033..10606.925 rows=1001923 loops=1)
    Group Key: a, b
    ->  Index Only Scan using t_a_b_c_idx on t  (cost=0.43..304007.06 rows=10000175 width=12) (actual
time=0.020..5240.332rows=10000000 loops=1)
 
          Heap Fetches: 0
  Planning Time: 0.100 ms
  Execution Time: 11043.358 ms
(6 rows)


test=# explain analyze select a,b, count(c) from t group by a, b order by b, a;
                                                           QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
  GroupAggregate  (cost=1658556.19..1768558.12 rows=1000018 width=16) (actual time=14707.676..25533.053 rows=1001923
loops=1)
    Group Key: b, a
    ->  Sort  (cost=1658556.19..1683556.63 rows=10000175 width=12) (actual time=14707.659..20011.024 rows=10000000
loops=1)
          Sort Key: b, a
          Sort Method: external merge  Disk: 219200kB
          ->  Seq Scan on t  (cost=0.00..154056.75 rows=10000175 width=12) (actual time=0.028..4751.244 rows=10000000
loops=1)
  Planning Time: 0.103 ms
  Execution Time: 25989.412 ms
(8 rows)


test=# explain analyze select * from (select a,b, count(c) from t group by a, b offset 0) foo order by b,a;
                                                                       QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=515759.00..518259.04 rows=1000018 width=16) (actual time=11281.094..11783.920 rows=1001923 loops=1)
    Sort Key: t.b, t.a
    Sort Method: external merge  Disk: 34880kB
    ->  GroupAggregate  (cost=0.43..389008.55 rows=1000018 width=16) (actual time=0.064..10507.256 rows=1001923
loops=1)
          Group Key: t.a, t.b
          ->  Index Only Scan using t_a_b_c_idx on t  (cost=0.43..304007.06 rows=10000175 width=12) (actual
time=0.052..5209.939rows=10000000 loops=1)
 
                Heap Fetches: 0
  Planning Time: 0.111 ms
  Execution Time: 12218.370 ms
(9 rows)


To sum this up:

   grouping (a,b): 10941 ms
   grouping (a,b) + ordering (a,b): 11043
   grouping (b,a) + ordering (b,a): 25989
   grouping (a,b) + ordering (b,a): 12218

So, it's fast as long as we can use the IOS, even if we have to do an
extra Sort at the end. This obviously relies on the grouping to reduce
the number of rows (in this case from 10M to 1M), which makes the extra
cost much cheaper.

I'm pretty sure I could construct similar examples e.g. with incremental
sort, and so on.

Does this explain why I think we can't make the assumption that it's
always better to match the pathkeys?

FWIW, I think the change of plan for the third query (where we specify
"GROUP BY a,b" but the planner decides to just change that) is rather
annoying, and it clearly makes things worse :-(


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date:
From: Alexander Korotkov
Date: 20 June 2020, 15:15:46
Subject: Re: git.postgresql.org ok?
From: Peter Geoghegan
Date: 20 June 2020, 19:15:17
Subject: Re: Operator class parameters and sgml docs
Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Previous

Next