Home > mailing lists

Re: significant slowdown of HashAggregate between 9.6 and 10 - Mailing list pgsql-hackers

From	Pavel Stehule
Subject	Re: significant slowdown of HashAggregate between 9.6 and 10
Date	June 3, 2020 18:43:08
Msg-id	CAFj8pRA=xqDC2zq4=4dyK+AQ7ohHeZ674tJ+_4dqav8aVy=ahg@mail.gmail.com Whole thread Raw
In response to	significant slowdown of HashAggregate between 9.6 and 10 (Pavel Stehule <pavel.stehule@gmail.com>)
Responses	Re: significant slowdown of HashAggregate between 9.6 and 10 (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List	pgsql-hackers

Tree view

st 3. 6. 2020 v 17:32 odesílatel Pavel Stehule <pavel.stehule@gmail.com> napsal:

Hi

One czech Postgres user reported performance issue related to speed HashAggregate in nested loop.

The speed of 9.6

HashAggregate (cost=27586.10..27728.66 rows=14256 width=24)
(actual time=0.003..0.049 rows=39 loops=599203)

The speed of 10.7

HashAggregate (cost=27336.78..27552.78 rows=21600 width=24)
(actual time=0.011..0.156 rows=38 loops=597137)

So it looks so HashAgg is about 3x slower - with brutal nested loop it is a problem.

I wrote simple benchmark and really looks so our hash aggregate is slower and slower.

create table foo(a int, b int, c int, d int, e int, f int);
insert into foo select random()*1000, random()*4, random()*4, random()* 2, random()*100, random()*100 from generate_series(1,2000000);

analyze foo;

9.6.7
postgres=# explain (analyze, buffers) select i from generate_series(1,500000) g(i) where exists (select count(*) cx from foo group by b, c, d having count(*) = i);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Function Scan on generate_series g (cost=0.00..57739020.00 rows=500 width=4) (actual time=807.485..3364.515 rows=74 loops=1) │
│ Filter: (SubPlan 1) │
│ Rows Removed by Filter: 499926 │
│ Buffers: shared hit=12739, temp read=856 written=855 │
│ SubPlan 1 │
│ -> HashAggregate (cost=57739.00..57739.75 rows=75 width=20) (actual time=0.006..0.006 rows=0 loops=500000) │
│ Group Key: foo.b, foo.c, foo.d │
│ Filter: (count(*) = g.i) │
│ Rows Removed by Filter: 75 │
│ Buffers: shared hit=12739 │
│ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000 width=12) (actual time=0.015..139.736 rows=2000000 loops=1) │
│ Buffers: shared hit=12739 │
│ Planning time: 0.276 ms │
│ Execution time: 3365.758 ms │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)

10.9

postgres=# explain (analyze, buffers) select i from generate_series(1,500000) g(i) where exists (select count(*) cx from foo group by b, c, d having count(*) = i);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ QUERY PLAN │
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ Function Scan on generate_series g (cost=0.00..57739020.00 rows=500 width=4) (actual time=825.468..4919.063 rows=74 loops=1) │
│ Filter: (SubPlan 1) │
│ Rows Removed by Filter: 499926 │
│ Buffers: shared hit=12739, temp read=856 written=855 │
│ SubPlan 1 │
│ -> HashAggregate (cost=57739.00..57739.75 rows=75 width=20) (actual time=0.009..0.009 rows=0 loops=500000) │
│ Group Key: foo.b, foo.c, foo.d │
│ Filter: (count(*) = g.i) │
│ Rows Removed by Filter: 75 │
│ Buffers: shared hit=12739 │
│ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000 width=12) (actual time=0.025..157.887 rows=2000000 loops=1) │
│ Buffers: shared hit=12739 │
│ Planning time: 0.829 ms │
│ Execution time: 4920.800 ms │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(14 rows)

master

postgres=# explain (analyze, buffers) select i from generate_series(1,500000) g(i) where exists (select count(*) cx from foo group by b, c, d having count(*) = i);
┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
│ QUERY PLAN
╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
│ Function Scan on generate_series g (cost=0.00..28869973750.00 rows=250000 width=4) (actual time=901.639..6057.943 rows=74 loops=1)
│ Filter: (SubPlan 1)
│ Rows Removed by Filter: 499926
│ Buffers: shared hit=12739, temp read=855 written=855
│ SubPlan 1
│ -> HashAggregate (cost=57739.00..57739.94 rows=1 width=20) (actual time=0.012..0.012 rows=0 loops=500000)
│ Group Key: foo.b, foo.c, foo.d
│ Filter: (count(*) = g.i)
│ Peak Memory Usage: 37 kB
│ Rows Removed by Filter: 75
│ Buffers: shared hit=12739
│ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000 width=12) (actual time=0.017..262.497 rows=2000000 loops=1)
│ Buffers: shared hit=12739
│ Planning Time: 0.275 ms
│ Buffers: shared hit=1
│ Execution Time: 6059.266 ms
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
(16 rows)

Regards

I tried to run same query on half data size, and the performance is almost same. Probably the performance issue can be related to initialization or finalization of aggregation.

Pavel

Pavel

pgsql-hackers by date:

From: Euler Taveira
Date: 03 June 2020, 18:40:38
Subject: Re: REINDEX CONCURRENTLY and indisreplident

From: Robert Haas
Date: 03 June 2020, 19:13:14
Subject: Re: Parallel copy

Re: significant slowdown of HashAggregate between 9.6 and 10 - Mailing list pgsql-hackers

Previous

Next