Home > mailing lists

Re: Re: parallel distinct union and aggregate support patch - Mailing list pgsql-hackers

From	bucoo@sohu.com
Subject	Re: Re: parallel distinct union and aggregate support patch
Date	October 29, 2020 07:23:25
Msg-id	2020102915232457574027@sohu.com Whole thread Raw
In response to	parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>)
Responses	Re: Re: parallel distinct union and aggregate support patch
List	pgsql-hackers

Tree view

> 1) It's better to always include the whole patch series - including the

> parts that have not changed. Otherwise people have to scavenge the

> thread and search for all the pieces, which may be a source of issues.

> Also, it confuses the patch tester [1] which tries to apply patches from

> a single message, so it will fail for this one.

Pathes 3 and 4 do not rely on 1 and 2 in code.

But, it will fail when you apply the apatches 3 and 4 directly, because

they are written after 1 and 2.

I can generate a new single patch if you need.

> 2) I suggest you try to describe the goal of these patches, using some

> example queries, explain output etc. Right now the reviewers have to

> reverse engineer the patches and deduce what the intention was, which

> may be causing unnecessary confusion etc. If this was my patch, I'd try

> to create a couple examples (CREATE TABLE + SELECT + EXPLAIN) showing

> how the patch changes the query plan, showing speedup etc.

I written some example queries in to regress, include "unique" "union"

"group by" and "group by grouping sets".

here is my tests, they are not in regress

```sql

begin;

create table gtest(id integer, txt text);

insert into gtest select t1.id,'txt'||t1.id from (select generate_series(1,1000*1000) id) t1,(select generate_series(1,10) id) t2;

analyze gtest;

commit;

set jit = off;

\timing on

```

normal aggregate times

```

set enable_batch_hashagg = off;

explain (costs off,analyze,verbose)

select sum(id),txt from gtest group by txt;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------

Finalize GroupAggregate (actual time=6469.279..8947.024 rows=1000000 loops=1)

Output: sum(id), txt

Group Key: gtest.txt

-> Gather Merge (actual time=6469.245..8165.930 rows=1000058 loops=1)

Output: txt, (PARTIAL sum(id))

Workers Planned: 2

Workers Launched: 2

-> Sort (actual time=6356.471..7133.832 rows=333353 loops=3)

Output: txt, (PARTIAL sum(id))

Sort Key: gtest.txt

Sort Method: external merge Disk: 11608kB

Worker 0: actual time=6447.665..7349.431 rows=317512 loops=1

Sort Method: external merge Disk: 10576kB

Worker 1: actual time=6302.882..7061.157 rows=333301 loops=1

Sort Method: external merge Disk: 11112kB

-> Partial HashAggregate (actual time=2591.487..4430.437 rows=333353 loops=3)

Output: txt, PARTIAL sum(id)

Group Key: gtest.txt

Batches: 17 Memory Usage: 4241kB Disk Usage: 113152kB

Worker 0: actual time=2584.345..4486.407 rows=317512 loops=1

Batches: 17 Memory Usage: 4241kB Disk Usage: 101392kB

Worker 1: actual time=2584.369..4393.244 rows=333301 loops=1

Batches: 17 Memory Usage: 4241kB Disk Usage: 112832kB

-> Parallel Seq Scan on public.gtest (actual time=0.691..603.990 rows=3333333 loops=3)

Output: id, txt

Worker 0: actual time=0.104..607.146 rows=3174970 loops=1

Worker 1: actual time=0.100..603.951 rows=3332785 loops=1

Planning Time: 0.226 ms

Execution Time: 9021.058 ms

(29 rows)

Time: 9022.251 ms (00:09.022)

set enable_batch_hashagg = on;

explain (costs off,analyze,verbose)

select sum(id),txt from gtest group by txt;

QUERY PLAN

-------------------------------------------------------------------------------------------------

Gather (actual time=3116.666..5740.826 rows=1000000 loops=1)

Output: (sum(id)), txt

Workers Planned: 2

Workers Launched: 2

-> Parallel BatchHashAggregate (actual time=3103.181..5464.948 rows=333333 loops=3)

Output: sum(id), txt

Group Key: gtest.txt

Worker 0: actual time=3094.550..5486.992 rows=326082 loops=1

Worker 1: actual time=3099.562..5480.111 rows=324729 loops=1

-> Parallel Seq Scan on public.gtest (actual time=0.791..656.601 rows=3333333 loops=3)

Output: id, txt

Worker 0: actual time=0.080..646.053 rows=3057680 loops=1

Worker 1: actual time=0.070..662.754 rows=3034370 loops=1

Planning Time: 0.243 ms

Execution Time: 5788.981 ms

(15 rows)

Time: 5790.143 ms (00:05.790)

```

grouping sets times

```

set enable_batch_hashagg = off;

explain (costs off,analyze,verbose)

select sum(id),txt from gtest group by grouping sets(id,txt,());

QUERY PLAN

------------------------------------------------------------------------------------------

GroupAggregate (actual time=9454.707..38921.885 rows=2000001 loops=1)

Output: sum(id), txt, id

Group Key: gtest.id

Group Key: ()

Sort Key: gtest.txt

Group Key: gtest.txt

-> Sort (actual time=9454.679..11804.071 rows=10000000 loops=1)

Output: txt, id

Sort Key: gtest.id

Sort Method: external merge Disk: 254056kB

-> Seq Scan on public.gtest (actual time=2.250..2419.031 rows=10000000 loops=1)

Output: txt, id

Planning Time: 0.230 ms

Execution Time: 39203.883 ms

(14 rows)

Time: 39205.339 ms (00:39.205)

set enable_batch_hashagg = on;

explain (costs off,analyze,verbose)

select sum(id),txt from gtest group by grouping sets(id,txt,());

QUERY PLAN

-------------------------------------------------------------------------------------------------

Gather (actual time=5931.776..14353.957 rows=2000001 loops=1)

Output: (sum(id)), txt, id

Workers Planned: 2

Workers Launched: 2

-> Parallel BatchHashAggregate (actual time=5920.963..13897.852 rows=666667 loops=3)

Output: sum(id), txt, id

Group Key: gtest.id

Group Key: ()

Group Key: gtest.txt

Worker 0: actual time=5916.370..14062.461 rows=513810 loops=1

Worker 1: actual time=5916.037..13932.847 rows=775901 loops=1

-> Parallel Seq Scan on public.gtest (actual time=0.399..688.273 rows=3333333 loops=3)

Output: id, txt

Worker 0: actual time=0.052..690.955 rows=3349990 loops=1

Worker 1: actual time=0.050..691.595 rows=3297070 loops=1

Planning Time: 0.157 ms

Execution Time: 14598.416 ms

(17 rows)

Time: 14599.437 ms (00:14.599)

```

pgsql-hackers by date:

From: Andrey Borodin
Date: 29 October 2020, 07:08:21
Subject: Re: MultiXact\SLRU buffers configuration

From: Masahiro Ikeda
Date: 29 October 2020, 08:03:56
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning

Re: Re: parallel distinct union and aggregate support patch - Mailing list pgsql-hackers

Previous

Next