Home > mailing lists

Re: Parallel Aggregate - Mailing list pgsql-hackers

From	Haribabu Kommi
Subject	Re: Parallel Aggregate
Date	December 14, 2015 06:04:02
Msg-id	CAJrrPGfsF8ony1K1OFED+ZVS+_OnygrCR0z9vYBtvz_f6XMtfQ@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Aggregate (Haribabu Kommi <kommi.haribabu@gmail.com>)
List	pgsql-hackers

Tree view

On Fri, Dec 11, 2015 at 5:42 PM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> 3. Performance test to observe the effect of parallel aggregate.

Here I attached the performance test report of parallel aggregate.
Summary of the result is:
1. Parallel aggregate is not giving any improvement or having
very less overhead compared to parallel scan in case of low
selectivity.

2. Parallel aggregate is performing well more than 60% compared
to parallel scan because of very less data transfer overhead as the
hash aggregate operation is reducing the number of tuples that
are required to be transferred from workers to backend.

The parallel aggregate plan is depends on below parallel seq scan.
In case if parallel seq scan plan is not generated because of more
tuple transfer overhead cost in case of higher selectivity, then
parallel aggregate is also not possible. But with parallel aggregate
the number of records that are required to be transferred from
worker to backend may reduce compared to parallel seq scan. So
the overall cost of parallel aggregate may be better.

To handle this problem, how about the following way?

Having an one more member in RelOptInfo called
cheapest_parallel_path used to store the parallel path that is possible.
where ever the parallel plan is possible, this value will be set with
the possible parallel plan. If parallel plan is not possible in the parent
nodes, then this will be set as NULL. otherwise again calculate the
parallel plan at this node based on the below parallel plan node.

Once the entire paths are finalized, in grouping planner, prepare a
plan for normal aggregate and parallel aggregate. Compare these
two costs and decide the cheapest cost plan.

I didn't yet evaluated the feasibility of the above solution. suggestions?

Regards,
Hari Babu
Fujitsu Australia

Attachment

performance_test_result.xlsx

pgsql-hackers by date:

From: Michael Paquier
Date: 14 December 2015, 02:59:45
Subject: Re: pg_stat_replication log positions vs base backups

From: Corey Huinker
Date: 14 December 2015, 06:15:58
Subject: Re: Disabling an index temporarily

Re: Parallel Aggregate - Mailing list pgsql-hackers

Attachment

Previous

Next