Home > mailing lists

Re: Parallel Aggregate - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Parallel Aggregate
Date	March 14, 2016 01:30:43
Msg-id	CAKJS1f_xMRhzSkSaH4BzSyEpayZNb8-DL3Ji-ZWnxtkdfHYEqQ@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Aggregate (James Sewell <james.sewell@lisasoft.com>)
Responses	Re: Parallel Aggregate
List	pgsql-hackers

Tree view

On 14 March 2016 at 14:16, James Sewell <james.sewell@lisasoft.com> wrote:

I've done some testing with one of my data sets in an 8VPU virtual environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25 million records.

Explain with a max_parallel_degree of 8 tells me that the query will only use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate (cost=810142.42..810882.62 rows=59216 width=16)
Group Key: (date_trunc('DAY'::text, pageview))
-> Gather (cost=765878.46..808069.86 rows=414512 width=16)
Number of Workers: 6
-> Partial HashAggregate (cost=764878.46..765618.66 rows=59216 width=16)
Group Key: date_trunc('DAY'::text, pageview)
-> Parallel Seq Scan on fact_agg_2015_12 (cost=0.00..743769.76 rows=4221741 width=12)

Great! Thanks for testing this.

If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual number of Gather rows come out at 105? I'd just like to get an idea of my cost estimate for the Gather are going to be accurate for real world data sets.

David Rowley http://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: James Sewell
Date: 14 March 2016, 01:16:51
Subject: Re: Parallel Aggregate

From: Kouhei Kaigai
Date: 14 March 2016, 01:44:32
Subject: Re: WIP: Upper planner pathification

Re: Parallel Aggregate - Mailing list pgsql-hackers

Previous

Next