Re: Parallel Aggregate - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAKJS1f8EcEOfYTdbVG2_kxujd+ZnY1bFzDTaRh-BCQuJ21iBBw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (James Sewell <james.sewell@lisasoft.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On 14 March 2016 at 16:39, James Sewell <james.sewell@lisasoft.com> wrote: > > I've been testing how this works with partitioning (which seems to be strange, but I'll post separately about that) andsomething odd seems to be going on now with the parallel triggering: > > postgres=# create table a as select * from base_p2015_11; > SELECT 20000000 > > postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); > QUERY PLAN > ---------------------------------------------------------------------------------------------- > Finalize GroupAggregate (cost=218242.96..218254.46 rows=200 width=16) > Group Key: (date_trunc('DAY'::text, ts)) > -> Sort (cost=218242.96..218245.96 rows=1200 width=16) > Sort Key: (date_trunc('DAY'::text, ts)) > -> Gather (cost=218059.08..218181.58 rows=1200 width=16) > Number of Workers: 5 > -> Partial HashAggregate (cost=217059.08..217061.58 rows=200 width=16) > Group Key: date_trunc('DAY'::text, ts) > -> Parallel Seq Scan on a (cost=0.00..197059.06 rows=4000005 width=12) > (9 rows) > > postgres=# analyze a; > > postgres=# explain select sum(count) from a group by date_trunc('DAY',ts); > QUERY PLAN > -------------------------------------------------------------------------- > GroupAggregate (cost=3164211.55..3564212.03 rows=20000024 width=16) > Group Key: (date_trunc('DAY'::text, ts)) > -> Sort (cost=3164211.55..3214211.61 rows=20000024 width=12) > Sort Key: (date_trunc('DAY'::text, ts)) > -> Seq Scan on a (cost=0.00..397059.30 rows=20000024 width=12) > (5 rows) > > Unsure what's happening here. This just comes down to the fact that PostgreSQL is quite poor at estimating the number of groups that will be produced by the expression date_trunc('DAY',ts). Due to lack of stats when you run the query before ANALYZE, PostgreSQL just uses a hardcoded guess of 200, which it thinks will fit quite nicely in the HashAggregate node's hash table. After you run ANALYZE this estimate goes up to 20000024, and the grouping planner thinks that's a little to much be storing in a hash table, based on the size of your your work_mem setting, so it uses a Sort plan instead. Things to try: 1. alter table a add column ts_date date; update a set ts_date = date_trunc('DAY',ts); vacuum full analyze ts; 2. or, create index on a (date_trunc('DAY',ts)); analyze a; 3. or for testing, set the work_mem higher. So, basically, it's no fault of this patch. It's just there's no real good way for the planner to go estimating something like date_trunc('DAY',ts) without either adding a column which explicitly stores that value (1), or collecting stats on the expression (2), or teaching the planner about the internals of that function, which is likely just never going to happen. (3) is just going to make the outlook of a hash plan look a little brighter, although you'll likely need a work_mem of over 1GB to make the plan change. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: