Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: using custom scan nodes to prototype parallel sequential scan |
Date | |
Msg-id | CAB7nPqSR0kAEEOAnOfa4Q4fwE1iFWsZpUYdQ3DodAu=WdheNng@mail.gmail.com Whole thread Raw |
In response to | Re: using custom scan nodes to prototype parallel sequential scan (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
List | pgsql-hackers |
On Wed, Dec 3, 2014 at 3:23 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> On Fri, Nov 14, 2014 at 02:51:32PM +1300, David Rowley wrote: >> > Likely for most aggregates, like count, sum, max, min, bit_and and >> > bit_or the merge function would be the same as the transition >> > function, as the state type is just the same as the input type. It >> > would only be aggregates like avg(), stddev*(), bool_and() and >> > bool_or() that would need a new merge function made... These would be >> > no more complex than the transition functions... Which are just a few >> lines of code anyway. >> > >> > We'd simply just not run parallel query if any aggregates used in the >> > query didn't have a merge function. >> > >> > When I mentioned this, I didn't mean to appear to be placing a road >> > block.I was just bringing to the table the information that COUNT(*) + >> > COUNT(*) works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) >> does not. >> >> Sorry, late reply, but, FYI, I don't think our percentile functions can't >> be parallelized in the same way: >> >> test=> \daS *percent* >> List of >> aggregate functions >> Schema | Name | Result data type | >> Argument data types | Description >> ------------+-----------------+--------------------+---------- >> ------------------------------------+--------------------------------- >> ---- >> pg_catalog | percent_rank | double precision | VARIADIC >> "any" ORDER BY VARIADIC "any" | fractional rank of hypothetical row >> pg_catalog | percentile_cont | double precision | double >> precision ORDER BY double precision | continuous distribution percentile >> pg_catalog | percentile_cont | double precision[] | double >> precision[] ORDER BY double precision | multiple continuous percentiles >> pg_catalog | percentile_cont | interval | double >> precision ORDER BY interval | continuous distribution >> percentile >> pg_catalog | percentile_cont | interval[] | double >> precision[] ORDER BY interval | multiple continuous percentiles >> pg_catalog | percentile_disc | anyelement | double >> precision ORDER BY anyelement | discrete percentile >> pg_catalog | percentile_disc | anyarray | double >> precision[] ORDER BY anyelement | multiple discrete percentiles >> > Yep, it seems to me the type of aggregate function that is not obvious > to split into multiple partitions. > I think, it is valuable even if we can push-down a part of aggregate > functions which is well known by the core planner. > For example, we know count(*) = sum(nrows), we also know avg(X) can > be rewritten to enhanced avg() that takes both of nrows and partial > sum of X. We can utilize these knowledge to break-down aggregate > functions. Postgres-XC (Postgres-XL) has implemented such parallel aggregate logic some time ago using a set of sub functions and a finalization function to do the work. My 2c. -- Michael
pgsql-hackers by date: