Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: using custom scan nodes to prototype parallel sequential scan
Date
Msg-id CAB7nPqSR0kAEEOAnOfa4Q4fwE1iFWsZpUYdQ3DodAu=WdheNng@mail.gmail.com
Whole thread Raw
In response to Re: using custom scan nodes to prototype parallel sequential scan  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On Wed, Dec 3, 2014 at 3:23 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>> On Fri, Nov 14, 2014 at 02:51:32PM +1300, David Rowley wrote:
>> > Likely for most aggregates, like count, sum, max, min, bit_and and
>> > bit_or the merge function would be the same as the transition
>> > function, as the state type is just the same as the input type. It
>> > would only be aggregates like avg(), stddev*(), bool_and() and
>> > bool_or() that would need a new merge function made... These would be
>> > no more complex than the transition functions... Which are just a few
>> lines of code anyway.
>> >
>> > We'd simply just not run parallel query if any aggregates used in the
>> > query didn't have a merge function.
>> >
>> > When I mentioned this, I didn't mean to appear to be placing a road
>> > block.I was just bringing to the table the information that COUNT(*) +
>> > COUNT(*) works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n)
>> does not.
>>
>> Sorry, late reply, but, FYI, I don't think our percentile functions can't
>> be parallelized in the same way:
>>
>>       test=> \daS *percent*
>>                                                             List of
>> aggregate functions
>>          Schema   |      Name       |  Result data type  |
>> Argument data types              |             Description
>>       ------------+-----------------+--------------------+----------
>> ------------------------------------+---------------------------------
>> ----
>>        pg_catalog | percent_rank    | double precision   | VARIADIC
>> "any" ORDER BY VARIADIC "any"       | fractional rank of hypothetical row
>>        pg_catalog | percentile_cont | double precision   | double
>> precision ORDER BY double precision   | continuous distribution percentile
>>        pg_catalog | percentile_cont | double precision[] | double
>> precision[] ORDER BY double precision | multiple continuous percentiles
>>        pg_catalog | percentile_cont | interval           | double
>> precision ORDER BY interval           | continuous distribution
>> percentile
>>        pg_catalog | percentile_cont | interval[]         | double
>> precision[] ORDER BY interval         | multiple continuous percentiles
>>        pg_catalog | percentile_disc | anyelement         | double
>> precision ORDER BY anyelement         | discrete percentile
>>        pg_catalog | percentile_disc | anyarray           | double
>> precision[] ORDER BY anyelement       | multiple discrete percentiles
>>
> Yep, it seems to me the type of aggregate function that is not obvious
> to split into multiple partitions.
> I think, it is valuable even if we can push-down a part of aggregate
> functions which is well known by the core planner.
> For example, we know count(*) = sum(nrows), we also know avg(X) can
> be rewritten to enhanced avg() that takes both of nrows and partial
> sum of X. We can utilize these knowledge to break-down aggregate
> functions.
Postgres-XC (Postgres-XL) has implemented such parallel aggregate
logic some time ago using a set of sub functions and a finalization
function to do the work.
My 2c.
-- 
Michael



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Wait free LW_SHARED acquisition - v0.2
Next
From: Jim Nasby
Date:
Subject: Re: On partitioning