Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: using custom scan nodes to prototype parallel sequential scan
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F80108C3A0@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: using custom scan nodes to prototype parallel sequential scan  (Bruce Momjian <bruce@momjian.us>)
Responses Re: using custom scan nodes to prototype parallel sequential scan  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
> On Fri, Nov 14, 2014 at 02:51:32PM +1300, David Rowley wrote:
> > Likely for most aggregates, like count, sum, max, min, bit_and and
> > bit_or the merge function would be the same as the transition
> > function, as the state type is just the same as the input type. It
> > would only be aggregates like avg(), stddev*(), bool_and() and
> > bool_or() that would need a new merge function made... These would be
> > no more complex than the transition functions... Which are just a few
> lines of code anyway.
> >
> > We'd simply just not run parallel query if any aggregates used in the
> > query didn't have a merge function.
> >
> > When I mentioned this, I didn't mean to appear to be placing a road
> > block.I was just bringing to the table the information that COUNT(*) +
> > COUNT(*) works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n)
> does not.
>
> Sorry, late reply, but, FYI, I don't think our percentile functions can't
> be parallelized in the same way:
>
>     test=> \daS *percent*
>                                                           List of
> aggregate functions
>        Schema   |      Name       |  Result data type  |
> Argument data types              |             Description
>     ------------+-----------------+--------------------+----------
> ------------------------------------+---------------------------------
> ----
>      pg_catalog | percent_rank    | double precision   | VARIADIC
> "any" ORDER BY VARIADIC "any"       | fractional rank of hypothetical row
>      pg_catalog | percentile_cont | double precision   | double
> precision ORDER BY double precision   | continuous distribution percentile
>      pg_catalog | percentile_cont | double precision[] | double
> precision[] ORDER BY double precision | multiple continuous percentiles
>      pg_catalog | percentile_cont | interval           | double
> precision ORDER BY interval           | continuous distribution
> percentile
>      pg_catalog | percentile_cont | interval[]         | double
> precision[] ORDER BY interval         | multiple continuous percentiles
>      pg_catalog | percentile_disc | anyelement         | double
> precision ORDER BY anyelement         | discrete percentile
>      pg_catalog | percentile_disc | anyarray           | double
> precision[] ORDER BY anyelement       | multiple discrete percentiles
>
Yep, it seems to me the type of aggregate function that is not obvious
to split into multiple partitions.
I think, it is valuable even if we can push-down a part of aggregate
functions which is well known by the core planner.
For example, we know count(*) = sum(nrows), we also know avg(X) can
be rewritten to enhanced avg() that takes both of nrows and partial
sum of X. We can utilize these knowledge to break-down aggregate
functions.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: using Core Foundation locale functions
Next
From: Michael Paquier
Date:
Subject: Re: printing table in asciidoc with psql