Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: using custom scan nodes to prototype parallel sequential scan
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801077AB8@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: using custom scan nodes to prototype parallel sequential scan  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
> On Fri, Nov 14, 2014 at 2:12 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> 
> 
>     >       On 12 November 2014 07:54, David Rowley
> <dgrowleyml@gmail.com>
>     > Take int4_avg_accum() for example it does:
>     >
>     >
>     > transdata->count++;
>     > transdata->sum += newval;
>     >
>     > The merge function would need to perform something like:
>     >
> 
>     > transdata->count += transdata2merge.count; sum +=
> transdata2merge.sum;
>     >
>     > Then the final function could be called on the merged aggregate
> state.
>     >
>     More simplify, we can describe parallel aware aggregate function.
>     Please assume AVG(X) function that takes nrows and sum of X. Its
> transition
>     function performs as like you described above, then final function
> works as
>     usual.
> 
>     The job of parallel seq scan needs to do is:
>       1. replace AVG(X) by AVG(nrows, sum(X)
>       2. generate count(*) on the partial relation being grouped.
>       3. generate sum(X) on the partial relation being grouped.
> 
>     It looks like the following query:
>       SELECT AVG(nrows, sum_X) FROM (
>         SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 0 and 999 GROUP BY cat
>         UNION
>         SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 1000 and 1999 GROUP BY cat
>         UNION
>         SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 2000 and 2999 GROUP BY cat
>         UNION
>         SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno
> between 3000 and 3999 GROUP BY cat
>       );
> 
> 
> 
> 
> Well, this would require giving the planner some kind of knowledge of what
> AVG() is. It currently knows nothing about that. It currently calls the
> transition function for each row, and the final function at the end and
> does not care or need to care about what either of those functions actually
> does. The transformation above looks like it would need to care and the
> logic to add that would be way more complex than aggregate merge functions.
> 
It may make sense to have an extra catalog to indicate how usual aggregate
function can be broken down (or unavailable).
Like, AVG(X) = {COUNT(X) + SUM(X)} x N-partitions

> Likely for most aggregates, like count, sum, max, min, bit_and and bit_or
> the merge function would be the same as the transition function, as the
> state type is just the same as the input type. It would only be aggregates
> like avg(), stddev*(), bool_and() and bool_or() that would need a new merge
> function made... These would be no more complex than the transition
> functions... Which are just a few lines of code anyway.
> 
> We'd simply just not run parallel query if any aggregates used in the query
> didn't have a merge function.
> 
> When I mentioned this, I didn't mean to appear to be placing a road block.I
> was just bringing to the table the information that COUNT(*) + COUNT(*)
> works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) does not.
> 
>  Merge functions should be a simple patch to write. If I thought there was
> going to be a use for them in this release, I'd put my hand up to put a
> patch together.
> 
Things I'm uncertain is, how caller of aggregate function distinguish a context
to call usual translation function, or new merge function.

Let's back an example of: SELECT cat, COUNT(*), AVG(X) FROM t GROUP BY cat;

Its plan tree is probably as follows: HashAggregate  Group Key: cat  ->  Custom Scan (Parallel Scan)        # of
workers:4
 

The caller of translation/merge/final function is HashAggregate node.
On the other hand, it has to know whether the underlying plan returns every
records of underlying table or sub-total by parallel scan.
Please correct me, if my assumption is wrong.

Once HashAggregate can know that sub-plan returns sub-total of the relation,
it can chose merge function instead of translation function.
However, what I want to clarify is how to inform HashAggregate node its sub-
plan intends to return sub-total, instead of individual rows.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: WAL format and API changes (9.5)
Next
From: Jim Nasby
Date:
Subject: Re: using custom scan nodes to prototype parallel sequential scan