Re: using custom scan nodes to prototype parallel sequential scan - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: using custom scan nodes to prototype parallel sequential scan |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F801077AB8@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: using custom scan nodes to prototype parallel sequential scan (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
> On Fri, Nov 14, 2014 at 2:12 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > > On 12 November 2014 07:54, David Rowley > <dgrowleyml@gmail.com> > > Take int4_avg_accum() for example it does: > > > > > > transdata->count++; > > transdata->sum += newval; > > > > The merge function would need to perform something like: > > > > > transdata->count += transdata2merge.count; sum += > transdata2merge.sum; > > > > Then the final function could be called on the merged aggregate > state. > > > More simplify, we can describe parallel aware aggregate function. > Please assume AVG(X) function that takes nrows and sum of X. Its > transition > function performs as like you described above, then final function > works as > usual. > > The job of parallel seq scan needs to do is: > 1. replace AVG(X) by AVG(nrows, sum(X) > 2. generate count(*) on the partial relation being grouped. > 3. generate sum(X) on the partial relation being grouped. > > It looks like the following query: > SELECT AVG(nrows, sum_X) FROM ( > SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno > between 0 and 999 GROUP BY cat > UNION > SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno > between 1000 and 1999 GROUP BY cat > UNION > SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno > between 2000 and 2999 GROUP BY cat > UNION > SELECT count(*) nrows, sum(X) sum_X FROM tbl WHERE blkno > between 3000 and 3999 GROUP BY cat > ); > > > > > Well, this would require giving the planner some kind of knowledge of what > AVG() is. It currently knows nothing about that. It currently calls the > transition function for each row, and the final function at the end and > does not care or need to care about what either of those functions actually > does. The transformation above looks like it would need to care and the > logic to add that would be way more complex than aggregate merge functions. > It may make sense to have an extra catalog to indicate how usual aggregate function can be broken down (or unavailable). Like, AVG(X) = {COUNT(X) + SUM(X)} x N-partitions > Likely for most aggregates, like count, sum, max, min, bit_and and bit_or > the merge function would be the same as the transition function, as the > state type is just the same as the input type. It would only be aggregates > like avg(), stddev*(), bool_and() and bool_or() that would need a new merge > function made... These would be no more complex than the transition > functions... Which are just a few lines of code anyway. > > We'd simply just not run parallel query if any aggregates used in the query > didn't have a merge function. > > When I mentioned this, I didn't mean to appear to be placing a road block.I > was just bringing to the table the information that COUNT(*) + COUNT(*) > works ok for merging COUNT(*)'s "sub totals", but AVG(n) + AVG(n) does not. > > Merge functions should be a simple patch to write. If I thought there was > going to be a use for them in this release, I'd put my hand up to put a > patch together. > Things I'm uncertain is, how caller of aggregate function distinguish a context to call usual translation function, or new merge function. Let's back an example of: SELECT cat, COUNT(*), AVG(X) FROM t GROUP BY cat; Its plan tree is probably as follows: HashAggregate Group Key: cat -> Custom Scan (Parallel Scan) # of workers:4 The caller of translation/merge/final function is HashAggregate node. On the other hand, it has to know whether the underlying plan returns every records of underlying table or sub-total by parallel scan. Please correct me, if my assumption is wrong. Once HashAggregate can know that sub-plan returns sub-total of the relation, it can chose merge function instead of translation function. However, what I want to clarify is how to inform HashAggregate node its sub- plan intends to return sub-total, instead of individual rows. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: