Home > mailing lists

Re: parallel distinct union and aggregate support patch - Mailing list pgsql-hackers

From	Dilip Kumar
Subject	Re: parallel distinct union and aggregate support patch
Date	October 22, 2020 09:08:03
Msg-id	CAFiTN-s85CsefWxZnm=X7bh+unMdUng4XBOx7Zgpd1HFGd2fXA@mail.gmail.com Whole thread
In response to	parallel distinct union and aggregate support patch ("bucoo@sohu.com" <bucoo@sohu.com>)
Responses	Re: parallel distinct union and aggregate support patch
List	pgsql-hackers

Tree view

On Mon, Oct 19, 2020 at 8:19 PM bucoo@sohu.com <bucoo@sohu.com> wrote:
>
> Hi hackers,
> I write a path for soupport parallel distinct, union and aggregate using batch sort.
> steps:
>  1. generate hash value for group clauses values, and using mod hash value save to batch
>  2. end of outer plan, wait all other workers finish write to batch
>  3. echo worker get a unique batch number, call tuplesort_performsort() function finish this batch sort
>  4. return row for this batch
>  5. if not end of all batchs, got step 3
>
> BatchSort paln make sure same tuple(group clause) return in same range, so Unique(or GroupAggregate) plan can work.

Interesting idea.  So IIUC, whenever a worker is scanning the tuple it
will directly put it into the respective batch(shared tuple store),
based on the hash on grouping column and once all the workers are
doing preparing the batch then each worker will pick those baches one
by one, perform sort and finish the aggregation.  I think there is a
scope of improvement that instead of directly putting the tuple to the
batch what if the worker does the partial aggregations and then it
places the partially aggregated rows in the shared tuple store based
on the hash value and then the worker can pick the batch by batch.  By
doing this way, we can avoid doing large sorts.  And then this
approach can also be used with the hash aggregate, I mean the
partially aggregated data by the hash aggregate can be put into the
respective batch.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 22 October 2020, 08:50:36
Subject: Re: [Patch] Optimize dropping of relation buffers using dlist

From: Kyotaro Horiguchi
Date: 22 October 2020, 09:16:48
Subject: Re: Enumize logical replication message actions

Re: parallel distinct union and aggregate support patch - Mailing list pgsql-hackers

Previous

Next