Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Parallel Seq Scan
Date	December 6, 2014 08:13:25
Msg-id	CAApHDvrZG5Q9rNxU4WOga8AgvAwQ83bF83CFvMbOQcCg8vk=Zw@mail.gmail.com Whole thread Raw
In response to	Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Parallel Seq Scan (Amit Kapila <amit.kapila16@gmail.com>) Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On 4 December 2014 at 19:35, Amit Kapila <amit.kapila16@gmail.com> wrote:

Attached patch is just to facilitate the discussion about the
parallel seq scan and may be some other dependent tasks like
sharing of various states like combocid, snapshot with parallel
workers. It is by no means ready to do any complex test, ofcourse
I will work towards making it more robust both in terms of adding
more stuff and doing performance optimizations.

Thoughts/Suggestions?

This is good news!

I've not gotten to look at the patch yet, but I thought you may be able to make use of the attached at some point.

It's bare-bones core support for allowing aggregate states to be merged together with another aggregate state. I would imagine that if a query such as:

SELECT MAX(value) FROM bigtable;

was run, then a series of parallel workers could go off and each find the max value from their portion of the table and then perhaps some other node type would then take all the intermediate results from the workers, once they're finished, and join all of the aggregate states into one and return that. Naturally, you'd need to check that all aggregates used in the targetlist had a merge function first.

This is just a few hours of work. I've not really tested the pg_dump support or anything yet. I've also not added any new functions to allow AVG() or COUNT() to work, I've really just re-used existing functions where I could, as things like MAX() and BOOL_OR() can just make use of the existing transition function. I thought that this might be enough for early tests.

I'd imagine such a workload, ignoring IO overhead, should scale pretty much linearly with the number of worker processes. Of course, if there was a GROUP BY clause then the merger code would have to perform more work.

If you think you might be able to make use of this, then I'm willing to go off and write all the other merge functions required for the other aggregates.

Regards

David Rowley

Attachment

merge_aggregate_state_v1.patch

pgsql-hackers by date:

From: Amit Kapila
Date: 06 December 2014, 08:10:19
Subject: Re: Parallel Seq Scan

From: Amit Kapila
Date: 06 December 2014, 09:22:27
Subject: Re: Parallel Seq Scan

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Attachment

Previous

Next