Home > mailing lists

Re: Parallel Seq Scan - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Parallel Seq Scan
Date	January 15, 2015 00:26:00
Msg-id	CA+Tgmoaoj8kf6ft9O1E=T3+XCrRoKr4sWBVfoXdzFaDCH+=M+Q@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Seq Scan (John Gorman <johngorman2@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Jan 13, 2015 at 6:25 AM, John Gorman <johngorman2@gmail.com> wrote:
> One approach that I has worked well for me is to break big jobs into much
> smaller bite size tasks. Each task is small enough to complete quickly.
>
> We add the tasks to a task queue and spawn a generic worker pool which eats
> through the task queue items.
>
> This solves a lot of problems.
>
> - Small to medium jobs can be parallelized efficiently.
> - No need to split big jobs perfectly.
> - We don't get into a situation where we are waiting around for a worker to
> finish chugging through a huge task while the other workers sit idle.
> - Worker memory footprint is tiny so we can afford many of them.
> - Worker pool management is a well known problem.
> - Worker spawn time disappears as a cost factor.
> - The worker pool becomes a shared resource that can be managed and reported
> on and becomes considerably more predictable.

I think this is a good idea, but for now I would like to keep our
goals somewhat more modest: let's see if we can get parallel
sequential scan, and only parallel sequential scan, working and
committed.  Ultimately, I think we may need something like what you're
talking about, because if you have a query with three or six or twelve
different parallelizable operations in it, you want the available CPU
resources to switch between those as their respective needs may
dictate.  You certainly don't want to spawn a separate pool of workers
for each scan.

But I think getting that all working in the first version is probably
harder than what we should attempt.  We have a bunch of problems to
solve here just around parallel sequential scan and the parallel mode
infrastructure: heavyweight locking, prefetching, the cost model, and
so on.  Trying to add to that all of the problems that might attend on
a generic task queueing infrastructure fills me with no small amount
of fear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 15 January 2015, 00:02:34
Subject: Re: OOM on EXPLAIN with lots of nodes

From: Robert Haas
Date: 15 January 2015, 00:42:05
Subject: Re: Typo fix in alter_table.sgml

Re: Parallel Seq Scan - Mailing list pgsql-hackers

Previous

Next