Re: Support Parallel Query Execution in Executor - Mailing list pgsql-hackers

From Mike Rylander
Subject Re: Support Parallel Query Execution in Executor
Date
Msg-id b918cf3d0604070559r5d139018h3693ec95ac42092c@mail.gmail.com
Whole thread Raw
In response to Re: Support Parallel Query Execution in Executor  ("Qingqing Zhou" <zhouqq@cs.toronto.edu>)
Responses Re: Support Parallel Query Execution in Executor  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On 4/6/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>
> ""Jonah H. Harris"" <jonah.harris@gmail.com> wrote
> >
> > Great work!  I had looked into this a little bit and came to the same
> > ideas/problems you did, but none of them seemed insurmountable at all.
> >  I'd be interested in working with you on this if you'd like.
> >

First, I want to second Jonah's enthusiasm.  This is very exciting!

>
> Yes, I am happy to work with anyone on the topic. The plan in mind is like
> this:
> (1) stable the master-slave seqscan: solve all the problems left;
> (2) parallize the seqscan: AFAICS, this should not very difficult based on
> 1, may only need some scan portition assignment;

This is really only a gut feeling for me (it can't be otherwise, since
we can't yet test), but I think parallelizing a single seqscan is
pretty much guaranteed to do nothing, because seqscans, especially on
large tables, are IO bound.

There was plan some time ago (during 8.0 beta, I think) to allow
multiple seqscans from different queries to join each other, such that
scans that begin later start scanning the table at the point, or just
behind the point, that the first running scan is already at.  That
plan would reduce IO contention, and buffer and OS cache thrashing, by
having multiple readers pull from the same hose.

I can't see how asking for more than one stream from the same file
would do anything but increase both cache thrashing and IO bandwidth
contention.  Am I missing something here?

> (3) add an indexscan or other one or two node type to  master-slave
> solution: this is in order to make the framework extensible;
> (4) parallize these node - this will be a big chunk of job;

Now that could be a _big_ win!  Especially if tablespaces are used to
balance commonly combined tables and indexes.

> (5) add a two-phase optimization to the server - we have to consider the
> partitioned table in this stage, yet another big chunk of job;
>

Same here.  This would be a place where parallel seqscans of different
tables (instead of multi-headed scan of one table) could buy you a
lot, especially with proper tablespace use.

Thanks again, Qingqing, for the work on this.  I'm very excited about
where this could go. :)

> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: WAL Bypass for indexes
Next
From: "Jim Nasby"
Date:
Subject: FW: [GENERAL] stored proc vs sql query string