Home > mailing lists

Re: Support Parallel Query Execution in Executor - Mailing list pgsql-hackers

From	Mike Rylander
Subject	Re: Support Parallel Query Execution in Executor
Date	April 7, 2006 14:49:18
Msg-id	b918cf3d0604070559r5d139018h3693ec95ac42092c@mail.gmail.com Whole thread Raw
In response to	Re: Support Parallel Query Execution in Executor ("Qingqing Zhou" <zhouqq@cs.toronto.edu>)
Responses	Re: Support Parallel Query Execution in Executor (Josh Berkus <josh@agliodbs.com>)
List	pgsql-hackers

Tree view

On 4/6/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>
> ""Jonah H. Harris"" <jonah.harris@gmail.com> wrote
> >
> > Great work!  I had looked into this a little bit and came to the same
> > ideas/problems you did, but none of them seemed insurmountable at all.
> >  I'd be interested in working with you on this if you'd like.
> >

First, I want to second Jonah's enthusiasm.  This is very exciting!

>
> Yes, I am happy to work with anyone on the topic. The plan in mind is like
> this:
> (1) stable the master-slave seqscan: solve all the problems left;
> (2) parallize the seqscan: AFAICS, this should not very difficult based on
> 1, may only need some scan portition assignment;

This is really only a gut feeling for me (it can't be otherwise, since
we can't yet test), but I think parallelizing a single seqscan is
pretty much guaranteed to do nothing, because seqscans, especially on
large tables, are IO bound.

There was plan some time ago (during 8.0 beta, I think) to allow
multiple seqscans from different queries to join each other, such that
scans that begin later start scanning the table at the point, or just
behind the point, that the first running scan is already at.  That
plan would reduce IO contention, and buffer and OS cache thrashing, by
having multiple readers pull from the same hose.

I can't see how asking for more than one stream from the same file
would do anything but increase both cache thrashing and IO bandwidth
contention.  Am I missing something here?

> (3) add an indexscan or other one or two node type to  master-slave
> solution: this is in order to make the framework extensible;
> (4) parallize these node - this will be a big chunk of job;

Now that could be a _big_ win!  Especially if tablespaces are used to
balance commonly combined tables and indexes.

> (5) add a two-phase optimization to the server - we have to consider the
> partitioned table in this stage, yet another big chunk of job;
>

Same here.  This would be a place where parallel seqscans of different
tables (instead of multi-headed scan of one table) could buy you a
lot, especially with proper tablespace use.

Thanks again, Qingqing, for the work on this.  I'm very excited about
where this could go. :)

> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

pgsql-hackers by date:

From: Tom Lane
Date: 07 April 2006, 14:22:09
Subject: Re: WAL Bypass for indexes

From: "Jim Nasby"
Date: 07 April 2006, 16:35:02
Subject: FW: [GENERAL] stored proc vs sql query string

Re: Support Parallel Query Execution in Executor - Mailing list pgsql-hackers

Previous

Next