Thread: Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers

Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers

From
"Jojo Paderes"
Date:
I'd like to know if the latest PostgreSQL release can scale up by
utilizing multiple cpu or dual core cpu to boost up the sql
executions.

I already do a research on the PostgreSQL mailing archives and only
found old threads dating back 2000. A lot of things have improved with
PostgreSQL and hopefully the support for multiple cpu or dual cores is
already provided.

--
http://jojopaderes.multiply.com
http://jojopaderes.wordpress.com

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers

From
Christopher Browne
Date:
jojo.paderes@gmail.com ("Jojo Paderes") wrote:
> I'd like to know if the latest PostgreSQL release can scale up by
> utilizing multiple cpu or dual core cpu to boost up the sql
> executions.
>
> I already do a research on the PostgreSQL mailing archives and only
> found old threads dating back 2000. A lot of things have improved with
> PostgreSQL and hopefully the support for multiple cpu or dual cores is
> already provided.

If you submit multiple concurrent queries, they can be concurrently
processed on separate CPUs; that has long been supported, and people
have been using SMP systems to this end for years.
--
let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];;
http://cbbrowne.com/info/spreadsheets.html
"In other words  -- and this is the rock solid  principle on which the
whole  of the Corporation's  Galaxy-wide success  is founded  -- their
fundamental design  flaws are  completely hidden by  their superficial
design flaws." -- HHGTG

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Frank Wiles
Date:
On Thu, 23 Mar 2006 14:19:24 +0800
"Jojo Paderes" <jojo.paderes@gmail.com> wrote:

> I'd like to know if the latest PostgreSQL release can scale up by
> utilizing multiple cpu or dual core cpu to boost up the sql
> executions.
>
> I already do a research on the PostgreSQL mailing archives and only
> found old threads dating back 2000. A lot of things have improved with
> PostgreSQL and hopefully the support for multiple cpu or dual cores is
> already provided.

  Yes PostgreSQL can take advantage of multiple CPUs and core, has been
  able to for quite some time.

 ---------------------------------
   Frank Wiles <frank@wiles.org>
   http://www.wiles.org
 ---------------------------------


Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Scott Marlowe
Date:
On Thu, 2006-03-23 at 00:19, Jojo Paderes wrote:
> I'd like to know if the latest PostgreSQL release can scale up by
> utilizing multiple cpu or dual core cpu to boost up the sql
> executions.
>
> I already do a research on the PostgreSQL mailing archives and only
> found old threads dating back 2000. A lot of things have improved with
> PostgreSQL and hopefully the support for multiple cpu or dual cores is
> already provided.

Can a single query be split up into parts and run on separate processors
at the same time?  No.

Can multiple incoming queries be run on different processors for better
performance?  Yes.

Has someone been working on the problem of splitting a query into pieces
and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
done that.

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
"Joshua D. Drake"
Date:
> Has someone been working on the problem of splitting a query into pieces
> and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
> done that.

I believe that is limited to Bizgress MPP yes?


>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Scott Marlowe
Date:
On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
> > Has someone been working on the problem of splitting a query into pieces
> > and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
> > done that.
>
> I believe that is limited to Bizgress MPP yes?

Yep.  I hope that someday it will be released to the postgresql global
dev group for inclusion.  Or at least parts of it.

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Christopher Browne
Date:
Martha Stewart called it a Good Thing when smarlowe@g2switchworks.com (Scott Marlowe) wrote:
> On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
>> > Has someone been working on the problem of splitting a query into pieces
>> > and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
>> > done that.
>>
>> I believe that is limited to Bizgress MPP yes?
>
> Yep.  I hope that someday it will be released to the postgresql global
> dev group for inclusion.  Or at least parts of it.

Question: Does the Bizgress/MPP use threading for this concurrency?
Or forking?

If it does so via forking, that's more portable, and less dependent on
specific complexities of threading implementations (which amounts to
non-portability ;-)).

Most times Jan comes to town, we spend a few minutes musing about the
"splitting queries across threads" problem, and dismiss it again; if
there's the beginning of a "split across processes," that's decidedly
neat :-).
--
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/internet.html
Why do we put suits in a garment bag, and put garments in a suitcase?

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
"Jim C. Nasby"
Date:
On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
> Martha Stewart called it a Good Thing when smarlowe@g2switchworks.com (Scott Marlowe) wrote:
> > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
> >> > Has someone been working on the problem of splitting a query into pieces
> >> > and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
> >> > done that.
> >>
> >> I believe that is limited to Bizgress MPP yes?
> >
> > Yep.  I hope that someday it will be released to the postgresql global
> > dev group for inclusion.  Or at least parts of it.
>
> Question: Does the Bizgress/MPP use threading for this concurrency?
> Or forking?
>
> If it does so via forking, that's more portable, and less dependent on
> specific complexities of threading implementations (which amounts to
> non-portability ;-)).
>
> Most times Jan comes to town, we spend a few minutes musing about the
> "splitting queries across threads" problem, and dismiss it again; if
> there's the beginning of a "split across processes," that's decidedly
> neat :-).

Correct me if I'm wrong, but there's no way to (reasonably) accomplish
that without having some dedicated extra processes laying around that
you can use to execute the queries, no? In other words, the cost of a
fork() during query execution would be too prohibitive...

FWIW, DB2 executes all queries in a dedicated set of processes. The
process handling the connection from the client will pass a query
request off to one of the executor processes. I can't remember which
process actually plans the query, but I know that the executor runs it.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
"Luke Lonergan"
Date:
Christopher,

On 3/23/06 6:22 PM, "Christopher Browne" <cbbrowne@acm.org> wrote:

> Question: Does the Bizgress/MPP use threading for this concurrency?
> Or forking?
>
> If it does so via forking, that's more portable, and less dependent on
> specific complexities of threading implementations (which amounts to
> non-portability ;-)).

OK - I'll byte:

It's process based, we fork backends at slice points in the execution plan.

To take care of the startup latency problem, we persist sets of these
backends, called "gangs".  They appear, persist for connection scope for
reuse, then are disbanded.

> Most times Jan comes to town, we spend a few minutes musing about the
> "splitting queries across threads" problem, and dismiss it again; if
> there's the beginning of a "split across processes," that's decidedly
> neat :-).

:-)

- Luke



Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Chris Browne
Date:
jnasby@pervasive.com ("Jim C. Nasby") writes:
> On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
>> Martha Stewart called it a Good Thing when smarlowe@g2switchworks.com (Scott Marlowe) wrote:
>> > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
>> >> > Has someone been working on the problem of splitting a query into pieces
>> >> > and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
>> >> > done that.
>> >>
>> >> I believe that is limited to Bizgress MPP yes?
>> >
>> > Yep.  I hope that someday it will be released to the postgresql global
>> > dev group for inclusion.  Or at least parts of it.
>>
>> Question: Does the Bizgress/MPP use threading for this concurrency?
>> Or forking?
>>
>> If it does so via forking, that's more portable, and less dependent on
>> specific complexities of threading implementations (which amounts to
>> non-portability ;-)).
>>
>> Most times Jan comes to town, we spend a few minutes musing about the
>> "splitting queries across threads" problem, and dismiss it again; if
>> there's the beginning of a "split across processes," that's decidedly
>> neat :-).
>
> Correct me if I'm wrong, but there's no way to (reasonably) accomplish
> that without having some dedicated extra processes laying around that
> you can use to execute the queries, no? In other words, the cost of a
> fork() during query execution would be too prohibitive...

Counterexample...

The sort of scenario we keep musing about is where you split off a
(thread|process) for each partition of a big table.  There is in fact
a natural such partitioning, in that tables get split at the 1GB mark,
by default.

Consider doing a join against 2 tables that are each 8GB in size
(e.g. - they consist of 8 data files).  Let's assume that the query
plan indicates doing seq scans on both.

You *know* you'll be reading through 16 files, each 1GB in size.
Spawning a process for each of those files doesn't strike me as
"prohibitively expensive."

A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

That particular scenario is one where the fork()s would hardly be
noticeable.

> FWIW, DB2 executes all queries in a dedicated set of processes. The
> process handling the connection from the client will pass a query
> request off to one of the executor processes. I can't remember which
> process actually plans the query, but I know that the executor runs
> it.

It seems to me that the kinds of cases where extra processes/threads
would be warranted are quite likely to be cases where fork()ing may be
an immaterial cost.
--
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
http://www.ntlug.org/~cbbrowne/languages.html
TECO Madness: a moment of convenience, a lifetime of regret.
-- Dave Moon

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Chris Browne
Date:
llonergan@greenplum.com ("Luke Lonergan") writes:
> Christopher,
>
> On 3/23/06 6:22 PM, "Christopher Browne" <cbbrowne@acm.org> wrote:
>
>> Question: Does the Bizgress/MPP use threading for this concurrency?
>> Or forking?
>>
>> If it does so via forking, that's more portable, and less dependent on
>> specific complexities of threading implementations (which amounts to
>> non-portability ;-)).
>
> OK - I'll byte:
>
> It's process based, we fork backends at slice points in the execution plan.

By "slice points", do you mean that you'd try to partition tables
(e.g. - if there's a Seq Scan on a table with 8 1GB segments, you
could spawn as many as 8 processes), or that two scans that are then
merge joined means a process for each scan, and a process for the
merge join?  Or perhaps both :-).  Or perhaps something else entirely ;-).

> To take care of the startup latency problem, we persist sets of
> these backends, called "gangs".  They appear, persist for connection
> scope for reuse, then are disbanded.

If only that could happen to more gangs...
--
output = ("cbbrowne" "@" "cbbrowne.com")
http://cbbrowne.com/info/multiplexor.html
"I'm sorry, the teleportation booth you have reached is not in service
at this  time.   Please  hand-reassemble  your molecules or   call  an
operator to help you...."

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Michael Stone
Date:
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
>A naive read on this is that you might start with one backend process,
>which then spawns 16 more.  Each of those backends is scanning through
>one of those 16 files; they then throw relevant tuples into shared
>memory to be aggregated/joined by the central one.

Of course, table scanning is going to be IO limited in most cases, and
having every query spawn 16 independent IO threads is likely to slow
things down in more cases than it speeds them up. It could work if you
have a bunch of storage devices, but at that point it's probably easier
and more direct to implement a clustered approach.

Mike Stone

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
"Jim C. Nasby"
Date:
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
> > Correct me if I'm wrong, but there's no way to (reasonably) accomplish
> > that without having some dedicated extra processes laying around that
> > you can use to execute the queries, no? In other words, the cost of a
> > fork() during query execution would be too prohibitive...
>
> Counterexample...
>
> The sort of scenario we keep musing about is where you split off a
> (thread|process) for each partition of a big table.  There is in fact
> a natural such partitioning, in that tables get split at the 1GB mark,
> by default.
>
> Consider doing a join against 2 tables that are each 8GB in size
> (e.g. - they consist of 8 data files).  Let's assume that the query
> plan indicates doing seq scans on both.
>
> You *know* you'll be reading through 16 files, each 1GB in size.
> Spawning a process for each of those files doesn't strike me as
> "prohibitively expensive."

Have you ever tried reading from 2 large files on a disk at the same
time, let alone 16? The results ain't pretty.

What you're suggesting maybe makes sense if the two tables are in
different tablespaces, provided you have some additional means to know
if those two tablespaces are on the same set of spindles. Though even
here the usefulness is somewhat suspect, because CPU is a hell of a lot
faster than disks are, unless you have a whole lot of disks. Of course,
this is exactly the target market for MPP.

Where parallel execution really makes sense is when you're doing things
like sorts or hash operations, because those are relatively
CPU-intensive.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Chris Browne
Date:
mstone+postgres@mathom.us (Michael Stone) writes:

> On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
>>A naive read on this is that you might start with one backend process,
>>which then spawns 16 more.  Each of those backends is scanning through
>>one of those 16 files; they then throw relevant tuples into shared
>>memory to be aggregated/joined by the central one.
>
> Of course, table scanning is going to be IO limited in most cases, and
> having every query spawn 16 independent IO threads is likely to slow
> things down in more cases than it speeds them up. It could work if you
> have a bunch of storage devices, but at that point it's probably
> easier and more direct to implement a clustered approach.

All stipulated, yes.  It obviously wouldn't be terribly useful to scan
more aggressively than I/O bandwidth can support.  The point is that
this is one of the kinds of places where concurrent processing could
do some good...
--
let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
http://cbbrowne.com/info/spiritual.html
Save the whales. Collect the whole set.

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From
Bruce Momjian
Date:
Added to TODO:

    * Experiment with multi-threaded backend better resource utilization
      This would allow a single query to make use of multiple CPU's or
      multiple I/O channels simultaneously.


---------------------------------------------------------------------------

Chris Browne wrote:
> mstone+postgres@mathom.us (Michael Stone) writes:
>
> > On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote:
> >>A naive read on this is that you might start with one backend process,
> >>which then spawns 16 more.  Each of those backends is scanning through
> >>one of those 16 files; they then throw relevant tuples into shared
> >>memory to be aggregated/joined by the central one.
> >
> > Of course, table scanning is going to be IO limited in most cases, and
> > having every query spawn 16 independent IO threads is likely to slow
> > things down in more cases than it speeds them up. It could work if you
> > have a bunch of storage devices, but at that point it's probably
> > easier and more direct to implement a clustered approach.
>
> All stipulated, yes.  It obviously wouldn't be terribly useful to scan
> more aggressively than I/O bandwidth can support.  The point is that
> this is one of the kinds of places where concurrent processing could
> do some good...
> --
> let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;;
> http://cbbrowne.com/info/spiritual.html
> Save the whales. Collect the whole set.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

--
  Bruce Momjian   http://candle.pha.pa.us

  + If your life is a hard drive, Christ can be your backup. +