Thread: Scaling up PostgreSQL in Multiple CPU / Dual Core Powered Servers
I'd like to know if the latest PostgreSQL release can scale up by utilizing multiple cpu or dual core cpu to boost up the sql executions. I already do a research on the PostgreSQL mailing archives and only found old threads dating back 2000. A lot of things have improved with PostgreSQL and hopefully the support for multiple cpu or dual cores is already provided. -- http://jojopaderes.multiply.com http://jojopaderes.wordpress.com
jojo.paderes@gmail.com ("Jojo Paderes") wrote: > I'd like to know if the latest PostgreSQL release can scale up by > utilizing multiple cpu or dual core cpu to boost up the sql > executions. > > I already do a research on the PostgreSQL mailing archives and only > found old threads dating back 2000. A lot of things have improved with > PostgreSQL and hopefully the support for multiple cpu or dual cores is > already provided. If you submit multiple concurrent queries, they can be concurrently processed on separate CPUs; that has long been supported, and people have been using SMP systems to this end for years. -- let name="cbbrowne" and tld="gmail.com" in String.concat "@" [name;tld];; http://cbbrowne.com/info/spreadsheets.html "In other words -- and this is the rock solid principle on which the whole of the Corporation's Galaxy-wide success is founded -- their fundamental design flaws are completely hidden by their superficial design flaws." -- HHGTG
On Thu, 23 Mar 2006 14:19:24 +0800 "Jojo Paderes" <jojo.paderes@gmail.com> wrote: > I'd like to know if the latest PostgreSQL release can scale up by > utilizing multiple cpu or dual core cpu to boost up the sql > executions. > > I already do a research on the PostgreSQL mailing archives and only > found old threads dating back 2000. A lot of things have improved with > PostgreSQL and hopefully the support for multiple cpu or dual cores is > already provided. Yes PostgreSQL can take advantage of multiple CPUs and core, has been able to for quite some time. --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org ---------------------------------
On Thu, 2006-03-23 at 00:19, Jojo Paderes wrote: > I'd like to know if the latest PostgreSQL release can scale up by > utilizing multiple cpu or dual core cpu to boost up the sql > executions. > > I already do a research on the PostgreSQL mailing archives and only > found old threads dating back 2000. A lot of things have improved with > PostgreSQL and hopefully the support for multiple cpu or dual cores is > already provided. Can a single query be split up into parts and run on separate processors at the same time? No. Can multiple incoming queries be run on different processors for better performance? Yes. Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that.
> Has someone been working on the problem of splitting a query into pieces > and running it on multiple CPUs / multiple machines? Yes. Bizgress has > done that. I believe that is limited to Bizgress MPP yes? > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: > > Has someone been working on the problem of splitting a query into pieces > > and running it on multiple CPUs / multiple machines? Yes. Bizgress has > > done that. > > I believe that is limited to Bizgress MPP yes? Yep. I hope that someday it will be released to the postgresql global dev group for inclusion. Or at least parts of it.
Martha Stewart called it a Good Thing when smarlowe@g2switchworks.com (Scott Marlowe) wrote: > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: >> > Has someone been working on the problem of splitting a query into pieces >> > and running it on multiple CPUs / multiple machines? Yes. Bizgress has >> > done that. >> >> I believe that is limited to Bizgress MPP yes? > > Yep. I hope that someday it will be released to the postgresql global > dev group for inclusion. Or at least parts of it. Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). Most times Jan comes to town, we spend a few minutes musing about the "splitting queries across threads" problem, and dismiss it again; if there's the beginning of a "split across processes," that's decidedly neat :-). -- output = reverse("moc.liamg" "@" "enworbbc") http://linuxfinances.info/info/internet.html Why do we put suits in a garment bag, and put garments in a suitcase?
On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote: > Martha Stewart called it a Good Thing when smarlowe@g2switchworks.com (Scott Marlowe) wrote: > > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: > >> > Has someone been working on the problem of splitting a query into pieces > >> > and running it on multiple CPUs / multiple machines? Yes. Bizgress has > >> > done that. > >> > >> I believe that is limited to Bizgress MPP yes? > > > > Yep. I hope that someday it will be released to the postgresql global > > dev group for inclusion. Or at least parts of it. > > Question: Does the Bizgress/MPP use threading for this concurrency? > Or forking? > > If it does so via forking, that's more portable, and less dependent on > specific complexities of threading implementations (which amounts to > non-portability ;-)). > > Most times Jan comes to town, we spend a few minutes musing about the > "splitting queries across threads" problem, and dismiss it again; if > there's the beginning of a "split across processes," that's decidedly > neat :-). Correct me if I'm wrong, but there's no way to (reasonably) accomplish that without having some dedicated extra processes laying around that you can use to execute the queries, no? In other words, the cost of a fork() during query execution would be too prohibitive... FWIW, DB2 executes all queries in a dedicated set of processes. The process handling the connection from the client will pass a query request off to one of the executor processes. I can't remember which process actually plans the query, but I know that the executor runs it. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Christopher, On 3/23/06 6:22 PM, "Christopher Browne" <cbbrowne@acm.org> wrote: > Question: Does the Bizgress/MPP use threading for this concurrency? > Or forking? > > If it does so via forking, that's more portable, and less dependent on > specific complexities of threading implementations (which amounts to > non-portability ;-)). OK - I'll byte: It's process based, we fork backends at slice points in the execution plan. To take care of the startup latency problem, we persist sets of these backends, called "gangs". They appear, persist for connection scope for reuse, then are disbanded. > Most times Jan comes to town, we spend a few minutes musing about the > "splitting queries across threads" problem, and dismiss it again; if > there's the beginning of a "split across processes," that's decidedly > neat :-). :-) - Luke
jnasby@pervasive.com ("Jim C. Nasby") writes: > On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote: >> Martha Stewart called it a Good Thing when smarlowe@g2switchworks.com (Scott Marlowe) wrote: >> > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: >> >> > Has someone been working on the problem of splitting a query into pieces >> >> > and running it on multiple CPUs / multiple machines? Yes. Bizgress has >> >> > done that. >> >> >> >> I believe that is limited to Bizgress MPP yes? >> > >> > Yep. I hope that someday it will be released to the postgresql global >> > dev group for inclusion. Or at least parts of it. >> >> Question: Does the Bizgress/MPP use threading for this concurrency? >> Or forking? >> >> If it does so via forking, that's more portable, and less dependent on >> specific complexities of threading implementations (which amounts to >> non-portability ;-)). >> >> Most times Jan comes to town, we spend a few minutes musing about the >> "splitting queries across threads" problem, and dismiss it again; if >> there's the beginning of a "split across processes," that's decidedly >> neat :-). > > Correct me if I'm wrong, but there's no way to (reasonably) accomplish > that without having some dedicated extra processes laying around that > you can use to execute the queries, no? In other words, the cost of a > fork() during query execution would be too prohibitive... Counterexample... The sort of scenario we keep musing about is where you split off a (thread|process) for each partition of a big table. There is in fact a natural such partitioning, in that tables get split at the 1GB mark, by default. Consider doing a join against 2 tables that are each 8GB in size (e.g. - they consist of 8 data files). Let's assume that the query plan indicates doing seq scans on both. You *know* you'll be reading through 16 files, each 1GB in size. Spawning a process for each of those files doesn't strike me as "prohibitively expensive." A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. That particular scenario is one where the fork()s would hardly be noticeable. > FWIW, DB2 executes all queries in a dedicated set of processes. The > process handling the connection from the client will pass a query > request off to one of the executor processes. I can't remember which > process actually plans the query, but I know that the executor runs > it. It seems to me that the kinds of cases where extra processes/threads would be warranted are quite likely to be cases where fork()ing may be an immaterial cost. -- let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];; http://www.ntlug.org/~cbbrowne/languages.html TECO Madness: a moment of convenience, a lifetime of regret. -- Dave Moon
llonergan@greenplum.com ("Luke Lonergan") writes: > Christopher, > > On 3/23/06 6:22 PM, "Christopher Browne" <cbbrowne@acm.org> wrote: > >> Question: Does the Bizgress/MPP use threading for this concurrency? >> Or forking? >> >> If it does so via forking, that's more portable, and less dependent on >> specific complexities of threading implementations (which amounts to >> non-portability ;-)). > > OK - I'll byte: > > It's process based, we fork backends at slice points in the execution plan. By "slice points", do you mean that you'd try to partition tables (e.g. - if there's a Seq Scan on a table with 8 1GB segments, you could spawn as many as 8 processes), or that two scans that are then merge joined means a process for each scan, and a process for the merge join? Or perhaps both :-). Or perhaps something else entirely ;-). > To take care of the startup latency problem, we persist sets of > these backends, called "gangs". They appear, persist for connection > scope for reuse, then are disbanded. If only that could happen to more gangs... -- output = ("cbbrowne" "@" "cbbrowne.com") http://cbbrowne.com/info/multiplexor.html "I'm sorry, the teleportation booth you have reached is not in service at this time. Please hand-reassemble your molecules or call an operator to help you...."
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: >A naive read on this is that you might start with one backend process, >which then spawns 16 more. Each of those backends is scanning through >one of those 16 files; they then throw relevant tuples into shared >memory to be aggregated/joined by the central one. Of course, table scanning is going to be IO limited in most cases, and having every query spawn 16 independent IO threads is likely to slow things down in more cases than it speeds them up. It could work if you have a bunch of storage devices, but at that point it's probably easier and more direct to implement a clustered approach. Mike Stone
On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: > > Correct me if I'm wrong, but there's no way to (reasonably) accomplish > > that without having some dedicated extra processes laying around that > > you can use to execute the queries, no? In other words, the cost of a > > fork() during query execution would be too prohibitive... > > Counterexample... > > The sort of scenario we keep musing about is where you split off a > (thread|process) for each partition of a big table. There is in fact > a natural such partitioning, in that tables get split at the 1GB mark, > by default. > > Consider doing a join against 2 tables that are each 8GB in size > (e.g. - they consist of 8 data files). Let's assume that the query > plan indicates doing seq scans on both. > > You *know* you'll be reading through 16 files, each 1GB in size. > Spawning a process for each of those files doesn't strike me as > "prohibitively expensive." Have you ever tried reading from 2 large files on a disk at the same time, let alone 16? The results ain't pretty. What you're suggesting maybe makes sense if the two tables are in different tablespaces, provided you have some additional means to know if those two tablespaces are on the same set of spindles. Though even here the usefulness is somewhat suspect, because CPU is a hell of a lot faster than disks are, unless you have a whole lot of disks. Of course, this is exactly the target market for MPP. Where parallel execution really makes sense is when you're doing things like sorts or hash operations, because those are relatively CPU-intensive. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
mstone+postgres@mathom.us (Michael Stone) writes: > On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: >>A naive read on this is that you might start with one backend process, >>which then spawns 16 more. Each of those backends is scanning through >>one of those 16 files; they then throw relevant tuples into shared >>memory to be aggregated/joined by the central one. > > Of course, table scanning is going to be IO limited in most cases, and > having every query spawn 16 independent IO threads is likely to slow > things down in more cases than it speeds them up. It could work if you > have a bunch of storage devices, but at that point it's probably > easier and more direct to implement a clustered approach. All stipulated, yes. It obviously wouldn't be terribly useful to scan more aggressively than I/O bandwidth can support. The point is that this is one of the kinds of places where concurrent processing could do some good... -- let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;; http://cbbrowne.com/info/spiritual.html Save the whales. Collect the whole set.
Added to TODO: * Experiment with multi-threaded backend better resource utilization This would allow a single query to make use of multiple CPU's or multiple I/O channels simultaneously. --------------------------------------------------------------------------- Chris Browne wrote: > mstone+postgres@mathom.us (Michael Stone) writes: > > > On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: > >>A naive read on this is that you might start with one backend process, > >>which then spawns 16 more. Each of those backends is scanning through > >>one of those 16 files; they then throw relevant tuples into shared > >>memory to be aggregated/joined by the central one. > > > > Of course, table scanning is going to be IO limited in most cases, and > > having every query spawn 16 independent IO threads is likely to slow > > things down in more cases than it speeds them up. It could work if you > > have a bunch of storage devices, but at that point it's probably > > easier and more direct to implement a clustered approach. > > All stipulated, yes. It obviously wouldn't be terribly useful to scan > more aggressively than I/O bandwidth can support. The point is that > this is one of the kinds of places where concurrent processing could > do some good... > -- > let name="cbbrowne" and tld="acm.org" in name ^ "@" ^ tld;; > http://cbbrowne.com/info/spiritual.html > Save the whales. Collect the whole set. > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Bruce Momjian http://candle.pha.pa.us + If your life is a hard drive, Christ can be your backup. +