Thread: PostgreSQL process architecture question.
We know PostgreSQL uses one dedicated server process to serve one client connection, what we want to know is whether PostgreSQL use multiple threads inside agents processes to take advantage of multiple CPUs. In our site we have only a few concurrent connections, so what occurs inside agent process is very important to us.
On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote: > We know PostgreSQL uses one dedicated server process to serve one client > connection, what we want to know is whether PostgreSQL use multiple threads > inside agents processes to take advantage of multiple CPUs. In our site we > have only a few concurrent connections, so what occurs inside agent process > is very important to us. No it doesn't. One connection gets one process which uses one CPU at a time.
On Tue, Sep 09, 2008 at 11:35:56PM +0800, Amber wrote: > We know PostgreSQL uses one dedicated server process to serve one client connection, what we want to know is whether PostgreSQL use multiple threads inside agents processes to take advantage of multiple CPUs. No. Note that "threading" is not automatically necessary to get more than one processor to work on a single query. But at the moment, Postgres doesn't do that either. A -- Andrew Sullivan ajs@commandprompt.com +1 503 667 4564 x104 http://www.commandprompt.com/
On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote: > On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote: >> We know PostgreSQL uses one dedicated server process to serve one client >> connection, what we want to know is whether PostgreSQL use multiple threads >> inside agents processes to take advantage of multiple CPUs. In our site we >> have only a few concurrent connections, so what occurs inside agent process >> is very important to us. > > No it doesn't. One connection gets one process which uses one CPU at a time. I understand the history/technical reasons/motivation for this, yet want to ask if anybody has thought about using OpenMP for careful parallelization of per-process work sections? Scanning large (e.g. already locked) arrays, parallel sweeps or calculations might benefit from parallelizatoin without requiring a full-out threaded design. Such an approach could retain the per-process isolation model yet still reap multicore benefits. To boot OpenMP is pretty easy to use and comes with gcc. Since I don't know much about PG's internals and their data dependencies etc. this might well be a dumb idea, but I figured asking couldn't hurt. :) regards Holger
That's it, we have 4 CPUs, each of which has 4 cores, that is we have 16 cores in total, but we have only 4 to 8 concurrent users, who regularly run complex queries. That is we can't use all our CPU resources in such a situation to speed up response time.
> To: pgsql-general@postgresql.org
> From: holger@wizards.de
> Subject: Re: [GENERAL] PostgreSQL process architecture question.
> Date: Tue, 9 Sep 2008 18:30:17 +0200
>
> On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote:
>
> > On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:
> >> We know PostgreSQL uses one dedicated server process to serve one client
> >> connection, what we want to know is whether PostgreSQL use multiple threads
> >> inside agents processes to take advantage of multiple CPUs. In our site we
> >> have only a few concurrent connections, so what occurs inside agent process
> >> is very important to us.
> >
> > No it doesn't. One connection gets one process which uses one CPU at a time.
>
> I understand the history/technical reasons/motivation for this, yet want
> to ask if anybod y has thought about using OpenMP for careful
> parallelization of per-process work sections? Scanning large (e.g. already
> locked) arrays, parallel sweeps or calculations might benefit from
> parallelizatoin without requiring a full-out threaded design. Such an
> approach could retain the per-process isolation model yet still reap
> multicore benefits. To boot OpenMP is pretty easy to use and comes with
> gcc.
>
> Since I don't know much about PG's internals and their data dependencies
> etc. this might well be a dumb idea, but I figured asking couldn't hurt. :)
>
> regards
> Holger
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!
> To: pgsql-general@postgresql.org
> From: holger@wizards.de
> Subject: Re: [GENERAL] PostgreSQL process architecture question.
> Date: Tue, 9 Sep 2008 18:30:17 +0200
>
> On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote:
>
> > On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:
> >> We know PostgreSQL uses one dedicated server process to serve one client
> >> connection, what we want to know is whether PostgreSQL use multiple threads
> >> inside agents processes to take advantage of multiple CPUs. In our site we
> >> have only a few concurrent connections, so what occurs inside agent process
> >> is very important to us.
> >
> > No it doesn't. One connection gets one process which uses one CPU at a time.
>
> I understand the history/technical reasons/motivation for this, yet want
> to ask if anybod y has thought about using OpenMP for careful
> parallelization of per-process work sections? Scanning large (e.g. already
> locked) arrays, parallel sweeps or calculations might benefit from
> parallelizatoin without requiring a full-out threaded design. Such an
> approach could retain the per-process isolation model yet still reap
> multicore benefits. To boot OpenMP is pretty easy to use and comes with
> gcc.
>
> Since I don't know much about PG's internals and their data dependencies
> etc. this might well be a dumb idea, but I figured asking couldn't hurt. :)
>
> regards
> Holger
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!
On Tue, Sep 9, 2008 at 11:17 PM, 小波 顾 <guxiaobo1982@hotmail.com> wrote: > That's it, we have 4 CPUs, each of which has 4 cores, that is we have 16 > cores in total, but we have only 4 to 8 concurrent users, who regularly run > complex queries. That is we can't use all our CPU resources in such a > situation to speed up response time. Unless you have either a small data set or a very powerful RAID array, most the time you won't be CPU bound anyway. But it would be nice to see some work come out to parallelize some of the work done in the back end.
guxiaobo1982@hotmail.com ("Amber") writes: > We know PostgreSQL uses one dedicated server process to serve one > client connection, what we want to know is whether PostgreSQL use > multiple threads inside agents processes to take advantage of > multiple CPUs. In our site we have only a few concurrent > connections, so what occurs inside > agent process is very > important to us. No, PostgreSQL does not attempt to make any use of threading at this time. The FAQ describes this quite nicely: http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F "Why don't you use threads, raw devices, async-I/O, <insert your favorite wizz-bang feature here>? There is always a temptation to use the newest operating system features as soon as they arrive. We resist that temptation. First, we support 15+ operating systems, so any new feature has to be well established before we will consider it. Second, most new wizz-bang features don't provide dramatic improvements. Third, they usually have some downside, such as decreased reliability or additional code required. Therefore, we don't rush to use new features but rather wait for the feature to be established, then ask for testing to show that a measurable improvement is possible. As an example, threads are not currently used in the backend code because: * Historically, threads were unsupported and buggy. * An error in one backend can corrupt other backends. * Speed improvements using threads are small compared to the remaining backend startup time. * The backend code would be more complex. So, we are not ignorant of new features. It is just that we are cautious about their adoption. The TODO list often contains links to discussions showing our reasoning in these areas." -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/oses.html Given recent events in Florida, the tourism board in Texas has developed a new advertising campaign based on the slogan "Ya'll come to Texas, where we ain't shot a tourist in a car since November 1963."
On Wed, 2008-09-10 at 00:02 -0600, Scott Marlowe wrote:
I would have agreed with this several years ago, but many folks now buy enough RAM to reduce the impact of IO. We're routinely CPU-bound on small queries, and even on some large ones, on a 32GB / 16-core Opteron box that serves a ~200GB database (on disk tables+indexes).
Does anyone know of research/references on query optimizers that include parallelization as part of the cost estimate? I can envision how PostgreSQL might parallelize a query plan that was optimized with an assumption of one core. However, I wonder whether cpu and io costs are sufficient for efficient parallel query optimization -- presumably contention for memory (for parallel sorts, say) becomes critical.
-Reece
Unless you have either a small data set or a very powerful RAID array, most the time you won't be CPU bound anyway. But it would be nice to see some work come out to parallelize some of the work done in the back end.
I would have agreed with this several years ago, but many folks now buy enough RAM to reduce the impact of IO. We're routinely CPU-bound on small queries, and even on some large ones, on a 32GB / 16-core Opteron box that serves a ~200GB database (on disk tables+indexes).
Does anyone know of research/references on query optimizers that include parallelization as part of the cost estimate? I can envision how PostgreSQL might parallelize a query plan that was optimized with an assumption of one core. However, I wonder whether cpu and io costs are sufficient for efficient parallel query optimization -- presumably contention for memory (for parallel sorts, say) becomes critical.
-Reece
-- Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0 |