Thread: PostgreSQL process architecture question.

PostgreSQL process architecture question.

From
"Amber"
Date:
We know PostgreSQL uses one dedicated server process to serve one client connection, what we want to know is whether PostgreSQL use multiple threads inside agents processes to take advantage of multiple CPUs. In our site we have only a few concurrent connections, so what occurs inside agent process is very important to us.

Re: PostgreSQL process architecture question.

From
"Scott Marlowe"
Date:
On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:
> We know PostgreSQL uses one dedicated server process to serve one client
> connection, what we want to know is whether PostgreSQL use multiple threads
> inside agents processes to take advantage of multiple CPUs. In our site we
> have only a few concurrent connections, so what occurs inside agent process
> is very important to us.

No it doesn't.  One connection gets one process which uses one CPU at a time.

Re: PostgreSQL process architecture question.

From
Andrew Sullivan
Date:
On Tue, Sep 09, 2008 at 11:35:56PM +0800, Amber wrote:
> We know PostgreSQL uses one dedicated server process to serve one
client connection, what we want to know is whether PostgreSQL use
multiple threads inside agents processes to take advantage of multiple
CPUs.

No.  Note that "threading" is not automatically necessary to get more
than one processor to work on a single query.  But at the moment,
Postgres doesn't do that either.

A
--
Andrew Sullivan
ajs@commandprompt.com
+1 503 667 4564 x104
http://www.commandprompt.com/

Re: PostgreSQL process architecture question.

From
"Holger Hoffstaette"
Date:
On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote:

> On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:
>> We know PostgreSQL uses one dedicated server process to serve one client
>> connection, what we want to know is whether PostgreSQL use multiple threads
>> inside agents processes to take advantage of multiple CPUs. In our site we
>> have only a few concurrent connections, so what occurs inside agent process
>> is very important to us.
>
> No it doesn't.  One connection gets one process which uses one CPU at a time.

I understand the history/technical reasons/motivation for this, yet want
to ask if anybody has thought about using OpenMP for careful
parallelization of per-process work sections? Scanning large (e.g. already
locked) arrays, parallel sweeps or calculations might benefit from
parallelizatoin without requiring a full-out threaded design. Such an
approach could retain the per-process isolation model yet still reap
multicore benefits. To boot OpenMP is pretty easy to use and comes with
gcc.

Since I don't know much about PG's internals and their data dependencies
etc. this might well be a dumb idea, but I figured asking couldn't hurt. :)

regards
Holger


Re: PostgreSQL process architecture question.

From
小波 顾
Date:
That's it, we have 4 CPUs, each of which has 4 cores, that is we have 16 cores in total, but we have only 4  to 8 concurrent users, who regularly run complex queries. That is we can't use all our CPU resources in such a situation to speed up response time.




> To: pgsql-general@postgresql.org
> From: holger@wizards.de
> Subject: Re: [GENERAL] PostgreSQL process architecture question.
> Date: Tue, 9 Sep 2008 18:30:17 +0200
>
> On Tue, 09 Sep 2008 10:07:32 -0600, Scott Marlowe wrote:
>
> > On Tue, Sep 9, 2008 at 9:35 AM, Amber <guxiaobo1982@hotmail.com> wrote:
> >> We know PostgreSQL uses one dedicated server process to serve one client
> >> connection, what we want to know is whether PostgreSQL use multiple threads
> >> inside agents processes to take advantage of multiple CPUs. In our site we
> >> have only a few concurrent connections, so what occurs inside agent process
> >> is very important to us.
> >
> > No it doesn't. One connection gets one process which uses one CPU at a time.
>
> I understand the history/technical reasons/motivation for this, yet want
> to ask if anybod y has thought about using OpenMP for careful
> parallelization of per-process work sections? Scanning large (e.g. already
> locked) arrays, parallel sweeps or calculations might benefit from
> parallelizatoin without requiring a full-out threaded design. Such an
> approach could retain the per-process isolation model yet still reap
> multicore benefits. To boot OpenMP is pretty easy to use and comes with
> gcc.
>
> Since I don't know much about PG's internals and their data dependencies
> etc. this might well be a dumb idea, but I figured asking couldn't hurt. :)
>
> regards
> Holger
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Invite your mail contacts to join your friends list with Windows Live Spaces. It's easy! Try it!

Re: PostgreSQL process architecture question.

From
"Scott Marlowe"
Date:
On Tue, Sep 9, 2008 at 11:17 PM, 小波 顾 <guxiaobo1982@hotmail.com> wrote:
> That's it, we have 4 CPUs, each of which has 4 cores, that is we have 16
> cores in total, but we have only 4  to 8 concurrent users, who regularly run
> complex queries. That is we can't use all our CPU resources in such a
> situation to speed up response time.

Unless you have either a small data set or a very powerful RAID array,
most the time you won't be CPU bound anyway.  But it would be nice to
see some work come out to parallelize some of the work done in the
back end.

Re: PostgreSQL process architecture question.

From
Chris Browne
Date:
guxiaobo1982@hotmail.com ("Amber") writes:
>    We know PostgreSQL uses one dedicated server process to serve one
>  client connection, what we want to know is whether PostgreSQL use
>  multiple threads inside agents processes to take advantage of
>  multiple CPUs. In our site we have only a few concurrent
>  connections, so what occurs inside > agent process is very
>  important to us.

No, PostgreSQL does not attempt to make any use of threading at this
time.  The FAQ describes this quite nicely:


http://wiki.postgresql.org/wiki/Developer_FAQ#Why_don.27t_you_use_threads.2C_raw_devices.2C_async-I.2FO.2C_.3Cinsert_your_favorite_wizz-bang_feature_here.3E.3F

"Why don't you use threads, raw devices, async-I/O, <insert your favorite wizz-bang feature here>?

There is always a temptation to use the newest operating system features as soon as they arrive. We resist that
temptation.

First, we support 15+ operating systems, so any new feature has to be
well established before we will consider it. Second, most new
wizz-bang features don't provide dramatic improvements. Third, they
usually have some downside, such as decreased reliability or
additional code required. Therefore, we don't rush to use new features
but rather wait for the feature to be established, then ask for
testing to show that a measurable improvement is possible.

As an example, threads are not currently used in the backend code because:

    * Historically, threads were unsupported and buggy.
    * An error in one backend can corrupt other backends.
    * Speed improvements using threads are small compared to the remaining backend startup time.
    * The backend code would be more complex.

So, we are not ignorant of new features. It is just that we are
cautious about their adoption. The TODO list often contains links to
discussions showing our reasoning in these areas."
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://cbbrowne.com/info/oses.html
Given  recent  events in  Florida,  the  tourism  board in  Texas  has
developed a new  advertising campaign based on the  slogan "Ya'll come
to Texas, where we ain't shot a tourist in a car since November 1963."

Re: PostgreSQL process architecture question.

From
Reece Hart
Date:
On Wed, 2008-09-10 at 00:02 -0600, Scott Marlowe wrote:
Unless you have either a small data set or a very powerful RAID array, most the time you won't be CPU bound anyway.  But it would be nice to see some work come out to parallelize some of the work done in the back end.

I would have agreed with this several years ago, but many folks now buy enough RAM to reduce the impact of IO. We're routinely CPU-bound on small queries, and even on some large ones, on a 32GB / 16-core Opteron box that serves a ~200GB database (on disk tables+indexes).

Does anyone know of research/references on query optimizers that include parallelization as part of the cost estimate? I can envision how PostgreSQL might parallelize a query plan that was optimized with an assumption of one core. However, I wonder whether cpu and io costs are sufficient for efficient parallel query optimization -- presumably contention for memory (for parallel sorts, say) becomes critical.

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0