Thread: Utilizing multiple cores for one query

Utilizing multiple cores for one query

From
henk de wit
Date:
I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3) are able to utilize multiple cores for the execution of a single query?

This is one thing that systems like SQL Server and Oracle have been able to do for quite some time. I haven't seen much in the documentation that hints that this may be possible in PG, nor did I find much in the mailinglists about this. The only thing I found was a topic that discussed some patches that may eventually lead to a sequence scan being handled by multiple cores.

Could someone shed some light on the current or future abilities of PG for making use of multiple cores to execute a single query?

Thanks in advance



Express yourself instantly with MSN Messenger! MSN Messenger

Re: Utilizing multiple cores for one query

From
"Jonah H. Harris"
Date:
On Dec 1, 2007 8:21 AM, henk de wit <henk53602@hotmail.com> wrote:
> I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3)
> are able to utilize multiple cores for the execution of a single query?

Nope.

> This is one thing that systems like SQL Server and Oracle have been able to
> do for quite some time. I haven't seen much in the documentation that hints
> that this may be possible in PG, nor did I find much in the mailinglists
> about this. The only thing I found was a topic that discussed some patches
> that may eventually lead to a sequence scan being handled by multiple cores.

I believe the threads you're talking about were related to scanning,
not parallel query.  Though, when Qingqing and I were discussing
parallel query a little over a year ago, I do seem to recall several
uninformed opinions stating that sequential scans were the only thing
it could be useful for.

> Could someone shed some light on the current or future abilities of PG for
> making use of multiple cores to execute a single query?

Currently, the only way to parallelize a query in Postgres is to use pgpool-II.

http://pgpool.projects.postgresql.org/

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation                | fax: 732.331.1301
499 Thornall Street, 2nd Floor          | jonah.harris@enterprisedb.com
Edison, NJ 08837                        | http://www.enterprisedb.com/

Re: Utilizing multiple cores for one query

From
henk de wit
Date:
> > I wonder whether the current versions of postgres (i.e. either 8.2 or 8.3)
> > are able to utilize multiple cores for the execution of a single query?
> Nope.

I see, thanks for the clarification.

Btw, in this thread: http://archives.postgresql.org/pgsql-performance/2007-10/msg00159.php

the following is said:

>You can determine what runs in parellel based on the
>indentation of the output.
>Items at the same indentation level under the same
>"parent" line will run in parallel

Wouldn't this offer some opportunities for running things on multiple cores? Based on the above, many people already seem to think that PG is able to utilize multiple cores for 1 query. Of course, it can be easily "proved" that this does not happen by simply watching at the CPU utilization graphs when executing a query. Nevertheless, those people may wonder why (some of) those items that already run in parallel not actually run in parallel using multiple cores?

> Currently, the only way to parallelize a query in Postgres is to use pgpool-II.
>
> http://pgpool.projects.postgresql.org/

Yes, I noticed this project before. At the time it was not really clear how stable and/or how well supported this is. It indeed seems to support parallel queries automatically by being able to rewrite standard queries. It does seem it needs different DB nodes and is thus probably not able to use multiple cores of a single DBMS. Also, I could not really find how well pgpool-II is doing at making judgments of the level of parallelization it's going to use. E.g. when there are 16 nodes in the system with a currently low utilization, a single query may be split into 16 pieces. On the other hand, when 8 of these nodes are heavily utilized, splitting to 8 pieces might be better. etc.

Anyway, are there any plans for postgresql to support parallelizing queries natively?





Express yourself instantly with MSN Messenger! MSN Messenger

Re: Utilizing multiple cores for one query

From
"Jonah H. Harris"
Date:
On Dec 1, 2007 9:42 AM, henk de wit <henk53602@hotmail.com> wrote:
> Wouldn't this offer some opportunities for running things on multiple cores?

No, it's not actually parallel in the same sense.

> Yes, I noticed this project before. At the time it was not really clear how
> stable and/or how well supported this is. It indeed seems to support
> parallel queries automatically by being able to rewrite standard queries. It
> does seem it needs different DB nodes and is thus probably not able to use
> multiple cores of a single DBMS.

I've seen it actually set up to use multiple connections to the same
DBMS.  How well it would work is pretty much dependent on your
application and the amount of parallelization you could actually gain.


> Also, I could not really find how well
> pgpool-II is doing at making judgments of the level of parallelization it's
> going to use. E.g. when there are 16 nodes in the system with a currently
> low utilization, a single query may be split into 16 pieces. On the other
> hand, when 8 of these nodes are heavily utilized, splitting to 8 pieces
> might be better. etc.

IIRC, it doesn't plan parallelization that way.  It looks at what is
partitioned (by default) on different nodes and parallelizes based on
that.  As I said earlier, you can partition a single node and put
pgpool-II on top of it to gain some parallelization.  Unfortunately,
it isn't capable of handling things like parallel index builds or
other useful maintenance features... but it can do fairly good query
result parallelization.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation                | fax: 732.331.1301
499 Thornall Street, 2nd Floor          | jonah.harris@enterprisedb.com
Edison, NJ 08837                        | http://www.enterprisedb.com/

Re: Utilizing multiple cores for one query

From
Jean-David Beyer
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

henk de wit wrote:
>> > I wonder whether the current versions of postgres (i.e. either 8.2
> or 8.3)
>> > are able to utilize multiple cores for the execution of a single query?
>> Nope.
>
> I see, thanks for the clarification.
>
> Btw, in this thread:
> http://archives.postgresql.org/pgsql-performance/2007-10/msg00159.php
>
> the following is said:
>
>>You can determine what runs in parellel based on the
>>indentation of the output.
>>Items at the same indentation level under the same
>>"parent" line will run in parallel
>
> Wouldn't this offer some opportunities for running things on multiple
> cores? Based on the above, many people already seem to think that PG is
> able to utilize multiple cores for 1 query.

Of course, it depends on just what you mean. Since postgresql is a
client-server system, the client can run on one processor and the server on
another. And that _is_ parallelism in a way. For me in one application, my
client uses about 20% of a processor and the server uses around 80%. But in
more detail,

VIRT   RES  SHR SWAP %MEM %CPU     TIME+ P COMMAND

2019m  94m  93m 1.9g  1.2   79   2:29.97 3 postgres: jdbeyer stock [local]
INSERT
2019m 813m 813m 1.2g 10.2    2  23:38.67 0 postgres: writer process

2018m  29m  29m 1.9g  0.4    0   4:07.59 3 /usr/bin/postmaster -p 5432 -D ...
 8624  652  264 7972  0.0    0   0:00.10 2 postgres: logger process

 9624 1596  204 8028  0.0    0   0:01.07 2 postgres: stats buffer process

 8892  840  280 8052  0.0    0   0:00.74 1 postgres: stats collector process
 6608 2320 1980 4288  0.0   22   1:56.27 0 /home/jdbeyer/bin/enter

The P column shows the processor the process last ran on. In this case, I
might get away with using one processor, it is clearly using all four.

Now this is not processing a single query on multiple cores (in this case,
the "query" is running on core #3 only), but the ancillary stuff is running
on multiple cores and some of it should be charged to the query. And the OS
kernel takes time for IO and stuff as well.

> Of course, it can be easily
> "proved" that this does not happen by simply watching at the CPU
> utilization graphs when executing a query. Nevertheless, those people
> may wonder why (some of) those items that already run in parallel not
> actually run in parallel using multiple cores?
>
>
- --
  .~.  Jean-David Beyer          Registered Linux User 85642.
  /V\  PGP-Key: 9A2FC99A         Registered Machine   241939.
 /( )\ Shrewsbury, New Jersey    http://counter.li.org
 ^^-^^ 11:40:01 up 1 day, 2:02, 5 users, load average: 4.15, 4.14, 4.15
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with CentOS - http://enigmail.mozdev.org

iD8DBQFHUZP/Ptu2XpovyZoRAn2BAKDLCyDrRiSo40u15M5GwY4OkxGlngCfbNHI
7hjIcP1ozr+KYPr43Pck9TA=
=Fawa
-----END PGP SIGNATURE-----

Re: Utilizing multiple cores for one query

From
Matthew
Date:
On Sat, 1 Dec 2007, Jonah H. Harris wrote:
> I believe the threads you're talking about were related to scanning,
> not parallel query.  Though, when Qingqing and I were discussing
> parallel query a little over a year ago, I do seem to recall several
> uninformed opinions stating that sequential scans were the only thing
> it could be useful for.

I would imagine sorting a huge set of results would benefit from
multi-threading, because it can be split up into separate tasks. Heck,
Postgres *already* splits sorting up into multiple chunks when the results
to sort are bigger than fit in memory.

This would benefit a lot of multi-table joins, because being able to sort
a table faster would enable merge joins to be used at lower cost. That's
particularly valuable when you're doing a large summary multi-table join
that uses most of the database contents.

Matthew

--
Beware of bugs in the above code; I have only proved it correct, not
tried it.                                               --Donald Knuth

Re: Utilizing multiple cores for one query

From
"Marko Kreen"
Date:
On 12/1/07, Jonah H. Harris <jonah.harris@gmail.com> wrote:
> Currently, the only way to parallelize a query in Postgres is to use pgpool-II.

FYI: plproxy issues queries for several nodes in parallel too.

--
marko