Thread: Don't Thread On Me (PostgreSQL related)

Don't Thread On Me (PostgreSQL related)

From

Rodrigo E. De León Plicet

Date:

26 January 2012, 18:13:29

Quote:

======================================================================

This thread

http://postgresql.1045698.n5.nabble.com/Multithread-Query-Planner-td5143643.html

was mentioned in a performance sub-group posting. Give it a read.

Back? It means, so far as I can see, that PG is toast. It will fall
down to being the cheap and dirty alternative to MySql, which even
has, at least two, multi-threaded engines. DB2 switched it's *nix
engine to threads from processes with release 9.5. Oracle claims it
for releases going back to 7 (I haven't tried to determine which parts
or applications; Larry has bought so many tchochtkes over the
years...). SQL Server is threaded.

Given that cpu's are breeding threads faster than cores,
PG will fall into irrelevance.

======================================================================

Source:
http://drcoddwasright.blogspot.com/2012/01/dont-thread-on-me.html

Comments?

Re: Don't Thread On Me (PostgreSQL related)

From

Merlin Moncure

Date:

26 January 2012, 19:03:08

On Thu, Jan 26, 2012 at 3:52 PM, Rodrigo E. De León Plicet
<rdeleonp@gmail.com> wrote:
> Quote:
>
> ======================================================================
>
> This thread
>
> http://postgresql.1045698.n5.nabble.com/Multithread-Query-Planner-td5143643.html
>
> was mentioned in a performance sub-group posting. Give it a read.
>
> Back? It means, so far as I can see, that PG is toast. It will fall
> down to being the cheap and dirty alternative to MySql, which even
> has, at least two, multi-threaded engines. DB2 switched it's *nix
> engine to threads from processes with release 9.5. Oracle claims it
> for releases going back to 7 (I haven't tried to determine which parts
> or applications; Larry has bought so many tchochtkes over the
> years...). SQL Server is threaded.
>
> Given that cpu's are breeding threads faster than cores,
> PG will fall into irrelevance.

The author of that post apparently doesn't understand that even though
postgresql hasn't 'switched to threads', it can still do more than one
thing at once.  Each process is itself an execution thread.  A
multi-threaded query planner is perfectly possible in postgresql
architecture -- however each one must reside in it's own process and
you have to use shared memory instead instead of pthreads and locking.
 Big whoop.  The only thing at stake with a multi threaded planner is
optimizing single user tasks which is, while important, a niche
optimization.  PostgreSQL is for more scalable than mysql for
multi-user loads and the gap is increasing.

merlin

Re: Don't Thread On Me (PostgreSQL related)

From

Thomas Kellerer

Date:

26 January 2012, 19:19:03

Rodrigo E. De León Plicet wrote on 26.01.2012 22:52:
> Oracle claims it for releases going back to 7

Not true.

Quote from the Oracle concepts manual:

"Multiple-process Oracle (also called multiuser Oracle) uses several processes to run different parts of the Oracle
Databasecode and additional processes for the users—either one process for each connected user or one or more processes
sharedby multiple users. Most databases are multiuser because a primary advantages of a database is managing data
neededby multiple users simultaneously." 

[...]

"For each user connection, the application is run by a client process that is different from the dedicated server
processthat runs the database code. Each client process is associated with its own server process" 

Taken from: http://docs.oracle.com/cd/E11882_01/server.112/e25789/process.htm#i16977

So the Oracle architecture is very similar to the one that PostgreSQL uses - at least on Linux/Unix. On Windows this is
doneusing threads (I think this is because Windows is not as efficient in running multiple processes as Linux/Unix).

Re: Don't Thread On Me (PostgreSQL related)

From

Chris Travers

Date:

26 January 2012, 19:32:48

On Thu, Jan 26, 2012 at 3:02 PM, Merlin Moncure <mmoncure@gmail.com> wrote:

On Thu, Jan 26, 2012 at 3:52 PM, Rodrigo E. De León Plicet
<rdeleonp@gmail.com> wrote:
> Quote:
>
> ======================================================================
>
> This thread
>
> http://postgresql.1045698.n5.nabble.com/Multithread-Query-Planner-td5143643.html
>
> was mentioned in a performance sub-group posting. Give it a read.
>
> Back? It means, so far as I can see, that PG is toast. It will fall
> down to being the cheap and dirty alternative to MySql, which even
> has, at least two, multi-threaded engines. DB2 switched it's *nix
> engine to threads from processes with release 9.5. Oracle claims it
> for releases going back to 7 (I haven't tried to determine which parts
> or applications; Larry has bought so many tchochtkes over the
> years...). SQL Server is threaded.
>
> Given that cpu's are breeding threads faster than cores,
> PG will fall into irrelevance.

The author of that post apparently doesn't understand that even though
postgresql hasn't 'switched to threads', it can still do more than one
thing at once. Each process is itself an execution thread. A
multi-threaded query planner is perfectly possible in postgresql
architecture -- however each one must reside in it's own process and
you have to use shared memory instead instead of pthreads and locking.
Big whoop. The only thing at stake with a multi threaded planner is
optimizing single user tasks which is, while important, a niche
optimization. PostgreSQL is for more scalable than mysql for
multi-user loads and the gap is increasing.

There are cases where intraquery parallelism would be helpful. As far as I understand it, PostgreSQL is the only major, solid (i.e. excluding MySQL) RDBMS which does not offer some sort of intraquery parallelism, and when running queries across very large databases, it might be helpful to be able to, say, scan different partitions simultaneously using different threads. So I think it is wrong to simply dismiss the need out of hand. The thing though is that I am not sure that where this need really comes to the fore, it is typical of single-server instances, and so this brings me to the bigger question.

The question in my mind though is a more basic one: How should intraquery parallelism be handled? Is it something PostgreSQL needs to do or is it something that should be the work of an external project like Postgres-XC? Down the road is there value in merging the codebases, perhaps making stand-alone/data/coordination node a compile time option?

Obviously such is not a question that needs to be addressed now. We can wait until someone has something that is production-ready and relatively feature-complete before discussing merging projects.

Best Wishes,
Chris Travers

Re: Don't Thread On Me (PostgreSQL related)

From

Chris Travers

Date:

26 January 2012, 19:36:28

On Thu, Jan 26, 2012 at 3:18 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:

Rodrigo E. De León Plicet wrote on 26.01.2012 22:52:

Oracle claims it for releases going back to 7

Not true.

Quote from the Oracle concepts manual:

"Multiple-process Oracle (also called multiuser Oracle) uses several processes to run different parts of the Oracle Database code and additional processes for the users—either one process for each connected user or one or more processes shared by multiple users. Most databases are multiuser because a primary advantages of a database is managing data needed by multiple users simultaneously."

Oracle offers intra-query parallelism. I am not entirely sure how they do it, but it is supported. I don't know if these subtasks are pthreads within the separate session process or if they are additional processes.

Best Wishes,

Chris Travers

Re: Don't Thread On Me (PostgreSQL related)

From

Magnus Hagander

Date:

27 January 2012, 05:09:41

On Fri, Jan 27, 2012 at 00:32, Chris Travers <chris.travers@gmail.com> wrote:
>
>
> On Thu, Jan 26, 2012 at 3:02 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>> On Thu, Jan 26, 2012 at 3:52 PM, Rodrigo E. De León Plicet
>> <rdeleonp@gmail.com> wrote:
>> > Quote:
>> >
>> > ======================================================================
>> >
>> > This thread
>> >
>> >
>> > http://postgresql.1045698.n5.nabble.com/Multithread-Query-Planner-td5143643.html
>> >
>> > was mentioned in a performance sub-group posting. Give it a read.
>> >
>> > Back? It means, so far as I can see, that PG is toast. It will fall
>> > down to being the cheap and dirty alternative to MySql, which even
>> > has, at least two, multi-threaded engines. DB2 switched it's *nix
>> > engine to threads from processes with release 9.5. Oracle claims it
>> > for releases going back to 7 (I haven't tried to determine which parts
>> > or applications; Larry has bought so many tchochtkes over the
>> > years...). SQL Server is threaded.
>> >
>> > Given that cpu's are breeding threads faster than cores,
>> > PG will fall into irrelevance.
>>
>> The author of that post apparently doesn't understand that even though
>> postgresql hasn't 'switched to threads', it can still do more than one
>> thing at once.  Each process is itself an execution thread.  A
>> multi-threaded query planner is perfectly possible in postgresql
>> architecture -- however each one must reside in it's own process and
>> you have to use shared memory instead instead of pthreads and locking.
>>  Big whoop.  The only thing at stake with a multi threaded planner is
>> optimizing single user tasks which is, while important, a niche
>> optimization.  PostgreSQL is for more scalable than mysql for
>> multi-user loads and the gap is increasing.
>>
>>
> There are cases where intraquery parallelism would be helpful.  As far as I
> understand it, PostgreSQL is the only major, solid (i.e. excluding MySQL)
> RDBMS which does not offer some sort of intraquery parallelism, and when
> running queries across very large databases, it might be helpful to be able
> to, say, scan different partitions simultaneously using different threads.
>  So I think it is wrong to simply dismiss the need out of hand.  The thing
> though is that I am not sure that where this need really comes to the fore,
> it is typical of single-server instances, and so this brings me to the
> bigger question.

Intraquery parallelism is certainly something PostgreSQL is in need
of, and it's going to get more and more obvious over the next couple
of years.

Whether it uses threads or not is an implementation detail, just like
processing of regular queries on threads or processes or pools is an
implementation detail.

So the lack of threads isn't a problem - the lack of intraquery parallelism is.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Don't Thread On Me (PostgreSQL related)

From

Eduardo Morras

Date:

27 January 2012, 05:28:22

At 00:32 27/01/2012, you wrote:

>There are cases where intraquery parallelism would be helpful.  As
>far as I understand it, PostgreSQL is the only major, solid (i.e.
>excluding MySQL) RDBMS which does not offer some sort of intraquery
>parallelism, and when running queries across very large databases,
>it might be helpful to be able to, say, scan different partitions
>simultaneously using different threads.  So I think it is wrong to
>simply dismiss the need out of hand.  The thing though is that I am
>not sure that where this need really comes to the fore, it is
>typical of single-server instances, and so this brings me to the
>bigger question.
>
>The question in my mind though is a more basic one:  How should
>intraquery parallelism be handled?  Is it something PostgreSQL needs
>to do or is it something that should be the work of an external
>project like Postgres-XC?  Down the road is there value in merging
>the codebases, perhaps making stand-alone/data/coordination node a
>compile time option?

I still don't think threads are the solution for this scenary. You
can do intraquery parallelism with multiprocess easier and safer than
with multithread. You launch a process with the whole query, it
divide the work in chunks and assigns them to different process
instead of threads. You can use shared resources for communicattion
between process. When all work is done, they pass results to the
original process and it join them. The principal advantage doing it
with process is that if one of the child subprocess dies, it can be
killed/slained and relaunched without any damage to the work of the
other brothers, but if you use threads, the whole process and all the
work done is lost.

It's not the unique advantage of using process vs threads. Some years
ago, one of the problems on multi socket servers was with the shared
memory and communications between the sockets. The inter cpu speed
was too much slow and latency too much high. Now, we have multi cpus
in one socket and faster intersocket communications and this is not a
problem anymore. Even better, the speed and latency communicating 2
or more servers (not sockets or cpus) is reaching levels where a
postgresql could have a shared memory between them, for example using
Hypertransport cards or modern FC, and it's easier, lot easier,
launch a remote process than a remote thread.

>Obviously such is not a question that needs to be addressed now.  We
>can wait until someone has something that is production-ready and
>relatively feature-complete before discussing merging projects.
>
>Best Wishes,
>Chris Travers

Re: Don't Thread On Me (PostgreSQL related)

From

Chris Travers

Date:

27 January 2012, 07:04:12

On Fri, Jan 27, 2012 at 1:28 AM, Eduardo Morras <nec556@retena.com> wrote:

At 00:32 27/01/2012, you wrote:

There are cases where intraquery parallelism would be helpful. As far as I understand it, PostgreSQL is the only major, solid (i.e. excluding MySQL) RDBMS which does not offer some sort of intraquery parallelism, and when running queries across very large databases, it might be helpful to be able to, say, scan different partitions simultaneously using different threads. So I think it is wrong to simply dismiss the need out of hand. The thing though is that I am not sure that where this need really comes to the fore, it is typical of single-server instances, and so this brings me to the bigger question.

The question in my mind though is a more basic one: How should intraquery parallelism be handled? Is it something PostgreSQL needs to do or is it something that should be the work of an external project like Postgres-XC? Down the road is there value in merging the codebases, perhaps making stand-alone/data/coordination node a compile time option?

I still don't think threads are the solution for this scenary. You can do intraquery parallelism with multiprocess easier and safer than with multithread. You launch a process with the whole query, it divide the work in chunks and assigns them to different process instead of threads. You can use shared resources for communicattion between process. When all work is done, they pass results to the original process and it join them. The principal advantage doing it with process is that if one of the child subprocess dies, it can be killed/slained and relaunched without any damage to the work of the other brothers, but if you use threads, the whole process and all the work done is lost.

Well, I am assuming that when anything regarding a query crashes, the work for that query should be lost so I don't see that as a big issue provided that you still have one process per session.

The larger issue would be rewriting the backend so that this is safe, and it would complicate QA. For this reason, I assume for now that this is not the way to go.

It's not the unique advantage of using process vs threads. Some years ago, one of the problems on multi socket servers was with the shared memory and communications between the sockets. The inter cpu speed was too much slow and latency too much high. Now, we have multi cpus in one socket and faster intersocket communications and this is not a problem anymore. Even better, the speed and latency communicating 2 or more servers (not sockets or cpus) is reaching levels where a postgresql could have a shared memory between them, for example using Hypertransport cards or modern FC, and it's easier, lot easier, launch a remote process than a remote thread.

But this gets back to my question: are there significant use cases where intraquery parallelism makes sense where clustering across servers does not? The reason I ask is that if there are not, then the work that's going into Postgres-XC would get us there entirely, in a multi-process (single-threaded), two tiered, network transparent model that would potentially scale up well.

Best Wishes,

Chris Travers

Re: Don't Thread On Me (PostgreSQL related)

From

Eduardo Morras

Date:

27 January 2012, 13:28:01

At 00:32 27/01/2012, you wrote:

>There are cases where intraquery parallelism would be helpful.  As
>far as I understand it, PostgreSQL is the only major, solid (i.e.
>excluding MySQL) RDBMS which does not offer some sort of intraquery
>parallelism, and when running queries across very large databases,
>it might be helpful to be able to, say, scan different partitions
>simultaneously using different threads.  So I think it is wrong to
>simply dismiss the need out of hand.  The thing though is that I am
>not sure that where this need really comes to the fore, it is
>typical of single-server instances, and so this brings me to the
>bigger question.
>
>The question in my mind though is a more basic one:  How should
>intraquery parallelism be handled?  Is it something PostgreSQL needs
>to do or is it something that should be the work of an external
>project like Postgres-XC?  Down the road is there value in merging
>the codebases, perhaps making stand-alone/data/coordination node a
>compile time option?

I still don't think threads are the solution for this scenary. You
can do intraquery parallelism with multiprocess easier and safer than
with multithread. You launch a process with the whole query, it
divide the work in chunks and assigns them to different process
instead of threads. You can use shared resources for communicattion
between process. When all work is done, they pass results to the
original process and it join them. The principal advantage doing it
with process is that if one of the child subprocess dies, it can be
killed/slained and relaunched without any damage to the work of the
other brothers, but if you use threads, the whole process and all the
work done is lost.

It's not the unique advantage of using process vs threads. Some years
ago, one of the problems on multi socket servers was with the shared
memory and communications between the sockets. The inter cpu speed
was too much slow and latency too much high. Now, we have multi cpus
in one socket and faster intersocket communications and this is not a
problem anymore. Even better, the speed and latency communicating 2
or more servers (not sockets or cpus) is reaching levels where a
postgresql could have a shared memory between them, for example using
Hypertransport cards or modern FC, and it's easier, lot easier,
launch a remote process than a remote thread.

>Obviously such is not a question that needs to be addressed now.  We
>can wait until someone has something that is production-ready and
>relatively feature-complete before discussing merging projects.
>
>Best Wishes,
>Chris Travers

Re: Don't Thread On Me (PostgreSQL related)

From

Rodrigo E. De León Plicet

Date:

07 February 2012, 00:43:57

On Jan 26, 4:52 pm, Rodrigo E. De León Plicet <rdele...@gmail.com>
wrote:
> Quote:
>
> ======================================================================
>
> This thread
>
> http://postgresql.1045698.n5.nabble.com/Multithread-Query-Planner-td5...
>
> was mentioned in a performance sub-group posting. Give it a read.
>
> Back? It means, so far as I can see, that PG is toast. It will fall
> down to being the cheap and dirty alternative to MySql, which even
> has, at least two, multi-threaded engines. DB2 switched it's *nix
> engine to threads from processes with release 9.5. Oracle claims it
> for releases going back to 7 (I haven't tried to determine which parts
> or applications; Larry has bought so many tchochtkes over the
> years...). SQL Server is threaded.
>
> Given that cpu's are breeding threads faster than cores,
> PG will fall into irrelevance.
>
> ======================================================================
>
> Source:http://drcoddwasright.blogspot.com/2012/01/dont-thread-on-me.html
>
> Comments?


Author's followup:

http://drcoddwasright.blogspot.com/2012/02/damn-you-damocles.html

Re: Don't Thread On Me (PostgreSQL related)

From

John R Pierce

Date:

07 February 2012, 01:05:38

On 02/03/12 5:53 PM, Rodrigo E. De León Plicet wrote:
> Author's followup:
>
> http://drcoddwasright.blogspot.com/2012/02/damn-you-damocles.html

his links hardly seem related to his proclamations.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Don't Thread On Me (PostgreSQL related)

From

Chris Travers

Date:

07 February 2012, 01:56:05

My reply is at:
http://ledgersmbdev.blogspot.com/2012/02/robert-young-is-wrong-about-threads-and.html

On Mon, Feb 6, 2012 at 9:05 PM, John R Pierce <pierce@hogranch.com> wrote:

On 02/03/12 5:53 PM, Rodrigo E. De León Plicet wrote:
Author's followup:

http://drcoddwasright.blogspot.com/2012/02/damn-you-damocles.html

his links hardly seem related to his proclamations.

--
john r pierce N 37, W 122
santa cruz ca mid-left coast

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: Don't Thread On Me (PostgreSQL related)

From

Achilleas Mantzios

Date:

07 February 2012, 13:05:54

On Τρι 07 Φεβ 2012 07:05:00 John R Pierce wrote:
> On 02/03/12 5:53 PM, Rodrigo E. De León Plicet wrote:
> > Author's followup:
> >
> > http://drcoddwasright.blogspot.com/2012/02/damn-you-damocles.html
>
> his links hardly seem related to his proclamations.

From the guy's blog:

"Shameless Plug
 I know a little something about SQL, DB2 (preferably LUW, z/OS if the money's
right), Oracle, SQL Server, Postgres, and database design in general. ...."

z/OS? It is apparent that this person has a mainframe background, and the rest
of the keywords he mentions are just buzzwords he has little idea about...

Believe me, I was an ex- IBM MVS sysprog (+10 yrs ago) and struggled to get
back into the BSD/linux world. One cannot think unix/mainframe at the same
time, the concepts are so different, they are nearly mutually exclusive.

--
Achilleas Mantzios
IT DEPT

Re: Don't Thread On Me (PostgreSQL related)

From

Vincent Veyron

Date:

08 February 2012, 18:40:27

Le mardi 07 février 2012 à 10:30 +0200, Achilleas Mantzios a écrit :
> On Τρι 07 Φεβ 2012 07:05:00 John R Pierce wrote:
> > On 02/03/12 5:53 PM, Rodrigo E. De León Plicet wrote:
> > > Author's followup:
> > >
> > > http://drcoddwasright.blogspot.com/2012/02/damn-you-damocles.html
> >
> > his links hardly seem related to his proclamations.
>
> From the guy's blog:

[...]

I am not sure one should waste too much time arguing with the guy?

He is the one who started this thread on june 23

http://archives.beccati.org/pgsql-general/message/18294

in much the same way as this one.

Please have a look at this post by the author/db expert referenced in
the article:

http://craigglendenning.blogspot.com/2009/03/i-fight-to-stay-focusedand-often-lose.html


--
Vincent Veyron
http://marica.fr/
Logiciel de gestion des sinistres et des contentieux pour le service juridique