Thread: Performance comparison between Postgres and Greenplum

Performance comparison between Postgres and Greenplum

From
Suvankar Roy
Date:

Hi,

I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as well as a Greenplum DB.

The Primary key is a composite one comprising of 2 columns (so_no, serial_no).

The execution of the following query takes 8214.016 ms in Greenplum but only 729.134 ms in Postgres.
select * from observation_all order by so_no, serial_no;

I believe that execution time in greenplum should be less compared to postgres. Can anybody throw some light, it would be of great help.


Regards,

Suvankar Roy

=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Performance comparison between Postgres and Greenplum

From
Scott Marlowe
Date:
On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi,
>
> I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as
> well as a Greenplum DB.
>
> The Primary key is a composite one comprising of 2 columns (so_no,
> serial_no).
>
> The execution of the following query takes 8214.016 ms in Greenplum but only
> 729.134 ms in Postgres.
> select * from observation_all order by so_no, serial_no;
>
> I believe that execution time in greenplum should be less compared to
> postgres. Can anybody throw some light, it would be of great help.

What versions are you comparing?

Re: Performance comparison between Postgres and Greenplum

From
Scott Marlowe
Date:
On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi Scott,
>
> This is what I have got -
> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
> i686-pc-linux-gnu, compiled by GCC gcc (GCC)

> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
> (1 row)

I wouldn't expect 8.2.x to outrun 8.3.x

Re: Performance comparison between Postgres and Greenplum

From
Alex Goncharov
Date:
,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
| I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
| as well as a Greenplum DB.
|
| The Primary key is a composite one comprising of 2 columns (so_no,
| serial_no).
|
| The execution of the following query takes 8214.016 ms in Greenplum but
| only 729.134 ms in Postgres.
| select * from observation_all order by so_no, serial_no;
|
| I believe that execution time in greenplum should be less compared to
| postgres. Can anybody throw some light, it would be of great help.

Why do you believe so?

Is your data distributed and served by separate segment hosts?  By how
many?  Is the network connectivity not a factor?  What happens with
the times if you don't sort your result set?

-- Alex -- alex-goncharov@comcast.net --


Re: Performance comparison between Postgres and Greenplum

From
Alex Goncharov
Date:
,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
| Yes, I have got 2 segments and a master host. So, in a way processing
| should be faster in Greenplum.

No, it should not: it all depends on your data, SQL statements and
setup.

In my own experiments, with small amounts of stored data, PostgreSQL
beats Greenplum, which doesn't surprise me a bit.

You need to know where most of the execution time goes -- maybe to
sorting?  And sorting in Greenplum, isn't it done on one machine, the
master host?  Why would that be faster than in PostgreSQL?
|
| For other queries though, results are satisfactory or at least comparable,
| like-
|
| select distinct so_no, serial_no from observation_all;
| in postgres it takes - 1404.238 ms
| in gp it takes - 1217.283 ms

No surprise here: the data is picked by multiple segment hosts and
never sorted on the master.

-- Alex -- alex-goncharov@comcast.net --


Re: Performance comparison between Postgres and Greenplum

From
Scott Mead
Date:

On Wed, Jul 15, 2009 at 9:18 AM, Alex Goncharov <alex-goncharov@comcast.net> wrote:
,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
| Yes, I have got 2 segments and a master host. So, in a way processing
| should be faster in Greenplum.

No, it should not: it all depends on your data, SQL statements and
setup.

In my own experiments, with small amounts of stored data, PostgreSQL
beats Greenplum, which doesn't surprise me a bit.

Agreed.  You're only operating on 99,000 rows.  That isn't really enough rows to exercise the architecture of shared-nothing clusters.  Now, I don't know greenplum very well, but I am familiar with another warehousing product with approximately the same architecture behind it.  From all the testing I've done, you need to get into the 50 million plus row range before the architecture starts to be really effective.  99,000 rows probably fits completely into memory on the machine that you're testing PG with, so your test really isn't fair.  On one PG box, you're just doing memory reads, and maybe some high-speed disk access, on the Greenplum setup, you've got network overhead on top of all that.  Bottom line: You need to do a test with a number of rows that won't fit into memory, and won't be very quickly scanned from disk into memory.  You need a LOT of data.

--Scott

Re: Performance comparison between Postgres and Greenplum

From
Greg Smith
Date:
On Mon, 13 Jul 2009, Suvankar Roy wrote:

> I believe that execution time in greenplum should be less compared to postgres.

Well, first off you don't even mention which PostgreSQL or Greenplum
version you're comparing, which leaves a lot of variables we can't account
for.  Second, you'd need to make sure that the two servers had as close to
identical server parameter configurations as possible to get a fair
comparison (the postgresql.conf file).  Next, you need to make sure the
data has been loaded and analyzed similarly on the two--try using "VACUUM
ANALYZE" on both systems before running your query, then "EXPLAIN ANALYZE"
on both setups to get an idea if they're using the same plan to pull data
from the disk, you may discover there's a radical different there.

...and even if you did all that, this still wouldn't be the right place to
ask about Greenplum's database product.  You'll end up with everyone mad
at you.  Nobody likes have benchmarks that show their product in a bad
light published, particularly if they aren't completely fair.  And this
list is dedicated to talking about the open-source PostgreSQL versions.
Your question would be more appropriate to throw in Greenplum's direction.
The list I gave above is by no means even comprehensive--there are plenty
of other ways you can end up doing an unfair comparison here (using
different paritions on the same disk which usually end up with different
speeds comes to mind).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Performance comparison between Postgres and Greenplum

From
Greg Smith
Date:
On Wed, 15 Jul 2009, Scott Marlowe wrote:

> On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>>
>> Hi Scott,
>>
>> This is what I have got -
>> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
>> i686-pc-linux-gnu, compiled by GCC gcc (GCC)
>
>> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
>> (1 row)
>
> I wouldn't expect 8.2.x to outrun 8.3.x

And you can't directly compare performance of a system running Linux with
one running Windows, even if they're the same hardware.  Theoretically,
Linux should have an advantage, but only if you're accounting for a whole
stack of other variables.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Performance comparison between Postgres and Greenplum

From
Suvankar Roy
Date:

Hi Scott,

This is what I have got -

In Greenplum, the following query returns:

test_db1=# select version();
                                                                                  version                                    
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) compiled on Jun  4 2009 16:30:49
(1 row)


In Postgres, the same query returns:

postgres=# select version();
                       version
-----------------------------------------------------
 PostgreSQL 8.3.7, compiled by Visual C++ build 1400
(1 row)

Regards,

Suvankar Roy
Tata Consultancy Services
Ph:- +91 33 66367352
Cell:- +91 9434666898



Scott Marlowe <scott.marlowe@gmail.com>

07/15/2009 09:10 AM

To
Suvankar Roy <suvankar.roy@tcs.com>
cc
pgsql-performance@postgresql.org
Subject
Re: [PERFORM] Performance comparison between Postgres and Greenplum





On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi,
>
> I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as
> well as a Greenplum DB.
>
> The Primary key is a composite one comprising of 2 columns (so_no,
> serial_no).
>
> The execution of the following query takes 8214.016 ms in Greenplum but only
> 729.134 ms in Postgres.
> select * from observation_all order by so_no, serial_no;
>
> I believe that execution time in greenplum should be less compared to
> postgres. Can anybody throw some light, it would be of great help.

What versions are you comparing?

ForwardSourceID:NT00004AAE    
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Performance comparison between Postgres and Greenplum

From
Suvankar Roy
Date:

Hi Scott,

Thanks for your input Scott.

But, then being a Massively Parallel Processing Database, is Greenplum not expected to outperform versions of Postgres higher than on which it is based.

My notion was that GP 3.3 (based on PostgreSQL 8.2.13) would exceed PG 8.3.7.

It seems that I was wrong here.

Regards,

Suvankar Roy



Scott Marlowe <scott.marlowe@gmail.com>

07/15/2009 03:00 PM

To
Suvankar Roy <suvankar.roy@tcs.com>
cc
pgsql-performance@postgresql.org
Subject
Re: [PERFORM] Performance comparison between Postgres and Greenplum





On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi Scott,
>
> This is what I have got -
> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
> i686-pc-linux-gnu, compiled by GCC gcc (GCC)

> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
> (1 row)

I wouldn't expect 8.2.x to outrun 8.3.x

ForwardSourceID:NT00004AD2    
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Performance comparison between Postgres and Greenplum

From
Suvankar Roy
Date:

Hi Alex,

Yes, I have got 2 segments and a master host. So, in a way processing should be faster in Greenplum.

Actually this is only a sort of Proof of Concept trial that I am carrying out to notice differences between greenplum and postgres, if any.

For other queries though, results are satisfactory or at least comparable, like-

select distinct so_no, serial_no from observation_all;
in postgres it takes - 1404.238 ms
in gp it takes - 1217.283 ms


Regards,

Suvankar Roy


Alex Goncharov <alex-goncharov@comcast.net>

07/15/2009 06:07 PM

Please respond to
Alex Goncharov <alex-goncharov@comcast.net>

To
Suvankar Roy <suvankar.roy@tcs.com>
cc
pgsql-performance@postgresql.org
Subject
Re: [PERFORM] Performance comparison between Postgres and Greenplum





,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
| I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
| as well as a Greenplum DB.
|
| The Primary key is a composite one comprising of 2 columns (so_no,
| serial_no).
|
| The execution of the following query takes 8214.016 ms in Greenplum but
| only 729.134 ms in Postgres.
| select * from observation_all order by so_no, serial_no;
|
| I believe that execution time in greenplum should be less compared to
| postgres. Can anybody throw some light, it would be of great help.

Why do you believe so?

Is your data distributed and served by separate segment hosts?  By how
many?  Is the network connectivity not a factor?  What happens with
the times if you don't sort your result set?

-- Alex -- alex-goncharov@comcast.net --


ForwardSourceID:NT00004AF2    
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you


Re: Performance comparison between Postgres and Greenplum

From
Scott Marlowe
Date:
On Wed, Jul 15, 2009 at 7:02 AM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi Alex,
>
> Yes, I have got 2 segments and a master host. So, in a way processing should
> be faster in Greenplum.
>
> Actually this is only a sort of Proof of Concept trial that I am carrying
> out to notice differences between greenplum and postgres, if any.

You're definitely gonna want more data to test with.  I run regular
vanilla pgsql for stats at work, and we average 0.8M to 2M rows of
stats every day.  We keep them for up to two years.  So, when we reach
our max of two years, we're talking somewhere in the range of a
billion rows to mess about with.

During a not so busy day, the 99,000th row entered into stats for
happens at about 3am.  Once they're loaded into memory it takes 435 ms
to access those 99k rows.

Start testing in the millions, at a minimum.  Hundreds of millions is
more likely to start showing a difference.