Thread: Performance comparison between Postgres and Greenplum
Hi,
I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as well as a Greenplum DB.
The Primary key is a composite one comprising of 2 columns (so_no, serial_no).
The execution of the following query takes 8214.016 ms in Greenplum but only 729.134 ms in Postgres.
select * from observation_all order by so_no, serial_no;
I believe that execution time in greenplum should be less compared to postgres. Can anybody throw some light, it would be of great help.
Regards,
Suvankar Roy
=====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roy<suvankar.roy@tcs.com> wrote: > > Hi, > > I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as > well as a Greenplum DB. > > The Primary key is a composite one comprising of 2 columns (so_no, > serial_no). > > The execution of the following query takes 8214.016 ms in Greenplum but only > 729.134 ms in Postgres. > select * from observation_all order by so_no, serial_no; > > I believe that execution time in greenplum should be less compared to > postgres. Can anybody throw some light, it would be of great help. What versions are you comparing?
On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar.roy@tcs.com> wrote: > > Hi Scott, > > This is what I have got - > In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on > i686-pc-linux-gnu, compiled by GCC gcc (GCC) > In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400 > (1 row) I wouldn't expect 8.2.x to outrun 8.3.x
,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----* | I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB | as well as a Greenplum DB. | | The Primary key is a composite one comprising of 2 columns (so_no, | serial_no). | | The execution of the following query takes 8214.016 ms in Greenplum but | only 729.134 ms in Postgres. | select * from observation_all order by so_no, serial_no; | | I believe that execution time in greenplum should be less compared to | postgres. Can anybody throw some light, it would be of great help. Why do you believe so? Is your data distributed and served by separate segment hosts? By how many? Is the network connectivity not a factor? What happens with the times if you don't sort your result set? -- Alex -- alex-goncharov@comcast.net --
,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----* | Yes, I have got 2 segments and a master host. So, in a way processing | should be faster in Greenplum. No, it should not: it all depends on your data, SQL statements and setup. In my own experiments, with small amounts of stored data, PostgreSQL beats Greenplum, which doesn't surprise me a bit. You need to know where most of the execution time goes -- maybe to sorting? And sorting in Greenplum, isn't it done on one machine, the master host? Why would that be faster than in PostgreSQL? | | For other queries though, results are satisfactory or at least comparable, | like- | | select distinct so_no, serial_no from observation_all; | in postgres it takes - 1404.238 ms | in gp it takes - 1217.283 ms No surprise here: the data is picked by multiple segment hosts and never sorted on the master. -- Alex -- alex-goncharov@comcast.net --
On Wed, Jul 15, 2009 at 9:18 AM, Alex Goncharov <alex-goncharov@comcast.net> wrote:
,--- You/Suvankar (Wed, 15 Jul 2009 18:32:12 +0530) ----*
| Yes, I have got 2 segments and a master host. So, in a way processing
| should be faster in Greenplum.
No, it should not: it all depends on your data, SQL statements and
setup.
In my own experiments, with small amounts of stored data, PostgreSQL
beats Greenplum, which doesn't surprise me a bit.
Agreed. You're only operating on 99,000 rows. That isn't really enough rows to exercise the architecture of shared-nothing clusters. Now, I don't know greenplum very well, but I am familiar with another warehousing product with approximately the same architecture behind it. From all the testing I've done, you need to get into the 50 million plus row range before the architecture starts to be really effective. 99,000 rows probably fits completely into memory on the machine that you're testing PG with, so your test really isn't fair. On one PG box, you're just doing memory reads, and maybe some high-speed disk access, on the Greenplum setup, you've got network overhead on top of all that. Bottom line: You need to do a test with a number of rows that won't fit into memory, and won't be very quickly scanned from disk into memory. You need a LOT of data.
--Scott
On Mon, 13 Jul 2009, Suvankar Roy wrote: > I believe that execution time in greenplum should be less compared to postgres. Well, first off you don't even mention which PostgreSQL or Greenplum version you're comparing, which leaves a lot of variables we can't account for. Second, you'd need to make sure that the two servers had as close to identical server parameter configurations as possible to get a fair comparison (the postgresql.conf file). Next, you need to make sure the data has been loaded and analyzed similarly on the two--try using "VACUUM ANALYZE" on both systems before running your query, then "EXPLAIN ANALYZE" on both setups to get an idea if they're using the same plan to pull data from the disk, you may discover there's a radical different there. ...and even if you did all that, this still wouldn't be the right place to ask about Greenplum's database product. You'll end up with everyone mad at you. Nobody likes have benchmarks that show their product in a bad light published, particularly if they aren't completely fair. And this list is dedicated to talking about the open-source PostgreSQL versions. Your question would be more appropriate to throw in Greenplum's direction. The list I gave above is by no means even comprehensive--there are plenty of other ways you can end up doing an unfair comparison here (using different paritions on the same disk which usually end up with different speeds comes to mind). -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Wed, 15 Jul 2009, Scott Marlowe wrote: > On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar.roy@tcs.com> wrote: >> >> Hi Scott, >> >> This is what I have got - >> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on >> i686-pc-linux-gnu, compiled by GCC gcc (GCC) > >> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400 >> (1 row) > > I wouldn't expect 8.2.x to outrun 8.3.x And you can't directly compare performance of a system running Linux with one running Windows, even if they're the same hardware. Theoretically, Linux should have an advantage, but only if you're accounting for a whole stack of other variables. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Hi Scott,
This is what I have got -
In Greenplum, the following query returns:
test_db1=# select version();
version
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44) compiled on Jun 4 2009 16:30:49
(1 row)
In Postgres, the same query returns:
postgres=# select version();
version
-----------------------------------------------------
PostgreSQL 8.3.7, compiled by Visual C++ build 1400
(1 row)
Regards,
Suvankar Roy
Tata Consultancy Services
Ph:- +91 33 66367352
Cell:- +91 9434666898
Scott Marlowe <scott.marlowe@gmail.com> 07/15/2009 09:10 AM |
|
On Mon, Jul 13, 2009 at 5:23 AM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi,
>
> I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB as
> well as a Greenplum DB.
>
> The Primary key is a composite one comprising of 2 columns (so_no,
> serial_no).
>
> The execution of the following query takes 8214.016 ms in Greenplum but only
> 729.134 ms in Postgres.
> select * from observation_all order by so_no, serial_no;
>
> I believe that execution time in greenplum should be less compared to
> postgres. Can anybody throw some light, it would be of great help.
What versions are you comparing?
ForwardSourceID:NT00004AAE
=====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Hi Scott,
Thanks for your input Scott.
But, then being a Massively Parallel Processing Database, is Greenplum not expected to outperform versions of Postgres higher than on which it is based.
My notion was that GP 3.3 (based on PostgreSQL 8.2.13) would exceed PG 8.3.7.
It seems that I was wrong here.
Regards,
Suvankar Roy
Scott Marlowe <scott.marlowe@gmail.com> 07/15/2009 03:00 PM |
|
On Tue, Jul 14, 2009 at 11:33 PM, Suvankar Roy<suvankar.roy@tcs.com> wrote:
>
> Hi Scott,
>
> This is what I have got -
> In Greenplum, version PostgreSQL 8.2.13 (Greenplum Database 3.3.0.1 build 4) on
> i686-pc-linux-gnu, compiled by GCC gcc (GCC)
> In Postgres, version PostgreSQL 8.3.7, compiled by Visual C++ build 1400
> (1 row)
I wouldn't expect 8.2.x to outrun 8.3.x
ForwardSourceID:NT00004AD2
=====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Hi Alex,
Yes, I have got 2 segments and a master host. So, in a way processing should be faster in Greenplum.
Actually this is only a sort of Proof of Concept trial that I am carrying out to notice differences between greenplum and postgres, if any.
For other queries though, results are satisfactory or at least comparable, like-
select distinct so_no, serial_no from observation_all;
in postgres it takes - 1404.238 ms
in gp it takes - 1217.283 ms
Regards,
Suvankar Roy
Alex Goncharov <alex-goncharov@comcast.net> 07/15/2009 06:07 PM
|
|
,--- You/Suvankar (Mon, 13 Jul 2009 16:53:41 +0530) ----*
| I have some 99,000 records in a table (OBSERVATION_ALL) in a Postgres DB
| as well as a Greenplum DB.
|
| The Primary key is a composite one comprising of 2 columns (so_no,
| serial_no).
|
| The execution of the following query takes 8214.016 ms in Greenplum but
| only 729.134 ms in Postgres.
| select * from observation_all order by so_no, serial_no;
|
| I believe that execution time in greenplum should be less compared to
| postgres. Can anybody throw some light, it would be of great help.
Why do you believe so?
Is your data distributed and served by separate segment hosts? By how
many? Is the network connectivity not a factor? What happens with
the times if you don't sort your result set?
-- Alex -- alex-goncharov@comcast.net --
ForwardSourceID:NT00004AF2
=====-----=====-----===== Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
On Wed, Jul 15, 2009 at 7:02 AM, Suvankar Roy<suvankar.roy@tcs.com> wrote: > > Hi Alex, > > Yes, I have got 2 segments and a master host. So, in a way processing should > be faster in Greenplum. > > Actually this is only a sort of Proof of Concept trial that I am carrying > out to notice differences between greenplum and postgres, if any. You're definitely gonna want more data to test with. I run regular vanilla pgsql for stats at work, and we average 0.8M to 2M rows of stats every day. We keep them for up to two years. So, when we reach our max of two years, we're talking somewhere in the range of a billion rows to mess about with. During a not so busy day, the 99,000th row entered into stats for happens at about 3am. Once they're loaded into memory it takes 435 ms to access those 99k rows. Start testing in the millions, at a minimum. Hundreds of millions is more likely to start showing a difference.