Thread: Re: [GENERAL] Benchmarks

Re: [GENERAL] Benchmarks

From
"Differentiated Software Solutions Pvt. Ltd."
Date:
Hi,

It was good that I have chanced upon your mail.

I'm currently implementing a very high performance reliability application
using postgres.
I'm risking lots of debate and criticism on what I have to say. This is
based on practical experience. I have decent experience is using DB2 and a
reasonable experience using Oracle and SQL-Server.

My opinion is as follows.
I wouldn't look at most benchmarking numbers for the app. you are
programming. We are currently facing some performance issues which I'm
sorting out. I was involved in a major benchmarking exercise about 3 years
back where we did exactly what you are looking for.... Results - Oracle and
Sybase and off all surprises even Ingres (which is now extinct) delivered
results which were quite indistingushable. My experience shows that if you
want to extract the juice out of the platform it all lies in the DB design
and coding !! It has been repeatedly proved with every DB I have worked on,
including Postgres.

Having said that... Let me stick my neck out and list some of the problems
we have been facing with Postgres
a) When faced with very heavy transaction loads postgres may crash !! With
great effort and about 3 weeks of full time effort on my part and also
thanks to the pg user community, I hope to have fixed it. I'm still awaiting
further testing before I write to this mailing list all details regarding
what we went thru'.
b) There are quite a few things which you'd take for granted in other DBs
which postgres does not have. Quite late in the day I was shocked to find
that postgres does not have roll-forward transaction logging. They have
Multi-version concurrency control and rollback support but you can't restore
a backup and play logs !!
c) Another issue which rankles me a lot is that postgres maintains different
tables in different files. I remember lots of debate on this issue between
1993 and 95 and most DBMS's have settled down into having all tables within
their own file structure... presumably to take over buffering and other
mechanisms from the OS onto themselves. I have even read literature where
Oracle and DB2 allow U to create DB's on raw UNIX partitions. I'm not sure
whether they are still used.
d) Postgres manual recommends a nightly vacuum. I read this also a bit late.
This is equivalent of rebuild database. While this is in progress all other
clients wait for vacuum release locks. This is really a handicap for a 24x7
app.

Also, Please.... consider DB2. They are vastly underrated. On the mainframe
it is just fantastic. I would like to extrapolate this experience. They are
6 month delayed in features as compared to Oracle... but I have found it to
be extremely reliable at a very competitive price. I just hope they have not
screwed it up on Linux.

At the same time.... postgres would be a great choice when you have
a) hardware  resource constraints
b) Less money to buy software
c) your app is not mission critical and 24x7.

Bye,

Murali

-----Original Message-----
From: Michael Cornelison <mcornel@magnify.com>
To: pgsql-general@hub.org <pgsql-general@hub.org>
Date: 05 January 2000 13:18
Subject: [GENERAL] Benchmarks


>I don't know if this is an already talked about issue. If it is I apologize
>in advance.
>
>I getting ready to gear up for a major database project.
>I am considering pgsql and oracle on a Intel/Linux platform.
>I did see the comparison of the different feature of all the databases, but
>that is not exactly what I need.
>I was really wondering if there are any existing benchmarks between the
two.
>like the number of selected records/second for a narrow table (10-20
>columns) and as well for a wide table (150+ columns)
>I would also be interested in the local vs. networked client benchmarks as
>well.
>I would also be interested in the speed of the inserts on similar tables.
>
>My projects is going to have very heavy traffic (some web, some data entry,
>and some real time data) and performance is going to be key.
>
>I just wanted to see if anyone has done anything like this before I attempt
>it. It would help minimize some of my work and/or reinforce any of my
>fining.
>
>Thanks
>Michael Cornelison
>
>
>
>
>************
>


Re: [GENERAL] Benchmarks

From
The Hermit Hacker
Date:
On Thu, 6 Jan 2000, Differentiated Software Solutions Pvt. Ltd. wrote:

> b) There are quite a few things which you'd take for granted in other DBs
> which postgres does not have. Quite late in the day I was shocked to find
> that postgres does not have roll-forward transaction logging. They have
> Multi-version concurrency control and rollback support but you can't restore
> a backup and play logs !!

Being worked on...as memory serves, should be in v7.1, as its part of the
WAL work that Vadim is doing...scheduale: this summer

> c) Another issue which rankles me a lot is that postgres maintains different
> tables in different files. I remember lots of debate on this issue between
> 1993 and 95 and most DBMS's have settled down into having all tables within
> their own file structure... presumably to take over buffering and other
> mechanisms from the OS onto themselves. I have even read literature where
> Oracle and DB2 allow U to create DB's on raw UNIX partitions. I'm not sure
> whether they are still used.

There are several schools of thought on this one...I've read documentation
from Oracle that recommends against doing this, since the operating
itself, in most cases, will do a much better job of disk I/O...there are
generally ways of tweaking your OS to increase various things like disk
buffering and what not...

> d) Postgres manual recommends a nightly vacuum. I read this also a bit
> late. This is equivalent of rebuild database. While this is in
> progress all other clients wait for vacuum release locks. This is
> really a handicap for a 24x7 app.

rebuild?  first off, vacuum is a 'cleanup tool'...it goes through and
removes unwanted/junk/deleted records from the database...if you do alot
of update/delete operations, then, yes, do a vacuum reasonably
often.  nightly though?  depends on how much your data changes.  a vacuum
gives you nothing really on a database that is purely insert data.

actually, there has been talk about, and work started on, getting rid of
the 'locking' issue...with the MVCC code, the lock *shouldn't* be required
anymore, but Tom Lane(?) did some preliminary work on removing this for
v7.0 and ran into a few roadblocks, especially where there was the risk of
simultaneous vacuum's happening...wasn't a pretty sight, if memory
serves...

> At the same time.... postgres would be a great choice when you have
> a) hardware  resource constraints
> b) Less money to buy software
> c) your app is not mission critical and 24x7.

I love these two works "mission critical"...I run several applications
that clients consider to be mission critical, the least of which are two
accounting systems for dial-up ISPs...about the only time their databases
go down is when we upgrade or have to reboot the physical machine...


Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [GENERAL] Benchmarks

From
Dustin Sallings
Date:
On Thu, 6 Jan 2000, The Hermit Hacker wrote:

# > d) Postgres manual recommends a nightly vacuum. I read this also a bit
# > late. This is equivalent of rebuild database. While this is in
# > progress all other clients wait for vacuum release locks. This is
# > really a handicap for a 24x7 app.
#
# rebuild?  first off, vacuum is a 'cleanup tool'...it goes through and
# removes unwanted/junk/deleted records from the database...if you do
# alot of update/delete operations, then, yes, do a vacuum reasonably
# often.  nightly though?  depends on how much your data changes.  a
# vacuum gives you nothing really on a database that is purely insert
# data.

    Untrue, vacuum is *extremely* important for updating statistics.
If you have a lot of data in a table, and you have never vacuumed, you
might as well not have any indices.  It'd be nice if you could seperate
the stat update from the storage reclaim.  Actually, it'd be nice if you
could reuse storage, so that an actual vacuum wouldn't be necessary unless
you just wanted to free up disk space you might end up using again anyway.

    The vacuum also doesn't seem to be very efficient.  In one of my
databases, a vacuum could take in excess of 24 hours, while I've written a
small SQL script that does a select rename and a insert into select from
that will do the same job in about ten minutes.  This is a database that
cannot lock for more than a few minutes.

# actually, there has been talk about, and work started on, getting rid
# of the 'locking' issue...with the MVCC code, the lock *shouldn't* be
# required anymore, but Tom Lane(?) did some preliminary work on
# removing this for v7.0 and ran into a few roadblocks, especially where
# there was the risk of simultaneous vacuum's happening...wasn't a
# pretty sight, if memory serves...

    Couldn't you just keep the pg_vlock and only have it honored by
vacuum?  That would be neat if other stuff could happen during a vacuum.

# I love these two works "mission critical"...I run several applications
# that clients consider to be mission critical, the least of which are
# two accounting systems for dial-up ISPs...about the only time their
# databases go down is when we upgrade or have to reboot the physical
# machine...

    Well, he's right that it makes a difference how the app is
written, though.  I've got a mission critical app that does web
monitoring, hitting a bunch of web pages and recording everything it does.
When it fails a regex match on a web page, it also records the entire web
page so you can examine it later.  I move a lot of data through that, and
it's fine.

    Another app I've got is an event processing system that can
regularly cause postgres to stop functioning when it starts getting a lot
of events coming through.

    The only real difference between the two is that one is written in
Java and uses database pooling, and the other is written in perl and makes
a database connection each time it has a batch of inserts.  I've not seen
the actual behavior when the event processing system breaks the database,
but it's usually followed by a ton of inserts trying to make their way in.

--
dustin sallings                            The world is watching America,
http://2852210114/~dustin/                 and America is watching TV.


Re: [GENERAL] Benchmarks

From
The Hermit Hacker
Date:
On Thu, 6 Jan 2000, Dustin Sallings wrote:

>     Untrue, vacuum is *extremely* important for updating statistics.
> If you have a lot of data in a table, and you have never vacuumed, you
> might as well not have any indices.  It'd be nice if you could seperate
> the stat update from the storage reclaim.  Actually, it'd be nice if you
> could reuse storage, so that an actual vacuum wouldn't be necessary unless
> you just wanted to free up disk space you might end up using again anyway.

Okay, my understanding is that a vacuum does a 'cleanup', while a vacuum
analyze does a cleanup *and* stats...



Re: [GENERAL] Benchmarks

From
Dustin Sallings
Date:
On Thu, 6 Jan 2000, The Hermit Hacker wrote:

# Okay, my understanding is that a vacuum does a 'cleanup', while a vacuum
# analyze does a cleanup *and* stats...

    That may be correct, but the stats without a cleanup would be a
nice option.  :)  That would give me the ability to update my stats every
day, or more frequently, but only cleanup once a week or month or
whatever.

--
SA, beyond.com           My girlfriend asked me which one I like better.
pub  1024/3CAE01D5 1994/11/03 Dustin Sallings <dustin@spy.net>
|    Key fingerprint =  87 02 57 08 02 D0 DA D6  C8 0F 3E 65 51 98 D8 BE
L_______________________ I hope the answer won't upset her. ____________


Re: [GENERAL] Benchmarks

From
Bruce Momjian
Date:
>     Untrue, vacuum is *extremely* important for updating statistics.
> If you have a lot of data in a table, and you have never vacuumed, you
> might as well not have any indices.  It'd be nice if you could seperate
> the stat update from the storage reclaim.  Actually, it'd be nice if you
> could reuse storage, so that an actual vacuum wouldn't be necessary unless
> you just wanted to free up disk space you might end up using again anyway.
>
>     The vacuum also doesn't seem to be very efficient.  In one of my
> databases, a vacuum could take in excess of 24 hours, while I've written a
> small SQL script that does a select rename and a insert into select from
> that will do the same job in about ten minutes.  This is a database that
> cannot lock for more than a few minutes.

This is serious.  Why would an INSERT / RENAME be so much faster.  Are
we that bad with VACUUM?

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: [GENERAL] Benchmarks (Vacuum)

From
"Rudy Gireyev"
Date:
While on the subject of vacuum. I wonder if
Tom's time will be better utilized at figuring out how to
get rid of vacuum all together rather than trying to fix
it. Simply have that functionality replaced with a more
modern way of data management and query optimization.
That command was nothing but trouble in Illustra and is
not getting any better.
Just thinking out loud.

Rudy

On 6 Jan 00, at 13:14, Bruce Momjian wrote:

> >     Untrue, vacuum is *extremely* important for updating statistics. If
> > you have a lot of data in a table, and you have never vacuumed, you
> > might as well not have any indices.  It'd be nice if you could
> > seperate the stat update from the storage reclaim.  Actually, it'd
> > be nice if you could reuse storage, so that an actual vacuum
> > wouldn't be necessary unless you just wanted to free up disk space
> > you might end up using again anyway.
> >
> >     The vacuum also doesn't seem to be very efficient.  In one of my
> > databases, a vacuum could take in excess of 24 hours, while I've
> > written a small SQL script that does a select rename and a insert
> > into select from that will do the same job in about ten minutes.
> > This is a database that cannot lock for more than a few minutes.
>
> This is serious.  Why would an INSERT / RENAME be so much faster.  Are
> we that bad with VACUUM?
>
> --
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000 +  If your
>   life is a hard drive,     |  830 Blythe Avenue +  Christ can be your
>   backup.        |  Drexel Hill, Pennsylvania 19026
>
> ************
>
>



Re: [GENERAL] Benchmarks (Vacuum)

From
Bruce Momjian
Date:
> While on the subject of vacuum. I wonder if
> Tom's time will be better utilized at figuring out how to
> get rid of vacuum all together rather than trying to fix
> it. Simply have that functionality replaced with a more
> modern way of data management and query optimization.
> That command was nothing but trouble in Illustra and is
> not getting any better.
> Just thinking out loud.

I have always hoped to allow space reuse without vacuum.  No one has
figured how to do it.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  maillist@candle.pha.pa.us            |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026