Thread: looking for some real world performance numbers

looking for some real world performance numbers

From
snacktime
Date:
I'm working through the architecture design for a new product.  We
have a small group working on this.  It's a web app that will be using
ruby on rails.  The challenge I'm running into is that the latest
conventional wisdom seems to be that since obviously databases don't
scale on the web, you should just not use them at all.  I have a group
of otherwise very bright people trying to convince me that a rdbms is
not a good place to store relational data because eventually it won't
scale.  And of course we don't even have version 1 of our product out
of the door.  I'll admit we do have a very good chance of actually
getting tons of traffic, but my position is to use a rdbms for
relational data, and then if and when it won't scale any more, deal
with it then.

So what would really help me is some real world numbers on how
postgresql is doing in the wild under pressure.  If anyone cares to
throw some out I would really appreciate it.

Chris

Re: looking for some real world performance numbers

From
Bill Moran
Date:
snacktime <snacktime@gmail.com> wrote:
>
> I'm working through the architecture design for a new product.  We
> have a small group working on this.  It's a web app that will be using
> ruby on rails.  The challenge I'm running into is that the latest
> conventional wisdom seems to be that since obviously databases don't
> scale on the web, you should just not use them at all.

Who are the people saying this?  It doesn't sound very wise to me.

Where are they proposing to put the data, if not in a database?  That's
what I'd like to know.

> I have a group
> of otherwise very bright people trying to convince me that a rdbms is
> not a good place to store relational data because eventually it won't
> scale.

What is _their_ evidence?

> And of course we don't even have version 1 of our product out
> of the door.

E.S. Raymond's "The Art of UNIX Programming": Rule #15: Write a
prototype before you optimize.

Nothing is funnier than watching people try to performance optimize
software that hasn't even been written yet.  Very few people are
smart enough to know where the performance bottlenecks will be before
they've coded anything.  If they insist on doing it wrong, at least
you'll have a good laugh.

> I'll admit we do have a very good chance of actually
> getting tons of traffic, but my position is to use a rdbms for
> relational data, and then if and when it won't scale any more, deal
> with it then.

That's sane.

> So what would really help me is some real world numbers on how
> postgresql is doing in the wild under pressure.  If anyone cares to
> throw some out I would really appreciate it.

http://people.freebsd.org/~kris/scaling/

Lots of interesting graphs on that page ... most of them seem to indicate
that RDBMS scale rather nicely.

--
Bill Moran
http://www.potentialtech.com

Re: looking for some real world performance numbers

From
andy
Date:
I'll agree with Bill's response...  If they dont want a rdbms what do
they  want?  If they know of something that scales better and is faster,
I'll bet they can make a lot of money.  Lot of high traffic sites would
love to hear what they think.

 > conventional wisdom seems to be that since obviously databases don't
 > scale on the web,

Conventional?  No, I don't think so.  If you have 200 Gig of data,
what's going to search it faster than a rdbms?

If you have 200 Gig of data, with very intensive database queries, what
scales better than having one web server round-robin requests to 10
database servers?

I think the conventional wisdom is that non-database people cannot setup
a database to run quickly to save their life.  And then blame the database.

-Andy

snacktime wrote:
> I'm working through the architecture design for a new product.  We
> have a small group working on this.  It's a web app that will be using
> ruby on rails.  The challenge I'm running into is that the latest
> conventional wisdom seems to be that since obviously databases don't
> scale on the web, you should just not use them at all.  I have a group
> of otherwise very bright people trying to convince me that a rdbms is
> not a good place to store relational data because eventually it won't
> scale.  And of course we don't even have version 1 of our product out
> of the door.  I'll admit we do have a very good chance of actually
> getting tons of traffic, but my position is to use a rdbms for
> relational data, and then if and when it won't scale any more, deal
> with it then.
>
> So what would really help me is some real world numbers on how
> postgresql is doing in the wild under pressure.  If anyone cares to
> throw some out I would really appreciate it.
>
> Chris
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>

Re: looking for some real world performance numbers

From
Dave Cramer
Date:
snacktime wrote:
> I'm working through the architecture design for a new product.  We
> have a small group working on this.  It's a web app that will be using
> ruby on rails.  The challenge I'm running into is that the latest
> conventional wisdom seems to be that since obviously databases don't
> scale on the web, you should just not use them at all.  I have a group
> of otherwise very bright people trying to convince me that a rdbms is
> not a good place to store relational data because eventually it won't
> scale.  And of course we don't even have version 1 of our product out
> of the door.  I'll admit we do have a very good chance of actually
> getting tons of traffic, but my position is to use a rdbms for
> relational data, and then if and when it won't scale any more, deal
> with it then.
>
> So what would really help me is some real world numbers on how
> postgresql is doing in the wild under pressure.  If anyone cares to
> throw some out I would really appreciate it.
>
>
I've got a client doing 18M page views/ day and postgresql isn't really
sweating.

Dave
> Chris
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>



Re: looking for some real world performance numbers

From
paul rivers
Date:
snacktime wrote:
>> I'm working through the architecture design for a new product.  We
>> have a small group working on this.  It's a web app that will be using
>> ruby on rails.  The challenge I'm running into is that the latest
>> conventional wisdom seems to be that since obviously databases don't
>> scale on the web, you should just not use them at all.  I have a group
>> of otherwise very bright people trying to convince me that a rdbms is
>> not a good place to store relational data because eventually it won't
>> scale.  And of course we don't even have version 1 of our product out
>> of the door.  I'll admit we do have a very good chance of actually
>> getting tons of traffic, but my position is to use a rdbms for
>> relational data, and then if and when it won't scale any more, deal
>> with it then.
>>
>> So what would really help me is some real world numbers on how
>> postgresql is doing in the wild under pressure.  If anyone cares to
>> throw some out I would really appreciate it.

It sounds like the RoR people are talking about any relational database,
and not just Postgres.   Many very busy sites do use relational
databases successfully.  So it can work.  Many other have failed.  So it
can fail, if the situation is exceptionally unusual, or IMHO more
likely, it´s poorly implemented.

What the main argument of their ¨won´t scale¨ stance?  Why not setup a
test case to prove or disprove it?  I don´t think anything we can
suggest based on what we know of your project will help, unless someone
happens to throw out a nearly identical case.

I would be surprised if avoiding a database is a better solution.  But
regardless, I would be more worried about using a technology when most
of the core group doesn´t believe in it.  That often leads to bad
results, regardless of whether it should.

Paul




Re: looking for some real world performance numbers

From
Thomas Kellerer
Date:
snacktime wrote on 21.10.2007 08:11:
> I have a group
> of otherwise very bright people trying to convince me that a rdbms is
> not a good place to store relational data

Hmm.

Those bright people say that a /relational/ database management system is not a
good place to store /relational/ data?

I can't understand that reasoning...

Where else do they want to store relational data than in a RDBMS?


Thomas

Re: looking for some real world performance numbers

From
Tom Lane
Date:
Thomas Kellerer <spam_eater@gmx.net> writes:
> Where else do they want to store relational data than in a RDBMS?

Indeed.  It seems like we can hardly answer the OP's question without
asking "compared to what?"  If they're afraid an RDBMS won't scale,
what have they got in mind that they are so certain will scale?

            regards, tom lane

Re: looking for some real world performance numbers

From
Guy Rouillier
Date:
Dave Cramer wrote:
> snacktime wrote:
>> I'm working through the architecture design for a new product.  We
>> have a small group working on this.  It's a web app that will be using
>> ruby on rails.  The challenge I'm running into is that the latest
>> conventional wisdom seems to be that since obviously databases don't
>> scale on the web, you should just not use them at all.  I have a group
>> of otherwise very bright people trying to convince me that a rdbms is
>> not a good place to store relational data because eventually it won't
>> scale.  And of course we don't even have version 1 of our product out
>> of the door.  I'll admit we do have a very good chance of actually
>> getting tons of traffic, but my position is to use a rdbms for
>> relational data, and then if and when it won't scale any more, deal
>> with it then.
>>
>> So what would really help me is some real world numbers on how
>> postgresql is doing in the wild under pressure.  If anyone cares to
>> throw some out I would really appreciate it.

I missed the original post on this, so I'm replying to Dave's response.
  To the OP, I don't know where you obtain your conventional wisdom
from, but I'd look for another source.  Just about any site you might
visit that handles lots of data has a DBMS of some sort behind it; given
that IMS and Adabase have been out of favor for 20 years, most of  those
DBMSs are relational.  So if it can work for your bank, E*Trade and
eBay, chances are it can work for you.

As far as real world numbers, we have a data-intensive app (network data
collection for a telecom company) that is currently inserting about 16
million rows a day.  I benchmarked PG for that app and with some
tweaking, PG could handle it.  The current app uses stored procedures
for all inserts, and PG didn't do well with that approach; substituting
embedded inserts fixed that problem.  So PG can definitely "handle" very
large transaction volumes.  As with any DBMS and any application, you
may encounter challenges (like the one I point out with using stored
procs for high-volume inserts) that require you to address with some
thought.

--
Guy Rouillier

Re: looking for some real world performance numbers

From
Ow Mun Heng
Date:
On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote:
>  The current app uses stored procedures
> for all inserts, and PG didn't do well with that approach; substituting
> embedded inserts fixed that problem.  So PG can definitely "handle" very


Can you explain what is embedded inserts?


Re: looking for some real world performance numbers

From
"Joshua D. Drake"
Date:
Dave Cramer wrote:
> snacktime wrote:
>> I'm working through the architecture design for a new product.  We
>> have a small group working on this.  It's a web app that will be using
>> ruby on rails.  The challenge I'm running into is that the latest
>> conventional wisdom seems to be that since obviously databases don't
>> scale on the web, you should just not use them at all.  I have a group
>> of otherwise very bright people trying to convince me that a rdbms is
>> not a good place to store relational data because eventually it won't
>> scale.  And of course we don't even have version 1 of our product out
>> of the door.  I'll admit we do have a very good chance of actually
>> getting tons of traffic, but my position is to use a rdbms for
>> relational data, and then if and when it won't scale any more, deal
>> with it then.
>>
>> So what would really help me is some real world numbers on how
>> postgresql is doing in the wild under pressure.  If anyone cares to
>> throw some out I would really appreciate it.
>>
>>
> I've got a client doing 18M page views/ day and postgresql isn't really
> sweating.

We have a client doing 15k/tps via a website. I would say these, "very
bright people" are "very bright but excessively ignorant".

Joshua D. Drake

>
> Dave
>> Chris
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>>
>>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


Re: looking for some real world performance numbers

From
"Harvey, Allan AC"
Date:
> As far as real world numbers, we have a data-intensive app 
> (network data 
> collection for a telecom company) that is currently inserting 
> about 16 
> million rows a day.  I benchmarked PG for that app and with some 
> tweaking, PG could handle it.
Me too, not telco though. 5.5 million per day across 2240 tables and 4 databases with 50 days on-line.
Tweaking to auto vaccuum to get it to keep up with the daily deletes and fsync off, slow disks not PG's fault but have
UPS.

Allan


The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended
recipient,use, disclosure or copying of this information is prohibited. If you have received this document in error,
pleaseadvise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses
containedin this email or any attachments.
 

Re: looking for some real world performance numbers

From
Guy Rouillier
Date:
Ow Mun Heng wrote:
> On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote:
>>  The current app uses stored procedures
>> for all inserts, and PG didn't do well with that approach; substituting
>> embedded inserts fixed that problem.  So PG can definitely "handle" very
>
>
> Can you explain what is embedded inserts?

Insert via embedded SQL insert statements in our Java code, as opposed
to embedded SQL stored proc invocations, which in turn do the inserts.
The existing code base used the latter approach, which didn't work well
with PG.  I suspect it has to do with PG's stored proc overload
capability.  The short of it is that *any* DBMS you use will have its
own quirks that you become acquainted with and learn to work around.

--
Guy Rouillier

Re: looking for some real world performance numbers

From
Ow Mun Heng
Date:
On Mon, 2007-10-22 at 01:23 -0400, Guy Rouillier wrote:
> Ow Mun Heng wrote:
> > On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote:
> >>  The current app uses stored procedures
> >> for all inserts, and PG didn't do well with that approach; substituting
> >> embedded inserts fixed that problem.  So PG can definitely "handle" very
> >
> >
> > Can you explain what is embedded inserts?
>
> Insert via embedded SQL insert statements in our Java code, as opposed
> to embedded SQL stored proc invocations, which in turn do the inserts.

AH.. so you sort of hard-code

insert into table values($x,$y,$z,$w)

instead of

execute sp_insert($x,$y,$z,$w)

> The existing code base used the latter approach, which didn't work well
> with PG.  I suspect it has to do with PG's stored proc overload
> capability.  The short of it is that *any* DBMS you use will have its
> own quirks that you become acquainted with and learn to work around.

Yeah..

I found out that PG doesn't like this statement.

where audit_key_dtime >= (select last_refreshed from denorm_log where tablename = 'zon')
and   audit_key_dtime <  (select last_refreshed + refresh_interval from denorm_log where tablename = 'zon')

Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1))

but prefers this

where audit_key_dtime >= '2007-08-08 18:00:00'
and   audit_key_dtime <  '2007-08-08 18:01:00'

Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone)
          AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone))

even though they are of the same 1 min interval


Re: looking for some real world performance numbers

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Thomas Kellerer <spam_eater@gmx.net> writes:
>> Where else do they want to store relational data than in a RDBMS?
>
> Indeed.  It seems like we can hardly answer the OP's question without
> asking "compared to what?"  If they're afraid an RDBMS won't scale,
> what have they got in mind that they are so certain will scale?

I suspect they're misapplying the lesson Google taught everyone. Namely that
domain-specific solutions can provide much better performance than
general-purpose software.

Google might not use an RDBMS to store their search index (which doesn't need
any of the ACID guarantees and needs all kinds of parallelism and lossy
alorithms which SQL and RDBMSes in general don't excel at), but on the other
hand I would be quite surprised if they stored their Adsense or other more
normal use data structures in anything but a bog-standard SQL database.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: looking for some real world performance numbers

From
Michael Fuhr
Date:
On Sat, Oct 20, 2007 at 11:11:32PM -0700, snacktime wrote:
> So what would really help me is some real world numbers on how
> postgresql is doing in the wild under pressure.  If anyone cares to
> throw some out I would really appreciate it.

One of my databases has about 70M rows inserted, 30M rows updated,
70M rows deleted, and 3G rows retrieved per day.  At peak times of
the day it sustains around 120K rows/minute inserted, 80K rows/minute
updated or deleted, and 3.5M rows/minute retrieved and it has room
to grow.  Usage patterns are different than for a web application,
however: transaction rates are low (a few hundred per minute) and
most logic is in database functions with statements that operate
on hundreds or thousands of rows at a time.  Still, this gives an
idea of what a PostgreSQL database on decent hardware can handle.

--
Michael Fuhr

Re: looking for some real world performance numbers

From
"vincent"
Date:
> On Sat, Oct 20, 2007 at 11:11:32PM -0700, snacktime wrote:
>> So what would really help me is some real world numbers on how
>> postgresql is doing in the wild under pressure.  If anyone cares to
>> throw some out I would really appreciate it.
>
> One of my databases has about 70M rows inserted, 30M rows updated,
> 70M rows deleted, and 3G rows retrieved per day.  At peak times of
> the day it sustains around 120K rows/minute inserted, 80K rows/minute
> updated or deleted, and 3.5M rows/minute retrieved and it has room
> to grow.  Usage patterns are different than for a web application,
> however: transaction rates are low (a few hundred per minute) and
> most logic is in database functions with statements that operate
> on hundreds or thousands of rows at a time.  Still, this gives an
> idea of what a PostgreSQL database on decent hardware can handle.
>
> --
> Michael Fuhr
>


What kind of hardware are you using for this system? Just to get an idea
of what 'decent hardware' is in this case.

regards,
vincent.


Re: looking for some real world performance numbers

From
Bill Moran
Date:
In response to Gregory Stark <stark@enterprisedb.com>:

> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > Thomas Kellerer <spam_eater@gmx.net> writes:
> >> Where else do they want to store relational data than in a RDBMS?
> >
> > Indeed.  It seems like we can hardly answer the OP's question without
> > asking "compared to what?"  If they're afraid an RDBMS won't scale,
> > what have they got in mind that they are so certain will scale?
>
> I suspect they're misapplying the lesson Google taught everyone. Namely that
> domain-specific solutions can provide much better performance than
> general-purpose software.
>
> Google might not use an RDBMS to store their search index (which doesn't need
> any of the ACID guarantees and needs all kinds of parallelism and lossy
> alorithms which SQL and RDBMSes in general don't excel at), but on the other
> hand I would be quite surprised if they stored their Adsense or other more
> normal use data structures in anything but a bog-standard SQL database.

Google also has enough high-calibre people that they can probably
re-invent the concept of an RDBMS if they want to.  Yet they don't.
I know a particular Googleite who's a PostgreSQL buff and is bummed
that they use MySQL all over the place.

--
Bill Moran
http://www.potentialtech.com

Re: looking for some real world performance numbers

From
Erik Jones
Date:
On Oct 22, 2007, at 5:44 AM, Gregory Stark wrote:

> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
>> Thomas Kellerer <spam_eater@gmx.net> writes:
>>> Where else do they want to store relational data than in a RDBMS?
>>
>> Indeed.  It seems like we can hardly answer the OP's question without
>> asking "compared to what?"  If they're afraid an RDBMS won't scale,
>> what have they got in mind that they are so certain will scale?
>
> I suspect they're misapplying the lesson Google taught everyone.
> Namely that
> domain-specific solutions can provide much better performance than
> general-purpose software.

That and they've probably been reading a lot of reddit which has had
a fairly large number of posts over the past few months along the
lines of "CouchDB is the future for everything!!!" or "RDBMS are old
technology solution that no longer applies to today's problems!!!".

Erik Jones

Software Developer | Emma®
erik@myemma.com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com



Re: looking for some real world performance numbers

From
Greg Smith
Date:
On Sat, 20 Oct 2007, snacktime wrote:

> It's a web app that will be using ruby on rails.  The challenge I'm
> running into is that the latest conventional wisdom seems to be that
> since obviously databases don't scale on the web, you should just not
> use them at all.

Those who don't use a DBMS to store data are doomed to reinvent one,
poorly.

Ruby On Rails makes some design trade-offs that can make scaling a large
database driven installation challenging.  An article that greatly
popularized the issues involved is the interview at
http://www.radicalbehavior.com/5-question-interview-with-twitter-developer-alex-payne/

There are a variety of workarounds for this though.  Some people use
memcached http://www.danga.com/memcached/ in various ways to reduce the
work the database handles directly.  There are Ruby-based approaches like
http://magicmodels.rubyforge.org/magic_multi_connections/ that let you use
a database cluster to scale higher.

On the PostgreSQL side, programs like PgBouncer let you reduce
database-related overhead when it's the sheer number of physical
connections dragging performance down
https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer

The thing to observe here is that even should the database itself turn
into a scaling bottleneck, there are several clever solutions that let you
keep the good things about a proper RDBMS while providing higher
throughput for the specific demands of the web application environment.
Accordingly, focusing on the raw performance of PostgreSQL is kind of
misleading.  You shouldn't have to hit the database for everything if you
put the right kind of layer on top.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: looking for some real world performance numbers

From
Michael Fuhr
Date:
On Mon, Oct 22, 2007 at 01:33:54PM +0200, vincent wrote:
> > One of my databases has about 70M rows inserted, 30M rows updated,
> > 70M rows deleted, and 3G rows retrieved per day.  At peak times of
> > the day it sustains around 120K rows/minute inserted, 80K rows/minute
> > updated or deleted, and 3.5M rows/minute retrieved and it has room
> > to grow.  Usage patterns are different than for a web application,
> > however: transaction rates are low (a few hundred per minute) and
> > most logic is in database functions with statements that operate
> > on hundreds or thousands of rows at a time.  Still, this gives an
> > idea of what a PostgreSQL database on decent hardware can handle.
>
> What kind of hardware are you using for this system? Just to get an idea
> of what 'decent hardware' is in this case.

I don't know the full specs because another group is responsible
for that.  I think the box has four Athlon 64 X2s with 32G RAM.  At
least some of the storage is SAN-attached.

--
Michael Fuhr