Thread: looking for some real world performance numbers
I'm working through the architecture design for a new product. We have a small group working on this. It's a web app that will be using ruby on rails. The challenge I'm running into is that the latest conventional wisdom seems to be that since obviously databases don't scale on the web, you should just not use them at all. I have a group of otherwise very bright people trying to convince me that a rdbms is not a good place to store relational data because eventually it won't scale. And of course we don't even have version 1 of our product out of the door. I'll admit we do have a very good chance of actually getting tons of traffic, but my position is to use a rdbms for relational data, and then if and when it won't scale any more, deal with it then. So what would really help me is some real world numbers on how postgresql is doing in the wild under pressure. If anyone cares to throw some out I would really appreciate it. Chris
snacktime <snacktime@gmail.com> wrote: > > I'm working through the architecture design for a new product. We > have a small group working on this. It's a web app that will be using > ruby on rails. The challenge I'm running into is that the latest > conventional wisdom seems to be that since obviously databases don't > scale on the web, you should just not use them at all. Who are the people saying this? It doesn't sound very wise to me. Where are they proposing to put the data, if not in a database? That's what I'd like to know. > I have a group > of otherwise very bright people trying to convince me that a rdbms is > not a good place to store relational data because eventually it won't > scale. What is _their_ evidence? > And of course we don't even have version 1 of our product out > of the door. E.S. Raymond's "The Art of UNIX Programming": Rule #15: Write a prototype before you optimize. Nothing is funnier than watching people try to performance optimize software that hasn't even been written yet. Very few people are smart enough to know where the performance bottlenecks will be before they've coded anything. If they insist on doing it wrong, at least you'll have a good laugh. > I'll admit we do have a very good chance of actually > getting tons of traffic, but my position is to use a rdbms for > relational data, and then if and when it won't scale any more, deal > with it then. That's sane. > So what would really help me is some real world numbers on how > postgresql is doing in the wild under pressure. If anyone cares to > throw some out I would really appreciate it. http://people.freebsd.org/~kris/scaling/ Lots of interesting graphs on that page ... most of them seem to indicate that RDBMS scale rather nicely. -- Bill Moran http://www.potentialtech.com
I'll agree with Bill's response... If they dont want a rdbms what do they want? If they know of something that scales better and is faster, I'll bet they can make a lot of money. Lot of high traffic sites would love to hear what they think. > conventional wisdom seems to be that since obviously databases don't > scale on the web, Conventional? No, I don't think so. If you have 200 Gig of data, what's going to search it faster than a rdbms? If you have 200 Gig of data, with very intensive database queries, what scales better than having one web server round-robin requests to 10 database servers? I think the conventional wisdom is that non-database people cannot setup a database to run quickly to save their life. And then blame the database. -Andy snacktime wrote: > I'm working through the architecture design for a new product. We > have a small group working on this. It's a web app that will be using > ruby on rails. The challenge I'm running into is that the latest > conventional wisdom seems to be that since obviously databases don't > scale on the web, you should just not use them at all. I have a group > of otherwise very bright people trying to convince me that a rdbms is > not a good place to store relational data because eventually it won't > scale. And of course we don't even have version 1 of our product out > of the door. I'll admit we do have a very good chance of actually > getting tons of traffic, but my position is to use a rdbms for > relational data, and then if and when it won't scale any more, deal > with it then. > > So what would really help me is some real world numbers on how > postgresql is doing in the wild under pressure. If anyone cares to > throw some out I would really appreciate it. > > Chris > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
snacktime wrote: > I'm working through the architecture design for a new product. We > have a small group working on this. It's a web app that will be using > ruby on rails. The challenge I'm running into is that the latest > conventional wisdom seems to be that since obviously databases don't > scale on the web, you should just not use them at all. I have a group > of otherwise very bright people trying to convince me that a rdbms is > not a good place to store relational data because eventually it won't > scale. And of course we don't even have version 1 of our product out > of the door. I'll admit we do have a very good chance of actually > getting tons of traffic, but my position is to use a rdbms for > relational data, and then if and when it won't scale any more, deal > with it then. > > So what would really help me is some real world numbers on how > postgresql is doing in the wild under pressure. If anyone cares to > throw some out I would really appreciate it. > > I've got a client doing 18M page views/ day and postgresql isn't really sweating. Dave > Chris > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > >
snacktime wrote: >> I'm working through the architecture design for a new product. We >> have a small group working on this. It's a web app that will be using >> ruby on rails. The challenge I'm running into is that the latest >> conventional wisdom seems to be that since obviously databases don't >> scale on the web, you should just not use them at all. I have a group >> of otherwise very bright people trying to convince me that a rdbms is >> not a good place to store relational data because eventually it won't >> scale. And of course we don't even have version 1 of our product out >> of the door. I'll admit we do have a very good chance of actually >> getting tons of traffic, but my position is to use a rdbms for >> relational data, and then if and when it won't scale any more, deal >> with it then. >> >> So what would really help me is some real world numbers on how >> postgresql is doing in the wild under pressure. If anyone cares to >> throw some out I would really appreciate it. It sounds like the RoR people are talking about any relational database, and not just Postgres. Many very busy sites do use relational databases successfully. So it can work. Many other have failed. So it can fail, if the situation is exceptionally unusual, or IMHO more likely, it´s poorly implemented. What the main argument of their ¨won´t scale¨ stance? Why not setup a test case to prove or disprove it? I don´t think anything we can suggest based on what we know of your project will help, unless someone happens to throw out a nearly identical case. I would be surprised if avoiding a database is a better solution. But regardless, I would be more worried about using a technology when most of the core group doesn´t believe in it. That often leads to bad results, regardless of whether it should. Paul
snacktime wrote on 21.10.2007 08:11: > I have a group > of otherwise very bright people trying to convince me that a rdbms is > not a good place to store relational data Hmm. Those bright people say that a /relational/ database management system is not a good place to store /relational/ data? I can't understand that reasoning... Where else do they want to store relational data than in a RDBMS? Thomas
Thomas Kellerer <spam_eater@gmx.net> writes: > Where else do they want to store relational data than in a RDBMS? Indeed. It seems like we can hardly answer the OP's question without asking "compared to what?" If they're afraid an RDBMS won't scale, what have they got in mind that they are so certain will scale? regards, tom lane
Dave Cramer wrote: > snacktime wrote: >> I'm working through the architecture design for a new product. We >> have a small group working on this. It's a web app that will be using >> ruby on rails. The challenge I'm running into is that the latest >> conventional wisdom seems to be that since obviously databases don't >> scale on the web, you should just not use them at all. I have a group >> of otherwise very bright people trying to convince me that a rdbms is >> not a good place to store relational data because eventually it won't >> scale. And of course we don't even have version 1 of our product out >> of the door. I'll admit we do have a very good chance of actually >> getting tons of traffic, but my position is to use a rdbms for >> relational data, and then if and when it won't scale any more, deal >> with it then. >> >> So what would really help me is some real world numbers on how >> postgresql is doing in the wild under pressure. If anyone cares to >> throw some out I would really appreciate it. I missed the original post on this, so I'm replying to Dave's response. To the OP, I don't know where you obtain your conventional wisdom from, but I'd look for another source. Just about any site you might visit that handles lots of data has a DBMS of some sort behind it; given that IMS and Adabase have been out of favor for 20 years, most of those DBMSs are relational. So if it can work for your bank, E*Trade and eBay, chances are it can work for you. As far as real world numbers, we have a data-intensive app (network data collection for a telecom company) that is currently inserting about 16 million rows a day. I benchmarked PG for that app and with some tweaking, PG could handle it. The current app uses stored procedures for all inserts, and PG didn't do well with that approach; substituting embedded inserts fixed that problem. So PG can definitely "handle" very large transaction volumes. As with any DBMS and any application, you may encounter challenges (like the one I point out with using stored procs for high-volume inserts) that require you to address with some thought. -- Guy Rouillier
On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote: > The current app uses stored procedures > for all inserts, and PG didn't do well with that approach; substituting > embedded inserts fixed that problem. So PG can definitely "handle" very Can you explain what is embedded inserts?
Dave Cramer wrote: > snacktime wrote: >> I'm working through the architecture design for a new product. We >> have a small group working on this. It's a web app that will be using >> ruby on rails. The challenge I'm running into is that the latest >> conventional wisdom seems to be that since obviously databases don't >> scale on the web, you should just not use them at all. I have a group >> of otherwise very bright people trying to convince me that a rdbms is >> not a good place to store relational data because eventually it won't >> scale. And of course we don't even have version 1 of our product out >> of the door. I'll admit we do have a very good chance of actually >> getting tons of traffic, but my position is to use a rdbms for >> relational data, and then if and when it won't scale any more, deal >> with it then. >> >> So what would really help me is some real world numbers on how >> postgresql is doing in the wild under pressure. If anyone cares to >> throw some out I would really appreciate it. >> >> > I've got a client doing 18M page views/ day and postgresql isn't really > sweating. We have a client doing 15k/tps via a website. I would say these, "very bright people" are "very bright but excessively ignorant". Joshua D. Drake > > Dave >> Chris >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 1: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> >> > > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
> As far as real world numbers, we have a data-intensive app > (network data > collection for a telecom company) that is currently inserting > about 16 > million rows a day. I benchmarked PG for that app and with some > tweaking, PG could handle it. Me too, not telco though. 5.5 million per day across 2240 tables and 4 databases with 50 days on-line. Tweaking to auto vaccuum to get it to keep up with the daily deletes and fsync off, slow disks not PG's fault but have UPS. Allan The material contained in this email may be confidential, privileged or copyrighted. If you are not the intended recipient,use, disclosure or copying of this information is prohibited. If you have received this document in error, pleaseadvise the sender and delete the document. Neither OneSteel nor the sender accept responsibility for any viruses containedin this email or any attachments.
Ow Mun Heng wrote: > On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote: >> The current app uses stored procedures >> for all inserts, and PG didn't do well with that approach; substituting >> embedded inserts fixed that problem. So PG can definitely "handle" very > > > Can you explain what is embedded inserts? Insert via embedded SQL insert statements in our Java code, as opposed to embedded SQL stored proc invocations, which in turn do the inserts. The existing code base used the latter approach, which didn't work well with PG. I suspect it has to do with PG's stored proc overload capability. The short of it is that *any* DBMS you use will have its own quirks that you become acquainted with and learn to work around. -- Guy Rouillier
On Mon, 2007-10-22 at 01:23 -0400, Guy Rouillier wrote: > Ow Mun Heng wrote: > > On Sun, 2007-10-21 at 22:54 -0400, Guy Rouillier wrote: > >> The current app uses stored procedures > >> for all inserts, and PG didn't do well with that approach; substituting > >> embedded inserts fixed that problem. So PG can definitely "handle" very > > > > > > Can you explain what is embedded inserts? > > Insert via embedded SQL insert statements in our Java code, as opposed > to embedded SQL stored proc invocations, which in turn do the inserts. AH.. so you sort of hard-code insert into table values($x,$y,$z,$w) instead of execute sp_insert($x,$y,$z,$w) > The existing code base used the latter approach, which didn't work well > with PG. I suspect it has to do with PG's stored proc overload > capability. The short of it is that *any* DBMS you use will have its > own quirks that you become acquainted with and learn to work around. Yeah.. I found out that PG doesn't like this statement. where audit_key_dtime >= (select last_refreshed from denorm_log where tablename = 'zon') and audit_key_dtime < (select last_refreshed + refresh_interval from denorm_log where tablename = 'zon') Index Cond: ((audit_key_dtime >= $0) AND (audit_key_dtime < $1)) but prefers this where audit_key_dtime >= '2007-08-08 18:00:00' and audit_key_dtime < '2007-08-08 18:01:00' Index Cond: ((audit_key_dtime >= '2007-08-08 18:00:00'::timestamp without time zone) AND (audit_key_dtime < '2007-08-08 18:01:00'::timestamp without time zone)) even though they are of the same 1 min interval
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Thomas Kellerer <spam_eater@gmx.net> writes: >> Where else do they want to store relational data than in a RDBMS? > > Indeed. It seems like we can hardly answer the OP's question without > asking "compared to what?" If they're afraid an RDBMS won't scale, > what have they got in mind that they are so certain will scale? I suspect they're misapplying the lesson Google taught everyone. Namely that domain-specific solutions can provide much better performance than general-purpose software. Google might not use an RDBMS to store their search index (which doesn't need any of the ACID guarantees and needs all kinds of parallelism and lossy alorithms which SQL and RDBMSes in general don't excel at), but on the other hand I would be quite surprised if they stored their Adsense or other more normal use data structures in anything but a bog-standard SQL database. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Sat, Oct 20, 2007 at 11:11:32PM -0700, snacktime wrote: > So what would really help me is some real world numbers on how > postgresql is doing in the wild under pressure. If anyone cares to > throw some out I would really appreciate it. One of my databases has about 70M rows inserted, 30M rows updated, 70M rows deleted, and 3G rows retrieved per day. At peak times of the day it sustains around 120K rows/minute inserted, 80K rows/minute updated or deleted, and 3.5M rows/minute retrieved and it has room to grow. Usage patterns are different than for a web application, however: transaction rates are low (a few hundred per minute) and most logic is in database functions with statements that operate on hundreds or thousands of rows at a time. Still, this gives an idea of what a PostgreSQL database on decent hardware can handle. -- Michael Fuhr
> On Sat, Oct 20, 2007 at 11:11:32PM -0700, snacktime wrote: >> So what would really help me is some real world numbers on how >> postgresql is doing in the wild under pressure. If anyone cares to >> throw some out I would really appreciate it. > > One of my databases has about 70M rows inserted, 30M rows updated, > 70M rows deleted, and 3G rows retrieved per day. At peak times of > the day it sustains around 120K rows/minute inserted, 80K rows/minute > updated or deleted, and 3.5M rows/minute retrieved and it has room > to grow. Usage patterns are different than for a web application, > however: transaction rates are low (a few hundred per minute) and > most logic is in database functions with statements that operate > on hundreds or thousands of rows at a time. Still, this gives an > idea of what a PostgreSQL database on decent hardware can handle. > > -- > Michael Fuhr > What kind of hardware are you using for this system? Just to get an idea of what 'decent hardware' is in this case. regards, vincent.
In response to Gregory Stark <stark@enterprisedb.com>: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > > > Thomas Kellerer <spam_eater@gmx.net> writes: > >> Where else do they want to store relational data than in a RDBMS? > > > > Indeed. It seems like we can hardly answer the OP's question without > > asking "compared to what?" If they're afraid an RDBMS won't scale, > > what have they got in mind that they are so certain will scale? > > I suspect they're misapplying the lesson Google taught everyone. Namely that > domain-specific solutions can provide much better performance than > general-purpose software. > > Google might not use an RDBMS to store their search index (which doesn't need > any of the ACID guarantees and needs all kinds of parallelism and lossy > alorithms which SQL and RDBMSes in general don't excel at), but on the other > hand I would be quite surprised if they stored their Adsense or other more > normal use data structures in anything but a bog-standard SQL database. Google also has enough high-calibre people that they can probably re-invent the concept of an RDBMS if they want to. Yet they don't. I know a particular Googleite who's a PostgreSQL buff and is bummed that they use MySQL all over the place. -- Bill Moran http://www.potentialtech.com
On Oct 22, 2007, at 5:44 AM, Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> Thomas Kellerer <spam_eater@gmx.net> writes: >>> Where else do they want to store relational data than in a RDBMS? >> >> Indeed. It seems like we can hardly answer the OP's question without >> asking "compared to what?" If they're afraid an RDBMS won't scale, >> what have they got in mind that they are so certain will scale? > > I suspect they're misapplying the lesson Google taught everyone. > Namely that > domain-specific solutions can provide much better performance than > general-purpose software. That and they've probably been reading a lot of reddit which has had a fairly large number of posts over the past few months along the lines of "CouchDB is the future for everything!!!" or "RDBMS are old technology solution that no longer applies to today's problems!!!". Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On Sat, 20 Oct 2007, snacktime wrote: > It's a web app that will be using ruby on rails. The challenge I'm > running into is that the latest conventional wisdom seems to be that > since obviously databases don't scale on the web, you should just not > use them at all. Those who don't use a DBMS to store data are doomed to reinvent one, poorly. Ruby On Rails makes some design trade-offs that can make scaling a large database driven installation challenging. An article that greatly popularized the issues involved is the interview at http://www.radicalbehavior.com/5-question-interview-with-twitter-developer-alex-payne/ There are a variety of workarounds for this though. Some people use memcached http://www.danga.com/memcached/ in various ways to reduce the work the database handles directly. There are Ruby-based approaches like http://magicmodels.rubyforge.org/magic_multi_connections/ that let you use a database cluster to scale higher. On the PostgreSQL side, programs like PgBouncer let you reduce database-related overhead when it's the sheer number of physical connections dragging performance down https://developer.skype.com/SkypeGarage/DbProjects/PgBouncer The thing to observe here is that even should the database itself turn into a scaling bottleneck, there are several clever solutions that let you keep the good things about a proper RDBMS while providing higher throughput for the specific demands of the web application environment. Accordingly, focusing on the raw performance of PostgreSQL is kind of misleading. You shouldn't have to hit the database for everything if you put the right kind of layer on top. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Oct 22, 2007 at 01:33:54PM +0200, vincent wrote: > > One of my databases has about 70M rows inserted, 30M rows updated, > > 70M rows deleted, and 3G rows retrieved per day. At peak times of > > the day it sustains around 120K rows/minute inserted, 80K rows/minute > > updated or deleted, and 3.5M rows/minute retrieved and it has room > > to grow. Usage patterns are different than for a web application, > > however: transaction rates are low (a few hundred per minute) and > > most logic is in database functions with statements that operate > > on hundreds or thousands of rows at a time. Still, this gives an > > idea of what a PostgreSQL database on decent hardware can handle. > > What kind of hardware are you using for this system? Just to get an idea > of what 'decent hardware' is in this case. I don't know the full specs because another group is responsible for that. I think the box has four Athlon 64 X2s with 32G RAM. At least some of the storage is SAN-attached. -- Michael Fuhr