Re: how good is PostgreSQL - Mailing list pgsql-general
From | Steve Wolfe |
---|---|
Subject | Re: how good is PostgreSQL |
Date | |
Msg-id | 003501c04379$78859ca0$50824e40@iboats.com Whole thread Raw |
In response to | how good is PostgreSQL ("Arnold Gamboa" <arnold@php4us.com>) |
List | pgsql-general |
> Even after that, you have a long way to go before you will hit 1000 > transactions per second from any SQL database. Since my last post probably wasn't too useful, here's some information that might be a little more help. It's a little long, I know, but hopefully it will be of use to someone. As programmers, we naturally want to throw things into databases for three reasons. First, it's easy to get data in. Second, it's easy to get relevant data out. And third, it's "cool". We don't want to work with flat files, now do we? ; ) However, in some cases, using the database to get data out ends up costing us a lot of time and money. Sometimes we do the same nasty query so often, that we end up purchasing bigger hardware to make the system work reasonably. Why? Because it was easier for us to write a program that did: GetDataFromDatabase(); PrepareData(); PrintData(); Each time, the database server does the work. But it doesn't necessarily have to be that way. In our company, we've found two trends that have enabled us to save a LOT of processing power on our machines. (read: Increase the capacity of our servers by 30% or more, with fairly minor changes) The first case is that of rarely-changing data. Some of our datasets probably have around 50,000 to 1,000,000 views (selects) for each update (insert/delete). Having the database repeat the query every time is a waste. So, we began writing our programs such that they will grab the data from the database once, and generate the HTML for every page, and the indexes. Then, when an update is made to the database (via the administrative tools), it simply rewrites *the relevant HTML files*, and changes the indeces pointing to them. (There are also some other very large advantages to this sort of thing, but I'm not allowed to say them. ; ) ) The second case is that of often-repeated queries. One of the offerings on our site is an online directory, which gets a pretty fair amount of traffic. Unfortunately, it uses a proprietary program that was purchased by management before they spoke with us. Grr.... It was the most utterly inefficient program I've ever seen. It would *not* allow the database to do joins, it would grab entire tables, then try to do the joins itself, in Perl. We rewrote the program to let PostgreSQL do the joins, and that sped it up. Then we realized that a very small number of queries (those for the first one or two levels of pages) accounted for a huge portion of the useage. So, we replaced the front page with a static HTML page (the front page doesn't change...), and saw another terrific drop in our system loads. Overall, by only modifying a couple of our more heavily-uesd programs, our server loads dropped by about 30%-40%. If we went to the trouble to modify some others, it would drop even more. But we're going to rewrite them completely for other reasons. : ) In any event, there are ways like this to save a LOT of CPU and disk I/O. Most web servers can server out several hundred static pages with the resources that would otherwise deliver one dynamically-created, database-driven page. It also allows you to cluster the web servers with cheap commodity hardware, instead of using big-iron on the database. And if you have a big-iron machine running the back-end, this can severely lighten the load on it, keeping you from dropping a few hundred grand on the next step up. ; ) (Incidentally, we've toyed around with developping a query-caching system that would sit betwen PostgreSQL and our DB libraries. However, it seems like it could be done *much* more efficiently in PostgreSQL itself, as it would be much easier to keep track of which tables have changed, etc.. Anybody know if this sort of functionality is planned? It would be terrific to simply give the machine another 256 megs of RAM, and tell it to use it as a DB cache...) steve
pgsql-general by date: