Thread: how good is PostgreSQL
Hi, For users of large PostgreSQL and PostgreSQL builders, this is for you. I'm having a terrible time deciding now. :( We're about to build a "huge" website now. I got tied up in signing the contract without really getting enough information about PgSQL since this what we plan to implement with PHP (normally we use mySQL but i guess it does not fit for huge databases like that). Here's my problem.. We're about to build a site like hitbox.com where there is a large amount of database required.. If say there is 100,000 users with 1000 page hits per day for each, and everything will be logged, you could imagine how huge this will be. I'm just so "nervous" (really, that's the term) if we implement this and later on experience a slow down or worse than that, crash in the server. My questions are: 1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW DOWN. 2. ....limit in number of tables per database 3. ... limit in number of database. Thanks for you comments. I would really appreciate every comment that I'll receive regarding this. Arnold
"Arnold Gamboa" <arnold@php4us.com> writes: > We're about to build a "huge" website now. I got tied up in signing the > contract without really getting enough information about PgSQL since this > what we plan to implement with PHP (normally we use mySQL but i guess it > does not fit for huge databases like that). Can you do connection pooling and client side caching of database queries in PHP ? From working with Java this is the spot we really improve speed. > > Here's my problem.. We're about to build a site like hitbox.com where there > is a large amount of database required.. If say there is 100,000 users with > 1000 page hits per day for each, and everything will be logged, you could > imagine how huge this will be. I'm just so "nervous" (really, that's the > term) if we implement this and later on experience a slow down or worse than > that, crash in the server. How many database queries do you have per page hit ? How many database inserts/updates do you have per page hit ? Are you using the database for httpd access logging, or is it some application level logging ? Anyhow you might want to look into an architecture where you have a dedicated box for the logging. But most important, test with real data. Populate your database and run stress tests. I'm was doing some testing on a portal my company has developed with PostgreSQL as the backend database. Running on my Linux laptop P466 with 128MB, Apache JServ, PostgreSQL 7.0.2. I managed to get about ~20 pageviews a second. Each pageview had on average 4 queries and 1 insert. But measure for yourself. Remember that you can gain a lot by tuning application, database and OS. regards, Gunnar
Arnold Gamboa wrote: > Hi, > > For users of large PostgreSQL and PostgreSQL builders, this is for you. > > I'm having a terrible time deciding now. :( > > We're about to build a "huge" website now. I got tied up in signing the > contract without really getting enough information about PgSQL since this > what we plan to implement with PHP (normally we use mySQL but i guess it > does not fit for huge databases like that). > > Here's my problem.. We're about to build a site like hitbox.com where there > is a large amount of database required.. If say there is 100,000 users with > 1000 page hits per day for each, and everything will be logged, you could > imagine how huge this will be. I'm just so "nervous" (really, that's the > term) if we implement this and later on experience a slow down or worse than > that, crash in the server. That is a LOT of work for any system. That is over 1100 page views a second, or under 900us each.. A standard Pentium III system, serving static pages would have problems with that. If you look at search engines, to get that performance with readonly data, they usually cluster multiple systems and load balance across them. You may need to segment your data and have multiple SQL servers perform different functions. Also, that 1100 page view per second is assuming an even distribution of traffic, which does not happen in a web server. If you average that much, chances are there will be periods of twice that. Look into a "local director," "Alteon," or even LVS. > > > My questions are: > 1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW > DOWN. > 2. ....limit in number of tables per database > 3. ... limit in number of database. There are a couple factors involved, more complex than a simple response. Use multiple databases and put each on a separate disk, with its own controller. Better yet, have multiple load balanced web boxes do a lot of processing in PHP and offload much of the CPU bound SQL work to the "cheap" web boxes, and have multiple SQL databases in the back handling various independent tasks. In a web site I worked on, we had multiple front end web servers, load balanced with an Alteon. Each web server had its own SQL database which provided SQL access to "static" data which was updated each week. We had an additional single SQL database backend which all the Web servers accessed for synchronized dynamic data. If you are serious about the load you expect to put on this system you must be careful: Do not create any indexes you do not need. Do not use the "foreign key" constraint as it forces a trigger for each insert. Make sure you index the keys by which you will access data. Avoid searching by strings, try to use keys. Even after that, you have a long way to go before you will hit 1000 transactions per second from any SQL database. If you are betting your business on this implementation, you have a lot of homework to do. > > > Thanks for you comments. I would really appreciate every comment that I'll > receive regarding this. > > Arnold
> Even after that, you have a long way to go before you will hit 1000 > transactions per second from any SQL database. I guess they could always buy a few Sun E10000's on the backend, and a large room of rack-mountable PC's for web/CGI serving. Nothing like plopping down ten or twenty million dollars on hardware. : ) steve
Steve Wolfe wrote: > > > Even after that, you have a long way to go before you will hit 1000 > > transactions per second from any SQL database. > I guess they could always buy a few Sun E10000's on the backend, and a > large room of rack-mountable PC's for web/CGI serving. Nothing like > plopping down ten or twenty million dollars on hardware. : ) Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on that. Probably would cost less, and be more reliable. And they can always load another Linux/390 VM -- an S/390 can run something like 41,000 virtual machines each running Linux/390 and Apache. However, if you want to see the architecture of a _large_ database-backed website, see the story behind Digital City at www.aolserver.com. While they're using Sybase instead of PostgreSQL, the architecture is the same. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
> Or they could buy a single IBM S/390, run Linux/390 and PostgreSQL on > that. Probably would cost less, and be more reliable. And they can > always load another Linux/390 VM -- an S/390 can run something like > 41,000 virtual machines each running Linux/390 and Apache. Yeah.... I'm very optomistic about IBM's new chips that are coming out next year. Each "processor module" will have 4 processors, but each processor will have 2 cores - so in effect, each "processor module" has 8 processors on it. All processors will have copper interconnects, and depending on the source, will debut at anywhere from 1.3 to 2 gigahertz. I think that will certainly help them get a larger share of the high-end market! steve
> Even after that, you have a long way to go before you will hit 1000 > transactions per second from any SQL database. Since my last post probably wasn't too useful, here's some information that might be a little more help. It's a little long, I know, but hopefully it will be of use to someone. As programmers, we naturally want to throw things into databases for three reasons. First, it's easy to get data in. Second, it's easy to get relevant data out. And third, it's "cool". We don't want to work with flat files, now do we? ; ) However, in some cases, using the database to get data out ends up costing us a lot of time and money. Sometimes we do the same nasty query so often, that we end up purchasing bigger hardware to make the system work reasonably. Why? Because it was easier for us to write a program that did: GetDataFromDatabase(); PrepareData(); PrintData(); Each time, the database server does the work. But it doesn't necessarily have to be that way. In our company, we've found two trends that have enabled us to save a LOT of processing power on our machines. (read: Increase the capacity of our servers by 30% or more, with fairly minor changes) The first case is that of rarely-changing data. Some of our datasets probably have around 50,000 to 1,000,000 views (selects) for each update (insert/delete). Having the database repeat the query every time is a waste. So, we began writing our programs such that they will grab the data from the database once, and generate the HTML for every page, and the indexes. Then, when an update is made to the database (via the administrative tools), it simply rewrites *the relevant HTML files*, and changes the indeces pointing to them. (There are also some other very large advantages to this sort of thing, but I'm not allowed to say them. ; ) ) The second case is that of often-repeated queries. One of the offerings on our site is an online directory, which gets a pretty fair amount of traffic. Unfortunately, it uses a proprietary program that was purchased by management before they spoke with us. Grr.... It was the most utterly inefficient program I've ever seen. It would *not* allow the database to do joins, it would grab entire tables, then try to do the joins itself, in Perl. We rewrote the program to let PostgreSQL do the joins, and that sped it up. Then we realized that a very small number of queries (those for the first one or two levels of pages) accounted for a huge portion of the useage. So, we replaced the front page with a static HTML page (the front page doesn't change...), and saw another terrific drop in our system loads. Overall, by only modifying a couple of our more heavily-uesd programs, our server loads dropped by about 30%-40%. If we went to the trouble to modify some others, it would drop even more. But we're going to rewrite them completely for other reasons. : ) In any event, there are ways like this to save a LOT of CPU and disk I/O. Most web servers can server out several hundred static pages with the resources that would otherwise deliver one dynamically-created, database-driven page. It also allows you to cluster the web servers with cheap commodity hardware, instead of using big-iron on the database. And if you have a big-iron machine running the back-end, this can severely lighten the load on it, keeping you from dropping a few hundred grand on the next step up. ; ) (Incidentally, we've toyed around with developping a query-caching system that would sit betwen PostgreSQL and our DB libraries. However, it seems like it could be done *much* more efficiently in PostgreSQL itself, as it would be much easier to keep track of which tables have changed, etc.. Anybody know if this sort of functionality is planned? It would be terrific to simply give the machine another 256 megs of RAM, and tell it to use it as a DB cache...) steve
markw wrote: > > Arnold Gamboa wrote: > > > Hi, > > > > For users of large PostgreSQL and PostgreSQL builders, this is for you. ..snip.. > > Also, that 1100 page view per second is assuming an even distribution of > traffic, which does not happen in a web server. If you average that much, > chances are there will be periods of twice that. > That's excessively optimistic. If your daily average is 1100 per second, you'll have 2200 average for many of the hours in that day, 5500 for a few hours, and some 10-minute periods with 11,000, certainly once in a while. ++ kevin -- Kevin O'Gorman (805) 650-6274 mailto:kogorman@pacbell.net Permanent e-mail forwarder: mailto:Kevin.O'Gorman.64@Alum.Dartmouth.org At school: mailto:kogorman@cs.ucsb.edu Web: http://www.cs.ucsb.edu/~kogorman/index.html Web: http://trixie.kosman.via.ayuda.com/~kevin/index.html "There is a freedom lying beyond circumstance, derived from the direct intuition that life can be grounded upon its absorption in what is changeless amid change" -- Alfred North Whitehead
> As programmers, we naturally want to throw things into databases for > three reasons. First, it's easy to get data in. Second, it's easy to get > relevant data out. And third, it's "cool". We don't want to work with > flat files, now do we? ; ) Kiddin', eh? :) Actually, the third reason seems to dominate the younger developers' minds. People often tend to keep everything in poor DBMS until it begins to kick back. And this has impact on the customers. Does your system use a database? No, why should it? You mean you'll keep our dearly beloved banner ads as flat files? Yes, this is where they belong. Sorry, we'll seek for someone more advanced. Good luck. Of course, hardware vendors jump up of joy :) Maybe I don't get it, but IMHO there's no reason to put into DB something that can't be indexed and used in where clause. > It would *not* allow the > database to do joins, it would grab entire tables, then try to do the > joins > itself, in Perl. Umh.... Yeah.... Well.... To keep compatibility with other Open Source Databases and ESR/RMS, you know :) > (Incidentally, we've toyed around with developping a query-caching > system that would sit betwen PostgreSQL and our DB libraries. Sounds amazing, but requires some research, I guess. However, in many cases one would be more than happy with cahced connections. Of course, cahced query results can be naturally added to that, but just connections are OK to start with. Security.... -- contaminated fish and microchips huge supertankers on Arabian trips oily propaganda from the leaders' lips all about the future there's people over here, people over there everybody's looking for a little more air crossing all the borders just to take their share planning for the future Rainbow, Difficult to Cure
> > (Incidentally, we've toyed around with developping a query-caching > > system that would sit betwen PostgreSQL and our DB libraries. > > Sounds amazing, but requires some research, I guess. However, in many > cases one would be more than happy with cahced connections. Of course, > cahced query results can be naturally added to that, but just connections > are OK to start with. Security.... To me, it doesn't sound like it would be that difficult of a project, at least not for the likes of the PostgreSQL developpers. It also doesn't seem like it would really introduce any security problems, not if it were done inside of PostgreSQL. Long ago, I got sidetracked from my endeavors in C, and so I don't feel that I'm qualified to do it. (otherwise, I would have done it already. : ) ) If you wanted it done in Perl or Object Pascal, I could help. : ) Here's a simple design that I was tossing back and forth. Please understand that I'm not saying this is the best way to do it, or even a good way to do it. Just a possible way to do it. I haven't been able to give it as much thought as I would like to. Here goes. ------------ Implementation Upon starting, the PostgreSQL engine could allocate a chunk of memory, sized according to the administrator's desire. That chunk would be used solely for query caching. When a query came in that was not cached (say, the first query), the database engine would process it as normal. It would then return it to the user, and add it to the cache. "Adding it to the cache" would mean that it would enter the query itself, the result set, and a list of which tables the query relied upon. The query that is stored could be either the query coming from the user, or the query after it goes through the optimizer. Each has pros and cons, I would probably favor using the query that comes from the user. When another query comes along, the caching engine would quickly look in the hash table, and see if it already had the cached results of the query. If so, it returns them, and wham. You've just avoided all of the work of optimizing, parsing, and executing, not to mention the disk I/O. A hash lookup seems extremely cheap compared to the work of actually processing a query. When an update/delete/insert comes along, the engine would analyze which tables were affected, and clear the cache entries that relied upon those tables. ----------------- Cache Clearing Cache clearing would be achieved via an LRU-based algorithm, which would also take into account the amount of RAM used by each query in the cache. ----------------- Performance Impact The potential performance differences range from a miniscule decrease to a tremendous increase. And it's a lot cheaper to throw an extra half gig of RAM in a machine that to upgrade processors and disk subsystems! ------------------ Possible Changes One potential drawback is that when a table is modified, the queries that rely upon it would be discarded. Where a table is updated frequently, that could greatly reduce the performance benefit. One possible alternative is to store the query cost with each query in the cache. When a table is updated, those queries are marked as "dirty". If the system load is below a certain amount, or the system has been idle, it could then re-execute those queries and update the cache. Which queries it re-executed would be determined on a factor of query cost and how frequently those cache entries were used. ------------------- The reason I would prefer it done in the PostgreSQL engine (as opposed to in a middleware application) is that the caching engine needs to know (a) which tables a query relies upon, and (b) which tables get changed. It seems that it would significantly reduce overhead to do those inside of PostgreSQL (which is already doing the query parsing and analysis). This could certainly give PostgreSQL a huge advantage over other database systems, too. It could save administrators a very large chunk of cash that they would otherwise have to spend on large systems. And it would just be cool. ; ) steve
> Whenever a query is executed (not found in cache, etc.), the caching > system would simply store the query, the results, and a list of tables > queried. When a new query came in, it would do a quick lookup in the query > hash to see if it already had the results. If so, whammo. Whenever an > insert/delete/update was sensed, it would look at the tables being affected, > and the caching mechanism would clear out the entries depending on those > tables. It seems to me that tracking the list of cached queries and watching for queries that might invalidate them adds a lot of complexity to the back end and the front end still has to establish the connection and wait transfer the data over the socket. On a more practical level, a backend solution would require someone with fairly detailed knowlege of the internals of the backend. A front end solution can more likely to be implemented by someone not as knowlegable. One of the big advantages of your technique is there is no code change at the application level. This means less database lock-in. Maybe that is a disadvantage too. ;-)
> It seems to me that tracking the list of cached queries and watching for > queries that might invalidate them adds a lot of complexity to the back end > and the front end still has to establish the connection and wait transfer > the data over the socket. I really don't think that it would. Checking to see if you have a query (a hash lookup) is very, very cheap relative to normally processing a query, I would think. And invalidating cache entries would also be very, very cheap compared to the normal activity of the database. Assuming hashes are done correctly, it would probably be done much, much faster than any query could execute. If software caches can increase the performance of disk drives that have latencies in thousandths of seconds, I'm sure they could help with queries that take hundredths or tenths of seconds. ; ) > On a more practical level, a backend solution would require someone with > fairly detailed knowlege of the internals of the backend. A front end > solution can more likely to be implemented by someone not as knowlegable. Yeah. I was hoping that one of the developpers would say "oooh... that would rock. We should do that." : ) > One of the big advantages of your technique is there is no code change at > the application level. This means less database lock-in. Maybe that is a > disadvantage too. ;-) I'm sure that someone with a better understanding of the theory associated with cache invalidation would design a better algorithm that I would, but it seems that even a fairly rudimentary implementation would seriously increase performance. steve
> Here's a simple design that I was tossing back and forth. Please > understand that I'm not saying this is the best way to do it, or even a > good way to do it. Just a possible way to do it. Sounds interesting, I certainly have reasons to play bad guy, but that's what I always do, so nevermind :) However, there's one major point where I disagree. Not that I have real reasons to, or observation or analysis to background my position, just a feeling. And the feeling is that connection/query cache should be separate from DBMS server itself. Several things come to the mind right off, like possibilities to cache connections to different sources, like PGSQL and Oracle, as well as a chance to run this cache on a separate box that will perform various additional functions, like load balancing. But that's right on the surface. Still in doubt.... -- contaminated fish and microchips huge supertankers on Arabian trips oily propaganda from the leaders' lips all about the future there's people over here, people over there everybody's looking for a little more air crossing all the borders just to take their share planning for the future Rainbow, Difficult to Cure
> Sounds interesting, I certainly have reasons to play bad guy, but that's > what I always do, so nevermind :) That's OK. Somebody has to be a realist. : ) > However, there's one major point where I disagree. Not that I have real > reasons to, or observation or analysis to background my position, just a > feeling. And the feeling is that connection/query cache should be separate > from DBMS server itself. > Several things come to the mind right off, like possibilities to cache > connections to different sources, like PGSQL and Oracle, That would be a benefit if you're running multiple DBMS' in the house - and you're certainly welcome to do something like that as a standalone package. ; ) I think it would be terrific if PostgreSQL could have the feature added to it, which would (a) give it a big performance benefit, (b) let it take advantage of already-written code, and (c) make one less machine and service to administer. > as well as a > chance to run this cache on a separate box that will perform various > additional functions, like load balancing. But that's right on the surface. > Still in doubt.... Yes, load-balancing would be another good factor. However, to my (very limitted) knowledge, there aren't any truly good ways of splitting up database work. If you're doing nothing but selects, it would be easy. But when updates come around, it gets hairier - and when you try to try for dynamic continuity-checking and database rebuilding, it gets very ugly. If there are any systems that get around those without huge performance hits, I'd love to hear about it. (Of course, if you have lots of money, a Beowolf-style cluster with high bandwidth, low-latency interconnects becomes desireable. But that's a different ballgame.) However, there is one other possibility: With caching, your servers might see enough of a performance increase that you wouldn't need to load-balance them. : ) steve
KuroiNeko wrote: > > > Here's a simple design that I was tossing back and forth. Please > > understand that I'm not saying this is the best way to do it, or even a > > good way to do it. Just a possible way to do it. > > Sounds interesting, I certainly have reasons to play bad guy, but that's > what I always do, so nevermind :) I think this feature deserves to be put on the TODO list under exotic features. This feature would probably also be a threat to MySQL dominance in the web scripting area for websites with medium to high traffic ;) Poul L. Christiansen
* Steve Wolfe <steve@iboats.com> [001031 13:47] wrote: > > > > (Incidentally, we've toyed around with developping a > query-caching > > > system that would sit betwen PostgreSQL and our DB libraries. > > > > Sounds amazing, but requires some research, I guess. However, in > many > > cases one would be more than happy with cahced connections. Of > course, > > cahced query results can be naturally added to that, but just > connections > > are OK to start with. Security.... > > To me, it doesn't sound like it would be that difficult of a project, at > least not for the likes of the PostgreSQL developpers. It also doesn't seem > like it would really introduce any security problems, not if it were done > inside of PostgreSQL. Long ago, I got sidetracked from my endeavors in C, > and so I don't feel that I'm qualified to do it. (otherwise, I would have > done it already. : ) ) If you wanted it done in Perl or Object Pascal, I > could help. : ) > > Here's a simple design that I was tossing back and forth. Please > understand that I'm not saying this is the best way to do it, or even a good > way to do it. Just a possible way to do it. I haven't been able to give it > as much thought as I would like to. Here goes. > > ------------ > Implementation > [snip] Karel Zak <zakkr@zf.jcu.cz> Implemented stored proceedures for postgresql but still hasn't been approached to integrated them. You can find his second attempt to get a response from the developers here: http://people.freebsd.org/~alfred/karel-pgsql.txt -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
> I think this feature deserves to be put on the TODO list under exotic > features. Well, it's kinda implemented already, I believe, with decades of being run unattended :) > This feature would probably also be a threat to MySQL dominance in the > web scripting area for websites with medium to high traffic ;) Dominance? Who needs it, anyway? -- contaminated fish and microchips huge supertankers on Arabian trips oily propaganda from the leaders' lips all about the future there's people over here, people over there everybody's looking for a little more air crossing all the borders just to take their share planning for the future Rainbow, Difficult to Cure
a) Don't log to a database. Log data should be sent into a process that collects any needed on-the-fly statistics and then outputs into disk files (rotating hourly or daily depending on your needs). This model is becoming pretty standard with Apache now; look at rotatelog in the Apache distribution for an example. b) Number of records isn't really the issue. Query complexity and number of queries are more pertinent. Generally, for example, a single SELECT that pulls in multiple rows is much faster than a bunch of small SELECTs. c) For very high traffic, you are going to have multiple front-end servers. If you design the system carefully, you can have a single shared network disk used by all of your front ends, then just stack boxes in front of it. This doesn't give you endless scalability, though; at some point you'll saturate your network file server and/or database box. d) PHP may not be a great choice. It doesn't provide a lot of hooks for effective caching of database connections and/or results. mod_perl or Java servlets may be better, depending on the details. - Tim Kientzle Arnold Gamboa wrote: > > Hi, > > For users of large PostgreSQL and PostgreSQL builders, this is for you. > > I'm having a terrible time deciding now. :( > > We're about to build a "huge" website now. I got tied up in signing the > contract without really getting enough information about PgSQL since this > what we plan to implement with PHP (normally we use mySQL but i guess it > does not fit for huge databases like that). > > Here's my problem.. We're about to build a site like hitbox.com where there > is a large amount of database required.. If say there is 100,000 users with > 1000 page hits per day for each, and everything will be logged, you could > imagine how huge this will be. I'm just so "nervous" (really, that's the > term) if we implement this and later on experience a slow down or worse than > that, crash in the server. > > My questions are: > 1. What is the limit for number of records in a table WITHOUT SUFFERING SLOW > DOWN. > 2. ....limit in number of tables per database > 3. ... limit in number of database. > > Thanks for you comments. I would really appreciate every comment that I'll > receive regarding this. > > Arnold
> d) PHP may not be a great choice. It doesn't provide a lot of hooks > for effective caching of database connections and/or results. > mod_perl or Java servlets may be better, depending on the details. One of our competitors spent a very, very large deal of money on high-end Sun equipment, so that they could write their CGI stuff in Java servlets. It still ran slow. We run Perl on machines that pale compared to theirs, and get far better performance. : ) steve
> We run Perl on machines that pale compared to theirs, > and get far better performance. : ) Well, don't get me wrong, I'm not going to a war. Here :) But CGI is so simple and straightforward that anything more than C is quite an overkill (think assembly). Myself I'm planning to port all my PERL stuff eventually. Yes, PERL is great for string handling, but when you spend a couple of weeks on BugTraq, you'll suddenly feel that it's still too much. When you only let `known good' values in, lex or regexp libs will do. Sorry for the offtopic, anyone interested is welcome to email me in private. Ed -- contaminated fish and microchips huge supertankers on Arabian trips oily propaganda from the leaders' lips all about the future there's people over here, people over there everybody's looking for a little more air crossing all the borders just to take their share planning for the future Rainbow, Difficult to Cure
"Steve Wolfe" <steve@iboats.com> writes: > One of our competitors spent a very, very large deal of money on high-end > Sun equipment, so that they could write their CGI stuff in Java servlets. > It still ran slow. We run Perl on machines that pale compared to theirs, > and get far better performance. : ) You can always do it slow if you don't design properly. A former customer saved a lot hardware and maintenance cost by migrating from a perl based publishing system to a Java based one. Less hardware, better performance and more functionality. ;-) The old perl system had been developed and maintained over a 4 year period - the initial development of the new Java based system took about 9 months. regards, Gunnar
i have a client which merged two companies, one running perl the other running java. what to do? i convinced them to port both the perl and java code to INTERCAL, and run the whole system on an array of C-64's. works better than either of the perl or java stuff. On Wed, Nov 01, 2000 at 01:58:55AM +0100, Gunnar R|nning wrote: > "Steve Wolfe" <steve@iboats.com> writes: > > One of our competitors spent a very, very large deal of money on high-end > > Sun equipment, so that they could write their CGI stuff in Java servlets. > > It still ran slow. We run Perl on machines that pale compared to theirs, > > and get far better performance. : ) > > You can always do it slow if you don't design properly. A former customer > saved a lot hardware and maintenance cost by migrating from a perl based > publishing system to a Java based one. Less hardware, better performance and > more functionality. ;-) The old perl system had been developed and maintained > over a 4 year period - the initial development of the new Java based system > took about 9 months. -- [ Jim Mercer jim@reptiles.org +1 416 410-5633 ] [ Reptilian Research -- Longer Life through Colder Blood ] [ Don't be fooled by cheap Finnish imitations; BSD is the One True Code. ]
On Tue, 31 Oct 2000, Jim Mercer wrote: > i convinced them to port both the perl and java code to INTERCAL, and run > the whole system on an array of C-64's. But there are no bindings from Postgresql to intercal! ;) (I hope I didn't just give a bad idea to someone...;) -alex
On the topic of query cache (or maybe this is just tangential and I'm confused): I've always heard that Oracle has the ability to essentially suck in as much of the database into RAM as you have memory to allow it, and can then just run its queries on that in-RAM database (or db subset) without doing disk I/O (which I would probably imagine is one of the more expensive parts of a given SQL command). I've looked for references as to Postgresql's ability to do something like this, but I've never been certain if it's possible. Can postgresql do this, please? And, if not, does it have to hit the disk for every SQL instruction (I would assume so)? I would imagine that the actual query cache would be slightly orthogonal to this in-RAM database cache, in as much as it would actually store the results of specific queries, rather than the complete tuple set on which to run queries. However, I would imagine that both schemes would provide performance increases. Also, as KuroiNeko writes below about placing the query cache outside the actual DBMS, don't some webservers (or at least specific custom coding implementations of them) just cache common query results themselves? (Not that it would necessarily be bad for the DBMS to do so, I wouldn't know enough about this to surmise.) I'd appreciate any pointers to more information on specific performance tuning in this area (IMHO, it would probably be a boon to the postgresql database and its community, if there existed some reference like O'Reilly's _Oracle Performance Tuning_ that was focused on Postgresql.) Thanks for any extra info, Daniel On Tue, 31 Oct 2000, KuroiNeko wrote: > > Here's a simple design that I was tossing back and forth. Please > > understand that I'm not saying this is the best way to do it, or even a > > good way to do it. Just a possible way to do it. > > Sounds interesting, I certainly have reasons to play bad guy, but that's > what I always do, so nevermind :) > However, there's one major point where I disagree. Not that I have real > reasons to, or observation or analysis to background my position, just a > feeling. And the feeling is that connection/query cache should be separate > from DBMS server itself. > Several things come to the mind right off, like possibilities to cache > connections to different sources, like PGSQL and Oracle, as well as a > chance to run this cache on a separate box that will perform various > additional functions, like load balancing. But that's right on the surface. > Still in doubt.... >
> I've looked for references as to > Postgresql's ability to do something like this, but I've never been > certain if it's possible. Can postgresql do this, please? And, if not, > does it have to hit the disk for every SQL instruction (I would assume > so)? Doing so, as you might guess is quite dangerous. Eg, RAM failures are extremely rare, with probability very close to 0, but there's nothing absolutely reliable. From my, quite limited, experience, I can tell that PGSQL relies more on file caching (or whatever is the term), provided by the OS, rather than on slurping relations into RAM. See the recent discussion of [f]sync(), maybe it sheds more light. > I would imagine that the actual query cache would be slightly orthogonal > to this in-RAM database cache Actually, there are several ways to keep the data in memory, each having its advantages drawbacks and reasons. To name just a few: caching pages and files, mapping files, storing `internal' structures (like the tuples in your example) in shared memory areas. Apologets and enemies of each method come in all shapes, but the real life is even worse. Often these methods interfere with each other, and inaccurate combination (you cache the pages, but overlooked file caching, performed by the OS) may easily become a bottleneck. > I'd appreciate any pointers to more information on specific performance > tuning in this area (IMHO, it would probably be a boon to the postgresql > database and its community, if there existed some reference like > O'Reilly's _Oracle Performance Tuning_ that was focused on Postgresql.) As I see it, performance tuning with PGSQL should be concentrated around quality design of your DB and queries. I may be wrong, but there's not much to play with where PGSQL server touches the system. Maybe it's bad, but I like it. General suggestions about fs performance apply to PGSQL and you don't have to re-invent the wheel. There are just files. Play with sync, install a RAID of SCSI drives, keep your swap on separate controller. Nothing really special that would impact or, what's more important, interfere with other services running on the same box. Change must come from inside :) Here, inside is DB design. Ed -- contaminated fish and microchips huge supertankers on Arabian trips oily propaganda from the leaders' lips all about the future there's people over here, people over there everybody's looking for a little more air crossing all the borders just to take their share planning for the future Rainbow, Difficult to Cure
Daniel Freedman wrote: > > On the topic of query cache (or maybe this is just tangential and I'm > confused): > > I've always heard that Oracle has the ability to essentially suck in as > much of the database into RAM as you have memory to allow it, and can then > just run its queries on that in-RAM database (or db subset) without doing > disk I/O (which I would probably imagine is one of the more expensive > parts of a given SQL command). I've looked for references as to > Postgresql's ability to do something like this, but I've never been > certain if it's possible. Can postgresql do this, please? And, if not, > does it have to hit the disk for every SQL instruction (I would assume > so)? PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's are cached, but the default cache is only ½MB of RAM. You can change this to whatever you want. I'm using Cold Fusion and it can cache queries itself, so no database action is necessary. But I don't think PHP and others have this possibility. But Cold Fusion costs 1300$ :( Poul L. Christiansen
> PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's > are cached, but the default cache is only ½MB of RAM. You can change > this to whatever you want. > > I'm using Cold Fusion and it can cache queries itself, so no database > action is necessary. But I don't think PHP and others have this > possibility. But Cold Fusion costs 1300$ :( No, PHP has this. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
On Wed, Nov 01, 2000 at 10:16:58AM +0000, Poul L. Christiansen wrote: > PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's > are cached, but the default cache is only ½MB of RAM. You can change > this to whatever you want. That sound like a very cool thing to do, and the default seems awfully conservative, given the average server´s RAM equipment nowadays. If you have a small Linux server with 128 MB of RAM, it would be interesting to see what happens, performance-wise, if you increase the cache for selects to, for instance, 64 MB. Has anyone tried to benchmark this? How would you benchmark it? Where do you change this cache size? How do you keep the cache from being swapped out to disk (which would presumably all but eradicate the benefits of such a measure)? Cheers Frank -- frank joerdens joerdens new media urbanstr. 116 10967 berlin germany e: frank@joerdens.de t: +49 (0)30 69597650 f: +49 (0)30 7864046 h: http://www.joerdens.de pgp public key: http://www.joerdens.de/pgp/frank_joerdens.asc
Frank Joerdens wrote: > > On Wed, Nov 01, 2000 at 10:16:58AM +0000, Poul L. Christiansen wrote: > > PostgreSQL hits the disk on UPDATE/DELETE/INSERT operations. SELECT's > > are cached, but the default cache is only ½MB of RAM. You can change > > this to whatever you want. > > That sound like a very cool thing to do, and the default seems awfully > conservative, given the average server´s RAM equipment nowadays. If you > have a small Linux server with 128 MB of RAM, it would be interesting to > see what happens, performance-wise, if you increase the cache for > selects to, for instance, 64 MB. Has anyone tried to benchmark this? How > would you benchmark it? Where do you change this cache size? How do you > keep the cache from being swapped out to disk (which would presumably > all but eradicate the benefits of such a measure)? I have a PostgreSQL server with 80MB of RAM running Redhat Linux 7.0 and in my /etc/rc.d/init.d/postgresql start script I have these 2 lines that start the postmaster. echo 67108864 > /proc/sys/kernel/shmmax su -l postgres -c "/usr/bin/pg_ctl -D $PGDATA -p /usr/bin/postmaster -o '-i -B 4096 -o -F' start >/dev/null 2>&1" < /dev/null The first line increases the maxium shared memory to 64MB. The "-B 4096" indicates 4096 * 8kb = 32MB to each postmaster. I haven't benchmarked it, but I know it's MUCH faster. Poul L. Christiansen
> How do you > keep the cache from being swapped out to disk (which would presumably > all but eradicate the benefits of such a measure)? You make sure that you have enough RAM that you aren't using swap. : ) Seriously, as cheap as RAM is today, if a machine uses swap more than occasionally, an upgrade is in order. steve
Performance depends on a lot of factors. Shelling out $$$ for Sun hardware doesn't garuntee good performance. They might have been better off buying a Tru64 system with Compaq's jdk. Steve Wolfe wrote: > > > d) PHP may not be a great choice. It doesn't provide a lot of hooks > > for effective caching of database connections and/or results. > > mod_perl or Java servlets may be better, depending on the details. > > One of our competitors spent a very, very large deal of money on high-end > Sun equipment, so that they could write their CGI stuff in Java servlets. > It still ran slow. We run Perl on machines that pale compared to theirs, > and get far better performance. : ) > > steve -- Joseph Shraibman jks@selectacast.net Increase signal to noise ratio. http://www.targabot.com
> Performance depends on a lot of factors. Shelling out $$$ for Sun > hardware doesn't garuntee good performance. They might have been better > off buying a Tru64 system with Compaq's jdk. Yeah, it could be. But comparing the $7,000 Intel machine I built against a $20,000 Alpha, I'm still very happy with Intel. Yes, the Alpha was faster on a per-processor basis. But it also cost more than twice as much on a dollar-for-transaction basis. ; ) steve