Thread: Postgre and Web Request
Hi, I am a new user of PostgreSQL and there are some questions about its performance in a scenario with a high requisition rate Lets picture an imaginary scenario: In my system (Debian Linux), there are 200.000.000 records on the database, and a total number of 10.000 diferent users. (the manual stated the following: there is a main process called postmaster. It starts new processes to each different request and each different user ... I dont understand very well ... please correct me if I`m wrong) If all users try to access, through WEB, at same time the database, what happens: 1. With the OS? Will it crash? 2. Will the Postmaster process startup 10.000 diferent processes at the same time? 3. What about performance? Is there any peformance downgrade? 4. What is the best solution for this problem? 5. How many simultaneos requests may the Postmaster open withouth decreasing perfomance? Grettings, Kilmer Cruz
> (the manual stated the following: there is a main process > called postmaster. It starts new processes to each different request and > each different user ... I dont understand very well ... please correct me > if I`m wrong) > If all users try to access, through WEB, at same time the database, > what happens: Well the likelyhood of 10,000 users accessing the website at the EXACT same time is highly unlikely (exponentially so). Remember the web is stateless so each full request is processed and then the connection is terminated. This means that connection to postgresql will also be dropped (unless you use something like connection pooling or persistent connections). > 1. With the OS? Will it crash? See above. And actually if you really did have a situation where 10,000 users accessed the site at the exact same time you would need a pretty large cluster to handle it... regardless of PostgreSQL. > 2. Will the Postmaster process startup 10.000 diferent processes > at the same time? See above. > 3. What about performance? Is there any peformance downgrade? Depends on the machine but that would be the case with any database. > 4. What is the best solution for this problem? You are asking for a solution to a problem that doesn't really exist at the database level. You need to spec your hardware requirements overall and get an firmer understaning of your application requirements. Like I said, it is highly unlikely that your demands are as high as you mention. > 5. How many simultaneos requests may the Postmaster open > withouth decreasing perfomance? > Depends on the machine... ram, amount of hard disks (not space, quantity of disks) etc... Sincerely, Joshua D. Drake > > Grettings, > Kilmer Cruz > > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend
Kilmer C. de Souza wrote: > Hi, > > > I am a new user of PostgreSQL and there are some questions about its > performance in a scenario with a high requisition rate > > Lets picture an imaginary scenario: > In my system (Debian Linux), there are 200.000.000 records on the > database, and a total number of 10.000 diferent users. > (the manual stated the following: there is a main process > called postmaster. It starts new processes to each different request and > each different user ... I dont understand very well ... please correct me > if I`m wrong) You're right. It will start a seperate process for each connection. > If all users try to access, through WEB, at same time the database, > what happens: > 1. With the OS? Will it crash? No. At least, no self-respecting posix system will crash. None that I know of, anyway. > 2. Will the Postmaster process startup 10.000 diferent processes > at the same time? No. > 3. What about performance? Is there any peformance downgrade? Yes. At that load you'll almost definately see a massive performance problem. > 4. What is the best solution for this problem? You're presenting an unrealistic scenerio, and I'll explain why in a moment. > 5. How many simultaneos requests may the Postmaster open > withouth decreasing perfomance? Depends on the hardware. Fact is, the scenerio of "10,000 users access at the same time" will almost never happen ... especially not through the web. That would be one tremendiously popular website. First off, any web browser I've ever seen puts a cap on the number of simultaneous connections. Usually around a few hundred. Let's say your web server has a cap of 200 simultaneous connections (not unusual) and you get 10,000 requests at exactly the same moment (unlikely in any case) Your web browser will immediately start servicing 200 of the requests. The remaining 9,800 will be queued to be handled as soon as one of the 200 is complete. Since web requests generally finish fairly quickly, you'll actually see the 10,000 get serviced in short order, although not as quickly as the 9,999th surfer would like, I'm sure. However, it's likely that your operating system won't be able to queue that big of a backlog, and quite a few of those attempts will return an error that the server is too busy. On the flip side, let's do some numbers, if you're getting 10,000 request per second, that's 864,000,000 hits per day ... are you actually expecting that amount of traffic? 10,000 per second doesn't even qualify as "at the same time". Evaluating your needs would better be accomplished by calculating the max load over a fixed period of time, determining how long an average request takes, and using that to figure out the number of processes that will need to be running. For example: If I figure that between 9:00 and 11:00 am is the busiest it will get, and I'll get approximate 100,000 hits, that's about 14 hits per second, and if each request takes about 3 seconds, I can figure that I'll have 42 requests active during any one second of that time. Not too bad of a load. To be safe, I double that, and set Apache's max processes to 100, then set Postgres max processes to the same. Then I spec out hardware that can handle the load and I'm off and running. -- Bill Moran Potential Technologies http://www.potentialtech.com
Kilmer C. de Souza said: > I am a new user of PostgreSQL and there are some questions about its > performance in a scenario with a high requisition rate > > Lets picture an imaginary scenario: > In my system (Debian Linux), there are 200.000.000 records on the > database, and a total number of 10.000 diferent users. > (the manual stated the following: there is a main process > called postmaster. It starts new processes to each different request and > each different user ... I dont understand very well ... please correct me > if I`m wrong) > If all users try to access, through WEB, at same time the > database, > what happens: > 1. With the OS? Will it crash? > 2. Will the Postmaster process startup 10.000 diferent > processes > at the same time? > 3. What about performance? Is there any peformance downgrade? > 4. What is the best solution for this problem? > 5. How many simultaneos requests may the Postmaster open > withouth decreasing perfomance? Depending on your web development environment (java, php, .NET) etc, you should be able to use some mechanism that will provide a pool of connections to the database. Each request does not open a new connection (and then release it), but insteads gets a connection from the pool to use, and returns it back to the pool when done. In a recent java web website (I was involved with) which was serving on average 3 page requests per second, we only used a pool of 8 connections. Also some pages requests required multiple queries to generate all the data prior to rendering the page. I can't remember the number of concurrent user. But you get the idea that even a small pool can service lots of traffic (if your SQL queries are nicely optimized). I'm afraid I cannot answer your specific questions on how many simulatenous and active connections postgres will support; but I suspect that it is limited by memory and hardware. Perhaps someone else can help. John Sidney-Woollett
Oww ... sorry man ... I make a mistake ... there are 10.000 users and 1.000 from 10.000 try to access at the same time the database. Can you help me again with this condition? Grettings Kilmer Cruz ----- Original Message ----- From: "Joshua D. Drake" <jd@commandprompt.com> To: "Kilmer C. de Souza" <listas@miti.com.br> Cc: <pgsql-general@postgresql.org> Sent: Wednesday, April 28, 2004 3:56 PM Subject: Re: [GENERAL] Postgre and Web Request > > > (the manual stated the following: there is a main process > > called postmaster. It starts new processes to each different request and > > each different user ... I dont understand very well ... please correct me > > if I`m wrong) > > If all users try to access, through WEB, at same time the database, > > what happens: > > Well the likelyhood of 10,000 users accessing the website at the EXACT > same time is highly unlikely (exponentially so). Remember the web is > stateless so each full request is processed and then the connection is > terminated. This means that connection to postgresql will also be > dropped (unless you use something like connection pooling or persistent > connections). > > > 1. With the OS? Will it crash? > > See above. And actually if you really did have a situation where 10,000 > users accessed the site at the exact same time you would need a pretty > large cluster to handle it... regardless of PostgreSQL. > > > 2. Will the Postmaster process startup 10.000 diferent processes > > at the same time? > > See above. > > > 3. What about performance? Is there any peformance downgrade? > > Depends on the machine but that would be the case with any database. > > > > 4. What is the best solution for this problem? > > You are asking for a solution to a problem that doesn't really exist at > the database level. You need to spec your hardware requirements overall > and get an firmer understaning of your application requirements. Like I > said, it is highly unlikely that your demands are as high as you mention. > > > > 5. How many simultaneos requests may the Postmaster open > > withouth decreasing perfomance? > > > > Depends on the machine... ram, amount of hard disks (not space, quantity > of disks) etc... > > Sincerely, > > Joshua D. Drake > > > > > > Grettings, > > Kilmer Cruz > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 8: explain analyze is your friend > >
> I make a mistake ... there are 10.000 users and 1.000 from 10.000 try to > access at the same time the database. I have problems with your numbers. Even if you have 10,000 users who are ALL online at the same time, in any reasonable period of time (say 60 seconds), how many of them would initiate a request? In most online applications, 95% OR MORE of all time is spent waiting for the user to do something. Web-based applications seem to fit that rule fairly well, because nothing happens at the server end for any given user until a 'submit' button is pressed. Consider, for example, a simple name-and-address entry form. A really fast typist can probably fill out 60-70 of them in an hour. That means each user is submitting a request every 50-60 seconds. Thus if there were 10,000 users doing this FULL TIME, they would generate something under 200 requests/second. In practice, I wouldn't expect to see more than 50-75 requests/second, and it shouldn't be too hard to design a hardware configuration capable of supporting that, disk speed and memory size are likely to be the major bottleneck points. I don't know if anyone has ever set up a queuing theory model for a PostgreSQL+Apache environment, there are probably too many individual tuning factors (not to mention application specific factors) to make a generalizable model practical. -- Mike Nolan
Kilmer: Rhetorical question: I read in the documentation for Apache and IIS that they create a new process or thread for each user. Can your hardware handle 10,000 simultaneous threads? : ) Probably not, and there is a reason that it doesn't need to..... If your web setup uses connection pooling, ask the person who is setting that up for a better explanation. If not, then I'll assume that you won't use connection pooling. If you don't use connection pooling, a new process is created for each web hit, but that process ends as soon as the web hit is finished. Thus, the number of users you have isn't the number to look at. What matters is how many hits per second you get at your peak. We get about 2 hits per second. Assuming that each hit takes 1/10 of a second, then that means that (on average) for 2/10 of a second, the database is doing work, and for 8/10 of a second, it is idle, with NO processes running other than the postmaster. Even if all your users are online at the same time, then the hit rate is what matters. Get out your web logs and see how much traffic you get. As a rough rule, unless you have something more than 10 hits per second or your web-app does something unusual, then you don't need to worry about any of this stuff at all. Paul Tillotson >Hi, > > I am a new user of PostgreSQL and there are some questions about its >performance in a scenario with a high requisition rate > > Lets picture an imaginary scenario: > In my system (Debian Linux), there are 200.000.000 records on the >database, and a total number of 10.000 diferent users. > (the manual stated the following: there is a main process >called postmaster. It starts new processes to each different request and >each different user ... I dont understand very well ... please correct me >if I`m wrong) > If all users try to access, through WEB, at same time the databasewhat happens: > 1. With the OS? Will it crash? > >
listas@miti.com.br ("Kilmer C. de Souza") writes: > Oww ... sorry man ... > I make a mistake ... there are 10.000 users and 1.000 from 10.000 try to > access at the same time the database. > Can you help me again with this condition? The issues don't really change. Opening 1000 concurrent connections means spawning 1K PostgreSQL processes, which will reserve a pile of memory, and cause a pretty severe performance problem. It is _vital_ that your application uses some form of "connection pooling" so that it can share 50-100 connections across the requests rather than opening 1K connections. How that is done will depend on your favored web application framework. Most frameworks have some notion of a "connection pool," so this certainly shouldn't need to be a crippling problem. -- (format nil "~S@~S" "cbbrowne" "ntlug.org") http://cbbrowne.com/info/lsf.html Rules of the Evil Overlord #75. "I will instruct my Legions of Terror to attack the hero en masse, instead of standing around waiting while members break off and attack one or two at a time." <http://www.eviloverlord.com/>
Le mercredi 28 Avril 2004 20:56, Joshua D. Drake a écrit : > > 1. With the OS? Will it crash? > > See above. And actually if you really did have a situation where 10,000 > users accessed the site at the exact same time you would need a pretty > large cluster to handle it... regardless of PostgreSQL. Excuse for my stupid question, Joshua, but how do you make a PostgreSQL Cluster ??? I'm really interesting about this solution ... regards, -- Bill Footcow
Hervé Piedvache wrote: > Le mercredi 28 Avril 2004 20:56, Joshua D. Drake a écrit : > >>> 1. With the OS? Will it crash? >> >>See above. And actually if you really did have a situation where 10,000 >>users accessed the site at the exact same time you would need a pretty >>large cluster to handle it... regardless of PostgreSQL. > Excuse for my stupid question, Joshua, but how do you make a PostgreSQL > Cluster ??? I'm really interesting about this solution ... You seem to misunderstand. In postgresql, a database installation is called as cluster. When you initdb, it initializes a database area where more than one databases can be created. Since there are more than one databases, such an installation is called as database cluster. It is not like a beowulf cluster..:-) HTH Shridhar
Le jeudi 29 Avril 2004 08:56, Shridhar Daithankar a écrit : > Hervé Piedvache wrote: > > Le mercredi 28 Avril 2004 20:56, Joshua D. Drake a écrit : > >>> 1. With the OS? Will it crash? > >> > >>See above. And actually if you really did have a situation where 10,000 > >>users accessed the site at the exact same time you would need a pretty > >>large cluster to handle it... regardless of PostgreSQL. > > > > Excuse for my stupid question, Joshua, but how do you make a PostgreSQL > > Cluster ??? I'm really interesting about this solution ... > > You seem to misunderstand. In postgresql, a database installation is called > as cluster. When you initdb, it initializes a database area where more than > one databases can be created. Since there are more than one databases, such > an installation is called as database cluster. > > It is not like a beowulf cluster..:-) OK ... but sorry I may misunderstand something more ... but why Joshua is talking about a large cluster ... ? Why the need of several database for one application ? And how can you maintain one application with several database ?? (replication ?, I'm not sure there is a realtime solution of replication with PostgreSQL able to accep 10000 insert by second ... for the moment !?) And second point ... to talk about beowulf cluster .... is there any solution for PostgreSQL ? We here talking many time about MySQL solution for that ... why PostgreSQL do not have this kind of solution, yet ? Regards, -- Hervé
Hervé Piedvache wrote: > Le jeudi 29 Avril 2004 08:56, Shridhar Daithankar a écrit : > >>Hervé Piedvache wrote: >> >>>Le mercredi 28 Avril 2004 20:56, Joshua D. Drake a écrit : >>> >>>>> 1. With the OS? Will it crash? >>>> >>>>See above. And actually if you really did have a situation where 10,000 >>>>users accessed the site at the exact same time you would need a pretty >>>>large cluster to handle it... regardless of PostgreSQL. >>> >>>Excuse for my stupid question, Joshua, but how do you make a PostgreSQL >>>Cluster ??? I'm really interesting about this solution ... >> >>You seem to misunderstand. In postgresql, a database installation is called >>as cluster. When you initdb, it initializes a database area where more than >>one databases can be created. Since there are more than one databases, such >>an installation is called as database cluster. >> >>It is not like a beowulf cluster..:-) > > > OK ... but sorry I may misunderstand something more ... but why Joshua is > talking about a large cluster ... ? Why the need of several database for one > application ? Yeah right. After I sent the last reply, I think Joshua might actually meant a cluster as in several machines working together. But anyways, what I explained is also correct but not sure now, if this is the same context. > And how can you maintain one application with several database ?? Well, you can use dblink to link databases from different machines in a central "hub" but that should be in the database design from the start. Furthermore there are some limitations to it. Check dblink module in contrib. It could be made to do some interesting work..:-) > (replication ?, I'm not sure there is a realtime solution of replication with > PostgreSQL able to accep 10000 insert by second ... for the moment !?) It can accept that many inserts per second given enough disk bandwidth and few concurrent connection. I wouldn't be surprised if it reaches that rate with a fiber channel disk array with 100+ connections concurrently inserting data on a middle to high end machine. Disk bandwidth is very crucial for databases. > And second point ... to talk about beowulf cluster .... is there any solution > for PostgreSQL ? We here talking many time about MySQL solution for that ... > why PostgreSQL do not have this kind of solution, yet ? Ohh. good question.. there are so many facets of this problems that it would take a book...:-) There are many replication solutions for postgresql. Most of them do async replication quite nicely. Synchronous multimaster replication isn't there AFAIK. But I could be out of touch.. Check gborg(http://gborg.postgresql.org) on replication. I seem not to reach it now. Shridhar
> > (replication ?, I'm not sure there is a realtime solution of replication with > > PostgreSQL able to accep 10000 insert by second ... for the moment !?) pgpool(annouced at general/announce list) is a synchronous single master replication solution and it runs UPDATE/INSERT about 2/3 speed of a normal PostgreSQL. So if you are lucky enough to aquire good hardware which can run normal PostgreSQL with 15000 INSERTs, you could obtain that speed with replication. > > And second point ... to talk about beowulf cluster .... is there any solution > > for PostgreSQL ? We here talking many time about MySQL solution for that ... > > why PostgreSQL do not have this kind of solution, yet ? > > Ohh. good question.. there are so many facets of this problems that it would > take a book...:-) > > There are many replication solutions for postgresql. Most of them do async > replication quite nicely. Synchronous multimaster replication isn't there AFAIK. > But I could be out of touch.. PGCluster (http://www.csra.co.jp/~mitani/jpug/pgcluster/en/index.html) is a synchronous multimaster replication solution. -- Tatsuo Ishii
Hello, > Depending on your web development environment (java, php, .NET) etc, > you should be able to use some mechanism that will provide a pool of > connections to the database. Each request does not open a new > connection (and then release it), but insteads gets a connection from > the pool to use, and returns it back to the pool when done. Where can I find some examples for connection pooling with php? Or must I just use persistence connections? Greetings, Stefan Sturm
On Wed, 2004-04-28 at 17:58, Chris Browne wrote: > listas@miti.com.br ("Kilmer C. de Souza") writes: > > Oww ... sorry man ... > > I make a mistake ... there are 10.000 users and 1.000 from 10.000 try to > > access at the same time the database. > > Can you help me again with this condition? > > The issues don't really change. Opening 1000 concurrent connections > means spawning 1K PostgreSQL processes, which will reserve a pile of > memory, and cause a pretty severe performance problem. > I think you need some qualifiers to that statement, since opening the processes themselves should cause little to no problems at all if given the right hardware. The main database I work on is currently set to handle up to 825 simultaneous connections during peak times and that is with perl dbi style connection pooling. If it weren't for i/o issues, I'm pretty sure PostgreSQL would have no problems at all running that load, which really only means we need to get a faster disk system set up. (Currently the data and wal live on a single 10,000 rpm SCSI drive). While I agree with everyone else in this thread that the OP is not likely to ever need such a high connection count, there's no reason that PostgreSQL can't support it given you have enough RAM, fast enough disks, and you don't shoot yourself in the foot with FK/Locking issues in the app. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Stefan Sturm wrote: > Hello, > >>Depending on your web development environment (java, php, .NET) etc, >>you should be able to use some mechanism that will provide a pool of >>connections to the database. Each request does not open a new >>connection (and then release it), but insteads gets a connection from >>the pool to use, and returns it back to the pool when done. > > Where can I find some examples for connection pooling with php? Or must I > just use persistence connections? php handles connection pooling more or less automatically ... as long as you use pg_pconnect() instead of pg_connect(). You have to be careful using connection pooling. For example, if you create a temporary table and forget to delete it when the the script completes, the next time the script runs, it's likely that the connection will be reused and the script will cause an error because the temp table already exists. Here's some docs at the php web site that may help: http://us4.php.net/manual/en/features.persistent-connections.php -- Bill Moran Potential Technologies http://www.potentialtech.com
> > Depending on your web development environment (java, php, .NET) etc, > > you should be able to use some mechanism that will provide a pool of > > connections to the database. Each request does not open a new > > connection (and then release it), but insteads gets a connection from > > the pool to use, and returns it back to the pool when done. > > Where can I find some examples for connection pooling with php? Or must I > just use persistence connections? Use pgpool (ftp://ftp.sra.co.jp/pub/cmd/postgres/pgpool/pgpool-1.1.tar.gz). -- Tatsuo Ishii
Bill Moran wrote: > Stefan Sturm wrote: > >> Hello, >> >>> Depending on your web development environment (java, php, >>> .NET) etc, you should be able to use some mechanism that will >>> provide a pool of connections to the database. Each request >>> does not open a new connection (and then release it), but >>> insteads gets a connection from the pool to use, and returns >>> it back to the pool when done. >> >> Where can I find some examples for connection pooling with php? >> Or must I just use persistence connections? > > php handles connection pooling more or less automatically ... as > long as you use pg_pconnect() instead of pg_connect(). > > You have to be careful using connection pooling. For example, if > you create a temporary table and forget to delete it when the the > script completes, the next time the script runs, it's likely that > the connection will be reused and the script will cause an error > because the temp table already exists. Gavin Sherry added the ON COMMIT DROP clause to CREATE TEMPORARY TABLE in 7.4 which, depending upon the expected life-cycle of the temporary table, may prove useful in the above scenario. FWIW, Mike Mascari
On Thu, 29 Apr 2004, Tatsuo Ishii wrote: > > > Depending on your web development environment (java, php, .NET) etc, > > > you should be able to use some mechanism that will provide a pool of > > > connections to the database. Each request does not open a new > > > connection (and then release it), but insteads gets a connection from > > > the pool to use, and returns it back to the pool when done. > > > > Where can I find some examples for connection pooling with php? Or must I > > just use persistence connections? > > Use pgpool > (ftp://ftp.sra.co.jp/pub/cmd/postgres/pgpool/pgpool-1.1.tar.gz). Tatsuo, I just tried pgpool (not replication yet) and noticed warnings appear in pgsql.log: (I just changed port number in my perl script to 9999) Apr 29 19:19:59 mira postgres[363]: [4-1] WARNING: there is no transaction in progress Oleg > -- > Tatsuo Ishii > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> On Thu, 29 Apr 2004, Tatsuo Ishii wrote: > > > > > Depending on your web development environment (java, php, .NET) etc, > > > > you should be able to use some mechanism that will provide a pool of > > > > connections to the database. Each request does not open a new > > > > connection (and then release it), but insteads gets a connection from > > > > the pool to use, and returns it back to the pool when done. > > > > > > Where can I find some examples for connection pooling with php? Or must I > > > just use persistence connections? > > > > Use pgpool > > (ftp://ftp.sra.co.jp/pub/cmd/postgres/pgpool/pgpool-1.1.tar.gz). > > Tatsuo, I just tried pgpool (not replication yet) and noticed warnings appear > in pgsql.log: > (I just changed port number in my perl script to 9999) > > Apr 29 19:19:59 mira postgres[363]: [4-1] WARNING: there is no transaction in progress It's harmless. pgpool issues "ABORT" and "RESET ALL" to backend each time when client connects to pgpool with pooled connection. This is neccesary since previos client might disconnect to pgpool without closing his/her transaction. -- Tatsuo Ishii