Thread: max_connections limit violation not showing in pg_stat_activity
Hello We are using PostgreSQL 9.3.10 on RedHat (probably 6.x). The database is hosted by an internal service provider and we have superuser access to it over a PG client, e.g. psql, but not to the OS. For that reason we only have access to the log files indirectly using some of the built in system functions like pg_ls_dir, etc. Among other thing the database is the backend for a web application that expects a load of a some hundred users at a time (those are participans to online surveys that we use for computing economic indicators and access the system every month). The whole amount of people expected is above 5000, but we don't expect a too high concurrent access to the database. As mentioned a few hundreds at the beginning of the surveys. To be sure that we won't have problems with the peak times we created a load test using gatling that ramps up to 1000 users in 5 minutes in bunches of 10. At the beginning we had problems with the web server response that we were able to correct. Now we face problem with the max_connections limit of PostgreSQL. Currently it is set to the default of 100. We are going to look into it and either increase that limit or consider connections pooling. What bothers me however is that running a query on pg_stat_activity with a watch of 1 seconds never shows any value higher than 37 of concurrent active connections. SELECT count(*) FROM pg_stat_activity; watch 1; Due to that fact it took us quite a time to figure out that the bottleneck had become the database. We discovered it after looking into the log files (as mentioned above this is not very straightforward, in particular because the logs tend to become quite huge). I assume that the peaks of requests violating the limit happen between two calls of the query. Is there a better way to keep track of this kind of problems? I felt a bit weird not to be able to discover the issue sooner. And what would be a reasonable strategy to deal with the problem at hand? Increasing max_connections has repercussions on the configuration of work_mem (if I remember well) or on the other hand on the amount of physical memory that must be available on the system. On Thursday we are going to have a meeting with our DB hosting provider to discuss which improvement need to be made to meet the requirements of our applications (the web application mentioned is not the only one using the database, but is the only one where we expect such peaks). So I'd be very grateful for advice on this subject. Thank you. Regards Charles -- Swiss PostgreSQL Users Group c/o Charles Clavadetscher Treasurer Motorenstrasse 18 CH – 8005 Zürich http://www.swisspug.org +-----------------------+ | ____ ______ ___ | | / )/ \/ \ | | ( / __ _\ ) | | \ (/ o) ( o) ) | | \_ (_ ) \ ) _/ | | \ /\_/ \)/ | | \/ <//| |\\> | | _| | | | \|_/ | | | | PostgreSQL 1996-2016 | | 20 Years of Success | | | +-----------------------+
On Tue, Nov 22, 2016 at 12:48 PM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote: > We are using PostgreSQL 9.3.10 on RedHat (probably 6.x). Is it possible to upgrade? You are missing over a year's worth of fixes for serious bugs and security vulnerabilities. https://www.postgresql.org/support/versioning/ > Among other thing the database is the backend for a web application that > expects a load of a some hundred users at a time (those are participans > to online surveys that we use for computing economic indicators and > access the system every month). The whole amount of people expected is > above 5000, but we don't expect a too high concurrent access to the > database. As mentioned a few hundreds at the beginning of the surveys. > > To be sure that we won't have problems with the peak times we created a > load test using gatling that ramps up to 1000 users in 5 minutes in > bunches of 10. At the beginning we had problems with the web server > response that we were able to correct. Now we face problem with the > max_connections limit of PostgreSQL. Currently it is set to the default > of 100. We are going to look into it and either increase that limit or > consider connections pooling. On a web site with about 3000 active users, I found (through adjusting the connection pool size on the production database and monitoring performance) that we got best performance with a pool of about 40 connections. This was on a machine with 16 cores (never count HT "threads" as cores), 512GB RAM, and a RAID with 40 drives of spinning rust. http://tbeitr.blogspot.com/2015/11/for-better-service-please-take-number.html > What bothers me however is that running a query on pg_stat_activity with > a watch of 1 seconds never shows any value higher than 37 of concurrent > active connections. > > SELECT count(*) FROM pg_stat_activity; watch 1; At the times when the resources are overloaded by more connections than the resources can efficiently service -- well that's precisely the time that a sleeping "monitoring" process is least likely to be given a time slice to run. If you can manage to get pgbadger to run on your environment, and you turn on logging of connections and disconnections, you will be able to get far more accurate information. > Increasing max_connections has repercussions on the configuration > of work_mem (if I remember well) Each connection can allocate one work_mem allocation per node which requires a sort, hash, CTE, etc. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: max_connections limit violation not showing in pg_stat_activity
From
"Charles Clavadetscher"
Date:
Hello Kevin Thank you very much for your input. I appreciate it very much. > -----Original Message----- > From: Kevin Grittner [mailto:kgrittn@gmail.com] > Sent: Dienstag, 22. November 2016 22:37 > To: Charles Clavadetscher <clavadetscher@swisspug.org> > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity > > On Tue, Nov 22, 2016 at 12:48 PM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote: > > > We are using PostgreSQL 9.3.10 on RedHat (probably 6.x). > > Is it possible to upgrade? You are missing over a year's worth of fixes for serious bugs and security > vulnerabilities. Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x). > https://www.postgresql.org/support/versioning/ > > > Among other thing the database is the backend for a web application > > that expects a load of a some hundred users at a time (those are > > participans to online surveys that we use for computing economic > > indicators and access the system every month). The whole amount of > > people expected is above 5000, but we don't expect a too high > > concurrent access to the database. As mentioned a few hundreds at the beginning of the surveys. > > > > To be sure that we won't have problems with the peak times we created > > a load test using gatling that ramps up to 1000 users in 5 minutes in > > bunches of 10. At the beginning we had problems with the web server > > response that we were able to correct. Now we face problem with the > > max_connections limit of PostgreSQL. Currently it is set to the > > default of 100. We are going to look into it and either increase that > > limit or consider connections pooling. > > On a web site with about 3000 active users, I found (through adjusting the connection pool size on the production > database and monitoring performance) that we got best performance with a pool of about 40 connections. This was on > a machine with 16 cores (never count HT "threads" as cores), 512GB RAM, and a RAID with 40 drives of spinning rust. OK. I will have to check with our hosting people how many cores we have or can have on the new environment. I have seen that there is pgBouncer and pgPool. Would you recommend one of those? AFAICS both are installed on the clientside, so that we should be able to use them. > http://tbeitr.blogspot.com/2015/11/for-better-service-please-take-number.html > > > What bothers me however is that running a query on pg_stat_activity > > with a watch of 1 seconds never shows any value higher than 37 of > > concurrent active connections. > > > > SELECT count(*) FROM pg_stat_activity; watch 1; > > At the times when the resources are overloaded by more connections than the resources can efficiently service -- > well that's precisely the time that a sleeping "monitoring" process is least likely to be given a time slice to run. > If you can manage to get pgbadger to run on your environment, and you turn on logging of connections and > disconnections, you will be able to get far more accurate information. Yes, it sounds reasonable. I assumed that this kind of measurements have a higher priority or reserved slots for them. Inthose occasions is when they are most needed. And thank you for the hint to pgbadger. I will take a look into it, but an installation on the server completely dependson our hosting service people. I am not sure this is feasible, but I can imagine an alternative scenario, using alocal installation for tuning and then pass the parameters to the hosters for implementation. Might help. Regards Charles > > > Increasing max_connections has repercussions on the configuration of > > work_mem (if I remember well) > > Each connection can allocate one work_mem allocation per node which requires a sort, hash, CTE, etc. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company
On Wed, Nov 23, 2016 at 4:43 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote: > From: Kevin Grittner [mailto:kgrittn@gmail.com] >> Is it possible to upgrade? You are missing over a year's worth >> of fixes for serious bugs and security vulnerabilities. > > Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x). > >> https://www.postgresql.org/support/versioning/ An upgrade from 9.3.x to 9.6.x is a major release upgrade, which gets you new features and usually gets you improvements in performance and scalability. The 9.3 major release will be supported for almost 2 more years, so I wasn't so concerned about that as being on 9.3.10 when the latest bug fix version of 9.3 is 9.3.15. To avoid hitting bugs that others have already hit and reported, with fixes published, it is wise to try to upgrade to the latest minor release fairly quickly. If the minor release fixes a serious security vulnerability, I think it is a good idea to update within a day or two of release. > I will have to check with our hosting people how many cores we > have or can have on the new environment. I have seen that there > is pgBouncer and pgPool. Would you recommend one of those? pgBouncer is more lightweight, so if you don't need any of the features present only pgPool, I would go with pgBouncer. Depending on your application software environment, it may be even better to use a pool built into the application development framework. There are several very good pools available to Java environments. I can personally attest to the quality of Apache dbcp, but others have mentioned that they like other Java connection pools even better. You might want to search the archives, or maybe someone will mention the others again on this thread. >>> SELECT count(*) FROM pg_stat_activity; watch 1; >> >> At the times when the resources are overloaded by more >> connections than the resources can efficiently service -- well >> that's precisely the time that a sleeping "monitoring" process >> is least likely to be given a time slice to run. If you can >> manage to get pgbadger to run on your environment, and you turn >> on logging of connections and disconnections, you will be able >> to get far more accurate information. > > Yes, it sounds reasonable. I assumed that this kind of > measurements have a higher priority or reserved slots for them. > In those occasions is when they are most needed. There is no such feature in PostgreSQL. It might be worthwhile, although how that would be implemented is not obvious, short of a sysadmin looking for the monitoring backend process and running "nice" against it. > And thank you for the hint to pgbadger. I will take a look into > it, but an installation on the server completely depends on our > hosting service people. I am not sure this is feasible, but I can > imagine an alternative scenario, using a local installation for > tuning and then pass the parameters to the hosters for > implementation. As long as you can control the PostgreSQL configuration (to set the right logging options) and can retrieve the log files, you should be able to use it. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: max_connections limit violation not showing in pg_stat_activity
From
"Charles Clavadetscher"
Date:
Hello Kevin Getting back at this. > -----Original Message----- > From: Kevin Grittner [mailto:kgrittn@gmail.com] > Sent: Mittwoch, 23. November 2016 17:04 > To: Charles Clavadetscher <clavadetscher@swisspug.org> > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] max_connections limit violation not showing in pg_stat_activity > > On Wed, Nov 23, 2016 at 4:43 AM, Charles Clavadetscher <clavadetscher@swisspug.org> wrote: > > From: Kevin Grittner [mailto:kgrittn@gmail.com] > > >> Is it possible to upgrade? You are missing over a year's worth of > >> fixes for serious bugs and security vulnerabilities. > > > > Yes. Actually it is foreseen to move to 9.6 (and RedHat 7.x). > > > >> https://www.postgresql.org/support/versioning/ > > An upgrade from 9.3.x to 9.6.x is a major release upgrade, which gets you new features and usually gets you > improvements in performance and scalability. The 9.3 major release will be supported for almost 2 more years, so I > wasn't so concerned about that as being on 9.3.10 when the latest bug fix version of 9.3 is 9.3.15. To avoid > hitting bugs that others have already hit and reported, with fixes published, it is wise to try to upgrade to the > latest minor release fairly quickly. If the minor release fixes a serious security vulnerability, I think it is a > good idea to update within a day or two of release. > > > I will have to check with our hosting people how many cores we have or > > can have on the new environment. I have seen that there is pgBouncer > > and pgPool. Would you recommend one of those? > > pgBouncer is more lightweight, so if you don't need any of the features present only pgPool, I would go with > pgBouncer. Depending on your application software environment, it may be even better to use a pool built into the > application development framework. There are several very good pools available to Java environments. I can > personally attest to the quality of Apache dbcp, but others have mentioned that they like other Java connection > pools even better. > You might want to search the archives, or maybe someone will mention the others again on this thread. Finally I set up pgbouncer and a simple first test with a somewhat heavy load (1000 users in 5 minutes corresponding to atotal amount of 12000 http requests and 5000 DB requests) shows an incredible improvement in performance. Without the poolerroughly a fifth of the calls died in a timeout. The remaining show an average response time of more than 8 seconds.With the pooler all requests went through without any error whatsoever and the mean response time dropped to 23 ms(the maximum being at 193 ms). At first I thought that I had some error in my simulation somewhere, but all 5000 DB requestshave been correctly performed as I could check in a log table that I prepared for that specific purpose. There isdefinetely not doubt about the beneficial effect of the connection pooler. Now I will take some time to better understand the configuration settings. Maybe a question on that. I have following configuration entries: pool_mode = session server_reset_query = DISCARD ALL max_client_conn = 100 default_pool_size = 60 min_pool_size = 5 max_db_connections = 60 max_user_connections = 60 The pooler only serves a single application and only has a single connection string with the same user for all calls (thatbeing the application user defined for that specific web application). Our current DB has a max_connections of 100.I assumed that max_client_conn should match this entry. With max_db_connections and max_user_connections I try to makesure that calls from this application won't take up all resources (as it happened without the pooler). This seems towork correctly. During the simulation other applications that also require the database but don't use the pooler, worknormally, i.e. there are no noticeable effects. If you have some suggestion or see potential problems in these settings, that I am not aware of so far, feel free to pointme to them. Thank you very much again for the excellent tips. Bye Charles > >>> SELECT count(*) FROM pg_stat_activity; watch 1; > >> > >> At the times when the resources are overloaded by more connections > >> than the resources can efficiently service -- well that's precisely > >> the time that a sleeping "monitoring" process is least likely to be > >> given a time slice to run. If you can manage to get pgbadger to run > >> on your environment, and you turn on logging of connections and > >> disconnections, you will be able to get far more accurate > >> information. > > > > Yes, it sounds reasonable. I assumed that this kind of measurements > > have a higher priority or reserved slots for them. > > In those occasions is when they are most needed. > > There is no such feature in PostgreSQL. It might be worthwhile, although how that would be implemented is not > obvious, short of a sysadmin looking for the monitoring backend process and running "nice" against it. > > > And thank you for the hint to pgbadger. I will take a look into it, > > but an installation on the server completely depends on our hosting > > service people. I am not sure this is feasible, but I can imagine an > > alternative scenario, using a local installation for tuning and then > > pass the parameters to the hosters for implementation. > > As long as you can control the PostgreSQL configuration (to set the right logging options) and can retrieve the log > files, you should be able to use it. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company