Thread: Tuning / performance questions
We have a system, that due to recent events is getting a lot heavier use. The application makes extensive use of functions. These functions would typically run sub-second but now can take several seconds.
I'm wondering what general ways there are to monitor and improve performance? We look at pgadmin's server status but that only sees the function being run. Additionally, is there a good way to 'explain' a function? Or do you have to execute the function steps individually and explain those?
The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a virtual machine.
Running pg_version returns 'PostgreSQL 9.1.0 on x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585], 64-bit' and select pg_size_pretty(pg_database_size('nrgdb')); returns 63 GB.
The server typically has up to 500 connections with a max of 750 connections.
Below are the non-default values of our configuration file.
Any thoughts on what we should look at?
Thanks,
Bryan
listen_addresses = '*' # what IP address(es) to listen on;
max_connections = 750 # (change requires restart)
superuser_reserved_connections = 9 # (change requires restart)
shared_buffers = 8192MB # min 128kB or max_connections*16kB
temp_buffers = 64MB # min 800kB
max_prepared_transactions = 250 # can be 0 or more
work_mem = 512MB # min 64kB
maintenance_work_mem = 1GB # min 1MB
fsync = off # turns forced synchronization on or off
full_page_writes = off # recover from partial page writes
wal_buffers = 16MB # min 32kB
commit_delay = 1000 # range 0-100000, in microseconds
commit_siblings = 5 # range 1-1000
checkpoint_segments = 50 # in logfile segments, min 1, 16MB each
checkpoint_timeout = 5min # range 30s-1h
checkpoint_warning = 1min # 0 is off
effective_cache_size = 16GB
log_destination = 'stderr' # Valid values are combinations of
logging_collector = on
Log_directory = '/var/log/postgres' # Directory where log files are written
log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.
log_rotation_age = 1d # Automatic rotation of logfiles will
log_rotation_size = 20MB # Automatic rotation of logfiles will
log_min_messages = info # Values, in order of decreasing detail:
log_min_duration_statement = 10000 # -1 is disabled, 0 logs all statements
log_line_prefix = '%t %p %u@%h: ' # Special values for Pgsi
log_statement = 'none' # none, ddl, mod, all
log_duration = off
autovacuum = on # enable autovacuum subprocess?
datestyle = 'iso, mdy'
lc_messages = 'en_US.UTF-8' # locale for system error message
lc_monetary = 'en_US.UTF-8' # locale for monetary formatting
lc_numeric = 'en_US.UTF-8' # locale for number formatting
lc_time = 'en_US.UTF-8' # locale for time formatting
Bryan Montgomery wrote: > We have a system, that due to recent events is getting a lot > heavier use. The application makes extensive use of functions. > These functions would typically run sub-second but now can take > several seconds. > The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a > virtual machine. > > Running pg_version returns 'PostgreSQL 9.1.0 on > x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.5.1 > 20101208 [gcc-4_5-branch revision 167585], 64-bit' and select > pg_size_pretty(pg_database_size('nrgdb')); returns 63 GB. > The server typically has up to 500 connections with a max of 750 > connections. > max_connections = 750 # (change requires restart) > temp_buffers = 64MB # min 800kB > work_mem = 512MB # min 64kB 750 * ((512 MB) + (64 MB)) = 421.87500 gigabytes Once a connection uses memory for temp_buffers, it doesn't release it for as long as that connection runs. Each connection can allocate work_mem for each node of a query plan. Even though maximum usage can be more than one allocation per connection, rule of thumb is to assume just one. You are up to needing 422 GB + shared memory + OS space (including some buffers and cache) + whatever else you run on this VM. You have 32 GB. You will almost certainly have problems at high load. Try putting pgbouncer in front of the database, configured to use transaction mode and accept 750 user connections while only keeping 20 or so database conections open. > max_prepared_transactions = 250 # can be 0 or more Are you really using two phase commit and a transaction manager? (Don't confuse this setting with something related to prepared *statements* -- prepared *transacitons* are a separate issue.) Even if you are using prepared transactions, do you really expect your transaction manager to let 250 transactions pile up in the database between the first and second phase of commit? BTW, you should be monitoring this table for old prepared transactions that to prevent problems with bloat. > fsync = off > full_page_writes = off You didn't mention your backup scheme, but be prepared for the fact that with these settings, if the VM (or its underlying OS or hardward) fails, your database will be corrupted and you may have no choice but to use your backup. > commit_delay = 1000 # range 0-100000, in microseconds > > commit_siblings = 5 # range 1-1000 These settings are notoriously hard to configure from the default without actually making things worse. Be very sure you know what you are doing and have carefully benchmarked this against your real workload; otherwise it is probably better to put these back to the defaults. There may be some other fine-tuning opportunities, but these issues should be fixed first, and it would be best to have an actual query that is performing poorly to try to tune some of the other settings. -Kevin
Reply follows inline.
On 11/06/2012 09:04 PM, Bryan Montgomery wrote:
I'm wondering what general ways there are to monitor and improve performance? We look at pgadmin's server status but that only sees the function being run. Additionally, is there a good way to 'explain' a function? Or do you have to execute the function steps individually and explain those?
See the auto_explain contrib module. It can explain statements within functions, as well as the functions themselves.
http://www.postgresql.org/docs/current/static/auto-explain.html
Get a connection pooler. Urgently. See http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely that your server is running efficiently with that many concurrent connections actively working. Reducing it to (say) 100 and using transaction-level connection pooling may boost performance significantly.The server typically has up to 500 connections with a max of 750 connections.
That's really dangerous with your connection count. If many connections actually use that, you'll run out of RAM in a hurry and enter nasty paging storm. If possible, reduce it, then raise it selectively in transactions where you know a high work_mem is needed.work_mem = 512MB # min 64kB
fsync = off # turns forced synchronization on or off
So you don't value your data and don't mind if you lose all of it, permanently and unrecoverably, if your server loses power or the host OS hard crashes?
It's much safer to use `synchronous_commit = off` and a commit_delay. If that isn't enough, get fast-flushing storage like a good raid controller with a battery backed cache you can put in write-back mode, or some high quality SSDs with power-protected write caches.
As above: I hope your data isn't important to you.full_page_writes = off # recover from partial page writes
--
Craig Ringer
Thanks for including your configuration and version; it makes things much easier.
Reply follows inline.
On 11/06/2012 09:04 PM, Bryan Montgomery wrote:See the auto_explain contrib module. It can explain statements within functions, as well as the functions themselves.I'm wondering what general ways there are to monitor and improve performance? We look at pgadmin's server status but that only sees the function being run. Additionally, is there a good way to 'explain' a function? Or do you have to execute the function steps individually and explain those?
http://www.postgresql.org/docs/current/static/auto-explain.htmlGet a connection pooler. Urgently. See http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely that your server is running efficiently with that many concurrent connections actively working. Reducing it to (say) 100 and using transaction-level connection pooling may boost performance significantly.The server typically has up to 500 connections with a max of 750 connections.
That's really dangerous with your connection count. If many connections actually use that, you'll run out of RAM in a hurry and enter nasty paging storm. If possible, reduce it, then raise it selectively in transactions where you know a high work_mem is needed.work_mem = 512MB # min 64kB
So you don't value your data and don't mind if you lose all of it, permanently and unrecoverably, if your server loses power or the host OS hard crashes?fsync = off # turns forced synchronization on or off
It's much safer to use `synchronous_commit = off` and a commit_delay. If that isn't enough, get fast-flushing storage like a good raid controller with a battery backed cache you can put in write-back mode, or some high quality SSDs with power-protected write caches.As above: I hope your data isn't important to you.full_page_writes = off # recover from partial page writes
--
Craig Ringer
Thanks for the insight. I know why some of the settings were set, but there are others that have been highlighted that now don't make sense. I'll strongly recommend adjusting those now.Good. If you've been running with fsync=off all this time I hope you've also been keeping good backups.
BTW, I should've sent you a link to http://wiki.postgresql.org/wiki/Number_Of_Database_Connections in my prior email, where I recommended connection pooling.
--
Craig Ringer