Thread: Tuning / performance questions

Tuning / performance questions

From
Bryan Montgomery
Date:
Hello experts! (and other like me).

We have a system, that due to recent events is getting a lot heavier use. The application makes extensive use of functions. These functions would typically run sub-second but now can take several seconds.

I'm wondering what general ways there are to monitor and improve performance? We look at pgadmin's server status but that only sees the function being run. Additionally, is there a good way to 'explain' a function? Or do you have to execute the function steps individually and explain those?

The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a virtual machine.

Running pg_version returns 'PostgreSQL 9.1.0 on x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.5.1 20101208 [gcc-4_5-branch revision 167585], 64-bit' and select pg_size_pretty(pg_database_size('nrgdb')); returns 63 GB.

The server typically has up to 500 connections with a max of 750 connections.

Below are the non-default values of our configuration file.

Any thoughts on what we should look at?

Thanks,

Bryan

listen_addresses = '*'                  # what IP address(es) to listen on;

max_connections = 750                   # (change requires restart)

superuser_reserved_connections = 9      # (change requires restart)

shared_buffers = 8192MB                 # min 128kB or max_connections*16kB

temp_buffers = 64MB                     # min 800kB

max_prepared_transactions = 250         # can be 0 or more

work_mem = 512MB                                # min 64kB

maintenance_work_mem = 1GB              # min 1MB

fsync = off                             # turns forced synchronization on or off

full_page_writes = off                  # recover from partial page writes

wal_buffers = 16MB                      # min 32kB

commit_delay = 1000                     # range 0-100000, in microseconds

commit_siblings = 5                     # range 1-1000

checkpoint_segments = 50                # in logfile segments, min 1, 16MB each

checkpoint_timeout = 5min               # range 30s-1h

checkpoint_warning = 1min               # 0 is off

effective_cache_size = 16GB

log_destination = 'stderr'              # Valid values are combinations of

logging_collector = on

Log_directory = '/var/log/postgres'             # Directory where log files are written

log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # Log file name pattern.

log_rotation_age = 1d                   # Automatic rotation of logfiles will

log_rotation_size = 20MB                # Automatic rotation of logfiles will

log_min_messages = info         # Values, in order of decreasing detail:

log_min_duration_statement = 10000 # -1 is disabled, 0 logs all statements

log_line_prefix = '%t %p %u@%h: '       # Special values for Pgsi

log_statement = 'none'                  # none, ddl, mod, all

log_duration = off

autovacuum = on                         # enable autovacuum subprocess?

datestyle = 'iso, mdy'

lc_messages = 'en_US.UTF-8'                     # locale for system error message

lc_monetary = 'en_US.UTF-8'                     # locale for monetary formatting

lc_numeric = 'en_US.UTF-8'                      # locale for number formatting

lc_time = 'en_US.UTF-8'                         # locale for time formatting


Re: Tuning / performance questions

From
"Kevin Grittner"
Date:
Bryan Montgomery wrote:

> We have a system, that due to recent events is getting a lot
> heavier use. The application makes extensive use of functions.
> These functions would typically run sub-second but now can take
> several seconds.

> The server is running on suse 11.4 with 8 vcpu and 32Gb ram on a
> virtual machine.
>
> Running pg_version returns 'PostgreSQL 9.1.0 on
> x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.5.1
> 20101208 [gcc-4_5-branch revision 167585], 64-bit' and select
> pg_size_pretty(pg_database_size('nrgdb')); returns 63 GB.

> The server typically has up to 500 connections with a max of 750
> connections.

> max_connections = 750 # (change requires restart)

> temp_buffers = 64MB # min 800kB

> work_mem = 512MB # min 64kB

750 * ((512 MB) + (64 MB)) = 421.87500 gigabytes

Once a connection uses memory for temp_buffers, it doesn't release it
for as long as that connection runs. Each connection can allocate
work_mem for each node of a query plan. Even though maximum usage can
be more than one allocation per connection, rule of thumb is to
assume just one. You are up to needing 422 GB + shared memory + OS
space (including some buffers and cache) + whatever else you run on
this VM. You have 32 GB. You will almost certainly have problems at
high load.

Try putting pgbouncer in front of the database, configured to use
transaction mode and accept 750 user connections while only keeping
20 or so database conections open.

> max_prepared_transactions = 250 # can be 0 or more

Are you really using two phase commit and a transaction manager?
(Don't confuse this setting with something related to prepared
*statements* -- prepared *transacitons* are a separate issue.)  Even
if you are using prepared transactions, do you really expect your
transaction manager to let 250 transactions pile up in the database
between the first and second phase of commit?

BTW, you should be monitoring this table for old prepared
transactions that to prevent problems with bloat.

> fsync = off
> full_page_writes = off

You didn't mention your backup scheme, but be prepared for the fact
that with these settings, if the VM (or its underlying OS or
hardward) fails, your database will be corrupted and you may have no
choice but to use your backup.

> commit_delay = 1000 # range 0-100000, in microseconds
>
> commit_siblings = 5 # range 1-1000

These settings are notoriously hard to configure from the default
without actually making things worse. Be very sure you know what you
are doing and have carefully benchmarked this against your real
workload; otherwise it is probably better to put these back to the
defaults.

There may be some other fine-tuning opportunities, but these issues
should be fixed first, and it would be best to have an actual query
that is performing poorly to try to tune some of the other settings.

-Kevin


Re: Tuning / performance questions

From
Craig Ringer
Date:
Thanks for including your configuration and version; it makes things much easier.

Reply follows inline.

On 11/06/2012 09:04 PM, Bryan Montgomery wrote:
I'm wondering what general ways there are to monitor and improve performance? We look at pgadmin's server status but that only sees the function being run. Additionally, is there a good way to 'explain' a function? Or do you have to execute the function steps individually and explain those?

See the auto_explain contrib module. It can explain statements within functions, as well as the functions themselves.

http://www.postgresql.org/docs/current/static/auto-explain.html

The server typically has up to 500 connections with a max of 750 connections.

Get a connection pooler. Urgently. See http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely that your server is running efficiently with that many concurrent connections actively working. Reducing it to (say) 100 and using transaction-level connection pooling may boost performance significantly.

work_mem = 512MB                                # min 64kB

That's really dangerous with your connection count. If many connections actually use that, you'll run out of RAM in a hurry and enter nasty paging storm. If possible, reduce it, then raise it selectively in transactions where you know a high work_mem is needed.

fsync = off                             # turns forced synchronization on or off

So you don't value your data and don't mind if you lose all of it, permanently and unrecoverably, if your server loses power or the host OS hard crashes?

It's much safer to use `synchronous_commit = off` and a commit_delay. If that isn't enough, get fast-flushing storage like a good raid controller with a battery backed cache you can put in write-back mode, or some high quality SSDs with power-protected write caches.

full_page_writes = off                  # recover from partial page writes

As above: I hope your data isn't important to you.

--
Craig Ringer

Re: Tuning / performance questions

From
Bryan Montgomery
Date:
Thanks for the insight. I know why some of the settings were set, but there are others that have been highlighted that now don't make sense. I'll strongly recommend adjusting those now.

On Tue, Nov 6, 2012 at 8:54 PM, Craig Ringer <ringerc@ringerc.id.au> wrote:
Thanks for including your configuration and version; it makes things much easier.

Reply follows inline.


On 11/06/2012 09:04 PM, Bryan Montgomery wrote:
I'm wondering what general ways there are to monitor and improve performance? We look at pgadmin's server status but that only sees the function being run. Additionally, is there a good way to 'explain' a function? Or do you have to execute the function steps individually and explain those?

See the auto_explain contrib module. It can explain statements within functions, as well as the functions themselves.

http://www.postgresql.org/docs/current/static/auto-explain.html


The server typically has up to 500 connections with a max of 750 connections.

Get a connection pooler. Urgently. See http://wiki.postgresql.org/wiki/PgBouncer . It is extremely unlikely that your server is running efficiently with that many concurrent connections actively working. Reducing it to (say) 100 and using transaction-level connection pooling may boost performance significantly.

work_mem = 512MB                                # min 64kB

That's really dangerous with your connection count. If many connections actually use that, you'll run out of RAM in a hurry and enter nasty paging storm. If possible, reduce it, then raise it selectively in transactions where you know a high work_mem is needed.

fsync = off                             # turns forced synchronization on or off

So you don't value your data and don't mind if you lose all of it, permanently and unrecoverably, if your server loses power or the host OS hard crashes?

It's much safer to use `synchronous_commit = off` and a commit_delay. If that isn't enough, get fast-flushing storage like a good raid controller with a battery backed cache you can put in write-back mode, or some high quality SSDs with power-protected write caches.

full_page_writes = off                  # recover from partial page writes

As above: I hope your data isn't important to you.

--
Craig Ringer


Re: Tuning / performance questions

From
Craig Ringer
Date:
On 11/07/2012 01:29 PM, Bryan Montgomery wrote:
Thanks for the insight. I know why some of the settings were set, but there are others that have been highlighted that now don't make sense. I'll strongly recommend adjusting those now.
Good. If you've been running with fsync=off all this time I hope you've also been keeping good backups.

BTW, I should've sent you a link to http://wiki.postgresql.org/wiki/Number_Of_Database_Connections in my prior email, where I recommended connection pooling.

--
Craig Ringer