Thread: Migrated from 8.3 to 9.0 - need to update config (re-post)

Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Carlo Stonebanks
Date:

Hello performance wizards!

 

(Sorry for the re-post if this appears twice - I see no evidence e-mailing to pgsql-perfomrance is working yet.)

My client has migrated his 8.3 hosted DB to new machines running PG 9.0. It’s time to look at the config settings.

Immediately below are the config settings.

The specifics of the DB and how it is used is below that, but in general let me say that this is a full-time ETL system, with only a handful of actual “users” and automated processes over 300 connections running “import” programs 24/7.

I appreciate the help,

Carlo

The host system:

Intel® Xeon® Processor X5560 (8M Cache, 2.80 GHz, 6.40 GT/s Intel® QPI) x 2, dual quad core

48 GB RAM

RAID 10, 6 X 600 GB 15krpm SAS)

LINUX Redhat/Centos 2.6.18-164.el5

Sys admin says that battery-backup RAID controller and consequent write settings should have no impact on performance. Is this true?

Current config and my thoughts on what to do with it. If it isn’t mentioned here, the values are default values:

# ===========================================================

max_connections = 300

shared_buffers = 500MB # At 48GB of RAM, could we go to 2GB

# - what is the impact on LINX config?

effective_cache_size = 2457MB # Sys admin says assume 25% of 48GB

# is used by OS and other apps

work_mem = 512MB # Complex reads are called many times a second

# from each connection, so what should this be?

maintenance_work_mem = 256MB # Should this be bigger - 1GB at least?

checkpoint_segments = 128 # There is lots of write activity; this is high

# but could it be higher?

#checkpoint_completion_target not set;

# Recommendation appears to be .9 for our 128 checkpoint segments

default_statistics_target = 200 # Deprecated?

#autovacuum_freeze_max_age not set;

# recommendation is 1,000,000 for non-activity.

# What is the metric for wal_buffers setting?

wal_buffers = 4MB # Looks low, recommendation appears to be 16MB.

# Is it really "set it and forget it"?

#synchronous_commit not set;

# Recommendation is to turn this off and leave fsync on

#fsync not set;

# Recommendation is to leave this on

#wal_level not set;

# Do we only needed for replication?

#max_wal_senders not set;

# Do we only needed for replication?

# The issue of vacuum/analyze is a tricky one.

# Data imports are running 24/7. One the DB is seeded, the vast majority

# of write activity is updates, and not to indexed columns.

# Deletions are vary rare.

vacuum_cost_delay = 20ms

# The background writer has not been addressed at all.

# Can our particular setup benefit from changing

# the bgwriter values?

bgwriter_lru_maxpages = 100 # This is the default;

listen_addresses = '*'

port = 5432

log_destination = 'stderr'

logging_collector = on

log_directory = 'pg_log'

log_filename = 'postgresql-%a.log'

log_truncate_on_rotation = on

log_rotation_age = 1d

log_rotation_size = 0

log_line_prefix = '%t'

track_counts = on

# ===========================================================


The DB is pretty large, and organized by schema. The most active are:

1) One “Core” schema

a. 100 tables

b. Typical row counts in the low millions.

c. This represents the enterprise’s core data.

d. Equal read/write activity

2) Multiple “Import” schemas

a. Contain several thousand raw “flat file” tables

b. Ragged column structure, up to hundreds of columns

c. Erratic row counts, from dozens of rows to 1 million

d. Each table sequentially read once, only status fields are written back

3) One “Audit” schema

a. A new log table is created every month

b. Typical row count is 200 million

c. Log every write to the “Core”

d. Almost entirely write operations, but the few read operations that are done have to be fast owing to the size of the tables

e. Links the “Core” data to the “Import” data

There are next to no “users” on the system – each connection services a constantly running import process which takes the incoming “import” data, analyzes the “core” data and decides how to distil the import into the core.

Analytical Processes are not report-oriented

The “Core” reads are mostly single row results

The “Import” reads are 1,000 row pages

There is next to no use of aggregate queries

Transactional Processes are a steady stream of writes

Not bursty or sporadic

Overwhelmingly inserts and updates, next to no deletes

Each transaction represents 10 – 50 writes to the “core” schema

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
"Kevin Grittner"
Date:
Carlo Stonebanks <stonec.register@sympatico.ca> wrote:

> this is a full-time ETL system, with only a handful of actual
> *users* and automated processes over 300 connections running
> *import* programs 24/7

> Intel* Xeon* Processor X5560 (8M Cache, 2.80 GHz, 6.40
> GT/s Intel* QPI) x 2, dual quad core 48 GB RAM
> RAID 10, 6 X 600 GB 15krpm SAS)

So, eight cores and six spindles.  You are probably going to see
*much* better throughput if you route those 300 workers through
about 22 connections.  Use a connection pooler which limits active
transactions to that and queues up requests to start a transaction.

> Sys admin says that battery-backup RAID controller and
> consequent write settings should have no impact on performance.

With only six drives, I your OS, WAL files, indexes, and heap files
are all in the same RAID?  If so, your sys admin is wrong -- you
want the controller configured for write-back (with automatic switch
to write-through on low or failed battery, if possible).

> max_connections = 300

Too high.  Both throughput and latency should improve with correct
use of a connection pooler.

> shared_buffers =
> 500MB        # At 48GB of RAM, could we go to 2GB

You might benefit from as much as 8GB, but only testing with your
actual load will show for sure.

> effective_cache_size =
> 2457MB # Sys admin says assume 25% of 48GB

Add together the shared_buffers setting and whatever the OS tells
you is used for cache under your normal load.  It's usually 75% of
RM or higher.  (NOTE: This doesn't cause any allocation of RAM; it's
a hint to the cost calculations.)

> work_mem =
> 512MB              # Complex reads are called many times a second

Maybe, if you use the connection pooler as described above.  Each
connection can allocate this multiple times.  So with 300
connections you could very easily start using 150GB of RAM in
addition to your shared buffers; causing a swap storm followed by
OOM crashes.  If you stay with 300 connections this *must* be
reduced by at least an order of magnitude.

> # from each connection, so what should this be?
> maintenance_work_mem =
> 256MB  # Should this be bigger - 1GB at least?

I'd go to 1 or 2 GB.

> checkpoint_segments =
> 128     # There is lots of write activity; this is high

OK

> # but could it be higher?

IMO, there's unlikely to be much benefit beyond that.

> #checkpoint_completion_target not set;
> #  Recommendation appears to be .9 for our 128 checkpoint segments

0.9 is probably a good idea.

> default_statistics_target =
> 200  # Deprecated?

Depends on your data.  The default is 100.  You might want to leave
that in general and boost it for specific columns where you find it
is needed.  Higher values improve estimates and can lead to better
query plans, but boost ANALYZE times and query planning time.

> # What is the metric for wal_buffers setting?
> wal_buffers =
> 4MB             # Looks low, recommendation appears to be 16MB.

16MB is good.

> # Is it really "set it and forget it"?

Yeah.

> #synchronous_commit not set;
>
> #  Recommendation is to turn this off and leave fsync on

If this is off, it makes lack of write-back on the controller a lot
less painful.  Even with write-back it can improve performance some.
It does mean that on a crash you can lose some committed
transactions (typically less than a second's worth), but you will
still have database integrity.

> #fsync not set;
>
> #  Recommendation is to leave this on

Unless you want to rebuild your database from scratch or restore
from backup on an OS crash, leave this on.

> #wal_level not set;
>
> #  Do we only needed for replication?

The lowest level just supports crash recovery.  The next level
supports archiving, for recovery from a PITR-style backup.  The
third level is needed to support hot standby (a replicated server on
which you can run targets as it is updated).

> # The issue of vacuum/analyze is a tricky one.
> #  Data imports are running 24/7. One the DB is seeded, the vast
> #  majority of write activity is updates, and not to indexed
> #  columns.  Deletions are vary rare.
> vacuum_cost_delay =
> 20ms

You could try that.  I would monitor for bloat and make things more
aggressive if needed.  If you are not vacuuming aggressively enough,
performance will slowly degrade.  If you let it go too far, recovery
can be a lot of work.

> # The background writer has not been addressed at all.
> #  Can our particular setup benefit from changing the bgwriter
> #  values?

Probably not.  If you find that your interactive users have periods
where queries seem to "freeze" for a few minutes at a time and then
return to normal levels of performance, you might need to make this
more aggressive.

-Kevin

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Claudio Freire
Date:
On Fri, Sep 9, 2011 at 5:38 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> This is getting back to that issue of using only enough processes at
> one time to keep all the bottleneck resources fully utilized.  Some
> people tend to assuem that if they throw a few more concurrent
> processes into the mix, it'll all get done sooner.  There are a
> great many benchmarks which show otherwise.

On the other hand, in order to benefit from synchro scans and stuff
like that, one has to increase concurrency beyond what is normally
considered optimal.

I have an application that really benefits from synchro scans, but
from time to time, when planets are aligned wrong, the extra
concurrency does hurt.

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Claudio Freire
Date:
On Fri, Sep 9, 2011 at 3:16 PM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Add together the shared_buffers setting and whatever the OS tells
> you is used for cache under your normal load.  It's usually 75% of
> RM or higher.  (NOTE: This doesn't cause any allocation of RAM; it's
> a hint to the cost calculations.)

In the manual[0] it says to take into account the number of concurrent
access to different indices and tables:

"
    Sets the planner's assumption about the effective size of the
disk cache that is available to a single query. This is factored into
estimates of the cost of using an index; a higher value makes it more
likely index scans will be used, a lower value makes it more likely
sequential scans will be used. When setting this parameter you should
consider both PostgreSQL's shared buffers and the portion of the
kernel's disk cache that will be used for PostgreSQL data files. Also,
take into account the expected number of concurrent queries on
different tables, since they will have to share the available space.
This parameter has no effect on the size of shared memory allocated by
PostgreSQL, nor does it reserve kernel disk cache; it is used only for
estimation purposes. The default is 128 megabytes (128MB).
"

However, every mail I've seen on the list, and every bibliography
seems to ignore that. Does PG consider it automatically now, and
admins only have to input the amount of system memory? (in which case
PG could autoconfigure itself by querying /proc), is the manual wrong,
or is the advise given everywher just ignoring that bit?


[0] http://www.postgresql.org/docs/9.0/static/runtime-config-query.html

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
"Kevin Grittner"
Date:
Claudio Freire <klaussfreire@gmail.com> wrote:
> On Fri, Sep 9, 2011 at 3:16 PM, Kevin Grittner
> <Kevin.Grittner@wicourts.gov> wrote:
>> Add together the shared_buffers setting and whatever the OS tells
>> you is used for cache under your normal load.  It's usually 75%
>> of RM or higher.  (NOTE: This doesn't cause any allocation of
>> RAM; it's a hint to the cost calculations.)
>
> In the manual[0] it says to take into account the number of
> concurrent access to different indices and tables:

Hmm.  I suspect that the manual is technically correct, except that
it probably only matters in terms of how many connections will
concurrently be executing long-running queries which might access
large swaths of large indexes.  In many environments, there are a
lot of maintenance and small query processes, and only occasional
queries where this setting would matter.  I've always had good
results (so far) on the effective assumption that only one such
query will run at a time.  (That is probably helped by the fact that
we normally submit jobs which run such queries to a job queue
manager which runs them one at a time...)

This is getting back to that issue of using only enough processes at
one time to keep all the bottleneck resources fully utilized.  Some
people tend to assuem that if they throw a few more concurrent
processes into the mix, it'll all get done sooner.  There are a
great many benchmarks which show otherwise.

-Kevin

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
"Kevin Grittner"
Date:
Claudio Freire <klaussfreire@gmail.com> wrote:

> On the other hand, in order to benefit from synchro scans and
> stuff like that, one has to increase concurrency beyond what is
> normally considered optimal.

That's a good example of why any general configuration advice should
just be used as a starting point.  There's no substitute for
benchmarking your own real workload on your own hardware.

On the third hand, though, you have to be very careful about
interpreting these results -- if you used a configuration with a
small effective_cache_size so you could get a lot of benefit from
the synchro scans, you might have suppressed choice of an index
which would have allowed them to run faster with lower concurrency.
Or you might have had to cut your work_mem to a small enough size
(to avoid OOM errors) to force a totally different plan.  So to get
a meaningful comparison, you have to change a number of variables at
once.

Good benchmarking is really hard.

-Kevin

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Carlo Stonebanks
Date:
Hi Kevin,

First, thanks for taking the time. I wish I could write back with quick,
terse questions to your detailed reply - but I'm sorry, this is still going
to be a wordy post.

>> max_connections = 300
>Too high.  Both throughput and latency should improve with correct use of
>a connection pooler.

Even for 300 stateful applications that can remain connected for up to a
week, continuously distilling data (imports)? The 300 is overkill, a sys
admin raised it from 100 when multiple large projects were loaded and the
server refused the additional connections. We can take large imports and
break them into multiple smaller ones which the operators are doing to try
and improve import performance. It does result in some improvement, but I
think they have gone over the top and the answer is to improve DB and OS
performance. Perhaps I don't understand how connection pooling will work
with stateful apps that are continuously reading and writing (the apps are
DB I/O bound).

> you want the controller configured for write-back (with automatic switch
> to write-through on low or failed battery, if possible).

For performance or safety reasons? Since the sys admin thinks there's no
performance benefit from this, I would like to be clear on why we should do
this.

>> Can our particular setup benefit from changing the bgwriter values?
> Probably not.  If you find that your interactive users have periods
> where queries seem to "freeze" for a few minutes at a time and then
> return to normal levels of performance, you might need to make this
> more aggressive.

We actually experience this. Once again, remember the overwhelming use of
the system is long-running import threads with continuous connections. Every
now and then the imports behave as if they are suddenly taking a deep
breath, slowing down. Sometimes, so much we cancel the import and restart
(the imports pick up where they left off).

What would the bg_writer settings be in this case?

Thanks again for your time,

Carlo


-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: September 9, 2011 2:16 PM
To: pgsql-performance@postgresql.org; Carlo Stonebanks
Subject: Re: [PERFORM] Migrated from 8.3 to 9.0 - need to update config
(re-post)

Carlo Stonebanks <stonec.register@sympatico.ca> wrote:

> this is a full-time ETL system, with only a handful of actual
> *users* and automated processes over 300 connections running
> *import* programs 24/7

> Intel* Xeon* Processor X5560 (8M Cache, 2.80 GHz, 6.40
> GT/s Intel* QPI) x 2, dual quad core 48 GB RAM
> RAID 10, 6 X 600 GB 15krpm SAS)

So, eight cores and six spindles.  You are probably going to see
*much* better throughput if you route those 300 workers through
about 22 connections.  Use a connection pooler which limits active
transactions to that and queues up requests to start a transaction.

> Sys admin says that battery-backup RAID controller and
> consequent write settings should have no impact on performance.

With only six drives, I your OS, WAL files, indexes, and heap files
are all in the same RAID?  If so, your sys admin is wrong -- you
want the controller configured for write-back (with automatic switch
to write-through on low or failed battery, if possible).

> max_connections = 300

Too high.  Both throughput and latency should improve with correct
use of a connection pooler.

> shared_buffers =
> 500MB        # At 48GB of RAM, could we go to 2GB

You might benefit from as much as 8GB, but only testing with your
actual load will show for sure.

> effective_cache_size =
> 2457MB # Sys admin says assume 25% of 48GB

Add together the shared_buffers setting and whatever the OS tells
you is used for cache under your normal load.  It's usually 75% of
RM or higher.  (NOTE: This doesn't cause any allocation of RAM; it's
a hint to the cost calculations.)

> work_mem =
> 512MB              # Complex reads are called many times a second

Maybe, if you use the connection pooler as described above.  Each
connection can allocate this multiple times.  So with 300
connections you could very easily start using 150GB of RAM in
addition to your shared buffers; causing a swap storm followed by
OOM crashes.  If you stay with 300 connections this *must* be
reduced by at least an order of magnitude.

> # from each connection, so what should this be?
> maintenance_work_mem =
> 256MB  # Should this be bigger - 1GB at least?

I'd go to 1 or 2 GB.

> checkpoint_segments =
> 128     # There is lots of write activity; this is high

OK

> # but could it be higher?

IMO, there's unlikely to be much benefit beyond that.

> #checkpoint_completion_target not set;
> #  Recommendation appears to be .9 for our 128 checkpoint segments

0.9 is probably a good idea.

> default_statistics_target =
> 200  # Deprecated?

Depends on your data.  The default is 100.  You might want to leave
that in general and boost it for specific columns where you find it
is needed.  Higher values improve estimates and can lead to better
query plans, but boost ANALYZE times and query planning time.

> # What is the metric for wal_buffers setting?
> wal_buffers =
> 4MB             # Looks low, recommendation appears to be 16MB.

16MB is good.

> # Is it really "set it and forget it"?

Yeah.

> #synchronous_commit not set;
>
> #  Recommendation is to turn this off and leave fsync on

If this is off, it makes lack of write-back on the controller a lot
less painful.  Even with write-back it can improve performance some.
It does mean that on a crash you can lose some committed
transactions (typically less than a second's worth), but you will
still have database integrity.

> #fsync not set;
>
> #  Recommendation is to leave this on

Unless you want to rebuild your database from scratch or restore
from backup on an OS crash, leave this on.

> #wal_level not set;
>
> #  Do we only needed for replication?

The lowest level just supports crash recovery.  The next level
supports archiving, for recovery from a PITR-style backup.  The
third level is needed to support hot standby (a replicated server on
which you can run targets as it is updated).

> # The issue of vacuum/analyze is a tricky one.
> #  Data imports are running 24/7. One the DB is seeded, the vast
> #  majority of write activity is updates, and not to indexed
> #  columns.  Deletions are vary rare.
> vacuum_cost_delay =
> 20ms

You could try that.  I would monitor for bloat and make things more
aggressive if needed.  If you are not vacuuming aggressively enough,
performance will slowly degrade.  If you let it go too far, recovery
can be a lot of work.

> # The background writer has not been addressed at all.
> #  Can our particular setup benefit from changing the bgwriter
> #  values?

Probably not.  If you find that your interactive users have periods
where queries seem to "freeze" for a few minutes at a time and then
return to normal levels of performance, you might need to make this
more aggressive.

-Kevin


Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Carlo Stonebanks
Date:
Hi Kevin,

(sorry for late reply, PG forums seem to have problems with my e-mail client, now trying web mail)
 
First, thanks for taking the time. I wish I could write back with quick, terse questions to your detailed reply - but I'm sorry, this is still going to be a wordy post.

>> max_connections = 300
>Too high. Both throughput and latency should improve with correct use
>of a connection pooler.

Even for 300 stateful applications that can remain connected for up to a week, continuously distilling data (imports)? The 300 is overkill, a sys admin raised it from 100 when multiple large projects were loaded and the server refused the additional connections. We can take large imports and break them into multiple smaller ones which the operators are doing to try and improve import performance. It does result in some improvement, but I think they have gone over the top and the answer is to improve DB and OS performance. Perhaps I don't understand how connection pooling will work with stateful apps that are continuously reading and writing (the apps are DB I/O bound).
 
> you want the controller configured for write-back (with automatic
> switch to write-through on low or failed battery, if possible).

For performance or safety reasons? Since the sys admin thinks there's no performance benefit from this, I would like to be clear on why we should do this.

>> Can our particular setup benefit from changing the bgwriter values?
> Probably not. If you find that your interactive users have periods
> where queries seem to "freeze" for a few minutes at a time and then
> return to normal levels of performance, you might need to make this
> more aggressive.

We actually experience this. Once again, remember the overwhelming use of the system is long-running import threads with continuous connections. Every now and then the imports behave as if they are suddenly taking a deep breath, slowing down. Sometimes, so much we cancel the import and restart (the imports pick up where they left off).

What would the bg_writer settings be in this case?

Thanks again for your time,

Carlo

 
> Date: Fri, 9 Sep 2011 13:16:28 -0500
> From: Kevin.Grittner@wicourts.gov
> To: pgsql-performance@postgresql.org; stonec.register@sympatico.ca
> Subject: Re: [PERFORM] Migrated from 8.3 to 9.0 - need to update config (re-post)
>
> Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
>
> > this is a full-time ETL system, with only a handful of actual
> > *users* and automated processes over 300 connections running
> > *import* programs 24/7
>
> > Intel* Xeon* Processor X5560 (8M Cache, 2.80 GHz, 6.40
> > GT/s Intel* QPI) x 2, dual quad core 48 GB RAM
> > RAID 10, 6 X 600 GB 15krpm SAS)
>
> So, eight cores and six spindles. You are probably going to see
> *much* better throughput if you route those 300 workers through
> about 22 connections. Use a connection pooler which limits active
> transactions to that and queues up requests to start a transaction.
>
> > Sys admin says that battery-backup RAID controller and
> > consequent write settings should have no impact on performance.
>
> With only six drives, I your OS, WAL files, indexes, and heap files
> are all in the same RAID? If so, your sys admin is wrong -- you
> want the controller configured for write-back (with automatic switch
> to write-through on low or failed battery, if possible).
>
> > max_connections = 300
>
> Too high. Both throughput and latency should improve with correct
> use of a connection pooler.
>
> > shared_buffers =
> > 500MB # At 48GB of RAM, could we go to 2GB
>
> You might benefit from as much as 8GB, but only testing with your
> actual load will show for sure.
>
> > effective_cache_size =
> > 2457MB # Sys admin says assume 25% of 48GB
>
> Add together the shared_buffers setting and whatever the OS tells
> you is used for cache under your normal load. It's usually 75% of
> RM or higher. (NOTE: This doesn't cause any allocation of RAM; it's
> a hint to the cost calculations.)
>
> > work_mem =
> > 512MB # Complex reads are called many times a second
>
> Maybe, if you use the connection pooler as described above. Each
> connection can allocate this multiple times. So with 300
> connections you could very easily start using 150GB of RAM in
> addition to your shared buffers; causing a swap storm followed by
> OOM crashes. If you stay with 300 connections this *must* be
> reduced by at least an order of magnitude.
>
> > # from each connection, so what should this be?
> > maintenance_work_mem =
> > 256MB # Should this be bigger - 1GB at least?
>
> I'd go to 1 or 2 GB.
>
> > checkpoint_segments =
> > 128 # There is lots of write activity; this is high
>
> OK
>
> > # but could it be higher?
>
> IMO, there's unlikely to be much benefit beyond that.
>
> > #checkpoint_completion_target not set;
> > # Recommendation appears to be .9 for our 128 checkpoint segments
>
> 0.9 is probably a good idea.
>
> > default_statistics_target =
> > 200 # Deprecated?
>
> Depends on your data. The default is 100. You might want to leave
> that in general and boost it for specific columns where you find it
> is needed. Higher values improve estimates and can lead to better
> query plans, but boost ANALYZE times and query planning time.
>
> > # What is the metric for wal_buffers setting?
> > wal_buffers =
> > 4MB # Looks low, recommendation appears to be 16MB.
>
> 16MB is good.
>
> > # Is it really "set it and forget it"?
>
> Yeah.
>
> > #synchronous_commit not set;
> >
> > # Recommendation is to turn this off and leave fsync on
>
> If this is off, it makes lack of write-back on the controller a lot
> less painful. Even with write-back it can improve performance some.
> It does mean that on a crash you can lose some committed
> transactions (typically less than a second's worth), but you will
> still have database integrity.
>
> > #fsync not set;
> >
> > # Recommendation is to leave this on
>
> Unless you want to rebuild your database from scratch or restore
> from backup on an OS crash, leave this on.
>
> > #wal_level not set;
> >
> > # Do we only needed for replication?
>
> The lowest level just supports crash recovery. The next level
> supports archiving, for recovery from a PITR-style backup. The
> third level is needed to support hot standby (a replicated server on
> which you can run targets as it is updated).
>
> > # The issue of vacuum/analyze is a tricky one.
> > # Data imports are running 24/7. One the DB is seeded, the vast
> > # majority of write activity is updates, and not to indexed
> > # columns. Deletions are vary rare.
> > vacuum_cost_delay =
> > 20ms
>
> You could try that. I would monitor for bloat and make things more
> aggressive if needed. If you are not vacuuming aggressively enough,
> performance will slowly degrade. If you let it go too far, recovery
> can be a lot of work.
>
> > # The background writer has not been addressed at all.
> > # Can our particular setup benefit from changing the bgwriter
> > # values?
>
> Probably not. If you find that your interactive users have periods
> where queries seem to "freeze" for a few minutes at a time and then
> return to normal levels of performance, you might need to make this
> more aggressive.
>
> -Kevin

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Craig Ringer
Date:
On 09/14/2011 02:56 AM, Carlo Stonebanks wrote:

> Even for 300 stateful applications that can remain connected for up to a
> week, continuously distilling data (imports)?

If they're all doing active work all that time you can still benefit
from a pooler.

Say your server can service 50 connections at optimum speed, and any
more result in reduced overall throughput. You have 300 apps with
statements they want to run. Your pooler will basically queue them, so
at any one time 50 are doing work and 250 are waiting for database
access. This should _improve_ database throughput by reducing contention
if 50 worker connections is your sweet spot. However, it will also
increase latency for service for those workers because they may have to
wait a while before their transaction runs, even though their
transaction will complete much faster.

You'd probably want to pool at the transaction level, so once a client
gets a connection it keeps it for the lifetime of that transaction and
the connection is handed back to the pool when the transaction commits
or rolls back.

>>  you want the controller configured for write-back (with automatic
>>  switch to write-through on low or failed battery, if possible).
>
> For performance or safety reasons? Since the sys admin thinks there's no
> performance benefit from this, I would like to be clear on why we should
> do this.

fsync!

If your workload is read-only, it won't help you much. If your workload
is write-heavy or fairly balanced it'll make a HUGE difference, because
fsync() on commit won't have to wait for disk I/O, only I/O to the RAID
card's cache controller.

You can also play with commit_delay and synchronous_commit to trade
guarantees of data persistence off against performance. Don't mind
losing up to 5 mins of commits if you lose power? These options are for you.

Whatever you do, do NOT set fsync=off. It should be called "Eat my data
if anything goes even slightly wrong=on"; it does have legitimate uses,
but they're not yours.

>> > Can our particular setup benefit from changing the bgwriter values?
>>  Probably not. If you find that your interactive users have periods
>>  where queries seem to "freeze" for a few minutes at a time and then
>>  return to normal levels of performance, you might need to make this
>>  more aggressive.
>
> We actually experience this. Once again, remember the overwhelming use
> of the system is long-running import threads with continuous
> connections. Every now and then the imports behave as if they are
> suddenly taking a deep breath, slowing down. Sometimes, so much we
> cancel the import and restart (the imports pick up where they left off).

This could definitely be checkpointing issues. Enable checkpoint logging.

> What would the bg_writer settings be in this case?

You need to tune it for your workload I'm afraid. See the manual and
mailing list discussions.

--
Craig Ringer

Re: Migrated from 8.3 to 9.0 - need to update config (re-post)

From
Carlo Stonebanks
Date:
Thanks guys,
 
So, would you say that transaction pooling has a load-balancing effect  because of its granularity compared to session pooling?
 
I'm concerned about the side-effects of transaction pooling, like the sessiion-level features we would always have to look out for. Wouldn't this require a code review? Just reading UDF Session State=No on this page got my attention: http://wiki.postgresql.org/wiki/PgBouncer 
 
If we go with transaction pooling, will we get any sort of warnings or exceptions when apps and stored pgUDF's are violating transaction pooling features, or will things just quietly go wrong, with one session getting a side-effect from another session's state?
 
Carlo
> Date: Wed, 14 Sep 2011 09:52:07 +0800
> From: ringerc@ringerc.id.au
> To: stonec.register@sympatico.ca
> CC: kevin.grittner@wicourts.gov; pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Migrated from 8.3 to 9.0 - need to update config (re-post)
>
> On 09/14/2011 02:56 AM, Carlo Stonebanks wrote:
>
> > Even for 300 stateful applications that can remain connected for up to a
> > week, continuously distilling data (imports)?
>
> If they're all doing active work all that time you can still benefit
> from a pooler.
>
> Say your server can service 50 connections at optimum speed, and any
> more result in reduced overall throughput. You have 300 apps with
> statements they want to run. Your pooler will basically queue them, so
> at any one time 50 are doing work and 250 are waiting for database
> access. This should _improve_ database throughput by reducing contention
> if 50 worker connections is your sweet spot. However, it will also
> increase latency for service for those workers because they may have to
> wait a while before their transaction runs, even though their
> transaction will complete much faster.
>
> You'd probably want to pool at the transaction level, so once a client
> gets a connection it keeps it for the lifetime of that transaction and
> the connection is handed back to the pool when the transaction commits
> or rolls back.
>
> >> you want the controller configured for write-back (with automatic
> >> switch to write-through on low or failed battery, if possible).
> >
> > For performance or safety reasons? Since the sys admin thinks there's no
> > performance benefit from this, I would like to be clear on why we should
> > do this.
>
> fsync!
>
> If your workload is read-only, it won't help you much. If your workload
> is write-heavy or fairly balanced it'll make a HUGE difference, because
> fsync() on commit won't have to wait for disk I/O, only I/O to the RAID
> card's cache controller.
>
> You can also play with commit_delay and synchronous_commit to trade
> guarantees of data persistence off against performance. Don't mind
> losing up to 5 mins of commits if you lose power? These options are for you.
>
> Whatever you do, do NOT set fsync=off. It should be called "Eat my data
> if anything goes even slightly wrong=on"; it does have legitimate uses,
> but they're not yours.
>
> >> > Can our particular setup benefit from changing the bgwriter values?
> >> Probably not. If you find that your interactive users have periods
> >> where queries seem to "freeze" for a few minutes at a time and then
> >> return to normal levels of performance, you might need to make this
> >> more aggressive.
> >
> > We actually experience this. Once again, remember the overwhelming use
> > of the system is long-running import threads with continuous
> > connections. Every now and then the imports behave as if they are
> > suddenly taking a deep breath, slowing down. Sometimes, so much we
> > cancel the import and restart (the imports pick up where they left off).
>
> This could definitely be checkpointing issues. Enable checkpoint logging.
>
> > What would the bg_writer settings be in this case?
>
> You need to tune it for your workload I'm afraid. See the manual and
> mailing list discussions.
>
> --
> Craig Ringer