WAL Optimisation - configuration and usage - Mailing list pgsql-performance

Hi,

There alot here, so skip to the middle from my WAL settings if you like.

I'm currently investigating the performance on a large database which
consumes email designated as SPAM for the perusal of customers wishing
to check. This incorporates a number of subprocesses - several delivery
daemons, an expiry daemon and a UI which performs large selects. A
considerable amount of UPDATE, SELECT and DELETE are performed continually.

Starting with a stock pg config, I've well understood the importance
increased shared mem, effective cache size and low random_page_cost as
detailed in
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html. After some
system analysis with vmstat and sar we've been able to determin that the
main problem is IO bound and IMO this is due to lots of updates
requiring high drive contention - the array is a RAID0 mirror and the
dataset originally 79GB. Alot of SPAM is being sent from our mail
scanners and coupled with the UI is creating an increasingly lagging system.

Typically all our db servers have these sort of enhancements - 1GB ram,
SMP boxen with SCSI 160 disks :
effective_cache_size = 95694
random_page_cost = 0.5
sort_mem=65536
max_connections = 128
shared_buffers = 15732

My focus today has been on WAL - I've not looked at WAL before. By
increasing the settings thus :

wal_buffers = 64 # need to determin WAL usage
wal_files = 64 # range 0-64
wal_sync_method = fsync   # the default varies across platforms:
wal_debug = 0             # range 0-16

# hopefully this should see less LogFlushes per LogInsert - use more WAL
though.
commit_delay = 10000     # range 0-100000
commit_siblings = 2       # range 1-1000
checkpoint_segments = 16   # in logfile segments (16MB each), min 1
checkpoint_timeout = 300  # in seconds, range 30-3600
fsync = true

great improvements have been seen. A vacuumdb -f -a -z went from
processing 1 table in 10 minutes to 10 tables in 1 minute. :) I actually
stopped it after 80 tables (48 hours runtime) because the projected end
time would have been next week. Once I restarted the postmaster with the
above WAL settings, vacuumdb -f -a -z completed all 650 tables by the
following day.

My thinking is therefore to reduce disk context switching as best as
possible within the current hardware limitiations. I'm aiming at keeping
the checkpoint subprocess happy that other backends are about to commit
- hence keep siblings low at 2 - and create a sufficient gap between
internal commital so many commits can be done in a single sync. From the
above config, I believe I've gone some way to acheive this and the
performance I'm now seeing suggests this.

But I think we can get more out of this as the above setting were picked
from thin air  and my concern here is being able to determin WAL file
usage and if the system is caught out on the other extreme that we're
not commiting fast enough. Currently I've read that WAL files shouldn't
be more than 2*checkpoint_segments+1 however my pg_xlog directory
contains 74 files. This suggests I'm using more logfiles than I should.
Also I'm not sure what wal_buffers really should be set to.

Can I get any feedback on this ? How to look into pg's WAL usage would
be what I'm looking for. BTW this is an old install I'm afraid 7.2.2 -
it's been impossible to upgrade up until now because it's been too slow.
I have moved the pg_xlog onto the root SCSI disk - it doesn't appear to
have made a huge difference but it could be on the same cable.

Additional information as a bit of background :
I can supply sar output if required. I'm currently running our expiry
daemon which scans all mail for each domain (ie each table) and this
seems to take a few hours to run on a 26GB archive. It's alot faster
than it ever was. Load gets to about 8 as backends are all busy doing
selects, updates and deletes. This process has recently already been run
so it shouldn't be doing too much deleting. Still seems IO bound, and I
don't think I'm going to solve that without a better disk arrangement,
but this is essentially what I'm doing now - exhausting other possibilities.

$ sar -B -s 16:00:00

16:35:55     pgpgin/s pgpgout/s  activepg  inadtypg  inaclnpg  inatarpg
16:36:00      3601.60    754.40    143492     87791     10230     48302
16:36:05      5766.40    552.80    143947     88039     10170     48431
16:36:10      3663.20    715.20    144578     88354      9075     48401
16:36:15      3634.40    412.00    144335     88405      9427     48433
16:36:20      5578.40    447.20    143626     88545      9817     48397
16:36:25      4154.40    469.60    143640     88654     10388     48536
16:36:30      3504.00    635.20    143538     88763      9992     48458
16:36:35      3540.80    456.00    142515     88949     10444     48381
16:36:40      3334.40   1067.20    143268     89244      9832     48468

$ vmstat 5
   procs                      memory    swap          io
system         cpu
 r  b  w   swpd   free   buff  cache  si  so    bi    bo   in    cs  us
sy  id
 0  7  1  29588  10592  15700 809060   1   0    97    75    0   103
13   9  79
 3  8  0  29588  11680  15736 807620   0   0  3313   438 1838  3559  19
13  68
 2 13  1  29588  12808  15404 800328   0   0  4470   445 1515  1752
7   7  86
 0  9  1  29588  10992  15728 806476   0   0  2933   781 1246  2686  14
10  76
 2  5  1  29588  11336  15956 807884   0   0  3354   662 1773  5211  27
17  57
 4  5  0  29696  13072  16020 813872   0  24  4282   306 2632  7862  45
25  31
 4  6  1  29696  10400  16116 815084   0   0  5086   314 2668  7893  47
26  27
 9  2  1  29696  13060  16308 814232  27   0  3927   748 2586  7836  48
29  23
 3  8  1  29696  10444  16232 812816   3   0  4015   433 2443  7180  47
28  25
 8  4  0  29696  10904  16432 812488   0   0  4537   500 2616  8418  46
30  24
 4  6  2  29696  11048  16320 810276   0   0  6076   569 1893  3919  20
14  66
 0  5  0  29696  10480  16600 813788   0   0  4595   435 2400  6215  33
21  46
 3  6  0  29696  10536  16376 812248   0   0  3802   504 2417  7921  43
25  32
 1  6  1  29696  11236  16500 809636   0   0  3691   357 2171  5199  24
15  61
 0 14  1  29696  10228  16036 801368   0   0  4038   561 1566  3288  16
12  72

Sorry it's so long but I thought some brief info would be better than
not. Thanks for reading,

--

Rob Fielding
Development
Designer Servers Ltd


pgsql-performance by date:

Previous
From: Shridhar Daithankar
Date:
Subject: Re: FreeBSD config
Next
From: Richard Huxton
Date:
Subject: Re: WAL Optimisation - configuration and usage