Re: System Load analyze - Mailing list pgsql-general

From Peter Bauer
Subject Re: System Load analyze
Date
Msg-id 200711280819.22714.peter.bauer@apus.co.at
Whole thread Raw
In response to Re: System Load analyze  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-general
Am Dienstag 27 November 2007 schrieb Scott Marlowe:
> On Nov 24, 2007 10:57 AM, Peter Bauer <peter.bauer@apus.co.at> wrote:
> > i have a system here with 2 2.4GHz Xeon Processors, 2GB RAM, ONE Disk on
> > a Battery Backed Write Cache SCSI Controller and PostgreSQL 8.1.4
> > running with the data on a DRBD Device for High Availability. The used
> > database is also replicated to two similar machines with slony1.
>
> Why are you running a version of PostgreSQL with known data eating
> bugs? If you care for your data, you will keep up to date on releases.
>  8.1.10 was released on 2007-09-17.  8.1.4 was released on 2006-05-23.
>  That's 16 months of bug fixes you're missing.  Go here:
> http://www.postgresql.org/docs/8.1/static/release.html and read up on
> the fixes you're missing.  Then update.  Or just update.
>
> OK, on the the issue at hand.
>
> > Since the load average is between 1 (most of the time) and 10 (peeks) i
> > am worried about the load and executed vmstat and iostat which show that
> > 1000-6000 Blocks are writen per second. Please check the attached output
> > for further details.
> > top shows that the CPUs are at least 80% idle most of the time so i
> > think there is an I/O bottleneck. I'm aware that this hardware setup is
> > probably not sufficient but is would like to investigate how critical
> > the situation is.
>
> Yes.  Battery backed cache can only do so much, it's not magic pixie
> dust.  Once it's full, the drive becomes the bottle neck.  Real db
> servers have more than one disk drive.  They usually have at least 4
> or so, and often dozens to hundreds. Also, not all battery backed
> caching RAID controllers are created equal.
>
> > procs -----------memory---------- ---swap-- -----io---- --system--
> > ----cpu---- r  b   swpd   free   buff  cache   si   so    bi    bo   in
> >  cs us sy id wa 0  0 398256  78328 140612 1063556    0    0     0  1472
> > 2029  5081  4  3 92  0 0  2 398256  78268 140612 1063576    0    0     0
> > 2304 1928  4216  0  2 98  0 1  2 398256  78100 140612 1063576    0    0
> >   0  1464 1716  3994  1  3 96  0 0  0 398256  78704 140612 1063592    0
> >  0     0   916 1435  3570  5  3 91  0 0  0 398256  77876 140612 1063616
> >  0    0     0     0  305  1169  3  1 96  0
>
> See that dip in the last line above where the blocks in drop to 0,
> idle jumps to 96, and blocks out drops, and context switches drop?
> That's most likely where postgresql is checkpointing.  Checkpointing
> is where it writes out all the dirty buffers to disk.  If the bgwriter
> is not tuned aggresively enough, checkpoints happen and make the whole
> database slow down for a few seconds.  If it's tuned too aggresively
> then the db spends too much CPU time tracking the dirty buffers and
> then writing them.  If tuned just right, it will write out the dirty
> buffers just fast enough that a checkpoint is never needed.
>
> You tune the bgwriter to your machine and I/O subsystem.  If you're
> planning on getting more hard drives, do that first.  Then tune the
> bgwriter.
>
> btw, if this is "vmstat 1" running, it's showing a checkpoint every 20
> or so seconds I think
>
> >  0  2 398256  79136 140612 1063964    0    0     0  1736 1959  4494  4  2
> > 94  0
>
> checkpoint here:
> >  0  0 398256  79132 140612 1063964    0    0     0     4  260  1039  1  1
> > 98  0 0  0 398256  79052 140612 1063980    0    0     0  2444 3084  6955
> > 6  5 89  0 0  2 398256  79060 140612 1063988    0    0     0   948 1146
> > 3616  3  1 96  0 0  1 398256  78268 140612 1064056    0    0     0  1908
> > 1809  4086  6  5 88  0 0  1 398256  76728 140612 1064056    0    0     0
> > 6256 6637 15472  5  5 90  0 0  2 398256  77000 140612 1064064    0    0
> >   0  4916 5840 12107  1  4 95  0 0  2 398256  76956 140612 1064068    0
> >  0     0  6468 7432 15211  1  3 96  0 0  6 398256  77388 140612 1064072
> >  0    0     0  8116 7826 18265  1  8 91  0 0  2 398256  74312 140612
> > 1064076    0    0     0  7032 6886 16136  2  7 91  0 0  2 398256  74264
> > 140612 1064076    0    0     0  5680 7143 13411  0  5 95  0 0  2 398256
> > 72980 140612 1064140    0    0     0  5396 6377 13251  6  6 88  0 0  3
> > 398256  76972 140612 1064148    0    0     0  5652 6793 14079  4  9 87  0
> > 0  2 398256  77836 140612 1064148    0    0     0  3968 5321 14187 10  8
> > 82  0 1  0 398256  77280 140612 1064148    0    0     0  1608 3188  8974
> > 21 12 67  0 1  0 398256  77832 140612 1064152    0    0     0   236  834
> > 2625  7  5 87  0 0  0 398256  77464 140612 1064152    0    0     0   244
> > 505  1378  2  4 94  0 1  0 398256  77828 140612 1064164    0    0     0
> > 316  580  1954  7  2 91  0 0  0 398256  77804 140612 1064180    0    0
> >  0   740  673  2248  2  2 96  0 0  0 398256  77000 140612 1064180    0
> > 0     0   304  589  1739  1  3 96  0
>
> 20 rows later, checkpoint here:
> >  0  0 398256  77000 140612 1064184    0    0     0     0  216   886  0  1
> > 99  0 0  0 398256  75452 140612 1064184    0    0     0   432  755  2032
> > 6  1 93  0
> >
> > max_fsm_pages = 40000                   # min max_fsm_relations*16, 6
> > bytes each, APUS
>
> This seems a little low for a busy server.
>
> > # - Background writer -
> >
> > bgwriter_delay = 100                    # 10-10000 milliseconds between
> > rounds, APUS bgwriter_lru_percent = 2.0              # 0-100% of LRU
> > buffers scanned/round, APUS bgwriter_lru_maxpages = 10              #
> > 0-1000 buffers max written/round, APUS bgwriter_all_percent = 1
> >      # 0-100% of all buffers scanned/round, APUS bgwriter_all_maxpages =
> > 10              # 0-1000 buffers max written/round, APUS
>
> So, bgwriter wakes up 10 times a second, and each time it processes 2%
> of the Least Recently Used pages for writing, and writes up to 10 of
> those pages.  And it only checks 1% of the total pages and writes 10
> of those at the most.  This is not aggresive enough, and given how
> much spare CPU you have left over, you can be a fair bit more
> aggresive.  The main thing to increase is the maxes.  Try changing
> them to the 100 to 300 range, and maybe increase your percentages to
> 5% or so.  What we're shooting for is to see those checkpoints go
> away.
>
> Then, when running your benchmark, after a few minutes, run a
> checkpoint by hand and see if you get one of those slow downs like we
> saw in vmstat above.  If your bgwriter is tuned properly, you should
> get an almost instant response from the checkpoint and no noticeable
> slow down in the vmstat numbers for context switches per second.
>
> Once you reach the point where the bgwriter is just keeping ahead of
> check points, there's little to be gained in more aggressive tuning of
> the bgwriter and you'll just be chewing up memory and cpu bandwidth if
> you do get too aggressive with it.

Hi Scott,

thank you for the great suggestions, i will keep the list informed.

br,
Peter

--
Peter Bauer
APUS Software G.m.b.H.
A-8074 Raaba, Bahnhofstrasse 1/1
Email: peter.bauer@apus.co.at
Tel: +43 316 401629 24
Fax: +43 316 401629 9

pgsql-general by date:

Previous
From: gongzhixiao@gmail.com
Date:
Subject: I have a select statement on the issue.
Next
From: Magnus Hagander
Date:
Subject: Re: Linux v.s. Mac OS-X Performance