Re: Slow Query / Check Point Segments - Mailing list pgsql-general

From John R Pierce
Subject Re: Slow Query / Check Point Segments
Date
Msg-id 4B5B7373.5020505@hogranch.com
Whole thread Raw
In response to Re: Slow Query / Check Point Segments  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-general
Greg Smith wrote:
> John R Pierce wrote:
>> I know the database has a lot of write volume overall, and its only
>> one of several databases running in different zones on the server.  I
>> know nothing about the SAN, I suspect its a EMC Symmetrix of some
>> sort.  Probably a generation or two behind latest.   The operations
>> people are used to running large oracle databases.
>
> One thing you might try is making the PostgreSQL install act more like
> an Oracle one in terms of how it does writes.  By default, PostgreSQL
> does its WAL writes by writing and then calling a sync method.  On
> Solaris, you should be able to safely change this in the
> postgresql.conf file to be:
>

iostat -x 5  or 15 shows a bunch of LUNs are as much as 10% busy, but
the service time stays under 50ms... this is one of the busiest of the
15 second samples over a 10 min period  I'm just showing a representive
sampling of the busiest LUNs out of 40,   the physical storage is all
raid10's on a DMX4.

                 extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
sd1       0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
...
ssd10     0.0   33.2    0.0  546.6  0.0  1.0   28.9   0   7
ssd11     0.0   27.7    0.0  573.9  0.0  0.9   34.0   0   6
ssd12     0.0   56.2    0.0  576.6  0.0  1.5   26.5   0  10
ssd13     0.0   30.8    0.0  505.5  0.0  0.9   28.7   0   7
ssd14     0.0   42.9    0.0  498.4  0.0  1.4   32.9   0   9
ssd15     0.0   38.3    0.0  557.6  0.0  1.7   44.0   0   8
ssd16     0.0   41.1    0.0  520.5  0.0  1.3   32.0   0   9
ssd17     0.0   52.4    0.0  528.6  0.0  1.3   24.8   0  10
ssd18     0.0   29.0    0.0  503.9  0.0  1.2   41.4   0   7
...

most show near 0% busy and 10/th that volume of writes or reads.

zpool iostat 15   also shows this io peak for about a minute or two
every few minutes but its nowhere near hardware capacity

$ zpool iostat 15
               capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G    109    386  1.16M  6.98M
data-p2   340M  33.2G      0      2    491  36.8K
----------  -----  -----  -----  -----  -----  -----
..............
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     40      6   527K   164K
data-p2   341M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     36  1.35K   306K  19.2M
data-p2   341M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     45    585   445K  2.01M
data-p2   341M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     47     38   410K  1.05M
data-p2   341M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     68    145   745K  4.11M
data-p2   341M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     38    168   311K  4.60M
data-p2   340M  33.2G      0      9      0   294K
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     55     65   504K  1.18M
data-p2   340M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G      5  1.07K  43.1K  9.71M
data-p2   340M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
data-p1   268G   101G     46      7   549K   179K
data-p2   340M  33.2G      0      0      0      0
----------  -----  -----  -----  -----  -----  -----
^C


> wal_sync_method=open_datasync
>
> Which I don't think is the default (you can confirm with "show
> wal_sync_method;" via psql on your database).  That will use O_DSYNC
> writes, which are more like how Oracle approaches this and therefore
> potentially a better tuned path for your install.

thats what its set to now.

>
> More on this subject, including idea for further tweaking
>
> http://www.postgresql.org/docs/8.4/static/runtime-config-wal.html
> http://blogs.sun.com/jkshah/entry/postgresql_wal_sync_method_and
> http://www.westnet.com/~gsmith/content/postgresql/TuningPGWAL.htm
>
> From what you've shown and described, I'm not sure what other
> PostgreSQL tuning you might do to improve the specific symptoms you're
> seeing.  The particular issue you've got I'd normally start attacking
> on the filesystem and hardware side of things.  There's a reason why
> many people avoid SANs in this context, there's a lot of ways you can
> screw up in this particular aspect of their performance relative to
> what you get with direct attached storage, and it's hard to tell which
> you've run into when a problem does pop up.
>

The SAN is a dmx4, and it seems to be damn fast by all low level tests i
can run.     this database is really getting hammered on.   a couple of
the tables take 8-12 hours to vacuum.   these are tables with many
millions of small rows that are updated randomly at a high speed, using
a rather large primary key (the PK is 6 or 8 short text fields, the
'data' is a couple counters).   The performance got a lot better when we
set these tables for fill factor of 50 but the table size doubled (duh)
which apparently produced its own problems so they are trying fill
factor 70.  there's probably 100 connections.

cpu and vm usage for a couple minutes...

$ vmstat 15
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s1 sd sd sd   in   sy   cs us
sy id
 2 0 0 30980016 1555616 20 260 40 0 0 0  0 -0  4  4  5 2231 21989 11168
25 4 72
 0 0 0 30765664 1227536 57 368 185 0 0 0 0  0  8 13  5 1366 64055 48764
34 7 59
 2 0 0 30757576 1218784 0 12 0  0  0  0  0  0  8  8  9 7137 82432 58548
53 12 36
 3 0 0 30750528 1211736 0 15 0  0  0  0  0  0 39 44 35 2959 85558 64763
40 10 51
 0 0 0 30745624 1206832 0 9  0  0  0  0  0  0  0  0  1 1253 65697 51775
32 7 62
 0 0 0 30739896 1201104 21 127 97 0 0 0  0  0  3  2  5 1271 67217 52826
34 7 59
 1 0 0 30729832 1191592 32 296 0 0 0  0  0  0  0  1  2 5030 82152 53824
45 12 43
 3 0 0 30722528 1184328 28 239 0 0 0  0  0  0 11 10 15 2201 87922 66675
38 10 52
 1 0 0 30720800 1181984 0 12 0  0  0  0  0  0 36 34 60 1970 68696 52730
38 8 54
 0 0 0 30716152 1177320 34 248 259 0 0 0 0  0  3  3 10 1139 71591 55853
31 8 61
 3 0 0 30708216 1169312 0 13 0  0  0  0  0  0  6  6  6 9114 90349 60455
56 15 29
 5 0 0 30700448 1161544 0 15 0  0  0  0  0  0 29 29 36 3196 61634 34290
47 11 41
 0 0 0 30694528 1155624 0 10 0  0  0  0  0  0 35 35 32 1343 62273 48045
33 7 60
 0 0 0 30686688 1148592 36 282 171 0 0 0 0  0  2  2  5 1068 75579 59492
30 8 62


I'm thinking its zfs tuning we need to do, not physical disk IO, and
thats a topic for another list, I guess.  I know quite a lot has been
done, using different zfs block sizes for different tablespaces, putting
index and data in different tablespaces, and so forth.




pgsql-general by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: more docs on extending postgres in C
Next
From: xu fei
Date:
Subject: FTS uses "tsquery" directly in the query