Thread: PG performance issues related to storage I/O waits
Hello team,
We have serious performance issues with a production EDB 9.2AS installation
The issue is mostly related to storage I/O bottlenecks during peak hours and we are looking for tunables on any level that could reduce the I/O spikes on SAN and improve overall DB performance.
Our storage array consists of 16 disks in RAID-10 topology (device 253-2 on our OS configuration). We are also using RAID-5 for archive_log storage (also presented by SAN to the same machine - device 253-3)
We have set synchronous_commit to off but since almost all of application queries are using prepared statements we don't get any real benefit.
We are using VMware , VMFS and LVM so we need your feedback on any kind of tunable that could remove load from storage during peak hours (FYI application peak hours are 13:00-23:00 UTC, during night (04:00-06:00 UTC) there are some heavy reporting activity + backups)
Archive logs are rsync-ed to a remote backup server every 20 minutes.
Also please advise on any postgres.conf modification that could significantly affect storage load (WAL-checkpoint configuration etc.) (we have not tried to move pg_xlog to a separate LUN since this is not an option - any other LUN would be using the same storage pool as the rest of the /pgdata files)
We had some issues in the past with autovaccum deamon failing to work efficiently under high load so we have already applied your instructions for a more aggressive auto-vacumm policy (changes already applied on postgresql.conf)
Let me know if you want me to attach all the usual info for tickets regarding (OS, disks, PG conf, etc) plus the sar output and server logs from the last 3 days (24,25,26 June).
Thanks,
Tasos
On 31 Červenec 2013, 13:58, Tasos Petalas wrote: > Hello team, > > We have serious performance issues with a production EDB 9.2AS > installation Are you sure you've sent this to the right list? Because this is a community mailing list for PostgreSQL, not for EDB. If you have a support contract with EDB it's probably better to ask them directly (e.g. they might give you advices about some custom features not available in vanilla PostgreSQL). > The issue is mostly related to storage I/O bottlenecks during peak hours > and we are looking for tunables on any level that could reduce the I/O > spikes on SAN and improve overall DB performance. So is that a dedicated DWH machine, and PostgreSQL is responsible for most of the I/O load? Which processes are doing that? Backends handling queries or some background processes (say, checkpoints)? Is that random or sequential I/O, reads or writes, ...? How much I/O are we talking about? Could it be that the SAN is overloaded by someone else (in case it's not dedicated to the database)? It might turn out that the most effective solution is tuning the queries that are responsible for the I/O activity. > Our storage array consists of 16 disks in RAID-10 topology (device 253-2 > on > our OS configuration). We are also using RAID-5 for archive_log storage > (also presented by SAN to the same machine - device 253-3) I have no clue what device 253-3 is, but I assume you're using SAS disks. > > We have set synchronous_commit to off but since almost all of application > queries are using prepared statements we don't get any real benefit. Ummmm, how is this related? AFAIK those are rather orthogonal features, i.e. prepared statements should benefit from synchronous_commit=off just like any other queries. > We are using VMware , VMFS and LVM so we need your feedback on any kind of > tunable that could remove load from storage during peak hours (FYI > application peak hours are 13:00-23:00 UTC, during night (04:00-06:00 UTC) > there are some heavy reporting activity + backups) > Archive logs are rsync-ed to a remote backup server every 20 minutes. > > Also please advise on any postgres.conf modification that could > significantly affect storage load (WAL-checkpoint configuration etc.) (we > have not tried to move pg_xlog to a separate LUN since this is not an > option - any other LUN would be using the same storage pool as the rest of > the /pgdata files) > We had some issues in the past with autovaccum deamon failing to work > efficiently under high load so we have already applied your instructions > for a more aggressive auto-vacumm policy (changes already applied on > postgresql.conf) > > Let me know if you want me to attach all the usual info for tickets > regarding (OS, disks, PG conf, etc) plus the sar output and server logs > from the last 3 days (24,25,26 June). Well, we can't really help you unless you give us this, so yes - attach this info. And please try to identify what is actually causing most I/O activity (e.g. using "iotop"). Tomas
Hi Tasos, (2013/07/31 20:58), Tasos Petalas wrote: > We have set synchronous_commit to off but since almost all of application queries > are using prepared statements we don't get any real benefit. > Also please advise on any postgres.conf modification that could significantly > affect storage load (WAL-checkpoint configuration etc.) Please send us to your postgresql.conf and detail of your server (memory ,raid cache size and OS version). I think your server's memory is big. You should set small dirty_background_ratio or dirty_background_byte. If your server's memory is big, PostgreSQL's checkpoint executes stupid disk-write. By the way, I make new checkpoint scheduler and method now. If you could send more detail information which is like response time or load average, you will help me to make better my patch. Best regards, -- Mitsumasa KONDO NTT Open Source Software Center
Hi, On 1.8.2013 21:47, Tasos Petalas wrote:> I attach postgresql.conf, sar output and server/os/disk info > > Will update you with iotop outputs shortly. So, have you found anything interesting using iotop? >> On Wed, Jul 31, 2013 at 3:40 PM, Tomas Vondra <tv@fuzzy.cz >> <mailto:tv@fuzzy.cz>> wrote: >> >> On 31 Červenec 2013, 13:58, Tasos Petalas wrote: >>> Hello team, >>> >>> We have serious performance issues with a production EDB 9.2AS >>> installation >> >> Are you sure you've sent this to the right list? Because this is a >> community mailing list for PostgreSQL, not for EDB. If you have a >> support contract with EDB it's probably better to ask them directly >> (e.g. they might give you advices about some custom features not >> available in vanilla PostgreSQL). > > We do have a support contract with did not provide any valuable > feedback so far. As a person coming from the community world I > believe I can get answers here. You can certainly get help here, although probably only for plain PostgreSQL, not for the features available only in EDB. >>> The issue is mostly related to storage I/O bottlenecks during >>> peak hours and we are looking for tunables on any level that >>> could reduce the I/O spikes on SAN and improve overall DB >>> performance. >> >> So is that a dedicated DWH machine, and PostgreSQL is responsible >> for most of the I/O load? Which processes are doing that? Backends >> handling queries or some background processes (say, checkpoints)? >> Is that random or sequential I/O, reads or writes, ...? > > The system is a heavy OLTP system. Bottlenecks are mostly related to > I/O writes (many small chunks). Machine is only hosting postgres. That however still doesn't say which processes are responsible for that. Is that background writer, backends running queries or what? The iotop should give you answer to this (or at least a hint). > How much I/O are we talking about? Could it be that the SAN is > overloaded by someone else (in case it's not dedicated to the > database)? > > Check SAR results (pg_data resides on dev 253-2 (RAID-10), > pg_archives on 253-3 (RAID-5) What is pg_archives? Also RAID-5 is generally poor choice for write intensive workloads. Also, how are these volumes defined? Do they use distinct sets of disks? How many disks are used for each volume? > It might turn out that the most effective solution is tuning the > queries that are responsible for the I/O activity. > > SAN is almost dedicated to the host running postgres. And? How does that contradict my suggestion to tune the queries? For each piece of hardware there are bottlenecks and a query that can hit them. In those cases the best solution often is tuning the query. >>> Our storage array consists of 16 disks in RAID-10 topology >>> (device 253-2 on our OS configuration). We are also using RAID-5 >>> for archive_log storage (also presented by SAN to the same >>> machine - device 253-3) >> >> I have no clue what device 253-3 is, but I assume you're using SAS >> disks. > > Yes we are using 15K SAS disks in RAID 10. (253-2 dev refers to sar > output for disks) OK, so the pg_archives is probably for xlog archive, right? >>> We have set synchronous_commit to off but since almost all of >>> application queries are using prepared statements we don't get >>> any real benefit. >> >> Ummmm, how is this related? AFAIK those are rather orthogonal >> features, i.e. prepared statements should benefit from >> synchronous_commit=off just like any other queries. > > it is not prepared statements. It is distributed transactions > (queries inside a PREPARE TRANSACTION block). / / /Certain utility > commands, for instance DROP TABLE, are forced to commit > synchronously regardless of the setting of synchronous_commit. This > is to ensure consistency between the server's file system and the > logical state of the database. The commands supporting two-phase > commit, such as PREPARE TRANSACTION, are also always synchronous./ > > _/Taken form > http://www.postgresql.org/docs/9.2/static/wal-async-commit.html/_ Well, then please use the proper term - prepared transactions and prepared statements are two very different things. But yes, if you're using prepared transactions (for 2PC), then yes, synchronous_commit=off is not going to improve anything as it simply has to be synchronous. >>> We are using VMware , VMFS and LVM so we need your feedback on >>> any kind of tunable that could remove load from storage during >>> peak hours (FYI application peak hours are 13:00-23:00 UTC, >>> during night (04:00-06:00 UTC) there are some heavy reporting >>> activity + backups) Archive logs are rsync-ed to a remote backup >>> server every 20 minutes. >>> >>> Also please advise on any postgres.conf modification that could >>> significantly affect storage load (WAL-checkpoint configuration >>> etc.) (we have not tried to move pg_xlog to a separate LUN since >>> this is not an option - any other LUN would be using the same >>> storage pool as the rest of the /pgdata files) We had some >>> issues in the past with autovaccum deamon failing to work >>> efficiently under high load so we have already applied your >>> instructions for a more aggressive auto-vacumm policy (changes >>> already applied on postgresql.conf) >>> >>> Let me know if you want me to attach all the usual info for >>> tickets regarding (OS, disks, PG conf, etc) plus the sar output >>> and server logs from the last 3 days (24,25,26 June). >> >> Well, we can't really help you unless you give us this, so yes - >> attach this info. And please try to identify what is actually >> causing most I/O activity (e.g. using "iotop"). > > SAR outputs, postgresql.conf, other os/system h/w info attached. I've checked the conf, and I think you should really consider increasing checkpoint_segments - it's set to 3 (= 64MB) but I think something like 32 (=512MB) or even more would be more appropriate. I see you've enabled log_checkpoints - can you check your logs how often the checkpoints happen? Also, can you check pg_stat_bgwriter view? I'd bet the value in checkpoints_timed is very low, compared to checkpoints_req. Or even better, get the values from this view before / after running the batch jobs. Anyway, I don't think writes are the main problem here - see this: DEV tps rd_sec/s wr_sec/s await %util 04:00:01 dev253-2 176.56 134.07 1378.08 3.70 3.53 04:10:01 dev253-2 895.22 11503.99 6735.08 16.63 8.24 04:20:01 dev253-2 455.35 25523.80 1362.37 2.38 16.81 04:30:01 dev253-2 967.29 95471.88 4193.50 6.70 37.44 04:40:01 dev253-2 643.31 80754.86 2456.40 3.35 29.70 04:50:01 dev253-2 526.35 84990.05 1323.28 2.07 29.41 05:00:01 dev253-2 652.68 73192.18 1297.20 1.89 28.51 05:10:01 dev253-2 1256.31 34786.32 5840.08 9.25 53.08 05:20:01 dev253-2 549.84 14530.45 3522.85 8.12 9.89 05:30:01 dev253-2 1363.27 170743.78 5490.38 7.53 59.75 05:40:01 dev253-2 978.88 180199.97 1796.90 2.54 74.08 05:50:01 dev253-2 1690.10 166467.91 8013.10 35.45 66.32 06:00:01 dev253-2 2441.94 111316.65 15245.05 34.90 41.78 it's a slightly modified sar output for the main data directory (on 253-2). It clearly shows you're doing ~50MB/s of (random) reads compared to less than 5MB/s of writes (assuming a sector is 512B). There's almost no activity on the pg_archives (253-3) device: 00:00:01 DEV tps rd_sec/s wr_sec/s await %util 04:00:01 dev253-3 20.88 0.01 167.00 14.10 0.02 04:10:01 dev253-3 211.06 0.05 1688.43 43.61 0.12 04:20:01 dev253-3 14.30 0.00 114.40 9.95 0.01 04:30:01 dev253-3 112.78 0.45 901.75 17.81 0.06 04:40:01 dev253-3 14.11 0.00 112.92 10.66 0.01 04:50:01 dev253-3 7.39 56.94 56.85 10.91 0.04 05:00:01 dev253-3 14.21 0.00 113.67 10.92 0.01 05:10:01 dev253-3 7.05 0.26 56.15 17.03 0.02 05:20:01 dev253-3 28.38 18.20 208.87 8.68 0.29 05:30:01 dev253-3 41.71 0.03 333.63 14.70 0.03 05:40:01 dev253-3 6.95 0.00 55.62 10.39 0.00 05:50:01 dev253-3 105.36 386.44 830.83 9.62 0.19 06:00:01 dev253-3 13.92 0.01 111.34 10.41 0.01 In the afternoon it's a different story - for 253-2 it looks like this: DEV tps rd_sec/s wr_sec/s await %util 15:50:01 dev253-2 4742.91 33828.98 29156.17 84.84 105.14 16:00:01 dev253-2 2781.05 12737.41 18878.52 19.24 80.53 16:10:01 dev253-2 3661.51 20950.64 23758.96 36.86 99.03 16:20:01 dev253-2 5011.45 32454.33 31895.05 72.75 102.38 16:30:01 dev253-2 2638.08 14661.23 17853.16 25.24 75.64 16:40:01 dev253-2 1988.95 5764.73 14190.12 45.05 58.80 16:50:01 dev253-2 2185.15 88296.73 11806.38 7.46 84.37 17:00:01 dev253-2 2031.19 12835.56 12997.34 8.90 82.62 17:10:01 dev253-2 4009.24 34288.71 23974.92 38.07 103.01 17:20:01 dev253-2 3605.86 26107.83 22457.41 45.76 90.90 17:30:01 dev253-2 2550.47 7496.85 18267.07 19.10 65.87 So this is about 50:50 reads and writes, and this is also the time when with some measurable activity on the 253-3 device: 00:00:01 DEV tps rd_sec/s wr_sec/s await %util 15:50:01 dev253-3 1700.97 9739.48 13249.75 22.63 8.53 16:00:01 dev253-3 807.44 512.95 6439.17 15.21 0.82 16:10:01 dev253-3 1236.72 22.92 9892.26 28.74 0.95 16:20:01 dev253-3 1709.15 0.52 13672.70 40.89 1.69 16:30:01 dev253-3 919.26 8217.60 7051.60 20.40 11.74 16:40:01 dev253-3 601.66 0.37 4812.94 18.99 0.39 16:50:01 dev253-3 476.40 0.42 3810.95 10.02 0.28 17:00:01 dev253-3 636.03 0.15 5088.08 11.01 0.35 17:10:01 dev253-3 1259.55 165.64 10069.65 15.18 1.01 17:20:01 dev253-3 1194.10 0.29 9552.49 26.11 0.94 17:30:01 dev253-3 785.40 2000.52 6201.21 33.12 3.06 Still, this is pretty low write activity, and it's sequential. What I think you could/should do: * increase checkpoint_segments to 64 (or something) * move pg_xlog to a separate device (not simply a volume on the SAN, sharing disks with the other volumes - that won't give you anything) I'd expect these changes to improve the afternoon peak, as it's doing about 50% writes. However I would not expect this to improve the morning peak, because that's doing a lot of reads (not writes). The trick here is that your database is ~300GB - you're doing a lot of seeks across the whole database, and it's way more than the RAM. So it has to actually access the disks. 15k disks have ~200 IOPS each, adding them to RAID-10 generally gives you an array with (n/2) * (IOPS for a single disk) I don't know how exactly are your volumes defined on your SAN, but assuming you have RAID-10 on 12 drives gives you ~1200 IOPS. This is the bottleneck you're hitting, and there's not much you can do about it: * getting better storage (giving you more seeks) - say SSD-like disks * improving the application/queries/schema to minimize the seeks And so on. Tomas
Hi, On 5.8.2013 17:55, Tasos Petalas wrote: > > Seems most of the I/O is caused by SELECT backend processes (READ), > whereas (WRITE) requests of wal writer and checkpointer processes do > not appear as top IO proceses (correct me if I am wrong) > > E.g. check the follwoing heavy write process that reports 0% I/O ...! > > 14:09:40 769 be/4 enterpri 0.00 B/s 33.65 M/s 0.00 % 0.00 % > postgres: wal writer process That's because the WAL writer does sequential I/O (writes), which is a perfect match for SAS drives. OTOH the queries do a lot of random reads, which is a terrible match for spinners. > That however still doesn't say which processes are responsible > for that. > Is that background writer, backends running queries or what? The > iotop > should give you answer to this (or at least a hint). > > > It seems most of I/O reported from backends running heavy concurrent > select queries (See iotop attachment in previous email) Yes, that seems to be the case. > Also, how are these volumes defined? Do they use distinct sets > of disks? > How many disks are used for each volume? > > > These are LUNs from SAN (we have dedicated 16 SAS 2,5'' disks in RAID-10 > topology in Storage) I do understand these are LUNs from the SAN. I was asking whether there are separate sets of disks for the data directory (which you mentioned to be RAID-10) and pg_archives (which you mentioned to be RAID-5). Although I doubt it'd be possible to use the same disk for two LUNs. > > Yes we are using 15K SAS disks in RAID 10. (253-2 dev refers > to sar > > output for disks) > > OK, so the pg_archives is probably for xlog archive, right? > > NO. > /pg_archives is the target mount_point where we copy archive_logs to > (archive_command = 'test ! -f /pg_archives/%f && cp %p /pg_archives/%f') ... which is exactly what WAL archive is. That's why the GUC is called archive_command. > I've checked the conf, and I think you should really consider > increasing > checkpoint_segments - it's set to 3 (= 64MB) but I think > something like > 32 (=512MB) or even more would be more appropriate. > > We use EDB dynatune. Actual setting can be found in file > (Ticket.Usual.Info.27.07.13.txt) of initial e-mail --> check show all; > section > Current checkpoint_segments is set to 64 OK, I'm not familiar with dynatune, and I got confused by the postgresql.conf that you sent. 64 seems fine to me. > I see you've enabled log_checkpoints - can you check your logs > how often > the checkpoints happen? > > > This is the output of the checkpoints during peak hours (avg. every 2-5 > minutes) > > 2013-08-02 14:00:20 UTC [767]: [19752]: [0]LOG: checkpoint complete: > wrote 55926 buffers (5.3%); 0 transaction log file(s) added, 0 removed, > 41 recycled; write=220.619 s, sync= > 5.443 s, total=226.152 s; sync files=220, longest=1.433 s, average=0.024 s > 2013-08-02 14:05:14 UTC [767]: [19754]: [0]LOG: checkpoint complete: > wrote 109628 buffers (10.5%); 0 transaction log file(s) added, 0 > removed, 31 recycled; write=209.714 s, syn > c=9.513 s, total=219.252 s; sync files=222, longest=3.472 s, average=0.042 s Meh, seems OK to me. This was based on the incorrect number of checkpoint segments ... > > > > Also, can you check pg_stat_bgwriter view? I'd bet the value in > checkpoints_timed is very low, compared to checkpoints_req. Or even > better, get the values from this view before / after running the > batch jobs. > > Results during load: > checkpoints_timed : 12432 , checkpoints_req = 3058 Again, seems fine. > In the afternoon it's a different story - for 253-2 it looks > like this: > > DEV tps rd_sec/s wr_sec/s await %util > 15:50:01 dev253-2 4742.91 33828.98 29156.17 84.84 105.14 > 16:00:01 dev253-2 2781.05 12737.41 18878.52 19.24 80.53 > 16:10:01 dev253-2 3661.51 20950.64 23758.96 36.86 99.03 > 16:20:01 dev253-2 5011.45 32454.33 31895.05 72.75 102.38 > 16:30:01 dev253-2 2638.08 14661.23 17853.16 25.24 75.64 > 16:40:01 dev253-2 1988.95 5764.73 14190.12 45.05 58.80 > 16:50:01 dev253-2 2185.15 88296.73 11806.38 7.46 84.37 > 17:00:01 dev253-2 2031.19 12835.56 12997.34 8.90 82.62 > 17:10:01 dev253-2 4009.24 34288.71 23974.92 38.07 103.01 > 17:20:01 dev253-2 3605.86 26107.83 22457.41 45.76 90.90 > 17:30:01 dev253-2 2550.47 7496.85 18267.07 19.10 65.87 > > > This is when the actual problem arises Well, then I think it's mostly about the SELECT queries. > What I think you could/should do: > > * move pg_xlog to a separate device (not simply a volume on the SAN, > sharing disks with the other volumes - that won't give you > anything) > > Unfortunately we cannot do so at the moment (alll available SAN > resources are assigned to the pg_data directory of the server) > > I'd expect these changes to improve the afternoon peak, as it's > doing > about 50% writes. However I would not expect this to improve the > morning > peak, because that's doing a lot of reads (not writes). > > Afternoon peak is what we need to troubleshoot (will check if we can > assign pg_xlog to a different LUN - not an option currently) OK, understood. It's difficult to predict the gain and given the iotop output it might even cause harm. > > Will SSD improve write performance? We are thinking of moving towards > this direction. It'll certainly improve the random I/O in general, which is the main issue with SELECT queries. Sequential read/write improvement probably won't be that significant. Tomas
Hi,That's because the WAL writer does sequential I/O (writes), which is a
On 5.8.2013 17:55, Tasos Petalas wrote:
>
> Seems most of the I/O is caused by SELECT backend processes (READ),
> whereas (WRITE) requests of wal writer and checkpointer processes do
> not appear as top IO proceses (correct me if I am wrong)
>
> E.g. check the follwoing heavy write process that reports 0% I/O ...!
>
> 14:09:40 769 be/4 enterpri 0.00 B/s 33.65 M/s 0.00 % 0.00 %
> postgres: wal writer process
perfect match for SAS drives.
OTOH the queries do a lot of random reads, which is a terrible match for
spinners.Yes, that seems to be the case.
> That however still doesn't say which processes are responsible
> for that.
> Is that background writer, backends running queries or what? The
> iotop
> should give you answer to this (or at least a hint).
>
>
> It seems most of I/O reported from backends running heavy concurrent
> select queries (See iotop attachment in previous email)I do understand these are LUNs from the SAN. I was asking whether there
> Also, how are these volumes defined? Do they use distinct sets
> of disks?
> How many disks are used for each volume?
>
>
> These are LUNs from SAN (we have dedicated 16 SAS 2,5'' disks in RAID-10
> topology in Storage)
are separate sets of disks for the data directory (which you mentioned
to be RAID-10) and pg_archives (which you mentioned to be RAID-5).
Although I doubt it'd be possible to use the same disk for two LUNs.
... which is exactly what WAL archive is. That's why the GUC is called
> > Yes we are using 15K SAS disks in RAID 10. (253-2 dev refers
> to sar
> > output for disks)
>
> OK, so the pg_archives is probably for xlog archive, right?
>
> NO.
> /pg_archives is the target mount_point where we copy archive_logs to
> (archive_command = 'test ! -f /pg_archives/%f && cp %p /pg_archives/%f')
archive_command.
OK, I'm not familiar with dynatune, and I got confused by the
> I've checked the conf, and I think you should really consider
> increasing
> checkpoint_segments - it's set to 3 (= 64MB) but I think
> something like
> 32 (=512MB) or even more would be more appropriate.
>
> We use EDB dynatune. Actual setting can be found in file
> (Ticket.Usual.Info.27.07.13.txt) of initial e-mail --> check show all;
> section
> Current checkpoint_segments is set to 64
postgresql.conf that you sent. 64 seems fine to me.
Meh, seems OK to me. This was based on the incorrect number of
> I see you've enabled log_checkpoints - can you check your logs
> how often
> the checkpoints happen?
>
>
> This is the output of the checkpoints during peak hours (avg. every 2-5
> minutes)
>
> 2013-08-02 14:00:20 UTC [767]: [19752]: [0]LOG: checkpoint complete:
> wrote 55926 buffers (5.3%); 0 transaction log file(s) added, 0 removed,
> 41 recycled; write=220.619 s, sync=
> 5.443 s, total=226.152 s; sync files=220, longest=1.433 s, average=0.024 s
> 2013-08-02 14:05:14 UTC [767]: [19754]: [0]LOG: checkpoint complete:
> wrote 109628 buffers (10.5%); 0 transaction log file(s) added, 0
> removed, 31 recycled; write=209.714 s, syn
> c=9.513 s, total=219.252 s; sync files=222, longest=3.472 s, average=0.042 s
checkpoint segments ...>Again, seems fine.
>
>
> Also, can you check pg_stat_bgwriter view? I'd bet the value in
> checkpoints_timed is very low, compared to checkpoints_req. Or even
> better, get the values from this view before / after running the
> batch jobs.
>
> Results during load:
> checkpoints_timed : 12432 , checkpoints_req = 3058
checkpoints_timed : 12580 checkpoints_req : 3070
> In the afternoon it's a different story - for 253-2 it looksWell, then I think it's mostly about the SELECT queries.
> like this:
>
> DEV tps rd_sec/s wr_sec/s await %util
> 15:50:01 dev253-2 4742.91 33828.98 29156.17 84.84 105.14
> 16:00:01 dev253-2 2781.05 12737.41 18878.52 19.24 80.53
> 16:10:01 dev253-2 3661.51 20950.64 23758.96 36.86 99.03
> 16:20:01 dev253-2 5011.45 32454.33 31895.05 72.75 102.38
> 16:30:01 dev253-2 2638.08 14661.23 17853.16 25.24 75.64
> 16:40:01 dev253-2 1988.95 5764.73 14190.12 45.05 58.80
> 16:50:01 dev253-2 2185.15 88296.73 11806.38 7.46 84.37
> 17:00:01 dev253-2 2031.19 12835.56 12997.34 8.90 82.62
> 17:10:01 dev253-2 4009.24 34288.71 23974.92 38.07 103.01
> 17:20:01 dev253-2 3605.86 26107.83 22457.41 45.76 90.90
> 17:30:01 dev253-2 2550.47 7496.85 18267.07 19.10 65.87
>
>
> This is when the actual problem arises
> What I think you could/should do:
>> * move pg_xlog to a separate device (not simply a volume on the SAN,OK, understood. It's difficult to predict the gain and given the iotop
> sharing disks with the other volumes - that won't give you
> anything)
>
> Unfortunately we cannot do so at the moment (alll available SAN
> resources are assigned to the pg_data directory of the server)
>
> I'd expect these changes to improve the afternoon peak, as it's
> doing
> about 50% writes. However I would not expect this to improve the
> morning
> peak, because that's doing a lot of reads (not writes).
>
> Afternoon peak is what we need to troubleshoot (will check if we can
> assign pg_xlog to a different LUN - not an option currently)
output it might even cause harm.It'll certainly improve the random I/O in general, which is the main
>
> Will SSD improve write performance? We are thinking of moving towards
> this direction.
issue with SELECT queries. Sequential read/write improvement probably
won't be that significant.
Tomas
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance