Thread: Disk latency goes up during certaing pediods

Disk latency goes up during certaing pediods

From
German Becker
Date:
Hi list,

I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for pg_xlog, using ext4 filesystem with journaling in writeback mode
During high load times, the disk usage is arround 40%. The IO write time is constant at about 3ms. On certain occasions  roughly once in 15 days, the IO write time goes up to about 10ms. This makes the disk usage go up to almost 100%, probably saturation, and the INSERTS DELETES UPDATES run considerable slower than normal.This lasts for about 2 hours and then the latency goes back to 3ms and everything is normal again. 
Has anyone seen this behavior? What could be causing the increase in latency?

Thanks!

Germán Becker

Re: Disk latency goes up during certaing pediods

From
Alvaro Herrera
Date:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From
Brett Stauner
Date:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?


On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


Re: Disk latency goes up during certaing pediods

From
German Becker
Date:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?


On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Disk latency goes up during certaing pediods

From
Luis
Date:

Is your temp stats dir in the same disk?

On Jul 29, 2013 7:59 PM, "German Becker" <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?


On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Disk latency goes up during certaing pediods

From
Brett Stauner
Date:
Okay, so it's not happening at the same time of day or anything.  What are your mount options for the WAL disk?


On Mon, Jul 29, 2013 at 12:58 PM, German Becker <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?


On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services




Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.


On Mon, Jul 29, 2013 at 3:01 PM, Brett Stauner <brett@mightybs.net> wrote:
Okay, so it's not happening at the same time of day or anything.  What are your mount options for the WAL disk?


On Mon, Jul 29, 2013 at 12:58 PM, German Becker <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?


On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services





Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
BTW I have all this data ploted using munin, let me know if you are intrested in looking at the graphs, I send them


On Mon, Jul 29, 2013 at 4:41 PM, German Becker <german.becker@gmail.com> wrote:
Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.


On Mon, Jul 29, 2013 at 3:01 PM, Brett Stauner <brett@mightybs.net> wrote:
Okay, so it's not happening at the same time of day or anything.  What are your mount options for the WAL disk?


On Mon, Jul 29, 2013 at 12:58 PM, German Becker <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?


On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...


On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services






Re: Disk latency goes up during certaing pediods

From
bricklen
Date:

On Mon, Jul 29, 2013 at 12:41 PM, German Becker <german.becker@gmail.com> wrote:
Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.


I haven't been following this thread and might have missed it, but did you show your checkpoint_completion_target?

Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
Here are the values:

# - Checkpoints -

checkpoint_segments = 256               # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min              # range 30s-1h
checkpoint_completion_target = 0.7      # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s               # 0 disables





On Mon, Jul 29, 2013 at 5:06 PM, bricklen <bricklen@gmail.com> wrote:

On Mon, Jul 29, 2013 at 12:41 PM, German Becker <german.becker@gmail.com> wrote:
Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.


I haven't been following this thread and might have missed it, but did you show your checkpoint_completion_target?

Re: Disk latency goes up during certaing pediods

From
bricklen
Date:

On Mon, Jul 29, 2013 at 1:28 PM, German Becker <german.becker@gmail.com> wrote:
checkpoint_segments = 256               # in logfile segments, min 1, 16MB each


I'm curious about checkpoint_segments. 256 seems pretty high -- did testing show that that helps?

 
checkpoint_completion_target = 0.7      # checkpoint target duration, 0.0 - 1.0

0.7 could be bumped up to 0.9, but I doubt that that will make a very noticeable difference for this particular issue.

Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.
Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?


On Mon, Jul 29, 2013 at 6:07 PM, bricklen <bricklen@gmail.com> wrote:

On Mon, Jul 29, 2013 at 1:28 PM, German Becker <german.becker@gmail.com> wrote:
checkpoint_segments = 256               # in logfile segments, min 1, 16MB each


I'm curious about checkpoint_segments. 256 seems pretty high -- did testing show that that helps?

 
checkpoint_completion_target = 0.7      # checkpoint target duration, 0.0 - 1.0

0.7 could be bumped up to 0.9, but I doubt that that will make a very noticeable difference for this particular issue.

Re: Disk latency goes up during certaing pediods

From
bricklen
Date:
On Tue, Jul 30, 2013 at 8:35 AM, German Becker <german.becker@gmail.com> wrote:
256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.
Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?


checkpoint_completion_target is to help with "checkpoint smoothing", to reduce the spike in disk I/O when shared_buffers are written out. Depesz has a good article about that:  http://www.depesz.com/2010/11/03/checkpoint_completion_target/

Do your graphs show any correlation between number of WAL segments getting recycled, and disk I/O spikes? Are you logging checkpoints? If so, you could use the checkpoint times to compare against your I/O graphs. I am by no means an expert here, I'm just throwing out ideas (which might already have been suggested).


Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
To all whom might be interested, I have an update on this.
I run some tests on the old production DB which was Posgres 8.3 (and only one disk for everything), using pgreplay, running the same queries as the 9.1 server.
Here is the output of iostat for the 8.3 server:


Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   230.00   22.00  106.00   308.00  2692.00    23.44     0.33    2.58   2.34  30.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   166.00    9.50   65.50   160.00  1708.00    24.91     0.29    3.07   2.47  18.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   236.50    7.50  118.50   120.00  2984.00    24.63     0.39    3.61   1.55  19.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   310.50    7.50  168.50   112.00  3832.00    22.41     0.44    2.50   0.94  16.50

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   321.50   22.50  184.00   320.00  4048.00    21.15     0.88    4.24   1.74  36.00

Device:         rrqm/s   wrqm/s     r/s     w/s   rsec/s   wsec/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00   266.00    4.50  155.00    64.00  3356.00    21.44     0.29    1.72   0.88  14.00


Here is the output for 9.1:


Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   85.00     0.00   352.00     8.28     0.29    3.46    0.00    3.46   3.46  29.40

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.50    0.00   97.00     0.00   450.00     9.28     0.39    4.04    0.00    4.04   3.79  36.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   87.00     0.00   376.00     8.64     0.29    3.29    0.00    3.29   3.29  28.60

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.50    0.00   92.00     0.00   386.00     8.39     0.32    3.43    0.00    3.43   3.28  30.20

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00   89.50     0.00   388.00     8.67     0.33    3.66    0.00    3.66   3.66  32.80

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sdb               0.00     0.00    0.00  104.50     0.00   432.00     8.27     0.38    3.62    0.00    3.62   3.62  37.80


(the columns are not the same, probably because of different Ubuntu versions)
What is notable is the following:

8.3 shows much less disk utilization even though the same disk is used for, for example the plain log
I think the main difference is average request size and w/s. In 9.1 the request sizes seem to be roughly half of those of 8.3, so the w/s are rougly double.

I think this might be because of the differnt setting in wal_level; in 8.3 we were using archive, (hot_standby was not available yet) and in 9.1 we are using hot_standby. Here is what the documentations says about this:

"It is thought that there is little measurable difference in performance between using hot_standby and archive levels, so feedback is welcome if any production impacts are noticeable."

Although, of course, this might be just the difference betwen Postgres releases.

Any thoughts or feedback is appreciated.
Cheers,

Germán Becker


On Tue, Jul 30, 2013 at 1:02 PM, bricklen <bricklen@gmail.com> wrote:
On Tue, Jul 30, 2013 at 8:35 AM, German Becker <german.becker@gmail.com> wrote:
256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.
Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?


checkpoint_completion_target is to help with "checkpoint smoothing", to reduce the spike in disk I/O when shared_buffers are written out. Depesz has a good article about that:  http://www.depesz.com/2010/11/03/checkpoint_completion_target/

Do your graphs show any correlation between number of WAL segments getting recycled, and disk I/O spikes? Are you logging checkpoints? If so, you could use the checkpoint times to compare against your I/O graphs. I am by no means an expert here, I'm just throwing out ideas (which might already have been suggested).



Re: Disk latency goes up during certaing pediods

From
bricklen
Date:
On Wed, Jul 31, 2013 at 9:25 AM, German Becker <german.becker@gmail.com> wrote:
To all whom might be interested, I have an update on this.
I run some tests on the old production DB which was Posgres 8.3 (and only one disk for everything), using pgreplay, running the same queries as the 9.1 server.
Here is the output of iostat for the 8.3 server:
...
Here is the output for 9.1:
...

What kernel are you running? Could it be related to a recent discussions about the 3.2 kernel?
http://markmail.org/message/qosngswoy5lqmxlr

Re: Disk latency goes up during certaing pediods

From
German Becker
Date:
I'm using kernel 3.2 so this probably apply. Many thanks! I'll upgrade the kernel and let you know


On Wed, Jul 31, 2013 at 1:34 PM, bricklen <bricklen@gmail.com> wrote:
On Wed, Jul 31, 2013 at 9:25 AM, German Becker <german.becker@gmail.com> wrote:
To all whom might be interested, I have an update on this.
I run some tests on the old production DB which was Posgres 8.3 (and only one disk for everything), using pgreplay, running the same queries as the 9.1 server.
Here is the output of iostat for the 8.3 server:
...
Here is the output for 9.1:
...

What kernel are you running? Could it be related to a recent discussions about the 3.2 kernel?
http://markmail.org/message/qosngswoy5lqmxlr