Thread: Disk latency goes up during certaing pediods

Disk latency goes up during certaing pediods

From

German Becker

Date:

26 July 2013, 22:21:29

Hi list,

I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for pg_xlog, using ext4 filesystem with journaling in writeback mode

During high load times, the disk usage is arround 40%. The IO write time is constant at about 3ms. On certain occasions roughly once in 15 days, the IO write time goes up to about 10ms. This makes the disk usage go up to almost 100%, probably saturation, and the INSERTS DELETES UPDATES run considerable slower than normal.This lasts for about 2 hours and then the latency goes back to 3ms and everything is normal again.

Has anyone seen this behavior? What could be causing the increase in latency?

Thanks!

Germán Becker

Re: Disk latency goes up during certaing pediods

From

Alvaro Herrera

Date:

26 July 2013, 23:27:21

German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions  roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity?  Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

29 July 2013, 17:17:16

Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:

German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

Brett Stauner

Date:

29 July 2013, 19:15:54

"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?

On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:

Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

29 July 2013, 20:58:11

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.

Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:

"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?

On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

Luis

Date:

29 July 2013, 21:01:53

Is your temp stats dir in the same disk?

On Jul 29, 2013 7:59 PM, "German Becker" <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?

On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

Brett Stauner

Date:

29 July 2013, 21:02:00

Okay, so it's not happening at the same time of day or anything. What are your mount options for the WAL disk?

On Mon, Jul 29, 2013 at 12:58 PM, German Becker <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?

On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

29 July 2013, 22:42:04

Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.

On Mon, Jul 29, 2013 at 3:01 PM, Brett Stauner <brett@mightybs.net> wrote:

Okay, so it's not happening at the same time of day or anything. What are your mount options for the WAL disk?

On Mon, Jul 29, 2013 at 12:58 PM, German Becker <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?

On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

29 July 2013, 22:44:46

BTW I have all this data ploted using munin, let me know if you are intrested in looking at the graphs, I send them

On Mon, Jul 29, 2013 at 4:41 PM, German Becker <german.becker@gmail.com> wrote:

Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.

On Mon, Jul 29, 2013 at 3:01 PM, Brett Stauner <brett@mightybs.net> wrote:
Okay, so it's not happening at the same time of day or anything. What are your mount options for the WAL disk?

On Mon, Jul 29, 2013 at 12:58 PM, German Becker <german.becker@gmail.com> wrote:

Brett,

Yes I'm not impying it is postgres related, perhaps is even a normal thing of ext3 /ext4 filesystem, but, this behavioiur is only notable when using postgres and in particular the wal files, as it is very hard disk intensive, soy maybe somenone has seen this befores. The server only task is the database. It has 4 disks one for the os, one for the wal and the other 2 for data. All disks show different access times. The WAL disk is the only one on which the with constant latency, and in certain ocasions it goes up.
Plus the ocasions are absolutely random.

On Mon, Jul 29, 2013 at 1:15 PM, Brett Stauner <brett@mightybs.net> wrote:
"...like if the disk/filesystem gets slower during one hour or so"

What about a different scheduled task on the system, not necessarily Postgres related?

On Mon, Jul 29, 2013 at 9:17 AM, German Becker <german.becker@gmail.com> wrote:
Alvaro,

Thanks for your reply. I believe that the only possibility is autovacum activity I will check that. Anyway what puzzles me is that the throughput does not increase like i might expect if there where high VACUUM activity, only the WAIT TIME and thus the UTILIZATION. Is like if the disk/filesystem gets slower during one hour or so...

On Fri, Jul 26, 2013 at 5:27 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
German Becker escribió:
> Hi list,
>
> I am running Postgres 9.1 on Ubuntu 12.04. I have a dedicated disk for
> pg_xlog, using ext4 filesystem with journaling in writeback mode
> During high load times, the disk usage is arround 40%. The IO write time is
> constant at about 3ms. On certain occasions roughly once in 15 days, the
> IO write time goes up to about 10ms. This makes the disk usage go up to
> almost 100%, probably saturation, and the INSERTS DELETES UPDATES run
> considerable slower than normal.This lasts for about 2 hours and then the
> latency goes back to 3ms and everything is normal again.
> Has anyone seen this behavior? What could be causing the increase in
> latency?

Can you correlate these episodes with autovacuum activity? Or perhaps
backups are being taken (maybe a new base backup is taken every 15
days)?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Re: Disk latency goes up during certaing pediods

From

bricklen

Date:

29 July 2013, 23:07:22

On Mon, Jul 29, 2013 at 12:41 PM, German Becker <german.becker@gmail.com> wrote:

Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.

I haven't been following this thread and might have missed it, but did you show your checkpoint_completion_target?

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

29 July 2013, 23:28:52

Here are the values:

# - Checkpoints -

checkpoint_segments = 256 # in logfile segments, min 1, 16MB each

#checkpoint_timeout = 5min # range 30s-1h

checkpoint_completion_target = 0.7 # checkpoint target duration, 0.0 - 1.0

#checkpoint_warning = 30s # 0 disables

On Mon, Jul 29, 2013 at 5:06 PM, bricklen <bricklen@gmail.com> wrote:

On Mon, Jul 29, 2013 at 12:41 PM, German Becker <german.becker@gmail.com> wrote:
Luis, The disk only has the WAL (pg_xlog ) directory. Brett, Here are the mount options:

/dev/sdb1 on /storage/sdb1 type ext3 (rw,noatime,data=writeback,errors=remount-ro)

BTW The original fs was ext4, now I am trying with ext3, with the exact same results. No noticeable changes using diferent journal modes.

I also tried disabling the journal altogether, which dramatically reduced the disk usage, but nevertheless there was this latency spikes.

I haven't been following this thread and might have missed it, but did you show your checkpoint_completion_target?

Re: Disk latency goes up during certaing pediods

From

bricklen

Date:

30 July 2013, 00:08:01

On Mon, Jul 29, 2013 at 1:28 PM, German Becker <german.becker@gmail.com> wrote:

checkpoint_segments = 256 # in logfile segments, min 1, 16MB each

I'm curious about checkpoint_segments. 256 seems pretty high -- did testing show that that helps?

checkpoint_completion_target = 0.7 # checkpoint target duration, 0.0 - 1.0

0.7 could be bumped up to 0.9, but I doubt that that will make a very noticeable difference for this particular issue.

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

30 July 2013, 18:35:07

256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.

Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?

On Mon, Jul 29, 2013 at 6:07 PM, bricklen <bricklen@gmail.com> wrote:

On Mon, Jul 29, 2013 at 1:28 PM, German Becker <german.becker@gmail.com> wrote:
checkpoint_segments = 256 # in logfile segments, min 1, 16MB each

I'm curious about checkpoint_segments. 256 seems pretty high -- did testing show that that helps?

checkpoint_completion_target = 0.7 # checkpoint target duration, 0.0 - 1.0

0.7 could be bumped up to 0.9, but I doubt that that will make a very noticeable difference for this particular issue.

Re: Disk latency goes up during certaing pediods

From

bricklen

Date:

30 July 2013, 19:02:37

On Tue, Jul 30, 2013 at 8:35 AM, German Becker <german.becker@gmail.com> wrote:

256 was set some time when we were testing a differnt issue. I read that the only drawback is the amunt of time required for recovery, which was tested and it was like 10 seconds for the 256 segments, and higher values mean less disk usage.
Anyway all these parameters should affect the throughput to the data disks, not the WAL, Am I right?

checkpoint_completion_target is to help with "checkpoint smoothing", to reduce the spike in disk I/O when shared_buffers are written out. Depesz has a good article about that: http://www.depesz.com/2010/11/03/checkpoint_completion_target/

Do your graphs show any correlation between number of WAL segments getting recycled, and disk I/O spikes? Are you logging checkpoints? If so, you could use the checkpoint times to compare against your I/O graphs. I am by no means an expert here, I'm just throwing out ideas (which might already have been suggested).

Re: Disk latency goes up during certaing pediods

From

German Becker

Date:

31 July 2013, 19:25:48

To all whom might be interested, I have an update on this.

I run some tests on the old production DB which was Posgres 8.3 (and only one disk for everything), using pgreplay, running the same queries as the 9.1 server.

Here is the output of iostat for the 8.3 server: