Thread: WALWriteLocks

WALWriteLocks

From

Don Seiler

Date:

28 April 2021, 18:53:39

Good afternoon,

We're running a few PostgreSQL 12 instances in VMs in the MS Azure cloud. We have one instance in particular that will seemingly lock up for a few seconds (and on rare occasions a few minutes). When looking at the metrics, it looks like most of the active sessions all begin waiting on WALWriteLocks. Shortly after we'll see a flurry of COMMITs in the PG log. For example, we would observe 75-90 sessions all waiting on WALWriteLock for 30 seconds and then suddenly they would all write the COMMIT action to the PG log (showing the ~30 second duration). When this wait goes on for minutes, then it is definitely noticed by the app and users.

We have moved the WAL files to separate Azure UltraSSD storage and are working with Azure support to try to identify the bottleneck here.

My understanding is that the WALWriteLock is solely due to the session's commit waiting to write to the WAL file, so our focus has been on the storage under the WAL files. I'm curious if there are any other events inside PostgreSQL that would be behind the WALWriteLock, to make sure we aren't chasing a red herring.

Thanks,

Don.

Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Laurenz Albe

Date:

29 April 2021, 06:38:31

On Wed, 2021-04-28 at 13:53 -0500, Don Seiler wrote:
> We're running a few PostgreSQL 12 instances in VMs in the MS Azure cloud. We have one
>  instance in particular that will seemingly lock up for a few seconds (and on rare
>  occasions a few minutes). When looking at the metrics, it looks like most of the
>  active sessions all begin waiting on WALWriteLocks. Shortly after we'll see a flurry
>  of COMMITs in the PG log. For example, we would observe 75-90 sessions all waiting
>  on WALWriteLock for 30 seconds and then suddenly they would all write the COMMIT
>  action to the PG log (showing the ~30 second duration). When this wait goes on for
>  minutes, then it is definitely noticed by the app and users.
> 
> We have moved the WAL files to separate Azure UltraSSD storage and are working with
>  Azure support to try to identify the bottleneck here.
> 
> My understanding is that the WALWriteLock is solely due to the session's commit
>  waiting to write to the WAL file, so our focus has been on the storage under the
>  WAL files. I'm curious if there are any other events inside PostgreSQL that would
>  be behind the WALWriteLock, to make sure we aren't chasing a red herring.

My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From

Don Seiler

Date:

29 April 2021, 16:11:44

On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?

Don.

Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

29 April 2021, 17:05:01

I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.

Do the dashboards wrt above metrics look ok?

We had vms with unlimited iops and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.

Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.

On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@seiler.us> wrote:

On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?

Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

29 April 2021, 18:43:48

ok sorry,

I tried the below, but i could not simulate WALWriteLock waits.

on ubuntu, create a slow loopback device, mount waldir on it and do commits to check for WALWriteLock

580 dd if=/dev/zero of=/var/tmp/postgres bs=1024k count=100

581 losetup --show --find /var/tmp/postgres # gives loop8

582 echo "0 `blockdev --getsz /dev/loop8` delay /dev/loop8 0 1000" | dmsetup create dm-slow # upto 1000ms delay

583 ll /dev/mapper/dm-slow

584 mkfs.ext4 /dev/mapper/dm-slow #format

585 su - postgres

586 mkdir -p /mnt/slow

587 mount /dev/mapper/dm-slow /mnt/slow

588 ls -l /mnt/slow

589 mkdir /mnt/slow/postgres

590 chown -R postgres /mnt/slow/postgres

591 initdb -D data -X /mnt/slow/postgres/data

set WALdir on slow /mnt/slow/postgres

start pg

pg_ctl -D data -l /tmp/logfile start

monitor disk io to verify slow wal commit due to io wait

iostat -x 1 -p dm-0 -p loop8

create table foo(id int)

for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

inserts are fast, but commit would write to WAL, would be slow.

verified by iostat.

avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 0.00 100.00 0.00 0.00

Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz aqu-sz %util

dm-0 0.00 0.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00 2028.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 2.03 100.40

loop8 0.00 0.00 0.00 0.00 0.00 0.00 2.00 12.00 0.00 0.00 5.50 6.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 1.60

but i could not see any WALWriteLocks in pg_locks during the window.

from postgres/xlog.c at master · postgres/postgres (github.com)

* To read XLogCtl->LogwrtResult, you must hold either info_lck or

* WALWriteLock. To update it, you need to hold both locks. The point of

* this arrangement is that the value can be examined by code that already

* holds WALWriteLock without needing to grab info_lck as well. In addition

* to the shared variable, each backend has a private copy of LogwrtResult,

* which is updated when convenient.

* The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst

* (protected by info_lck), but we don't need to cache any copies of it.

* info_lck is only held long enough to read/update the protected variables,

* so it's a plain spinlock. The other locks are held longer (potentially

* over I/O operations), so we use LWLocks for them. These locks are:

* WALBufMappingLock: must be held to replace a page in the WAL buffer cache.

* It is only held while initializing and changing the mapping. If the

* contents of the buffer being replaced haven't been written yet, the mapping

* lock is released while the write is done, and reacquired afterwards.

* WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or

* XLogFlush).

* Wait for any WAL insertions < upto to finish.

* Returns the location of the oldest insertion that is still in-progress.

* Any WAL prior to that point has been fully copied into WAL buffers, and

* can be flushed out to disk. Because this waits for any insertions older

* than 'upto' to finish, the return value is always >= 'upto'.

* Note: When you are about to write out WAL, you must call this function

* *before* acquiring WALWriteLock, to avoid deadlocks. This function might

* need to wait for an insertion to finish (or at least advance to next

* uninitialized page), and the inserter might need to evict an old WAL buffer

* to make room for a new one, which in turn requires WALWriteLock.

so i may be wrong in what i suggested.

Sorry, I guess the experts will have to weigh in.

apologies for diversion.

Thanks,

Vijay

On Thu, 29 Apr 2021 at 22:35, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.

Do the dashboards wrt above metrics look ok?

We had vms with unlimited iops and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.
Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.

On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@seiler.us> wrote:
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?

Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Laurenz Albe

Date:

29 April 2021, 18:44:35

On Thu, 2021-04-29 at 11:11 -0500, Don Seiler wrote:
> On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > My gues is that you have too many active client connections, and you are suffering
> > from contention between the many backends that all want to write WAL.
> > 
> > In that case, use a connection pool to limit the number of active connections.
> 
> We do have pgbouncer in place already.

What is the maximum pool size?
How many connections in state "active" do you have?

> What I really want to know in this case is if there is some other PG operation that
>  accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file
>  storage, and we can focus our investigation there? 

WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or XLogFlush).

It might be I/O overload, but it could also be internal contention for the lock.
Check the "%iowait" column in "sar -u 1".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From

Laurenz Albe

Date:

29 April 2021, 18:52:03

On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
> 
> create table foo(id int)
> 
>  for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" &
done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

29 April 2021, 20:00:39

ok i have a small laptop with an even smaller vm :)

pgbench -i s 3

pgbench -c 20 -j 5 -T 900 -M prepared

select wait_event_type, wait_event from pg_stat_activity;

wait_event_type | wait_event

-----------------+---------------------

Activity | LogicalLauncherMain

Activity | AutoVacuumMain

LWLock | WALWrite

Lock | transactionid

Lock | tuple

LWLock | WALWrite

Lock | transactionid

Lock | tuple

Lock | transactionid

Lock | tuple

IO | WALSync

Lock | transactionid

Activity | BgWriterHibernate

Activity | CheckpointerMain

LWLock | WALWrite

i then destroyed the slow WAL db

rm -rf data

and built a new one without additional latency to WALdir.

initdb -D data

pg_ctl -D data start

pgbench -i -s 3

pgbench -c 20 -j 5 -T 900 -M prepared

i see similar output as above

but no

WALWriteLock

old thread, just for reference. do not read into anything :)

PostgreSQL - hackers - An attempt to reduce WALWriteLock contention (postgresql-archive.org)

Thanks,

Vijay

On Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
> for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

29 April 2021, 20:04:08

ok my bad,

it seems it was renamed,

WALWriteLock in 9,6 to WALWrite :)

so Yes, I do see WALWrite LWLocks.

Thanks,

Vijay

On Fri, 30 Apr 2021 at 01:30, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

ok i have a small laptop with an even smaller vm :)

pgbench -i s 3
pgbench -c 20 -j 5 -T 900 -M prepared

select wait_event_type, wait_event from pg_stat_activity;

wait_event_type | wait_event
-----------------+---------------------
Activity | LogicalLauncherMain
Activity | AutoVacuumMain
LWLock | WALWrite
Lock | transactionid
Lock | tuple
LWLock | WALWrite
Lock | transactionid
Lock | transactionid
Lock | tuple
Lock | transactionid
Lock | transactionid
Lock | transactionid
Lock | tuple
IO | WALSync
Lock | transactionid
|
Activity | BgWriterHibernate
Activity | CheckpointerMain
LWLock | WALWrite

i then destroyed the slow WAL db
rm -rf data

and built a new one without additional latency to WALdir.

initdb -D data
pg_ctl -D data start

pgbench -i -s 3
pgbench -c 20 -j 5 -T 900 -M prepared

i see similar output as above
but no
WALWriteLock

old thread, just for reference. do not read into anything :)

PostgreSQL - hackers - An attempt to reduce WALWriteLock contention (postgresql-archive.org)

Thanks,
Vijay

On Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
> for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

RE: [EXTERNAL] Re: WALWriteLocks

From

"Godfrin, Philippe E"

Date:

29 April 2021, 21:20:03

Have you thought about commit_delay and siblings?

From: Don Seiler <don@seiler.us>
Sent: Thursday, April 29, 2021 11:12 AM
To: laurenz.albe@cybertec.at
Cc: pgsql-admin@postgresql.org
Subject: [EXTERNAL] Re: WALWriteLocks

Use caution when interacting with this [EXTERNAL] email!

On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

Don.

Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Don Seiler

Date:

05 May 2021, 16:25:33

I appreciate everyone's feedback and help but we seem to be dancing around the actual question.

What I want to know is if there are other events that would contribute to a WALWriteLock (eg writing to a data file or the postgresql.log file) or is it 100% writing to the WAL file. So when we see a sustained spike in WALWriteLock wait events, we know where to focus our investigation efforts. The effort itself into the WAL writes is a separate issue. First I want to know if there is any other process/place that we also need to account for.

Don.

On Thu, Apr 29, 2021 at 3:04 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

ok my bad,
it seems it was renamed,

WALWriteLock in 9,6 to WALWrite :)

so Yes, I do see WALWrite LWLocks.

Thanks,
Vijay

On Fri, 30 Apr 2021 at 01:30, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
ok i have a small laptop with an even smaller vm :)

pgbench -i s 3
pgbench -c 20 -j 5 -T 900 -M prepared

select wait_event_type, wait_event from pg_stat_activity;

wait_event_type | wait_event
-----------------+---------------------
Activity | LogicalLauncherMain
Activity | AutoVacuumMain
LWLock | WALWrite
Lock | transactionid
Lock | tuple
LWLock | WALWrite
Lock | transactionid
Lock | transactionid
Lock | tuple
Lock | transactionid
Lock | transactionid
Lock | transactionid
Lock | tuple
IO | WALSync
Lock | transactionid
|
Activity | BgWriterHibernate
Activity | CheckpointerMain
LWLock | WALWrite

i then destroyed the slow WAL db
rm -rf data

and built a new one without additional latency to WALdir.

initdb -D data
pg_ctl -D data start

pgbench -i -s 3
pgbench -c 20 -j 5 -T 900 -M prepared

i see similar output as above
but no
WALWriteLock

old thread, just for reference. do not read into anything :)

PostgreSQL - hackers - An attempt to reduce WALWriteLock contention (postgresql-archive.org)

Thanks,
Vijay

On Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
> for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Laurenz Albe

Date:

05 May 2021, 16:44:57

On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
>  to a WALWriteLock (eg writing to a data file or the postgresql.log file)
>  or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite     Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

24 May 2021, 14:13:15

Hey Don,

Was this resolved?

We're there any changes after moving wal to ssd ?

On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From

Don Seiler

Date:

24 May 2021, 15:05:30

Not resolved. The WAL was already on SSD, we further moved it to a separate SSD (Azure's Ultra SSD) and that seemed to lessen the severity/frequency but we still see occasional spikes over the course of any given day.

What is actually interesting though is now that I've had iostat running, when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on the data directory disk, not the WAL disk. See below for an example (first two readings for baseline, then spike). sdc is the PG data directory disk (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching per Azure's recommendations, the WAL disk is set to None (per recommendations for log disk plus Ultra SSD doesn't allow any caching anyway).

05/12/2021 01:45:47 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 3.61 1.67 0.00 76.95

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28

05/12/2021 01:45:52 PM
avg-cpu: %user %nice %system %iowait %steal %idle
18.71 0.00 4.02 1.54 0.00 75.74

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44

05/12/2021 01:45:57 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.85 0.00 4.34 10.93 0.00 66.88

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88

05/12/2021 01:46:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
19.40 0.00 4.48 2.53 0.00 73.59

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64

On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

Hey Don,

Was this resolved?
We're there any changes after moving wal to ssd ?

On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

24 May 2021, 15:38:13

Thanks.

For my ref:

Virtual machine and disk performance - Azure Virtual Machines | Microsoft Docs

Sorry, i just want to use all the info, i have less exp :)

do you generate large wals ?

if your checkpoint_completion_target is aggressive, it will result in a lot of io.

if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.

but for that log_checkpoints needs to be turned on if they are hogging the iops.

Basics of Tuning Checkpoints - 2ndQuadrant | PostgreSQL

Tuning Input/Output (I/O) Operations for PostgreSQL | Severalnines

I did use the above to spread checkpoints to reduce io stress once, but it meant recovery in case of crash would take longer.

But ok, I guess I am diverging. I'll see if I can simulate the exact messages.

On Mon, 24 May 2021 at 20:35, Don Seiler <don@seiler.us> wrote:

Not resolved. The WAL was already on SSD, we further moved it to a separate SSD (Azure's Ultra SSD) and that seemed to lessen the severity/frequency but we still see occasional spikes over the course of any given day.

What is actually interesting though is now that I've had iostat running, when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on the data directory disk, not the WAL disk. See below for an example (first two readings for baseline, then spike). sdc is the PG data directory disk (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching per Azure's recommendations, the WAL disk is set to None (per recommendations for log disk plus Ultra SSD doesn't allow any caching anyway).

05/12/2021 01:45:47 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 3.61 1.67 0.00 76.95

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28

05/12/2021 01:45:52 PM
avg-cpu: %user %nice %system %iowait %steal %idle
18.71 0.00 4.02 1.54 0.00 75.74

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44

05/12/2021 01:45:57 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.85 0.00 4.34 10.93 0.00 66.88

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88

05/12/2021 01:46:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
19.40 0.00 4.48 2.53 0.00 73.59

Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64

On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
Hey Don,

Was this resolved?
We're there any changes after moving wal to ssd ?

On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

--
Don Seiler
www.seiler.us

Thanks,

Vijay

Mumbai, India

Re: WALWriteLocks

From

Don Seiler

Date:

24 May 2021, 15:51:38

On Mon, May 24, 2021 at 10:38 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

Thanks.

For my ref:
Virtual machine and disk performance - Azure Virtual Machines | Microsoft Docs

We've definitely been over these docs, with and without their support team. We are well below the caps on the both the disks and the VM size in use.

Sorry, i just want to use all the info, i have less exp :)
do you generate large wals ?

Our WALs are standard 16MB in size.

if your checkpoint_completion_target is aggressive, it will result in a lot of io.
if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.
but for that log_checkpoints needs to be turned on if they are hogging the iops.

checkpoint_timeout is 5min, max_wal_size is 5GB. The vast majority of checkpoints are time-initiated, not wal-initiated. We've tuned this in the past. Over the past 16 hours there have been only 5 checkpoints started due to wal pressure. There was no such wal-initiated checkpoint that corresponds to the most recent WALWriteLock spike either.

Don.

Don Seiler
www.seiler.us

Re: WALWriteLocks

From

Vijaykumar Jain

Date:

24 May 2021, 16:02:11

ok one final query.

PostgreSQL: Documentation: 13: 19.5. Write Ahead Log

PostgreSQL: Documentation: 13: 19.6. Replication

Do you have synchronous_standby_names enabled and have non zero nodes here?

slow network or disk on standby may result in WALWrite contention on primary, even when primary has no issue.

On Mon, 24 May 2021 at 21:21, Don Seiler <don@seiler.us> wrote:

On Mon, May 24, 2021 at 10:38 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
Thanks.

For my ref:
Virtual machine and disk performance - Azure Virtual Machines | Microsoft Docs

We've definitely been over these docs, with and without their support team. We are well below the caps on the both the disks and the VM size in use.

Sorry, i just want to use all the info, i have less exp :)
do you generate large wals ?

Our WALs are standard 16MB in size.

if your checkpoint_completion_target is aggressive, it will result in a lot of io.
if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.
but for that log_checkpoints needs to be turned on if they are hogging the iops.

checkpoint_timeout is 5min, max_wal_size is 5GB. The vast majority of checkpoints are time-initiated, not wal-initiated. We've tuned this in the past. Over the past 16 hours there have been only 5 checkpoints started due to wal pressure. There was no such wal-initiated checkpoint that corresponds to the most recent WALWriteLock spike either.

Don.

--
Don Seiler
www.seiler.us

Thanks,

Vijay

Mumbai, India

Re: WALWriteLocks

From

Don Seiler

Date:

24 May 2021, 16:20:54

On Mon, May 24, 2021 at 11:02 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

Do you have synchronous_standby_names enabled and have non zero nodes here?
slow network or disk on standby may result in WALWrite contention on primary, even when primary has no issue.

We don't use synchronous standby commit. That parameter value is empty.

Don Seiler
www.seiler.us