Thread: WALWriteLocks

WALWriteLocks

From
Don Seiler
Date:
Good afternoon,

We're running a few PostgreSQL 12 instances in VMs in the MS Azure cloud. We have one instance in particular that will seemingly lock up for a few seconds (and on rare occasions a few minutes). When looking at the metrics, it looks like most of the active sessions all begin waiting on WALWriteLocks. Shortly after we'll see a flurry of COMMITs in the PG log. For example, we would observe 75-90 sessions all waiting on WALWriteLock for 30 seconds and then suddenly they would all write the COMMIT action to the PG log (showing the ~30 second duration). When this wait goes on for minutes, then it is definitely noticed by the app and users.

We have moved the WAL files to separate Azure UltraSSD storage and are working with Azure support to try to identify the bottleneck here.

My understanding is that the WALWriteLock is solely due to the session's commit waiting to write to the WAL file, so our focus has been on the storage under the WAL files. I'm curious if there are any other events inside PostgreSQL that would be behind the WALWriteLock, to make sure we aren't chasing a red herring.

Thanks,
Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Laurenz Albe
Date:
On Wed, 2021-04-28 at 13:53 -0500, Don Seiler wrote:
> We're running a few PostgreSQL 12 instances in VMs in the MS Azure cloud. We have one
>  instance in particular that will seemingly lock up for a few seconds (and on rare
>  occasions a few minutes). When looking at the metrics, it looks like most of the
>  active sessions all begin waiting on WALWriteLocks. Shortly after we'll see a flurry
>  of COMMITs in the PG log. For example, we would observe 75-90 sessions all waiting
>  on WALWriteLock for 30 seconds and then suddenly they would all write the COMMIT
>  action to the PG log (showing the ~30 second duration). When this wait goes on for
>  minutes, then it is definitely noticed by the app and users.
> 
> We have moved the WAL files to separate Azure UltraSSD storage and are working with
>  Azure support to try to identify the bottleneck here.
> 
> My understanding is that the WALWriteLock is solely due to the session's commit
>  waiting to write to the WAL file, so our focus has been on the storage under the
>  WAL files. I'm curious if there are any other events inside PostgreSQL that would
>  be behind the WALWriteLock, to make sure we aren't chasing a red herring.

My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: WALWriteLocks

From
Don Seiler
Date:
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there? 

Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Vijaykumar Jain
Date:
I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.

Do the dashboards wrt above metrics look ok? 

We had vms with unlimited iops  and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.
Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.

On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@seiler.us> wrote:
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there? 

Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Vijaykumar Jain
Date:
ok sorry,
I tried the below, but i could not simulate WALWriteLock waits.

on ubuntu, create a slow loopback device, mount waldir on it and do commits to check for WALWriteLock

  580  dd if=/dev/zero of=/var/tmp/postgres bs=1024k count=100
  581  losetup --show --find /var/tmp/postgres # gives loop8
  582  echo "0 `blockdev --getsz /dev/loop8` delay /dev/loop8 0 1000" | dmsetup create dm-slow  # upto 1000ms delay
  583  ll /dev/mapper/dm-slow 
  584  mkfs.ext4 /dev/mapper/dm-slow #format
  585  su - postgres
  586  mkdir -p /mnt/slow
  587  mount /dev/mapper/dm-slow /mnt/slow
  588  ls -l /mnt/slow
  589  mkdir /mnt/slow/postgres
  590  chown -R postgres /mnt/slow/postgres
  591 initdb -D data -X /mnt/slow/postgres/data

set WALdir on slow  /mnt/slow/postgres

start pg
pg_ctl -D data -l /tmp/logfile start

monitor disk io to verify slow wal commit due to io wait
iostat -x 1 -p dm-0 -p loop8

create table foo(id int)

 for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

inserts are fast, but commit would write to WAL, would be slow.

verified by iostat.

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00    0.00  100.00    0.00    0.00

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz  aqu-sz  %util
dm-0             0.00      0.00     0.00   0.00    0.00     0.00    1.00      0.00     0.00   0.00 2028.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    2.03 100.40
loop8            0.00      0.00     0.00   0.00    0.00     0.00    2.00     12.00     0.00   0.00    5.50     6.00    0.00      0.00     0.00   0.00    0.00     0.00    0.02   1.60

but i could not see any WALWriteLocks in pg_locks during the window.



* To read XLogCtl->LogwrtResult, you must hold either info_lck or
 * WALWriteLock.  To update it, you need to hold both locks.  The point of
 * this arrangement is that the value can be examined by code that already
 * holds WALWriteLock without needing to grab info_lck as well.  In addition
 * to the shared variable, each backend has a private copy of LogwrtResult,
 * which is updated when convenient.
 *
 * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
 * (protected by info_lck), but we don't need to cache any copies of it.
 *
 * info_lck is only held long enough to read/update the protected variables,
 * so it's a plain spinlock.  The other locks are held longer (potentially
 * over I/O operations), so we use LWLocks for them.  These locks are:
 *
 * WALBufMappingLock: must be held to replace a page in the WAL buffer cache.
 * It is only held while initializing and changing the mapping.  If the
 * contents of the buffer being replaced haven't been written yet, the mapping
 * lock is released while the write is done, and reacquired afterwards.
 *
 * WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or
 * XLogFlush).

/*
 * Wait for any WAL insertions < upto to finish.
 *
 * Returns the location of the oldest insertion that is still in-progress.
 * Any WAL prior to that point has been fully copied into WAL buffers, and
 * can be flushed out to disk. Because this waits for any insertions older
 * than 'upto' to finish, the return value is always >= 'upto'.
 *
 * Note: When you are about to write out WAL, you must call this function
 * *before* acquiring WALWriteLock, to avoid deadlocks. This function might
 * need to wait for an insertion to finish (or at least advance to next
 * uninitialized page), and the inserter might need to evict an old WAL buffer
 * to make room for a new one, which in turn requires WALWriteLock.
 */


so i may be wrong in what i suggested.
Sorry, I guess the experts will have to weigh in.

apologies for diversion.

Thanks,
Vijay







On Thu, 29 Apr 2021 at 22:35, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.

Do the dashboards wrt above metrics look ok? 

We had vms with unlimited iops  and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.
Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.

On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@seiler.us> wrote:
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

We do have pgbouncer in place already.

Thanks for the replies so far.

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there? 

Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Laurenz Albe
Date:
On Thu, 2021-04-29 at 11:11 -0500, Don Seiler wrote:
> On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> > My gues is that you have too many active client connections, and you are suffering
> > from contention between the many backends that all want to write WAL.
> > 
> > In that case, use a connection pool to limit the number of active connections.
> 
> We do have pgbouncer in place already.

What is the maximum pool size?
How many connections in state "active" do you have?

> What I really want to know in this case is if there is some other PG operation that
>  accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file
>  storage, and we can focus our investigation there? 

WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or XLogFlush).

It might be I/O overload, but it could also be internal contention for the lock.
Check the "%iowait" column in "sar -u 1".

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: WALWriteLocks

From
Laurenz Albe
Date:
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
> 
> create table foo(id int)
> 
>  for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" &
done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: WALWriteLocks

From
Vijaykumar Jain
Date:
ok i have a small laptop with an even smaller vm :)

pgbench -i s 3
pgbench -c 20 -j 5 -T 900 -M prepared

select wait_event_type, wait_event from pg_stat_activity;

 wait_event_type |     wait_event
-----------------+---------------------
 Activity        | LogicalLauncherMain
 Activity        | AutoVacuumMain
 LWLock          | WALWrite
 Lock            | transactionid
 Lock            | tuple
 LWLock          | WALWrite
 Lock            | transactionid
 Lock            | transactionid
 Lock            | tuple
 Lock            | transactionid
 Lock            | transactionid
 Lock            | transactionid
 Lock            | tuple
 IO              | WALSync
 Lock            | transactionid
                 |
 Activity        | BgWriterHibernate
 Activity        | CheckpointerMain
 LWLock          | WALWrite


i then destroyed the slow WAL db 
rm -rf data 


and built a new one without additional latency to WALdir.

initdb -D data
pg_ctl -D data start

pgbench -i -s 3
pgbench -c 20 -j 5 -T 900 -M prepared

i see similar output as above
but no 
WALWriteLock

old thread, just for reference. do not read into anything :) 



Thanks,
Vijay





On Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
>  for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From
Vijaykumar Jain
Date:
ok my bad,
it seems it was renamed,

WALWriteLock in 9,6 to WALWrite :)

so Yes,  I do see WALWrite LWLocks.

Thanks,
Vijay

On Fri, 30 Apr 2021 at 01:30, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
ok i have a small laptop with an even smaller vm :)

pgbench -i s 3
pgbench -c 20 -j 5 -T 900 -M prepared

select wait_event_type, wait_event from pg_stat_activity;

 wait_event_type |     wait_event
-----------------+---------------------
 Activity        | LogicalLauncherMain
 Activity        | AutoVacuumMain
 LWLock          | WALWrite
 Lock            | transactionid
 Lock            | tuple
 LWLock          | WALWrite
 Lock            | transactionid
 Lock            | transactionid
 Lock            | tuple
 Lock            | transactionid
 Lock            | transactionid
 Lock            | transactionid
 Lock            | tuple
 IO              | WALSync
 Lock            | transactionid
                 |
 Activity        | BgWriterHibernate
 Activity        | CheckpointerMain
 LWLock          | WALWrite


i then destroyed the slow WAL db 
rm -rf data 


and built a new one without additional latency to WALdir.

initdb -D data
pg_ctl -D data start

pgbench -i -s 3
pgbench -c 20 -j 5 -T 900 -M prepared

i see similar output as above
but no 
WALWriteLock

old thread, just for reference. do not read into anything :) 



Thanks,
Vijay





On Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
>  for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

RE: [EXTERNAL] Re: WALWriteLocks

From
"Godfrin, Philippe E"
Date:

Have you thought about commit_delay and siblings?

pg

 

From: Don Seiler <don@seiler.us>
Sent: Thursday, April 29, 2021 11:12 AM
To: laurenz.albe@cybertec.at
Cc: pgsql-admin@postgresql.org
Subject: [EXTERNAL] Re: WALWriteLocks

 

Use caution when interacting with this [EXTERNAL] email!

 


On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:

My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.

In that case, use a connection pool to limit the number of active connections.

 

We do have pgbouncer in place already.

 

Thanks for the replies so far.

 

What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there? 

 

Don.

 

--

Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Don Seiler
Date:
I appreciate everyone's feedback and help but we seem to be dancing around the actual question.

What I want to know is if there are other events that would contribute to a WALWriteLock (eg writing to a data file or the postgresql.log file) or is it 100% writing to the WAL file. So when we see a sustained spike in WALWriteLock wait events, we know where to focus our investigation efforts. The effort itself into the WAL writes is a separate issue. First I want to know if there is any other process/place that we also need to account for.

Don.

On Thu, Apr 29, 2021 at 3:04 PM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
ok my bad,
it seems it was renamed,

WALWriteLock in 9,6 to WALWrite :)

so Yes,  I do see WALWrite LWLocks.

Thanks,
Vijay

On Fri, 30 Apr 2021 at 01:30, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
ok i have a small laptop with an even smaller vm :)

pgbench -i s 3
pgbench -c 20 -j 5 -T 900 -M prepared

select wait_event_type, wait_event from pg_stat_activity;

 wait_event_type |     wait_event
-----------------+---------------------
 Activity        | LogicalLauncherMain
 Activity        | AutoVacuumMain
 LWLock          | WALWrite
 Lock            | transactionid
 Lock            | tuple
 LWLock          | WALWrite
 Lock            | transactionid
 Lock            | transactionid
 Lock            | tuple
 Lock            | transactionid
 Lock            | transactionid
 Lock            | transactionid
 Lock            | tuple
 IO              | WALSync
 Lock            | transactionid
                 |
 Activity        | BgWriterHibernate
 Activity        | CheckpointerMain
 LWLock          | WALWrite


i then destroyed the slow WAL db 
rm -rf data 


and built a new one without additional latency to WALdir.

initdb -D data
pg_ctl -D data start

pgbench -i -s 3
pgbench -c 20 -j 5 -T 900 -M prepared

i see similar output as above
but no 
WALWriteLock

old thread, just for reference. do not read into anything :) 



Thanks,
Vijay





On Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
>  for i in {1..10}; do  psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done

Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Laurenz Albe
Date:
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
>  to a WALWriteLock (eg writing to a data file or the postgresql.log file)
>  or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite     Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com




Re: WALWriteLocks

From
Vijaykumar Jain
Date:
Hey Don,

Was this resolved?
We're there any changes after moving wal to ssd ?

On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
>  to a WALWriteLock (eg writing to a data file or the postgresql.log file)
>  or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite      Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com

Re: WALWriteLocks

From
Don Seiler
Date:
Not resolved. The WAL was already on SSD, we further moved it to a separate SSD (Azure's Ultra SSD) and that seemed to lessen the severity/frequency but we still see occasional spikes over the course of any given day.

What is actually interesting though is now that I've had iostat running, when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on the data directory disk, not the WAL disk. See below for an example (first two readings for baseline, then spike). sdc is the PG data directory disk (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching per Azure's recommendations, the WAL disk is set to None (per recommendations for log disk plus Ultra SSD doesn't allow any caching anyway).

05/12/2021 01:45:47 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.77    0.00    3.61    1.67    0.00   76.95

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    3.20      0.00      0.07     0.00    14.60   0.00  82.02    0.00    1.69   0.00     0.00    22.25   3.25   1.04
sdc           1180.00  182.80     11.39      1.41     0.00     0.00   0.00   0.00    0.71    5.22   0.77     9.88     7.93   0.67  91.20
sdd              0.00 1290.00      0.00     19.82     0.00   250.00   0.00  16.23    0.00    0.46   0.01     0.00    15.74   0.76  98.56
dm-0          1180.40  182.80     11.39      1.41     0.00     0.00   0.00   0.00    0.74    5.30   1.84     9.88     7.93   0.67  91.28

05/12/2021 01:45:52 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.71    0.00    4.02    1.54    0.00   75.74

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    3.40      0.00      0.07     0.00    15.00   0.00  81.52    0.00    1.88   0.00     0.00    21.65   3.53   1.20
sdc           1035.20  118.60     10.55      1.05     0.00    26.80   0.00  18.43    0.67    5.27   0.50    10.44     9.07   0.81  93.44
sdd              0.00 1298.00      0.00     21.64     0.00   264.40   0.00  16.92    0.00    0.49   0.01     0.00    17.08   0.76  98.96
dm-0          1035.20  145.40     10.55      1.05     0.00     0.00   0.00   0.00    0.69    5.49   1.52    10.44     7.39   0.79  93.44

05/12/2021 01:45:57 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.85    0.00    4.34   10.93    0.00   66.88

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    3.80      0.00      0.07     0.00    14.60   0.00  79.35    0.00    2.26   0.00     0.00    20.00   3.58   1.36
sdc            983.20 2831.80      8.98     36.91     0.00     0.00   0.00   0.00    6.61   38.56 108.97     9.35    13.35   0.25  94.80
sdd              0.00 1246.20      0.00     21.24     0.00   249.00   0.00  16.65    0.00    0.49   0.03     0.00    17.45   0.79  98.48
dm-0           982.80 2831.60      8.98     36.91     0.00     0.00   0.00   0.00    6.58   38.66 115.93     9.35    13.35   0.25  94.88

05/12/2021 01:46:02 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          19.40    0.00    4.48    2.53    0.00   73.59

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              2.20    4.00      0.04      0.09     0.00    20.00   0.00  83.33    0.36    2.20   0.00    17.45    24.00   2.32   1.44
sdc            915.00 2592.60      8.31     23.10     0.00    44.20   0.00   1.68    1.20   16.35  37.69     9.30     9.13   0.27  94.64
sdd              0.00 1234.20      0.00     24.99     0.00   253.60   0.00  17.05    0.00    0.54   0.02     0.00    20.74   0.80  99.20
dm-0           915.60 2636.80      8.31     23.10     0.00     0.00   0.00   0.00    1.20   16.19  43.80     9.30     8.97   0.27  94.64


On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
Hey Don,

Was this resolved?
We're there any changes after moving wal to ssd ?

On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
>  to a WALWriteLock (eg writing to a data file or the postgresql.log file)
>  or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite      Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Vijaykumar Jain
Date:
Thanks.

For my ref: 

Sorry, i just want to use all the info, i have less exp :)
do you generate large wals ? 
if your checkpoint_completion_target is aggressive, it will result in a lot of io.
if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.
but for that log_checkpoints needs to be turned on if they are hogging the iops.


I did use the above to spread checkpoints to reduce io stress once, but it meant recovery in case of crash would take longer.

But ok, I guess I am diverging. I'll see if I can simulate the exact messages.





On Mon, 24 May 2021 at 20:35, Don Seiler <don@seiler.us> wrote:
Not resolved. The WAL was already on SSD, we further moved it to a separate SSD (Azure's Ultra SSD) and that seemed to lessen the severity/frequency but we still see occasional spikes over the course of any given day.

What is actually interesting though is now that I've had iostat running, when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on the data directory disk, not the WAL disk. See below for an example (first two readings for baseline, then spike). sdc is the PG data directory disk (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching per Azure's recommendations, the WAL disk is set to None (per recommendations for log disk plus Ultra SSD doesn't allow any caching anyway).

05/12/2021 01:45:47 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.77    0.00    3.61    1.67    0.00   76.95

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    3.20      0.00      0.07     0.00    14.60   0.00  82.02    0.00    1.69   0.00     0.00    22.25   3.25   1.04
sdc           1180.00  182.80     11.39      1.41     0.00     0.00   0.00   0.00    0.71    5.22   0.77     9.88     7.93   0.67  91.20
sdd              0.00 1290.00      0.00     19.82     0.00   250.00   0.00  16.23    0.00    0.46   0.01     0.00    15.74   0.76  98.56
dm-0          1180.40  182.80     11.39      1.41     0.00     0.00   0.00   0.00    0.74    5.30   1.84     9.88     7.93   0.67  91.28

05/12/2021 01:45:52 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          18.71    0.00    4.02    1.54    0.00   75.74

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    3.40      0.00      0.07     0.00    15.00   0.00  81.52    0.00    1.88   0.00     0.00    21.65   3.53   1.20
sdc           1035.20  118.60     10.55      1.05     0.00    26.80   0.00  18.43    0.67    5.27   0.50    10.44     9.07   0.81  93.44
sdd              0.00 1298.00      0.00     21.64     0.00   264.40   0.00  16.92    0.00    0.49   0.01     0.00    17.08   0.76  98.96
dm-0          1035.20  145.40     10.55      1.05     0.00     0.00   0.00   0.00    0.69    5.49   1.52    10.44     7.39   0.79  93.44

05/12/2021 01:45:57 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.85    0.00    4.34   10.93    0.00   66.88

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              0.00    3.80      0.00      0.07     0.00    14.60   0.00  79.35    0.00    2.26   0.00     0.00    20.00   3.58   1.36
sdc            983.20 2831.80      8.98     36.91     0.00     0.00   0.00   0.00    6.61   38.56 108.97     9.35    13.35   0.25  94.80
sdd              0.00 1246.20      0.00     21.24     0.00   249.00   0.00  16.65    0.00    0.49   0.03     0.00    17.45   0.79  98.48
dm-0           982.80 2831.60      8.98     36.91     0.00     0.00   0.00   0.00    6.58   38.66 115.93     9.35    13.35   0.25  94.88

05/12/2021 01:46:02 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          19.40    0.00    4.48    2.53    0.00   73.59

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz wareq-sz  svctm  %util
loop0            0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sda              0.00    0.00      0.00      0.00     0.00     0.00   0.00   0.00    0.00    0.00   0.00     0.00     0.00   0.00   0.00
sdb              2.20    4.00      0.04      0.09     0.00    20.00   0.00  83.33    0.36    2.20   0.00    17.45    24.00   2.32   1.44
sdc            915.00 2592.60      8.31     23.10     0.00    44.20   0.00   1.68    1.20   16.35  37.69     9.30     9.13   0.27  94.64
sdd              0.00 1234.20      0.00     24.99     0.00   253.60   0.00  17.05    0.00    0.54   0.02     0.00    20.74   0.80  99.20
dm-0           915.60 2636.80      8.31     23.10     0.00     0.00   0.00   0.00    1.20   16.19  43.80     9.30     8.97   0.27  94.64


On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:
Hey Don,

Was this resolved?
We're there any changes after moving wal to ssd ?

On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
>  to a WALWriteLock (eg writing to a data file or the postgresql.log file)
>  or is it 100% writing to the WAL file.

You don't contribute to a lock...

The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):

> WALWrite      Waiting for WAL buffers to be written to disk.

So it is contention while writing or flushing WAL.
You can read the code for the details.

Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com



--
Don Seiler
www.seiler.us


--
Thanks,
Vijay
Mumbai, India

Re: WALWriteLocks

From
Don Seiler
Date:
On Mon, May 24, 2021 at 10:38 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

We've definitely been over these docs, with and without their support team. We are well below the caps on the both the disks and the VM size in use.
 
Sorry, i just want to use all the info, i have less exp :)
do you generate large wals ? 

Our WALs are standard 16MB in size.
 
if your checkpoint_completion_target is aggressive, it will result in a lot of io.
if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.
but for that log_checkpoints needs to be turned on if they are hogging the iops.

checkpoint_timeout is 5min, max_wal_size is 5GB. The vast majority of checkpoints are time-initiated, not wal-initiated. We've tuned this in the past. Over the past 16 hours there have been only 5 checkpoints started due to wal pressure. There was no such wal-initiated checkpoint that corresponds to the most recent WALWriteLock spike either.

Don.

--
Don Seiler
www.seiler.us

Re: WALWriteLocks

From
Vijaykumar Jain
Date:
ok one final query.


Do you have synchronous_standby_names enabled and have non zero nodes here?
slow network or disk on standby may result in WALWrite contention on primary, even when primary has no issue.


 



 


On Mon, 24 May 2021 at 21:21, Don Seiler <don@seiler.us> wrote:
On Mon, May 24, 2021 at 10:38 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

We've definitely been over these docs, with and without their support team. We are well below the caps on the both the disks and the VM size in use.
 
Sorry, i just want to use all the info, i have less exp :)
do you generate large wals ? 

Our WALs are standard 16MB in size.
 
if your checkpoint_completion_target is aggressive, it will result in a lot of io.
if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.
but for that log_checkpoints needs to be turned on if they are hogging the iops.

checkpoint_timeout is 5min, max_wal_size is 5GB. The vast majority of checkpoints are time-initiated, not wal-initiated. We've tuned this in the past. Over the past 16 hours there have been only 5 checkpoints started due to wal pressure. There was no such wal-initiated checkpoint that corresponds to the most recent WALWriteLock spike either.

Don.

--
Don Seiler
www.seiler.us


--
Thanks,
Vijay
Mumbai, India

Re: WALWriteLocks

From
Don Seiler
Date:
On Mon, May 24, 2021 at 11:02 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:

Do you have synchronous_standby_names enabled and have non zero nodes here?
slow network or disk on standby may result in WALWrite contention on primary, even when primary has no issue.

We don't use synchronous standby commit. That parameter value is empty.

--
Don Seiler
www.seiler.us