Thread: WALWriteLocks
On Wed, 2021-04-28 at 13:53 -0500, Don Seiler wrote: > We're running a few PostgreSQL 12 instances in VMs in the MS Azure cloud. We have one > instance in particular that will seemingly lock up for a few seconds (and on rare > occasions a few minutes). When looking at the metrics, it looks like most of the > active sessions all begin waiting on WALWriteLocks. Shortly after we'll see a flurry > of COMMITs in the PG log. For example, we would observe 75-90 sessions all waiting > on WALWriteLock for 30 seconds and then suddenly they would all write the COMMIT > action to the PG log (showing the ~30 second duration). When this wait goes on for > minutes, then it is definitely noticed by the app and users. > > We have moved the WAL files to separate Azure UltraSSD storage and are working with > Azure support to try to identify the bottleneck here. > > My understanding is that the WALWriteLock is solely due to the session's commit > waiting to write to the WAL file, so our focus has been on the storage under the > WAL files. I'm curious if there are any other events inside PostgreSQL that would > be behind the WALWriteLock, to make sure we aren't chasing a red herring. My gues is that you have too many active client connections, and you are suffering from contention between the many backends that all want to write WAL. In that case, use a connection pool to limit the number of active connections. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.
In that case, use a connection pool to limit the number of active connections.
www.seiler.us
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.
In that case, use a connection pool to limit the number of active connections.We do have pgbouncer in place already.Thanks for the replies so far.What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?
I guess details on io waits/ iops saturation etc metrics would need to be ruled out for further discussion.Do the dashboards wrt above metrics look ok?We had vms with unlimited iops and no synchronous replication, so I do not recall this piling up of locks issues atleast till 1tb dbs on ssds till pg11.Googling does mention some things wrt tuning of wal buffers etc, but I believe ruling out resource exhaustion is important.On Thu, 29 Apr 2021 at 9:42 PM Don Seiler <don@seiler.us> wrote:On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.
In that case, use a connection pool to limit the number of active connections.We do have pgbouncer in place already.Thanks for the replies so far.What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?
On Thu, 2021-04-29 at 11:11 -0500, Don Seiler wrote: > On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > My gues is that you have too many active client connections, and you are suffering > > from contention between the many backends that all want to write WAL. > > > > In that case, use a connection pool to limit the number of active connections. > > We do have pgbouncer in place already. What is the maximum pool size? How many connections in state "active" do you have? > What I really want to know in this case is if there is some other PG operation that > accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file > storage, and we can focus our investigation there? WALWriteLock: must be held to write WAL buffers to disk (XLogWrite or XLogFlush). It might be I/O overload, but it could also be internal contention for the lock. Check the "%iowait" column in "sar -u 1". Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote: > I tried the below, but i could not simulate WALWriteLock waits. > > create table foo(id int) > > for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done Perhaps you should do simple inserts of one row and run that script with a pgbench and lots of clients. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
> for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done
Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
ok i have a small laptop with an even smaller vm :)pgbench -i s 3pgbench -c 20 -j 5 -T 900 -M preparedselect wait_event_type, wait_event from pg_stat_activity;wait_event_type | wait_event-----------------+---------------------Activity | LogicalLauncherMainActivity | AutoVacuumMainLWLock | WALWriteLock | transactionidLock | tupleLWLock | WALWriteLock | transactionidLock | transactionidLock | tupleLock | transactionidLock | transactionidLock | transactionidLock | tupleIO | WALSyncLock | transactionid|Activity | BgWriterHibernateActivity | CheckpointerMainLWLock | WALWritei then destroyed the slow WAL dbrm -rf dataand built a new one without additional latency to WALdir.initdb -D datapg_ctl -D data startpgbench -i -s 3pgbench -c 20 -j 5 -T 900 -M preparedi see similar output as abovebut noWALWriteLockold thread, just for reference. do not read into anything :)Thanks,VijayOn Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
> for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done
Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
Have you thought about commit_delay and siblings?
pg
From: Don Seiler <don@seiler.us>
Sent: Thursday, April 29, 2021 11:12 AM
To: laurenz.albe@cybertec.at
Cc: pgsql-admin@postgresql.org
Subject: [EXTERNAL] Re: WALWriteLocks
Use caution when interacting with this [EXTERNAL] email! |
On Thu, Apr 29, 2021 at 1:38 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
My gues is that you have too many active client connections, and you are suffering
from contention between the many backends that all want to write WAL.
In that case, use a connection pool to limit the number of active connections.
We do have pgbouncer in place already.
Thanks for the replies so far.
What I really want to know in this case is if there is some other PG operation that accounts for a WALWriteLock wait, or is it always an I/O (write) to the WAL file storage, and we can focus our investigation there?
Don.
--
Don Seiler
www.seiler.us
ok my bad,it seems it was renamed,WALWriteLock in 9,6 to WALWrite :)so Yes, I do see WALWrite LWLocks.Thanks,VijayOn Fri, 30 Apr 2021 at 01:30, Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:ok i have a small laptop with an even smaller vm :)pgbench -i s 3pgbench -c 20 -j 5 -T 900 -M preparedselect wait_event_type, wait_event from pg_stat_activity;wait_event_type | wait_event-----------------+---------------------Activity | LogicalLauncherMainActivity | AutoVacuumMainLWLock | WALWriteLock | transactionidLock | tupleLWLock | WALWriteLock | transactionidLock | transactionidLock | tupleLock | transactionidLock | transactionidLock | transactionidLock | tupleIO | WALSyncLock | transactionid|Activity | BgWriterHibernateActivity | CheckpointerMainLWLock | WALWritei then destroyed the slow WAL dbrm -rf dataand built a new one without additional latency to WALdir.initdb -D datapg_ctl -D data startpgbench -i -s 3pgbench -c 20 -j 5 -T 900 -M preparedi see similar output as abovebut noWALWriteLockold thread, just for reference. do not read into anything :)Thanks,VijayOn Fri, 30 Apr 2021 at 00:22, Laurenz Albe <laurenz.albe@cybertec.at> wrote:On Fri, 2021-04-30 at 00:13 +0530, Vijaykumar Jain wrote:
> I tried the below, but i could not simulate WALWriteLock waits.
>
> create table foo(id int)
>
> for i in {1..10}; do psql -c "begin transaction; insert into foo select 1 from generate_series(1, 1000); commit;" & done
Perhaps you should do simple inserts of one row and run that script
with a pgbench and lots of clients.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
www.seiler.us
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote: > What I want to know is if there are other events that would contribute > to a WALWriteLock (eg writing to a data file or the postgresql.log file) > or is it 100% writing to the WAL file. You don't contribute to a lock... The documentation says (https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE): > WALWrite Waiting for WAL buffers to be written to disk. So it is contention while writing or flushing WAL. You can read the code for the details. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.
You don't contribute to a lock...
The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):
> WALWrite Waiting for WAL buffers to be written to disk.
So it is contention while writing or flushing WAL.
You can read the code for the details.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 3.61 1.67 0.00 76.95
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28
05/12/2021 01:45:52 PM
avg-cpu: %user %nice %system %iowait %steal %idle
18.71 0.00 4.02 1.54 0.00 75.74
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44
05/12/2021 01:45:57 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.85 0.00 4.34 10.93 0.00 66.88
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88
05/12/2021 01:46:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
19.40 0.00 4.48 2.53 0.00 73.59
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64
Hey Don,Was this resolved?We're there any changes after moving wal to ssd ?On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.
You don't contribute to a lock...
The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):
> WALWrite Waiting for WAL buffers to be written to disk.
So it is contention while writing or flushing WAL.
You can read the code for the details.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com
www.seiler.us
Not resolved. The WAL was already on SSD, we further moved it to a separate SSD (Azure's Ultra SSD) and that seemed to lessen the severity/frequency but we still see occasional spikes over the course of any given day.What is actually interesting though is now that I've had iostat running, when we see the WALWriteLock spikes, we see the w_await and aqu-sz spike on the data directory disk, not the WAL disk. See below for an example (first two readings for baseline, then spike). sdc is the PG data directory disk (Azure Premium SSD), and dm-0 is an LVM lun wrapped around it. sdd is the WAL disk (Azure Ultra SSD). The data disk is set to ReadOnly host caching per Azure's recommendations, the WAL disk is set to None (per recommendations for log disk plus Ultra SSD doesn't allow any caching anyway).05/12/2021 01:45:47 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.77 0.00 3.61 1.67 0.00 76.95
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.20 0.00 0.07 0.00 14.60 0.00 82.02 0.00 1.69 0.00 0.00 22.25 3.25 1.04
sdc 1180.00 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.71 5.22 0.77 9.88 7.93 0.67 91.20
sdd 0.00 1290.00 0.00 19.82 0.00 250.00 0.00 16.23 0.00 0.46 0.01 0.00 15.74 0.76 98.56
dm-0 1180.40 182.80 11.39 1.41 0.00 0.00 0.00 0.00 0.74 5.30 1.84 9.88 7.93 0.67 91.28
05/12/2021 01:45:52 PM
avg-cpu: %user %nice %system %iowait %steal %idle
18.71 0.00 4.02 1.54 0.00 75.74
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.40 0.00 0.07 0.00 15.00 0.00 81.52 0.00 1.88 0.00 0.00 21.65 3.53 1.20
sdc 1035.20 118.60 10.55 1.05 0.00 26.80 0.00 18.43 0.67 5.27 0.50 10.44 9.07 0.81 93.44
sdd 0.00 1298.00 0.00 21.64 0.00 264.40 0.00 16.92 0.00 0.49 0.01 0.00 17.08 0.76 98.96
dm-0 1035.20 145.40 10.55 1.05 0.00 0.00 0.00 0.00 0.69 5.49 1.52 10.44 7.39 0.79 93.44
05/12/2021 01:45:57 PM
avg-cpu: %user %nice %system %iowait %steal %idle
17.85 0.00 4.34 10.93 0.00 66.88
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 0.00 3.80 0.00 0.07 0.00 14.60 0.00 79.35 0.00 2.26 0.00 0.00 20.00 3.58 1.36
sdc 983.20 2831.80 8.98 36.91 0.00 0.00 0.00 0.00 6.61 38.56 108.97 9.35 13.35 0.25 94.80
sdd 0.00 1246.20 0.00 21.24 0.00 249.00 0.00 16.65 0.00 0.49 0.03 0.00 17.45 0.79 98.48
dm-0 982.80 2831.60 8.98 36.91 0.00 0.00 0.00 0.00 6.58 38.66 115.93 9.35 13.35 0.25 94.88
05/12/2021 01:46:02 PM
avg-cpu: %user %nice %system %iowait %steal %idle
19.40 0.00 4.48 2.53 0.00 73.59
Device r/s w/s rMB/s wMB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
loop0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
sdb 2.20 4.00 0.04 0.09 0.00 20.00 0.00 83.33 0.36 2.20 0.00 17.45 24.00 2.32 1.44
sdc 915.00 2592.60 8.31 23.10 0.00 44.20 0.00 1.68 1.20 16.35 37.69 9.30 9.13 0.27 94.64
sdd 0.00 1234.20 0.00 24.99 0.00 253.60 0.00 17.05 0.00 0.54 0.02 0.00 20.74 0.80 99.20
dm-0 915.60 2636.80 8.31 23.10 0.00 0.00 0.00 0.00 1.20 16.19 43.80 9.30 8.97 0.27 94.64On Mon, May 24, 2021 at 9:13 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:Hey Don,Was this resolved?We're there any changes after moving wal to ssd ?On Wed, May 5, 2021, 10:14 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:On Wed, 2021-05-05 at 11:25 -0500, Don Seiler wrote:
> What I want to know is if there are other events that would contribute
> to a WALWriteLock (eg writing to a data file or the postgresql.log file)
> or is it 100% writing to the WAL file.
You don't contribute to a lock...
The documentation says
(https://www.postgresql.org/docs/13/monitoring-stats.html#WAIT-EVENT-LOCK-TABLE):
> WALWrite Waiting for WAL buffers to be written to disk.
So it is contention while writing or flushing WAL.
You can read the code for the details.
Yours,
Laurenz Albe
--
Cybertec | https://www.cybertec-postgresql.com--Don Seiler
www.seiler.us
Sorry, i just want to use all the info, i have less exp :)do you generate large wals ?
Our WALs are standard 16MB in size.
if your checkpoint_completion_target is aggressive, it will result in a lot of io.if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.but for that log_checkpoints needs to be turned on if they are hogging the iops.
checkpoint_timeout is 5min, max_wal_size is 5GB. The vast majority of checkpoints are time-initiated, not wal-initiated. We've tuned this in the past. Over the past 16 hours there have been only 5 checkpoints started due to wal pressure. There was no such wal-initiated checkpoint that corresponds to the most recent WALWriteLock spike either.
www.seiler.us
On Mon, May 24, 2021 at 10:38 AM Vijaykumar Jain <vijaykumarjain.github@gmail.com> wrote:We've definitely been over these docs, with and without their support team. We are well below the caps on the both the disks and the VM size in use.Sorry, i just want to use all the info, i have less exp :)do you generate large wals ?
Our WALs are standard 16MB in size.
if your checkpoint_completion_target is aggressive, it will result in a lot of io.if the checkpoint is spread over a larger time window, maybe it will reduce frequent i/o pressure.but for that log_checkpoints needs to be turned on if they are hogging the iops.
checkpoint_timeout is 5min, max_wal_size is 5GB. The vast majority of checkpoints are time-initiated, not wal-initiated. We've tuned this in the past. Over the past 16 hours there have been only 5 checkpoints started due to wal pressure. There was no such wal-initiated checkpoint that corresponds to the most recent WALWriteLock spike either.Don.--Don Seiler
www.seiler.us
Do you have synchronous_standby_names enabled and have non zero nodes here?slow network or disk on standby may result in WALWrite contention on primary, even when primary has no issue.
www.seiler.us