Is it possible for a WAL file to be missing records? - Mailing list pgsql-bugs

From 와따가따
Subject Is it possible for a WAL file to be missing records?
Date
Msg-id CAAEzU5vAfo0gD2u=TWAjFoeQLjHdTTaypM85mx+rQKHE8ht1OA@mail.gmail.com
Whole thread Raw
List pgsql-bugs
PostgreSQL version and HA extension in use
- PostgreSQL 13.10 version
- pg_auto_failover 2.0

CPU usage and load were increasing due to high load.

Failover was performed while a large number of WALwrite events occurred in the primary DB.

I confirmed that the part where the secondary was not promoted was a pg_auto_failover issue.

I promoted the secondary manually.

And I originally tried to make the primary DB a new secondary using the archived wal file, but there seemed to be a missing WAL record.

So, I opened the WAL file using pg_waldump and there was a missing record.

It was not a DB server crash.
Can records not be recorded in the WAL file even when a failover is performed due to high load?

I'm wondering if this could be considered a bug or if it was a situation where WAL records could be lost.

I will send you the information confirmed through DB log and pg_waldump.

I'll share some DB settings too.
hot_standby_feedback = on
hot_standby = on
synchronous_commit = on
wal_writer_flush_after = 1MB
wal_sync_method = fdatasync
wal_writer_delay = 200ms
wal_buffers = 16MB
wal_segment_size= 16MB

[When the first failover occurs]
- WAL apply DB log
image.png
- Check the wal record using pg_waldump
I verified that there are no missing lsn in 0000000300005015000000A6 and 0000000300005015000000A7.
However, the prev lsn shown in 0000000300005015000000A8 is not found in 0000000300005015000000A7.
    - The last LSN of 0000000300005015000000A7 is 5015/A6003778
    -The prev LSN of the first record of 0000000300005015000000A8 is 5015/A7FFED78.
image.png

[When the second failover occurs]
- DB log
image.png

- Check the wal record using pg_waldump
The last LSN of 000000030000501E0000008E is 501E/8EFFCED8.
The prev lsn of the first record in 000000030000501E0000008F wal file is 501E/8EFFEEC8. 
It appears to have been lost due to the large difference in LSN.

image.png


Attachment

pgsql-bugs by date:

Previous
From: David Rowley
Date:
Subject: Re: Error from array_agg when table has many rows
Next
From: Tom Lane
Date:
Subject: Re: Error from array_agg when table has many rows