Thread: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

From
hubert depesz lubaczewski
Date:
Hi,
we have following situation:

1. primary on 14.5 that is *not* archiving (this is temporary situation
   related to ongoing upgrade from pg 12 proces) - all on ubuntu focal.
2. on new replica we run (via wrapper, but this doesn't seem to be
   related):
   pg_basebackup -D /var/lib/postgresql/14/main -c fast -v -P -U some-user -h sourcedb.hostname
3. after it is done, if the datadir was large enough, pg on replica
   doesn't replicate/catchup, because, from logs:
2022-09-29 14:59:26.587 UTC,,,2355588,,6335b2ce.23f184,1,,2022-09-29 14:59:26 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0
 
2022-09-29 14:59:26.587 UTC,,,2355588,,6335b2ce.23f184,2,,2022-09-29 14:59:26 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
4. if there is restore_command configured, it tries to read data from archive
   too, but archive is non-existant.
5. the "missing" file is there, in pg_wal (I would assume that
   pg_basebackup copied it there):
   root@host# /bin/ls -c1 0* | wc -l
   1068
   root@host# /bin/ls -c1 0* | sort -V | head -n 1
   00000001000007E4000000A0
   root@host# /bin/ls -c1 0* | sort -V | tail -n 1
   00000001000007E800000092
   root@host# /bin/ls -c1 0* | sort -V | grep -n 00000001000007E800000067
   1043:00000001000007E800000067
   root@host# /bin/ls -c1 0* | sort -V | grep -n -C5 00000001000007E800000067
   1038-00000001000007E800000062
   1039-00000001000007E800000063
   1040-00000001000007E800000064
   1041-00000001000007E800000065
   1042-00000001000007E800000066
   1043:00000001000007E800000067
   1044-00000001000007E800000068
   1045-00000001000007E800000069
   1046-00000001000007E800000070
   1047-00000001000007E800000071
   1048-00000001000007E800000072
6. What's more - I straced startup process, and it does:
   a. opens the wal file (the problematic one)
   b. read 8k form it
   c. closes it
   d. checks existence of finish.recovery trigger file (it doesn't exist)
   e. starts restore program (which fails).
   f. rinse and repeat

What am I missing? what is wrong? How can I fix it? The problem is not fixing
*this server*, because we are in process of upgrading LOTS and LOTS of servers,
and I need to know what is broken/how to work around it.


Currently our goto fix is:
1. increase wal_keep_size to ~ 200GB
2. standaup replica
3. once it catches up decrease wal_keep_size to standard (for us) 16GB

but it is not really nice solution.

Best regards,

depesz




Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

From
Michael Paquier
Date:
On Thu, Sep 29, 2022 at 05:51:02PM +0200, hubert depesz lubaczewski wrote:
> What am I missing? what is wrong? How can I fix it? The problem is not fixing
> *this server*, because we are in process of upgrading LOTS and LOTS of servers,
> and I need to know what is broken/how to work around it.

So, your backup_label file points to 00000001000007E4000000A0 as being
the first segment to begin recovery from.  This means that the startup
process is able to correctly kick recovery using the contents of the
local pg_wal/ from 00000001000007E4000000A0 to 00000001000007E800000066.
Once 00000001000007E800000067 cannot be read, the startup process
fails to read this segment and decides to switch the source to get WAL
from as streaming.  The order should be local pg_wal/, then archiving
and finally streaming if all are configured, as far as I recall.

What's the error you are seeing when reading 00000001000007E800000067?
Is pg_waldump able to understand its internals?  This could point out
to an issue with xlogreader.c, for example.

Another thing that has changed in this area is related to continuation
records.  What are the WAL contents at the end of
00000001000007E800000066?  Do you spot an OVERWRITE_CONTRECORD when
using pg_waldump on it?
--
Michael

Attachment

Re: pg ignores wal files in pg_wal, and instead tries to load them from archive/primary

From
hubert depesz lubaczewski
Date:
On Fri, Sep 30, 2022 at 09:49:28AM +0900, Michael Paquier wrote:
> On Thu, Sep 29, 2022 at 05:51:02PM +0200, hubert depesz lubaczewski wrote:
> > What am I missing? what is wrong? How can I fix it? The problem is not fixing
> > *this server*, because we are in process of upgrading LOTS and LOTS of servers,
> > and I need to know what is broken/how to work around it.
> So, your backup_label file points to 00000001000007E4000000A0 as being
> the first segment to begin recovery from.  This means that the startup

Well, *now* I don't have backup_label:

root@c273-pg2258:/var/lib/postgresql/14/main# ls -l
total 68520
-rw------- 1 postgres postgres        3 Sep 28 01:38 PG_VERSION
-rw------- 1 postgres postgres 61264030 Sep 30 04:42 autoprewarm.blocks
-rw------- 1 postgres postgres      231 Sep 28 01:38 backup_label.old
-rw------- 1 postgres postgres  8783060 Sep 28 01:38 backup_manifest
drwx------ 6 postgres postgres       74 Sep 28 01:38 base
-rw------- 1 postgres postgres       61 Sep 30 04:00 current_logfiles
drwx------ 2 postgres postgres     4096 Sep 29 14:25 global
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_commit_ts
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_dynshmem
drwx------ 4 postgres postgres       84 Sep 28 03:38 pg_logical
drwx------ 4 postgres postgres       48 Sep 28 01:38 pg_multixact
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_notify
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_replslot
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_serial
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_snapshots
drwx------ 2 postgres postgres        6 Sep 29 14:25 pg_stat
drwx------ 2 postgres postgres       43 Sep 29 14:25 pg_stat_tmp
drwx------ 2 postgres postgres      122 Sep 29 14:25 pg_subtrans
drwx------ 2 postgres postgres       27 Sep 28 01:38 pg_tblspc
drwx------ 2 postgres postgres        6 Sep 28 01:38 pg_twophase
drwx------ 3 postgres postgres   204800 Sep 28 03:38 pg_wal
drwx------ 2 postgres postgres     4096 Sep 28 01:38 pg_xact
-rw------- 1 postgres postgres      118 Sep 28 01:38 postgresql.auto.conf
-rw------- 1 postgres postgres      130 Sep 29 14:25 postmaster.opts
-rw------- 1 postgres postgres      103 Sep 29 14:25 postmaster.pid
-rw-r--r-- 1 postgres postgres        0 Sep 28 01:38 standby.signal

backup_label.old is:
root@c273-pg2258:/var/lib/postgresql/14/main# cat backup_label.old
START WAL LOCATION: 7D5/3B068338 (file 00000001000007D50000003B)
CHECKPOINT LOCATION: 7D5/3B076F78
BACKUP METHOD: streamed
BACKUP FROM: primary
START TIME: 2022-09-27 21:06:00 UTC
LABEL: pg_basebackup base backup
START TIMELINE: 1

> process is able to correctly kick recovery using the contents of the
> local pg_wal/ from 00000001000007E4000000A0 to 00000001000007E800000066.
> Once 00000001000007E800000067 cannot be read, the startup process
> fails to read this segment and decides to switch the source to get WAL
> from as streaming.  The order should be local pg_wal/, then archiving
> and finally streaming if all are configured, as far as I recall.
> What's the error you are seeing when reading 00000001000007E800000067?

I don't see any errors. Let me show you some data from log:
2022-09-30 04:44:43.833 UTC,"postgres","postgres",2716081,"[local]",6336743b.2971b1,3,"authentication",2022-09-30
04:44:43UTC,4/14269,0,LOG,00000,"connection authorized: user=postgres database=postgres
application_name=psql",,,,,,,,,"","clientbackend",,1147616880456321454 
2022-09-30 04:44:43.917 UTC,,,2716082,,6336743b.2971b2,1,,2022-09-30 04:44:43 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:43.918 UTC,,,2716082,,6336743b.2971b2,2,,2022-09-30 04:44:43 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,1,,2022-09-30 04:44:48 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:48.922 UTC,,,2716187,,63367440.29721b,2,,2022-09-30 04:44:48 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:51.867 UTC,,,2716207,"[local]",63367443.29722f,1,"",2022-09-30 04:44:51 UTC,,0,LOG,00000,"connection
received:host=[local]",,,,,,,,,"","not initialized",,0 
2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,2,"authentication",2022-09-30 04:44:51
UTC,4/14271,0,LOG,00000,"connectionauthenticated: identity=""postgres"" method=peer
(/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","clientbackend",,-659920678270393094 
2022-09-30 04:44:51.868 UTC,"postgres","db",2716207,"[local]",63367443.29722f,3,"authentication",2022-09-30 04:44:51
UTC,4/14271,0,LOG,00000,"connectionauthorized: user=postgres database=db application_name=psql",,,,,,,,,"","client
backend",,-659920678270393094
2022-09-30 04:44:53.751 UTC,,,2716218,"127.0.0.1:39078",63367445.29723a,1,"",2022-09-30 04:44:53
UTC,,0,LOG,00000,"connectionreceived: host=127.0.0.1 port=39078",,,,,,,,,"","not initialized",,0 
2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,1,,2022-09-30 04:44:53 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:53.927 UTC,,,2716219,,63367445.29723b,2,,2022-09-30 04:44:53 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,1,,2022-09-30 04:44:58 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:44:58.931 UTC,,,2716255,,6336744a.29725f,2,,2022-09-30 04:44:58 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,1,,2022-09-30 04:45:03 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:45:03.936 UTC,,,2716366,,6336744f.2972ce,2,,2022-09-30 04:45:03 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:08.751 UTC,,,2716401,"127.0.0.1:45926",63367454.2972f1,1,"",2022-09-30 04:45:08
UTC,,0,LOG,00000,"connectionreceived: host=127.0.0.1 port=45926",,,,,,,,,"","not initialized",,0 
2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,1,,2022-09-30 04:45:08 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:45:08.940 UTC,,,2716402,,63367454.2972f2,2,,2022-09-30 04:45:08 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0
2022-09-30 04:45:12.960 UTC,,,2716425,"[local]",63367458.297309,1,"",2022-09-30 04:45:12 UTC,,0,LOG,00000,"connection
received:host=[local]",,,,,,,,,"","not initialized",,0 
2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,2,"authentication",2022-09-30
04:45:12UTC,4/14277,0,LOG,00000,"connection authenticated: identity=""postgres"" method=peer
(/etc/postgresql/14/main/pg_hba.conf:11)",,,,,,,,,"","clientbackend",,1147616880456321454 
2022-09-30 04:45:12.961 UTC,"postgres","postgres",2716425,"[local]",63367458.297309,3,"authentication",2022-09-30
04:45:12UTC,4/14277,0,LOG,00000,"connection authorized: user=postgres database=postgres
application_name=psql",,,,,,,,,"","clientbackend",,1147616880456321454 
2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,1,,2022-09-30 04:45:13 UTC,,0,LOG,00000,"started streaming WAL
fromprimary at 7E8/67000000 on timeline 1",,,,,,,,,"","walreceiver",,0 
2022-09-30 04:45:13.946 UTC,,,2716431,,63367459.29730f,2,,2022-09-30 04:45:13 UTC,,0,FATAL,08P01,"could not receive
datafrom WAL stream: ERROR:  requested WAL segment 00000001000007E800000067 has already been
removed",,,,,,,,,"","walreceiver",,0

> Is pg_waldump able to understand its internals?  This could point out
> to an issue with xlogreader.c, for example.

So... it can, but no records:

postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000067
pg_waldump: fatal: could not find a valid record after 7E8/67000000
postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000068
pg_waldump: fatal: could not find a valid record after 7E8/68000000

but previous one works:
postgres@c273-pg2258:~/14/main$ /usr/lib/postgresql/14/bin/pg_waldump 00000001000007E800000066 | tail -n 20
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005495, lsn: 7E8/66A0B010, prev 7E8/66A0AFC8, desc: COMMIT
2022-09-2801:38:20.782352 UTC; origin: node 1, lsn CE64/A5086920, at 2022-09-28 01:38:20.780890 UTC 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005496, lsn: 7E8/66A0B058, prev 7E8/66A0B010, desc: LOCK off
55:xid 140005496: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195315099 blk 83995 
rmgr: Heap        len (rec/tot):    165/   165, tx:  140005496, lsn: 7E8/66A0B090, prev 7E8/66A0B058, desc: HOT_UPDATE
off55 xmax 140005496 flags 0x10 KEYS_UPDATED ; new off 74 xmax 140005496, blkref #0: rel 17963/16644/195315099 blk
83995
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005496, lsn: 7E8/66A0B138, prev 7E8/66A0B090, desc: COMMIT
2022-09-2801:38:20.783036 UTC; origin: node 3, lsn CE64/A5087E00, at 2022-09-28 01:38:20.780992 UTC 
rmgr: XLOG        len (rec/tot):     51/   980, tx:          0, lsn: 7E8/66A0B180, prev 7E8/66A0B138, desc:
FPI_FOR_HINT, blkref #0: rel 17963/16644/195326137 blk 271085 FPW 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005497, lsn: 7E8/66A0B558, prev 7E8/66A0B180, desc: LOCK off
13:xid 140005497: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195326137 blk 271085 
rmgr: Heap        len (rec/tot):    261/   261, tx:  140005497, lsn: 7E8/66A0B590, prev 7E8/66A0B558, desc: HOT_UPDATE
off13 xmax 140005497 flags 0x10 KEYS_UPDATED ; new off 15 xmax 140005497, blkref #0: rel 17963/16644/195326137 blk
271085
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005497, lsn: 7E8/66A0B698, prev 7E8/66A0B590, desc: COMMIT
2022-09-2801:38:20.784762 UTC; origin: node 1, lsn CE64/A508C7F0, at 2022-09-28 01:38:20.782858 UTC 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005498, lsn: 7E8/66A0B6E0, prev 7E8/66A0B698, desc: LOCK off 8:
xid140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500 
rmgr: Heap        len (rec/tot):     54/    54, tx:  140005498, lsn: 7E8/66A0B718, prev 7E8/66A0B6E0, desc: LOCK off 8:
xid140005498: flags 0x00 LOCK_ONLY EXCL_LOCK KEYS_UPDATED , blkref #0: rel 17963/16644/195328502 blk 3038500 
rmgr: Heap        len (rec/tot):   1821/  1821, tx:  140005498, lsn: 7E8/66A0B750, prev 7E8/66A0B718, desc: UPDATE off
8xmax 140005498 flags 0x10 KEYS_UPDATED ; new off 5 xmax 140005498, blkref #0: rel 17963/16644/195328502 blk 3038837,
blkref#1: rel 17963/16644/195328502 blk 3038500 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BE70, prev 7E8/66A0B750, desc: INSERT_LEAF
off158, blkref #0: rel 17963/16644/195349680 blk 78396 
rmgr: Btree       len (rec/tot):     72/    72, tx:  140005498, lsn: 7E8/66A0BEB0, prev 7E8/66A0BE70, desc: INSERT_LEAF
off72, blkref #0: rel 17963/16644/419942198 blk 82241 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BEF8, prev 7E8/66A0BEB0, desc: INSERT_LEAF
off268, blkref #0: rel 17963/16644/419946394 blk 18486 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BF38, prev 7E8/66A0BEF8, desc: INSERT_LEAF
off69, blkref #0: rel 17963/16644/419946856 blk 56634 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BF78, prev 7E8/66A0BF38, desc: INSERT_LEAF
off98, blkref #0: rel 17963/16644/419947709 blk 19044 
rmgr: Btree       len (rec/tot):     64/    64, tx:  140005498, lsn: 7E8/66A0BFB8, prev 7E8/66A0BF78, desc: INSERT_LEAF
off53, blkref #0: rel 17963/16644/419948176 blk 10715 
rmgr: Transaction len (rec/tot):     65/    65, tx:  140005498, lsn: 7E8/66A0BFF8, prev 7E8/66A0BFB8, desc: COMMIT
2022-09-2801:38:20.785822 UTC; origin: node 1, lsn CE64/A5091128, at 2022-09-28 01:38:20.784288 UTC 
rmgr: XLOG        len (rec/tot):     34/    34, tx:          0, lsn: 7E8/66A0C058, prev 7E8/66A0BFF8, desc: BACKUP_END
7D5/3B068338
rmgr: XLOG        len (rec/tot):     24/    24, tx:          0, lsn: 7E8/66A0C080, prev 7E8/66A0C058, desc: SWITCH


> Another thing that has changed in this area is related to continuation
> records.  What are the WAL contents at the end of
> 00000001000007E800000066?  Do you spot an OVERWRITE_CONTRECORD when
> using pg_waldump on it?

No. There is no such thing in pg_waldump on *66 file.

If I understand correctly - this would suggest that the .66 file is the end of
wal taken from pg_basebackup. What are the 67 and next files then?

Best regards,

depesz





Currently our goto fix is:
1. increase wal_keep_size to ~ 200GB
2. standaup replica
3. once it catches up decrease wal_keep_size to standard (for us) 16GB

but it is not really nice solution.

If you don't have an archive and aren't using a permanent slot, then you always have the risk of blowing past your available WAL.

Increasing wal_keep_size is an option to reduce this risk.  Yes, it is not so nice, but you have already declined to use the nicer options.

Where those extra unusable files came from is a curiosity, but nothing more than that.  I'm guessing they were created by the wrapper you said you use, as I can't reproduce their existence with just pg_basebackup.  They are probably "recycled ahead" files from the master that somehow got copied needlessly, and with a file name which does not match the file contents (since they contents are from a previous name the file had).  pg_waldump isn't as helpful as it could be in those cases.

Cheers,

Jeff