BUG #19032: In restore_command %f parameter does not support WAL partial files. - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #19032: In restore_command %f parameter does not support WAL partial files.
Date
Msg-id 19032-7904cffd24dd96a5@postgresql.org
Whole thread Raw
Responses Re: BUG #19032: In restore_command %f parameter does not support WAL partial files.
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      19032
Logged by:          Wojciech Szenajch
Email address:      wszenajch@gmail.com
PostgreSQL version: 17.6
Operating system:   Ubuntu 24.04.03 LTS
Description:

Using restore_command = 'cp /tmp/receivewal/%f %p' does not use WAL partial
file during backup restore process.
%p value is used by Postgres to copy WAL and history files and is filled by
Postgres internally without any Administrator influence.

Example:
1. Contents of /tmp/receivewal:
0000000300000001000000E6
0000000300000001000000E7
0000000300000001000000E8
0000000300000001000000E9
0000000300000001000000EA
...
000000030000000200000004
000000030000000200000005
000000030000000200000006.partial
00000003.history

2. Log from restore process using restore_command = 'cp /tmp/receivewal/%f
%p' :
2025-08-25 11:15:48.100 CEST [3326] LOG:  starting PostgreSQL 17.6 (Ubuntu
17.6-1.pgdg24.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
13.3.0-6ubuntu2~24.04) 13.3.0, 64-bit
2025-08-25 11:15:48.100 CEST [3326] LOG:  listening on IPv4 address
"0.0.0.0", port 5432
2025-08-25 11:15:48.101 CEST [3326] LOG:  listening on IPv6 address "::",
port 5432
2025-08-25 11:15:48.111 CEST [3326] LOG:  listening on Unix socket
"/var/run/postgresql/.s.PGSQL.5432"
2025-08-25 11:15:48.139 CEST [3330] LOG:  database system was interrupted;
last known up at 2025-08-25 10:28:23 CEST
cp: cannot stat '/tmp/receivewal/00000004.history': No such file or
directory
2025-08-25 11:15:51.197 CEST [3330] LOG:  starting backup recovery with redo
LSN 1/E7000028, checkpoint LSN 1/E7000080, on timeline ID 3
2025-08-25 11:15:51.213 CEST [3330] LOG:  restored log file
"00000003.history" from archive
2025-08-25 11:15:51.318 CEST [3330] LOG:  restored log file
"0000000300000001000000E7" from archive
2025-08-25 11:15:51.439 CEST [3330] LOG:  starting archive recovery
2025-08-25 11:15:51.463 CEST [3330] LOG:  redo starts at 1/E7000028
2025-08-25 11:15:51.526 CEST [3330] LOG:  restored log file
"0000000300000001000000E8" from archive
2025-08-25 11:15:51.647 CEST [3330] LOG:  completed backup recovery with
redo LSN 1/E7000028 and end LSN 1/E7000120
2025-08-25 11:15:51.647 CEST [3330] LOG:  consistent recovery state reached
at 1/E7000120
2025-08-25 11:15:51.647 CEST [3326] LOG:  database system is ready to accept
read-only connections
2025-08-25 11:15:51.860 CEST [3330] LOG:  restored log file
"0000000300000001000000E9" from archive
2025-08-25 11:15:52.111 CEST [3330] LOG:  restored log file
"0000000300000001000000EA" from archive
...
2025-08-25 11:16:04.450 CEST [3328] LOG:  recovery restart point at
1/FC09F7B0
2025-08-25 11:16:04.450 CEST [3328] DETAIL:  Last completed transaction was
at log time 2025-08-25 10:29:45.078323+02.
2025-08-25 11:16:04.450 CEST [3328] LOG:  restartpoint starting: wal
2025-08-25 11:16:04.634 CEST [3330] LOG:  restored log file
"000000030000000200000004" from archive
2025-08-25 11:16:04.821 CEST [3330] LOG:  restored log file
"000000030000000200000005" from archive
cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or
directory
2025-08-25 11:16:05.218 CEST [3330] LOG:  redo done at 2/5F8CD38 system
usage: CPU: user: 2.09 s, system: 4.27 s, elapsed: 13.75 s
2025-08-25 11:16:05.218 CEST [3330] LOG:  last completed transaction was at
log time 2025-08-25 10:31:27.632146+02
2025-08-25 11:16:05.269 CEST [3330] LOG:  restored log file
"000000030000000200000005" from archive
cp: cannot stat '/tmp/receivewal/00000004.history': No such file or
directory
2025-08-25 11:16:05.437 CEST [3330] LOG:  selected new timeline ID: 4
2025-08-25 11:16:05.564 CEST [3330] LOG:  restored log file
"00000003.history" from archive
2025-08-25 11:16:05.583 CEST [3330] LOG:  archive recovery complete
2025-08-25 11:16:06.625 CEST [3328] LOG:  restartpoint complete: wrote 6392
buffers (39.0%); 1 WAL file(s) added, 2 removed, 0 recycled; write=1.443 s,
sync=0.568 s, total=2.175 s; sync files=13, longest=0.444 s, average=0.044
s; distance=49438 kB, estimate=88596 kB; lsn=2/20B7918, redo lsn=1/FF0E70A8
2025-08-25 11:16:06.626 CEST [3328] LOG:  recovery restart point at
1/FF0E70A8
2025-08-25 11:16:06.626 CEST [3328] DETAIL:  Last completed transaction was
at log time 2025-08-25 10:31:27.632146+02.
2025-08-25 11:16:06.626 CEST [3328] LOG:  checkpoints are occurring too
frequently (2 seconds apart)
2025-08-25 11:16:06.626 CEST [3328] HINT:  Consider increasing the
configuration parameter "max_wal_size".
2025-08-25 11:16:06.640 CEST [3328] LOG:  checkpoint starting:
end-of-recovery immediate wait wal
2025-08-25 11:16:07.033 CEST [3328] LOG:  checkpoint complete: wrote 4125
buffers (25.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.278 s,
sync=0.057 s, total=0.407 s; sync files=32, longest=0.013 s, average=0.002
s; distance=113534 kB, estimate=113534 kB; lsn=2/5FC6C38, redo lsn=2/5FC6C38
2025-08-25 11:16:07.068 CEST [3326] LOG:  database system is ready to accept
connections

The last used is "restored log file "000000030000000200000005" from
archive". Partial file is not used because is not copied: cp: cannot stat
'/tmp/receivewal/000000030000000200000006': No such file or directory.


When I used the following parameter value: restore_command = 'cp
/tmp/receivewal/%f %p || cp /tmp/receivewal/%f.partial %p' to include files
with partial extension, file 000000030000000200000006.partial was copied and
seemed to be processed during restore (maybe it had no any valuable data in
my experiment).
The end of the log:
...
2025-08-25 11:06:46.703 CEST [3120] LOG:  restored log file
"000000030000000200000005" from archive
cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or
directory
2025-08-25 11:06:47.604 CEST [3120] LOG:  restored log file
"000000030000000200000006" from archive
2025-08-25 11:06:47.670 CEST [3118] LOG:  restartpoint complete: wrote 5058
buffers (30.9%); 0 WAL file(s) added, 3 removed, 0 recycled; write=1.791 s,
sync=0.506 s, total=2.469 s; sync files=5, longest=0.303 s, average=0.102 s;
distance=49438 kB, estimate=81497 kB; lsn=2/20B7918, redo lsn=1/FF0E70A8
2025-08-25 11:06:47.670 CEST [3118] LOG:  recovery restart point at
1/FF0E70A8
2025-08-25 11:06:47.670 CEST [3118] DETAIL:  Last completed transaction was
at log time 2025-08-25 10:31:27.632146+02.
2025-08-25 11:06:47.671 CEST [3118] LOG:  restartpoint starting: wal
2025-08-25 11:06:48.062 CEST [3120] LOG:  invalid magic number 0000 in WAL
segment 000000030000000200000006, LSN 2/686C000, offset 8830976
2025-08-25 11:06:48.063 CEST [3120] LOG:  redo done at 2/686BFB0 system
usage: CPU: user: 2.54 s, system: 6.48 s, elapsed: 17.16 s
2025-08-25 11:06:48.063 CEST [3120] LOG:  last completed transaction was at
log time 2025-08-25 10:31:45.384122+02
cp: cannot stat '/tmp/receivewal/000000030000000200000006': No such file or
directory
2025-08-25 11:06:48.143 CEST [3120] LOG:  restored log file
"000000030000000200000006" from archive
cp: cannot stat '/tmp/receivewal/00000004.history': No such file or
directory
cp: cannot stat '/tmp/receivewal/00000004.history.partial': No such file or
directory
2025-08-25 11:06:48.306 CEST [3120] LOG:  selected new timeline ID: 4
2025-08-25 11:06:48.461 CEST [3120] LOG:  restored log file
"00000003.history" from archive
2025-08-25 11:06:48.487 CEST [3120] LOG:  archive recovery complete
2025-08-25 11:06:49.671 CEST [3118] LOG:  restartpoint complete: wrote 6161
buffers (37.6%); 0 WAL file(s) added, 0 removed, 1 recycled; write=1.025 s,
sync=0.916 s, total=2.001 s; sync files=44, longest=0.645 s, average=0.021
s; distance=50765 kB, estimate=78424 kB; lsn=2/4E58E48, redo lsn=2/227A4F8
2025-08-25 11:06:49.671 CEST [3118] LOG:  recovery restart point at
2/227A4F8
2025-08-25 11:06:49.671 CEST [3118] DETAIL:  Last completed transaction was
at log time 2025-08-25 10:31:45.384122+02.
2025-08-25 11:06:49.682 CEST [3118] LOG:  checkpoint starting:
end-of-recovery immediate wait
2025-08-25 11:06:50.385 CEST [3118] LOG:  checkpoint complete: wrote 10155
buffers (62.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.600 s,
sync=0.062 s, total=0.714 s; sync files=21, longest=0.013 s, average=0.003
s; distance=71622 kB, estimate=77744 kB; lsn=2/686BFF0, redo lsn=2/686BFF0
2025-08-25 11:06:50.428 CEST [3116] LOG:  database system is ready to accept
connections

Command used for copying WAL and history files copied
000000030000000200000006.partial file and it was used in restore process but
giving false reference to /tmp/receivewal/00000004.history.partial.
Maybe better template for including partial files may be used.

But as an Administrator I expect this %f parameter to cover all files
required to restore for recovery_target_timeline = 'latest'. Especially that
restore_command = 'cp /tmp/receivewal/%f %p' definition example is used in
Postgres documentation as sufficient to get maximum possible restore of
data.

I found "restore_command = 'cp /tmp/receivewal/%f %p || cp
/tmp/receivewal/%f.partial %p'" on page:

https://www.cybertec-postgresql.com/en/never-lose-a-postgresql-transaction-with-pg_receivewal/#archive-recovery-and-partial-wal-segments
published 10.2019, so the problem probably also exists in older Postgres
versions.


pgsql-bugs by date:

Previous
From: Thadeus Anand
Date:
Subject: Re: [CAUTION: SUSPECT SENDER] RE: [CAUTION: SUSPECT SENDER] RE: BUG #19029: Replication Slot size keeps increasing while logical subscription works fine
Next
From: Tom Lane
Date:
Subject: Re: BUG #19031: pg_trgm infinite loop on certain cases