Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers
| From | Thomas Munro |
|---|---|
| Subject | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date | |
| Msg-id | CA+hUKGL2dppjO4o28ZY7n_LTWviKLAi-7KZ=tx5w2HGevCEYPA@mail.gmail.com Whole thread |
| In response to | Re: pg_waldump: support decoding of WAL inside tarfile (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: pg_waldump: support decoding of WAL inside tarfile
Re: pg_waldump: support decoding of WAL inside tarfile |
| List | pgsql-hackers |
On Thu, Mar 26, 2026 at 6:28 AM Andres Freund <andres@anarazel.de> wrote: > On 2026-03-24 12:11:44 +0900, Michael Paquier wrote: > > On Sun, Mar 22, 2026 at 11:02:20PM -0400, Tom Lane wrote: > > > Proposed patch attached. There might be an argument for using some > > > other size than 256K for the other two decompressors, but my > > > inclination is to try to make all three use roughly the same block > > > size. (See also 66ec01dc4.) > > > > The buildfarm has switched mostly to green, except on this one: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoatzin&dt=2026-03-23%2006%3A00%3A42 > > I think there's a few more failues. Fairywren regularly fails, including in a > run from today. This fails 100% of the time on my machine, even after e9d72348 and ff84efe4, eg: # Running: pg_waldump --path /tmp/D8WG1Sv2HE/pg_wal.tar --start 0/017A2610 --end 0/02093848 [09:43:29.288](0.148s) not ok 104 - runs with path option and start and end locations: exit code 0 [09:43:29.289](0.001s) # Failed test 'runs with path option and start and end locations: exit code 0' # at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl line 402. [09:43:29.290](0.001s) not ok 105 - runs with path option and start and end locations: no stderr [09:43:29.291](0.001s) # Failed test 'runs with path option and start and end locations: no stderr' # at /home/tmunro/projects/postgresql/src/bin/pg_waldump/t/001_basic.pl line 402. [09:43:29.291](0.000s) # got: 'pg_waldump: error: could not find WAL "000000010000000000000002" in archive "pg_wal.tar" # ' I can see that it is wrong about the contents of the tar file: $ pg_waldump --path _tmp_H_1gv81G1L_pg_wal.tar --start 0/017A2610 --end 0/020934F8 2>&1 | tail -3 rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn: 0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel 1663/5/16397 blk 2, blkref #1: rel 1663/5/16397 blk 0 rmgr: Transaction len (rec/tot): 46/ 46, tx: 720, lsn: 0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967 NZDT pg_waldump: error: could not find WAL "000000010000000000000002" in archive "_tmp_H_1gv81G1L_pg_wal.tar" $ tar tvf _tmp_H_1gv81G1L_pg_wal.tar drwx------ 0 tmunro tmunro 0 Mar 29 10:15 archive_status/ -rw------- 0 tmunro tmunro 0 Mar 29 10:15 archive_status/000000010000000000000002.ready -rw------- 0 tmunro tmunro 0 Mar 29 10:15 archive_status/000000010000000000000001.ready drwx------ 0 tmunro tmunro 0 Mar 29 10:08 summaries/ -rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000002 -rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000001 -rw------- 0 tmunro tmunro 16777216 Mar 29 10:15 000000010000000000000003 It seems like the place we'd be looking for the file is in astreamer_tar_header(), so I added in some caveman debugging: /* * Parse key fields out of the header. */ fprintf(stderr, "XXXX [%s] XXXX\n", &buffer[TAR_OFFSET_NAME]); strlcpy(member->pathname, &buffer[TAR_OFFSET_NAME], MAXPGPATH); if (member->pathname[0] == '\0') pg_fatal("tar member has empty name"); Now I see: XXXX [archive_status/] XXXX XXXX [archive_status/000000010000000000000002.ready] XXXX XXXX [archive_status/000000010000000000000001.ready] XXXX XXXX [summaries/] XXXX XXXX [PaxHeader/000000010000000000000002] XXXX XXXX [GNUSparseFile.0/000000010000000000000002] XXXX XXXX [000000010000000000000001] XXXX rmgr: XLOG len (rec/tot): 30/ 30, tx: 0, lsn: 0/017A2610, prev 0/017A25F0, desc: NEXTOID 24576 rmgr: Standby len (rec/tot): 42/ 42, tx: 692, lsn: 0/017A2630, prev 0/017A2610, desc: LOCK xid 692 db 5 rel 16384 rmgr: Storage len (rec/tot): 42/ 42, tx: 692, lsn: 0/017A2660, prev 0/017A2630, desc: CREATE base/5/16384 ... lots more normal output ... rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn: 0/01FFBED8, prev 0/01FFBE98, desc: INSERT off 97, blkref #0: rel 1663/5/16397 blk 2, blkref #1: rel 1663/5/16397 blk 0 rmgr: Heap len (rec/tot): 575/ 575, tx: 720, lsn: 0/01FFBF20, prev 0/01FFBED8, desc: INSERT off: 12, flags: 0x08, blkref #0: rel 1663/5/16393 blk 52 rmgr: Btree len (rec/tot): 64/ 64, tx: 720, lsn:XXXX [PaxHeader/000000010000000000000003] XXXX XXXX [GNUSparseFile.0/000000010000000000000003] XXXX 0/01FFC178, prev 0/01FFBF20, desc: INSERT_LEAF off: 344, blkref #0: rel 1663/5/16396 blk 2 rmgr: Hash len (rec/tot): 72/ 72, tx: 720, lsn: 0/01FFC1B8, prev 0/01FFC178, desc: INSERT off 40, blkref #0: rel 1663/5/16397 blk 2, blkref #1: rel 1663/5/16397 blk 0 rmgr: Transaction len (rec/tot): 46/ 46, tx: 720, lsn: 0/01FFC200, prev 0/01FFC1B8, desc: COMMIT 2026-03-29 10:15:24.112967 NZDT pg_waldump: error: could not find WAL "000000010000000000000002" in archive "_tmp_H_1gv81G1L_pg_wal.tar" Seems like it already stepped over 000000010000000000000002 earlier? Could it be a table-of-contents order dependency bug or something like that?
pgsql-hackers by date: