Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers
| From | Andrew Dunstan |
|---|---|
| Subject | Re: pg_waldump: support decoding of WAL inside tarfile |
| Date | |
| Msg-id | 401bf08a-c8f1-48e2-9a30-78deaa9fa7c5@dunslane.net Whole thread Raw |
| In response to | Re: pg_waldump: support decoding of WAL inside tarfile (Amul Sul <sulamul@gmail.com>) |
| List | pgsql-hackers |
On Sat, Mar 7, 2026 at 3:51 AM Andrew Dunstan <andrew@dunslane.net> wrote:On 2026-03-04 We 4:50 PM, Andrew Dunstan wrote:On 2026-03-04 We 7:52 AM, Amul Sul wrote:On Wed, Mar 4, 2026 at 6:07 AM Andrew Dunstan<andrew@dunslane.net> wrote:On 2026-03-02 Mo 8:00 AM, Amul Sul wrote:On Wed, Feb 18, 2026 at 12:28 PM Amul Sul<sulamul@gmail.com> wrote:On Tue, Feb 10, 2026 at 3:06 PM Amul Sul<sulamul@gmail.com> wrote:On Wed, Feb 4, 2026 at 6:39 PM Amul Sul<sulamul@gmail.com> wrote:On Wed, Jan 28, 2026 at 2:41 AM Robert Haas<robertmhaas@gmail.com> wrote:On Tue, Jan 27, 2026 at 7:07 AM Amul Sul<sulamul@gmail.com> wrote:In the attached version, I am using the WAL segment name as the hash key, which is much more straightforward. I have rewritten read_archive_wal_page(), and it looks much cleaner than before. The logic to discard irrelevant WAL files is still within get_archive_wal_entry. I added an explanation for setting cur_wal to NULL, which is now handled in the separate function I mentioned previously. Kindly have a look at the attached version; let me know if you are still not happy with the current approach for filtering/discarding irrelevant WAL segments. It isn't much different from the previous version, but I have tried to keep it in a separate routine for better code readability, with comments to make it easier to understand. I also added a comment for ArchivedWALFile.I feel like the division of labor between get_archive_wal_entry() and read_archive_wal_page() is odd. I noticed this in the last version, too, and it still seems to be the case. get_archive_wal_entry() first calls ArchivedWAL_lookup(). If that finds an entry, it just returns. If it doesn't, it loops until an entry for the requested file shows up and then returns it. Then control returns to read_archive_wal_page() which loops some more until we have all the data we need for the requested file. But it seems odd to me to have two separate loops here. I think that the first loop is going to call read_archive_file() until we find the beginning of the file that we care about and then the second one is going to call read_archive_file() some more until we have read enough of it to satisfy the request. It feels odd to me to do it that way, as if we told somebody to first wait until 9 o'clock and then wait another 30 minutes, instead of just telling them to wait until 9:30. I realize it's not quite the same thing, because apart from calling read_archive_file(), the two loops do different things, but I still think it looks odd. + /* + * Ignore if the timeline is different or the current segment is not + * the desired one. + */ + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz); + if (privateInfo->timeline != curSegTimeline || + privateInfo->startSegNo > curSegNo || + privateInfo->endSegNo < curSegNo || + segno > curSegNo) + { + free_archive_wal_entry(entry->fname, privateInfo); + continue; + } The comment doesn't match the code. If it did, the test would be (privateInfo->timeline != curSegTimeline || segno != curSegno). But instead the segno test is > rather than !=, and the checks against startSegNo and endSegNo aren't explained at all. I think I understand why the segno test uses > rather than !=, but it's the point of the comment to explain things like that, rather than leaving the reader to guess. And I don't know why we also need to test startSegNo and endSegNo. I also wonder what the point is of doing XLogFromFileName() on the fname provided by the caller and then again on entry->fname. Couldn't you just compare the strings? Again, the division of labor is really odd here. It's the job of astreamer_waldump_content() to skip things that aren't WAL files at all, but it's the job of get_archive_wal_entry() to skip things that are WAL files but not the one we want. I disagree with putting those checks in completely separate parts of the code.Keeping the timeline and segment start-end range checks inside the archive streamer creates a circular dependency that cannot be resolved without a 'dirty hack'. We must read the first available WAL file page to determine the wal_segment_size before it can calculate the target segment range. Moving the checks inside the streamer would make it impossible to process that initial file, as the necessary filtering parameters -- would still be unknown which would need to be skipped for the first read somehow. What if later we realized that the first WAL file which was allowed to be streamed by skipping that check is irrelevant and doesn't fall under the start-end segment range?Please have a look at the attached version, specifically patch 0005. In astreamer_waldump_content(), I have moved the WAL file filtration check from get_archive_wal_entry(). This check will be skipped during the initial read in init_archive_reader(), which instead performs it explicitly once it determines the WAL segment size and the start/end segments. To access the WAL segment size inside astreamer_waldump_content(), I have moved the WAL segment size variable into the XLogDumpPrivate structure in the separate 0004 patch.Attached is an updated version including the aforesaid changes. It includes a new refactoring patch (0001) that moves the logic for identifying tar archives and their compression types from pg_basebackup and pg_verifybackup into a separate-reusable function, per a suggestion from Euler [1]. Additionally, I have added a test for the contrecord decoding to the main patch (now 0006). 1]http://postgr.es/m/3c8e7b02-2152-495a-a0b6-e37cf9286a70@app.fastmail.comRebased against the latest master, fixed typos in code comments, and replaced palloc0 with palloc0_object.Hi Amul. I think this looks in pretty good shape.Thank you very much for looking at the patch.Attached are patches for a few things I think could be fixed. They are mostly self-explanatory. The TAP test fix is the only sane way I could come up with stopping the skip code you had from reporting a wildly inaccurate number of tests skipped. The sane way to do this from a Test::More perspective is a subtest, but unfortunately meson does not like subtest output, which is why we don't use it elsewhere, so the only way I could come up with was to split this out into a separate test. Of course, we might just say we don't care about the misreport, in which case we could just live with things as they are.I agree that the reported skip number was incorrect, and I have corrected it in the attached patch. I haven't applied your patch for the TAP test improvements yet because I wanted to double-check it first with you; the patch as it stood created duplicate tests already present in 001_basic.pl. To avoid this duplication, I have added a loop that performs tests for both plain and tar WAL directory inputs, similar to the approach used in pg_verifybackup for different compression type tests (e.g., 008_untar.pl, 010_client_untar.pl). I don't have any objection to doing so if you feel the duplication is acceptable, but I feel that using a loop for the tests in 001_basic.pl is a bit tidier. Let me know your thoughts.I will take a look.I'm ok, with doing it this way. It's just a bit fragile - if we add a test the number will be wrong. But maybe it's not worth worrying about. Everything else looks fairly good. The attached fixes a few relatively minor issues in v15. The main one is that it stops allocating/freeing a buffer every time we call read_archive_file() and instead adds a reusable buffer. It also adds back wal-directory as an undocumented alias of wal-path, to avoid breaking legacy scripts unnecessarily, and adds constness to the fname argument of pg_tar_compress_algorithm, as well as fixing some indentation and grammar issues. All in all I think we're in good shape.Thanks for the review. I have incorporated your suggested changes, with one exception: I have skipped the buffer reallocation code in read_archive_file(). Since we only handle two specific read sizes -- XLOG_BLCKSZ and READ_CHUNK_SIZE (128 KB, we defined in archive_waldump.c) -- dynamic reallocation seems unnecessary. Instead, I moved the allocation to init_archive_reader(), which now initializes a buffer at READ_CHUNK_SIZE. I also added an assertion in read_archive_file() to ensure that no read request exceeds this allocated capacity. Kindly have a look at the attached version and let me know your thoughts.
Looks pretty good. I have squashed them into three patches I think are committable. Also attached is a diff showing what's changed - mainly this:
. --follow + tar archive rejected (pg_waldump.c) — new validation prevents a confusing pg_fatal when combining --follow with a tar archive
. error messages split (archive_waldump.c) — the single "could not read file" error is now two distinct messages: "WAL segment is too short" (truncated file) vs "unexpected end of archive" (archive EOF) - Fixes an issue raised in review
. hash table cleanup (archive_waldump.c) — free_archive_reader now iterates and frees all remaining hash entries and destroys the table
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
pgsql-hackers by date: