Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL - Mailing list pgsql-hackers

From David Christensen
Subject Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
Date
Msg-id CAOxo6X+wXEu1QfTFTf-WiDs3jr8ERvgEOddHfUhAH_5eje-G2w@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses Re: [PATCH] Teach pg_waldump to extract FPIs from the WAL
List pgsql-hackers
On Fri, Nov 11, 2022 at 4:57 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > Well if it's not the same output then I guess you're right and there's
> > not a use for the `--fixup` mode.  By the same token, I'd say
> > calculating/setting the checksum also wouldn't need to be done, we
> > should just include the page as included in the WAL stream.
>
> Let's hear from others, we may be missing something here. I recommend
> keeping the --fixup patch as 0002, in case if we decide to discard
> it's easier, however I'll leave that to you.

I've whacked in `git` for now; I can resurrect if people consider it useful.

> > > > > 5.
> > > > > +        if (!RestoreBlockImage(record, block_id, page))
> > > > > +            continue;
> > > > > +
> > > > > +        /* we have our extracted FPI, let's save it now */
> > > > > After extracting the page from the WAL record, do we need to perform a
> > > > > checksum on it?
> > >
> > > I think you just need to do the following, this will ensure the
> > > authenticity of the page that pg_waldump returns.
> > > if ((PageHeader) page)->pd_checksum != pg_checksum_page((char *) page, blk))
> > > {
> > >     pg_fatal("page checksum failed");
> > > }
> >
> > The WAL already has a checksum, so not certain this makes sense on its
> > own. Also I'm inclined to make it a warning if it doesn't match rather
> > than a fatal. (I'd also have to verify that the checksum is properly
> > set on the page prior to copying the FPI into WAL, which I'm pretty
> > sure it is but not certain.)
>
> How about having it as an Assert()?

Based on empirical testing, the checksums don't match, so
asserting/alerting on each block extracted seems next to useless, so
going to just remove that.

> > > 5.
> > > +        fsync(fileno(OPF));
> > > +        fclose(OPF);
> > > I think just the fsync() isn't enough, you still need fsync_fname()
> > > and/or fsync_parent_path(), perhaps after for (block_id = 0; block_id
> > > <= XLogRecMaxBlockId(record); block_id++) loop.
> >
> > I'm not sure I get the value of the fsyncs here; if you are using this
> > tool at this capacity you're by definition doing some sort of
> > transient investigative steps.  Since the WAL was fsync'd, you could
> > always rerun/recreate as needed in the unlikely event of an OS crash
> > in the middle of this investigation.  Since this is outside the
> > purview of the database operations proper (unlike, say, initdb) seems
> > like it's unnecessary (or definitely shouldn't need to be selectable).
> > My thoughts are that if we're going to fsync, just do the fsyncs
> > unconditionally rather than complicate the interface further.
>
> -1 for fysnc() per file created as it can create a lot of sync load on
> production servers impacting performance. How about just syncing the
> directory at the end assuming it doesn't cost as much as fsync() per
> FPI file created would?

I can fsync the dir if that's a useful compromise.

> > > 4.
> > > +$primary->append_conf('postgresql.conf', "wal_level='replica'");
> > > +$primary->append_conf('postgresql.conf', 'archive_mode=on');
> > > +$primary->append_conf('postgresql.conf', "archive_command='/bin/false'");
> > > Why do you need to set wal_level to replica, out of the box your
> > > cluster comes with replica only no?
> > > And why do you need archive_mode on and set the command to do nothing?
> > > Why archiving is needed for testing your feature firstly?
> >
> > I think it had shown "minimal" in my testing; I was purposefully
> > failing archives so the WAL would stick around.  Maybe a custom
> > archive command that just copied a single WAL file into a known
> > location so I could use that instead of the current approach would
> > work, though not sure how Windows support would work with that.  Open
> > to other ideas to more cleanly get a single WAL file that isn't the
> > last one.  (Earlier versions of this test were using /all/ of the
> > generated WAL files rather than a single one, so maybe I am
> > overcomplicating things for a single WAL file case.)
>
> Typically we create a physical replication slot at the beginning so
> that the server keeps the WAL required for you in pg_wal itself, for
> instance, please see pg_walinspect:
>
> -- Make sure checkpoints don't interfere with the test.
> SELECT 'init' FROM
> pg_create_physical_replication_slot('regress_pg_walinspect_slot',
> true, false);

Will see if I can get something like this to work; I'm currently
stopping the server before running the file-based tests, but I suppose
there's no reason to do so, so a temporary slot that holds it around
until the test is complete is probably fine.

David



pgsql-hackers by date:

Previous
From: samay sharma
Date:
Subject: Re: Documentation for building with meson
Next
From: Robert Haas
Date:
Subject: Re: Add sub-transaction overflow status in pg_stat_activity