Re: Show various offset arrays for heap WAL records - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Show various offset arrays for heap WAL records
Date
Msg-id CAH2-WzkJSmaywgUVr0PKSLLzm3ODPziDcxFRUTvp0AP3hQTXNg@mail.gmail.com
Whole thread Raw
In response to Re: Show various offset arrays for heap WAL records  (Melanie Plageman <melanieplageman@gmail.com>)
Responses Re: Show various offset arrays for heap WAL records
List pgsql-hackers
On Mon, Mar 27, 2023 at 2:29 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> I went to add dedup records and noticed that since the actual
> BTDedupInterval struct is what is put in the xlog, I would need access
> to that type from nbtdesc.c, however, including nbtree.h doesn't seem to
> work because it includes files that cannot be included in frontend code.

I suppose that the BTDedupInterval struct could have just as easily
gone in nbtxlog.h, next to xl_btree_dedup. There might have been a
moment where I thought about doing it that way, but I guess I found it
slightly preferable to use that symbol name (BTDedupInterval) rather
than (say) xl_btree_dedup_interval in places like the nearby
BTDedupStateData struct.

Actually, I suppose that it's hard to make that alternative work, at
least without
including nbtxlog.h in nbtree.h. Which sounds wrong.

> I, of course, could make some local struct in nbtdesc.c which has an
> OffsetNumber and a uint16, since the BTDedupInterval is pretty
> straightforward, but that seems a bit annoying.
> I'm probably missing something obvious, but is there a better way to do
> this?

It was probably just one of those cases where I settled on the
arrangement that looked least odd overall. Not a particularly
principled approach. But the approach that I'm going to take once more
here.  ;-)

All of the available alternatives are annoying in roughly the same
way, though perhaps to varying degrees. All except one: I'm okay with
just not adding coverage for deduplication records, for the time being
-- just seeing the number of intervals alone is relatively informative
with deduplication records, unlike (say) nbtree delete records. I'm
also okay with having coverage for dedup records if you feel it's
worth having. Your call.

If we're going to have coverage for deduplication records then it
seems to me that we have to have a struct in nbtxlog.h for your code
to work off of. It also seems likely that we'll want to use that same
struct within nbtxlog.c. What's less clear is what that means for the
BTDedupInterval struct. I don't think that we should include nbtxlog.h
in nbtree.h, nor should we do the converse.

I guess maybe two identical structs would be okay. BTDedupInterval,
and xl_btree_dedup_interval, with the former still used in nbtdedup.c,
and the latter used through a pointer at the point that nbtxlog.c
reads a dedup record. Then maybe at a sizeof() static assert beside
the existing btree_xlog_dedup() assertions that check that the dedup
state interval array matches the array taken from the WAL record.
That's still a bit weird, but I find it preferable to any alternative
that I can think of.

> On another note, I've thought about how to include some example output
> in docs, and, for example we could modify the example output in the
> pgwalinspect docs which includes a PRUNE record already for
> pg_get_wal_record_info() docs. We'd probably just want to keep it short.

Yeah. Perhaps a PRUNE record for one of the system catalogs whose
relfilenode is relatively recognizable. Say pg_class. It probably
doesn't matter that much, but there is perhaps some small value in
picking an example that is relatively easy to recreate later on (or to
approximately recreate). I'm certainly not insisting on that, though.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Non-superuser subscription owners
Next
From: gkokolatos@pm.me
Date:
Subject: Re: Add LZ4 compression in pg_dump