BRIN FSM vacuuming questions - Mailing list pgsql-hackers

From Tom Lane
Subject BRIN FSM vacuuming questions
Date
Msg-id 5801.1522429460@sss.pgh.pa.us
Whole thread Raw
Responses Re: BRIN FSM vacuuming questions
List pgsql-hackers
I noticed that several places in brin_pageops.c do this sort of thing:

        if (extended)
        {
            Assert(BlockNumberIsValid(newblk));
            RecordPageWithFreeSpace(idxrel, newblk, freespace);
            FreeSpaceMapVacuum(idxrel);
        }

ie they put *one* page into the index's free space map and then invoke a
scan of the index's entire FSM to ensure that that info is bubbled up to
the top immediately.  I wonder how carefully this was thought through.
We don't generally think that it's worth doing an FSM vacuum for a single
page worth of free space.

We could make these operations somewhat cheaper, in the wake of 851a26e26,
by doing

-           FreeSpaceMapVacuum(idxrel);
+           FreeSpaceMapVacuumRange(idxrel, newblk, newblk + 1);

so that only the relevant subtree of the FSM gets scanned.  But it seems
at least as plausible to just drop the FreeSpaceMapVacuum calls entirely
and let the next VACUUM fix up the map, like the entire rest of the
system does.  This is particularly true because the code is inconsistent:
some places that do RecordPageWithFreeSpace follow it up with an immediate
FreeSpaceMapVacuum, and some don't.

Or, perhaps, there's actually some method behind this madness and there's
good reason to think that FreeSpaceMapVacuum calls in these specific
places are worth the cost?  If so, a comment documenting the reasoning
would be appropriate.

I'm also finding this code in brin_page_cleanup a bit implausible:

    /* Measure free space and record it */
    freespace = br_page_get_freespace(page);
    if (freespace > GetRecordedFreeSpace(idxrel, BufferGetBlockNumber(buf)))
    {
        RecordPageWithFreeSpace(idxrel, BufferGetBlockNumber(buf), freespace);
        return true;
    }

because of the fact that GetRecordedFreeSpace will report a rounded-down
result owing to the limited resolution of the FSM's numbers.  If we
assume that the result of br_page_get_freespace is always maxaligned,
then by my math there's a 75% chance (on maxalign-8 machines; more if
maxalign is 4) that this will think there's a discrepancy even when
there is none.  It seems questionable to me that it's worth checking
GetRecordedFreeSpace at all.  If it really is, we'd need to get the
recorded free space, invoke RecordPageWithFreeSpace, and then see if
the result of GetRecordedFreeSpace has changed in order to decide whether
anything really changed.  But that seems overly complicated, and again
unlike anything done elsewhere.  Also, the result value is used to
decide whether FreeSpaceMapVacuum is needed, but I'm unconvinced that
it's a good idea to skip said call just because the leaf FSM values did
not change; that loses the opportunity to clean up any upper-level FSM
errors that may exist.

In short, I'd suggest making this RecordPageWithFreeSpace call
unconditional, dropping the result value of brin_page_cleanup,
and doing FreeSpaceMapVacuum unconditionally too, back in
brin_vacuum_scan.  I don't see a reason for brin to be unlike
the rest of the system here.

Lastly, I notice that brin_vacuum_scan's loop is coded like

    for (blkno = 0; blkno < RelationGetNumberOfBlocks(idxrel); blkno++)
    {
        ...
    }

which means we're incurring an lseek kernel call for *each iteration*
of the loop.  Surely that's pretty inefficient.  Is there an intention
here to be sure we examine every page even in the face of concurrent
extensions?  If so, we could code it like this:

    nblocks = RelationGetNumberOfBlocks(idxrel);
    for (blkno = 0; blkno < nblocks; blkno++)
    {
        ...
        if (blkno == nblocks-1)
            nblocks = RelationGetNumberOfBlocks(idxrel);
    }

but I'd just as soon skip the recheck unless there's a really strong
reason for it.  Even with that, there's no guarantee somebody hasn't added
another page before we ever get out of the function, so I'm unclear
on what the point is.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Jeremy Finzel
Date:
Subject: Re: Feature Request - DDL deployment with logical replication
Next
From: Konstantin Knizhnik
Date:
Subject: Re: JIT compiling with LLVM v12.2