VACUUM can finish an interrupted nbtree page split -- is that okay? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject VACUUM can finish an interrupted nbtree page split -- is that okay?
Date
Msg-id CAH2-Wz=_Xvv8byzK_LvY4ci76OgsHCQzoKF7We8yG9waO7j6rA@mail.gmail.com
Whole thread Raw
Responses Re: VACUUM can finish an interrupted nbtree page split -- is that okay?
Re: VACUUM can finish an interrupted nbtree page split -- is that okay?
List pgsql-hackers
_bt_lock_branch_parent() is used by VACUUM during page deletion, and
calls _bt_getstackbuf(), which always finishes incomplete page splits
for the parent page that it exclusive locks and returns. ISTM that
this may be problematic, since it contradicts the general rule that
VACUUM isn't supposed to finish incomplete page splits. According to
the nbtree README:

"It would seem natural to add the missing downlinks in VACUUM, but since
inserting a downlink might require splitting a page, it might fail if you
run out of disk space.  That would be bad during VACUUM - the reason for
running VACUUM in the first place might be that you run out of disk space,
and now VACUUM won't finish because you're out of disk space.  In contrast,
an insertion can require enlarging the physical file anyway."

I'm inclined to note this as an exception in the nbtree README, and
leave it at that. Interrupted internal page splits are probably very
rare in practice, so the operational risk of running out of disk space
like this is minimal.

FWIW, I notice that the logic that appears after the
_bt_lock_branch_parent() call to _bt_getstackbuf() anticipates that it
must defend against interrupted splits in at least the
grandparent-of-leaf page, and maybe even the parent, so it's probably
not unworkable to not finish the split:

    /*
     * If the target is the rightmost child of its parent, then we can't
     * delete, unless it's also the only child.
     */
    if (poffset >= maxoff)
    {
        /* It's rightmost child... */
        if (poffset == P_FIRSTDATAKEY(opaque))
        {
            /*
             * It's only child, so safe if parent would itself be removable.
             * We have to check the parent itself, and then recurse to test
             * the conditions at the parent's parent.
             */
            if (P_RIGHTMOST(opaque) || P_ISROOT(opaque) ||
                P_INCOMPLETE_SPLIT(opaque))
            {
                _bt_relbuf(rel, pbuf);
                return false;
            }

Separately, I noticed another minor issue that appears a few lines
further down still:

            /*
             * Perform the same check on this internal level that
             * _bt_mark_page_halfdead performed on the leaf level.
             */
            if (_bt_is_page_halfdead(rel, *rightsib))
            {
                elog(DEBUG1, "could not delete page %u because its
right sibling %u is half-dead",
                     parent, *rightsib);
                return false;
            }

I thought that internal pages were never half-dead after Postgres 9.4.
If that happens, then the check within _bt_pagedel() will throw an
ERRCODE_INDEX_CORRUPTED error, and tell the DBA to REINDEX. Shouldn't
this internal level _bt_is_page_halfdead() check contain a "can't
happen" error, or even a simple assertion?

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Incomplete startup packet errors
Next
From: David Rowley
Date:
Subject: Re: NOT IN subquery optimization