Re: Race condition in b-tree page deletion - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Race condition in b-tree page deletion
Date
Msg-id 52E66E84.2050109@vmware.com
Whole thread Raw
In response to Re: Race condition in b-tree page deletion  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Race condition in b-tree page deletion
List pgsql-hackers
Thanks for the review!

On 01/18/2014 09:45 PM, Kevin Grittner wrote:
> The patch still applies with a few offsets.  It compiles with these
> warnings (which were also reported by Peter Eisentraut and should
> be fixed):
>
> nbtpage.c: In function ‘_bt_pagedel’:
> nbtpage.c:1695:24: warning: ‘metad’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> nbtpage.c:1414:18: note: ‘metad’ was declared here
> nbtpage.c:1730:4: warning: ‘metapg’ may be used uninitialized in this function [-Wmaybe-uninitialized]
> nbtpage.c:1413:8: note: ‘metapg’ was declared here

Fixed.

> I'm not a fan of the new retry label introduced in _bt_pagedel()
> and the goto statement which references it.  I think a loop using
> while() or for() would be cleaner.  The whole idea of describing
> this logic recursively and then making a big deal about using tail
> recursion as an optimization seems pointless and confusing.  It
> could just as easily be described as an iterative approach and save
> the extra logical step in explaining the code.  I think that would
> be clearer and easier to understand.  It would also eliminate the
> last parameter, which is always passed as NULL and only set to
> something else internally.  I think it would be cleaner to make
> that a local variable initialized to NULL as part of eliminating
> the fiction of recursion here.

Ok, changed it to a loop, and moved the responsibility to construct the
'stack' into _bt_pagedel().

> There is a point in this loop where we return 0, but I think that
> is wrong; I think we should break so that the pages already deleted
> will be counted.

Fixed.

> On a related note. the caller doesn't actually
> use the count, but assumes that any non-zero value should be
> treated as 1.

Hmm. The caller does this:

>         ndel = _bt_pagedel(rel, buf);
>
>         /* count only this page, else may double-count parent */
>         if (ndel)
>             stats->pages_deleted++;

The problem here is that btvacuumpage() might visit the parent page
again, and count it again. The code is erring on the side of
undercounting; I'm not sure which is better.

> Similar issues with faux recursion existing the calling function,
> btvacuumpage().  In that case the unnecessary parameter which the
> caller must get right is orig_blkno, which must be the same as
> blkno on an actual call.  That could be a local variable
> initialized to blkno within the function.  I think this would be
> better described as iteration directly rather than claiming
> recursions and whining about how the compiler is too dumb to turn
> it into iteration for us.  It's not the job of this patch to fix
> that in the caller, but let's not emulate it.

Agreed.

> Other than that, I have not found any problems.

While fixing the above issues, and reviewing this again in general, I
noticed that btree_xlog_unlink_page() was a few bricks shy of a load.
When unlinking an internal page, it didn't update the pointer to the new
topmost page in the to-be-deleted chain. Fixed that.

Attached is a new version. I also re-worded the README changes; I hope
it reads better now.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Retain dynamic shared memory segments for postmaster lifetime
Next
From: Stephen Frost
Date:
Subject: Re: Custom Scan APIs (Re: Custom Plan node)