Re: B-tree parent pointer and checkpoints - Mailing list pgsql-hackers

From Tom Lane
Subject Re: B-tree parent pointer and checkpoints
Date
Msg-id 12375.1289429390@sss.pgh.pa.us
Whole thread Raw
In response to Re: B-tree parent pointer and checkpoints  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: B-tree parent pointer and checkpoints
Re: B-tree parent pointer and checkpoints
List pgsql-hackers
I wrote:
> What happens if you error out in between?  Or is it assumed that the
> *entire* sequence is a critical section?  If it has to be that way,
> one might wonder what's the point of trying to split it into multiple
> WAL records.

Or, to be more concrete: I'm wondering if this *entire* mechanism isn't
a bad idea that we should just rip out.

The question that ought to be asked here, I think, is whether it
shouldn't be required that every inter-WAL-record state is a valid
consistent state that doesn't require post-crash fixups.  If that
isn't the case, then a simple ERROR or FATAL exit out of the backend
that was creating the sequence originally will leave the system in
an unacceptable state.  We could prevent such an exit by wrapping the
whole sequence in a critical section, but if you have to do that then
it's not apparent why you shouldn't fold it into one WAL record.

IOW, forget this patch.  Take out the logic that tries to complete
pending splits during replay, instead.  I believe this is perfectly safe
for btree: loss of a parent record isn't fatal, as proven by the fact
that searches don't have to be locked out while a split proceeds.
(We might want to make btree_page_del not think that a missing parent
record is an error, but it shouldn't think that anyway, because of the
possibility of a non-crashing failure during the original split.)
This approach might not be safe for GIST or GIN; but if it isn't, they
need fixes anyway.
        regards, tom lane


pgsql-hackers by date:

Previous
From: KaiGai Kohei
Date:
Subject: Re: security hooks on object creation
Next
From: Tom Lane
Date:
Subject: Re: multi-platform, multi-locale regression tests