On Fri, May 8, 2015 at 6:00 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Robert Haas wrote:
>
>> It seems to me that the most obvious places where
>> DetermineSafeOldestOffset() should be called are (1) at startup or
>> after recovery, to initialize the value; and (2) each time we truncate
>> the SLRU, to update the value. Other than that, this doesn't change.
>> The startup calls are there, in apparently reasonable places, but it's
>> not obvious to me how this gets called in the TruncateMultiXact path.
>> Instead it seems to get set via the SetMultiXactIdLimit path. Maybe
>> that's OK, but it would seem to imply that we're OK with overwriting
>> old members information if that information was slated for truncation
>> at the next checkpoint anyway, which seems scary.
>
> I considered this question in the previous commit: is it okay to
> overwrite a file that is no longer used (per the limits set by vacuum)
> but not yet removed (by checkpoint)? It seems to me that there is no
> data-loss issue with doing that -- which is why the advance-oldest code
> is called during vacuum and not during checkpoint.
Given three concurrent backends that are checkpointing, vacuuming and
creating multixacts, isn't there some ordering of events that could
cause SlruScanDirCbRemoveMembers to delete a segment file that it
judges to be the tail segment, but which has become the new head
segment just before the file gets unlinked? The comment in
SlrScanDirCbRemoveMembers says "... and we never increase rangeStart
during any one run", but I think that a concurrent vacuum could
advance the oldest multixact and therefore the oldest offset (which is
what range->rangeStart was derived from), and if you're extra unlucky,
a third backend that is creating multixacts could then advance
nextOffset so that it points to a page *past* range->rangeStart, and
we'd start blowing away the wrong data.
I may have the details wrong there, but in general I can't see how you
can delete just the right segment files while the head and tail
pointers of this circular buffer are both moving, unless you hold the
lock to stop that while you make the decision and unlink the file
(which I assume is out of the question), or say 'well if we keep the
head and tail pointers at least 20 pages apart, that's unlikely to
happen' (which I assume is also out of the question). Now I
understand the suggestion that the checkpoint code could be in charge
of advancing the oldest multixact + offset.
But if we did that, our new autovacuum code would get confused and
keep trying to vacuum stuff that it's already dealt with, until the
next checkpoint... so maybe we'd need to track both (1) the offset of
the oldest multixact: the one that we use to trigger autovacuums, even
though there might be even older segments still on disk, and (2)
offset_of_start_of_oldest_segment_on_disk (suitably compressed): the
one that we use to block wraparound in GetNewMultiXactId, so that we
never write into a file that checkpoint might decide to delete.
> Also, if we have extra calls in other places after Thomas' patch,
> perhaps this one is not necessary.
No extra calls added.
--
Thomas Munro
http://www.enterprisedb.com