Re: snapshot too old, configured by time - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: snapshot too old, configured by time
Date
Msg-id CAM3SWZQdSPVsVL_jjjKVQ0kd8OskLPZFzXFrt-DmzSOgFuRtew@mail.gmail.com
Whole thread Raw
In response to Re: snapshot too old, configured by time  (Kevin Grittner <kgrittn@gmail.com>)
Responses Re: snapshot too old, configured by time  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-hackers
On Wed, Mar 30, 2016 at 12:46 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Wed, Mar 30, 2016 at 2:29 PM, Peter Geoghegan <pg@heroku.com> wrote:
>
>> [Does the patch allow dangling page pointers?]
>
>> Again, I don't want to prejudice anyone against your patch, which I
>> haven't read.
>
> I don't believe that the way the patch does its business opens any
> new vulnerabilities of this type.  If you see such after looking at
> the patch, let me know.

Okay, let me be more concrete about this. The patch does this:

> --- a/src/backend/access/heap/pruneheap.c
> +++ b/src/backend/access/heap/pruneheap.c
> @@ -92,12 +92,21 @@ heap_page_prune_opt(Relation relation, Buffer buffer)
>      * need to use the horizon that includes slots, otherwise the data-only
>      * horizon can be used. Note that the toast relation of user defined
>      * relations are *not* considered catalog relations.
> +    *
> +    * It is OK to apply the old snapshot limit before acquiring the cleanup
> +    * lock because the worst that can happen is that we are not quite as
> +    * aggressive about the cleanup (by however many transaction IDs are
> +    * consumed between this point and acquiring the lock).  This allows us to
> +    * save significant overhead in the case where the page is found not to be
> +    * prunable.
>      */
>     if (IsCatalogRelation(relation) ||
>         RelationIsAccessibleInLogicalDecoding(relation))
>         OldestXmin = RecentGlobalXmin;
>     else
> -       OldestXmin = RecentGlobalDataXmin;
> +       OldestXmin =
> +               TransactionIdLimitedForOldSnapshots(RecentGlobalDataXmin,
> +                                                   relation);

This new intermediary function TransactionIdLimitedForOldSnapshots()
is called to decide what OldestXmin actually gets to be above, based
in part on the new GUC old_snapshot_threshold:

> +/*
> + * TransactionIdLimitedForOldSnapshots
> + *
> + * Apply old snapshot limit, if any.  This is intended to be called for page
> + * pruning and table vacuuming, to allow old_snapshot_threshold to override
> + * the normal global xmin value.  Actual testing for snapshot too old will be
> + * based on whether a snapshot timestamp is prior to the threshold timestamp
> + * set in this function.
> + */
> +TransactionId
> +TransactionIdLimitedForOldSnapshots(TransactionId recentXmin,
> +                                   Relation relation)

It might not be RecentGlobalDataXmin that is usually returned as
OldestXmin as it is today, which is exactly the point of this patch:
VACUUM can be more aggressive in cleaning up bloat, not unlike the
non-catalog logical decoding case, on the theory that we can reliably
detect when that causes failures for old snapshots, and just raise a
"snapshot too old" error. (RecentGlobalDataXmin is morally about the
same as RecentGlobalXmin, as far as this patch goes).

So far, so good. It's okay that _bt_page_recyclable() never got the
memo about any of this...:

/**  _bt_page_recyclable() -- Is an existing page recyclable?** This exists to make sure _bt_getbuf and btvacuumscan
havethe same* policy about whether a page is safe to re-use.*/
 
bool
_bt_page_recyclable(Page page)
{   BTPageOpaque opaque;
  ...
   /*    * Otherwise, recycle if deleted and too old to have any processes    * interested in it.    */   opaque =
(BTPageOpaque)PageGetSpecialPointer(page);   if (P_ISDELETED(opaque) &&       TransactionIdPrecedes(opaque->btpo.xact,
RecentGlobalXmin))      return true;   return false;
 
}

...because this patch does nothing to advance RecentGlobalXmin (or
RecentGlobalDataXmin) itself more aggressively. It does make
vacuum_set_xid_limits() get a more aggressive cutoff point, but we
don't see that being passed back down by lazy vacuum here; within
_bt_page_recyclable(), we rely on the more conservative
RecentGlobalXmin, which is not subject to any clever optimization in
the patch.

Fortunately, this seems correct, since index scans will always succeed
in finding a deleted page, per nbtree README notes on
RecentGlobalXmin. Unfortunately, this does stop recycling from
happening early for B-Tree pages, even though that's probably safe in
principle. This is probably not so bad -- it just needs to be
considered when reviewing this patch (the same is true of logical
decoding's RecentGlobalDataXmin; it also doesn't appear in
_bt_page_recyclable(), and I guess that that was never a problem).
Index relations will not get smaller in some important cases, but they
will be made less bloated by VACUUM in a sense that's still probably
very useful. Maybe that explains some of what Jeff talked about.

I think another part of the problems that Jeff mentioned (with
pruning) could be this existing code within heap_hot_search_buffer():
       /*        * If we can't see it, maybe no one else can either.  At caller        * request, check whether all
chainmembers are dead to all        * transactions.        */       if (all_dead && *all_dead &&
!HeapTupleIsSurelyDead(heapTuple,RecentGlobalXmin))           *all_dead = false;
 

This is used within routines like btgettuple(), to do the LP_DEAD
thing to kill index tuples (not HOT chain pruning).

Aside: Not sure offhand why it might be okay, performance-wise, that
this code doesn't care about RecentGlobalDataXmin; pruning was a big
part of why RecentGlobalDataXmin was added for logical decoding, I
thought, although I guess the _bt_killitems() stuff doesn't count as
pruning.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
Next
From: Abhijit Menon-Sen
Date:
Subject: Re: dealing with extension dependencies that aren't quite 'e'