Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vslookups - Mailing list pgsql-hackers

From Petr Jelinek
Subject Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vslookups
Date
Msg-id 979ff13d-0b8e-4937-01e8-2925c0adc306@2ndquadrant.com
Whole thread Raw
In response to [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: [HACKERS] [PATCH] Fix minor race in commit_ts SLRU truncation vs lookups  (Craig Ringer <craig@2ndquadrant.com>)
List pgsql-hackers
On 28/12/16 15:01, Craig Ringer wrote:
> Hi all
> 
> There's a minor race between commit_ts SLRU truncation and concurrent
> commit_ts lookups, where a lookup can check the lower valid bound xid
> without knowing it's already been truncated away. This would result in
> a SLRU lookup error.
> 
> It's pretty low-harm since it's hard to trigger and the only problem
> is an error being emitted when we should otherwise return null/zero.
> Most notably you have to pass an xid that used to be within the
> datrozenxid but due to a concurrent vacuum has just moved outside it.
> This can't happen if you're passing the xmin of a tuple that still
> exists so it only matters for callers passing arbitrary XIDs in via
> pg_xact_commit_timestamp(...).
> 
> The race window is bigger on standby because there we don't find out
> about the advance of the lower commit ts bound until the next
> checkpoint. But you still have to be looking at very old xids that
> don't exist on the heap anymore.
> 
> We might as well fix it in HEAD, but it's totally pointless to
> back-patch, and the only part of the race that can be realistically
> hit is on standby, where we can't backpatch a fix w/o changing the
> xlog format. Nope. We could narrow the scope by limiting commit_ts
> slru truncation to just before a checkpoint, but given how hard this
> is to hit... I don't care.
> 
> (This came up as part of the investigation I've been doing on the
> txid_status thread, where Robert pointed out a similar problem that
> can arise where txid_status races with clog truncation. I noticed the
> issue with standby while looking into that.)
> 

Hi,

I remember thinking this might affect committs when I was reading that
thread but didn't have time to investigate it yet myself. Thanks for
doing the all the work yourself.

About the patch, it looks good to me for master with the minor exception
that:
> +        appendStringInfo(buf, "pageno %d, xid %u",
> +            trunc.pageno, trunc.oldestXid);

This should probably say oldestXid instead of xid in the text description.

About back-patching, I wonder if standby could be modified to move
oldestXid based on pageno reverse calculation, but it's going to be
slightly ugly.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] proposal: session server side variables
Next
From: Peter Eisentraut
Date:
Subject: [HACKERS] make more use of RoleSpec struct