Re: snapshot too old, configured by time - Mailing list pgsql-hackers

From Steve Singer
Subject Re: snapshot too old, configured by time
Date
Msg-id BLU437-SMTP193E7367B788D2EC2B2B81DC160@phx.gbl
Whole thread Raw
In response to Re: snapshot too old, configured by time  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: snapshot too old, configured by time  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers
On 10/15/2015 05:47 PM, Kevin Grittner wrote:
> All other issues raised by Álvaro and Steve have been addressed,
> except for this one, which I will argue against:

I've been looking through the updated patch


In snapmgr.c


+ * XXX: If we can trust a read of an int64 value to be atomic, we can
skip the
+ * spinlock here.
+ */
+int64
+GetOldSnapshotThresholdTimestamp(void)


Was your intent with the XXX for it to be a TODO to only aquire the lock
on platforms without the atomic 64bit operations?

On a more general note:

I've tried various manual tests of this feature and it sometimes works
as expected and sometimes doesn't.
I'm getting the feeling that how I expect it to work isn't quite in sync
with how it does work.

I'd expect the following to be sufficient to generate the test

T1: Obtains a snapshot that can see some rows
T2: Waits 60 seconds and performs an update on those rows
T2: Performs a vacuum
T1: Waits 60 seconds, tries to select from the table.  The snapshot
should be too old


For example it seems that in test 002 the select_ok on conn1 following
the vacuum but right before the final sleep is critical to the snapshot
too old error showing up (ie if I remove that select_ok but leave in the
sleep I don't get the error)

Is this intended and if so is there a better way we can explain how
things work?


Also is 0 intended to be an acceptable value for old_snapshot_threshold
and if so what should we expect to see then?




>> So if I understand correctly, every call to BufferGetPage needs to have
>> a TestForOldSnapshot immediately afterwards?  It seems easy to later
>> introduce places that fail to test for old snapshots.  What happens if
>> they do?  Does vacuum remove tuples anyway and then the query returns
>> wrong results?  That seems pretty dangerous.  Maybe the snapshot could
>> be an argument of BufferGetPage?
> There are 486 occurences of BufferGetPage in the source code, and
> this patch follows 36 of them with TestForOldSnapshot.  This only
> needs to be done when a page is going to be used for a scan which
> produces user-visible results.  That is, it is *not* needed for
> positioning within indexes to add or vacuum away entries, for heap
> inserts or deletions (assuming the row to be deleted has already
> been found).  It seems wrong to modify about 450 BufferGetPage
> references to add a NULL parameter; and if we do want to make that
> noop change as insurance, it seems like it should be a separate
> patch, since the substance of this patch would be buried under the
> volume of that.
>
> I will add this to the November CF.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: PATCH: 9.5 replication origins fix for logical decoding
Next
From: Michael Paquier
Date:
Subject: Re: eXtensible Transaction Manager API