Re: Question about behavior of snapshot too old feature - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Question about behavior of snapshot too old feature
Date
Msg-id CAD21AoD2r6HRg5Go4NGeo3mcERRbW2b+_bL35dzqC5k05VLp8A@mail.gmail.com
Whole thread Raw
In response to Re: Question about behavior of snapshot too old feature  (Kevin Grittner <kgrittn@gmail.com>)
List pgsql-hackers
On Mon, Oct 17, 2016 at 10:04 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Sun, Oct 16, 2016 at 9:26 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
>> When I set old_snapshot_threshold = 0 I got error at step #3, which
>> means that the error is occurred without table pruning.
>
> The "snapshot too old" error can happen without pruning, but only
> because there is no way to tell the difference between a page that
> has been pruned since the snapshot was taken and a page which has
> had some other kind of modification since the snapshot was taken.
>
> Ignoring false positives for a moment (where the page is updated by
> something other than pruning), what is required for early pruning
> is that the snapshot has expired (which due to "rounding" and
> avoidance of locking could easily take up to a minute or two more
> than the old_snapshot_threshold setting) and then there is page
> pruning due to a vacuum or just HOT pruning from a page read.  At
> some point after that, a read which is part of returning data to
> the user (e.g., not just positioning for index modification) can
> see that the snapshot is too old and that the LSN for the page is
> past the snapshot LSN.  That is when you get the error.
>> We have regression test for this feature but it sets
>> old_snapshot_threshold = 0, I doubt about we can test it properly.
>> Am I missing something?
>
> This is a hard feature to test properly, and certainly hard to test
> without the test running for a long time.  The zero setting is
> really not intended to be used in production, but only to allow
> some half-way decent testing that doesn't take extreme lengths of
> time.  If you add some delays of a few minutes each at key points
> in a test, you should be able to get a test that works with a
> setting of 1min.  It is not impossible that we might need to add a
> memory barrier to one or two places to get such tests to behave
> consistently, but I have not been able to spot where, if anywhere,
> that would be.


Thank you for explanation! I understood.
When old_snapshot_threshold = 0, it skips to allocate shared memory
area for the xid array and skips the some logic in order to avoid
using the shared memory, so I was concerned about that a little.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Pantelis Theodosiou
Date:
Subject: Re: Indirect indexes
Next
From: "Constantin S. Pan"
Date:
Subject: Re: Fun fact about autovacuum and orphan temp tables