Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold < - Mailing list pgsql-committers

From Kevin Grittner
Subject Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date
Msg-id CACjxUsN7VNe_q0xGA7xfoTM_yqQXpqOTjhk4CQ479fQthhEFfg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <  (Andres Freund <andres@anarazel.de>)
Responses Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <  (Andres Freund <andres@anarazel.de>)
Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-committers
On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>>>> That comment reminds me of a question I had: Did you consider the effect
>>>> of this patch on analyze? It uses a snapshot, and by memory you've not
>>>> built in a defense against analyze being cancelled.

The primary defense is not considering a cancellation except in 30
of the 500 places a page is used.  None of those 30 are, as far as
I can see (upon review in response to your question), used in the
analyze process.

>> With a 1min threshold, after loading a table "v" with a million
>> rows, beginning a repeatable read transaction on a different
>> connection and opening a cursor against that table, deleting almost
>> all rows on the original connection, and waiting a few minutes I
>> see this in the open transaction:
>>
>> test=# analyze verbose v;
>> INFO:  analyzing "public.v"
>> INFO:  "v": scanned 4425 of 4425 pages, containing 1999 live rows and
>> 0 dead rows; 1999 rows in sample, 1999 estimated total rows
>> ANALYZE
>> test=# select count(*) from v;
>> ERROR:  snapshot too old
>>
>> Meanwhile, no errors appeared in the log from autovacuum.
>
> I'd guess that that problem could only be reproduced if autoanalyze
> takes longer than the timeout, and there's rows pruned after it has
> started?  Analyze IIRC acquires a new snapshot when getting sample rows,
> so it'll not necessarily trigger in the above scenario, right?

Per Tom's recollection and my review of the code, the transaction
snapshot would be used in the test I show above, and the
combination of the verbose output the subsequent query show clearly
that if one of the page references capable of throwing the error
were encountered with this snapshot, the error would be thrown.  So
at least in this ANALYZE run, there is empirical support for what I
found in reviewing the code -- none of the places where we check
for an old snapshot are exercised during an ANALYZE.

> Is there anything preventing this from becoming a problem?

The fundamental approach that the error can only appear on
user-facing scans, not internal reads and positioning.

Unless there is some code path that uses a scan where the snapshot
age is checked, this cannot be a problem.  I don't see any such
path, but if you do, please let me know, and I'll fix things.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Fetch XIDs atomically during vac_truncate_clog().
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <