Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold < - Mailing list pgsql-committers

From Kevin Grittner
Subject Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date
Msg-id CACjxUsORHHKuuJ5XoQ+eapdbVD+Pt0yuddEosh44EBPfH2bqDg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Re: pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-committers
On Wed, Jun 15, 2016 at 1:29 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jun 15, 2016 at 1:44 PM, Kevin Grittner <kgrittn@gmail.com>
> wrote:>> So what happens in this scenario:
>>> 1. ANALYZE runs really slowly - maybe the user-defined function it's
>>> running for the expression index is extremely long-running.
>>> 2. Eventually, the snapshot for ANALYZE is older than the configured
>>> value of snapshot_too_old.
>>> 3. Then, ANALYZE selects a page with an LSN new enough that it might
>>> have been pruned.
>>>
>>> Presumably, the ANALYZE ought to error out in this scenario, just as
>>> it would in any other situation where an old snapshot sees a new page.
>>> No?
>>
>> The test I showed creates a situation which (to ANALYZE) is
>> identical to what you describe -- ANALYZE sees a page with an LSN
>> recent enough that it could have been (and actually has been)
>> pruned.  Why would it be better for the ANALYZE to fail than to
>> complete?
>
> As I understand it, the reason we need to sometimes give "ERROR:
> snapshot too old" after early pruning is because we might otherwise
> give the wrong answer.
>
> Maybe I'm confused.

In the scenario that you describe, ANALYZE is coming up with
statistics to use in query planning, and the numbers are not
expected to always be 100% accurate.  I can think of conditions
which might prevail when seeing the recent LSN.

(1)  The recent LSN is from a cause having nothing to do with the
STO feature, like DML.  As things stand, the behavior is the same
as without the patch -- the rows are counted just the same as
always.  If we did as you suggest, we instead would abort ANALYZE
and have stale statistics.

(2)  The recent LSN is related to STO pruning.  The dead rows are
gone forever, and will not be counted.  This seems more correct
than counting them, even if it were possible, and also superior to
aborting the ANALYZE and leaving stale statistics.

Of course, a subset of (1) is the case that the rows can be
early-pruned at the next opportunity.  In such a case ANALYZE is
still counting them according to the rules that we had before the
snapshot too old feature.  If someone wants to argue that those
rules are wrong, that seems like material for a separate patch.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-committers by date:

Previous
From: Robert Haas
Date:
Subject: pgsql: Fix lazy_scan_heap so that it won't mark pages all-frozen too so
Next
From: Robert Haas
Date:
Subject: pgsql: Add integrity-checking functions to pg_visibility.