Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold < - Mailing list pgsql-hackers

From Kevin Grittner
Subject Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
Date
Msg-id CACjxUsMAFcCC8gWttOfHF=-1xC98fXehThwykvsJT9Jh2=GULg@mail.gmail.com
Whole thread Raw
In response to Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Re: [COMMITTERS] pgsql: Avoid extra locks in GetSnapshotData if old_snapshot_threshold <
List pgsql-hackers
On Tue, May 24, 2016 at 4:10 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, May 24, 2016 at 3:48 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> On Tue, May 24, 2016 at 12:00 PM, Andres Freund <andres@anarazel.de> wrote:
>>> On 2016-05-24 11:24:44 -0500, Kevin Grittner wrote:
>>>> On Fri, May 6, 2016 at 8:28 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
>>>>> On Fri, May 6, 2016 at 7:48 PM, Andres Freund <andres@anarazel.de> wrote:
>>>>
>>>>>> That comment reminds me of a question I had: Did you consider the effect
>>>>>> of this patch on analyze? It uses a snapshot, and by memory you've not
>>>>>> built in a defense against analyze being cancelled.
>>
>> The primary defense is not considering a cancellation except in 30
>> of the 500 places a page is used.  None of those 30 are, as far as
>> I can see (upon review in response to your question), used in the
>> analyze process.
>
> It's not obvious to me how this is supposed to work.  If ANALYZE's
> snapshot is subject to being ignored for xmin purposes because of
> snapshot_too_old, then I would think it would need to consider
> cancelling itself if it reads a page with possibly-removed data, just
> like in any other case.  It seems that we might not actually need a
> snapshot set for ANALYZE in all cases, because the comments say:
>
>                     /* functions in indexes may want a snapshot set */
>                     PushActiveSnapshot(GetTransactionSnapshot());
>
> If we can get away with it, it would be a rather large win to only set
> a snapshot when the table has an expression index.  For purposes of
> "snapshot too old", though, it will be important that a function in an
> index which tries to read data from some other table which has been
> pruned cancels itself when necessary.

I have reviewed the code and run tests to try to find something
here which could be considered a bug, without finding any problem.
When reading pages for the random sample for ANALYZE (or
auto-analyze) there is not an age check; so ANALYZE completes
without error, keeping statistics up-to-date.

There really is no difference in behavior except in the case that:

(1)  old_snapshot_threshold >= 0 to enable the "snapshot too old"
       feature, and
(2)  there were tuples that were dead as the result of completed
       transactions, and
(3)  those tuples became older than the threshold, and
(4)  those tuples were pruned or vacuumed away, and
(5)  an ANALYZE process would have read those dead tuples had they
       not been removed.

In such a case the irrevocably dead, permanently removed tuples are
not counted in the statistics.  I have trouble seeing a better
outcome than that.  Among my tests, I specifically checked for an
ANALYZE of a table having an index on an expression, using an old
snapshot:

-- connection 1
drop table if exists t1;
create table t1 (c1 int not null);
drop table if exists t2;
create table t2 (c1 int not null);
insert into t1 select generate_series(1, 10000);
drop function mysq(i int);
create function mysq(i int)
  returns int
  language plpgsql
  immutable
as $mysq$
begin
  return (i * i);
end
$mysq$;
create index t1_c1sq on t1 ((mysq(c1)));
begin transaction isolation level repeatable read;
select 1;

-- connection 2
vacuum analyze verbose t1;
delete from t1 where c1 between 1000 and 1999;
delete from t1 where c1 = 8000;
insert into t2 values (1);
select pg_sleep_for('2min');
vacuum verbose t1;  -- repeat if necessary to see the dead rows
disappear

-- connection 1
analyze verbose t1;

This runs to completion, as I would want and expect.

I am closing this item on the "PostgreSQL 9.6 Open Items" page.  If
anyone feels that I've missed something, please provide a test to
show the problem, or a clear description of the problem and how you
feel behavior should be different.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Confusing recovery message when target not hit
Next
From: David Steele
Date:
Subject: Re: Confusing recovery message when target not hit