On Wed, Jun 15, 2016 at 10:46 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Jun 11, 2016 at 11:29 AM, Kevin Grittner <kgrittn@gmail.com> wrote:
>> I have reviewed the code and run tests to try to find something
>> here which could be considered a bug, without finding any problem.
>> When reading pages for the random sample for ANALYZE (or
>> auto-analyze) there is not an age check; so ANALYZE completes
>> without error, keeping statistics up-to-date.
>>
>> There really is no difference in behavior except in the case that:
>>
>> (1) old_snapshot_threshold >= 0 to enable the "snapshot too old"
>> feature, and
>> (2) there were tuples that were dead as the result of completed
>> transactions, and
>> (3) those tuples became older than the threshold, and
>> (4) those tuples were pruned or vacuumed away, and
>> (5) an ANALYZE process would have read those dead tuples had they
>> not been removed.
>>
>> In such a case the irrevocably dead, permanently removed tuples are
>> not counted in the statistics. I have trouble seeing a better
>> outcome than that. Among my tests, I specifically checked for an
>> ANALYZE of a table having an index on an expression, using an old
>> snapshot:
>>
>> -- connection 1
>> drop table if exists t1;
>> create table t1 (c1 int not null);
>> drop table if exists t2;
>> create table t2 (c1 int not null);
>> insert into t1 select generate_series(1, 10000);
>> drop function mysq(i int);
>> create function mysq(i int)
>> returns int
>> language plpgsql
>> immutable
>> as $mysq$
>> begin
>> return (i * i);
>> end
>> $mysq$;
>> create index t1_c1sq on t1 ((mysq(c1)));
>> begin transaction isolation level repeatable read;
>> select 1;
>>
>> -- connection 2
>> vacuum analyze verbose t1;
>> delete from t1 where c1 between 1000 and 1999;
>> delete from t1 where c1 = 8000;
>> insert into t2 values (1);
>> select pg_sleep_for('2min');
>> vacuum verbose t1; -- repeat if necessary to see the dead rows
>> disappear
>>
>> -- connection 1
>> analyze verbose t1;
>>
>> This runs to completion, as I would want and expect.
>>
>> I am closing this item on the "PostgreSQL 9.6 Open Items" page. If
>> anyone feels that I've missed something, please provide a test to
>> show the problem, or a clear description of the problem and how you
>> feel behavior should be different.
>
> So what happens in this scenario:
>
> 1. ANALYZE runs really slowly - maybe the user-defined function it's
> running for the expression index is extremely long-running.
> 2. Eventually, the snapshot for ANALYZE is older than the configured
> value of snapshot_too_old.
> 3. Then, ANALYZE selects a page with an LSN new enough that it might
> have been pruned.
>
> Presumably, the ANALYZE ought to error out in this scenario, just as
> it would in any other situation where an old snapshot sees a new page.
> No?
The test I showed creates a situation which (to ANALYZE) is
identical to what you describe -- ANALYZE sees a page with an LSN
recent enough that it could have been (and actually has been)
pruned. Why would it be better for the ANALYZE to fail than to
complete?
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company