Thread: Statistics visibility in SERIALIZABLE transactions

Statistics visibility in SERIALIZABLE transactions

From
Michael Fuhr
Date:
Updates to planner statistics appear to be visible in SERIALIZABLE
transactions even though updated data is not.  Is this intentional?
Could that adversely affect query plans?

CREATE TABLE test (x integer);
INSERT INTO test (x) SELECT i % 2 FROM generate_series(1, 100) AS g(i);
ANALYZE test;

Transaction 1:
BEGIN ISOLATION LEVEL SERIALIZABLE;
EXPLAIN ANALYZE SELECT * FROM test WHERE x = 1;                                          QUERY PLAN
                     
 
------------------------------------------------------------------------------------------------Seq Scan on test
(cost=0.00..2.25rows=50 width=4) (actual time=0.201..0.787 rows=50 loops=1)  Filter: (x = 1)Total runtime: 1.169 ms
 
(3 rows)

Transaction 2:
BEGIN;
DELETE FROM test WHERE x = 1;
ANALYZE test;
COMMIT;

Transaction 1:
EXPLAIN ANALYZE SELECT * FROM test WHERE x = 1;                                         QUERY PLAN
                    
 
-----------------------------------------------------------------------------------------------Seq Scan on test
(cost=0.00..1.62rows=1 width=4) (actual time=0.499..1.090 rows=50 loops=1)  Filter: (x = 1)Total runtime: 1.476 ms
 
(3 rows)

In Transaction 1's second query the planner uses an updated row
count estimate even though the old rows are still visible.  I think
I understand why statistics like the total relation size and total
number of tuples would help the planner, but is there a reason for
distribution statistics to be visible for data that itself isn't
visible?

Thanks.

-- 
Michael Fuhr


Re: Statistics visibility in SERIALIZABLE transactions

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Updates to planner statistics appear to be visible in SERIALIZABLE
> transactions even though updated data is not.  Is this intentional?

This is because the planner uses the catcache to read pg_statistic, and
that runs on SnapshotNow rules.  I am not excited about it: as a general
rule you want to be using the latest stats available.  It is much more
likely that the stats existing at the start of your transaction are
out-of-date than that they are a better representation of the data in
your snapshot than stats committed a bit later.  It's an approximation
either way...

(Actually, in general catalog changes apply immediately even in
SERIALIZABLE mode.  Try looking at a table that was ALTERed since
your transaction began.)
        regards, tom lane


Re: Statistics visibility in SERIALIZABLE transactions

From
Josh Berkus
Date:
Tom,

>  Try looking at a table that was ALTERed since
> your transaction began.)

Shouldn't the transaction fail, then?

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: Statistics visibility in SERIALIZABLE transactions

From
Russell Smith
Date:
Josh Berkus wrote:
> Tom,
>
>   
>>  Try looking at a table that was ALTERed since
>> your transaction began.)
>>     
>
> Shouldn't the transaction fail, then?
>
>   
Neil Conway and I opened this VERY BIG can of worms a couple of years 
ago on hackers.

I can't find the references to it in the archives.  I think I need to go 
back more than the start of 2005.  And I can't remember how to do that.
It was in relation to CLUSTER on a table and how previous transactions 
would not use the correct view of the table.

I recall something to do with issues relating to reading the catalog 
before transaction startup and problems with needing a transaction to 
use the catalog.
My memory of it is fuzzy.  If Neil or Tom remember the discussion they 
may be able to post a link.

Sorry if this wasn't much help.

Russell Smith