Re: "page is not marked all-visible" warning in regression tests - Mailing list pgsql-hackers

From Andres Freund
Subject Re: "page is not marked all-visible" warning in regression tests
Date
Msg-id 201206061946.11827.andres@2ndquadrant.com
Whole thread Raw
In response to Re: "page is not marked all-visible" warning in regression tests  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: "page is not marked all-visible" warning in regression tests  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tuesday, June 05, 2012 04:18:44 PM Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote:
> >> I got this last night in a perfectly standard build of HEAD:
> >> + WARNING:  page is not marked all-visible but visibility map bit is set
> >> in relation "pg_db_role_setting" page 0 --
> > 
> > I have seen that twice just yesterday. Couldn't reproduce it so far.
> > Workload was (pretty exactly):
> > 
> > initdb
> > postgres -c fsync=off
> > pgbench -i -s 100
> > CREATE TABLE data(id serial primary key, data int);
> > ALTER SEQUENCE data_id_seq INCREMENT 2;
> > VACUUM FREEZE;
> > normal shutdown
> > postgres -c fsync=on
> > pgbench -c 20 -j 20 -T 100
> > WARNING: ... pg_depend ...
> > WARNING: ... can't remember ...
> 
> Hmm ... from memory, what I did was
> 
> configure/build/install from a fresh pull
> initdb
> start postmaster, fsync off
> make installcheck
> stop postmaster
> apply Hanada-san's json patch, replace postgres executable
> start postmaster, fsync off
> make installcheck
> 
> and it was the second of these runs that failed.  Could we be missing
> flushing some blocks out to disk at shutdown?  Maybe fsync off is a
> contributing factor?
On a cursory lock it might just be a race condition in 
vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the 
warning to be visible, all_visible_according_to_vm is determined before we 
loop over all blocks. At the point where one specific heap block is actually 
read and locked that knowledge might be completely outdated by any concurrent 
backend. Am I missing something?

I have to say the whole visibilitymap correctness and crash-safety seems to be 
quite under documented, especially as it seems to be somewhat intricate (to 
me). E.g. not having any note why visibilitymap_test doesn't need locking. (I 
guess the theory is that a 1 byte read will always be consistent. But how does 
that ensure other backends see an up2date value?).

Andres

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: Inconsistency in libpq connection parameters, and extension thereof
Next
From: Robert Haas
Date:
Subject: Re: Ability to listen on two unix sockets