Thread: "page is not marked all-visible" warning in regression tests
I got this last night in a perfectly standard build of HEAD: *** /home/tgl/pgsql/src/test/regress/expected/sanity_check.out Thu Jan 12 14:06:14 2012 --- /home/tgl/pgsql/src/test/regress/results/sanity_check.out Mon Jun 4 20:28:39 2012 *************** *** 1,4 **** --- 1,5 ---- VACUUM; + WARNING: page is not marked all-visible but visibility map bit is set in relation "pg_db_role_setting" page 0 -- -- sanitycheck, if we don't have indices the test will take years to -- complete. But skip TOAST relations (since they willhave varying ====================================================================== I didn't manage to reproduce it in a few tries, and I don't recall having seen the like reported from the buildfarm either. So there's some low-probability bug in there ... regards, tom lane
On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote: > I got this last night in a perfectly standard build of HEAD: > > *** /home/tgl/pgsql/src/test/regress/expected/sanity_check.out Thu Jan 12 > 14:06:14 2012 --- > /home/tgl/pgsql/src/test/regress/results/sanity_check.out Mon Jun 4 > 20:28:39 2012 *************** > *** 1,4 **** > --- 1,5 ---- > VACUUM; > + WARNING: page is not marked all-visible but visibility map bit is set in > relation "pg_db_role_setting" page 0 -- > -- sanity check, if we don't have indices the test will take years to > -- complete. But skip TOAST relations (since they will have varying > > ====================================================================== > > I didn't manage to reproduce it in a few tries, and I don't recall > having seen the like reported from the buildfarm either. So there's > some low-probability bug in there ... I have seen that twice just yesterday. Couldn't reproduce it so far. Workload was (pretty exactly): initdb postgres -c fsync=off pgbench -i -s 100 CREATE TABLE data(id serial primary key, data int); ALTER SEQUENCE data_id_seq INCREMENT 2; VACUUM FREEZE; normal shutdown postgres -c fsync=on pgbench -c 20 -j 20 -T 100 WARNING: ... pg_depend ... WARNING: ... can't remember ... Greetings, Andres
Andres Freund <andres@2ndquadrant.com> writes: > On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote: >> I got this last night in a perfectly standard build of HEAD: >> + WARNING: page is not marked all-visible but visibility map bit is set in >> relation "pg_db_role_setting" page 0 -- > I have seen that twice just yesterday. Couldn't reproduce it so far. > Workload was (pretty exactly): > initdb > postgres -c fsync=off > pgbench -i -s 100 > CREATE TABLE data(id serial primary key, data int); > ALTER SEQUENCE data_id_seq INCREMENT 2; > VACUUM FREEZE; > normal shutdown > postgres -c fsync=on > pgbench -c 20 -j 20 -T 100 > WARNING: ... pg_depend ... > WARNING: ... can't remember ... Hmm ... from memory, what I did was configure/build/install from a fresh pull initdb start postmaster, fsync off make installcheck stop postmaster apply Hanada-san's json patch, replace postgres executable start postmaster, fsync off make installcheck and it was the second of these runs that failed. Could we be missing flushing some blocks out to disk at shutdown? Maybe fsync off is a contributing factor? regards, tom lane
On Tuesday, June 05, 2012 04:18:44 PM Tom Lane wrote: > Andres Freund <andres@2ndquadrant.com> writes: > > On Tuesday, June 05, 2012 03:32:08 PM Tom Lane wrote: > >> I got this last night in a perfectly standard build of HEAD: > >> + WARNING: page is not marked all-visible but visibility map bit is set > >> in relation "pg_db_role_setting" page 0 -- > > > > I have seen that twice just yesterday. Couldn't reproduce it so far. > > Workload was (pretty exactly): > > > > initdb > > postgres -c fsync=off > > pgbench -i -s 100 > > CREATE TABLE data(id serial primary key, data int); > > ALTER SEQUENCE data_id_seq INCREMENT 2; > > VACUUM FREEZE; > > normal shutdown > > postgres -c fsync=on > > pgbench -c 20 -j 20 -T 100 > > WARNING: ... pg_depend ... > > WARNING: ... can't remember ... > > Hmm ... from memory, what I did was > > configure/build/install from a fresh pull > initdb > start postmaster, fsync off > make installcheck > stop postmaster > apply Hanada-san's json patch, replace postgres executable > start postmaster, fsync off > make installcheck > > and it was the second of these runs that failed. Could we be missing > flushing some blocks out to disk at shutdown? Maybe fsync off is a > contributing factor? On a cursory lock it might just be a race condition in vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the warning to be visible, all_visible_according_to_vm is determined before we loop over all blocks. At the point where one specific heap block is actually read and locked that knowledge might be completely outdated by any concurrent backend. Am I missing something? I have to say the whole visibilitymap correctness and crash-safety seems to be quite under documented, especially as it seems to be somewhat intricate (to me). E.g. not having any note why visibilitymap_test doesn't need locking. (I guess the theory is that a 1 byte read will always be consistent. But how does that ensure other backends see an up2date value?). Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On a cursory lock it might just be a race condition in > vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for the > warning to be visible, all_visible_according_to_vm is determined before we > loop over all blocks. At the point where one specific heap block is actually > read and locked that knowledge might be completely outdated by any concurrent > backend. Am I missing something? No, I think you're right. I think that warning is bogus. I added it in place of some older warning which no longer made sense, but I think this one doesn't make sense either. > I have to say the whole visibilitymap correctness and crash-safety seems to be > quite under documented, especially as it seems to be somewhat intricate (to > me). E.g. not having any note why visibilitymap_test doesn't need locking. (I > guess the theory is that a 1 byte read will always be consistent. But how does > that ensure other backends see an up2date value?). It's definitely intricate, and it's very possible that we should have some more documentation. I am not sure exactly what and where, but feel free to suggest something. visibilitymap_test() does have a comment saying that: /* * We don't need to lock the page, as we're only looking at a single bit. */ But that's a bit unsatisfying, because, as you say, it doesn't address the question of memory-ordering issues. I think that there's no situation in which it causes a problem to see the visibility map bit as unset when in reality it has just recently been set by some other back-end. It would be bad if someone did something like: if (visibilitymap_test(...)) visibilitymap_clear(); ...because then memory-ordering issues could cause us to accidentally fail to clear the bit. No one should be doing that, though; the relevant locking and conditional logic is built directly into visibilitymap_clear(). On the flip side, if someone sees the visibility map bit as set when it's actually just been cleared, that could cause a problem - most seriously, index-only scans could return wrong answers. For that to happen, someone would have to insert a heap tuple onto a previously all-visible page, clearing the visibility map bit, and then insert an index tuple; concurrently, some other backend would need to see the index tuple but not the fact that the visibility map bit had been cleared. I don't think that can happen: after inserting the heap tuple, the inserting backend would release buffer content lock, which acts as a full memory barrier; before reading the index tuple, the index-only-scanning backend would have to take the content lock on the index buffer, which also acts as a full memory barrier. So the inserter can't do the writes out of order, and the index-only-scanner can't do the reads out of order, so I think it's safe.... but we probably do need to explain that somewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wednesday, June 06, 2012 08:19:15 PM Robert Haas wrote: > On Wed, Jun 6, 2012 at 1:46 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > On a cursory lock it might just be a race condition in > > vacuumlazy.c:lazy_scan_heap. If scan_all is set, which it has to be for > > the warning to be visible, all_visible_according_to_vm is determined > > before we loop over all blocks. At the point where one specific heap > > block is actually read and locked that knowledge might be completely > > outdated by any concurrent backend. Am I missing something? > > No, I think you're right. I think that warning is bogus. I added it > in place of some older warning which no longer made sense, but I think > this one doesn't make sense either. Agreed. It might be interesting to recheck the visibility and warn if its wrong. That should be infrequent enough to bearable and it does check for an actually dangerous case in a new code path. > > I have to say the whole visibilitymap correctness and crash-safety seems > > to be quite under documented, especially as it seems to be somewhat > > intricate (to me). E.g. not having any note why visibilitymap_test > > doesn't need locking. (I guess the theory is that a 1 byte read will > > always be consistent. But how does that ensure other backends see an > > up2date value?). > > It's definitely intricate, and it's very possible that we should have > some more documentation. I am not sure exactly what and where, but > feel free to suggest something. I think some addition to the LOCKING part of visibilitymap.c's header explaining some of what you wrote in your email might be a good start. I would also suggest explictly mentioning that its ok to have the visibilitymap and the page disagreeing about the visibility if its the visibility map that think that the page contains invisible data but not the other way round (I think that can currently happen). visibilitymap_test() should explain that its results can be outdated if youre not holding a buffer lock. > visibilitymap_test() does have a comment saying that: > > /* > * We don't need to lock the page, as we're only looking at a > single bit. > */ Oh. I conveniently skipped that comment in my brain ;) > But that's a bit unsatisfying, because, as you say, it doesn't address > the question of memory-ordering issues. I think that there's no > situation in which it causes a problem to see the visibility map bit > as unset when in reality it has just recently been set by some other > back-end. It would be bad if someone did something like: > > if (visibilitymap_test(...)) > visibilitymap_clear(); > > ...because then memory-ordering issues could cause us to accidentally > fail to clear the bit. No one should be doing that, though; the > relevant locking and conditional logic is built directly into > visibilitymap_clear(). Then _test should document that... I don't think its impossible that we will grow more uses of the visibilitymap logic. > On the flip side, if someone sees the visibility map bit as set when > it's actually just been cleared, that could cause a problem - most > seriously, index-only scans could return wrong answers. For that to > happen, someone would have to insert a heap tuple onto a previously > all-visible page, clearing the visibility map bit, and then insert an > index tuple; concurrently, some other backend would need to see the > index tuple but not the fact that the visibility map bit had been > cleared. I don't think that can happen: after inserting the heap > tuple, the inserting backend would release buffer content lock, which > acts as a full memory barrier; before reading the index tuple, the > index-only-scanning backend would have to take the content lock on the > index buffer, which also acts as a full memory barrier. So the > inserter can't do the writes out of order, and the index-only-scanner > can't do the reads out of order, so I think it's safe.... but we > probably do need to explain that somewhere. Hm. For a short while I thought there would be an issue with heap_delete and IOS because the deleting transaction can commit without any barriers happening on the IOS side. But that only seems to be possible with non MVCC snapshots which are currently not allowed with index only scans. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jun 6, 2012 at 3:07 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hm. For a short while I thought there would be an issue with heap_delete and > IOS because the deleting transaction can commit without any barriers happening > on the IOS side. But that only seems to be possible with non MVCC snapshots > which are currently not allowed with index only scans. Well, one, commits are irrelevant; the page ceases to be all-visible as soon as the delete happens. And two, you can't do a delete or a commit without a memory barrier - every LWLockAcquire() or LWLockRelease() is a full barrier, so any operation that requires a buffer content lock is both preceded and followed by a full barrier. Proposed patch attached. This adds some more comments in various places, and implements your suggestion of retesting the visibility-map bit when we detect a possible mismatch with the page-level bit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thursday, June 07, 2012 03:20:50 PM Robert Haas wrote: > On Wed, Jun 6, 2012 at 3:07 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > Hm. For a short while I thought there would be an issue with heap_delete > > and IOS because the deleting transaction can commit without any barriers > > happening on the IOS side. But that only seems to be possible with non > > MVCC snapshots which are currently not allowed with index only scans. > Well, one, commits are irrelevant; the page ceases to be all-visible > as soon as the delete happens. Its not irrelevant for the code as it stands if non-mvcc snapshots were allowed. Unless I miss something, even disregarding memory ordering issues, there is no interlocking providing protection against the index doing the visibilitymap_test() and some concurrent backend doing a heap_delete + commit directly after that. If a SnapshotNow were used this could result in different results between a IOS and a normal indexscan because the normal indexscan would lock the heap page before doing the visibility testing and thus would see the new visibility information. But then a SnapshotNow wouldn't be used if that were a problem... For normal snapshots its not a problem because with regards to the snapshot everything on the page is still visible as I think that can only happen for deletions and HOT updates because the index page would need to get locked otherwise. Deletions aren't visible yet and HOT is irrelevant for the IOS by definition. > And two, you can't do a delete or a commit without a memory barrier - every > LWLockAcquire() or LWLockRelease() is a full barrier, so any operation that > requires a buffer content lock is both preceded and followed by a full > barrier. The memory barrier on the deleting side is irrelevant if there is no memory barrier on the reading side. > Proposed patch attached. This adds some more comments in various > places, and implements your suggestion of retesting the visibility-map > bit when we detect a possible mismatch with the page-level bit. Thanks, will look at it in a bit. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> Well, one, commits are irrelevant; the page ceases to be all-visible >> as soon as the delete happens. > Its not irrelevant for the code as it stands if non-mvcc snapshots were > allowed. Unless I miss something, even disregarding memory ordering issues, > there is no interlocking providing protection against the index doing the > visibilitymap_test() and some concurrent backend doing a heap_delete + commit > directly after that. Hmm. Well, the backend that did the heap_delete would clear the visibility map bit when it did the delete, before releasing the lock on the heap buffer. So normally we'll see that buffer as not-all-visible as soon as the delete is performed, even if no commit has happened yet. But I guess there could be a memory-ordering issue there for a SnapshotNow scan, because as you say there's no barrier on the read side in that case. The delete + commit would have to happen after we finish fetching the index TID but before we test the visibility map bit. So I suppose that if we wanted to support index-only scans under SnapshotNow maybe we'd want to lock and unlock the visibility map page when testing each bit. But that almost seems like overkill anyway: for it to matter, someone would have to be relying on starting a scan, deleting a tuple in another transaction while the scan was in progress, and having the scan see the delete when it reached the deleted tuple. But of course if the scan had take a microsecond longer to reach the deleted tuple it wouldn't have seen it as deleted anyway, so it's a bit hard to see how any such code could be race-free anyhow. >> Proposed patch attached. This adds some more comments in various >> places, and implements your suggestion of retesting the visibility-map >> bit when we detect a possible mismatch with the page-level bit. > Thanks, will look at it in a bit. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thursday, June 07, 2012 04:27:32 PM Robert Haas wrote: > On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Proposed patch attached. This adds some more comments in various > >> places, and implements your suggestion of retesting the visibility-map > >> bit when we detect a possible mismatch with the page-level bit. > > > > Thanks, will look at it in a bit. I wonder if /* mark page all-visible, if appropriate */ if (all_visible && !PageIsAllVisible(page)) { PageSetAllVisible(page); MarkBufferDirty(buf); visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); } shouldn't test if (all_visible && (!PageIsAllVisible(page) || !all_visible_according_to_vm) instead. Commentwise I am not totally content with the emphasis on memory ordering because some of the stuff is more locking than memory ordering. Except that I think its a pretty clear improvement. I can reformulate the places where I find that relevant but I have the feeling that wouldn't help the legibility. Its the big comment in vacuumlazy.c, the comment in nodeIndexonly.c and the one in the header of visibilitymap_test. Should be s/memory- ordering/concurrency/ except in nodeIndexonlyscan.c The visibilitymap_clear/PageClearAllVisible in heap_multi_insert should be moved into the critical section, shouldn't it? Regards, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 11:04 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On Thursday, June 07, 2012 04:27:32 PM Robert Haas wrote: >> On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> > wrote: >> >> Proposed patch attached. This adds some more comments in various >> >> places, and implements your suggestion of retesting the visibility-map >> >> bit when we detect a possible mismatch with the page-level bit. >> > >> > Thanks, will look at it in a bit. > I wonder if > /* mark page all-visible, if appropriate */ > if (all_visible && !PageIsAllVisible(page)) > { > PageSetAllVisible(page); > MarkBufferDirty(buf); > visibilitymap_set(onerel, blkno, InvalidXLogRecPtr, vmbuffer, > visibility_cutoff_xid); > } > shouldn't test > if (all_visible && > (!PageIsAllVisible(page) || !all_visible_according_to_vm) > instead. Hmm, I think you're right. > Commentwise I am not totally content with the emphasis on memory ordering > because some of the stuff is more locking than memory ordering. Except that I > think its a pretty clear improvement. I can reformulate the places where I > find that relevant but I have the feeling that wouldn't help the legibility. > Its the big comment in vacuumlazy.c, the comment in nodeIndexonly.c and the > one in the header of visibilitymap_test. Should be s/memory- > ordering/concurrency/ except in nodeIndexonlyscan.c Hmm, I see your point. > The visibilitymap_clear/PageClearAllVisible in heap_multi_insert should be > moved into the critical section, shouldn't it? Yes, it should. I was thinking maybe we could go the other way and have heap_insert do it before starting the critical section, but that's no good: clearing the visibility map bit is part of the critical data change, and we can't do it and then forget to WAL-log it. Updated patch attached. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Thursday, June 07, 2012 06:08:34 PM Robert Haas wrote: > On Thu, Jun 7, 2012 at 11:04 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On Thursday, June 07, 2012 04:27:32 PM Robert Haas wrote: > >> On Thu, Jun 7, 2012 at 9:41 AM, Andres Freund <andres@2ndquadrant.com> > > > > wrote: > >> >> Proposed patch attached. This adds some more comments in various > >> >> places, and implements your suggestion of retesting the > >> >> visibility-map bit when we detect a possible mismatch with the > >> >> page-level bit. > >> > > >> > Thanks, will look at it in a bit. > > The visibilitymap_clear/PageClearAllVisible in heap_multi_insert should > > be moved into the critical section, shouldn't it? > > Yes, it should. I was thinking maybe we could go the other way and > have heap_insert do it before starting the critical section, but > that's no good: clearing the visibility map bit is part of the > critical data change, and we can't do it and then forget to WAL-log > it. You could do a visibilitymap_pin outside, but I don't really see the point as the page is already locked. There might be some slight benefit in doing so in multi_insert but that would be more complicated. And of doubtful benefit. > Updated patch attached. Looks good. Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Jun 7, 2012 at 12:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > You could do a visibilitymap_pin outside, but I don't really see the point as > the page is already locked. There might be some slight benefit in doing so in > multi_insert but that would be more complicated. And of doubtful benefit. Doesn't RelationGetBufferForTuple() already do exactly that? >> Updated patch attached. > Looks good. Thanks for the review. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thursday, June 07, 2012 06:23:58 PM Robert Haas wrote: > On Thu, Jun 7, 2012 at 12:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > You could do a visibilitymap_pin outside, but I don't really see the > > point as the page is already locked. There might be some slight benefit > > in doing so in multi_insert but that would be more complicated. And of > > doubtful benefit. > > Doesn't RelationGetBufferForTuple() already do exactly that? Forget it, sorry. I mis(read|remembered) how multi_insert works and concluded from that on how insert works... Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Robert Haas <robertmhaas@gmail.com> writes: > Updated patch attached. The comments need a pass of copy-editing, eg here and here: > + * so somebody else could be change the bit just after we look at it. In fact, ^^^^^^^^^^^^^^^ > + * got cleared after we checked it and before we got took the buffer ^^^^^^^^ Seems reasonable beyond that. regards, tom lane
On Thu, Jun 7, 2012 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Updated patch attached. > > The comments need a pass of copy-editing, eg here and here: > >> + * so somebody else could be change the bit just after we look at it. In fact, > ^^^^^^^^^^^^^^^ > >> + * got cleared after we checked it and before we got took the buffer > ^^^^^^^^ > > Seems reasonable beyond that. Thanks, committed with those fixes and a correction for one more typo I spotted elsewhere. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company