lazy_vacuum_page() does this:
1. START_CRIT_SECTION()
2. Remove dead tuples from the page, marking the itemid's unused.
3. MarkBufferDirty
4. if all remaining tuples on the page are visible to everyone, set the
all-visible flag on the page, and call visibilitymap_set() to set the VM
bit.
5 visibilitymap_set() writes a WAL record about setting the bit in the
visibility map.
6. write the WAL record of removing the dead tuples.
7. END_CRIT_SECTION().
See the problem? Setting the VM bit is WAL-logged first, before the
removal of the tuples. If you stop recovery between the two WAL records,
the page's VM bit in the VM map will be set, but the dead tuples are
still on the page.
This bug was introduced in Feb 2013, by commit
fdf9e21196a6f58c6021c967dc5776a16190f295, so it's present in 9.3 and master.
The fix seems quite straightforward, we just have to change the order of
those two records. I'll go commit that.
- Heikki