Indeed, this is a nasty bug that leads to data corruption. The following
sequence results in corruption of the visibility map, but I believe it
can be shown to cause problems for a btree or GIN index as well. So it's
recoverable if you do a VACUUM or a reindex.
drop table foo;
create table foo(i int);
create index foo_idx on foo(i);
insert into foo values(1);
vacuum foo;
checkpoint;
insert into foo values(2);
-- Send a SIGQUIT to the postmaster, and restart it.
-- Now the buffer for the VM is in shared memory without BM_PERMANENT.
-- VM block comes from backup block which still has VM bit set, but wal
-- record for insert (2) unsets it in memory.
delete from foo where i = 2;
-- This checkpoint will *not* write out the buffer because it's not
-- BM_PERMANENT. Therefore, it remains the same on disk, with the VM
-- bit set.
checkpoint;
-- Send a SIGQUIT to the postmaster and restart. WAL records prior to
-- that last checkpoint aren't replayed, so VM bit is still set.
set enable_seqscan=false;
select * from foo where i = 2;
vacuum foo;
WARNING: page is not marked all-visible but visibility map bit is
set in relation "foo" page 0
VACUUM
select * from foo;
i
---
1
(1 row)
This bug seems particularly troublesome because the right fix would be
to include the relpersistence in the WAL records that need it. But that
can't be backported (right?).
The problem might not be a live problem before
7e4911b2ae33acff7b85234b91372133ec6df9d4, because that bug was hiding
this one.
Regards,
Jeff Davis