Hi,
On 2022-02-18 08:56:37 -0800, Andres Freund wrote:
> Could you try to minimize the script? A 300 line reproducer is quite long. And
> it looks like it won't even work in non postgres-pro tree.
>
> One thing to do would be to modify pg_visibility to elog(PANIC, "something")
> when it encounters corruption. Then you would have a chance of inspecting the
> state of the tuple/page in that moment.
Oh, I have been able to reliably reproduce this on HEAD. I modified
record_corrupt_item() to PANIC and then:
psql regression:
BEGIN ;SELECT txid_current();
<leave open>
psql postgres
DROP TABLE IF EXISTS vacuum_test_0;
create table vacuum_test_0 as select 42 i;
vacuum (disable_page_skipping) vacuum_test_0;
select * from pg_check_visible('vacuum_test_0');
At which point there immediately is a crash.
This reproduces in earlier versions too, at least back to 10.
I *think* this is a false positive:
- PGPROC->xmin is computed without regard for the database in which the other
sessions are running. Due to the the txid_current() session this includes an
older xid.
- During the VACUUM in vis.sql the only connection to the database pgbench
connects to is VACUUM and thus ignored when determining horizons (due to
PROC_IN_VACUUM). Therefore the horizon is computed to latestCompletedXid +
1.
- But during pg_check_visible(), the current session is *not* marked as
PROC_IN_VACUUM. So the horizon is the xid from the txid_current().
Boom.
Greetings,
Andres Freund