Justin Pryzby <pryzby@telsasoft.com> writes:
> This commit seems to trigger elog(), not reproducible in the
> parent commit.
> 6e086fa2e77 Allow parallel workers to cope with a newly-created session user ID.
> postgres=# SET min_parallel_table_scan_size=0; CLUSTER pg_attribute USING pg_attribute_relid_attnum_index;
> ERROR: pg_attribute catalog is missing 26 attribute(s) for relation OID 70321
I've been poking at this all day, and I still have little idea what's
going on. I've added a bunch of throwaway instrumentation, and have
managed to convince myself that the problem is that parallel heap
scan is broken. The scans done to rebuild pg_attribute's indexes
seem to sometimes miss heap pages or visit pages twice (in different
workers). I have no idea why this is, and even less idea how
6e086fa2e is provoking it. As you say, the behavior isn't entirely
reproducible, but I couldn't make it happen at all after reverting
6e086fa2e's changes in transam/parallel.c, so apparently there is
some connection.
Another possibly useful data point is that for me it reproduces
fairly well (more than one time in two) on x86_64 Linux, but
I could not make it happen on macOS ARM64. If it's a race
condition, which smells plausible, that's perhaps not hugely
surprising.
regards, tom lane