I believe I've identified the reason why skink and some other buildfarm
members have been failing the pg_upgrade test recently. It is that
recent changes in sequence support have caused binary-upgrade restore
runs to do some sequence OID/relfilenode assignments without any heed
to the OIDs that pg_upgrade tried to impose on those sequences. Once
those sequences have relfilenodes other than the intended ones, they
are land mines for all subsequent pg_upgrade-controlled table OID
assignments.
I am not very sure why it's so hard to duplicate the misbehavior; perhaps,
in order to make the failure happen with the current regression tests,
it's necessary for a background auto-analyze to happen and consume some
OIDs (for pg_statistic TOAST entries) at just the wrong time. However,
I can definitely demonstrate that there are uncontrolled relfilenode
assignments happening during pg_upgrade's restore run. I stuck an
elog() call into GetNewObjectId(), along with generation of a stack
trace using backtrace(), and here is one example:
[593daad3.4863:2243] LOG: generated OID 16735
[593daad3.4863:2244] STATEMENT: -- For binary upgrade, must preserve pg_class oidsSELECT
pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('46851'::pg_catalog.oid);--For binary upgrade, must preserve
pg_typeoidSELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('46852'::pg_catalog.oid);ALTER TABLE "itest10" ALTER
COLUMN"a" ADD GENERATED BY DEFAULT AS IDENTITY ( SEQUENCE NAME "itest10_a_seq" START WITH 1 INCREMENT BY 1
NOMINVALUE NO MAXVALUE CACHE 1);
postgres: postgres regression [local] ALTER TABLE(GetNewObjectId+0xda) [0x50397a]
postgres: postgres regression [local] ALTER TABLE(GetNewRelFileNode+0xec) [0x52430c]
postgres: postgres regression [local] ALTER TABLE(RelationSetNewRelfilenode+0x79) [0x851d59]
postgres: postgres regression [local] ALTER TABLE(AlterSequence+0x1cd) [0x5d976d]
postgres: postgres regression [local] ALTER TABLE() [0x75d279]
postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7]
postgres: postgres regression [local] ALTER TABLE() [0x75cb1d]
postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7]
postgres: postgres regression [local] ALTER TABLE() [0x759f0b]
postgres: postgres regression [local] ALTER TABLE() [0x75ae91]
postgres: postgres regression [local] ALTER TABLE(PortalRun+0x250) [0x75b740]
postgres: postgres regression [local] ALTER TABLE() [0x757be7]
postgres: postgres regression [local] ALTER TABLE(PostgresMain+0xe08) [0x759968]
postgres: postgres regression [local] ALTER TABLE(PostmasterMain+0x1a99) [0x6e21a9]
postgres: postgres regression [local] ALTER TABLE(main+0x6b8) [0x65b958]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x3f3bc1ed1d]
postgres: postgres regression [local] ALTER TABLE() [0x473899]
Judging by when we started to see buildfarm failures, I think that
commit 3d79013b9 probably broke it, but the problem seems latent
in the whole concept of transactional sequence information.
Not sure what we want to do about it. One idea is to make
ALTER SEQUENCE not so transactional when in binary-upgrade mode.
(I'm also tempted to make GetNewRelFileNode complain if IsBinaryUpgrade
is true, but that's a separate matter.)
In any case, this is a "must fix" problem IMO, so I'll go add it to the
open items list.
regards, tom lane