[HACKERS] Transactional sequence stuff breaks pg_upgrade - Mailing list pgsql-hackers

From Tom Lane
Subject [HACKERS] Transactional sequence stuff breaks pg_upgrade
Date
Msg-id 19785.1497215827@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Transactional sequence stuff breaks pg_upgrade
Re: [HACKERS] Transactional sequence stuff breaks pg_upgrade
List pgsql-hackers
I believe I've identified the reason why skink and some other buildfarm
members have been failing the pg_upgrade test recently.  It is that
recent changes in sequence support have caused binary-upgrade restore
runs to do some sequence OID/relfilenode assignments without any heed
to the OIDs that pg_upgrade tried to impose on those sequences.  Once
those sequences have relfilenodes other than the intended ones, they
are land mines for all subsequent pg_upgrade-controlled table OID
assignments.

I am not very sure why it's so hard to duplicate the misbehavior; perhaps,
in order to make the failure happen with the current regression tests,
it's necessary for a background auto-analyze to happen and consume some
OIDs (for pg_statistic TOAST entries) at just the wrong time.  However,
I can definitely demonstrate that there are uncontrolled relfilenode
assignments happening during pg_upgrade's restore run.  I stuck an
elog() call into GetNewObjectId(), along with generation of a stack
trace using backtrace(), and here is one example:

[593daad3.4863:2243] LOG:  generated OID 16735
[593daad3.4863:2244] STATEMENT:  -- For binary upgrade, must preserve pg_class oidsSELECT
pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('46851'::pg_catalog.oid);--For binary upgrade, must preserve
pg_typeoidSELECT pg_catalog.binary_upgrade_set_next_pg_type_oid('46852'::pg_catalog.oid);ALTER TABLE "itest10" ALTER
COLUMN"a" ADD GENERATED BY DEFAULT AS IDENTITY (    SEQUENCE NAME "itest10_a_seq"    START WITH 1    INCREMENT BY 1
NOMINVALUE    NO MAXVALUE    CACHE 1); 

postgres: postgres regression [local] ALTER TABLE(GetNewObjectId+0xda) [0x50397a]
postgres: postgres regression [local] ALTER TABLE(GetNewRelFileNode+0xec) [0x52430c]
postgres: postgres regression [local] ALTER TABLE(RelationSetNewRelfilenode+0x79) [0x851d59]
postgres: postgres regression [local] ALTER TABLE(AlterSequence+0x1cd) [0x5d976d]
postgres: postgres regression [local] ALTER TABLE() [0x75d279]
postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7]
postgres: postgres regression [local] ALTER TABLE() [0x75cb1d]
postgres: postgres regression [local] ALTER TABLE(standard_ProcessUtility+0xb7) [0x75dec7]
postgres: postgres regression [local] ALTER TABLE() [0x759f0b]
postgres: postgres regression [local] ALTER TABLE() [0x75ae91]
postgres: postgres regression [local] ALTER TABLE(PortalRun+0x250) [0x75b740]
postgres: postgres regression [local] ALTER TABLE() [0x757be7]
postgres: postgres regression [local] ALTER TABLE(PostgresMain+0xe08) [0x759968]
postgres: postgres regression [local] ALTER TABLE(PostmasterMain+0x1a99) [0x6e21a9]
postgres: postgres regression [local] ALTER TABLE(main+0x6b8) [0x65b958]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x3f3bc1ed1d]
postgres: postgres regression [local] ALTER TABLE() [0x473899]


Judging by when we started to see buildfarm failures, I think that
commit 3d79013b9 probably broke it, but the problem seems latent
in the whole concept of transactional sequence information.

Not sure what we want to do about it.  One idea is to make
ALTER SEQUENCE not so transactional when in binary-upgrade mode.

(I'm also tempted to make GetNewRelFileNode complain if IsBinaryUpgrade
is true, but that's a separate matter.)

In any case, this is a "must fix" problem IMO, so I'll go add it to the
open items list.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Piotr Stefaniak
Date:
Subject: Re: pgindent (was Re: [HACKERS] [COMMITTERS] pgsql: Preventivemaintenance in advance of pgindent run.)
Next
From: Andrew Gierth
Date:
Subject: Re: [HACKERS] PG10 transition tables, wCTEs and multiple operations on the same table