Re: pg_upgrade failed with ERROR: null relpartbound for relation18159 error. - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_upgrade failed with ERROR: null relpartbound for relation18159 error.
Date
Msg-id 20181005070518.GE14664@paquier.xyz
Whole thread Raw
In response to Re: pg_upgrade failed with ERROR: null relpartbound for relation18159 error.  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
Responses Re: pg_upgrade failed with ERROR: null relpartbound for relation18159 error.  (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>)
List pgsql-hackers
On Fri, Oct 05, 2018 at 03:06:52PM +0900, Amit Langote wrote:
> To reproduce, the following works too (after creating the objects as
> described above):
>
> alter table partkey_t detach partition partkey_t_1;
> alter table partkey_t attach partition partkey_t_1 for values from (0) to
> (1000);
> ERROR:  null relpartbound for relation 16396
> CONTEXT:  SQL function "my_int4_sort"
>
> The stack at the time of the error:
>
> (gdb) bt
> #0  RelationBuildPartitionDesc
> #1  0x00000000009bf04e in RelationBuildDesc
> #2  0x00000000009c1784 in RelationClearRelation
> #3  0x00000000009c1cc5 in RelationFlushRelation
> #4  0x00000000009c1dd9 in RelationCacheInvalidateEntry
> #5  0x00000000009b9496 in LocalExecuteInvalidationMessage
> #6  0x00000000009b91ec in ProcessInvalidationMessages
> #7  0x00000000009b9cdb in CommandEndInvalidationMessages
> #8  0x00000000005346ef in AtCCI_LocalCache
> #9  0x0000000000534124 in CommandCounterIncrement
> #10 0x00000000006c579d in fmgr_sql
> #11 0x00000000009de7c2 in FunctionCall2Coll
> #12 0x000000000058ac9f in partition_rbound_cmp
> #13 0x0000000000588059 in check_new_partition_bound
> #14 0x000000000067f536 in ATExecAttachPartition
> <snip>
>
> So, the CommandCounterIncrement done in fmgr_sql causes partkey_t's
> PartitionDesc to be recomputed, which counts partkey_t_1 as its child
> because ATExecAttachPartition has already finished CreateInheritance which
> would've sent out an invalidation message for partkey_t.
>
> As of commit 2fbdf1b38bc [1], which has been applied in 11 and HEAD
> branches, RelationBuildPartitionDesc emits an error if we don't find
> relpartbound set for a child found by scanning pg_inherits, instead of
> skipping such children.  While that commit switched the order of creating
> pg_inherits entry and checking a new bound against existing bounds in
> DefineRelation in light of aforementioned change, it didn't in
> ATExecAttachPartition, hence this error.
>
> Attached patch fixes that.

Could you please add a minimal regression test in your patch?  That's
the second bug related to ATTACH PARTITION I am looking at today..

> I thought we'd need to apply this to 10, 11, HEAD, but I couldn't
> reproduce this in 10.  That's because the above commit wasn't applied to
> 10, so the child that causes this error is being skipped in 10's case.

Hmm.  Indeed, v10 does not complain but HEAD does.  (I ran the attached
SQL file, which is the complete test case both of you have compiled).
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Replace PG_AUTOCONF_FILENAME with parameter
Next
From: Konstantin Knizhnik
Date:
Subject: out-of-order XID insertion in KnownAssignedXids