Thread: BUG #17056: Segmentation fault on altering the type of the foreign table column with a default

The following bug has been logged on the website:

Bug reference:      17056
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 14beta1
Operating system:   Ubuntu 20.04
Description:

When executing the following query (based on excerpt from
foreign_data.sql):

CREATE FOREIGN DATA WRAPPER dummy;
CREATE SERVER s0 FOREIGN DATA WRAPPER dummy;
CREATE FOREIGN TABLE ft1 (c1 integer NOT NULL) SERVER s0;
ALTER FOREIGN TABLE ft1 ADD COLUMN c8 integer DEFAULT 0;
ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE char(10);

The server crashes with the stack trace:

Core was generated by `postgres: law regression [local] ALTER FOREIGN TABLE
                        '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  pg_detoast_datum (datum=0x0) at fmgr.c:1724
1724            if (VARATT_IS_EXTENDED(datum))
(gdb) bt
#0  pg_detoast_datum (datum=0x0) at fmgr.c:1724
#1  0x000055f03f919267 in construct_md_array
(elems=elems@entry=0x7ffc24b6c3f0, nulls=nulls@entry=0x0, 
    ndims=ndims@entry=1, dims=dims@entry=0x7ffc24b6c340,
lbs=lbs@entry=0x7ffc24b6c344, elmtype=elmtype@entry=1042, 
    elmlen=-1, elmbyval=false, elmalign=105 'i') at arrayfuncs.c:3397
#2  0x000055f03f91952f in construct_array (elems=elems@entry=0x7ffc24b6c3f0,
nelems=nelems@entry=1, 
    elmtype=elmtype@entry=1042, elmlen=<optimized out>, elmbyval=<optimized
out>, elmalign=<optimized out>)
    at arrayfuncs.c:3328
#3  0x000055f03f6f3db7 in ATExecAlterColumnType (tab=0x7ffc24b6c400,
tab@entry=0x55f03ff27a20, 
    rel=rel@entry=0x7f2035994618, cmd=<optimized out>,
lockmode=lockmode@entry=8) at tablecmds.c:12276
#4  0x000055f03f705f24 in ATExecCmd (wqueue=wqueue@entry=0x7ffc24b6c700,
tab=tab@entry=0x55f03ff27a20, 
    cmd=<optimized out>, lockmode=lockmode@entry=8,
cur_pass=cur_pass@entry=1, context=context@entry=0x7ffc24b6c810)
    at tablecmds.c:4985
#5  0x000055f03f7063bb in ATRewriteCatalogs
(wqueue=wqueue@entry=0x7ffc24b6c700, lockmode=lockmode@entry=8, 
    context=context@entry=0x7ffc24b6c810) at
../../../src/include/nodes/nodes.h:604
#6  0x000055f03f706618 in ATController
(parsetree=parsetree@entry=0x55f03fe163d8, rel=rel@entry=0x7f2035994618, 
    cmds=0x55f03fe163a0, recurse=true, lockmode=lockmode@entry=8,
context=context@entry=0x7ffc24b6c810)
    at tablecmds.c:4376
#7  0x000055f03f7066a2 in AlterTable (stmt=stmt@entry=0x55f03fe163d8,
lockmode=lockmode@entry=8, 
    context=context@entry=0x7ffc24b6c810) at tablecmds.c:4023
#8  0x000055f03f8f7d47 in ProcessUtilitySlow
(pstate=pstate@entry=0x55f03ff278b0, pstmt=pstmt@entry=0x55f03fe166e8, 
    queryString=queryString@entry=0x55f03fe15690 "ALTER FOREIGN TABLE ft1
ALTER COLUMN c8 TYPE char(10);", 
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0, 
    dest=0x55f03fe167b8, qc=0x7ffc24b6cd20) at utility.c:1284
#9  0x000055f03f8f77bf in standard_ProcessUtility (pstmt=0x55f03fe166e8, 
    queryString=0x55f03fe15690 "ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE
char(10);", 
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55f03fe167b8, qc=0x7ffc24b6cd20)
    at utility.c:1034
#10 0x000055f03f8f789e in ProcessUtility (pstmt=pstmt@entry=0x55f03fe166e8,
queryString=<optimized out>, 
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
queryEnv=<optimized out>, 
    dest=dest@entry=0x55f03fe167b8, qc=0x7ffc24b6cd20) at utility.c:525
#11 0x000055f03f8f3c65 in PortalRunUtility
(portal=portal@entry=0x55f03fe790f0, pstmt=pstmt@entry=0x55f03fe166e8, 
    isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55f03fe167b8,

    qc=qc@entry=0x7ffc24b6cd20) at pquery.c:1159
#12 0x000055f03f8f48c0 in PortalRunMulti
(portal=portal@entry=0x55f03fe790f0, isTopLevel=isTopLevel@entry=true, 
    setHoldSnapshot=setHoldSnapshot@entry=false,
dest=dest@entry=0x55f03fe167b8, altdest=altdest@entry=0x55f03fe167b8, 
    qc=qc@entry=0x7ffc24b6cd20) at pquery.c:1305
#13 0x000055f03f8f559b in PortalRun (portal=portal@entry=0x55f03fe790f0,
count=count@entry=9223372036854775807, 
    isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true,
dest=dest@entry=0x55f03fe167b8, 
    altdest=altdest@entry=0x55f03fe167b8, qc=0x7ffc24b6cd20) at
pquery.c:779
#14 0x000055f03f8f1825 in exec_simple_query (
    query_string=query_string@entry=0x55f03fe15690 "ALTER FOREIGN TABLE ft1
ALTER COLUMN c8 TYPE char(10);")
    at postgres.c:1214
#15 0x000055f03f8f37f7 in PostgresMain (argc=argc@entry=1,
argv=argv@entry=0x7ffc24b6cf10, dbname=<optimized out>, 
    username=<optimized out>) at postgres.c:4486
#16 0x000055f03f84ee79 in BackendRun (port=port@entry=0x55f03fe36d20) at
postmaster.c:4491
#17 0x000055f03f852008 in BackendStartup (port=port@entry=0x55f03fe36d20) at
postmaster.c:4213
#18 0x000055f03f85224f in ServerLoop () at postmaster.c:1745
#19 0x000055f03f85379c in PostmasterMain (argc=3, argv=<optimized out>) at
postmaster.c:1417
#20 0x000055f03f7949f9 in main (argc=3, argv=0x55f03fe0f950) at main.c:209


PG Bug reporting form <noreply@postgresql.org> writes:
> CREATE FOREIGN DATA WRAPPER dummy;
> CREATE SERVER s0 FOREIGN DATA WRAPPER dummy;
> CREATE FOREIGN TABLE ft1 (c1 integer NOT NULL) SERVER s0;
> ALTER FOREIGN TABLE ft1 ADD COLUMN c8 integer DEFAULT 0;
> ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE char(10);

Hmm.  The equivalent DDL on a plain table works fine, but this is
crashing in the code that manipulates attmissingval.  I suspect some
confusion about whether a foreign table column should even *have*
attmissingval.  Andrew, any thoughts?

            regards, tom lane



On 6/10/21 6:10 PM, Tom Lane wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
>> CREATE FOREIGN DATA WRAPPER dummy;
>> CREATE SERVER s0 FOREIGN DATA WRAPPER dummy;
>> CREATE FOREIGN TABLE ft1 (c1 integer NOT NULL) SERVER s0;
>> ALTER FOREIGN TABLE ft1 ADD COLUMN c8 integer DEFAULT 0;
>> ALTER FOREIGN TABLE ft1 ALTER COLUMN c8 TYPE char(10);
> Hmm.  The equivalent DDL on a plain table works fine, but this is
> crashing in the code that manipulates attmissingval.  I suspect some
> confusion about whether a foreign table column should even *have*
> attmissingval.  Andrew, any thoughts?
>
>             


My initial thought would be that it should not. If the foreign table has
rows with missing columns then it should be up to the foreign server to
supply them transparently. We have no notion what the foreign semantics
of missing columns are.


I can take a look at a fix tomorrow. My inclination would be simply to
skip setting attmissingval for foreign tables.


cheers


andrew



--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Andrew Dunstan <andrew@dunslane.net> writes:
> On 6/10/21 6:10 PM, Tom Lane wrote:
>> Hmm.  The equivalent DDL on a plain table works fine, but this is
>> crashing in the code that manipulates attmissingval.  I suspect some
>> confusion about whether a foreign table column should even *have*
>> attmissingval.  Andrew, any thoughts?

> My initial thought would be that it should not. If the foreign table has
> rows with missing columns then it should be up to the foreign server to
> supply them transparently. We have no notion what the foreign semantics
> of missing columns are.

Yeah, that was kind of what I thought.  Probably only RELKIND_RELATION
rels should ever have attmissingval; but certainly, anything without
local storage should not.

> I can take a look at a fix tomorrow. My inclination would be simply to
> skip setting attmissingval for foreign tables.

Seems like in addition to that, we'll need a defense in this specific
code to cope with the case where the foreign column already has an
attmissingval.  Or maybe, the logic to not store a new one will be enough
to keep us from reaching this crash; but we need to be sure it is enough.

            regards, tom lane



On 6/10/21 7:11 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 6/10/21 6:10 PM, Tom Lane wrote:
>>> Hmm.  The equivalent DDL on a plain table works fine, but this is
>>> crashing in the code that manipulates attmissingval.  I suspect some
>>> confusion about whether a foreign table column should even *have*
>>> attmissingval.  Andrew, any thoughts?
>> My initial thought would be that it should not. If the foreign table has
>> rows with missing columns then it should be up to the foreign server to
>> supply them transparently. We have no notion what the foreign semantics
>> of missing columns are.
> Yeah, that was kind of what I thought.  Probably only RELKIND_RELATION
> rels should ever have attmissingval; but certainly, anything without
> local storage should not.
>
>> I can take a look at a fix tomorrow. My inclination would be simply to
>> skip setting attmissingval for foreign tables.
> Seems like in addition to that, we'll need a defense in this specific
> code to cope with the case where the foreign column already has an
> attmissingval.  Or maybe, the logic to not store a new one will be enough
> to keep us from reaching this crash; but we need to be sure it is enough.


The first piece could be fairly simply accomplished by something like this

diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index afa830d924..ac89efefe8 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -2287,7 +2287,8 @@ StoreAttrDefault(Relation rel, AttrNumber attnum,
        valuesAtt[Anum_pg_attribute_atthasdef - 1] = true;
        replacesAtt[Anum_pg_attribute_atthasdef - 1] = true;
 
-       if (add_column_mode && !attgenerated)
+       if (rel->rd_rel->relkind == RELKIND_RELATION  && add_column_mode &&
+           !attgenerated)
        {
            expr2 = expression_planner(expr2);
            estate = CreateExecutorState();


I'm guessing we want to exclude materialized views and partitioned
tables as well as things without local storage.

How to ignore something that's got into the catalog that shouldn't be
there is less clear. At the point where we fetch missing values all we
have access to is a TupleDesc.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On 6/11/21 5:59 PM, Andrew Dunstan wrote:
>
> How to ignore something that's got into the catalog that shouldn't be
> there is less clear. At the point where we fetch missing values all we
> have access to is a TupleDesc.
>
>

On further reflection I guess we'll need to make that check at the point
where we fill in the TupleDesc.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




On 6/10/21 7:11 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 6/10/21 6:10 PM, Tom Lane wrote:
>>> Hmm.  The equivalent DDL on a plain table works fine, but this is
>>> crashing in the code that manipulates attmissingval.  I suspect some
>>> confusion about whether a foreign table column should even *have*
>>> attmissingval.  Andrew, any thoughts?
>> My initial thought would be that it should not. If the foreign table has
>> rows with missing columns then it should be up to the foreign server to
>> supply them transparently. We have no notion what the foreign semantics
>> of missing columns are.
> Yeah, that was kind of what I thought.  Probably only RELKIND_RELATION
> rels should ever have attmissingval; but certainly, anything without
> local storage should not.
>
>> I can take a look at a fix tomorrow. My inclination would be simply to
>> skip setting attmissingval for foreign tables.
> Seems like in addition to that, we'll need a defense in this specific
> code to cope with the case where the foreign column already has an
> attmissingval.  Or maybe, the logic to not store a new one will be enough
> to keep us from reaching this crash; but we need to be sure it is enough.


Ok, I think the attached is the least we need to do. Given this I
haven't been able to induce a crash even when the catalog is hacked with
bogus missing values on a foreign table. But I'm not 100% convinced I
have fixed all the places that need to be fixed.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com


Attachment
On 6/12/21 5:50 PM, Andres Freund wrote:
> Hi,
>
> On 2021-06-12 17:40:38 -0400, Andrew Dunstan wrote:
>> Ok, I think the attached is the least we need to do. Given this I
>> haven't been able to induce a crash even when the catalog is hacked with
>> bogus missing values on a foreign table. But I'm not 100% convinced I
>> have fixed all the places that need to be fixed.
> Hm. There's a few places that look at atthasmissing and just assume that
> there's corresponding information about the missing field. And as far as
> I can see the proposed changes in RelationBuildTupleDesc() don't unset
> atthasmissing, they just prevent the constraint part of the tuple desc
> from being filled. Wouldn't this cause problems if we reach code like
>

Yes, you're right. This version should take care of things better.


Thanks for looking.


cheers


andrew


Attachment
On 2021-Jun-12, Andrew Dunstan wrote:

> +    /* Don't do anything unless it's a RELKIND type relation */
> +    if (tablerel->rd_rel->relkind != RELKIND_RELATION)
> +    {
> +        table_close(tablerel, AccessExclusiveLock);
> +        return;
> +    }

"RELKIND type relation" is the wrong phrase ... maybe "it's a plain
table" is good enough?  (Ditto in RelationBuildTupleDesc).

>      /*
>       * Here we go --- change the recorded column type and collation.  (Note
>       * heapTup is a copy of the syscache entry, so okay to scribble on.) First
> -     * fix up the missing value if any.
> +     * fix up the missing value if any. There shouldn't be any missing values
> +     * for anything except RELKIND_RELATION relations, but if there are, ignore
> +     * them.
>       */
> -    if (attTup->atthasmissing)
> +    if (rel->rd_rel->relkind == RELKIND_RELATION  && attTup->atthasmissing)

Would it be sensible to have a macro "AttributeHasMissingVal(rel,
attTup)", to use instead of reading atthasmissing directly?  The macro
would check the relkind, and also serve as documentation that said check
is necessary.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"



On 6/14/21 3:13 PM, Alvaro Herrera wrote:
> On 2021-Jun-12, Andrew Dunstan wrote:
>
>> +    /* Don't do anything unless it's a RELKIND type relation */
>> +    if (tablerel->rd_rel->relkind != RELKIND_RELATION)
>> +    {
>> +        table_close(tablerel, AccessExclusiveLock);
>> +        return;
>> +    }
> "RELKIND type relation" is the wrong phrase ... maybe "it's a plain
> table" is good enough?  (Ditto in RelationBuildTupleDesc).



OK, will change.



>
>>      /*
>>       * Here we go --- change the recorded column type and collation.  (Note
>>       * heapTup is a copy of the syscache entry, so okay to scribble on.) First
>> -     * fix up the missing value if any.
>> +     * fix up the missing value if any. There shouldn't be any missing values
>> +     * for anything except RELKIND_RELATION relations, but if there are, ignore
>> +     * them.
>>       */
>> -    if (attTup->atthasmissing)
>> +    if (rel->rd_rel->relkind == RELKIND_RELATION  && attTup->atthasmissing)
> Would it be sensible to have a macro "AttributeHasMissingVal(rel,
> attTup)", to use instead of reading atthasmissing directly?  The macro
> would check the relkind, and also serve as documentation that said check
> is necessary.



Well AFAIK this is the only place we actually need this combination of
tests, and I'm not a huge fan of defining a macro to use in one spot.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com