Thread: Crash in record_type_typmod_compare

Crash in record_type_typmod_compare

From

Sait Talha Nisanci

Date:

31 March 2021, 08:50:00

Hello,

In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:

#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417	tupdesc.c: No such file or directory.
#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1  0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2  0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:987
#3  0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4  0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5  0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6  TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7  0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8  0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9  0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, dest=dest@entry=0x1abfc90, altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228

This is the issue: https://github.com/citusdata/citus/issues/3825

I think this is related to postgres because of the following events:

In assign_record_type_typmod tupledesc will be set to NULL if it is not in the cache and it will be set to an actual value in this line.
It is possible that postgres will error in between these two lines, hence leaving a NULL tupledesc in the cache. For example in find_or_make_matching_shared_tupledesc. (Possibly because of OOM)
Now there is a NULL tupledesc in the hash table, hence when doing a comparison in record_type_typmod_compare, it will crash.

I have manually added a line to error in "find_or_make_matching_shared_tupledesc" and I was able to get a similar crash with two subsequent simple SELECT queries. You can see the backtrace in the issue.

We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.

Best,

Talha.

Re: Crash in record_type_typmod_compare

From

Sait Talha Nisanci

Date:

31 March 2021, 13:26:34

>We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.

I am attaching this patch.

From: Sait Talha Nisanci
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: Metin Doslu <Metin.Doslu@microsoft.com>
Subject: Crash in record_type_typmod_compare

Hello,

In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:

#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417	tupdesc.c: No such file or directory.
#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1  0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2  0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:987
#3  0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4  0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5  0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6  TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7  0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8  0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9  0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, dest=dest@entry=0x1abfc90, altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228

This is the issue: https://github.com/citusdata/citus/issues/3825

I think this is related to postgres because of the following events:

In assign_record_type_typmod tupledesc will be set to NULL if it is not in the cache and it will be set to an actual value in this line.
It is possible that postgres will error in between these two lines, hence leaving a NULL tupledesc in the cache. For example in find_or_make_matching_shared_tupledesc. (Possibly because of OOM)
Now there is a NULL tupledesc in the hash table, hence when doing a comparison in record_type_typmod_compare, it will crash.

We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.

Best,

Talha.

Attachment

0001-fix_crash_in_typmod_compare.patch

Re: Crash in record_type_typmod_compare

From

Tom Lane

Date:

31 March 2021, 17:10:50

Sait Talha Nisanci <Sait.Nisanci@microsoft.com> writes:
>> We should probably do
HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974>
onlyafter we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error
happens.I will share a fix in this thread soon. 
> I am attaching this patch.

I see the hazard, but this seems like an expensive way to fix it,
as it forces two hash searches for every insertion.  Couldn't we just
teach record_type_typmod_compare to say "not equal" if it sees a
null tupdesc?

            regards, tom lane

Re: Crash in record_type_typmod_compare

From

Andres Freund

Date:

31 March 2021, 19:29:53

Hi,

On 2021-03-31 13:10:50 -0400, Tom Lane wrote:
> Sait Talha Nisanci <Sait.Nisanci@microsoft.com> writes:
> >> We should probably do
HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974>
onlyafter we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error
happens.I will share a fix in this thread soon.

> > I am attaching this patch.
> 
> I see the hazard, but this seems like an expensive way to fix it,
> as it forces two hash searches for every insertion.

Obviously not free - but at least it'd be overhead only in the insertion
path. And the bucket should still be in L1 cache for the second
insertion...

I doubt that the cost of a separate HASH_ENTER is all that significant
compared to find_or_make_matching_shared_tupledesc/CreateTupleDescCopy?

We do the separate HASH_FIND/HASH_ENTER in plenty other places that are
much hotter than assign_record_type_typmod(),
e.g. RelationIdGetRelation().

It could even be that the additional branches in the comparator would
end up costing more in the aggregate...

> Couldn't we just
> teach record_type_typmod_compare to say "not equal" if it sees a
> null tupdesc?

Won't that lead to an accumulation of dead hash table entries over time?

It also just seems quite wrong to have hash table entries that cannot
ever be found via HASH_FIND/HASH_REMOVE, because
record_type_typmod_compare() returns false once there's a NULL in there.

Greetings,

Andres Freund

Re: Crash in record_type_typmod_compare

From

Tom Lane

Date:

31 March 2021, 19:34:22

Andres Freund <andres@anarazel.de> writes:
> On 2021-03-31 13:10:50 -0400, Tom Lane wrote:
>> Couldn't we just
>> teach record_type_typmod_compare to say "not equal" if it sees a
>> null tupdesc?

> Won't that lead to an accumulation of dead hash table entries over time?

Yeah, if you have repeat failures there, which doesn't seem very likely.
Still, I take your point that we're doing it the first way in other
places, so maybe inventing a different way here isn't good.

            regards, tom lane

Re: Crash in record_type_typmod_compare

From

Andres Freund

Date:

31 March 2021, 19:49:26

Hi,

On 2021-03-31 13:26:34 +0000, Sait Talha Nisanci wrote:
> diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
> index 4915ef5934..4757e8fa80 100644
> --- a/src/backend/utils/cache/typcache.c
> +++ b/src/backend/utils/cache/typcache.c
> @@ -1970,18 +1970,16 @@ assign_record_type_typmod(TupleDesc tupDesc)
>              CreateCacheMemoryContext();
>      }
>  
> -    /* Find or create a hashtable entry for this tuple descriptor */
> +    /* Find a hashtable entry for this tuple descriptor */
>      recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
>                                                  (void *) &tupDesc,
> -                                                HASH_ENTER, &found);
> +                                                HASH_FIND, &found);
>      if (found && recentry->tupdesc != NULL)
>      {
>          tupDesc->tdtypmod = recentry->tupdesc->tdtypmod;
>          return;
>      }
>  
> -    /* Not present, so need to manufacture an entry */
> -    recentry->tupdesc = NULL;
>      oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>  
>      /* Look in the SharedRecordTypmodRegistry, if attached */
> @@ -1995,6 +1993,10 @@ assign_record_type_typmod(TupleDesc tupDesc)
>      }
>      ensure_record_cache_typmod_slot_exists(entDesc->tdtypmod);
>      RecordCacheArray[entDesc->tdtypmod] = entDesc;
> +    /* Not present, so need to manufacture an entry */
> +    recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> +                                                (void *) &tupDesc,
> +                                                HASH_ENTER, NULL);
>      recentry->tupdesc = entDesc;

ISTM that the ensure_record_cache_typmod_slot_exists() should be moved
to before find_or_make_matching_shared_tupledesc /
CreateTupleDescCopy. Otherwise we can leak if the CreateTupleDescCopy()
succeeds but ensure_record_cache_typmod_slot_exists() fails. Conversely,
if ensure_record_cache_typmod_slot_exists() succeeds, but
CreateTupleDescCopy() we won't leak, since the former is just
repalloc()ing allocations.

Greetings,

Andres Freund

Re: [EXTERNAL] Re: Crash in record_type_typmod_compare

From

Sait Talha Nisanci

Date:

05 April 2021, 12:07:29

Hi Andres,

Please see the updated patch, do you mean something like this? (there might be a simpler way for doing this)

Best,

Talha.

From: Andres Freund <andres@anarazel.de>
Sent: Wednesday, March 31, 2021 10:49 PM
To: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Metin Doslu <Metin.Doslu@microsoft.com>
Subject: [EXTERNAL] Re: Crash in record_type_typmod_compare

Hi,

On 2021-03-31 13:26:34 +0000, Sait Talha Nisanci wrote:
> diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
> index 4915ef5934..4757e8fa80 100644
> --- a/src/backend/utils/cache/typcache.c
> +++ b/src/backend/utils/cache/typcache.c
> @@ -1970,18 +1970,16 @@ assign_record_type_typmod(TupleDesc tupDesc)
>                        CreateCacheMemoryContext();
>        }
>
> -     /* Find or create a hashtable entry for this tuple descriptor */
> +     /* Find a hashtable entry for this tuple descriptor */
>        recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
>                                                                                                (void *) &tupDesc,
> -                                                                                             HASH_ENTER, &found);
> +                                                                                             HASH_FIND, &found);
>        if (found && recentry->tupdesc != NULL)
>        {
>                tupDesc->tdtypmod = recentry->tupdesc->tdtypmod;
>                return;
>        }
>
> -     /* Not present, so need to manufacture an entry */
> -     recentry->tupdesc = NULL;
>        oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>
>        /* Look in the SharedRecordTypmodRegistry, if attached */
> @@ -1995,6 +1993,10 @@ assign_record_type_typmod(TupleDesc tupDesc)
>        }
>        ensure_record_cache_typmod_slot_exists(entDesc->tdtypmod);
>        RecordCacheArray[entDesc->tdtypmod] = entDesc;
> +     /* Not present, so need to manufacture an entry */
> +     recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> +                                                                                             (void *) &tupDesc,
> +                                                                                             HASH_ENTER, NULL);
>        recentry->tupdesc = entDesc;

ISTM that the ensure_record_cache_typmod_slot_exists() should be moved
to before find_or_make_matching_shared_tupledesc /
CreateTupleDescCopy. Otherwise we can leak if the CreateTupleDescCopy()
succeeds but ensure_record_cache_typmod_slot_exists() fails. Conversely,
if ensure_record_cache_typmod_slot_exists() succeeds, but
CreateTupleDescCopy() we won't leak, since the former is just
repalloc()ing allocations.

Greetings,

Andres Freund

Attachment

0002_fix_crash_in_typmod_compare.patch

Re: [EXTERNAL] Re: Crash in record_type_typmod_compare

From

Jeff Davis

Date:

10 July 2021, 17:58:29

On Mon, 2021-04-05 at 12:07 +0000, Sait Talha Nisanci wrote:
> Hi Andres,
> 
> Please see the updated patch, do you mean something like this? (there
> might be a simpler way for doing this)
> 

Committed with minor revisions.

My patch also avoids incrementing NextRecordTypmod until we've already
called CreateTupleDescCopy(). Otherwise, an allocation failure could
leave NextRecordTypmod unnecessarily incremented, which is a tiny leak.

Regards,
    Jeff Davis