Thread: Crash in record_type_typmod_compare
Hello,
In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417 tupdesc.c: No such file or directory.
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1 0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2 0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:987
#3 0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4 0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5 0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6 TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7 0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8 0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9 0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, dest=dest@entry=0x1abfc90, altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228
I think this is related to postgres because of the following events:
- In assign_record_type_typmod tupledesc will be set to NULL if it is not in the cache and it will be set to an actual value in this line.
- It is possible that postgres will error in between these two lines, hence leaving a NULL tupledesc in the cache. For example in find_or_make_matching_shared_tupledesc. (Possibly because of OOM)
- Now there is a NULL tupledesc in the hash table, hence when doing a comparison in record_type_typmod_compare, it will crash.
I have manually added a line to error in "find_or_make_matching_shared_tupledesc" and I was able to get a similar crash with two subsequent simple SELECT queries. You can see the backtrace in the issue.
We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.
Best,
Talha.
>We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.
I am attaching this patch.
From: Sait Talha Nisanci
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: Metin Doslu <Metin.Doslu@microsoft.com>
Subject: Crash in record_type_typmod_compare
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: Metin Doslu <Metin.Doslu@microsoft.com>
Subject: Crash in record_type_typmod_compare
Hello,
In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417 tupdesc.c: No such file or directory.
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1 0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2 0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:987
#3 0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4 0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5 0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6 TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7 0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8 0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9 0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, dest=dest@entry=0x1abfc90, altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228
I think this is related to postgres because of the following events:
- In assign_record_type_typmod tupledesc will be set to NULL if it is not in the cache and it will be set to an actual value in this line.
- It is possible that postgres will error in between these two lines, hence leaving a NULL tupledesc in the cache. For example in find_or_make_matching_shared_tupledesc. (Possibly because of OOM)
- Now there is a NULL tupledesc in the hash table, hence when doing a comparison in record_type_typmod_compare, it will crash.
I have manually added a line to error in "find_or_make_matching_shared_tupledesc" and I was able to get a similar crash with two subsequent simple SELECT queries. You can see the backtrace in the issue.
We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.
Best,
Talha.
Attachment
Sait Talha Nisanci <Sait.Nisanci@microsoft.com> writes: >> We should probably do HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974> onlyafter we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens.I will share a fix in this thread soon. > I am attaching this patch. I see the hazard, but this seems like an expensive way to fix it, as it forces two hash searches for every insertion. Couldn't we just teach record_type_typmod_compare to say "not equal" if it sees a null tupdesc? regards, tom lane
Hi, On 2021-03-31 13:10:50 -0400, Tom Lane wrote: > Sait Talha Nisanci <Sait.Nisanci@microsoft.com> writes: > >> We should probably do HASH_ENTER<https://github.com/postgres/postgres/blob/1509c6fc29c07d13c9a590fbd6f37c7576f58ba6/src/backend/utils/cache/typcache.c#L1974> onlyafter we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens.I will share a fix in this thread soon. > > I am attaching this patch. > > I see the hazard, but this seems like an expensive way to fix it, > as it forces two hash searches for every insertion. Obviously not free - but at least it'd be overhead only in the insertion path. And the bucket should still be in L1 cache for the second insertion... I doubt that the cost of a separate HASH_ENTER is all that significant compared to find_or_make_matching_shared_tupledesc/CreateTupleDescCopy? We do the separate HASH_FIND/HASH_ENTER in plenty other places that are much hotter than assign_record_type_typmod(), e.g. RelationIdGetRelation(). It could even be that the additional branches in the comparator would end up costing more in the aggregate... > Couldn't we just > teach record_type_typmod_compare to say "not equal" if it sees a > null tupdesc? Won't that lead to an accumulation of dead hash table entries over time? It also just seems quite wrong to have hash table entries that cannot ever be found via HASH_FIND/HASH_REMOVE, because record_type_typmod_compare() returns false once there's a NULL in there. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2021-03-31 13:10:50 -0400, Tom Lane wrote: >> Couldn't we just >> teach record_type_typmod_compare to say "not equal" if it sees a >> null tupdesc? > Won't that lead to an accumulation of dead hash table entries over time? Yeah, if you have repeat failures there, which doesn't seem very likely. Still, I take your point that we're doing it the first way in other places, so maybe inventing a different way here isn't good. regards, tom lane
Hi, On 2021-03-31 13:26:34 +0000, Sait Talha Nisanci wrote: > diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c > index 4915ef5934..4757e8fa80 100644 > --- a/src/backend/utils/cache/typcache.c > +++ b/src/backend/utils/cache/typcache.c > @@ -1970,18 +1970,16 @@ assign_record_type_typmod(TupleDesc tupDesc) > CreateCacheMemoryContext(); > } > > - /* Find or create a hashtable entry for this tuple descriptor */ > + /* Find a hashtable entry for this tuple descriptor */ > recentry = (RecordCacheEntry *) hash_search(RecordCacheHash, > (void *) &tupDesc, > - HASH_ENTER, &found); > + HASH_FIND, &found); > if (found && recentry->tupdesc != NULL) > { > tupDesc->tdtypmod = recentry->tupdesc->tdtypmod; > return; > } > > - /* Not present, so need to manufacture an entry */ > - recentry->tupdesc = NULL; > oldcxt = MemoryContextSwitchTo(CacheMemoryContext); > > /* Look in the SharedRecordTypmodRegistry, if attached */ > @@ -1995,6 +1993,10 @@ assign_record_type_typmod(TupleDesc tupDesc) > } > ensure_record_cache_typmod_slot_exists(entDesc->tdtypmod); > RecordCacheArray[entDesc->tdtypmod] = entDesc; > + /* Not present, so need to manufacture an entry */ > + recentry = (RecordCacheEntry *) hash_search(RecordCacheHash, > + (void *) &tupDesc, > + HASH_ENTER, NULL); > recentry->tupdesc = entDesc; ISTM that the ensure_record_cache_typmod_slot_exists() should be moved to before find_or_make_matching_shared_tupledesc / CreateTupleDescCopy. Otherwise we can leak if the CreateTupleDescCopy() succeeds but ensure_record_cache_typmod_slot_exists() fails. Conversely, if ensure_record_cache_typmod_slot_exists() succeeds, but CreateTupleDescCopy() we won't leak, since the former is just repalloc()ing allocations. Greetings, Andres Freund
Hi Andres,
Please see the updated patch, do you mean something like this? (there might be a simpler way for doing this)
Best,
Talha.
From: Andres Freund <andres@anarazel.de>
Sent: Wednesday, March 31, 2021 10:49 PM
To: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Metin Doslu <Metin.Doslu@microsoft.com>
Subject: [EXTERNAL] Re: Crash in record_type_typmod_compare
Sent: Wednesday, March 31, 2021 10:49 PM
To: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Cc: pgsql-hackers <pgsql-hackers@postgresql.org>; Metin Doslu <Metin.Doslu@microsoft.com>
Subject: [EXTERNAL] Re: Crash in record_type_typmod_compare
Hi,
On 2021-03-31 13:26:34 +0000, Sait Talha Nisanci wrote:
> diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
> index 4915ef5934..4757e8fa80 100644
> --- a/src/backend/utils/cache/typcache.c
> +++ b/src/backend/utils/cache/typcache.c
> @@ -1970,18 +1970,16 @@ assign_record_type_typmod(TupleDesc tupDesc)
> CreateCacheMemoryContext();
> }
>
> - /* Find or create a hashtable entry for this tuple descriptor */
> + /* Find a hashtable entry for this tuple descriptor */
> recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> (void *) &tupDesc,
> - HASH_ENTER, &found);
> + HASH_FIND, &found);
> if (found && recentry->tupdesc != NULL)
> {
> tupDesc->tdtypmod = recentry->tupdesc->tdtypmod;
> return;
> }
>
> - /* Not present, so need to manufacture an entry */
> - recentry->tupdesc = NULL;
> oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>
> /* Look in the SharedRecordTypmodRegistry, if attached */
> @@ -1995,6 +1993,10 @@ assign_record_type_typmod(TupleDesc tupDesc)
> }
> ensure_record_cache_typmod_slot_exists(entDesc->tdtypmod);
> RecordCacheArray[entDesc->tdtypmod] = entDesc;
> + /* Not present, so need to manufacture an entry */
> + recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> + (void *) &tupDesc,
> + HASH_ENTER, NULL);
> recentry->tupdesc = entDesc;
ISTM that the ensure_record_cache_typmod_slot_exists() should be moved
to before find_or_make_matching_shared_tupledesc /
CreateTupleDescCopy. Otherwise we can leak if the CreateTupleDescCopy()
succeeds but ensure_record_cache_typmod_slot_exists() fails. Conversely,
if ensure_record_cache_typmod_slot_exists() succeeds, but
CreateTupleDescCopy() we won't leak, since the former is just
repalloc()ing allocations.
Greetings,
Andres Freund
On 2021-03-31 13:26:34 +0000, Sait Talha Nisanci wrote:
> diff --git a/src/backend/utils/cache/typcache.c b/src/backend/utils/cache/typcache.c
> index 4915ef5934..4757e8fa80 100644
> --- a/src/backend/utils/cache/typcache.c
> +++ b/src/backend/utils/cache/typcache.c
> @@ -1970,18 +1970,16 @@ assign_record_type_typmod(TupleDesc tupDesc)
> CreateCacheMemoryContext();
> }
>
> - /* Find or create a hashtable entry for this tuple descriptor */
> + /* Find a hashtable entry for this tuple descriptor */
> recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> (void *) &tupDesc,
> - HASH_ENTER, &found);
> + HASH_FIND, &found);
> if (found && recentry->tupdesc != NULL)
> {
> tupDesc->tdtypmod = recentry->tupdesc->tdtypmod;
> return;
> }
>
> - /* Not present, so need to manufacture an entry */
> - recentry->tupdesc = NULL;
> oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>
> /* Look in the SharedRecordTypmodRegistry, if attached */
> @@ -1995,6 +1993,10 @@ assign_record_type_typmod(TupleDesc tupDesc)
> }
> ensure_record_cache_typmod_slot_exists(entDesc->tdtypmod);
> RecordCacheArray[entDesc->tdtypmod] = entDesc;
> + /* Not present, so need to manufacture an entry */
> + recentry = (RecordCacheEntry *) hash_search(RecordCacheHash,
> + (void *) &tupDesc,
> + HASH_ENTER, NULL);
> recentry->tupdesc = entDesc;
ISTM that the ensure_record_cache_typmod_slot_exists() should be moved
to before find_or_make_matching_shared_tupledesc /
CreateTupleDescCopy. Otherwise we can leak if the CreateTupleDescCopy()
succeeds but ensure_record_cache_typmod_slot_exists() fails. Conversely,
if ensure_record_cache_typmod_slot_exists() succeeds, but
CreateTupleDescCopy() we won't leak, since the former is just
repalloc()ing allocations.
Greetings,
Andres Freund
Attachment
On Mon, 2021-04-05 at 12:07 +0000, Sait Talha Nisanci wrote: > Hi Andres, > > Please see the updated patch, do you mean something like this? (there > might be a simpler way for doing this) > Committed with minor revisions. My patch also avoids incrementing NextRecordTypmod until we've already called CreateTupleDescCopy(). Otherwise, an allocation failure could leave NextRecordTypmod unnecessarily incremented, which is a tiny leak. Regards, Jeff Davis