Re: Crash in record_type_typmod_compare - Mailing list pgsql-hackers
From | Sait Talha Nisanci |
---|---|
Subject | Re: Crash in record_type_typmod_compare |
Date | |
Msg-id | HE1PR8303MB00909BA3B6BBCB825990E78B917C9@HE1PR8303MB0090.EURPRD83.prod.outlook.com Whole thread Raw |
In response to | Crash in record_type_typmod_compare (Sait Talha Nisanci <Sait.Nisanci@microsoft.com>) |
Responses |
Re: Crash in record_type_typmod_compare
Re: Crash in record_type_typmod_compare |
List | pgsql-hackers |
>We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.
I am attaching this patch.
From: Sait Talha Nisanci
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: Metin Doslu <Metin.Doslu@microsoft.com>
Subject: Crash in record_type_typmod_compare
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: Metin Doslu <Metin.Doslu@microsoft.com>
Subject: Crash in record_type_typmod_compare
Hello,
In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417 tupdesc.c: No such file or directory.
#0 equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1 0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2 0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:987
#3 0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4 0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5 0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6 TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7 0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8 0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9 0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, dest=dest@entry=0x1abfc90, altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228
I think this is related to postgres because of the following events:
- In assign_record_type_typmod tupledesc will be set to NULL if it is not in the cache and it will be set to an actual value in this line.
- It is possible that postgres will error in between these two lines, hence leaving a NULL tupledesc in the cache. For example in find_or_make_matching_shared_tupledesc. (Possibly because of OOM)
- Now there is a NULL tupledesc in the hash table, hence when doing a comparison in record_type_typmod_compare, it will crash.
I have manually added a line to error in "find_or_make_matching_shared_tupledesc" and I was able to get a similar crash with two subsequent simple SELECT queries. You can see the backtrace in the issue.
We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.
Best,
Talha.
Attachment
pgsql-hackers by date: