Re: Crash in record_type_typmod_compare - Mailing list pgsql-hackers

From Sait Talha Nisanci
Subject Re: Crash in record_type_typmod_compare
Date
Msg-id HE1PR8303MB00909BA3B6BBCB825990E78B917C9@HE1PR8303MB0090.EURPRD83.prod.outlook.com
Whole thread Raw
In response to Crash in record_type_typmod_compare  (Sait Talha Nisanci <Sait.Nisanci@microsoft.com>)
Responses Re: Crash in record_type_typmod_compare  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Crash in record_type_typmod_compare  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
>We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.

I am attaching this patch.







From: Sait Talha Nisanci
Sent: Wednesday, March 31, 2021 11:50 AM
To: pgsql-hackers <pgsql-hackers@postgresql.org>
Cc: Metin Doslu <Metin.Doslu@microsoft.com>
Subject: Crash in record_type_typmod_compare
 
Hello,

In citus, we have seen the following crash backtraces because of a NULL tupledesc multiple times and we weren't sure if this was related to citus or postgres:

#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
417	tupdesc.c: No such file or directory.
#0  equalTupleDescs (tupdesc1=0x0, tupdesc2=0x1b9f3f0) at tupdesc.c:417
#1  0x000000000085b51f in record_type_typmod_compare (a=<optimized out>, b=<optimized out>, size=<optimized out>) at typcache.c:1761
#2  0x0000000000869c73 in hash_search_with_hash_value (hashp=0x1c10530, keyPtr=keyPtr@entry=0x7ffcfd3117b8, hashvalue=3194332168, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:987
#3  0x000000000086a3fd in hash_search (hashp=<optimized out>, keyPtr=keyPtr@entry=0x7ffcfd3117b8, action=action@entry=HASH_ENTER, foundPtr=foundPtr@entry=0x7ffcfd3117c0) at dynahash.c:911
#4  0x000000000085d0e1 in assign_record_type_typmod (tupDesc=<optimized out>, tupDesc@entry=0x1b9f3f0) at typcache.c:1801
#5  0x000000000061832b in BlessTupleDesc (tupdesc=0x1b9f3f0) at execTuples.c:2056
#6  TupleDescGetAttInMetadata (tupdesc=0x1b9f3f0) at execTuples.c:2081
#7  0x00007f2701878dee in CreateDistributedExecution (modLevel=ROW_MODIFY_READONLY, taskList=0x1c82398, hasReturning=<optimized out>, paramListInfo=0x1c3e5a0, tupleDescriptor=0x1b9f3f0, tupleStore=<optimized out>, targetPoolSize=16, xactProperties=0x7ffcfd311960, jobIdList=0x0) at executor/adaptive_executor.c:951
#8  0x00007f270187ba09 in AdaptiveExecutor (scanState=0x1b9eff0) at executor/adaptive_executor.c:676
#9  0x00007f270187c582 in CitusExecScan (node=0x1b9eff0) at executor/citus_custom_scan.c:182
#10 0x000000000060c9e2 in ExecProcNode (node=0x1b9eff0) at ../../../src/include/executor/executor.h:239
#11 ExecutePlan (execute_once=<optimized out>, dest=0x1abfc90, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1b9eff0, estate=0x1b9ed50) at execMain.c:1646
#12 standard_ExecutorRun (queryDesc=0x1c3e660, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:364
#13 0x00007f27018819bd in CitusExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=true) at executor/multi_executor.c:177
#14 0x00007f27000adfee in pgss_ExecutorRun (queryDesc=0x1c3e660, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at pg_stat_statements.c:891
#15 0x000000000074f97d in PortalRunSelect (portal=portal@entry=0x1b8ed00, forward=forward@entry=true, count=0, count@entry=9223372036854775807, dest=dest@entry=0x1abfc90) at pquery.c:929
#16 0x0000000000750df0 in PortalRun (portal=portal@entry=0x1b8ed00, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=<optimized out>, dest=dest@entry=0x1abfc90, altdest=altdest@entry=0x1abfc90, completionTag=0x7ffcfd312090 "") at pquery.c:770
#17 0x000000000074e745 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1abf880 "") at postgres.c:2090
#18 PostgresMain (argc=<optimized out>, argv=argv@entry=0x1b4a0e8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4308
#19 0x00000000006de9d8 in BackendRun (port=0x1b37230, port=0x1b37230) at postmaster.c:4437
#20 BackendStartup (port=0x1b37230) at postmaster.c:4128
#21 ServerLoop () at postmaster.c:1704
#22 0x00000000006df955 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1aba280) at postmaster.c:1377
#23 0x0000000000487a4e in main (argc=3, argv=0x1aba280) at main.c:228

This is the issue: https://github.com/citusdata/citus/issues/3825


I think this is related to postgres because of the following events:

I have manually added a line to error in "find_or_make_matching_shared_tupledesc" and I was able to get a similar crash with two subsequent simple SELECT queries. You can see the backtrace in the issue.

We should probably do HASH_ENTER only after we have a valid entry so that we don't end up with a NULL entry in the cache even if an intermediate error happens. I will share a fix in this thread soon.

Best,
Talha.

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [PATCH] add concurrent_abort callback for output plugin
Next
From: Denis Hirn
Date:
Subject: Re: [PATCH] Allow multiple recursive self-references