COPY FROM crash - Mailing list pgsql-hackers

From Zhang Mingli
Subject COPY FROM crash
Date
Msg-id f722b8fb-1962-4015-8578-e2bd77818ac9@Spark
Whole thread Raw
Responses Re: COPY FROM crash
Re: COPY FROM crash
List pgsql-hackers
Hi, all

I got a crash when copy partition tables with mass data in Cloudberry DB[0](based on Postgres14.4, Greenplum 7).

I have a test on Postgres and it has the similar issue(different places but same function).

However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insert buffer is flushed.

To reproduce easily, change the Macros to:

#define MAX_BUFFERED_TUPLES 1
#define MAX_PARTITION_BUFFERS 0

Config and make install, when initdb, a core dump will be as:
 
#0 0x000055de617211b9 in CopyMultiInsertInfoNextFreeSlot (miinfo=0x7ffce496d360, rri=0x55de6368ba88)
 at copyfrom.c:592
#1 0x000055de61721ff1 in CopyFrom (cstate=0x55de63592ce8) at copyfrom.c:985
#2 0x000055de6171dd86 in DoCopy (pstate=0x55de63589e00, stmt=0x55de635347d8, stmt_location=0, stmt_len=195,
 processed=0x7ffce496d590) at copy.c:306
#3 0x000055de61ad7ce8 in standard_ProcessUtility (pstmt=0x55de635348a8,
 queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
 readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
 qc=0x7ffce496d910) at utility.c:735
#4 0x000055de61ad7614 in ProcessUtility (pstmt=0x55de635348a8,
 queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
 readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
 qc=0x7ffce496d910) at utility.c:523
#5 0x000055de61ad5e8f in PortalRunUtility (portal=0x55de633dd7a0, pstmt=0x55de635348a8, isTopLevel=true,
 setHoldSnapshot=false, dest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1158
#6 0x000055de61ad6106 in PortalRunMulti (portal=0x55de633dd7a0, isTopLevel=true, setHoldSnapshot=false,
 dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1315
#7 0x000055de61ad5550 in PortalRun (portal=0x55de633dd7a0, count=9223372036854775807, isTopLevel=true,
 run_once=true, dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910)
 at pquery.c:791```


The root cause is:  we may call CopyMultiInsertInfoFlush() to flush buffer during COPY tuples, ex: insert from next tuple, 
CopyMultiInsertInfoNextFreeSlot() will get a crash due to null pointer of buffer.

To fix it: instead of call CopyMultiInsertInfoSetupBuffer() outside, I put it into CopyMultiInsertInfoNextFreeSlot() to avoid such issues.

[0] https://github.com/cloudberrydb/cloudberrydb


Zhang Mingli
www.hashdata.xyz
Attachment

pgsql-hackers by date:

Previous
From: Sutou Kouhei
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Amit Kapila
Date:
Subject: Re: 040_pg_createsubscriber.pl is slow and unstable (was Re: speed up a logical replica setup)