Thread: COPY FROM crash
Hi, all
I got a crash when copy partition tables with mass data in Cloudberry DB[0](based on Postgres14.4, Greenplum 7).
I have a test on Postgres and it has the similar issue(different places but same function).
However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insert buffer is flushed.
To reproduce easily, change the Macros to:
#define MAX_BUFFERED_TUPLES 1
#define MAX_PARTITION_BUFFERS 0
Config and make install, when initdb, a core dump will be as:
#0 0x000055de617211b9 in CopyMultiInsertInfoNextFreeSlot (miinfo=0x7ffce496d360, rri=0x55de6368ba88)
at copyfrom.c:592
#1 0x000055de61721ff1 in CopyFrom (cstate=0x55de63592ce8) at copyfrom.c:985
#2 0x000055de6171dd86 in DoCopy (pstate=0x55de63589e00, stmt=0x55de635347d8, stmt_location=0, stmt_len=195,
processed=0x7ffce496d590) at copy.c:306
#3 0x000055de61ad7ce8 in standard_ProcessUtility (pstmt=0x55de635348a8,
queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
qc=0x7ffce496d910) at utility.c:735
#4 0x000055de61ad7614 in ProcessUtility (pstmt=0x55de635348a8,
queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
qc=0x7ffce496d910) at utility.c:523
#5 0x000055de61ad5e8f in PortalRunUtility (portal=0x55de633dd7a0, pstmt=0x55de635348a8, isTopLevel=true,
setHoldSnapshot=false, dest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1158
#6 0x000055de61ad6106 in PortalRunMulti (portal=0x55de633dd7a0, isTopLevel=true, setHoldSnapshot=false,
dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1315
#7 0x000055de61ad5550 in PortalRun (portal=0x55de633dd7a0, count=9223372036854775807, isTopLevel=true,
run_once=true, dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910)
at pquery.c:791```
The root cause is: we may call CopyMultiInsertInfoFlush() to flush buffer during COPY tuples, ex: insert from next tuple,
CopyMultiInsertInfoNextFreeSlot() will get a crash due to null pointer of buffer.
To fix it: instead of call CopyMultiInsertInfoSetupBuffer() outside, I put it into CopyMultiInsertInfoNextFreeSlot() to avoid such issues.
[0] https://github.com/cloudberrydb/cloudberrydb
I got a crash when copy partition tables with mass data in Cloudberry DB[0](based on Postgres14.4, Greenplum 7).
I have a test on Postgres and it has the similar issue(different places but same function).
However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insert buffer is flushed.
To reproduce easily, change the Macros to:
#define MAX_BUFFERED_TUPLES 1
#define MAX_PARTITION_BUFFERS 0
Config and make install, when initdb, a core dump will be as:
#0 0x000055de617211b9 in CopyMultiInsertInfoNextFreeSlot (miinfo=0x7ffce496d360, rri=0x55de6368ba88)
at copyfrom.c:592
#1 0x000055de61721ff1 in CopyFrom (cstate=0x55de63592ce8) at copyfrom.c:985
#2 0x000055de6171dd86 in DoCopy (pstate=0x55de63589e00, stmt=0x55de635347d8, stmt_location=0, stmt_len=195,
processed=0x7ffce496d590) at copy.c:306
#3 0x000055de61ad7ce8 in standard_ProcessUtility (pstmt=0x55de635348a8,
queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
qc=0x7ffce496d910) at utility.c:735
#4 0x000055de61ad7614 in ProcessUtility (pstmt=0x55de635348a8,
queryString=0x55de63533960 "COPY information_schema.sql_features (feature_id, feature_name, sub_feature_id, sub
_feature_name, is_supported, comments) FROM E'/home/gpadmin/install/pg17/share/postgresql/sql_features.txt';\n",
readOnlyTree=false, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55de620b0ce0 <debugtupDR>,
qc=0x7ffce496d910) at utility.c:523
#5 0x000055de61ad5e8f in PortalRunUtility (portal=0x55de633dd7a0, pstmt=0x55de635348a8, isTopLevel=true,
setHoldSnapshot=false, dest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1158
#6 0x000055de61ad6106 in PortalRunMulti (portal=0x55de633dd7a0, isTopLevel=true, setHoldSnapshot=false,
dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910) at pquery.c:1315
#7 0x000055de61ad5550 in PortalRun (portal=0x55de633dd7a0, count=9223372036854775807, isTopLevel=true,
run_once=true, dest=0x55de620b0ce0 <debugtupDR>, altdest=0x55de620b0ce0 <debugtupDR>, qc=0x7ffce496d910)
at pquery.c:791```
The root cause is: we may call CopyMultiInsertInfoFlush() to flush buffer during COPY tuples, ex: insert from next tuple,
CopyMultiInsertInfoNextFreeSlot() will get a crash due to null pointer of buffer.
To fix it: instead of call CopyMultiInsertInfoSetupBuffer() outside, I put it into CopyMultiInsertInfoNextFreeSlot() to avoid such issues.
[0] https://github.com/cloudberrydb/cloudberrydb
www.hashdata.xyz
Attachment
On Tue, 30 Jul 2024 at 15:52, Zhang Mingli <zmlpostgres@gmail.com> wrote: > I have a test on Postgres and it has the similar issue(different places but same function). > > However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insertbuffer is flushed. > > To reproduce easily, change the Macros to: > > #define MAX_BUFFERED_TUPLES 1 > #define MAX_PARTITION_BUFFERS 0 I think you're going to need to demonstrate to us there's an actual PostgreSQL bug here with a test case that causes a crash without changing the above definitions. It seems to me that it's not valid to set MAX_PARTITION_BUFFERS to anything less than 2 due to the code inside CopyMultiInsertInfoFlush(). If we find the CopyMultiInsertBuffer for 'curr_rri' then that code would misbehave if the list only contained a single CopyMultiInsertBuffer due to the expectation there's another item in the list after the list_delete_first(). If you're only able to get it to misbehave by setting MAX_PARTITION_BUFFERS to less than 2, then my suggested fix would be to add a comment to say that values less than to are not supported. David
Hi! On Tue, 30 Jul 2024 at 08:52, Zhang Mingli <zmlpostgres@gmail.com> wrote: > > Hi, all > > I got a crash when copy partition tables with mass data in Cloudberry DB[0](based on Postgres14.4, Greenplum 7). > > I have a test on Postgres and it has the similar issue(different places but same function). Just to be clear, you are facing this on HEAD, on on REL_14_STABLE? > However it’s a little hard to reproduce because it happened when inserting next tuple after a previous copy multi insertbuffer is flushed. > > To reproduce easily, change the Macros to: > > #define MAX_BUFFERED_TUPLES 1 > #define MAX_PARTITION_BUFFERS 0 This way it's harder to believe that the problem persists with the original settings. Are these values valid?
Hi,
www.hashdata.xyz
On Jul 30, 2024 at 13:35 +0800, David Rowley <dgrowleyml@gmail.com>, wrote:
If you're only able
to get it to misbehave by setting MAX_PARTITION_BUFFERS to less than
2, then my suggested fix would be to add a comment to say that values
less than to are not supported.
Right.
On Postgres this crash could only happen if it is set to zero.
I’ve updated the comments in patch v1 with an additional Assert.
On Postgres this crash could only happen if it is set to zero.
I’ve updated the comments in patch v1 with an additional Assert.
Attachment
Hi,
www.hashdata.xyz
On Jul 30, 2024 at 13:37 +0800, Kirill Reshke <reshkekirill@gmail.com>, wrote:
Just to be clear, you are facing this on HEAD, on on REL_14_STABLE?
Postgres HEAD.
On Tue, 30 Jul 2024 at 19:06, Zhang Mingli <zmlpostgres@gmail.com> wrote: > I’ve updated the comments in patch v1 with an additional Assert. Thanks. I adjusted a little and used a StaticAssert instead then pushed. StaticAssert seems better as invalid values will result in compilation failure. David