Re: Question on ThrowErrorData - Mailing list pgsql-hackers
From | Zhenghua Lyu |
---|---|
Subject | Re: Question on ThrowErrorData |
Date | |
Msg-id | CANerzAd5kaVkR_veQnWuQO2sp-_AJg5YGs_WJvikCwGCb9HjTw@mail.gmail.com Whole thread Raw |
In response to | Re: Question on ThrowErrorData (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
The comment seems not correct after the commit:
commit dccda847b709e38ce6f09ac5c47f2bdf30e93a2c
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Jun 27 14:43:59 2024 -0400
Avoid crashing when a JIT-inlined backend function throws an error.
errfinish() assumes that the __FUNC__ and __FILE__ arguments it's
passed are compile-time constant strings that can just be pointed
to rather than physically copied. However, it's possible for LLVM
to generate code in which those pointers point into a dynamically
loaded code segment. If that segment gets unloaded before we're
done with the ErrorData struct, we have dangling pointers that
will lead to SIGSEGV. In simple cases that won't happen, because we
won't unload LLVM code before end of transaction. But it's possible
to happen if the error is thrown within end-of-transaction code run by
_SPI_commit or _SPI_rollback, because since commit 2e517818f those
functions clean up by ending the transaction and starting a new one.
Rather than fixing this by adding pstrdup() overhead to every
elog/ereport sequence, let's fix it by copying the risky pointers
in CopyErrorData(). That solves it for _SPI_commit/_SPI_rollback
because they use that function to preserve the error data across
the transaction end/restart sequence; and it seems likely that
any other code doing something similar would need to do that too.
I'm suspicious that this behavior amounts to an LLVM bug (or a
bug in our use of it?), because it implies that string constant
references that should be pointer-equal according to a naive
understanding of C semantics will sometimes not be equal.
However, even if it is a bug and someday gets fixed, we'll have
to cope with the current behavior for a long time to come.
Report and patch by me. Back-patch to all supported branches.
Discussion: https://postgr.es/m/1565654.1719425368@sss.pgh.pa.us
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1df08e40218..d7f08e82ac1 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1492,7 +1492,21 @@ CopyErrorData(void)
newedata = (ErrorData *) palloc(sizeof(ErrorData));
memcpy(newedata, edata, sizeof(ErrorData));
- /* Make copies of separately-allocated fields */
+ /*
+ * Make copies of separately-allocated strings. Note that we copy even
+ * theoretically-constant strings such as filename. This is because those
+ * could point into JIT-created code segments that might get unloaded at
+ * transaction cleanup. In some cases we need the copied ErrorData to
+ * survive transaction boundaries, so we'd better copy those strings too.
+ */
+ if (newedata->filename)
+ newedata->filename = pstrdup(newedata->filename);
+ if (newedata->funcname)
+ newedata->funcname = pstrdup(newedata->funcname);
+ if (newedata->domain)
+ newedata->domain = pstrdup(newedata->domain);
+ if (newedata->context_domain)
+ newedata->context_domain = pstrdup(newedata->context_domain);
if (newedata->message)
newedata->message = pstrdup(newedata->message);
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu Jun 27 14:43:59 2024 -0400
Avoid crashing when a JIT-inlined backend function throws an error.
errfinish() assumes that the __FUNC__ and __FILE__ arguments it's
passed are compile-time constant strings that can just be pointed
to rather than physically copied. However, it's possible for LLVM
to generate code in which those pointers point into a dynamically
loaded code segment. If that segment gets unloaded before we're
done with the ErrorData struct, we have dangling pointers that
will lead to SIGSEGV. In simple cases that won't happen, because we
won't unload LLVM code before end of transaction. But it's possible
to happen if the error is thrown within end-of-transaction code run by
_SPI_commit or _SPI_rollback, because since commit 2e517818f those
functions clean up by ending the transaction and starting a new one.
Rather than fixing this by adding pstrdup() overhead to every
elog/ereport sequence, let's fix it by copying the risky pointers
in CopyErrorData(). That solves it for _SPI_commit/_SPI_rollback
because they use that function to preserve the error data across
the transaction end/restart sequence; and it seems likely that
any other code doing something similar would need to do that too.
I'm suspicious that this behavior amounts to an LLVM bug (or a
bug in our use of it?), because it implies that string constant
references that should be pointer-equal according to a naive
understanding of C semantics will sometimes not be equal.
However, even if it is a bug and someday gets fixed, we'll have
to cope with the current behavior for a long time to come.
Report and patch by me. Back-patch to all supported branches.
Discussion: https://postgr.es/m/1565654.1719425368@sss.pgh.pa.us
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1df08e40218..d7f08e82ac1 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1492,7 +1492,21 @@ CopyErrorData(void)
newedata = (ErrorData *) palloc(sizeof(ErrorData));
memcpy(newedata, edata, sizeof(ErrorData));
- /* Make copies of separately-allocated fields */
+ /*
+ * Make copies of separately-allocated strings. Note that we copy even
+ * theoretically-constant strings such as filename. This is because those
+ * could point into JIT-created code segments that might get unloaded at
+ * transaction cleanup. In some cases we need the copied ErrorData to
+ * survive transaction boundaries, so we'd better copy those strings too.
+ */
+ if (newedata->filename)
+ newedata->filename = pstrdup(newedata->filename);
+ if (newedata->funcname)
+ newedata->funcname = pstrdup(newedata->funcname);
+ if (newedata->domain)
+ newedata->domain = pstrdup(newedata->domain);
+ if (newedata->context_domain)
+ newedata->context_domain = pstrdup(newedata->context_domain);
if (newedata->message)
newedata->message = pstrdup(newedata->message);
=============================================================
Below is a case shown via GDB on latest master, you can see the contract break.
(gdb) p errordata[errordata_stack_depth]
$1 = {elevel = 21, output_to_server = true, output_to_client = true, hide_stmt = false, hide_ctx = false, filename = 0x5630ba40b450 "postgres.c",
lineno = 3347, funcname = 0x5630ba40b470 "ProcessInterrupts", domain = 0x56308a93bf0a "postgres-19", context_domain = 0x56308a93bf0a "postgres-19",
sqlerrcode = 16908741, message = 0x5630ba2dfd10 "terminating background worker \"parallel worker\" due to administrator command", detail = 0x0,
detail_log = 0x0, hint = 0x0, context = 0x5630ba2dfda0 "parallel worker", backtrace = 0x0, message_id = 0x0, schema_name = 0x0, table_name = 0x0,
column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, internalquery = 0x0, saved_errno = 4,
assoc_context = 0x5630ba2dfc10}
(gdb) p errordata[errordata_stack_depth].assoc_context [0]
$2 = {type = T_AllocSetContext, isReset = false, allowInCritSection = true, mem_allocated = 8192, methods = 0x56308aae3bf0 <mcxt_methods+240>,
parent = 0x5630ba2ddc00, firstchild = 0x0, prevchild = 0x5630ba2e53f0, nextchild = 0x0, name = 0x56308a953217 "ErrorContext", ident = 0x0,
reset_cbs = 0x0}
(gdb) p errordata[errordata_stack_depth].filename
$3 = 0x5630ba40b450 "postgres.c"
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)]
$4 = {alloc = 0x56308a6d4250 <AllocSetAlloc>, free_p = 0x56308a6d4480 <AllocSetFree>, realloc = 0x56308a6d4822 <AllocSetRealloc>,
reset = 0x56308a6d383f <AllocSetReset>, delete_context = 0x56308a6d3a56 <AllocSetDelete>, get_chunk_context = 0x56308a6d4cff <AllocSetGetChunkContext>,
get_chunk_space = 0x56308a6d4d9b <AllocSetGetChunkSpace>, is_empty = 0x56308a6d4e71 <AllocSetIsEmpty>, stats = 0x56308a6d4ecd <AllocSetStats>,
check = 0x56308a6d5218 <AllocSetCheck>}
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)].get_chunk_context(0x5630ba40b450)
$5 = (struct MemoryContextData *) 0x5630ba40aeb0
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)].get_chunk_context(0x5630ba40b450)[0]
$6 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, mem_allocated = 8192, methods = 0x56308aae3bf0 <mcxt_methods+240>,
parent = 0x5630ba2ddc00, firstchild = 0x0, prevchild = 0x0, nextchild = 0x5630ba3da1e0, name = 0x56308a753c4a "ProcessParallelMessages", ident = 0x0,
reset_cbs = 0x0}
(gdb)
$1 = {elevel = 21, output_to_server = true, output_to_client = true, hide_stmt = false, hide_ctx = false, filename = 0x5630ba40b450 "postgres.c",
lineno = 3347, funcname = 0x5630ba40b470 "ProcessInterrupts", domain = 0x56308a93bf0a "postgres-19", context_domain = 0x56308a93bf0a "postgres-19",
sqlerrcode = 16908741, message = 0x5630ba2dfd10 "terminating background worker \"parallel worker\" due to administrator command", detail = 0x0,
detail_log = 0x0, hint = 0x0, context = 0x5630ba2dfda0 "parallel worker", backtrace = 0x0, message_id = 0x0, schema_name = 0x0, table_name = 0x0,
column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, internalquery = 0x0, saved_errno = 4,
assoc_context = 0x5630ba2dfc10}
(gdb) p errordata[errordata_stack_depth].assoc_context [0]
$2 = {type = T_AllocSetContext, isReset = false, allowInCritSection = true, mem_allocated = 8192, methods = 0x56308aae3bf0 <mcxt_methods+240>,
parent = 0x5630ba2ddc00, firstchild = 0x0, prevchild = 0x5630ba2e53f0, nextchild = 0x0, name = 0x56308a953217 "ErrorContext", ident = 0x0,
reset_cbs = 0x0}
(gdb) p errordata[errordata_stack_depth].filename
$3 = 0x5630ba40b450 "postgres.c"
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)]
$4 = {alloc = 0x56308a6d4250 <AllocSetAlloc>, free_p = 0x56308a6d4480 <AllocSetFree>, realloc = 0x56308a6d4822 <AllocSetRealloc>,
reset = 0x56308a6d383f <AllocSetReset>, delete_context = 0x56308a6d3a56 <AllocSetDelete>, get_chunk_context = 0x56308a6d4cff <AllocSetGetChunkContext>,
get_chunk_space = 0x56308a6d4d9b <AllocSetGetChunkSpace>, is_empty = 0x56308a6d4e71 <AllocSetIsEmpty>, stats = 0x56308a6d4ecd <AllocSetStats>,
check = 0x56308a6d5218 <AllocSetCheck>}
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)].get_chunk_context(0x5630ba40b450)
$5 = (struct MemoryContextData *) 0x5630ba40aeb0
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)].get_chunk_context(0x5630ba40b450)[0]
$6 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, mem_allocated = 8192, methods = 0x56308aae3bf0 <mcxt_methods+240>,
parent = 0x5630ba2ddc00, firstchild = 0x0, prevchild = 0x0, nextchild = 0x5630ba3da1e0, name = 0x56308a753c4a "ProcessParallelMessages", ident = 0x0,
reset_cbs = 0x0}
(gdb)
Tom Lane <tgl@sss.pgh.pa.us> 于2025年10月20日周一 20:05写道:
正华吕 <kainwen@gmail.com> writes:
> Inside errfinish, it just does a simple pointer assignment to set
> filename and funcname field (via function set_stack_entry_location()).
Please note the comment on struct ErrorData:
* ErrorData holds the data accumulated during any one ereport() cycle.
* Any non-NULL pointers must point to palloc'd data.
* (The const pointers are an exception; we assume they point at non-freeable
* constant strings.)
...
const char *filename; /* __FILE__ of ereport() call */
...
const char *funcname; /* __func__ of ereport() call */
In practice these are always pointing at compiler-generated
constant strings.
regards, tom lane
pgsql-hackers by date: