Re: Question on ThrowErrorData - Mailing list pgsql-hackers

From Zhenghua Lyu
Subject Re: Question on ThrowErrorData
Date
Msg-id CANerzAd5kaVkR_veQnWuQO2sp-_AJg5YGs_WJvikCwGCb9HjTw@mail.gmail.com
Whole thread Raw
In response to Re: Question on ThrowErrorData  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
The comment seems not correct after the commit:

commit dccda847b709e38ce6f09ac5c47f2bdf30e93a2c
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Jun 27 14:43:59 2024 -0400

    Avoid crashing when a JIT-inlined backend function throws an error.

    errfinish() assumes that the __FUNC__ and __FILE__ arguments it's
    passed are compile-time constant strings that can just be pointed
    to rather than physically copied.  However, it's possible for LLVM
    to generate code in which those pointers point into a dynamically
    loaded code segment.  If that segment gets unloaded before we're
    done with the ErrorData struct, we have dangling pointers that
    will lead to SIGSEGV.  In simple cases that won't happen, because we
    won't unload LLVM code before end of transaction.  But it's possible
    to happen if the error is thrown within end-of-transaction code run by
    _SPI_commit or _SPI_rollback, because since commit 2e517818f those
    functions clean up by ending the transaction and starting a new one.

    Rather than fixing this by adding pstrdup() overhead to every
    elog/ereport sequence, let's fix it by copying the risky pointers
    in CopyErrorData().  That solves it for _SPI_commit/_SPI_rollback
    because they use that function to preserve the error data across
    the transaction end/restart sequence; and it seems likely that
    any other code doing something similar would need to do that too.

    I'm suspicious that this behavior amounts to an LLVM bug (or a
    bug in our use of it?), because it implies that string constant
    references that should be pointer-equal according to a naive
    understanding of C semantics will sometimes not be equal.
    However, even if it is a bug and someday gets fixed, we'll have
    to cope with the current behavior for a long time to come.

    Report and patch by me.  Back-patch to all supported branches.

    Discussion: https://postgr.es/m/1565654.1719425368@sss.pgh.pa.us

diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 1df08e40218..d7f08e82ac1 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1492,7 +1492,21 @@ CopyErrorData(void)
        newedata = (ErrorData *) palloc(sizeof(ErrorData));
        memcpy(newedata, edata, sizeof(ErrorData));

-       /* Make copies of separately-allocated fields */
+       /*
+        * Make copies of separately-allocated strings.  Note that we copy even
+        * theoretically-constant strings such as filename.  This is because those
+        * could point into JIT-created code segments that might get unloaded at
+        * transaction cleanup.  In some cases we need the copied ErrorData to
+        * survive transaction boundaries, so we'd better copy those strings too.
+        */
+       if (newedata->filename)
+               newedata->filename = pstrdup(newedata->filename);
+       if (newedata->funcname)
+               newedata->funcname = pstrdup(newedata->funcname);
+       if (newedata->domain)
+               newedata->domain = pstrdup(newedata->domain);
+       if (newedata->context_domain)
+               newedata->context_domain = pstrdup(newedata->context_domain);
        if (newedata->message)
                newedata->message = pstrdup(newedata->message);


=============================================================

Below is a case shown via GDB on latest master, you can see the contract break.

(gdb) p errordata[errordata_stack_depth]
$1 = {elevel = 21, output_to_server = true, output_to_client = true, hide_stmt = false, hide_ctx = false, filename = 0x5630ba40b450 "postgres.c",
  lineno = 3347, funcname = 0x5630ba40b470 "ProcessInterrupts", domain = 0x56308a93bf0a "postgres-19", context_domain = 0x56308a93bf0a "postgres-19",
  sqlerrcode = 16908741, message = 0x5630ba2dfd10 "terminating background worker \"parallel worker\" due to administrator command", detail = 0x0,
  detail_log = 0x0, hint = 0x0, context = 0x5630ba2dfda0 "parallel worker", backtrace = 0x0, message_id = 0x0, schema_name = 0x0, table_name = 0x0,
  column_name = 0x0, datatype_name = 0x0, constraint_name = 0x0, cursorpos = 0, internalpos = 0, internalquery = 0x0, saved_errno = 4,
  assoc_context = 0x5630ba2dfc10}
(gdb) p errordata[errordata_stack_depth].assoc_context [0]
$2 = {type = T_AllocSetContext, isReset = false, allowInCritSection = true, mem_allocated = 8192, methods = 0x56308aae3bf0 <mcxt_methods+240>,
  parent = 0x5630ba2ddc00, firstchild = 0x0, prevchild = 0x5630ba2e53f0, nextchild = 0x0, name = 0x56308a953217 "ErrorContext", ident = 0x0,
  reset_cbs = 0x0}
(gdb) p errordata[errordata_stack_depth].filename
$3 = 0x5630ba40b450 "postgres.c"
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)]
$4 = {alloc = 0x56308a6d4250 <AllocSetAlloc>, free_p = 0x56308a6d4480 <AllocSetFree>, realloc = 0x56308a6d4822 <AllocSetRealloc>,
  reset = 0x56308a6d383f <AllocSetReset>, delete_context = 0x56308a6d3a56 <AllocSetDelete>, get_chunk_context = 0x56308a6d4cff <AllocSetGetChunkContext>,
  get_chunk_space = 0x56308a6d4d9b <AllocSetGetChunkSpace>, is_empty = 0x56308a6d4e71 <AllocSetIsEmpty>, stats = 0x56308a6d4ecd <AllocSetStats>,
  check = 0x56308a6d5218 <AllocSetCheck>}
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)].get_chunk_context(0x5630ba40b450)
$5 = (struct MemoryContextData *) 0x5630ba40aeb0
(gdb) p mcxt_methods[GetMemoryChunkMethodID(0x5630ba40b450)].get_chunk_context(0x5630ba40b450)[0]
$6 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, mem_allocated = 8192, methods = 0x56308aae3bf0 <mcxt_methods+240>,
  parent = 0x5630ba2ddc00, firstchild = 0x0, prevchild = 0x0, nextchild = 0x5630ba3da1e0, name = 0x56308a753c4a "ProcessParallelMessages", ident = 0x0,
  reset_cbs = 0x0}
(gdb)




Tom Lane <tgl@sss.pgh.pa.us> 于2025年10月20日周一 20:05写道:
正华吕 <kainwen@gmail.com> writes:
>      Inside errfinish, it just does a simple pointer assignment to set
>      filename and funcname field (via function set_stack_entry_location()).

Please note the comment on struct ErrorData:

 * ErrorData holds the data accumulated during any one ereport() cycle.
 * Any non-NULL pointers must point to palloc'd data.
 * (The const pointers are an exception; we assume they point at non-freeable
 * constant strings.)
...
    const char *filename;       /* __FILE__ of ereport() call */
...
    const char *funcname;       /* __func__ of ereport() call */

In practice these are always pointing at compiler-generated
constant strings.

                        regards, tom lane

pgsql-hackers by date:

Previous
From: "Matheus Alcantara"
Date:
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Next
From: Chao Li
Date:
Subject: Re: formatting.c cleanup