Re: Fix some ubsan/asan related issues - Mailing list pgsql-hackers

From Tristan Partin
Subject Re: Fix some ubsan/asan related issues
Date
Msg-id CYXOX7BI77Z2.2897DGYULLJWO@neon.tech
Whole thread Raw
In response to Re: Fix some ubsan/asan related issues  (Andres Freund <andres@anarazel.de>)
Responses Re: Fix some ubsan/asan related issues
List pgsql-hackers
On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
> Hi,
>
> On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
> > From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
> > From: Tristan Partin <tristan@neon.tech>
> > Date: Mon, 29 Jan 2024 18:03:39 -0600
> > Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
> >  NULL
>
> > If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
> > the API contract of memcpy in glibc. The two pointer arguments are
> > marked as nonnull, even in the event the amount to copy is 0 bytes.
>
> It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
> that something useful?

Dropped. Will change on the Neon side. Should we add an assert
somewhere for good measure? Near the memcpy in question?

> > From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
> > From: Tristan Partin <tristan@neon.tech>
> > Date: Wed, 24 Jan 2024 17:07:01 -0600
> > Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
> >
> > The ecpg is parser is extremely leaky, so we need to silence leak
> > detection.
>
> This does stuff beyond epcg...

Dropped.

> > +if get_option('b_sanitize').contains('address')
> > +  cdata.set('USE_ADDRESS_SANITIZER', 1)
> > +endif
> >
> >  ###############################################################
> >  # NLS / Gettext
> > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> > index ac409b0006..e18e716d9c 100644
> > --- a/src/bin/initdb/initdb.c
> > +++ b/src/bin/initdb/initdb.c
> > @@ -338,6 +338,17 @@ do { \
> >          output_failed = true, output_errno = errno; \
> >  } while (0)
> >
> > +#ifdef USE_ADDRESS_SANITIZER
>
> When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
> implement this ourselves.

Thanks!

> > +const char *__asan_default_options(void);
> > +
> > +const char *__asan_default_options(void)
> > +{
> > +    return "detect_leaks=0";
> > +}
> > +
> > +#endif
>
> Wonder if we should move this into some static library and link it into all
> binaries that don't want leak detection? It doesn't seem great to have to
> adjust this in a bunch of files if we want to adjust the options...

See attached patches. Here is what I found to be necessary to get
a -Db_sanitize=address,undefined build to successfully make it through
tests. I do have a few concerns about the patch.

1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak
   sanitizer is enabled. So you will see me use this, to make some
   include directives work. I don't like this as a final solution
   because someone could use -fsanitize=leak.
2. I tracked down what seems to be a valid leak in adt/xml.c. Attached
   (test.sql) is a fairly minimal reproduction of what is needed to show
   the leak. I didn't spend too much time tracking it down. Might get to
   it later, who knows. Below you will find the backtrace, and whoever
   wants to try their hand at fixing it will need to comment out
   xmlNewNode in the leak.supp file.
3. I don't love the new library name. Maybe it should be name more lsan
   specific.
4. Should pg_attribute_no_asan be renamed to
   pg_attribute_no_sanitize_address? That would match
   pg_attribute_no_sanitize_alignment.

I will also attach a Meson test log for good measure. I didn't try
testing anything that requires PG_TEST_EXTRA, but I suspect that
everything will be fine.

Alexander, I haven't yet gotten to the things you pointed out in the
sibling thread.

==221848==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 120 byte(s) in 1 object(s) allocated from:
    #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
    #1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
    #2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
    #3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
    #4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
    #5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
    #6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
    #7 0x109655d in ExecProject ../src/include/executor/executor.h:389
    #8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
    #9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
    #10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
    #11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
    #12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
    #13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
    #14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
    #15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
    #16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
    #17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
    #18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
    #19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
    #20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
    #21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
    #22 0x11a5f67 in main ../src/backend/main/main.c:198
    #23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #25 0x49c5d4 in _start
(/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4)
(BuildId:c8ca341e1303be0f9dc0b0271c55c4b9e42af89b) 

Indirect leak of 13 byte(s) in 1 object(s) allocated from:
    #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
    #1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
    #2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
    #3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
    #4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
    #5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
    #6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
    #7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
    #8 0x109655d in ExecProject ../src/include/executor/executor.h:389
    #9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
    #10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
    #11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
    #12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
    #13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
    #14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
    #15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
    #16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
    #17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
    #18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
    #19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
    #20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
    #21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
    #22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
    #23 0x11a5f67 in main ../src/backend/main/main.c:198
    #24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
    #26 0x49c5d4 in _start
(/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4)
(BuildId:c8ca341e1303be0f9dc0b0271c55c4b9e42af89b) 

SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).

--
Tristan Partin
Neon (https://neon.tech)

Attachment

pgsql-hackers by date:

Previous
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby
Next
From: Peter Smith
Date:
Subject: Re: Synchronizing slots from primary to standby