Re: Fix some ubsan/asan related issues - Mailing list pgsql-hackers

From Junwang Zhao
Subject Re: Fix some ubsan/asan related issues
Date
Msg-id CAEG8a3LKmvZtT4jSRXma=eaDnEvdRD8cFdOVVRVCF50aNdMN2A@mail.gmail.com
Whole thread Raw
In response to Re: Fix some ubsan/asan related issues  ("Tristan Partin" <tristan@neon.tech>)
List pgsql-hackers
Hi Tristan,

On Tue, Feb 6, 2024 at 11:53 AM Tristan Partin <tristan@neon.tech> wrote:
>
> On Tue Jan 30, 2024 at 3:58 PM CST, Andres Freund wrote:
> > Hi,
> >
> > On 2024-01-30 09:59:25 -0600, Tristan Partin wrote:
> > > From 331cec1c9db6ff60dcc6d9ba62a9c8be4e5e95ed Mon Sep 17 00:00:00 2001
> > > From: Tristan Partin <tristan@neon.tech>
> > > Date: Mon, 29 Jan 2024 18:03:39 -0600
> > > Subject: [PATCH v1 1/3] Refuse to register message in LogLogicalMessage if
> > >  NULL
> >
> > > If this occurs, the memcpy of rdata_data in CopyXLogRecordToWAL breaks
> > > the API contract of memcpy in glibc. The two pointer arguments are
> > > marked as nonnull, even in the event the amount to copy is 0 bytes.
> >
> > It seems pretty odd to call LogLogicalMessage() with a NULL argument. Why is
> > that something useful?
>
> Dropped. Will change on the Neon side. Should we add an assert
> somewhere for good measure? Near the memcpy in question?
>
> > > From dc9488f3fdee69b981b52c985fb77106d7d301ff Mon Sep 17 00:00:00 2001
> > > From: Tristan Partin <tristan@neon.tech>
> > > Date: Wed, 24 Jan 2024 17:07:01 -0600
> > > Subject: [PATCH v1 2/3] meson: Support compiling with -Db_sanitize=address
> > >
> > > The ecpg is parser is extremely leaky, so we need to silence leak
> > > detection.
> >
> > This does stuff beyond epcg...
>
> Dropped.
>
> > > +if get_option('b_sanitize').contains('address')
> > > +  cdata.set('USE_ADDRESS_SANITIZER', 1)
> > > +endif
> > >
> > >  ###############################################################
> > >  # NLS / Gettext
> > > diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
> > > index ac409b0006..e18e716d9c 100644
> > > --- a/src/bin/initdb/initdb.c
> > > +++ b/src/bin/initdb/initdb.c
> > > @@ -338,6 +338,17 @@ do { \
> > >             output_failed = true, output_errno = errno; \
> > >  } while (0)
> > >
> > > +#ifdef USE_ADDRESS_SANITIZER
> >
> > When asan is used  __SANITIZE_ADDRESS__ is defined, so we don't need to
> > implement this ourselves.
>
> Thanks!
>
> > > +const char *__asan_default_options(void);
> > > +
> > > +const char *__asan_default_options(void)
> > > +{
> > > +   return "detect_leaks=0";
> > > +}
> > > +
> > > +#endif
> >
> > Wonder if we should move this into some static library and link it into all
> > binaries that don't want leak detection? It doesn't seem great to have to
> > adjust this in a bunch of files if we want to adjust the options...
>
> See attached patches. Here is what I found to be necessary to get
> a -Db_sanitize=address,undefined build to successfully make it through
> tests. I do have a few concerns about the patch.
>
> 1. For whatever reason, __SANITIZE_LEAK__ is not defined when the leak
>    sanitizer is enabled. So you will see me use this, to make some
>    include directives work. I don't like this as a final solution
>    because someone could use -fsanitize=leak.
> 2. I tracked down what seems to be a valid leak in adt/xml.c. Attached
>    (test.sql) is a fairly minimal reproduction of what is needed to show
>    the leak. I didn't spend too much time tracking it down. Might get to
>    it later, who knows. Below you will find the backtrace, and whoever
>    wants to try their hand at fixing it will need to comment out
>    xmlNewNode in the leak.supp file.
> 3. I don't love the new library name. Maybe it should be name more lsan
>    specific.
> 4. Should pg_attribute_no_asan be renamed to
>    pg_attribute_no_sanitize_address? That would match
>    pg_attribute_no_sanitize_alignment.
>
> I will also attach a Meson test log for good measure. I didn't try
> testing anything that requires PG_TEST_EXTRA, but I suspect that
> everything will be fine.
>
> Alexander, I haven't yet gotten to the things you pointed out in the
> sibling thread.
>
> ==221848==ERROR: LeakSanitizer: detected memory leaks
>
> Direct leak of 120 byte(s) in 1 object(s) allocated from:
>     #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
>     #1 0x7fac4a48427d in xmlNewNode (/lib64/libxml2.so.2+0x5d27d) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
>     #2 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
>     #3 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
>     #4 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
>     #5 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
>     #6 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
>     #7 0x109655d in ExecProject ../src/include/executor/executor.h:389
>     #8 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
>     #9 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
>     #10 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
>     #11 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
>     #12 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
>     #13 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
>     #14 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
>     #15 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
>     #16 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
>     #17 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
>     #18 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
>     #19 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
>     #20 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
>     #21 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
>     #22 0x11a5f67 in main ../src/backend/main/main.c:198
>     #23 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
>     #24 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
>     #25 0x49c5d4 in _start
(/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4)
(BuildId:c8ca341e1303be0f9dc0b0271c55c4b9e42af89b) 
>
> Indirect leak of 13 byte(s) in 1 object(s) allocated from:
>     #0 0x7fac4a6d92ef in malloc (/lib64/libasan.so.8+0xd92ef) (BuildId: 7fcb7759bc17ef47f9682414b6d99732d6a6ab0c)
>     #1 0x7fac4a4e106f in xmlStrndup (/lib64/libxml2.so.2+0xba06f) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
>     #2 0x7fac4a4842c0 in xmlNewNode (/lib64/libxml2.so.2+0x5d2c0) (BuildId: 3074681c8fa9b17e0cbed09bc61c25ada5c28e7c)
>     #3 0x22107a6 in xmltotext_with_options ../src/backend/utils/adt/xml.c:754
>     #4 0xead047 in ExecEvalXmlExpr ../src/backend/executor/execExprInterp.c:4020
>     #5 0xe8c119 in ExecInterpExpr ../src/backend/executor/execExprInterp.c:1537
>     #6 0xe91f2c in ExecInterpExprStillValid ../src/backend/executor/execExprInterp.c:1881
>     #7 0x109632d in ExecEvalExprSwitchContext ../src/include/executor/executor.h:355
>     #8 0x109655d in ExecProject ../src/include/executor/executor.h:389
>     #9 0x1097186 in ExecResult ../src/backend/executor/nodeResult.c:136
>     #10 0xf0f90c in ExecProcNodeFirst ../src/backend/executor/execProcnode.c:464
>     #11 0xec9bec in ExecProcNode ../src/include/executor/executor.h:273
>     #12 0xed875c in ExecutePlan ../src/backend/executor/execMain.c:1670
>     #13 0xecbee0 in standard_ExecutorRun ../src/backend/executor/execMain.c:365
>     #14 0xecb529 in ExecutorRun ../src/backend/executor/execMain.c:309
>     #15 0x1ae89f6 in PortalRunSelect ../src/backend/tcop/pquery.c:924
>     #16 0x1ae7c06 in PortalRun ../src/backend/tcop/pquery.c:768
>     #17 0x1ad1b43 in exec_simple_query ../src/backend/tcop/postgres.c:1273
>     #18 0x1adf8de in PostgresMain ../src/backend/tcop/postgres.c:4653
>     #19 0x170dbce in BackendRun ../src/backend/postmaster/postmaster.c:4464
>     #20 0x170bf70 in BackendStartup ../src/backend/postmaster/postmaster.c:4140
>     #21 0x170263f in ServerLoop ../src/backend/postmaster/postmaster.c:1776
>     #22 0x1701052 in PostmasterMain ../src/backend/postmaster/postmaster.c:1475
>     #23 0x11a5f67 in main ../src/backend/main/main.c:198
>     #24 0x7fac48e46149 in __libc_start_call_main (/lib64/libc.so.6+0x28149) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
>     #25 0x7fac48e4620a in __libc_start_main_impl (/lib64/libc.so.6+0x2820a) (BuildId:
7ea8d85df0e89b90c63ac7ed2b3578b2e7728756)
>     #26 0x49c5d4 in _start
(/home/tristan957/Projects/work/postgresql/build/tmp_install/home/tristan957/.opt/postgresql/bin/postgres+0x49c5d4)
(BuildId:c8ca341e1303be0f9dc0b0271c55c4b9e42af89b) 
>
> SUMMARY: AddressSanitizer: 133 byte(s) leaked in 2 allocation(s).
>
> --
> Tristan Partin
> Neon (https://neon.tech)

I tried your v1-0002, it works at compile phase but failed to run initdb
with the following leak detected:

=================================================================
==64983==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 248 byte(s) in 1 object(s) allocated from:
    #0 0x7fc7729df9cf in __interceptor_malloc
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:69
    #1 0x55bff5480e8b in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:190
    #2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
    #3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58

Indirect leak of 19 byte(s) in 1 object(s) allocated from:
    #0 0x7fc77299777b in __interceptor_strdup
../../../../src/libsanitizer/asan/asan_interceptors.cpp:439
    #1 0x55bff5480f41 in save_ps_display_args
../postgres/src/backend/utils/misc/ps_status.c:198
    #2 0x55bff4a5a298 in main ../postgres/src/backend/main/main.c:90
    #3 0x7fc771924249 in __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58

I worked around by moving *new_environ* as a global variable, I think this is
false positive report so maybe this deserves a patch?

I tried to apply your v2 patch set but v2-0004 seems out of date.

--
Regards
Junwang Zhao



pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: scalability bottlenecks with (many) partitions (and more)
Next
From: Jim Jones
Date:
Subject: Re: Psql meta-command conninfo+