Thread: LLVM Address Sanitizer (ASAN) and valgrind support

LLVM Address Sanitizer (ASAN) and valgrind support

From
Greg Stark
Date:
I feel like I remember hearing about this before but I can't find any
mention of it in my mail archives. It seems pretty simple to add
support for LLVM's Address Sanitizer (asan) by using the hooks we
already have for valgrind.

In fact I think this would actually be sufficient. I'm not sure what
the MEMPOOL valgrind stuff is though. I don't think it's relevant to
address sanitizer which only tracks references to free'd or
unallocated pointers.

I don't even see any need offhand for a configure flag or autoconf
test. We could have a configure flag just to be consistent with
valgrind but it seems pointless. If you're compiling with asan I don't
see any reason to not use it. I'm building this to see if it works
now.

Incidentally there's another sanitizer called msan that looks even
more promising. It's more like valgrind in that it tracks undefined
memory. It's not working for me though and I haven't spent much time
trying to figure out why yet.


diff --git a/src/include/utils/memdebug.h b/src/include/utils/memdebug.h
index 608facc..7696986 100644
--- a/src/include/utils/memdebug.h
+++ b/src/include/utils/memdebug.h
@@ -19,6 +19,27 @@
#ifdef USE_VALGRIND#include <valgrind/memcheck.h>
+
+#elif __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
+
+#include <sanitizer/asan_interface.h>
+
+#define VALGRIND_MAKE_MEM_DEFINED(addr, size) \
+ ASAN_UNPOISON_MEMORY_REGION(addr, size)
+
+#define VALGRIND_MAKE_MEM_NOACCESS(addr, size) \
+ ASAN_POISON_MEMORY_REGION(addr, size)
+
+#define VALGRIND_MAKE_MEM_UNDEFINED(addr, size) \
+ ASAN_UNPOISON_MEMORY_REGION(addr, size)
+
+#define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size) do {} while (0)
+#define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed) do {} while (0)
+#define VALGRIND_DESTROY_MEMPOOL(context) do {} while (0)
+#define VALGRIND_MEMPOOL_ALLOC(context, addr, size) do {} while (0)
+#define VALGRIND_MEMPOOL_FREE(context, addr) do {} while (0)
+#define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size) do {} while (0)
+#else#define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size) do {} while (0)#define VALGRIND_CREATE_MEMPOOL(context,
redzones,zeroed) do {} while (0)
 


-- 
greg



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Noah Misch
Date:
On Mon, Sep 07, 2015 at 05:05:10PM +0100, Greg Stark wrote:
> I feel like I remember hearing about this before but I can't find any
> mention of it in my mail archives. It seems pretty simple to add
> support for LLVM's Address Sanitizer (asan) by using the hooks we
> already have for valgrind.

Nice.

> In fact I think this would actually be sufficient. I'm not sure what
> the MEMPOOL valgrind stuff is though. I don't think it's relevant to
> address sanitizer which only tracks references to free'd or
> unallocated pointers.

Documentation: http://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools

A Valgrind memory pool is the precise analog of a PostgreSQL memory context.
The design of the PostgreSQL Valgrind support gave mcxt.c primary
responsibility for validating and invalidating pointers, and mcxt.c uses
MEMPOOL client requests to do so.  While Valgrind could catch all the same
errors with just VALGRIND_MAKE_MEM_{DEFINED,NOACCESS,UNDEFINED}, the MEMPOOL
client requests improve the attribution of errors.  You get this:

==00:00:14:08.472 471537==  Address 0x8ecd848 is 24 bytes inside a block of size 25 client-defined
==00:00:14:08.472 471537==    at 0x7717FD: MemoryContextAlloc (mcxt.c:589)

Instead of this:

==00:00:02:29.587 18940==  Address 0x6a1b51b is 83,835 bytes inside a block of size 262,144 alloc'd
==00:00:02:29.587 18940==    at 0x4C2353F: malloc (vg_replace_malloc.c:236)
==00:00:02:29.587 18940==    by 0x7FD12F: AllocSetAlloc (aset.c:792)
==00:00:02:29.587 18940==    by 0x7FEE9D: MemoryContextAlloc (mcxt.c:524)

> I don't even see any need offhand for a configure flag or autoconf
> test. We could have a configure flag just to be consistent with
> valgrind but it seems pointless. If you're compiling with asan I don't
> see any reason to not use it. I'm building this to see if it works
> now.

I agree.  A flag guards Valgrind client requests, because we'd otherwise have
no idea whether the user plans to run the binary under Valgrind.  For ASAN,
all relevant decisions happen at build time.

> --- a/src/include/utils/memdebug.h
> +++ b/src/include/utils/memdebug.h
> @@ -19,6 +19,27 @@
> 
>  #ifdef USE_VALGRIND
>  #include <valgrind/memcheck.h>
> +
> +#elif __has_feature(address_sanitizer) || defined(__SANITIZE_ADDRESS__)
> +
> +#include <sanitizer/asan_interface.h>
> +
> +#define VALGRIND_MAKE_MEM_DEFINED(addr, size) \
> + ASAN_UNPOISON_MEMORY_REGION(addr, size)
> +
> +#define VALGRIND_MAKE_MEM_NOACCESS(addr, size) \
> + ASAN_POISON_MEMORY_REGION(addr, size)
> +
> +#define VALGRIND_MAKE_MEM_UNDEFINED(addr, size) \
> + ASAN_UNPOISON_MEMORY_REGION(addr, size)
> +
> +#define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size) do {} while (0)
> +#define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed) do {} while (0)
> +#define VALGRIND_DESTROY_MEMPOOL(context) do {} while (0)
> +#define VALGRIND_MEMPOOL_ALLOC(context, addr, size) do {} while (0)
> +#define VALGRIND_MEMPOOL_FREE(context, addr) do {} while (0)
> +#define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size) do {} while (0)

aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit
VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit
VALGRIND_MAKE_MEM_NOACCESS().  #define those two accordingly.  If ASAN has no
equivalent of MEMPOOL client requests, it's fine to stub out the rest.



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Andres Freund
Date:
Hi,


On 2015-09-07 17:05:10 +0100, Greg Stark wrote:
> I feel like I remember hearing about this before but I can't find any
> mention of it in my mail archives. It seems pretty simple to add
> support for LLVM's Address Sanitizer (asan) by using the hooks we
> already have for valgrind.

Any plans to pick this up again?


> In fact I think this would actually be sufficient. I'm not sure what
> the MEMPOOL valgrind stuff is though. I don't think it's relevant to
> address sanitizer which only tracks references to free'd or
> unallocated pointers.

It'd be nice to add msan support, so uninitialized accesses are also
tracked. (oh, you suggest that below)


I vote for renaming the VALGRIND names etc. to something more
tool-neutral. I think it's going to be too confusing otherwise.


Andres



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Greg Stark
Date:
On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund <andres@anarazel.de> wrote:
> Any plans to pick this up again?

Yeah, I was just thinking I should pick this up again.

> I vote for renaming the VALGRIND names etc. to something more tool-neutral. I think it's going to be too confusing
otherwise.

Hm, the danger there is once I start refactoring things I could get
bogged down... I would love to remove all the #ifdef's and have the
macros just be no-ops if they're compiled out for example...

-- 
greg



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Andres Freund
Date:
On 2016-09-28 00:23:11 +0100, Greg Stark wrote:
> On Tue, Sep 27, 2016 at 11:02 PM, Andres Freund <andres@anarazel.de> wrote:
> > Any plans to pick this up again?
> 
> Yeah, I was just thinking I should pick this up again.
> 
> > I vote for renaming the VALGRIND names etc. to something more tool-neutral. I think it's going to be too confusing
otherwise.
> 
> Hm, the danger there is once I start refactoring things I could get
> bogged down...

Meh, adding a neutral name seems pretty harmless.

> I would love to remove all the #ifdef's and have the
> macros just be no-ops if they're compiled out for example...

Don't we pretty much have that?

#ifdef USE_VALGRIND
#include <valgrind/memcheck.h>
#else
#define VALGRIND_CHECK_MEM_IS_DEFINED(addr, size)            do {} while (0)
#define VALGRIND_CREATE_MEMPOOL(context, redzones, zeroed)    do {} while (0)
#define VALGRIND_DESTROY_MEMPOOL(context)                    do {} while (0)
#define VALGRIND_MAKE_MEM_DEFINED(addr, size)                do {} while (0)
#define VALGRIND_MAKE_MEM_NOACCESS(addr, size)                do {} while (0)
#define VALGRIND_MAKE_MEM_UNDEFINED(addr, size)                do {} while (0)
#define VALGRIND_MEMPOOL_ALLOC(context, addr, size)            do {} while (0)
#define VALGRIND_MEMPOOL_FREE(context, addr)                do {} while (0)
#define VALGRIND_MEMPOOL_CHANGE(context, optr, nptr, size)    do {} while (0)
#endif

?



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-09-28 00:23:11 +0100, Greg Stark wrote:
>> I would love to remove all the #ifdef's and have the
>> macros just be no-ops if they're compiled out for example...

> Don't we pretty much have that?

I think "((void) 0)" is a more common spelling of a dummy statement.
Though there's little wrong with it as it is.
        regards, tom lane



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Andres Freund
Date:
On 2016-09-27 19:31:57 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-09-28 00:23:11 +0100, Greg Stark wrote:
> >> I would love to remove all the #ifdef's and have the
> >> macros just be no-ops if they're compiled out for example...
> 
> > Don't we pretty much have that?
> 
> I think "((void) 0)" is a more common spelling of a dummy statement.
> Though there's little wrong with it as it is.

That's already committed code, I didn't propose to add it ;)



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Piotr Stefaniak
Date:
On 2016-09-28 00:02, Andres Freund wrote:
> On 2015-09-07 17:05:10 +0100, Greg Stark wrote:
>> I feel like I remember hearing about this before but I can't find any
>> mention of it in my mail archives. It seems pretty simple to add
>> support for LLVM's Address Sanitizer (asan) by using the hooks we
>> already have for valgrind.
>
> Any plans to pick this up again?

Not remembering the context, I was initially confused about what exactly
supposedly needs to be done in order to have ASan support, especially
since I've been using it for a couple of years without any kind of
modifications. Having read the whole thread now, I assume the discussion
is now about getting MSan support, since apparently it's been already
concluded that not much is needed for getting ASan support:

>> I don't even see any need offhand for a configure flag or autoconf
>> test. We could have a configure flag just to be consistent with
>> valgrind but it seems pointless. If you're compiling with asan I don't
>> see any reason to not use it. I'm building this to see if it works
>> now.
>
> I agree.  A flag guards Valgrind client requests, because we'd otherwise have
> no idea whether the user plans to run the binary under Valgrind.  For ASAN,
> all relevant decisions happen at build time.

Please correct me if I'm wrong.




Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Greg Stark
Date:
On Wed, Sep 28, 2016 at 7:40 AM, Piotr Stefaniak
<postgres@piotr-stefaniak.me> wrote:
> Not remembering the context, I was initially confused about what exactly
> supposedly needs to be done in order to have ASan support, especially
> since I've been using it for a couple of years without any kind of
> modifications. Having read the whole thread now, I assume the discussion
> is now about getting MSan support, since apparently it's been already
> concluded that not much is needed for getting ASan support:


The differnce between msan and asan is only related to whether
uninitialized reads are tracked. All other memory errors such as
reading past the end of an allocation or reading after a free are
tracked by both.

Without asan support in Postgres's memdebug.h asan will just track
whether you're using memory that is outside the memory that malloc has
handed Postgres. It doesn't know anything about whether that memory
has been returned by palloc or has since been pfree'd. Even the bounds
checking is not great since you could be reading from palloc's header
or from the bytes in the next palloc block that happened to be
returned by the same malloc (or another malloc if you're unlucky).

The support I added to memdebug.h called macros which call llvm
intrinsics to mark the memory malloc'd by Postgres as unusuable until
it's returned by palloc. Once it's returned by palloc it's marked
usable except for a "guard" byte at the end. Then pfree marks the
memory unusable again. This basically mimics the behaviour you would
get from asan if you were using malloc directly.

I added support for msan as well which is basically just one more
macro to make the distinction between usable but uninitialized memory
and usable and initialized memory. But I was unable to test it because
msan didn't work for me at all. This seems to be the way of things
with llvm. It's great stuff but there's always 10% that is broken
because there's some cool new thing that's better.

-- 
greg



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Greg Stark
Date:
On Sat, Feb 6, 2016 at 4:52 AM, Noah Misch <noah@leadboat.com> wrote:
> aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit
> VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit
> VALGRIND_MAKE_MEM_NOACCESS().  #define those two accordingly.  If ASAN has no

Actually this is confusing.

aset.c doesn't actually use the MEMPOOL_ALLOC macro at all, it just
calls UNDEFINED, DEFINED, and NOACCESS. mcxt.c does however do the
MEMPOOL_ALLOC/FREE. So both layers are calling these macros for
overlapping memory areas which I find very confusing and I'm not sure
what the net effect is.

The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
have convenient access to a size argument. It could call the
GetChunkSpace method but that will include the allocation overhead and
in any case isn't this memory already marked noaccess by aset.c?

-- 
greg



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Noah Misch
Date:
On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote:
> On Sat, Feb 6, 2016 at 4:52 AM, Noah Misch <noah@leadboat.com> wrote:
> > aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit
> > VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit
> > VALGRIND_MAKE_MEM_NOACCESS().  #define those two accordingly.  If ASAN has no
> 
> Actually this is confusing.
> 
> aset.c doesn't actually use the MEMPOOL_ALLOC macro at all, it just
> calls UNDEFINED, DEFINED, and NOACCESS. mcxt.c does however do the
> MEMPOOL_ALLOC/FREE.

Correct.

> So both layers are calling these macros for
> overlapping memory areas which I find very confusing and I'm not sure
> what the net effect is.

The net effect looks like this, at the instant an mcxt.c function returns:

NOACCESS:
- requested_size field of AllocChunkData
- Trailing padding, if any, of AllocChunkData
- Any chunk bytes after the last byte of the most recent requested size

DEFINED:
- palloc0() return value, up to requested size

UNDEFINED:
- palloc() return value, up to requested size
- repalloc() new portion, after size increase (with MEMORY_CONTEXT_CHECKING disabled, memory unfortunately becomes
DEFINEDinstead)
 

> The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
> have convenient access to a size argument. It could call the
> GetChunkSpace method but that will include the allocation overhead and

That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of
VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested.  Calling
GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an
assumption of mcxt.c, which is messy.  Including the allocation overhead is
fine, though.

> in any case isn't this memory already marked noaccess by aset.c?

Only sometimes, when AllocSetFree() happens to call free() or wipe_mem().



Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Greg Stark
Date:
<p dir="ltr"><p dir="ltr">On Oct 20, 2016 5:27 PM, "Noah Misch" <<a
href="mailto:noah@leadboat.com">noah@leadboat.com</a>>wrote:<br /> ><br /> > On Wed, Oct 19, 2016 at
11:08:39AM+0100, Greg Stark wrote:<br /> ><br /> > > The MEMPOOL_FREE doesn't take any size argument and
mcxt.cdoesn't<br /> > > have convenient access to a size argument. It could call the<br /> > >
GetChunkSpacemethod but that will include the allocation overhead and<br /> ><br /> > That is indeed a problem
formaking VALGRIND_MEMPOOL_FREE() an alias of<br /> > VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested. 
Calling<br/> > GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an<br /> > assumption of
mcxt.c,which is messy.  Including the allocation overhead is<br /> > fine, though.<p dir="ltr">I think the way out
isto simply have aset.c make the memory undefined/noaccess even if it's redundant under valgrind. It's a bit
unfortunatethat the macros would have different semantics under the two. If it's too confusing or we're worried about
theperformance overhead we could make a MAKE_MEM_{UNDEFINED,NOACCESS}_IF_NO_MEMPOOL() but I don't think it's worth it
myself.<br/><br /><p dir="ltr">> > in any case isn't this memory already marked noaccess by aset.c?<br /> ><br
/>> Only sometimes, when AllocSetFree() happens to call free() or wipe_mem().<p dir="ltr">I think if I did further
surgeryhere I would rename wipe_mem and randomise_mem and make them responsible for making the memory undefined and
noaccessas well. They would always be defined so that would get rid of all the ifdefs except within those functions.<br
/><pdir="ltr">I have a patch working under asan on both gcc and clang. That handles noaccess but not undefined memory
reads.I need to try msan to make sure the macro definitions work well for that API too.<p dir="ltr">There are a couple
buildoddities both with gcc and clang before I can commit anything though. And I can't test valgrind to be sure the
redundantcalls aren't causing a problem. 

Re: LLVM Address Sanitizer (ASAN) and valgrind support

From
Noah Misch
Date:
On Thu, Oct 20, 2016 at 06:14:28PM +0100, Greg Stark wrote:
> On Oct 20, 2016 5:27 PM, "Noah Misch" <noah@leadboat.com> wrote:
> > On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote:
> > > The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't
> > > have convenient access to a size argument. It could call the
> > > GetChunkSpace method but that will include the allocation overhead and
> >
> > That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of
> > VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested.  Calling
> > GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an
> > assumption of mcxt.c, which is messy.  Including the allocation overhead is
> > fine, though.
> 
> I think the way out is to simply have aset.c make the memory
> undefined/noaccess even if it's redundant under valgrind. It's a bit
> unfortunate that the macros would have different semantics under the two.
> If it's too confusing or we're worried about the performance overhead we
> could make a MAKE_MEM_{UNDEFINED,NOACCESS}_IF_NO_MEMPOOL() but I don't
> think it's worth it myself.

I don't expect much performance overhead.  When I last benchmarked Valgrind
Memcheck of "make installcheck", a !USE_VALGRIND build (no client requests at
all) saved about 5% of runtime.  A single new client request should be cheap
enough.  (Marking otherwise-redundant calls may still be good, though.)

> There are a couple build oddities both with gcc and clang before I can
> commit anything though. And I can't test valgrind to be sure the redundant
> calls aren't causing a problem.

When you submit your patch to a CommitFest, mention that you're blocked on
having a reviewer who can test Valgrind.  Many reviewers can help.

Thanks,
nm