Thread: Fix some ubsan/asan related issues

Fix some ubsan/asan related issues

From

"Tristan Partin"

Date:

30 January 2024, 15:57:24

Patch 1:

Passing NULL as a second argument to memcpy breaks ubsan, and there
didn't seem to be anything preventing that in the LogLogicalMessage()
codepath. Here is a preventative measure in LogLogicalMessage() and an
Assert() in CopyXLogRecordToWAL().

Patch 2:

Support building with -Db_sanitize=address in Meson. Various executables
are leaky which can cause the builds to fail and tests to fail, when we
are fine leaking this memory.

Personally, I am a big stickler for always freeing my memory in
executables even if it gets released at program exit because it makes
the leak sanitizer much more effective. However (!), I am not here to
change the philosophy of memory management in one-off executables, so
I have just silenced memory leaks in various executables for now.

Patch 3:

THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!

In my effort to try to see if the test suite would pass with asan
enabled, I ran into a max_stack_depth issue. I tried maxing it out
(hence, the patch), but that still didn't remedy my issue. I tried to
look on the list for any relevant emails, but nothing turned up. Maybe
others have not attempted this before? Seems doubtful.

Not entirely sure how to fix this issue. I personally find asan
extremely effective, even more than valgrind, so it would be great if
I could run Postgres with asan enabled to catch various stupid C issues
I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
seem to leave enough stack space for Postgres.

---

I would like to see patch 1 reviewed and committed. Patch 2 honestly
doesn't matter unless asan support can be fixed. I can also add a patch
that errors out the Meson build if asan support is requested. That way
others don't spend time heading down a dead end.

--
Tristan Partin
Neon (https://neon.tech)

Re: Fix some ubsan/asan related issues

From

Heikki Linnakangas

Date:

30 January 2024, 20:05:28

On 30/01/2024 17:57, Tristan Partin wrote:
> Patch 1:
> 
> Passing NULL as a second argument to memcpy breaks ubsan, and there
> didn't seem to be anything preventing that in the LogLogicalMessage()
> codepath. Here is a preventative measure in LogLogicalMessage() and an
> Assert() in CopyXLogRecordToWAL().

For the audience: We ran into this one with the neon extension. The 
solution is to call LogLogicalMessage("", 0, ...) instead of 
LogLogicalMessage(NULL, 0, ...). . But it's true that it's pointlessfor 
LogLogicalMessage to call XLogRegisterData() if the message is empty. If 
we do this, we should check for 'size == 0' rather than 'message == NULL'.

> Patch 2:
> 
> Support building with -Db_sanitize=address in Meson. Various executables
> are leaky which can cause the builds to fail and tests to fail, when we
> are fine leaking this memory.
> 
> Personally, I am a big stickler for always freeing my memory in
> executables even if it gets released at program exit because it makes
> the leak sanitizer much more effective. However (!), I am not here to
> change the philosophy of memory management in one-off executables, so
> I have just silenced memory leaks in various executables for now.
> 
> Patch 3:
> 
> THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION!
> 
> In my effort to try to see if the test suite would pass with asan
> enabled, I ran into a max_stack_depth issue. I tried maxing it out
> (hence, the patch), but that still didn't remedy my issue. I tried to
> look on the list for any relevant emails, but nothing turned up. Maybe
> others have not attempted this before? Seems doubtful.
> 
> Not entirely sure how to fix this issue. I personally find asan
> extremely effective, even more than valgrind, so it would be great if
> I could run Postgres with asan enabled to catch various stupid C issues
> I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
> just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
> seem to leave enough stack space for Postgres.

I'm a bit confused by these. We already run with sanitizer in the cirrus 
CI. What does this enable that we're not already doing?

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Fix some ubsan/asan related issues

From

Andres Freund

Date:

30 January 2024, 21:23:04

Hi,

On 2024-01-30 22:05:28 +0200, Heikki Linnakangas wrote:
> On 30/01/2024 17:57, Tristan Partin wrote:
> > In my effort to try to see if the test suite would pass with asan
> > enabled, I ran into a max_stack_depth issue. I tried maxing it out
> > (hence, the patch), but that still didn't remedy my issue. I tried to
> > look on the list for any relevant emails, but nothing turned up. Maybe
> > others have not attempted this before? Seems doubtful.
> > 
> > Not entirely sure how to fix this issue. I personally find asan
> > extremely effective, even more than valgrind, so it would be great if
> > I could run Postgres with asan enabled to catch various stupid C issues
> > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres
> > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't
> > seem to leave enough stack space for Postgres.
> 
> I'm a bit confused by these. We already run with sanitizer in the cirrus CI.
> What does this enable that we're not already doing?

The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...

The checks are actually quite useful, so making our stack depth check work
with asan would be worthwhile.

I discussed this in a bit more in
https://postgr.es/m/20231129193920.4vphw7dqxjzf5v5b%40awork3.anarazel.de

Greetings,

Andres Freund

Re: Fix some ubsan/asan related issues

From

Alexander Lakhin

Date:

31 January 2024, 04:00:00

Hello,

30.01.2024 18:57, Tristan Partin wrote:

Patch 1:

Passing NULL as a second argument to memcpy breaks ubsan, ...

Maybe you would like to fix also one more similar place, reached with:
create extension xml2;
select xslt_process('<x/>',
$$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
</xsl:template>
</xsl:stylesheet>$$);

varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null

There is also an issue with pg_bsd_indent, I stumble upon when doing
`make check-world` with the sanitizers enabled:
https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com

31.01.2024 00:23, Andres Freund wrote:

The reason asan fails is that it uses a "shadow stack" to track stack variable
lifetimes. These confuse our stack depth check. CI doesn't have the issue
because the compiler doesn't yet enable the feature, locally I get around it
by using ASAN_OPTIONS=detect_stack_use_after_return=0:...

Even with detect_stack_use_after_return=0, clang-18's asan makes the test
012_subtransactions.pl fail:
2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth (currently 2048kB), after ensuring the platform's stack depth limit is adequate.

(All the other tests pass.)
Though the same test passes when I use clang-16.

Best regards,
Alexander

Re: Fix some ubsan/asan related issues

From

"Tristan Partin"

Date:

31 January 2024, 18:16:52

On Tue Jan 30, 2024 at 10:00 PM CST, Alexander Lakhin wrote:
> Hello,
>
> 30.01.2024 18:57, Tristan Partin wrote:
> > Patch 1:
> >
> > Passing NULL as a second argument to memcpy breaks ubsan, ...
>
> Maybe you would like to fix also one more similar place, reached with:
> create extension xml2;
> select xslt_process('<x/>',
> $$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
> <xsl:template match="@*|node()">
> </xsl:template>
> </xsl:stylesheet>$$);
>
> varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null
>
> There is also an issue with pg_bsd_indent, I stumble upon when doing
> `make check-world` with the sanitizers enabled:
> https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com
>
> 31.01.2024 00:23, Andres Freund wrote:
> > The reason asan fails is that it uses a "shadow stack" to track stack variable
> > lifetimes. These confuse our stack depth check. CI doesn't have the issue
> > because the compiler doesn't yet enable the feature, locally I get around it
> > by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
>
> Even with detect_stack_use_after_return=0, clang-18's asan makes the test
> 012_subtransactions.pl fail:
> 2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
> 2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
> 2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter
max_stack_depth 
> (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
>
> (All the other tests pass.)
> Though the same test passes when I use clang-16.

Thanks Alexander! I will try and take a look at these.

--
Tristan Partin
Neon (https://neon.tech)