Thread: Fix some ubsan/asan related issues
Patch 1: Passing NULL as a second argument to memcpy breaks ubsan, and there didn't seem to be anything preventing that in the LogLogicalMessage() codepath. Here is a preventative measure in LogLogicalMessage() and an Assert() in CopyXLogRecordToWAL(). Patch 2: Support building with -Db_sanitize=address in Meson. Various executables are leaky which can cause the builds to fail and tests to fail, when we are fine leaking this memory. Personally, I am a big stickler for always freeing my memory in executables even if it gets released at program exit because it makes the leak sanitizer much more effective. However (!), I am not here to change the philosophy of memory management in one-off executables, so I have just silenced memory leaks in various executables for now. Patch 3: THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION! In my effort to try to see if the test suite would pass with asan enabled, I ran into a max_stack_depth issue. I tried maxing it out (hence, the patch), but that still didn't remedy my issue. I tried to look on the list for any relevant emails, but nothing turned up. Maybe others have not attempted this before? Seems doubtful. Not entirely sure how to fix this issue. I personally find asan extremely effective, even more than valgrind, so it would be great if I could run Postgres with asan enabled to catch various stupid C issues I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't seem to leave enough stack space for Postgres. --- I would like to see patch 1 reviewed and committed. Patch 2 honestly doesn't matter unless asan support can be fixed. I can also add a patch that errors out the Meson build if asan support is requested. That way others don't spend time heading down a dead end. -- Tristan Partin Neon (https://neon.tech)
On 30/01/2024 17:57, Tristan Partin wrote: > Patch 1: > > Passing NULL as a second argument to memcpy breaks ubsan, and there > didn't seem to be anything preventing that in the LogLogicalMessage() > codepath. Here is a preventative measure in LogLogicalMessage() and an > Assert() in CopyXLogRecordToWAL(). For the audience: We ran into this one with the neon extension. The solution is to call LogLogicalMessage("", 0, ...) instead of LogLogicalMessage(NULL, 0, ...). . But it's true that it's pointlessfor LogLogicalMessage to call XLogRegisterData() if the message is empty. If we do this, we should check for 'size == 0' rather than 'message == NULL'. > Patch 2: > > Support building with -Db_sanitize=address in Meson. Various executables > are leaky which can cause the builds to fail and tests to fail, when we > are fine leaking this memory. > > Personally, I am a big stickler for always freeing my memory in > executables even if it gets released at program exit because it makes > the leak sanitizer much more effective. However (!), I am not here to > change the philosophy of memory management in one-off executables, so > I have just silenced memory leaks in various executables for now. > > Patch 3: > > THIS DOESN'T WORK. DO NOT COMMIT. PROPOSING FOR DISCUSSION! > > In my effort to try to see if the test suite would pass with asan > enabled, I ran into a max_stack_depth issue. I tried maxing it out > (hence, the patch), but that still didn't remedy my issue. I tried to > look on the list for any relevant emails, but nothing turned up. Maybe > others have not attempted this before? Seems doubtful. > > Not entirely sure how to fix this issue. I personally find asan > extremely effective, even more than valgrind, so it would be great if > I could run Postgres with asan enabled to catch various stupid C issues > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't > seem to leave enough stack space for Postgres. I'm a bit confused by these. We already run with sanitizer in the cirrus CI. What does this enable that we're not already doing? -- Heikki Linnakangas Neon (https://neon.tech)
Hi, On 2024-01-30 22:05:28 +0200, Heikki Linnakangas wrote: > On 30/01/2024 17:57, Tristan Partin wrote: > > In my effort to try to see if the test suite would pass with asan > > enabled, I ran into a max_stack_depth issue. I tried maxing it out > > (hence, the patch), but that still didn't remedy my issue. I tried to > > look on the list for any relevant emails, but nothing turned up. Maybe > > others have not attempted this before? Seems doubtful. > > > > Not entirely sure how to fix this issue. I personally find asan > > extremely effective, even more than valgrind, so it would be great if > > I could run Postgres with asan enabled to catch various stupid C issues > > I might create. On my system, ulimit -a returns 8192 kbytes, so Postgres > > just lets me set 7680 kbytes as the max. Whatever asan is doing doesn't > > seem to leave enough stack space for Postgres. > > I'm a bit confused by these. We already run with sanitizer in the cirrus CI. > What does this enable that we're not already doing? The reason asan fails is that it uses a "shadow stack" to track stack variable lifetimes. These confuse our stack depth check. CI doesn't have the issue because the compiler doesn't yet enable the feature, locally I get around it by using ASAN_OPTIONS=detect_stack_use_after_return=0:... The checks are actually quite useful, so making our stack depth check work with asan would be worthwhile. I discussed this in a bit more in https://postgr.es/m/20231129193920.4vphw7dqxjzf5v5b%40awork3.anarazel.de Greetings, Andres Freund
Hello,
30.01.2024 18:57, Tristan Partin wrote:
30.01.2024 18:57, Tristan Partin wrote:
Patch 1:
Passing NULL as a second argument to memcpy breaks ubsan, ...
Maybe you would like to fix also one more similar place, reached with:
create extension xml2;
select xslt_process('<x/>',
$$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|node()">
</xsl:template>
</xsl:stylesheet>$$);
varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null
There is also an issue with pg_bsd_indent, I stumble upon when doing
`make check-world` with the sanitizers enabled:
https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com
31.01.2024 00:23, Andres Freund wrote:
The reason asan fails is that it uses a "shadow stack" to track stack variable lifetimes. These confuse our stack depth check. CI doesn't have the issue because the compiler doesn't yet enable the feature, locally I get around it by using ASAN_OPTIONS=detect_stack_use_after_return=0:...
Even with detect_stack_use_after_return=0, clang-18's asan makes the test
012_subtransactions.pl fail:
2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201);
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded
2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth (currently 2048kB), after ensuring the platform's stack depth limit is adequate.
(All the other tests pass.)
Though the same test passes when I use clang-16.
Best regards,
Alexander
On Tue Jan 30, 2024 at 10:00 PM CST, Alexander Lakhin wrote: > Hello, > > 30.01.2024 18:57, Tristan Partin wrote: > > Patch 1: > > > > Passing NULL as a second argument to memcpy breaks ubsan, ... > > Maybe you would like to fix also one more similar place, reached with: > create extension xml2; > select xslt_process('<x/>', > $$<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> > <xsl:template match="@*|node()"> > </xsl:template> > </xsl:stylesheet>$$); > > varlena.c:201:26: runtime error: null pointer passed as argument 2, which is declared to never be null > > There is also an issue with pg_bsd_indent, I stumble upon when doing > `make check-world` with the sanitizers enabled: > https://www.postgresql.org/message-id/591971ce-25c1-90f3-0526-5f54e3ebb32e%40gmail.com > > 31.01.2024 00:23, Andres Freund wrote: > > The reason asan fails is that it uses a "shadow stack" to track stack variable > > lifetimes. These confuse our stack depth check. CI doesn't have the issue > > because the compiler doesn't yet enable the feature, locally I get around it > > by using ASAN_OPTIONS=detect_stack_use_after_return=0:... > > Even with detect_stack_use_after_return=0, clang-18's asan makes the test > 012_subtransactions.pl fail: > 2024-01-31 03:24:25.691 UTC [4112455] 012_subtransactions.pl LOG: statement: SELECT hs_subxids(201); > 2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl ERROR: stack depth limit exceeded > 2024-01-31 03:24:25.714 UTC [4112455] 012_subtransactions.pl HINT: Increase the configuration parameter max_stack_depth > (currently 2048kB), after ensuring the platform's stack depth limit is adequate. > > (All the other tests pass.) > Though the same test passes when I use clang-16. Thanks Alexander! I will try and take a look at these. -- Tristan Partin Neon (https://neon.tech)