Re: seawasp failing, maybe in glibc allocator - Mailing list pgsql-hackers

From Andres Freund
Subject Re: seawasp failing, maybe in glibc allocator
Date
Msg-id 20210620105934.vkweficqaw4t6uw4@alap3.anarazel.de
Whole thread Raw
In response to Re: seawasp failing, maybe in glibc allocator  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: seawasp failing, maybe in glibc allocator
List pgsql-hackers
Hi,

On 2021-06-19 17:07:51 +1200, Thomas Munro wrote:
> On Sat, May 22, 2021 at 12:25 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2021-05-21 15:57:01 -0700, Andres Freund wrote:
> > > I found the LLVM commit to blame (c8fc5e3ba942057d6c4cdcd1faeae69a28e7b671).
> > > Contacting the author and reading the change to see if I can spit the
> > > issue myself.
> >
> > Hrmpf. It's a silent API breakage. The author intended to email us about
> > it, but apparently forgot. One now needs to increment a string-pool
> > refcount. The reason that didn't trigger a reliable crash is that
> > there's a path where the refcount of string-pool entries aren't asserted
> > to be above before decrementing the refcount... And that there
> > practically never are references to the pool entries after use.
> >
> > Continuing to discusss whether there's a better way to deal with this.
> 
> Any news?

Nothing really. I'd discussed it a bit with the relevant LLVM
maintainer, but we didn't come to a real resolution. He apologized for
not giving a proper heads up - he'd planned to send out an email, but
somehow lost track of that.

Given that any potential solution would be also end up being a versioned
ifdef, I think adding something like what you suggest here is the least
unreasonable solution.

> FWIW this change appears to fix the problem for my system (LLVM 13
> build from a couple of days ago).  No more weird results, valgrind
> errors gone.  I ran the leak checker to see if I now had the opposite
> problem, and although there are various leaks reported, I didn't see
> obvious intern pool related stacks.
> 
> diff --git a/src/backend/jit/llvm/llvmjit.c b/src/backend/jit/llvm/llvmjit.c
> index 71029a39a9..7b09e520f5 100644
> --- a/src/backend/jit/llvm/llvmjit.c
> +++ b/src/backend/jit/llvm/llvmjit.c
> @@ -1116,6 +1116,11 @@
> llvm_resolve_symbols(LLVMOrcDefinitionGeneratorRef GeneratorObj, void
> *Ctx,
>         if (error != LLVMErrorSuccess)
>                 LLVMOrcDisposeMaterializationUnit(mu);
> 
> +#if LLVM_VERSION_MAJOR > 12
> +       for (int i = 0; i < LookupSetSize; i++)
> +               LLVMOrcRetainSymbolStringPoolEntry(symbols[i].Name);
> +#endif
> +
>         pfree(symbols);

I think this should be part of the earlier loop? Once
LLVMOrcAbsoluteSymbols() is called that owns the reference, so there
doesn't seem to be a reason to increase the refcount only later?

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: pgbench logging broken by time logic changes
Next
From: Andres Freund
Date:
Subject: Re: seawasp failing, maybe in glibc allocator