Re: gs_group_1 crashing on 13beta2/s390x - Mailing list pgsql-hackers

From Andres Freund
Subject Re: gs_group_1 crashing on 13beta2/s390x
Date
Msg-id 20201015083246.kie5726xerdt3ael@alap3.anarazel.de
Whole thread Raw
In response to Re: gs_group_1 crashing on 13beta2/s390x  (Andres Freund <andres@anarazel.de>)
Responses Re: gs_group_1 crashing on 13beta2/s390x
List pgsql-hackers
Hi,

On 2020-10-14 17:56:16 -0700, Andres Freund wrote:
> Oh dear. It's not as simple as that. The issue indeed are relocations,
> but we don't hit those errors. The issue rather is that the systemz
> specific relative redirection code thought that the only relative
> symbols are functions. So it creates a stub function to redirect
> them. Which turns out to not work well with variables like
> CurrentMemoryContext...

That might be a problem - but the main problem causing the crash at hand
is likely something else. The prototypes we create for
ExecAggTransReparent() were missing the 'zeroext' parameter for a the
'isnull' attribute, because the code for copying the attributes from
llvmjit_types.bc didn't go deep enough (i.e. I didn't quite grok the
pretty weird API). On s390x that lead to the newValue argument in
ExecAggTransReparent() having a 0 lower byte, but set higher bytes -
which then *sometimes* fooled the if (!newValueIsNull) check, which
assumed that the higher bits were unset.

I have a fix for this, but I've just stared at s390 assembly code for
~10h, never having done so before. So that'll have to wait for tomorrow.

It's quite possible that that fix would also help on other
architectures...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.
Next
From: Andres Freund
Date:
Subject: Re: recovering from "found xmin ... from before relfrozenxid ..."