Re: longfin and tamandua aren't too happy but I'm not sure why - Mailing list pgsql-hackers

From Tom Lane
Subject Re: longfin and tamandua aren't too happy but I'm not sure why
Date
Msg-id 3825454.1664310917@sss.pgh.pa.us
Whole thread Raw
In response to Re: longfin and tamandua aren't too happy but I'm not sure why  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: longfin and tamandua aren't too happy but I'm not sure why
List pgsql-hackers
Justin Pryzby <pryzby@telsasoft.com> writes:
> On Tue, Sep 27, 2022 at 02:55:18PM -0400, Robert Haas wrote:
>> Both animals are running with -fsanitize=alignment and it's not
>> difficult to believe that the commit mentioned above could have
>> introduced an alignment problem where we didn't have one before, but
>> without a stack backtrace I don't know how to track it down. I tried
>> running those tests locally with -fsanitize=alignment and they passed.

> There's one here:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kestrel&dt=2022-09-27%2018%3A43%3A06

On longfin's host, the test_decoding run produces two core files.
One has a backtrace like this:

  * frame #0: 0x000000010a36af8c postgres`ParseCommitRecord(info='\x80', xlrec=0x00007fa0678a8090,
parsed=0x00007ff7b5c50e78)at xactdesc.c:102:30 
    frame #1: 0x000000010a765f9e postgres`xact_decode(ctx=0x00007fa0680d9118, buf=0x00007ff7b5c51000) at decode.c:201:5
[opt]
    frame #2: 0x000000010a765d17 postgres`LogicalDecodingProcessRecord(ctx=0x00007fa0680d9118, record=<unavailable>) at
decode.c:119:3[opt] 
    frame #3: 0x000000010a76d890 postgres`pg_logical_slot_get_changes_guts(fcinfo=<unavailable>, confirm=true,
binary=false)at logicalfuncs.c:271:5 [opt] 
    frame #4: 0x000000010a76d320 postgres`pg_logical_slot_get_changes(fcinfo=<unavailable>) at logicalfuncs.c:338:9
[opt]
    frame #5: 0x000000010a5a521d postgres`ExecMakeTableFunctionResult(setexpr=<unavailable>,
econtext=0x00007fa068098f50,argContext=<unavailable>, expectedDesc=0x00007fa06701ba38, randomAccess=<unavailable>) at
execSRF.c:234:13[opt] 
    frame #6: 0x000000010a5c405b postgres`FunctionNext(node=0x00007fa068098d40) at nodeFunctionscan.c:95:5 [opt]
    frame #7: 0x000000010a5a61b9 postgres`ExecScan(node=0x00007fa068098d40, accessMtd=(postgres`FunctionNext at
nodeFunctionscan.c:61),recheckMtd=(postgres`FunctionRecheck at nodeFunctionscan.c:251)) at execScan.c:199:10 [opt] 
    frame #8: 0x000000010a596ee0 postgres`standard_ExecutorRun [inlined] ExecProcNode(node=0x00007fa068098d40) at
executor.h:259:9[opt] 
    frame #9: 0x000000010a596eb8 postgres`standard_ExecutorRun [inlined] ExecutePlan(estate=<unavailable>,
planstate=0x00007fa068098d40,use_parallel_mode=<unavailable>, operation=CMD_SELECT, sendTuples=<unavailable>,
numberTuples=0,direction=1745456112, dest=0x00007fa067023848, execute_once=<unavailable>) at execMain.c:1636:10 [opt] 
    frame #10: 0x000000010a596e2a postgres`standard_ExecutorRun(queryDesc=<unavailable>, direction=1745456112, count=0,
execute_once=<unavailable>)at execMain.c:363:3 [opt] 

and the other

  * frame #0: 0x000000010a36af8c postgres`ParseCommitRecord(info='\x80', xlrec=0x00007fa06783a090,
parsed=0x00007ff7b5c50040)at xactdesc.c:102:30 
    frame #1: 0x000000010a3cd24d postgres`xact_redo(record=0x00007fa0670096c8) at xact.c:6161:3
    frame #2: 0x000000010a41770d postgres`ApplyWalRecord(xlogreader=0x00007fa0670096c8, record=0x00007fa06783a060,
replayTLI=0x00007ff7b5c507f0)at xlogrecovery.c:1897:2 
    frame #3: 0x000000010a4154be postgres`PerformWalRecovery at xlogrecovery.c:1728:4
    frame #4: 0x000000010a3e0dc7 postgres`StartupXLOG at xlog.c:5473:3
    frame #5: 0x000000010a7498a0 postgres`StartupProcessMain at startup.c:267:2 [opt]
    frame #6: 0x000000010a73e2cb postgres`AuxiliaryProcessMain(auxtype=StartupProcess) at auxprocess.c:141:4 [opt]
    frame #7: 0x000000010a745b97 postgres`StartChildProcess(type=StartupProcess) at postmaster.c:5408:3 [opt]
    frame #8: 0x000000010a7487e2 postgres`PostmasterStateMachine at postmaster.c:4006:16 [opt]
    frame #9: 0x000000010a745804 postgres`reaper(postgres_signal_arg=<unavailable>) at postmaster.c:3256:2 [opt]
    frame #10: 0x00007ff815b16dfd libsystem_platform.dylib`_sigtramp + 29
    frame #11: 0x00007ff815accd5b libsystem_kernel.dylib`__select + 11
    frame #12: 0x000000010a74689c postgres`ServerLoop at postmaster.c:1768:13 [opt]
    frame #13: 0x000000010a743fbb postgres`PostmasterMain(argc=<unavailable>, argv=0x00006000006480a0) at
postmaster.c:1476:11[opt] 
    frame #14: 0x000000010a61c775 postgres`main(argc=8, argv=<unavailable>) at main.c:197:3 [opt]

Looks like it might be the same bug, but perhaps not.

I recompiled access/transam and access/rmgrdesc at -O0 to get the accurate
line numbers shown for those files.  Let me know if you need any more
info; I can add -O0 in more places, or poke around in the cores.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Convert *GetDatum() and DatumGet*() macros to inline functions
Next
From: Tom Lane
Date:
Subject: Re: longfin and tamandua aren't too happy but I'm not sure why