Re: Non-reproducible AIO failure - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Non-reproducible AIO failure |
Date | |
Msg-id | ezv35wbw6sdgkkcfjwn4usf62fuqvhgln7uywrvzppwee7ul6r@w5k4apanh546 Whole thread Raw |
In response to | Re: Non-reproducible AIO failure (Konstantin Knizhnik <knizhnik@garret.ru>) |
Responses |
Re: Non-reproducible AIO failure
|
List | pgsql-hackers |
Hi, On 2025-06-06 14:03:12 +0300, Konstantin Knizhnik wrote: > There is really essential difference in code generated by clang 15 (working) > and 16 (not working). There also are code gen differences between upstream clang 17 and apple's clang, which is based on llvm 17 as well (I've updated the toolchain, it repros with that as well). The code generated for the bitfield access is so attrocious that it's worth changing to plain uint8 for that alone. It's annoying to have to add casts to all the switches, but clearly there isn't a better way for now. What specific compilers / versions were you comparing? I don't quite see similar assembly... I'm annotating the code below mostly for my understanding, I'm not fluid in armv8. > ``` > > pgaio_io_stage(PgAioHandle *ioh, PgAioOp op) > > { > > ... > > HOLD_INTERRUPTS(); > > ioh->op = op; > ioh->result = 0; > > pgaio_io_update_state(ioh, PGAIO_HS_DEFINED); > ... > > } > > ``` > > clang15: > > ``` > > postgres[0x10035d4ac] <+84>: add x22, x22, #0x5d4 ; > InterruptHoldoffCount > postgres[0x10035d4b0] <+88>: ldr w8, [x22] > postgres[0x10035d4b4] <+92>: add w8, w8, #0x1 > postgres[0x10035d4b8] <+96>: str w8, [x22] That's InterruptHoldoffCount++ > *postgres[0x10035d4bc] <+100>: strb w20, [x19, #0x2] That's ioh->op = op > *postgres[0x10035d4c0] <+104>: str wzr, [x19, #0x14] ioh->result = 0; > postgres[0x10035d4c4] <+108>: mov x0, x19 Saving the caller-saved x0 register. > postgres[0x10035d4c8] <+112>: mov w1, #0x2 > postgres[0x10035d4cc] <+116>: bl 0x10035cbdc ; > pgaio_io_update_state at aio.c:384 Calling pgaio_io_update_state() with new_state = PGAIO_HS_DEFINED > ``` > > clang16: > > ``` > > postgres[0x100699e8c] <+200>: add x9, x9, #0xd74 ; InterruptHoldoffCount > postgres[0x100699e90] <+204>: ldr w8, [x9] That's the start of InterruptHoldoffCount++ > postgres[0x100699e94] <+208>: mov w10, #0x1 ; =1 > postgres[0x100699e98] <+212>: stur w10, [x29, #-0x18] Isn't this setting w10 to 1 just to immediately overwrite it with another value? I assume this is restoring things from the stack? > postgres[0x100699e9c] <+216>: add w8, w8, #0x1 > postgres[0x100699ea0] <+220>: str w8, [x9] That's the end of InterruptHoldoffCount++ > postgres[0x100699ea4] <+224>: ldur w10, [x29, #-0xc] One more overwriting of w10, for good measure? > postgres[0x100699ea8] <+228>: ldur x9, [x29, #-0x8] I assume this is restoring the pointer to ioh from the stack? > postgres[0x100699eac] <+232>: ldrh w8, [x9] Loading state + target as one 16bit value. > postgres[0x100699eb0] <+236>: and w10, w10, #0xff Clearing the upper bits of w10 - not sure what w10 actually is, probably the op argument? > postgres[0x100699eb4] <+240>: orr w10, w8, w10, lsl #16 Or-ing {ioh->state, ioh->target} and (I think) the 'op' argument shifted left by 16 bit, presumably leaving us with {ioh->state, ioh->target, op}. > postgres[0x100699eb8] <+244>: lsr w8, w10, #16 So this is right shifting what I think is {ioh->state, ioh->target, op} by 16, which presumably leaves you just with op. WTF. > *postgres[0x100699ebc] <+248>: strh w10, [x9] Now we're writing 16 bits of {ioh->state, ioh->target, ioh->op} to ioh, thereby setting ioh->state and ioh->target, but not ioh->op. > postgres[0x100699ec0] <+252>: strb w8, [x9, #0x2] Setting ioh->op. > *postgres[0x100699ec4] <+256>: ldur x8, [x29, #-0x8] This was loaded into x9 above, why is this redone, x9 is not overwritten? Oh, I guess you removed the Assert related instructions, and that's why it's reloading it... > postgres[0x100699ec8] <+260>: str wzr, [x8, #0x14] ioh->return = 0. > postgres[0x100699ed0] <+268>: mov w1, #0x2 ; =2 This puts 2, presumably PGAIO_HS_DEFINED into w1 - presumably as an argument for the call to pgaio_io_update_state(). > postgres[0x100699ed4] <+272>: bl 0x100698efc ; pgaio_io_update_state > at aio.c:384 > ``` > As you can see pg16 just updates third byte of the structure (`op` field). > It is not affecting 'state' field at all. It's really weird code. I think the state store would be in the call to pgaio_io_update_state(). > While clang16 is also updating `state` field! I assume this was intending to refer to clang15? FWIW, I don't think it does update the state field in the assembly you shared. This is very odd code, but I don't immediately see a bug in it :( Greetings, Andres Freund
pgsql-hackers by date: