Re: fairywren failures - Mailing list pgsql-hackers

From Tom Lane
Subject Re: fairywren failures
Date
Msg-id 29222.1570133585@sss.pgh.pa.us
Whole thread Raw
In response to Re: fairywren failures  (Andres Freund <andres@anarazel.de>)
Responses Re: fairywren failures
List pgsql-hackers
Andres Freund <andres@anarazel.de> writes:
> * It's certainly curious that the failures so far only have happended as
>   part of pg_upgradeCheck, rather than the plain regression tests.

Isn't it though.  We spent a long time wondering why we saw parallel
plan instability mostly in pg_upgradeCheck, too [1].  We eventually
decided that the cause of that instability was chance timing collisions
with bgwriter/checkpointer, but nobody ever really explained why
pg_upgradeCheck should be more prone to hit those windows than the plain
tests are.  I feel like there's something still to be understood there.

Whether this is related, who's to say.  But given your thought about
stack alignment, I'm half thinking that the crash is seen when we get a
signal (e.g. SIGUSR1 from sinval processing) at the wrong time, allowing
the stack to become unaligned, and that the still-unexplained timing
difference in pg_upgradeCheck accounts for that test being more prone to
show it.

            regards, tom lane

[1] https://www.postgresql.org/message-id/20190605050037.GA33985@rfd.leadboat.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: consider including server_version in explain(settings)
Next
From: David Fetter
Date:
Subject: Re: Value of Transparent Data Encryption (TDE)