On 10/3/19 4:13 PM, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> * It's certainly curious that the failures so far only have happended as
>> part of pg_upgradeCheck, rather than the plain regression tests.
> Isn't it though. We spent a long time wondering why we saw parallel
> plan instability mostly in pg_upgradeCheck, too [1]. We eventually
> decided that the cause of that instability was chance timing collisions
> with bgwriter/checkpointer, but nobody ever really explained why
> pg_upgradeCheck should be more prone to hit those windows than the plain
> tests are. I feel like there's something still to be understood there.
>
> Whether this is related, who's to say. But given your thought about
> stack alignment, I'm half thinking that the crash is seen when we get a
> signal (e.g. SIGUSR1 from sinval processing) at the wrong time, allowing
> the stack to become unaligned, and that the still-unexplained timing
> difference in pg_upgradeCheck accounts for that test being more prone to
> show it.
>
> regards, tom lane
>
> [1] https://www.postgresql.org/message-id/20190605050037.GA33985@rfd.leadboat.com
Yes, that's very puzzling. But what do we actually do differently in the
pg_upgrade checks that might account for it? Nothing that is at all
obvious to me that might account for it.
Another data point: the new Visual Studio 2019 instance drongo running
on the same machine is not exhibiting these problems. Yes, it's not
running test.sh, but vcregress.pl does pretty much the same thing. So
that does seem to point to the toolset. I'll see if I can get the same
toolset jacana is using installed and try that.
cheers
andrew
--
Andrew Dunstan https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services