a very significant fraction of the buildfarm is now pink - Mailing list pgsql-hackers

From Robert Haas
Subject a very significant fraction of the buildfarm is now pink
Date
Msg-id CA+TgmoZB8frH_n6tYAdFSYWUwy1wbVOzM3zHfVkHUKCj0+rHzA@mail.gmail.com
Whole thread Raw
In response to Re: explain analyze rows=%.0f  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: a very significant fraction of the buildfarm is now pink
List pgsql-hackers
On Fri, Feb 21, 2025 at 7:04 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> A very significant fraction of the buildfarm is now pink.
> If you don't have a fix pretty nearly ready, please revert.

When we're going to do a release, you want no commits for at least 24
hours before the release so that we can make sure the buildfarm is
clean. But when I commit something and the buildfarm fails, you want
it reverted within a handful of hours before I can even be sure of
having seen all the failures the commit caused, let alone had time to
think about what the best fix might be. It doesn't make sense to me
that we need 24 hours to be sure that the buildfarm is passing, but in
3 hours I'm supposed to see all the failures -- including the ones
that only happen on buildfarm animals that run once a day, I guess? --
and analyze them -- and decide on a fix -- naturally without any sort
of discussion because there's no time for that -- and code the fix --
and push it. I have a really hard time seeing how that's a reasonable
expectation.

I understand that the buildfarm can't be red all the time or nobody
can distinguish the problems they caused from preexisting ones. At the
same time, it seems completely unreasonable for us to say that, on the
one hand, the buildfarm has to be green at absolutely all times, and
on the other time, buildfarm owners are not required to provide any
sort of resources to test things before they are committed. IMHO, one
of those policies absolutely has to change. The current situation is
way too stressful for committers and it's burning people out and
making them unwilling to commit things -- or if they do commit things,
then they end up insta-reverting them, committing them again later,
maybe insta-reverting them a second time because they didn't actually
find all the problems the first time, and then maybe even round three,
four, or five. The commit log ends up with a bunch of garbage from the
repeated commits and reverts, and if it goes on long enough,
eventually somebody shows up to say "wow, this patch seems to be in
terrible shape, maybe it shouldn't ever be committed again" right when
the committer's stress level is already going through the ceiling. And
sometimes that is justified, but sometimes it isn't.

In the case of my commit today, the failures are the result of a
2-line regression diff with no functional impact that neither CI nor
any of the 11 reviewers noticed. That just shouldn't be the sort of
thing that results in somebody having to work evenings and weekends.
Perhaps if it DIDN'T result in a committer having to work evenings and
weekends, it wouldn't have taken 16 years for us to do something about
that problem.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: GetRelationPath() vs critical sections
Next
From: Robert Haas
Date:
Subject: Re: explain analyze rows=%.0f