Buildfarm coverage planning (was: what's going on with lapwing?) - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Buildfarm coverage planning (was: what's going on with lapwing?)
Date
Msg-id 07802ece-8718-40f0-8713-8fc1f3558e4e@dunslane.net
Whole thread Raw
In response to Re: what's going on with lapwing?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On 2025-03-07 Fr 8:52 AM, Robert Haas wrote:
On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Honestly, it's been years of people complaining on one thing or another about
lapwing without ever asking for a change.  Was it really hard to ask "can you
remove the -Werror it's not useful anymore" the first time it caused extra
work?  Instead I have to guess what people want.  So after a few complaints I
removed that flag.  And now after a few more complaints I turned it off.  If
that's not what you want, well too bad but that's on you, not me.
This is actually much harder for me as a committer than you might
guess. How is an individual committer working on an individual issue
supposed to know that removing -Wall is the right thing vs. fixing the
warning in some way? As Melanie also mentions in her reply, committers
are not born knowing or understanding which machines are chronic
problems, and sometimes it's very difficult to figure that out. If you
read every message on the mailing list you're probably going to have
more idea than if you don't, but that's more than most people can do
these days, and you can still miss things.

But I think your complaint here is actually getting at another problem
with the way that we do the buildfarm as a project: it's completely
unplanned. People show up and run random buildfarm machines and nobody
knows why they are running those machines: was it because they care
about support for that platform, or was it just to be nice, or where
they testing some unusual configuration, or just because they set it
up a long time ago and have never turned it off? And then other
buildfarm members that maybe we ought to have are missing and nobody
knows if anyone else is working on that, or maybe they don't even know
about it. And then, to your point, nobody ever shows up and tells a
buildfarm member owner what we'd like them to do. There is no "what
we'd like them to do" -- we have no policy or preference or anything
as a group. Everybody's just guessing what other people want and care
about, and then sometimes we're all grumpy at each other.

But notice that with CI, it's the other way around. Some small group
of people decide what our CI setup should do and then they configure
it to do that thing. You can agree with those decisions or not, but
they are intentional. The platforms and OS versions that are being
tested are what somebody decided was best from among the available
options. Now I'm not saying that sort of centralized planning is
without flaws -- and the fact that only a handful of people seem to
understand how to keep this CI stuff working is definitely one of them
-- but it also has some strengths, namely that you remove a lot of
this guesswork around what the other people involved actually want.

I'm not sure exactly where I'm going with this line of thought, but I
do wonder if we ought to find a way to be more intentional about the
buildfarm instead of just letting things happen and then being sad
that we didn't get what we wanted.


I've renamed the thread because it's got a long way past what's happened with lapwing.

I agree that the coverage is unplanned, and that we should do better about it. 

However, we can't cover every possible combination of Architecture, OS, Compiler, Build System, Build Options. There are roughly 170 currently reporting buildfarm animals. Comprehensive coverage would mean a fleet of thousands, even if we ran multiple animals on a single machine. And I'm not sure that would be a win - the status page would become unmanageable and useless. So we need to be strategic about it.

So let's start with a couple of simple questions:

1. what are the perceived major gaps in coverage?

2. what animals (if any) should be turned off or updated?

I'll start with my nomination for 1: we need a Windows animal that builds in the same way and with the same options as the installer binaries. That's something we are actually working on at EDB.


cheers


andrew





--
Andrew Dunstan
EDB: https://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: pg_atomic_compare_exchange_*() and memory barriers
Next
From: Andres Freund
Date:
Subject: Re: pg_atomic_compare_exchange_*() and memory barriers