Thread: what's going on with lapwing?

what's going on with lapwing?

From
Robert Haas
Date:
lapwing has been failing sepgsql-check for over a month, but there's
no log file:


https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2025-03-03%2016%3A29%3A50&stg=contrib-sepgsql-check

I feel like something must be broken with logfile collection in the
buildfarm client, because I keep seeing failures like this where
there's no log to view. It's hard to know whether this is a
configuration issue with this BF member or whether something actually
needs fixing. In any case, we should try to figure out what's going on
here.

Apologies if this has been discussed elsewhere and I have missed it.
Pointers appreciated.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> lapwing has been failing sepgsql-check for over a month, but there's
> no log file:

I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.

            regards, tom lane



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > lapwing has been failing sepgsql-check for over a month, but there's
> > no log file:
>
> I believe it hasn't been updated with the buildfarm client changes
> needed to run sepgsql-check since aeb8ea361.

OK, thanks. The owner is pgbuildfarm@rjuju.net, which I'm guessing
might be Julien Rouhaud, but not sure? It would be good to get that
updated, especially with the end of the release cycle hard upon us.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Julien Rouhaud
Date:
On Mon, Mar 03, 2025 at 03:52:30PM -0500, Robert Haas wrote:
> On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > lapwing has been failing sepgsql-check for over a month, but there's
> > > no log file:
> >
> > I believe it hasn't been updated with the buildfarm client changes
> > needed to run sepgsql-check since aeb8ea361.

Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners.  There was an email
yesterday for installing libcurl which I did.  There was an email before last
release for possibly stuck tests which I checked.  There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?

Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm
client release (18) is from november, did I miss something?

> 
> OK, thanks. The owner is pgbuildfarm@rjuju.net, which I'm guessing
> might be Julien Rouhaud, but not sure?

Yes it's me.  I'm not sure why it's such a mystery as I've been answering for
it for many years.  Having an alias is useful to me as I can redirect it to
some private mailbox with less traffic and therefore react promptly in case of
any problem.



Re: what's going on with lapwing?

From
Tom Lane
Date:
Julien Rouhaud <rjuju123@gmail.com> writes:
>> On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I believe it hasn't been updated with the buildfarm client changes
>>> needed to run sepgsql-check since aeb8ea361.

> Well, AFAIK the usual habit when something is broken and a buildfarm cilent
> upgrade is needed is to warn the buildfarm owners.  There was an email
> yesterday for installing libcurl which I did.  There was an email before last
> release for possibly stuck tests which I checked.  There was no such email to
> ask to update the client, so I'm not sure why you expected me to do so?

Yeah, I think a new buildfarm release is overdue.  We have this issue
affecting sepgsql-check, and we have the TestUpgradeXversion changes
that are necessary for that still to work, and it's not great to
expect owners to run hand-patched scripts.

            regards, tom lane



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Mon, Mar 3, 2025 at 6:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Well, AFAIK the usual habit when something is broken and a buildfarm cilent
> upgrade is needed is to warn the buildfarm owners.  There was an email
> yesterday for installing libcurl which I did.  There was an email before last
> release for possibly stuck tests which I checked.  There was no such email to
> ask to update the client, so I'm not sure why you expected me to do so?
>
> Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm
> client release (18) is from november, did I miss something?

Honestly, I just noticed that the buildfarm member in question had
been red for over a month and I figured it was a setup issue of some
kind. I guess I was wrong. It didn't cross my mind that a commit over
a month ago had broken things and there had been no subsequent revert
or buildfarm client release. I'm frankly quite astonished because Tom
recently told me that I needed to fix something RIGHT NOW or else
revert when the commit had been in the tree for two hours. Given that
context, maybe you can understand why I thought it was unlikely that
we were just chilling about this being broken after 38 days.

> Yes it's me.  I'm not sure why it's such a mystery as I've been answering for
> it for many years.  Having an alias is useful to me as I can redirect it to
> some private mailbox with less traffic and therefore react promptly in case of
> any problem.

Buildfarm members are labelled with an email address, not a name. I
don't magically know the names of all the people who correspond to
those email addresses. I've learned some of them over time, but I
didn't recognize rjuju.net.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Andrew Dunstan
Date:


On 2025-03-03 Mo 6:12 PM, Tom Lane wrote:
Julien Rouhaud <rjuju123@gmail.com> writes:
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
I believe it hasn't been updated with the buildfarm client changes
needed to run sepgsql-check since aeb8ea361.
Well, AFAIK the usual habit when something is broken and a buildfarm cilent
upgrade is needed is to warn the buildfarm owners.  There was an email
yesterday for installing libcurl which I did.  There was an email before last
release for possibly stuck tests which I checked.  There was no such email to
ask to update the client, so I'm not sure why you expected me to do so?
Yeah, I think a new buildfarm release is overdue.  We have this issue
affecting sepgsql-check, and we have the TestUpgradeXversion changes
that are necessary for that still to work, and it's not great to
expect owners to run hand-patched scripts.
			


Yeah, I try to avoid making too many releases, but I agree it's time to push one.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: what's going on with lapwing?

From
Julien Rouhaud
Date:
On Tue, Mar 04, 2025 at 08:33:46AM -0500, Robert Haas wrote:
> On Mon, Mar 3, 2025 at 6:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Well, AFAIK the usual habit when something is broken and a buildfarm cilent
> > upgrade is needed is to warn the buildfarm owners.  There was an email
> > yesterday for installing libcurl which I did.  There was an email before last
> > release for possibly stuck tests which I checked.  There was no such email to
> > ask to update the client, so I'm not sure why you expected me to do so?
> >
> > Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm
> > client release (18) is from november, did I miss something?
>
> Honestly, I just noticed that the buildfarm member in question had
> been red for over a month and I figured it was a setup issue of some
> kind. I guess I was wrong. It didn't cross my mind that a commit over
> a month ago had broken things and there had been no subsequent revert
> or buildfarm client release. I'm frankly quite astonished because Tom
> recently told me that I needed to fix something RIGHT NOW or else
> revert when the commit had been in the tree for two hours. Given that
> context, maybe you can understand why I thought it was unlikely that
> we were just chilling about this being broken after 38 days.

Yes, I do remember the 2 threads and I totally understand.  On my side I know
that I don't have as much time as I'd like to contribute but I at least make
sure that I don't leave my animal broken because of something on my side.  So I
did try to check when I received the first failure notifications, also saw that
there were no logs, found a few other animals with the same symptoms and
concluded that it would eventually be resolved either with a fix or some
instructions on the buildfarm client side or something.  I did check again a
few days ago when I was that nothing was pushed anymore on that branch only,
saw that it was the same on other animals, got confused and gave up.
>
> > Yes it's me.  I'm not sure why it's such a mystery as I've been answering for
> > it for many years.  Having an alias is useful to me as I can redirect it to
> > some private mailbox with less traffic and therefore react promptly in case of
> > any problem.
>
> Buildfarm members are labelled with an email address, not a name. I
> don't magically know the names of all the people who correspond to
> those email addresses. I've learned some of them over time, but I
> didn't recognize rjuju.net.

Ah sorry, I thought that the "rjuju" was a poor enough alias (which I more or
less use everywhere) that it would be recognisable.  In any case I do receive
the emails so I can reply, just not from that address.



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Tue, Mar 4, 2025 at 9:26 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Yes, I do remember the 2 threads and I totally understand.  On my side I know
> instructions on the buildfarm client side or something.  I did check again a
> few days ago when I was that nothing was pushed anymore on that branch only,
> saw that it was the same on other animals, got confused and gave up.

Sure, makes sense.

> > Buildfarm members are labelled with an email address, not a name. I
> > don't magically know the names of all the people who correspond to
> > those email addresses. I've learned some of them over time, but I
> > didn't recognize rjuju.net.
>
> Ah sorry, I thought that the "rjuju" was a poor enough alias (which I more or
> less use everywhere) that it would be recognisable.  In any case I do receive
> the emails so I can reply, just not from that address.

I think honestly at some point in the past I did have that alias <=>
real name mapping in my brain, but somewhere along the way it fell
out. On the list, I tend to pay more attention to people advertised
real names than to their nicknames/handles, because that's what goes
in release notes, what people's name tag will say at a conference,
etc.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Honestly, I just noticed that the buildfarm member in question had
> been red for over a month and I figured it was a setup issue of some
> kind. I guess I was wrong. It didn't cross my mind that a commit over
> a month ago had broken things and there had been no subsequent revert
> or buildfarm client release. I'm frankly quite astonished because Tom
> recently told me that I needed to fix something RIGHT NOW or else
> revert when the commit had been in the tree for two hours. Given that
> context, maybe you can understand why I thought it was unlikely that
> we were just chilling about this being broken after 38 days.

As far as that goes, there are degrees of buildfarm brokenness.

Right now we have three remaining animals that are affected by the
sepgsql issue (two of which seem to have been stopped awhile ago by
their owners), and the error is perfectly predictable/repeatable.
So it's easy to ignore.

At the point where I complained to you about that other problem,
it was looking like it might cause a quarter or a third of the
buildfarm to fail intermittently.  Maybe I overestimated the
frequency of the failure, but if that was accurate it would have
resulted in a lot of fruitless double-checking of failures to
see if there was anything real underneath the noise.  So I find
that sort of case much more painful.

But yeah, I thought we were overdue for a buildfarm release.
I'm pleased to see that Andrew just pushed one.

            regards, tom lane



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Tue, Mar 4, 2025 at 10:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> At the point where I complained to you about that other problem,
> it was looking like it might cause a quarter or a third of the
> buildfarm to fail intermittently.  Maybe I overestimated the
> frequency of the failure, but if that was accurate it would have
> resulted in a lot of fruitless double-checking of failures to
> see if there was anything real underneath the noise.  So I find
> that sort of case much more painful.

I think that's actually totally fair. I was not upset that you wanted
it fixed, or even that you wanted it fixed relatively quickly. The
things that I was upset about were:

1. There's no real way for me to avoid this kind of pain. That's not
your fault, but it is something that I think we need to address as a
community. As Jelte said on the other thread, other projects have
infrastructure that allows them to avoid these kinds of problems by
being able to do pre-commit testing. Having modern infrastructure for
stuff like this is an important part of attracting and retaining
developers.

2. Two hours is just not enough time. Never mind that people like to
have evenings and weekends off -- this is supposed to be a community
that operates by consensus. It doesn't seem right to spend months or
years discussing the design before committing, and then after commit,
boom, you have to make a unilateral decision about what to change
within -- not even hours, but minutes. Because you also need time for
BF results to show up, and then you need time to code and test
whatever you decided. I would actually be quite sympathetic to the
time frame here if we were immediately before a feature or release
freeze when there is no tolerance for error, but not in this
situation.

3. You ignored the substantive questions that I asked you to comment
only on the procedural issue of fixing the BF, even though you were a
previous participant in the discussion on that patch.

Maybe my tolerance for reverts is just lower than yours. I think it's
bad when somebody has a problem like this, insta-reverts, then tries
it again later after changing the patch, then maybe the same thing
happens again, gets insta-reverted a second time, then maybe the third
time the commit actually sticks. I think that clutters up the commit
history with a bunch of junk, and that junk is permanent. The
buildfarm being red is bad, but after it's fixed it will be green
again and the time for which it was red will have little enduring
impact. But everybody who tries to find stuff in the commit log is
potentially inconvenienced by reverts, forever. Commands like 'git log
FILENAME' or 'git log -Gstring' will now return extra, spurious hits.
If I absolutely have to choose between the BF being red for a couple
of days and revert ping-pong, I would prefer the former.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Julien Rouhaud
Date:
On Tue, Mar 04, 2025 at 10:18:49AM -0500, Tom Lane wrote:
>
> But yeah, I thought we were overdue for a buildfarm release.
> I'm pleased to see that Andrew just pushed one.

FWIW I installed the client version 19.1 this morning and forced a run on HEAD
and lapwing is back to green.



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Wed, Mar 5, 2025 at 9:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> FWIW I installed the client version 19.1 this morning and forced a run on HEAD
> and lapwing is back to green.

Thanks, appreciate it.

By the way, is there a particular reason why we're keeping Debian 7
coverage in the buildfarm? I don't want to be in a huge rush to kill
platforms people still care about, but it was pointed out to me
off-list that this is quite an old release -- it seems Debian 7 was
first released in 2013, last released in 2016, EOL in 2018. I assume
that not too many people are going to install a PostgreSQL release
that comes out in 2025 on an OS that's been EOL for 7 years (or 12
years if the BF page is correct that this is actually Debian 7.0).
Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and
13, but not 8 or 10. Is there a theory behind all of this or is the
current situation somewhat accidental?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Andrew Dunstan
Date:
On 2025-03-06 Th 10:45 AM, Robert Haas wrote:
> On Wed, Mar 5, 2025 at 9:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> FWIW I installed the client version 19.1 this morning and forced a run on HEAD
>> and lapwing is back to green.
> Thanks, appreciate it.
>
> By the way, is there a particular reason why we're keeping Debian 7
> coverage in the buildfarm? I don't want to be in a huge rush to kill
> platforms people still care about, but it was pointed out to me
> off-list that this is quite an old release -- it seems Debian 7 was
> first released in 2013, last released in 2016, EOL in 2018. I assume
> that not too many people are going to install a PostgreSQL release
> that comes out in 2025 on an OS that's been EOL for 7 years (or 12
> years if the BF page is correct that this is actually Debian 7.0).
> Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and
> 13, but not 8 or 10. Is there a theory behind all of this or is the
> current situation somewhat accidental?
>


Fairly accidental, I think.

We do have a project at EDB at fill in certain gaps in buildfarm 
coverage, so maybe we can reduce the incidence of such accidents.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: what's going on with lapwing?

From
Andres Freund
Date:
Hi,

On 2025-03-06 11:57:05 -0500, Andrew Dunstan wrote:
> On 2025-03-06 Th 10:45 AM, Robert Haas wrote:
> > By the way, is there a particular reason why we're keeping Debian 7
> > coverage in the buildfarm? I don't want to be in a huge rush to kill
> > platforms people still care about, but it was pointed out to me
> > off-list that this is quite an old release -- it seems Debian 7 was
> > first released in 2013, last released in 2016, EOL in 2018. I assume
> > that not too many people are going to install a PostgreSQL release
> > that comes out in 2025 on an OS that's been EOL for 7 years (or 12
> > years if the BF page is correct that this is actually Debian 7.0).
> > Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and
> > 13, but not 8 or 10. Is there a theory behind all of this or is the
> > current situation somewhat accidental?
>
> Fairly accidental, I think.
> 
> We do have a project at EDB at fill in certain gaps in buildfarm coverage,
> so maybe we can reduce the incidence of such accidents.

I think the way to fix the gap is to drop the buildfarm animal running an OS
that has been unsupported for 7 years / without security fixes for 9 years,
not to add an animal running an OS that has been unsupported for 4 years /
without security fixes for 6 years (i.e. Debian 8).

Debian 9 has been out of support for 2 years / without security fixes for 4.

Debian 10 is also out of LTS support, albeit more recently 30 June 2024 and
has been out of security support for 2 1/2 years.

Keeping this old stuff around is a burden on everyone that commits stuff and
probably on some contributors too.

I'd not necessarily fight hard to drop a perfectly working Debian 10 animal,
but adding a new one at this point makes no sense whatsoever.

Greetings,

Andres



Re: what's going on with lapwing?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 2025-03-06 Th 10:45 AM, Robert Haas wrote:
>> By the way, is there a particular reason why we're keeping Debian 7
>> coverage in the buildfarm? I don't want to be in a huge rush to kill
>> platforms people still care about, but it was pointed out to me
>> off-list that this is quite an old release -- it seems Debian 7 was
>> first released in 2013, last released in 2016, EOL in 2018. I assume
>> that not too many people are going to install a PostgreSQL release
>> that comes out in 2025 on an OS that's been EOL for 7 years (or 12
>> years if the BF page is correct that this is actually Debian 7.0).

I don't think that's the way to think about old buildfarm members.
Sure, nobody is very likely to be putting PG 18 on a Debian 7 box,
but the odds are much higher that they might have PG 13 on it and
wish to update to 13.latest.  So what you need to compare OS EOL
dates to is not current development but our oldest supported branch.

Having said that, PG 13 came out in 2020, so yeah it'd be reasonable
to retire Debian 7 buildfarm animals now.  But the gap isn't nearly
as large as you make it sound.

> We do have a project at EDB at fill in certain gaps in buildfarm 
> coverage, so maybe we can reduce the incidence of such accidents.

+1

            regards, tom lane



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think that's the way to think about old buildfarm members.
> Sure, nobody is very likely to be putting PG 18 on a Debian 7 box,
> but the odds are much higher that they might have PG 13 on it and
> wish to update to 13.latest.  So what you need to compare OS EOL
> dates to is not current development but our oldest supported branch.

But the work it's creating is mostly because it's still testing
master. If it were only testing a gradually-decreasing set of older
branches, that wouldn't seem weird to me.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 6, 2025 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't think that's the way to think about old buildfarm members.
>> Sure, nobody is very likely to be putting PG 18 on a Debian 7 box,
>> but the odds are much higher that they might have PG 13 on it and
>> wish to update to 13.latest.  So what you need to compare OS EOL
>> dates to is not current development but our oldest supported branch.

> But the work it's creating is mostly because it's still testing
> master. If it were only testing a gradually-decreasing set of older
> branches, that wouldn't seem weird to me.

I don't think it's reasonable to ask buildfarm owners to set up their
animals like that, because (AFAIK) it requires tedious, error-prone
configuration of moving parts that we don't supply, like cron scripts.

If there were some trivial way to do that, it'd be more acceptable.
Maybe invent a build-farm.conf option like "newest_branch_to_build"?
branches_to_build covers some adjacent territory, but its filtering
options go the wrong way (only branches newer than X, whereas what
we want here is only branches older than X); probably we could
also address this with more options there.

            regards, tom lane



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> If there were some trivial way to do that, it'd be more acceptable.
> Maybe invent a build-farm.conf option like "newest_branch_to_build"?
> branches_to_build covers some adjacent territory, but its filtering
> options go the wrong way (only branches newer than X, whereas what
> we want here is only branches older than X); probably we could
> also address this with more options there.

Yes, that would be nice. I also think we should mandate the use of
that option for OS versions that are EOL for more than X years, for
some to-be-determined value of X, like maybe 3 or something. Right
now, in the absence of any policy, and in the absence also of any
agreement on what the policy should be, we have to have a huge mailing
list thread every time somebody wants to get rid of an OS. I think it
should just be automatic. We don't need to give up -- and IMHO
shouldn't give up -- on an OS the moment the vendor pulls the plug
either on a certain release or on the system in general, but we
shouldn't have to individually litigate every case, either. Right now
half of when we desupport an OS seems to boil down to when the hard
drive on the last remaining server anybody has access to finally dies,
but that leads to weird outcomes where some operating systems are not
tested even though they are still in active use and others continue to
get tested long after they are not. To me, it all just feels a bit too
random, and I think we would be well-served by being more intentional
about it.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe invent a build-farm.conf option like "newest_branch_to_build"?

> Yes, that would be nice. I also think we should mandate the use of
> that option for OS versions that are EOL for more than X years, for
> some to-be-determined value of X, like maybe 3 or something.

It's hard to "mandate" anything in a distributed project like this.
I don't really see a need to either, at least for cases where an
old animal isn't causing us extra work.  When it does, though,
it'd be nice to be able to decide "we're not gonna support that
OS version beyond PG nn", and then have a simple recipe to give
the BF owner that's less drastic than "shut it down".

            regards, tom lane



Re: what's going on with lapwing?

From
Andres Freund
Date:
Hi,

On 2025-03-06 14:13:40 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Maybe invent a build-farm.conf option like "newest_branch_to_build"?
>
> > Yes, that would be nice. I also think we should mandate the use of
> > that option for OS versions that are EOL for more than X years, for
> > some to-be-determined value of X, like maybe 3 or something.
>
> It's hard to "mandate" anything in a distributed project like this.
> I don't really see a need to either, at least for cases where an
> old animal isn't causing us extra work.

Lapwing *has* caused extra work though, repeatedly.


> When it does, though, it'd be nice to be able to decide "we're not gonna
> support that OS version beyond PG nn", and then have a simple recipe to give
> the BF owner that's less drastic than "shut it down".

The BF is there to be useful for PG development. BF owners contribute them for
that purpose. I don't think we need to keep animals alive when they're past
their shelf life, just to make the animal's owners happy - I suspect most
won't want to keep an animal alive that we don't want.

That said, I'd be happy if the BF had a slightly easier way to configure "up
to this major version".

Greetings,

Andres



Re: what's going on with lapwing?

From
Andrew Dunstan
Date:
On 2025-03-06 Th 2:28 PM, Andres Freund wrote:
> Hi,
>
> On 2025-03-06 14:13:40 -0500, Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> Maybe invent a build-farm.conf option like "newest_branch_to_build"?
>>> Yes, that would be nice. I also think we should mandate the use of
>>> that option for OS versions that are EOL for more than X years, for
>>> some to-be-determined value of X, like maybe 3 or something.
>> It's hard to "mandate" anything in a distributed project like this.
>> I don't really see a need to either, at least for cases where an
>> old animal isn't causing us extra work.
> Lapwing *has* caused extra work though, repeatedly.
>
>
>> When it does, though, it'd be nice to be able to decide "we're not gonna
>> support that OS version beyond PG nn", and then have a simple recipe to give
>> the BF owner that's less drastic than "shut it down".
> The BF is there to be useful for PG development. BF owners contribute them for
> that purpose. I don't think we need to keep animals alive when they're past
> their shelf life, just to make the animal's owners happy - I suspect most
> won't want to keep an animal alive that we don't want.
>
> That said, I'd be happy if the BF had a slightly easier way to configure "up
> to this major version".
>


The only reason it's not there is that nobody's ever asked for it ;-)

You can specify an actual list right now instead of a keyword for 
branches_to_build. e.g. "branches_to_build => [qw(REL_13_STABLE 
REL_14_STABLE REL_15_STABLE)]". The only difficulty about this is you 
need to prune the list when a branch goes EOL.

But we could also build in some keyword type settings too, that would 
interact more sensibly with the server list of branches.

So we could say something like  "branches_to_build => 
'UP_TO_REL_15_STABLE'". That would be pretty simple to code for.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com




Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's hard to "mandate" anything in a distributed project like this.
> I don't really see a need to either, at least for cases where an
> old animal isn't causing us extra work.

I don't know, to me it feels like we have the argument about whether
StegosaurOS is actually dead or whether there might be survivors of
the Chixulub impact hiding somewhere several times a year. The
participants in those discussions are always pretty much always the
same, and their opinions are pretty much always the same, and the
threads are long and tiresome. Eventually we usually get consensus to
forcibly retire some OS that's been EOL for 5 or 10 years and then we
do it all over again the next time somebody's losing their mind. I
think all of that energy could be better spent.

What really makes this unfortunate is that we typically only have this
discussion after that machine has ALREADY caused a bunch of hassle for
a bunch of people. You commit something, the BF turns red, you fix it.
You do the same thing again. Then maybe a third time. By the time we
actually get consensus to remove things, they're generally not just
MOSTLY dead. They're at the point where all you can do is check their
pockets for loose change. They have kicked the bucket, shuffled off
this mortal coil, run down the curtain and gone to join the choir
invisible. They are ex-operating systems.

It just doesn't seem reasonable to me that you have to basically show
up and prove that you've already wasted 100 hours or whatever on a
defunct OS before we'll consider giving it the boot. It's a
predictable outcome. Once the OS is dead, you can't easily upgrade to
newer software versions, which is eventually going to break something
for somebody, not to mention that eventually no PG developer other
than the BF member owner will be able to get access to a copy of that
OS to fix anything that breaks. It's only a matter of time before that
becomes an inconvenience.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> The only reason it's not there is that nobody's ever asked for it ;-)

> You can specify an actual list right now instead of a keyword for 
> branches_to_build. e.g. "branches_to_build => [qw(REL_13_STABLE 
> REL_14_STABLE REL_15_STABLE)]". The only difficulty about this is you 
> need to prune the list when a branch goes EOL.

Right; it's that need for manual maintenance that I'd like to
get out from under.

> But we could also build in some keyword type settings too, that would 
> interact more sensibly with the server list of branches.
> So we could say something like  "branches_to_build => 
> 'UP_TO_REL_15_STABLE'". That would be pretty simple to code for.

WFM.

            regards, tom lane



Re: what's going on with lapwing?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 6, 2025 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's hard to "mandate" anything in a distributed project like this.
>> I don't really see a need to either, at least for cases where an
>> old animal isn't causing us extra work.

> I don't know, to me it feels like we have the argument about whether
> StegosaurOS is actually dead or whether there might be survivors of
> the Chixulub impact hiding somewhere several times a year.

I think you misunderstood my drift.  I'm okay with setting a project
policy that we won't support OSes that are more than N years EOL,
as long as it's phrased to account for older PG branches properly.
My point was that we can implement such a policy in a laissez-faire
way: if an older BF animal isn't causing us trouble then why mess
with it?  Once we *do* recognize that it's causing us trouble,
we can apply the still-hypothetical policy and ask the owner to
turn it off for branches where it's out of support.

            regards, tom lane



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 4:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think you misunderstood my drift.  I'm okay with setting a project
> policy that we won't support OSes that are more than N years EOL,
> as long as it's phrased to account for older PG branches properly.

Yep, I misunderstood. That sounds awesome. Let's see if others agree.

> My point was that we can implement such a policy in a laissez-faire
> way: if an older BF animal isn't causing us trouble then why mess
> with it?  Once we *do* recognize that it's causing us trouble,
> we can apply the still-hypothetical policy and ask the owner to
> turn it off for branches where it's out of support.

Fair enough. This does have the disadvantage that people will still
commit things that turn the buildfarm red and have to go into panic
mode -- perhaps not even realizing which machines are running an older
OS -- and then reactively do this. However, it still sounds like
progress over where we are today, because it would (hopefully) remove
the need for an argument about each individual case.

Honestly, I'm kind of bummed by how fast operating systems and OS
versions die these days. It seems crazy to me that I probably couldn't
reasonably build a PG from today on the laptop I started at EDB with,
or a PG from then on my current laptop. But it doesn't seem like
there's much point in fighting the tide. We shouldn't be the only
people trying to keep stuff working long after the sell-by date.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Melanie Plageman
Date:
On Thu, Mar 6, 2025 at 4:38 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Mar 6, 2025 at 4:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > My point was that we can implement such a policy in a laissez-faire
> > way: if an older BF animal isn't causing us trouble then why mess
> > with it?  Once we *do* recognize that it's causing us trouble,
> > we can apply the still-hypothetical policy and ask the owner to
> > turn it off for branches where it's out of support.
>
> Fair enough. This does have the disadvantage that people will still
> commit things that turn the buildfarm red and have to go into panic
> mode -- perhaps not even realizing which machines are running an older
> OS -- and then reactively do this. However, it still sounds like
> progress over where we are today, because it would (hopefully) remove
> the need for an argument about each individual case.

One thing I've been wishing for recently is access to the discussion
and lore around individual buildfarm animals in a consolidated place.
As a new committer, I haven't been part of all of these discussions
over the last N years and so if an animal goes red, I can look through
the buildfarm status history to see what kind of failures it has had
for the last ~6 months, but it doesn't tell me if that machine is
known for failing because of an outdated OS or some other
platform-specific or hardware-specific issue.

For some animals searching for "skink" in the email archives might be
enough, but I haven't found that to be true so far.

- Melanie



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 4:54 PM Melanie Plageman
<melanieplageman@gmail.com> wrote:
> One thing I've been wishing for recently is access to the discussion
> and lore around individual buildfarm animals in a consolidated place.
> As a new committer, I haven't been part of all of these discussions
> over the last N years and so if an animal goes red, I can look through
> the buildfarm status history to see what kind of failures it has had
> for the last ~6 months, but it doesn't tell me if that machine is
> known for failing because of an outdated OS or some other
> platform-specific or hardware-specific issue.

This is a problem for me, too, so I would also be happy about this.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Julien Rouhaud
Date:
On Thu, Mar 06, 2025 at 02:28:20PM -0500, Andres Freund wrote:
> Hi,
>
> On 2025-03-06 14:13:40 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >> Maybe invent a build-farm.conf option like "newest_branch_to_build"?
> >
> > > Yes, that would be nice. I also think we should mandate the use of
> > > that option for OS versions that are EOL for more than X years, for
> > > some to-be-determined value of X, like maybe 3 or something.
> >
> > It's hard to "mandate" anything in a distributed project like this.
> > I don't really see a need to either, at least for cases where an
> > old animal isn't causing us extra work.
>
> Lapwing *has* caused extra work though, repeatedly.
>
>
> > When it does, though, it'd be nice to be able to decide "we're not gonna
> > support that OS version beyond PG nn", and then have a simple recipe to give
> > the BF owner that's less drastic than "shut it down".
>
> The BF is there to be useful for PG development. BF owners contribute them for
> that purpose. I don't think we need to keep animals alive when they're past
> their shelf life, just to make the animal's owners happy - I suspect most
> won't want to keep an animal alive that we don't want.

I indeed don't want to keep lapwing up unless there is any value.  Note that it
started to fail on 2 branches after the last buildfarm client update for
reasons I don't understand.  Since everyone is complaining about lapwing
already I will just turn it off.  If anyone is interested in keeping it up on
REL_13_STABLE until its EOL (so for the next ~6 months) let me know.



Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 5:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:\
> I indeed don't want to keep lapwing up unless there is any value.  Note that it
> started to fail on 2 branches after the last buildfarm client update for
> reasons I don't understand.  Since everyone is complaining about lapwing
> already I will just turn it off.  If anyone is interested in keeping it up on
> REL_13_STABLE until its EOL (so for the next ~6 months) let me know.

We're not complaining, we're just discussing. :-)

In all seriousness, I appreciate the willingness of you and others to
run BF members and of Andrew to maintain the BF client. It's kind of a
thankless job, I'm sure, but will be much worse off as a project if
people just stop doing those things. Wanting to figure out how we can
do better or wondering whether a certain machine makes sense any more
doesn't translate into not appreciating the work.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Julien Rouhaud
Date:
On Thu, Mar 06, 2025 at 06:25:29PM -0500, Robert Haas wrote:
> On Thu, Mar 6, 2025 at 5:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:\
> > I indeed don't want to keep lapwing up unless there is any value.  Note that it
> > started to fail on 2 branches after the last buildfarm client update for
> > reasons I don't understand.  Since everyone is complaining about lapwing
> > already I will just turn it off.  If anyone is interested in keeping it up on
> > REL_13_STABLE until its EOL (so for the next ~6 months) let me know.
>
> We're not complaining, we're just discussing. :-)

Well, in quoted part of my answer was:

> Lapwing *has* caused extra work though, repeatedly.

In my book that's complaining.  At least Tom and Alvaro previously complained
too recently.

>
> In all seriousness, I appreciate the willingness of you and others to
> run BF members and of Andrew to maintain the BF client. It's kind of a
> thankless job, I'm sure, but will be much worse off as a project if
> people just stop doing those things. Wanting to figure out how we can
> do better or wondering whether a certain machine makes sense any more
> doesn't translate into not appreciating the work.

I'm fine with the thankless job, I got used to it.  But being blamed for it is
a big no.

Honestly, it's been years of people complaining on one thing or another about
lapwing without ever asking for a change.  Was it really hard to ask "can you
remove the -Werror it's not useful anymore" the first time it caused extra
work?  Instead I have to guess what people want.  So after a few complaints I
removed that flag.  And now after a few more complaints I turned it off.  If
that's not what you want, well too bad but that's on you, not me.



Re: what's going on with lapwing?

From
Peter Eisentraut
Date:
On 06.03.25 22:54, Melanie Plageman wrote:
> One thing I've been wishing for recently is access to the discussion
> and lore around individual buildfarm animals in a consolidated place.
> As a new committer, I haven't been part of all of these discussions
> over the last N years and so if an animal goes red, I can look through
> the buildfarm status history to see what kind of failures it has had
> for the last ~6 months, but it doesn't tell me if that machine is
> known for failing because of an outdated OS or some other
> platform-specific or hardware-specific issue.

I'm not sure there is more than that to it.  There is at any given time 
necessarily a trailing edge system, which is where you tend to run into 
trouble if you try to use compiler or library features past some 
invisible boundary, and which also tend to have flaky hardware and 
perhaps limited resources in general, and so you might also tend to see 
some more test instability there.

That used to be an HP-UX 10 system, and then later it was an OpenBSD on 
Sparc system, and now it's seemingly this one, and perhaps there were a 
few in between, and there will be a new one soon.  But this keeps 
moving, and the lore from more than 12 months ago is probably not that 
useful here.




Re: what's going on with lapwing?

From
Robert Haas
Date:
On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Honestly, it's been years of people complaining on one thing or another about
> lapwing without ever asking for a change.  Was it really hard to ask "can you
> remove the -Werror it's not useful anymore" the first time it caused extra
> work?  Instead I have to guess what people want.  So after a few complaints I
> removed that flag.  And now after a few more complaints I turned it off.  If
> that's not what you want, well too bad but that's on you, not me.

This is actually much harder for me as a committer than you might
guess. How is an individual committer working on an individual issue
supposed to know that removing -Wall is the right thing vs. fixing the
warning in some way? As Melanie also mentions in her reply, committers
are not born knowing or understanding which machines are chronic
problems, and sometimes it's very difficult to figure that out. If you
read every message on the mailing list you're probably going to have
more idea than if you don't, but that's more than most people can do
these days, and you can still miss things.

But I think your complaint here is actually getting at another problem
with the way that we do the buildfarm as a project: it's completely
unplanned. People show up and run random buildfarm machines and nobody
knows why they are running those machines: was it because they care
about support for that platform, or was it just to be nice, or where
they testing some unusual configuration, or just because they set it
up a long time ago and have never turned it off? And then other
buildfarm members that maybe we ought to have are missing and nobody
knows if anyone else is working on that, or maybe they don't even know
about it. And then, to your point, nobody ever shows up and tells a
buildfarm member owner what we'd like them to do. There is no "what
we'd like them to do" -- we have no policy or preference or anything
as a group. Everybody's just guessing what other people want and care
about, and then sometimes we're all grumpy at each other.

But notice that with CI, it's the other way around. Some small group
of people decide what our CI setup should do and then they configure
it to do that thing. You can agree with those decisions or not, but
they are intentional. The platforms and OS versions that are being
tested are what somebody decided was best from among the available
options. Now I'm not saying that sort of centralized planning is
without flaws -- and the fact that only a handful of people seem to
understand how to keep this CI stuff working is definitely one of them
-- but it also has some strengths, namely that you remove a lot of
this guesswork around what the other people involved actually want.

I'm not sure exactly where I'm going with this line of thought, but I
do wonder if we ought to find a way to be more intentional about the
buildfarm instead of just letting things happen and then being sad
that we didn't get what we wanted.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: what's going on with lapwing?

From
Julien Rouhaud
Date:
On Fri, Mar 07, 2025 at 08:52:26AM -0500, Robert Haas wrote:
> On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > Honestly, it's been years of people complaining on one thing or another about
> > lapwing without ever asking for a change.  Was it really hard to ask "can you
> > remove the -Werror it's not useful anymore" the first time it caused extra
> > work?  Instead I have to guess what people want.  So after a few complaints I
> > removed that flag.  And now after a few more complaints I turned it off.  If
> > that's not what you want, well too bad but that's on you, not me.
>
> This is actually much harder for me as a committer than you might
> guess. How is an individual committer working on an individual issue
> supposed to know that removing -Wall is the right thing vs. fixing the
> warning in some way?

That may be true in general, but I don't think it was the case here.

The "repeated extra work" here was replacing {0} with {{0}} or something like
that, due to a compiler bug.  It was outlined multiple time.

The problem is easy: either doing that is so much pain that the -Werror should
be removed, or the odds of that compiler and/or 32bits architecture revealing
an actual bug is high enough that it's a small price to pay.

I was fine either way, and I didn't think it was my choice to make since I'm
not the one responsible to deal with it.  But none of that happened and instead
people just chose to the easy solution of complaining about it for the sake of
complaining without giving an actual solution.

Anyway it seems that the only wanted use for this animal now would be to test
REL_13_STABLE which will be EOL in a few months now, and I don't see the point
of dealing more complaining just for that given how much activity is going to
happen on that branch, so as I said I turned it off.



Re: what's going on with lapwing?

From
Greg Sabino Mullane
Date:


On Fri, Mar 7, 2025 at 8:52 AM Robert Haas <robertmhaas@gmail.com> wrote:
There is no "what we'd like them to do" -- we have no policy or preference or anything as a group. Everybody's just guessing what other people want and care about, and then sometimes we're all grumpy at each other.

This is a great point. Many years ago I had some spare hardware and wanted to setup a build animal. I looked through the list to see what unique permutation I could provide, but it was a very manual process! It would be nice to at least have a list of "wanted" configurations. 

Perhaps a summary of the currently running configurations. So you could see, for example, which compiler versions were run in the last week, and perhaps how many animals were running it. Basically, some meta information about the current runs, which could help us drive the "wanted" list.

--
Cheers,
Greg

--
Enterprise Postgres Software Products & Tech Support

Buildfarm coverage planning (was: what's going on with lapwing?)

From
Andrew Dunstan
Date:


On 2025-03-07 Fr 8:52 AM, Robert Haas wrote:
On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
Honestly, it's been years of people complaining on one thing or another about
lapwing without ever asking for a change.  Was it really hard to ask "can you
remove the -Werror it's not useful anymore" the first time it caused extra
work?  Instead I have to guess what people want.  So after a few complaints I
removed that flag.  And now after a few more complaints I turned it off.  If
that's not what you want, well too bad but that's on you, not me.
This is actually much harder for me as a committer than you might
guess. How is an individual committer working on an individual issue
supposed to know that removing -Wall is the right thing vs. fixing the
warning in some way? As Melanie also mentions in her reply, committers
are not born knowing or understanding which machines are chronic
problems, and sometimes it's very difficult to figure that out. If you
read every message on the mailing list you're probably going to have
more idea than if you don't, but that's more than most people can do
these days, and you can still miss things.

But I think your complaint here is actually getting at another problem
with the way that we do the buildfarm as a project: it's completely
unplanned. People show up and run random buildfarm machines and nobody
knows why they are running those machines: was it because they care
about support for that platform, or was it just to be nice, or where
they testing some unusual configuration, or just because they set it
up a long time ago and have never turned it off? And then other
buildfarm members that maybe we ought to have are missing and nobody
knows if anyone else is working on that, or maybe they don't even know
about it. And then, to your point, nobody ever shows up and tells a
buildfarm member owner what we'd like them to do. There is no "what
we'd like them to do" -- we have no policy or preference or anything
as a group. Everybody's just guessing what other people want and care
about, and then sometimes we're all grumpy at each other.

But notice that with CI, it's the other way around. Some small group
of people decide what our CI setup should do and then they configure
it to do that thing. You can agree with those decisions or not, but
they are intentional. The platforms and OS versions that are being
tested are what somebody decided was best from among the available
options. Now I'm not saying that sort of centralized planning is
without flaws -- and the fact that only a handful of people seem to
understand how to keep this CI stuff working is definitely one of them
-- but it also has some strengths, namely that you remove a lot of
this guesswork around what the other people involved actually want.

I'm not sure exactly where I'm going with this line of thought, but I
do wonder if we ought to find a way to be more intentional about the
buildfarm instead of just letting things happen and then being sad
that we didn't get what we wanted.


I've renamed the thread because it's got a long way past what's happened with lapwing.

I agree that the coverage is unplanned, and that we should do better about it. 

However, we can't cover every possible combination of Architecture, OS, Compiler, Build System, Build Options. There are roughly 170 currently reporting buildfarm animals. Comprehensive coverage would mean a fleet of thousands, even if we ran multiple animals on a single machine. And I'm not sure that would be a win - the status page would become unmanageable and useless. So we need to be strategic about it.

So let's start with a couple of simple questions:

1. what are the perceived major gaps in coverage?

2. what animals (if any) should be turned off or updated?

I'll start with my nomination for 1: we need a Windows animal that builds in the same way and with the same options as the installer binaries. That's something we are actually working on at EDB.


cheers


andrew





--
Andrew Dunstan
EDB: https://www.enterprisedb.com