Thread: what's going on with lapwing?
lapwing has been failing sepgsql-check for over a month, but there's no log file: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=lapwing&dt=2025-03-03%2016%3A29%3A50&stg=contrib-sepgsql-check I feel like something must be broken with logfile collection in the buildfarm client, because I keep seeing failures like this where there's no log to view. It's hard to know whether this is a configuration issue with this BF member or whether something actually needs fixing. In any case, we should try to figure out what's going on here. Apologies if this has been discussed elsewhere and I have missed it. Pointers appreciated. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > lapwing has been failing sepgsql-check for over a month, but there's > no log file: I believe it hasn't been updated with the buildfarm client changes needed to run sepgsql-check since aeb8ea361. regards, tom lane
On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > lapwing has been failing sepgsql-check for over a month, but there's > > no log file: > > I believe it hasn't been updated with the buildfarm client changes > needed to run sepgsql-check since aeb8ea361. OK, thanks. The owner is pgbuildfarm@rjuju.net, which I'm guessing might be Julien Rouhaud, but not sure? It would be good to get that updated, especially with the end of the release cycle hard upon us. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Mar 03, 2025 at 03:52:30PM -0500, Robert Haas wrote: > On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > > lapwing has been failing sepgsql-check for over a month, but there's > > > no log file: > > > > I believe it hasn't been updated with the buildfarm client changes > > needed to run sepgsql-check since aeb8ea361. Well, AFAIK the usual habit when something is broken and a buildfarm cilent upgrade is needed is to warn the buildfarm owners. There was an email yesterday for installing libcurl which I did. There was an email before last release for possibly stuck tests which I checked. There was no such email to ask to update the client, so I'm not sure why you expected me to do so? Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm client release (18) is from november, did I miss something? > > OK, thanks. The owner is pgbuildfarm@rjuju.net, which I'm guessing > might be Julien Rouhaud, but not sure? Yes it's me. I'm not sure why it's such a mystery as I've been answering for it for many years. Having an alias is useful to me as I can redirect it to some private mailbox with less traffic and therefore react promptly in case of any problem.
Julien Rouhaud <rjuju123@gmail.com> writes: >> On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I believe it hasn't been updated with the buildfarm client changes >>> needed to run sepgsql-check since aeb8ea361. > Well, AFAIK the usual habit when something is broken and a buildfarm cilent > upgrade is needed is to warn the buildfarm owners. There was an email > yesterday for installing libcurl which I did. There was an email before last > release for possibly stuck tests which I checked. There was no such email to > ask to update the client, so I'm not sure why you expected me to do so? Yeah, I think a new buildfarm release is overdue. We have this issue affecting sepgsql-check, and we have the TestUpgradeXversion changes that are necessary for that still to work, and it's not great to expect owners to run hand-patched scripts. regards, tom lane
On Mon, Mar 3, 2025 at 6:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > Well, AFAIK the usual habit when something is broken and a buildfarm cilent > upgrade is needed is to warn the buildfarm owners. There was an email > yesterday for installing libcurl which I did. There was an email before last > release for possibly stuck tests which I checked. There was no such email to > ask to update the client, so I'm not sure why you expected me to do so? > > Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm > client release (18) is from november, did I miss something? Honestly, I just noticed that the buildfarm member in question had been red for over a month and I figured it was a setup issue of some kind. I guess I was wrong. It didn't cross my mind that a commit over a month ago had broken things and there had been no subsequent revert or buildfarm client release. I'm frankly quite astonished because Tom recently told me that I needed to fix something RIGHT NOW or else revert when the commit had been in the tree for two hours. Given that context, maybe you can understand why I thought it was unlikely that we were just chilling about this being broken after 38 days. > Yes it's me. I'm not sure why it's such a mystery as I've been answering for > it for many years. Having an alias is useful to me as I can redirect it to > some private mailbox with less traffic and therefore react promptly in case of > any problem. Buildfarm members are labelled with an email address, not a name. I don't magically know the names of all the people who correspond to those email addresses. I've learned some of them over time, but I didn't recognize rjuju.net. -- Robert Haas EDB: http://www.enterprisedb.com
Julien Rouhaud <rjuju123@gmail.com> writes:On Mon, Mar 3, 2025 at 2:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:I believe it hasn't been updated with the buildfarm client changes needed to run sepgsql-check since aeb8ea361.Well, AFAIK the usual habit when something is broken and a buildfarm cilent upgrade is needed is to warn the buildfarm owners. There was an email yesterday for installing libcurl which I did. There was an email before last release for possibly stuck tests which I checked. There was no such email to ask to update the client, so I'm not sure why you expected me to do so?Yeah, I think a new buildfarm release is overdue. We have this issue affecting sepgsql-check, and we have the TestUpgradeXversion changes that are necessary for that still to work, and it's not great to expect owners to run hand-patched scripts.
Yeah, I try to avoid making too many releases, but I agree it's time to push one.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
On Tue, Mar 04, 2025 at 08:33:46AM -0500, Robert Haas wrote: > On Mon, Mar 3, 2025 at 6:09 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Well, AFAIK the usual habit when something is broken and a buildfarm cilent > > upgrade is needed is to warn the buildfarm owners. There was an email > > yesterday for installing libcurl which I did. There was an email before last > > release for possibly stuck tests which I checked. There was no such email to > > ask to update the client, so I'm not sure why you expected me to do so? > > > > Apart from that, commit aeb8ea361 is from january 24, and the latest buildfarm > > client release (18) is from november, did I miss something? > > Honestly, I just noticed that the buildfarm member in question had > been red for over a month and I figured it was a setup issue of some > kind. I guess I was wrong. It didn't cross my mind that a commit over > a month ago had broken things and there had been no subsequent revert > or buildfarm client release. I'm frankly quite astonished because Tom > recently told me that I needed to fix something RIGHT NOW or else > revert when the commit had been in the tree for two hours. Given that > context, maybe you can understand why I thought it was unlikely that > we were just chilling about this being broken after 38 days. Yes, I do remember the 2 threads and I totally understand. On my side I know that I don't have as much time as I'd like to contribute but I at least make sure that I don't leave my animal broken because of something on my side. So I did try to check when I received the first failure notifications, also saw that there were no logs, found a few other animals with the same symptoms and concluded that it would eventually be resolved either with a fix or some instructions on the buildfarm client side or something. I did check again a few days ago when I was that nothing was pushed anymore on that branch only, saw that it was the same on other animals, got confused and gave up. > > > Yes it's me. I'm not sure why it's such a mystery as I've been answering for > > it for many years. Having an alias is useful to me as I can redirect it to > > some private mailbox with less traffic and therefore react promptly in case of > > any problem. > > Buildfarm members are labelled with an email address, not a name. I > don't magically know the names of all the people who correspond to > those email addresses. I've learned some of them over time, but I > didn't recognize rjuju.net. Ah sorry, I thought that the "rjuju" was a poor enough alias (which I more or less use everywhere) that it would be recognisable. In any case I do receive the emails so I can reply, just not from that address.
On Tue, Mar 4, 2025 at 9:26 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > Yes, I do remember the 2 threads and I totally understand. On my side I know > instructions on the buildfarm client side or something. I did check again a > few days ago when I was that nothing was pushed anymore on that branch only, > saw that it was the same on other animals, got confused and gave up. Sure, makes sense. > > Buildfarm members are labelled with an email address, not a name. I > > don't magically know the names of all the people who correspond to > > those email addresses. I've learned some of them over time, but I > > didn't recognize rjuju.net. > > Ah sorry, I thought that the "rjuju" was a poor enough alias (which I more or > less use everywhere) that it would be recognisable. In any case I do receive > the emails so I can reply, just not from that address. I think honestly at some point in the past I did have that alias <=> real name mapping in my brain, but somewhere along the way it fell out. On the list, I tend to pay more attention to people advertised real names than to their nicknames/handles, because that's what goes in release notes, what people's name tag will say at a conference, etc. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > Honestly, I just noticed that the buildfarm member in question had > been red for over a month and I figured it was a setup issue of some > kind. I guess I was wrong. It didn't cross my mind that a commit over > a month ago had broken things and there had been no subsequent revert > or buildfarm client release. I'm frankly quite astonished because Tom > recently told me that I needed to fix something RIGHT NOW or else > revert when the commit had been in the tree for two hours. Given that > context, maybe you can understand why I thought it was unlikely that > we were just chilling about this being broken after 38 days. As far as that goes, there are degrees of buildfarm brokenness. Right now we have three remaining animals that are affected by the sepgsql issue (two of which seem to have been stopped awhile ago by their owners), and the error is perfectly predictable/repeatable. So it's easy to ignore. At the point where I complained to you about that other problem, it was looking like it might cause a quarter or a third of the buildfarm to fail intermittently. Maybe I overestimated the frequency of the failure, but if that was accurate it would have resulted in a lot of fruitless double-checking of failures to see if there was anything real underneath the noise. So I find that sort of case much more painful. But yeah, I thought we were overdue for a buildfarm release. I'm pleased to see that Andrew just pushed one. regards, tom lane
On Tue, Mar 4, 2025 at 10:18 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > At the point where I complained to you about that other problem, > it was looking like it might cause a quarter or a third of the > buildfarm to fail intermittently. Maybe I overestimated the > frequency of the failure, but if that was accurate it would have > resulted in a lot of fruitless double-checking of failures to > see if there was anything real underneath the noise. So I find > that sort of case much more painful. I think that's actually totally fair. I was not upset that you wanted it fixed, or even that you wanted it fixed relatively quickly. The things that I was upset about were: 1. There's no real way for me to avoid this kind of pain. That's not your fault, but it is something that I think we need to address as a community. As Jelte said on the other thread, other projects have infrastructure that allows them to avoid these kinds of problems by being able to do pre-commit testing. Having modern infrastructure for stuff like this is an important part of attracting and retaining developers. 2. Two hours is just not enough time. Never mind that people like to have evenings and weekends off -- this is supposed to be a community that operates by consensus. It doesn't seem right to spend months or years discussing the design before committing, and then after commit, boom, you have to make a unilateral decision about what to change within -- not even hours, but minutes. Because you also need time for BF results to show up, and then you need time to code and test whatever you decided. I would actually be quite sympathetic to the time frame here if we were immediately before a feature or release freeze when there is no tolerance for error, but not in this situation. 3. You ignored the substantive questions that I asked you to comment only on the procedural issue of fixing the BF, even though you were a previous participant in the discussion on that patch. Maybe my tolerance for reverts is just lower than yours. I think it's bad when somebody has a problem like this, insta-reverts, then tries it again later after changing the patch, then maybe the same thing happens again, gets insta-reverted a second time, then maybe the third time the commit actually sticks. I think that clutters up the commit history with a bunch of junk, and that junk is permanent. The buildfarm being red is bad, but after it's fixed it will be green again and the time for which it was red will have little enduring impact. But everybody who tries to find stuff in the commit log is potentially inconvenienced by reverts, forever. Commands like 'git log FILENAME' or 'git log -Gstring' will now return extra, spurious hits. If I absolutely have to choose between the BF being red for a couple of days and revert ping-pong, I would prefer the former. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Mar 04, 2025 at 10:18:49AM -0500, Tom Lane wrote: > > But yeah, I thought we were overdue for a buildfarm release. > I'm pleased to see that Andrew just pushed one. FWIW I installed the client version 19.1 this morning and forced a run on HEAD and lapwing is back to green.
On Wed, Mar 5, 2025 at 9:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > FWIW I installed the client version 19.1 this morning and forced a run on HEAD > and lapwing is back to green. Thanks, appreciate it. By the way, is there a particular reason why we're keeping Debian 7 coverage in the buildfarm? I don't want to be in a huge rush to kill platforms people still care about, but it was pointed out to me off-list that this is quite an old release -- it seems Debian 7 was first released in 2013, last released in 2016, EOL in 2018. I assume that not too many people are going to install a PostgreSQL release that comes out in 2025 on an OS that's been EOL for 7 years (or 12 years if the BF page is correct that this is actually Debian 7.0). Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and 13, but not 8 or 10. Is there a theory behind all of this or is the current situation somewhat accidental? -- Robert Haas EDB: http://www.enterprisedb.com
On 2025-03-06 Th 10:45 AM, Robert Haas wrote: > On Wed, Mar 5, 2025 at 9:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote: >> FWIW I installed the client version 19.1 this morning and forced a run on HEAD >> and lapwing is back to green. > Thanks, appreciate it. > > By the way, is there a particular reason why we're keeping Debian 7 > coverage in the buildfarm? I don't want to be in a huge rush to kill > platforms people still care about, but it was pointed out to me > off-list that this is quite an old release -- it seems Debian 7 was > first released in 2013, last released in 2016, EOL in 2018. I assume > that not too many people are going to install a PostgreSQL release > that comes out in 2025 on an OS that's been EOL for 7 years (or 12 > years if the BF page is correct that this is actually Debian 7.0). > Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and > 13, but not 8 or 10. Is there a theory behind all of this or is the > current situation somewhat accidental? > Fairly accidental, I think. We do have a project at EDB at fill in certain gaps in buildfarm coverage, so maybe we can reduce the incidence of such accidents. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2025-03-06 11:57:05 -0500, Andrew Dunstan wrote: > On 2025-03-06 Th 10:45 AM, Robert Haas wrote: > > By the way, is there a particular reason why we're keeping Debian 7 > > coverage in the buildfarm? I don't want to be in a huge rush to kill > > platforms people still care about, but it was pointed out to me > > off-list that this is quite an old release -- it seems Debian 7 was > > first released in 2013, last released in 2016, EOL in 2018. I assume > > that not too many people are going to install a PostgreSQL release > > that comes out in 2025 on an OS that's been EOL for 7 years (or 12 > > years if the BF page is correct that this is actually Debian 7.0). > > Somewhat oddly, I see that we have coverage for Debian 9, 11, 12, and > > 13, but not 8 or 10. Is there a theory behind all of this or is the > > current situation somewhat accidental? > > Fairly accidental, I think. > > We do have a project at EDB at fill in certain gaps in buildfarm coverage, > so maybe we can reduce the incidence of such accidents. I think the way to fix the gap is to drop the buildfarm animal running an OS that has been unsupported for 7 years / without security fixes for 9 years, not to add an animal running an OS that has been unsupported for 4 years / without security fixes for 6 years (i.e. Debian 8). Debian 9 has been out of support for 2 years / without security fixes for 4. Debian 10 is also out of LTS support, albeit more recently 30 June 2024 and has been out of security support for 2 1/2 years. Keeping this old stuff around is a burden on everyone that commits stuff and probably on some contributors too. I'd not necessarily fight hard to drop a perfectly working Debian 10 animal, but adding a new one at this point makes no sense whatsoever. Greetings, Andres
Andrew Dunstan <andrew@dunslane.net> writes: > On 2025-03-06 Th 10:45 AM, Robert Haas wrote: >> By the way, is there a particular reason why we're keeping Debian 7 >> coverage in the buildfarm? I don't want to be in a huge rush to kill >> platforms people still care about, but it was pointed out to me >> off-list that this is quite an old release -- it seems Debian 7 was >> first released in 2013, last released in 2016, EOL in 2018. I assume >> that not too many people are going to install a PostgreSQL release >> that comes out in 2025 on an OS that's been EOL for 7 years (or 12 >> years if the BF page is correct that this is actually Debian 7.0). I don't think that's the way to think about old buildfarm members. Sure, nobody is very likely to be putting PG 18 on a Debian 7 box, but the odds are much higher that they might have PG 13 on it and wish to update to 13.latest. So what you need to compare OS EOL dates to is not current development but our oldest supported branch. Having said that, PG 13 came out in 2020, so yeah it'd be reasonable to retire Debian 7 buildfarm animals now. But the gap isn't nearly as large as you make it sound. > We do have a project at EDB at fill in certain gaps in buildfarm > coverage, so maybe we can reduce the incidence of such accidents. +1 regards, tom lane
On Thu, Mar 6, 2025 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I don't think that's the way to think about old buildfarm members. > Sure, nobody is very likely to be putting PG 18 on a Debian 7 box, > but the odds are much higher that they might have PG 13 on it and > wish to update to 13.latest. So what you need to compare OS EOL > dates to is not current development but our oldest supported branch. But the work it's creating is mostly because it's still testing master. If it were only testing a gradually-decreasing set of older branches, that wouldn't seem weird to me. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 6, 2025 at 12:22 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't think that's the way to think about old buildfarm members. >> Sure, nobody is very likely to be putting PG 18 on a Debian 7 box, >> but the odds are much higher that they might have PG 13 on it and >> wish to update to 13.latest. So what you need to compare OS EOL >> dates to is not current development but our oldest supported branch. > But the work it's creating is mostly because it's still testing > master. If it were only testing a gradually-decreasing set of older > branches, that wouldn't seem weird to me. I don't think it's reasonable to ask buildfarm owners to set up their animals like that, because (AFAIK) it requires tedious, error-prone configuration of moving parts that we don't supply, like cron scripts. If there were some trivial way to do that, it'd be more acceptable. Maybe invent a build-farm.conf option like "newest_branch_to_build"? branches_to_build covers some adjacent territory, but its filtering options go the wrong way (only branches newer than X, whereas what we want here is only branches older than X); probably we could also address this with more options there. regards, tom lane
On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > If there were some trivial way to do that, it'd be more acceptable. > Maybe invent a build-farm.conf option like "newest_branch_to_build"? > branches_to_build covers some adjacent territory, but its filtering > options go the wrong way (only branches newer than X, whereas what > we want here is only branches older than X); probably we could > also address this with more options there. Yes, that would be nice. I also think we should mandate the use of that option for OS versions that are EOL for more than X years, for some to-be-determined value of X, like maybe 3 or something. Right now, in the absence of any policy, and in the absence also of any agreement on what the policy should be, we have to have a huge mailing list thread every time somebody wants to get rid of an OS. I think it should just be automatic. We don't need to give up -- and IMHO shouldn't give up -- on an OS the moment the vendor pulls the plug either on a certain release or on the system in general, but we shouldn't have to individually litigate every case, either. Right now half of when we desupport an OS seems to boil down to when the hard drive on the last remaining server anybody has access to finally dies, but that leads to weird outcomes where some operating systems are not tested even though they are still in active use and others continue to get tested long after they are not. To me, it all just feels a bit too random, and I think we would be well-served by being more intentional about it. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Maybe invent a build-farm.conf option like "newest_branch_to_build"? > Yes, that would be nice. I also think we should mandate the use of > that option for OS versions that are EOL for more than X years, for > some to-be-determined value of X, like maybe 3 or something. It's hard to "mandate" anything in a distributed project like this. I don't really see a need to either, at least for cases where an old animal isn't causing us extra work. When it does, though, it'd be nice to be able to decide "we're not gonna support that OS version beyond PG nn", and then have a simple recipe to give the BF owner that's less drastic than "shut it down". regards, tom lane
Hi, On 2025-03-06 14:13:40 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> Maybe invent a build-farm.conf option like "newest_branch_to_build"? > > > Yes, that would be nice. I also think we should mandate the use of > > that option for OS versions that are EOL for more than X years, for > > some to-be-determined value of X, like maybe 3 or something. > > It's hard to "mandate" anything in a distributed project like this. > I don't really see a need to either, at least for cases where an > old animal isn't causing us extra work. Lapwing *has* caused extra work though, repeatedly. > When it does, though, it'd be nice to be able to decide "we're not gonna > support that OS version beyond PG nn", and then have a simple recipe to give > the BF owner that's less drastic than "shut it down". The BF is there to be useful for PG development. BF owners contribute them for that purpose. I don't think we need to keep animals alive when they're past their shelf life, just to make the animal's owners happy - I suspect most won't want to keep an animal alive that we don't want. That said, I'd be happy if the BF had a slightly easier way to configure "up to this major version". Greetings, Andres
On 2025-03-06 Th 2:28 PM, Andres Freund wrote: > Hi, > > On 2025-03-06 14:13:40 -0500, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> Maybe invent a build-farm.conf option like "newest_branch_to_build"? >>> Yes, that would be nice. I also think we should mandate the use of >>> that option for OS versions that are EOL for more than X years, for >>> some to-be-determined value of X, like maybe 3 or something. >> It's hard to "mandate" anything in a distributed project like this. >> I don't really see a need to either, at least for cases where an >> old animal isn't causing us extra work. > Lapwing *has* caused extra work though, repeatedly. > > >> When it does, though, it'd be nice to be able to decide "we're not gonna >> support that OS version beyond PG nn", and then have a simple recipe to give >> the BF owner that's less drastic than "shut it down". > The BF is there to be useful for PG development. BF owners contribute them for > that purpose. I don't think we need to keep animals alive when they're past > their shelf life, just to make the animal's owners happy - I suspect most > won't want to keep an animal alive that we don't want. > > That said, I'd be happy if the BF had a slightly easier way to configure "up > to this major version". > The only reason it's not there is that nobody's ever asked for it ;-) You can specify an actual list right now instead of a keyword for branches_to_build. e.g. "branches_to_build => [qw(REL_13_STABLE REL_14_STABLE REL_15_STABLE)]". The only difficulty about this is you need to prune the list when a branch goes EOL. But we could also build in some keyword type settings too, that would interact more sensibly with the server list of branches. So we could say something like "branches_to_build => 'UP_TO_REL_15_STABLE'". That would be pretty simple to code for. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
On Thu, Mar 6, 2025 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's hard to "mandate" anything in a distributed project like this. > I don't really see a need to either, at least for cases where an > old animal isn't causing us extra work. I don't know, to me it feels like we have the argument about whether StegosaurOS is actually dead or whether there might be survivors of the Chixulub impact hiding somewhere several times a year. The participants in those discussions are always pretty much always the same, and their opinions are pretty much always the same, and the threads are long and tiresome. Eventually we usually get consensus to forcibly retire some OS that's been EOL for 5 or 10 years and then we do it all over again the next time somebody's losing their mind. I think all of that energy could be better spent. What really makes this unfortunate is that we typically only have this discussion after that machine has ALREADY caused a bunch of hassle for a bunch of people. You commit something, the BF turns red, you fix it. You do the same thing again. Then maybe a third time. By the time we actually get consensus to remove things, they're generally not just MOSTLY dead. They're at the point where all you can do is check their pockets for loose change. They have kicked the bucket, shuffled off this mortal coil, run down the curtain and gone to join the choir invisible. They are ex-operating systems. It just doesn't seem reasonable to me that you have to basically show up and prove that you've already wasted 100 hours or whatever on a defunct OS before we'll consider giving it the boot. It's a predictable outcome. Once the OS is dead, you can't easily upgrade to newer software versions, which is eventually going to break something for somebody, not to mention that eventually no PG developer other than the BF member owner will be able to get access to a copy of that OS to fix anything that breaks. It's only a matter of time before that becomes an inconvenience. -- Robert Haas EDB: http://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > The only reason it's not there is that nobody's ever asked for it ;-) > You can specify an actual list right now instead of a keyword for > branches_to_build. e.g. "branches_to_build => [qw(REL_13_STABLE > REL_14_STABLE REL_15_STABLE)]". The only difficulty about this is you > need to prune the list when a branch goes EOL. Right; it's that need for manual maintenance that I'd like to get out from under. > But we could also build in some keyword type settings too, that would > interact more sensibly with the server list of branches. > So we could say something like "branches_to_build => > 'UP_TO_REL_15_STABLE'". That would be pretty simple to code for. WFM. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Mar 6, 2025 at 2:13 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's hard to "mandate" anything in a distributed project like this. >> I don't really see a need to either, at least for cases where an >> old animal isn't causing us extra work. > I don't know, to me it feels like we have the argument about whether > StegosaurOS is actually dead or whether there might be survivors of > the Chixulub impact hiding somewhere several times a year. I think you misunderstood my drift. I'm okay with setting a project policy that we won't support OSes that are more than N years EOL, as long as it's phrased to account for older PG branches properly. My point was that we can implement such a policy in a laissez-faire way: if an older BF animal isn't causing us trouble then why mess with it? Once we *do* recognize that it's causing us trouble, we can apply the still-hypothetical policy and ask the owner to turn it off for branches where it's out of support. regards, tom lane
On Thu, Mar 6, 2025 at 4:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think you misunderstood my drift. I'm okay with setting a project > policy that we won't support OSes that are more than N years EOL, > as long as it's phrased to account for older PG branches properly. Yep, I misunderstood. That sounds awesome. Let's see if others agree. > My point was that we can implement such a policy in a laissez-faire > way: if an older BF animal isn't causing us trouble then why mess > with it? Once we *do* recognize that it's causing us trouble, > we can apply the still-hypothetical policy and ask the owner to > turn it off for branches where it's out of support. Fair enough. This does have the disadvantage that people will still commit things that turn the buildfarm red and have to go into panic mode -- perhaps not even realizing which machines are running an older OS -- and then reactively do this. However, it still sounds like progress over where we are today, because it would (hopefully) remove the need for an argument about each individual case. Honestly, I'm kind of bummed by how fast operating systems and OS versions die these days. It seems crazy to me that I probably couldn't reasonably build a PG from today on the laptop I started at EDB with, or a PG from then on my current laptop. But it doesn't seem like there's much point in fighting the tide. We shouldn't be the only people trying to keep stuff working long after the sell-by date. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 6, 2025 at 4:38 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 6, 2025 at 4:27 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > My point was that we can implement such a policy in a laissez-faire > > way: if an older BF animal isn't causing us trouble then why mess > > with it? Once we *do* recognize that it's causing us trouble, > > we can apply the still-hypothetical policy and ask the owner to > > turn it off for branches where it's out of support. > > Fair enough. This does have the disadvantage that people will still > commit things that turn the buildfarm red and have to go into panic > mode -- perhaps not even realizing which machines are running an older > OS -- and then reactively do this. However, it still sounds like > progress over where we are today, because it would (hopefully) remove > the need for an argument about each individual case. One thing I've been wishing for recently is access to the discussion and lore around individual buildfarm animals in a consolidated place. As a new committer, I haven't been part of all of these discussions over the last N years and so if an animal goes red, I can look through the buildfarm status history to see what kind of failures it has had for the last ~6 months, but it doesn't tell me if that machine is known for failing because of an outdated OS or some other platform-specific or hardware-specific issue. For some animals searching for "skink" in the email archives might be enough, but I haven't found that to be true so far. - Melanie
On Thu, Mar 6, 2025 at 4:54 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > One thing I've been wishing for recently is access to the discussion > and lore around individual buildfarm animals in a consolidated place. > As a new committer, I haven't been part of all of these discussions > over the last N years and so if an animal goes red, I can look through > the buildfarm status history to see what kind of failures it has had > for the last ~6 months, but it doesn't tell me if that machine is > known for failing because of an outdated OS or some other > platform-specific or hardware-specific issue. This is a problem for me, too, so I would also be happy about this. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 06, 2025 at 02:28:20PM -0500, Andres Freund wrote: > Hi, > > On 2025-03-06 14:13:40 -0500, Tom Lane wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > > On Thu, Mar 6, 2025 at 1:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> Maybe invent a build-farm.conf option like "newest_branch_to_build"? > > > > > Yes, that would be nice. I also think we should mandate the use of > > > that option for OS versions that are EOL for more than X years, for > > > some to-be-determined value of X, like maybe 3 or something. > > > > It's hard to "mandate" anything in a distributed project like this. > > I don't really see a need to either, at least for cases where an > > old animal isn't causing us extra work. > > Lapwing *has* caused extra work though, repeatedly. > > > > When it does, though, it'd be nice to be able to decide "we're not gonna > > support that OS version beyond PG nn", and then have a simple recipe to give > > the BF owner that's less drastic than "shut it down". > > The BF is there to be useful for PG development. BF owners contribute them for > that purpose. I don't think we need to keep animals alive when they're past > their shelf life, just to make the animal's owners happy - I suspect most > won't want to keep an animal alive that we don't want. I indeed don't want to keep lapwing up unless there is any value. Note that it started to fail on 2 branches after the last buildfarm client update for reasons I don't understand. Since everyone is complaining about lapwing already I will just turn it off. If anyone is interested in keeping it up on REL_13_STABLE until its EOL (so for the next ~6 months) let me know.
On Thu, Mar 6, 2025 at 5:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:\ > I indeed don't want to keep lapwing up unless there is any value. Note that it > started to fail on 2 branches after the last buildfarm client update for > reasons I don't understand. Since everyone is complaining about lapwing > already I will just turn it off. If anyone is interested in keeping it up on > REL_13_STABLE until its EOL (so for the next ~6 months) let me know. We're not complaining, we're just discussing. :-) In all seriousness, I appreciate the willingness of you and others to run BF members and of Andrew to maintain the BF client. It's kind of a thankless job, I'm sure, but will be much worse off as a project if people just stop doing those things. Wanting to figure out how we can do better or wondering whether a certain machine makes sense any more doesn't translate into not appreciating the work. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 06, 2025 at 06:25:29PM -0500, Robert Haas wrote: > On Thu, Mar 6, 2025 at 5:12 PM Julien Rouhaud <rjuju123@gmail.com> wrote:\ > > I indeed don't want to keep lapwing up unless there is any value. Note that it > > started to fail on 2 branches after the last buildfarm client update for > > reasons I don't understand. Since everyone is complaining about lapwing > > already I will just turn it off. If anyone is interested in keeping it up on > > REL_13_STABLE until its EOL (so for the next ~6 months) let me know. > > We're not complaining, we're just discussing. :-) Well, in quoted part of my answer was: > Lapwing *has* caused extra work though, repeatedly. In my book that's complaining. At least Tom and Alvaro previously complained too recently. > > In all seriousness, I appreciate the willingness of you and others to > run BF members and of Andrew to maintain the BF client. It's kind of a > thankless job, I'm sure, but will be much worse off as a project if > people just stop doing those things. Wanting to figure out how we can > do better or wondering whether a certain machine makes sense any more > doesn't translate into not appreciating the work. I'm fine with the thankless job, I got used to it. But being blamed for it is a big no. Honestly, it's been years of people complaining on one thing or another about lapwing without ever asking for a change. Was it really hard to ask "can you remove the -Werror it's not useful anymore" the first time it caused extra work? Instead I have to guess what people want. So after a few complaints I removed that flag. And now after a few more complaints I turned it off. If that's not what you want, well too bad but that's on you, not me.
On 06.03.25 22:54, Melanie Plageman wrote: > One thing I've been wishing for recently is access to the discussion > and lore around individual buildfarm animals in a consolidated place. > As a new committer, I haven't been part of all of these discussions > over the last N years and so if an animal goes red, I can look through > the buildfarm status history to see what kind of failures it has had > for the last ~6 months, but it doesn't tell me if that machine is > known for failing because of an outdated OS or some other > platform-specific or hardware-specific issue. I'm not sure there is more than that to it. There is at any given time necessarily a trailing edge system, which is where you tend to run into trouble if you try to use compiler or library features past some invisible boundary, and which also tend to have flaky hardware and perhaps limited resources in general, and so you might also tend to see some more test instability there. That used to be an HP-UX 10 system, and then later it was an OpenBSD on Sparc system, and now it's seemingly this one, and perhaps there were a few in between, and there will be a new one soon. But this keeps moving, and the lore from more than 12 months ago is probably not that useful here.
On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > Honestly, it's been years of people complaining on one thing or another about > lapwing without ever asking for a change. Was it really hard to ask "can you > remove the -Werror it's not useful anymore" the first time it caused extra > work? Instead I have to guess what people want. So after a few complaints I > removed that flag. And now after a few more complaints I turned it off. If > that's not what you want, well too bad but that's on you, not me. This is actually much harder for me as a committer than you might guess. How is an individual committer working on an individual issue supposed to know that removing -Wall is the right thing vs. fixing the warning in some way? As Melanie also mentions in her reply, committers are not born knowing or understanding which machines are chronic problems, and sometimes it's very difficult to figure that out. If you read every message on the mailing list you're probably going to have more idea than if you don't, but that's more than most people can do these days, and you can still miss things. But I think your complaint here is actually getting at another problem with the way that we do the buildfarm as a project: it's completely unplanned. People show up and run random buildfarm machines and nobody knows why they are running those machines: was it because they care about support for that platform, or was it just to be nice, or where they testing some unusual configuration, or just because they set it up a long time ago and have never turned it off? And then other buildfarm members that maybe we ought to have are missing and nobody knows if anyone else is working on that, or maybe they don't even know about it. And then, to your point, nobody ever shows up and tells a buildfarm member owner what we'd like them to do. There is no "what we'd like them to do" -- we have no policy or preference or anything as a group. Everybody's just guessing what other people want and care about, and then sometimes we're all grumpy at each other. But notice that with CI, it's the other way around. Some small group of people decide what our CI setup should do and then they configure it to do that thing. You can agree with those decisions or not, but they are intentional. The platforms and OS versions that are being tested are what somebody decided was best from among the available options. Now I'm not saying that sort of centralized planning is without flaws -- and the fact that only a handful of people seem to understand how to keep this CI stuff working is definitely one of them -- but it also has some strengths, namely that you remove a lot of this guesswork around what the other people involved actually want. I'm not sure exactly where I'm going with this line of thought, but I do wonder if we ought to find a way to be more intentional about the buildfarm instead of just letting things happen and then being sad that we didn't get what we wanted. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 07, 2025 at 08:52:26AM -0500, Robert Haas wrote: > On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Honestly, it's been years of people complaining on one thing or another about > > lapwing without ever asking for a change. Was it really hard to ask "can you > > remove the -Werror it's not useful anymore" the first time it caused extra > > work? Instead I have to guess what people want. So after a few complaints I > > removed that flag. And now after a few more complaints I turned it off. If > > that's not what you want, well too bad but that's on you, not me. > > This is actually much harder for me as a committer than you might > guess. How is an individual committer working on an individual issue > supposed to know that removing -Wall is the right thing vs. fixing the > warning in some way? That may be true in general, but I don't think it was the case here. The "repeated extra work" here was replacing {0} with {{0}} or something like that, due to a compiler bug. It was outlined multiple time. The problem is easy: either doing that is so much pain that the -Werror should be removed, or the odds of that compiler and/or 32bits architecture revealing an actual bug is high enough that it's a small price to pay. I was fine either way, and I didn't think it was my choice to make since I'm not the one responsible to deal with it. But none of that happened and instead people just chose to the easy solution of complaining about it for the sake of complaining without giving an actual solution. Anyway it seems that the only wanted use for this animal now would be to test REL_13_STABLE which will be EOL in a few months now, and I don't see the point of dealing more complaining just for that given how much activity is going to happen on that branch, so as I said I turned it off.
There is no "what we'd like them to do" -- we have no policy or preference or anything as a group. Everybody's just guessing what other people want and care about, and then sometimes we're all grumpy at each other.
On Thu, Mar 6, 2025 at 7:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:Honestly, it's been years of people complaining on one thing or another about lapwing without ever asking for a change. Was it really hard to ask "can you remove the -Werror it's not useful anymore" the first time it caused extra work? Instead I have to guess what people want. So after a few complaints I removed that flag. And now after a few more complaints I turned it off. If that's not what you want, well too bad but that's on you, not me.This is actually much harder for me as a committer than you might guess. How is an individual committer working on an individual issue supposed to know that removing -Wall is the right thing vs. fixing the warning in some way? As Melanie also mentions in her reply, committers are not born knowing or understanding which machines are chronic problems, and sometimes it's very difficult to figure that out. If you read every message on the mailing list you're probably going to have more idea than if you don't, but that's more than most people can do these days, and you can still miss things. But I think your complaint here is actually getting at another problem with the way that we do the buildfarm as a project: it's completely unplanned. People show up and run random buildfarm machines and nobody knows why they are running those machines: was it because they care about support for that platform, or was it just to be nice, or where they testing some unusual configuration, or just because they set it up a long time ago and have never turned it off? And then other buildfarm members that maybe we ought to have are missing and nobody knows if anyone else is working on that, or maybe they don't even know about it. And then, to your point, nobody ever shows up and tells a buildfarm member owner what we'd like them to do. There is no "what we'd like them to do" -- we have no policy or preference or anything as a group. Everybody's just guessing what other people want and care about, and then sometimes we're all grumpy at each other. But notice that with CI, it's the other way around. Some small group of people decide what our CI setup should do and then they configure it to do that thing. You can agree with those decisions or not, but they are intentional. The platforms and OS versions that are being tested are what somebody decided was best from among the available options. Now I'm not saying that sort of centralized planning is without flaws -- and the fact that only a handful of people seem to understand how to keep this CI stuff working is definitely one of them -- but it also has some strengths, namely that you remove a lot of this guesswork around what the other people involved actually want. I'm not sure exactly where I'm going with this line of thought, but I do wonder if we ought to find a way to be more intentional about the buildfarm instead of just letting things happen and then being sad that we didn't get what we wanted.
I've renamed the thread because it's got a long way past what's happened with lapwing.
I agree that the coverage is unplanned, and that we should do better about it.
However, we can't cover every possible combination of Architecture, OS, Compiler, Build System, Build Options. There are roughly 170 currently reporting buildfarm animals. Comprehensive coverage would mean a fleet of thousands, even if we ran multiple animals on a single machine. And I'm not sure that would be a win - the status page would become unmanageable and useless. So we need to be strategic about it.
So let's start with a couple of simple questions:
1. what are the perceived major gaps in coverage?
2. what animals (if any) should be turned off or updated?
I'll start with my nomination for 1: we need a Windows animal that builds in the same way and with the same options as the installer binaries. That's something we are actually working on at EDB.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com