Re: Changing the state of data checksums in a running cluster - Mailing list pgsql-hackers

From Daniel Gustafsson
Subject Re: Changing the state of data checksums in a running cluster
Date
Msg-id 7FF767F0-EBAC-43EF-B93A-E750015D1D31@yesql.se
Whole thread Raw
In response to Re: Changing the state of data checksums in a running cluster  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
> On 5 Apr 2026, at 06:56, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2026-04-05 00:27:00 +0200, Daniel Gustafsson wrote:
>>> On 4 Apr 2026, at 02:35, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>
>>>> On 4 Apr 2026, at 00:59, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>>
>>>>> On 3 Apr 2026, at 23:46, Daniel Gustafsson <daniel@yesql.se> wrote:
>>>>>
>>>>> After many more runs on CI I ended up pushing this version, and I see BF
>>>>> members being angry due the test not waiting for the launcher to exit.  I am
>>>>> working on a fix right now.
>>>>
>>>> 0036232ba8f seems to have made the failing animals slightly happier, I will
>>>> continue to monitor the buildfarm for other fallout.
>>>
>>> The intermittent failure on kestrel implies timing similar to the one fixed in
>>> 0036232ba8fb28, a tentative fix is to make it part of waiting for an endstate
>>> (on or off) to make sure the cluster is always in the right state for new
>>> operations.  Right now kestrel is the one which has been flapping, I'm waiting
>>> a bit to see if more will follow and give further clues.
>>
>> mylodon had the same failure, and I believe the bug is in my injection point
>> test code.  I have a tentative fix in the attached refactoring which moves over
>> to using the injection_point extension module.  It's still fairly rare so I'm
>> holding off for a little bit before pushing it to see if I can collect a little
>> bit more evidence.
>
> There are a lot checksum related errors on CI:
>
> https://cirrus-ci.com/task/4848298592305152
> https://cirrus-ci.com/task/5338691381493760
> https://cirrus-ci.com/task/6271077241847808
> https://cirrus-ci.com/task/6150048418889728
>
> They probably are mostly the issues you know about.  It'd be nice to get them
> fixed soon-ish...

I am investigating them and have tentative fixes that I will apply tonight.

--
Daniel Gustafsson




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Duplicate RequestNamedLWLocktranche() names and test_lwlock_tranches improvements
Next
From: Daniel Gustafsson
Date:
Subject: Re: PG 19 release notes and authors