Re: buildfarm instance bichir stuck - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: buildfarm instance bichir stuck
Date
Msg-id CA+hUKGK2eubAdK3TjrbRMf=htWegcL-09EPNh5xJrp6ZGSgPTw@mail.gmail.com
Whole thread Raw
In response to Re: buildfarm instance bichir stuck  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Fri, Apr 9, 2021 at 6:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Wed, Apr 7, 2021 at 7:31 PM Robins Tharakan <tharakan@gmail.com> wrote:
> > Correct. This is easily reproducible on this test-instance, so let me know if you want me to test a patch.
>
> From your description it sounds like signals are not arriving at all,
> rather than some more complicated race.  Let's go back to basics...

I was looking into the portability of SIGURG and OOB socket data for
something totally different (hallway track discussion from PGCon,
could we use that for query cancel, like FTP does, instead of opening
another socket?), and lo and behold, someone has figured out a
workaround for this latch problem:

https://github.com/microsoft/WSL/issues/8619

I don't really want to add code to scrape uname() ouput detect
different kernels at runtime as shown there, but it doesn't seem to
make a difference on Linux if we just always do what was suggested.  I
didn't look too hard into whether that is the right place to put the
call, or really understand *why* it works, and since I am not a
Windows user and we don't have a WSL1 CI, I can't confirm that it
works or explore whether there is some other ordering of operations
that would be better but still work, but if that does the trick then
maybe we should just do something like the attached.

Thoughts?

Attachment

pgsql-hackers by date:

Previous
From: José Neves
Date:
Subject: CDC/ETL system on top of logical replication with pgoutput, custom client
Next
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences, take 2