Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Refactoring postmaster's code to cleanup after child exit
Date
Msg-id 217d43af-0287-4769-a825-cde4cfa00e6c@iki.fi
Whole thread Raw
In response to Re: Refactoring postmaster's code to cleanup after child exit  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Refactoring postmaster's code to cleanup after child exit
List pgsql-hackers
On 05/10/2024 01:03, Thomas Munro wrote:
> On Sat, Oct 5, 2024 at 7:41 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> My test for dead-end backends opens 20 TCP (or unix domain) connections
>> to the server, in quick succession. That works fine my system, and it
>> passed cirrus CI on other platforms, but on FreeBSD it failed
>> repeatedly. The behavior in that scenario is apparently
>> platform-dependent: it depends on the accept queue size, but what
>> happens when you reach the queue size also seems to depend on the
>> platform. On my Linux system, the connect() calls in the client are
>> blocked, if the server is doesn't call accept() fast enough, but
>> apparently you get an error on *BSD systems.
> 
> Right, we've analysed that difference in AF_UNIX implementation
> before[1], which shows up in the real world, where client sockets ie
> libpq's are usually non-blocking, as EAGAIN on Linux (which is not
> valid per POSIX) vs ECONNREFUSED on other OSes.  All fail to connect,
> but the error message is different.

Thanks for the pointer!

> For blocking AF_UNIX client sockets like in your test, Linux
> effectively has an infinite queue made from two layers.  The listen
> queue (a queue of connecting sockets) does respect the requested
> backlog size, but when it's full it has an extra trick: the connect()
> call waits (in a queue of threads) for space to become free in the
> listen queue, so it's effectively unlimited (but only for blocking
> sockets), while FreeBSD and I suspect any other implementation
> deriving from or reimplementing the BSD socket code gives you
> ECONNREFUSED.  macOS behaves just the same as FreeBSD AFAICT, so I
> don't know why you didn't see the same thing... I guess it's just
> racing against accept() draining the queue.

In fact I misremembered: the failure happened on macOS, *not* FreeBSD. 
It could be just luck I didn't see it on FreeBSD though.

> It's possible that Windows copied the Linux behaviour for AF_UNIX,
> given that it probably has something to do with the WSL project for
> emulating Linux, but IDK.

Sadly Windows' IO::Socket::UNIX hasn't been implemented on Windows (or 
at least on this perl distribution we're using in Cirrus CI):

Socket::pack_sockaddr_un not implemented on this architecture at 
C:/strawberry/5.26.3.1/perl/lib/Socket.pm line 872.

so I'll have to disable this test on Windows anyway.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: New PostgreSQL Contributors
Next
From: Shayon Mukherjee
Date:
Subject: Re: On disable_cost