pgbench stopped supporting large number of client connections on Windows - Mailing list pgsql-hackers

From Marina Polyakova
Subject pgbench stopped supporting large number of client connections on Windows
Date
Msg-id 8225e78650dd69f69c8cff37ecce9a09@postgrespro.ru
Whole thread Raw
Responses Re: pgbench stopped supporting large number of client connections on Windows  (Fabien COELHO <coelho@cri.ensmp.fr>)
List pgsql-hackers
Hello, hackers!

While trying to test a patch that adds a synchronization barrier in 
pgbench [1] on Windows, I found that since the commit "Use ppoll(2), if 
available, to wait for input in pgbench." [2] I cannot use a large 
number of client connections in pgbench on my Windows virtual machines 
(Windows Server 2008 R2 and Windows 2019), for example:

> bin\pgbench.exe -c 90 -S -T 3 postgres
starting vacuum...end.
too many client connections for select()

The almost same thing happens with reindexdb and vacuumdb (build on 
commit [3]):

> bin\reindexdb.exe -j 95 postgres
reindexdb: fatal: too many jobs for this platform -- try 90

> bin\vacuumdb.exe -j 95 postgres
vacuumdb: vacuuming database "postgres"
vacuumdb: fatal: too many jobs for this platform -- try 90

IIUC the checks below are not correct on Windows, since on this system 
sockets can have values equal to or greater than FD_SETSIZE (see Windows 
documentation [4] and pgbench debug output in attached 
pgbench_debug.txt).

src/bin/pgbench/pgbench.c, the function add_socket_to_set:
if (fd < 0 || fd >= FD_SETSIZE)
{
    /*
     * Doing a hard exit here is a bit grotty, but it doesn't seem worth
     * complicating the API to make it less grotty.
     */
    pg_log_fatal("too many client connections for select()");
    exit(1);
}

src/bin/scripts/scripts_parallel.c, the function ParallelSlotsSetup:
/*
  * Fail and exit immediately if trying to use a socket in an
  * unsupported range.  POSIX requires open(2) to use the lowest
  * unused file descriptor and the hint given relies on that.
  */
if (PQsocket(conn) >= FD_SETSIZE)
{
    pg_log_fatal("too many jobs for this platform -- try %d", i);
    exit(1);
}

I tried to fix this, see attached fix_max_client_conn_on_Windows.patch 
(based on commit [3]). I checked it for reindexdb and vacuumdb, and it 
works for simple databases (1025 jobs are not allowed and 1024 jobs is 
ok). Unfortunately, pgbench was getting connection errors when it tried 
to use 1000 jobs on my virtual machines, although there were no errors 
for fewer jobs (500) and the same number of clients (1000)...

Any suggestions are welcome!

[1] 
https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de
[2] 
https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9
[3] 
https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8
[4] From 
https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select 
:
Internally, socket handles in an fd_set structure are not represented as 
bit flags as in Berkeley Unix. Their data representation is opaque.

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: bitmaps and correlation
Next
From: Jacob Champion
Date:
Subject: Re: Support for NSS as a libpq TLS backend