Thread: pgbench stopped supporting large number of client connections on Windows
pgbench stopped supporting large number of client connections on Windows
From
Marina Polyakova
Date:
Hello, hackers! While trying to test a patch that adds a synchronization barrier in pgbench [1] on Windows, I found that since the commit "Use ppoll(2), if available, to wait for input in pgbench." [2] I cannot use a large number of client connections in pgbench on my Windows virtual machines (Windows Server 2008 R2 and Windows 2019), for example: > bin\pgbench.exe -c 90 -S -T 3 postgres starting vacuum...end. too many client connections for select() The almost same thing happens with reindexdb and vacuumdb (build on commit [3]): > bin\reindexdb.exe -j 95 postgres reindexdb: fatal: too many jobs for this platform -- try 90 > bin\vacuumdb.exe -j 95 postgres vacuumdb: vacuuming database "postgres" vacuumdb: fatal: too many jobs for this platform -- try 90 IIUC the checks below are not correct on Windows, since on this system sockets can have values equal to or greater than FD_SETSIZE (see Windows documentation [4] and pgbench debug output in attached pgbench_debug.txt). src/bin/pgbench/pgbench.c, the function add_socket_to_set: if (fd < 0 || fd >= FD_SETSIZE) { /* * Doing a hard exit here is a bit grotty, but it doesn't seem worth * complicating the API to make it less grotty. */ pg_log_fatal("too many client connections for select()"); exit(1); } src/bin/scripts/scripts_parallel.c, the function ParallelSlotsSetup: /* * Fail and exit immediately if trying to use a socket in an * unsupported range. POSIX requires open(2) to use the lowest * unused file descriptor and the hint given relies on that. */ if (PQsocket(conn) >= FD_SETSIZE) { pg_log_fatal("too many jobs for this platform -- try %d", i); exit(1); } I tried to fix this, see attached fix_max_client_conn_on_Windows.patch (based on commit [3]). I checked it for reindexdb and vacuumdb, and it works for simple databases (1025 jobs are not allowed and 1024 jobs is ok). Unfortunately, pgbench was getting connection errors when it tried to use 1000 jobs on my virtual machines, although there were no errors for fewer jobs (500) and the same number of clients (1000)... Any suggestions are welcome! [1] https://www.postgresql.org/message-id/flat/20200227180100.zyvjwzcpiokfsqm2%40alap3.anarazel.de [2] https://github.com/postgres/postgres/commit/60e612b602999e670f2d57a01e52799eaa903ca9 [3] https://github.com/postgres/postgres/commit/48e1291342dd7771cf8c67aa1d7ec1f394b95dd8 [4] From https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-select : Internally, socket handles in an fd_set structure are not represented as bit flags as in Berkeley Unix. Their data representation is opaque. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
re: pgbench stopped supporting large number of client connections on Windows
From
Ranier Vilela
Date:
Hi Marina,
Nice catch.
>rc/bin/pgbench/pgbench.c, the function add_socket_to_set:
>if (fd < 0 || fd >= FD_SETSIZE)
>{
>/*
>* Doing a hard exit here is a bit grotty, but it doesn't seem worth
>* complicating the API to make it less grotty.
>*/
>pg_log_fatal("too many client connections for select()");
>exit(1);
>}
It seems to me that the limit is hardcode in, src/backend/port/win32/socket.c
FD_SETSIZE * 2
that would be 2048?
regards,
Ranier Vilela
Re: pgbench stopped supporting large number of client connections on Windows
From
Fabien COELHO
Date:
Hello Marina, > While trying to test a patch that adds a synchronization barrier in pgbench > [1] on Windows, Thanks for trying that, I do not have a windows setup for testing, and the sync code I wrote for Windows is basically blind coding:-( > I found that since the commit "Use ppoll(2), if available, to > wait for input in pgbench." [2] I cannot use a large number of client > connections in pgbench on my Windows virtual machines (Windows Server 2008 R2 > and Windows 2019), for example: > >> bin\pgbench.exe -c 90 -S -T 3 postgres > starting vacuum...end. ISTM that 1 thread with 90 clients is a bad idea, see below. > The almost same thing happens with reindexdb and vacuumdb (build on > commit [3]): Windows fd implementation is somehow buggy because it does not return the smallest number available, and then with the assumption that select uses a dense array indexed with them (true on linux, less so on Windows which probably uses a sparse array), so that the number gets over the limit, even if less are actually used, hence the catch, as you noted. Another point is windows has a hardcoded number of objects one thread can really wait for, typically 64, so that waiting for more requires actually forking threads to do the waiting. But if you are ready to fork threads just to wait, then probaly you could have started pgbench with more threads in the first place. Now it would probably not make the problem go away because fd numbers would be per process, not per thread, but it really suggests that one should not load a thread is more than 64 clients. > IIUC the checks below are not correct on Windows, since on this system > sockets can have values equal to or greater than FD_SETSIZE (see Windows > documentation [4] and pgbench debug output in attached pgbench_debug.txt). Okay. But then, how may one detect that there are too many fds in the set? I think that an earlier version of the code needed to make assumptions about the internal implementation of windows (there is a counter somewhere in windows fd_set struct), which was rejected because if was breaking the interface. Now your patch is basically resurrecting that. Why not if there is no other solution, but this is quite depressing, and because it breaks the interface it would be broken if windows changed its internals for some reason:-( Doesn't windows has "ppoll"? Should we implement the stuff above windows polling capabilities and coldly skip its failed posix portability attempts? This raises again the issue that you should not have more that 64 clients per thread anyway, because it is an intrinsic limit on windows. I think that at one point it was suggested to error or warn if nclients/nthreads is too great, but that was not kept in the end. > I tried to fix this, see attached fix_max_client_conn_on_Windows.patch (based > on commit [3]). I checked it for reindexdb and vacuumdb, and it works for > simple databases (1025 jobs are not allowed and 1024 jobs is ok). > Unfortunately, pgbench was getting connection errors when it tried to use > 1000 jobs on my virtual machines, although there were no errors for fewer > jobs (500) and the same number of clients (1000)... It seems that the max number of threads you can start depends on available memory, because each thread is given its own stack, so it would depend on your vm settings? > Any suggestions are welcome! Use ppoll, and start more threads but not too many? -- Fabien.
Fabien COELHO <coelho@cri.ensmp.fr> writes: >> Any suggestions are welcome! > Use ppoll, and start more threads but not too many? Does ppoll exist on Windows? There was a prior thread on this topic, which seems to have drifted off into the sunset: https://www.postgresql.org/message-id/flat/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0%40BL0PR1901MB1985.namprd19.prod.outlook.com regards, tom lane
Re: pgbench stopped supporting large number of client connections on Windows
From
Marina Polyakova
Date:
On 2020-11-06 23:54, Ranier Vilela wrote: > Hi Marina, Hello! > Nice catch. Thank you! >> rc/bin/pgbench/pgbench.c, the function add_socket_to_set: >> if (fd < 0 || fd >= FD_SETSIZE) >> { >> /* >> * Doing a hard exit here is a bit grotty, but it doesn't seem worth >> * complicating the API to make it less grotty. >> */ >> pg_log_fatal("too many client connections for select()"); >> exit(1); >> } > > It seems to me that the limit is hardcode in, > src/backend/port/win32/socket.c > > FD_SETSIZE * 2 > > that would be 2048? 1) If you mean the function pgwin32_select in the file src/backend/port/win32/socket.c, IIUC it is only used in the backend, see src/include/port/win32_port.h: #ifndef FRONTEND <...> #define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout) <...> #endif /* FRONTEND */ 2) It looks like FD_SETSIZE does not set a limit on the socket value on Windows, see https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 : The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: pgbench stopped supporting large number of client connections on Windows
From
Ranier Vilela
Date:
Em sáb., 7 de nov. de 2020 às 14:55, Marina Polyakova <m.polyakova@postgrespro.ru> escreveu:
On 2020-11-06 23:54, Ranier Vilela wrote:
> Hi Marina,
Hello!
1) If you mean the function pgwin32_select in the file
src/backend/port/win32/socket.c, IIUC it is only used in the backend,
see src/include/port/win32_port.h:
#ifndef FRONTEND
<...>
#define select(n, r, w, e, timeout) pgwin32_select(n, r, w, e, timeout)
<...>
#endif /* FRONTEND */
Yes. My mistake, you right here.
2) It looks like FD_SETSIZE does not set a limit on the socket value on
Windows, see
https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2
:
The maximum number of sockets that a Windows Sockets application can use
is not affected by the manifest constant FD_SETSIZE. This value defined
in the Winsock2.h header file is used in constructing the FD_SET
structures used with select function.
Correct.
It seems that the limit will be defined by compilation, before the inclusion of Winsock2.h.
Have you tried to define -DFD_SETSIZE=2048
Have you tried to define -DFD_SETSIZE=2048
best regards,
Ranier Vilela
Re: pgbench stopped supporting large number of client connections on Windows
From
Fabien COELHO
Date:
Hello Tom, >> Use ppoll, and start more threads but not too many? > > Does ppoll exist on Windows? Some g*gling suggest that the answer is no. > There was a prior thread on this topic, which seems to have drifted off > into the sunset: Indeed. I may have contributed to this dwindling by not adding a CF entry for this thread, so that there was no reminder anywhere. > https://www.postgresql.org/message-id/flat/BL0PR1901MB1985F219C46C61EDE7036C34ED8E0%40BL0PR1901MB1985.namprd19.prod.outlook.com It seems that there is no simple good solution around windows wait event implementations: - timeout seems to be milliseconds on all things I saw - 64 is an intrinsic limit, probably because of the underlying n² implementations Maybe we could provide a specific windows implementation limited to 64 fd (which is not a bad thing bench-wise) but with a rounded-down timeout, so that it could end-up on an activate spinning wait in some cases, which is probably not a bug issue, all things considered. -- Fabien.
Re: pgbench stopped supporting large number of client connections on Windows
From
Marina Polyakova
Date:
On 2020-11-07 01:01, Fabien COELHO wrote: > Hello Marina, Hello, Fabien! Thank you for your comments! >> While trying to test a patch that adds a synchronization barrier in >> pgbench [1] on Windows, > > Thanks for trying that, I do not have a windows setup for testing, and > the sync code I wrote for Windows is basically blind coding:-( FYI: 1) It looks like pgbench will no longer support Windows XP due to the function DeleteSynchronizationBarrier. From https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier : Minimum supported client: Windows 8 [desktop apps only] Minimum supported server: Windows Server 2012 [desktop apps only] On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] has compiled without (new) warnings, but when running pgbench I got the following error: The procedure entry point DeleteSynchronizationBarrier could not be located in the dynamic link library KERNEL32.dll. 2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1] with fix_max_client_conn_on_Windows.patch has compiled without (new) warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1 -S) your patches fix problems with progress reports as in [2], but on Windows I did not notice such changes, see attached pgbench_runs_linux_vs_windows.zip. >> The almost same thing happens with reindexdb and vacuumdb (build on >> commit [3]): > > Windows fd implementation is somehow buggy because it does not return > the smallest number available, and then with the assumption that > select uses a dense array indexed with them (true on linux, less so on > Windows which probably uses a sparse array), so that the number gets > over the limit, even if less are actually used, hence the catch, as > you noted. I agree with you. It looks like the structure fd_set just contains used sockets by this application on Windows, and the macro FD_SETSIZE is used only here. From https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set : typedef struct fd_set { u_int fd_count; SOCKET fd_array[FD_SETSIZE]; } fd_set, FD_SET, *PFD_SET, *LPFD_SET; From https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 : The maximum number of sockets that a Windows Sockets application can use is not affected by the manifest constant FD_SETSIZE. This value defined in the Winsock2.h header file is used in constructing the FD_SET structures used with select function. >> IIUC the checks below are not correct on Windows, since on this system >> sockets can have values equal to or greater than FD_SETSIZE (see >> Windows documentation [4] and pgbench debug output in attached >> pgbench_debug.txt). > > Okay. > > But then, how may one detect that there are too many fds in the set? > > I think that an earlier version of the code needed to make assumptions > about the internal implementation of windows (there is a counter > somewhere in windows fd_set struct), which was rejected because if was > breaking the interface. Now your patch is basically resurrecting that. I tried to keep the behaviour "we check if the socket value can be used in select() at runtime", but now I will also read that thread... > Why not if there is no other solution, but this is quite depressing, > and because it breaks the interface it would be broken if windows > changed its internals for some reason:-( It looks like if the internals of the structure fd_set are changed, we will also have problems with the function pgwin32_select from src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?.. (I'm writing responses to the rest of your comments but it takes time...) [1] https://www.postgresql.org/message-id/alpine.DEB.2.22.394.2011021726390.989361%40pseudo [2] https://www.postgresql.org/message-id/20200227185129.hikscyenomnlrord%40alap3.anarazel.de -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
Re: pgbench stopped supporting large number of client connections on Windows
From
Masahiko Sawada
Date:
Hi Marina, On Sun, Nov 8, 2020 at 11:59 PM Marina Polyakova <m.polyakova@postgrespro.ru> wrote: > > On 2020-11-07 01:01, Fabien COELHO wrote: > > Hello Marina, > > Hello, Fabien! > > Thank you for your comments! > > >> While trying to test a patch that adds a synchronization barrier in > >> pgbench [1] on Windows, > > > > Thanks for trying that, I do not have a windows setup for testing, and > > the sync code I wrote for Windows is basically blind coding:-( > > FYI: > > 1) It looks like pgbench will no longer support Windows XP due to the > function DeleteSynchronizationBarrier. From > https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier > : > > Minimum supported client: Windows 8 [desktop apps only] > Minimum supported server: Windows Server 2012 [desktop apps only] > > On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] > has compiled without (new) warnings, but when running pgbench I got the > following error: > > The procedure entry point DeleteSynchronizationBarrier could not be > located in the dynamic link library KERNEL32.dll. > > 2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1] > with fix_max_client_conn_on_Windows.patch has compiled without (new) > warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with > and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1 > -S) your patches fix problems with progress reports as in [2], but on > Windows I did not notice such changes, see attached > pgbench_runs_linux_vs_windows.zip. > > >> The almost same thing happens with reindexdb and vacuumdb (build on > >> commit [3]): > > > > Windows fd implementation is somehow buggy because it does not return > > the smallest number available, and then with the assumption that > > select uses a dense array indexed with them (true on linux, less so on > > Windows which probably uses a sparse array), so that the number gets > > over the limit, even if less are actually used, hence the catch, as > > you noted. > > I agree with you. It looks like the structure fd_set just contains used > sockets by this application on Windows, and the macro FD_SETSIZE is used > only here. > > From > https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set > : > > typedef struct fd_set { > u_int fd_count; > SOCKET fd_array[FD_SETSIZE]; > } fd_set, FD_SET, *PFD_SET, *LPFD_SET; > > From > https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 > : > > The maximum number of sockets that a Windows Sockets application can use > is not affected by the manifest constant FD_SETSIZE. This value defined > in the Winsock2.h header file is used in constructing the FD_SET > structures used with select function. > > >> IIUC the checks below are not correct on Windows, since on this system > >> sockets can have values equal to or greater than FD_SETSIZE (see > >> Windows documentation [4] and pgbench debug output in attached > >> pgbench_debug.txt). > > > > Okay. > > > > But then, how may one detect that there are too many fds in the set? > > > > I think that an earlier version of the code needed to make assumptions > > about the internal implementation of windows (there is a counter > > somewhere in windows fd_set struct), which was rejected because if was > > breaking the interface. Now your patch is basically resurrecting that. > > I tried to keep the behaviour "we check if the socket value can be used > in select() at runtime", but now I will also read that thread... > > > Why not if there is no other solution, but this is quite depressing, > > and because it breaks the interface it would be broken if windows > > changed its internals for some reason:-( > > It looks like if the internals of the structure fd_set are changed, we > will also have problems with the function pgwin32_select from > src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?.. > > (I'm writing responses to the rest of your comments but it takes > time...) > This patch on Commitfest has been "Waiting on Author" for almost 2 months. Could you share the current status? Are you updating the patch? Regards, -- Masahiko Sawada EnterpriseDB: https://www.enterprisedb.com/
Re: pgbench stopped supporting large number of client connections on Windows
From
Masahiko Sawada
Date:
Hi, On Thu, Jan 7, 2021 at 10:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Hi Marina, > > On Sun, Nov 8, 2020 at 11:59 PM Marina Polyakova > <m.polyakova@postgrespro.ru> wrote: > > > > On 2020-11-07 01:01, Fabien COELHO wrote: > > > Hello Marina, > > > > Hello, Fabien! > > > > Thank you for your comments! > > > > >> While trying to test a patch that adds a synchronization barrier in > > >> pgbench [1] on Windows, > > > > > > Thanks for trying that, I do not have a windows setup for testing, and > > > the sync code I wrote for Windows is basically blind coding:-( > > > > FYI: > > > > 1) It looks like pgbench will no longer support Windows XP due to the > > function DeleteSynchronizationBarrier. From > > https://docs.microsoft.com/en-us/windows/win32/api/synchapi/nf-synchapi-deletesynchronizationbarrier > > : > > > > Minimum supported client: Windows 8 [desktop apps only] > > Minimum supported server: Windows Server 2012 [desktop apps only] > > > > On Windows Server 2008 R2 (MSVC 2013) the 6-th version of the patch [1] > > has compiled without (new) warnings, but when running pgbench I got the > > following error: > > > > The procedure entry point DeleteSynchronizationBarrier could not be > > located in the dynamic link library KERNEL32.dll. > > > > 2) On Windows Server 2019 (MSVC 2019) the 6-th version of the patch [1] > > with fix_max_client_conn_on_Windows.patch has compiled without (new) > > warnings. I made a few runs (-M prepared -c 100 -j 10 -T 10 -P1 -S) with > > and without your patches. On Linux (-M prepared -c 1000 -j 500 -T 10 -P1 > > -S) your patches fix problems with progress reports as in [2], but on > > Windows I did not notice such changes, see attached > > pgbench_runs_linux_vs_windows.zip. > > > > >> The almost same thing happens with reindexdb and vacuumdb (build on > > >> commit [3]): > > > > > > Windows fd implementation is somehow buggy because it does not return > > > the smallest number available, and then with the assumption that > > > select uses a dense array indexed with them (true on linux, less so on > > > Windows which probably uses a sparse array), so that the number gets > > > over the limit, even if less are actually used, hence the catch, as > > > you noted. > > > > I agree with you. It looks like the structure fd_set just contains used > > sockets by this application on Windows, and the macro FD_SETSIZE is used > > only here. > > > > From > > https://docs.microsoft.com/en-us/windows/win32/api/winsock/ns-winsock-fd_set > > : > > > > typedef struct fd_set { > > u_int fd_count; > > SOCKET fd_array[FD_SETSIZE]; > > } fd_set, FD_SET, *PFD_SET, *LPFD_SET; > > > > From > > https://docs.microsoft.com/en-us/windows/win32/winsock/maximum-number-of-sockets-supported-2 > > : > > > > The maximum number of sockets that a Windows Sockets application can use > > is not affected by the manifest constant FD_SETSIZE. This value defined > > in the Winsock2.h header file is used in constructing the FD_SET > > structures used with select function. > > > > >> IIUC the checks below are not correct on Windows, since on this system > > >> sockets can have values equal to or greater than FD_SETSIZE (see > > >> Windows documentation [4] and pgbench debug output in attached > > >> pgbench_debug.txt). > > > > > > Okay. > > > > > > But then, how may one detect that there are too many fds in the set? > > > > > > I think that an earlier version of the code needed to make assumptions > > > about the internal implementation of windows (there is a counter > > > somewhere in windows fd_set struct), which was rejected because if was > > > breaking the interface. Now your patch is basically resurrecting that. > > > > I tried to keep the behaviour "we check if the socket value can be used > > in select() at runtime", but now I will also read that thread... > > > > > Why not if there is no other solution, but this is quite depressing, > > > and because it breaks the interface it would be broken if windows > > > changed its internals for some reason:-( > > > > It looks like if the internals of the structure fd_set are changed, we > > will also have problems with the function pgwin32_select from > > src/backend/port/win32/socket.c, because it uses fd_set.fd_count too?.. > > > > (I'm writing responses to the rest of your comments but it takes > > time...) > > > > This patch on Commitfest has been "Waiting on Author" for almost 2 > months. Could you share the current status? Are you updating the > patch? Status update for a commitfest entry. Since this is a bug fix, I've moved this patch to the next commitfest. I think if this patch is still inactive until the feature freeze we can remove this entry by setting it to "Returned with Feedback". Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: pgbench stopped supporting large number of client connections on Windows
From
David Steele
Date:
On 2/1/21 8:18 AM, Masahiko Sawada wrote: >> >> This patch on Commitfest has been "Waiting on Author" for almost 2 >> months. Could you share the current status? Are you updating the >> patch? > > Status update for a commitfest entry. > > Since this is a bug fix, I've moved this patch to the next commitfest. > I think if this patch is still inactive until the feature freeze we > can remove this entry by setting it to "Returned with Feedback". If no further work is planned on this patch then we should close it. I plan to do that on April 1 if there are no objections. It's listed as a bug but seems like a limitation that could just be documented. Regards, -- -David david@pgmasters.net