Thread: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).
[GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5 (EDB binary).
From
Nikolai Zhubr
Date:
Hello all, (Hopefully this is right place to post on the subject, otherwise please let me know.) I'm observing some strange inexplicable effect in 9.5.5 server running on x86 windows (32-bit windows xp sp3). That is, CPU usage in backend process for the session in question starts to grow, going from approx 0-1 to 8-15 percent and more, stays that high for several seconds, then goes back to 0-1. All the effect takes about 15-30 seconds, and repeats stably every 10-20 minutes (as long as the respective client continues to run the same queries). Apparently it is essential to consider the pattern of requests going to the server: in this case there is a continuous stream of very small cheap queries, but quite a lot of them per second. Trying to understand the reason, I've managed to craft a pure artifical test triggering very similar CPU load waves without the need for any specific database at all, easy to repeat: 1. "select localtimestamp" 40 times (As separate requests, one by one, but no delay inserted in between) 2. wait 1/2 second. 3. goto 1 That's it. Just let it run for > 20 minutes in one session. (These queries are so cheap that normally they consume approx zero resources) Screenshots: https://yadi.sk/i/J_yj_0t43BgdGw (I can also send as file if this link does not work) Other notes: - the server instance in question is EDB 9.5.5-1 win32 binaries. - the production machine is Core 2 duo 2600MHz, 2Gb ram, typical CPU load is rather low, like 0% to 3%, therefore the effect is easily noticable. - no antivirus or other weird or abusive software involved. - communication goes through libpq (tcp/ip only). - turning SSL on/off does not matter. - my test application issueing "select localtimestamp" in the endless loop is written in pascal, although this probably does not matter. - query execution time as seen by the client is not affected (stays low). - pausing the test in the client causes backend to drop CPU usage immediately to 0, resuming causes it to go back to where it was (unless pausing for too long), so excessive CPU load is tied to some normal activity of backend (i.e. no activity == no load). - the effect is NOT observed (yet?) when running test on a server machine directly (pointing it to 127.0.0.1) - the effect looks more substantial in SMP case (2 Cores), compared to UMP case (when testing in a VM, see below). - nothing appears in the log. To me it looks like some sort of wait/check/synchronize issue for a socket/lock/signal or similar. Probably some rare corner case, probably windows-specific. However, looking through backend/port/win32/socket.c and backend/port/win32_latch.c I cannot immediately see anything wrong yet (but WaitForMultipleObjects is a tricky thing IIRC). Luckily I've managed to construct a configuration very similar to production machine in a development VM (VirtualBox) and reproduced the effect there, so now I'm able to safely and comfortably test whatever ideas might appear (Although I'm somewhat reluctant to rebuild and hand-debug the server here myself, because preparing proper build environment on windows is quite a lot of work...) Any help/thoughts/instructions greatly appreciated. Should I also report this to bugtracker at postgresql.org? Thank you, Nikolai
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Michael Paquier
Date:
On Mon, Jan 30, 2017 at 6:16 PM, Nikolai Zhubr <n-a-zhubr@yandex.ru> wrote: > I'm observing some strange inexplicable effect in 9.5.5 server running on > x86 windows (32-bit windows xp sp3). Oh, well... You are aware that this is out of support by Microsoft, right? > That is, CPU usage in backend process > for the session in question starts to grow, going from approx 0-1 to 8-15 > percent and more, stays that high for several seconds, then goes back to > 0-1. All the effect takes about 15-30 seconds, and repeats stably every > 10-20 minutes (as long as the respective client continues to run the same > queries). Apparently it is essential to consider the pattern of requests > going to the server: in this case there is a continuous stream of very small > cheap queries, but quite a lot of them per second. Trying to understand the > reason, I've managed to craft a pure artifical test triggering very similar > CPU load waves without the need for any specific database at all, easy to > repeat: > > 1. "select localtimestamp" 40 times (As separate requests, one by one, but > no delay inserted in between) > 2. wait 1/2 second. > 3. goto 1 Craig, could this be a side-effect of 519b0757? That's new in 9.5, and that's directly related to the code paths discussed here. -- Michael
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
31.01.2017 6:43, Michael Paquier wrote: > On Mon, Jan 30, 2017 at 6:16 PM, Nikolai Zhubr<n-a-zhubr@yandex.ru> wrote: >> I'm observing some strange inexplicable effect in 9.5.5 server running on >> x86 windows (32-bit windows xp sp3). > > Oh, well... You are aware that this is out of support by Microsoft, right? Sure. The box is quite old. However, at the moment I'm not confident the issue is XP-specific. Not even really sure it is windows-specific. And, I can't see any indication that the behaviour of e.g. WaitForMultipleObjects() changed much from NT 3.51 to Windows 10 anyway. [...] >> 1. "select localtimestamp" 40 times (As separate requests, one by one, but >> no delay inserted in between) >> 2. wait 1/2 second. >> 3. goto 1 > > Craig, could this be a side-effect of 519b0757? That's new in 9.5, and > that's directly related to the code paths discussed here. Meanwhile I'll redo my tests using 9.4 instead of 9.5 and report back. Thank you. Nikolai
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
31.01.2017 10:37, I wrote: > [...] >>> 1. "select localtimestamp" 40 times (As separate requests, one by >>> one, but >>> no delay inserted in between) >>> 2. wait 1/2 second. >>> 3. goto 1 >> >> Craig, could this be a side-effect of 519b0757? That's new in 9.5, and >> that's directly related to the code paths discussed here. > > Meanwhile I'll redo my tests using 9.4 instead of 9.5 and report back. Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32) Would it make sense to check some even older ones? Thank you. Nikolai > > Thank you. > > Nikolai > >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Andres Freund
Date:
On 2017-01-31 11:45:33 +0300, Nikolai Zhubr wrote: > 31.01.2017 10:37, I wrote: > > [...] > > > > 1. "select localtimestamp" 40 times (As separate requests, one by > > > > one, but > > > > no delay inserted in between) > > > > 2. wait 1/2 second. > > > > 3. goto 1 > > > > > > Craig, could this be a side-effect of 519b0757? That's new in 9.5, and > > > that's directly related to the code paths discussed here. > > > > Meanwhile I'll redo my tests using 9.4 instead of 9.5 and report back. > > Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32) That's good to know, less because of 519b0757, more because of the latch changes - but they also went in in 9.5... > Would it make sense to check some even older ones? Could you use process monitor or such to see what the process is doing while using a lot of CPU? Regards, Andres
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
31.01.2017 19:51, Andres Freund: [...] >> Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32) > > That's good to know, less because of 519b0757, more because of the latch > changes - but they also went in in 9.5... > >> Would it make sense to check some even older ones? > > Could you use process monitor or such to see what the process is doing > while using a lot of CPU? I'm not sure how to do this, especially considering that the process in question is running as a service? Now, some more input: * 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We are moving to Centos soon anyway!) * 9.4.4 server running on win7 32-bit - affected, same thing as on XP. Thank you. Nikolai > > Regards, > > Andres >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
01.02.2017 1:02, I wrote: [...] >> Could you use process monitor or such to see what the process is doing >> while using a lot of CPU? > > I'm not sure how to do this, especially considering that the process in > question is running as a service? > > Now, some more input: > > * 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We > are moving to Centos soon anyway!) > > * 9.4.4 server running on win7 32-bit - affected, same thing as on XP. I've managed to create a "fix" (see diff below). It looks like the wait logic is somehow broken on windows currently, though I can not find the problem myself yet. It would be great if someone more familiar with the (windows-specific) code came up with ideas. I have a build environment ready so I could do more tests then. --- be-secure.c.orig 2017-02-01 22:37:37.228032608 +0300 +++ be-secure.c 2017-02-01 22:51:17.655751292 +0300 @@ -159,6 +159,7 @@ * socket to become ready again. */ } + Sleep(15); /* n.zhubr */ goto retry; } @@ -238,6 +239,7 @@ * socket to become ready again. */ } + Sleep(15); /* n.zhubr */ goto retry; } Thank you. Nikolai > > > Thank you. > > Nikolai > >> >> Regards, >> >> Andres >> > > >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
02.02.2017 2:14, I wrote: > 01.02.2017 1:02, I wrote: > [...] >>> Could you use process monitor or such to see what the process is doing >>> while using a lot of CPU? >> >> I'm not sure how to do this, especially considering that the process in >> question is running as a service? >> >> Now, some more input: >> >> * 9.5.2 server running on linux x86_64 - unaffected! (What a relief! We >> are moving to Centos soon anyway!) >> >> * 9.4.4 server running on win7 32-bit - affected, same thing as on XP. > > I've managed to create a "fix" (see diff below). > It looks like the wait logic is somehow broken on windows currently, > though I can not find the problem myself yet. > It would be great if someone more familiar with the (windows-specific) > code came up with ideas. > I have a build environment ready so I could do more tests then. Some update. Adding this "Sleep(15)" before "goto retry" into secure_read() has apparently eliminated the effect at our production server too. That is, my load-bug-detector has been quiet for > 24hr or more. Now by adding more debigging stuff into secure_read() and secure_write() I've found that: * secure_write() is likely irrelevant, as "goto retry" there was never actually hit yet; * in secure_read(), during the intervals of excessive cpu load, WaitLatchOrSocket() was never observed to indicate latch event, and was never observed to (erroneously) indicate socket readiness more than once (with socket read attempt in between), which I was suspecting happening, so I can not blame secure_read() itself and this all makes me wonder even more... Note: I'm testing with SSL off now. As always, and hints greatly appreciated! Thank you. Nikolai > > --- be-secure.c.orig 2017-02-01 22:37:37.228032608 +0300 > +++ be-secure.c 2017-02-01 22:51:17.655751292 +0300 > @@ -159,6 +159,7 @@ > * socket to become ready again. > */ > } > + Sleep(15); /* n.zhubr */ > goto retry; > } > > @@ -238,6 +239,7 @@ > * socket to become ready again. > */ > } > + Sleep(15); /* n.zhubr */ > goto retry; > } > > > Thank you. > > Nikolai > >> >> >> Thank you. >> >> Nikolai >> >>> >>> Regards, >>> >>> Andres >>> >> >> >> > > >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
03.02.2017 13:52, I wrote: [...] > Adding this "Sleep(15)" before "goto retry" into secure_read() has > apparently eliminated the effect at our production server too. That is, > my load-bug-detector has been quiet for > 24hr or more. > > Now by adding more debigging stuff into secure_read() and secure_write() > I've found that: > > * secure_write() is likely irrelevant, as "goto retry" there was never > actually hit yet; > > * in secure_read(), during the intervals of excessive cpu load, > WaitLatchOrSocket() was never observed to indicate latch event, and was > never observed to (erroneously) indicate socket readiness more than once > (with socket read attempt in between), which I was suspecting happening, > so I can not blame secure_read() itself and this all makes me wonder > even more... Ok, secure_read() is likely irrelevant too. I think what happened after I inserted "Sleep(15)" into secure_read() is that this "Sleep(15)" was essentially added into the main "for(;;)" loop of PostgresMain (through ReadCommand), introducing an artifical additional CPU relaxation step along with every incoming query and therefore just masking a real CPU eater. So probably I'll have to somehow profile this "for(;;)" in PostgresMain. Thank you. Nikolai > > Note: I'm testing with SSL off now. > > As always, and hints greatly appreciated! > > > Thank you. > Nikolai > >> >> --- be-secure.c.orig 2017-02-01 22:37:37.228032608 +0300 >> +++ be-secure.c 2017-02-01 22:51:17.655751292 +0300 >> @@ -159,6 +159,7 @@ >> * socket to become ready again. >> */ >> } >> + Sleep(15); /* n.zhubr */ >> goto retry; >> } >> >> @@ -238,6 +239,7 @@ >> * socket to become ready again. >> */ >> } >> + Sleep(15); /* n.zhubr */ >> goto retry; >> } >> >> >> Thank you. >> >> Nikolai >> >>> >>> >>> Thank you. >>> >>> Nikolai >>> >>>> >>>> Regards, >>>> >>>> Andres >>>> >>> >>> >>> >> >> >> > > >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Andres Freund
Date:
On 2017-02-01 01:02:11 +0300, Nikolai Zhubr wrote: > 31.01.2017 19:51, Andres Freund: > [...] > > > Exactly same trouble with 9.4 (Specifically EDB 9.4.10-1 win32) > > > > That's good to know, less because of 519b0757, more because of the latch > > changes - but they also went in in 9.5... > > > > > Would it make sense to check some even older ones? > > > > Could you use process monitor or such to see what the process is doing > > while using a lot of CPU? > > I'm not sure how to do this, especially considering that the process in > question is running as a service? I don't think that stops you, if you have admin privileges.
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Andres Freund
Date:
On 2017-02-03 22:17:55 +0300, Nikolai Zhubr wrote: > 03.02.2017 20:29, Andres Freund: > [...] > > > > Could you use process monitor or such to see what the process is doing > > > > while using a lot of CPU? > > > > > > I'm not sure how to do this, especially considering that the process in > > > question is running as a service? > > > > I don't think that stops you, if you have admin privileges. > > Well, profiling postgres.exe is still beyond my capability at this time > anyway. Instead, I'll try to prepare a most simple client application > example for testing the behaviour in question so that anyone could run it > easily. (And while working on such an example, maybe I'll get some more > understanding of what is actually going on here.) I'm missing something. All you need to do is to start processexplorer.exe and filter out other applications? Andres
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
03.02.2017 20:29, Andres Freund: [...] >>> Could you use process monitor or such to see what the process is doing >>> while using a lot of CPU? >> >> I'm not sure how to do this, especially considering that the process in >> question is running as a service? > > I don't think that stops you, if you have admin privileges. Well, profiling postgres.exe is still beyond my capability at this time anyway. Instead, I'll try to prepare a most simple client application example for testing the behaviour in question so that anyone could run it easily. (And while working on such an example, maybe I'll get some more understanding of what is actually going on here.) Thank you. Nikolai >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
03.02.2017 22:16, Andres Freund: [...] >> Well, profiling postgres.exe is still beyond my capability at this time >> anyway. Instead, I'll try to prepare a most simple client application >> example for testing the behaviour in question so that anyone could run it >> easily. (And while working on such an example, maybe I'll get some more >> understanding of what is actually going on here.) > > I'm missing something. All you need to do is to start > processexplorer.exe and filter out other applications? Ah, ok. I first thought about some kind of code-level debugging. And yes, running Process Explorer gave some new and unexpected input. During the period of this strange high load it claims 40% CPU is used by interrupts (normally 0.01%) and 3% used by backend postgres.exe (normally approx 0%). I'd guess this means some problem happening in the OS (which hosts this postgres.exe), probably related to network communication? (Because nothing else seems likely related to interrupts in such scenario?) The screenshot: https://yadi.sk/i/hC8FMZCE3CyzPs Thank you. Nikolai > > Andres >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
05.02.2017 17:10, I wrote: [...] > And yes, running Process Explorer gave some new and unexpected input. > During the period of this strange high load it claims 40% CPU is used by > interrupts (normally 0.01%) and 3% used by backend postgres.exe > (normally approx 0%). I'd guess this means some problem happening in the > OS (which hosts this postgres.exe), probably related to network > communication? (Because nothing else seems likely related to interrupts > in such scenario?) Additionally, I've now got a kernrate viewer reports, one for high load, and one for normal load period, attached below. Here, tcpip shows some more kernel activity during the problematic period, but the difference is not so huge. Time 36348 hits, 25000 events per hit -------- Module Hits msec %Total Events/Sec intelppm 35048 45640 96 % 19198071 hal 1030 45640 2 % 564198 ntkrnlpa 204 45640 0 % 111744 tcpip 20 45640 0 % 10955 win32k 18 45640 0 % 9859 afd 6 45640 0 % 3286 ipnat 6 45640 0 % 3286 NDIS 4 45640 0 % 2191 ...... Time 37227 hits, 25000 events per hit -------- Module Hits msec %Total Events/Sec intelppm 35856 46828 96 % 19142393 hal 1089 46828 2 % 581382 ntkrnlpa 229 46828 0 % 122255 win32k 29 46828 0 % 15482 tcpip 9 46828 0 % 4804 NDIS 4 46828 0 % 2135 afd 3 46828 0 % 1601 psched 3 46828 0 % 1601 ipnat 2 46828 0 % 1067 Thank you. Nikolai > > The screenshot: https://yadi.sk/i/hC8FMZCE3CyzPs > > > Thank you. > > Nikolai > > >> >> Andres >> >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
05.02.2017 22:05, I wrote: [...] >> And yes, running Process Explorer gave some new and unexpected input. >> During the period of this strange high load it claims 40% CPU is used by >> interrupts (normally 0.01%) and 3% used by backend postgres.exe >> (normally approx 0%). I'd guess this means some problem happening in the >> OS (which hosts this postgres.exe), probably related to network >> communication? (Because nothing else seems likely related to interrupts >> in such scenario?) Ok, I've got a working example as simple as the following: do { for (i=0; i<40; i++) { Sleep(15); res = PQexec(conn, "SELECT localtimestamp "); if (PQresultStatus(res) != PGRES_TUPLES_OK) return 1; PQclear(res); } fprintf(stdout, "*"); Sleep(350); } while(1); Here, Sleep(15) and Sleep(350) are very important, removing or changing them can cause the effect to disappear. Looks like some timing-sensitive issue in windows TCP/IP implementation? But then, how can it be that no-one have noticed it yet? Puzzling. Thank you. Regards, Nikolai > > Additionally, I've now got a kernrate viewer reports, one for high load, > and one for normal load period, attached below. Here, tcpip shows some > more kernel activity during the problematic period, but the difference > is not so huge. > > Time 36348 hits, 25000 events per hit -------- > Module Hits msec %Total Events/Sec > intelppm 35048 45640 96 % 19198071 > hal 1030 45640 2 % 564198 > ntkrnlpa 204 45640 0 % 111744 > tcpip 20 45640 0 % 10955 > win32k 18 45640 0 % 9859 > afd 6 45640 0 % 3286 > ipnat 6 45640 0 % 3286 > NDIS 4 45640 0 % 2191 > ...... > > Time 37227 hits, 25000 events per hit -------- > Module Hits msec %Total Events/Sec > intelppm 35856 46828 96 % 19142393 > hal 1089 46828 2 % 581382 > ntkrnlpa 229 46828 0 % 122255 > win32k 29 46828 0 % 15482 > tcpip 9 46828 0 % 4804 > NDIS 4 46828 0 % 2135 > afd 3 46828 0 % 1601 > psched 3 46828 0 % 1601 > ipnat 2 46828 0 % 1067 > > > Thank you. > > Nikolai > >> >> The screenshot: https://yadi.sk/i/hC8FMZCE3CyzPs >> >> >> Thank you. >> >> Nikolai >> >> >>> >>> Andres >>> >> > > >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
Hello all, In order to locate the problem more precisely, I'd like to prepare a test, involving some ping-like communication between the server and a test client. That is, I'd like to repeatedly send something valid to the server and get some valid replies from it, but without any kind of real activity happening on the server. I've looked through the main loop in PostgresMain() but could not find any suitable candidates. Any thoughts? Thank you. Nikolai 03.02.2017 16:30, I wrote: [...] > Ok, secure_read() is likely irrelevant too. > > I think what happened after I inserted "Sleep(15)" into secure_read() is > that this "Sleep(15)" was essentially added into the main "for(;;)" loop > of PostgresMain (through ReadCommand), introducing an artifical > additional CPU relaxation step along with every incoming query and > therefore just masking a real CPU eater. > > So probably I'll have to somehow profile this "for(;;)" in PostgresMain. > > > Thank you. > > Nikolai >
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
"Rader, David"
Date:
How about using pg_isready?
On Sun, Feb 12, 2017 at 12:23 PM, Nikolai Zhubr <n-a-zhubr@yandex.ru> wrote:
Hello all,
In order to locate the problem more precisely, I'd like to prepare a test, involving some ping-like communication between the server and a test client. That is, I'd like to repeatedly send something valid to the server and get some valid replies from it, but without any kind of real activity happening on the server. I've looked through the main loop in PostgresMain() but could not find any suitable candidates.
Any thoughts?
Thank you.
Nikolai
03.02.2017 16:30, I wrote:
[...]Ok, secure_read() is likely irrelevant too.
I think what happened after I inserted "Sleep(15)" into secure_read() is
that this "Sleep(15)" was essentially added into the main "for(;;)" loop
of PostgresMain (through ReadCommand), introducing an artifical
additional CPU relaxation step along with every incoming query and
therefore just masking a real CPU eater.
So probably I'll have to somehow profile this "for(;;)" in PostgresMain.
Thank you.
Nikolai--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Nikolai Zhubr
Date:
13.02.2017 23:58, Rader, David: > How about using pg_isready? > https://www.postgresql.org/docs/current/static/app-pg-isready.html But it doesn't actually communicate with the server AFAIK, just checks if a connection could be established? Maybe I should have been more specific. What I need is debugging/profiling pure communication side of server operation, implying huge lots of requests and replies going over the wire to and from the server within some continued (valid) session, but so that the server is not actually doing anything above that (no sql, no locking, no synchronizing, zero usefull activity, just pumping network I/O) Regards, Nikolai > > -- > David Rader > davidr@openscg.com <mailto:davidr@openscg.com> > > On Sun, Feb 12, 2017 at 12:23 PM, Nikolai Zhubr <n-a-zhubr@yandex.ru > <mailto:n-a-zhubr@yandex.ru>> wrote: > > Hello all, > > In order to locate the problem more precisely, I'd like to prepare a > test, involving some ping-like communication between the server and > a test client. That is, I'd like to repeatedly send something valid > to the server and get some valid replies from it, but without any > kind of real activity happening on the server. I've looked through > the main loop in PostgresMain() but could not find any suitable > candidates. > > Any thoughts? > > Thank you. > > Nikolai > > > 03.02.2017 16:30, I wrote: > [...] > > Ok, secure_read() is likely irrelevant too. > > I think what happened after I inserted "Sleep(15)" into > secure_read() is > that this "Sleep(15)" was essentially added into the main > "for(;;)" loop > of PostgresMain (through ReadCommand), introducing an artifical > additional CPU relaxation step along with every incoming query and > therefore just masking a real CPU eater. > > So probably I'll have to somehow profile this "for(;;)" in > PostgresMain. > > > Thank you. > > Nikolai
[GENERAL] Re: Causeless CPU load waves in backend, on windows, 9.5.5 (EDBbinary).
From
Thomas Kellerer
Date:
Nikolai Zhubr schrieb am 13.02.2017 um 23:03: > Maybe I should have been more specific. > What I need is debugging/profiling pure communication side of server > operation, implying huge lots of requests and replies going over the > wire to and from the server within some continued (valid) session, > but so that the server is not actually doing anything above that (no > sql, no locking, no synchronizing, zero usefull activity, just > pumping network I/O) > If you are willing to drop the "no sql" requirement you could use something like select rpad('*', 100000000, '*'); this will send a lot of data over the wire, the SQL overhead should be fairly small. You can send more data if you combine that with e.g. generate_series()
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
Scott Mead
Date:
On Mon, Feb 13, 2017 at 5:10 PM, Thomas Kellerer <spam_eater@gmx.net> wrote:
Nikolai Zhubr schrieb am 13.02.2017 um 23:03:Maybe I should have been more specific.
What I need is debugging/profiling pure communication side of server
operation, implying huge lots of requests and replies going over the
wire to and from the server within some continued (valid) session,
but so that the server is not actually doing anything above that (no
sql, no locking, no synchronizing, zero usefull activity, just
pumping network I/O)
My personal favorite is 'select 1'
I know you're not looking for SQL, but, you gotta start somewhere...
If you are willing to drop the "no sql" requirement you could use something like
select rpad('*', 100000000, '*');
this will send a lot of data over the wire, the SQL overhead should be fairly small.
You can send more data if you combine that with e.g. generate_series()
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Causeless CPU load waves in backend, on windows, 9.5.5(EDB binary).
From
Michael Paquier
Date:
On Tue, Feb 14, 2017 at 7:03 AM, Nikolai Zhubr <n-a-zhubr@yandex.ru> wrote: > 13.02.2017 23:58, Rader, David: >> >> How about using pg_isready? >> https://www.postgresql.org/docs/current/static/app-pg-isready.html > > But it doesn't actually communicate with the server AFAIK, just checks if a > connection could be established? Not exactly, it checks also if the server is able to accept connections or refusing them, after establishing the connection. The real deal is in that. For example it is possible to use pg_isready as well on a hot standby refusing connections because it did not reach yet a consistent point. -- Michael
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
Nikolai Zhubr
Date:
14.02.2017 1:10, Thomas Kellerer: > Nikolai Zhubr schrieb am 13.02.2017 um 23:03: >> Maybe I should have been more specific. >> What I need is debugging/profiling pure communication side of server >> operation, implying huge lots of requests and replies going over the >> wire to and from the server within some continued (valid) session, >> but so that the server is not actually doing anything above that (no >> sql, no locking, no synchronizing, zero usefull activity, just >> pumping network I/O) >> > > If you are willing to drop the "no sql" requirement you could use > something like > > select rpad('*', 100000000, '*'); > > this will send a lot of data over the wire, the SQL overhead should be > fairly small. Well yes, but I've been there already. Now I'd like to locate a CPU eater more precisely - supposedly there is some issue with communication, that is why I don't want to mix in anything else. Anyway, I've now got pretty sure the standard protocol out of the box does not provide such short-circuite capability so I'll have to hack it a bit. Regards, Nikolai > > You can send more data if you combine that with e.g. generate_series() >
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
John R Pierce
Date:
On 2/13/2017 11:03 PM, Nikolai Zhubr wrote: > Now I'd like to locate a CPU eater more precisely - supposedly there > is some issue with communication, that is why I don't want to mix in > anything else. use iperf to test the network transport layer, without any postgres in the loop? -- john r pierce, recycling bits in santa cruz
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
Nikolai Zhubr
Date:
14.02.2017 12:47, John R Pierce: > On 2/13/2017 11:03 PM, Nikolai Zhubr wrote: >> Now I'd like to locate a CPU eater more precisely - supposedly there >> is some issue with communication, that is why I don't want to mix in >> anything else. > > use iperf to test the network transport layer, without any postgres in > the loop? No, the network performance in general is fine. However, some specific continuous communication pattern causes something to go very wrong on windows periodically. (And there is no such problem on linux btw) While the issue now seems rather related to windows itself and not postgres, but the pattern typical for some of my postgres uses is apparently a reliable trigger. Therefore, I'm thinking to first isolate the problem inside postgres, and if it still triggers then, second step, create a test "server" consisting of communication code of postgres to run without any other parts of postgres. Regards, Nikolai > >
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
"Rader, David"
Date:
On Tue, Feb 14, 2017 at 5:28 AM, Nikolai Zhubr <n-a-zhubr@yandex.ru> wrote:
14.02.2017 12:47, John R Pierce:On 2/13/2017 11:03 PM, Nikolai Zhubr wrote:Now I'd like to locate a CPU eater more precisely - supposedly there
is some issue with communication, that is why I don't want to mix in
anything else.
use iperf to test the network transport layer, without any postgres in
the loop?
No, the network performance in general is fine. However, some specific continuous communication pattern causes something to go very wrong on windows periodically. (And there is no such problem on linux btw) While the issue now seems rather related to windows itself and not postgres, but the pattern typical for some of my postgres uses is apparently a reliable trigger. Therefore, I'm thinking to first isolate the problem inside postgres, and if it still triggers then, second step, create a test "server" consisting of communication code of postgres to run without any other parts of postgres.
Regards,
Nikolai
Try the libpq call from pg_isready. It does actually make a round trip to the postgres server and asks the server if it is ready to accept connections. So you are running the socket communication code of postgres and a small bit of "status" check but not any sql.
relevant call is PQping:
rv = PQpingParams(keywords, values, 1); |
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
Nikolai Zhubr
Date:
14.02.2017 18:15, Rader, David: [...] > Try the libpq call from pg_isready. It does actually make a round trip > to the postgres server and asks the server if it is ready to accept > connections. So you are running the socket communication code of > postgres and a small bit of "status" check but not any sql. Indeed, such function is available. But essentially, this function is a (kind of) combined login+logout, therefore it would not work for my purpose. (Despite its name, it can not be used to perform some communication "ping" within an established session, such thing simply does not exist in the protocol, at least as of 9.5) Regards, Nikolai > > from mirror: > https://github.com/postgres/postgres/blob/master/src/bin/scripts/pg_isready.c > > relevant call is PQping: > > rv = PQpingParams(keywords, values, 1); > > >
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
Nikolai Zhubr
Date:
15.02.2017 0:06, I wrote: [...] > Indeed, such function is available. But essentially, this function is a > (kind of) combined login+logout, therefore it would not work for my > purpose. (Despite its name, it can not be used to perform some > communication "ping" within an established session, such thing simply > does not exist in the protocol, at least as of 9.5) Ok, I've made some additions to the protocol to be able to do pings to a real server, some standalone tests mimicing postgresql communication and employing RDTSC, and run rediculously extensive experiments. And my findings to the moment is that the reason for CPU load issue is a wierd behaviour of windows' core functions, specifically WaitForMultipleObjects (and apparently the whole family of Wait functions) which is used in postgresql backend for waiting. So, what I've observed is that Wait* functions _usually_ go to sleep nicely when the state is not signalled, but _sometimes_, depending on unknown criteria, it can choose to instead do a busy-loop wait or something CPU-expensive. Maybe it tries to optimize the delay, or maybe it is a bug. The effect somewhat varies depending on windows version, CPU cores, selected system timer frequency, and Wait* call pattern (frequency). I can not currently see how it can be fixed in a generic and reliable way in postgres without sacrificing some performance. On the other hand, leaving it as-is is exposing the server to a substantial CPU abuse (even unintentional, as it initially happened in my case). Regards, Nikolai > > > Regards, > Nikolai > >> >> from mirror: >> https://github.com/postgres/postgres/blob/master/src/bin/scripts/pg_isready.c >> >> >> relevant call is PQping: >> >> rv = PQpingParams(keywords, values, 1); >> >> >> > > >
Re: [GENERAL] Re: Causeless CPU load waves in backend, on windows,9.5.5 (EDB binary).
From
Nikolai Zhubr
Date:
27.02.2017 10:08, I wrote: [...] > So, what I've observed is that Wait* functions _usually_ go to sleep > nicely when the state is not signalled, but _sometimes_, depending on > unknown criteria, it can choose to instead do a busy-loop wait or > something CPU-expensive. Maybe it tries to optimize the delay, or maybe > it is a bug. The effect somewhat varies depending on windows version, > CPU cores, selected system timer frequency, and Wait* call pattern > (frequency). > > I can not currently see how it can be fixed in a generic and reliable > way in postgres without sacrificing some performance. On the other hand, > leaving it as-is is exposing the server to a substantial CPU abuse (even > unintentional, as it initially happened in my case). Ok, I've finally found what happened. False alarm, in short. The server is perfectly fine. My mistake was that I was using most basic ways of monitoring CPU load, i.e. provided by GetProcessTimes() and taskmgr.exe and such. And because they only operate at the scheduler granularity (usually 16ms approx), some rounding has to happen in cpu consumption calculation. With this approach, this rounding can apparently distort the measurement dramatically (in some corner cases). Now, xperf employs a more fair approach to cpu consumption measurement, based on all the exact times of process switching-in and -out, which apparently gives a much better result. So, in my case the results of taskmgr and xperf are very different, and considering the above, taskmgr's report should be dismissed as bogus. And xperf indicates that my stress-tests actually do not cause postgres backend process to consume any noticable cpu percentage al all. That's good. Regards, Nikolai > > > Regards, > Nikolai