Thread: High CPU shoot during poll retry

High CPU shoot during poll retry

From

Gaurav Srivastava

Date:

12 December 2014, 14:35:57

Hi All,

In ODBC library later to change done as part of commit Title "Rip out broken retry/timeout logic in SOCK_wait_for_ready." in file socket.c

Now the cpde snippet for SOCK_wait_for_ready() is like:

       do {
#ifdef HAVE_POLL
                fds.fd = sock->socket;
                fds.events = output ? POLLOUT : POLLIN;
                fds.revents = 0;
                ret = poll(&fds, 1, nowait ? 0 : -1);
mylog("!!! poll ret=%d revents=%x\n", ret, fds.revents);
#else
                FD_ZERO(&fds);
                FD_ZERO(&except_fds);
                FD_SET(sock->socket, &fds);
                FD_SET(sock->socket, &except_fds);
                if (nowait)
                {
                        tm.tv_sec = 0;
                        tm.tv_usec = 0;
                }
                ret = select((int) sock->socket + 1, output ? NULL : &fds, output ? &fds : NULL, &except_fds, nowait ? &tm : NULL);
#endif /* HAVE_POLL */
                gerrno = SOCK_ERRNO;
        } while (ret < 0 && EINTR == gerrno);

So whenever there is no fd is ready to be read it will immediately return and solve the issue of infinite query hung but due to immediate return it will go for continuous retries and causing CPU to shoot very high.This is one of the case we are suffering in our scenario after upgrading ODBC.

One way is to put usleep from post to every call of SOCK_wait_for_ready() to solve this,but would request if a better patch can be available to fix this issue.

Please suggest.

Thanks and Regards,
Gaurav Srivastava | Associate Consultant
GlobalLogic
P +91.120.4342000.2920 M +91.9953996631 S ta5ramn1
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

Re: High CPU shoot during poll retry

From

Heikki Linnakangas

Date:

12 December 2014, 20:57:22

On 12/12/2014 12:31 PM, Gaurav Srivastava wrote:
> Hi All,
>
> In ODBC library later to change done as part of commit Title  "Rip out
> broken retry/timeout logic in SOCK_wait_for_ready." in file socket.c
>
> Now  the cpde snippet for SOCK_wait_for_ready() is like:
>
>         do {
> #ifdef  HAVE_POLL
>                  fds.fd = sock->socket;
>                  fds.events = output ? POLLOUT : POLLIN;
>                  fds.revents = 0;
>                *  ret = poll(&fds, 1, nowait ? 0 : -1); *
> mylog("!!!  poll ret=%d revents=%x\n", ret, fds.revents);
> #else
>                  FD_ZERO(&fds);
>                  FD_ZERO(&except_fds);
>                  FD_SET(sock->socket, &fds);
>                  FD_SET(sock->socket, &except_fds);
>                  if (nowait)
>                  {
>                          tm.tv_sec = 0;
>                          tm.tv_usec = 0;
>                  }
>                  ret = select((int) sock->socket + 1, output ? NULL : &fds,
> output ? &fds : NULL, &except_fds, nowait ? &tm : NULL);
> #endif /* HAVE_POLL */
>                  gerrno = SOCK_ERRNO;
>          } *while (ret < 0 && EINTR == gerrno);*
>
>
>
> So whenever there is no fd is ready to be read it will immediately return
> and solve the issue of infinite query hung but  due to immediate return it
> will go for continuous retries and causing CPU to shoot very high.This is
> one of the case we are suffering in our scenario after upgrading ODBC.

Why do you think it will go into continuous retries?

- Heikki

Re: High CPU shoot during poll retry

From

Gaurav Srivastava

Date:

14 December 2014, 17:34:35

We are providing timeout value as 0 or -1 in poll() timeout field meaning zero timeout or infinite wait . During zero timeout case poll will immediately return with return value as zero and come out of loop.But from the places where this api SOCK_wait_for_ready()is called , it is checked that return value is 0 or more

if (SOCK_wait_for_ready(sock, FALSE, FALSE) >= 0)

goto retry;

In case of zero timeout if reading fd is not ready, it will go into continuous retries and in case of access load suppose 200k registrations,10k calls running it will result CPU to shoot very high.

This is the actual problem statement,please let me know if i was able to made my point clear.

If yes,then other than putting usleep() before every retry can we provide better solution? I solved my issue using usleep

Thanks and Regards,

Gaurav Srivastava | Associate Consultant
GlobalLogic
P +91.120.4342000.2920 M +91.9953996631 S ta5ramn1
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

On Sat, Dec 13, 2014 at 2:27 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 12/12/2014 12:31 PM, Gaurav Srivastava wrote:
Hi All,

In ODBC library later to change done as part of commit Title "Rip out
broken retry/timeout logic in SOCK_wait_for_ready." in file socket.c

Now the cpde snippet for SOCK_wait_for_ready() is like:

do {
#ifdef HAVE_POLL
fds.fd = sock->socket;
fds.events = output ? POLLOUT : POLLIN;
fds.revents = 0;
* ret = poll(&fds, 1, nowait ? 0 : -1); *
mylog("!!! poll ret=%d revents=%x\n", ret, fds.revents);
#else
FD_ZERO(&fds);
FD_ZERO(&except_fds);
FD_SET(sock->socket, &fds);
FD_SET(sock->socket, &except_fds);
if (nowait)
{
tm.tv_sec = 0;
tm.tv_usec = 0;
}
ret = select((int) sock->socket + 1, output ? NULL : &fds,
output ? &fds : NULL, &except_fds, nowait ? &tm : NULL);
#endif /* HAVE_POLL */
gerrno = SOCK_ERRNO;
} *while (ret < 0 && EINTR == gerrno);*

So whenever there is no fd is ready to be read it will immediately return
and solve the issue of infinite query hung but due to immediate return it
will go for continuous retries and causing CPU to shoot very high.This is
one of the case we are suffering in our scenario after upgrading ODBC.

Why do you think it will go into continuous retries?

- Heikki

Re: High CPU shoot during poll retry

From

Heikki Linnakangas

Date:

16 December 2014, 14:54:38

On 12/13/2014 04:48 AM, Gaurav Srivastava wrote:
> We are providing timeout value as 0 or  -1 in poll() timeout field meaning
> zero timeout or infinite wait . During zero timeout case poll will
> immediately return with return value as zero and come out of loop.But from
> the places where this api SOCK_wait_for_ready()is called , it is checked
> that return value is 0 or more
>
> if (SOCK_wait_for_ready(sock, FALSE, FALSE) >= 0)
>
>                                  goto retry;
> In case of zero timeout if reading fd is not ready, it will go into
> continuous retries and in case of access load suppose 200k
> registrations,10k calls running it will result CPU to shoot very high.

The above call passes nowait==FALSE. It will wait.

The callers that pass nowait==TRUE look like this:

>             if (!maybeEOF)
>             {
>                 int    nready = SOCK_wait_for_ready(self, FALSE, TRUE);
>                 if (nready > 0)
>                 {
>                     maybeEOF = TRUE;
>                     goto retry;
>                 }
>                 else if (0 == nready)
>                     maybeEOF = TRUE;
>             }
>             if (maybeEOF)
>                 SOCK_set_error(self, SOCKET_CLOSED, "Socket has been closed.");
>             else
>                 SOCK_set_error(self, SOCKET_READ_ERROR, "Error while reading from the socket.");
>             return 0;

On the first iteration, it calls SOCK_wait_for_ready() with zero
timeout, to check if the socket is immediately readable. If it is, it
retries once. Otherwise it throws an error.

> This is the actual problem statement,please let me know if i was able to
> made my point clear.
>
> If yes,then other than putting usleep() before every retry can we provide
> better solution? I solved my issue using usleep

If you can create a test program to reproduce the problem you're seeing,
I can have a look. Or you could attach a debugger and trace through
where it's actually looping.

- Heikki

Re: High CPU shoot during poll retry

From

Heikki Linnakangas

Date:

17 December 2014, 13:47:47

On 12/17/2014 07:21 AM, Gaurav Srivastava wrote:
> Hi Heikki,
>
> Just to explain a bit more about the case "nowait==FALSE"
>
> We used this fix "Rip out broken retry/timeout logic in
> SOCK_wait_for_ready" as in few scenarios our clients also get stuck in
> infinite wait and our design is such that none of our process can wait
> little longer than few secs if this happens preventive actions are taken
> resulting re-initialization of the process.
>
>
> In order to avoid that wait situation we did slight modification i.e.
> making "nowait" TRUE everywhere in SOCK_wait_for_ready() API which resolved
> our poll() wait issue after your fix,thanks for that but later to that this
> high CPU issue has come.
>
> We resolved it using micro sleep,but still looking for a better solution,as
> we don't have much understanding of ODBC.

Ok. Well, all I can say to that is "don't do that". There isn't anything
wrong with SOCK_wait_for_ready, as it's used in unmodified psqlodbc.

One idea is to have a separate watchdog thread or alarm signal handler,
and if an operation takes longer than you tolerate, have the watchdog
thread forcibly close() the socket. That will wake up the thread blocked
on the socket. You'll still need to hack psqlodbc to get the file
descriptor of the socket so that you can close() it, but it seems less
risky.

- Heikki

Re: High CPU shoot during poll retry

From

Heikki Linnakangas

Date:

17 December 2014, 15:02:13

On 12/17/2014 04:01 PM, Gaurav Srivastava wrote:
> Hi Heikki,
>
> Sincere thanks for your quick responses.
>
> Can we have similar functionality added in psqlODBC as in unmodified code
> when nowait==False it waits for "infinite duration" ,Instead If we have a
> configured timed wait,anybody can configure it as per their requirement.

Yeah, that would be nice. The ODBC standard way to do it is to set the
SQL_ATTR_QUERY_TIMEOUT attribute with SQLSetStmtAttr. It's not supported
by the psqlodbc driver, however. It probably would be quite difficult to
implement,  but patches are welcome..

You could also do "set statement_timeout=2" to set a timeout in the
server, but that wouldn't help with network problems.

- Heikki

Re: High CPU shoot during poll retry

From

Gaurav Srivastava

Date:

18 December 2014, 19:45:31

Hi Heikki,

Just to explain a bit more about the case "nowait==FALSE"

We used this fix "Rip out broken retry/timeout logic in SOCK_wait_for_ready" as in few scenarios our clients also get stuck in infinite wait and our design is such that none of our process can wait little longer than few secs if this happens preventive actions are taken resulting re-initialization of the process.

In order to avoid that wait situation we did slight modification i.e. making "nowait" TRUE everywhere in SOCK_wait_for_ready() API which resolved our poll() wait issue after your fix,thanks for that but later to that this high CPU issue has come.

We resolved it using micro sleep,but still looking for a better solution,as we don't have much understanding of ODBC.

Please let me know if I am not clear or you require more information.

Thanks and Regards,
Gaurav Srivastava | Associate Consultant
GlobalLogic
P +91.120.4342000.2920 M +91.9953996631 S ta5ramn1
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

On Tue, Dec 16, 2014 at 8:23 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 12/13/2014 04:48 AM, Gaurav Srivastava wrote:
We are providing timeout value as 0 or -1 in poll() timeout field meaning
zero timeout or infinite wait . During zero timeout case poll will
immediately return with return value as zero and come out of loop.But from
the places where this api SOCK_wait_for_ready()is called , it is checked
that return value is 0 or more

if (SOCK_wait_for_ready(sock, FALSE, FALSE) >= 0)

goto retry;
In case of zero timeout if reading fd is not ready, it will go into
continuous retries and in case of access load suppose 200k
registrations,10k calls running it will result CPU to shoot very high.

The above call passes nowait==FALSE. It will wait.

The callers that pass nowait==TRUE look like this:

if (!maybeEOF)
{
int nready = SOCK_wait_for_ready(self, FALSE, TRUE);
if (nready > 0)
{
maybeEOF = TRUE;
goto retry;
}
else if (0 == nready)
maybeEOF = TRUE;
}
if (maybeEOF)
SOCK_set_error(self, SOCKET_CLOSED, "Socket has been closed.");
else
SOCK_set_error(self, SOCKET_READ_ERROR, "Error while reading from the socket.");
return 0;

On the first iteration, it calls SOCK_wait_for_ready() with zero timeout, to check if the socket is immediately readable. If it is, it retries once. Otherwise it throws an error.

This is the actual problem statement,please let me know if i was able to
made my point clear.

If yes,then other than putting usleep() before every retry can we provide
better solution? I solved my issue using usleep

If you can create a test program to reproduce the problem you're seeing, I can have a look. Or you could attach a debugger and trace through where it's actually looping.

- Heikki

Re: High CPU shoot during poll retry

From

Gaurav Srivastava

Date:

18 December 2014, 19:47:28

Hi Heikki,

Sincere thanks for your quick responses.

Can we have similar functionality added in psqlODBC as in unmodified code when nowait==False it waits for "infinite duration" ,Instead If we have a configured timed wait,anybody can configure it as per their requirement.

Waiting for infinite time would still cause query hung scenarios Please let me know if i am wrong or misunderstood.

Thanking again,

Gaurav Srivastava | Associate Consultant
GlobalLogic
P +91.120.4342000.2920 M +91.9953996631 S ta5ramn1
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

On Wed, Dec 17, 2014 at 7:16 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 12/17/2014 07:21 AM, Gaurav Srivastava wrote:
Hi Heikki,

Just to explain a bit more about the case "nowait==FALSE"

We used this fix "Rip out broken retry/timeout logic in
SOCK_wait_for_ready" as in few scenarios our clients also get stuck in
infinite wait and our design is such that none of our process can wait
little longer than few secs if this happens preventive actions are taken
resulting re-initialization of the process.

In order to avoid that wait situation we did slight modification i.e.
making "nowait" TRUE everywhere in SOCK_wait_for_ready() API which resolved
our poll() wait issue after your fix,thanks for that but later to that this
high CPU issue has come.

We resolved it using micro sleep,but still looking for a better solution,as
we don't have much understanding of ODBC.

Ok. Well, all I can say to that is "don't do that". There isn't anything wrong with SOCK_wait_for_ready, as it's used in unmodified psqlodbc.

One idea is to have a separate watchdog thread or alarm signal handler, and if an operation takes longer than you tolerate, have the watchdog thread forcibly close() the socket. That will wake up the thread blocked on the socket. You'll still need to hack psqlodbc to get the file descriptor of the socket so that you can close() it, but it seems less risky.

- Heikki

Re: High CPU shoot during poll retry

From

Gaurav Srivastava

Date:

18 December 2014, 19:49:21

Hi Heikki,

Yes i understand it would be difficult but i would like to develop this patch as it would create huge flexibility in psqlODBC and help a lot to its users.

I have read a lot about you as a passionate psqlODBC developer , I would request if you can share some knowledge document about psqlODBC or design details so that i can create more effective patch.

Thanks and Regards,

Gaurav Srivastava | Associate Consultant
GlobalLogic
P +91.120.4342000.2920 M +91.9953996631 S ta5ramn1
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt

On Wed, Dec 17, 2014 at 8:31 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:

On 12/17/2014 04:01 PM, Gaurav Srivastava wrote:
Hi Heikki,

Sincere thanks for your quick responses.

Can we have similar functionality added in psqlODBC as in unmodified code
when nowait==False it waits for "infinite duration" ,Instead If we have a
configured timed wait,anybody can configure it as per their requirement.

Yeah, that would be nice. The ODBC standard way to do it is to set the SQL_ATTR_QUERY_TIMEOUT attribute with SQLSetStmtAttr. It's not supported by the psqlodbc driver, however. It probably would be quite difficult to implement, but patches are welcome..

You could also do "set statement_timeout=2" to set a timeout in the server, but that wouldn't help with network problems.

- Heikki