Thread: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Denis A Ustimenko
Date:
----- Forwarded message from Bruce Momjian <pgman@candle.pha.pa.us> -----

I don't know.  Would you ask hackers list, and perhaps CC the author of
that patch.

---------------------------------------------------------------------------

Denis A Ustimenko wrote:
> Bruce, why have all precise time calculations been droped out in 1.206? If there is no
> gettimeofday in win32?
> 
----- End forwarded message -----

-- 
Regards
Denis


Denis A Ustimenko wrote:
>>Bruce, why have all precise time calculations been droped out in 1.206? If there is no
>>gettimeofday in win32?

gettimeofday was not portable to win32 (at least not that I could find) and 
hence broke the win32 build of the clients.

Joe




Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Denis A Ustimenko
Date:
On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> Denis A Ustimenko wrote:
> >>Bruce, why have all precise time calculations been droped out in 1.206? 
> >>If there is no
> >>gettimeofday in win32?
> 
> gettimeofday was not portable to win32 (at least not that I could find) and 
> hence broke the win32 build of the clients.
> 

GetSystemTimeAsFileTime should help.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp

-- 
Regards
Denis


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Denis A Ustimenko wrote:
> On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> > Denis A Ustimenko wrote:
> > >>Bruce, why have all precise time calculations been droped out in 1.206? 
> > >>If there is no
> > >>gettimeofday in win32?
> > 
> > gettimeofday was not portable to win32 (at least not that I could find) and 
> > hence broke the win32 build of the clients.
> > 
> 
> GetSystemTimeAsFileTime should help.
> 
> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp

It's not clear to me how we could get this into something we can deal
with like gettimeofday.

I looked at the Apache APR project, and they have a routine that returns
the microseconds since 1970 for Unix:/* NB NB NB NB This returns GMT!!!!!!!!!! */APR_DECLARE(apr_time_t)
apr_time_now(void){   struct timeval tv;    gettimeofday(&tv, NULL);    return tv.tv_sec * APR_USEC_PER_SEC +
tv.tv_usec;}

and for Win32:APR_DECLARE(apr_time_t) apr_time_now(void){    LONGLONG aprtime = 0;    FILETIME time;#ifndef _WIN32_WCE
 GetSystemTimeAsFileTime(&time);#else    SYSTEMTIME st;    GetSystemTime(&st);    SystemTimeToFileTime(&st,
&time);#endif   FileTimeToAprTime(&aprtime, &time);    return aprtime; }
 

and FileTimeToAprTime() is:/* Number of micro-seconds between the beginning of the Windows epoch * (Jan. 1, 1601) and
theUnix epoch (Jan. 1, 1970)  */#define APR_DELTA_EPOCH_IN_USEC   APR_TIME_C(11644473600000000);__inline void
FileTimeToAprTime(apr_time_t*result, FILETIME *input){    /* Convert FILETIME one 64 bit number so we can work with it.
*/   *result = input->dwHighDateTime;    *result = (*result) << 32;    *result |= input->dwLowDateTime;    *result /=
10;   /* Convert from 100 nano-sec periods to micro-seconds. */    *result -= APR_DELTA_EPOCH_IN_USEC;  /* Convert from
Windowsepoch to Unix epoch */    return;}
 

So, this is what needs to be dealt with to get it working.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian wrote:
> So, this is what needs to be dealt with to get it working.
> 

More to the point, why is sub-second precision needed in this function? 
Connection timeout is given to us in whole seconds (1.205 code, i.e. prior to 
the patch in question):
     remains.tv_sec = atoi(conn->connect_timeout);     if (!remains.tv_sec)     {         conn->status =
CONNECTION_BAD;        return 0;     }     remains.tv_usec = 0;     rp = &remains;
 

So there is no way to bail out prior to one second. Once you accept that the 
timeout is >= 1 second, and in whole second increments, why does it need 
sub-second resolution?

Joe



Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Joe Conway wrote:
> Bruce Momjian wrote:
> > So, this is what needs to be dealt with to get it working.
> > 
> 
> More to the point, why is sub-second precision needed in this function? 
> Connection timeout is given to us in whole seconds (1.205 code, i.e. prior to 
> the patch in question):
> 
>       remains.tv_sec = atoi(conn->connect_timeout);
>       if (!remains.tv_sec)
>       {
>           conn->status = CONNECTION_BAD;
>           return 0;
>       }
>       remains.tv_usec = 0;
>       rp = &remains;
> 
> So there is no way to bail out prior to one second. Once you accept that the 
> timeout is >= 1 second, and in whole second increments, why does it need 
> sub-second resolution?

It could be argued that our seconds are not as exact as they could be
with subsecond timing.  Not sure it is worth it, but I can see the
point.  I would like to remove the tv_usec test because it suggests we
are doing something with microseconds when we are not.  Also, should we
switch to a simple time() call, rather than gettimeofday()?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian wrote:
> It could be argued that our seconds are not as exact as they could be
> with subsecond timing.  Not sure it is worth it, but I can see the
> point.

Well, if we were specifying the timeout in microseconds instead of seconds, it 
would make sense to have better resolution. But when you can only specify the 
timeout in seconds, the internal time comparison doesn't need to be any more 
accurate than seconds (IMHO anyway).

> are doing something with microseconds when we are not.  Also, should we
> switch to a simple time() call, rather than gettimeofday()?
> 

Already done -- that's what Denis is unhappy about.

Joe



Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Denis A Ustimenko
Date:
On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
> Well, if we were specifying the timeout in microseconds instead of seconds, 
> it would make sense to have better resolution. But when you can only 
> specify the timeout in seconds, the internal time comparison doesn't need 
> to be any more accurate than seconds (IMHO anyway).

Actually we have the state machine in connectDBComplete() and the timeout is
set for machine as the whole. Therefore if 1 second timeout is seted for the
connectDBComplete() the timeout of particualr iteration of loop can be less
then 1 second. 

-- 
Regards
Denis


Denis A Ustimenko <denis@oldham.ru> writes:
> On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
>> Well, if we were specifying the timeout in microseconds instead of seconds, 
>> it would make sense to have better resolution. But when you can only 
>> specify the timeout in seconds, the internal time comparison doesn't need 
>> to be any more accurate than seconds (IMHO anyway).

> Actually we have the state machine in connectDBComplete() and the timeout is
> set for machine as the whole. Therefore if 1 second timeout is seted for the
> connectDBComplete() the timeout of particualr iteration of loop can be less
> then 1 second. 

However, the code's been restructured so that we don't need to keep
track of the exact time spent in any one iteration.  The error is only
on the overall delay.  I agree with Joe that it's not worth the effort
needed (in the Win32 case) to make the timeout accurate to < 1 sec.
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Joe Conway wrote:
> Bruce Momjian wrote:
> > It could be argued that our seconds are not as exact as they could be
> > with subsecond timing.  Not sure it is worth it, but I can see the
> > point.
> 
> Well, if we were specifying the timeout in microseconds instead of seconds, it 
> would make sense to have better resolution. But when you can only specify the 
> timeout in seconds, the internal time comparison doesn't need to be any more 
> accurate than seconds (IMHO anyway).
> 
> > are doing something with microseconds when we are not.  Also, should we
> > switch to a simple time() call, rather than gettimeofday()?
> > 
> 
> Already done -- that's what Denis is unhappy about.

OK, I see that, but now, we are stuffing everything into a timeval
struct.  Does that make sense?  Shouldn't we just use time_t?  I realize
we need the timeval struct for select() in pqWaitTimed, but we are
making a copy of the timeval we pass in anyway. Seems it would be easier
just making it a time_t.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Already done -- that's what Denis is unhappy about.

> OK, I see that, but now, we are stuffing everything into a timeval
> struct.  Does that make sense?  Shouldn't we just use time_t?

Yeah, the code could be simplified now.  I'm also still not happy about
the question of whether it's safe to assume tv_sec is signed.  I think
the running state should be just finish_time, and then inside the
loop when we are about to wait, we could do
current_time = time(NULL);if (current_time >= finish_time){    // failure exit}remains.tv_sec = finish_time -
current_time;remains.tv_usec= 0;// pass &remains to select...
 
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> Already done -- that's what Denis is unhappy about.
> 
> > OK, I see that, but now, we are stuffing everything into a timeval
> > struct.  Does that make sense?  Shouldn't we just use time_t?
> 
> Yeah, the code could be simplified now.  I'm also still not happy about
> the question of whether it's safe to assume tv_sec is signed.  I think
> the running state should be just finish_time, and then inside the
> loop when we are about to wait, we could do
> 
>     current_time = time(NULL);
>     if (current_time >= finish_time)
>     {
>         // failure exit
>     }
>     remains.tv_sec = finish_time - current_time;
>     remains.tv_usec = 0;
>     // pass &remains to select...

That whole remains structure should be a time_t variable, and then we
_know_ we can't assume it is signed.  The use of timeval should
happen only in pqWaitTimed because it has to use select().

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian <pgman@candle.pha.pa.us> writes:
> That whole remains structure should be a time_t variable, and then we
> _know_ we can't assume it is signed.  The use of timeval should
> happen only in pqWaitTimed because it has to use select().

I think it's fine to use struct timeval as the parameter type for
pqWaitTimed.  This particular caller of pqWaitTimed has no need for
sub-second wait precision, but that doesn't mean we might not want it
for other purposes later.
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > That whole remains structure should be a time_t variable, and then we
> > _know_ we can't assume it is signed.  The use of timeval should
> > happen only in pqWaitTimed because it has to use select().
>
> I think it's fine to use struct timeval as the parameter type for
> pqWaitTimed.  This particular caller of pqWaitTimed has no need for
> sub-second wait precision, but that doesn't mean we might not want it
> for other purposes later.

That was a question:  whether pqWaitTimed() was something exported by
libpq and therefore something that has an API that shouldn't change.  I
see it in libpq-int.h, which I think means it isn't exported, but yes,
there could be later cases where we need subsecond stuff.

I have applied the following patch to get us a little closer to sanity.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.208
diff -c -c -r1.208 fe-connect.c
*** src/interfaces/libpq/fe-connect.c    11 Oct 2002 04:41:59 -0000    1.208
--- src/interfaces/libpq/fe-connect.c    14 Oct 2002 17:10:19 -0000
***************
*** 1071,1085 ****
              conn->status = CONNECTION_BAD;
              return 0;
          }
!         remains.tv_usec = 0;
          rp = &remains;

          /* calculate the finish time based on start + timeout */
          finish_time = time((time_t *) NULL) + remains.tv_sec;
      }

!     while (rp == NULL || remains.tv_sec > 0 ||
!            (remains.tv_sec == 0 && remains.tv_usec > 0))
      {
          /*
           * Wait, if necessary.    Note that the initial state (just after
--- 1071,1084 ----
              conn->status = CONNECTION_BAD;
              return 0;
          }
!         remains.tv_usec = 0;    /* We don't use subsecond timing */
          rp = &remains;

          /* calculate the finish time based on start + timeout */
          finish_time = time((time_t *) NULL) + remains.tv_sec;
      }

!     while (rp == NULL || remains.tv_sec > 0)
      {
          /*
           * Wait, if necessary.    Note that the initial state (just after
***************
*** 1133,1139 ****
              }

              remains.tv_sec = finish_time - current_time;
-             remains.tv_usec = 0;
          }
      }
      conn->status = CONNECTION_BAD;
--- 1132,1137 ----
Index: src/interfaces/libpq/fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.80
diff -c -c -r1.80 fe-misc.c
*** src/interfaces/libpq/fe-misc.c    3 Oct 2002 17:09:42 -0000    1.80
--- src/interfaces/libpq/fe-misc.c    14 Oct 2002 17:10:22 -0000
***************
*** 783,796 ****
  }

  int
! pqWaitTimed(int forRead, int forWrite, PGconn *conn, const struct timeval * timeout)
  {
      fd_set        input_mask;
      fd_set        output_mask;
      fd_set        except_mask;

      struct timeval tmp_timeout;
-     struct timeval *ptmp_timeout = NULL;

      if (conn->sock < 0)
      {
--- 783,795 ----
  }

  int
! pqWaitTimed(int forRead, int forWrite, PGconn *conn, const struct timeval *timeout)
  {
      fd_set        input_mask;
      fd_set        output_mask;
      fd_set        except_mask;

      struct timeval tmp_timeout;

      if (conn->sock < 0)
      {
***************
*** 823,836 ****
          if (NULL != timeout)
          {
              /*
!              * select may modify timeout argument on some platforms use
!              * copy
               */
              tmp_timeout = *timeout;
-             ptmp_timeout = &tmp_timeout;
          }
          if (select(conn->sock + 1, &input_mask, &output_mask,
!                    &except_mask, ptmp_timeout) < 0)
          {
              if (SOCK_ERRNO == EINTR)
                  goto retry5;
--- 822,834 ----
          if (NULL != timeout)
          {
              /*
!              *     select() may modify timeout argument on some platforms so
!              *    use copy
               */
              tmp_timeout = *timeout;
          }
          if (select(conn->sock + 1, &input_mask, &output_mask,
!                    &except_mask, &tmp_timeout) < 0)
          {
              if (SOCK_ERRNO == EINTR)
                  goto retry5;
Index: src/interfaces/libpq/libpq-int.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v
retrieving revision 1.58
diff -c -c -r1.58 libpq-int.h
*** src/interfaces/libpq/libpq-int.h    3 Oct 2002 17:09:42 -0000    1.58
--- src/interfaces/libpq/libpq-int.h    14 Oct 2002 17:10:24 -0000
***************
*** 340,346 ****
  extern int    pqFlush(PGconn *conn);
  extern int    pqSendSome(PGconn *conn);
  extern int    pqWait(int forRead, int forWrite, PGconn *conn);
! extern int    pqWaitTimed(int forRead, int forWrite, PGconn *conn, const struct timeval * timeout);
  extern int    pqReadReady(PGconn *conn);
  extern int    pqWriteReady(PGconn *conn);

--- 340,346 ----
  extern int    pqFlush(PGconn *conn);
  extern int    pqSendSome(PGconn *conn);
  extern int    pqWait(int forRead, int forWrite, PGconn *conn);
! extern int    pqWaitTimed(int forRead, int forWrite, PGconn *conn, const struct timeval *timeout);
  extern int    pqReadReady(PGconn *conn);
  extern int    pqWriteReady(PGconn *conn);


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Oops, overoptimized a little. ptmp_timeout is needed in case no time is
passed;  ptmp_timeout restored.

---------------------------------------------------------------------------

pgman wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > That whole remains structure should be a time_t variable, and then we
> > > _know_ we can't assume it is signed.  The use of timeval should
> > > happen only in pqWaitTimed because it has to use select().
> > 
> > I think it's fine to use struct timeval as the parameter type for
> > pqWaitTimed.  This particular caller of pqWaitTimed has no need for
> > sub-second wait precision, but that doesn't mean we might not want it
> > for other purposes later.
> 
> That was a question:  whether pqWaitTimed() was something exported by
> libpq and therefore something that has an API that shouldn't change.  I
> see it in libpq-int.h, which I think means it isn't exported, but yes,
> there could be later cases where we need subsecond stuff.
> 
> I have applied the following patch to get us a little closer to sanity.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
I have applied the following comment patch.  The current code resets the
timer when select() is interruped.  On OS's that modify timeout to show
the remaining time, we should be using that value instead of resetting
the timer to its original value on select retry.

---------------------------------------------------------------------------

pgman wrote:
>
> Oops, overoptimized a little. ptmp_timeout is needed in case no time is
> passed;  ptmp_timeout restored.
>
> ---------------------------------------------------------------------------
>
> pgman wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > That whole remains structure should be a time_t variable, and then we
> > > > _know_ we can't assume it is signed.  The use of timeval should
> > > > happen only in pqWaitTimed because it has to use select().
> > >
> > > I think it's fine to use struct timeval as the parameter type for
> > > pqWaitTimed.  This particular caller of pqWaitTimed has no need for
> > > sub-second wait precision, but that doesn't mean we might not want it
> > > for other purposes later.
> >
> > That was a question:  whether pqWaitTimed() was something exported by
> > libpq and therefore something that has an API that shouldn't change.  I
> > see it in libpq-int.h, which I think means it isn't exported, but yes,
> > there could be later cases where we need subsecond stuff.
> >
> > I have applied the following patch to get us a little closer to sanity.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/interfaces/libpq/fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.82
diff -c -c -r1.82 fe-misc.c
*** src/interfaces/libpq/fe-misc.c    14 Oct 2002 17:33:08 -0000    1.82
--- src/interfaces/libpq/fe-misc.c    14 Oct 2002 18:08:14 -0000
***************
*** 824,830 ****
          {
              /*
               *     select() may modify timeout argument on some platforms so
!              *    use copy
               */
              tmp_timeout = *timeout;
              ptmp_timeout = &tmp_timeout;
--- 824,835 ----
          {
              /*
               *     select() may modify timeout argument on some platforms so
!              *    use copy.
!              *    XXX Do we really want to do that?  If select() returns
!              *    the number of seconds remaining, we are resetting
!              *    the timeout to its original value.  This will yeild
!              *    incorrect timings when select() is interrupted.
!              *    bjm 2002-10-14
               */
              tmp_timeout = *timeout;
              ptmp_timeout = &tmp_timeout;

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>               /*
>                *     select() may modify timeout argument on some platforms so
> !              *    use copy.
> !              *    XXX Do we really want to do that?  If select() returns
> !              *    the number of seconds remaining, we are resetting
> !              *    the timeout to its original value.  This will yeild
> !              *    incorrect timings when select() is interrupted.
> !              *    bjm 2002-10-14
>                */
>               tmp_timeout = *timeout;
>               ptmp_timeout = &tmp_timeout;


Actually, now that I look at this, the API for PQwaitTimed is wrong
after all.  The right way to implement this is for the caller to pass in
finish_time (or some indication that no timeout is wanted).  Evaluation
of the time left to wait should happen inside this retry loop.  That
way, you get the right behavior (plus or minus one second, anyway)
independently of whether the platform's select() reduces its timeout
argument or not.
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >               /*
> >                *     select() may modify timeout argument on some platforms so
> > !              *    use copy.
> > !              *    XXX Do we really want to do that?  If select() returns
> > !              *    the number of seconds remaining, we are resetting
> > !              *    the timeout to its original value.  This will yeild
> > !              *    incorrect timings when select() is interrupted.
> > !              *    bjm 2002-10-14
> >                */
> >               tmp_timeout = *timeout;
> >               ptmp_timeout = &tmp_timeout;
> 
> 
> Actually, now that I look at this, the API for PQwaitTimed is wrong
> after all.  The right way to implement this is for the caller to pass in
> finish_time (or some indication that no timeout is wanted).  Evaluation
> of the time left to wait should happen inside this retry loop.  That
> way, you get the right behavior (plus or minus one second, anyway)
> independently of whether the platform's select() reduces its timeout
> argument or not.

Yes, you are saying do the time() inside PQwaitTimed(), so we can
properly get new time() values on select() retry.  Yep.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Denis A Ustimenko
Date:
Tom, excuse me, I forget to copy previous posting to pgsql-hackers@postgresql.org.

On Mon, Oct 14, 2002 at 09:53:27AM -0400, Tom Lane wrote:
> Denis A Ustimenko <denis@oldham.ru> writes:
> > On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
> >> Well, if we were specifying the timeout in microseconds instead of seconds, 
> >> it would make sense to have better resolution. But when you can only 
> >> specify the timeout in seconds, the internal time comparison doesn't need 
> >> to be any more accurate than seconds (IMHO anyway).
> 
> > Actually we have the state machine in connectDBComplete() and the timeout is
> > set for machine as the whole. Therefore if 1 second timeout is seted for the
> > connectDBComplete() the timeout of particualr iteration of loop can be less
> > then 1 second. 
> 
> However, the code's been restructured so that we don't need to keep
> track of the exact time spent in any one iteration.  The error is only
> on the overall delay.  I agree with Joe that it's not worth the effort
> needed (in the Win32 case) to make the timeout accurate to < 1 sec.
> 

Beware of almost 1 second posiible error. For example: connect_timeout == 1,
we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
do only one iteration. I don't know is it enough to make connection?
True timeout in this case == 0.000001

-- 
Regards
Denis


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Denis A Ustimenko wrote:
> Tom, excuse me, I forget to copy previous posting to pgsql-hackers@postgresql.org.
> 
> On Mon, Oct 14, 2002 at 09:53:27AM -0400, Tom Lane wrote:
> > Denis A Ustimenko <denis@oldham.ru> writes:
> > > On Sun, Oct 13, 2002 at 10:59:40PM -0700, Joe Conway wrote:
> > >> Well, if we were specifying the timeout in microseconds instead of seconds, 
> > >> it would make sense to have better resolution. But when you can only 
> > >> specify the timeout in seconds, the internal time comparison doesn't need 
> > >> to be any more accurate than seconds (IMHO anyway).
> > 
> > > Actually we have the state machine in connectDBComplete() and the timeout is
> > > set for machine as the whole. Therefore if 1 second timeout is seted for the
> > > connectDBComplete() the timeout of particualr iteration of loop can be less
> > > then 1 second. 
> > 
> > However, the code's been restructured so that we don't need to keep
> > track of the exact time spent in any one iteration.  The error is only
> > on the overall delay.  I agree with Joe that it's not worth the effort
> > needed (in the Win32 case) to make the timeout accurate to < 1 sec.
> > 
> 
> Beware of almost 1 second posiible error. For example: connect_timeout == 1,
> we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
> do only one iteration. I don't know is it enough to make connection?
> True timeout in this case == 0.000001

Good question.  What is going to happen is that select() is going to be
passed tv_sec = 1, and it is going to sleep for one second.  Now, if
select is interrupted, another time() call is going to be made.  If the
second ticked just before we run time(), we are going to think one
second has elapsed and return, so in the non-interrupt case, we are
going to be fine, but with select() interrupted, we are going to have
this problem.  Now, if we used gettimeofday(), we would be fine, but we
don't have that on Win32 and the ported version of that looks pretty
complicated.  Maybe we will go with what we have and see if anyone
complains.

One idea I am looking at is to pass a time_t into pqWaitTimeout, and do
that clock checking in there, and maybe I can do a gettimeofday() for
non-Win32 and a time() on Win32.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Denis A Ustimenko wrote:
>> Beware of almost 1 second posiible error. For example: connect_timeout == 1,
>> we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
>> do only one iteration. I don't know is it enough to make connection?
>> True timeout in this case == 0.000001

> Good question.  What is going to happen is that select() is going to be
> passed tv_sec = 1, and it is going to sleep for one second.  Now, if
> select is interrupted, another time() call is going to be made.

There is a very simple answer to this, which I think I suggested to Joe
originally, but it's not in the code now: the initial calculation of
finish_time = now() + timeout must add one.  This ensures that any
roundoff error is in the conservative direction of timing out later,
rather than sooner.
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Denis A Ustimenko wrote:
> >> Beware of almost 1 second posiible error. For example: connect_timeout == 1,
> >> we start at 0.999999 then finish_time == 1. If CPU is quite busy we will
> >> do only one iteration. I don't know is it enough to make connection?
> >> True timeout in this case == 0.000001
> 
> > Good question.  What is going to happen is that select() is going to be
> > passed tv_sec = 1, and it is going to sleep for one second.  Now, if
> > select is interrupted, another time() call is going to be made.
> 
> There is a very simple answer to this, which I think I suggested to Joe
> originally, but it's not in the code now: the initial calculation of
> finish_time = now() + timeout must add one.  This ensures that any
> roundoff error is in the conservative direction of timing out later,
> rather than sooner.

Yes, I saw that and will try to add it back into the code.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
>>Good question.  What is going to happen is that select() is going to be
>>passed tv_sec = 1, and it is going to sleep for one second.  Now, if
>>select is interrupted, another time() call is going to be made.
> 
> There is a very simple answer to this, which I think I suggested to Joe
> originally, but it's not in the code now: the initial calculation of
> finish_time = now() + timeout must add one.  This ensures that any
> roundoff error is in the conservative direction of timing out later,
> rather than sooner.

Yes, my bad, I guess.

The thing was that with the extra +1, I was repeatedly getting a wall-clock 
time of 2 seconds with a timeout set to 1 second. It seemed odd to have my 1 
second timeout automatically turned into 2 seconds every time. With the 
current code, I tried a timeout of 1 second at least a 100 times and it always 
took about 1 full wall-clock second. But I guess if there is some corner case 
that needs it...

Joe



Joe Conway <mail@joeconway.com> writes:
> The thing was that with the extra +1, I was repeatedly getting a wall-clock 
> time of 2 seconds with a timeout set to 1 second. It seemed odd to have my 1 
> second timeout automatically turned into 2 seconds every time.

That is odd; seems like you should get between 1 and 2 seconds.  How
were you measuring the delay, exactly?
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
> > The thing was that with the extra +1, I was repeatedly getting a wall-clock 
> > time of 2 seconds with a timeout set to 1 second. It seemed odd to have my 1 
> > second timeout automatically turned into 2 seconds every time.
> 
> That is odd; seems like you should get between 1 and 2 seconds.  How
> were you measuring the delay, exactly?

Remember, that if you add 1, the select() is going to get tv_sec = 2, so
yes, it will be two seconds.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> That is odd; seems like you should get between 1 and 2 seconds.  How
>> were you measuring the delay, exactly?

> Remember, that if you add 1, the select() is going to get tv_sec = 2, so
> yes, it will be two seconds.

Yeah, but only if the value isn't recalculated shortly later.  Consider
caller computes finish_time = time() + timeout;
...
inside select-wait loop, compute max_delay = finish_time - time();

If the time() value has incremented by 1 second between these two lines
of code, you have a problem with a 1-second timeout...
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> That is odd; seems like you should get between 1 and 2 seconds.  How
> >> were you measuring the delay, exactly?
> 
> > Remember, that if you add 1, the select() is going to get tv_sec = 2, so
> > yes, it will be two seconds.
> 
> Yeah, but only if the value isn't recalculated shortly later.  Consider
> 
>     caller computes finish_time = time() + timeout;
> 
>     ...
> 
>     inside select-wait loop, compute max_delay = finish_time - time();
> 
> If the time() value has incremented by 1 second between these two lines
> of code, you have a problem with a 1-second timeout...

Yep.  If you track finish time, you get that 1 second rounding problem,
and if you track just duration/timeout, you get into the problem of not
knowing when the timeout has ended.  I don't think these can be fixed
except by overestimating (+1) or by tracking subseconds along with
seconds so you really know when one second has elapsed.

Perhaps we need to modify a timeout of 1 to be 2 and leave other values
alone.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Tom Lane wrote:
 > Joe Conway <mail@joeconway.com> writes:
 >
 >> The thing was that with the extra +1, I was repeatedly getting a
 >> wall-clock time of 2 seconds with a timeout set to 1 second. It seemed
 >> odd to have my 1 second timeout automatically turned into 2 seconds every
 >> time.
 >
 > That is odd; seems like you should get between 1 and 2 seconds.  How were
 > you measuring the delay, exactly?

OK. I got a little more scientific about my testing. I used a php script,
running on the same machine, to connect/disconnect in a tight loop and timed
successful and unsuccessful connection attempts using microtime().

Here are the results. First with current cvs code:

current cvs libpq code
-----------------------
good connect info, using unix socket, timeout = 1 second:
=========================================================
unsuccessful 69 times: sum 0.41736388206482: avg 0.0060487519139829
successful 9931 times: sum 68.798981308937: avg 0.0069276992557584

good connect info, using hostaddr, timeout = 1 second
=====================================================
unsuccessful 72 times: sum 0.37020063400269: avg 0.0051416754722595
successful 9928 times: sum 75.047878861427: avg 0.0075592142285886

current cvs libpq code - bad hostaadr, using hostaddr, timeout = 1 second
=========================================================================
unsuccessful 100 times: sum 99.975758910179: avg 0.99975758910179
successful 0 times: sum 0: avg n/a


Clearly not good. The timeout code is causing connection failures about 0.7%
of the time. Next are the results using the attached patch. Per Bruce's
suggestion, it only adds 1 if the timeout is set to 1.


with patch libpq code
---------------------
good connect info, using unix socket, timeout = 1 second
========================================================
unsuccessful 0 times: sum 0: avg n/a
successful 10000 times: sum 68.95981669426: avg 0.006895981669426

with patch libpq code - good connect info, using hostaddr, timeout = 1 second
=============================================================================
unsuccessful 0 times: sum 0: avg n/a
successful 10000 times: sum 73.500863552094: avg 0.0073500863552094

with patch libpq code - good connect info, using hostaddr, timeout = 2 seconds
==============================================================================
unsuccessful 0 times: sum 0: avg n/a
successful 10000 times: sum 73.354710936546: avg 0.0073354710936546

with patch libpq code - bad hostaadr, using hostaddr, timeout = 1 second
========================================================================
unsuccessful 100 times: sum 149.98181843758: avg 1.4998181843758
successful 0 times: sum 0: avg n/a

with patch libpq code - bad hostaadr, using hostaddr, timeout = 2 seconds
=========================================================================
unsuccessful 100 times: sum 149.98445630074: avg 1.4998445630074
successful 0 times: sum 0: avg n/a

with patch libpq code - bad hostaadr, using hostaddr, timeout = 3 seconds
=========================================================================
unsuccessful 20 times: sum 59.842629671097: avg 2.9921314835548
successful 0 times: sum 0: avg n/a


With the patch there were 0 failures on 30000 attempts using good connect
information.

If there are no objections, please apply the attached. Otherwise let me know
if you'd like different tests or would like to try other approaches.

Thanks,

Joe


Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /opt/src/cvs/pgsql-server/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.210
diff -c -r1.210 fe-connect.c
*** src/interfaces/libpq/fe-connect.c    15 Oct 2002 01:48:25 -0000    1.210
--- src/interfaces/libpq/fe-connect.c    15 Oct 2002 22:36:53 -0000
***************
*** 1066,1071 ****
--- 1066,1073 ----
      if (conn->connect_timeout != NULL)
      {
          remains.tv_sec = atoi(conn->connect_timeout);
+         if (remains.tv_sec == 1)
+             remains.tv_sec += 1;
          if (!remains.tv_sec)
          {
              conn->status = CONNECTION_BAD;

Joe Conway <mail@joeconway.com> writes:
> [ some convincing test cases that timeout=1 is not good ]

>           remains.tv_sec = atoi(conn->connect_timeout);
> +         if (remains.tv_sec == 1)
> +             remains.tv_sec += 1;
>           if (!remains.tv_sec)
>           {
>               conn->status = CONNECTION_BAD;

On pure-paranoia grounds, I'd suggest the logic

+        /* force a sane minimum delay */
+         if (remains.tv_sec < 2)
+             remains.tv_sec = 2;

whereupon you could remove the failure check just below.
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
> > [ some convincing test cases that timeout=1 is not good ]
>
> >           remains.tv_sec = atoi(conn->connect_timeout);
> > +         if (remains.tv_sec == 1)
> > +             remains.tv_sec += 1;
> >           if (!remains.tv_sec)
> >           {
> >               conn->status = CONNECTION_BAD;
>
> On pure-paranoia grounds, I'd suggest the logic
>
> +        /* force a sane minimum delay */
> +         if (remains.tv_sec < 2)
> +             remains.tv_sec = 2;
>
> whereupon you could remove the failure check just below.

I think we should fail if they set the timeout to zero, rather than
cover it up by setting the delay to two.

Attached is a patch that implements most of what we have discussed:

    use time()
    get rid of timeval where not needed
    allow restart of select() to properly compute remaining time
    add 1 to timeout == 1
    pass finish time to pqWaitTimed

Patch applied.  I am applying it so it is in CVS and everyone can see
it.  I will keep modifying it until everyone likes it.  It is just
easier to do it that way when multiple people are reviewing it.  They
can jump in and make changes too.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
Index: src/interfaces/libpq/fe-connect.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v
retrieving revision 1.210
diff -c -c -r1.210 fe-connect.c
*** src/interfaces/libpq/fe-connect.c    15 Oct 2002 01:48:25 -0000    1.210
--- src/interfaces/libpq/fe-connect.c    16 Oct 2002 02:48:07 -0000
***************
*** 1052,1061 ****
  {
      PostgresPollingStatusType flag = PGRES_POLLING_WRITING;

!     time_t            finish_time = 0,
!                     current_time;
!     struct timeval    remains,
!                    *rp = NULL;

      if (conn == NULL || conn->status == CONNECTION_BAD)
          return 0;
--- 1052,1058 ----
  {
      PostgresPollingStatusType flag = PGRES_POLLING_WRITING;

!     time_t            finish_time = -1;

      if (conn == NULL || conn->status == CONNECTION_BAD)
          return 0;
***************
*** 1065,1084 ****
       */
      if (conn->connect_timeout != NULL)
      {
!         remains.tv_sec = atoi(conn->connect_timeout);
!         if (!remains.tv_sec)
          {
              conn->status = CONNECTION_BAD;
              return 0;
          }
!         remains.tv_usec = 0;    /* We don't use subsecond timing */
!         rp = &remains;
!
          /* calculate the finish time based on start + timeout */
!         finish_time = time((time_t *) NULL) + remains.tv_sec;
      }

!     while (rp == NULL || remains.tv_sec > 0)
      {
          /*
           * Wait, if necessary.    Note that the initial state (just after
--- 1062,1082 ----
       */
      if (conn->connect_timeout != NULL)
      {
!         int timeout = atoi(conn->connect_timeout);
!
!         if (timeout == 0)
          {
              conn->status = CONNECTION_BAD;
              return 0;
          }
!         /* Rounding could cause connection to fail;we need at least 2 secs */
!         if (timeout == 1)
!             timeout++;
          /* calculate the finish time based on start + timeout */
!         finish_time = time(NULL) + timeout;
      }

!     while (finish_time == -1 || time(NULL) >= finish_time)
      {
          /*
           * Wait, if necessary.    Note that the initial state (just after
***************
*** 1094,1100 ****
                  return 1;        /* success! */

              case PGRES_POLLING_READING:
!                 if (pqWaitTimed(1, 0, conn, rp))
                  {
                      conn->status = CONNECTION_BAD;
                      return 0;
--- 1092,1098 ----
                  return 1;        /* success! */

              case PGRES_POLLING_READING:
!                 if (pqWaitTimed(1, 0, conn, finish_time))
                  {
                      conn->status = CONNECTION_BAD;
                      return 0;
***************
*** 1102,1108 ****
                  break;

              case PGRES_POLLING_WRITING:
!                 if (pqWaitTimed(0, 1, conn, rp))
                  {
                      conn->status = CONNECTION_BAD;
                      return 0;
--- 1100,1106 ----
                  break;

              case PGRES_POLLING_WRITING:
!                 if (pqWaitTimed(0, 1, conn, finish_time))
                  {
                      conn->status = CONNECTION_BAD;
                      return 0;
***************
*** 1119,1138 ****
           * Now try to advance the state machine.
           */
          flag = PQconnectPoll(conn);
-
-         /*
-          * If connecting timeout is set, calculate remaining time.
-          */
-         if (rp != NULL)
-         {
-             if (time(¤t_time) == -1)
-             {
-                 conn->status = CONNECTION_BAD;
-                 return 0;
-             }
-
-             remains.tv_sec = finish_time - current_time;
-         }
      }
      conn->status = CONNECTION_BAD;
      return 0;
--- 1117,1122 ----
Index: src/interfaces/libpq/fe-misc.c
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/fe-misc.c,v
retrieving revision 1.83
diff -c -c -r1.83 fe-misc.c
*** src/interfaces/libpq/fe-misc.c    14 Oct 2002 18:11:17 -0000    1.83
--- src/interfaces/libpq/fe-misc.c    16 Oct 2002 02:48:10 -0000
***************
*** 779,789 ****
  int
  pqWait(int forRead, int forWrite, PGconn *conn)
  {
!     return pqWaitTimed(forRead, forWrite, conn, (const struct timeval *) NULL);
  }

  int
! pqWaitTimed(int forRead, int forWrite, PGconn *conn, const struct timeval *timeout)
  {
      fd_set        input_mask;
      fd_set        output_mask;
--- 779,789 ----
  int
  pqWait(int forRead, int forWrite, PGconn *conn)
  {
!     return pqWaitTimed(forRead, forWrite, conn, -1);
  }

  int
! pqWaitTimed(int forRead, int forWrite, PGconn *conn, time_t finish_time)
  {
      fd_set        input_mask;
      fd_set        output_mask;
***************
*** 820,826 ****
              FD_SET(conn->sock, &output_mask);
          FD_SET(conn->sock, &except_mask);

!         if (NULL != timeout)
          {
              /*
               *     select() may modify timeout argument on some platforms so
--- 820,826 ----
              FD_SET(conn->sock, &output_mask);
          FD_SET(conn->sock, &except_mask);

!         if (finish_time != -1)
          {
              /*
               *     select() may modify timeout argument on some platforms so
***************
*** 831,837 ****
               *    incorrect timings when select() is interrupted.
               *    bjm 2002-10-14
               */
!             tmp_timeout = *timeout;
              ptmp_timeout = &tmp_timeout;
          }
          if (select(conn->sock + 1, &input_mask, &output_mask,
--- 831,839 ----
               *    incorrect timings when select() is interrupted.
               *    bjm 2002-10-14
               */
!             if ((tmp_timeout.tv_sec = finish_time - time(NULL)) < 0)
!                 tmp_timeout.tv_sec = 0;  /* possible? */
!             tmp_timeout.tv_usec = 0;
              ptmp_timeout = &tmp_timeout;
          }
          if (select(conn->sock + 1, &input_mask, &output_mask,
Index: src/interfaces/libpq/libpq-int.h
===================================================================
RCS file: /cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v
retrieving revision 1.59
diff -c -c -r1.59 libpq-int.h
*** src/interfaces/libpq/libpq-int.h    14 Oct 2002 17:15:11 -0000    1.59
--- src/interfaces/libpq/libpq-int.h    16 Oct 2002 02:48:12 -0000
***************
*** 340,346 ****
  extern int    pqFlush(PGconn *conn);
  extern int    pqSendSome(PGconn *conn);
  extern int    pqWait(int forRead, int forWrite, PGconn *conn);
! extern int    pqWaitTimed(int forRead, int forWrite, PGconn *conn, const struct timeval *timeout);
  extern int    pqReadReady(PGconn *conn);
  extern int    pqWriteReady(PGconn *conn);

--- 340,347 ----
  extern int    pqFlush(PGconn *conn);
  extern int    pqSendSome(PGconn *conn);
  extern int    pqWait(int forRead, int forWrite, PGconn *conn);
! extern int    pqWaitTimed(int forRead, int forWrite, PGconn *conn,
!                         time_t finish_time);
  extern int    pqReadReady(PGconn *conn);
  extern int    pqWriteReady(PGconn *conn);


Bruce Momjian wrote:
> Patch applied.  I am applying it so it is in CVS and everyone can see
> it.  I will keep modifying it until everyone likes it.  It is just
> easier to do it that way when multiple people are reviewing it.  They
> can jump in and make changes too.

I ran the same test as before, with the following results:


current cvs
-----------
good connect info, using hostaddr, timeout = 1 second
=====================================================
unsuccessful 0 times: avg n/a
successful 50000 times: avg 0.0087

bad connect info, using hostaddr, timeout = 1 second
====================================================
unsuccessful 100 times: avg 1.4998
successful 0 times: sum 0: avg n/a


Seems to work well. But one slight concern:

with previous 2 line patch
--------------------------
good connect info, using hostaddr, timeout = 1 || 2 second(s)
=============================================================
unsuccessful 0 times: avg n/a
successful 20000 times: avg 0.0074

These tests were on the same, otherwise unloaded development box. Not sure if 
it is an artifact or not, but the average connection time has gone from 0.0074 
to 0.0087, an increase of about 17%. Also worth noting is that there was very 
little deviation from connect-to-connect on both of the tests (eye-balled the 
total range at about 0.0003). I did not bother calculating standard deviation 
of the connect times, but I'm certain it would not be enough to account for 
the difference. Could anything in Bruce's patch account for this, or do you 
think it is normal variation due to something on my dev box?

Joe



Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Joe Conway wrote:
> Seems to work well. But one slight concern:
> 
> with previous 2 line patch
> --------------------------
> good connect info, using hostaddr, timeout = 1 || 2 second(s)
> =============================================================
> unsuccessful 0 times: avg n/a
> successful 20000 times: avg 0.0074
> 
> These tests were on the same, otherwise unloaded development box. Not sure if 
> it is an artifact or not, but the average connection time has gone from 0.0074 
> to 0.0087, an increase of about 17%. Also worth noting is that there was very 
> little deviation from connect-to-connect on both of the tests (eye-balled the 
> total range at about 0.0003). I did not bother calculating standard deviation 
> of the connect times, but I'm certain it would not be enough to account for 
> the difference. Could anything in Bruce's patch account for this, or do you 
> think it is normal variation due to something on my dev box?

Yes, the new code has _three_ time() calls, rather than the old code
that I think only had two.  I was going to mention it but I figured
time() was a pretty light system call, sort of like getpid().

I needed the additional time() calls so the computation of remaining
time was more accurate, i.e. we are not resetting the timer on a
select() EINTR anymore.

Should I try to rework it?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian wrote:
> Yes, the new code has _three_ time() calls, rather than the old code
> that I think only had two.  I was going to mention it but I figured
> time() was a pretty light system call, sort of like getpid().
> 
> I needed the additional time() calls so the computation of remaining
> time was more accurate, i.e. we are not resetting the timer on a
> select() EINTR anymore.
> 
> Should I try to rework it?
> 

I tried two more runs of 10000, and the average is pretty steady at 0.0087. 
However the total range is a fair bit wider than I originally reported.

I added a forth time() call to see what the effect would be. It increased the 
average to 0.0089 (two runs of 10000 connects each), so I don't think the 
time() call explains the entire difference.

Not sure this is worth worrying about or not. I'd guess anyone serious about 
keeping connect time to a minimum uses some kind of connection pool or 
persistent connection anyway.

Joe




Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Joe Conway wrote:
> Bruce Momjian wrote:
> > Yes, the new code has _three_ time() calls, rather than the old code
> > that I think only had two.  I was going to mention it but I figured
> > time() was a pretty light system call, sort of like getpid().
> > 
> > I needed the additional time() calls so the computation of remaining
> > time was more accurate, i.e. we are not resetting the timer on a
> > select() EINTR anymore.
> > 
> > Should I try to rework it?
> > 
> 
> I tried two more runs of 10000, and the average is pretty steady at 0.0087. 
> However the total range is a fair bit wider than I originally reported.
> 
> I added a forth time() call to see what the effect would be. It increased the 
> average to 0.0089 (two runs of 10000 connects each), so I don't think the 
> time() call explains the entire difference.
> 
> Not sure this is worth worrying about or not. I'd guess anyone serious about 
> keeping connect time to a minimum uses some kind of connection pool or 
> persistent connection anyway.

Well, the fact you see a change of 0.0002 is significant.  Let me add
that in the old code there was only one time() call _in_ the loop, while
now, there are two, so I can easily see there are several additional
time() calls.  Did you put your calls in the while loop?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian wrote:
> Well, the fact you see a change of 0.0002 is significant.  Let me add
> that in the old code there was only one time() call _in_ the loop, while
> now, there are two, so I can easily see there are several additional
> time() calls.  Did you put your calls in the while loop?
> 

Not the first time, but I moved it around (into the loop in connectDBComplete, 
and just before select() in pqWaitTimed) and it didn't seem to make any 
difference. Then I removed it entirely again, and still got 0.0089 seconds 
average. So, at the least, 0.0002 worth of variation seems to be related to 
the development machine itself.

Joe

p.s. The good news is that with tens of thousands more tests at 1 second 
timeout, still zero connection failures!




Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Denis A Ustimenko
Date:
On Mon, Oct 14, 2002 at 01:00:07AM -0400, Bruce Momjian wrote:
> Denis A Ustimenko wrote:
> > On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> > > Denis A Ustimenko wrote:
> > > >>Bruce, why have all precise time calculations been droped out in 1.206? 
> > > >>If there is no
> > > >>gettimeofday in win32?
> > > 
> > > gettimeofday was not portable to win32 (at least not that I could find) and 
> > > hence broke the win32 build of the clients.
> > > 
> > 
> > GetSystemTimeAsFileTime should help.
> > 
> > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp
> 
> It's not clear to me how we could get this into something we can deal
> with like gettimeofday.
> 
> I looked at the Apache APR project, and they have a routine that returns
> the microseconds since 1970 for Unix:
>     

Here is my version of gettimeofday for win32. It was tested with Watcom C 11.0c. I think it can be used. I still belive
thatfine time calculation is the right way.
 

#include<stdio.h>
#ifdef _WIN32
#include<winsock.h>
#else
#include<sys/time.h>
#endif

main()
{   struct timeval t;   if (gettimeofday(&t,NULL)) {       printf("error\n\r");   } else {       printf("the time is
%ld.%ld\n\r",t.tv_sec, t.tv_usec);   }   fflush(stdout);
 
}

#ifdef _WIN32
int gettimeofday(struct timeval *tp, void *tzp)
{   FILETIME time;   __int64 tmp;
   if ( NULL == tp) return -1;
   GetSystemTimeAsFileTime(&time);
   tmp = time.dwHighDateTime;   tmp <<= 32;   tmp |= time.dwLowDateTime;   tmp /= 10; // it was in 100 nanosecond
periods  tp->tv_sec = tmp / 1000000 - 11644473600L; // Windows Epoch begins at 12:00 AM 01.01.1601   tp->tv_usec = tmp
%1000000;   return 0;
 
}
#endif



-- 
Regards
Denis


Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Yes, the new code has _three_ time() calls, rather than the old code
> that I think only had two.  I was going to mention it but I figured
> time() was a pretty light system call, sort of like getpid().
> I needed the additional time() calls so the computation of remaining
> time was more accurate, i.e. we are not resetting the timer on a
> select() EINTR anymore.

As long as the time() calls aren't invoked in the default no-timeout
case, I doubt that the small additional slowdown matters too much.
Still, one could ask why we are expending extra cycles to make the
timeout more accurate.  Who the heck needs an accurate timeout on
connect?  Can you really give a use-case where the user won't have
picked a number out of the air anyway?
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Yes, the new code has _three_ time() calls, rather than the old code
> > that I think only had two.  I was going to mention it but I figured
> > time() was a pretty light system call, sort of like getpid().
> > I needed the additional time() calls so the computation of remaining
> > time was more accurate, i.e. we are not resetting the timer on a
> > select() EINTR anymore.
> 
> As long as the time() calls aren't invoked in the default no-timeout
> case, I doubt that the small additional slowdown matters too much.
> Still, one could ask why we are expending extra cycles to make the
> timeout more accurate.  Who the heck needs an accurate timeout on
> connect?  Can you really give a use-case where the user won't have
> picked a number out of the air anyway?

I think we do need to properly compute the timeout on an EINTR of
select() because if we don't, a 30 second timeout could become 90
seconds if select() is interrupted.  The other time() calls are needed,
one above the loop, and one inside the loop.

The only thing I can do is to pass in the end time _and_ the duration to
pqWaitTimeout and do the time() recomputation only on EINTR.  This would
compute duration in the loop where I call time() and therefore elminate
a time() call in the normal, non-EINTR case. Of course, it makes the
code more complicated, and that is why I avoided it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Yes, the new code has _three_ time() calls, rather than the old code
> > that I think only had two.  I was going to mention it but I figured
> > time() was a pretty light system call, sort of like getpid().
> > I needed the additional time() calls so the computation of remaining
> > time was more accurate, i.e. we are not resetting the timer on a
> > select() EINTR anymore.
> 
> As long as the time() calls aren't invoked in the default no-timeout
> case, I doubt that the small additional slowdown matters too much.
> Still, one could ask why we are expending extra cycles to make the
> timeout more accurate.  Who the heck needs an accurate timeout on
> connect?  Can you really give a use-case where the user won't have
> picked a number out of the air anyway?

Yes, the default no-timeout case makes no time() calls.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Tom Lane wrote:
>> Still, one could ask why we are expending extra cycles to make the
>> timeout more accurate.  Who the heck needs an accurate timeout on
>> connect?  Can you really give a use-case where the user won't have
>> picked a number out of the air anyway?

> I think we do need to properly compute the timeout on an EINTR of
> select() because if we don't, a 30 second timeout could become 90
> seconds if select() is interrupted.  The other time() calls are needed,
> one above the loop, and one inside the loop.

AFAICS we need one time() call at the start, and then one inside the
select loop.  I haven't looked at your recent patches, but you said
something about putting two calls in the loop; that seems like overkill.
        regards, tom lane


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
I will keep this in case we need it later.  I think we worked around
this problem by having timeout == 1 set equal to 2 so we get at least
one second for the connection.

---------------------------------------------------------------------------

Denis A Ustimenko wrote:
> On Mon, Oct 14, 2002 at 01:00:07AM -0400, Bruce Momjian wrote:
> > Denis A Ustimenko wrote:
> > > On Sun, Oct 13, 2002 at 09:02:55PM -0700, Joe Conway wrote:
> > > > Denis A Ustimenko wrote:
> > > > >>Bruce, why have all precise time calculations been droped out in 1.206? 
> > > > >>If there is no
> > > > >>gettimeofday in win32?
> > > > 
> > > > gettimeofday was not portable to win32 (at least not that I could find) and 
> > > > hence broke the win32 build of the clients.
> > > > 
> > > 
> > > GetSystemTimeAsFileTime should help.
> > > 
> > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/getsystemtimeasfiletime.asp
> > 
> > It's not clear to me how we could get this into something we can deal
> > with like gettimeofday.
> > 
> > I looked at the Apache APR project, and they have a routine that returns
> > the microseconds since 1970 for Unix:
> >     
> 
> Here is my version of gettimeofday for win32. It was tested with Watcom C 11.0c. I think it can be used. I still
belivethat fine time calculation is the right way.
 
> 
> #include<stdio.h>
> #ifdef _WIN32
> #include<winsock.h>
> #else
> #include<sys/time.h>
> #endif
> 
> main()
> {
>     struct timeval t;
>     if (gettimeofday(&t,NULL)) {
>         printf("error\n\r");
>     } else {
>         printf("the time is %ld.%ld\n\r", t.tv_sec, t.tv_usec);
>     }
>     fflush(stdout);
> }
> 
> #ifdef _WIN32
> int gettimeofday(struct timeval *tp, void *tzp)
> {
>     FILETIME time;
>     __int64 tmp;
> 
>     if ( NULL == tp) return -1;
> 
>     GetSystemTimeAsFileTime(&time);
> 
>     tmp = time.dwHighDateTime;
>     tmp <<= 32;
>     tmp |= time.dwLowDateTime;
>     tmp /= 10; // it was in 100 nanosecond periods
>     tp->tv_sec = tmp / 1000000 - 11644473600L; // Windows Epoch begins at 12:00 AM 01.01.1601
>     tp->tv_usec = tmp % 1000000;
>     return 0;
> }
> #endif
> 
> 
> 
> -- 
> Regards
> Denis
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: droped out precise time calculations in src/interfaces/libpq/fe-connect.c

From
Bruce Momjian
Date:
Tom Lane wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > Tom Lane wrote:
> >> Still, one could ask why we are expending extra cycles to make the
> >> timeout more accurate.  Who the heck needs an accurate timeout on
> >> connect?  Can you really give a use-case where the user won't have
> >> picked a number out of the air anyway?
> 
> > I think we do need to properly compute the timeout on an EINTR of
> > select() because if we don't, a 30 second timeout could become 90
> > seconds if select() is interrupted.  The other time() calls are needed,
> > one above the loop, and one inside the loop.
> 
> AFAICS we need one time() call at the start, and then one inside the
> select loop.  I haven't looked at your recent patches, but you said
> something about putting two calls in the loop; that seems like overkill.

Yes, one call at the start, and one in the loop.  We need another in
pqWaitTimeout, but only if we hit EINTR.  As the code stands now we do
time() unconditionally in pqWaitTimeout too because we only have to pass
in the funish time.  If we want to pass in both finish and duration (for
use as select() timeout param), then we can eliminate the time() call in
there for the non-EINTR case.  Is it worth the added code complexity? 
That is what I am not sure about.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073