Thread: Millisecond-precision connect_timeout for libpq

Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

05 July 2013, 18:07:32

Hi, guys! I made a quick patch to support floating number in
connect_timeout param for libpq. This will treat floating number as
seconds so this is backwards-compatible. I don't usually write in C,
so there may be mistakes. Could you review it and give me some
feedback?

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Attachment

connect_timeout_in_ms.patch

Re: Millisecond-precision connect_timeout for libpq

From

Tom Lane

Date:

05 July 2013, 18:20:58

ivan babrou <ibobrik@gmail.com> writes:
> Hi, guys! I made a quick patch to support floating number in
> connect_timeout param for libpq.

What exactly is the use case for that?  It seems like extra complication
for something with little if any real-world usefulness.
        regards, tom lane

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

05 July 2013, 19:09:48

If you can figure out that postgresql is overloaded then you may
decide what to do faster. In our app we have very strict limit for
connect time to mysql, redis and other services, but postgresql has
minimum of 2 seconds. When processing time for request is under 100ms
on average sub-second timeouts matter.

On 5 July 2013 22:20, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ivan babrou <ibobrik@gmail.com> writes:
>> Hi, guys! I made a quick patch to support floating number in
>> connect_timeout param for libpq.
>
> What exactly is the use case for that?  It seems like extra complication
> for something with little if any real-world usefulness.
>
>                         regards, tom lane

-- 
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

Tom Lane

Date:

05 July 2013, 19:26:36

ivan babrou <ibobrik@gmail.com> writes:
> If you can figure out that postgresql is overloaded then you may
> decide what to do faster. In our app we have very strict limit for
> connect time to mysql, redis and other services, but postgresql has
> minimum of 2 seconds. When processing time for request is under 100ms
> on average sub-second timeouts matter.

If you are issuing a fresh connection for each sub-100ms query, you're
doing it wrong anyway ...
        regards, tom lane

Re: Millisecond-precision connect_timeout for libpq

From

Josh Berkus

Date:

05 July 2013, 20:02:09

On 07/05/2013 12:26 PM, Tom Lane wrote:
> ivan babrou <ibobrik@gmail.com> writes:
>> If you can figure out that postgresql is overloaded then you may
>> decide what to do faster. In our app we have very strict limit for
>> connect time to mysql, redis and other services, but postgresql has
>> minimum of 2 seconds. When processing time for request is under 100ms
>> on average sub-second timeouts matter.
> 
> If you are issuing a fresh connection for each sub-100ms query, you're
> doing it wrong anyway ...

It's fairly common with certain kinds of apps, including Rails and PHP.This is one of the reasons why we've discussed
havinga kind of

stripped-down version of pgbouncer built into Postgres as a connection
manager.  If it weren't valuable to be able to relocate pgbouncer to
other hosts, I'd still say that was a good idea.

Ivan would really strongly benefit from running pgbouncer on his
appservers instead of connecting directly to Postgres.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Millisecond-precision connect_timeout for libpq

From

"Joshua D. Drake"

Date:

05 July 2013, 20:10:03

On 7/5/2013 1:01 PM, Josh Berkus wrote:
> If you are issuing a fresh connection for each sub-100ms query, you're
> doing it wrong anyway ...
> It's fairly common with certain kinds of apps, including Rails and PHP.
>   This is one of the reasons why we've discussed having a kind of
> stripped-down version of pgbouncer built into Postgres as a connection
> manager.  If it weren't valuable to be able to relocate pgbouncer to
> other hosts, I'd still say that was a good idea.

No kidding. I think a lot of -hackers forget that the web rules here and 
the web is stateless, which means a huge performance loss for postgresql 
unless we add yet another piece of software. Pre-forking here would 
really help us.

JD

Re: Millisecond-precision connect_timeout for libpq

From

Tom Lane

Date:

05 July 2013, 20:25:06

"Joshua D. Drake" <jd@commandprompt.com> writes:
> On 7/5/2013 1:01 PM, Josh Berkus wrote:
>> This is one of the reasons why we've discussed having a kind of
>> stripped-down version of pgbouncer built into Postgres as a connection
>> manager.  If it weren't valuable to be able to relocate pgbouncer to
>> other hosts, I'd still say that was a good idea.

> No kidding. I think a lot of -hackers forget that the web rules here and 
> the web is stateless, which means a huge performance loss for postgresql 
> unless we add yet another piece of software. Pre-forking here would 
> really help us.

Pre-forking, per se, wouldn't be that much help IMO.  You really want to
connect to a backend that's already loaded its catalog caches etc.  So a
connection pooler is the right solution, not least because you can get
it today.  Whether we should integrate a pooler into core is more of a
project management issue than a technical one, I think.
        regards, tom lane

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

05 July 2013, 22:21:07

On 5 July 2013 23:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ivan babrou <ibobrik@gmail.com> writes:
>> If you can figure out that postgresql is overloaded then you may
>> decide what to do faster. In our app we have very strict limit for
>> connect time to mysql, redis and other services, but postgresql has
>> minimum of 2 seconds. When processing time for request is under 100ms
>> on average sub-second timeouts matter.
>
> If you are issuing a fresh connection for each sub-100ms query, you're
> doing it wrong anyway ...
>
>                         regards, tom lane

In php you cannot persist connection between requests without worrying
about transaction state. We don't use postgresql for every sub-100ms
query because it can block the whole request for 2 seconds. Usually it
takes 1.5ms to connect, btw.

Can you tell me why having ability to specify more accurate connect
timeout is a bad idea?

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

08 July 2013, 16:11:31

On 5 July 2013 23:47, ivan babrou <ibobrik@gmail.com> wrote:
> On 5 July 2013 23:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> ivan babrou <ibobrik@gmail.com> writes:
>>> If you can figure out that postgresql is overloaded then you may
>>> decide what to do faster. In our app we have very strict limit for
>>> connect time to mysql, redis and other services, but postgresql has
>>> minimum of 2 seconds. When processing time for request is under 100ms
>>> on average sub-second timeouts matter.
>>
>> If you are issuing a fresh connection for each sub-100ms query, you're
>> doing it wrong anyway ...
>>
>>                         regards, tom lane
>
> In php you cannot persist connection between requests without worrying
> about transaction state. We don't use postgresql for every sub-100ms
> query because it can block the whole request for 2 seconds. Usually it
> takes 1.5ms to connect, btw.
>
> Can you tell me why having ability to specify more accurate connect
> timeout is a bad idea?
>
> --
> Regards, Ian Babrou
> http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Nobody answered my question yet.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

"David E. Wheeler"

Date:

08 July 2013, 16:40:53

On Jul 8, 2013, at 7:44 AM, ivan babrou <ibobrik@gmail.com> wrote:

>> Can you tell me why having ability to specify more accurate connect
>> timeout is a bad idea?
>
> Nobody answered my question yet.

From an earlier post by Tom:

> What exactly is the use case for that?  It seems like extra complication
> for something with little if any real-world usefulness.

So the answer is: extra complication.

Best,

David

Re: Millisecond-precision connect_timeout for libpq

From

Cédric Villemain

Date:

08 July 2013, 17:13:51

Le lundi 8 juillet 2013 18:40:29, David E. Wheeler a écrit :
> On Jul 8, 2013, at 7:44 AM, ivan babrou <ibobrik@gmail.com> wrote:
> >> Can you tell me why having ability to specify more accurate connect
> >> timeout is a bad idea?
> >
> > Nobody answered my question yet.
>
> From an earlier post by Tom:
> > What exactly is the use case for that?  It seems like extra complication
> > for something with little if any real-world usefulness.
>
> So the answer is: extra complication.

for something that must go through a pooler anyway.
You can have a look at pgbouncer: query_wait_timeout parameter for example.

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

08 July 2013, 19:34:05

On 8 July 2013 20:40, David E. Wheeler <david@justatheory.com> wrote:
> On Jul 8, 2013, at 7:44 AM, ivan babrou <ibobrik@gmail.com> wrote:
>
>>> Can you tell me why having ability to specify more accurate connect
>>> timeout is a bad idea?
>>
>> Nobody answered my question yet.
>
> From an earlier post by Tom:
>
>> What exactly is the use case for that?  It seems like extra complication
>> for something with little if any real-world usefulness.
>
> So the answer is: extra complication.
>
> Best,
>
> David
>

I don't see any extra complication in backwards-compatible patch that
removes more lines that adds. Can you tell me, what exactly is extra
complicated?

About pooling connections: we have 150 applications servers and 10
postgresql servers. Each app connects to each server -> 150
connections per server if I run pooler on each application server.
That's more than default setting and now we usually have not more than
10 connections per server. What would happen if we have 300 app
servers? I thought connections consume some memory. Running pooler not
on every app server gives no advantage — I still may get network
blackhole and 2 seconds delay. Moreover, now I can guess that
postgresql is overloaded if it does not accept connections, with
pooler I can simply blow up disks with heavy io.

Seriously, I don't get why running 150 poolers is easier. And my
problem is still here: server (pooler is this case) is down — 2
seconds delay. 2000% slower.

Where am I wrong?

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

Robert Haas

Date:

09 July 2013, 02:44:25

On Jul 8, 2013, at 1:31 PM, ivan babrou <ibobrik@gmail.com> wrote:
> On 8 July 2013 20:40, David E. Wheeler <david@justatheory.com> wrote:
>> On Jul 8, 2013, at 7:44 AM, ivan babrou <ibobrik@gmail.com> wrote:
>>
>>>> Can you tell me why having ability to specify more accurate connect
>>>> timeout is a bad idea?
>>>
>>> Nobody answered my question yet.
>>
>> From an earlier post by Tom:
>>
>>> What exactly is the use case for that?  It seems like extra complication
>>> for something with little if any real-world usefulness.
>>
>> So the answer is: extra complication.
>>
>> Best,
>>
>> David
>
> I don't see any extra complication in backwards-compatible patch that
> removes more lines that adds. Can you tell me, what exactly is extra
> complicated?
>
> About pooling connections: we have 150 applications servers and 10
> postgresql servers. Each app connects to each server -> 150
> connections per server if I run pooler on each application server.
> That's more than default setting and now we usually have not more than
> 10 connections per server. What would happen if we have 300 app
> servers? I thought connections consume some memory. Running pooler not
> on every app server gives no advantage — I still may get network
> blackhole and 2 seconds delay. Moreover, now I can guess that
> postgresql is overloaded if it does not accept connections, with
> pooler I can simply blow up disks with heavy io.
>
> Seriously, I don't get why running 150 poolers is easier. And my
> problem is still here: server (pooler is this case) is down — 2
> seconds delay. 2000% slower.
>
> Where am I wrong?

I agree with you.

...Robert

Re: Millisecond-precision connect_timeout for libpq

From

Markus Wanner

Date:

09 July 2013, 07:07:40

On 07/08/2013 08:31 PM, ivan babrou wrote:
> Seriously, I don't get why running 150 poolers is easier.

Did you consider running pgbouncer on the database servers?

Regards

Markus Wanner

Re: Millisecond-precision connect_timeout for libpq

From

Markus Wanner

Date:

09 July 2013, 08:22:28

On 07/09/2013 09:15 AM, ivan babrou wrote:
> Database server lost network — boom, 2 seconds delay. What's the point then?

Oh, I see. Good point. It could still improve connection time during
normal operation, though.

None the less, I now agree with you: we recommend a pooler, which may be
capable of millisecond timeouts, but arguably is vastly more complex
than the proposed patch. And it even brings its own set of gotchas (lots
of connections). I guess I don't quite buy the complexity argument, yet.

Sure, gettimeofday() is subject to clock adjustments. But so is time().
And if you're setting timeouts that low, you probably know what you're
doing (or at least care about latency a lot). Or is gettimeofday() still
considerably slower on certain architectures or in certain scenarios?
Where's the complexity?

Regards

Markus Wanner

Re: Millisecond-precision connect_timeout for libpq

From

Merlin Moncure

Date:

09 July 2013, 13:53:17

On Fri, Jul 5, 2013 at 12:28 PM, ivan babrou <ibobrik@gmail.com> wrote:
> Hi, guys! I made a quick patch to support floating number in
> connect_timeout param for libpq. This will treat floating number as
> seconds so this is backwards-compatible. I don't usually write in C,
> so there may be mistakes. Could you review it and give me some
> feedback?

First thing that jumps into my head: why not use asynchronous
connection (PQconnectStart, etc) and code the timeout on top of that?

merlin

Re: Millisecond-precision connect_timeout for libpq

From

Markus Wanner

Date:

09 July 2013, 14:01:26

Ian,

On 07/05/2013 07:28 PM, ivan babrou wrote:
> -            /*
> -             * Rounding could cause connection to fail; need at least 2 secs
> -             */

You removed this above comment... please check why it's there. The
relevant revision seems to be:

###
commit 2908a838ac2cf8cdccaa115249f8399eef8a731e
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Oct 24 23:35:55 2002 +0000

Code review for connection timeout patch.  Avoid unportable assumption
that tv_sec is signed; return a useful error message on timeout failure;
honor PGCONNECT_TIMEOUT environment variable in PQsetdbLogin; make code
obey documentation statement that timeout=0 means no timeout.
###

> -            if (timeout < 2)
> -                timeout = 2;
> -            /* calculate the finish time based on start + timeout */
> -            finish_time = time(NULL) + timeout;
> +            gettimeofday(&finish_time, NULL);
> +            finish_time.tv_usec += (int) timeout_usec;

I vaguely recall tv_usec only being required to hold values up to
1000000 by some standard. A signed 32 bit value would qualify, but only
hold up to a good half hour worth of microseconds. That doesn't quite
seem enough to calculate finish_time the way you are proposing to do it.

> +            finish_time.tv_sec  += finish_time.tv_usec / 1000000;
> +            finish_time.tv_usec  = finish_time.tv_usec % 1000000;
>          }
>      }
>  
> @@ -1073,15 +1074,15 @@ pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
>          input_fd.events |= POLLOUT;
>  
>      /* Compute appropriate timeout interval */
> -    if (end_time == ((time_t) -1))
> +    if (end_time == NULL)
>          timeout_ms = -1;
>      else
>      {
> -        time_t        now = time(NULL);
> +        struct timeval now;
> +        gettimeofday(&now, NULL);
>  
> -        if (end_time > now)
> -            timeout_ms = (end_time - now) * 1000;
> -        else
> +        timeout_ms = (end_time->tv_sec - now.tv_sec) * 1000 + (end_time->tv_usec - now.tv_usec) / 1000;

I think that's incorrect on a platform where tv_sec and/or tv_usec is
unsigned. (And the cited commit above indicates there are such platforms.)

On 07/09/2013 02:25 PM, ivan babrou wrote:
> There's no complexity here :)

Not so fast, cowboy...  :-)

Regards

Markus Wanner

Re: Millisecond-precision connect_timeout for libpq

From

Andres Freund

Date:

09 July 2013, 14:43:25

On 2013-07-05 21:28:59 +0400, ivan babrou wrote:
> Hi, guys! I made a quick patch to support floating number in
> connect_timeout param for libpq. This will treat floating number as
> seconds so this is backwards-compatible. I don't usually write in C,
> so there may be mistakes. Could you review it and give me some
> feedback?
> 
> --
> Regards, Ian Babrou
> http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
> index 18fcb0c..58c1a35 100644
> --- a/src/interfaces/libpq/fe-connect.c
> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -1452,7 +1452,7 @@ static int
>  connectDBComplete(PGconn *conn)
>  {
>      PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
> -    time_t        finish_time = ((time_t) -1);
> +    struct timeval        finish_time;
>  
>      if (conn == NULL || conn->status == CONNECTION_BAD)
>          return 0;
> @@ -1462,17 +1462,14 @@ connectDBComplete(PGconn *conn)
>       */
>      if (conn->connect_timeout != NULL)
>      {
> -        int            timeout = atoi(conn->connect_timeout);
> +        int            timeout_usec = (int) (atof(conn->connect_timeout) * 1000000);

I'd rather not use a plain int for storing usecs. An overflow is rather
unlikely, but still. Also, I'd rather use something like USECS_PER_SEC
instead of a plain 1000000 in multiple places.

>  
> -        if (timeout > 0)
> +        if (timeout_usec > 0)
>          {
> -            /*
> -             * Rounding could cause connection to fail; need at least 2 secs
> -             */
> -            if (timeout < 2)
> -                timeout = 2;
> -            /* calculate the finish time based on start + timeout */
> -            finish_time = time(NULL) + timeout;
> +            gettimeofday(&finish_time, NULL);
> +            finish_time.tv_usec += (int) timeout_usec;

Accordingly adjust this.

Looks like a sensible thing to me.

*Independent* from this patch, you might want look into server-side
connection pooling using transaction mode. If that's applicable for
your application it might reduce latency noticeably.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Millisecond-precision connect_timeout for libpq

From

Dmitriy Igrishin

Date:

09 July 2013, 15:17:44

2013/7/9 Merlin Moncure <mmoncure@gmail.com>

On Fri, Jul 5, 2013 at 12:28 PM, ivan babrou <ibobrik@gmail.com> wrote:
> Hi, guys! I made a quick patch to support floating number in
> connect_timeout param for libpq. This will treat floating number as
> seconds so this is backwards-compatible. I don't usually write in C,
> so there may be mistakes. Could you review it and give me some
> feedback?

First thing that jumps into my head: why not use asynchronous
connection (PQconnectStart, etc) and code the timeout on top of that?

+1.

merlin

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

--
// Dmitriy.

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

09 July 2013, 16:22:03

On 9 July 2013 11:05, Markus Wanner <markus@bluegap.ch> wrote:
> On 07/08/2013 08:31 PM, ivan babrou wrote:
>> Seriously, I don't get why running 150 poolers is easier.
>
> Did you consider running pgbouncer on the database servers?
>
> Regards
>
> Markus Wanner

Database server lost network — boom, 2 seconds delay. What's the point then?

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

09 July 2013, 16:22:24

On 9 July 2013 12:20, Markus Wanner <markus@bluegap.ch> wrote:
> On 07/09/2013 09:15 AM, ivan babrou wrote:
>> Database server lost network — boom, 2 seconds delay. What's the point then?
>
> Oh, I see. Good point. It could still improve connection time during
> normal operation, though.

Connection time during normal operation is 1.5ms which is fast enough for now.

> None the less, I now agree with you: we recommend a pooler, which may be
> capable of millisecond timeouts, but arguably is vastly more complex
> than the proposed patch. And it even brings its own set of gotchas (lots
> of connections). I guess I don't quite buy the complexity argument, yet.

Pooler isn't capable of millisecond timeouts. At least I don't see how
could I understand that pooler is dead in 50ms.

> Sure, gettimeofday() is subject to clock adjustments. But so is time().
> And if you're setting timeouts that low, you probably know what you're
> doing (or at least care about latency a lot). Or is gettimeofday() still
> considerably slower on certain architectures or in certain scenarios?
> Where's the complexity?

There's no complexity here :)

> Regards
>
> Markus Wanner

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

09 July 2013, 16:52:14

On 9 July 2013 17:59, Markus Wanner <markus@bluegap.ch> wrote:
> Ian,
>
> On 07/05/2013 07:28 PM, ivan babrou wrote:
>> -                     /*
>> -                      * Rounding could cause connection to fail; need at least 2 secs
>> -                      */
>
> You removed this above comment... please check why it's there. The
> relevant revision seems to be:
>
> ###
> commit 2908a838ac2cf8cdccaa115249f8399eef8a731e
> Author: Tom Lane <tgl@sss.pgh.pa.us>
> Date:   Thu Oct 24 23:35:55 2002 +0000

That's not correct, facb72007 is the relevant revision. It seems that
it's only applicable for small timeouts in seconds, but it you request
connect timeout in 1 ms you should be ready to fail. I may be wrong
about this, Bruce Momjian introduced that change in 2002.

> Code review for connection timeout patch.  Avoid unportable assumption
> that tv_sec is signed; return a useful error message on timeout failure;
> honor PGCONNECT_TIMEOUT environment variable in PQsetdbLogin; make code
> obey documentation statement that timeout=0 means no timeout.
> ###
>
>> -                     if (timeout < 2)
>> -                             timeout = 2;
>> -                     /* calculate the finish time based on start + timeout */
>> -                     finish_time = time(NULL) + timeout;
>> +                     gettimeofday(&finish_time, NULL);
>> +                     finish_time.tv_usec += (int) timeout_usec;
>
> I vaguely recall tv_usec only being required to hold values up to
> 1000000 by some standard. A signed 32 bit value would qualify, but only
> hold up to a good half hour worth of microseconds. That doesn't quite
> seem enough to calculate finish_time the way you are proposing to do it.

Agree, this should be fixed.

>> +                     finish_time.tv_sec  += finish_time.tv_usec / 1000000;
>> +                     finish_time.tv_usec  = finish_time.tv_usec % 1000000;
>>               }
>>       }
>>
>> @@ -1073,15 +1074,15 @@ pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
>>               input_fd.events |= POLLOUT;
>>
>>       /* Compute appropriate timeout interval */
>> -     if (end_time == ((time_t) -1))
>> +     if (end_time == NULL)
>>               timeout_ms = -1;
>>       else
>>       {
>> -             time_t          now = time(NULL);
>> +             struct timeval now;
>> +             gettimeofday(&now, NULL);
>>
>> -             if (end_time > now)
>> -                     timeout_ms = (end_time - now) * 1000;
>> -             else
>> +             timeout_ms = (end_time->tv_sec - now.tv_sec) * 1000 + (end_time->tv_usec - now.tv_usec) / 1000;
>
> I think that's incorrect on a platform where tv_sec and/or tv_usec is
> unsigned. (And the cited commit above indicates there are such platforms.)

I don't get it. timeout_ms is signed, and can hold unsigned -
unsigned. Is it about anything else?

> On 07/09/2013 02:25 PM, ivan babrou wrote:
>> There's no complexity here :)
>
> Not so fast, cowboy...  :-)
>
> Regards
>
> Markus Wanner

Is there anything else I should fix?

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

Merlin Moncure

Date:

09 July 2013, 20:18:21

On Fri, Jul 5, 2013 at 3:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 07/05/2013 12:26 PM, Tom Lane wrote:
>> ivan babrou <ibobrik@gmail.com> writes:
>>> If you can figure out that postgresql is overloaded then you may
>>> decide what to do faster. In our app we have very strict limit for
>>> connect time to mysql, redis and other services, but postgresql has
>>> minimum of 2 seconds. When processing time for request is under 100ms
>>> on average sub-second timeouts matter.
>>
>> If you are issuing a fresh connection for each sub-100ms query, you're
>> doing it wrong anyway ...
>
> It's fairly common with certain kinds of apps, including Rails and PHP.
>  This is one of the reasons why we've discussed having a kind of
> stripped-down version of pgbouncer built into Postgres as a connection
> manager.  If it weren't valuable to be able to relocate pgbouncer to
> other hosts, I'd still say that was a good idea.

for the record, I think this is a great idea.

merlin

Re: Millisecond-precision connect_timeout for libpq

From

Merlin Moncure

Date:

10 July 2013, 14:56:06

On Wed, Jul 10, 2013 at 2:54 AM, ivan babrou <ibobrik@gmail.com> wrote:
> On 9 July 2013 19:17, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
>>
>>
>>
>> 2013/7/9 Merlin Moncure <mmoncure@gmail.com>
>>>
>>> On Fri, Jul 5, 2013 at 12:28 PM, ivan babrou <ibobrik@gmail.com> wrote:
>>> > Hi, guys! I made a quick patch to support floating number in
>>> > connect_timeout param for libpq. This will treat floating number as
>>> > seconds so this is backwards-compatible. I don't usually write in C,
>>> > so there may be mistakes. Could you review it and give me some
>>> > feedback?
>>>
>>> First thing that jumps into my head: why not use asynchronous
>>> connection (PQconnectStart, etc) and code the timeout on top of that?
>>
>> +1.
>>>
>>>
>>> merlin
>>>
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>>
>>
>> --
>> // Dmitriy.
>>
>
> Doesn't look like straightforward solution for me. In my case existing
> drivers will benefit from my patch, in async case they should be
> rewritten. We don't use  libpq directly, we use native pgsql module
> from php.
>
> Even with that, kernel can wait for milliseconds — why should we limit
> precision 1000x times and reinvent milliseconds again in userspace?

Fair point.  Although I still agree with Tom in the sense that if I
were in your shoes I would be reconsidering certain parts of the
connection stack since you have such demanding requirements; even with
this solved I think other issues are lurking right around the corner.
That said, I did scan your patch briefly and noted it only changed
internal API functions and seems pretty straightforward. I withdraw my
objection.

merlin

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

10 July 2013, 16:13:53

On 9 July 2013 18:43, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-07-05 21:28:59 +0400, ivan babrou wrote:
>> Hi, guys! I made a quick patch to support floating number in
>> connect_timeout param for libpq. This will treat floating number as
>> seconds so this is backwards-compatible. I don't usually write in C,
>> so there may be mistakes. Could you review it and give me some
>> feedback?
>>
>> --
>> Regards, Ian Babrou
>> http://bobrik.name http://twitter.com/ibobrik skype:i.babrou
>
>> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
>> index 18fcb0c..58c1a35 100644
>> --- a/src/interfaces/libpq/fe-connect.c
>> +++ b/src/interfaces/libpq/fe-connect.c
>> @@ -1452,7 +1452,7 @@ static int
>>  connectDBComplete(PGconn *conn)
>>  {
>>       PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
>> -     time_t          finish_time = ((time_t) -1);
>> +     struct timeval          finish_time;
>>
>>       if (conn == NULL || conn->status == CONNECTION_BAD)
>>               return 0;
>> @@ -1462,17 +1462,14 @@ connectDBComplete(PGconn *conn)
>>        */
>>       if (conn->connect_timeout != NULL)
>>       {
>> -             int                     timeout = atoi(conn->connect_timeout);
>> +             int                     timeout_usec = (int) (atof(conn->connect_timeout) * 1000000);
>
> I'd rather not use a plain int for storing usecs. An overflow is rather
> unlikely, but still. Also, I'd rather use something like USECS_PER_SEC
> instead of a plain 1000000 in multiple places.
>
>>
>> -             if (timeout > 0)
>> +             if (timeout_usec > 0)
>>               {
>> -                     /*
>> -                      * Rounding could cause connection to fail; need at least 2 secs
>> -                      */
>> -                     if (timeout < 2)
>> -                             timeout = 2;
>> -                     /* calculate the finish time based on start + timeout */
>> -                     finish_time = time(NULL) + timeout;
>> +                     gettimeofday(&finish_time, NULL);
>> +                     finish_time.tv_usec += (int) timeout_usec;
>
> Accordingly adjust this.
>
> Looks like a sensible thing to me.
>
> *Independent* from this patch, you might want look into server-side
> connection pooling using transaction mode. If that's applicable for
> your application it might reduce latency noticeably.
>
> Greetings,
>
> Andres Freund
>
> --
>  Andres Freund                     http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

I tried to make it more safe. Still not sure about constants, I
haven't found any good examples in libpq.

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Attachment

connect_timeout_in_ms.patch

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

10 July 2013, 16:13:54

On 9 July 2013 19:17, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
>
>
>
> 2013/7/9 Merlin Moncure <mmoncure@gmail.com>
>>
>> On Fri, Jul 5, 2013 at 12:28 PM, ivan babrou <ibobrik@gmail.com> wrote:
>> > Hi, guys! I made a quick patch to support floating number in
>> > connect_timeout param for libpq. This will treat floating number as
>> > seconds so this is backwards-compatible. I don't usually write in C,
>> > so there may be mistakes. Could you review it and give me some
>> > feedback?
>>
>> First thing that jumps into my head: why not use asynchronous
>> connection (PQconnectStart, etc) and code the timeout on top of that?
>
> +1.
>>
>>
>> merlin
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
> --
> // Dmitriy.
>

Doesn't look like straightforward solution for me. In my case existing
drivers will benefit from my patch, in async case they should be
rewritten. We don't use  libpq directly, we use native pgsql module
from php.

Even with that, kernel can wait for milliseconds — why should we limit
precision 1000x times and reinvent milliseconds again in userspace?

--
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou

Re: Millisecond-precision connect_timeout for libpq

From

Josh Berkus

Date:

10 July 2013, 17:47:04

On 07/09/2013 01:18 PM, Merlin Moncure wrote:
> On Fri, Jul 5, 2013 at 3:01 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> It's fairly common with certain kinds of apps, including Rails and PHP.
>>  This is one of the reasons why we've discussed having a kind of
>> stripped-down version of pgbouncer built into Postgres as a connection
>> manager.  If it weren't valuable to be able to relocate pgbouncer to
>> other hosts, I'd still say that was a good idea.
> 
> for the record, I think this is a great idea.

Well, we discussed this a bit last year.  I'd personally love to have an
event-based connection manager tied to local Postgres, which could use
all of PostgreSQL's various authentication methods.  For the simple
case, where a user has a rails/mod_python/PHP app which just spawns lots
of connections, this could make the whole "too many connections"
headache go away as a concern for first-time users.

We'd still need pgbouncer for the serious scalability cases (so that we
could put it on other machines), but I am a bit tired of explaining how
to manage max_connections to people.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Millisecond-precision connect_timeout for libpq

From

Andres Freund

Date:

15 July 2013, 12:41:07

On 2013-07-15 16:34:07 +0400, ivan babrou wrote:
> Is there any hope to see it in libpq? If so, can anyone review latest
> version of my patch?

A littlebit of patience please. There are heaps of other patches still
being reviewed that have been posted earlier than yours ;). I know it
can be frustrating, but that doesn't make reviewer time grow on trees...

http://wiki.postgresql.org/wiki/CommitFest

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Millisecond-precision connect_timeout for libpq

From

ivan babrou

Date:

15 July 2013, 16:26:46

Is there any hope to see it in libpq? If so, can anyone review latest
version of my patch?

On 10 July 2013 11:49, ivan babrou <ibobrik@gmail.com> wrote:
> On 9 July 2013 18:43, Andres Freund <andres@2ndquadrant.com> wrote:
>> On 2013-07-05 21:28:59 +0400, ivan babrou wrote:
>>> Hi, guys! I made a quick patch to support floating number in
>>> connect_timeout param for libpq. This will treat floating number as
>>> seconds so this is backwards-compatible. I don't usually write in C,
>>> so there may be mistakes. Could you review it and give me some
>>> feedback?
>>>
>>> --
>>> Regards, Ian Babrou
>>> http://bobrik.name http://twitter.com/ibobrik skype:i.babrou
>>
>>> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
>>> index 18fcb0c..58c1a35 100644
>>> --- a/src/interfaces/libpq/fe-connect.c
>>> +++ b/src/interfaces/libpq/fe-connect.c
>>> @@ -1452,7 +1452,7 @@ static int
>>>  connectDBComplete(PGconn *conn)
>>>  {
>>>       PostgresPollingStatusType flag = PGRES_POLLING_WRITING;
>>> -     time_t          finish_time = ((time_t) -1);
>>> +     struct timeval          finish_time;
>>>
>>>       if (conn == NULL || conn->status == CONNECTION_BAD)
>>>               return 0;
>>> @@ -1462,17 +1462,14 @@ connectDBComplete(PGconn *conn)
>>>        */
>>>       if (conn->connect_timeout != NULL)
>>>       {
>>> -             int                     timeout = atoi(conn->connect_timeout);
>>> +             int                     timeout_usec = (int) (atof(conn->connect_timeout) * 1000000);
>>
>> I'd rather not use a plain int for storing usecs. An overflow is rather
>> unlikely, but still. Also, I'd rather use something like USECS_PER_SEC
>> instead of a plain 1000000 in multiple places.
>>
>>>
>>> -             if (timeout > 0)
>>> +             if (timeout_usec > 0)
>>>               {
>>> -                     /*
>>> -                      * Rounding could cause connection to fail; need at least 2 secs
>>> -                      */
>>> -                     if (timeout < 2)
>>> -                             timeout = 2;
>>> -                     /* calculate the finish time based on start + timeout */
>>> -                     finish_time = time(NULL) + timeout;
>>> +                     gettimeofday(&finish_time, NULL);
>>> +                     finish_time.tv_usec += (int) timeout_usec;
>>
>> Accordingly adjust this.
>>
>> Looks like a sensible thing to me.
>>
>> *Independent* from this patch, you might want look into server-side
>> connection pooling using transaction mode. If that's applicable for
>> your application it might reduce latency noticeably.
>>
>> Greetings,
>>
>> Andres Freund
>>
>> --
>>  Andres Freund                     http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>
> I tried to make it more safe. Still not sure about constants, I
> haven't found any good examples in libpq.
>
> --
> Regards, Ian Babrou
> http://bobrik.name http://twitter.com/ibobrik skype:i.babrou



-- 
Regards, Ian Babrou
http://bobrik.name http://twitter.com/ibobrik skype:i.babrou