Re: Built-in connection pooler - Mailing list pgsql-hackers

From Ryan Lambert
Subject Re: Built-in connection pooler
Date
Msg-id CAN-V+g99RvCr5GCRkn_H6k2Mom76HcAqFrfzQxuFNEa-Hip5Mw@mail.gmail.com
Whole thread Raw
In response to Re: Built-in connection pooler  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Built-in connection pooler  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
Hello Konstantin,

> Concerning testing: I do not think that connection pooler needs some kind of special tests.
> The idea of built-in connection pooler is that it should be able to handle all requests normal postgres can do.
> I have added to regression tests extra path with enabled connection proxies.
> Unfortunately, pg_regress is altering some session variables, so it backend becomes tainted and so
> pooling is not actually used (but communication through proxy is tested).
> Thank you for your work on this patch, I took some good time to really explore the configuration and do some testing with pgbench.  This round of testing was done against patch 10 [1] and master branch commit a0555ddab9 from 7/22.

Thank you for explaining, I wasn't sure.


make installcheck-world: tested, passed
Implements feature:  tested, passed
Documentation:  I need to review again, I saw typos when testing but didn't make note of the details.

Applying the patch [1] has improved from v9, still getting these:

git apply -p1 < builtin_connection_proxy-10.patch
<stdin>:1536: indent with spaces.
            /* Has pending clients: serve one of them */
<stdin>:1936: indent with spaces.
                        /* If we attach new client to the existed backend, then we need to send handshake response to the client */
<stdin>:2208: indent with spaces.
                            if (port->sock == PGINVALID_SOCKET)
<stdin>:2416: indent with spaces.
    FuncCallContext* srf_ctx;
<stdin>:2429: indent with spaces.
        ps_ctx = (PoolerStateContext*)palloc(sizeof(PoolerStateContext));
warning: squelched 5 whitespace errors
warning: 10 lines add whitespace errors.


I used a DigitalOcean droplet with 2 CPU and 2 GB RAM and SSD for this testing, Ubuntu 18.04.  I chose the smaller server size based on the availability of similar and recent results around connection pooling [2] that used AWS EC2 m4.large instance (2 cores, 8 GB RAM) and pgbouncer.  Your prior pgbench tests [3] also focused on larger servers so I wanted to see how this works on smaller hardware.

Considering this from connpool.sgml:
"<varname>connection_proxies</varname> specifies number of connection proxy processes which will be spawned. Default value is zero, so connection pooling is disabled by default."

That hints to me that connection_proxies is the main configuration to start with so that was the only configuration I changed from the default for this feature.  I adjusted shared_buffers to 500MB (25% of total) and max_connections to 1000.  Only having one proxy gives subpar performance across the board, so did setting this value to 10.  My hunch is this value should roughly follow the # of cpus available, but that's just a hunch.

I tested with 25, 75, 150, 300 and 600 connections.  Initialized with a scale of 1000 and ran read-only tests.  Basic pgbench commands look like the following, I have full commands and results from 18 tests included in the attached MD file.  Postgres was restarted between each test.

pgbench -i -s 1000 bench_test
pgbench -p 6543 -c 300 -j 2 -T 600 -P 60 -S bench_test

Tests were all ran from the same server.  I intend to do further testing with external connections using SSL.

General observations:
For each value of connection_proxies, the TPS observed at 25 connections held up reliably through 600 connections. For this server using connection_proxies = 2 was the fastest, but setting to 1 or 10 still provided **predictable** throughput.  That seems to be a good attribute for this feature.

Also predictable was the latency increase, doubling the connections roughly doubles the latency.  This was true with or without connection pooling.

Focusing on disabled connection pooling vs the feature with two proxies, the results are promising, the breakpoint seems to be around 150 connections.

Low connections (25):  -15% TPS; +45% latency
Medium connections (300):  +21% TPS; +38% latency
High connections (600):  Couldn't run without pooling... aka: Win for pooling!

The two attached charts show the TPS and average latency of these two scenarios.  This feature does a great job of maintaining a consistent TPS as connection numbers increase.  This comes with tradeoffs of lower throughput with < 150 connections, and higher latency across the board.
The increase in latency seems reasonable to me based on the testing I have done so far. Compared to the results from [2] it seems latency is affecting this feature a bit more than it does pgbouncer, yet not unreasonably so given the benefit of having the feature built in and the reduced complexity.

I don't understand yet how max_sessions ties in.
Also, having both session_pool_size and connection_proxies seemed confusing at first.  I still haven't figured out exactly how they relate together in the overall operation and their impact on performance.  The new view helped, I get the concept of **what** it is doing (connection_proxies = more rows, session_pool_size = n_backends for each row), it's more a lack of understanding the **why** regarding how it will operate.


postgres=# select * from pg_pooler_state();
 pid  | n_clients | n_ssl_clients | n_pools | n_backends | n_dedicated_backends | tx_bytes  | rx_bytes  | n_transactions
------+-----------+---------------+---------+------------+----------------------+-----------+-----------+----------------
 1682 |        75 |             0 |       1 |         10 |                    0 | 366810458 | 353181393 |        5557109
 1683 |        75 |             0 |       1 |         10 |                    0 | 368464689 | 354778709 |        5582174
(2 rows



I am not sure how I feel about this:
"Non-tainted backends are not terminated even if there are no more connected sessions."

Would it be possible (eventually) to monitor connection rates and free up non-tainted backends after a time?  The way I'd like to think of that working would be:

If 50% of backends are unused for more than 1 hour, release 10% of established backends.

The two percentages and time frame would ideally be configurable, but setup in a way that it doesn't let go of connections too quickly, causing unnecessary expense of re-establishing those connections.  My thought is if there's one big surge of connections followed by a long period of lower connections, does it make sense to keep those extra backends established?


I'll give the documentation another pass soon.  Thanks for all your work on this, I like what I'm seeing so far!

[1] https://www.postgresql.org/message-id/attachment/102732/builtin_connection_proxy-10.patch
[2] http://richyen.com/postgres/2019/06/25/pools_arent_just_for_cars.html
[3] https://www.postgresql.org/message-id/ede4470a-055b-1389-0bbd-840f0594b758%40postgrespro.ru


Thanks,
Ryan Lambert



On Fri, Jul 19, 2019 at 3:10 PM Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:


On 19.07.2019 6:36, Ryan Lambert wrote:
Here's what I found tonight in your latest patch (9).  The output from git apply is better, fewer whitespace errors, but still getting the following.  Ubuntu 18.04 if that helps.

git apply -p1 < builtin_connection_proxy-9.patch
<stdin>:79: tab in indent.
                  Each proxy launches its own subset of backends.
<stdin>:634: indent with spaces.
    union {
<stdin>:635: indent with spaces.
       struct sockaddr_in inaddr;
<stdin>:636: indent with spaces.
       struct sockaddr addr;
<stdin>:637: indent with spaces.
    } a;
warning: squelched 54 whitespace errors
warning: 59 lines add whitespace errors.


A few more minor edits.  In config.sgml:

"If the <varname>max_sessions</varname> limit is reached new connection are not accepted."
Should be "connections".

"The default value is 10, so up to 10 backends will server each database,"
"sever" should be "serve" and the sentence should end with a period instead of a comma.


In postmaster.c:

/* The socket number we are listening for poolled connections on */
"poolled" --> "pooled"


"(errmsg("could not create listen socket for locahost")));"

"locahost" -> "localhost".


" * so to support order balancing we should do dome smart work here."

"dome" should be "some"?

I don't see any tests covering this new functionality.  It seems that this is significant enough functionality to warrant some sort of tests, but I don't know exactly what those would/should be.


Thank you once again for this fixes.
Updated patch is attached.

Concerning testing: I do not think that connection pooler needs some kind of special tests.
The idea of built-in connection pooler is that it should be able to handle all requests normal postgres can do.
I have added to regression tests extra path with enabled connection proxies.
Unfortunately, pg_regress is altering some session variables, so it backend becomes tainted and so
pooling is not actually used (but communication through proxy is tested).

It is  also possible to run pg_bench with different number of connections though connection pooler.


Thanks,
Ryan

Attachment

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Memory Accounting
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.