Re: Built-in connection pooler - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Built-in connection pooler
Date
Msg-id 231d5efd-f176-f50e-d103-5cdd3a32ce56@postgrespro.ru
Whole thread Raw
In response to Re: Built-in connection pooler  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers

On 15.07.2019 1:48, Thomas Munro wrote:
> On Mon, Jul 15, 2019 at 7:56 AM Konstantin Knizhnik
> <k.knizhnik@postgrespro.ru> wrote:
>> Can you please explain me more precisely how to reproduce the problem
>> (how to configure postgres and how to connect to it)?
> Maybe it's just that postmaster.c's ConnCreate() always allocates a
> struct for port->gss if the feature is enabled, but the equivalent
> code in proxy.c's proxy_loop() doesn't?
>
> ./configure
>    --prefix=$HOME/install/postgres \
>    --enable-cassert \
>    --enable-debug \
>    --enable-depend \
>    --with-gssapi \
>    --with-icu \
>    --with-pam \
>    --with-ldap \
>    --with-openssl \
>    --enable-tap-tests \
>    --with-includes="/opt/local/include" \
>    --with-libraries="/opt/local/lib" \
>    CC="ccache cc" CFLAGS="-O0"
>
> ~/install/postgres/bin/psql postgres -h localhost -p 6543
>
> 2019-07-15 08:37:25.348 NZST [97972] LOG:  server process (PID 97974)
> was terminated by signal 11: Segmentation fault: 11
>
> (lldb) bt
> * thread #1, stop reason = signal SIGSTOP
>    * frame #0: 0x0000000104163e7e
> postgres`secure_read(port=0x00007fda9ef001d0, ptr=0x00000001047ab690,
> len=8192) at be-secure.c:164:6
>      frame #1: 0x000000010417760d postgres`pq_recvbuf at pqcomm.c:969:7
>      frame #2: 0x00000001041779ed postgres`pq_getbytes(s="??\x04~?,
> len=1) at pqcomm.c:1110:8
>      frame #3: 0x0000000104284147
> postgres`ProcessStartupPacket(port=0x00007fda9ef001d0,
> secure_done=true) at postmaster.c:2064:6
> ...
> (lldb) f 0
> frame #0: 0x0000000104163e7e
> postgres`secure_read(port=0x00007fda9ef001d0, ptr=0x00000001047ab690,
> len=8192) at be-secure.c:164:6
>     161         else
>     162     #endif
>     163     #ifdef ENABLE_GSS
> -> 164         if (port->gss->enc)
>     165         {
>     166             n = be_gssapi_read(port, ptr, len);
>     167             waitfor = WL_SOCKET_READABLE;
> (lldb) print port->gss
> (pg_gssinfo *) $0 = 0x0000000000000000

Thank you, fixed (pushed to 
https://github.com/postgrespro/postgresql.builtin_pool.git repository).
I am not sure that GSS authentication works as intended, but at least

psql "sslmode=require dbname=postgres port=6543 krbsrvname=POSTGRES"

is normally connected.


>>> Obviously your concept of tainted backends (= backends that can't be
>>> reused by other connections because they contain non-default session
>>> state) is quite simplistic and would help only the very simplest use
>>> cases.  Obviously the problems that need to be solved first to do
>>> better than that are quite large.  Personally I think we should move
>>> all GUCs into the Session struct, put the Session struct into shared
>>> memory, and then figure out how to put things like prepared plans into
>>> something like Ideriha-san's experimental shared memory context so
>>> that they also can be accessed by any process, and then we'll mostly
>>> be tackling problems that we'll have to tackle for threads too.  But I
>>> think you made the right choice to experiment with just reusing the
>>> backends that have no state like that.
>> That was not actually my idea: it was proposed by Dimitri Fontaine.
>> In PgPRO-EE we have another version of builtin connection pooler
>> which maintains session context and allows to use GUCs, prepared statements
>> and temporary tables in pooled sessions.
> Interesting.  Do you serialise/deserialise plans and GUCs using
> machinery similar to parallel query, and did you changed temporary
> tables to use shared buffers?  People have suggested that we do that
> to allow temporary tables in parallel queries too.  FWIW I have also
> wondered about per (database, user) pools for reusable parallel
> workers.

No. The main difference between PG-Pro (conn_pool) and vanilla 
(conn_proxy) version of connection pooler
is that first one  bind sessions to pool workers while last one is using 
proxy to scatter requests between workers.

So in conn_pool postmaster accepts client connection and the schedule it 
(using one of provided scheduling policies, i.e. round robin, random,
load balancing,...) to one of worker backends. Then client socket is 
transferred to this backend and client and backend are connected directly.
Session is never rescheduled, i.e. it is bounded to backend. One backend 
is serving multiple sessions. Sessions GUCs and some static variables
are saved in session context. Each session has its own temporary 
namespace, so temporary tables of different sessions do not interleave.

Direct connection of client and backend allows to eliminate proxy 
overhead. But at the same time - binding session to backend it is the 
main drawback of this approach. Long living transaction can block all 
other sessions scheduled to this backend.
I have though a lot about possibility to reschedule sessions. The main 
"show stopper" is actually temporary tables.
Implementation of temporary tables is one of the "bad smelling" places 
in Postgres causing multiple problems:
parallel queries, using temporary table at replica, catalog bloating, 
connection pooling...
This is why I have tried to implement "global" temporary tables (like in 
Oracle).
I am going to publish this patch soon: in this case table definition is 
global, but data is local for each session.
Also global temporary tables are accessed through shared pool which 
makes it possible to use them in parallel queries.
But it is separate story, not currently related with connection pooling.

Approach with proxy doesn't suffer from this problem.



>> But the main idea of this patch was to make connection pooler less
>> invasive and
>> minimize changes in Postgres core. This is why I have implemented proxy.
> How do you do it without a proxy?

I hope I have explained it above.
Actually this version pf connection pooler is also available at github 
https://github.com/postgrespro/postgresql.builtin_pool.git repository
branch conn_pool).

> Ok, time for a little bit more testing.  I was curious about the user
> experience when there are no free backends.
>
> 1.  I tested with connection_proxies=1, max_connections=4, and I began
> 3 transactions.  Then I tried to make a 4th connection, and it blocked
> while trying to connect and the log shows a 'sorry, too many clients
> already' message.  Then I committed one of my transactions and the 4th
> connection was allowed to proceed.
>
> 2.  I tried again, this time with 4 already existing connections and
> no transactions.  I began 3 concurrent transactions, and then when I
> tried to begin a 4th transaction the BEGIN command blocked until one
> of the other transactions committed.
>
> Some thoughts:   Why should case 1 block?  Shouldn't I be allowed to
> connect, even though I can't begin a transaction without waiting yet?
>   Why can I run only 3 transactions when I said max_connection=4?  Ah,
> that's probably because the proxy itself is eating one slot, and
> indeed if I set connection_proxies to 2 I can now run only two
> concurrent transactions.  And yet when there were no transactions
> running I could still open 4 connections.  Hmm.

max_connections is not the right switch to control behavior of 
connection pooler.
You should adjust session_pool_size, connection_proxies and max_sessions 
parameters.

What happen in 1) case? Default value of session_pool_size is 10. It 
means that postgres will spawn up to 10 backens for each database/user 
combination. But max_connection limit will exhausted earlier. May be it 
is better to prohibit setting session_pool_size larger than max_connection
or automatically adjust it according to max_connections. But it is also 
no so easy to enforce, because separate set of pool workers is created for
each database/user combination.

So I agree that observer behavior is confusing, but I do not have good 
idea how to improve it.

>
> The general behaviour of waiting instead of raising an error seems
> sensible, and that's how client-side connection pools usually work.
> Perhaps some of the people who have wanted admission control were
> thinking of doing it at the level of queries rather than whole
> transactions though, I don't know.  I suppose in extreme cases it's
> possible to create invisible deadlocks, if a client is trying to
> control more than one transaction concurrently, but that doesn't seem
> like a real problem.
>
>>> On my FreeBSD box (which doesn't have epoll(), so it's latch.c's old
>>> school poll() for now), I see the connection proxy process eating a
>>> lot of CPU and the temperature rising.  I see with truss that it's
>>> doing this as fast as it can:
>>>
>>> poll({ 13/POLLIN 17/POLLIN|POLLOUT },2,1000)     = 1 (0x1)
>>>
>>> Ouch.  I admit that I had the idea to test on FreeBSD because I
>>> noticed the patch introduces EPOLLET and I figured this might have
>>> been tested only on Linux.  FWIW the same happens on a Mac.
>> Yehh.
>> This is really the problem. In addition to FreeBSD and MacOS, it also
>> takes place at Win32.
>> I have to think more how to solve it. We should somehow emulate
>> "edge-triggered" more for this system.
>> The source of the problem is that proxy is reading data from one socket
>> and writing it in another socket.
>> If write socket is busy, we have to wait until is is available. But at
>> the same time there can be data available for input,
>> so poll will return immediately unless we remove read event for this
>> socket. Not here how to better do it without changing
>> WaitEvenSet API.
> Can't you do this by removing events you're not interested in right
> now, using ModifyWaitEvent() to change between WL_SOCKET_WRITEABLE and
> WL_SOCKET_READABLE as appropriate?  Perhaps the problem you're worried
> about is that this generates extra system calls in the epoll()
> implementation?  I think that's not a problem for poll(), and could be
> fixed for the kqueue() implementation I plan to commit eventually.  I
> have no clue for Windows.
Window implementation is similar with poll().
Yes, definitely it can be done by removing read handle from wait even 
set after each pending write operation.
But it seems to be very inefficient in case of epoll() implementation 
(where changing event set requires separate syscall).
So either we have to add some extra function, i.e. WaitEventEdgeTrigger 
which will do nothing for epoll(),
either somehow implement edge triggering inside WaitEvent* 
implementation itself (not clear how to do it, since read/write 
operations are
not performed through this API).



>> Looks like proxy.c has to be moved somewhere outside core?
>> May be make it an extension? But it may be not so easy to implement because
>> proxy has to be tightly integrated with postmaster.
> I'm not sure.  Seems like it should be solvable with the code in core.
>
I also think so. It is now working at Unix and I hope I can also fix 
Win32 build.



pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Insecure initialization of required_relids field
Next
From: Paul Guo
Date:
Subject: Re: Two pg_rewind patches (auto generate recovery conf and ensureclean shutdown)