Re: Built-in connection pooling - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Built-in connection pooling
Date
Msg-id 7bc1e831-54e6-b2db-f083-2d26e9ae0dd7@2ndquadrant.com
Whole thread Raw
In response to Re: Built-in connection pooling  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Built-in connection pooling
Re: Built-in connection pooling
List pgsql-hackers
On 01/19/2018 05:17 PM, Konstantin Knizhnik wrote:
> 
> 
> On 19.01.2018 18:53, Tomas Vondra wrote:
>>
>> ...
>>
>> The questions I'm asking myself are mostly these:
>>
>> 1) When assigning a backend, we first try to get one from a pool, which
>> happens right at the beginning of BackendStartup. If we find a usable
>> backend, we send the info to the backend (pg_send_sock/pg_recv_sock).
>>
>> But AFAICS this only only happens at connection time, right? But it your
>> initial message you say "Rescheduling is done at transaction level,"
>> which in my understanding means "transaction pooling". So, how does that
>> part work?
> 
> Here it is:
> 
>               ChooseSession:
>                 ...
> 

OK, thanks.

>>
>> 2) How does this deal with backends for different databases? I
>> don't see any checks that the requested database matches the
>> backend database (not any code switching the backend from one db to
>> another - which would be very tricky, I think).
> As I wrote in the initial mail this problem is not handled now.
> It is expected that all clients are connected to the same database using
> the same user.
> I only check and report an error if this assumption is violated.
> Definitely it should be fixed. And it is one of the main challenge with
> this approach! And I want to receive some advices from community about
> the best ways of solving it.
> The problem is that we get information about database/user in
> ProcessStartupPackage function in the beackend, when session is already
> assigned to the particular backend.
> We either have to somehow redirect session to some other backend
> (somehow notify postmaster that we are not able to handle it)?
> either obtain database/user name in postmaster. But it meas that
> ProcessStartupPackage should be called in postmaster and Postmaster has
> to read from client's socket.
> I afraid that postmaster can be a bottleneck in this case.
> 

Hmmm, that's unfortunate. I guess you'll have process the startup packet
in the main process, before it gets forked. At least partially.

> The problem can be much easily solved in case of using pthread version
> of Postgres. In this case reassigning session to another executor
> (thread) can be don much easily.
> And there is no need to use unportable trick with passing fiel
> descriptor to other process.
> And in future I am going to combine them. The problem is that pthread
> version of Postgres is still in very raw state.
> 

Yeah. Unfortunately, we're using processes now, and switching to threads
will take time (assuming it happens at all).

>> 3) Is there any sort of shrinking the pools? I mean, if the backend is
>> idle for certain period of time (or when we need backends for other
>> databases), does it get closed automatically?
> 
> When client is disconnected, client session is closed. But backen is not
> terminated even if there are no more sessions at this backend.
> It  was done intentionally, to avoid permanent spawning of new processes
> when there is one or few clients which frequently connect/disconnect to
> the database.

Sure, but it means a short peak will exhaust the backends indefinitely.
That's acceptable for a PoC, but I think needs to be fixed eventually.

>>
>> Furthermore, I'm rather confused about the meaning of session_pool_size.
>> I mean, that GUC determines the number of backends in the pool, it has
>> nothing to do with sessions per se, right? Which would mean it's a bit
>> misleading to name it "session_..." (particularly if the pooling happens
>> at transaction level, not session level - which is question #1).
>>
> Yehh, yes it is not right name. It means maximal number of backends
> which should be used to serve client's sessions.
> But "max backends" is already used and has completely different meaning.
> 
>> When I've been thinking about adding a built-in connection pool, my
>> rough plan was mostly "bgworker doing something like pgbouncer" (that
>> is, listening on a separate port and proxying everything to regular
>> backends). Obviously, that has pros and cons, and probably would not
>> work serve the threading use case well.
> 
> And we will get the same problem as with pgbouncer: one process will not
> be able to handle all connections...
> Certainly it is possible to start several such scheduling bgworkers...
> But in any case it is more efficient to multiplex session in backend
> themselves.
> 

Well, I haven't said it has to be single-threaded like pgbouncer. I
don't see why the bgworker could not use multiple threads internally (of
course, it'd need to be not to mess the stuff that is not thread-safe).


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: Built-in connection pooling
Next
From: Pavel Stehule
Date:
Subject: Re: Built-in connection pooling