Re: Built-in connection pooler - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Built-in connection pooler
Date
Msg-id 20190730131246.ihphwoy4sywfdhib@development
Whole thread Raw
In response to Re: Built-in connection pooler  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Built-in connection pooler
List pgsql-hackers
On Tue, Jul 30, 2019 at 01:01:48PM +0300, Konstantin Knizhnik wrote:
>
>
>On 30.07.2019 4:02, Tomas Vondra wrote:
>>
>>My idea (sorry if it wasn't too clear) was that we might handle some
>>cases more gracefully.
>>
>>For example, if we only switch between transactions, we don't quite care
>>about 'SET LOCAL' (but the current patch does set the tainted flag). The
>>same thing applies to GUCs set for a function.
>>For prepared statements, we might count the number of statements we
>>prepared and deallocated, and treat it as 'not tained' when there are no
>>statements. Maybe there's some risk I can't think of.
>>
>>The same thing applies to temporary tables - if you create and drop a
>>temporary table, is there a reason to still treat the session as tained?
>>
>>
>
>I already handling temporary tables with transaction scope (created 
>using "create temp table ... on commit drop" command) - backend is not 
>marked as tainted in this case.
>Thank you for your notice about "set local" command - attached patch 
>is also handling such GUCs.
>

Thanks.

>
>>>To implement prepared statements  we need to store them in session 
>>>context or at least add some session specific prefix to prepare 
>>>statement name.
>>>Temporary tables also require per-session temporary table space. 
>>>With GUCs situation is even more complicated - actually most of 
>>>the time in my PgPro-EE pooler version
>>>I have spent in the fight with GUCs (default values, reloading 
>>>configuration, memory alllocation/deallocation,...).
>>>But the "show stopper" are temporary tables: if them are accessed 
>>>through internal (non-shared buffer), then you can not reschedule 
>>>session to some other backend.
>>>This is why I have now patch with implementation of global 
>>>temporary tables (a-la Oracle) which has global metadata and are 
>>>accessed though shared buffers (which also allows to use them
>>>in parallel queries).
>>>
>>
>>Yeah, temporary tables are messy. Global temporary tables would be nice,
>>not just because of this, but also because of catalog bloat.
>>
>
>Global temp tables solves two problems:
>1. catalog bloating
>2. parallel query execution.
>
>Them are not solving problem with using temporary tables at replica.
>May be this problem can be solved by implementing special table access 
>method for temporary tables.
>But I am still no sure how useful will be such implementation of 
>special table access method for temporary tables.
>Obviously it requires much more efforts (need to reimplement a lot of 
>heapam stuff).
>But it will allow to eliminate MVCC overhead for temporary tuple and 
>may be also reduce space by reducing size of tuple header.
>

Sure. Temporary tables are a hard issue (another place where they cause
trouble are 2PC transactions, IIRC), so I think it's perfectly sensible to
accept the limitation, handle cases that we can handle and see if we can
improve the remaining cases later.

>
>
>>
>>>If Postgres backend is able to work only with on database, then 
>>>you will have to start at least such number of backends as number 
>>>of databases you have.
>>>Situation with users is more obscure - it may be possible to 
>>>implement multiuser access to the same backend (as it can be done 
>>>now using "set role").
>>>

Yes, that's a direct consequence of the PostgreSQL process model.

>>
>>I don't think I've said we need anything like that. The way I'd expect
>>it to work that when we run out of backend connections, we terminate
>>some existing ones (and then fork new backends).
>
>I afraid that it may eliminate most of positive effect of session 
>pooling if we will  terminate and launch new backends without any 
>attempt to bind backends to database and reuse them.
>

I'm not sure I understand. Surely we'd still reuse connections as much as
possible - we'd still keep the per-db/user connection pools, but after
running out of backend connections we'd pick a victim in one of the pools,
close it and open a new connection.

We'd need some logic for picking the 'victim' but that does not seem
particularly hard - idle connections first, then connections from
"oversized" pools (this is one of the reasons why pgbouncer has
min_connection_pool).

>>
>>>So I am not sure that if we implement sophisticated configurator 
>>>which allows to specify in some configuration file for each 
>>>database/role pair maximal/optimal number
>>>of workers, then it completely eliminate the problem with multiple 
>>>session pools.
>>>
>>
>>Why would we need to invent any sophisticated configurator? Why couldn't
>>we use some version of what pgbouncer already does, or maybe integrate
>>it somehow into pg_hba.conf?
>
>I didn't think about such possibility.
>But I suspect many problems with reusing pgbouncer code and moving it 
>to Postgres core.
>

To be clear - I wasn't suggesting to copy any code from pgbouncer. It's
far too different (style, ...) compared to core. I'm suggesting to adopt
roughly the same configuration approach, i.e. what parameters are allowed
for each pool, global limits, etc.

I don't know whether we should have a separate configuration file, make it
part of pg_hba.conf somehow, or store the configuration in a system
catalog. But I'm pretty sure we don't need a "sophisticated configurator".

>>I also agree that more monitoring facilities are needed.
>>>Just want to get better understanding what kind of information we 
>>>need to monitor.
>>>As far as pooler is done at transaction level, all non-active 
>>>session are in idle state
>>>and state of active sessions can be inspected using pg_stat_activity.
>>>
>>
>>Except when sessions are tainted, for example. And when the transactions
>>are long-running, it's still useful to list the connections.
>>
>Tainted backends are very similar with normal postgres backends.
>The only difference is that them are still connected with client 
>though proxy.
>What I wanted to say is that pg_stat_activity will show you 
>information about all active transactions
>even in case of connection polling.  You will no get information about 
>pended sessions, waiting for
>idle backends. But such session do not have any state (transaction is 
>not started yet). So there is no much useful information
>we can show about them except just number of such pended sessions.
>

I suggest you take a look at metrics used for pgbouncer monitoring. Even
when you have pending connection, you can still get useful data about that
(e.g. average wait time to get a backend, maximum wait time, ...).

Furthermore, how will you know from pg_stat_activity whether a connection
is coming through a connection pool or not? Or that it's (not) tainted? Or
how many backends are used by all connection pools combined?

Because those are questions people will be asking when investigating
issues, and so on.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgsql-hackers by date:

Previous
From: Ibrar Ahmed
Date:
Subject: Re: SQL:2011 PERIODS vs Postgres Ranges?
Next
From: Binguo Bao
Date:
Subject: Re: [proposal] de-TOAST'ing using a iterator