Re: Built-in connection pooling - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Built-in connection pooling
Date
Msg-id 40404619-0ebf-a642-b010-0805e66609c7@postgrespro.ru
Whole thread Raw
In response to Re: Built-in connection pooling  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Built-in connection pooling
List pgsql-hackers

On 23.04.2018 23:14, Robert Haas wrote:
> On Wed, Apr 18, 2018 at 9:41 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>> Well, may be I missed something, but i do not know how to efficiently
>>> support
>>> 1. Temporary tables
>>> 2. Prepared statements
>>> 3. Sessoin GUCs
>>> with any external connection pooler (with pooling level other than
>>> session).
>> Me neither. What makes it easier to do these things in an internal
>> connection pooler? What could the backend do differently, to make these
>> easier to implement in an external pooler?
> I think you are Konstantin are possibly failing to see the big picture
> here.  Temporary tables, prepared statements, and GUC settings are
> examples of session state that users expect will be preserved for the
> lifetime of a connection and not beyond; all session state, of
> whatever kind, has the same set of problems.  A transparent connection
> pooling experience means guaranteeing that no such state vanishes
> before the user ends the current session, and also that no such state
> established by some other session becomes visible in the current
> session.  And we really need to account for *all* such state, not just
> really big things like temporary tables and prepared statements and
> GUCs but also much subtler things such as the state of the PRNG
> established by srandom().

It is not quit true thst I have not realized this issues.
In addition to connection pooling, I have also implemented pthread 
version of Postgres and their static variables are replaced with 
thread-local variables which let each thread use its own set of variables.

Unfortunately in connection pooling this approach can not be used.
But I think that performing scheduling at transaction level will 
eliminate the problem with static variables in most cases.
My expectation is that there are very few of them which has 
session-level lifetime.
Unfortunately it is not so easy to locate all such places. Once such 
variables are located, them can be saved in session context and restored 
on reschedule.

More challenging thing is to handle system static variables which which 
can not be easily saved/restored. You example with srandom is exactly 
such case.
Right now I do not know any efficient way to suspend/resume 
pseudo-random sequence.
But frankly speaking, that such behaviour of random is completely not 
acceptable and built-in session pool unusable.



> This is really very similar to the problem that parallel query has
> when spinning up new worker backends.  As far as possible, we want the
> worker backends to have the same state as the original backend.
> However, there's no systematic way of being sure that every relevant
> backend-private global, including perhaps globals added by loadable
> modules, is in exactly the same state.  For parallel query, we solved
> that problem by copying a bunch of things that we knew were
> commonly-used (cf. parallel.c) and by requiring functions to be
> labeled as parallel-restricted if they rely on anything other state.
> The problem for connection pooling is much harder.  If you only ever
> ran parallel-safe functions throughout the lifetime of a session, then
> you would know that the session has no "hidden state" other than what
> parallel.c already knows about (except for any functions that are
> mislabeled, but we can say that's the user's fault for mislabeling
> them).  But as soon as you run even one parallel-restricted or
> parallel-unsafe function, there might be a global variable someplace
> that holds arbitrary state which the core system won't know anything
> about.  If you want to have some other process take over that session,
> you need to copy that state to the new process; if you want to reuse
> the current process for a new session, you need to clear that state.
> Since you don't know it exists or where to find it, and since the code
> to copy and/or clear it might not even exist, you can't.
>
> In other words, transparent connection pooling is going to require
> some new mechanism, which third-party code will have to know about,
> for tracking every last bit of session state that might need to be
> preserved or cleared.  That's going to be a big project.  Maybe some
> of that can piggyback on existing infrastructure like
> InvalidateSystemCaches(), but there's probably still a ton of ad-hoc
> state to deal with.  And no out-of-core pooler has a chance of
> handling all that stuff correctly; an in-core pooler will be able to
> do so only with a lot of work.

I think that situation with parallel executors are slightly different: 
in this case several backends perform execution of the same query.
So them really need to somehow share/synchronize state of static variables.
But in case of connection pooling only one transaction is executed by 
backend at each moment of time. And there should be no problems with 
static variables unless them cross transaction boundary. But I do not 
think that there are many such variables.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Oddity in tuple routing for foreign partitions
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] Runtime Partition Pruning