Re: Built-in connection pooling - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Built-in connection pooling |
Date | |
Msg-id | 40404619-0ebf-a642-b010-0805e66609c7@postgrespro.ru Whole thread Raw |
In response to | Re: Built-in connection pooling (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Built-in connection pooling
|
List | pgsql-hackers |
On 23.04.2018 23:14, Robert Haas wrote: > On Wed, Apr 18, 2018 at 9:41 AM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> Well, may be I missed something, but i do not know how to efficiently >>> support >>> 1. Temporary tables >>> 2. Prepared statements >>> 3. Sessoin GUCs >>> with any external connection pooler (with pooling level other than >>> session). >> Me neither. What makes it easier to do these things in an internal >> connection pooler? What could the backend do differently, to make these >> easier to implement in an external pooler? > I think you are Konstantin are possibly failing to see the big picture > here. Temporary tables, prepared statements, and GUC settings are > examples of session state that users expect will be preserved for the > lifetime of a connection and not beyond; all session state, of > whatever kind, has the same set of problems. A transparent connection > pooling experience means guaranteeing that no such state vanishes > before the user ends the current session, and also that no such state > established by some other session becomes visible in the current > session. And we really need to account for *all* such state, not just > really big things like temporary tables and prepared statements and > GUCs but also much subtler things such as the state of the PRNG > established by srandom(). It is not quit true thst I have not realized this issues. In addition to connection pooling, I have also implemented pthread version of Postgres and their static variables are replaced with thread-local variables which let each thread use its own set of variables. Unfortunately in connection pooling this approach can not be used. But I think that performing scheduling at transaction level will eliminate the problem with static variables in most cases. My expectation is that there are very few of them which has session-level lifetime. Unfortunately it is not so easy to locate all such places. Once such variables are located, them can be saved in session context and restored on reschedule. More challenging thing is to handle system static variables which which can not be easily saved/restored. You example with srandom is exactly such case. Right now I do not know any efficient way to suspend/resume pseudo-random sequence. But frankly speaking, that such behaviour of random is completely not acceptable and built-in session pool unusable. > This is really very similar to the problem that parallel query has > when spinning up new worker backends. As far as possible, we want the > worker backends to have the same state as the original backend. > However, there's no systematic way of being sure that every relevant > backend-private global, including perhaps globals added by loadable > modules, is in exactly the same state. For parallel query, we solved > that problem by copying a bunch of things that we knew were > commonly-used (cf. parallel.c) and by requiring functions to be > labeled as parallel-restricted if they rely on anything other state. > The problem for connection pooling is much harder. If you only ever > ran parallel-safe functions throughout the lifetime of a session, then > you would know that the session has no "hidden state" other than what > parallel.c already knows about (except for any functions that are > mislabeled, but we can say that's the user's fault for mislabeling > them). But as soon as you run even one parallel-restricted or > parallel-unsafe function, there might be a global variable someplace > that holds arbitrary state which the core system won't know anything > about. If you want to have some other process take over that session, > you need to copy that state to the new process; if you want to reuse > the current process for a new session, you need to clear that state. > Since you don't know it exists or where to find it, and since the code > to copy and/or clear it might not even exist, you can't. > > In other words, transparent connection pooling is going to require > some new mechanism, which third-party code will have to know about, > for tracking every last bit of session state that might need to be > preserved or cleared. That's going to be a big project. Maybe some > of that can piggyback on existing infrastructure like > InvalidateSystemCaches(), but there's probably still a ton of ad-hoc > state to deal with. And no out-of-core pooler has a chance of > handling all that stuff correctly; an in-core pooler will be able to > do so only with a lot of work. I think that situation with parallel executors are slightly different: in this case several backends perform execution of the same query. So them really need to somehow share/synchronize state of static variables. But in case of connection pooling only one transaction is executed by backend at each moment of time. And there should be no problems with static variables unless them cross transaction boundary. But I do not think that there are many such variables. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: