Thread: changing MyDatabaseId

changing MyDatabaseId

From
Robert Haas
Date:
I've spent a few hours pouring over the source code with
coarse-toothed comb, trying to figure out just exactly what might
break if we changed MyDatabaseId after backend startup time, or in
other words, allowed a backend to unbind from the database to which it
was originally bound and rebind to a new one.  This is related to the
periodic conversations we've had about a built-in connection pooler,
and/or maintaining a pool of worker threads that could be used to
service parallel query, replication sets, etc.  What follows is not
meant to be a concrete design proposal; it's basic research that may
lead to a proposal at some time in the future.  Still, comments are
welcome.

For the use cases that seem interesting, it is sufficient to think
about changing MyDatabaseId when there is no transaction in progress,
or perhaps, when there is a transaction in progress which does that
and nothing else.  This means that code that simply uses MyDatabaseId
in the course of going about its duties is not going to blow up.
However, anything that stores the current database ID someplace is a
hazard, as is any code that stores information which was obtained
using the current database ID.  Looking through the code, it appears
to me that we'd need to do the following (not necessarily in this
order):

1. Blow away all of our catalog caches as if we've received a sinval
reset.  Since most of our caches are hooked into the catalog cache via
invalidation hooks, this should trigger cascading invalidations of
other caches that need it, at least in most cases.  (Further looking
is probably needed to identify any cases where a
sinval-reset-equivalent is insufficient.)
2. Blow away absolutely everything from the relcache.  It appears to
me that we'd need to repeat the phase 2 and 3 initialization steps.
We can't even keep the entries for shared catalogs, because the
descriptors that formrdesc() coughs up may not exactly match what we
read from the pg_class entry.  It's tempting to propose adding enough
additional shared catalogs to make the relcache entries for shared
catalogs independent of any particular database's contents, but right
now they are not.
3. Reinitialize the relmapper.
4. Release all locks on objects in the old database; and our shared
lock on the database itself.  Take a new shared lock on the new
database.
5. Flush out any backend-local statistics that have been gathered but
not yet sent to the statistics collector.
6. Update our entry in the PgBackendStatus array so pg_stat_activity
sees the new database assignment.
7. Change MyProc->databaseId while holding ProcArrayLock in LW_EXCLUSIVE mode.
8. Update MyDatabaseTableSpace.

I haven't benchmarked this (so perhaps I should STFU) but #2 and maybe
#1 and #3 sounds like the most painful part of this activity.  Still,
there's not much help for it that I can see.  The best you can hope
for is to keep around entries to the shared catalogs, and that's not
going to get you terribly far, and at least ATM even that is unsafe.
I think maybe the thing to do is try to quantify how much time is
being spent in each part of the backend startup process - like connect
to the database lots and lots of times with individual connections,
run a trivial SELECT against a table, and then disconnect and repeat.
The problem is that (unless someone knows how to do magic tricks with
oprofile or dtrace) you're only going to find out the amount of time
actually spent executing each portion of the code.  It won't tell you
how much overhead you have from fork() itself, let alone how much
faster the initialization steps would have run if they had been
running on a warmed-up process address space rather than one that
might well need to fault in a lot of page-table entries in its first
few moments of life.  But it might still provide some useful data.
Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: changing MyDatabaseId

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Looking through the code, it appears to me that we'd need to do the
> following (not necessarily in this order):

Don't forget

9. Unload loadable modules that do not exist according to the new
database's catalogs; eg we don't want postgis trying to run when
its supporting tables don't exist in the selected database.
10. Somehow persuade remaining loadable modules to discard whatever
state they have that might be database-local.

We don't have workable APIs for either of those operations ATM.

I believe also that there are probably race conditions in several of
the steps you listed; in particular there is certainly a risk involved
in changing the database-we-advertise-being-connected-to versus a
concurrent DROP DATABASE.  Maybe that's easily soluble, but I'm not sure
about it.
        regards, tom lane


Re: changing MyDatabaseId

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of lun nov 15 02:41:40 -0300 2010:

> I believe also that there are probably race conditions in several of
> the steps you listed; in particular there is certainly a risk involved
> in changing the database-we-advertise-being-connected-to versus a
> concurrent DROP DATABASE.  Maybe that's easily soluble, but I'm not sure
> about it.

Perhaps it would help to first disconnect from the original database and
not reconnect to any one in particular, i.e. stay in a state like the
autovac launcher is in.  Then cleanup as per Robert's list, then switch
to a valid database ID.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: changing MyDatabaseId

From
Robert Haas
Date:
On Mon, Nov 15, 2010 at 12:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Looking through the code, it appears to me that we'd need to do the
>> following (not necessarily in this order):
>
> Don't forget
>
> 9. Unload loadable modules that do not exist according to the new
> database's catalogs; eg we don't want postgis trying to run when
> its supporting tables don't exist in the selected database.

I might be missing something here, but I wasn't aware that the
catalogs had any say in what loadable modules exist.  Obviously their
contents will determine whether such loadable modules ever get invoked
as a result of actions at the SQL level, but if they don't it may not
matter much that they're loaded but idle.

> 10. Somehow persuade remaining loadable modules to discard whatever
> state they have that might be database-local.

Ouch.

> We don't have workable APIs for either of those operations ATM.

Somewhat independently of this problem, it would be nice to be able to
unload a loadable module, and I think that we currently don't support
that at all.  One thing (the only thing?) that makes that a
show-stopper is that the way we manage hook function chaining makes it
impossible for a module to reliably unload itself.  If you load both
auto_explain and pg_stat_statements, for example, you end up with a
tangle of pointers that can't be easily unwound.  I don't have a real
clear idea how to fix it, but the status quo certainly doesn't seem
optimal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: changing MyDatabaseId

From
Markus Wanner
Date:
Robert,

On 11/15/2010 05:39 AM, Robert Haas wrote:
> I've spent a few hours pouring over the source code with
> coarse-toothed comb, trying to figure out just exactly what might
> break if we changed MyDatabaseId after backend startup time, or in
> other words, allowed a backend to unbind from the database to which it
> was originally bound and rebind to a new one.  This is related to the
> periodic conversations we've had about a built-in connection pooler,
> and/or maintaining a pool of worker threads that could be used to
> service parallel query, replication sets, etc.  What follows is not
> meant to be a concrete design proposal; it's basic research that may
> lead to a proposal at some time in the future.  Still, comments are
> welcome.

Thanks a lot for doing that, saved me a couple hours (presumably more
than it cost you :-)

> Thoughts?

The question obviously is whether or not this is faster than just
terminating one backend and starting a new one. Which basically costs an
additional termination and re-creation of a process (i.e. fork())
AFAICS. Or what other savings do you envision?

If that's it, it certainly seems like a huge amount of work for very
little benefit. Or does this feature enable something that's impossible
to do otherwise?

Regards

Markus Wanner


Re: changing MyDatabaseId

From
Andres Freund
Date:
On Wednesday 17 November 2010 11:04:04 Markus Wanner wrote:
> Robert,
> 
> On 11/15/2010 05:39 AM, Robert Haas wrote:
> > I've spent a few hours pouring over the source code with
> > coarse-toothed comb, trying to figure out just exactly what might
> > break if we changed MyDatabaseId after backend startup time, or in
> > other words, allowed a backend to unbind from the database to which it
> > was originally bound and rebind to a new one.  This is related to the
> > periodic conversations we've had about a built-in connection pooler,
> > and/or maintaining a pool of worker threads that could be used to
> > service parallel query, replication sets, etc.  What follows is not
> > meant to be a concrete design proposal; it's basic research that may
> > lead to a proposal at some time in the future.  Still, comments are
> > welcome.
> 
> Thanks a lot for doing that, saved me a couple hours (presumably more
> than it cost you :-)
> 
> > Thoughts?
> 
> The question obviously is whether or not this is faster than just
> terminating one backend and starting a new one. Which basically costs an
> additional termination and re-creation of a process (i.e. fork())
> AFAICS. Or what other savings do you envision?
Well, one could optimize most of the resetting away if the the old 
MyDatabaseId and the new one are the same  - an optimization which is hardly 
possible with forking new backends.

Also I think it could lower the impact due locking the procarray an related 
areas.

Andres


Re: changing MyDatabaseId

From
Markus Wanner
Date:
Andreas,

On 11/17/2010 11:38 AM, Andres Freund wrote:
> Well, one could optimize most of the resetting away if the the old 
> MyDatabaseId and the new one are the same  - an optimization which is hardly 
> possible with forking new backends.

Uh? Why not simply re-use the same backend, then? Or do you think of
re-connecting to the same database as a way to reset your connection?

> Also I think it could lower the impact due locking the procarray an related 
> areas.

That may be, yes. But as pointed out by Tom and Alvaro, you'd have to
adhere to a compatible sequence of changes to shared memory to avoid
race conditions. That possibly requires using a very similar locking
sequence as the combination of a detaching and a newly starting backend
would use.

Regards

Markus Wanner


Re: changing MyDatabaseId

From
Andres Freund
Date:
On Wednesday 17 November 2010 11:58:33 Markus Wanner wrote:
> Andreas,
> 
> On 11/17/2010 11:38 AM, Andres Freund wrote:
> > Well, one could optimize most of the resetting away if the the old
> > MyDatabaseId and the new one are the same  - an optimization which is
> > hardly possible with forking new backends.
> Uh? Why not simply re-use the same backend, then? Or do you think of
> re-connecting to the same database as a way to reset your connection?
I am thinking of a connection-pooler like setup. Quite often your main-load 
goes towards a single database - in that situation you don't have to reset the 
database id most of the time.

Andres


Re: changing MyDatabaseId

From
Markus Wanner
Date:
On 11/17/2010 12:09 PM, Andres Freund wrote:
> I am thinking of a connection-pooler like setup. Quite often your main-load 
> goes towards a single database - in that situation you don't have to reset the 
> database id most of the time.

Okay, so that's what I'd call a connection-reset or something. Or
probably a re-authentication feature which allows switching users (and
thus permissions on that same database). Could make sense for a
connection pooler, yes. Not having to flush caches in that case could
even turn out to be a good win. And I think it's a lot simpler to just
switch the user than to switch the database. Such a feature looks more
promising to me. (Do we have other connection related and unchangeable
state?)

As side notes: for the bgworkers in Postgres-R, I'm currently re-using
existing backends. As they only do change set application, the amount of
connection-reset required is minimal (well, there isn't a client
connection for these kinds of backends, in the first place, but that's
another story). Plus they are always acting as superuser, no
authentication or user switching required in that case.

For parallel querying as well as async transactions, it might make sense
to be able to switch users but remain connected to the same database.

Regards

Markus Wanner


Re: changing MyDatabaseId

From
Alvaro Herrera
Date:
Excerpts from Markus Wanner's message of mié nov 17 07:04:04 -0300 2010:

> > Thoughts?
> 
> The question obviously is whether or not this is faster than just
> terminating one backend and starting a new one. Which basically costs an
> additional termination and re-creation of a process (i.e. fork())
> AFAICS. Or what other savings do you envision?

I don't think it's a speed thing only.  It would be a great thing to
have in autovacuum, for example, where we have constant problem reports
because the system failed to fork a new backend.  If we could simply
reuse an already existing one, it would be a lot more robust.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: changing MyDatabaseId

From
Markus Wanner
Date:
On 11/17/2010 01:27 PM, Alvaro Herrera wrote:
> I don't think it's a speed thing only.  It would be a great thing to
> have in autovacuum, for example, where we have constant problem reports
> because the system failed to fork a new backend.  If we could simply
> reuse an already existing one, it would be a lot more robust.

Hm, that's an interesting point.

To actually increase robustness, it would have to be a failure scenario
that (temporarily) prevents forking, but allows an existing backend to
continue to do work (i.e. the ability to allocate memory or open files
come to mind).

Any idea about what's usually causing these fork() failures? I'm asking
because I'm afraid that for example, in case of an out of memory
condition, we'd just hit an OOM error later on, without being able to
perform the VACUUM job, either.

Regards

Markus Wanner


Re: changing MyDatabaseId

From
Alvaro Herrera
Date:
Excerpts from Markus Wanner's message of mié nov 17 09:57:18 -0300 2010:
> On 11/17/2010 01:27 PM, Alvaro Herrera wrote:
> > I don't think it's a speed thing only.  It would be a great thing to
> > have in autovacuum, for example, where we have constant problem reports
> > because the system failed to fork a new backend.  If we could simply
> > reuse an already existing one, it would be a lot more robust.
> 
> Hm, that's an interesting point.
> 
> To actually increase robustness, it would have to be a failure scenario
> that (temporarily) prevents forking, but allows an existing backend to
> continue to do work (i.e. the ability to allocate memory or open files
> come to mind).

Well, the autovacuum mechanism involves a lot of back-and-forth between
launcher and postmaster, which includes some signals, a fork() and
backend initialization.  The failure possibilities are endless.

Fork failure communication is similarly brittle.

> Any idea about what's usually causing these fork() failures? I'm asking
> because I'm afraid that for example, in case of an out of memory
> condition, we'd just hit an OOM error later on, without being able to
> perform the VACUUM job, either.

To be honest I have no idea.  Sometimes the server is just too loaded.
Right now we have this "delay", if the process is not up and running in
60 seconds then we have to assume that "something" happened, and we no
longer wait for it.  If we knew the process was already there, we could
leave it alone; we'd know it would get to its duty eventually.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: changing MyDatabaseId

From
Markus Wanner
Date:
On 11/17/2010 02:19 PM, Alvaro Herrera wrote:
> Well, the autovacuum mechanism involves a lot of back-and-forth between
> launcher and postmaster, which includes some signals, a fork() and
> backend initialization.  The failure possibilities are endless.
> 
> Fork failure communication is similarly brittle.

I certainly agree to that. However, a re-connecting mechanism wouldn't
allow us to get rid of the existing avworker startup infrastructure
entirely.

And for increased robustness, we'd require a less brittle re-connecting
mechanism. Given Robert's list, that doesn't seem trivial, either. (But
still doable, yes).

> Right now we have this "delay", if the process is not up and running in
> 60 seconds then we have to assume that "something" happened, and we no
> longer wait for it.  If we knew the process was already there, we could
> leave it alone; we'd know it would get to its duty eventually.

You are assuming presence of pool here. Which is fine, it's just not
something that a re-connecting feature would solve per se. (Says he who
coded the bgworkers pool thingie).

Regards

Markus Wanner


Re: changing MyDatabaseId

From
Tom Lane
Date:
Markus Wanner <markus@bluegap.ch> writes:
> On 11/17/2010 02:19 PM, Alvaro Herrera wrote:
>> Well, the autovacuum mechanism involves a lot of back-and-forth between
>> launcher and postmaster, which includes some signals, a fork() and
>> backend initialization.  The failure possibilities are endless.
>> 
>> Fork failure communication is similarly brittle.

> I certainly agree to that. However, a re-connecting mechanism wouldn't
> allow us to get rid of the existing avworker startup infrastructure
> entirely.

I'm afraid that any such change would trade a visible, safe failure
mechanism (no avworker) for invisible, impossible-to-debug data
corruption scenarios (due to failure to reset some bit of cached state).
It certainly won't give me any warm fuzzy feeling that I can trust
autovacuum.
        regards, tom lane


Re: changing MyDatabaseId

From
Markus Wanner
Date:
On 11/17/2010 04:25 PM, Tom Lane wrote:
> I'm afraid that any such change would trade a visible, safe failure
> mechanism (no avworker) for invisible, impossible-to-debug data
> corruption scenarios (due to failure to reset some bit of cached state).
> It certainly won't give me any warm fuzzy feeling that I can trust
> autovacuum.

Well, Alvaro doesn't quite seem have a warm fuzzy feeling with the
status quo, either. And I can certainly understand his concerns.

But yes, the os-level process separation and cache state reset guarantee
that an exit() / fork() pair provides is hard to match up against in
user space.

So, Alvaro's argument for robustness only stands under the assumption
that we can achieve a perfect cache state reset mechanism. Now, how
feasible is that? Are there any kind of tools that could help us check?

Regards

Markus Wanner


Re: changing MyDatabaseId

From
Robert Haas
Date:
On Wed, Nov 17, 2010 at 5:04 AM, Markus Wanner <markus@bluegap.ch> wrote:
> The question obviously is whether or not this is faster than just
> terminating one backend and starting a new one.

I agree.

> Which basically costs an
> additional termination and re-creation of a process (i.e. fork())
> AFAICS. Or what other savings do you envision?

I think the thing I'd like to do (or see somebody else do) is write a
test harness that connects to the database, runs a single SELECT *
FROM dual (one row, one column table), and then disconnects; and then
oprofile the daylights out of the backend processes.  In other words,
I'd like to measure as exactly as we can the overhead of each part of
the startup process.  I think that would give us a clearer picture of
where the overhead is, and then we could look more directly at which
line items might be avoidable for rebinding to a new database.

However, that test doesn't capture everything.  For example, imagine a
connection pooler sitting in front of PG.  Rebinding to a new database
means disconnecting a TCP connection and establishing a new one.
Switching databases might save some latency there even if we don't
actually save much in terms of CPU instructions.  Maybe that's not
important, though.  I don't know.  I don't want to let my theorizing
get too far ahead of the data.

It also occurs to me to wonder whether there's some way that we can
speed up backend startup, period.  One of the frequent complaints
levied against PostgreSQL is that our connections are too expensive.
AFAICT, this is partly because backend startup costs are high, and
partly because of internal contention, especially around
ProcArrayLock.  Squeezing on the startup overhead will be valuable no
matter what we decide to do about database switching.  A variant on
this theme would be to try to adjust the startup sequence in some way
so that less of it needs to be redone if we switch databases, which
might be possible even if a more general solution isn't.
Unfortunately, I'm not sure how feasible this is, but maybe there's a
way...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: changing MyDatabaseId

From
Greg Stark
Date:
On Wed, Nov 17, 2010 at 4:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> However, that test doesn't capture everything.  For example, imagine a
> connection pooler sitting in front of PG.  Rebinding to a new database
> means disconnecting a TCP connection and establishing a new one.
> Switching databases might save some latency there even if we don't
> actually save much in terms of CPU instructions.  Maybe that's not
> important, though.  I don't know.  I don't want to let my theorizing
> get too far ahead of the data.

Everything you said is true but there's more. A freshly created
backend needs to build relcache entries and for every relation in your
query. A reused connection eventually warms up the relcache and
syscaches and can plan new queries using them without doing any
syscalls. And of course if it's a query that's already been planned
might be able to reuse the entire plan structure without replanning
it.


--
greg


Re: changing MyDatabaseId

From
Robert Haas
Date:
On Wed, Nov 17, 2010 at 12:42 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Wed, Nov 17, 2010 at 4:52 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> However, that test doesn't capture everything.  For example, imagine a
>> connection pooler sitting in front of PG.  Rebinding to a new database
>> means disconnecting a TCP connection and establishing a new one.
>> Switching databases might save some latency there even if we don't
>> actually save much in terms of CPU instructions.  Maybe that's not
>> important, though.  I don't know.  I don't want to let my theorizing
>> get too far ahead of the data.
>
> Everything you said is true but there's more. A freshly created
> backend needs to build relcache entries and for every relation in your
> query. A reused connection eventually warms up the relcache and
> syscaches and can plan new queries using them without doing any
> syscalls. And of course if it's a query that's already been planned
> might be able to reuse the entire plan structure without replanning
> it.

I think you're missing the point.  If we switch databases, all cached
relations and plans have to be flushed anyway.  We're talking about
what might NOT need to be flushed on switching databases.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: changing MyDatabaseId

From
Greg Stark
Date:
On Wed, Nov 17, 2010 at 6:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I think you're missing the point.  If we switch databases, all cached
> relations and plans have to be flushed anyway.  We're talking about
> what might NOT need to be flushed on switching databases.

Oh sorry, yes, I missed that point.

I will mention that your point about TCP connection establishment
latency is real. TCP connection establishment adds several
milliseconds of latency (for low latency connections -- obviously it's
much much worse for long-haul connections) and then the congestion
control slow start adds more if there's a significant amount of data
to transfer.

--
greg


Re: changing MyDatabaseId

From
Bruce Momjian
Date:
Alvaro Herrera wrote:
> Excerpts from Tom Lane's message of lun nov 15 02:41:40 -0300 2010:
> 
> > I believe also that there are probably race conditions in several of
> > the steps you listed; in particular there is certainly a risk involved
> > in changing the database-we-advertise-being-connected-to versus a
> > concurrent DROP DATABASE.  Maybe that's easily soluble, but I'm not sure
> > about it.
> 
> Perhaps it would help to first disconnect from the original database and
> not reconnect to any one in particular, i.e. stay in a state like the
> autovac launcher is in.  Then cleanup as per Robert's list, then switch
> to a valid database ID.

One question is whether we keep our caches in the hope we will continue
with the same database, or clear our caches so we don't have to incur
the overhead of clearing them when we connect a new session.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +