Thread: Rethinking locking for database create/drop vs connection startup

Rethinking locking for database create/drop vs connection startup

From
Tom Lane
Date:
This is motivated by Jim Buttafuoco's recent gripe about not being
able to connect while a DROP DATABASE is in progress:
http://archives.postgresql.org/pgsql-hackers/2006-05/msg00074.php
The whole area is really pretty grotty anyway --- the existing interlock
does not prevent an incoming connection from trying to connect to the
victim database, only make sure that we detect it later.  This is not
very good, for two reasons.  One is that you'll most likely get a very
unfriendly error message due to attempts to access already-missing
system catalogs; when I experimented just now I got

psql: FATAL:  could not open relation 1663/104854/1259: No such file or directory

which is really not the way I'd like to report "database foo just got
deleted".  The other problem is that I'm not entirely convinced that a
backend trying to do this won't leave any permanent problems behind,
most likely in the form of dirty shared buffers for subsequently-deleted
system catalogs in the victim database.  ReverifyMyDatabase tries to
clean that up by doing DropDatabaseBuffers, but that only helps if we
get as far as ReverifyMyDatabase.

It strikes me that we now have a decent tool for solving the problem,
which is LockSharedObject() --- that is, there exists a locktag
convention whereby we can "take a lock" on a database as such, rather
than having to use table-level locks on pg_database as proxy.  The
locktag would be in the form of an OID so it would identify a DB by
OID.  If dropdb() takes such a lock before it checks for active
backends, then the connection sequence can look like this:
1. read pg_database flat file to find out OID of target DB2. initialize far enough to be able to start a transaction,
anddo so3. take a shared lock on the target DB by OID4. re-read pg_database flat file and verify DB still exists
 

If step 4 fails to find the DB in the flat file, then we can bomb
out before we've made any attempt to touch catalogs of the target
DB.  This ensures both a reasonable error message, and no pollution
of shared buffers.  If we get past step 4 then we don't have to worry
about concurrent dropdb() anymore.  (The shared lock will only last
until we commit the startup transaction, but that's OK --- once we
are listed in the PGPROC array we don't need the lock anymore.)

It's slightly annoying to have to read the flat file twice, but
for reasonable numbers of databases per installation I don't think
this will pose any material performance penalty.  The file will
certainly still be sitting in kernel disk cache.

It's still necessary to serialize CREATE/DROP DATABASE commands against
each other, to ensure that only one backend tries to write the flat file
at a time, but with this scheme they'd not need to block connections
being made to unrelated databases.

Thoughts, better ideas?
        regards, tom lane


Re: Rethinking locking for database create/drop vs connection

From
Christopher Kings-Lynne
Date:
> It's slightly annoying to have to read the flat file twice, but
> for reasonable numbers of databases per installation I don't think
> this will pose any material performance penalty.  The file will
> certainly still be sitting in kernel disk cache.

Dropping a db isn't exactly a "common" occurrence anyway no?



Re: Rethinking locking for database create/drop vs

From
Simon Riggs
Date:
On Wed, 2006-05-03 at 16:15 -0400, Tom Lane wrote:
> This is motivated by Jim Buttafuoco's recent gripe about not being
> able to connect while a DROP DATABASE is in progress:
> http://archives.postgresql.org/pgsql-hackers/2006-05/msg00074.php

...

>  If dropdb() takes such a lock before it checks for active
> backends, then the connection sequence can look like this:
> 
>     1. read pg_database flat file to find out OID of target DB
>     2. initialize far enough to be able to start a transaction,
>        and do so
>     3. take a shared lock on the target DB by OID
>     4. re-read pg_database flat file and verify DB still exists

Many people never CREATE or DROP databases. They just do everything in
the default database (name is release dependent) - at least on their
main system(s). It would be valid to optimize for that case.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com



Simon Riggs <simon@2ndquadrant.com> writes:
> On Wed, 2006-05-03 at 16:15 -0400, Tom Lane wrote:
>> If dropdb() takes such a lock before it checks for active
>> backends, then the connection sequence can look like this:

> Many people never CREATE or DROP databases. They just do everything in
> the default database (name is release dependent) - at least on their
> main system(s). It would be valid to optimize for that case.

I'm not particularly concerned about people with only a couple of
databases --- reading the flat file isn't going to take any meaningful
amount of time for them anyway.  It's the folks with hundreds of
databases who might have a beef.  But those are exactly the people
who need create/drop database to be bulletproof.

As I've been working on this patch I've found that it will clean up a
whole lot of related issues, so I'm getting more and more convinced
it's the Right Thing.  Some points:

* Connecting will actually take RowExclusiveLock (ordinary writer's
lock), while CREATE DATABASE takes ShareLock on the template DB, and
of course DROP/RENAME DATABASE take AccessExclusiveLock.  This provides
for the first time an absolute guarantee that CREATE DATABASE gets a
consistent copy of the template: before we could never ensure that
someone didn't connect to the template and change it while the copy was
in progress.  At the same time, two CREATE DATABASEs can safely use the
same template, and of course two concurrent connections don't block
each other.

* Since we're trying not to take any table-level exclusive locks on
pg_database anymore, we need a different solution in flatfiles.c to
ensure only one transaction writes the flat file at a time.  To do this
I'm going to have a dedicated lock, used only in the flatfile code, that
is taken just before trying to write the file and held till commit
(which is immediately after).  This eliminates the former risk of
deadlock associated with manual updates to pg_database, and as a bonus
holds the exclusive lock for a much shorter period of time.
        regards, tom lane