Thread: Rethinking locking for database create/drop vs connection startup
This is motivated by Jim Buttafuoco's recent gripe about not being able to connect while a DROP DATABASE is in progress: http://archives.postgresql.org/pgsql-hackers/2006-05/msg00074.php The whole area is really pretty grotty anyway --- the existing interlock does not prevent an incoming connection from trying to connect to the victim database, only make sure that we detect it later. This is not very good, for two reasons. One is that you'll most likely get a very unfriendly error message due to attempts to access already-missing system catalogs; when I experimented just now I got psql: FATAL: could not open relation 1663/104854/1259: No such file or directory which is really not the way I'd like to report "database foo just got deleted". The other problem is that I'm not entirely convinced that a backend trying to do this won't leave any permanent problems behind, most likely in the form of dirty shared buffers for subsequently-deleted system catalogs in the victim database. ReverifyMyDatabase tries to clean that up by doing DropDatabaseBuffers, but that only helps if we get as far as ReverifyMyDatabase. It strikes me that we now have a decent tool for solving the problem, which is LockSharedObject() --- that is, there exists a locktag convention whereby we can "take a lock" on a database as such, rather than having to use table-level locks on pg_database as proxy. The locktag would be in the form of an OID so it would identify a DB by OID. If dropdb() takes such a lock before it checks for active backends, then the connection sequence can look like this: 1. read pg_database flat file to find out OID of target DB2. initialize far enough to be able to start a transaction, anddo so3. take a shared lock on the target DB by OID4. re-read pg_database flat file and verify DB still exists If step 4 fails to find the DB in the flat file, then we can bomb out before we've made any attempt to touch catalogs of the target DB. This ensures both a reasonable error message, and no pollution of shared buffers. If we get past step 4 then we don't have to worry about concurrent dropdb() anymore. (The shared lock will only last until we commit the startup transaction, but that's OK --- once we are listed in the PGPROC array we don't need the lock anymore.) It's slightly annoying to have to read the flat file twice, but for reasonable numbers of databases per installation I don't think this will pose any material performance penalty. The file will certainly still be sitting in kernel disk cache. It's still necessary to serialize CREATE/DROP DATABASE commands against each other, to ensure that only one backend tries to write the flat file at a time, but with this scheme they'd not need to block connections being made to unrelated databases. Thoughts, better ideas? regards, tom lane
> It's slightly annoying to have to read the flat file twice, but > for reasonable numbers of databases per installation I don't think > this will pose any material performance penalty. The file will > certainly still be sitting in kernel disk cache. Dropping a db isn't exactly a "common" occurrence anyway no?
On Wed, 2006-05-03 at 16:15 -0400, Tom Lane wrote: > This is motivated by Jim Buttafuoco's recent gripe about not being > able to connect while a DROP DATABASE is in progress: > http://archives.postgresql.org/pgsql-hackers/2006-05/msg00074.php ... > If dropdb() takes such a lock before it checks for active > backends, then the connection sequence can look like this: > > 1. read pg_database flat file to find out OID of target DB > 2. initialize far enough to be able to start a transaction, > and do so > 3. take a shared lock on the target DB by OID > 4. re-read pg_database flat file and verify DB still exists Many people never CREATE or DROP databases. They just do everything in the default database (name is release dependent) - at least on their main system(s). It would be valid to optimize for that case. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs <simon@2ndquadrant.com> writes: > On Wed, 2006-05-03 at 16:15 -0400, Tom Lane wrote: >> If dropdb() takes such a lock before it checks for active >> backends, then the connection sequence can look like this: > Many people never CREATE or DROP databases. They just do everything in > the default database (name is release dependent) - at least on their > main system(s). It would be valid to optimize for that case. I'm not particularly concerned about people with only a couple of databases --- reading the flat file isn't going to take any meaningful amount of time for them anyway. It's the folks with hundreds of databases who might have a beef. But those are exactly the people who need create/drop database to be bulletproof. As I've been working on this patch I've found that it will clean up a whole lot of related issues, so I'm getting more and more convinced it's the Right Thing. Some points: * Connecting will actually take RowExclusiveLock (ordinary writer's lock), while CREATE DATABASE takes ShareLock on the template DB, and of course DROP/RENAME DATABASE take AccessExclusiveLock. This provides for the first time an absolute guarantee that CREATE DATABASE gets a consistent copy of the template: before we could never ensure that someone didn't connect to the template and change it while the copy was in progress. At the same time, two CREATE DATABASEs can safely use the same template, and of course two concurrent connections don't block each other. * Since we're trying not to take any table-level exclusive locks on pg_database anymore, we need a different solution in flatfiles.c to ensure only one transaction writes the flat file at a time. To do this I'm going to have a dedicated lock, used only in the flatfile code, that is taken just before trying to write the file and held till commit (which is immediately after). This eliminates the former risk of deadlock associated with manual updates to pg_database, and as a bonus holds the exclusive lock for a much shorter period of time. regards, tom lane