Re: Race condition with libpq - Mailing list pgsql-interfaces

From Dietmar May
Subject Re: Race condition with libpq
Date
Msg-id 008401c3239e$1cd6ccf0$fb02a8c0@muskrat
Whole thread Raw
In response to Re: Race condition with libpq  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-interfaces
Hi Tom,

Thanks for your response. My post (non-member) was delayed for quite some
time; and I'm grateful to actually get a response from someone.

> "Dietmar May" <dcmay@dmis.com> writes:
> > ... If just executing without any time lag between calls (i.e. no
> > human interaction), then I consistently and repeatably receive the error
> > "Drop database failed : ERROR:  DROP DATABASE: database
> \"somedb\" is being
> > accessed by other users\n".
>
> > The "other user" is the PGconn that was just released using a PGfinish!
>
> Yeah, we have seen this behavior reported before; though I've not found
> it to be easily reproducible myself.  It's pretty hard to believe that a
> backend shutdown would take longer than a backend startup; so my take on
> it is that for some reason, the kernel is not choosing to promptly
> report closure of the first connection to the first backend.  Normally
> it doesn't matter much if a backend is slow to exit ... but it does if
> you wanna drop its database.

FYI, I've "solved" the problem by adding a 1ms sleep before issuing the DROP
DATABASE command. Yuck!

It is obviously a race condition, though.

Would I be right in guessing (from your comments) that if PQexec issues a
command that does not return a result set, that it executes it
asynchronously? In other words, after verifying that there are no error
conditions (like dropping a table that doesn't exist), it starts off a
back-end thread (or perhaps connects to a thread from a pool) that actually
performs the operation.

That's the only way that I could see this could fail -- because my code
can't issue a CREATE DATABASE until after the PQexec("DROP DATABASE") has
returned.

If this is indeed the case, I would expect the problem to pop up again if I
were dropping a large table with a lot of data; because that would likely
take longer than my "sleep" time.

> My recommendation would be not to connect to the victim database at all.
> Connect to template1, then try to issue a DROP DATABASE.  You don't much
> care if that fails.  Then issue CREATE DATABASE.

Unfortunately, since I'm working on a generic C++ wrapper for PGsql, I do
care if it fails. I have no way to distinguish between a DROP DATABASE
failing because the table doesn't exist vs. failing because of some valid
error condition, like another process using the table (other than perhaps
the very undesirable approach of parsing out the text of PQerrorMessage).

> > I consider it a reasonable expectation to connect to the PGsql server
> > without opening any database, and being able to perform some basic
> > operations that do not rely on having an open database (CREATE
> USER, ALTER
> > USER, DROP USER, CREATE DATABASE, DROP DATABASE, etc).
>
> There is no such thing as a "connection that doesn't open any database",
> and I do not think there could be.  The convention is to connect to
> template1, which is supposed to be there.  We could perhaps put stronger
> restrictions on what you could do to template1, to positively ensure
> that template1 is always available to connect to --- but that wouldn't
> really solve the problem you discuss above.

The admin guide 4.1.1 states, "for example, one could drop template1 and
recreate it from template0 without any ill effects". [4.1, on the other
hand, states "this database is called template1 and cannot be deleted".]

I guess I had misread the first statement as meaning "you can drop the
template1 database and have a cluster that doesn't contain a database called
template1"; but this is apparently not a valid interpretation, since PGsql
seems to use template1 for many internal operations.

I don't understand the internals of PGsql - not sure I understand your
comment that "I don't think there could be a connection that doesn't open
any database". Perhaps that is because PGsql internally manages information
about users and databases within some internal database tables.

Nonetheless, from an application perspective (that is, regardless of the
internal operation of PGsql), creating users and databases have nothing to
do with a specific database. A user is at the connection level, not the
database level. Please correct me if I'm wrong, but I don't think it's
possible to create a user that applies only to one database in the system.
So, as a programmer using the system, I think it's undesirable to deal with
template1 for these kinds of operations, even if that's what PGsql does
internally.

Thanks again for your response,
Dietmar



pgsql-interfaces by date:

Previous
From: Tom Lane
Date:
Subject: Re: Race condition with libpq
Next
From: "Dietmar May"
Date:
Subject: Re: Race condition with libpq