Re: plpgsql; execute query inside exists - Mailing list pgsql-general

From Alban Hertroys
Subject Re: plpgsql; execute query inside exists
Date
Msg-id CAF-3MvPcDSYMerp6Zo1=gfu2i3x8gR-=f9aarFSe9ymHVzj66Q@mail.gmail.com
Whole thread Raw
In response to Re: plpgsql; execute query inside exists  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: plpgsql; execute query inside exists  (Merlin Moncure <mmoncure@gmail.com>)
List pgsql-general
On 17 October 2011 16:24, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 8:44 AM, Alban Hertroys <haramrae@gmail.com> wrote:
>> On 17 October 2011 15:20, Merlin Moncure <mmoncure@gmail.com> wrote:
>>> A better way to do this is to query information_schema:
>>>
>>> PERFORM 1 FROM information_schema.tables where schema_name = x and
>>> table_name = y;
>>>
>>> IF FOUND THEN
>>>  CREATE TABLE ...
>>> END IF;
>>>
>>> (there is a race condition in the above code -- do you see it? if
>>> concurrent access to this function is an issue, you have to LOCK an
>>> object before running the PERFORM or perhaps use an advisory lock).
>>
>> Is there? You'd think that with transactional DDL and the code running
>> in a single transaction (namely inside a stored function) it would be
>> concurrency-safe.
>
> Transactional DDL does not protect you from race conditions any more
> than MVCC protects you from race conditions in regular DML.  What
> transactional DDL does is roll back the changes in the event of an
> error so you don't have half written schema changes in your database.
> MVCC gives a rigorous definition of visibility rules and transactions
> guarantee only a complete unit of work getting committed to the
> database.  You still have to code defensively against multi-user
> access however.  The good news is that multi user coding is about an
> order of magnitude easier in sql (especially postgres variant) than in
> any other development platform that I'm aware of.
>
> The race condition here is basically the same problem that affects
> 'upsert' patterns:
>
> test record(s) if found update if not found insert;
>
> The problem comes that in between the test and the insert case someone
> else can also test and get the insert in before you do.  You have two
> general strategies to get around this: locking and retry.  I greatly
> advise going the locking route unless your concurrency requirements
> are very high.  It's much simpler, and since you're not invoking a
> subtransaction, faster in the uncontested case.

So what would happen if you don't lock? I think it's this:

 Session A        | Session B
----------------+-----------------
 SELECT x        | SELECT x
 NOT FOUND        | NOT FOUND
   ...            | CREATE TABLE
 CREATE TABLE    | <commit>
 <error>        |
 <rollback>        |

If I understand correctly, if you don't mind the error and the
subsequent rollback in Session A, than there's not much need to lock,
or is there? It is important to be aware of the possible rollback of
such a transaction, of course.

And what would you lock? A record in information_schema.tables? That's
a read-only view. A table that doesn't exist yet? Can't do. A record
in the pg_ schema? Rather not...

I suppose you could work around that problem by keeping track of your
own tables that were generated using aforementioned plpgsql function.
Then you have a table (that you own) with records to lock.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

pgsql-general by date:

Previous
From: Evan Walter
Date:
Subject: Re: [ADMIN] Trying to use binary replication - from tutorial
Next
From: Alban Hertroys
Date:
Subject: Re: 9.1 got really fast ;)