Re: SELECT FOR UPDATE differs inside and outside a pl/pgsql - Mailing list pgsql-bugs

From Mark Shewmaker
Subject Re: SELECT FOR UPDATE differs inside and outside a pl/pgsql
Date
Msg-id 1071703211.2082.225.camel@k9
Whole thread Raw
In response to Re: SELECT FOR UPDATE differs inside and outside a pl/pgsql function (7.4)  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: SELECT FOR UPDATE differs inside and outside a pl/pgsql
List pgsql-bugs
On Wed, 2003-12-17 at 14:02, Tom Lane wrote:
> Mark Shewmaker <mark@primefactor.com> writes:
> > In other words:  Is this a bug or a user misunderstanding:
>
> You've got the function doing
>
> >     LOOP
> >       select * into myrow from mytable limit 1 for update;
> >       if found then exit;
> >       end if;
> >     END LOOP;
>
> which means it will loop infinitely if the SELECT finds no row.
>
> Because "LIMIT 1" executes before "FOR UPDATE" does, it is possible that
> the first row returned by the query is rejected by FOR UPDATE (in this
> case because it was deleted by an already-committed transaction), and
> then the query will not examine any more rows because of the LIMIT.
>
> I think you could get the behavior you want with
>
>       select * into myrow from mytable for update;
>
> that is, just rely on the implicit LIMIT 1 associated with SELECT INTO
> rather than writing one in the query.  Then you will get the first row
> that passes the FOR UPDATE restriction, which I think is what you're
> after.

You're right:  Changing the loop in myfunction from the above to use
"select * into myrow from mytable for update;" results in the behavior I
was after.

What's more, the the whole loop can be replaced with that single "select
* into myrow from mytable for update;", and it works just perfectly!

(I think you may have been implying that.)

If the first transaction holds the first available row locked, a second
transaction running this function just waits.  When the first
transaction deletes the row and then commits, the second transaction
suddenly selects the next available row and goes on its merry way.

All very excellent!

> There has been some argument about whether FOR UPDATE shouldn't execute
> before LIMIT, but IIRC there are nasty corner cases in that case too.
> So it's hard to say if this should be considered a bug or not.

Okay.  Well, I wouldn't want to suggest anything that would cause more
problems, especially given that there's a simple and effective
solution/workaround.  ("Workaround" being a bit strong if this isn't
even clearly a bug.)

However, your answer does leave some other questions open as well as
opening up more:

1.  There's still the original outstanding oddity that there was a
    *difference* in the first place between the results from a "select
    limit 1 for update" command run directly within pgsql and a
    "select into variable limit 1 for update" run within a plpgsql
    function that's run directly within pgsql.

    The original function never could lock a row, whereas the command
    run directly could obtain a lock on the second try:

        Infinite loop in function:  Never locks a row.
        First attempt run directly:  No row locked.
        Second attempt run directly:  One row successfully locked.

    If a "FOR UPDATE executes before LIMIT" rule stopped the function
    from ever locking a row, it's still curious why didn't it stop the
    direct command from ever locking a row as well.  That still looks
    troublingly inconsistent.  (Even though it's easy to get the desired
    result in either case admittedly.)

2.  There's now a difference between the suggested "select * into
    myrow from mytable for update;" run within a function, with its
    JUST GREAT behavior, and the original "select * from mytable
    limit 1 for update;":

       New command run from function, first try:  One row successfully
                                                  locked.
       Command run directly, first try:  No row locked.
       Command run directly, second try:  One row locked successfully.

    I suppose my main question here is:  Can I rely on this unexpected
    (by me) and delightful feature of the suggested "select * into
    myrow from mytable for update;" to not require multiple runs in
    this sort of situation?

3.  If there are nasty deadlocks in FOR UPDATE running before LIMIT,
    and there are these slight inconsistencies in how select-for-update
    works depending on where the command is run, I suppose I wonder
    if some of the inconsistencies can trigger some of these problems.

    (Very vague, I know.  I don't have any problems to report or ask
    about related to this, but.. figured I should bring it up in case
    it causes someone to suddenly realize or think of where a real
    deadlock might be.)

4.  As an aside, since another way to directly solve the problem would
    be a way to only select rows that aren't locked, are there any
    thoughts on having that sort of functionality in a future revision?

    (From looking things up online, Oracle 8.1 and onward has something
    like this called "Skip Locked" (with "NOWAIT" being something close
    to the opposite), and microsoft sql-server has something like it
    referred to as "read past".)

Thanks,

 -Mark Shewmaker
  mark at primefactor dot com

pgsql-bugs by date:

Previous
From: "PostgreSQL Bugs List"
Date:
Subject: BUG #1012: missing server/*.h on suse devel rpm package
Next
From: Tom Lane
Date:
Subject: Re: SELECT FOR UPDATE differs inside and outside a pl/pgsql