Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
Date
Msg-id 603c8f070909271031m4e838366nb8a48689a3d8c3df@mail.gmail.com
Whole thread Raw
In response to Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5  (Boszormenyi Zoltan <zb@cybertec.at>)
Responses Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
List pgsql-hackers
On Mon, Sep 21, 2009 at 6:07 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
> Jeff Janes írta:
>> On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at
>> <mailto:zb@cybertec.at>> wrote:
>>
>>     Boszormenyi Zoltan írta:
>>     > Alvaro Herrera írta:
>>     >
>>     >> Boszormenyi Zoltan wrote:
>>     >>
>>     >>
>>     >>
>>     >>> The vague consensus for syntax options was that the GUC
>>     >>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT
>>     >>> is allowed) both should be implemented.
>>     >>>
>>     >>> Behaviour would be that N seconds timeout should be
>>     >>> applied to every lock that the statement would take.
>>     >>>
>>     >>>
>>     >> In
>>     http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us
>>     >> Tom argues that lock_timeout should be sufficient.  I'm not
>>     sure what
>>     >> does WAIT [N] buy
>>
>>
>> I disagree with Tom on this point.  *If* I was trying to implement  a
>> server policy, then sure, it should not be done by embedding the
>> timeout in the SQL statement.  But I don't think they want this to
>> implement a server policy.  (And if we do, why would we thump the poor
>> victims that are waiting on the lock, rather than the rogue who
>> decided to take a lock and then camp out on it?)  The use case for
>> WAIT [N] is not a server policy, but a UI policy.  I have two ways to
>> do this task.  The preferred way needs to lock a row, but waiting for
>> it may take too long.  So if I can't get the lock within a reasonable
>> time, I fall back on a less-preferred but still acceptable way of
>> doing the task, one that doesn't need the lock.  If we move to a new
>> server, the appropriate value for the time out does not change,
>> because the appropriate level is the concern of the UI and the end
>> users, not the database server.  This wouldn't be scattered all over
>> the application, either.  In my experience, if you have an application
>> that could benefit from this, you might have 1 or 2 uses for WAIT [N]
>> out of 1,000+ statements in the application.  (From my perspective, if
>> there were to be a WAIT [N] option, it could plug into the
>> statement_timeout mechanism rather than the proposed lock_timeout
>> mechanism.)
>>
>> I think that if the use case for a GUC is to set it, run a single very
>> specific statement, and then unset it, that is pretty clear evidence
>> that this should not be a GUC in the first place.
>>
>> Maybe I am biased in this because I am primarily thinking about how I
>> would use such a feature, rather than how Hans-Juergen intends to use
>> it, and maybe those uses differ.  Hans-Juergen, could you describe
>> your use case a little bit more?   Who do is going to be getting these
>> time-out errors, the queries run by the web-app, or longer running
>> back-office queries?  And when they do get an error, what will they do
>> about it?
>
> Our use case is to port a huge set of Informix apps,
> that use SET LOCK MODE TO WAIT N;
> Apparently Tom Lane was on the opinion that
> PostgreSQL won't need anything more in that regard.
>
> In case the app gets an error, the query (transaction)
> can be retried, the "when" can be user controlled.
>
> I tried to argue on the SELECT ... WAIT N part as well,
> but for our purposes currently the GUC is enough.
>
>>     > Okay, we implemented only the lock_timeout GUC.
>>     > Patch attached, hopefully in an acceptable form.
>>     > Documentation included in the patch, lock_timeout
>>     > works the same way as statement_timeout, takes
>>     > value in milliseconds and 0 disables the timeout.
>>     >
>>     > Best regards,
>>     > Zoltán Böszörményi
>>     >
>>
>>     New patch attached. It's only regenerated for current CVS
>>     so it should apply cleanly.
>>
>>
>>
>> In addition to the previously mentioned seg-fault issues when
>> attempting to use this feature (confirmed in another machine, linux,
>> 64 bit, and --enable-cassert does not offer any help), I have some
>> more concerns about the patch.  From the docs:
>>
>> doc/src/sgml/config.sgml
>>
>>         Abort any statement that tries to lock any rows or tables and
>> the lock
>>         has to wait more than the specified number of milliseconds,
>> starting
>>         from the time the command arrives at the server from the client.
>>         If <varname>log_min_error_statement</> is set to
>> <literal>ERROR</> or
>>         lower, the statement that timed out will also be logged.
>>         A value of zero (the default) turns off the limitation.
>>
>> This suggests that all row locks will have this behavior.  However, my
>> experiments show that row locks attempted to be taken for ordinary
>> UPDATE commands do not time out.  If this is only intended to apply to
>> SELECT .... FOR UPDATE, that should be documented here.  It is
>> documented elsewhere that this applies to SELECT...FOR UPDATE, but it
>> is not documented that this the only row-locks it applies to.
>>
>> "from the time the command arrives at the server".  I am pretty sure
>> this is not the desired behavior, otherwise how does it differ from
>> statement_timeout?  I think it must be a copy and paste error for the doc.
>>
>>
>> For the implementation, I think the patch touches too much code.  In
>> particular, lwlock.c.  Is the time spent waiting on ProcArrayLock
>> significant enough that it needs all of that code to support timing it
>> out?  I don't think it should ever take more than a few microseconds
>> to obtain that light-weight lock.  And if we do want to time all of
>> the light weight access, shouldn't those times be summed up, rather
>> than timing out only if any single one of them exceeds the threshold
>> in isolation?  (That is my interpretation of how the code works
>> currently, I could be wrong on that.)
>
> You seem to be right, it may not be needed.
> The only callsite is ProcSleep() in storage/lmgr/proc.c
> and PGSemaphoreTimedLock() was already waited on.
> Thanks for the review.
>
>>
>> If the seg-faults are fixed, I am still skeptical that this patch is
>> acceptable, because the problem it solves seems to be poorly or
>> incompletely specified.

So there are a couple of problems with this patch:

1. Do we want it at all?
2. Do we want it as a GUC or dedicated syntax?
3. Seg faults are bad.

As to #1, personally, I think it's quite useful.  The arguments that
have been made that lock_timeout is redundant with statement_timeout
don't seem to me to have much merit.  If I have a low-priority
maintenance operation that runs in the background, it's perfectly
reasonable for me to want it to die if it spends too long waiting on a
lock.  But to simulate that behavior with statement timeout, I have to
benchmark every statement and then set the statement timeout for that
statement individually, and it's still not really going to do what I
want.  The suggestion that these two are the same strikes me as akin
to telling someone they don't need a scalpel because they already have
a perfectly good hammer.

In futher support of this position, I note that Microsoft SQL Server,
Oracle, and DB2 all have this feature.  AFAICT from a quick Google
Search, MySQL does not.

As to #2, I was initially thinking dedicated syntax would be better
because I hate "SET guc = value; do thing; SET guc = previous_value;".But now I'm realizing that there's every reason
tosuppose that 
SELECT FOR UPDATE will not be the only case where we want to do this -
so I think a GUC is the only reasonable choice.  But that having been
said, I think some kind of syntax to set a GUC for just one statement
would be way useful, per discussions downthread.  However, that seems
like it can and should be a separate pach.

As to #3, that's obviously gotta be fixed.  If we're to further
consider this patch for this CommitFest, that fixing needs to happen
pretty soon.

...Robert


pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: GRANT ON ALL IN schema
Next
From: Robert Haas
Date:
Subject: Re: Issues for named/mixed function notation patch