Re: Strange Windows problem, lock_timeout test request - Mailing list pgsql-hackers

From Boszormenyi Zoltan
Subject Re: Strange Windows problem, lock_timeout test request
Date
Msg-id 514496DF.6070706@cybertec.at
Whole thread Raw
In response to Re: Strange Windows problem, lock_timeout test request  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Strange Windows problem, lock_timeout test request  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
2013-03-15 18:53 keltezéssel, Tom Lane írta:
> Boszormenyi Zoltan<zb@cybertec.at>  writes:
>> [ 2-lock_timeout-v33.patch ]
> I looked at this patch a bit.  I don't understand why you've chosen to
> alter the API of the enable_timeout variants to have a bool result that
> says "I didn't bother to process the timeout because it would have fired
> immediately".

I don't know either...

>    That is not an advantage for any call site that I can
> see: it just means that the caller needs an extra, very difficult to
> test code path to handle what would normally be handled by a timeout
> interrupt.  Even if it were a good API choice, you've broken a number of
> existing call sites that the patch fails to touch (eg in postmaster.c
> and postgres.c).  It's not acceptable to define a failure return from
> enable_timeout_after and then let callers assume that the failure can't
> happen.
>
> Please undo that.

Done.

> Also, I'm not really enamored of the choice to use List* infrastructure
> for enable_timeouts().  That doesn't appear to be especially convenient
> for any caller, and what it does do is create memory-leak risks for most
> of them (if they forget to free the list cells, perhaps as a result of
> an error exit).  I think a local-variable array of TimeoutParams[]
> structs would serve better for most use-cases.

Changed. However, the first member of the structure is
"TimeoutId id" and a sensible end-of-array value can be -1.
Some versions of GCC (maybe other compilers, too) complain
if a constant is assigned to an enum which is outside of the
declared values of the enum. So I added a "NO_TIMEOUT = -1"
to enum TimeoutId. Comments?

> Another thought is that the PGSemaphoreTimedLock implementations all
> share the same bug, which is that if the "condition" callback returns
> true immediately after we acquire the semaphore, they will all wrongly
> return false indicating that the semaphore wasn't acquired.

You are right. Fixed.

>    (BTW,
> I do not like either the variable name "condition" or the typedef name
> "TimeoutCondition" for something that's actually a callback function
> pointer.)

How about "TimeoutCallback" and "callback_fn"?

> In the same vein, this comment change:
>
>         * NOTE: Think not to put any shared-state cleanup after the call to
>         * ProcSleep, in either the normal or failure path.  The lock state must
> !      * be fully set by the lock grantor, or by CheckDeadLock if we give up
> !      * waiting for the lock.  This is necessary because of the possibility
> !      * that a cancel/die interrupt will interrupt ProcSleep after someone else
> !      * grants us the lock, but before we've noticed it. Hence, after granting,
> !      * the locktable state must fully reflect the fact that we own the lock;
> !      * we can't do additional work on return.
>         *
>         * We can and do use a PG_TRY block to try to clean up after failure, but
>         * this still has a major limitation: elog(FATAL) can occur while waiting
> --- 1594,1606 ----
>        /*
>         * NOTE: Think not to put any shared-state cleanup after the call to
>         * ProcSleep, in either the normal or failure path.  The lock state must
> !      * be fully set by the lock grantor, or by CheckDeadLock or by ProcSleep
> !      * itself in case a timeout is detected or if we give up waiting for the lock.
> !      * This is necessary because of the possibility that a cancel/die interrupt
> !      * will interrupt ProcSleep after someone else grants us the lock, but
> !      * before we've noticed it. Hence, after granting, the locktable state must
> !      * fully reflect the fact that we own the lock; we can't do additional work
> !      * on return.
>         *
>
> suggests that you've failed to grasp the fundamental race-condition
> problem here.  ProcSleep can't do cleanup after the fact any more than
> its callers can, because otherwise it has a race condition with some
> other process deciding to grant it the lock at about the same time its
> timeout goes off.  I think the changes in ProcSleep that alter the
> state-cleanup behavior are just plain wrong, and what you need to do
> is make that look more like the existing mechanisms that clean up when
> statement timeout occurs.

This was the most enlightening comment up to now.
It seems no one else understood the timer code but you...
Thanks very much.

I hope the way I did it is right. I factored out the core of
StatementCancelHandler() into a common function that can
be used by the lock_timeout interrupt as its timeout callback
function. Now the code doesn't need PGSemaphoreTimedLock().

While I was thinking about how this thing works, I recognized
that I also need to set the timeout indicator to false after checking
it in ProcessInterrupts().

The reason is that it's needed is this scenario:
1. lock_timeout is set and the transaction throws its error
2. lock_timeout is unset before the next transaction
3. the user presses Ctrl-C during the next transaction
In  this case, the second transaction would report the
lock timeout error, since the indicator was still set.

The other way would be to call disable_all_timeouts(false)
unconditionally in ProcessInterrupts() but setting only the indicator
is faster and it doesn't interfere with unknown registered timeout.
Also, the reason is that it's disable_timeout_indicator() instead of
a generic set_timeout_indicator() is that the generic one may be
abused to produce false positives.

Best regards,
Zoltán Böszörményi

>             regards, tom lane
>
>


--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web:http://www.postgresql-support.de
      http://www.postgresql.at/


Attachment

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Support for REINDEX CONCURRENTLY
Next
From: Tom Lane
Date:
Subject: Re: Strange Windows problem, lock_timeout test request