Re: [PATCH] lock_timeout and common SIGALRM framework - Mailing list pgsql-hackers

From Boszormenyi Zoltan
Subject Re: [PATCH] lock_timeout and common SIGALRM framework
Date
Msg-id 5060D101.8020908@cybertec.at
Whole thread Raw
In response to Re: [PATCH] lock_timeout and common SIGALRM framework  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] lock_timeout and common SIGALRM framework  (Boszormenyi Zoltan <zb@cybertec.at>)
List pgsql-hackers
Hi,

2012-09-22 20:49 keltezéssel, Tom Lane írta:
> Boszormenyi Zoltan <zb@cybertec.at> writes:
>> new version with a lot more cleanup is attached.
> I looked at this patch, and frankly I'm rather dismayed.  It's a mess.

I hope you won't find this one a mess. I tried to address all your complaints.

> [rather long diatribe on modifying PGSemaphoreLock improperly]

I have returned to the previous version that used PGSemaphoreTimedLock
and this time I save and restore errno around calling the timeout condition
checker.

> The whole lmgrtimeout module seems to me to be far more mechanism than is
> warranted, for too little added functionality.  [...]

lmgrtimeout is no more, back to hardcoding things.

>    The minimum thing I would want it to do is avoid calling
> timeout.c multiple times, which is what would happen right now (leading
> to four extra syscalls per lock acquisition, which is enough new overhead
> to constitute a strong objection to committing this patch at all).

I have added enable_multiple_timeouts() and disable_multiple_timeouts()
that minimize the number setitimer() calls.

> There's considerable lack of attention to updating comments, too.
> For instance in WaitOnLock you only bothered to update the comment
> immediately adjacent to the changed code, and not the two comment
> blocks above that, which both have specific references to deadlocks
> being the reason for failure.

I modified the comment in question. I hope the wording is right.

> Also, the "per statement" mode for lock timeout doesn't seem to be
> any such thing, because it's implemented like this:
>
> +            case LOCK_TIMEOUT_PER_STMT:
> +                enable_timeout_at(LOCK_TIMEOUT,
> +                            TimestampTzPlusMilliseconds(
> +                                    GetCurrentStatementStartTimestamp(),
> +                                    LockTimeout));
> +                break;
>
> That doesn't provide anything like "you can spend at most N milliseconds
> waiting for locks during a statement".  What it is is "if you happen to be
> waiting for a lock N milliseconds after the statement starts, or if you
> attempt to acquire any lock more than N milliseconds after the statement
> starts, you lose instantly".  I don't think that definition actually adds
> any useful functionality compared to setting statement_timeout to N
> milliseconds, and it's certainly wrongly documented.  To do what the
> documentation implies would require tracking and adding up the time spent
> waiting for locks during a statement.  Which might be a good thing to do,
> especially if the required gettimeofday() calls could be shared with what
> timeout.c probably has to do anyway at start and stop of a lock wait.
> But this code doesn't do it.

The code now properly accumulates the time spent in waiting for
LOCK_TIMEOUT. This means that if STATEMENT_TIMEOUT and
LOCK_TIMEOUT are set to the same value, STATEMENT_TIMEOUT
will trigger because it considers the time as one contiguous unit,
LOCK_TIMEOUT only accounts the time spent in waiting, not the time
spent with useful work. This means that LOCK_TIMEOUT doesn't
need any special code in its handler function, it's a NOP. The
relation between timeouts is only handled by the timeout.c module.

> Lastly, I'm not sure where is the best place to be adding the control
> logic for this, but I'm pretty sure postinit.c is not it.  It oughta be
> somewhere under storage/lmgr/, no?

The above change means that there is no control logic outside of
storage/lmgr now.

I temporarily abandoned the idea of detailed error reporting on
the object type and name/ID. WaitOnLock() reports "canceling statement
due to lock timeout" and LockAcquire() kept its previous semantics.
This can be quickly revived in case of demand, it would be another ~15K patch.

I hope you can find another time slot in this CF to review this one.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: External Replication
Next
From: Simon Riggs
Date:
Subject: Re: [PoC] load balancing in libpq