Re: [PATCH] lock_timeout and common SIGALRM framework - Mailing list pgsql-hackers

From Boszormenyi Zoltan
Subject Re: [PATCH] lock_timeout and common SIGALRM framework
Date
Msg-id 4F964F94.7070407@cybertec.at
Whole thread Raw
In response to Re: [PATCH] lock_timeout and common SIGALRM framework  (Marc Cousin <cousinmarc@gmail.com>)
Responses Re: [PATCH] lock_timeout and common SIGALRM framework
List pgsql-hackers
2012-04-23 15:08 keltezéssel, Marc Cousin írta:
> On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote:
>> 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta:
>>> 2012-04-06 14:47 keltezéssel, Cousin Marc írta:
>>>> On 05/04/12 08:02, Boszormenyi Zoltan wrote:
>>>>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta:
>>>>>> I think this patch is doing two things: first touching infrastructure
>>>>>> stuff and then adding lock_timeout on top of that.  Would it work to
>>>>>> split the patch in two pieces?
>>>>>>
>>>>> Sure. Attached is the split version.
>>>>>
>>>>> Best regards,
>>>>> Zoltán Böszörményi
>>>>>
>>>> Hi,
>>>>
>>>> I've started looking at and testing both patches.
>>>>
>>>> Technically speaking, I think the source looks much better than the
>>>> first version of lock timeout, and may help adding other timeouts in the
>>>> future. I haven't tested it in depth though, because I encountered the
>>>> following problem:
>>>>
>>>> While testing the patch, I found a way to crash PG. But what's weird is
>>>> that it crashes also with an unpatched git version.
>>>>
>>>> Here is the way to reproduce it (I have done it with a pgbench schema):
>>>>
>>>> - Set a small statement_timeout (just to save time during the tests)
>>>>
>>>> Session1:
>>>> =#BEGIN;
>>>> =#lock TABLE pgbench_accounts ;
>>>>
>>>> Session 2:
>>>> =#BEGIN;
>>>> =#lock TABLE pgbench_accounts ;
>>>> ERROR:  canceling statement due to statement timeout
>>>> =# lock TABLE pgbench_accounts ;
>>>>
>>>> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be
>>>> done with a rollback to savepoint of course.
>>>>
>>>> Session 2 crashes with this : TRAP : FailedAssertion(«
>>>> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne :
>>>> 749).
>>>>
>>>> It can also be done without a statement_timeout, and a control-C on the
>>>> second lock table.
>>>>
>>>> I didn't touch anything but this. It occurs everytime, when asserts are
>>>> activated.
>>>>
>>>> I tried it on 9.1.3, and I couldn't make it crash with the same sequence
>>>> of events. So maybe it's something introduced since ? Or is the assert
>>>> still valid ?
>>>>
>>>> Cheers
>>>>
>>> Attached are the new patches. I rebased them to current GIT and
>>> they are expected to be applied after Robert Haas' patch in the
>>> "bug in fast-path locking" thread.
>>>
>>> Now it survives the above scenario.
>>>
>>> Best regards,
>>> Zoltán Böszörményi
>> New patch attached, rebased to today's GIT.
>>
>> Best regards,
>> Zoltán Böszörményi
>>
> Ok, I've done what was missing from the review (from when I had a bug in
> locking the other day), so here is the full review. By the way, this
> patch doesn't belong to current commitfest, but to the next one.

It was added to 2012-Next when I posted it, 2012-01 was already
closed for new additions.

> Is the patch in context diff format?
> Yes
>
> Does it apply cleanly to the current git master?
> Yes
>
> Does it include reasonable tests, necessary doc patches, etc?
> The new lock_timeout GUC is documented. There are regression tests.
>
> Read what the patch is supposed to do, and consider:
> Does the patch actually implement that?
> Yes
>
> Do we want that?
> I do. Mostly for administrative jobs which could lock the whole
> application. It would be much easier to run reindexes, vacuum full, etc…
> without worrying about bringing application down because of lock
> contention.
>
> Do we already have it?
> No.
>
> Does it follow SQL spec, or the community-agreed behavior?
> I don't know if there is a consensus on this new GUC. statement_timeout
> is obviously not in the SQL spec.
>
> Does it include pg_dump support (if applicable)?
> Not applicable
>
> Are there dangers?
> Yes, as it rewrites all the timeout code. I feel it is much cleaner this
> way, as there is a generic set of function managing all sigalarm code,
> but it heavily touches this part.
>
> Have all the bases been covered?
> I tried all sql statements I could think of (select, insert, update,
> delete, truncate, drop, create index, adding constraint, lock.
>
> I tried having statement_timeout, lock_timeout and deadlock_timeout at
> very short and close or equal values. It worked too.
>
> Rollback to savepoint while holding locks dont crash PostgreSQL anymore.
>
> Other timeouts such as archive_timeout and checkpoint_timeout still
> work.
>
> Does the feature work as advertised?
> Yes
>
> Are there corner cases the author has failed to consider?
> I didn't find any.
>
> Are there any assertion failures or crashes?
> No.
>
> Does the patch slow down simple tests?
> No
>
> If it claims to improve performance, does it?
> Not applicable
>
> Does it slow down other things?
> No
>
> Does it follow the project coding guidelines?
> I think so
>
> Are there portability issues?
> No, all the portable code (acquiring locks and manipulating sigalarm) is
> the same as before.
>
> Will it work on Windows/BSD etc?
> It should. I couldn't test it though.
>
> Are the comments sufficient and accurate?
> Yes
>
> Does it do what it says, correctly?
> Yes
>
> Does it produce compiler warnings?
> No
>
> Can you make it crash?
> Not anymore
>
> Is everything done in a way that fits together coherently with other
> features/modules?
> Yes, I think so. The new way of handling sigalarm seems more robust to
> me.
>
> Are there interdependencies that can cause problems?
> I don't see any.
>
> Regards,
>
> Marc

Thanks for the review.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de     http://www.postgresql.at/



pgsql-hackers by date:

Previous
From: Jaime Casanova
Date:
Subject: Re: DOMAINs and CASTs
Next
From: Nikhil Sontakke
Date:
Subject: Re: B-tree page deletion boundary cases