Re: [PATCH] lock_timeout and common SIGALRM framework - Mailing list pgsql-hackers
From | Marc Cousin |
---|---|
Subject | Re: [PATCH] lock_timeout and common SIGALRM framework |
Date | |
Msg-id | 1335186512.32450.17.camel@marco-dalibo Whole thread Raw |
In response to | Re: [PATCH] lock_timeout and common SIGALRM framework (Boszormenyi Zoltan <zb@cybertec.at>) |
Responses |
Re: [PATCH] lock_timeout and common SIGALRM framework
|
List | pgsql-hackers |
On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote: > 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta: > > 2012-04-06 14:47 keltezéssel, Cousin Marc írta: > >> On 05/04/12 08:02, Boszormenyi Zoltan wrote: > >>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta: > >>>> I think this patch is doing two things: first touching infrastructure > >>>> stuff and then adding lock_timeout on top of that. Would it work to > >>>> split the patch in two pieces? > >>>> > >>> Sure. Attached is the split version. > >>> > >>> Best regards, > >>> Zoltán Böszörményi > >>> > >> Hi, > >> > >> I've started looking at and testing both patches. > >> > >> Technically speaking, I think the source looks much better than the > >> first version of lock timeout, and may help adding other timeouts in the > >> future. I haven't tested it in depth though, because I encountered the > >> following problem: > >> > >> While testing the patch, I found a way to crash PG. But what's weird is > >> that it crashes also with an unpatched git version. > >> > >> Here is the way to reproduce it (I have done it with a pgbench schema): > >> > >> - Set a small statement_timeout (just to save time during the tests) > >> > >> Session1: > >> =#BEGIN; > >> =#lock TABLE pgbench_accounts ; > >> > >> Session 2: > >> =#BEGIN; > >> =#lock TABLE pgbench_accounts ; > >> ERROR: canceling statement due to statement timeout > >> =# lock TABLE pgbench_accounts ; > >> > >> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be > >> done with a rollback to savepoint of course. > >> > >> Session 2 crashes with this : TRAP : FailedAssertion(« > >> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne : > >> 749). > >> > >> It can also be done without a statement_timeout, and a control-C on the > >> second lock table. > >> > >> I didn't touch anything but this. It occurs everytime, when asserts are > >> activated. > >> > >> I tried it on 9.1.3, and I couldn't make it crash with the same sequence > >> of events. So maybe it's something introduced since ? Or is the assert > >> still valid ? > >> > >> Cheers > >> > > > > Attached are the new patches. I rebased them to current GIT and > > they are expected to be applied after Robert Haas' patch in the > > "bug in fast-path locking" thread. > > > > Now it survives the above scenario. > > > > Best regards, > > Zoltán Böszörményi > > New patch attached, rebased to today's GIT. > > Best regards, > Zoltán Böszörményi > Ok, I've done what was missing from the review (from when I had a bug in locking the other day), so here is the full review. By the way, this patch doesn't belong to current commitfest, but to the next one. Is the patch in context diff format? Yes Does it apply cleanly to the current git master? Yes Does it include reasonable tests, necessary doc patches, etc? The new lock_timeout GUC is documented. There are regression tests. Read what the patch is supposed to do, and consider: Does the patch actually implement that? Yes Do we want that? I do. Mostly for administrative jobs which could lock the whole application. It would be much easier to run reindexes, vacuum full, etc… without worrying about bringing application down because of lock contention. Do we already have it? No. Does it follow SQL spec, or the community-agreed behavior? I don't know if there is a consensus on this new GUC. statement_timeout is obviously not in the SQL spec. Does it include pg_dump support (if applicable)? Not applicable Are there dangers? Yes, as it rewrites all the timeout code. I feel it is much cleaner this way, as there is a generic set of function managing all sigalarm code, but it heavily touches this part. Have all the bases been covered? I tried all sql statements I could think of (select, insert, update, delete, truncate, drop, create index, adding constraint, lock. I tried having statement_timeout, lock_timeout and deadlock_timeout at very short and close or equal values. It worked too. Rollback to savepoint while holding locks dont crash PostgreSQL anymore. Other timeouts such as archive_timeout and checkpoint_timeout still work. Does the feature work as advertised? Yes Are there corner cases the author has failed to consider? I didn't find any. Are there any assertion failures or crashes? No. Does the patch slow down simple tests? No If it claims to improve performance, does it? Not applicable Does it slow down other things? No Does it follow the project coding guidelines? I think so Are there portability issues? No, all the portable code (acquiring locks and manipulating sigalarm) is the same as before. Will it work on Windows/BSD etc? It should. I couldn't test it though. Are the comments sufficient and accurate? Yes Does it do what it says, correctly? Yes Does it produce compiler warnings? No Can you make it crash? Not anymore Is everything done in a way that fits together coherently with other features/modules? Yes, I think so. The new way of handling sigalarm seems more robust to me. Are there interdependencies that can cause problems? I don't see any. Regards, Marc
pgsql-hackers by date: