Re: [PATCH] lock_timeout and common SIGALRM framework - Mailing list pgsql-hackers

From Marc Cousin
Subject Re: [PATCH] lock_timeout and common SIGALRM framework
Date
Msg-id 1335186512.32450.17.camel@marco-dalibo
Whole thread Raw
In response to Re: [PATCH] lock_timeout and common SIGALRM framework  (Boszormenyi Zoltan <zb@cybertec.at>)
Responses Re: [PATCH] lock_timeout and common SIGALRM framework
List pgsql-hackers
On Mon, 2012-04-23 at 10:53 +0200, Boszormenyi Zoltan wrote:
> 2012-04-10 09:02 keltezéssel, Boszormenyi Zoltan írta:
> > 2012-04-06 14:47 keltezéssel, Cousin Marc írta:
> >> On 05/04/12 08:02, Boszormenyi Zoltan wrote:
> >>> 2012-04-04 21:30 keltezéssel, Alvaro Herrera írta:
> >>>> I think this patch is doing two things: first touching infrastructure
> >>>> stuff and then adding lock_timeout on top of that.  Would it work to
> >>>> split the patch in two pieces?
> >>>>
> >>> Sure. Attached is the split version.
> >>>
> >>> Best regards,
> >>> Zoltán Böszörményi
> >>>
> >> Hi,
> >>
> >> I've started looking at and testing both patches.
> >>
> >> Technically speaking, I think the source looks much better than the
> >> first version of lock timeout, and may help adding other timeouts in the
> >> future. I haven't tested it in depth though, because I encountered the
> >> following problem:
> >>
> >> While testing the patch, I found a way to crash PG. But what's weird is
> >> that it crashes also with an unpatched git version.
> >>
> >> Here is the way to reproduce it (I have done it with a pgbench schema):
> >>
> >> - Set a small statement_timeout (just to save time during the tests)
> >>
> >> Session1:
> >> =#BEGIN;
> >> =#lock TABLE pgbench_accounts ;
> >>
> >> Session 2:
> >> =#BEGIN;
> >> =#lock TABLE pgbench_accounts ;
> >> ERROR:  canceling statement due to statement timeout
> >> =# lock TABLE pgbench_accounts ;
> >>
> >> I'm using \set ON_ERROR_ROLLBACK INTERACTIVE by the way. It can also be
> >> done with a rollback to savepoint of course.
> >>
> >> Session 2 crashes with this : TRAP : FailedAssertion(«
> >> !(locallock->holdsStrongLockCount == 0) », fichier : « lock.c », ligne :
> >> 749).
> >>
> >> It can also be done without a statement_timeout, and a control-C on the
> >> second lock table.
> >>
> >> I didn't touch anything but this. It occurs everytime, when asserts are
> >> activated.
> >>
> >> I tried it on 9.1.3, and I couldn't make it crash with the same sequence
> >> of events. So maybe it's something introduced since ? Or is the assert
> >> still valid ?
> >>
> >> Cheers
> >>
> >
> > Attached are the new patches. I rebased them to current GIT and
> > they are expected to be applied after Robert Haas' patch in the
> > "bug in fast-path locking" thread.
> >
> > Now it survives the above scenario.
> >
> > Best regards,
> > Zoltán Böszörményi
> 
> New patch attached, rebased to today's GIT.
> 
> Best regards,
> Zoltán Böszörményi
> 

Ok, I've done what was missing from the review (from when I had a bug in
locking the other day), so here is the full review. By the way, this
patch doesn't belong to current commitfest, but to the next one.



Is the patch in context diff format?
Yes

Does it apply cleanly to the current git master?
Yes

Does it include reasonable tests, necessary doc patches, etc?
The new lock_timeout GUC is documented. There are regression tests.

Read what the patch is supposed to do, and consider:
Does the patch actually implement that?
Yes

Do we want that?
I do. Mostly for administrative jobs which could lock the whole
application. It would be much easier to run reindexes, vacuum full, etc…
without worrying about bringing application down because of lock
contention.

Do we already have it?
No.

Does it follow SQL spec, or the community-agreed behavior?
I don't know if there is a consensus on this new GUC. statement_timeout
is obviously not in the SQL spec.

Does it include pg_dump support (if applicable)?
Not applicable

Are there dangers?
Yes, as it rewrites all the timeout code. I feel it is much cleaner this
way, as there is a generic set of function managing all sigalarm code,
but it heavily touches this part.

Have all the bases been covered?
I tried all sql statements I could think of (select, insert, update,
delete, truncate, drop, create index, adding constraint, lock.

I tried having statement_timeout, lock_timeout and deadlock_timeout at
very short and close or equal values. It worked too.

Rollback to savepoint while holding locks dont crash PostgreSQL anymore.

Other timeouts such as archive_timeout and checkpoint_timeout still
work.

Does the feature work as advertised?
Yes

Are there corner cases the author has failed to consider?
I didn't find any.

Are there any assertion failures or crashes?
No.

Does the patch slow down simple tests?
No

If it claims to improve performance, does it?
Not applicable

Does it slow down other things?
No

Does it follow the project coding guidelines?
I think so

Are there portability issues?
No, all the portable code (acquiring locks and manipulating sigalarm) is
the same as before.

Will it work on Windows/BSD etc?
It should. I couldn't test it though.

Are the comments sufficient and accurate?
Yes

Does it do what it says, correctly?
Yes

Does it produce compiler warnings?
No

Can you make it crash?
Not anymore

Is everything done in a way that fits together coherently with other
features/modules?
Yes, I think so. The new way of handling sigalarm seems more robust to
me.

Are there interdependencies that can cause problems?
I don't see any.

Regards,

Marc





pgsql-hackers by date:

Previous
From: Boszormenyi Zoltan
Date:
Subject: Re: [PATCH] lock_timeout and common SIGALRM framework
Next
From: Sandro Santilli
Date:
Subject: Re: Gsoc2012 idea, tablesample