Thread: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
hello everybody, i would like to propose an extension to our SELECT FOR UPDATE mechanism. especially in web applications it can be extremely useful to have the chance to terminate a lock after a given timeframe. i would like to add this functionality to PostgreSQL 8.5. the oracle syntax is quite clear and easy to use here: http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2126016 informix should behave pretty much the same way. are there any arguments from hackers' side against this feature? many thanks, hans -- Cybertec Schönig & Schönig GmbH Professional PostgreSQL Consulting, Support, Training Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: www.postgresql-support.de
Can't you to this today with statement_timeout? Surely you do want to rollback the whole transaction or at least the subtransaction if you have error handling. -- Greg On 11 May 2009, at 10:26, Hans-Juergen Schoenig <postgres@cybertec.at> wrote: > hello everybody, > > i would like to propose an extension to our SELECT FOR UPDATE > mechanism. > especially in web applications it can be extremely useful to have > the chance to terminate a lock after a given timeframe. > i would like to add this functionality to PostgreSQL 8.5. > > the oracle syntax is quite clear and easy to use here: > > http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/statements_10002.htm#i2126016 > > informix should behave pretty much the same way. > are there any arguments from hackers' side against this feature? > > many thanks, > > hans > > -- > Cybertec Schönig & Schönig GmbH > Professional PostgreSQL Consulting, Support, Training > Gröhrmühlgasse 26, A-2700 Wiener Neustadt > Web: www.postgresql-support.de > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
hello greg, the thing with statement_timeout is a little bit of an issue. you could do: SET statement_timeout TO ...; SELECT FOR UPDATE ... SET statement_timeout TO default; this practically means 3 commands. the killer argument, however, is that the lock might very well happen ways after the statement has started. imagine something like that (theoretical example): SELECT ... FROM WHERE x > ( SELECT some_very_long_thing) FOR UPDATE ...; some operation could run for ages without ever taking a single, relevant lock here. so, you don't really get the same thing with statement_timeout. regards, hans Greg Stark wrote: > Can't you to this today with statement_timeout? Surely you do want to > rollback the whole transaction or at least the subtransaction if you > have error handling. > -- Cybertec Schönig & Schönig GmbH Professional PostgreSQL Consulting, Support, Training Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: www.postgresql-support.de
2009/5/11 Hans-Juergen Schoenig <postgres@cybertec.at>
the thing with statement_timeout is a little bit of an issue.
you could do:
SET statement_timeout TO ...;
SELECT FOR UPDATE ...
SET statement_timeout TO default;
Why not extend the "SET" instruction to allow configuration parameters to be set only in the duration of the transaction or the next "n" commands?
--
Lucas Brito
--On 11. Mai 2009 06:38:44 -0300 Lucas Brito <lucas75@gmail.com> wrote: > Why not extend the "SET" instruction to allow configuration parameters to > be set only in the duration of the transaction or the next "n" commands? It's already there: see SET LOCAL. -- Thanks Bernd
-- Greg On 11 May 2009, at 11:18, Hans-Juergen Schoenig <postgres@cybertec.at> wrote: > hello greg, > > the thing with statement_timeout is a little bit of an issue. > you could do: > SET statement_timeout TO ...; > SELECT FOR UPDATE ... > SET statement_timeout TO default; > > this practically means 3 commands. I tend to think there should be protocol level support for options like this but that would require buy-in from the interface writers. > > the killer argument, however, is that the lock might very well > happen ways after the statement has started. Sure. But Isn't the statement_timeout behaviour what an application writer would actually want? Why would he care how long some sub-part of the statement took? Isn't an application -you used the example of a web app - really concerned with its response time? > > imagine something like that (theoretical example): > > SELECT ... > FROM > WHERE x > ( SELECT some_very_long_thing) > FOR UPDATE ...; > > some operation could run for ages without ever taking a single, > relevant lock here. > so, you don't really get the same thing with statement_timeout. > > regards, > > hans > > > > > Greg Stark wrote: >> Can't you to this today with statement_timeout? Surely you do want >> to rollback the whole transaction or at least the subtransaction if >> you have error handling. >> > > > > > -- > Cybertec Schönig & Schönig GmbH > Professional PostgreSQL Consulting, Support, Training > Gröhrmühlgasse 26, A-2700 Wiener Neustadt > Web: www.postgresql-support.de >
> > I tend to think there should be protocol level support for options > like this but that would require buy-in from the interface writers. > > how would you do it? if you support it on the protocol level, you still need a way to allow the user to tell you how ... i would see WAIT for DELETE, UPDATE and SELECT FOR UPDATE. did you have more in mind? >> >> the killer argument, however, is that the lock might very well happen >> ways after the statement has started. > > Sure. But Isn't the statement_timeout behaviour what an application > writer would actually want? Why would he care how long some sub-part > of the statement took? Isn't an application -you used the example of a > web app - really concerned with its response time? > > no, for a simple reason: in this case you would depend ways too much in other tasks. some other reads which just pump up the load or some nightly cronjobs would give you timeouts which are not necessarily related to locking. we really want to protect us against some "LOCK TABLE IN ACCESS EXCLUSIVE MODE" - i am not looking for a solution which kills queries after some time (we have that already). i want protect myself against locking issues. this feature is basically supported by most big vendor (informix, oracle, just to name a few). i am proposing this because i have needed it for a long time already and in this case it is also needed for a migration project. hans -- Cybertec Schönig & Schönig GmbH Professional PostgreSQL Consulting, Support, Training Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: www.postgresql-support.de
Hans-Juergen Schoenig <postgres@cybertec.at> writes: > i would like to propose an extension to our SELECT FOR UPDATE mechanism. > especially in web applications it can be extremely useful to have the > chance to terminate a lock after a given timeframe. I guess my immediate reactions to this are: 1. Why SELECT FOR UPDATE in particular, and not other sorts of locks? 2. That "clear and easy to use" oracle syntax sucks. You do not want to be embedding lock timeout constants in your application queries. When you move to a new server and the appropriate timeout changes, do you want to be trying to update your clients for that? What I think has been proposed previously is a GUC variable named something like "lock_timeout", which would cause a wait for *any* heavyweight lock to abort after such-and-such an interval. This would address your point about not wanting to use an overall statement_timeout, and it would be more general than a feature that only works for SELECT FOR UPDATE row locks, and it would allow decoupling the exact length of the timeout from application query logic. regards, tom lane
Hi, Tom Lane írta: > Hans-Juergen Schoenig <postgres@cybertec.at> writes: > >> i would like to propose an extension to our SELECT FOR UPDATE mechanism. >> especially in web applications it can be extremely useful to have the >> chance to terminate a lock after a given timeframe. >> > > I guess my immediate reactions to this are: > > 1. Why SELECT FOR UPDATE in particular, and not other sorts of locks? > > 2. That "clear and easy to use" oracle syntax sucks. You do not want > to be embedding lock timeout constants in your application queries. > When you move to a new server and the appropriate timeout changes, > do you want to be trying to update your clients for that? > > What I think has been proposed previously is a GUC variable named > something like "lock_timeout", which would cause a wait for *any* > heavyweight lock to abort after such-and-such an interval. This > would address your point about not wanting to use an overall > statement_timeout, and it would be more general than a feature > that only works for SELECT FOR UPDATE row locks, and it would allow > decoupling the exact length of the timeout from application query > logic. > Would the "lock_timeout" work for all to be acquired locks individually, or all of them combined for the statement? The individual application of the timeout for every locks individually wouldn't be too nice. E.g. SELECT ... FOR ... WAIT N (N in seconds) behaviour in this scenario below is not what the application writed would expect: xact 1: SELECT ... FOR UPDATE (record 1) xact 2: SELECT ... FOR UPDATE (record 2) xact 3: SELECT ... FOR UPDATE WAIT 10 (record 1 and 2, waits for both records sequentially) xact 1: COMMIT/ROLLBACK almost 10 seconds later xact 3 acquires lock for record 1, wait for lock on record2 xact 2: COMMIT/ROLLBACK almost 10 seconds later xact 3 acquires lock for record 2 3rd transaction has to wait for almost 2 times the specified time. E.g. in Informix the SET LOCK MODE TO WAIT N works for all to-be acquired locks combined. If lock_timeout and/or ... "FOR <lockmode> WAIT N" ever gets implemented, it should behave that way. Best regards, Zoltán Böszörményi > regards, tom lane > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > Would the "lock_timeout" work for all to be acquired locks individually, > or all of them combined for the statement? The individual application > of the timeout for every locks individually wouldn't be too nice. I think the way you're describing would be both harder to implement and full of its own strange traps. regards, tom lane
Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: > >> Would the "lock_timeout" work for all to be acquired locks individually, >> or all of them combined for the statement? The individual application >> of the timeout for every locks individually wouldn't be too nice. >> > > I think the way you're describing would be both harder to implement > and full of its own strange traps. > Why? PGSemaphoreTimedLock(..., struct timespec *timeout) { ... gettimeofday(&tv1, NULL); semtimedop(..., timeout); gettimeofday(&tv2, NULL); <decrease *timeout with the difference of tv1 and tv2> } Next call will use the decreased value. Either all locks are acquired in the given time, or the next try will timeout (error) or there are still locks and the timeout went down to or below zero (error). Why is it hard? > regards, tom lane > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > Tom Lane �rta: >> I think the way you're describing would be both harder to implement >> and full of its own strange traps. > Why? Well, for one thing: if I roll back a subtransaction, should the lock wait time it used now no longer count against the total? If not, once a timeout failure has occurred it'll no longer be possible for the total transaction to do anything, even if it rolls back a failed subtransaction. But more generally, what you are proposing seems largely duplicative with statement_timeout. The only reason I can see for a lock-wait-specific timeout is that you have a need to control the length of a specific wait and *not* the overall time spent. Hans already argued upthread why he wants a feature that doesn't act like statement_timeout. regards, tom lane
Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: > >> Tom Lane írta: >> >>> I think the way you're describing would be both harder to implement >>> and full of its own strange traps. >>> > > >> Why? >> > > Well, for one thing: if I roll back a subtransaction, should the lock > wait time it used now no longer count against the total? Does statement_timeout counts against subtransactions as well? No. If a statement finishes before statement_timeout, does it also decrease the possible runtime for the next statement? No. I was talking about locks acquired during one statement. > If not, > once a timeout failure has occurred it'll no longer be possible for > the total transaction to do anything, even if it rolls back a failed > subtransaction. > > But more generally, what you are proposing seems largely duplicative > with statement_timeout. The only reason I can see for a > lock-wait-specific timeout is that you have a need to control the > length of a specific wait and *not* the overall time spent. Hans > already argued upthread why he wants a feature that doesn't act like > statement_timeout. > He argued about he wants a timeout *independent* from statement_timeout for locks only inside the same statement IIRC. > regards, tom lane > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
2009/5/11 Boszormenyi Zoltan <zb@cybertec.at>: > Does statement_timeout counts against subtransactions as well? No. > If a statement finishes before statement_timeout, does it also decrease > the possible runtime for the next statement? No. I was talking about > locks acquired during one statement. With respect I can't figure out what you're trying to say here. > He argued about he wants a timeout *independent* from statement_timeout > for locks only inside the same statement IIRC. I think what you're saying is you think he only wanted to distinguish total time spent waiting for locks from total time spent executing including such things as i/o wait time. That's possible, Hans-Juergen wasn't very clear on what "locking issues" he was concerned about. I can think of a few categories of "locking issues" that might be problems though: 1) A web application wants to ensure that a slow batch job which locks records doesn't impact responsiveness. I think statement_timeout handles this better though. 2) A batch job might want to ensure it's still "making progress" even if slowly, but some other jobs might block indefinitely while holding locks (for example an email generating script might be stuck waiting for remote sites to respond). statement_timeout is better for ensuring overall execution speed but it won't fire until the entire time allotment is used up whereas something which detects being stuck on an individual lock would detect the problem much earlier (and perhaps the rest of the job could still be completed). 3) Applications which have hidden deadlocks because they block each other outside the database while holding locks in the database. This can be dealt with by using userlocks to represent the external resources but that depends on all of those external resources being identified correctly. A lock timeout would be an imprecise way to detect possible deadlocks even though it's always possible it just didn't wait long enough. Hans-Juergen, are any of these use cases good descriptions of your intended use? Or do you have a different case? -- greg
hello tom ... the reason for SELECT FOR UPDATE is very simple: this is the typical lock obtained by basically every business application if written properly (updating a product, whatever). the problem with NOWAIT basically is that if a small transaction holds a a lock for a subsecond, you will already lose your transaction because it does not wait at all (which is exactly what you want in some cases). however, in many cases you want to compromise on wait forever vs. die instantly. depending on the code path we could decide how long to wait for which operation. this makes sense as we would only fire 1 statement instead of 3 (set, run, set back). i agree that a GUC is definitely an option. however, i would say that adding an extension to SELECT FOR UPDATE, UPDATE and DELETE would make more sense form a usability point of view (just my 0.02 cents). if hackers' decides to go for a GUC, we are fine as well and we will add it to 8.5. many thanks, hans On May 11, 2009, at 4:46 PM, Tom Lane wrote: > Hans-Juergen Schoenig <postgres@cybertec.at> writes: >> i would like to propose an extension to our SELECT FOR UPDATE >> mechanism. >> especially in web applications it can be extremely useful to have the >> chance to terminate a lock after a given timeframe. > > I guess my immediate reactions to this are: > > 1. Why SELECT FOR UPDATE in particular, and not other sorts of locks? > > 2. That "clear and easy to use" oracle syntax sucks. You do not want > to be embedding lock timeout constants in your application queries. > When you move to a new server and the appropriate timeout changes, > do you want to be trying to update your clients for that? > > What I think has been proposed previously is a GUC variable named > something like "lock_timeout", which would cause a wait for *any* > heavyweight lock to abort after such-and-such an interval. This > would address your point about not wanting to use an overall > statement_timeout, and it would be more general than a feature > that only works for SELECT FOR UPDATE row locks, and it would allow > decoupling the exact length of the timeout from application query > logic. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de
Greg Stark írta: > 2009/5/11 Boszormenyi Zoltan <zb@cybertec.at>: > >> Does statement_timeout counts against subtransactions as well? No. >> If a statement finishes before statement_timeout, does it also decrease >> the possible runtime for the next statement? No. I was talking about >> locks acquired during one statement. >> > > With respect I can't figure out what you're trying to say here. > Sorry, bad rhetorics. Point correctly made is below. >> He argued about he wants a timeout *independent* from statement_timeout >> for locks only inside the same statement IIRC. >>
2009/5/11 Hans-Jürgen Schönig <postgres@cybertec.at>: > i agree that a GUC is definitely an option. > however, i would say that adding an extension to SELECT FOR UPDATE, UPDATE > and DELETE would make more sense form a usability point of view (just my > 0.02 cents). I kinda agree with this. I believe Tom was arguing upthread that any change of this short should touch all of the places where NOWAIT is accepted now, and I agree with that. But having to issue SET as a separate statement and then maybe do another SET afterward to get the old value back doesn't seem like it provides any real advantage. GUCs are good for properties that you want to set and leave set, not so good for things that are associated with particular statements. It also seems to me that there's no reason for NOWAIT to be part of the syntax, but WAIT n to be a GUC. ...Robert
> But more generally, what you are proposing seems largely duplicative > with statement_timeout. The only reason I can see for a > lock-wait-specific timeout is that you have a need to control the > length of a specific wait and *not* the overall time spent. Hans > already argued upthread why he wants a feature that doesn't act like > statement_timeout. I agree with Tom here; I want to wait for a specific amount of time for a specific lock request. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus írta: > >> But more generally, what you are proposing seems largely duplicative >> with statement_timeout. The only reason I can see for a >> lock-wait-specific timeout is that you have a need to control the >> length of a specific wait and *not* the overall time spent. Hans >> already argued upthread why he wants a feature that doesn't act like >> statement_timeout. > > I agree with Tom here; I want to wait for a specific amount of time > for a specific lock request. > Well, thinking about it a bit more, I think we can live with that. The use case would be mostly 1 record per SELECT FOR UPDATE WAIT N query, so for this the two semantics are equal. We would differ from Informix when one SELECT fetches more than one record obviously. We can have both GUC and the SQL extension for temporary setting. SET lock_timeout = N; -- 0 means infinite? or: SET lock_timeout = infinite; NOWAIT | WAIT (or no keyword as of now) for infinite waiting | WAIT DEFAULT | WAIT N (N seconds timeout) Comments? -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Robert Haas <robertmhaas@gmail.com> writes: > I kinda agree with this. I believe Tom was arguing upthread that any > change of this short should touch all of the places where NOWAIT is > accepted now, and I agree with that. But having to issue SET as a > separate statement and then maybe do another SET afterward to get the > old value back doesn't seem like it provides any real advantage. GUCs > are good for properties that you want to set and leave set, not so > good for things that are associated with particular statements. My point is that I don't believe the scenario where you say that you know exactly how long each different statement in your application should wait and they should all be different. What I do find credible is that you want to set a "policy" for all the lock timeouts. Now think about what happens when it's time to change the policy. A GUC is gonna be a lot easier to manage than timeouts that are embedded in all your individual queries. > It also seems to me that there's no reason for NOWAIT to be part of > the syntax, but WAIT n to be a GUC. I wasn't happy about NOWAIT in the syntax, either ;-) ... but at least that's a boolean and not a parameter whose specific value was plucked out of thin air, which is what it's pretty much always going to be. regards, tom lane
Tom, > My point is that I don't believe the scenario where you say that you > know exactly how long each different statement in your application > should wait and they should all be different. What I do find credible > is that you want to set a "policy" for all the lock timeouts. Now > think about what happens when it's time to change the policy. A GUC > is gonna be a lot easier to manage than timeouts that are embedded in > all your individual queries. For production applications, it's credible that you're going to desire three different behaviors for different locks: you'll want to not wait at all for some locks, wait a limited time for others, and for a few wait forever. I agree that the time for the 2nd case wouldn't vary per lock in any reasonable case. I can see Zoltan's argument: for web applications, it's important to keep the *total* wait time under 50 seconds for most users (default browser timeout for most is 60 seconds). So it would certainly be nice if we could somehow set total wait time instead of individual operation wait time. It's also completely and totally unworkable on the database layer for multiple reasons, so I'm not going to bother pushing any idea which implements this. So, I can see having a session-based lock_timeout GUC, and also a NOWAIT statement. It would mean that users would need to set lock_timeout=-1 if they didn't want the lock to timeout, but that's consistent with how other timeouts behave. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > I can see Zoltan's argument: for web applications, it's important to > keep the *total* wait time under 50 seconds for most users (default > browser timeout for most is 60 seconds). And why is that only about lock wait time and not about total execution time? I still think statement_timeout covers the need, or at least is close enough that it isn't justified to make lock_timeout act like that (thus making it not serve the other class of requirement). regards, tom lane
On 5/11/09 4:25 PM, Tom Lane wrote: > Josh Berkus<josh@agliodbs.com> writes: >> I can see Zoltan's argument: for web applications, it's important to >> keep the *total* wait time under 50 seconds for most users (default >> browser timeout for most is 60 seconds). > > And why is that only about lock wait time and not about total execution > time? I still think statement_timeout covers the need, or at least is > close enough that it isn't justified to make lock_timeout act like that > (thus making it not serve the other class of requirement). That was one of the reasons it's "completely and totally unworkable", as I mentioned, if you read the next sentence. The only real answer to the response time issue is to measure total response time in the middleware. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
hello everybody, from my side the goal of this discussion is to extract a consensus so that we can go ahead and implement this issue for 8.5. our customer here needs a solution to this problem and we have to come up with something which can then make it into PostgreSQL core. how shall we proceed with the decision finding process here? i am fine with a GUC and with an grammar extension - i just need a decision which stays unchanged. comments and votes are welcome. many thanks, hans -- Cybertec Schönig & Schönig GmbH Professional PostgreSQL Consulting, Support, Training Gröhrmühlgasse 26, A-2700 Wiener Neustadt Web: www.postgresql-support.de
Hans-Juergen Schoenig wrote: > hello everybody, > > from my side the goal of this discussion is to extract a consensus so > that we can go ahead and implement this issue for 8.5. > our customer here needs a solution to this problem and we have to come > up with something which can then make it into PostgreSQL core. > how shall we proceed with the decision finding process here? > i am fine with a GUC and with an grammar extension - i just need a > decision which stays unchanged. Do we have answer for Hans-Juergen here? I have added a vague TODO: Consider a lock timeout parameter * http://archives.postgresql.org/pgsql-hackers/2009-05/msg00485.php -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian írta: > Hans-Juergen Schoenig wrote: > >> hello everybody, >> >> from my side the goal of this discussion is to extract a consensus so >> that we can go ahead and implement this issue for 8.5. >> our customer here needs a solution to this problem and we have to come >> up with something which can then make it into PostgreSQL core. >> how shall we proceed with the decision finding process here? >> i am fine with a GUC and with an grammar extension - i just need a >> decision which stays unchanged. >> > > Do we have answer for Hans-Juergen here? > Do we? The vague consensus for syntax options was that the GUC 'lock_timeout' and WAIT [N] extension (wherever NOWAIT is allowed) both should be implemented. Behaviour would be that N seconds timeout should be applied to every lock that the statement would take. Can we go ahead implementing it? > I have added a vague TODO: > > Consider a lock timeout parameter > > * http://archives.postgresql.org/pgsql-hackers/2009-05/msg00485.php > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan wrote: > The vague consensus for syntax options was that the GUC > 'lock_timeout' and WAIT [N] extension (wherever NOWAIT > is allowed) both should be implemented. > > Behaviour would be that N seconds timeout should be > applied to every lock that the statement would take. In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us Tom argues that lock_timeout should be sufficient. I'm not sure what does WAIT [N] buy. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera írta: > Boszormenyi Zoltan wrote: > > >> The vague consensus for syntax options was that the GUC >> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT >> is allowed) both should be implemented. >> >> Behaviour would be that N seconds timeout should be >> applied to every lock that the statement would take. >> > > In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us > Tom argues that lock_timeout should be sufficient. I'm not sure what > does WAIT [N] buy. > Syntax consistency with NOWAIT? -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan wrote: > Alvaro Herrera írta: > > Boszormenyi Zoltan wrote: > > > >> The vague consensus for syntax options was that the GUC > >> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT > >> is allowed) both should be implemented. > >> > >> Behaviour would be that N seconds timeout should be > >> applied to every lock that the statement would take. > > > > In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us > > Tom argues that lock_timeout should be sufficient. I'm not sure what > > does WAIT [N] buy. > > Syntax consistency with NOWAIT? Consistency could also be achieved by removing NOWAIT, but I don't see you proposing that. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera írta: > Boszormenyi Zoltan wrote: > >> Alvaro Herrera írta: >> >>> Boszormenyi Zoltan wrote: >>> >>> >>>> The vague consensus for syntax options was that the GUC >>>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT >>>> is allowed) both should be implemented. >>>> >>>> Behaviour would be that N seconds timeout should be >>>> applied to every lock that the statement would take. >>>> >>> In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us >>> Tom argues that lock_timeout should be sufficient. I'm not sure what >>> does WAIT [N] buy. >>> >> Syntax consistency with NOWAIT? >> And easy of use in diverging from default lock_timeout? > Consistency could also be achieved by removing NOWAIT, but I don't see > you proposing that. > And you won't see me proposing any other feature removal either :-) -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Alvaro Herrera írta: > Boszormenyi Zoltan wrote: > > >> The vague consensus for syntax options was that the GUC >> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT >> is allowed) both should be implemented. >> >> Behaviour would be that N seconds timeout should be >> applied to every lock that the statement would take. >> > > In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us > Tom argues that lock_timeout should be sufficient. I'm not sure what > does WAIT [N] buy Okay, we implemented only the lock_timeout GUC. Patch attached, hopefully in an acceptable form. Documentation included in the patch, lock_timeout works the same way as statement_timeout, takes value in milliseconds and 0 disables the timeout. Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/ diff -dcrpN pgsql.orig/doc/src/sgml/config.sgml pgsql/doc/src/sgml/config.sgml *** pgsql.orig/doc/src/sgml/config.sgml 2009-07-17 07:50:48.000000000 +0200 --- pgsql/doc/src/sgml/config.sgml 2009-07-30 13:12:07.000000000 +0200 *************** COPY postgres_log FROM '/full/path/to/lo *** 4018,4023 **** --- 4018,4046 ---- </listitem> </varlistentry> + <varlistentry id="guc-lock-timeout" xreflabel="lock_timeout"> + <term><varname>lock_timeout</varname> (<type>integer</type>)</term> + <indexterm> + <primary><varname>lock_timeout</> configuration parameter</primary> + </indexterm> + <listitem> + <para> + Abort any statement that tries to lock any rows or tables and the lock + has to wait more than the specified number of milliseconds, starting + from the time the command arrives at the server from the client. + If <varname>log_min_error_statement</> is set to <literal>ERROR</> or + lower, the statement that timed out will also be logged. + A value of zero (the default) turns off the limitation. + </para> + + <para> + Setting <varname>lock_timeout</> in + <filename>postgresql.conf</> is not recommended because it + affects all sessions. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-vacuum-freeze-table-age" xreflabel="vacuum_freeze_table_age"> <term><varname>vacuum_freeze_table_age</varname> (<type>integer</type>)</term> <indexterm> diff -dcrpN pgsql.orig/doc/src/sgml/ref/lock.sgml pgsql/doc/src/sgml/ref/lock.sgml *** pgsql.orig/doc/src/sgml/ref/lock.sgml 2009-01-16 11:44:56.000000000 +0100 --- pgsql/doc/src/sgml/ref/lock.sgml 2009-07-30 13:29:07.000000000 +0200 *************** where <replaceable class="PARAMETER">loc *** 39,46 **** <literal>NOWAIT</literal> is specified, <command>LOCK TABLE</command> does not wait to acquire the desired lock: if it cannot be acquired immediately, the command is aborted and an ! error is emitted. Once obtained, the lock is held for the ! remainder of the current transaction. (There is no <command>UNLOCK TABLE</command> command; locks are always released at transaction end.) </para> --- 39,49 ---- <literal>NOWAIT</literal> is specified, <command>LOCK TABLE</command> does not wait to acquire the desired lock: if it cannot be acquired immediately, the command is aborted and an ! error is emitted. If <varname>lock_timeout</varname> is set to a value ! higher than 0, and the lock cannot be acquired under the specified ! timeout value in milliseconds, the command is aborted and an error ! is emitted. Once obtained, the lock is held for the remainder of ! the current transaction. (There is no <command>UNLOCK TABLE</command> command; locks are always released at transaction end.) </para> diff -dcrpN pgsql.orig/doc/src/sgml/ref/select.sgml pgsql/doc/src/sgml/ref/select.sgml *** pgsql.orig/doc/src/sgml/ref/select.sgml 2009-05-04 11:00:49.000000000 +0200 --- pgsql/doc/src/sgml/ref/select.sgml 2009-07-30 13:36:57.000000000 +0200 *************** FOR SHARE [ OF <replaceable class="param *** 1101,1106 **** --- 1101,1114 ---- </para> <para> + If <literal>NOWAIT</> option is not specified and <varname>lock_timeout</varname> + is set to a value higher than 0, and the lock needs to wait more than + the specified value in milliseconds, the command reports an error after + timing out, rather than waiting indefinitely. The note in the previous + paragraph applies to the <varname>lock_timeout</varname>, too. + </para> + + <para> <literal>FOR SHARE</literal> behaves similarly, except that it acquires a shared rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing diff -dcrpN pgsql.orig/src/backend/access/heap/heapam.c pgsql/src/backend/access/heap/heapam.c *** pgsql.orig/src/backend/access/heap/heapam.c 2009-06-13 18:24:46.000000000 +0200 --- pgsql/src/backend/access/heap/heapam.c 2009-07-30 12:29:17.000000000 +0200 *************** l3: *** 3142,3157 **** */ if (!have_tuple_lock) { if (nowait) ! { ! if (!ConditionalLockTuple(relation, tid, tuple_lock_type)) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); ! } ! else ! LockTuple(relation, tid, tuple_lock_type); have_tuple_lock = true; } --- 3142,3160 ---- */ if (!have_tuple_lock) { + bool result; + if (nowait) ! result = ConditionalLockTuple(relation, tid, tuple_lock_type); ! else ! result = TimedLockTuple(relation, tid, tuple_lock_type); ! ! if (!result) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); ! have_tuple_lock = true; } *************** l3: *** 3172,3188 **** } else if (infomask & HEAP_XMAX_IS_MULTI) { /* wait for multixact to end */ if (nowait) ! { ! if (!ConditionalMultiXactIdWait((MultiXactId) xwait)) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); - } - else - MultiXactIdWait((MultiXactId) xwait); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); --- 3175,3191 ---- } else if (infomask & HEAP_XMAX_IS_MULTI) { + bool result; /* wait for multixact to end */ if (nowait) ! result = ConditionalMultiXactIdWait((MultiXactId) xwait); ! else ! result = TimedMultiXactIdWait((MultiXactId) xwait); ! if (!result) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); *************** l3: *** 3207,3223 **** } else { /* wait for regular transaction to end */ if (nowait) ! { ! if (!ConditionalXactLockTableWait(xwait)) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); - } - else - XactLockTableWait(xwait); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); --- 3210,3226 ---- } else { + bool result; /* wait for regular transaction to end */ if (nowait) ! result = ConditionalXactLockTableWait(xwait); ! else ! result = TimedXactLockTableWait(xwait); ! if (!result) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); diff -dcrpN pgsql.orig/src/backend/access/transam/multixact.c pgsql/src/backend/access/transam/multixact.c *** pgsql.orig/src/backend/access/transam/multixact.c 2009-07-13 11:16:21.000000000 +0200 --- pgsql/src/backend/access/transam/multixact.c 2009-07-30 12:31:18.000000000 +0200 *************** ConditionalMultiXactIdWait(MultiXactId m *** 636,641 **** --- 636,678 ---- } /* + * TimedMultiXactIdWait + * As above, but only lock if we can get the lock under lock_timeout. + */ + bool + TimedMultiXactIdWait(MultiXactId multi) + { + bool result = true; + TransactionId *members; + int nmembers; + + nmembers = GetMultiXactIdMembers(multi, &members); + + if (nmembers >= 0) + { + int i; + + for (i = 0; i < nmembers; i++) + { + TransactionId member = members[i]; + + debug_elog4(DEBUG2, "ConditionalMultiXactIdWait: trying %d (%u)", + i, member); + if (!TransactionIdIsCurrentTransactionId(member)) + { + result = TimedXactLockTableWait(member); + if (!result) + break; + } + } + + pfree(members); + } + + return result; + } + + /* * CreateMultiXactId * Make a new MultiXactId * diff -dcrpN pgsql.orig/src/backend/commands/lockcmds.c pgsql/src/backend/commands/lockcmds.c *** pgsql.orig/src/backend/commands/lockcmds.c 2009-06-13 18:24:48.000000000 +0200 --- pgsql/src/backend/commands/lockcmds.c 2009-07-30 11:09:49.000000000 +0200 *************** LockTableRecurse(Oid reloid, RangeVar *r *** 65,70 **** --- 65,71 ---- { Relation rel; AclResult aclresult; + bool result; /* * Acquire the lock. We must do this first to protect against concurrent *************** LockTableRecurse(Oid reloid, RangeVar *r *** 72,97 **** * won't fail. */ if (nowait) { ! if (!ConditionalLockRelationOid(reloid, lockmode)) ! { ! /* try to throw error by name; relation could be deleted... */ ! char *relname = rv ? rv->relname : get_rel_name(reloid); ! if (relname) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation \"%s\"", relname))); ! else ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation with OID %u", reloid))); - } } - else - LockRelationOid(reloid, lockmode); /* * Now that we have the lock, check to see if the relation really exists --- 73,97 ---- * won't fail. */ if (nowait) + result = ConditionalLockRelationOid(reloid, lockmode); + else + result = TimedLockRelationOid(reloid, lockmode); + if (!result) { ! /* try to throw error by name; relation could be deleted... */ ! char *relname = rv ? rv->relname : get_rel_name(reloid); ! if (relname) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation \"%s\"", relname))); ! else ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation with OID %u", reloid))); } /* * Now that we have the lock, check to see if the relation really exists diff -dcrpN pgsql.orig/src/backend/port/posix_sema.c pgsql/src/backend/port/posix_sema.c *** pgsql.orig/src/backend/port/posix_sema.c 2009-06-13 18:24:55.000000000 +0200 --- pgsql/src/backend/port/posix_sema.c 2009-07-30 10:37:20.000000000 +0200 *************** *** 24,29 **** --- 24,30 ---- #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" /* for LockTimeout */ #ifdef USE_NAMED_POSIX_SEMAPHORES *************** PGSemaphoreTryLock(PGSemaphore sema) *** 313,315 **** --- 314,359 ---- return true; } + + /* + * PGSemaphoreTimedLock + * + * Lock a semaphore only if able to do so under the lock_timeout + */ + bool + PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK) + { + int errStatus; + struct timespec timeout; + + + /* + * See notes in sysv_sema.c's implementation of PGSemaphoreLock. Just as + * that code does for semop(), we handle both the case where sem_wait() + * returns errno == EINTR after a signal, and the case where it just keeps + * waiting. + */ + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + if (LockTimeout) + { + timeout.tv_sec = LockTimeout / 1000; + timeout.tv_nsec = (LockTimeout % 1000) * 1000000; + errStatus = sem_timedwait(PG_SEM_REF(sema), &timeout); + } + else + errStatus = sem_wait(PG_SEM_REF(sema)); + ImmediateInterruptOK = false; + } while (errStatus < 0 && errno == EINTR); + + if (errStatus < 0) + { + if (errno == ETIMEDOUT) + return false; /* failed to lock it */ + /* Otherwise we got trouble */ + elog(FATAL, "sem_wait failed: %m"); + } + return true; + } diff -dcrpN pgsql.orig/src/backend/port/sysv_sema.c pgsql/src/backend/port/sysv_sema.c *** pgsql.orig/src/backend/port/sysv_sema.c 2009-06-13 18:24:55.000000000 +0200 --- pgsql/src/backend/port/sysv_sema.c 2009-07-30 10:37:37.000000000 +0200 *************** *** 30,35 **** --- 30,36 ---- #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" /* for LockTimeout */ #ifndef HAVE_UNION_SEMUN *************** PGSemaphoreTryLock(PGSemaphore sema) *** 497,499 **** --- 498,590 ---- return true; } + + /* + * PGSemaphoreTimedLock + * + * Lock a semaphore only if able to do so under the lock_timeout + */ + bool + PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK) + { + int errStatus; + struct sembuf sops; + struct timespec timeout; + + sops.sem_op = -1; /* decrement */ + sops.sem_flg = 0; + sops.sem_num = sema->semNum; + + /* + * Note: if errStatus is -1 and errno == EINTR then it means we returned + * from the operation prematurely because we were sent a signal. So we + * try and lock the semaphore again. + * + * Each time around the loop, we check for a cancel/die interrupt. On + * some platforms, if such an interrupt comes in while we are waiting, it + * will cause the semop() call to exit with errno == EINTR, allowing us to + * service the interrupt (if not in a critical section already) during the + * next loop iteration. + * + * Once we acquire the lock, we do NOT check for an interrupt before + * returning. The caller needs to be able to record ownership of the lock + * before any interrupt can be accepted. + * + * There is a window of a few instructions between CHECK_FOR_INTERRUPTS + * and entering the semop() call. If a cancel/die interrupt occurs in + * that window, we would fail to notice it until after we acquire the lock + * (or get another interrupt to escape the semop()). We can avoid this + * problem by temporarily setting ImmediateInterruptOK to true before we + * do CHECK_FOR_INTERRUPTS; then, a die() interrupt in this interval will + * execute directly. However, there is a huge pitfall: there is another + * window of a few instructions after the semop() before we are able to + * reset ImmediateInterruptOK. If an interrupt occurs then, we'll lose + * control, which means that the lock has been acquired but our caller did + * not get a chance to record the fact. Therefore, we only set + * ImmediateInterruptOK if the caller tells us it's OK to do so, ie, the + * caller does not need to record acquiring the lock. (This is currently + * true for lockmanager locks, since the process that granted us the lock + * did all the necessary state updates. It's not true for SysV semaphores + * used to implement LW locks or emulate spinlocks --- but the wait time + * for such locks should not be very long, anyway.) + * + * On some platforms, signals marked SA_RESTART (which is most, for us) + * will not interrupt the semop(); it will just keep waiting. Therefore + * it's necessary for cancel/die interrupts to be serviced directly by the + * signal handler. On these platforms the behavior is really the same + * whether the signal arrives just before the semop() begins, or while it + * is waiting. The loop on EINTR is thus important only for other types + * of interrupts. + */ + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + if (LockTimeout) + { + timeout.tv_sec = LockTimeout / 1000; + timeout.tv_nsec = (LockTimeout % 1000) * 1000000; + errStatus = semtimedop(sema->semId, &sops, 1, &timeout); + } + else + errStatus = semop(sema->semId, &sops, 1); + ImmediateInterruptOK = false; + } while (errStatus < 0 && errno == EINTR); + + if (errStatus < 0) + { + /* Expect EAGAIN or EWOULDBLOCK (platform-dependent) */ + #ifdef EAGAIN + if (errno == EAGAIN) + return false; /* failed to lock it */ + #endif + #if defined(EWOULDBLOCK) && (!defined(EAGAIN) || (EWOULDBLOCK != EAGAIN)) + if (errno == EWOULDBLOCK) + return false; /* failed to lock it */ + #endif + /* Otherwise we got trouble */ + elog(FATAL, "semop(id=%d) failed: %m", sema->semId); + } + return true; + } + diff -dcrpN pgsql.orig/src/backend/port/win32_sema.c pgsql/src/backend/port/win32_sema.c *** pgsql.orig/src/backend/port/win32_sema.c 2009-06-13 18:24:55.000000000 +0200 --- pgsql/src/backend/port/win32_sema.c 2009-07-30 10:37:57.000000000 +0200 *************** *** 16,21 **** --- 16,22 ---- #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" /* for LockTimeout */ static HANDLE *mySemSet; /* IDs of sema sets acquired so far */ static int numSems; /* number of sema sets acquired so far */ *************** PGSemaphoreTryLock(PGSemaphore sema) *** 205,207 **** --- 206,267 ---- /* keep compiler quiet */ return false; } + + /* + * PGSemaphoreTimedLock + * + * Lock a semaphore only if able to do so under the lock_timeout + * Serve the interrupt if interruptOK is true. + */ + bool + PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK) + { + DWORD ret; + HANDLE wh[2]; + + wh[0] = *sema; + wh[1] = pgwin32_signal_event; + + /* + * As in other implementations of PGSemaphoreLock, we need to check for + * cancel/die interrupts each time through the loop. But here, there is + * no hidden magic about whether the syscall will internally service a + * signal --- we do that ourselves. + */ + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + + errno = 0; + ret = WaitForMultipleObjectsEx(2, wh, FALSE, LockTimeout ? LockTimeout : INFINITE, TRUE); + + if (ret == WAIT_OBJECT_0) + { + /* We got it! */ + return true; + } + else if (ret == WAIT_TIMEOUT) + { + /* Can't get it */ + errno = EAGAIN; + return false; + } + else if (ret == WAIT_OBJECT_0 + 1) + { + /* Signal event is set - we have a signal to deliver */ + pgwin32_dispatch_queued_signals(); + errno = EINTR; + } + else + /* Otherwise we are in trouble */ + errno = EIDRM; + + ImmediateInterruptOK = false; + } while (errno == EINTR); + + if (errno != 0) + ereport(FATAL, + (errmsg("could not lock semaphore: error code %d", (int) GetLastError()))); + } + diff -dcrpN pgsql.orig/src/backend/storage/lmgr/lmgr.c pgsql/src/backend/storage/lmgr/lmgr.c *** pgsql.orig/src/backend/storage/lmgr/lmgr.c 2009-01-02 17:15:28.000000000 +0100 --- pgsql/src/backend/storage/lmgr/lmgr.c 2009-07-30 12:28:44.000000000 +0200 *************** *** 21,26 **** --- 21,27 ---- #include "catalog/catalog.h" #include "miscadmin.h" #include "storage/lmgr.h" + #include "storage/proc.h" #include "storage/procarray.h" #include "utils/inval.h" *************** LockRelationOid(Oid relid, LOCKMODE lock *** 76,82 **** SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, false); /* * Now that we have the lock, check for invalidation messages, so that we --- 77,83 ---- SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); /* * Now that we have the lock, check for invalidation messages, so that we *************** ConditionalLockRelationOid(Oid relid, LO *** 108,114 **** SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, true); if (res == LOCKACQUIRE_NOT_AVAIL) return false; --- 109,144 ---- SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT); ! ! if (res == LOCKACQUIRE_NOT_AVAIL) ! return false; ! ! /* ! * Now that we have the lock, check for invalidation messages; see notes ! * in LockRelationOid. ! */ ! if (res != LOCKACQUIRE_ALREADY_HELD) ! AcceptInvalidationMessages(); ! ! return true; ! } ! ! /* ! * LockTimeoutRelationOid ! * ! * As LockRelationOid, but only lock if we can under lock_timeout. ! * Returns TRUE iff the lock was acquired. ! */ ! bool ! TimedLockRelationOid(Oid relid, LOCKMODE lockmode) ! { ! LOCKTAG tag; ! LockAcquireResult res; ! ! SetLocktagRelationOid(&tag, relid); ! ! res = LockAcquire(&tag, lockmode, false, false, LockTimeout); if (res == LOCKACQUIRE_NOT_AVAIL) return false; *************** LockRelation(Relation relation, LOCKMODE *** 171,177 **** relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, false); /* * Now that we have the lock, check for invalidation messages; see notes --- 201,207 ---- relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); /* * Now that we have the lock, check for invalidation messages; see notes *************** ConditionalLockRelation(Relation relatio *** 198,204 **** relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, true); if (res == LOCKACQUIRE_NOT_AVAIL) return false; --- 228,234 ---- relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT); if (res == LOCKACQUIRE_NOT_AVAIL) return false; *************** LockRelationIdForSession(LockRelId *reli *** 250,256 **** SET_LOCKTAG_RELATION(tag, relid->dbId, relid->relId); ! (void) LockAcquire(&tag, lockmode, true, false); } /* --- 280,286 ---- SET_LOCKTAG_RELATION(tag, relid->dbId, relid->relId); ! (void) LockAcquire(&tag, lockmode, true, false, INFINITE_TIMEOUT); } /* *************** LockRelationForExtension(Relation relati *** 285,291 **** relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 315,321 ---- relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** LockPage(Relation relation, BlockNumber *** 319,325 **** relation->rd_lockInfo.lockRelId.relId, blkno); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 349,355 ---- relation->rd_lockInfo.lockRelId.relId, blkno); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** ConditionalLockPage(Relation relation, B *** 338,344 **** relation->rd_lockInfo.lockRelId.relId, blkno); ! return (LockAcquire(&tag, lockmode, false, true) != LOCKACQUIRE_NOT_AVAIL); } /* --- 368,374 ---- relation->rd_lockInfo.lockRelId.relId, blkno); ! return (LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT) != LOCKACQUIRE_NOT_AVAIL); } /* *************** LockTuple(Relation relation, ItemPointer *** 375,381 **** ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 405,411 ---- ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** ConditionalLockTuple(Relation relation, *** 395,401 **** ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! return (LockAcquire(&tag, lockmode, false, true) != LOCKACQUIRE_NOT_AVAIL); } /* --- 425,451 ---- ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! return (LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT) != LOCKACQUIRE_NOT_AVAIL); ! } ! ! /* ! * TimedLockTuple ! * ! * As above, but only lock if we can get the lock under lock_timeout. ! * Returns TRUE iff the lock was acquired. ! */ ! bool ! TimedLockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode) ! { ! LOCKTAG tag; ! ! SET_LOCKTAG_TUPLE(tag, ! relation->rd_lockInfo.lockRelId.dbId, ! relation->rd_lockInfo.lockRelId.relId, ! ItemPointerGetBlockNumber(tid), ! ItemPointerGetOffsetNumber(tid)); ! ! return (LockAcquire(&tag, lockmode, false, false, LockTimeout) != LOCKACQUIRE_NOT_AVAIL); } /* *************** XactLockTableInsert(TransactionId xid) *** 429,435 **** SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false); } /* --- 479,485 ---- SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false, INFINITE_TIMEOUT); } /* *************** XactLockTableWait(TransactionId xid) *** 473,479 **** SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ShareLock, false, false); LockRelease(&tag, ShareLock, false); --- 523,529 ---- SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ShareLock, false, false, INFINITE_TIMEOUT); LockRelease(&tag, ShareLock, false); *************** ConditionalXactLockTableWait(Transaction *** 501,507 **** SET_LOCKTAG_TRANSACTION(tag, xid); ! if (LockAcquire(&tag, ShareLock, false, true) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); --- 551,589 ---- SET_LOCKTAG_TRANSACTION(tag, xid); ! if (LockAcquire(&tag, ShareLock, false, true, INFINITE_TIMEOUT) == LOCKACQUIRE_NOT_AVAIL) ! return false; ! ! LockRelease(&tag, ShareLock, false); ! ! if (!TransactionIdIsInProgress(xid)) ! break; ! xid = SubTransGetParent(xid); ! } ! ! return true; ! } ! ! ! /* ! * TimedXactLockTableWait ! * ! * As above, but only lock if we can get the lock under lock_timeout. ! * Returns TRUE if the lock was acquired. ! */ ! bool ! TimedXactLockTableWait(TransactionId xid) ! { ! LOCKTAG tag; ! ! for (;;) ! { ! Assert(TransactionIdIsValid(xid)); ! Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny())); ! ! SET_LOCKTAG_TRANSACTION(tag, xid); ! ! if (LockAcquire(&tag, ShareLock, false, false, LockTimeout) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); *************** VirtualXactLockTableInsert(VirtualTransa *** 531,537 **** SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false); } /* --- 613,619 ---- SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false, INFINITE_TIMEOUT); } /* *************** VirtualXactLockTableWait(VirtualTransact *** 549,555 **** SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ShareLock, false, false); LockRelease(&tag, ShareLock, false); } --- 631,637 ---- SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ShareLock, false, false, INFINITE_TIMEOUT); LockRelease(&tag, ShareLock, false); } *************** ConditionalVirtualXactLockTableWait(Virt *** 569,575 **** SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! if (LockAcquire(&tag, ShareLock, false, true) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); --- 651,657 ---- SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! if (LockAcquire(&tag, ShareLock, false, true, INFINITE_TIMEOUT) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); *************** LockDatabaseObject(Oid classid, Oid obji *** 598,604 **** objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 680,686 ---- objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** LockSharedObject(Oid classid, Oid objid, *** 636,642 **** objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false); /* Make sure syscaches are up-to-date with any changes we waited for */ AcceptInvalidationMessages(); --- 718,724 ---- objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); /* Make sure syscaches are up-to-date with any changes we waited for */ AcceptInvalidationMessages(); *************** LockSharedObjectForSession(Oid classid, *** 678,684 **** objid, objsubid); ! (void) LockAcquire(&tag, lockmode, true, false); } /* --- 760,766 ---- objid, objsubid); ! (void) LockAcquire(&tag, lockmode, true, false, INFINITE_TIMEOUT); } /* diff -dcrpN pgsql.orig/src/backend/storage/lmgr/lock.c pgsql/src/backend/storage/lmgr/lock.c *** pgsql.orig/src/backend/storage/lmgr/lock.c 2009-06-13 18:24:57.000000000 +0200 --- pgsql/src/backend/storage/lmgr/lock.c 2009-07-30 14:32:24.000000000 +0200 *************** PROCLOCK_PRINT(const char *where, const *** 254,260 **** static uint32 proclock_hash(const void *key, Size keysize); static void RemoveLocalLock(LOCALLOCK *locallock); static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner); ! static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner); static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode, PROCLOCK *proclock, LockMethod lockMethodTable); static void CleanUpLock(LOCK *lock, PROCLOCK *proclock, --- 254,261 ---- static uint32 proclock_hash(const void *key, Size keysize); static void RemoveLocalLock(LOCALLOCK *locallock); static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner); ! static int WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, ! int lock_timeout); static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode, PROCLOCK *proclock, LockMethod lockMethodTable); static void CleanUpLock(LOCK *lock, PROCLOCK *proclock, *************** LockAcquireResult *** 467,473 **** LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait) { LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid; LockMethod lockMethodTable; --- 468,475 ---- LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait, ! int lock_timeout) { LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid; LockMethod lockMethodTable; *************** LockAcquire(const LOCKTAG *locktag, *** 745,750 **** --- 747,754 ---- } else { + int wait_result; + Assert(status == STATUS_FOUND); /* *************** LockAcquire(const LOCKTAG *locktag, *** 794,800 **** locktag->locktag_type, lockmode); ! WaitOnLock(locallock, owner); TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1, locktag->locktag_field2, --- 798,804 ---- locktag->locktag_type, lockmode); ! wait_result = WaitOnLock(locallock, owner, lock_timeout); TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1, locktag->locktag_field2, *************** LockAcquire(const LOCKTAG *locktag, *** 813,818 **** --- 817,847 ---- * Check the proclock entry status, in case something in the ipc * communication doesn't work correctly. */ + if (wait_result == STATUS_WAITING) + { + if (proclock->holdMask == 0) + { + SHMQueueDelete(&proclock->lockLink); + SHMQueueDelete(&proclock->procLink); + if (!hash_search_with_hash_value(LockMethodProcLockHash, + (void *) &(proclock->tag), + proclock_hashcode, + HASH_REMOVE, + NULL)) + elog(PANIC, "proclock table corrupted"); + } + else + PROCLOCK_PRINT("LockAcquire: TIMED OUT", proclock); + lock->nRequested--; + lock->requested[lockmode]--; + LOCK_PRINT("LockAcquire: TIMED OUT", lock, lockmode); + Assert((lock->nRequested > 0) && (lock->requested[lockmode] >= 0)); + Assert(lock->nGranted <= lock->nRequested); + LWLockRelease(partitionLock); + if (locallock->nLocks == 0) + RemoveLocalLock(locallock); + return LOCKACQUIRE_NOT_AVAIL; + } if (!(proclock->holdMask & LOCKBIT_ON(lockmode))) { PROCLOCK_PRINT("LockAcquire: INCONSISTENT", proclock); *************** GrantAwaitedLock(void) *** 1105,1118 **** * Caller must have set MyProc->heldLocks to reflect locks already held * on the lockable object by this process. * * The appropriate partition lock must be held at entry. */ ! static void ! WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner) { LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock); LockMethod lockMethodTable = LockMethods[lockmethodid]; char *volatile new_status = NULL; LOCK_PRINT("WaitOnLock: sleeping on lock", locallock->lock, locallock->tag.mode); --- 1134,1153 ---- * Caller must have set MyProc->heldLocks to reflect locks already held * on the lockable object by this process. * + * Result: (returns value of ProcSleep()) + * STATUS_OK if we acquired the lock + * STATUS_ERROR if not (deadlock) + * STATUS_WAITING if not (timeout) + * * The appropriate partition lock must be held at entry. */ ! static int ! WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, int lock_timeout) { LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock); LockMethod lockMethodTable = LockMethods[lockmethodid]; char *volatile new_status = NULL; + int wait_status; LOCK_PRINT("WaitOnLock: sleeping on lock", locallock->lock, locallock->tag.mode); *************** WaitOnLock(LOCALLOCK *locallock, Resourc *** 1154,1161 **** */ PG_TRY(); { ! if (ProcSleep(locallock, lockMethodTable) != STATUS_OK) { /* * We failed as a result of a deadlock, see CheckDeadLock(). Quit * now. --- 1189,1208 ---- */ PG_TRY(); { ! wait_status = ProcSleep(locallock, lockMethodTable, lock_timeout); ! switch (wait_status) { + case STATUS_OK: + /* Got lock */ + break; + case STATUS_WAITING: + /* + * We failed as a result of a timeout. Quit now. + */ + LOCK_PRINT("WaitOnLock: timeout on lock", + locallock->lock, locallock->tag.mode); + break; + default: /* * We failed as a result of a deadlock, see CheckDeadLock(). Quit * now. *************** WaitOnLock(LOCALLOCK *locallock, Resourc *** 1202,1207 **** --- 1249,1256 ---- LOCK_PRINT("WaitOnLock: wakeup on lock", locallock->lock, locallock->tag.mode); + + return wait_status; } /* diff -dcrpN pgsql.orig/src/backend/storage/lmgr/lwlock.c pgsql/src/backend/storage/lmgr/lwlock.c *** pgsql.orig/src/backend/storage/lmgr/lwlock.c 2009-01-02 17:15:28.000000000 +0100 --- pgsql/src/backend/storage/lmgr/lwlock.c 2009-07-30 10:45:39.000000000 +0200 *************** LWLockConditionalAcquire(LWLockId lockid *** 554,559 **** --- 554,756 ---- } /* + * LWLockTimedAcquire - acquire a lightweight lock in the specified mode + * + * If the lock is not available, sleep until it is or until lock_timeout + * whichever is sooner + * + * Side effect: cancel/die interrupts are held off until lock release. + */ + bool + LWLockTimedAcquire(LWLockId lockid, LWLockMode mode) + { + volatile LWLock *lock = &(LWLockArray[lockid].lock); + PGPROC *proc = MyProc; + bool retry = false; + int extraWaits = 0; + bool timeout; + + PRINT_LWDEBUG("LWLockAcquire", lockid, lock); + + #ifdef LWLOCK_STATS + /* Set up local count state first time through in a given process */ + if (counts_for_pid != MyProcPid) + { + int *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int)); + int numLocks = LWLockCounter[1]; + + sh_acquire_counts = calloc(numLocks, sizeof(int)); + ex_acquire_counts = calloc(numLocks, sizeof(int)); + block_counts = calloc(numLocks, sizeof(int)); + counts_for_pid = MyProcPid; + on_shmem_exit(print_lwlock_stats, 0); + } + /* Count lock acquisition attempts */ + if (mode == LW_EXCLUSIVE) + ex_acquire_counts[lockid]++; + else + sh_acquire_counts[lockid]++; + #endif /* LWLOCK_STATS */ + + /* + * We can't wait if we haven't got a PGPROC. This should only occur + * during bootstrap or shared memory initialization. Put an Assert here + * to catch unsafe coding practices. + */ + Assert(!(proc == NULL && IsUnderPostmaster)); + + /* Ensure we will have room to remember the lock */ + if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS) + elog(ERROR, "too many LWLocks taken"); + + /* + * Lock out cancel/die interrupts until we exit the code section protected + * by the LWLock. This ensures that interrupts will not interfere with + * manipulations of data structures in shared memory. + */ + HOLD_INTERRUPTS(); + + /* + * Loop here to try to acquire lock after each time we are signaled by + * LWLockRelease. + * + * NOTE: it might seem better to have LWLockRelease actually grant us the + * lock, rather than retrying and possibly having to go back to sleep. But + * in practice that is no good because it means a process swap for every + * lock acquisition when two or more processes are contending for the same + * lock. Since LWLocks are normally used to protect not-very-long + * sections of computation, a process needs to be able to acquire and + * release the same lock many times during a single CPU time slice, even + * in the presence of contention. The efficiency of being able to do that + * outweighs the inefficiency of sometimes wasting a process dispatch + * cycle because the lock is not free when a released waiter finally gets + * to run. See pgsql-hackers archives for 29-Dec-01. + */ + for (;;) + { + bool mustwait; + + /* Acquire mutex. Time spent holding mutex should be short! */ + SpinLockAcquire(&lock->mutex); + + /* If retrying, allow LWLockRelease to release waiters again */ + if (retry) + lock->releaseOK = true; + + /* If I can get the lock, do so quickly. */ + if (mode == LW_EXCLUSIVE) + { + if (lock->exclusive == 0 && lock->shared == 0) + { + lock->exclusive++; + mustwait = false; + } + else + mustwait = true; + } + else + { + if (lock->exclusive == 0) + { + lock->shared++; + mustwait = false; + } + else + mustwait = true; + } + + if (!mustwait) + break; /* got the lock */ + + /* + * Add myself to wait queue. + * + * If we don't have a PGPROC structure, there's no way to wait. This + * should never occur, since MyProc should only be null during shared + * memory initialization. + */ + if (proc == NULL) + elog(PANIC, "cannot wait without a PGPROC structure"); + + proc->lwWaiting = true; + proc->lwExclusive = (mode == LW_EXCLUSIVE); + proc->lwWaitLink = NULL; + if (lock->head == NULL) + lock->head = proc; + else + lock->tail->lwWaitLink = proc; + lock->tail = proc; + + /* Can release the mutex now */ + SpinLockRelease(&lock->mutex); + + /* + * Wait until awakened. + * + * Since we share the process wait semaphore with the regular lock + * manager and ProcWaitForSignal, and we may need to acquire an LWLock + * while one of those is pending, it is possible that we get awakened + * for a reason other than being signaled by LWLockRelease. If so, + * loop back and wait again. Once we've gotten the LWLock, + * re-increment the sema by the number of additional signals received, + * so that the lock manager or signal manager will see the received + * signal when it next waits. + */ + LOG_LWDEBUG("LWLockAcquire", lockid, "waiting"); + + #ifdef LWLOCK_STATS + block_counts[lockid]++; + #endif + + TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode); + + for (;;) + { + /* "false" means cannot accept cancel/die interrupt here. */ + timeout = !PGSemaphoreTimedLock(&proc->sem, false); + if (timeout) + break; + if (!proc->lwWaiting) + break; + extraWaits++; + } + + TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode); + + if (timeout) + { + LOG_LWDEBUG("LWLockTimedAcquire", lockid, "timed out"); + break; + } + + LOG_LWDEBUG("LWLockTimedAcquire", lockid, "awakened"); + + /* Now loop back and try to acquire lock again. */ + retry = true; + } + + /* We are done updating shared state of the lock itself. */ + SpinLockRelease(&lock->mutex); + + if (timeout) + goto out; + + TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode); + + /* Add lock to list of locks held by this backend */ + held_lwlocks[num_held_lwlocks++] = lockid; + + out: + /* + * Fix the process wait semaphore's count for any absorbed wakeups. + */ + while (extraWaits-- > 0) + PGSemaphoreUnlock(&proc->sem); + + return !timeout; + } + + /* * LWLockRelease - release a previously acquired lock */ void diff -dcrpN pgsql.orig/src/backend/storage/lmgr/proc.c pgsql/src/backend/storage/lmgr/proc.c *** pgsql.orig/src/backend/storage/lmgr/proc.c 2009-06-13 18:24:57.000000000 +0200 --- pgsql/src/backend/storage/lmgr/proc.c 2009-07-30 11:35:54.000000000 +0200 *************** *** 50,55 **** --- 50,56 ---- /* GUC variables */ int DeadlockTimeout = 1000; int StatementTimeout = 0; + int LockTimeout = 0; bool log_lock_waits = false; /* Pointer to this process's PGPROC struct, if any */ *************** ProcQueueInit(PROC_QUEUE *queue) *** 717,723 **** * The lock table's partition lock must be held at entry, and will be held * at exit. * ! * Result: STATUS_OK if we acquired the lock, STATUS_ERROR if not (deadlock). * * ASSUME: that no one will fiddle with the queue until after * we release the partition lock. --- 718,727 ---- * The lock table's partition lock must be held at entry, and will be held * at exit. * ! * Result: ! * STATUS_OK if we acquired the lock ! * STATUS_ERROR if not (deadlock) ! * STATUS_WAITING if not (timeout) * * ASSUME: that no one will fiddle with the queue until after * we release the partition lock. *************** ProcQueueInit(PROC_QUEUE *queue) *** 728,734 **** * semaphore is normally zero, so when we try to acquire it, we sleep. */ int ! ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) { LOCKMODE lockmode = locallock->tag.mode; LOCK *lock = locallock->lock; --- 732,738 ---- * semaphore is normally zero, so when we try to acquire it, we sleep. */ int ! ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, int lock_timeout) { LOCKMODE lockmode = locallock->tag.mode; LOCK *lock = locallock->lock; *************** ProcSleep(LOCALLOCK *locallock, LockMeth *** 889,895 **** */ do { ! PGSemaphoreLock(&MyProc->sem, true); /* * waitStatus could change from STATUS_WAITING to something else --- 893,902 ---- */ do { ! if (lock_timeout == INFINITE_TIMEOUT) ! PGSemaphoreLock(&MyProc->sem, true); ! else if (!PGSemaphoreTimedLock(&MyProc->sem, true)) ! break; /* * waitStatus could change from STATUS_WAITING to something else *************** ProcSleep(LOCALLOCK *locallock, LockMeth *** 906,912 **** { PGPROC *autovac = GetBlockingAutoVacuumPgproc(); ! LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); /* * Only do it if the worker is not working to protect against Xid --- 913,922 ---- { PGPROC *autovac = GetBlockingAutoVacuumPgproc(); ! if (lock_timeout == INFINITE_TIMEOUT) ! LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); ! else if (!LWLockTimedAcquire(ProcArrayLock, LW_EXCLUSIVE)) ! break; /* * Only do it if the worker is not working to protect against Xid diff -dcrpN pgsql.orig/src/backend/utils/adt/lockfuncs.c pgsql/src/backend/utils/adt/lockfuncs.c *** pgsql.orig/src/backend/utils/adt/lockfuncs.c 2009-01-02 17:15:30.000000000 +0100 --- pgsql/src/backend/utils/adt/lockfuncs.c 2009-07-30 11:40:19.000000000 +0200 *************** pg_advisory_lock_int8(PG_FUNCTION_ARGS) *** 337,343 **** SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ExclusiveLock, true, false); PG_RETURN_VOID(); } --- 337,343 ---- SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ExclusiveLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_advisory_lock_shared_int8(PG_FUNCTION *** 353,359 **** SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ShareLock, true, false); PG_RETURN_VOID(); } --- 353,359 ---- SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ShareLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_try_advisory_lock_int8(PG_FUNCTION_AR *** 372,378 **** SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ExclusiveLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 372,378 ---- SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ExclusiveLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } *************** pg_try_advisory_lock_shared_int8(PG_FUNC *** 391,397 **** SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ShareLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 391,397 ---- SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ShareLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } *************** pg_advisory_lock_int4(PG_FUNCTION_ARGS) *** 446,452 **** SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ExclusiveLock, true, false); PG_RETURN_VOID(); } --- 446,452 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ExclusiveLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_advisory_lock_shared_int4(PG_FUNCTION *** 463,469 **** SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ShareLock, true, false); PG_RETURN_VOID(); } --- 463,469 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ShareLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_try_advisory_lock_int4(PG_FUNCTION_AR *** 483,489 **** SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ExclusiveLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 483,489 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ExclusiveLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } *************** pg_try_advisory_lock_shared_int4(PG_FUNC *** 503,509 **** SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ShareLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 503,509 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ShareLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } diff -dcrpN pgsql.orig/src/backend/utils/misc/guc.c pgsql/src/backend/utils/misc/guc.c *** pgsql.orig/src/backend/utils/misc/guc.c 2009-07-23 14:40:12.000000000 +0200 --- pgsql/src/backend/utils/misc/guc.c 2009-07-30 10:08:25.000000000 +0200 *************** static struct config_int ConfigureNamesI *** 1539,1544 **** --- 1539,1554 ---- }, { + {"lock_timeout", PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop("Sets the maximum allowed timeout for any lock taken by a statement."), + gettext_noop("A value of 0 turns off the timeout."), + GUC_UNIT_MS + }, + &LockTimeout, + 0, 0, INT_MAX, NULL, NULL + }, + + { {"vacuum_freeze_min_age", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Minimum age at which VACUUM should freeze a table row."), NULL diff -dcrpN pgsql.orig/src/include/access/multixact.h pgsql/src/include/access/multixact.h *** pgsql.orig/src/include/access/multixact.h 2009-01-02 17:15:37.000000000 +0100 --- pgsql/src/include/access/multixact.h 2009-07-30 12:25:55.000000000 +0200 *************** extern bool MultiXactIdIsRunning(MultiXa *** 48,53 **** --- 48,54 ---- extern bool MultiXactIdIsCurrent(MultiXactId multi); extern void MultiXactIdWait(MultiXactId multi); extern bool ConditionalMultiXactIdWait(MultiXactId multi); + extern bool TimedMultiXactIdWait(MultiXactId multi); extern void MultiXactIdSetOldestMember(void); extern int GetMultiXactIdMembers(MultiXactId multi, TransactionId **xids); diff -dcrpN pgsql.orig/src/include/storage/lmgr.h pgsql/src/include/storage/lmgr.h *** pgsql.orig/src/include/storage/lmgr.h 2009-06-13 18:25:05.000000000 +0200 --- pgsql/src/include/storage/lmgr.h 2009-07-30 12:25:19.000000000 +0200 *************** extern void RelationInitLockInfo(Relatio *** 25,30 **** --- 25,31 ---- /* Lock a relation */ extern void LockRelationOid(Oid relid, LOCKMODE lockmode); extern bool ConditionalLockRelationOid(Oid relid, LOCKMODE lockmode); + extern bool TimedLockRelationOid(Oid relid, LOCKMODE lockmode); extern void UnlockRelationId(LockRelId *relid, LOCKMODE lockmode); extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode); *************** extern void UnlockPage(Relation relation *** 48,53 **** --- 49,56 ---- extern void LockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode); extern bool ConditionalLockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode); + extern bool TimedLockTuple(Relation relation, ItemPointer tid, + LOCKMODE lockmode); extern void UnlockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode); /* Lock an XID (used to wait for a transaction to finish) */ *************** extern void XactLockTableInsert(Transact *** 55,60 **** --- 58,64 ---- extern void XactLockTableDelete(TransactionId xid); extern void XactLockTableWait(TransactionId xid); extern bool ConditionalXactLockTableWait(TransactionId xid); + extern bool TimedXactLockTableWait(TransactionId xid); /* Lock a VXID (used to wait for a transaction to finish) */ extern void VirtualXactLockTableInsert(VirtualTransactionId vxid); diff -dcrpN pgsql.orig/src/include/storage/lock.h pgsql/src/include/storage/lock.h *** pgsql.orig/src/include/storage/lock.h 2009-04-14 10:28:46.000000000 +0200 --- pgsql/src/include/storage/lock.h 2009-07-30 11:28:44.000000000 +0200 *************** extern uint32 LockTagHashCode(const LOCK *** 476,482 **** extern LockAcquireResult LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait); extern bool LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock); extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks); --- 476,483 ---- extern LockAcquireResult LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait, ! int lock_timeout); extern bool LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock); extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks); diff -dcrpN pgsql.orig/src/include/storage/lwlock.h pgsql/src/include/storage/lwlock.h *** pgsql.orig/src/include/storage/lwlock.h 2009-03-04 10:27:30.000000000 +0100 --- pgsql/src/include/storage/lwlock.h 2009-07-30 11:36:44.000000000 +0200 *************** extern bool Trace_lwlocks; *** 92,97 **** --- 92,98 ---- extern LWLockId LWLockAssign(void); extern void LWLockAcquire(LWLockId lockid, LWLockMode mode); extern bool LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode); + extern bool LWLockTimedAcquire(LWLockId lockid, LWLockMode mode); extern void LWLockRelease(LWLockId lockid); extern void LWLockReleaseAll(void); extern bool LWLockHeldByMe(LWLockId lockid); diff -dcrpN pgsql.orig/src/include/storage/pg_sema.h pgsql/src/include/storage/pg_sema.h *** pgsql.orig/src/include/storage/pg_sema.h 2009-01-02 17:15:39.000000000 +0100 --- pgsql/src/include/storage/pg_sema.h 2009-07-30 10:36:19.000000000 +0200 *************** extern void PGSemaphoreUnlock(PGSemaphor *** 80,83 **** --- 80,86 ---- /* Lock a semaphore only if able to do so without blocking */ extern bool PGSemaphoreTryLock(PGSemaphore sema); + /* Lock a semaphore only if able to do so under the lock_timeout */ + extern bool PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK); + #endif /* PG_SEMA_H */ diff -dcrpN pgsql.orig/src/include/storage/proc.h pgsql/src/include/storage/proc.h *** pgsql.orig/src/include/storage/proc.h 2009-02-26 12:23:28.000000000 +0100 --- pgsql/src/include/storage/proc.h 2009-07-30 11:15:01.000000000 +0200 *************** typedef struct PROC_HDR *** 146,155 **** --- 146,158 ---- */ #define NUM_AUXILIARY_PROCS 3 + /* For checking LockTimeout */ + #define INFINITE_TIMEOUT 0 /* configurable options */ extern int DeadlockTimeout; extern int StatementTimeout; + extern int LockTimeout; extern bool log_lock_waits; extern volatile bool cancel_from_timeout; *************** extern bool HaveNFreeProcs(int n); *** 168,174 **** extern void ProcReleaseLocks(bool isCommit); extern void ProcQueueInit(PROC_QUEUE *queue); ! extern int ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable); extern PGPROC *ProcWakeup(PGPROC *proc, int waitStatus); extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock); extern void LockWaitCancel(void); --- 171,177 ---- extern void ProcReleaseLocks(bool isCommit); extern void ProcQueueInit(PROC_QUEUE *queue); ! extern int ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, int lock_timeout); extern PGPROC *ProcWakeup(PGPROC *proc, int waitStatus); extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock); extern void LockWaitCancel(void);
Boszormenyi Zoltan írta: > Alvaro Herrera írta: > >> Boszormenyi Zoltan wrote: >> >> >> >>> The vague consensus for syntax options was that the GUC >>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT >>> is allowed) both should be implemented. >>> >>> Behaviour would be that N seconds timeout should be >>> applied to every lock that the statement would take. >>> >>> >> In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us >> Tom argues that lock_timeout should be sufficient. I'm not sure what >> does WAIT [N] buy >> > > Okay, we implemented only the lock_timeout GUC. > Patch attached, hopefully in an acceptable form. > Documentation included in the patch, lock_timeout > works the same way as statement_timeout, takes > value in milliseconds and 0 disables the timeout. > > Best regards, > Zoltán Böszörményi > New patch attached. It's only regenerated for current CVS so it should apply cleanly. -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/ diff -dcrpN pgsql.ooscur/doc/src/sgml/config.sgml pgsql.locktimeout/doc/src/sgml/config.sgml *** pgsql.ooscur/doc/src/sgml/config.sgml 2009-08-26 10:19:48.000000000 +0200 --- pgsql.locktimeout/doc/src/sgml/config.sgml 2009-09-03 15:41:34.000000000 +0200 *************** COPY postgres_log FROM '/full/path/to/lo *** 4028,4033 **** --- 4028,4056 ---- </listitem> </varlistentry> + <varlistentry id="guc-lock-timeout" xreflabel="lock_timeout"> + <term><varname>lock_timeout</varname> (<type>integer</type>)</term> + <indexterm> + <primary><varname>lock_timeout</> configuration parameter</primary> + </indexterm> + <listitem> + <para> + Abort any statement that tries to lock any rows or tables and the lock + has to wait more than the specified number of milliseconds, starting + from the time the command arrives at the server from the client. + If <varname>log_min_error_statement</> is set to <literal>ERROR</> or + lower, the statement that timed out will also be logged. + A value of zero (the default) turns off the limitation. + </para> + + <para> + Setting <varname>lock_timeout</> in + <filename>postgresql.conf</> is not recommended because it + affects all sessions. + </para> + </listitem> + </varlistentry> + <varlistentry id="guc-vacuum-freeze-table-age" xreflabel="vacuum_freeze_table_age"> <term><varname>vacuum_freeze_table_age</varname> (<type>integer</type>)</term> <indexterm> diff -dcrpN pgsql.ooscur/doc/src/sgml/ref/lock.sgml pgsql.locktimeout/doc/src/sgml/ref/lock.sgml *** pgsql.ooscur/doc/src/sgml/ref/lock.sgml 2009-01-16 11:44:56.000000000 +0100 --- pgsql.locktimeout/doc/src/sgml/ref/lock.sgml 2009-09-03 15:41:34.000000000 +0200 *************** where <replaceable class="PARAMETER">loc *** 39,46 **** <literal>NOWAIT</literal> is specified, <command>LOCK TABLE</command> does not wait to acquire the desired lock: if it cannot be acquired immediately, the command is aborted and an ! error is emitted. Once obtained, the lock is held for the ! remainder of the current transaction. (There is no <command>UNLOCK TABLE</command> command; locks are always released at transaction end.) </para> --- 39,49 ---- <literal>NOWAIT</literal> is specified, <command>LOCK TABLE</command> does not wait to acquire the desired lock: if it cannot be acquired immediately, the command is aborted and an ! error is emitted. If <varname>lock_timeout</varname> is set to a value ! higher than 0, and the lock cannot be acquired under the specified ! timeout value in milliseconds, the command is aborted and an error ! is emitted. Once obtained, the lock is held for the remainder of ! the current transaction. (There is no <command>UNLOCK TABLE</command> command; locks are always released at transaction end.) </para> diff -dcrpN pgsql.ooscur/doc/src/sgml/ref/select.sgml pgsql.locktimeout/doc/src/sgml/ref/select.sgml *** pgsql.ooscur/doc/src/sgml/ref/select.sgml 2009-08-31 12:55:43.000000000 +0200 --- pgsql.locktimeout/doc/src/sgml/ref/select.sgml 2009-09-03 15:41:34.000000000 +0200 *************** FOR SHARE [ OF <replaceable class="param *** 1109,1114 **** --- 1109,1122 ---- </para> <para> + If <literal>NOWAIT</> option is not specified and <varname>lock_timeout</varname> + is set to a value higher than 0, and the lock needs to wait more than + the specified value in milliseconds, the command reports an error after + timing out, rather than waiting indefinitely. The note in the previous + paragraph applies to the <varname>lock_timeout</varname>, too. + </para> + + <para> <literal>FOR SHARE</literal> behaves similarly, except that it acquires a shared rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing diff -dcrpN pgsql.ooscur/src/backend/access/heap/heapam.c pgsql.locktimeout/src/backend/access/heap/heapam.c *** pgsql.ooscur/src/backend/access/heap/heapam.c 2009-08-24 15:58:20.000000000 +0200 --- pgsql.locktimeout/src/backend/access/heap/heapam.c 2009-09-03 15:41:34.000000000 +0200 *************** l3: *** 3139,3154 **** */ if (!have_tuple_lock) { if (nowait) ! { ! if (!ConditionalLockTuple(relation, tid, tuple_lock_type)) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); ! } ! else ! LockTuple(relation, tid, tuple_lock_type); have_tuple_lock = true; } --- 3139,3157 ---- */ if (!have_tuple_lock) { + bool result; + if (nowait) ! result = ConditionalLockTuple(relation, tid, tuple_lock_type); ! else ! result = TimedLockTuple(relation, tid, tuple_lock_type); ! ! if (!result) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); ! have_tuple_lock = true; } *************** l3: *** 3169,3185 **** } else if (infomask & HEAP_XMAX_IS_MULTI) { /* wait for multixact to end */ if (nowait) ! { ! if (!ConditionalMultiXactIdWait((MultiXactId) xwait)) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); - } - else - MultiXactIdWait((MultiXactId) xwait); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); --- 3172,3188 ---- } else if (infomask & HEAP_XMAX_IS_MULTI) { + bool result; /* wait for multixact to end */ if (nowait) ! result = ConditionalMultiXactIdWait((MultiXactId) xwait); ! else ! result = TimedMultiXactIdWait((MultiXactId) xwait); ! if (!result) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); *************** l3: *** 3204,3220 **** } else { /* wait for regular transaction to end */ if (nowait) ! { ! if (!ConditionalXactLockTableWait(xwait)) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); - } - else - XactLockTableWait(xwait); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); --- 3207,3223 ---- } else { + bool result; /* wait for regular transaction to end */ if (nowait) ! result = ConditionalXactLockTableWait(xwait); ! else ! result = TimedXactLockTableWait(xwait); ! if (!result) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on row in relation \"%s\"", RelationGetRelationName(relation)))); LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); diff -dcrpN pgsql.ooscur/src/backend/access/transam/multixact.c pgsql.locktimeout/src/backend/access/transam/multixact.c *** pgsql.ooscur/src/backend/access/transam/multixact.c 2009-07-13 11:16:21.000000000 +0200 --- pgsql.locktimeout/src/backend/access/transam/multixact.c 2009-09-03 15:41:34.000000000 +0200 *************** ConditionalMultiXactIdWait(MultiXactId m *** 636,641 **** --- 636,678 ---- } /* + * TimedMultiXactIdWait + * As above, but only lock if we can get the lock under lock_timeout. + */ + bool + TimedMultiXactIdWait(MultiXactId multi) + { + bool result = true; + TransactionId *members; + int nmembers; + + nmembers = GetMultiXactIdMembers(multi, &members); + + if (nmembers >= 0) + { + int i; + + for (i = 0; i < nmembers; i++) + { + TransactionId member = members[i]; + + debug_elog4(DEBUG2, "ConditionalMultiXactIdWait: trying %d (%u)", + i, member); + if (!TransactionIdIsCurrentTransactionId(member)) + { + result = TimedXactLockTableWait(member); + if (!result) + break; + } + } + + pfree(members); + } + + return result; + } + + /* * CreateMultiXactId * Make a new MultiXactId * diff -dcrpN pgsql.ooscur/src/backend/commands/lockcmds.c pgsql.locktimeout/src/backend/commands/lockcmds.c *** pgsql.ooscur/src/backend/commands/lockcmds.c 2009-06-13 18:24:48.000000000 +0200 --- pgsql.locktimeout/src/backend/commands/lockcmds.c 2009-09-03 15:41:34.000000000 +0200 *************** LockTableRecurse(Oid reloid, RangeVar *r *** 65,70 **** --- 65,71 ---- { Relation rel; AclResult aclresult; + bool result; /* * Acquire the lock. We must do this first to protect against concurrent *************** LockTableRecurse(Oid reloid, RangeVar *r *** 72,97 **** * won't fail. */ if (nowait) { ! if (!ConditionalLockRelationOid(reloid, lockmode)) ! { ! /* try to throw error by name; relation could be deleted... */ ! char *relname = rv ? rv->relname : get_rel_name(reloid); ! if (relname) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation \"%s\"", relname))); ! else ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation with OID %u", reloid))); - } } - else - LockRelationOid(reloid, lockmode); /* * Now that we have the lock, check to see if the relation really exists --- 73,97 ---- * won't fail. */ if (nowait) + result = ConditionalLockRelationOid(reloid, lockmode); + else + result = TimedLockRelationOid(reloid, lockmode); + if (!result) { ! /* try to throw error by name; relation could be deleted... */ ! char *relname = rv ? rv->relname : get_rel_name(reloid); ! if (relname) ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation \"%s\"", relname))); ! else ! ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), errmsg("could not obtain lock on relation with OID %u", reloid))); } /* * Now that we have the lock, check to see if the relation really exists diff -dcrpN pgsql.ooscur/src/backend/port/posix_sema.c pgsql.locktimeout/src/backend/port/posix_sema.c *** pgsql.ooscur/src/backend/port/posix_sema.c 2009-06-13 18:24:55.000000000 +0200 --- pgsql.locktimeout/src/backend/port/posix_sema.c 2009-09-03 15:41:34.000000000 +0200 *************** *** 24,29 **** --- 24,30 ---- #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" /* for LockTimeout */ #ifdef USE_NAMED_POSIX_SEMAPHORES *************** PGSemaphoreTryLock(PGSemaphore sema) *** 313,315 **** --- 314,359 ---- return true; } + + /* + * PGSemaphoreTimedLock + * + * Lock a semaphore only if able to do so under the lock_timeout + */ + bool + PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK) + { + int errStatus; + struct timespec timeout; + + + /* + * See notes in sysv_sema.c's implementation of PGSemaphoreLock. Just as + * that code does for semop(), we handle both the case where sem_wait() + * returns errno == EINTR after a signal, and the case where it just keeps + * waiting. + */ + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + if (LockTimeout) + { + timeout.tv_sec = LockTimeout / 1000; + timeout.tv_nsec = (LockTimeout % 1000) * 1000000; + errStatus = sem_timedwait(PG_SEM_REF(sema), &timeout); + } + else + errStatus = sem_wait(PG_SEM_REF(sema)); + ImmediateInterruptOK = false; + } while (errStatus < 0 && errno == EINTR); + + if (errStatus < 0) + { + if (errno == ETIMEDOUT) + return false; /* failed to lock it */ + /* Otherwise we got trouble */ + elog(FATAL, "sem_wait failed: %m"); + } + return true; + } diff -dcrpN pgsql.ooscur/src/backend/port/sysv_sema.c pgsql.locktimeout/src/backend/port/sysv_sema.c *** pgsql.ooscur/src/backend/port/sysv_sema.c 2009-06-13 18:24:55.000000000 +0200 --- pgsql.locktimeout/src/backend/port/sysv_sema.c 2009-09-03 15:41:34.000000000 +0200 *************** *** 30,35 **** --- 30,36 ---- #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" /* for LockTimeout */ #ifndef HAVE_UNION_SEMUN *************** PGSemaphoreTryLock(PGSemaphore sema) *** 497,499 **** --- 498,590 ---- return true; } + + /* + * PGSemaphoreTimedLock + * + * Lock a semaphore only if able to do so under the lock_timeout + */ + bool + PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK) + { + int errStatus; + struct sembuf sops; + struct timespec timeout; + + sops.sem_op = -1; /* decrement */ + sops.sem_flg = 0; + sops.sem_num = sema->semNum; + + /* + * Note: if errStatus is -1 and errno == EINTR then it means we returned + * from the operation prematurely because we were sent a signal. So we + * try and lock the semaphore again. + * + * Each time around the loop, we check for a cancel/die interrupt. On + * some platforms, if such an interrupt comes in while we are waiting, it + * will cause the semop() call to exit with errno == EINTR, allowing us to + * service the interrupt (if not in a critical section already) during the + * next loop iteration. + * + * Once we acquire the lock, we do NOT check for an interrupt before + * returning. The caller needs to be able to record ownership of the lock + * before any interrupt can be accepted. + * + * There is a window of a few instructions between CHECK_FOR_INTERRUPTS + * and entering the semop() call. If a cancel/die interrupt occurs in + * that window, we would fail to notice it until after we acquire the lock + * (or get another interrupt to escape the semop()). We can avoid this + * problem by temporarily setting ImmediateInterruptOK to true before we + * do CHECK_FOR_INTERRUPTS; then, a die() interrupt in this interval will + * execute directly. However, there is a huge pitfall: there is another + * window of a few instructions after the semop() before we are able to + * reset ImmediateInterruptOK. If an interrupt occurs then, we'll lose + * control, which means that the lock has been acquired but our caller did + * not get a chance to record the fact. Therefore, we only set + * ImmediateInterruptOK if the caller tells us it's OK to do so, ie, the + * caller does not need to record acquiring the lock. (This is currently + * true for lockmanager locks, since the process that granted us the lock + * did all the necessary state updates. It's not true for SysV semaphores + * used to implement LW locks or emulate spinlocks --- but the wait time + * for such locks should not be very long, anyway.) + * + * On some platforms, signals marked SA_RESTART (which is most, for us) + * will not interrupt the semop(); it will just keep waiting. Therefore + * it's necessary for cancel/die interrupts to be serviced directly by the + * signal handler. On these platforms the behavior is really the same + * whether the signal arrives just before the semop() begins, or while it + * is waiting. The loop on EINTR is thus important only for other types + * of interrupts. + */ + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + if (LockTimeout) + { + timeout.tv_sec = LockTimeout / 1000; + timeout.tv_nsec = (LockTimeout % 1000) * 1000000; + errStatus = semtimedop(sema->semId, &sops, 1, &timeout); + } + else + errStatus = semop(sema->semId, &sops, 1); + ImmediateInterruptOK = false; + } while (errStatus < 0 && errno == EINTR); + + if (errStatus < 0) + { + /* Expect EAGAIN or EWOULDBLOCK (platform-dependent) */ + #ifdef EAGAIN + if (errno == EAGAIN) + return false; /* failed to lock it */ + #endif + #if defined(EWOULDBLOCK) && (!defined(EAGAIN) || (EWOULDBLOCK != EAGAIN)) + if (errno == EWOULDBLOCK) + return false; /* failed to lock it */ + #endif + /* Otherwise we got trouble */ + elog(FATAL, "semop(id=%d) failed: %m", sema->semId); + } + return true; + } + diff -dcrpN pgsql.ooscur/src/backend/port/win32_sema.c pgsql.locktimeout/src/backend/port/win32_sema.c *** pgsql.ooscur/src/backend/port/win32_sema.c 2009-06-13 18:24:55.000000000 +0200 --- pgsql.locktimeout/src/backend/port/win32_sema.c 2009-09-03 15:41:34.000000000 +0200 *************** *** 16,21 **** --- 16,22 ---- #include "miscadmin.h" #include "storage/ipc.h" #include "storage/pg_sema.h" + #include "storage/proc.h" /* for LockTimeout */ static HANDLE *mySemSet; /* IDs of sema sets acquired so far */ static int numSems; /* number of sema sets acquired so far */ *************** PGSemaphoreTryLock(PGSemaphore sema) *** 205,207 **** --- 206,267 ---- /* keep compiler quiet */ return false; } + + /* + * PGSemaphoreTimedLock + * + * Lock a semaphore only if able to do so under the lock_timeout + * Serve the interrupt if interruptOK is true. + */ + bool + PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK) + { + DWORD ret; + HANDLE wh[2]; + + wh[0] = *sema; + wh[1] = pgwin32_signal_event; + + /* + * As in other implementations of PGSemaphoreLock, we need to check for + * cancel/die interrupts each time through the loop. But here, there is + * no hidden magic about whether the syscall will internally service a + * signal --- we do that ourselves. + */ + do + { + ImmediateInterruptOK = interruptOK; + CHECK_FOR_INTERRUPTS(); + + errno = 0; + ret = WaitForMultipleObjectsEx(2, wh, FALSE, LockTimeout ? LockTimeout : INFINITE, TRUE); + + if (ret == WAIT_OBJECT_0) + { + /* We got it! */ + return true; + } + else if (ret == WAIT_TIMEOUT) + { + /* Can't get it */ + errno = EAGAIN; + return false; + } + else if (ret == WAIT_OBJECT_0 + 1) + { + /* Signal event is set - we have a signal to deliver */ + pgwin32_dispatch_queued_signals(); + errno = EINTR; + } + else + /* Otherwise we are in trouble */ + errno = EIDRM; + + ImmediateInterruptOK = false; + } while (errno == EINTR); + + if (errno != 0) + ereport(FATAL, + (errmsg("could not lock semaphore: error code %d", (int) GetLastError()))); + } + diff -dcrpN pgsql.ooscur/src/backend/storage/lmgr/lmgr.c pgsql.locktimeout/src/backend/storage/lmgr/lmgr.c *** pgsql.ooscur/src/backend/storage/lmgr/lmgr.c 2009-01-02 17:15:28.000000000 +0100 --- pgsql.locktimeout/src/backend/storage/lmgr/lmgr.c 2009-09-03 15:41:34.000000000 +0200 *************** *** 21,26 **** --- 21,27 ---- #include "catalog/catalog.h" #include "miscadmin.h" #include "storage/lmgr.h" + #include "storage/proc.h" #include "storage/procarray.h" #include "utils/inval.h" *************** LockRelationOid(Oid relid, LOCKMODE lock *** 76,82 **** SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, false); /* * Now that we have the lock, check for invalidation messages, so that we --- 77,83 ---- SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); /* * Now that we have the lock, check for invalidation messages, so that we *************** ConditionalLockRelationOid(Oid relid, LO *** 108,114 **** SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, true); if (res == LOCKACQUIRE_NOT_AVAIL) return false; --- 109,144 ---- SetLocktagRelationOid(&tag, relid); ! res = LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT); ! ! if (res == LOCKACQUIRE_NOT_AVAIL) ! return false; ! ! /* ! * Now that we have the lock, check for invalidation messages; see notes ! * in LockRelationOid. ! */ ! if (res != LOCKACQUIRE_ALREADY_HELD) ! AcceptInvalidationMessages(); ! ! return true; ! } ! ! /* ! * LockTimeoutRelationOid ! * ! * As LockRelationOid, but only lock if we can under lock_timeout. ! * Returns TRUE iff the lock was acquired. ! */ ! bool ! TimedLockRelationOid(Oid relid, LOCKMODE lockmode) ! { ! LOCKTAG tag; ! LockAcquireResult res; ! ! SetLocktagRelationOid(&tag, relid); ! ! res = LockAcquire(&tag, lockmode, false, false, LockTimeout); if (res == LOCKACQUIRE_NOT_AVAIL) return false; *************** LockRelation(Relation relation, LOCKMODE *** 171,177 **** relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, false); /* * Now that we have the lock, check for invalidation messages; see notes --- 201,207 ---- relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); /* * Now that we have the lock, check for invalidation messages; see notes *************** ConditionalLockRelation(Relation relatio *** 198,204 **** relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, true); if (res == LOCKACQUIRE_NOT_AVAIL) return false; --- 228,234 ---- relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! res = LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT); if (res == LOCKACQUIRE_NOT_AVAIL) return false; *************** LockRelationIdForSession(LockRelId *reli *** 250,256 **** SET_LOCKTAG_RELATION(tag, relid->dbId, relid->relId); ! (void) LockAcquire(&tag, lockmode, true, false); } /* --- 280,286 ---- SET_LOCKTAG_RELATION(tag, relid->dbId, relid->relId); ! (void) LockAcquire(&tag, lockmode, true, false, INFINITE_TIMEOUT); } /* *************** LockRelationForExtension(Relation relati *** 285,291 **** relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 315,321 ---- relation->rd_lockInfo.lockRelId.dbId, relation->rd_lockInfo.lockRelId.relId); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** LockPage(Relation relation, BlockNumber *** 319,325 **** relation->rd_lockInfo.lockRelId.relId, blkno); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 349,355 ---- relation->rd_lockInfo.lockRelId.relId, blkno); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** ConditionalLockPage(Relation relation, B *** 338,344 **** relation->rd_lockInfo.lockRelId.relId, blkno); ! return (LockAcquire(&tag, lockmode, false, true) != LOCKACQUIRE_NOT_AVAIL); } /* --- 368,374 ---- relation->rd_lockInfo.lockRelId.relId, blkno); ! return (LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT) != LOCKACQUIRE_NOT_AVAIL); } /* *************** LockTuple(Relation relation, ItemPointer *** 375,381 **** ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 405,411 ---- ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** ConditionalLockTuple(Relation relation, *** 395,401 **** ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! return (LockAcquire(&tag, lockmode, false, true) != LOCKACQUIRE_NOT_AVAIL); } /* --- 425,451 ---- ItemPointerGetBlockNumber(tid), ItemPointerGetOffsetNumber(tid)); ! return (LockAcquire(&tag, lockmode, false, true, INFINITE_TIMEOUT) != LOCKACQUIRE_NOT_AVAIL); ! } ! ! /* ! * TimedLockTuple ! * ! * As above, but only lock if we can get the lock under lock_timeout. ! * Returns TRUE iff the lock was acquired. ! */ ! bool ! TimedLockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode) ! { ! LOCKTAG tag; ! ! SET_LOCKTAG_TUPLE(tag, ! relation->rd_lockInfo.lockRelId.dbId, ! relation->rd_lockInfo.lockRelId.relId, ! ItemPointerGetBlockNumber(tid), ! ItemPointerGetOffsetNumber(tid)); ! ! return (LockAcquire(&tag, lockmode, false, false, LockTimeout) != LOCKACQUIRE_NOT_AVAIL); } /* *************** XactLockTableInsert(TransactionId xid) *** 429,435 **** SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false); } /* --- 479,485 ---- SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false, INFINITE_TIMEOUT); } /* *************** XactLockTableWait(TransactionId xid) *** 473,479 **** SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ShareLock, false, false); LockRelease(&tag, ShareLock, false); --- 523,529 ---- SET_LOCKTAG_TRANSACTION(tag, xid); ! (void) LockAcquire(&tag, ShareLock, false, false, INFINITE_TIMEOUT); LockRelease(&tag, ShareLock, false); *************** ConditionalXactLockTableWait(Transaction *** 501,507 **** SET_LOCKTAG_TRANSACTION(tag, xid); ! if (LockAcquire(&tag, ShareLock, false, true) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); --- 551,589 ---- SET_LOCKTAG_TRANSACTION(tag, xid); ! if (LockAcquire(&tag, ShareLock, false, true, INFINITE_TIMEOUT) == LOCKACQUIRE_NOT_AVAIL) ! return false; ! ! LockRelease(&tag, ShareLock, false); ! ! if (!TransactionIdIsInProgress(xid)) ! break; ! xid = SubTransGetParent(xid); ! } ! ! return true; ! } ! ! ! /* ! * TimedXactLockTableWait ! * ! * As above, but only lock if we can get the lock under lock_timeout. ! * Returns TRUE if the lock was acquired. ! */ ! bool ! TimedXactLockTableWait(TransactionId xid) ! { ! LOCKTAG tag; ! ! for (;;) ! { ! Assert(TransactionIdIsValid(xid)); ! Assert(!TransactionIdEquals(xid, GetTopTransactionIdIfAny())); ! ! SET_LOCKTAG_TRANSACTION(tag, xid); ! ! if (LockAcquire(&tag, ShareLock, false, false, LockTimeout) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); *************** VirtualXactLockTableInsert(VirtualTransa *** 531,537 **** SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false); } /* --- 613,619 ---- SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ExclusiveLock, false, false, INFINITE_TIMEOUT); } /* *************** VirtualXactLockTableWait(VirtualTransact *** 549,555 **** SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ShareLock, false, false); LockRelease(&tag, ShareLock, false); } --- 631,637 ---- SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! (void) LockAcquire(&tag, ShareLock, false, false, INFINITE_TIMEOUT); LockRelease(&tag, ShareLock, false); } *************** ConditionalVirtualXactLockTableWait(Virt *** 569,575 **** SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! if (LockAcquire(&tag, ShareLock, false, true) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); --- 651,657 ---- SET_LOCKTAG_VIRTUALTRANSACTION(tag, vxid); ! if (LockAcquire(&tag, ShareLock, false, true, INFINITE_TIMEOUT) == LOCKACQUIRE_NOT_AVAIL) return false; LockRelease(&tag, ShareLock, false); *************** LockDatabaseObject(Oid classid, Oid obji *** 598,604 **** objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false); } /* --- 680,686 ---- objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); } /* *************** LockSharedObject(Oid classid, Oid objid, *** 636,642 **** objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false); /* Make sure syscaches are up-to-date with any changes we waited for */ AcceptInvalidationMessages(); --- 718,724 ---- objid, objsubid); ! (void) LockAcquire(&tag, lockmode, false, false, INFINITE_TIMEOUT); /* Make sure syscaches are up-to-date with any changes we waited for */ AcceptInvalidationMessages(); *************** LockSharedObjectForSession(Oid classid, *** 678,684 **** objid, objsubid); ! (void) LockAcquire(&tag, lockmode, true, false); } /* --- 760,766 ---- objid, objsubid); ! (void) LockAcquire(&tag, lockmode, true, false, INFINITE_TIMEOUT); } /* diff -dcrpN pgsql.ooscur/src/backend/storage/lmgr/lock.c pgsql.locktimeout/src/backend/storage/lmgr/lock.c *** pgsql.ooscur/src/backend/storage/lmgr/lock.c 2009-06-13 18:24:57.000000000 +0200 --- pgsql.locktimeout/src/backend/storage/lmgr/lock.c 2009-09-03 15:41:34.000000000 +0200 *************** PROCLOCK_PRINT(const char *where, const *** 254,260 **** static uint32 proclock_hash(const void *key, Size keysize); static void RemoveLocalLock(LOCALLOCK *locallock); static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner); ! static void WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner); static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode, PROCLOCK *proclock, LockMethod lockMethodTable); static void CleanUpLock(LOCK *lock, PROCLOCK *proclock, --- 254,261 ---- static uint32 proclock_hash(const void *key, Size keysize); static void RemoveLocalLock(LOCALLOCK *locallock); static void GrantLockLocal(LOCALLOCK *locallock, ResourceOwner owner); ! static int WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, ! int lock_timeout); static bool UnGrantLock(LOCK *lock, LOCKMODE lockmode, PROCLOCK *proclock, LockMethod lockMethodTable); static void CleanUpLock(LOCK *lock, PROCLOCK *proclock, *************** LockAcquireResult *** 467,473 **** LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait) { LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid; LockMethod lockMethodTable; --- 468,475 ---- LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait, ! int lock_timeout) { LOCKMETHODID lockmethodid = locktag->locktag_lockmethodid; LockMethod lockMethodTable; *************** LockAcquire(const LOCKTAG *locktag, *** 745,750 **** --- 747,754 ---- } else { + int wait_result; + Assert(status == STATUS_FOUND); /* *************** LockAcquire(const LOCKTAG *locktag, *** 794,800 **** locktag->locktag_type, lockmode); ! WaitOnLock(locallock, owner); TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1, locktag->locktag_field2, --- 798,804 ---- locktag->locktag_type, lockmode); ! wait_result = WaitOnLock(locallock, owner, lock_timeout); TRACE_POSTGRESQL_LOCK_WAIT_DONE(locktag->locktag_field1, locktag->locktag_field2, *************** LockAcquire(const LOCKTAG *locktag, *** 813,818 **** --- 817,847 ---- * Check the proclock entry status, in case something in the ipc * communication doesn't work correctly. */ + if (wait_result == STATUS_WAITING) + { + if (proclock->holdMask == 0) + { + SHMQueueDelete(&proclock->lockLink); + SHMQueueDelete(&proclock->procLink); + if (!hash_search_with_hash_value(LockMethodProcLockHash, + (void *) &(proclock->tag), + proclock_hashcode, + HASH_REMOVE, + NULL)) + elog(PANIC, "proclock table corrupted"); + } + else + PROCLOCK_PRINT("LockAcquire: TIMED OUT", proclock); + lock->nRequested--; + lock->requested[lockmode]--; + LOCK_PRINT("LockAcquire: TIMED OUT", lock, lockmode); + Assert((lock->nRequested > 0) && (lock->requested[lockmode] >= 0)); + Assert(lock->nGranted <= lock->nRequested); + LWLockRelease(partitionLock); + if (locallock->nLocks == 0) + RemoveLocalLock(locallock); + return LOCKACQUIRE_NOT_AVAIL; + } if (!(proclock->holdMask & LOCKBIT_ON(lockmode))) { PROCLOCK_PRINT("LockAcquire: INCONSISTENT", proclock); *************** GrantAwaitedLock(void) *** 1105,1118 **** * Caller must have set MyProc->heldLocks to reflect locks already held * on the lockable object by this process. * * The appropriate partition lock must be held at entry. */ ! static void ! WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner) { LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock); LockMethod lockMethodTable = LockMethods[lockmethodid]; char *volatile new_status = NULL; LOCK_PRINT("WaitOnLock: sleeping on lock", locallock->lock, locallock->tag.mode); --- 1134,1153 ---- * Caller must have set MyProc->heldLocks to reflect locks already held * on the lockable object by this process. * + * Result: (returns value of ProcSleep()) + * STATUS_OK if we acquired the lock + * STATUS_ERROR if not (deadlock) + * STATUS_WAITING if not (timeout) + * * The appropriate partition lock must be held at entry. */ ! static int ! WaitOnLock(LOCALLOCK *locallock, ResourceOwner owner, int lock_timeout) { LOCKMETHODID lockmethodid = LOCALLOCK_LOCKMETHOD(*locallock); LockMethod lockMethodTable = LockMethods[lockmethodid]; char *volatile new_status = NULL; + int wait_status; LOCK_PRINT("WaitOnLock: sleeping on lock", locallock->lock, locallock->tag.mode); *************** WaitOnLock(LOCALLOCK *locallock, Resourc *** 1154,1161 **** */ PG_TRY(); { ! if (ProcSleep(locallock, lockMethodTable) != STATUS_OK) { /* * We failed as a result of a deadlock, see CheckDeadLock(). Quit * now. --- 1189,1208 ---- */ PG_TRY(); { ! wait_status = ProcSleep(locallock, lockMethodTable, lock_timeout); ! switch (wait_status) { + case STATUS_OK: + /* Got lock */ + break; + case STATUS_WAITING: + /* + * We failed as a result of a timeout. Quit now. + */ + LOCK_PRINT("WaitOnLock: timeout on lock", + locallock->lock, locallock->tag.mode); + break; + default: /* * We failed as a result of a deadlock, see CheckDeadLock(). Quit * now. *************** WaitOnLock(LOCALLOCK *locallock, Resourc *** 1202,1207 **** --- 1249,1256 ---- LOCK_PRINT("WaitOnLock: wakeup on lock", locallock->lock, locallock->tag.mode); + + return wait_status; } /* diff -dcrpN pgsql.ooscur/src/backend/storage/lmgr/lwlock.c pgsql.locktimeout/src/backend/storage/lmgr/lwlock.c *** pgsql.ooscur/src/backend/storage/lmgr/lwlock.c 2009-01-02 17:15:28.000000000 +0100 --- pgsql.locktimeout/src/backend/storage/lmgr/lwlock.c 2009-09-03 15:41:34.000000000 +0200 *************** LWLockConditionalAcquire(LWLockId lockid *** 554,559 **** --- 554,756 ---- } /* + * LWLockTimedAcquire - acquire a lightweight lock in the specified mode + * + * If the lock is not available, sleep until it is or until lock_timeout + * whichever is sooner + * + * Side effect: cancel/die interrupts are held off until lock release. + */ + bool + LWLockTimedAcquire(LWLockId lockid, LWLockMode mode) + { + volatile LWLock *lock = &(LWLockArray[lockid].lock); + PGPROC *proc = MyProc; + bool retry = false; + int extraWaits = 0; + bool timeout; + + PRINT_LWDEBUG("LWLockAcquire", lockid, lock); + + #ifdef LWLOCK_STATS + /* Set up local count state first time through in a given process */ + if (counts_for_pid != MyProcPid) + { + int *LWLockCounter = (int *) ((char *) LWLockArray - 2 * sizeof(int)); + int numLocks = LWLockCounter[1]; + + sh_acquire_counts = calloc(numLocks, sizeof(int)); + ex_acquire_counts = calloc(numLocks, sizeof(int)); + block_counts = calloc(numLocks, sizeof(int)); + counts_for_pid = MyProcPid; + on_shmem_exit(print_lwlock_stats, 0); + } + /* Count lock acquisition attempts */ + if (mode == LW_EXCLUSIVE) + ex_acquire_counts[lockid]++; + else + sh_acquire_counts[lockid]++; + #endif /* LWLOCK_STATS */ + + /* + * We can't wait if we haven't got a PGPROC. This should only occur + * during bootstrap or shared memory initialization. Put an Assert here + * to catch unsafe coding practices. + */ + Assert(!(proc == NULL && IsUnderPostmaster)); + + /* Ensure we will have room to remember the lock */ + if (num_held_lwlocks >= MAX_SIMUL_LWLOCKS) + elog(ERROR, "too many LWLocks taken"); + + /* + * Lock out cancel/die interrupts until we exit the code section protected + * by the LWLock. This ensures that interrupts will not interfere with + * manipulations of data structures in shared memory. + */ + HOLD_INTERRUPTS(); + + /* + * Loop here to try to acquire lock after each time we are signaled by + * LWLockRelease. + * + * NOTE: it might seem better to have LWLockRelease actually grant us the + * lock, rather than retrying and possibly having to go back to sleep. But + * in practice that is no good because it means a process swap for every + * lock acquisition when two or more processes are contending for the same + * lock. Since LWLocks are normally used to protect not-very-long + * sections of computation, a process needs to be able to acquire and + * release the same lock many times during a single CPU time slice, even + * in the presence of contention. The efficiency of being able to do that + * outweighs the inefficiency of sometimes wasting a process dispatch + * cycle because the lock is not free when a released waiter finally gets + * to run. See pgsql-hackers archives for 29-Dec-01. + */ + for (;;) + { + bool mustwait; + + /* Acquire mutex. Time spent holding mutex should be short! */ + SpinLockAcquire(&lock->mutex); + + /* If retrying, allow LWLockRelease to release waiters again */ + if (retry) + lock->releaseOK = true; + + /* If I can get the lock, do so quickly. */ + if (mode == LW_EXCLUSIVE) + { + if (lock->exclusive == 0 && lock->shared == 0) + { + lock->exclusive++; + mustwait = false; + } + else + mustwait = true; + } + else + { + if (lock->exclusive == 0) + { + lock->shared++; + mustwait = false; + } + else + mustwait = true; + } + + if (!mustwait) + break; /* got the lock */ + + /* + * Add myself to wait queue. + * + * If we don't have a PGPROC structure, there's no way to wait. This + * should never occur, since MyProc should only be null during shared + * memory initialization. + */ + if (proc == NULL) + elog(PANIC, "cannot wait without a PGPROC structure"); + + proc->lwWaiting = true; + proc->lwExclusive = (mode == LW_EXCLUSIVE); + proc->lwWaitLink = NULL; + if (lock->head == NULL) + lock->head = proc; + else + lock->tail->lwWaitLink = proc; + lock->tail = proc; + + /* Can release the mutex now */ + SpinLockRelease(&lock->mutex); + + /* + * Wait until awakened. + * + * Since we share the process wait semaphore with the regular lock + * manager and ProcWaitForSignal, and we may need to acquire an LWLock + * while one of those is pending, it is possible that we get awakened + * for a reason other than being signaled by LWLockRelease. If so, + * loop back and wait again. Once we've gotten the LWLock, + * re-increment the sema by the number of additional signals received, + * so that the lock manager or signal manager will see the received + * signal when it next waits. + */ + LOG_LWDEBUG("LWLockAcquire", lockid, "waiting"); + + #ifdef LWLOCK_STATS + block_counts[lockid]++; + #endif + + TRACE_POSTGRESQL_LWLOCK_WAIT_START(lockid, mode); + + for (;;) + { + /* "false" means cannot accept cancel/die interrupt here. */ + timeout = !PGSemaphoreTimedLock(&proc->sem, false); + if (timeout) + break; + if (!proc->lwWaiting) + break; + extraWaits++; + } + + TRACE_POSTGRESQL_LWLOCK_WAIT_DONE(lockid, mode); + + if (timeout) + { + LOG_LWDEBUG("LWLockTimedAcquire", lockid, "timed out"); + break; + } + + LOG_LWDEBUG("LWLockTimedAcquire", lockid, "awakened"); + + /* Now loop back and try to acquire lock again. */ + retry = true; + } + + /* We are done updating shared state of the lock itself. */ + SpinLockRelease(&lock->mutex); + + if (timeout) + goto out; + + TRACE_POSTGRESQL_LWLOCK_ACQUIRE(lockid, mode); + + /* Add lock to list of locks held by this backend */ + held_lwlocks[num_held_lwlocks++] = lockid; + + out: + /* + * Fix the process wait semaphore's count for any absorbed wakeups. + */ + while (extraWaits-- > 0) + PGSemaphoreUnlock(&proc->sem); + + return !timeout; + } + + /* * LWLockRelease - release a previously acquired lock */ void diff -dcrpN pgsql.ooscur/src/backend/storage/lmgr/proc.c pgsql.locktimeout/src/backend/storage/lmgr/proc.c *** pgsql.ooscur/src/backend/storage/lmgr/proc.c 2009-09-02 17:57:06.000000000 +0200 --- pgsql.locktimeout/src/backend/storage/lmgr/proc.c 2009-09-03 15:41:34.000000000 +0200 *************** *** 50,55 **** --- 50,56 ---- /* GUC variables */ int DeadlockTimeout = 1000; int StatementTimeout = 0; + int LockTimeout = 0; bool log_lock_waits = false; /* Pointer to this process's PGPROC struct, if any */ *************** ProcQueueInit(PROC_QUEUE *queue) *** 719,725 **** * The lock table's partition lock must be held at entry, and will be held * at exit. * ! * Result: STATUS_OK if we acquired the lock, STATUS_ERROR if not (deadlock). * * ASSUME: that no one will fiddle with the queue until after * we release the partition lock. --- 720,729 ---- * The lock table's partition lock must be held at entry, and will be held * at exit. * ! * Result: ! * STATUS_OK if we acquired the lock ! * STATUS_ERROR if not (deadlock) ! * STATUS_WAITING if not (timeout) * * ASSUME: that no one will fiddle with the queue until after * we release the partition lock. *************** ProcQueueInit(PROC_QUEUE *queue) *** 730,736 **** * semaphore is normally zero, so when we try to acquire it, we sleep. */ int ! ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable) { LOCKMODE lockmode = locallock->tag.mode; LOCK *lock = locallock->lock; --- 734,740 ---- * semaphore is normally zero, so when we try to acquire it, we sleep. */ int ! ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, int lock_timeout) { LOCKMODE lockmode = locallock->tag.mode; LOCK *lock = locallock->lock; *************** ProcSleep(LOCALLOCK *locallock, LockMeth *** 891,897 **** */ do { ! PGSemaphoreLock(&MyProc->sem, true); /* * waitStatus could change from STATUS_WAITING to something else --- 895,904 ---- */ do { ! if (lock_timeout == INFINITE_TIMEOUT) ! PGSemaphoreLock(&MyProc->sem, true); ! else if (!PGSemaphoreTimedLock(&MyProc->sem, true)) ! break; /* * waitStatus could change from STATUS_WAITING to something else *************** ProcSleep(LOCALLOCK *locallock, LockMeth *** 908,914 **** { PGPROC *autovac = GetBlockingAutoVacuumPgproc(); ! LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); /* * Only do it if the worker is not working to protect against Xid --- 915,924 ---- { PGPROC *autovac = GetBlockingAutoVacuumPgproc(); ! if (lock_timeout == INFINITE_TIMEOUT) ! LWLockAcquire(ProcArrayLock, LW_EXCLUSIVE); ! else if (!LWLockTimedAcquire(ProcArrayLock, LW_EXCLUSIVE)) ! break; /* * Only do it if the worker is not working to protect against Xid diff -dcrpN pgsql.ooscur/src/backend/utils/adt/lockfuncs.c pgsql.locktimeout/src/backend/utils/adt/lockfuncs.c *** pgsql.ooscur/src/backend/utils/adt/lockfuncs.c 2009-01-02 17:15:30.000000000 +0100 --- pgsql.locktimeout/src/backend/utils/adt/lockfuncs.c 2009-09-03 15:41:34.000000000 +0200 *************** pg_advisory_lock_int8(PG_FUNCTION_ARGS) *** 337,343 **** SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ExclusiveLock, true, false); PG_RETURN_VOID(); } --- 337,343 ---- SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ExclusiveLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_advisory_lock_shared_int8(PG_FUNCTION *** 353,359 **** SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ShareLock, true, false); PG_RETURN_VOID(); } --- 353,359 ---- SET_LOCKTAG_INT64(tag, key); ! (void) LockAcquire(&tag, ShareLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_try_advisory_lock_int8(PG_FUNCTION_AR *** 372,378 **** SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ExclusiveLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 372,378 ---- SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ExclusiveLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } *************** pg_try_advisory_lock_shared_int8(PG_FUNC *** 391,397 **** SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ShareLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 391,397 ---- SET_LOCKTAG_INT64(tag, key); ! res = LockAcquire(&tag, ShareLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } *************** pg_advisory_lock_int4(PG_FUNCTION_ARGS) *** 446,452 **** SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ExclusiveLock, true, false); PG_RETURN_VOID(); } --- 446,452 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ExclusiveLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_advisory_lock_shared_int4(PG_FUNCTION *** 463,469 **** SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ShareLock, true, false); PG_RETURN_VOID(); } --- 463,469 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! (void) LockAcquire(&tag, ShareLock, true, false, INFINITE_TIMEOUT); PG_RETURN_VOID(); } *************** pg_try_advisory_lock_int4(PG_FUNCTION_AR *** 483,489 **** SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ExclusiveLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 483,489 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ExclusiveLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } *************** pg_try_advisory_lock_shared_int4(PG_FUNC *** 503,509 **** SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ShareLock, true, true); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } --- 503,509 ---- SET_LOCKTAG_INT32(tag, key1, key2); ! res = LockAcquire(&tag, ShareLock, true, true, INFINITE_TIMEOUT); PG_RETURN_BOOL(res != LOCKACQUIRE_NOT_AVAIL); } diff -dcrpN pgsql.ooscur/src/backend/utils/misc/guc.c pgsql.locktimeout/src/backend/utils/misc/guc.c *** pgsql.ooscur/src/backend/utils/misc/guc.c 2009-09-02 17:57:07.000000000 +0200 --- pgsql.locktimeout/src/backend/utils/misc/guc.c 2009-09-03 15:41:34.000000000 +0200 *************** static struct config_int ConfigureNamesI *** 1546,1551 **** --- 1546,1561 ---- }, { + {"lock_timeout", PGC_USERSET, CLIENT_CONN_STATEMENT, + gettext_noop("Sets the maximum allowed timeout for any lock taken by a statement."), + gettext_noop("A value of 0 turns off the timeout."), + GUC_UNIT_MS + }, + &LockTimeout, + 0, 0, INT_MAX, NULL, NULL + }, + + { {"vacuum_freeze_min_age", PGC_USERSET, CLIENT_CONN_STATEMENT, gettext_noop("Minimum age at which VACUUM should freeze a table row."), NULL diff -dcrpN pgsql.ooscur/src/include/access/multixact.h pgsql.locktimeout/src/include/access/multixact.h *** pgsql.ooscur/src/include/access/multixact.h 2009-01-02 17:15:37.000000000 +0100 --- pgsql.locktimeout/src/include/access/multixact.h 2009-09-03 15:41:34.000000000 +0200 *************** extern bool MultiXactIdIsRunning(MultiXa *** 48,53 **** --- 48,54 ---- extern bool MultiXactIdIsCurrent(MultiXactId multi); extern void MultiXactIdWait(MultiXactId multi); extern bool ConditionalMultiXactIdWait(MultiXactId multi); + extern bool TimedMultiXactIdWait(MultiXactId multi); extern void MultiXactIdSetOldestMember(void); extern int GetMultiXactIdMembers(MultiXactId multi, TransactionId **xids); diff -dcrpN pgsql.ooscur/src/include/storage/lmgr.h pgsql.locktimeout/src/include/storage/lmgr.h *** pgsql.ooscur/src/include/storage/lmgr.h 2009-06-13 18:25:05.000000000 +0200 --- pgsql.locktimeout/src/include/storage/lmgr.h 2009-09-03 15:41:34.000000000 +0200 *************** extern void RelationInitLockInfo(Relatio *** 25,30 **** --- 25,31 ---- /* Lock a relation */ extern void LockRelationOid(Oid relid, LOCKMODE lockmode); extern bool ConditionalLockRelationOid(Oid relid, LOCKMODE lockmode); + extern bool TimedLockRelationOid(Oid relid, LOCKMODE lockmode); extern void UnlockRelationId(LockRelId *relid, LOCKMODE lockmode); extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode); *************** extern void UnlockPage(Relation relation *** 48,53 **** --- 49,56 ---- extern void LockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode); extern bool ConditionalLockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode); + extern bool TimedLockTuple(Relation relation, ItemPointer tid, + LOCKMODE lockmode); extern void UnlockTuple(Relation relation, ItemPointer tid, LOCKMODE lockmode); /* Lock an XID (used to wait for a transaction to finish) */ *************** extern void XactLockTableInsert(Transact *** 55,60 **** --- 58,64 ---- extern void XactLockTableDelete(TransactionId xid); extern void XactLockTableWait(TransactionId xid); extern bool ConditionalXactLockTableWait(TransactionId xid); + extern bool TimedXactLockTableWait(TransactionId xid); /* Lock a VXID (used to wait for a transaction to finish) */ extern void VirtualXactLockTableInsert(VirtualTransactionId vxid); diff -dcrpN pgsql.ooscur/src/include/storage/lock.h pgsql.locktimeout/src/include/storage/lock.h *** pgsql.ooscur/src/include/storage/lock.h 2009-04-14 10:28:46.000000000 +0200 --- pgsql.locktimeout/src/include/storage/lock.h 2009-09-03 15:41:34.000000000 +0200 *************** extern uint32 LockTagHashCode(const LOCK *** 476,482 **** extern LockAcquireResult LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait); extern bool LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock); extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks); --- 476,483 ---- extern LockAcquireResult LockAcquire(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock, ! bool dontWait, ! int lock_timeout); extern bool LockRelease(const LOCKTAG *locktag, LOCKMODE lockmode, bool sessionLock); extern void LockReleaseAll(LOCKMETHODID lockmethodid, bool allLocks); diff -dcrpN pgsql.ooscur/src/include/storage/lwlock.h pgsql.locktimeout/src/include/storage/lwlock.h *** pgsql.ooscur/src/include/storage/lwlock.h 2009-03-04 10:27:30.000000000 +0100 --- pgsql.locktimeout/src/include/storage/lwlock.h 2009-09-03 15:41:34.000000000 +0200 *************** extern bool Trace_lwlocks; *** 92,97 **** --- 92,98 ---- extern LWLockId LWLockAssign(void); extern void LWLockAcquire(LWLockId lockid, LWLockMode mode); extern bool LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode); + extern bool LWLockTimedAcquire(LWLockId lockid, LWLockMode mode); extern void LWLockRelease(LWLockId lockid); extern void LWLockReleaseAll(void); extern bool LWLockHeldByMe(LWLockId lockid); diff -dcrpN pgsql.ooscur/src/include/storage/pg_sema.h pgsql.locktimeout/src/include/storage/pg_sema.h *** pgsql.ooscur/src/include/storage/pg_sema.h 2009-01-02 17:15:39.000000000 +0100 --- pgsql.locktimeout/src/include/storage/pg_sema.h 2009-09-03 15:41:34.000000000 +0200 *************** extern void PGSemaphoreUnlock(PGSemaphor *** 80,83 **** --- 80,86 ---- /* Lock a semaphore only if able to do so without blocking */ extern bool PGSemaphoreTryLock(PGSemaphore sema); + /* Lock a semaphore only if able to do so under the lock_timeout */ + extern bool PGSemaphoreTimedLock(PGSemaphore sema, bool interruptOK); + #endif /* PG_SEMA_H */ diff -dcrpN pgsql.ooscur/src/include/storage/proc.h pgsql.locktimeout/src/include/storage/proc.h *** pgsql.ooscur/src/include/storage/proc.h 2009-09-02 17:57:08.000000000 +0200 --- pgsql.locktimeout/src/include/storage/proc.h 2009-09-03 15:41:34.000000000 +0200 *************** typedef struct PROC_HDR *** 147,156 **** --- 147,159 ---- */ #define NUM_AUXILIARY_PROCS 2 + /* For checking LockTimeout */ + #define INFINITE_TIMEOUT 0 /* configurable options */ extern int DeadlockTimeout; extern int StatementTimeout; + extern int LockTimeout; extern bool log_lock_waits; extern volatile bool cancel_from_timeout; *************** extern bool HaveNFreeProcs(int n); *** 169,175 **** extern void ProcReleaseLocks(bool isCommit); extern void ProcQueueInit(PROC_QUEUE *queue); ! extern int ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable); extern PGPROC *ProcWakeup(PGPROC *proc, int waitStatus); extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock); extern void LockWaitCancel(void); --- 172,178 ---- extern void ProcReleaseLocks(bool isCommit); extern void ProcQueueInit(PROC_QUEUE *queue); ! extern int ProcSleep(LOCALLOCK *locallock, LockMethod lockMethodTable, int lock_timeout); extern PGPROC *ProcWakeup(PGPROC *proc, int waitStatus); extern void ProcLockWakeup(LockMethod lockMethodTable, LOCK *lock); extern void LockWaitCancel(void);
On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
I'm getting segfaults, built in 32 bit linux with gcc
bin/pg_ctl -D data start -l logfile -o "--lock_timeout=5"
Session 1:
jjanes=# begin;
BEGIN
jjanes=# select * from pgbench_branches where bid=3 for update;
bid | bbalance | filler
-----+----------+--------
3 | -3108950 |
(1 row)
Session 2:
jjanes=# select * from pgbench_branches where bid=3 for update;
ERROR: could not obtain lock on row in relation "pgbench_branches"
jjanes=# select * from pgbench_branches where bid=3 for update;
ERROR: could not obtain lock on row in relation "pgbench_branches"
jjanes=# select * from pgbench_branches where bid=3 for update;
ERROR: could not obtain lock on row in relation "pgbench_branches"
jjanes=# set lock_timeout = 0 ;
SET
jjanes=# select * from pgbench_branches where bid=3 for update;
<Session 2 is now blocked>
Session1:
jjanes=# commit;
<long pause>
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
I just realized I should have built with asserts turned on. I'll do that tomorrow, but don't want to delay this info until then, so I am sending it now.
Cheers,
Jeff
Boszormenyi Zoltan írta:
>
> Okay, we implemented only the lock_timeout GUC.
> Patch attached, hopefully in an acceptable form.
> Documentation included in the patch, lock_timeout
> works the same way as statement_timeout, takes
> value in milliseconds and 0 disables the timeout.
>
> Best regards,
> Zoltán Böszörményi
>
New patch attached. It's only regenerated for current CVS
so it should apply cleanly.
I'm getting segfaults, built in 32 bit linux with gcc
bin/pg_ctl -D data start -l logfile -o "--lock_timeout=5"
Session 1:
jjanes=# begin;
BEGIN
jjanes=# select * from pgbench_branches where bid=3 for update;
bid | bbalance | filler
-----+----------+--------
3 | -3108950 |
(1 row)
Session 2:
jjanes=# select * from pgbench_branches where bid=3 for update;
ERROR: could not obtain lock on row in relation "pgbench_branches"
jjanes=# select * from pgbench_branches where bid=3 for update;
ERROR: could not obtain lock on row in relation "pgbench_branches"
jjanes=# select * from pgbench_branches where bid=3 for update;
ERROR: could not obtain lock on row in relation "pgbench_branches"
jjanes=# set lock_timeout = 0 ;
SET
jjanes=# select * from pgbench_branches where bid=3 for update;
<Session 2 is now blocked>
Session1:
jjanes=# commit;
<long pause>
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
I just realized I should have built with asserts turned on. I'll do that tomorrow, but don't want to delay this info until then, so I am sending it now.
Cheers,
Jeff
On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote:
I disagree with Tom on this point. *If* I was trying to implement a server policy, then sure, it should not be done by embedding the timeout in the SQL statement. But I don't think they want this to implement a server policy. (And if we do, why would we thump the poor victims that are waiting on the lock, rather than the rogue who decided to take a lock and then camp out on it?) The use case for WAIT [N] is not a server policy, but a UI policy. I have two ways to do this task. The preferred way needs to lock a row, but waiting for it may take too long. So if I can't get the lock within a reasonable time, I fall back on a less-preferred but still acceptable way of doing the task, one that doesn't need the lock. If we move to a new server, the appropriate value for the time out does not change, because the appropriate level is the concern of the UI and the end users, not the database server. This wouldn't be scattered all over the application, either. In my experience, if you have an application that could benefit from this, you might have 1 or 2 uses for WAIT [N] out of 1,000+ statements in the application. (From my perspective, if there were to be a WAIT [N] option, it could plug into the statement_timeout mechanism rather than the proposed lock_timeout mechanism.)
I think that if the use case for a GUC is to set it, run a single very specific statement, and then unset it, that is pretty clear evidence that this should not be a GUC in the first place.
Maybe I am biased in this because I am primarily thinking about how I would use such a feature, rather than how Hans-Juergen intends to use it, and maybe those uses differ. Hans-Juergen, could you describe your use case a little bit more? Who do is going to be getting these time-out errors, the queries run by the web-app, or longer running back-office queries? And when they do get an error, what will they do about it?
In addition to the previously mentioned seg-fault issues when attempting to use this feature (confirmed in another machine, linux, 64 bit, and --enable-cassert does not offer any help), I have some more concerns about the patch. From the docs:
doc/src/sgml/config.sgml
Abort any statement that tries to lock any rows or tables and the lock
has to wait more than the specified number of milliseconds, starting
from the time the command arrives at the server from the client.
If <varname>log_min_error_statement</> is set to <literal>ERROR</> or
lower, the statement that timed out will also be logged.
A value of zero (the default) turns off the limitation.
This suggests that all row locks will have this behavior. However, my experiments show that row locks attempted to be taken for ordinary UPDATE commands do not time out. If this is only intended to apply to SELECT .... FOR UPDATE, that should be documented here. It is documented elsewhere that this applies to SELECT...FOR UPDATE, but it is not documented that this the only row-locks it applies to.
"from the time the command arrives at the server". I am pretty sure this is not the desired behavior, otherwise how does it differ from statement_timeout? I think it must be a copy and paste error for the doc.
For the implementation, I think the patch touches too much code. In particular, lwlock.c. Is the time spent waiting on ProcArrayLock significant enough that it needs all of that code to support timing it out? I don't think it should ever take more than a few microseconds to obtain that light-weight lock. And if we do want to time all of the light weight access, shouldn't those times be summed up, rather than timing out only if any single one of them exceeds the threshold in isolation? (That is my interpretation of how the code works currently, I could be wrong on that.)
If the seg-faults are fixed, I am still skeptical that this patch is acceptable, because the problem it solves seems to be poorly or incompletely specified.
Cheers,
Jeff
Boszormenyi Zoltan írta:
> Alvaro Herrera írta:
>
>> Boszormenyi Zoltan wrote:
>>
>>
>>
>>> The vague consensus for syntax options was that the GUC
>>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT
>>> is allowed) both should be implemented.
>>>
>>> Behaviour would be that N seconds timeout should be
>>> applied to every lock that the statement would take.
>>>
>>>
>> In http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us
>> Tom argues that lock_timeout should be sufficient. I'm not sure what
>> does WAIT [N] buy
I disagree with Tom on this point. *If* I was trying to implement a server policy, then sure, it should not be done by embedding the timeout in the SQL statement. But I don't think they want this to implement a server policy. (And if we do, why would we thump the poor victims that are waiting on the lock, rather than the rogue who decided to take a lock and then camp out on it?) The use case for WAIT [N] is not a server policy, but a UI policy. I have two ways to do this task. The preferred way needs to lock a row, but waiting for it may take too long. So if I can't get the lock within a reasonable time, I fall back on a less-preferred but still acceptable way of doing the task, one that doesn't need the lock. If we move to a new server, the appropriate value for the time out does not change, because the appropriate level is the concern of the UI and the end users, not the database server. This wouldn't be scattered all over the application, either. In my experience, if you have an application that could benefit from this, you might have 1 or 2 uses for WAIT [N] out of 1,000+ statements in the application. (From my perspective, if there were to be a WAIT [N] option, it could plug into the statement_timeout mechanism rather than the proposed lock_timeout mechanism.)
I think that if the use case for a GUC is to set it, run a single very specific statement, and then unset it, that is pretty clear evidence that this should not be a GUC in the first place.
Maybe I am biased in this because I am primarily thinking about how I would use such a feature, rather than how Hans-Juergen intends to use it, and maybe those uses differ. Hans-Juergen, could you describe your use case a little bit more? Who do is going to be getting these time-out errors, the queries run by the web-app, or longer running back-office queries? And when they do get an error, what will they do about it?
>>
>
> Okay, we implemented only the lock_timeout GUC.
> Patch attached, hopefully in an acceptable form.
> Documentation included in the patch, lock_timeout
> works the same way as statement_timeout, takes
> value in milliseconds and 0 disables the timeout.
>
> Best regards,
> Zoltán Böszörményi
>
New patch attached. It's only regenerated for current CVS
so it should apply cleanly.
In addition to the previously mentioned seg-fault issues when attempting to use this feature (confirmed in another machine, linux, 64 bit, and --enable-cassert does not offer any help), I have some more concerns about the patch. From the docs:
doc/src/sgml/config.sgml
Abort any statement that tries to lock any rows or tables and the lock
has to wait more than the specified number of milliseconds, starting
from the time the command arrives at the server from the client.
If <varname>log_min_error_statement</> is set to <literal>ERROR</> or
lower, the statement that timed out will also be logged.
A value of zero (the default) turns off the limitation.
This suggests that all row locks will have this behavior. However, my experiments show that row locks attempted to be taken for ordinary UPDATE commands do not time out. If this is only intended to apply to SELECT .... FOR UPDATE, that should be documented here. It is documented elsewhere that this applies to SELECT...FOR UPDATE, but it is not documented that this the only row-locks it applies to.
"from the time the command arrives at the server". I am pretty sure this is not the desired behavior, otherwise how does it differ from statement_timeout? I think it must be a copy and paste error for the doc.
For the implementation, I think the patch touches too much code. In particular, lwlock.c. Is the time spent waiting on ProcArrayLock significant enough that it needs all of that code to support timing it out? I don't think it should ever take more than a few microseconds to obtain that light-weight lock. And if we do want to time all of the light weight access, shouldn't those times be summed up, rather than timing out only if any single one of them exceeds the threshold in isolation? (That is my interpretation of how the code works currently, I could be wrong on that.)
If the seg-faults are fixed, I am still skeptical that this patch is acceptable, because the problem it solves seems to be poorly or incompletely specified.
Cheers,
Jeff
On Sat, Sep 19, 2009 at 4:17 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >> Boszormenyi Zoltan írta: >> > Alvaro Herrera írta: >> >> Boszormenyi Zoltan wrote: >> >>> The vague consensus for syntax options was that the GUC >> >>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT >> >>> is allowed) both should be implemented. >> >>> >> >>> Behaviour would be that N seconds timeout should be >> >>> applied to every lock that the statement would take. >> >>> >> >> In >> >> http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us >> >> Tom argues that lock_timeout should be sufficient. I'm not sure what >> >> does WAIT [N] buy > > I disagree with Tom on this point. *If* I was trying to implement a server > policy, then sure, it should not be done by embedding the timeout in the SQL > statement. But I don't think they want this to implement a server policy. > (And if we do, why would we thump the poor victims that are waiting on the > lock, rather than the rogue who decided to take a lock and then camp out on > it?) The use case for WAIT [N] is not a server policy, but a UI policy. I > have two ways to do this task. The preferred way needs to lock a row, but > waiting for it may take too long. So if I can't get the lock within a > reasonable time, I fall back on a less-preferred but still acceptable way of > doing the task, one that doesn't need the lock. If we move to a new server, > the appropriate value for the time out does not change, because the > appropriate level is the concern of the UI and the end users, not the > database server. This wouldn't be scattered all over the application, > either. In my experience, if you have an application that could benefit > from this, you might have 1 or 2 uses for WAIT [N] out of 1,000+ statements > in the application. (From my perspective, if there were to be a WAIT [N] > option, it could plug into the statement_timeout mechanism rather than the > proposed lock_timeout mechanism.) > > I think that if the use case for a GUC is to set it, run a single very > specific statement, and then unset it, that is pretty clear evidence that > this should not be a GUC in the first place. +1 to all of the above. ...Robert
Jeff Janes írta: > On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at > <mailto:zb@cybertec.at>> wrote: > > Boszormenyi Zoltan írta: > > > > Okay, we implemented only the lock_timeout GUC. > > Patch attached, hopefully in an acceptable form. > > Documentation included in the patch, lock_timeout > > works the same way as statement_timeout, takes > > value in milliseconds and 0 disables the timeout. > > > > Best regards, > > Zoltán Böszörményi > > > > New patch attached. It's only regenerated for current CVS > so it should apply cleanly. > > > I'm getting segfaults, built in 32 bit linux with gcc > > bin/pg_ctl -D data start -l logfile -o "--lock_timeout=5" > > Session 1: > jjanes=# begin; > BEGIN > jjanes=# select * from pgbench_branches where bid=3 for update; > bid | bbalance | filler > -----+----------+-------- > 3 | -3108950 | > (1 row) > > Session 2: > jjanes=# select * from pgbench_branches where bid=3 for update; > ERROR: could not obtain lock on row in relation "pgbench_branches" > jjanes=# select * from pgbench_branches where bid=3 for update; > ERROR: could not obtain lock on row in relation "pgbench_branches" > jjanes=# select * from pgbench_branches where bid=3 for update; > ERROR: could not obtain lock on row in relation "pgbench_branches" > jjanes=# set lock_timeout = 0 ; > SET > jjanes=# select * from pgbench_branches where bid=3 for update; > > <Session 2 is now blocked> > > Session1: > jjanes=# commit; > <long pause> > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > The connection to the server was lost. Attempting reset: Failed. > > I just realized I should have built with asserts turned on. I'll do > that tomorrow, but don't want to delay this info until then, so I am > sending it now. > > Cheers, > > Jeff Thanks for the test. The same test worked perfectly at the time I posted it and it also works perfectly on 8.4.1 *now*. So something has changed between then and the current CVS, because I was able to reproduce the segfault with the current CVS HEAD. We'll have to update the patch obviously... Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Jeff Janes írta: > On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at > <mailto:zb@cybertec.at>> wrote: > > Boszormenyi Zoltan írta: > > Alvaro Herrera írta: > > > >> Boszormenyi Zoltan wrote: > >> > >> > >> > >>> The vague consensus for syntax options was that the GUC > >>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT > >>> is allowed) both should be implemented. > >>> > >>> Behaviour would be that N seconds timeout should be > >>> applied to every lock that the statement would take. > >>> > >>> > >> In > http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us > >> Tom argues that lock_timeout should be sufficient. I'm not > sure what > >> does WAIT [N] buy > > > I disagree with Tom on this point. *If* I was trying to implement a > server policy, then sure, it should not be done by embedding the > timeout in the SQL statement. But I don't think they want this to > implement a server policy. (And if we do, why would we thump the poor > victims that are waiting on the lock, rather than the rogue who > decided to take a lock and then camp out on it?) The use case for > WAIT [N] is not a server policy, but a UI policy. I have two ways to > do this task. The preferred way needs to lock a row, but waiting for > it may take too long. So if I can't get the lock within a reasonable > time, I fall back on a less-preferred but still acceptable way of > doing the task, one that doesn't need the lock. If we move to a new > server, the appropriate value for the time out does not change, > because the appropriate level is the concern of the UI and the end > users, not the database server. This wouldn't be scattered all over > the application, either. In my experience, if you have an application > that could benefit from this, you might have 1 or 2 uses for WAIT [N] > out of 1,000+ statements in the application. (From my perspective, if > there were to be a WAIT [N] option, it could plug into the > statement_timeout mechanism rather than the proposed lock_timeout > mechanism.) > > I think that if the use case for a GUC is to set it, run a single very > specific statement, and then unset it, that is pretty clear evidence > that this should not be a GUC in the first place. > > Maybe I am biased in this because I am primarily thinking about how I > would use such a feature, rather than how Hans-Juergen intends to use > it, and maybe those uses differ. Hans-Juergen, could you describe > your use case a little bit more? Who do is going to be getting these > time-out errors, the queries run by the web-app, or longer running > back-office queries? And when they do get an error, what will they do > about it? Our use case is to port a huge set of Informix apps, that use SET LOCK MODE TO WAIT N; Apparently Tom Lane was on the opinion that PostgreSQL won't need anything more in that regard. In case the app gets an error, the query (transaction) can be retried, the "when" can be user controlled. I tried to argue on the SELECT ... WAIT N part as well, but for our purposes currently the GUC is enough. > > Okay, we implemented only the lock_timeout GUC. > > Patch attached, hopefully in an acceptable form. > > Documentation included in the patch, lock_timeout > > works the same way as statement_timeout, takes > > value in milliseconds and 0 disables the timeout. > > > > Best regards, > > Zoltán Böszörményi > > > > New patch attached. It's only regenerated for current CVS > so it should apply cleanly. > > > > In addition to the previously mentioned seg-fault issues when > attempting to use this feature (confirmed in another machine, linux, > 64 bit, and --enable-cassert does not offer any help), I have some > more concerns about the patch. From the docs: > > doc/src/sgml/config.sgml > > Abort any statement that tries to lock any rows or tables and > the lock > has to wait more than the specified number of milliseconds, > starting > from the time the command arrives at the server from the client. > If <varname>log_min_error_statement</> is set to > <literal>ERROR</> or > lower, the statement that timed out will also be logged. > A value of zero (the default) turns off the limitation. > > This suggests that all row locks will have this behavior. However, my > experiments show that row locks attempted to be taken for ordinary > UPDATE commands do not time out. If this is only intended to apply to > SELECT .... FOR UPDATE, that should be documented here. It is > documented elsewhere that this applies to SELECT...FOR UPDATE, but it > is not documented that this the only row-locks it applies to. > > "from the time the command arrives at the server". I am pretty sure > this is not the desired behavior, otherwise how does it differ from > statement_timeout? I think it must be a copy and paste error for the doc. > > > For the implementation, I think the patch touches too much code. In > particular, lwlock.c. Is the time spent waiting on ProcArrayLock > significant enough that it needs all of that code to support timing it > out? I don't think it should ever take more than a few microseconds > to obtain that light-weight lock. And if we do want to time all of > the light weight access, shouldn't those times be summed up, rather > than timing out only if any single one of them exceeds the threshold > in isolation? (That is my interpretation of how the code works > currently, I could be wrong on that.) You seem to be right, it may not be needed. The only callsite is ProcSleep() in storage/lmgr/proc.c and PGSemaphoreTimedLock() was already waited on. Thanks for the review. > > If the seg-faults are fixed, I am still skeptical that this patch is > acceptable, because the problem it solves seems to be poorly or > incompletely specified. > > Cheers, > > Jeff -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
>> I think that if the use case for a GUC is to set it, run a single very >> specific statement, and then unset it, that is pretty clear evidence that >> this should not be a GUC in the first place. +1 Plus, do we really want another GUC? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Mon, Sep 21, 2009 at 1:32 PM, Josh Berkus <josh@agliodbs.com> wrote: > >>> I think that if the use case for a GUC is to set it, run a single very >>> specific statement, and then unset it, that is pretty clear evidence that >>> this should not be a GUC in the first place. > > +1 > > Plus, do we really want another GUC? Well, I don't share the seemingly-popular sentiment that more GUCs are a bad thing. GUCs let you change important parameters of the application without compiling, which is very useful. Of course, I don't want: - GUCs that I'm going to set, execute one statement, and the unset (and this likely falls into that category). - GUCs that are poorly designed so that it's not clear, even to an experienced user, what value to set. - GUCs that exist only to work around the inability of the database to figure out the appropriate value without user input. On the flip side, rereading the thread, one major advantage of the GUC is that it can be used for statements other than SELECT, which hard-coded syntax can't. That might be enough to make me change my vote. ...Robert
Robert Haas escribió: > Of course, I don't want: > > - GUCs that I'm going to set, execute one statement, and the unset > (and this likely falls into that category). > - GUCs that are poorly designed so that it's not clear, even to an > experienced user, what value to set. > - GUCs that exist only to work around the inability of the database to > figure out the appropriate value without user input. > > On the flip side, rereading the thread, one major advantage of the GUC > is that it can be used for statements other than SELECT, which > hard-coded syntax can't. That might be enough to make me change my > vote. Perhaps we'd benefit from a way to set a variable for a single query; something like WITH ( SET query_lock_timeout = 5s ) SELECT ... Of course, this particular syntax doesn't work because WITH is already taken. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Mon, Sep 21, 2009 at 3:14 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Robert Haas escribió: > >> Of course, I don't want: >> >> - GUCs that I'm going to set, execute one statement, and the unset >> (and this likely falls into that category). >> - GUCs that are poorly designed so that it's not clear, even to an >> experienced user, what value to set. >> - GUCs that exist only to work around the inability of the database to >> figure out the appropriate value without user input. >> >> On the flip side, rereading the thread, one major advantage of the GUC >> is that it can be used for statements other than SELECT, which >> hard-coded syntax can't. That might be enough to make me change my >> vote. > > Perhaps we'd benefit from a way to set a variable for a single query; > something like > > WITH ( SET query_lock_timeout = 5s ) SELECT ... > > Of course, this particular syntax doesn't work because WITH is already > taken. Yeah, I thought about that. I think that would be sweet. Maybe LET (query_lock_timeout = 5 s) IN SELECT ... ...Robert
Alvaro Herrera <alvherre@commandprompt.com> writes: > Perhaps we'd benefit from a way to set a variable for a single query; Yeah, particularly if it allows us to fend off requests for random one-off features to accomplish the same thing ... > WITH ( SET query_lock_timeout = 5s ) SELECT ... > Of course, this particular syntax doesn't work because WITH is already > taken. I think you could make it work if you really wanted, but perhaps a different keyword would be better. regards, tom lane
On Mon, Sep 21, 2009 at 3:07 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Jeff Janes írta: >> >> Maybe I am biased in this because I am primarily thinking about how I >> would use such a feature, rather than how Hans-Juergen intends to use >> it, and maybe those uses differ. Hans-Juergen, could you describe >> your use case a little bit more? Who do is going to be getting these >> time-out errors, the queries run by the web-app, or longer running >> back-office queries? And when they do get an error, what will they do >> about it? > > Our use case is to port a huge set of Informix apps, > that use SET LOCK MODE TO WAIT N; > Apparently Tom Lane was on the opinion that > PostgreSQL won't need anything more in that regard. Will statement_timeout not suffice for that use case? I understand that they will do different things, but do not understand why those difference are important. Are there "invisible" deadlocks that need to be timed out, while long running but not dead-locking queries that need to not be timed out? I guess re-running a long-running query is never going to succeed unless the execution plan is improved, while rerunning a long-blocking query is expected to succeed eventually? Cheers, Jeff
Jeff, > Will statement_timeout not suffice for that use case? Well, currently statement_timeout doesn't affect waiting for locks. And as a DBA, I don't think I'd want the same timeout for executing queries as for waiting for a lock. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Jeff, >> Will statement_timeout not suffice for that use case? > Well, currently statement_timeout doesn't affect waiting for locks. Sure it does. > And as a DBA, I don't think I'd want the same timeout for executing > queries as for waiting for a lock. Well, that's exactly what Jeff is questioning. How big is the use-case for that exactly? regards, tom lane
Tom, > Well, that's exactly what Jeff is questioning. How big is the use-case > for that exactly? I think that it's not necessary to have a 2nd GUC, but for a different reason than argued. For the applications I work on, I tend to set statement_timeout to something high designed just to catch runaway queries, like 2min or 5min (or 1 hour on data warehouses). Partly this is because statement_timeout is so indiscriminate, and I don't want to terminate queries I actually wanted to complete. If the lock time is included in the statement_timeout counter, even more so. This would mean that I'd want a lock_timeout which was much shorter than the statement_timeout. However, I also stand by my statement that I don't think that a blanket per-server lock_timeout is that useful; you want the lock timeout to be based on how many locks you're waiting for, what the particular operation is, what the user is expecting, etc. And you need so send them a custom error message which explains the lock wait. So, while some people have asserted that a lock_timeout GUC would allow users to retrofit older applications to time out on locks, I just don't see that being the case. You'd have to refactor regardless, and if you're going to, just add the WAIT statement to the lock request. So, -1 from me on having a lock_timeout GUC for now. However, I think this is another one worth taking an informal blog poll to reach users other than hackers, yes? -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
From
Hans-Juergen Schoenig -- PostgreSQL
Date:
Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> Jeff, >> >>> Will statement_timeout not suffice for that use case? >>> > > >> Well, currently statement_timeout doesn't affect waiting for locks. >> > > Sure it does. > > >> And as a DBA, I don't think I'd want the same timeout for executing >> queries as for waiting for a lock. >> this is exactly the point it is simply an additional use case. while statement_timeout is perfect to kick out queries which take too long a lock_timeout serves a totally different purpose because you will get a totally different error message. imagine some old 4GL terminal application: in this case you will hardly reach a statement_timeout because you will simply want to wait until things appear on your screen. however, you definitely don't want to wait forever if somebody keeps working on some product which is on stock and never finishes. btw, this old terminal application i was talking about is exactly the usecase we had - this is why this patch has been made. we are porting roughly 2500 terminal application from informix to postgresql. we are talking about entire factory production lines and so on here (the ECPG patches posted recently are for the same project, btw.). there are countless use-cases where you want to know whether you are locked out or whether you are just taking too long - the message is totally different. the goal of the patch is to have a mechanism to make sure that you don't starve to death. as far is syntax is concerned: there are good reasons for WAIT and good reasons for a GUC. while the WAIT syntax is clearly for a very precise instruction for a very certain place in a program, a GUC is a more overall policy. i don't see a reason why we should not have both anyway. a GUC has the charm that it can be assigned to roles, procedures, etc. nicely a WAIT clause has the charm of being incredibly precise. i can see good arguments for both. the code itself is pretty simplistic - it needs no effort to be up to date and it does not harm anything else - it is pretty isolated. many thanks, hans -- Cybertec Schoenig & Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de
Re: SELECT ... FOR UPDATE [WAIT integer | NOWAIT] for 8.5
From
Hans-Juergen Schoenig -- PostgreSQL
Date:
Jeff Janes wrote: > Will statement_timeout not suffice for that use case? we tried to get around it without actually touching the core but we really need this functionality. patching the core here is not the primary desire we have. it is all about modeling some functionality which was truly missing. many thanks, hans -- Cybertec Schoenig & Schoenig GmbH Reyergasse 9 / 2 A-2700 Wiener Neustadt Web: www.postgresql-support.de
On Wed, 2009-09-23 at 10:58 -0700, Josh Berkus wrote: > So, while some people have asserted that a lock_timeout GUC would > allow > users to retrofit older applications to time out on locks, I just > don't > see that being the case. You'd have to refactor regardless, and if > you're going to, just add the WAIT statement to the lock request. But note that almost every statement contains a lock request of some kind. So you'd need to add a WAIT clause to every single statement type in PostgreSQL.
On Mon, Sep 21, 2009 at 6:07 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Jeff Janes írta: >> On Thu, Sep 3, 2009 at 6:47 AM, Boszormenyi Zoltan <zb@cybertec.at >> <mailto:zb@cybertec.at>> wrote: >> >> Boszormenyi Zoltan írta: >> > Alvaro Herrera írta: >> > >> >> Boszormenyi Zoltan wrote: >> >> >> >> >> >> >> >>> The vague consensus for syntax options was that the GUC >> >>> 'lock_timeout' and WAIT [N] extension (wherever NOWAIT >> >>> is allowed) both should be implemented. >> >>> >> >>> Behaviour would be that N seconds timeout should be >> >>> applied to every lock that the statement would take. >> >>> >> >>> >> >> In >> http://archives.postgresql.org/message-id/291.1242053201@sss.pgh.pa.us >> >> Tom argues that lock_timeout should be sufficient. I'm not >> sure what >> >> does WAIT [N] buy >> >> >> I disagree with Tom on this point. *If* I was trying to implement a >> server policy, then sure, it should not be done by embedding the >> timeout in the SQL statement. But I don't think they want this to >> implement a server policy. (And if we do, why would we thump the poor >> victims that are waiting on the lock, rather than the rogue who >> decided to take a lock and then camp out on it?) The use case for >> WAIT [N] is not a server policy, but a UI policy. I have two ways to >> do this task. The preferred way needs to lock a row, but waiting for >> it may take too long. So if I can't get the lock within a reasonable >> time, I fall back on a less-preferred but still acceptable way of >> doing the task, one that doesn't need the lock. If we move to a new >> server, the appropriate value for the time out does not change, >> because the appropriate level is the concern of the UI and the end >> users, not the database server. This wouldn't be scattered all over >> the application, either. In my experience, if you have an application >> that could benefit from this, you might have 1 or 2 uses for WAIT [N] >> out of 1,000+ statements in the application. (From my perspective, if >> there were to be a WAIT [N] option, it could plug into the >> statement_timeout mechanism rather than the proposed lock_timeout >> mechanism.) >> >> I think that if the use case for a GUC is to set it, run a single very >> specific statement, and then unset it, that is pretty clear evidence >> that this should not be a GUC in the first place. >> >> Maybe I am biased in this because I am primarily thinking about how I >> would use such a feature, rather than how Hans-Juergen intends to use >> it, and maybe those uses differ. Hans-Juergen, could you describe >> your use case a little bit more? Who do is going to be getting these >> time-out errors, the queries run by the web-app, or longer running >> back-office queries? And when they do get an error, what will they do >> about it? > > Our use case is to port a huge set of Informix apps, > that use SET LOCK MODE TO WAIT N; > Apparently Tom Lane was on the opinion that > PostgreSQL won't need anything more in that regard. > > In case the app gets an error, the query (transaction) > can be retried, the "when" can be user controlled. > > I tried to argue on the SELECT ... WAIT N part as well, > but for our purposes currently the GUC is enough. > >> > Okay, we implemented only the lock_timeout GUC. >> > Patch attached, hopefully in an acceptable form. >> > Documentation included in the patch, lock_timeout >> > works the same way as statement_timeout, takes >> > value in milliseconds and 0 disables the timeout. >> > >> > Best regards, >> > Zoltán Böszörményi >> > >> >> New patch attached. It's only regenerated for current CVS >> so it should apply cleanly. >> >> >> >> In addition to the previously mentioned seg-fault issues when >> attempting to use this feature (confirmed in another machine, linux, >> 64 bit, and --enable-cassert does not offer any help), I have some >> more concerns about the patch. From the docs: >> >> doc/src/sgml/config.sgml >> >> Abort any statement that tries to lock any rows or tables and >> the lock >> has to wait more than the specified number of milliseconds, >> starting >> from the time the command arrives at the server from the client. >> If <varname>log_min_error_statement</> is set to >> <literal>ERROR</> or >> lower, the statement that timed out will also be logged. >> A value of zero (the default) turns off the limitation. >> >> This suggests that all row locks will have this behavior. However, my >> experiments show that row locks attempted to be taken for ordinary >> UPDATE commands do not time out. If this is only intended to apply to >> SELECT .... FOR UPDATE, that should be documented here. It is >> documented elsewhere that this applies to SELECT...FOR UPDATE, but it >> is not documented that this the only row-locks it applies to. >> >> "from the time the command arrives at the server". I am pretty sure >> this is not the desired behavior, otherwise how does it differ from >> statement_timeout? I think it must be a copy and paste error for the doc. >> >> >> For the implementation, I think the patch touches too much code. In >> particular, lwlock.c. Is the time spent waiting on ProcArrayLock >> significant enough that it needs all of that code to support timing it >> out? I don't think it should ever take more than a few microseconds >> to obtain that light-weight lock. And if we do want to time all of >> the light weight access, shouldn't those times be summed up, rather >> than timing out only if any single one of them exceeds the threshold >> in isolation? (That is my interpretation of how the code works >> currently, I could be wrong on that.) > > You seem to be right, it may not be needed. > The only callsite is ProcSleep() in storage/lmgr/proc.c > and PGSemaphoreTimedLock() was already waited on. > Thanks for the review. > >> >> If the seg-faults are fixed, I am still skeptical that this patch is >> acceptable, because the problem it solves seems to be poorly or >> incompletely specified. So there are a couple of problems with this patch: 1. Do we want it at all? 2. Do we want it as a GUC or dedicated syntax? 3. Seg faults are bad. As to #1, personally, I think it's quite useful. The arguments that have been made that lock_timeout is redundant with statement_timeout don't seem to me to have much merit. If I have a low-priority maintenance operation that runs in the background, it's perfectly reasonable for me to want it to die if it spends too long waiting on a lock. But to simulate that behavior with statement timeout, I have to benchmark every statement and then set the statement timeout for that statement individually, and it's still not really going to do what I want. The suggestion that these two are the same strikes me as akin to telling someone they don't need a scalpel because they already have a perfectly good hammer. In futher support of this position, I note that Microsoft SQL Server, Oracle, and DB2 all have this feature. AFAICT from a quick Google Search, MySQL does not. As to #2, I was initially thinking dedicated syntax would be better because I hate "SET guc = value; do thing; SET guc = previous_value;".But now I'm realizing that there's every reason tosuppose that SELECT FOR UPDATE will not be the only case where we want to do this - so I think a GUC is the only reasonable choice. But that having been said, I think some kind of syntax to set a GUC for just one statement would be way useful, per discussions downthread. However, that seems like it can and should be a separate pach. As to #3, that's obviously gotta be fixed. If we're to further consider this patch for this CommitFest, that fixing needs to happen pretty soon. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > As to #1, personally, I think it's quite useful. The arguments that > have been made that lock_timeout is redundant with statement_timeout > don't seem to me to have much merit. > ... > As to #2, I was initially thinking dedicated syntax would be better > because I hate "SET guc = value; do thing; SET guc = previous_value;". > But now I'm realizing that there's every reason to suppose that > SELECT FOR UPDATE will not be the only case where we want to do this - > so I think a GUC is the only reasonable choice. Yeah. I believe that a reasonable argument can be made for being able to limit lock waits separately from total execution time, but it is *not* clear to me why SELECT FOR UPDATE per-tuple waits should be the one single solitary place where that is useful. IIRC I was against the SELECT FOR UPDATE NOWAIT syntax to begin with, because of exactly this same reasoning. > But that having been > said, I think some kind of syntax to set a GUC for just one statement > would be way useful, per discussions downthread. However, that seems > like it can and should be a separate pach. Worth looking at. We do already have SET LOCAL, and the per-function GUC settings, but that may not be sufficient. regards, tom lane
On Sun, Sep 27, 2009 at 1:31 PM, Robert Haas <robertmhaas@gmail.com> wrote: > As to #3, that's obviously gotta be fixed. If we're to further > consider this patch for this CommitFest, that fixing needs to happen > pretty soon. Since it has been 6 days since I posted this and more than 2 weeks since the problem was found, I am moving this patch to returned with feedback. If it is resubmitted for the next CommitFest, please change the subject line to something like "lock_timeout GUC" so that it will match what the patch actually does. I think we have consensus that a GUC is the way to go here, and the feature seems to have enough support. Investigating a set-GUC-for-this-statement-only feature also seems to have some support, but that would be a separate patch and not necessary to satisfy the OP's use case. ...Robert
Hi, and I am happy to present the newest patch. Current state is that it doesn't segfault and seems to work as expected. These combinations below were tested. The lock type on the left side of the arrow is taked by transaction 1 then transaction 2 tries to take the lock on the right side of the arrow. LOCK TABLE -> LOCK TABLE = xact 2 times out LOCK TABLE -> SELECT FOR UPDATE = xact 2 times out LOCK TABLE -> SELECT FOR SHARE = xact 2 times out LOCK TABLE -> SELECT (no lock) = xact 2 times out SELECT FOR UPDATE -> LOCK TABLE = xact 2 times out SELECT FOR UPDATE -> SELECT FOR UPDATE = xact 2 times out SELECT FOR UPDATE -> SELECT FOR SHARE = xact 2 times out SELECT FOR UPDATE -> SELECT (no lock) = xact 2 returns record SELECT FOR SHARE -> LOCK TABLE = xact 2 times out SELECT FOR SHARE -> SELECT FOR UPDATE = xact 2 times out SELECT FOR SHARE -> SELECT FOR SHARE = xact 2 returns record (+ UPDATE on xact 1 times out) SELECT FOR SHARE -> SELECT (no lock) = xact 2 returns record SELECT (no lock) -> LOCK TABLE = xact 2 times out SELECT (no lock) -> SELECT FOR UPDATE = xact 2 returns record SELECT (no lock) -> SELECT FOR SHARE = xact 2 returns record SELECT (no lock) -> SELECT (no lock) = xact 2 returns record Comments? Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Attachment
On Mon, Jan 11, 2010 at 4:35 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Hi, > > and I am happy to present the newest patch. Current state is > that it doesn't segfault and seems to work as expected. > These combinations below were tested. > it has a hunk failed when trying to apply i guess it's because of Tom's refactor of relcache.c it's a simple fix so i will not bother anyone, patch attached -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Attachment
Jaime Casanova <jcasanov@systemguards.com.ec> writes: > it has a hunk failed when trying to apply i guess it's because of > Tom's refactor of relcache.c If this patch is touching those parts of relcache.c, it probably needs rethinking. regards, tom lane
Tom Lane írta: > Jaime Casanova <jcasanov@systemguards.com.ec> writes: > >> it has a hunk failed when trying to apply i guess it's because of >> Tom's refactor of relcache.c >> > > If this patch is touching those parts of relcache.c, it probably needs > rethinking. > > regards, tom lane > > The reject in my patch is because of this chunk in your change: *************** load_critical_index(Oid indexoid) *** 2836,2842 **** Relation ird; LockRelationOid(indexoid, AccessShareLock); ! ird = RelationBuildDesc(indexoid, NULL); if (ird == NULL) elog(PANIC, "could not open critical system index%u", indexoid); ird->rd_isnailed = true; --- 2893,2899 ---- Relation ird; LockRelationOid(indexoid, AccessShareLock); ! ird = RelationBuildDesc(indexoid, true); if (ird == NULL) elog(PANIC, "could not open critical system index%u", indexoid); ird->rd_isnailed = true; What I did there is to check the return value of LockRelationOid() and also elog(PANIC) if the lock wasn't available. Does it need rethinking? -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: > Tom Lane írta: >> >> If this patch is touching those parts of relcache.c, it probably needs >> rethinking. >> > > What I did there is to check the return value of LockRelationOid() the hunk was because a diference in the position (i guess patch accept a hunk of reasonable size, assuming there is something like a reasonable size for that) and is not touching the same as your refactor (sorry if i explain myself bad) > and also elog(PANIC) if the lock wasn't available. > Does it need rethinking? > well, i actually think that PANIC is too high for this... -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova írta: > 2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: > >> Tom Lane írta: >> >>> If this patch is touching those parts of relcache.c, it probably needs >>> rethinking. >>> >>> >> What I did there is to check the return value of LockRelationOid() >> > > the hunk was because a diference in the position (i guess patch accept > a hunk of reasonable size, assuming there is something like a > reasonable size for that) > > and is not touching the same as your refactor (sorry if i explain myself bad) > > >> and also elog(PANIC) if the lock wasn't available. >> Does it need rethinking? >> >> > > well, i actually think that PANIC is too high for this... > Well, it tries to lock and then open a critical system index. Failure to open it has PANIC, it seemed appropriate to use the same error level if the lock failure case. Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Jaime Casanova írta: > 2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: > >> Tom Lane írta: >> >>> If this patch is touching those parts of relcache.c, it probably needs >>> rethinking. >>> >>> >> What I did there is to check the return value of LockRelationOid() >> > > the hunk was because a diference in the position (i guess patch accept > a hunk of reasonable size, assuming there is something like a > reasonable size for that) > Actually the reject was not because of the position difference, Tom's refactor changed one line in load_critical_index(): - ird = RelationBuildDesc(indexoid, NULL); + ird = RelationBuildDesc(indexoid, true); -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: >> >> well, i actually think that PANIC is too high for this... >> > > Well, it tries to lock and then open a critical system index. > Failure to open it has PANIC, it seemed appropriate to use > the same error level if the lock failure case. > if you try to open a critical system index and it doesn't exist is clearly a signal of corruption, if you can't lock it it's just a concurrency issue... don't see why they both should have the same level of message -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Boszormenyi Zoltan <zb@cybertec.at> writes: > Tom Lane �rta: >> If this patch is touching those parts of relcache.c, it probably needs >> rethinking. > What I did there is to check the return value of LockRelationOid() > and also elog(PANIC) if the lock wasn't available. > Does it need rethinking? Yes. What you have done is to change all the LockSomething primitives from return void to return bool and thereby require all call sites to check their results. This is a bad idea. There is no way that you can ensure that all third-party modules will make the same change, meaning that accepting this patch will certainly introduce nasty, hard to reproduce bugs. And what's the advantage? The callers are all going to throw errors anyway, so you might as well do that within the Lock function and avoid the system-wide API change. I think this is a big patch with a small patch struggling to get out. regards, tom lane
Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: > >> Tom Lane írta: >> >>> If this patch is touching those parts of relcache.c, it probably needs >>> rethinking. >>> > > >> What I did there is to check the return value of LockRelationOid() >> and also elog(PANIC) if the lock wasn't available. >> Does it need rethinking? >> > > Yes. What you have done is to change all the LockSomething primitives > from return void to return bool and thereby require all call sites to > check their results. This is a bad idea. Okay, can you tell me how can I get the relation name out of the xid in XactLockTableWait()? There are several call site of this function, and your idea about putting the error code into the LockSomething() functions to preserve the API results strange error messages, like ERROR: could not obtain lock on transaction with ID 658 when I want to UPDATE a tuple in a session when this and another session have a FOR SHARE lock on said tuple. > There is no way that you can > ensure that all third-party modules will make the same change, meaning > that accepting this patch will certainly introduce nasty, hard to > reproduce bugs. And what's the advantage? The callers are all going > to throw errors anyway, so you might as well do that within the Lock > function and avoid the system-wide API change. > > I think this is a big patch with a small patch struggling to get out. > Your smaller patch is attached, with the above strangeness. :-) Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Attachment
Boszormenyi Zoltan írta: > Tom Lane írta: > >> Boszormenyi Zoltan <zb@cybertec.at> writes: >> >> >>> Tom Lane írta: >>> >>> >>>> If this patch is touching those parts of relcache.c, it probably needs >>>> rethinking. >>>> >>>> >> >> >>> What I did there is to check the return value of LockRelationOid() >>> and also elog(PANIC) if the lock wasn't available. >>> Does it need rethinking? >>> >>> >> Yes. What you have done is to change all the LockSomething primitives >> from return void to return bool and thereby require all call sites to >> check their results. This is a bad idea. >> > > Okay, can you tell me how can I get the relation name > out of the xid in XactLockTableWait()? There are several > call site of this function, and your idea about putting the error > code into the LockSomething() functions to preserve the API > results strange error messages, like > > ERROR: could not obtain lock on transaction with ID 658 > > when I want to UPDATE a tuple in a session when > this and another session have a FOR SHARE lock > on said tuple. > > >> There is no way that you can >> ensure that all third-party modules will make the same change, meaning >> that accepting this patch will certainly introduce nasty, hard to >> reproduce bugs. And what's the advantage? The callers are all going >> to throw errors anyway, so you might as well do that within the Lock >> function and avoid the system-wide API change. >> May I change the interface of XactLockTableWait() and MultiXactIdWait()? Not the return value, only the number of parameters. E.g. with the relation name, like in the attached patch. This solves the problem of bad error messages... What do you think? >> I think this is a big patch with a small patch struggling to get out. >> >> > > Your smaller patch is attached, with the above strangeness. :-) > > Best regards, > Zoltán Böszörményi > > > ------------------------------------------------------------------------ > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Attachment
2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: >> >> Your smaller patch is attached, with the above strangeness. :-) >> you still had to add this parameter to the postgresql.conf.sample in the section about lock management -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova írta: > 2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: > >>> Your smaller patch is attached, with the above strangeness. :-) >>> >>> > > you still had to add this parameter to the postgresql.conf.sample in > the section about lock management > Attached with the required change. Thanks, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Attachment
2010/1/15 Boszormenyi Zoltan <zb@cybertec.at>: > Jaime Casanova írta: >> 2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: >> >>>> Your smaller patch is attached, with the above strangeness. :-) >>>> >>>> ok, the patch is more simpler than before and seems to be doing things right... it passes regression tests and my own tests... i think is ready for a commiter to look at it -- Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Jaime Casanova írta: > 2010/1/15 Boszormenyi Zoltan <zb@cybertec.at>: > >> Jaime Casanova írta: >> >>> 2010/1/13 Boszormenyi Zoltan <zb@cybertec.at>: >>> >>> >>>>> Your smaller patch is attached, with the above strangeness. :-) >>>>> >>>>> >>>>> > > ok, the patch is more simpler than before and seems to be doing things right... > it passes regression tests and my own tests... > > i think is ready for a commiter to look at it > Thanks very much for your review. :) -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan escribió: > May I change the interface of XactLockTableWait() > and MultiXactIdWait()? Not the return value, only the number > of parameters. E.g. with the relation name, like in the attached > patch. This solves the problem of bad error messages... > What do you think? We already present such locks as being on transaction id such-and-such, not on relations. IMHO the original wording (waiting on transaction NNN) is okay; you don't need to fool around with passing around a relation name (which is misleading anyway). If you want to provide a friendlier way to display tuple locks, that's okay but it's a separate patch. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Boszormenyi Zoltan <zb@cybertec.at> writes: > [ 5-pg85-locktimeout-14-ctxdiff.patch ] I took a quick look at this. I am not qualified to review the Win32 implementation of PGSemaphoreTimedLock, but I am afraid that both of the other ones are nonstarters on portability grounds. sem_timedwait() and semtimedop() do not appear in the Single Unix Spec, which is our usual reference for what is portable. In particular I don't see either of them on OS X or HPUX. I suspect that applying this patch would immediately break every platform except Linux. I also concur with Alvaro's feeling that the changes to XactLockTableWait() and MultiXactIdWait() are inappropriate. There is no reason to assume that there is always a relevant relation for waits performed with those functions. (In the same line, not all of the added error reports are careful about what happens if get_rel_name fails.) A larger question, which I think has been raised before but I have not seen a satisfactory answer for, is whether the system will behave sanely at all with this type of patch in place. I don't really think that a single lock timeout applicable to every possible reason to wait is going to be nice to use; and I'm afraid in some contexts it could render things completely nonfunctional. (In particular I think that Hot Standby is fragile enough already without this.) It seems particularly imprudent to make such a thing USERSET, implying that any clueless or malicious user could set it in a way that would cause problems, if there are any to cause. regards, tom lane
On Tue, Jan 19, 2010 at 6:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > A larger question, which I think has been raised before but I have not > seen a satisfactory answer for, is whether the system will behave sanely > at all with this type of patch in place. I don't really think that a > single lock timeout applicable to every possible reason to wait is going > to be nice to use; and I'm afraid in some contexts it could render > things completely nonfunctional. (In particular I think that Hot > Standby is fragile enough already without this.) It seems particularly > imprudent to make such a thing USERSET, implying that any clueless or > malicious user could set it in a way that would cause problems, if there > are any to cause. The obvious alternative is to have specific syntax to allow for waits on specific types of statements; however, based on the previous round of conversation, I thought we had concluded that the present design was the least of evils. http://archives.postgresql.org/pgsql-hackers/2009-09/msg01730.php I am not too sure what you think this might break? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Tue, Jan 19, 2010 at 6:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> A larger question, which I think has been raised before but I have not >> seen a satisfactory answer for, is whether the system will behave sanely >> at all with this type of patch in place. > I am not too sure what you think this might break? I'm not sure either. If we weren't at the tail end of a devel cycle, with a large/destabilizing patch already in there that has a great deal of exposure to details of locking behavior, I'd not be so worried. Maybe the right thing is to bounce this back to be reconsidered in the first fest of the next cycle. It's not ready to commit anyway because of the portability problems, so ... regards, tom lane
On Tue, Jan 19, 2010 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Tue, Jan 19, 2010 at 6:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> A larger question, which I think has been raised before but I have not >>> seen a satisfactory answer for, is whether the system will behave sanely >>> at all with this type of patch in place. > >> I am not too sure what you think this might break? > > I'm not sure either. If we weren't at the tail end of a devel cycle, > with a large/destabilizing patch already in there that has a great deal > of exposure to details of locking behavior, I'd not be so worried. > > Maybe the right thing is to bounce this back to be reconsidered in the > first fest of the next cycle. It's not ready to commit anyway because > of the portability problems, so ... That seems reasonable to me. I'd like to have the functionality, but pushing it off a release sounds reasonable, if we're worried that it will be destabilizing. ...Robert
<p>we already have statement timeout it seems the natural easy to implement this is with more hairy logic to calculate thetimeout until the next of the three timeouts should fire and set sigalarm. I sympathize with whoever tries to work thatthrough though, the logic is hairy enough with just the two variables...but at least we know that sigalarm works or atleast it had better...<p>greg<p><blockquote type="cite">On 20 Jan 2010 00:27, "Robert Haas" <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br /><br /><p><font color="#500050">On Tue, Jan 19,2010 at 7:10 PM, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote: > Robert Haas <robertmhaas@gmai...</font>Thatseems reasonable to me. I'd like to have the functionality, but<br /> pushing it off arelease sounds reasonable, if we're worried that it<br /> will be destabilizing.<br /><font color="#888888"><br /> ...Robert<br/></font><p><font color="#500050"> -- Sent via pgsql-hackers mailing list (<a href="mailto:pgsql-hackers@postgresql.org">pgsql-hackers@postgresql.org</a>)To make changes to your subs...</font></blockquote>
Greg Stark <stark@mit.edu> writes: > we already have statement timeout it seems the natural easy to implement > this is with more hairy logic to calculate the timeout until the next of the > three timeouts should fire and set sigalarm. I sympathize with whoever tries > to work that through though, the logic is hairy enough with just the two > variables...but at least we know that sigalarm works or at least it had > better... Yeah, that code is ugly as sin already. Maybe there is a way to refactor it so it can scale better? I can't help thinking of Polya's inventor's paradox ("the more general problem may be easier to solve"). If we want to do it without any new system-call dependencies I think that's probably the only way. I'm not necessarily against new dependencies, if they're portable --- but it seems these aren't. regards, tom lane
Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: > >> [ 5-pg85-locktimeout-14-ctxdiff.patch ] >> > > I took a quick look at this. I am not qualified to review the Win32 > implementation of PGSemaphoreTimedLock, but I am afraid that both of > the other ones are nonstarters on portability grounds. sem_timedwait() > and semtimedop() do not appear in the Single Unix Spec, which is our > usual reference for what is portable. In particular I don't see either > of them on OS X or HPUX. We're lucky in that regard, we have developed and tested this patch under Linux and: # uname -a HP-UX uxhv1f17 B.11.31 U ia64 4099171317 unlimited-user license The links under src/backend/port show that it uses sysv_sema.c and semtimedop() compiles and works nicely there. Hans will test it under OS X. > I suspect that applying this patch would > immediately break every platform except Linux. > Fortunately suspicion doesn not mean guilty, let's wait for Hans' test. > I also concur with Alvaro's feeling that the changes to XactLockTableWait() > and MultiXactIdWait() are inappropriate. There is no reason to assume > that there is always a relevant relation for waits performed with those > functions. (In the same line, not all of the added error reports are > careful about what happens if get_rel_name fails.) > Okay, I don't have strong feelings about the exact error message, I will post the older version with the Lock* APIs intact, add the chunk that adds the GUC to postgresql.conf.sample and also look at your comment. But IIRC some of the missing checks come from the callers' logic, they (all or only some of them? have to check) already opened the Relation they try to lock hence the same get_rel_name() MUST succeed or else it's an internal error already. > A larger question, which I think has been raised before but I have not > seen a satisfactory answer for, is whether the system will behave sanely > at all with this type of patch in place. I don't really think that a > single lock timeout applicable to every possible reason to wait is going > to be nice to use; IIRC you were the one who raised the issue but in the exact opposite way to conclude that we won't need SELECT ... WAIT N to complement NOWAIT. Stick to one opinion please. :-) > and I'm afraid in some contexts it could render > things completely nonfunctional. (In particular I think that Hot > Standby is fragile enough already without this.) It seems particularly > imprudent to make such a thing USERSET, implying that any clueless or > malicious user could set it in a way that would cause problems, if there > are any to cause. > Is there an flag that causes the setting rejected from postgresql.conf but makes settable from the session? This would ensure correct operation, as the default 0 behaves the same as before. Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Tom Lane írta: > Greg Stark <stark@mit.edu> writes: > >> we already have statement timeout it seems the natural easy to implement >> this is with more hairy logic to calculate the timeout until the next of the >> three timeouts should fire and set sigalarm. I sympathize with whoever tries >> to work that through though, the logic is hairy enough with just the two >> variables...but at least we know that sigalarm works or at least it had >> better... >> > > Yeah, that code is ugly as sin already. Maybe there is a way to > refactor it so it can scale better? I can't help thinking of Polya's > inventor's paradox ("the more general problem may be easier to solve"). > > If we want to do it without any new system-call dependencies I think > that's probably the only way. I'm not necessarily against new > dependencies, if they're portable --- but it seems these aren't. > Okay, after reading google it seems you're right that OS X lacks sem_timedwait(). How about adding a configure check for semtimedop() and sem_timedwait() and if they don't exist set a compile time flag (HAVE_XXX) and in this case PGSemaphoreTimedLock() would behave the same as PGSemaphoreLock() and have an assign_*() function that tells the user that the timeout functionality is missing? We have precedent for the missing functionality with e.g. effective_io_concurrency and ereport() is also allowed in such functions, see assign_transaction_read_only(). Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan írta: > Tom Lane írta: > >> Greg Stark <stark@mit.edu> writes: >> >> >>> we already have statement timeout it seems the natural easy to implement >>> this is with more hairy logic to calculate the timeout until the next of the >>> three timeouts should fire and set sigalarm. I sympathize with whoever tries >>> to work that through though, the logic is hairy enough with just the two >>> variables...but at least we know that sigalarm works or at least it had >>> better... >>> >>> >> Yeah, that code is ugly as sin already. Maybe there is a way to >> refactor it so it can scale better? I can't help thinking of Polya's >> inventor's paradox ("the more general problem may be easier to solve"). >> >> If we want to do it without any new system-call dependencies I think >> that's probably the only way. I'm not necessarily against new >> dependencies, if they're portable --- but it seems these aren't. >> >> > > Okay, after reading google it seems you're right that OS X lacks > sem_timedwait(). How about adding a configure check for semtimedop() > and sem_timedwait() and if they don't exist set a compile time flag > (HAVE_XXX) and in this case PGSemaphoreTimedLock() would > behave the same as PGSemaphoreLock() and have an assign_*() > function that tells the user that the timeout functionality is missing? > We have precedent for the missing functionality with e.g. > effective_io_concurrency and ereport() is also allowed in such > functions, see assign_transaction_read_only(). > Attached with the proposed modification to lift the portability concerns. Fixed the missing check for get_rel_name() and one typo ("transation") Introduced checks for semtimedop() and sem_timedwait() in configure.in and USE_LOCK_TIMEOUT in port.h depending on HAVE_DECL_SEMTIMEDOP || HAVE_DECL_SEM_TIMEDWAIT || WIN32 Introduced assign_lock_timeout() GUC validator function that allows setting the value only from the wired-in-default (0) or from SET statements. Comments? Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Attachment
If that's the case then others timeouts should be failing on os x, no? But i have never hear that 2010/1/20, Boszormenyi Zoltan <zb@cybertec.at>: > Boszormenyi Zoltan írta: >> Tom Lane írta: >> >>> Greg Stark <stark@mit.edu> writes: >>> >>> >>>> we already have statement timeout it seems the natural easy to implement >>>> this is with more hairy logic to calculate the timeout until the next of >>>> the >>>> three timeouts should fire and set sigalarm. I sympathize with whoever >>>> tries >>>> to work that through though, the logic is hairy enough with just the two >>>> variables...but at least we know that sigalarm works or at least it had >>>> better... >>>> >>>> >>> Yeah, that code is ugly as sin already. Maybe there is a way to >>> refactor it so it can scale better? I can't help thinking of Polya's >>> inventor's paradox ("the more general problem may be easier to solve"). >>> >>> If we want to do it without any new system-call dependencies I think >>> that's probably the only way. I'm not necessarily against new >>> dependencies, if they're portable --- but it seems these aren't. >>> >>> >> >> Okay, after reading google it seems you're right that OS X lacks >> sem_timedwait(). How about adding a configure check for semtimedop() >> and sem_timedwait() and if they don't exist set a compile time flag >> (HAVE_XXX) and in this case PGSemaphoreTimedLock() would >> behave the same as PGSemaphoreLock() and have an assign_*() >> function that tells the user that the timeout functionality is missing? >> We have precedent for the missing functionality with e.g. >> effective_io_concurrency and ereport() is also allowed in such >> functions, see assign_transaction_read_only(). >> > > Attached with the proposed modification to lift the portability concerns. > Fixed the missing check for get_rel_name() and one typo ("transation") > Introduced checks for semtimedop() and sem_timedwait() in configure.in > and USE_LOCK_TIMEOUT in port.h depending on > HAVE_DECL_SEMTIMEDOP || HAVE_DECL_SEM_TIMEDWAIT || WIN32 > Introduced assign_lock_timeout() GUC validator function that allows > setting the value only from the wired-in-default (0) or from SET statements. > > Comments? > > Best regards, > Zoltán Böszörményi > > -- > Bible has answers for everything. Proof: > "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more > than these cometh of evil." (Matthew 5:37) - basics of digital technology. > "May your kingdom come" - superficial description of plate tectonics > > ---------------------------------- > Zoltán Böszörményi > Cybertec Schönig & Schönig GmbH > http://www.postgresql.at/ > > -- Enviado desde mi dispositivo móvil Atentamente, Jaime Casanova Soporte y capacitación de PostgreSQL Asesoría y desarrollo de sistemas Guayaquil - Ecuador Cel. +59387171157
Hi, I wrote: > Okay, after reading google it seems you're right that OS X lacks > sem_timedwait(). Jaime Casanova írta: > If that's the case then others timeouts should be failing on os x, no? > But i have never hear that > among others, I found this reference on the missing sem_timedwait() function: http://bugs.freepascal.org/view.php?id=13148 Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
2010/1/20 Boszormenyi Zoltan <zb@cybertec.at>: > Attached with the proposed modification to lift the portability concerns. > Fixed the missing check for get_rel_name() and one typo ("transation") > Introduced checks for semtimedop() and sem_timedwait() in configure.in > and USE_LOCK_TIMEOUT in port.h depending on > HAVE_DECL_SEMTIMEDOP || HAVE_DECL_SEM_TIMEDWAIT || WIN32 > Introduced assign_lock_timeout() GUC validator function that allows > setting the value only from the wired-in-default (0) or from SET statements. > > Comments? I think that it is a very bad idea to implement this feature in a way that is not 100% portable. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > 2010/1/20 Boszormenyi Zoltan <zb@cybertec.at>: >> Attached with the proposed modification to lift the portability concerns. > I think that it is a very bad idea to implement this feature in a way > that is not 100% portable. Agreed, this is not acceptable. If there were no possible way to implement the feature portably, we *might* consider doing it like this. But I think more likely it'd get rejected anyway. When there is a clear path to a portable solution, it's definitely not going to fly to submit a nonportable one. regards, tom lane
Tom Lane írta: > Robert Haas <robertmhaas@gmail.com> writes: > >> 2010/1/20 Boszormenyi Zoltan <zb@cybertec.at>: >> >>> Attached with the proposed modification to lift the portability concerns. >>> > > >> I think that it is a very bad idea to implement this feature in a way >> that is not 100% portable. >> > > Agreed, this is not acceptable. If there were no possible way to > implement the feature portably, we *might* consider doing it like this. > But I think more likely it'd get rejected anyway. When there is a > clear path to a portable solution, it's definitely not going to fly > to submit a nonportable one. > > regards, tom lane > OK, I will implement it using setitimer(). It may not reach 8.5 though, when will this last Commitfest end? Thanks, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
2010/1/21 Boszormenyi Zoltan <zb@cybertec.at>: > Tom Lane írta: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I think that it is a very bad idea to implement this feature in a way >>> that is not 100% portable. >> >> Agreed, this is not acceptable. If there were no possible way to >> implement the feature portably, we *might* consider doing it like this. >> But I think more likely it'd get rejected anyway. When there is a >> clear path to a portable solution, it's definitely not going to fly >> to submit a nonportable one. > > OK, I will implement it using setitimer(). > It may not reach 8.5 though, when will this last Commitfest end? The CommitFest ends 2/15, but that's not really the relevant metric. Patches will be marked Returned with Feedback if they are not updated within 4-5 days of the time they were last reviewed, or more aggressively as we get towards the end. Also, if a patch needs a major rewrite, it should be marked Returned with Feedback and resubmitted for this CommitFest. It sounds like this patch meets that criterion; in addition, Tom has expressed concerns that this might be something that should be committed early in the release cycle rather than at the very end. ...Robert
Hi, Robert Haas írta: > 2010/1/21 Boszormenyi Zoltan <zb@cybertec.at>: > >> Tom Lane írta: >> >>> Robert Haas <robertmhaas@gmail.com> writes: >>> >>>> I think that it is a very bad idea to implement this feature in a way >>>> that is not 100% portable. >>>> >>> Agreed, this is not acceptable. If there were no possible way to >>> implement the feature portably, we *might* consider doing it like this. >>> But I think more likely it'd get rejected anyway. When there is a >>> clear path to a portable solution, it's definitely not going to fly >>> to submit a nonportable one. >>> >> OK, I will implement it using setitimer(). >> It may not reach 8.5 though, when will this last Commitfest end? >> > > The CommitFest ends 2/15, but that's not really the relevant metric. > Patches will be marked Returned with Feedback if they are not updated > within 4-5 days of the time they were last reviewed, or more > aggressively as we get towards the end. Also, if a patch needs a > major rewrite, it should be marked Returned with Feedback and > resubmitted for this CommitFest. It sounds like this patch meets that > criterion; in addition, Tom has expressed concerns that this might be > something that should be committed early in the release cycle rather > than at the very end. > > ...Robert > Thanks. So it means that this patch will considered for 9.1. I would like a mini-review on the change I made in the latest patch by introducing the validator function. Is it enough to check for (source == PGC_S_DEFAULT || source == PGC_S_SESSION) to ensure only interactive sessions can get lock timeouts? This way autovacuum, replication and any other internal processes get proper behaviour, i.e. the setting from postgresql.conf is ignored and locks don't timeout for them. Which other PGC_S_* settings can or must be enabled? Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
On Thu, Jan 21, 2010 at 9:41 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Thanks. So it means that this patch will considered for 9.1. Yeah, I think that's best. > I would like a mini-review on the change I made in the latest > patch by introducing the validator function. Is it enough > to check for > (source == PGC_S_DEFAULT || source == PGC_S_SESSION) > to ensure only interactive sessions can get lock timeouts? > This way autovacuum, replication and any other internal > processes get proper behaviour, i.e. the setting from > postgresql.conf is ignored and locks don't timeout for them. > Which other PGC_S_* settings can or must be enabled? I'm not sure that I know how this should work, but that approach seems a little strange to me. Why would we not allow PGC_S_USER, for example? Also, does this mean that if the setting is present in postgresql.conf, autovacuum will fail to start? It seems to me that rather than trying to restrict the PGC_S_* types for which this can be set, we should be trying to make the "internal processes" ignore the GUC altogether. I'm not sure if there's a clean way to do that, though. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 21, 2010 at 9:41 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >> I would like a mini-review on the change I made in the latest >> patch by introducing the validator function. Is it enough >> to check for >> � �(source == PGC_S_DEFAULT || source == PGC_S_SESSION) >> to ensure only interactive sessions can get lock timeouts? > I'm not sure that I know how this should work, but that approach seems > a little strange to me. Why would we not allow PGC_S_USER, for > example? Why is this a good idea at all? I can easily see somebody feeling that he'd like autovacuums to fail rather than block on locks for a long time, for example. regards, tom lane
On Thu, Jan 21, 2010 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Jan 21, 2010 at 9:41 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >>> I would like a mini-review on the change I made in the latest >>> patch by introducing the validator function. Is it enough >>> to check for >>> (source == PGC_S_DEFAULT || source == PGC_S_SESSION) >>> to ensure only interactive sessions can get lock timeouts? > >> I'm not sure that I know how this should work, but that approach seems >> a little strange to me. Why would we not allow PGC_S_USER, for >> example? > > Why is this a good idea at all? I can easily see somebody feeling that > he'd like autovacuums to fail rather than block on locks for a long > time, for example. What I can see happening is someone setting this GUC in postgresql.conf and then being surprised that it applied to thinks like walreceiver and autovacuum, in addition to user queries. Are we even sure that that code would all behave sanely with this behavior? ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Jan 21, 2010 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Why is this a good idea at all? �I can easily see somebody feeling that >> he'd like autovacuums to fail rather than block on locks for a long >> time, for example. > What I can see happening is someone setting this GUC in > postgresql.conf and then being surprised that it applied to thinks > like walreceiver and autovacuum, in addition to user queries. Are we > even sure that that code would all behave sanely with this behavior? No, I'm not sure, as I said before ;-). But a 100%-arbitrary restriction like "it doesn't apply to background processes" will not make it noticeably safer. There is very damn little code that only executes in background and never anywhere else. regards, tom lane
Tom Lane írta: > Robert Haas <robertmhaas@gmail.com> writes: > >> On Thu, Jan 21, 2010 at 9:41 AM, Boszormenyi Zoltan <zb@cybertec.at> wrote: >> >>> I would like a mini-review on the change I made in the latest >>> patch by introducing the validator function. Is it enough >>> to check for >>> (source == PGC_S_DEFAULT || source == PGC_S_SESSION) >>> to ensure only interactive sessions can get lock timeouts? >>> > > >> I'm not sure that I know how this should work, but that approach seems >> a little strange to me. Why would we not allow PGC_S_USER, for >> example? >> > > Why is this a good idea at all? I can easily see somebody feeling that > he'd like autovacuums to fail rather than block on locks for a long > time, for example. > You expressed stability concerns coming from this patch. Were these concerns because of locks timing out making things fragile or because of general feelings about introducing such a patch at the end of the release cycle? I was thinking about the former, hence this modification. Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Boszormenyi Zoltan <zb@cybertec.at> writes: > You expressed stability concerns coming from this patch. > Were these concerns because of locks timing out making > things fragile or because of general feelings about introducing > such a patch at the end of the release cycle? I was thinking > about the former, hence this modification. Indeed, I am *very* concerned about the stability implications of this patch. I just don't believe that arbitrarily restricting which processes the GUC applies to will make it any safer. regards, tom lane
Hi, Tom Lane írta: > Boszormenyi Zoltan <zb@cybertec.at> writes: > >> You expressed stability concerns coming from this patch. >> Were these concerns because of locks timing out making >> things fragile or because of general feelings about introducing >> such a patch at the end of the release cycle? I was thinking >> about the former, hence this modification. >> > > Indeed, I am *very* concerned about the stability implications of this > patch. I just don't believe that arbitrarily restricting which > processes the GUC applies to will make it any safer. > > regards, tom lane > Okay, here is the rewritten lock_timeout GUC patch that uses setitimer() to set the timeout for lock timeout. I removed the GUC assignment/validation function. I left the current statement timeout vs deadlock timeout logic mostly intact in enable_sig_alarm(), because it's used by a few places. The only change is that statement_fin_time is always computed there because the newly introduced function (enable_sig_alarm_for_lock_timeout()) checks it to see whether the lock timeout triggers earlier then the deadlock timeout. As it was discussed before, this is 9.1 material. Best regards, Zoltán Böszörményi -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/