Thread: Debugging deadlocks
I'm getting the following in the server log: 2005-03-27 06:04:21 GMT estat DETAIL: Process 20928 waits for ShareLock on transaction 7751823; blocked by process 20929. Process 20929 waits for ShareLock on transaction 7768115; blocked by process 20928. 2005-03-27 06:04:21 GMT estat CONTEXT: SQL statement "SELECT 1 FROM ONLY "rumba"."service_plane" x WHERE "service_plane_id" = $1 FOR UPDATE OF x" SQL statement " INSERT INTO FIVE_MIN_STATS_200503 (traffic_id, service_plane_id, datestamp, sample_bucket_no, service_id, data_collector_device_id, bit_delta, packet_delta, bit_rate, packet_rate, bit_drop_delta, packet_drop_delta, bit_drop_rate, packet_drop_rate, updated) VALUES ( '1','4','2005-03-21','1','MV008816','3', 0, 0, 0, 0,0,0,0,0,'N' )" PL/pgSQL function "initialize_five_minute_samples" line 34 at execute statement SQL statement "SELECT INITIALIZE_FIVE_MINUTE_SAMPLES( $1 , $2 , $3 , $4 , $5 , 1, 288)" PL/pgSQL function "ins_updt_five_min_sample" line 28 at perform FIVE_MIN_STATS_200503 has a foreign key into "rumba"."service_plane". The service_plane table is a reference table, i.e., a fixed set of values used only to validate foreign keys. So the code doesn't have any update statements on that table. I'm assuming PostgreSQL is generating that SQL to validate the foreign key. But why is it selecting for update? -- Guy Rouillier
On Sun, Mar 27, 2005 at 12:54:28AM -0600, Guy Rouillier wrote: > I'm getting the following in the server log: > > 2005-03-27 06:04:21 GMT estat DETAIL: Process 20928 waits for ShareLock > on transaction 7751823; blocked by process 20929. > Process 20929 waits for ShareLock on transaction 7768115; > blocked by process 20928. > 2005-03-27 06:04:21 GMT estat CONTEXT: SQL statement "SELECT 1 FROM > ONLY "rumba"."service_plane" x WHERE "service_plane_id" = $1 FOR UPDATE > OF x" ... > The service_plane table is a reference table, i.e., a fixed set of > values used only to validate foreign keys. So the code doesn't have any > update statements on that table. I'm assuming PostgreSQL is generating > that SQL to validate the foreign key. But why is it selecting for > update? To make sure the referenced key can't change until the transaction completes and the referencing row becomes visible to other transactions (or is rolled back) -- otherwise other transactions could change or delete the referenced key and not know they'd be breaking your referential integrity. The current implementation supports only exclusive row-level locks (SELECT FOR UPDATE), but I think Alvaro might be working on shared row-level locks for a future release. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
"Michael Fuhr" <mike@fuhr.org> writes > To make sure the referenced key can't change until the transaction > completes and the referencing row becomes visible to other transactions > (or is rolled back) -- otherwise other transactions could change > or delete the referenced key and not know they'd be breaking your > referential integrity. The current implementation supports only > exclusive row-level locks (SELECT FOR UPDATE), but I think Alvaro > might be working on shared row-level locks for a future release. > The code to process RI could be optimized a little bit. See the following case: user -1- test=# begin; BEGIN test=# delete from a where i = 2; DELETE 1 user -2- test=# update a set i = 3 where i = 1; ERROR: update or delete on "a" violates foreign key constraint "b_i_fkey" on "b" DETAIL: Key (i)=(1) is still referenced from table "b". test=# update a set i = 2 where i = 1; /* User 2 will be blocked here */ Blocking user 2 is strange and not necessary? Since the sever should first check the where-clause (this is true in ordinary UPDATE). If not, just report an error as the fomer statement. Regards, Qingqing
On Sun, 27 Mar 2005, Qingqing Zhou wrote: > > "Michael Fuhr" <mike@fuhr.org> writes > > To make sure the referenced key can't change until the transaction > > completes and the referencing row becomes visible to other transactions > > (or is rolled back) -- otherwise other transactions could change > > or delete the referenced key and not know they'd be breaking your > > referential integrity. The current implementation supports only > > exclusive row-level locks (SELECT FOR UPDATE), but I think Alvaro > > might be working on shared row-level locks for a future release. > > > > The code to process RI could be optimized a little bit. See the following > case: > > user -1- > test=# begin; > BEGIN > test=# delete from a where i = 2; > DELETE 1 > > user -2- > test=# update a set i = 3 where i = 1; > ERROR: update or delete on "a" violates foreign key > constraint "b_i_fkey" on "b" > DETAIL: Key (i)=(1) is still referenced from table "b". > test=# update a set i = 2 where i = 1; > /* User 2 will be blocked here */ > > Blocking user 2 is strange and not necessary? Since the sever should first > check the where-clause (this is true in ordinary UPDATE). If not, just > report an error as the fomer statement. Well, that's not the foreign key necessarily. I don't have a machine to test on at the moment (machine currently dead), but I think the same happens without a foreign key constraint due to the unique/primary key constraint on a.i.
Michael Fuhr wrote: > On Sun, Mar 27, 2005 at 12:54:28AM -0600, Guy Rouillier wrote: >> I'm getting the following in the server log: >> >> 2005-03-27 06:04:21 GMT estat DETAIL: Process 20928 waits for >> ShareLock on transaction 7751823; blocked by process 20929. >> Process 20929 waits for ShareLock on transaction 7768115; blocked by >> process 20928. 2005-03-27 06:04:21 GMT estat CONTEXT: SQL statement >> "SELECT 1 FROM ONLY "rumba"."service_plane" x WHERE >> "service_plane_id" = $1 FOR UPDATE OF x" > ... >> The service_plane table is a reference table, i.e., a fixed set of >> values used only to validate foreign keys. So the code doesn't have >> any update statements on that table. I'm assuming PostgreSQL is >> generating that SQL to validate the foreign key. But why is it >> selecting for update? > > To make sure the referenced key can't change until the transaction > completes and the referencing row becomes visible to other > transactions (or is rolled back) -- otherwise other transactions > could change or delete the referenced key and not know they'd be > breaking your referential integrity. The current implementation > supports only exclusive row-level locks (SELECT FOR UPDATE), but I > think Alvaro might be working on shared row-level locks for a future > release. Michael, thanks for the reply. The current approach seems pretty... fragile. I encountered this deadlock because I have two tables (call them T1 and T2), each with a foreign key to the same reference table. I'm inserting about 2 million rows a day into each of these two tables. Rows arrive in logical batches, so to help out performance, I only commit after inserting every N rows. The deadlock arises because thread1 inserts some set of rows into T1 while thread2 inserts a set of rows into T2. The cardinality of the reference table is very small (only about 10 rows) so the inserts into T1 and T2 are virtually guaranteed to both reference the same values. I understand this is easy for me to say since I'm not writing the PG code, but multiple clients should be able to read reference tables simultaneously. Seems like a "prevent write" lock should be placed on the rows in the reference table rather than an "exclusive" lock. That way as many clients as necessary can read the values, but if someone comes along and tries to change a row, just that one client gets an error. With the current implementation, it appears I need to either (1) always commit after every inserted row, or (2) single thread my entire insert logic. Neither of these two alternatives is very desirable. And it is only a partial fix (which will work in my case since I'm the only one updating this database.) In the general case though, where other programmers may be writing code that I may not even know about to update parts of this database, avoiding this type of deadlock becomes very difficult. It pretty much requires that everyone know what everyone else is doing. -- Guy Rouillier
On Sun, Mar 27, 2005 at 06:02:25PM -0600, Guy Rouillier wrote: > With the current implementation, it appears I need to either (1) always > commit after every inserted row, or (2) single thread my entire insert > logic. Neither of these two alternatives is very desirable. I think a usual workaround is to declare the contraints INITIALLY DEFERRED. This will delay the check until commit time, so the time window to deadlock is smaller. There still is a possibility though, so you need to take it into account. It occurs to me that if you control all insertion threads, you could try to serialize access to COMMIT in order to make the chance of deadlock even smaller. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Industry suffers from the managerial dogma that for the sake of stability and continuity, the company should be independent of the competence of individual employees." (E. Dijkstra)
"Stephan Szabo" <sszabo@megazone.bigpanda.com> writes > > Well, that's not the foreign key necessarily. I don't have a machine to > test on at the moment (machine currently dead), but I think the same > happens without a foreign key constraint due to the unique/primary key > constraint on a.i. I see. That's more reasonable - the executor will first wait to see if the constraint on itself satifies, then do the RI check. Thanks, Qingqing
On Sun, Mar 27, 2005 at 06:02:25PM -0600, Guy Rouillier wrote: > With the current implementation, it appears I need to either (1) always > commit after every inserted row, or (2) single thread my entire insert > logic. Neither of these two alternatives is very desirable. And it is > only a partial fix (which will work in my case since I'm the only one > updating this database.) In the general case though, where other > programmers may be writing code that I may not even know about to update > parts of this database, avoiding this type of deadlock becomes very > difficult. It pretty much requires that everyone know what everyone > else is doing. The other possibility is to sort the rows on the foreign key as they are inserted. Then the locks will always be taken in the same order and hence can never deadlock. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Alvaro Herrera wrote: > On Sun, Mar 27, 2005 at 06:02:25PM -0600, Guy Rouillier wrote: > >> With the current implementation, it appears I need to either (1) >> always commit after every inserted row, or (2) single thread my >> entire insert logic. Neither of these two alternatives is very >> desirable. > > I think a usual workaround is to declare the contraints INITIALLY > DEFERRED. This will delay the check until commit time, so the time > window to deadlock is smaller. There still is a possibility though, > so you need to take it into account. It occurs to me that if you > control all insertion threads, you could try to serialize access to > COMMIT in order to make the chance of deadlock even smaller. I changed the definition of the two tables with the common foreign key to use DEFERRABLE INITIALLY DEFERRED. I'm no longer getting a deadlock, but I'm getting another error that may just be masking the same problem. I'd like to understand your caveat about deadlocks still being possible. To help me understand I'll draw a simplistic timeline. A and B are transactions on the two different tables: A !--------------------! B !------------------! At the time that A commits, what is the effect on B? B has not yet attempted to lock any rows in the ref table (the one with the primary key referenced by the foreign keys.) So as I understand it, B should be unaffected by A's commit. Correct? Does the deadlock possibility arise if A and B should try to commit at the same instant? -- Guy Rouillier
On Sun, Mar 27, 2005 at 01:37:44AM -0700, Michael Fuhr wrote: [] > The current implementation supports only > exclusive row-level locks (SELECT FOR UPDATE), but I think Alvaro > might be working on shared row-level locks for a future release. Hmm ... are you saying that SELECT FOR UPDATE exquires an exclusive lock on the row in question in the sense that it conflicts with other *readers* trying to access that row? The documentation would appear to say otherwise: -- snip -- Row-level locks do not affect data querying; they block writers to the same row only. To acquire a row-level lock on a row without actually modifying the row, select the row with SELECT FOR UPDATE. -- snap -- (from: http://www.vitavoom.com/postgresql-docs/explicit-locking.html) and -- snip -- A row-level lock on a specific row is automatically acquired when the row is updated (or deleted or marked for update). The lock is held until the transaction commits or rolls back. Row-level locks do not affect data querying; they block writers to the same row only. -- snap -- (from: http://www.postgresql.org/docs/7.4/static/explicit-locking.html) but then I don't understand the original poster's problem at all, since his queries are only *reading* the referenced tables?? Regards, Frank
On Wed, Mar 30, 2005 at 10:59:52PM +0200, frank@joerdens.de wrote: > On Sun, Mar 27, 2005 at 01:37:44AM -0700, Michael Fuhr wrote: > > The current implementation supports only > > exclusive row-level locks (SELECT FOR UPDATE), but I think Alvaro > > might be working on shared row-level locks for a future release. > > Hmm ... are you saying that SELECT FOR UPDATE exquires an exclusive lock > on the row in question in the sense that it conflicts with other > *readers* trying to access that row? The documentation would appear to > say otherwise: I'm saying that foreign key checks use SELECT FOR UPDATE to ensure that the referenced key doesn't change while the transaction is pending, and that SELECT FOR UPDATE conflicts with other SELECT FOR UPDATE queries. Therefore, if concurrent transactions insert records into a table that has a non-deferred foreign key constraint, and if the foreign key values are the same, then one of the transactions will block. Example: CREATE TABLE foo ( fooid integer PRIMARY KEY ); CREATE TABLE bar ( barid serial PRIMARY KEY, fooid integer NOT NULL REFERENCES foo ); INSERT INTO foo (fooid) VALUES (1); If we now have two transactions that both insert records into bar with the same value for fooid, then one of the transactions will block: T1: BEGIN; T2: BEGIN; T1: INSERT INTO bar (fooid) VALUES (1); T2: INSERT INTO bar (fooid) VALUES (1); -- blocks Transaction T2 blocks because both transactions have done something like "SELECT 1 FROM foo WHERE fooid = 1 FOR UPDATE" to ensure that the referenced key can't change until the transaction completes. So even though both transactions are only reading the referenced table (foo), one has blocked the other. Note that SELECT queries without FOR UPDATE won't block -- it's other FOR UPDATE queries that block, even though they're only reading. I think Alvaro is working on a new locking mechanism that will allow transactions to prevent a record from being modified without blocking other transactions doing the same. Alvaro (or somebody else), please correct me if I'm mistaken, but that's what I've understood from discussions elsewhere. -- Michael Fuhr http://www.fuhr.org/~mfuhr/
On Wed, Mar 30, 2005 at 02:30:58PM -0700, Michael Fuhr wrote: > I think Alvaro is working on a new locking mechanism that will allow > transactions to prevent a record from being modified without blocking > other transactions doing the same. Yeah, and it does work. (I posted the patch two days ago.) alvherre=# create table a (a serial primary key); NOTICE: CREATE TABLE creará una secuencia implícita «a_a_seq» para la columna serial «a.a» NOTICE: CREATE TABLE / PRIMARY KEY creará el índice implícito «a_pkey» para la tabla «a» CREATE TABLE alvherre=# create table b (a int references a); CREATE TABLE alvherre=# insert into a default values; INSERT 0 1 alvherre=# insert into a default values; INSERT 0 1 alvherre=# begin; BEGIN alvherre=# insert into b values (1); INSERT 0 1 Session 2: alvherre=# begin; BEGIN alvherre=# insert into b values (2); INSERT 0 1 alvherre=# insert into b values (1); INSERT 0 1 Session 1: lvherre=# insert into b values (2); INSERT 0 1 alvherre=# commit; COMMIT Session 2: alvherre=# commit; COMMIT You'll notice if you do that on any released version it will deadlock ... Now this can't be applied right away because it's easy to run "out of memory" (shared memory for the lock table). Say, a delete or update that touches 10000 tuples does not work. I'm currently working on a proposal to allow the lock table to spill to disk ... -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "La persona que no quería pecar / estaba obligada a sentarse en duras y empinadas sillas / desprovistas, por cierto de blandos atenuantes" (Patricio Vogel)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > Now this can't be applied right away because it's easy to run "out of > memory" (shared memory for the lock table). Say, a delete or update > that touches 10000 tuples does not work. I'm currently working on a > proposal to allow the lock table to spill to disk ... Is that true even if I'm updating/deleting 1,000 tuples that all reference the same foreign key? It seems like that should only need a single lock per (sub)transaction_id per referenced foreign key. How is this handled currently? Is your patch any worse than the current behaviour? -- greg
On Wed, Mar 30, 2005 at 05:41:04PM -0500, Greg Stark wrote: > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > Now this can't be applied right away because it's easy to run "out of > > memory" (shared memory for the lock table). Say, a delete or update > > that touches 10000 tuples does not work. I'm currently working on a > > proposal to allow the lock table to spill to disk ... > > Is that true even if I'm updating/deleting 1,000 tuples that all reference the > same foreign key? It seems like that should only need a single lock per > (sub)transaction_id per referenced foreign key. Well, in that case you need 1000 PROCLOCK objects, all pointing to the same LOCK object. But it still uses shared memory. > How is this handled currently? Is your patch any worse than the current > behaviour? With my patch it's useless without a provision to spill the lock table. The current situation is that we don't use the lock table to lock tuples; instead we mark them on disk, in the tuple itself. So we can't really mark a tuple more than once (because we have only one bit to mark); that's why we limit tuple locking to exclusive locking (there's no way to mark a tuple with more than one shared lock). With my patch we need a lot of memory for each tuple locked. This needs to be shared memory. Since shared memory is limited, we can't grab an arbitrary number of locks simultaneously. Thus, deleting a whole table can fail. You haven't ever seen Postgres failing in a DELETE FROM table, have you? -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Java is clearly an example of a money oriented programming" (A. Stepanov)
On Mar 30, 2005, at 4:47 PM, Alvaro Herrera wrote: > Now this can't be applied right away because it's easy to run "out of > memory" (shared memory for the lock table). Say, a delete or update > that touches 10000 tuples does not work. I'm currently working on a > proposal to allow the lock table to spill to disk ... > That would be most excellent, as I have deadlocks from transactions that do upwards of 100k inserts selecting data from other tables.
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Wed, Mar 30, 2005 at 05:41:04PM -0500, Greg Stark wrote: > > > > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > > > Is that true even if I'm updating/deleting 1,000 tuples that all reference the > > same foreign key? It seems like that should only need a single lock per > > (sub)transaction_id per referenced foreign key. > > Well, in that case you need 1000 PROCLOCK objects, all pointing to the > same LOCK object. But it still uses shared memory. Why? What do you need these PROCLOCK objects for? You're never going to do anything to these locks but release them all together. > > How is this handled currently? Is your patch any worse than the current > > behaviour? > > With my patch it's useless without a provision to spill the lock table. > The current situation is that we don't use the lock table to lock > tuples; instead we mark them on disk, in the tuple itself. So we can't > really mark a tuple more than once (because we have only one bit to > mark); that's why we limit tuple locking to exclusive locking (there's > no way to mark a tuple with more than one shared lock). For reference, the way Oracle does it (as I understand it) it locks them on disk by reserving a fixed amount of space for each block to record locks. If that space is exhausted then it has some sort of failover mechanism. I think in practice you rarely need more than 1 or 2 lockers. So this works quite well most of the time. But then you're paying the cost in lower i/o throughput all the time. > With my patch we need a lot of memory for each tuple locked. This needs > to be shared memory. Since shared memory is limited, we can't grab an > arbitrary number of locks simultaneously. Thus, deleting a whole table > can fail. You haven't ever seen Postgres failing in a DELETE FROM > table, have you? -- greg
Alvaro Herrera wrote: > > Now this can't be applied right away because it's easy to run "out of > memory" (shared memory for the lock table). Say, a delete or update > that touches 10000 tuples does not work. I'm currently working on a > proposal to allow the lock table to spill to disk ... While not always true, in many cases the cardinality of the referenced (parent) table is small compared to that of the referencing (child) table. Does locking require a separate lock record for each tuple in the child table, or just one for each tuple in the parent table with a reference count? For example, the scenario I started this thread with had two child tables referencing rows in a common parent table. For a given parent tuple, a single "prevent write" lock with a reference count would seem sufficient. -- Guy Rouillier
On Thu, Mar 31, 2005 at 06:54:31PM -0600, Guy Rouillier wrote: > Alvaro Herrera wrote: > > > > Now this can't be applied right away because it's easy to run "out of > > memory" (shared memory for the lock table). Say, a delete or update > > that touches 10000 tuples does not work. I'm currently working on a > > proposal to allow the lock table to spill to disk ... > > While not always true, in many cases the cardinality of the referenced > (parent) table is small compared to that of the referencing (child) > table. Does locking require a separate lock record for each tuple in > the child table, or just one for each tuple in the parent table with a > reference count? Just one. (LOCALLOCK, which is private to each backend, stores how many times we hold a lock.) I just realized we not only need to be able to spill LOCK struct to disk, but also PROCLOCK ... am I right? -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) La web junta la gente porque no importa que clase de mutante sexual seas, tienes millones de posibles parejas. Pon "buscar gente que tengan sexo con ciervos incendiándose", y el computador dirá "especifique el tipo de ciervo" (Jason Alexander)
On Sun, Mar 27, 2005 at 12:54:28AM -0600, Guy Rouillier wrote: [] > The service_plane table is a reference table, i.e., a fixed set of > values used only to validate foreign keys. So the code doesn't have any > update statements on that table. And idea that just came up around here that sounds like a pretty neat workaround, which we're gonna try, is to drop the foreign key constraints, and just use a check constraint for the allowed values. If the cardinality of the reference table is small, this is much faster than using foreign keys and solves your problem. With the drawback that if you update the reference table (if you keep it), you mustn't forget to also update the check constraints in more than one place. Rgds, Frank
On Fri, Apr 01, 2005 at 10:37:11 +0200, frank@joerdens.de wrote: > > And idea that just came up around here that sounds like a pretty neat > workaround, which we're gonna try, is to drop the foreign key > constraints, and just use a check constraint for the allowed values. If > the cardinality of the reference table is small, this is much faster > than using foreign keys and solves your problem. With the drawback that > if you update the reference table (if you keep it), you mustn't forget > to also update the check constraints in more than one place. Using domains is a good way to keep column constraints in just one place.
Alvaro, I suppose there must be reasons not to do this, but have you considered using the "slack" space (empty space) in an ordinary table "heap" page to store share-locks on the tuples in that page? (If not enough space is available then you would still need to use the spilled-to-disk btree structure.) Maybe there could also be something that you put in the disk page that means "transaction X has all the tuples in this page share-locked," since I imagine it will usually be the case that if very many of them are locked, then they are all locked. With this method, checking for a lock would be slightly more complicated since you would need to check the disk page in which the tuple resides first, and the spill-to-disk structure afterwards. It seems that if this would work, you would be able to share-lock every tuple in a table (or a large fraction of them) without a lot of extra IO provided that the pages were not 100 % full. (Which is almost never the case in postgresql anyway.) Paul Tillotson Alvaro Herrera wrote: >On Thu, Mar 31, 2005 at 06:54:31PM -0600, Guy Rouillier wrote: > > >>Alvaro Herrera wrote: >> >> >>>Now this can't be applied right away because it's easy to run "out of >>>memory" (shared memory for the lock table). Say, a delete or update >>>that touches 10000 tuples does not work. I'm currently working on a >>>proposal to allow the lock table to spill to disk ... >>> >>> >>While not always true, in many cases the cardinality of the referenced >>(parent) table is small compared to that of the referencing (child) >>table. Does locking require a separate lock record for each tuple in >>the child table, or just one for each tuple in the parent table with a >>reference count? >> >> > >Just one. (LOCALLOCK, which is private to each backend, stores how many >times we hold a lock.) > >I just realized we not only need to be able to spill LOCK struct to >disk, but also PROCLOCK ... am I right? > > >
On Fri, Apr 01, 2005 at 07:41:54PM -0500, Paul Tillotson wrote: Hi, > I suppose there must be reasons not to do this, but have you considered > using the "slack" space (empty space) in an ordinary table "heap" page > to store share-locks on the tuples in that page? (If not enough space > is available then you would still need to use the spilled-to-disk btree > structure.) No, I hadn't considered it. However this seems hard to do, because the "slack space" does not have a start point; on each page, elements (tuples) grow from the back to the front, while element pointers grow in the reverse direction. So I don't see how would this be done. I've been discussing this with Bruce who has an idea completely different from mine (basically to extend the current tuple locking system instead of extending the lock manager). I'm still unsure about the whole thing. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Ni aun el genio muy grande llegaría muy lejos si tuviera que sacarlo todo de su propio interior" (Goethe)
>>I suppose there must be reasons not to do this, but have you considered >>using the "slack" space (empty space) in an ordinary table "heap" page >>to store share-locks on the tuples in that page? (If not enough space >>is available then you would still need to use the spilled-to-disk btree >>structure.) >> >> > >No, I hadn't considered it. However this seems hard to do, because the >"slack space" does not have a start point; on each page, elements >(tuples) grow from the back to the front, while element pointers grow >in the reverse direction. So I don't see how would this be done. > > Would it work for an updater, who finds that the locks list (currently located in the middle of the empty space) is "in the way" of a new tuple that he wants to insert, to take some kind of lock, move the whole list up or down (spilling some of these locks to the disk if no more space is available), and release it again? Could the lock(s) on a particular tuple be tacked onto the end of that tuple? (I think tuples are stored variable-width anyway, aren't they?) I suppose this is conceptually equivalent to adding a variable width system column which gets TOASTED when it gets too wide. I suppose this would make taking a lock as expensive as doing an update, though, right? Paul Tillotson
On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote: > Would it work for an updater, who finds that the locks list (currently > located in the middle of the empty space) is "in the way" of a new tuple > that he wants to insert, to take some kind of lock, move the whole list > up or down (spilling some of these locks to the disk if no more space is > available), and release it again? Well, at that point you need to take a lock in order to be able to manage locks. Managing not to step on your own feet in that scenario is complex, to say the least, if not downright impossible. Another problem with this approach is that it would be practically impossible for a process to release all its locks when it finishes. No, we have to build this infrastructure purely using LWLocks, which means, no normal relations involved. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "The Postgresql hackers have what I call a "NASA space shot" mentality. Quite refreshing in a world of "weekend drag racer" developers." (Scott Marlowe)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote: >> ... > Well, at that point you need to take a lock in order to be able to > manage locks. Managing not to step on your own feet in that scenario > is complex, to say the least, if not downright impossible. I looked at Paul's first message and thought "nah, that won't work because ... because ... hmm ... hmmm ..." We currently store tuple locks on the same page as the tuples (ie, in the tuple headers) and need no extra locks to do so. Certainly it still has to have a spill mechanism, but the thought that is attractive to me is that until you are forced to spill, you do not have to take any system-wide lock, only a page-level lock. So it could have very good average performance. > Another problem with this approach is that it would be practically > impossible for a process to release all its locks when it finishes. There is no explicit release of tuple-level locks in the current setup. Need it be different in Paul's idea? regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote: > >> ... > > > Well, at that point you need to take a lock in order to be able to > > manage locks. Managing not to step on your own feet in that scenario > > is complex, to say the least, if not downright impossible. > > I looked at Paul's first message and thought "nah, that won't work > because ... because ... hmm ... hmmm ..." For what it's worth, this would be very similar to how Oracle handles such locks. In that case there's a fixed amount of space per page reserved in advance for storing locks. Actually I think it's a bit more complicated. But that's the idea. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> I looked at Paul's first message and thought "nah, that won't work >> because ... because ... hmm ... hmmm ..." > For what it's worth, this would be very similar to how Oracle handles such > locks. [ slightly alarmed ] Do they have a patent on the way they do it? > In that case there's a fixed amount of space per page reserved in > advance for storing locks. Paul's idea involved using a variable amount of space (ie, whatever's free) ... which might be sufficient to escape whatever hypothetical patent Oracle or someone else might hypothetically have on this idea. Or not. Pardon me for feeling a bit skittish at the moment. regards, tom lane
Greg Stark wrote: >Tom Lane <tgl@sss.pgh.pa.us> writes: > > >>Alvaro Herrera <alvherre@dcc.uchile.cl> writes: >> >> >>>On Fri, Apr 01, 2005 at 10:14:07PM -0500, Paul Tillotson wrote: >>> >>> >>>>... >>>> >>>> >>>Well, at that point you need to take a lock in order to be able to >>>manage locks. Managing not to step on your own feet in that scenario >>>is complex, to say the least, if not downright impossible. >>> >>> >>I looked at Paul's first message and thought "nah, that won't work >>because ... because ... hmm ... hmmm ..." >> >> > >For what it's worth, this would be very similar to how Oracle handles such >locks. In that case there's a fixed amount of space per page reserved in >advance for storing locks. > >Actually I think it's a bit more complicated. But that's the idea. > > While we're at it, does anyone know how DB2 and MSSQL server handle row-level locks? Paul
On Fri, Apr 01, 2005 at 11:02:36PM -0500, Tom Lane wrote: [Cc: to -hackers] > We currently store tuple locks on the same page as the tuples (ie, in > the tuple headers) and need no extra locks to do so. Certainly it > still has to have a spill mechanism, but the thought that is attractive > to me is that until you are forced to spill, you do not have to take any > system-wide lock, only a page-level lock. So it could have very good > average performance. If we go this way maybe we should abandon the idea of using the standard lock manager to lock tuples, which is what can be found on the patch I posted. Or maybe not, and just have the lock manager store the locks on the page himself -- but it will have to know about the buffer, so it will be in some sense a break in opacity (of API between the two). One possible way to do it would be having a OffsetNumber stored in the page header, and if it's not InvalidOffsetNumber then it points to a "tuple" that holds struct { OffsetNumber nextLock; LOCK lock } So a locker would check the chain of locks and stop when it sees InvalidOffsetNumber. If there is no free space on the page, what should we do? Store the lock into the main hash table? Another problem would be the XLog. On heap operations, do we register exactly where (position in the page) a tuple was stored, or just the fact that it was stored? If the latter, then there's no problem. If the former, then on the next REDO the records wouldn't match (==> PANIC) -- unless we logged the lockings as well. Reading the code I see we do log the offset numbers, so that's a problem :-( ... maybe we could work around that by moving the pd_lower without using line pointers; not sure if that's doable. -- Alvaro Herrera (<alvherre[@]dcc.uchile.cl>) "Porque Kim no hacia nada, pero, eso sí, con extraordinario éxito" ("Kim", Kipling)
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> I looked at Paul's first message and thought "nah, that won't work > >> because ... because ... hmm ... hmmm ..." > > > For what it's worth, this would be very similar to how Oracle handles such > > locks. > > [ slightly alarmed ] Do they have a patent on the way they do it? Do we really want to know?... A few minutes searching on a patent search site doesn't turn up anything relevant though. It might have been part of Oracle's design from such an early stage that they never thought it was patentable. It's not clear to me that it is for that matter. The general idea of storing locks with the objects being locked isn't really anything novel. -- greg
bruno@wolff.to (Bruno Wolff III) writes: > Using domains is a good way to keep column constraints in just one place. > Speaking of domains, how do you find out what the range of a domain is? eg: test=# create domain fruit as text check( value in ('apple', 'orange', 'banana', 'pear')); CREATE DOMAIN test=# \dD fruit List of domains Schema | Name | Type | Modifier --------+-------+------+---------- public | fruit | text | (1 row) test=# \dD+ fuit List of domains Schema | Name | Type | Modifier --------+-------+------+---------- public | fruit | text | (1 row) A quick look through pg_catalog doesn't suggest anything that would return the check conditions - Is there any way to do this? -- Remove -42 for email
On Fri, Apr 01, 2005 at 09:27:40AM -0700, Edmund Bacon wrote: > > Speaking of domains, how do you find out what the range of a domain > is? I'm not sure if there's a better way, but this appears to work: SELECT pg_get_constraintdef(oid, TRUE) FROM pg_constraint WHERE contypid = 'fruit'::regtype; -- Michael Fuhr http://www.fuhr.org/~mfuhr/
frank@joerdens.de wrote: > On Sun, Mar 27, 2005 at 12:54:28AM -0600, Guy Rouillier wrote: [] >> The service_plane table is a reference table, i.e., a fixed set of >> values used only to validate foreign keys. So the code doesn't have >> any update statements on that table. > > And idea that just came up around here that sounds like a pretty neat > workaround, which we're gonna try, is to drop the foreign key > constraints, and just use a check constraint for the allowed values. > If the cardinality of the reference table is small, this is much > faster than using foreign keys and solves your problem. With the > drawback that if you update the reference table (if you keep it), you > mustn't forget to also update the check constraints in more than one > place. Frank and Bruno, thanks for the idea. I'm going to give this a try. I tried the INITIALLY DEFERRED and I'm getting some strange duplicate key errors. This sounds like a better idea in this scenario, as our reference table only has a dozen unique values. -- Guy Rouillier