Thread: this is in plain text (row level locks)
Iam trying to acquire rowlevel locks in postgresql. I try doing this: 'select * from students where name='Larry' for update; But by looking at the holding array of proclock , I've noticed that by doing this only AccessShareLock gets acquired which is a table level lock. How do I acquire rowlevelock and what fields of Lock or Proclock datastructures indicate it. Thanks Jenny _________________________________________________________________ Protect your PC - get McAfee.com VirusScan Online http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963
"Jenny -" <nat_lazy@hotmail.com> writes: > Iam trying to acquire rowlevel locks in postgresql. I try doing this: > 'select * from students where name='Larry' for update; > But by looking at the holding array of proclock , I've noticed that by doing > this only > AccessShareLock gets acquired which is a table level lock. Row-level locks are not recorded in proclock --- they are implemented by marking the individual tuple on-disk. If we tried to record them in shared memory, it'd be very easy to run out of shared memory, because you could be holding row locks on a large number of tuples. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> "Jenny -" <nat_lazy@hotmail.com> writes: >> Iam trying to acquire rowlevel locks in postgresql. I try doing >>this: 'select * from students where name='Larry' for update; >> But by looking at the holding array of proclock , I'venoticed >> that by doing this only AccessShareLock gets acquired which is >> a table level lock. Tom> Row-level locks are not recorded in proclock --- they are Tom> implemented by marking the individual tuple on-disk. If we Tom> tried to record them in shared memory, it'd be very easy to Tom> run out of shared memory, becauseyou could be holding row Tom> locks on a large number of tuples. Of course, other database systems do this without too much hassle .. including relying on lock escalation (move up to page/table level locks) when the number of locks grow too large. Does pgsql only record X locks on the individual tuples on-disk or does it do so for S locks as well ? Not that I dislike the idea - Toby Lehman suggested this in his Ph.D. thesis in the mid-eighties for main-memory databases (where you don't take the write penalty). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
Sailesh Krishnamurthy wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > Tom> "Jenny -" <nat_lazy@hotmail.com> writes: > >> Iam trying to acquire rowlevel locks in postgresql. I try doing > >> this: 'select * from students where name='Larry' for update; > >> But by looking at the holding array of proclock , I've noticed > >> that by doing this only AccessShareLock gets acquired which is > >> a table level lock. > > Tom> Row-level locks are not recorded in proclock --- they are > Tom> implemented by marking the individual tuple on-disk. If we > Tom> tried to record them in shared memory, it'd be very easy to > Tom> run out of shared memory, because you could be holding row > Tom> locks on a large number of tuples. > > Of course, other database systems do this without too much hassle > .. including relying on lock escalation (move up to page/table level > locks) when the number of locks grow too large. I wouldn't say they do it with little hassle --- it is quite a pain, in fact, at least for users. > Does pgsql only record X locks on the individual tuples on-disk or > does it do so for S locks as well ? We don't need to shared lock individual rows because of MVCC --- well, we sort of do by recording our xid in our proc structure, so folks don't change things underneath us. We prevent expired rows from disappearing from the disk by others looking at our proc start xid. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Sailesh Krishnamurthy wrote: >> Does pgsql only record X locks on the individual tuples on-disk or >> does it do so for S locks as well ? > We don't need to shared lock individual rows because of MVCC --- well, > we sort of do by recording our xid in our proc structure, so folks don't > change things underneath us. We prevent expired rows from disappearing > from the disk by others looking at our proc start xid. This is actually an issue though. Row-level shared locks would be really nice to have for foreign-key handling. Right now we have to use X locks for those, and that leads to deadlocking problems for applications. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >>Sailesh Krishnamurthy wrote: >> >>>Does pgsql only record X locks on the individual tuples on-disk or >>>does it do so for S locks as well ? > > >>We don't need to shared lock individual rows because of MVCC --- well, >>we sort of do by recording our xid in our proc structure, so folks don't >>change things underneath us. We prevent expired rows from disappearing >>from the disk by others looking at our proc start xid. > > > This is actually an issue though. Row-level shared locks would be > really nice to have for foreign-key handling. Right now we have to > use X locks for those, and that leads to deadlocking problems for > applications. Yes! Yes! It's the last big hurdle for an otherwise excellent RI implementation... Just wanted "Joe-user's" enthusiasm for row-level S locks registered somewhere... :-) Mike Mascari mascarm@mascari.com
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Sailesh Krishnamurthy wrote: > >> Does pgsql only record X locks on the individual tuples on-disk or > >> does it do so for S locks as well ? > > > We don't need to shared lock individual rows because of MVCC --- well, > > we sort of do by recording our xid in our proc structure, so folks don't > > change things underneath us. We prevent expired rows from disappearing > > from the disk by others looking at our proc start xid. > > This is actually an issue though. Row-level shared locks would be > really nice to have for foreign-key handling. Right now we have to > use X locks for those, and that leads to deadlocking problems for > applications. Is the plan to allow one backend to shared lock the row while others can read it but not modify it, or is the idea to actually allow multiple backends to record their shared status on the row? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Tom Lane wrote: >> This is actually an issue though. Row-level shared locks would be >> really nice to have for foreign-key handling. Right now we have to >> use X locks for those, and that leads to deadlocking problems for >> applications. > Is the plan to allow one backend to shared lock the row while others can > read it but not modify it, or is the idea to actually allow multiple > backends to record their shared status on the row? Plan? We have no plan to fix this :-(. But clearly there has to be some way to tell which backends hold read locks on a shared-locked row, else you can't tell if they've all dropped the lock or not. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > Tom Lane wrote: > >> This is actually an issue though. Row-level shared locks would be > >> really nice to have for foreign-key handling. Right now we have to > >> use X locks for those, and that leads to deadlocking problems for > >> applications. > > > Is the plan to allow one backend to shared lock the row while others can > > read it but not modify it, or is the idea to actually allow multiple > > backends to record their shared status on the row? > > Plan? We have no plan to fix this :-(. But clearly there has to be > some way to tell which backends hold read locks on a shared-locked row, > else you can't tell if they've all dropped the lock or not. I suppose we could allow one backend to mark the page with a shared lock for primary key purposes while others read it. Does that buy us anything? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I suppose we could allow one backend to mark the page with a shared lock > for primary key purposes while others read it. Does that buy us > anything? That doesn't work, unless you insist that the first backend can't exit its transaction until all the other ones are done. Which introduces its own possibilities for deadlock --- but even worse, how does the first backend *know* that the other ones are done? You're right back where you started: it has to be possible to tell which backends have share-locked a particular row. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> That doesn't work, unless you insist that the first backend Tom> can't exit its transaction until all the otherones are done. Tom> Which introduces its own possibilities for deadlock --- but Tom> even worse, how does the firstbackend *know* that the other Tom> ones are done? You're right back where you started: it has Tom> to be possibleto tell which backends have share-locked a Tom> particular row. Is a count a solution ? The first backend gets the S lock on the row - I'm assuming you plan to do it by recording it on the tuple and not in a shared memory lock table, which means that you might have to unnecessarily write an unmodified page if its buffer pool frame is stolen. The problem is that on commit time, you must carefully decrement the count value of shared locks on any tuple that you own. This can be accomplished by having each backend keep track of the list of files and TIDs for any rows for which it acquired S locks. Is this the same way that pgsql releases the X locks ? Bruce, I don't disagree that MVCC has the very nice property that writers don't block readers. However, I don't buy that 2-phase locking, with lock escalation is either unworkable because of too many locks, or causes any extra pain for the user application (apart from the fact that writers not blocking readers gives you more concurrency at some very minor overhead of not being strictly serializable). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Thu, 2003-07-24 at 02:45, Sailesh Krishnamurthy wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > Tom> That doesn't work, unless you insist that the first backend > Tom> can't exit its transaction until all the other ones are done. > Tom> Which introduces its own possibilities for deadlock --- but > Tom> even worse, how does the first backend *know* that the other > Tom> ones are done? You're right back where you started: it has > Tom> to be possible to tell which backends have share-locked a > Tom> particular row. > > Is a count a solution ? Almost. Problem with a count is the difficulty decrementing the count after a crash. You would also need to store a log of some kind so you know what to reset while starting up in error recovery mode which would add a large amount of overhead. It may be best to have a locking manager run as a separate process. That way it could store locks in ram or spill over to disk. Now the problem is communication overhead. It probably isn't any worse than writing them to disk for long queries (actual disk activity resulting), but short queries are probably going to notice.
Rod Taylor <rbt@rbt.ca> writes: > It may be best to have a locking manager run as a separate process. > That way it could store locks in ram or spill over to disk. Hmm, that might be workable. We could imagine that in place of the HEAP_MARKED_FOR_UPDATE status bit, we have a "this row is possibly locked" hint bit. Only if you see the bit set do you need to query the lock manager. If the answer comes back that no lock is held, you can clear the bit --- so no need for any painful "undo" stuff after a crash, and no communication overhead in the normal case. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Rod Taylor <rbt@rbt.ca> writes: >> It may be best to have a locking manager run as a separate >> process. Thatway it could store locks in ram or spill over to >> disk. Tom> Hmm, that might be workable. We could imagine that in place Tom> of the HEAP_MARKED_FOR_UPDATE status bit, wehave a "this row Tom> is possibly locked" hint bit. Only if you see the bit set do Tom> you need to query the lockmanager. If the answer comes back Why do you want to query the lock manager as a separate process ? Why not have the traditional approach of a lock table in shared memory, growing and shrinking as appropriate, and have each individual process update it (need to protect it with a latch of course). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: > Why not have the traditional approach of a lock table in shared > memory, growing and shrinking as appropriate, Because we can't grow shared memory. Whatever size we get at startup is what we're stuck with. (I suppose we could try asking the kernel for additional segments, but there's every likelihood that that will fail.) regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: >> Why not have the traditional approach of a lock tablein shared >> memory, growing and shrinking as appropriate, Tom> Because we can't grow shared memory. Whatever size we get at Tom> startup is what we're stuck with. (I supposewe could try Tom> asking the kernel for additional segments, but there's every Tom> likelihood that that willfail.) We implemented a Shared Memory MemoryContext using OSSP libmm (used in Apache) for TelegraphCQ. If you think it's useful I can submit it as a patch. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
Sailesh Krishnamurthy wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > Tom> Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: > >> Why not have the traditional approach of a lock table in shared > >> memory, growing and shrinking as appropriate, > > Tom> Because we can't grow shared memory. Whatever size we get at > Tom> startup is what we're stuck with. (I suppose we could try > Tom> asking the kernel for additional segments, but there's every > Tom> likelihood that that will fail.) > > We implemented a Shared Memory MemoryContext using OSSP libmm (used in > Apache) for TelegraphCQ. > > If you think it's useful I can submit it as a patch. > > -- > Pip-pip > Sailesh > http://www.cs.berkeley.edu/~sailesh ^^^^^^^^^^^^^^^ Watch out, that code from Berkeley usually is a mess. :-) Seriously, though, it would be good to see what you guys are up to. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: > We implemented a Shared Memory MemoryContext using OSSP libmm (used in > Apache) for TelegraphCQ. How portable is that? What kind of real-world performance do you get? regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: >> We implemented a Shared Memory MemoryContext using OSSPlibmm >> (used in Apache) for TelegraphCQ. Tom> How portable is that? What kind of real-world performance do Tom> you get? As portable as libmm I guess. We've only tried it on Linux and MacOS X. I understand that the Apache guys use it - so it should be fairly portable. http://www.ossp.org/pkg/lib/mm/ Actually I've also tried it successfully on SPARC Solaris. I believe that a user of our first TelegraphCQ release (that we did not announce) tried it on FreeBSD. Sean Chittenden has tried it on FreeBSD but I'm not sure if he built it and got it to work properly. As for our SHMemoryContext code itself, it's pretty much a straight copy of the code in aset.c and mctx.c, with appropriate changes where we call the mm_alloc functions and also use mm_lock and mm_unlock for protection. As for performance, I'm in the midst of a performance study, but it's not a micro-benchmark of the shmem allocator. After I get some numbers I'll start profiling our code. One problem is that on some platforms the locking implementation uses the file-system. This is not in our control and we just trust mm to do the best possible. Given that we rely on shared query execution (we run multiple queries in a single adaptive query plan) we don't have much choice - we must rely on shared memory. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
>>>>> "Bruce" == Bruce Momjian <pgman@candle.pha.pa.us> writes: >> -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh Bruce> ^^^^^^^^^^^^^^^ Watch out, that code from Bruce> Berkeley usually is a mess. :-) LOL ! That's why we release the code - in the hope of getting real hackers playing with it :-) Bruce> Seriously, though, it would be good to see what you guys Bruce> are up to. Our code is based on 7.3.2 pgsql and available at: http://telegraph.cs.berkeley.edu/telegraphcq If the specific shm stuff is interesting, I can easily take it out and submit it. If you prefer to just look at the code in the tarball that's fine too. Or you can browse our cvs repository at: http://triplerock.cs.berkeley.edu:8080/viewcvs/viewcvs.cgi/ Look for shmctx.c and shmset.c in src/backend/utils/mmgr -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
I was thinking of adding to TODO:* Allow shared row locks for referential integrity but how is that different from: * Implement dirty reads and use them in RI triggers --------------------------------------------------------------------------- Tom Lane wrote: > Rod Taylor <rbt@rbt.ca> writes: > > It may be best to have a locking manager run as a separate process. > > That way it could store locks in ram or spill over to disk. > > Hmm, that might be workable. We could imagine that in place of the > HEAP_MARKED_FOR_UPDATE status bit, we have a "this row is possibly > locked" hint bit. Only if you see the bit set do you need to query > the lock manager. If the answer comes back that no lock is held, > you can clear the bit --- so no need for any painful "undo" stuff > after a crash, and no communication overhead in the normal case. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I was thinking of adding to TODO: > * Allow shared row locks for referential integrity > but how is that different from: > * Implement dirty reads and use them in RI triggers It'd be a completely different approach to solving the FK locking problem. I wouldn't think we'd do both. Personally I'd feel more comfortable with a shared-lock approach, if we could work out the scalability issues. Dirty reads seem ... well ... dirty. regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I was thinking of adding to TODO: > > * Allow shared row locks for referential integrity > > but how is that different from: > > * Implement dirty reads and use them in RI triggers > > It'd be a completely different approach to solving the FK locking > problem. I wouldn't think we'd do both. > > Personally I'd feel more comfortable with a shared-lock approach, if we > could work out the scalability issues. Dirty reads seem ... well ... > dirty. TODO updated: * Implement dirty reads or shared locks and use them in RI triggers -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Personally I'd feel more comfortable with a shared-lock Tom> approach, if we could work out the scalability issues. Dirty Tom> reads seem ... well ... dirty. Tom I was going to do some experiments to measure the costs of our shared memory MemoryContext. Is there something specific you'd be interested in ? My first goal is mainly to measure the amount of time it takes to allocate and deallocate shared memory in the contested and uncontested cases. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
>From: Tom Lane <tgl@sss.pgh.pa.us> >To: "Jenny -" <nat_lazy@hotmail.com> >Subject: Re: [HACKERS] this is in plain text (row level locks) Date: Sat, >02 Aug 2003 23:28:30 -0400 > > > if row-level locks are not recorded in proclock or any other shared >memory > > datastructuers, then why does lockmode (array or ints) of proclock >indicate > > that an AccessShareLock is acquired when a row is locked by >application.? > >That's a table lock --- it's independent of row locks. It's there >mostly to ensure someone doesn't delete the whole table out from under >you. > > regards, tom lane so even though the application locks a row in a table, table-level locks are automatically taken by postgesql ? why is that? thanks _________________________________________________________________ The new MSN 8: advanced junk mail protection and 2 months FREE* http://join.msn.com/?page=features/junkmail
"Jenny -" <nat_lazy@hotmail.com> writes: > so even though the application locks a row in a table, table-level locks are > automatically taken by postgesql ? why is that? So that the table doesn't disappear while you're trying to scan it. (Or afterwards --- a row-level lock wouldn't be noticed by DROP TABLE.) Note that AccessShareLock is a pretty weak kind of lock, and holding it does not prevent most other operations. regards, tom lane