Thread: Obscure: correctness of lock manager???
Hi, I'm doing research on locking pattern of applications. I chose PostgreSQL 7.3.3 as an example application due to availability of sourcecode. I instrumented the file backend/storage/lmgr/lock.c with printf() statements in order to find out the locking behaviour of typical applications using locking. I ran the DBT3 benchmark from OSDL on top of PostgreSQL, which should be very similar to the well-known TCP-H benchmark. I ran with small scalefactors (0.05, 0.1) and 8 or 16 backend processes. When I later analyzed the logfiles produced that way, I found a very strange behaviour: exclusive locks were granted in parallel, even to different processes running in parallel. To check that, I added additional traces and found that the lock manager NEVER blocked, it ALWAYS ran through! I don't understand the rationales behind the lock manager deeply, but if my observations were true, it would mean that you just could eliminate the lockmanager at all without producing a different behaviour. To my understanding, the only side-effect of a lockmanager is just to delay processes in case of conflicts. If never a delay occurs, the lock manager has no effect. Probably a database without any lock manager does not satisfy correctness, such as ACID properties. However, the source code is written very cleanly and carefully, and it seems that the authors had some reasons for implementing it that way as it looks like. I found that LockCheckConflicts() did the first shortcut return almost all times, leading to that presumably wrong behaviour. However, I found other cases where the code after the comment "Rats." was entered, also sometimes leading to bad grants of conflicting locks. Interestingly, I found no single case where LockCountMyLocks() was ever called. I don't understand the lock manager deeply enough to trace down that presumable bug. What I also cannot understand: why are locks always granted if they confict with other locks held by the _same_ process? In my opinion, this should be incorrect. When a process holds locks for different transactions, the conflicts should be checked on _transaction_ level, not on process level. Otherwise ACID could be violated, e.g. when SQL statements of multiple txns are executed by one process in an interleaved way, there would be NO locking at all. In my opinion, this could violate ACID. Can anybody explain me the rationales behind the lock manager implementation? Have I overlooked something? Did I miss something? Cheers, Thomas [Sorry if similar postings appeared several times in the mailing list, I had some problems with majordomo]
Thomas Schoebel-Theuer <schoebel@informatik.uni-stuttgart.de> writes: > I instrumented the file backend/storage/lmgr/lock.c with printf() statements > in order to find out the locking behaviour of typical applications using > locking. > When I later analyzed the logfiles produced that way, I found a very strange > behaviour: exclusive locks were granted in parallel, even to different > processes running in parallel. If that were actually true, we'd have noticed it before now ;-). Either you broke something while adding instrumentation, or you are misinterpreting your data. > I found no single case where LockCountMyLocks() was ever called. I'm leaning to the "you broke something" theory. LockCountMyLocks() is in the main path of LockAcquire, and could only be missed if the early-out path is taken due to already holding a lock of the required type. > What I also cannot understand: why are locks always granted if they > confict with other locks held by the _same_ process? In my opinion, > this should be incorrect. It is convenient for our purposes. Consider for example BEGIN;LOCK TABLE foo;INSERT INTO foo ... If we did things as you suggest, this would fail, because the write lock needed by the INSERT would conflict with the prior exclusive LOCK. Working around that would require re-implementing essentially the same we-don't-conflict-with-our-own-locks semantics outside the lock manager. > When a process holds locks for different transactions, We never hold locks for different transactions, because a backend does only one thing at a time. regards, tom lane
Hi Tom, the problem persists, even when starting from scratch. I did the following: # wget ftp://ftp.de.postgresql.org/mirror/postgresql/source/v7.3.4/postgresql-7.3.4.tar.gz # tar xzf postgresql-7.3.4.tar.gz # cd postgresql-7.3.4/ # cat ../mypatch --- src/backend/storage/lmgr/lock.c~ 2002-11-01 01:40:23.000000000 +0100 +++ src/backend/storage/lmgr/lock.c 2003-08-29 11:23:02.000000000 +0200 @@ -467,6 +467,8 @@ LWLockAcquire(masterLock, LW_EXCLUSIVE); + printf("lock\n"); fflush(stdout); + /* * Find or create a lock with this tag */ @@ -682,8 +684,13 @@ /* * Sleep till someone wakes me up. */ + + printf("before wait\n"); fflush(stdout); + status = WaitOnLock(lockmethod, lockmode, lock, holder); + printf("after wait\n"); fflush(stdout); + /* * NOTE: do not do any material change of state between here and * return. All required changes in locktable state must have been # patch -p0 < ../mypatch # gmake # gmake install After running DBT3 with scale factor 0.025 and 8 concurrent processes: $ wc -l run/dbt3_logfile 51941 run/dbt3_logfile $ grep lock run/dbt3_logfile | wc -l 51941 $ grep wait run/dbt3_logfile | wc -l 0 Well, I just added three printf() statements. I cannot imagine how that could break postgresql. I repeated the test with following additional modifications: # cat ../mypatch2 --- src/backend/storage/lmgr/lock.c~ 2003-08-29 11:26:37.000000000 +0200 +++ src/backend/storage/lmgr/lock.c 2003-08-29 11:57:26.000000000 +0200 @@ -39,6 +39,7 @@#include "utils/memutils.h"#include "utils/ps_status.h" +#include <sched.h> /* This configuration variable is used to set the lock table size */int max_locks_per_xact; /* set byguc.c */ @@ -1160,6 +1161,7 @@ ProcLockWakeup(lockMethodTable, lock); LWLockRelease(masterLock); + sched_yield(); return TRUE;} @@ -1337,6 +1339,8 @@ elog(LOG, "LockReleaseAll: done");#endif + sched_yield(); + return TRUE;} This should lead to very heavy scheduling, such that processes are better interleaved. After running DBT3: same result. With my other patch producing thorough log output, the sched_yield() leads to higher probability for observing badly granted locks. So it is very unlikely that my printf()s and postprocessing of the logfile leads to that problem. I have even observed cases where the error occurs within the first 10 locks, such that I can compute the lock state by hand and verify by hand that there really exist locks of mode 7 which are granted in parallel to different processes. Although I cannot be sure that my environment (kernel, libc, compiler, ...) produces that behaviour, I think that there remains some probability for a bug in the lock manager. I have repeated the tests on two different machines, one of them a dual-processor Athlon MP-1900+, the other a single processor Athlon 3000+. OK, both systems are running Redhat 9, so there remain some chances that something very obscure happens on the OS level which is reproducible on both systems. In order to find out possible OS effects, the above tests should be repeated by other people on other platforms. Please, if anyone could kindly do that, report the results here. Tom, it sounds really strange, and I also cannot nearly believe it, but I could imagine why that problem (if it really exists) was not detected before. The following is no claim, it is just an idea how it could have happened. Please don't take it as a personal threat, I just want to explain that it _could_ be possible that a non-working lock manager has not led to any noticable problems. Also, I don't want to stimulate a discussion whether the following is right or not. It could be wrong. (1) Most of the locks are con-conflicting by nature. (2) If I understand it right, read-only txns use time-domain-addressing and thus never conflict with any other txns. Onlyread-write txns can ever produce races on data. (3) Ciritical regions are often only a small percentage of the overall running time of a process. (4) Rescheduling by the OS occurs not when processes are woken up, but rather only when a process blocks for itself orwhen a timer interrupt occurs. (5) Current processors are by a factor of 10 million faster than timer interrupts (typically 100/s). When a process doesnot block for itself, it will be interrupted only after 10 million instructions in average. Thus the probabilityto hit a critical region just in that seldom moment is extremely low. (6) I ran my tests on extremely small databases which fit in the buffer cache of the OS. Real-world apps are doing muchmore physical disk IO. At disk IO, rescheduling _always_ occurs at the same place. When processes are running lessthan 10ms until the next timer interrupt, there will be never interruptions at unforeseeable places. In summary, if this theory is right, it _could_ be _possible_ that "unpredictable" behaviour has never been noticed, because it occurs only with extremely low probability. I don't want to claim that just this is the reality, just provide some idea how it _could_ have happend if the problem really exists. Tom, please dig into the problem. If the lock manager is really wrong, all my measurements are at least questionable, if not void. I have written a paper relying on that measurements and want to submit it to a conference in 2 weeks. I hope that fixing that problem (if it exists) will not lead to toally different behaviour and render my whole work void. Please, help me by investigating the problem and finding out what happens, and fixing it if it should turn out a bug. Cheers, Thomas
Thomas Schoebel-Theuer <schoebel@informatik.uni-stuttgart.de> writes: > the problem persists, even when starting from scratch. I did the following: > + printf("lock\n"); fflush(stdout); > + > $ grep lock run/dbt3_logfile | wc -l I'd bet that your logfile is not accumulating postmaster stdout, but only stderr. Or maybe not even stderr --- where are you getting it from exactly? Perhaps you're logging client-side output, rather than that of the backends. > Tom, it sounds really strange, and I also cannot nearly believe it, > but I could imagine why that problem (if it really exists) was > not detected before. If I actually believed your experiment, it would prove that LockAcquire wasn't being called at all. It is trivial to demonstrate that this is not so, eg, attach to a backend with gdb and set a breakpoint at LockAcquire. Or, if you'd like some more macroscopic proof that the lock manager is doing something, try this: psql session 1: create table a(f1 int);create table b(f1 int);begin;lock table a; psql session 2: begin;lock table b;lock table a;-- note that it blocks waiting for session 1's lock back in session 1: lock table b;-- note deadlock failure report regards, tom lane
Tom, I just realized that I probably could have misinterpreted the locktag information. This could have caused the conflicts in my postprocessing. Apologies if that was the reason. I'm running further checks to resolve that problem. However, what remains strange is that the lockmanager is never blocking, at least when running DBT3. The short patch shows with high confidence that there is no blocking at all. I'm pretty sure that I got the output of the backend processes, because I included getpid() in the full version of the instrumentation. The stripped-down version showed up the "lock\n" lines, so I am sure that I got the backend output. I'll incude getpid() even there to be really sure. If the lock manager were correct in that never any blocking orrurs in DBT3, then I would have another problem: the experiment would be probably meaningless for my research. In that case, do you know any benchmark / application which leads to heavy conflicts in the lock manager such that blocking occurs? Do you have any clue whether it really could happen that really NEVER any blocking occurs when running the DBT3 benachmark with 8 or 16 parallel backend processes??? Thanks for your patience, Thomas
Thomas Schoebel-Theuer <schoebel@informatik.uni-stuttgart.de> writes: > Do you have any clue whether it really could happen that really > NEVER any blocking occurs when running the DBT3 benachmark with 8 > or 16 parallel backend processes??? I haven't looked at dbt3 at all ... what does it do? We do make a point of trying to prevent concurrent readers and writers from blocking each other, so depending on how the benchmark is set up, a low probability of lock conflict is certainly possible. regards, tom lane