Re: Obscure: correctness of lock manager??? - Mailing list pgsql-hackers

From Thomas Schoebel-Theuer
Subject Re: Obscure: correctness of lock manager???
Date
Msg-id 200308291054.h7TAsccH027147@eiche.informatik.uni-stuttgart.de
Whole thread Raw
In response to Re: Obscure: correctness of lock manager???  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Obscure: correctness of lock manager???
List pgsql-hackers
Hi Tom,

the problem persists, even when starting from scratch. I did the following:

# wget ftp://ftp.de.postgresql.org/mirror/postgresql/source/v7.3.4/postgresql-7.3.4.tar.gz
# tar xzf postgresql-7.3.4.tar.gz
# cd postgresql-7.3.4/
# cat ../mypatch
--- src/backend/storage/lmgr/lock.c~    2002-11-01 01:40:23.000000000 +0100
+++ src/backend/storage/lmgr/lock.c     2003-08-29 11:23:02.000000000 +0200
@@ -467,6 +467,8 @@
       LWLockAcquire(masterLock, LW_EXCLUSIVE);

+       printf("lock\n"); fflush(stdout);
+       /*        * Find or create a lock with this tag        */
@@ -682,8 +684,13 @@               /*                * Sleep till someone wakes me up.                */
+
+               printf("before wait\n"); fflush(stdout);
+               status = WaitOnLock(lockmethod, lockmode, lock, holder);

+               printf("after wait\n"); fflush(stdout);
+               /*                * NOTE: do not do any material change of state between here and                *
return.     All required changes in locktable state must have been
 
# patch -p0 < ../mypatch
# gmake
# gmake install

After running DBT3 with scale factor 0.025 and 8 concurrent processes:

$ wc -l run/dbt3_logfile 51941 run/dbt3_logfile
$ grep lock run/dbt3_logfile | wc -l 51941
$ grep wait run/dbt3_logfile | wc -l     0

Well, I just added three printf() statements. I cannot imagine
how that could break postgresql.

I repeated the test with following additional modifications:

# cat ../mypatch2
--- src/backend/storage/lmgr/lock.c~    2003-08-29 11:26:37.000000000 +0200
+++ src/backend/storage/lmgr/lock.c     2003-08-29 11:57:26.000000000 +0200
@@ -39,6 +39,7 @@#include "utils/memutils.h"#include "utils/ps_status.h"

+#include <sched.h>
/* This configuration variable is used to set the lock table size */int                    max_locks_per_xact; /* set
byguc.c */
 
@@ -1160,6 +1161,7 @@               ProcLockWakeup(lockMethodTable, lock);
       LWLockRelease(masterLock);
+       sched_yield();       return TRUE;}

@@ -1337,6 +1339,8 @@               elog(LOG, "LockReleaseAll: done");#endif

+       sched_yield();
+       return TRUE;}

This should lead to very heavy scheduling, such that processes are
better interleaved. After running DBT3: same result.

With my other patch producing thorough log output, the sched_yield()
leads to higher probability for observing badly granted locks.

So it is very unlikely that my printf()s and postprocessing of the
logfile leads to that problem. I have even observed cases where
the error occurs within the first 10 locks, such that I can
compute the lock state by hand and verify by hand that there really
exist locks of mode 7 which are granted in parallel to different
processes.

Although I cannot be sure that my environment (kernel, libc,
compiler, ...) produces that behaviour, I think that there
remains some probability for a bug in the lock manager. I have
repeated the tests on two different machines, one of them a
dual-processor Athlon MP-1900+, the other a single processor
Athlon 3000+. OK, both systems are running Redhat 9, so there
remain some chances that something very obscure happens on the
OS level which is reproducible on both systems.

In order to find out possible OS effects, the above tests should
be repeated by other people on other platforms. Please, if anyone
could kindly do that, report the results here.

Tom, it sounds really strange, and I also cannot nearly believe it,
but I could imagine why that problem (if it really exists) was
not detected before. The following is no claim, it is just an idea
how it could have happened. Please don't take it as a personal
threat, I just want to explain that it _could_ be possible that
a non-working lock manager has not led to any noticable problems.
Also, I don't want to stimulate a discussion whether the following
is right or not. It could be wrong.

(1) Most of the locks are con-conflicting by nature.
(2) If I understand it right, read-only txns use time-domain-addressing   and thus never conflict with any other txns.
Onlyread-write   txns can ever produce races on data.
 
(3) Ciritical regions are often only a small percentage of the overall   running time of a process.
(4) Rescheduling by the OS occurs not when processes are woken up,   but rather only when a process blocks for itself
orwhen a   timer interrupt occurs.
 
(5) Current processors are by a factor of 10 million faster than   timer interrupts (typically 100/s). When a process
doesnot   block for itself, it will be interrupted only after 10 million   instructions in average. Thus the
probabilityto hit a critical   region just in that seldom moment is extremely low.
 
(6) I ran my tests on extremely small databases which fit in the   buffer cache of the OS. Real-world apps are doing
muchmore   physical disk IO. At disk IO, rescheduling _always_ occurs at   the same place. When processes are running
lessthan 10ms until   the next timer interrupt, there will be never interruptions   at unforeseeable places.
 

In summary, if this theory is right, it _could_ be _possible_ that
"unpredictable" behaviour has never been noticed, because it occurs
only with extremely low probability.

I don't want to claim that just this is the reality, just provide
some idea how it _could_ have happend if the problem really exists.

Tom, please dig into the problem. If the lock manager is really
wrong, all my measurements are at least questionable, if not void.
I have written a paper relying on that measurements and want
to submit it to a conference in 2 weeks. I hope that fixing that
problem (if it exists) will not lead to toally different behaviour
and render my whole work void. Please, help me by investigating
the problem and finding out what happens, and fixing it if it should
turn out a bug.

Cheers,

Thomas


pgsql-hackers by date:

Previous
From: Giuseppe Tanzilli - CSF Sistemi
Date:
Subject: Re: pgsql 7.4b2 bug on column defaults?
Next
From: Rod Taylor
Date:
Subject: Re: ALTER TABLE