Thread: Obscure: correctness of lock manager???

Obscure: correctness of lock manager???

From

Thomas Schoebel-Theuer

Date:

28 August 2003, 18:39:28

Hi,

I'm doing research on locking pattern of applications. I chose PostgreSQL
7.3.3 as an example application due to availability of sourcecode.

I instrumented the file backend/storage/lmgr/lock.c with printf() statements
in order to find out the locking behaviour of typical applications using
locking. I ran the DBT3 benchmark from OSDL on top of PostgreSQL, which
should be very similar to the well-known TCP-H benchmark. I ran with
small scalefactors (0.05, 0.1) and 8 or 16 backend processes.

When I later analyzed the logfiles produced that way, I found a very strange
behaviour: exclusive locks were granted in parallel, even to different
processes running in parallel. To check that, I added additional traces
and found that the lock manager NEVER blocked, it ALWAYS ran through!

I don't understand the rationales behind the lock manager deeply, but
if my observations were true, it would mean that you just could
eliminate the lockmanager at all without producing a different behaviour.
To my understanding, the only side-effect of a lockmanager is just to
delay processes in case of conflicts. If never a delay occurs, the
lock manager has no effect.

Probably a database without any lock manager does not satisfy correctness,
such as ACID properties.

However, the source code is written very cleanly and carefully, and
it seems that the authors had some reasons for implementing it that
way as it looks like.

I found that LockCheckConflicts() did the first shortcut return almost
all times, leading to that presumably wrong behaviour. However, I found
other cases where the code after the comment "Rats." was entered, also
sometimes leading to bad grants of conflicting locks. Interestingly,
I found no single case where LockCountMyLocks() was ever called.

I don't understand the lock manager deeply enough to trace down that
presumable bug.

What I also cannot understand: why are locks always granted if they
confict with other locks held by the _same_ process? In my opinion,
this should be incorrect. When a process holds locks for different transactions,
the conflicts should be checked on _transaction_ level, not on process
level. Otherwise ACID could be violated, e.g. when SQL statements of
multiple txns are executed by one process in an interleaved way, there
would be NO locking at all. In my opinion, this could violate ACID.

Can anybody explain me the rationales behind the lock manager
implementation? Have I overlooked something? Did I miss something?

Cheers,

Thomas

[Sorry if similar postings appeared several times in the mailing list,
I had some problems with majordomo]

Re: Obscure: correctness of lock manager???

From

Tom Lane

Date:

28 August 2003, 19:58:50

Thomas Schoebel-Theuer <schoebel@informatik.uni-stuttgart.de> writes:
> I instrumented the file backend/storage/lmgr/lock.c with printf() statements
> in order to find out the locking behaviour of typical applications using
> locking.

> When I later analyzed the logfiles produced that way, I found a very strange
> behaviour: exclusive locks were granted in parallel, even to different
> processes running in parallel.

If that were actually true, we'd have noticed it before now ;-).
Either you broke something while adding instrumentation, or you are
misinterpreting your data.

> I found no single case where LockCountMyLocks() was ever called.

I'm leaning to the "you broke something" theory.  LockCountMyLocks()
is in the main path of LockAcquire, and could only be missed if the
early-out path is taken due to already holding a lock of the required
type.

> What I also cannot understand: why are locks always granted if they
> confict with other locks held by the _same_ process? In my opinion,
> this should be incorrect.

It is convenient for our purposes.  Consider for example
BEGIN;LOCK TABLE foo;INSERT INTO foo ...

If we did things as you suggest, this would fail, because the write lock
needed by the INSERT would conflict with the prior exclusive LOCK.
Working around that would require re-implementing essentially the same
we-don't-conflict-with-our-own-locks semantics outside the lock manager.

> When a process holds locks for different transactions,

We never hold locks for different transactions, because a backend does
only one thing at a time.
        regards, tom lane

Re: Obscure: correctness of lock manager???

From

Thomas Schoebel-Theuer

Date:

29 August 2003, 07:54:55

Hi Tom,

the problem persists, even when starting from scratch. I did the following:

# wget ftp://ftp.de.postgresql.org/mirror/postgresql/source/v7.3.4/postgresql-7.3.4.tar.gz
# tar xzf postgresql-7.3.4.tar.gz
# cd postgresql-7.3.4/
# cat ../mypatch
--- src/backend/storage/lmgr/lock.c~    2002-11-01 01:40:23.000000000 +0100
+++ src/backend/storage/lmgr/lock.c     2003-08-29 11:23:02.000000000 +0200
@@ -467,6 +467,8 @@
       LWLockAcquire(masterLock, LW_EXCLUSIVE);

+       printf("lock\n"); fflush(stdout);
+       /*        * Find or create a lock with this tag        */
@@ -682,8 +684,13 @@               /*                * Sleep till someone wakes me up.                */
+
+               printf("before wait\n"); fflush(stdout);
+               status = WaitOnLock(lockmethod, lockmode, lock, holder);

+               printf("after wait\n"); fflush(stdout);
+               /*                * NOTE: do not do any material change of state between here and                *
return.     All required changes in locktable state must have been
 
# patch -p0 < ../mypatch
# gmake
# gmake install

After running DBT3 with scale factor 0.025 and 8 concurrent processes:

$ wc -l run/dbt3_logfile 51941 run/dbt3_logfile
$ grep lock run/dbt3_logfile | wc -l 51941
$ grep wait run/dbt3_logfile | wc -l     0

Well, I just added three printf() statements. I cannot imagine
how that could break postgresql.

I repeated the test with following additional modifications:

# cat ../mypatch2
--- src/backend/storage/lmgr/lock.c~    2003-08-29 11:26:37.000000000 +0200
+++ src/backend/storage/lmgr/lock.c     2003-08-29 11:57:26.000000000 +0200
@@ -39,6 +39,7 @@#include "utils/memutils.h"#include "utils/ps_status.h"

+#include <sched.h>
/* This configuration variable is used to set the lock table size */int                    max_locks_per_xact; /* set
byguc.c */
 
@@ -1160,6 +1161,7 @@               ProcLockWakeup(lockMethodTable, lock);
       LWLockRelease(masterLock);
+       sched_yield();       return TRUE;}

@@ -1337,6 +1339,8 @@               elog(LOG, "LockReleaseAll: done");#endif

+       sched_yield();
+       return TRUE;}

This should lead to very heavy scheduling, such that processes are
better interleaved. After running DBT3: same result.

With my other patch producing thorough log output, the sched_yield()
leads to higher probability for observing badly granted locks.

So it is very unlikely that my printf()s and postprocessing of the
logfile leads to that problem. I have even observed cases where
the error occurs within the first 10 locks, such that I can
compute the lock state by hand and verify by hand that there really
exist locks of mode 7 which are granted in parallel to different
processes.

Although I cannot be sure that my environment (kernel, libc,
compiler, ...) produces that behaviour, I think that there
remains some probability for a bug in the lock manager. I have
repeated the tests on two different machines, one of them a
dual-processor Athlon MP-1900+, the other a single processor
Athlon 3000+. OK, both systems are running Redhat 9, so there
remain some chances that something very obscure happens on the
OS level which is reproducible on both systems.

In order to find out possible OS effects, the above tests should
be repeated by other people on other platforms. Please, if anyone
could kindly do that, report the results here.

Tom, it sounds really strange, and I also cannot nearly believe it,
but I could imagine why that problem (if it really exists) was
not detected before. The following is no claim, it is just an idea
how it could have happened. Please don't take it as a personal
threat, I just want to explain that it _could_ be possible that
a non-working lock manager has not led to any noticable problems.
Also, I don't want to stimulate a discussion whether the following
is right or not. It could be wrong.

(1) Most of the locks are con-conflicting by nature.
(2) If I understand it right, read-only txns use time-domain-addressing   and thus never conflict with any other txns.
Onlyread-write   txns can ever produce races on data.
 
(3) Ciritical regions are often only a small percentage of the overall   running time of a process.
(4) Rescheduling by the OS occurs not when processes are woken up,   but rather only when a process blocks for itself
orwhen a   timer interrupt occurs.
 
(5) Current processors are by a factor of 10 million faster than   timer interrupts (typically 100/s). When a process
doesnot   block for itself, it will be interrupted only after 10 million   instructions in average. Thus the
probabilityto hit a critical   region just in that seldom moment is extremely low.
 
(6) I ran my tests on extremely small databases which fit in the   buffer cache of the OS. Real-world apps are doing
muchmore   physical disk IO. At disk IO, rescheduling _always_ occurs at   the same place. When processes are running
lessthan 10ms until   the next timer interrupt, there will be never interruptions   at unforeseeable places.
 

In summary, if this theory is right, it _could_ be _possible_ that
"unpredictable" behaviour has never been noticed, because it occurs
only with extremely low probability.

I don't want to claim that just this is the reality, just provide
some idea how it _could_ have happend if the problem really exists.

Tom, please dig into the problem. If the lock manager is really
wrong, all my measurements are at least questionable, if not void.
I have written a paper relying on that measurements and want
to submit it to a conference in 2 weeks. I hope that fixing that
problem (if it exists) will not lead to toally different behaviour
and render my whole work void. Please, help me by investigating
the problem and finding out what happens, and fixing it if it should
turn out a bug.

Cheers,

Thomas

Re: Obscure: correctness of lock manager???

From

Tom Lane

Date:

29 August 2003, 10:21:06

Thomas Schoebel-Theuer <schoebel@informatik.uni-stuttgart.de> writes:
> the problem persists, even when starting from scratch. I did the following:

> +       printf("lock\n"); fflush(stdout);
> +

> $ grep lock run/dbt3_logfile | wc -l

I'd bet that your logfile is not accumulating postmaster stdout, but
only stderr.  Or maybe not even stderr --- where are you getting it
from exactly?  Perhaps you're logging client-side output, rather than
that of the backends.

> Tom, it sounds really strange, and I also cannot nearly believe it,
> but I could imagine why that problem (if it really exists) was
> not detected before.

If I actually believed your experiment, it would prove that LockAcquire
wasn't being called at all.  It is trivial to demonstrate that this is
not so, eg, attach to a backend with gdb and set a breakpoint at
LockAcquire.  Or, if you'd like some more macroscopic proof that the
lock manager is doing something, try this:

psql session 1:
create table a(f1 int);create table b(f1 int);begin;lock table a;

psql session 2:
begin;lock table b;lock table a;-- note that it blocks waiting for session 1's lock

back in session 1:
lock table b;-- note deadlock failure report
        regards, tom lane

Re: Obscure: correctness of lock manager???

From

Thomas Schoebel-Theuer

Date:

29 August 2003, 11:02:10

Tom,

I just realized that I probably could have misinterpreted the locktag
information. This could have caused the conflicts in my postprocessing.
Apologies if that was the reason. I'm running further checks to
resolve that problem.

However, what remains strange is that the lockmanager is never blocking,
at least when running DBT3. The short patch shows with high confidence
that there is no blocking at all.

I'm pretty sure that I got the output of the backend processes, because
I included getpid() in the full version of the instrumentation. The
stripped-down version showed up the "lock\n" lines, so I am sure that
I got the backend output. I'll incude getpid() even there to be really
sure.

If the lock manager were correct in that never any blocking orrurs
in DBT3, then I would have another problem: the experiment would
be probably meaningless for my research. In that case, do you know
any benchmark / application which leads to heavy conflicts in the
lock manager such that blocking occurs?

Do you have any clue whether it really could happen that really
NEVER any blocking occurs when running the DBT3 benachmark with 8
or 16 parallel backend processes???

Thanks for your patience,

Thomas

Re: Obscure: correctness of lock manager???

From

Tom Lane

Date:

29 August 2003, 11:43:29

Thomas Schoebel-Theuer <schoebel@informatik.uni-stuttgart.de> writes:
> Do you have any clue whether it really could happen that really
> NEVER any blocking occurs when running the DBT3 benachmark with 8
> or 16 parallel backend processes???

I haven't looked at dbt3 at all ... what does it do?

We do make a point of trying to prevent concurrent readers and writers
from blocking each other, so depending on how the benchmark is set up,
a low probability of lock conflict is certainly possible.
        regards, tom lane