Thread: pg locking problem

pg locking problem

From
czl@iname.com (charles)
Date:
When running a test with multiple (>=2) users pg 'seizes up' after a
few transactions. Cpu goes 100% and disk goes to 0%. This lasts
'forever' (overnight)On the same test all other tested databases don't
have this problem.

The error occurs with higher tx rate, when transactions bump into each
other more frequently. Some 'deadlock detected' messages appear around
the hang up time, but _not_ always.

Occasionally - but rarely - the seizure looks differently. CPU goes to
0%, disk goes to 0% and, after about one minute, the processing is
resumed.

The current suspicion is that it is due to difference in lock and
deadlock handling between pg and most other dbs. In general I found
that I can't set transaction isolation level to READ_UNCOMMITTED (as
for other databases), the lowest level is READ_COMMITEED.

Any ideas how to reduce this problem? I really want to prove pg perf
with 20-100 users and have a problem running 2...

It wouldn't be fair (to other db's) to rewrite the test with some
pg-only LOCK command etc. I suspect I'm missing something simple, like
one of .conf parameters.


P.S. Using WinNT/Win2K system, pg 7.1.3 (current cygwin), jdbc driver
is jdbc7.1-1.3, cygipc is 1.10-1, java is 1.3.1_01a (current jdk).
Default pg installation, except for bumped up memory and 8 wal files.


Re: pg locking problem

From
Tom Lane
Date:
czl@iname.com (charles) writes:
> When running a test with multiple (>=2) users pg 'seizes up' after a
> few transactions.

Since you haven't told us a thing about what this test is, it's hard
to see how you expect to get any useful help ...
        regards, tom lane


Re: pg locking problem

From
Tatsuo Ishii
Date:
> When running a test with multiple (>=2) users pg 'seizes up' after a
> few transactions. Cpu goes 100% and disk goes to 0%. This lasts
> 'forever' (overnight)On the same test all other tested databases don't
> have this problem.
> 
> The error occurs with higher tx rate, when transactions bump into each
> other more frequently. Some 'deadlock detected' messages appear around
> the hang up time, but _not_ always.
> 
> Occasionally - but rarely - the seizure looks differently. CPU goes to
> 0%, disk goes to 0% and, after about one minute, the processing is
> resumed.
> 
> The current suspicion is that it is due to difference in lock and
> deadlock handling between pg and most other dbs. In general I found
> that I can't set transaction isolation level to READ_UNCOMMITTED (as
> for other databases), the lowest level is READ_COMMITEED.
> 
> Any ideas how to reduce this problem? I really want to prove pg perf
> with 20-100 users and have a problem running 2...

Have you ever tried the test on UNIX (or UNIX like systems)? I have
never seen such a problem with PostgreSQL running on UNIX. Also I
think PostgreSQL on Win is not ready for practical use...

> It wouldn't be fair (to other db's) to rewrite the test with some
> pg-only LOCK command etc. I suspect I'm missing something simple, like
> one of .conf parameters.
> 
> 
> P.S. Using WinNT/Win2K system, pg 7.1.3 (current cygwin), jdbc driver
> is jdbc7.1-1.3, cygipc is 1.10-1, java is 1.3.1_01a (current jdk).
> Default pg installation, except for bumped up memory and 8 wal files.
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org
> 


Re: pg locking problem

From
czl@iname.com (charles)
Date:
For various reasons, not wholly dependent on me, the test should show
good perf on Windows. Otherwise Sorry about that. Can't change the
platform for this one. I did scan the archives, without finding
anything similar - though maybe my search was not thorough enough.

I managed to isolate the bug further. 

1. Running with read-only transactions the bug does not occur. This
means that the bug is not directly related to the number of users (as
long as there's more than one).

2. Running with read-only transactions _and_ just _one_ type of a
read-write transaction the bug occurs. This means that the bug is not
caused by a deadlock - single transaction type always requests the
tables in the same order. (Am I right here? i'm sleepy so my thinking
is not up to scratch). Anyway, regardless which one of read-write tx
types is chosen, the problem occurs.

3. Overall this suggests that, in crude terms, the problem is
triggered when reading updated but uncommitted records. Possibly even
by one user reading updated uncommitted records of another (since this
happens with only two users)

4. The seizure problem manifests itself in high (100%) cpu
utilization. Also, about 80% of that cpu utilization is system state.
All pg processes (for all users) use about the same amount of cpu time
- that is the situation is not caused by one process/user getting out
of whack.


t-ishii@sra.co.jp (Tatsuo Ishii) wrote in message news:<20011114100112O.t-ishii@sra.co.jp>...
> > When running a test with multiple (>=2) users pg 'seizes up' after a
> > few transactions. Cpu goes 100% and disk goes to 0%. This lasts
> > 'forever' (overnight)On the same test all other tested databases don't
> > have this problem.
> > 
> > The error occurs with higher tx rate, when transactions bump into each
> > other more frequently. Some 'deadlock detected' messages appear around
> > the hang up time, but _not_ always.
> > 
> > Occasionally - but rarely - the seizure looks differently. CPU goes to
> > 0%, disk goes to 0% and, after about one minute, the processing is
> > resumed.
> > 
> > The current suspicion is that it is due to difference in lock and
> > deadlock handling between pg and most other dbs. In general I found
> > that I can't set transaction isolation level to READ_UNCOMMITTED (as
> > for other databases), the lowest level is READ_COMMITEED.
> > 
> > Any ideas how to reduce this problem? I really want to prove pg perf
> > with 20-100 users and have a problem running 2...
> 
> Have you ever tried the test on UNIX (or UNIX like systems)? I have
> never seen such a problem with PostgreSQL running on UNIX. Also I
> think PostgreSQL on Win is not ready for practical use...
> 
> > It wouldn't be fair (to other db's) to rewrite the test with some
> > pg-only LOCK command etc. I suspect I'm missing something simple, like
> > one of .conf parameters.
> > 
> > 
> > P.S. Using WinNT/Win2K system, pg 7.1.3 (current cygwin), jdbc driver
> > is jdbc7.1-1.3, cygipc is 1.10-1, java is 1.3.1_01a (current jdk).
> > Default pg installation, except for bumped up memory and 8 wal files.
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 6: Have you searched our list archives?
> > 
> > http://archives.postgresql.org
> > 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)


Re: pg locking problem

From
"Ross J. Reedstrom"
Date:
On Wed, Nov 14, 2001 at 02:14:33PM -0800, charles wrote:
> For various reasons, not wholly dependent on me, the test should show
> good perf on Windows. Otherwise Sorry about that. Can't change the
> platform for this one. I did scan the archives, without finding
> anything similar - though maybe my search was not thorough enough.
> 
> I managed to isolate the bug further. 

<minor rant mode>
Based on your descriptions, I don't think this can really qualify as a
bug. Asking PostgreSQL to adapt to the locking scheme optimized for a
different RDBMS, when it's underlying locking mechanisms are not only
different, but fundamentally _better_ in many contexts, is just silly,
not to mention not "fair" to PostgreSQL. I think you would find, (I am,
of course, guessing, since you've given few actual details of the tests)
that the PG specific rewriting would be to _remove_ LOCK calls.
</minor rant mode>

Given all that, it is still probably bad behavior for PG to use so
much CPU. We're interested in fixing that, but not just in solving your
problem.

> 1. Running with read-only transactions the bug does not occur. This
> means that the bug is not directly related to the number of users (as
> long as there's more than one).
> 
> 2. Running with read-only transactions _and_ just _one_ type of a
> read-write transaction the bug occurs. This means that the bug is not
> caused by a deadlock - single transaction type always requests the
> tables in the same order. (Am I right here? i'm sleepy so my thinking
> is not up to scratch). Anyway, regardless which one of read-write tx
> types is chosen, the problem occurs.
> 
> 3. Overall this suggests that, in crude terms, the problem is
> triggered when reading updated but uncommitted records. Possibly even
> by one user reading updated uncommitted records of another (since this
> happens with only two users)
> 
> 4. The seizure problem manifests itself in high (100%) cpu
> utilization. Also, about 80% of that cpu utilization is system state.
> All pg processes (for all users) use about the same amount of cpu time
> - that is the situation is not caused by one process/user getting out
> of whack.
> 

Several people have suggested tests you could run to help isolate the
problem: running the identical code against an identical PG database
hosted on Unix would tell us if it's a problem with the NT compatibility
layer: i.e. cygwin. As Tatsuo Ishii pointed out, this is a likely cause,
since _many_ people use PG for heavy duty service under Unix, and NT
isn't a primary platform for any of the core developers. But, if your
application can trigger the same behavior on a Unix hosted server,
you will get a _lot_ of attention, trust me.

> > > P.S. Using WinNT/Win2K system, pg 7.1.3 (current cygwin), jdbc driver
> > > is jdbc7.1-1.3, cygipc is 1.10-1, java is 1.3.1_01a (current jdk).
> > > Default pg installation, except for bumped up memory and 8 wal files.

Even if your application _does_ have the same behavior under Unix, the
next thing you'd be asked to do is try the latest version, 7.2b2, which
would be a good idea anyway, though I don't know if whoever builds the
NT binaries has built one yet (hint hint): there's been a lot of bug
fixes and code rework since 7.1.


Ross




Re: pg locking problem

From
czl@iname.com (charles)
Date:
All,

Thanks for all your comments. After looking more at the problem and
reading your mail I suspect that it may be somewhere at the boundary
between cyg-ipc and postgres. Firstly, as you point out, pg is too
heavily used on UNIX, hence a bug like that couldn't have gone
unnoticed. Secondly, nothing in the testing done so far contradicts
that hypothesis.

Unfortunately, end of the year is near so I can't put much more into
this pg bug - hopefully for the time being. And testing it on UNIX is
beyond my time, money and logistics budget on this one, unfortunately.

with many thanks and hope to revisit it soon (got some promising
numbers on single user test)
                    charles


reedstrm@rice.edu ("Ross J. Reedstrom") wrote in message news:<20011115134442.A3811@rice.edu>...
> On Wed, Nov 14, 2001 at 02:14:33PM -0800, charles wrote:
> > For various reasons, not wholly dependent on me, the test should show
> > good perf on Windows. Otherwise Sorry about that. Can't change the
> > platform for this one. I did scan the archives, without finding
> > anything similar - though maybe my search was not thorough enough.
> > 
> > I managed to isolate the bug further. 
> 
> <minor rant mode>
> Based on your descriptions, I don't think this can really qualify as a
> bug. Asking PostgreSQL to adapt to the locking scheme optimized for a
> different RDBMS, when it's underlying locking mechanisms are not only
> different, but fundamentally _better_ in many contexts, is just silly,
> not to mention not "fair" to PostgreSQL. I think you would find, (I am,
> of course, guessing, since you've given few actual details of the tests)
> that the PG specific rewriting would be to _remove_ LOCK calls.
> </minor rant mode>
> 
> Given all that, it is still probably bad behavior for PG to use so
> much CPU. We're interested in fixing that, but not just in solving your
> problem.
> 
> > 1. Running with read-only transactions the bug does not occur. This
> > means that the bug is not directly related to the number of users (as
> > long as there's more than one).
> > 
> > 2. Running with read-only transactions _and_ just _one_ type of a
> > read-write transaction the bug occurs. This means that the bug is not
> > caused by a deadlock - single transaction type always requests the
> > tables in the same order. (Am I right here? i'm sleepy so my thinking
> > is not up to scratch). Anyway, regardless which one of read-write tx
> > types is chosen, the problem occurs.
> > 
> > 3. Overall this suggests that, in crude terms, the problem is
> > triggered when reading updated but uncommitted records. Possibly even
> > by one user reading updated uncommitted records of another (since this
> > happens with only two users)
> > 
> > 4. The seizure problem manifests itself in high (100%) cpu
> > utilization. Also, about 80% of that cpu utilization is system state.
> > All pg processes (for all users) use about the same amount of cpu time
> > - that is the situation is not caused by one process/user getting out
> > of whack.
> > 
> 
> Several people have suggested tests you could run to help isolate the
> problem: running the identical code against an identical PG database
> hosted on Unix would tell us if it's a problem with the NT compatibility
> layer: i.e. cygwin. As Tatsuo Ishii pointed out, this is a likely cause,
> since _many_ people use PG for heavy duty service under Unix, and NT
> isn't a primary platform for any of the core developers. But, if your
> application can trigger the same behavior on a Unix hosted server,
> you will get a _lot_ of attention, trust me.
> 
> > > > P.S. Using WinNT/Win2K system, pg 7.1.3 (current cygwin), jdbc driver
> > > > is jdbc7.1-1.3, cygipc is 1.10-1, java is 1.3.1_01a (current jdk).
> > > > Default pg installation, except for bumped up memory and 8 wal files.
> 
> Even if your application _does_ have the same behavior under Unix, the
> next thing you'd be asked to do is try the latest version, 7.2b2, which
> would be a good idea anyway, though I don't know if whoever builds the
> NT binaries has built one yet (hint hint): there's been a lot of bug
> fixes and code rework since 7.1.
> 
> 
> Ross
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html