Thread: BUG #2631: database locking problem

BUG #2631: database locking problem

From
"Ross Elliott"
Date:
The following bug has been logged online:

Bug reference:      2631
Logged by:          Ross Elliott
Email address:      ross.elliott@infoterra-global.com
PostgreSQL version: 8.1.4
Operating system:   Red Hat  AS 4
Description:        database locking problem
Details:

I have a dataset of aprox 650 million rows that I load into 5 spatial
(PostGIS) tables using 9 client machines (via JDBC). On 8.1.3 this all works
fine but on 8.1.4 it will fail with just three clients loading the data. The
database will lock up, the process list shows the clients  performing an
INSERT, the postgres locks show each client having an exclusive row lock and
these locks seem to move about, that is it would seem to be working, except
that nothing is being inserted, I've left the system in this state for over
a day just to make sure. So, what happened between 8.1.3 and 8.1.4 that may
have affected locking?

Re: BUG #2631: database locking problem

From
Tom Lane
Date:
"Ross Elliott" <ross.elliott@infoterra-global.com> writes:
> So, what happened between 8.1.3 and 8.1.4 that may
> have affected locking?

Nothing that I know of.  Please provide a self-contained test case
(but are you sure this is not a PostGIS bug?) ...

            regards, tom lane

Re: BUG #2631: database locking problem

From
Michael Fuhr
Date:
On Sun, Sep 17, 2006 at 01:29:33AM -0400, Tom Lane wrote:
> "Ross Elliott" <ross.elliott@infoterra-global.com> writes:
> > So, what happened between 8.1.3 and 8.1.4 that may
> > have affected locking?
>
> Nothing that I know of.  Please provide a self-contained test case
> (but are you sure this is not a PostGIS bug?) ...

What does "SELECT postgis_full_version()" show?  If you still have
the 8.1.3 system then please post the output from both 8.1.3 and
8.1.4.

--
Michael Fuhr

Re: BUG #2631: database locking problem

From
Tom Lane
Date:
"Ross Elliott" <ross.elliott@infoterra-global.com> writes:
> I've been playing with this a bit more by switching to 8.1.3 with
> postgis 1.1.3 and still get the locking problem.

Well, that's dang odd.  PID 9403 seems to be the problem, because it's
got extend lock on topoarea_idx2, which is a lock that should be held
for only a *very* short time.  Apparently it's blocked on an LWLock
inside either ReadBuffer or LockBuffer --- but the LockBuffer should
certainly not block because ReadBuffer should have returned a page that
never existed until just now, and hence couldn't be locked by anyone
else.  Even more interesting, topoarea_idx2 is a plain btree, which
shouldn't be affected at all by postgis.  And none of this code has
changed much recently.

Can you attach to 9403 with a debugger and get a stack trace to confirm
exactly where it's blocked?

            regards, tom lane

Re: BUG #2631: database locking problem

From
"Ross Elliott"
Date:
The 8.1.4 shows POSTGIS="1.1.3" GEOS="2.2.1-CAPI-1.0.1" PROJ="Rel.
4.4.7, 31 March 2003" USE_STATS and the 8.1.3 shows POSTGIS="1.1.2"
GEOS="2.2.1-CAPI-1.0.1" PROJ="Rel. 4.4.7, 31 March 2003" USE_STATS

I've been playing with this a bit more by switching to 8.1.3 with
postgis 1.1.3 and still get the locking problem. I've attached a file
that shows the output of 'select pg_class.relname,pg_locks.* from
pg_class,pg_locks where pg_class.relfilenode=pg_locks.relation' and the
process list when it locks. I've also attached the table descriptions.

I'd really like to nail this as I have a production system going live
very soon (hopefully).




--
Ross Elliott
Senior Software Engineer
Infoterra Ltd
T +44 (0)1252 362095
www.infoterra.co.uk


-----Original Message-----
From: Michael Fuhr [mailto:mike@fuhr.org]
Sent: 17 September 2006 14:41
To: Tom Lane
Cc: Ross Elliott; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #2631: database locking problem

On Sun, Sep 17, 2006 at 01:29:33AM -0400, Tom Lane wrote:
> "Ross Elliott" <ross.elliott@infoterra-global.com> writes:
> > So, what happened between 8.1.3 and 8.1.4 that may have affected
> > locking?
>
> Nothing that I know of.  Please provide a self-contained test case
> (but are you sure this is not a PostGIS bug?) ...

What does "SELECT postgis_full_version()" show?  If you still have the
8.1.3 system then please post the output from both 8.1.3 and 8.1.4.

--
Michael Fuhr

Attachment

Re: BUG #2631: database locking problem

From
"Ross Elliott"
Date:
I'll see what I can get done, I'll need to get it to lockup again as
I've killed everything to try other options. I have a feeling that the
only reason it worked on 8.1.3 before was because it was on slower
servers which were also running other stuff. My new database server is
very fast and only runs the database and if I log all the sql then it
doesn't seem to screw up (at least not yet). One other thing, the
tablespace is not local storage, I have a 7TB SAN partition that runs a
SAN filesystem on it, but this is the same as my older 8.1.3 system.=20


--=20
Ross Elliott
Senior Software Engineer
Infoterra Ltd
T +44 (0)1252 362095
www.infoterra.co.uk
=20

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: 17 September 2006 22:45
To: Ross Elliott
Cc: Michael Fuhr; pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #2631: database locking problem=20

"Ross Elliott" <ross.elliott@infoterra-global.com> writes:
> I've been playing with this a bit more by switching to 8.1.3 with=20
> postgis 1.1.3 and still get the locking problem.

Well, that's dang odd.  PID 9403 seems to be the problem, because it's
got extend lock on topoarea_idx2, which is a lock that should be held
for only a *very* short time.  Apparently it's blocked on an LWLock
inside either ReadBuffer or LockBuffer --- but the LockBuffer should
certainly not block because ReadBuffer should have returned a page that
never existed until just now, and hence couldn't be locked by anyone
else.  Even more interesting, topoarea_idx2 is a plain btree, which
shouldn't be affected at all by postgis.  And none of this code has
changed much recently.

Can you attach to 9403 with a debugger and get a stack trace to confirm
exactly where it's blocked?

            regards, tom lane