Thread: BUG #2631: database locking problem
The following bug has been logged online: Bug reference: 2631 Logged by: Ross Elliott Email address: ross.elliott@infoterra-global.com PostgreSQL version: 8.1.4 Operating system: Red Hat AS 4 Description: database locking problem Details: I have a dataset of aprox 650 million rows that I load into 5 spatial (PostGIS) tables using 9 client machines (via JDBC). On 8.1.3 this all works fine but on 8.1.4 it will fail with just three clients loading the data. The database will lock up, the process list shows the clients performing an INSERT, the postgres locks show each client having an exclusive row lock and these locks seem to move about, that is it would seem to be working, except that nothing is being inserted, I've left the system in this state for over a day just to make sure. So, what happened between 8.1.3 and 8.1.4 that may have affected locking?
"Ross Elliott" <ross.elliott@infoterra-global.com> writes: > So, what happened between 8.1.3 and 8.1.4 that may > have affected locking? Nothing that I know of. Please provide a self-contained test case (but are you sure this is not a PostGIS bug?) ... regards, tom lane
On Sun, Sep 17, 2006 at 01:29:33AM -0400, Tom Lane wrote: > "Ross Elliott" <ross.elliott@infoterra-global.com> writes: > > So, what happened between 8.1.3 and 8.1.4 that may > > have affected locking? > > Nothing that I know of. Please provide a self-contained test case > (but are you sure this is not a PostGIS bug?) ... What does "SELECT postgis_full_version()" show? If you still have the 8.1.3 system then please post the output from both 8.1.3 and 8.1.4. -- Michael Fuhr
"Ross Elliott" <ross.elliott@infoterra-global.com> writes: > I've been playing with this a bit more by switching to 8.1.3 with > postgis 1.1.3 and still get the locking problem. Well, that's dang odd. PID 9403 seems to be the problem, because it's got extend lock on topoarea_idx2, which is a lock that should be held for only a *very* short time. Apparently it's blocked on an LWLock inside either ReadBuffer or LockBuffer --- but the LockBuffer should certainly not block because ReadBuffer should have returned a page that never existed until just now, and hence couldn't be locked by anyone else. Even more interesting, topoarea_idx2 is a plain btree, which shouldn't be affected at all by postgis. And none of this code has changed much recently. Can you attach to 9403 with a debugger and get a stack trace to confirm exactly where it's blocked? regards, tom lane
The 8.1.4 shows POSTGIS="1.1.3" GEOS="2.2.1-CAPI-1.0.1" PROJ="Rel. 4.4.7, 31 March 2003" USE_STATS and the 8.1.3 shows POSTGIS="1.1.2" GEOS="2.2.1-CAPI-1.0.1" PROJ="Rel. 4.4.7, 31 March 2003" USE_STATS I've been playing with this a bit more by switching to 8.1.3 with postgis 1.1.3 and still get the locking problem. I've attached a file that shows the output of 'select pg_class.relname,pg_locks.* from pg_class,pg_locks where pg_class.relfilenode=pg_locks.relation' and the process list when it locks. I've also attached the table descriptions. I'd really like to nail this as I have a production system going live very soon (hopefully). -- Ross Elliott Senior Software Engineer Infoterra Ltd T +44 (0)1252 362095 www.infoterra.co.uk -----Original Message----- From: Michael Fuhr [mailto:mike@fuhr.org] Sent: 17 September 2006 14:41 To: Tom Lane Cc: Ross Elliott; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #2631: database locking problem On Sun, Sep 17, 2006 at 01:29:33AM -0400, Tom Lane wrote: > "Ross Elliott" <ross.elliott@infoterra-global.com> writes: > > So, what happened between 8.1.3 and 8.1.4 that may have affected > > locking? > > Nothing that I know of. Please provide a self-contained test case > (but are you sure this is not a PostGIS bug?) ... What does "SELECT postgis_full_version()" show? If you still have the 8.1.3 system then please post the output from both 8.1.3 and 8.1.4. -- Michael Fuhr
Attachment
I'll see what I can get done, I'll need to get it to lockup again as I've killed everything to try other options. I have a feeling that the only reason it worked on 8.1.3 before was because it was on slower servers which were also running other stuff. My new database server is very fast and only runs the database and if I log all the sql then it doesn't seem to screw up (at least not yet). One other thing, the tablespace is not local storage, I have a 7TB SAN partition that runs a SAN filesystem on it, but this is the same as my older 8.1.3 system.=20 --=20 Ross Elliott Senior Software Engineer Infoterra Ltd T +44 (0)1252 362095 www.infoterra.co.uk =20 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20 Sent: 17 September 2006 22:45 To: Ross Elliott Cc: Michael Fuhr; pgsql-bugs@postgresql.org Subject: Re: [BUGS] BUG #2631: database locking problem=20 "Ross Elliott" <ross.elliott@infoterra-global.com> writes: > I've been playing with this a bit more by switching to 8.1.3 with=20 > postgis 1.1.3 and still get the locking problem. Well, that's dang odd. PID 9403 seems to be the problem, because it's got extend lock on topoarea_idx2, which is a lock that should be held for only a *very* short time. Apparently it's blocked on an LWLock inside either ReadBuffer or LockBuffer --- but the LockBuffer should certainly not block because ReadBuffer should have returned a page that never existed until just now, and hence couldn't be locked by anyone else. Even more interesting, topoarea_idx2 is a plain btree, which shouldn't be affected at all by postgis. And none of this code has changed much recently. Can you attach to 9403 with a debugger and get a stack trace to confirm exactly where it's blocked? regards, tom lane