Thread: ANALYZE locks pg_listener in EXCLUSIVE for long time?
We are seeing occasional long lockouts from out DB. When I do a 'select * from pg_locks', I find that everybody is waiting for pg_listener, and that the lock on pg_listener is currently held by a long-running ANALYZE VERBOSE. I saw the following in the change logs (not sure if it's relevant): should have gotten this notify. But to do that, we'd have to wait to see if he commits or not, or make UNLISTENhold exclusive lock on pg_listener until commit. Either of these answers is deadlock-prone, not tomention horrible for interactive performance. Do it this way for now. (What happened to that project todo LISTEN/NOTIFY in memory with no table, anyway?) Does this mean that ANALYZE will take an exclusive lock on pg_listener until the ANALYZE finishes? Or is there some other cause? Any help or suggestions would be appreciated... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > Does this mean that ANALYZE will take an exclusive lock on pg_listener > until the ANALYZE finishes? Or is there some other cause? ANALYZE does not take an exclusive lock on anything. However, the async.c functions want AccessExclusiveLock on pg_listener, so they quite possibly would get blocked by ANALYZE's not-so-exclusive lock. Possibly we could reduce the strength of the lock taken by the async.c functions ... I haven't thought hard about it. The long-term answer is certainly a wholesale rewrite of the listen/notify mechanism. regards, tom lane
Tom Lane wrote: > Philip Warner <pjw@rhyme.com.au> writes: > > Does this mean that ANALYZE will take an exclusive lock on pg_listener > > until the ANALYZE finishes? Or is there some other cause? > > ANALYZE does not take an exclusive lock on anything. However, the > async.c functions want AccessExclusiveLock on pg_listener, so they > quite possibly would get blocked by ANALYZE's not-so-exclusive lock. > > Possibly we could reduce the strength of the lock taken by the async.c > functions ... I haven't thought hard about it. The long-term answer is > certainly a wholesale rewrite of the listen/notify mechanism. Gavin was working on it a while ago but I am not sure how far he got. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
At 12:45 PM 3/05/2004, Tom Lane wrote: >Possibly we could reduce the strength of the lock taken by the async.c >functions If possible, this seems like a great option. We currently have a large database with several hundred users who get locked out for as much as half an hour while ANALYZE runs. The data in the database is extremely dynamic, so the analyze needs to be run regularly; we could run less often but this will just mean the problem happens once per week instead of once per day. Would ACCESS SHARE be OK? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > If possible, this seems like a great option. We currently have a large > database with several hundred users who get locked out for as much as half > an hour while ANALYZE runs. If it takes half an hour to ANALYZE pg_listener, I think that ANALYZE is not your real problem :-(. You need a much more aggressive vacuuming policy on that table. Maybe a cron job issuing "vacuum pg_listener" once a minute would do? And get the size of the table knocked down to something less stratospheric to begin with --- perhaps stop all the listeners while you TRUNCATE the table. The existing listen/notify infrastructure isn't really designed for notification rates exceeding a few events per minute ... regards, tom lane
Philip Warner <pjw@rhyme.com.au> writes: > Would ACCESS SHARE be OK? Certainly not, since the point of the locks in async.c is that only one backend should execute those routines at a time. ExclusiveLock might work okay ... but I still haven't thought hard about it ... regards, tom lane
At 01:46 PM 3/05/2004, Tom Lane wrote: >If it takes half an hour to ANALYZE pg_listener, I think that ANALYZE is >not your real problem :-(. You need a much more aggressive vacuuming >policy on that table. Maybe a cron job issuing "vacuum pg_listener" >once a minute would do? And get the size of the table knocked down to >something less stratospheric to begin with --- perhaps stop all the >listeners while you TRUNCATE the table. It's a general ANALYZE command for the entire DB. It's about 6GB in size, and is vacuumed as frequently as possible; there is certainly unreclaimed space, but it does not substantially outweigh used space. My *guess* is that the largest table is being ANALYZEd at the time (it uses most of the 6GB), and for some reason pg_listeners is being locked in ACCESS SHARE the entire time. Just vacuuming pg_listener produces: vacuum verbose pg_listener; INFO: vacuuming "pg_catalog.pg_listener" INFO: "pg_listener": found 0 removable, 0 nonremovable row versions in 0 pages VACUUM ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
On Sun, 2 May 2004, Bruce Momjian wrote: > Tom Lane wrote: > > Philip Warner <pjw@rhyme.com.au> writes: > > > Does this mean that ANALYZE will take an exclusive lock on pg_listener > > > until the ANALYZE finishes? Or is there some other cause? > > > > ANALYZE does not take an exclusive lock on anything. However, the > > async.c functions want AccessExclusiveLock on pg_listener, so they > > quite possibly would get blocked by ANALYZE's not-so-exclusive lock. > > > > Possibly we could reduce the strength of the lock taken by the async.c > > functions ... I haven't thought hard about it. The long-term answer is > > certainly a wholesale rewrite of the listen/notify mechanism. > > Gavin was working on it a while ago but I am not sure how far he got. Its basically written. It is implemented using shared memory. I got stuck when I considered the situation where we rung out of shared memory. Some emails in the archive suggested we just fire all listeners but I didn't like that. What I was considering was that when someone issues a NOTIFY, we reserve a slot for the NOTIFY (plus a message, which is why I originally looked at the problem) in shared memory. At the end of the transaction, we update a flag to say that the transaction successed or we remove it if we've aborted. Does anyone else have any thoughts about it? Gavin
Philip Warner <pjw@rhyme.com.au> writes: > ... for some reason pg_listeners is being locked in ACCESS SHARE the > entire time. > Just vacuuming pg_listener produces: > vacuum verbose pg_listener; > INFO: vacuuming "pg_catalog.pg_listener" > INFO: "pg_listener": found 0 removable, 0 nonremovable row versions in 0 pages > VACUUM [blinks...] There's something pretty strange about that. Are you using LISTEN/NOTIFY at all? regards, tom lane
At 02:21 PM 3/05/2004, Tom Lane wrote: >[blinks...] There's something pretty strange about that. Are you using >LISTEN/NOTIFY at all? Nope. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 02:21 PM 3/05/2004, Tom Lane wrote: >> [blinks...] There's something pretty strange about that. Are you using >> LISTEN/NOTIFY at all? > Nope. In that case there's no reason for anything to be taking any particular locks on pg_listener; and it's simply not possible for ANALYZE to spend half an hour on a zero-page table if it's not blocked by a lock. Could you dig a little deeper and see where the problem really is? regards, tom lane
At 02:33 PM 3/05/2004, Tom Lane wrote: >Could >you dig a little deeper and see where the problem really is? I thought I had 8-(. The result of a 'select * from pg_locks where not granted' was a bunch of locks on the pg_listener relation, and no others. Only one process had a lock on that relation, and it was an ANALYZE command. If there is something else to look for next time it happens, please let me know. > ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
Philip Warner <pjw@rhyme.com.au> writes: > At 02:33 PM 3/05/2004, Tom Lane wrote: >> Could you dig a little deeper and see where the problem really is? > I thought I had 8-(. > The result of a 'select * from pg_locks where not granted' was a bunch of > locks on the pg_listener relation, and no others. Only one process had a > lock on that relation, and it was an ANALYZE command. I don't believe any of this. In the first place, if you're not using LISTEN/NOTIFY then there's no reason for any backend to try to take out an AccessExclusive lock on pg_listener (which is the only kind that would be blocked by ANALYZE's measly AccessShareLock). In the second place, an ANALYZE on a zero-page relation cannot conceivably take half an hour, unless it's in turn being blocked by something else. Please dig deeper. regards, tom lane
At 02:54 PM 3/05/2004, Tom Lane wrote: >I don't believe any of this. mail=# select * from pg_locks where not granted; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-------+---------------------+--------- 16414 | 17149 | | 7847 | AccessExclusiveLock| f 16414 | 17149 | | 51015 | AccessExclusiveLock | f 16414 | 17149 | | 51149 | AccessExclusiveLock | f 16414 | 17149 | | 51024 | AccessExclusiveLock | f 16414 | 17149| | 48838 | AccessExclusiveLock | f 16414 | 17149 | | 57670 | AccessExclusiveLock | f 16414 | 17149 | | 54535 | AccessExclusiveLock | f 16414 | 17149 | | 57756 | AccessExclusiveLock| f 16414 | 17149 | | 25894 | AccessExclusiveLock | f 16414 | 17149 | | 55813 | AccessExclusiveLock | f 16414 | 17149 | | 18181 | AccessExclusiveLock | f 16414 | 17149| | 57846 | AccessExclusiveLock | f 16414 | 17149 | | 44827 | AccessExclusiveLock | f 16414 | 17149 | | 52352 | AccessExclusiveLock | f 16414 | 17149 | | 51239 | AccessExclusiveLock| f 16414 | 17149 | | 14610 | AccessExclusiveLock | f 16414 | 17149 | | 88549 | AccessExclusiveLock | f 16414 | 17149 | | 36939 | AccessExclusiveLock | f 16414 | 17149| | 57120 | AccessExclusiveLock | f 16414 | 17149 | | 89107 | AccessExclusiveLock | f 16414 | 17149 | | 48915 | AccessExclusiveLock | f 16414 | 17149 | | 74102 | AccessExclusiveLock| f 16414 | 17149 | | 51029 | AccessExclusiveLock | f 16414 | 17149 | | 51100 | AccessExclusiveLock | f 16414 | 17149 | | 57865 | AccessExclusiveLock | f 16414 | 17149| | 45777 | AccessExclusiveLock | f 16414 | 17149 | | 22288 | AccessExclusiveLock | f 16414 | 17149 | | 52006 | AccessExclusiveLock | f 16414 | 17149 | | 44829 | AccessExclusiveLock| f 16414 | 17149 | | 49073 | AccessExclusiveLock | f 16414 | 17149 | | 52827 | AccessExclusiveLock | f 16414 | 17149 | | 21003 | AccessExclusiveLock | f 16414 | 17149| | 50204 | AccessExclusiveLock | f mail=# select * from pg_class where oid=16414; relname | relnamespace | reltype | relowner | relam | relfilenode | relpages | reltuples | reltoastrelid | reltoastidxid | relh asindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhasoids | relhaspkey | rel hasrules | relhassubclass | relacl -------------+--------------+---------+----------+-------+-------------+----------+-----------+---------------+---------------+----- --------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+------------+---- ---------+----------------+------------ pg_listener | 11 | 16415 | 1 | 0 | 16414 | 0 | 0 | 0 | 0 | f | f | r | 3 | 0 | 0 | 0 | 0 | 0 | f | f | f | f | {=r/pgsql} (1 row) mail=# select * from pg_locks where granted and relation=16414; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-------+-----------------+--------- 16414 | 17149 | | 56399 | AccessShareLock| t (1 row) mail=# \q [~] root@beast>ps ax | grep 56399; 56399 ?? D 0:03.41 postmaster: pgsql mail 127.0.0.1 ANALYZE (postgres) [~] root@beast>kill -TERM 56399 [~] root@beast>ps ax | grep 56399; [~] root@beast>psql -p5443 -U pgsql mail Welcome to psql 7.3.3, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help on internal slash commands \g or terminate with semicolon to execute query \q to quit mail=# select * from pg_locks where granted and relation=16414; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-----+------+--------- (0 rows) mail=# select * from pg_locks where not granted; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-------+-----------+--------- | | 65738548 | 72279 | ShareLock |f | | 65738548 | 40479 | ShareLock | f | | 65738548 | 46830 | ShareLock | f | | 65738478 | 20762 | ShareLock | f (4 rows) mail=# select * from pg_locks where not granted; relation | database | transaction | pid | mode | granted ----------+----------+-------------+-----+------+--------- (0 rows) ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
At 02:54 PM 3/05/2004, Tom Lane wrote: >Please dig deeper. I will log everything I check next time; unfortunately, when it happens, the priority is on unlocking everything so I have a limited time to play. So far, killing the ANALYZE has fixed the problem each time. ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 03 5330 3172 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp.mit.edu:11371 |/
On Mon, May 03, 2004 at 02:14:18PM +1000, Gavin Sherry wrote: > It is implemented using shared memory. I got stuck when I considered the > situation where we rung out of shared memory. Some emails in the archive > suggested we just fire all listeners but I didn't like that. Can this be kept in backend local memory and then sent to the other backends at transaction commit? If you run out of local memory you can just spill to disk. (With shared memory this seems pretty hard to do.) I'm not sure how would one "send to the other backends." Maybe write another file on disk, one for each remote backend? Surely this can be done somehow. I've heard that on linux-2.6 they are implementing "POSIX message queues" (not sure what those are anyway); maybe we can do that on platforms that support it, for performance. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "In a specialized industrial society, it would be a disaster to have kids running around loose." (Paul Graham)
> I'm not sure how would one "send to the other backends." > Maybe write another file on disk, one for each remote > backend? Surely this can be done somehow. I've heard that > on linux-2.6 they are implementing "POSIX message queues" > (not sure what those are anyway); maybe we can do that on > platforms that support it, for performance. Dunno if this is relevant, but if you want to go with message queues, there is also SystemV message queues. Since postgresql already uses sysv semaphores and shared memory, this would perhaps be portable to more systems that pg supports today (though you'd still need some kind of abstraction layer, since e.g. win32 does not have it). (man msgsnd, msgrcv, msgctl, msgget //Magnus
On Mon, 3 May 2004, Alvaro Herrera wrote: > On Mon, May 03, 2004 at 02:14:18PM +1000, Gavin Sherry wrote: > > > It is implemented using shared memory. I got stuck when I considered the > > situation where we rung out of shared memory. Some emails in the archive > > suggested we just fire all listeners but I didn't like that. > > Can this be kept in backend local memory and then sent to the other > backends at transaction commit? If you run out of local memory you can > just spill to disk. (With shared memory this seems pretty hard to do.) > > I'm not sure how would one "send to the other backends." Maybe write > another file on disk, one for each remote backend? Surely this can be > done somehow. I've heard that on linux-2.6 they are implementing "POSIX > message queues" (not sure what those are anyway); maybe we can do that > on platforms that support it, for performance. What happens in the (unlikely) event that we never find space in shared memory? That's the problem that I am currently trying to solve. We currently just fire all the triggers but is that a great idea? Particularly if we support the passing of a message with a notify. Gavin
> * Is it really a good idea for database-wide ANALYZE to run as a single > transaction? Holding all those locks is a recipe for deadlocks, even > if they're as inoffensive as AccessShareLocks normally are. Wasn't one idea behind that change also to not make the planner create a plan from mixed old and new statistics ? I guess that could later be accomplished with "begin work; analyze; commit work;" (with subtransactions) though. Andreas
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: >> * Is it really a good idea for database-wide ANALYZE to run as a single >> transaction? Holding all those locks is a recipe for deadlocks, even >> if they're as inoffensive as AccessShareLocks normally are. > Wasn't one idea behind that change also to not make the planner create a plan > from mixed old and new statistics ? I don't recall that that was part of the discussion. IIRC all we were after was to let someone invoke ANALYZE from inside a BEGIN/COMMIT block. A possible compromise is to hack ANALYZE so that if it is invoked when *not* within a BEGIN block, it runs a separate transaction for each table. This seems pretty crufty but might satisfy all the requirements. > I guess that could later be accomplished with > "begin work; analyze; commit work;" (with subtransactions) though. AFAICS, locks taken by a (committed) subtransaction can't be released until top-level commit anyhow. Otherwise they fail to perform one of the essential functions of locking in an MVCC environment: to delay another process until the changes you've made are visible to him. regards, tom lane