Thread: slow queue-like empty table
I have a query which really should be lightning fast (limit 1 from index), but which isn't. I've checked the pg_locks table, there are no locks on the table. The database is not under heavy load at the moment, but the query seems to draw CPU power. I checked the pg_locks view, but found nothing locking the table. It's a queue-like table, lots of rows beeing added and removed to the queue. The queue is currently empty. Have a look: NBET=> vacuum verbose analyze my_queue; INFO: vacuuming "public.my_queue" INFO: index "my_queue_pkey" now contains 34058 row versions in 390 pages DETAIL: 288 index pages have been deleted, 285 are current ly reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "my_queue": found 0 removable, 34058 nonremovable row versions in 185 pages DETAIL: 34058 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: analyzing "public.my_queue" INFO: "my_queue": scanned 185 of 185 pages, containing 0 live rows and 34058 dead rows; 0 rows in sample, 0 estimated totalrows VACUUM NBET=> explain analyze select bet_id from my_queue order by bet_id limit 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.04 rows=1 width=4) (actual time=402.525..402.525 rows=0 loops=1) -> Index Scan using my_queue_pkey on stats_bet_queue (cost=0.00..1314.71 rows=34058 width=4) (actual time=402.518..402.518rows=0 loops=1) Total runtime: 402.560 ms (3 rows) NBET=> select count(*) from my_queue; count ------- 0 (1 row) It really seems like some transaction is still viewing the queue, since it found 38k of non-removable rows ... but how do I find the pid of the transaction viewing the queue? As said, the pg_locks didn't give me any hints ...
[Tobias Brox - Thu at 08:56:31AM +0200] > It really seems like some transaction is still viewing the queue, since > it found 38k of non-removable rows ... but how do I find the pid of the > transaction viewing the queue? As said, the pg_locks didn't give me any > hints ... Dropping the table and recreating it solved the immediate problem, but there must be some better solution than that? :-)
On Thu, 2006-09-28 at 09:36, Tobias Brox wrote: > [Tobias Brox - Thu at 08:56:31AM +0200] > > It really seems like some transaction is still viewing the queue, since > > it found 38k of non-removable rows ... but how do I find the pid of the > > transaction viewing the queue? As said, the pg_locks didn't give me any > > hints ... The open transaction doesn't have to have any locks on your queue table to prevent vacuuming dead rows. It's mere existence is enough... MVCC means that a still running transaction could still see those dead rows, and so VACUUM can't remove them until there's no transaction which started before they were deleted. So long running transactions are your enemy when it comes to high insert/delete rate queue tables. So you should check for "idle in transaction" sessions, those are bad... or any other long running transaction. > Dropping the table and recreating it solved the immediate problem, but > there must be some better solution than that? :-) If you must have long running transactions on your system (like vacuuming another big table - that also qualifies as a long running transaction, though this is fixed in 8.2), then you could use CLUSTER (see the docs), which is currently not MVCC conforming and deletes all the dead space regardless if any other running transaction can see it or not. This is only acceptable if you're application handles the queue table independently, not mixed in complex transactions. And the CLUSTER command takes an exclusive lock on the table, so it won't work for e.g. during a pg_dump, it would keep the queue table locked exclusively for the whole duration of the pg_dump (it won't be able to actually get the lock, but it will prevent any other activity on it, as it looks like in progress exclusive lock requests block any new shared lock request). HTH, Csaba.
On Thu, Sep 28, 2006 at 08:56:31AM +0200, Tobias Brox wrote: > CPU 0.00s/0.00u sec elapsed 0.00 sec. > INFO: "my_queue": found 0 removable, 34058 nonremovable row versions in 185 pages ^^^^^^^ You have a lot of dead rows that can't be removed. You must have a lot of other transactions in process. Note that nobody needs to be _looking_ at those rows for them to be unremovable. The transactions just have to be old enough. > ------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..0.04 rows=1 width=4) (actual time=402.525..402.525 rows=0 loops=1) > -> Index Scan using my_queue_pkey on stats_bet_queue (cost=0.00..1314.71 rows=34058 width=4) (actual time=402.518..402.518rows=0 loops=1) I'm amazed this does an indexscan on an empty table. If this table is "hot", my bet is that you have attempted to optimise in an area that actually isn't an optimisation under PostgreSQL. That is, if you're putting data in there, a daemon is constantly deleting from it, but all your other transactions depend on knowing the value of the "unprocessed queue", the design just doesn't work under PostgreSQL. It turns out to be impossible to keep the table vacuumed well enough for high performance. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
[Csaba Nagy - Thu at 10:45:35AM +0200] > So you should check for "idle in transaction" sessions, those are bad... > or any other long running transaction. Thank you (and others) for pointing this out, you certainly set us on the right track. We did have some few unclosed transactions; transactions not beeing ended by "rollback" or "commit". We've been fixing this, beating up the programmers responsible and continued monitoring. I don't think it's only due to those queue-like tables, we've really seen a significant improvement on the graphs showing load and cpu usage on the database server after we killed all the "idle in transaction". I can safely relax still some weeks before I need to do more optimization work :-) (oh, btw, we didn't really beat up the programmers ... too big geographical distances ;-)
On Oct 4, 2006, at 5:59 AM, Tobias Brox wrote: > [Csaba Nagy - Thu at 10:45:35AM +0200] >> So you should check for "idle in transaction" sessions, those are >> bad... >> or any other long running transaction. > > Thank you (and others) for pointing this out, you certainly set us on > the right track. We did have some few unclosed transactions; > transactions not beeing ended by "rollback" or "commit". We've been > fixing this, beating up the programmers responsible and continued > monitoring. > > I don't think it's only due to those queue-like tables, we've really > seen a significant improvement on the graphs showing load and cpu > usage > on the database server after we killed all the "idle in > transaction". I > can safely relax still some weeks before I need to do more > optimization > work :-) Leaving transactions open for a long time is murder on pretty much any database. It's about one of the worst programming mistakes you can make (from a performance standpoint). Further, mishandling transaction close is a great way to lose data: BEGIN; ...useful work --COMMIT should have happened here ...more work ...ERROR! ROLLBACK; You just lost that useful work. > (oh, btw, we didn't really beat up the programmers ... too big > geographical distances ;-) This warrants a plane ticket. Seriously. If your app programmers aren't versed in transaction management, you should probably be defining a database API that allows the use of autocommit. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)