Thread: slow queue-like empty table

slow queue-like empty table

From
Tobias Brox
Date:
I have a query which really should be lightning fast (limit 1 from
index), but which isn't.  I've checked the pg_locks table, there are no
locks on the table.  The database is not under heavy load at the moment,
but the query seems to draw CPU power.  I checked the pg_locks view, but
found nothing locking the table.  It's a queue-like table, lots of rows
beeing added and removed to the queue.  The queue is currently empty.
Have a look:

NBET=> vacuum verbose analyze my_queue;
INFO:  vacuuming "public.my_queue"
INFO:  index "my_queue_pkey" now contains 34058 row
versions in 390 pages
DETAIL:  288 index pages have been deleted, 285 are current
ly reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "my_queue": found 0 removable, 34058 nonremovable row versions in 185 pages
DETAIL:  34058 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.my_queue"
INFO:  "my_queue": scanned 185 of 185 pages, containing 0 live rows and 34058 dead rows; 0 rows in sample, 0 estimated
totalrows 
VACUUM
NBET=> explain analyze select bet_id from my_queue order by bet_id limit 1;
                                                                      QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=402.525..402.525 rows=0 loops=1)
   ->  Index Scan using my_queue_pkey on stats_bet_queue  (cost=0.00..1314.71 rows=34058 width=4) (actual
time=402.518..402.518rows=0 loops=1) 
 Total runtime: 402.560 ms
(3 rows)

NBET=> select count(*) from my_queue;
 count
-------
     0
(1 row)

It really seems like some transaction is still viewing the queue, since
it found 38k of non-removable rows ... but how do I find the pid of the
transaction viewing the queue?  As said, the pg_locks didn't give me any
hints ...


Re: slow queue-like empty table

From
Tobias Brox
Date:
[Tobias Brox - Thu at 08:56:31AM +0200]
> It really seems like some transaction is still viewing the queue, since
> it found 38k of non-removable rows ... but how do I find the pid of the
> transaction viewing the queue?  As said, the pg_locks didn't give me any
> hints ...

Dropping the table and recreating it solved the immediate problem, but
there must be some better solution than that? :-)

Re: slow queue-like empty table

From
Csaba Nagy
Date:
On Thu, 2006-09-28 at 09:36, Tobias Brox wrote:
> [Tobias Brox - Thu at 08:56:31AM +0200]
> > It really seems like some transaction is still viewing the queue, since
> > it found 38k of non-removable rows ... but how do I find the pid of the
> > transaction viewing the queue?  As said, the pg_locks didn't give me any
> > hints ...

The open transaction doesn't have to have any locks on your queue table
to prevent vacuuming dead rows. It's mere existence is enough... MVCC
means that a still running transaction could still see those dead rows,
and so VACUUM can't remove them until there's no transaction which
started before they were deleted.

So long running transactions are your enemy when it comes to high
insert/delete rate queue tables.

So you should check for "idle in transaction" sessions, those are bad...
or any other long running transaction.


> Dropping the table and recreating it solved the immediate problem, but
> there must be some better solution than that? :-)

If you must have long running transactions on your system (like
vacuuming another big table - that also qualifies as a long running
transaction, though this is fixed in 8.2), then you could use CLUSTER
(see the docs), which is currently not MVCC conforming and deletes all
the dead space regardless if any other running transaction can see it or
not. This is only acceptable if you're application handles the queue
table independently, not mixed in complex transactions. And the CLUSTER
command takes an exclusive lock on the table, so it won't work for e.g.
during a pg_dump, it would keep the queue table locked exclusively for
the whole duration of the pg_dump (it won't be able to actually get the
lock, but it will prevent any other activity on it, as it looks like in
progress exclusive lock requests block any new shared lock request).

HTH,
Csaba.



Re: slow queue-like empty table

From
Andrew Sullivan
Date:
On Thu, Sep 28, 2006 at 08:56:31AM +0200, Tobias Brox wrote:
> CPU 0.00s/0.00u sec elapsed 0.00 sec.
> INFO:  "my_queue": found 0 removable, 34058 nonremovable row versions in 185 pages
                                        ^^^^^^^

You have a lot of dead rows that can't be removed.  You must have a
lot of other transactions in process.  Note that nobody needs to be
_looking_ at those rows for them to be unremovable.  The transactions
just have to be old enough.


>
-------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.04 rows=1 width=4) (actual time=402.525..402.525 rows=0 loops=1)
>    ->  Index Scan using my_queue_pkey on stats_bet_queue  (cost=0.00..1314.71 rows=34058 width=4) (actual
time=402.518..402.518rows=0 loops=1) 

I'm amazed this does an indexscan on an empty table.

If this table is "hot", my bet is that you have attempted to optimise
in an area that actually isn't an optimisation under PostgreSQL.
That is, if you're putting data in there, a daemon is constantly
deleting from it, but all your other transactions depend on knowing
the value of the "unprocessed queue", the design just doesn't work
under PostgreSQL.  It turns out to be impossible to keep the table
vacuumed well enough for high performance.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism.
                --Brad Holland

Re: slow queue-like empty table

From
Tobias Brox
Date:
[Csaba Nagy - Thu at 10:45:35AM +0200]
> So you should check for "idle in transaction" sessions, those are bad...
> or any other long running transaction.

Thank you (and others) for pointing this out, you certainly set us on
the right track.  We did have some few unclosed transactions;
transactions not beeing ended by "rollback" or "commit".  We've been
fixing this, beating up the programmers responsible and continued
monitoring.

I don't think it's only due to those queue-like tables, we've really
seen a significant improvement on the graphs showing load and cpu usage
on the database server after we killed all the "idle in transaction".  I
can safely relax still some weeks before I need to do more optimization
work :-)

(oh, btw, we didn't really beat up the programmers ... too big
geographical distances ;-)

Re: slow queue-like empty table

From
Jim Nasby
Date:
On Oct 4, 2006, at 5:59 AM, Tobias Brox wrote:
> [Csaba Nagy - Thu at 10:45:35AM +0200]
>> So you should check for "idle in transaction" sessions, those are
>> bad...
>> or any other long running transaction.
>
> Thank you (and others) for pointing this out, you certainly set us on
> the right track.  We did have some few unclosed transactions;
> transactions not beeing ended by "rollback" or "commit".  We've been
> fixing this, beating up the programmers responsible and continued
> monitoring.
>
> I don't think it's only due to those queue-like tables, we've really
> seen a significant improvement on the graphs showing load and cpu
> usage
> on the database server after we killed all the "idle in
> transaction".  I
> can safely relax still some weeks before I need to do more
> optimization
> work :-)

Leaving transactions open for a long time is murder on pretty much
any database. It's about one of the worst programming mistakes you
can make (from a performance standpoint). Further, mishandling
transaction close is a great way to lose data:

BEGIN;
...useful work
--COMMIT should have happened here
...more work
...ERROR!
ROLLBACK;

You just lost that useful work.

> (oh, btw, we didn't really beat up the programmers ... too big
> geographical distances ;-)

This warrants a plane ticket. Seriously. If your app programmers
aren't versed in transaction management, you should probably be
defining a database API that allows the use of autocommit.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)