Thread: Poor select count(*) performance
Hi there,
I'm sorry for a stupid question but I'm really stuck.
A query:
SELECT COUNT(*) FROM "lingq_card" WHERE "lingq_card"."context_id" = ...;
An hour ago it took 8 seconds, one minute ago the same query took just only 7 milliseconds.
Any ideas why the execution time varies so wildly?
Explain Analyze gives:
Aggregate (cost=2000.08..2000.09 rows=1 width=0) (actual time=6.962..6.963 rows=1 loops=1)
-> Index Scan using lingq_card_context_id on lingq_card (cost=0.00..1998.68 rows=561 width=0) (actual time=0.025..5.045 rows=2830 loops=1)
Index Cond: (context_id = 68672)
Total runtime: 7.011 ms
The lingq_cards table contains about 1.4 million rows.
Thanks,
Mike
I'm sorry for a stupid question but I'm really stuck.
A query:
SELECT COUNT(*) FROM "lingq_card" WHERE "lingq_card"."context_id" = ...;
An hour ago it took 8 seconds, one minute ago the same query took just only 7 milliseconds.
Any ideas why the execution time varies so wildly?
Explain Analyze gives:
Aggregate (cost=2000.08..2000.09 rows=1 width=0) (actual time=6.962..6.963 rows=1 loops=1)
-> Index Scan using lingq_card_context_id on lingq_card (cost=0.00..1998.68 rows=561 width=0) (actual time=0.025..5.045 rows=2830 loops=1)
Index Cond: (context_id = 68672)
Total runtime: 7.011 ms
The lingq_cards table contains about 1.4 million rows.
Thanks,
Mike
On Mon, Feb 23, 2009 at 6:44 PM, Mike Ivanov <mike@thelinguist.com> wrote: > Hi there, > > I'm sorry for a stupid question but I'm really stuck. > > A query: > > SELECT COUNT(*) FROM "lingq_card" WHERE "lingq_card"."context_id" = ...; > > An hour ago it took 8 seconds, one minute ago the same query took just only > 7 milliseconds. The two common causes are caching and changing query plans. How many rows did it have to hit, did it use an index, which index did it use, and were the rows it needed already in the pg shared_buffers OR the OS / kernel file system cache when retrieved?
On Mon, Feb 23, 2009 at 05:44:05PM -0800, Mike Ivanov wrote: > An hour ago it took 8 seconds, one minute ago the same query took just only > 7 milliseconds. > > Any ideas why the execution time varies so wildly? > > Explain Analyze gives: > > Aggregate (cost=2000.08..2000.09 rows=1 width=0) (actual time=6.962..6.963 rows=1 loops=1) > -> Index Scan using lingq_card_context_id on lingq_card > (cost=0.00..1998.68 rows=561 width=0) (actual time=0.025..5.045 rows=2830 loops=1) > Index Cond: (context_id = 68672) > Total runtime: 7.011 ms If you're unlucky in the example above, none of those 2830 rows will be in memory and you'll have to wait for the disk to bring them all back. Depending on where these are on disk and how fast your disks are this could take up to 30 seconds. If you want this sort of thing to go quicker you could try CLUSTERing the table on this index, but then this will slow down other queries that want data to come off the disk in a specific order. It's a balancing act! -- Sam http://samason.me.uk/
On Mon, Feb 23, 2009 at 6:54 PM, Sam Mason <sam@samason.me.uk> wrote:
This does not sound very inspiring :-)
Would throwing more hardware (memory, faster CPU) at the server improve the situation?
This can help because all other queries to this table alway filtered/grouped by context_id field. I will try that.
Thanks a lot, Sam!
Depending on where these are on disk and how fast your disks are this
could take up to 30 seconds.
This does not sound very inspiring :-)
Would throwing more hardware (memory, faster CPU) at the server improve the situation?
If you want this sort of thing to go quicker you could try CLUSTERing
This can help because all other queries to this table alway filtered/grouped by context_id field. I will try that.
On Mon, Feb 23, 2009 at 11:21:16PM -0800, Mike Ivanov wrote: > On Mon, Feb 23, 2009 at 6:54 PM, Sam Mason <sam@samason.me.uk> wrote: > > Depending on where these are on disk and how fast your disks are this > > could take up to 30 seconds. > > This does not sound very inspiring :-) It was calculated with a pessimistic seek time of 10ms * 3000 seeks. Real worst case would be even worse as you'd have to factor in potential misses of the index as well but that's unlikely. In practice, a table is likely to be stored close together on the disk and hence assuming average seek time is not accurate. If it's having to go off and read the index then you may loose this spacial clustering and performance will suffer. > Would throwing more hardware (memory, faster CPU) at the server improve the > situation? You're IO bound not CPU bound; faster disks would help or if your dataset's small enough more memory. -- Sam http://samason.me.uk/