Thread: Poor select count(*) performance

Poor select count(*) performance

From

Mike Ivanov

Date:

24 February 2009, 00:49:23

Hi there,

I'm sorry for a stupid question but I'm really stuck.

A query:

SELECT COUNT(*) FROM "lingq_card" WHERE "lingq_card"."context_id" = ...;

An hour ago it took 8 seconds, one minute ago the same query took just only 7 milliseconds.

Any ideas why the execution time varies so wildly?

Explain Analyze gives:

Aggregate (cost=2000.08..2000.09 rows=1 width=0) (actual time=6.962..6.963 rows=1 loops=1)
-> Index Scan using lingq_card_context_id on lingq_card (cost=0.00..1998.68 rows=561 width=0) (actual time=0.025..5.045 rows=2830 loops=1)
Index Cond: (context_id = 68672)
Total runtime: 7.011 ms

The lingq_cards table contains about 1.4 million rows.

Thanks,
Mike

Re: Poor select count(*) performance

From

Scott Marlowe

Date:

24 February 2009, 00:56:07

On Mon, Feb 23, 2009 at 6:44 PM, Mike Ivanov <mike@thelinguist.com> wrote:
> Hi there,
>
> I'm sorry for a stupid question but I'm really stuck.
>
> A query:
>
> SELECT COUNT(*) FROM "lingq_card" WHERE "lingq_card"."context_id" = ...;
>
> An hour ago it took 8 seconds, one minute ago the same query took just only
> 7 milliseconds.

The two common causes are caching and changing query plans.

How many rows did it have to hit, did it use an index, which index did
it use, and were the rows it needed already in the pg shared_buffers
OR the OS / kernel file system cache when retrieved?

Re: Poor select count(*) performance

From

Sam Mason

Date:

24 February 2009, 01:54:26

On Mon, Feb 23, 2009 at 05:44:05PM -0800, Mike Ivanov wrote:
> An hour ago it took 8 seconds, one minute ago the same query took just only
> 7 milliseconds.
>
> Any ideas why the execution time varies so wildly?
>
> Explain Analyze gives:
>
> Aggregate  (cost=2000.08..2000.09 rows=1 width=0) (actual time=6.962..6.963 rows=1 loops=1)
>    ->  Index Scan using lingq_card_context_id on lingq_card
> (cost=0.00..1998.68 rows=561 width=0) (actual time=0.025..5.045 rows=2830 loops=1)
>          Index Cond: (context_id = 68672)
>  Total runtime: 7.011 ms

If you're unlucky in the example above, none of those 2830 rows will be
in memory and you'll have to wait for the disk to bring them all back.
Depending on where these are on disk and how fast your disks are this
could take up to 30 seconds.

If you want this sort of thing to go quicker you could try CLUSTERing
the table on this index, but then this will slow down other queries that
want data to come off the disk in a specific order.  It's a balancing
act!

--
  Sam  http://samason.me.uk/

Re: Poor select count(*) performance

From

Mike Ivanov

Date:

24 February 2009, 06:45:16

On Mon, Feb 23, 2009 at 6:54 PM, Sam Mason <sam@samason.me.uk> wrote:

Depending on where these are on disk and how fast your disks are this
could take up to 30 seconds.

This does not sound very inspiring :-)

Would throwing more hardware (memory, faster CPU) at the server improve the situation?

If you want this sort of thing to go quicker you could try CLUSTERing

This can help because all other queries to this table alway filtered/grouped by context_id field. I will try that.

Thanks a lot, Sam!

Re: Poor select count(*) performance

From

Sam Mason

Date:

24 February 2009, 21:47:01

On Mon, Feb 23, 2009 at 11:21:16PM -0800, Mike Ivanov wrote:
> On Mon, Feb 23, 2009 at 6:54 PM, Sam Mason <sam@samason.me.uk> wrote:
> > Depending on where these are on disk and how fast your disks are this
> > could take up to 30 seconds.
>
> This does not sound very inspiring :-)

It was calculated with a pessimistic seek time of 10ms * 3000 seeks.
Real worst case would be even worse as you'd have to factor in potential
misses of the index as well but that's unlikely.  In practice, a table
is likely to be stored close together on the disk and hence assuming
average seek time is not accurate.  If it's having to go off and read
the index then you may loose this spacial clustering and performance
will suffer.

> Would throwing more hardware (memory, faster CPU) at the server improve the
> situation?

You're IO bound not CPU bound; faster disks would help or if your
dataset's small enough more memory.

--
  Sam  http://samason.me.uk/