Thread: Database Select Slow

Database Select Slow

From
"carter ck"
Date:
Hi all,

I am facing a performance issue here. Whenever I do a count(*) on a table
that contains about 300K records, it takes few minutes to complete. Whereas
my other application which is counting > 500K records just take less than 10
seconds to complete.

I have indexed all the essential columns and still it does not improve the
speed.

All helps and advice are appreciated.

Thanks.

_________________________________________________________________
Check it out! Windows Live Spaces is here! http://spaces.live.com/?mkt=en-sg
It�s easy to create your own personal Web site.


Re: Database Select Slow

From
"A. Kretschmer"
Date:
am  Fri, dem 10.08.2007, um 17:46:11 +0800 mailte carter ck folgendes:
> Hi all,
>
> I am facing a performance issue here. Whenever I do a count(*) on a table
> that contains about 300K records, it takes few minutes to complete. Whereas
> my other application which is counting > 500K records just take less than
> 10 seconds to complete.
>
> I have indexed all the essential columns and still it does not improve the
> speed.

Indexes don't help in this case, a 'select count(*)' forces a seq. scan.
Do you realy need this information? An estimate for the number of rows
can you find in the system catalog (reltuples in pg_class, see
http://www.postgresql.org/docs/current/interactive/catalog-pg-class.html)


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Database Select Slow

From
".ep"
Date:
On Aug 10, 9:42 pm, andreas.kretsch...@schollglas.com ("A.
Kretschmer") wrote:
> am  Fri, dem 10.08.2007, um 17:46:11 +0800 mailte carter ck folgendes:
>
> > Hi all,
>
> > I am facing a performance issue here. Whenever I do a count(*) on a table
> > that contains about 300K records, it takes few minutes to complete. Whereas
> > my other application which is counting > 500K records just take less than
> > 10 seconds to complete.
>
> > I have indexed all the essential columns and still it does not improve the
> > speed.
>
> Indexes don't help in this case, a 'select count(*)' forces a seq. scan.
> Do you realy need this information? An estimate for the number of rows
> can you find in the system catalog (reltuples in pg_class,
seehttp://www.postgresql.org/docs/current/interactive/catalog-pg-class.html)



Hi, what if I need to do a count with a WHERE condition? E.g.,

SELECT count(*) from customers where cust_id = 'georgebush' and
created_on > current_date - interval '1 week' ;

Can I get the info about this from somewhere in the pg system tables
as well? Queries like these are very common in most applications, so
I'm hoping I can avoid the sequential scans!

Many thanks for any tips.


Re: Database Select Slow

From
Bill Moran
Date:
In response to ".ep" <erick.papa@gmail.com>:

> On Aug 10, 9:42 pm, andreas.kretsch...@schollglas.com ("A.
> Kretschmer") wrote:
> > am  Fri, dem 10.08.2007, um 17:46:11 +0800 mailte carter ck folgendes:
> >
> > > Hi all,
> >
> > > I am facing a performance issue here. Whenever I do a count(*) on a table
> > > that contains about 300K records, it takes few minutes to complete. Whereas
> > > my other application which is counting > 500K records just take less than
> > > 10 seconds to complete.
> >
> > > I have indexed all the essential columns and still it does not improve the
> > > speed.
> >
> > Indexes don't help in this case, a 'select count(*)' forces a seq. scan.
> > Do you realy need this information? An estimate for the number of rows
> > can you find in the system catalog (reltuples in pg_class,
seehttp://www.postgresql.org/docs/current/interactive/catalog-pg-class.html)
>
>
>
> Hi, what if I need to do a count with a WHERE condition? E.g.,
>
> SELECT count(*) from customers where cust_id = 'georgebush' and
> created_on > current_date - interval '1 week' ;
>
> Can I get the info about this from somewhere in the pg system tables
> as well? Queries like these are very common in most applications, so
> I'm hoping I can avoid the sequential scans!
>
> Many thanks for any tips.

If you only need an estimate, you can do an "explain" of the query, and
grep out the row count.  The accuracy of this will vary depending on the
statistics, but it's very fast and works with a query of any complexity.

If you need fast, accurate counts, your best bet is to set up triggers on
your tables to maintain counts in a separate table.  This can be rather
complex to set up, and you take a performance hit during inserts and updates,
but I don't know of any other way to do it.

--
Bill Moran
http://www.potentialtech.com

Re: Database Select Slow

From
Guido Neitzer
Date:
On 10.08.2007, at 06:58, .ep wrote:

> Hi, what if I need to do a count with a WHERE condition? E.g.,
>
> SELECT count(*) from customers where cust_id = 'georgebush' and
> created_on > current_date - interval '1 week' ;
>
> Can I get the info about this from somewhere in the pg system tables
> as well? Queries like these are very common in most applications, so
> I'm hoping I can avoid the sequential scans!

If you have a qualified count(*) it goes to the index first, than
checks whether the rows are live for your transaction. The problem is
only the unqualified count with

select count(*) from table_name;

without any qualification. Or, of course, if your qualifier is not
selective enough and you get a couple of millions rows back from a
slow IO system ...

I try to do counts only if I know that the selectivity is good enough
not to kill the performance. Or I use "pleas wait" pages in the my
application to tell the user, that his request is being processed and
not hung.

cug

Re: Database Select Slow

From
"Scott Marlowe"
Date:
On 8/10/07, carter ck <carterck32@hotmail.com> wrote:
> Hi all,
>
> I am facing a performance issue here. Whenever I do a count(*) on a table
> that contains about 300K records, it takes few minutes to complete. Whereas
> my other application which is counting > 500K records just take less than 10
> seconds to complete.
>
> I have indexed all the essential columns and still it does not improve the
> speed.

As previously mentioned, indexes won't help with a count(*) with no
where clause.

They might help with a where clause, if it's quite selective, but if
you're grabbing a noticeable percentage of a table, pgsql will rightly
switch to a seq scan.

Here's some examples from my goodly sized stats db here at work:

\timing
explain select * from businessrequestsummary;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Seq Scan on businessrequestsummary  (cost=0.00..3280188.63
rows=67165363 width=262)
Time: 0.441 ms

gives me an approximate value of 67,165,363 rows.

explain select * from businessrequestsummary where lastflushtime >
now() - interval '1 day';
                                                         QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
 Index Scan using businessrequestsummary_lastflushtime_dx on
businessrequestsummary  (cost=0.00..466.65 rows=6661 width=262)
   Index Cond: (lastflushtime > (now() - '1 day'::interval))
says 6661 rows. and takes 0.9 ms and would use the index.

To run the real queries I get much slower times. :)

Now, to run the real count(*) queries:

 select count(*) from businessrequestsummary where lastflushtime >
now() - interval '1 day';
 count
--------
 274192
(1 row)

Time: 546.528 ms

(data in the buffers makes it fast)

select count(*) from businessrequestsummary where lastflushtime >
now() - interval '1 week';
  count
---------
 1700050
(1 row)

Time: 26291.155 ms

second run (data now in buffer)

select count(*) from businessrequestsummary where lastflushtime >
now() - interval '1 week';
  count
---------
 1699689
(1 row)

Time: 2592.573 ms

Note the number changed, because this db is constantly being updated
in real time with production statistics.

I'm not going to run a select count(*) on that db, because it would
take about 30 minutes to run.  It's got about 67million rows in it.

Re: Database Select Slow

From
"carter ck"
Date:
Hi,

Thanks for the clarification. It helps to resolve the problem. Now, the page
can be fully loaded within 2 seconds.

Thanks.


>From: "Scott Marlowe" <scott.marlowe@gmail.com>
>To: "carter ck" <carterck32@hotmail.com>
>CC: pgsql-general@postgresql.org
>Subject: Re: [GENERAL] Database Select Slow
>Date: Fri, 10 Aug 2007 10:57:19 -0500
>
>On 8/10/07, carter ck <carterck32@hotmail.com> wrote:
> > Hi all,
> >
> > I am facing a performance issue here. Whenever I do a count(*) on a
>table
> > that contains about 300K records, it takes few minutes to complete.
>Whereas
> > my other application which is counting > 500K records just take less
>than 10
> > seconds to complete.
> >
> > I have indexed all the essential columns and still it does not improve
>the
> > speed.
>
>As previously mentioned, indexes won't help with a count(*) with no
>where clause.
>
>They might help with a where clause, if it's quite selective, but if
>you're grabbing a noticeable percentage of a table, pgsql will rightly
>switch to a seq scan.
>
>Here's some examples from my goodly sized stats db here at work:
>
>\timing
>explain select * from businessrequestsummary;
>                                      QUERY PLAN
>-------------------------------------------------------------------------------------
>  Seq Scan on businessrequestsummary  (cost=0.00..3280188.63
>rows=67165363 width=262)
>Time: 0.441 ms
>
>gives me an approximate value of 67,165,363 rows.
>
>explain select * from businessrequestsummary where lastflushtime >
>now() - interval '1 day';
>                                                          QUERY PLAN

>-----------------------------------------------------------------------------------------------------------------------------
>  Index Scan using businessrequestsummary_lastflushtime_dx on
>businessrequestsummary  (cost=0.00..466.65 rows=6661 width=262)
>    Index Cond: (lastflushtime > (now() - '1 day'::interval))
>says 6661 rows. and takes 0.9 ms and would use the index.
>
>To run the real queries I get much slower times. :)
>
>Now, to run the real count(*) queries:
>
>  select count(*) from businessrequestsummary where lastflushtime >
>now() - interval '1 day';
>  count
>--------
>  274192
>(1 row)
>
>Time: 546.528 ms
>
>(data in the buffers makes it fast)
>
>select count(*) from businessrequestsummary where lastflushtime >
>now() - interval '1 week';
>   count
>---------
>  1700050
>(1 row)
>
>Time: 26291.155 ms
>
>second run (data now in buffer)
>
>select count(*) from businessrequestsummary where lastflushtime >
>now() - interval '1 week';
>   count
>---------
>  1699689
>(1 row)
>
>Time: 2592.573 ms
>
>Note the number changed, because this db is constantly being updated
>in real time with production statistics.
>
>I'm not going to run a select count(*) on that db, because it would
>take about 30 minutes to run.  It's got about 67million rows in it.
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

_________________________________________________________________
Find just what you are after with the more precise, more powerful new MSN
Search. http://search.msn.com.sg/ Try it now.