Thread: Slow count(*)
Hi, We are looking for a patch that will help us count using the indexes. Our product is about 20 times slower on Postgres compared to MS SQL Server. Any ideas? Danny Abraham BMC Software CTM&D Business Unit 972-52-4286-513 danny_abraham@bmc.com
On 02/01/2008, Abraham, Danny <danny_abraham@bmc.com> wrote: > Hi, > > We are looking for a patch that will help us count using the indexes. > > Our product is about 20 times slower on Postgres compared to MS SQL > Server. > > Any ideas? There isn't any similar patch and will not be. Use materialized views or similar techniques. Are you sure, so all your problems are only in SELECT COUNT(*)? Check, please, all slow queries. Regards Pavel Stehule > > Danny Abraham > BMC Software > CTM&D Business Unit > 972-52-4286-513 > danny_abraham@bmc.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
am Wed, dem 02.01.2008, um 9:29:24 -0600 mailte Abraham, Danny folgendes: > Hi, > > We are looking for a patch that will help us count using the indexes. > > Our product is about 20 times slower on Postgres compared to MS SQL > Server. > > Any ideas? Please show us your SQL and the execution plan (EXPLAIN or, better, EXPLAIN ANALYSE) and read our FAQ. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
On Wed, Jan 02, 2008 at 09:29:24AM -0600, Abraham, Danny wrote: > We are looking for a patch that will help us count using the indexes. Is this for SELECT count(*) FROM table; or SELECT count(1) FROM table WHERE. . . The latter _will_ use an index, if the index is correct, the statistics are right, and the index selectivity is worth the cost of reading the index. The former will not use an index at all, because the answer depends on visibility, and you can't know that without reading the table. If you're counting how many rows are in the table (for, for instance, display purposes), you probably need to do something else. > Our product is about 20 times slower on Postgres compared to MS SQL > Server. > > Any ideas? Not without the queries, the EXPLAIN ANALYZE plans, and some information about the database. A
On Wed, 2008-01-02 at 09:29 -0600, Abraham, Danny wrote: > Our product is about 20 times slower on Postgres compared to MS SQL > Server. If you want to have a cross-platform product then you must consider how to access multiple systems both accurately and quickly. Not much point catering for the different SQL dialects and then ignoring the performance differences. All products are not the same; you will find many advantages with Postgres. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
>>> On Wed, Jan 2, 2008 at 9:29 AM, in message <BE67D1149BBD5746984545ED91F702E04DCC7A@hou-ex-02.adprod.bmc.com>, "Abraham, Danny" <danny_abraham@bmc.com> wrote: > We are looking for a patch that will help us count using the indexes. As others have mentioned, that's not currently possible for a count of all rows in a table, because there can be many versions of a row under PostgreSQL's MVCC techniques, and the row must currently be visited to determine whether it is visible in the context of your database transaction. > Our product is about 20 times slower on Postgres compared to MS SQL > Server. > > Any ideas? Again, it is best to show a particular example of a problem, because you might be making a bad assumption about the cause of your slowness. If you don't understand MVCC and the need for maintenance, you might have table bloat which could be the issue. Also, always give the exact version of PostgreSQL, the OS, and a description of the hardware. If you really are doing proper maintenance, and you don't need exact counts, you might be able to use the approximation stored in the system tables: cc=> \timing Timing is on. cc=> select count(*) from "Party";count --------135093 (1 row) Time: 48.626 ms cc=> select reltuples from pg_class where relname = 'Party';reltuples ----------- 135091 (1 row) Time: 9.799 ms -Kevin
Kevin Grittner wrote: > >If you really are doing proper maintenance, and you don't need exact >counts, you might be able to use the approximation stored in the >system tables: > > Also, if you're using count(*) as an existance test (common in Mysql code), it's better to use exists instead. Using a table in my system, I see: > proddb=> explain analyze select count(*) from instrument_listings > where update_date is null and delist_date is null; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=18385.45..18385.46 rows=1 width=0) (actual > time=897.799..897.801 rows=1 loops=1) > -> Seq Scan on instrument_listings (cost=0.00..17973.43 > rows=164807 width=0) (actual time=0.018..634.197 rows=146122 loops=1) > Filter: ((update_date IS NULL) AND (delist_date IS NULL)) > Total runtime: 897.846 ms > (4 rows) > > Time: 898.478 ms > proddb=> explain analyze select true where exists(select 1 from > instrument_listings where update_date is null and delist_date is null > limit 1); > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------ > Result (cost=0.11..0.12 rows=1 width=0) (actual time=0.026..0.028 > rows=1 loops=1) > One-Time Filter: $0 > InitPlan > -> Limit (cost=0.00..0.11 rows=1 width=0) (actual > time=0.022..0.022 rows=1 loops=1) > -> Seq Scan on instrument_listings (cost=0.00..17973.43 > rows=164807 width=0) (actual time=0.017..0.017 rows=1 loops=1) > Filter: ((update_date IS NULL) AND (delist_date IS NULL)) > Total runtime: 0.063 ms > (7 rows) > > Time: 0.768 ms > proddb=> The exists version is over 1000x faster (and the only reason it's not more studly is that I'm working on the table as we speak, so it's all in memory). As a general rule in postgres, don't do count(*) unless you really mean it. Brian