Re: Critical performance problems on large databases - Mailing list pgsql-general
From | Nigel J. Andrews |
---|---|
Subject | Re: Critical performance problems on large databases |
Date | |
Msg-id | Pine.LNX.4.21.0204110040270.2690-100000@ponder.fairway2k.co.uk Whole thread Raw |
In response to | Re: Critical performance problems on large databases (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Critical performance problems on large databases
Re: Critical performance problems on large databases |
List | pgsql-general |
On Wed, 10 Apr 2002, Tom Lane wrote: > Gunther Schadow <gunther@aurora.regenstrief.org> writes: > > We also noted that a > > SELECT COUNT(*) FROM BigQuery; > > can take quite a long time and again use a lot of resources, > > whereas > > SELECT COUNT(smallcolumn) FROM BigQuery; > > may be faster and less resource consuming. > > This is complete nonsense... if anything, the second one will take > more cycles, since it has to actually examine a column. > OK, I hate to do this, partly because I seem to remember mention of how this sort of thing has to happen like this because the system isn't clever enough to do the optimisation, but... Shouldn't SELECT count(*) FROM chat_post give an immediate answer, especially when there's an index it can use? Example, with what appears to be an overkill but the the primary key is a two column affair where as the 'time' column counted in the second select has it's own index: avid_chat_archive=> SET ENABLE_SEQSCAN TO FALSE; SET VARIABLE avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(*) FROM chat_post; NOTICE: QUERY PLAN: Aggregate (cost=100020853.10..100020853.10 rows=1 width=0) (actual time=48557.4 8..48557.49 rows=1 loops=1) -> Seq Scan on chat_post (cost=100000000.00..100018291.08 rows=1024808 width =0) (actual time=6.68..32598.56 rows=1024808 loops=1) Total runtime: 48557.93 msec EXPLAIN avid_chat_archive=> SELECT COUNT(*) FROM chat_post; count --------- 1024808 (1 row) avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(time) FROM chat_post; NOTICE: QUERY PLAN: Aggregate (cost=100020853.10..100020853.10 rows=1 width=8) (actual time=51314.5 4..51314.55 rows=1 loops=1) -> Seq Scan on chat_post (cost=100000000.00..100018291.08 rows=1024808 width =8) (actual time=6.78..35012.81 rows=1024808 loops=1) Total runtime: 51314.97 msec EXPLAIN avid_chat_archive=> SELECT COUNT(time) FROM chat_post; count --------- 1024808 (1 row) avid_chat_archive=> \d chat_post Table "chat_post" Column | Type | Modifiers -------------+--------------------------+----------- session_id | smallint | not null poster_name | character varying(32) | not null time | timestamp with time zone | not null post_number | smallint | not null Indexes: chat_post_time_key, chat_post_user_key Primary key: chat_post_pkey Triggers: RI_ConstraintTrigger_4369014, RI_ConstraintTrigger_4369012, RI_ConstraintTrigger_4369010, RI_ConstraintTrigger_4369008, RI_ConstraintTrigger_4369006 avid_chat_archive=> Having muttered about the primary key using two columns I see the planner can see the table size without having to revert to an index. Which makes sense if only I'd turned my brain on first. Anyway, the question still stands, why does postgres do this query this way? It is doing the full sequential scan, i.e. fetching the tuples from disk, when this data is not necessary for the query result. Is it to do with calling requirement of count(), other aggregate functions and/or functions in general when used in the return list and/or that it requires too much intelligence for the system to determine such optimisations? And, I can't see this specific item addressed in the FAQ. I'm sure I'm not the only to have brought this up (oh wait, I'm reply to a related message so obviously not). Shouldn't it be there? -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
pgsql-general by date: