Re: Critical performance problems on large databases - Mailing list pgsql-general

From Nigel J. Andrews
Subject Re: Critical performance problems on large databases
Date
Msg-id Pine.LNX.4.21.0204110040270.2690-100000@ponder.fairway2k.co.uk
Whole thread Raw
In response to Re: Critical performance problems on large databases  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Critical performance problems on large databases  ("Nigel J. Andrews" <nandrews@investsystems.co.uk>)
Re: Critical performance problems on large databases  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-general
On Wed, 10 Apr 2002, Tom Lane wrote:

> Gunther Schadow <gunther@aurora.regenstrief.org> writes:
> > We also noted that a
> > SELECT COUNT(*) FROM BigQuery;
> > can take quite a long time and again use a lot of resources,
> > whereas
> > SELECT COUNT(smallcolumn) FROM BigQuery;
> > may be faster and less resource consuming.
>
> This is complete nonsense... if anything, the second one will take
> more cycles, since it has to actually examine a column.
>

OK, I hate to do this, partly because I seem to remember mention of how this
sort of thing has to happen like this because the system isn't clever enough to
do the optimisation, but...

Shouldn't
     SELECT count(*) FROM chat_post
give an immediate answer, especially when there's an index it can use?

Example, with what appears to be an overkill but the the primary key is a two
column affair where as the 'time' column counted in the second select has it's
own index:


avid_chat_archive=> SET ENABLE_SEQSCAN TO FALSE;
SET VARIABLE
avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(*) FROM chat_post;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100020853.10..100020853.10 rows=1 width=0) (actual time=48557.4
8..48557.49 rows=1 loops=1)
  ->  Seq Scan on chat_post  (cost=100000000.00..100018291.08 rows=1024808 width
=0) (actual time=6.68..32598.56 rows=1024808 loops=1)
Total runtime: 48557.93 msec

EXPLAIN
avid_chat_archive=> SELECT COUNT(*) FROM chat_post;
  count
---------
 1024808
(1 row)

avid_chat_archive=> EXPLAIN ANALYZE SELECT COUNT(time) FROM chat_post;
NOTICE:  QUERY PLAN:

Aggregate  (cost=100020853.10..100020853.10 rows=1 width=8) (actual time=51314.5
4..51314.55 rows=1 loops=1)
  ->  Seq Scan on chat_post  (cost=100000000.00..100018291.08 rows=1024808 width
=8) (actual time=6.78..35012.81 rows=1024808 loops=1)
Total runtime: 51314.97 msec

EXPLAIN
avid_chat_archive=> SELECT COUNT(time) FROM chat_post;
  count
---------
 1024808
(1 row)

avid_chat_archive=> \d chat_post

                 Table "chat_post"
   Column    |           Type           | Modifiers
-------------+--------------------------+-----------
 session_id  | smallint                 | not null
 poster_name | character varying(32)    | not null
 time        | timestamp with time zone | not null
 post_number | smallint                 | not null
Indexes: chat_post_time_key,
         chat_post_user_key
Primary key: chat_post_pkey
Triggers: RI_ConstraintTrigger_4369014,
          RI_ConstraintTrigger_4369012,
          RI_ConstraintTrigger_4369010,
          RI_ConstraintTrigger_4369008,
          RI_ConstraintTrigger_4369006

avid_chat_archive=>


Having muttered about the primary key using two columns I see the planner can
see the table size without having to revert to an index. Which makes sense if
only I'd turned my brain on first.

Anyway, the question still stands, why does postgres do this query this
way? It is doing the full sequential scan, i.e. fetching the tuples from
disk, when this data is not necessary for the query result. Is it to do with
calling requirement of count(), other aggregate functions and/or functions in
general when used in the return list and/or that it requires too much
intelligence for the system to determine such optimisations?

And, I can't see this specific item addressed in the FAQ. I'm sure I'm not the
only to have brought this up (oh wait, I'm reply to a related message so
obviously not). Shouldn't it be there?


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants


pgsql-general by date:

Previous
From: "Ronan Lucio"
Date:
Subject: Database permissions
Next
From: "Nigel J. Andrews"
Date:
Subject: Re: Critical performance problems on large databases