Re: Simple query not using index: why? - Mailing list pgsql-general

From William Garrison
Subject Re: Simple query not using index: why?
Date
Msg-id 48BEEBA5.3030108@mobydisk.com
Whole thread Raw
In response to Re: Simple query not using index: why?  (aklaver@comcast.net (Adrian Klaver))
Responses Re: Simple query not using index: why?  (Joshua Drake <jd@commandprompt.com>)
List pgsql-general
Can't it just scan the index to get that?  I assumed the index had links to every fileid in the table.  In my over-simplified imagination, the table looks like this:

ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
ctid|fileid|column|column|column|column
etc.

While the index looks like
fileid|ctid
fileid|ctid
fileid|ctid
fileid|ctid
...

So I expected scanning the index was faster, and still had everything it needed to do the count.  Or perhaps it was because I said COUNT(*) so it needs to look at the other columns in the table?  I really just wanted the number of "hits" not the number of records with distinct values or anything like that.  My understanding was that COUNT(*) did that, and didn't really look at the columns themselves.


Adrian Klaver wrote:
 -------------- Original message ----------------------
From: William Garrison <postgres@mobydisk.com> 
I am looking for records with duplicate keys, so I am running this query:

SELECT   fileid, COUNT(*)
FROM   file
GROUP BY   fileid
HAVING   COUNT(*)>1

The table has an index on fileid (non-unique index) so I am surprised 
that postgres is doing a table scan.  This database is >15GB, and there 
are a number of fairly large string columns in the table.  I am very 
surprised that scanning the index is not faster than scanning the 
table.  Any thoughts on that?  Is scanning the table faster than 
scanning the index?  Is there a reason that it needs anything other than 
the index?
   
I may be missing something, but it would have to scan the entire table to get all the occurrences of each fileid in order to do the count(*).



--
Adrian Klaver
aklaver@comcast.net

 

pgsql-general by date:

Previous
From: Tony Caduto
Date:
Subject: Re: SELECT INTO returns incorrect values
Next
From: Joshua Drake
Date:
Subject: Re: Simple query not using index: why?