Re: Trying to get postgres to use an index - Mailing list pgsql-general

From Mike Wertheim
Subject Re: Trying to get postgres to use an index
Date
Msg-id 000f01c4c6c0$e2a79330$dc0a000a@mikespc
Whole thread Raw
In response to Re: Trying to get postgres to use an index  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
Responses Re: Trying to get postgres to use an index
List pgsql-general
I have some more info on my indexing situation, and a new question.

In my previous email, I told about 2 tables: Notification and Item,
which join on a field called ItemID.  The joining query didn't execute
as quickly as I thought it should. I now notice that I have another
table, Folder, which joins with Item in a similar way, and the
performance of that join is excellent.

So my new questions is...  What makes the Folder join faster than the
Notification join?


Here is some info on the tables, queries, and "explain analyze"
output...

Item's primary key is ItemID (int4).
Folder's primary key is ItemID (int4).  Folder also contains 4 varchar
columns, 2 text columns, 6 bool columns, 7 datetime columns and 1 int4
column.
Notification has an index on its ItemID (int4) field.  Notification also
contains 7 text columns (1 of them being the primary key), 3 timestamp
columns and 4 int4 columns.

Folder and Notification have a similar number of rows.  "select count(*)
from folder" returns 193043.  "select count(*) from notification"
returns 223689.


The first query is: "select count(*) from FOLDER f, ITEM i where
f.itemID = i.itemID and i.projectid=7720".  This query returns the
result "5" and executes in less than 1 second.

The second query is: "select count(*) from NOTIFICATION n, ITEM i where
n.itemID = i.itemID and i.projectid=7720".  This query returns the
result "2" and executes in about 40 seconds.


Here's the "explain analyze" output...

The Folder query uses the indexes:
explain analyze select count(*) from FOLDER f, ITEM i where f.itemID =
i.itemID and i.projectid=7720;   Aggregate  (cost=6371.88..6371.88
rows=1 width=0) (actual time=83.557..83.558 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..6371.31 rows=227 width=0) (actual
time=17.929..83.502 rows=5 loops=1)
         ->  Index Scan using item_ix_item_4_idx on item i
(cost=0.00..2105.51 rows=869 width=4) (actual time=0.098..19.409 rows=51
loops=1)
               Index Cond: (projectid = 7720)
         ->  Index Scan using folder_pkey on folder f  (cost=0.00..4.90
rows=1 width=4) (actual time=1.255..1.255 rows=0 loops=51)
               Index Cond: (f.itemid = "outer".itemid)
 Total runtime: 92.185 ms


The Notification query does a sequential scan on Notification:
explain analyze select count(*) from NOTIFICATION n, ITEM i where
n.itemID = i.itemID and i.projectid=7720;
 Aggregate  (cost=38732.31..38732.31 rows=1 width=0) (actual
time=40380.497..40380.498 rows=1 loops=1)
   ->  Hash Join  (cost=2107.69..38731.65 rows=263 width=0) (actual
time=36341.174..40380.447 rows=2 loops=1)
         Hash Cond: ("outer".itemid = "inner".itemid)
         ->  Seq Scan on notification n  (cost=0.00..35502.89
rows=223689 width=4) (actual time=8289.236..40255.341 rows=223689
loops=1)
         ->  Hash  (cost=2105.51..2105.51 rows=869 width=4) (actual
time=0.177..0.177 rows=0 loops=1)
               ->  Index Scan using item_ix_item_4_idx on item i
(cost=0.00..2105.51 rows=869 width=4) (actual time=0.025..0.127 rows=51
loops=1)
                     Index Cond: (projectid = 7720)
 Total runtime: 40380.657 ms


So my question is...  What difference do you see between the Folder and
Notification tables that would account for such a big difference in
query performance?  And how can I make the Notification query run about
as fast as the Folder query?



pgsql-general by date:

Previous
From: Mike Cox
Date:
Subject: Re: I'm about to release the next postgresql RFD. Comments
Next
From: "A. Mous"
Date:
Subject: Transaction rollback - newbie