Hi all,
I am having my first try at optimizing the way queries are run on one of my
databases and I'm having a few problems/questions.
Here is the picture (kept simple): I have a table that is at the center of a
many-to-many relationship with about 300K records. Here is its definition:
blocop2=# \d protocolitems
Table "public.protocolitems"
Column | Type | Modifiers
------------------------+----------+----------------------------------------
---------------
protocolitemID | bigint | not null default
nextval('"protocolitems_seq"'::text)
protocolitemProtocolID | bigint | not null
protocolitemItemID | bigint | not null
protocolitemQuantity | smallint | not null default 0
Indexes: protocolitems_pkey primary key btree ("protocolitemID"),
protocol_item_pair unique btree ("protocolitemProtocolID",
"protocolitemItemID"),
item_index btree ("protocolitemItemID"),
protocol_index btree ("protocolitemProtocolID")
Foreign Key constraints: protocol_used FOREIGN KEY
("protocolitemProtocolID") REFERENCES protocols("protocolID") ON UPDATE NO
ACTION ON DELETE CASCADE,
article_referenced FOREIGN KEY
("protocolitemItemID") REFERENCES items("itemID") ON UPDATE NO ACTION ON
DELETE CASCADE
So it's got a primary key and two foreign keys. I decided to index the two
foreign keys to that they could be queried more quickly. I also defined a
unique constraint because I cannot have duplicate pairs of foreign keys.
That's actually the first question: since the unique constraint already
generates an index on the two foreign keys, do I really need the two other
indexes?
Now the problem I have is that, although this table is not huge, it's still
quite large and operations on a couple of thousand records are taking quite
some time. Roughly said an update/delete operation on a single record takes
about 1 to 1.5 second. Sometimes I have to operate on 4000 records in a go,
which means between 4000 and 6000 seconds. The application having a web
interface, I guess you can imagine I have a timeout problem on my hands. I
can access the server configuration and change the timeout limits, but
changing it to that extent (1.5 hours!) doesn't seem like a good solution.
I tried to play around with the indexes and I got surprising results from
the query planner. Here goes. First an extract of the table:
blocop2=# select * from protocolitems where "protocolitemItemID" = 101;
protocolitemID | protocolitemProtocolID | protocolitemItemID |
protocolitemQuantity
----------------+------------------------+--------------------+-------------
---------
247501 | 1 | 101 |
0
247502 | 2 | 101 |
0
247503 | 3 | 101 |
0
247504 | 4 | 101 |
0
247505 | 5 | 101 |
0
...
(30 rows)
Then I ran explain analyze on a delete operation of the first record of the
extract above, once with the primary key and once with the foreign key pair:
blocop2=# explain analyze delete from protocolitems where "protocolitemID" =
247501;
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------
Seq Scan on protocolitems (cost=0.00..4971.75 rows=1 width=6) (actual
time=1038.13..1055.69 rows=1 loops=1)
Filter: ("protocolitemID" = 247501)
Total runtime: 1056.07 msec
(3 rows)
blocop2=# explain analyze delete from protocolitems where
"protocolitemProtocolID" = 1 and "protocolitemItemID" = 101;
QUERY PLAN
----------------------------------------------------------------------------
----------------------------------
Seq Scan on protocolitems (cost=0.00..5595.50 rows=3 width=6) (actual
time=1048.63..1048.63 rows=0 loops=1)
Filter: (("protocolitemProtocolID" = 1) AND ("protocolitemItemID" = 101))
Total runtime: 1048.84 msec
(3 rows)
My first surprise is that the cost/time is the same. I would think that
scanning two columns would be more costly than scanning a single one. Second
surprise, why is the index not used? Third question, does the query planner
take into account the cost of rebuilding the indexes?
I ran the same two operations dropping one index after the other. I spare
you the details: since the indexes weren't used anyway, the results don't
change.
I then reset the indexes and tried to run the queries again by setting
enable_seqscan to off:
blocop2=# set enable_seqscan to off;
SET
blocop2=# explain analyze delete from protocolitems where "protocolitemID" =
247501;
QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------
Seq Scan on protocolitems (cost=100000000.00..100004953.74 rows=1 width=6)
(actual time=1210.40..1210.40 rows=0 loops=1)
Filter: ("protocolitemID" = 247501)
Total runtime: 1210.71 msec
(3 rows)
blocop2=# explain analyze delete from protocolitems where
"protocolitemProtocolID" = 1 and "protocolitemItemID" = 101;
QUERY PLAN
----------------------------------------------------------------------------
-----------------------------------------------
Seq Scan on protocolitems (cost=100000000.00..100005577.48 rows=1 width=6)
(actual time=1272.72..1272.72 rows=0 loops=1)
Filter: (("protocolitemProtocolID" = 1) AND ("protocolitemItemID" = 101))
Total runtime: 1273.00 msec
(3 rows)
Now although the query plan does change, it's still not using the indexes.
Why?
Any help would be appreciated.
Cheers.
--------
Francois
Home page: http://www.monpetitcoin.com/
"We waste our time spending money we don't have to buy things we don't need
to impress people we don't like"