Suggestions on finetuning this search? - Mailing list pgsql-sql

From Justin Long
Subject Suggestions on finetuning this search?
Date
Msg-id PBEMJIMKFNACLPCNPDEBIEODFLAA.justinlong@strategicnetwork.org
Whole thread Raw
List pgsql-sql
I would welcome any suggestions for fine-tuning this search to run faster.


Here is the SQL. Basically what we're allowing people to do is to specify
words to search our article index.

THE TABLES:
knowledge = the knowledge base of articles
kb_categories = the category that each article is assigned to
kbwords = an index of every word in the knowledge base (kbid, wordid)
wordindex = an index of every word in the knowledge base (wordid, word)

AMPLIFICATION:
kbwords = a list of all the words that appear in a specific article (might
return a list of 5 articles where the word 'monk' appears)
wordindex = a unique list of all the words that appear in all the articles
(would only return a single entry for the word 'monk')

THE SQL:
select * from knowledge k, kb_categories c , kbwords w0 , kbwords w1 WHERE
k.catid=c.catid AND ((k.kbid=w0.kbid and w0.wordid=(SELECT wordid from
wordindex where word='BUDDIST')) AND  (k.kbid=w1.kbid and w1.wordid=(SELECT
wordid from wordindex where word='MONK'))) ORDER BY k.regionid , k.ctryid ,
k.catid , k.title ;

WHAT "EXPLAIN" RETURNS:
Sort  (cost=2796577.40..2796577.40 rows=2878549 width=332) InitPlan   ->  Index Scan using wordindex_word on wordindex
(cost=0.00..247.71
rows=376 width=4)   ->  Index Scan using wordindex_word on wordindex  (cost=0.00..247.71
rows=376 width=4) ->  Merge Join  (cost=21187.45..21993.59 rows=2878549 width=332)       ->  Merge Join
(cost=11060.50..11140.94rows=52582 width=324)             ->  Sort  (cost=933.56..933.56 rows=960 width=316)
      ->  Hash Join  (cost=1.43..885.97 rows=960 width=316)                         ->  Seq Scan on knowledge k
(cost=0.00..559.25
rows=2825 width=284)                         ->  Hash  (cost=1.34..1.34 rows=34 width=32)
-> Seq Scan on kb_categories c
 
(cost=0.00..1.34 rows=34 width=32)             ->  Sort  (cost=10126.95..10126.95 rows=5474 width=8)
-> Seq Scan on kbwords w0  (cost=0.00..9787.02
 
rows=5474 width=8)       ->  Sort  (cost=10126.95..10126.95 rows=5474 width=8)             ->  Seq Scan on kbwords w1
(cost=0.00..9787.02rows=5474
 
width=8)

Please e-mail suggestions to justinlong@strategicnetwork.org. Thanks!

To see the code in action, visit
http://www.strategicnetwork.org/index.asp?loc=kb



pgsql-sql by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Data Types
Next
From: Mark Byerley
Date:
Subject: Datetime Query