Greetings,
I am trying to test out the performance of the contrib/fulltextindex
package and I am getting horrid performance results.
The Setup:
I created a simple table, ST (id SERIAL, body varchar(1024), which is to be
searched. I created the ST_FTI table, trigger and indices as per
instructions in the FTI readme and C file. To populate the table, I took a
flat text version of 'War and Peace' I found on the net, broke it up into
sentences and inserted each sentence into ST as a row. So I have about
38,000 sentences and my ST_FTI table is about 2 million rows.
The Test:
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it. That is my target. For a straight select on ST:
select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
select s.* from st s, st_fti f1, st_fti f2 where f1.string
~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
and s.oid = f2.id;
the cost becomes a staggering 80628.92!!! The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st). I cannot figure out why this is so much worse than the
straight query. Indeed, the cost to look up a single string in the st_fti
table is way high:
select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong? Is it the sheer size of the st_fti table that is
causing problems? Any help would be greatly appreciated.
Thanks,
Paul C.
FTI search
NOTICE: QUERY PLAN:
Merge Join (cost=80046.91..80628.92 rows=110 width=28)
-> Sort (cost=41827.54..41827.54 rows=19400 width=24)
-> Hash Join (cost=1992.80..40216.39 rows=19400 width=24)
-> Seq Scan on st_fti f2 (cost=0.00..36703.40 rows=19400
width=4)
-> Hash (cost=929.94..929.94 rows=34094 width=20)
-> Seq Scan on st s (cost=0.00..929.94 rows=34094
width=20)
-> Sort (cost=38219.37..38219.37 rows=19400 width=4)
-> Seq Scan on st_fti f1 (cost=0.00..36703.40 rows=19400 width=4)
EXPLAIN
Plain search:
NOTICE: QUERY PLAN:
Seq Scan on st (cost=0.00..1100.41 rows=1 width=16)
EXPLAIN
_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp