FTI is really really slow; what am I doing wrong? - Mailing list pgsql-general

From Paul C.
Subject FTI is really really slow; what am I doing wrong?
Date
Msg-id F104VWDKpzxDPHprp5y00014a94@hotmail.com
Whole thread Raw
Responses Re: FTI is really really slow; what am I doing wrong?  (newsreader@mediaone.net)
Re: FTI is really really slow; what am I doing wrong?  (Bruno Wolff III <bruno@wolff.to>)
Re: FTI is really really slow; what am I doing wrong?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Greetings,
  I am trying to test out the performance of the contrib/fulltextindex
package and I am getting horrid performance results.
The Setup:
I created a simple table, ST (id SERIAL, body varchar(1024), which is to be
searched.  I created the ST_FTI table, trigger and indices as per
instructions in the FTI readme and C file.  To populate the table, I took a
flat text version of 'War and Peace' I found on the net, broke it up into
sentences and inserted each sentence into ST as a row.  So I have about
38,000 sentences and my ST_FTI table is about 2 million rows.
The Test:
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it.  That is my target.  For a straight select on ST:
  select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
  select s.* from st s, st_fti f1, st_fti f2 where f1.string
    ~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
    and s.oid = f2.id;
the cost becomes a staggering 80628.92!!!  The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st).  I cannot figure out why this is so much worse than the
straight query.  Indeed, the cost to look up a single string in the st_fti
table is way high:
  select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong?  Is it the sheer size of the st_fti table that is
causing problems?  Any help would be greatly appreciated.
Thanks,
Paul C.

FTI search
NOTICE:  QUERY PLAN:
Merge Join  (cost=80046.91..80628.92 rows=110 width=28)
  ->  Sort  (cost=41827.54..41827.54 rows=19400 width=24)
        ->  Hash Join  (cost=1992.80..40216.39 rows=19400 width=24)
              ->  Seq Scan on st_fti f2  (cost=0.00..36703.40 rows=19400
width=4)
              ->  Hash  (cost=929.94..929.94 rows=34094 width=20)
                    ->  Seq Scan on st s  (cost=0.00..929.94 rows=34094
width=20)
  ->  Sort  (cost=38219.37..38219.37 rows=19400 width=4)
        ->  Seq Scan on st_fti f1  (cost=0.00..36703.40 rows=19400 width=4)
EXPLAIN

Plain search:
NOTICE:  QUERY PLAN:
Seq Scan on st  (cost=0.00..1100.41 rows=1 width=16)
EXPLAIN


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


pgsql-general by date:

Previous
From: Andrew Gould
Date:
Subject: During dump: function not found
Next
From: Peter Eisentraut
Date:
Subject: Re: add, subtract bool type