Thread: FTI is really really slow; what am I doing wrong?

FTI is really really slow; what am I doing wrong?

From
"Paul C."
Date:
Greetings,
  I am trying to test out the performance of the contrib/fulltextindex
package and I am getting horrid performance results.
The Setup:
I created a simple table, ST (id SERIAL, body varchar(1024), which is to be
searched.  I created the ST_FTI table, trigger and indices as per
instructions in the FTI readme and C file.  To populate the table, I took a
flat text version of 'War and Peace' I found on the net, broke it up into
sentences and inserted each sentence into ST as a row.  So I have about
38,000 sentences and my ST_FTI table is about 2 million rows.
The Test:
There is exactly one sentence (row) that has the strings 'Newton' and
'Kepler' in it.  That is my target.  For a straight select on ST:
  select * from st where body ~* 'newton' and body ~* 'kepler';
the cost is 1100.41
BUT for an query using the FTI indices:
  select s.* from st s, st_fti f1, st_fti f2 where f1.string
    ~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
    and s.oid = f2.id;
the cost becomes a staggering 80628.92!!!  The plans are pasted at the end
of this message.
Now, I have all the indices created (on id of st_fti, on string of st_fti
and on oid of st).  I cannot figure out why this is so much worse than the
straight query.  Indeed, the cost to look up a single string in the st_fti
table is way high:
  select * from st_fti where string ~ '^kepler';
costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
exists.
What am I doing wrong?  Is it the sheer size of the st_fti table that is
causing problems?  Any help would be greatly appreciated.
Thanks,
Paul C.

FTI search
NOTICE:  QUERY PLAN:
Merge Join  (cost=80046.91..80628.92 rows=110 width=28)
  ->  Sort  (cost=41827.54..41827.54 rows=19400 width=24)
        ->  Hash Join  (cost=1992.80..40216.39 rows=19400 width=24)
              ->  Seq Scan on st_fti f2  (cost=0.00..36703.40 rows=19400
width=4)
              ->  Hash  (cost=929.94..929.94 rows=34094 width=20)
                    ->  Seq Scan on st s  (cost=0.00..929.94 rows=34094
width=20)
  ->  Sort  (cost=38219.37..38219.37 rows=19400 width=4)
        ->  Seq Scan on st_fti f1  (cost=0.00..36703.40 rows=19400 width=4)
EXPLAIN

Plain search:
NOTICE:  QUERY PLAN:
Seq Scan on st  (cost=0.00..1100.41 rows=1 width=16)
EXPLAIN


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


Re: FTI is really really slow; what am I doing wrong?

From
"Mitch Vincent"
Date:
You've vacuum analyze 'd the database, haven't you?

-Mitch

> There is exactly one sentence (row) that has the strings 'Newton' and
> 'Kepler' in it.  That is my target.  For a straight select on ST:
>   select * from st where body ~* 'newton' and body ~* 'kepler';
> the cost is 1100.41
> BUT for an query using the FTI indices:
>   select s.* from st s, st_fti f1, st_fti f2 where f1.string
>     ~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
>     and s.oid = f2.id;
> the cost becomes a staggering 80628.92!!!  The plans are pasted at the end
> of this message.
> Now, I have all the indices created (on id of st_fti, on string of st_fti
> and on oid of st).  I cannot figure out why this is so much worse than the
> straight query.  Indeed, the cost to look up a single string in the st_fti
> table is way high:
>   select * from st_fti where string ~ '^kepler';
> costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
> exists.
> What am I doing wrong?  Is it the sheer size of the st_fti table that is
> causing problems?  Any help would be greatly appreciated.
> Thanks,



Re: FTI is really really slow; what am I doing wrong?

From
newsreader@mediaone.net
Date:
Did you vacuum after
populating the tables?
If not you should do it


On Wed, Aug 22, 2001 at 11:08:55AM -0400, Paul C. wrote:
> Greetings,
>   I am trying to test out the performance of the contrib/fulltextindex
> package and I am getting horrid performance results.
> The Setup:
> I created a simple table, ST (id SERIAL, body varchar(1024), which is to be
> searched.  I created the ST_FTI table, trigger and indices as per
> instructions in the FTI readme and C file.  To populate the table, I took a
> flat text version of 'War and Peace' I found on the net, broke it up into
> sentences and inserted each sentence into ST as a row.  So I have about
> 38,000 sentences and my ST_FTI table is about 2 million rows.
> The Test:
> There is exactly one sentence (row) that has the strings 'Newton' and
> 'Kepler' in it.  That is my target.  For a straight select on ST:
>   select * from st where body ~* 'newton' and body ~* 'kepler';
> the cost is 1100.41
> BUT for an query using the FTI indices:
>   select s.* from st s, st_fti f1, st_fti f2 where f1.string
>     ~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
>     and s.oid = f2.id;
> the cost becomes a staggering 80628.92!!!  The plans are pasted at the end
> of this message.
> Now, I have all the indices created (on id of st_fti, on string of st_fti
> and on oid of st).  I cannot figure out why this is so much worse than the
> straight query.  Indeed, the cost to look up a single string in the st_fti
> table is way high:
>   select * from st_fti where string ~ '^kepler';
> costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
> exists.
> What am I doing wrong?  Is it the sheer size of the st_fti table that is
> causing problems?  Any help would be greatly appreciated.
> Thanks,
> Paul C.
>
> FTI search
> NOTICE:  QUERY PLAN:
> Merge Join  (cost=80046.91..80628.92 rows=110 width=28)
>   ->  Sort  (cost=41827.54..41827.54 rows=19400 width=24)
>         ->  Hash Join  (cost=1992.80..40216.39 rows=19400 width=24)
>               ->  Seq Scan on st_fti f2  (cost=0.00..36703.40 rows=19400
> width=4)
>               ->  Hash  (cost=929.94..929.94 rows=34094 width=20)
>                     ->  Seq Scan on st s  (cost=0.00..929.94 rows=34094
> width=20)
>   ->  Sort  (cost=38219.37..38219.37 rows=19400 width=4)
>         ->  Seq Scan on st_fti f1  (cost=0.00..36703.40 rows=19400 width=4)
> EXPLAIN
>
> Plain search:
> NOTICE:  QUERY PLAN:
> Seq Scan on st  (cost=0.00..1100.41 rows=1 width=16)
> EXPLAIN
>
>
> _________________________________________________________________
> Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: FTI is really really slow; what am I doing wrong?

From
"Paul C."
Date:
Meant to send this to the whole list...

I had vaccum'ed it, but not 'vacuum analyze' -ed it.  That did the trick.
Cost down to 12.09 from 80628.92.

Thank you!


>From: "Mitch Vincent" <mvincent@cablespeed.com>
>To: "Paul C." <ulive1x@hotmail.com>, <pgsql-general@postgresql.org>
>Subject: Re: [GENERAL] FTI is really really slow; what am I doing wrong?
>Date: Wed, 22 Aug 2001 11:28:04 -0400
>
>You've vacuum analyze 'd the database, haven't you?
>
>-Mitch
>
> > There is exactly one sentence (row) that has the strings 'Newton' and
> > 'Kepler' in it.  That is my target.  For a straight select on ST:
> >   select * from st where body ~* 'newton' and body ~* 'kepler';
> > the cost is 1100.41
> > BUT for an query using the FTI indices:
> >   select s.* from st s, st_fti f1, st_fti f2 where f1.string
> >     ~ '^kepler' and f2.string ~ '^newton' and s.oid = f1.id
> >     and s.oid = f2.id;
> > the cost becomes a staggering 80628.92!!!  The plans are pasted at the
>end
> > of this message.
> > Now, I have all the indices created (on id of st_fti, on string of
>st_fti
> > and on oid of st).  I cannot figure out why this is so much worse than
>the
> > straight query.  Indeed, the cost to look up a single string in the
>st_fti
> > table is way high:
> >   select * from st_fti where string ~ '^kepler';
> > costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
> > exists.
> > What am I doing wrong?  Is it the sheer size of the st_fti table that is
> > causing problems?  Any help would be greatly appreciated.
> > Thanks,
>
>


_________________________________________________________________
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp


Re: FTI is really really slow; what am I doing wrong?

From
Bruno Wolff III
Date:
I think the problem is that the indexes on the words are not usable for
regular expression matching. If the words are folded to lower case
when the index is built, then using an exact match (or even like) should
be much faster.

Re: FTI is really really slow; what am I doing wrong?

From
Tom Lane
Date:
"Paul C." <ulive1x@hotmail.com> writes:
> Indeed, the cost to look up a single string in the st_fti
> table is way high:
>   select * from st_fti where string ~ '^kepler';
> costs 36703.40, AND its doing a Seq Scan on st_fti, even though an index
> exists.

Have you done a VACUUM ANALYZE on st_fti?  Are you running the database
in plain C locale?

            regards, tom lane