Re: tsearch2 poor performance - Mailing list pgsql-admin

From Oleg Bartunov
Subject Re: tsearch2 poor performance
Date
Msg-id Pine.GSO.4.58.0409291600370.14980@ra.sai.msu.su
Whole thread Raw
In response to Re: tsearch2 poor performance  (Kris Kiger <kris@musicrebellion.com>)
List pgsql-admin
Kris,

we're working on prototype of tsearchd - full text search daemon, which
maintain static inverted index outside of postgresql using the same
parser, dictionary tsearch2 does.  This approach could scale up
fts capability preserving access to metadata, so yo may have
"archive" part of your collection (tsearchd) and "online", which could be
searchable with tsearch2.

Here is what we have right now:

pages ( tid integer, fts_index  tsvector)

1. Create index
select count(tdindex(tid,fts_index)) from pages;
2. Flush index
select tdflush();
3. Search
select pages.tid, rank(fts_index,to_tsquery('supernovae & magellan')) as rank
from pages, tdsearch(to_tsquery('supernovae & magellan')) as idx where
tid=idx order by rank desc;

If it's possible, you could share your data, so we could test our
prototype on real data.


    Oleg

On Mon, 27 Sep 2004, Kris Kiger wrote:

> Yes, it is much better than no index of sequential scan.  We may just be
> looking at the best performance tsearch2 can offer on my machine.
>
> search_test=#  explain analyze SELECT count(q) FROM product,
> to_tsquery('oil') AS q  WHERE vector @@ q;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
> time=83311.552..83311.555 rows=1 loops=1)
>    ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
> (actual time=0.204..81960.198 rows=226357 loops=1)
>          Join Filter: ("outer".vector @@ "inner".q)
>          ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
> width=32) (actual time=0.100..27415.795 rows=3000000 loops=1)
>          ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
> time=0.003..0.006 rows=1 loops=3000000)
>                ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.020..0.024 rows=1 loops=1)
>  Total runtime: 83311.735 ms
> (7 rows)
>
> search_test=# explain analyze select count(*) from product where
> description like '% oil %';
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=347264.01..347264.01 rows=1 width=0) (actual
> time=39858.350..39858.353 rows=1 loops=1)
>    ->  Seq Scan on product  (cost=0.00..347252.00 rows=4801 width=0)
> (actual time=0.100..38320.293 rows=226357 loops=1)
>          Filter: (description ~~ '% oil %'::text)
>  Total runtime: 39858.491 ms
>
>
> >>Oleg,
> >>
> >>    Thanks for the help on this.
> >>
> >>    The query I used to return the 508 number is:
> >>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
> >>desc, word ;
> >>
> >>    Testing says, the more words I use, the faster the query is.  My
> >>original search word, 'oil', appears in 226,357 documents 233,266 times.
> >> As far as distinct words go, 'oil' is middle of the road for
> >>occurences.  As it is set up now, the best search time I am getting on
> >>this single word is roughly 22 seconds.
> >>
> >>
> >
> >Does this time (22 seconds) is still better than seq. scan (no index)
> >or standard 'LIKE' ?
> >
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-admin by date:

Previous
From: "Magnus Hagander"
Date:
Subject: Re: How to install plperl on Windows?
Next
From: Kris Kiger
Date:
Subject: Re: tsearch2 poor performance