Home > mailing lists

Re: tsearch2 poor performance - Mailing list pgsql-admin

From	Oleg Bartunov
Subject	Re: tsearch2 poor performance
Date	September 29, 2004 13:12:24
Msg-id	Pine.GSO.4.58.0409291600370.14980@ra.sai.msu.su Whole thread Raw
In response to	Re: tsearch2 poor performance (Kris Kiger <kris@musicrebellion.com>)
List	pgsql-admin

Tree view

Kris,

we're working on prototype of tsearchd - full text search daemon, which
maintain static inverted index outside of postgresql using the same
parser, dictionary tsearch2 does.  This approach could scale up
fts capability preserving access to metadata, so yo may have
"archive" part of your collection (tsearchd) and "online", which could be
searchable with tsearch2.

Here is what we have right now:

pages ( tid integer, fts_index  tsvector)

1. Create index
select count(tdindex(tid,fts_index)) from pages;
2. Flush index
select tdflush();
3. Search
select pages.tid, rank(fts_index,to_tsquery('supernovae & magellan')) as rank
from pages, tdsearch(to_tsquery('supernovae & magellan')) as idx where
tid=idx order by rank desc;

If it's possible, you could share your data, so we could test our
prototype on real data.


    Oleg

On Mon, 27 Sep 2004, Kris Kiger wrote:

> Yes, it is much better than no index of sequential scan.  We may just be
> looking at the best performance tsearch2 can offer on my machine.
>
> search_test=#  explain analyze SELECT count(q) FROM product,
> to_tsquery('oil') AS q  WHERE vector @@ q;
>                                                             QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=67847264.50..67847264.50 rows=1 width=32) (actual
> time=83311.552..83311.555 rows=1 loops=1)
>    ->  Nested Loop  (cost=12.50..67839764.50 rows=3000001 width=32)
> (actual time=0.204..81960.198 rows=226357 loops=1)
>          Join Filter: ("outer".vector @@ "inner".q)
>          ->  Seq Scan on product  (cost=0.00..339752.00 rows=3000000
> width=32) (actual time=0.100..27415.795 rows=3000000 loops=1)
>          ->  Materialize  (cost=12.50..22.50 rows=1000 width=32) (actual
> time=0.003..0.006 rows=1 loops=3000000)
>                ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.020..0.024 rows=1 loops=1)
>  Total runtime: 83311.735 ms
> (7 rows)
>
> search_test=# explain analyze select count(*) from product where
> description like '% oil %';
>                                                        QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=347264.01..347264.01 rows=1 width=0) (actual
> time=39858.350..39858.353 rows=1 loops=1)
>    ->  Seq Scan on product  (cost=0.00..347252.00 rows=4801 width=0)
> (actual time=0.100..38320.293 rows=226357 loops=1)
>          Filter: (description ~~ '% oil %'::text)
>  Total runtime: 39858.491 ms
>
>
> >>Oleg,
> >>
> >>    Thanks for the help on this.
> >>
> >>    The query I used to return the 508 number is:
> >>         SELECT * FROM stat('SELECT vector FROM product') ORDER BY ndoc
> >>desc, word ;
> >>
> >>    Testing says, the more words I use, the faster the query is.  My
> >>original search word, 'oil', appears in 226,357 documents 233,266 times.
> >> As far as distinct words go, 'oil' is middle of the road for
> >>occurences.  As it is set up now, the best search time I am getting on
> >>this single word is roughly 22 seconds.
> >>
> >>
> >
> >Does this time (22 seconds) is still better than seq. scan (no index)
> >or standard 'LIKE' ?
> >
> >
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-admin by date:

From: "Magnus Hagander"
Date: 29 September 2004, 09:24:56
Subject: Re: How to install plperl on Windows?

From: Kris Kiger
Date: 29 September 2004, 15:34:07
Subject: Re: tsearch2 poor performance

Re: tsearch2 poor performance - Mailing list pgsql-admin

Previous

Next