Thread: Re: [HACKERS] slow count() was: tsearch2 poor performance

Re: [HACKERS] slow count() was: tsearch2 poor performance

From

"Magnus Hagander"

Date:

04 October 2004, 00:44:10

>> Hey all, its me again.  If I do not do a count(product_id) on my
>> tsearch2 queries, its actually really fast, for example;
>>
>
>Hmm, I also really want to know  what's the difference ?
>Postgresql 8.0beta3 on Linux 2.4.25
>
>tsearchd=# explain analyze select body from txt where
>fts_index @@ to_tsquery('oil') limit 1000;
>                                                           QUERY PLAN
>---------------------------------------------------------------
>-----------------------------------------------------------------
> Limit  (cost=0.00..4027.67 rows=1000 width=315) (actual
>time=0.053..14.662 rows=1000 loops=1)
>   ->  Index Scan using fts_idx on txt  (cost=0.00..12083.02
>rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1)
>         Index Cond: (fts_index @@ '\'oil\''::tsquery)
> Total runtime: 15.848 ms
>(4 rows)
>
>tsearchd=# explain analyze select count(body) from txt where
>fts_index @@ to_tsquery('oil') limit 1000;
>
>Didn't get result after 10 minutes :(
>

I think you're missing what LIMIT does.

In the first query, it LIMITs the return from the index scan to 1000
entries.
In the second query, it LIMITs the return from the aggregate to 1000
entries. The indexscan will include all matches, send then to count(),
which returns 1 row only, which LIMIT is then applied to.

You could probably reach the same result with a subselect:
select count(*) FROM (select body from txt where fts_index @@
to_tsquery('oil') limit 1000)


At least that's how I think LIMIT works. That would certainly explain
the major time difference.

//Magnus

Re: [HACKERS] slow count() was: tsearch2 poor performance

From

Oleg Bartunov

Date:

04 October 2004, 00:52:18

Magnus


On Sun, 3 Oct 2004, Magnus Hagander wrote:

> >> Hey all, its me again.  If I do not do a count(product_id) on my
> >> tsearch2 queries, its actually really fast, for example;
> >>
> >
> >Hmm, I also really want to know  what's the difference ?
> >Postgresql 8.0beta3 on Linux 2.4.25
> >
> >tsearchd=# explain analyze select body from txt where
> >fts_index @@ to_tsquery('oil') limit 1000;
> >                                                           QUERY PLAN
> >---------------------------------------------------------------
> >-----------------------------------------------------------------
> > Limit  (cost=0.00..4027.67 rows=1000 width=315) (actual
> >time=0.053..14.662 rows=1000 loops=1)
> >   ->  Index Scan using fts_idx on txt  (cost=0.00..12083.02
> >rows=3000 width=315) (actual time=0.049..12.552 rows=1000 loops=1)
> >         Index Cond: (fts_index @@ '\'oil\''::tsquery)
> > Total runtime: 15.848 ms
> >(4 rows)
> >
> >tsearchd=# explain analyze select count(body) from txt where
> >fts_index @@ to_tsquery('oil') limit 1000;
> >
> >Didn't get result after 10 minutes :(
> >
>
> I think you're missing what LIMIT does.
>
> In the first query, it LIMITs the return from the index scan to 1000
> entries.
> In the second query, it LIMITs the return from the aggregate to 1000
> entries. The indexscan will include all matches, send then to count(),
> which returns 1 row only, which LIMIT is then applied to.
>

Thanks for explanation. I suspect this, so LIMIT is still hack


> You could probably reach the same result with a subselect:
> select count(*) FROM (select body from txt where fts_index @@
> to_tsquery('oil') limit 1000)
>

yes, timings now are equal !

>
> At least that's how I think LIMIT works. That would certainly explain
> the major time difference.
>
> //Magnus
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83