Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents. - Mailing list pgsql-general

From Benjamin Arai
Subject Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.
Date
Msg-id 3E8290A0-05DA-4F0F-A783-859124278D27@benjaminarai.com
Whole thread Raw
In response to Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.  (Benjamin Arai <benjamin@araisoft.com>)
Responses Re: [PERFORM] Slow TSearch2 performance for table with 1 million documents.
List pgsql-general
It appears that the ORDER BY rank operation is the slowing factor.
If I remove it then the query is pretty fast.  Is there another way
to perform ORDER BY such that it does not do a sort?

Benjamin

On Oct 5, 2007, at 3:57 PM, Benjamin Arai wrote:

>
> On Oct 5, 2007, at 8:32 AM, Oleg Bartunov wrote:
>
>> On Fri, 5 Oct 2007, Tom Lane wrote:
>>
>>> Benjamin Arai <benjamin@araisoft.com> writes:
>>>> # explain analyze select * FROM fulltext_article, to_tsquery
>>>> ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti,
>>>> q) DESC;
>>>
>>>>    QUERY PLAN
>>>> -------------------------------------------------------------------
>>>> -----
>>>> -------------------------------------------------------------------
>>>> -----
>>>> ------------
>>>> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
>>>> time=12969.237..12970.490 rows=5119 loops=1)
>>>>     Sort Key: rank(fulltext_article.idxfti, q.q)
>>>>     ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
>>>> (actual time=209.513..12955.498 rows=5119 loops=1)
>>>>           ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
>>>> (actual time=0.005..0.006 rows=1 loops=1)
>>>>           ->  Bitmap Heap Scan on fulltext_article
>>>> (cost=3069.79..6516.70 rows=933 width=742) (actual
>>>> time=209.322..234.390 rows=5119 loops=1)
>>>>                 Recheck Cond: (fulltext_article.idxfti @@ q.q)
>>>>                 ->  Bitmap Index Scan on
>>>> fulltext_article_idxfti_idx
>>>> (cost=0.00..3069.56 rows=933 width=0) (actual time=208.373..208.373
>>>> rows=5119 loops=1)
>>>>                       Index Cond: (fulltext_article.idxfti @@ q.q)
>>>> Total runtime: 12973.035 ms
>>>> (9 rows)
>>>
>>> The time seems all spent at the join step, which is odd because it
>>> really hasn't got much to do.  AFAICS all it has to do is compute
>>> the
>>> rank() values that the sort step will use.  Is it possible that
>>> rank() is really slow?
>>
>> can you try rank_cd() instead ?
>>
> Using Rank:
>
> -# ('simple','dog') AS q  WHERE idxfti @@ q ORDER BY rank(idxfti,
> q) DESC;
>
>    QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> ----------------
> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=98083.081..98084.351 rows=5119 loops=1)
>    Sort Key: rank(fulltext_article.idxfti, q.q)
>    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=479.122..98067.594 rows=5119 loops=1)
>          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.003..0.004 rows=1 loops=1)
>          ->  Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=341.739..37112.110 rows=5119 loops=1)
>                Recheck Cond: (fulltext_article.idxfti @@ q.q)
>                ->  Bitmap Index Scan on
> fulltext_article_idxfti_idx  (cost=0.00..3069.56 rows=933 width=0)
> (actual time=321.443..321.443 rows=5119 loops=1)
>                      Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 98087.575 ms
> (9 rows)
>
> Using Rank_cd:
>
> # explain analyze select * FROM fulltext_article, to_tsquery
> ('simple','cat') AS q  WHERE idxfti @@ q ORDER BY rank_cd(idxfti,
> q) DESC;
>
>    QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> -----------------
> Sort  (cost=6576.74..6579.07 rows=933 width=774) (actual
> time=199316.648..199324.631 rows=26054 loops=1)
>    Sort Key: rank_cd(fulltext_article.idxfti, q.q)
>    ->  Nested Loop  (cost=3069.79..6530.71 rows=933 width=774)
> (actual time=871.428..199244.330 rows=26054 loops=1)
>          ->  Function Scan on q  (cost=0.00..0.01 rows=1 width=32)
> (actual time=0.006..0.007 rows=1 loops=1)
>          ->  Bitmap Heap Scan on fulltext_article
> (cost=3069.79..6516.70 rows=933 width=742) (actual
> time=850.674..50146.477 rows=26054 loops=1)
>                Recheck Cond: (fulltext_article.idxfti @@ q.q)
>                ->  Bitmap Index Scan on
> fulltext_article_idxfti_idx  (cost=0.00..3069.56 rows=933 width=0)
> (actual time=838.120..838.120 rows=26054 loops=1)
>                      Index Cond: (fulltext_article.idxfti @@ q.q)
> Total runtime: 199338.297 ms
> (9 rows)
>
>>
>>>
>>>             regards, tom lane
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 5: don't forget to increase your free space map settings
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet
>> (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>


pgsql-general by date:

Previous
From: Karsten Hilbert
Date:
Subject: Re: XMIN semantic at peril ?
Next
From: "Filip Rembiałkowski"
Date:
Subject: Re: XMIN semantic at peril ?