Re: Tsearch2 really slower than ilike ? - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: Tsearch2 really slower than ilike ?
Date
Msg-id Pine.GSO.4.61.0411172017120.18871@ra.sai.msu.su
Whole thread Raw
In response to Re: Tsearch2 really slower than ilike ?  (Hervé Piedvache <herve@elma.fr>)
Responses Re: Tsearch2 really slower than ilike ?
List pgsql-performance
  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1817792895-1100712215=:18871
Content-Type: TEXT/PLAIN; charset=koi8-r; format=flowed
Content-Transfer-Encoding: 8BIT

1;2c1;2c1;2cOn Wed, 17 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:

> Oleg,
>
> Sorry but when I do your request I get :
> # select id_site from site where idx_site_name @@  'livejourn';
> ERROR:  type " " does not exist
>

no idea :) btw, what version of postgresql and OS you're running.
Could you try minimal test - check sql commands from tsearch2 sources,
some basic queries from tsearch2 documentation, tutorials.

btw, your query should looks like
select id_site from site_rss where idx_site_name @@ 'livejourn';
                     ^^^^^^^^

How did you run your queries at all ? I mean your first message about
poor tsearch2 performance.

1;2c1;2c1;2c

> What is this ?
>
> (private: I don't know what happend with my mail, but I do nothing special to
> disturb the contains when I'm writting to you ...)
>
> Le Mardi 16 Novembre 2004 22:13, Oleg Bartunov a ?crit :
>> ok, I downloaded dump of table and here is what I found:
>>
>> zz=# select count(*) from tt;
>>   count
>> --------
>>   183956
>> (1 row)
>>
>> zz=# select * from stat('select tt from tt') order by ndoc desc, nentry
>> desc,wo
>> rd limit 10;
>>       word     | ndoc  | nentry
>> --------------+-------+--------
>>   blog         | 12710 |  12835
>>   weblog       |  4857 |   4859
>>   news         |  4402 |   4594
>>   life         |  4136 |   4160
>>   world        |  1980 |   1986
>>   journal      |  1882 |   1883
>>   livejourn    |  1737 |   1737
>>   thought      |  1669 |   1677
>>   web          |  1154 |   1161
>>   scotsman.com |  1138 |   1138
>> (10 rows)
>>
>> zz=# explain analyze select tt from tt where tt @@  'blog';
>>                                                        QUERY PLAN
>> ---------------------------------------------------------------------------
>> ------------------------------------------- Index Scan using tt_idx on tt
>> (cost=0.00..728.83 rows=184 width=32) (actual time=0.047..141.110
>> rows=12710 loops=1) Index Cond: (tt @@ '\'blog\''::tsquery)
>>     Filter: (tt @@ '\'blog\''::tsquery)
>>   Total runtime: 154.105 ms
>> (4 rows)
>>
>> It's really fast ! So, I don't understand your problem.
>> I run query on my desktop machine, nothing special.
>>
>>
>>      Oleg
>>
>> On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote:
>>> Hi,
>>>
>>> I'm completly dispointed with Tsearch2 ...
>>>
>>> I have a table like this :
>>>                                          Table "public.site"
>>>    Column     |            Type             |
>>> Modifiers
>>> ---------------+-----------------------------+---------------------------
>>> ------------------------------------ id_site       | integer
>>>       | not null default
>>> nextval('public.site_id_site_seq'::text)
>>> site_name     | text                        |
>>> site_url      | text                        |
>>> url            | text                        |
>>> language      | text                        |
>>> datecrea      | date                        | default now()
>>> id_category   | integer                     |
>>> time_refresh  | integer                     |
>>> active        | integer                     |
>>> error         | integer                     |
>>> description   | text                        |
>>> version       | text                        |
>>> idx_site_name | tsvector                    |
>>> lastcheck     | date                        |
>>> lastupdate    | timestamp without time zone |
>>> Indexes:
>>>    "site_id_site_key" unique, btree (id_site)
>>>    "ix_idx_site_name" gist (idx_site_name)
>>> Triggers:
>>>    tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW
>>> EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name')
>>>
>>> I have 183 956 records in the database ...
>>>
>>> SELECT s.site_name, s.id_site, s.description, s.site_url,
>>>                case when exists (select id_user
>>>                                                    from user_choice u
>>>                                                 where u.id_site=s.id_site
>>>                                                    and u.id_user = 1)
>>> then 1 else 0 end as bookmarked
>>>   FROM site s
>>> WHERE s.idx_site_name @@ to_tsquery('atari');
>>>
>>> Explain Analyze :
>>>                                                               QUERY PLAN
>>> -------------------------------------------------------------------------
>>> ----------------------------------------------------------------- Index
>>> Scan using ix_idx_site_name on site s  (cost=0.00..1202.12 rows=184
>>> width=158) (actual time=4687.674..4698.422 rows=1 loops=1)
>>>   Index Cond: (idx_site_name @@ '\'atari\''::tsquery)
>>>   Filter: (idx_site_name @@ '\'atari\''::tsquery)
>>>   SubPlan
>>>     ->  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4)
>>> (actual time=0.232..0.232 rows=0 loops=1)
>>>           Filter: ((id_site = $0) AND (id_user = 1))
>>> Total runtime: 4698.608 ms
>>>
>>> First time I run the request I have a result in about 28 seconds.
>>>
>>> SELECT s.site_name, s.id_site, s.description, s.site_url,
>>>                case when exists (select id_user
>>>                                                    from user_choice u
>>>                                                 where u.id_site=s.id_site
>>>                                                    and u.id_user = 1)
>>> then 1 else 0 end as bookmarked
>>>   FROM site_rss s
>>> WHERE s.site_name ilike '%atari%'
>>>
>>>                                                   QUERY PLAN
>>> -------------------------------------------------------------------------
>>> --------------------------------------- Seq Scan on site_rss s
>>> (cost=0.00..11863.16 rows=295 width=158) (actual time=17.414..791.937
>>> rows=12 loops=1)
>>>   Filter: (site_name ~~* '%atari%'::text)
>>>   SubPlan
>>>     ->  Seq Scan on user_choice u  (cost=0.00..3.46 rows=1 width=4)
>>> (actual time=0.222..0.222 rows=0 loops=12)
>>>           Filter: ((id_site = $0) AND (id_user = 1))
>>> Total runtime: 792.099 ms
>>>
>>> First time I run the request I have a result in about 789 miliseconds
>>> !!???
>>>
>>> I'm using PostgreSQL v7.4.6 with a Bi-Penitum III 933 Mhz and 1 Gb of
>>> RAM.
>>>
>>> Any idea ... ? For the moment I'm going back to use the ilike solution
>>> ... but I was really thinking that Tsearch2 could be a better solution
>>> ...
>>>
>>> Regards,
>>
>>      Regards,
>>          Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: the planner will ignore your desire to choose an index scan if your
>>       joining column's datatypes do not match
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83
---559023410-1817792895-1100712215=:18871--

pgsql-performance by date:

Previous
From: Hervé Piedvache
Date:
Subject: Re: Tsearch2 really slower than ilike ?
Next
From: "Matthew T. O'Connor"
Date:
Subject: Re: query plan question