Re: tsearch2 seem very slow - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: tsearch2 seem very slow
Date
Msg-id Pine.GSO.4.63.0509260007010.27150@ra.sai.msu.su
Whole thread Raw
In response to Re: tsearch2 seem very slow  ("Ahmad Fajar" <gendowo@konphalindo.or.id>)
Responses Re: tsearch2 seem very slow  ("Ahmad Fajar" <gendowo@konphalindo.or.id>)
List pgsql-performance
Ahmad,

On Mon, 26 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
>> what king of garbage ? Probably you index not needed token types, for
>> example, email address, file names....
>
>> do you need proximity ? If no, use strip(tsvector) function to remove
>> coordinate information from tsvector.
>
> I need proximity. Some time I have to rank my article and make a chart for
> that.
>
>> don't index default configuration and index only needed tokens, for
>> example, to index only 3 type of tokens, first create 'qq' configuration
>> and specify tokens to index.
>
>> insert into pg_ts_cfg values('qq','default','en_US');
> -- tokens to index
>> insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
>> insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
>> insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}');
>
> I still don't understand about tsearch2 configuration, so until now I just
> use default configuration. I will try your suggestion. But how can I get the
> en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}';
>> From default configuration I only see: ..., '{en_stem}';

I think you should read documentation. I couldn't explain you things already
written.

>
>> Beside that, I still have problem, if I do a simple query like:
>> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
>> word); The table have 200 million rows, I have index the keywords field.
> On
>> the first time my query seem to slow to get the result, about 15-60 sec to
>> get the result. I use latest pgAdmin3 to test all queries. But if I repeat
>> the query I will get fast result. My question is why on the first time the
>> query seem to slow.
>
>> because index pages should be readed from disk into shared buffers, so
>> next query will benefit from that. You need enough shared memory to get
>> real benefit. You may get postgresql stats and look on cache hit ration.
>
>> btw, how does your query ( keywords='blabla') relates to tsearch2 ?
>
> (Keywords='blabla') isn't related to tsearch2, I just got an idea from
> tsearch2 and try different approach. But I stuck on the query result speed.
> Very slow to get result on the first query.
> And how to see postgresql stats and look on cache hit ratio? I still don't
> know how to get it.
>

learn from http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

>> I try to cluster the table base on keyword index, but after 15 hours
>> waiting and it doesn't finish I stop clustering.
>
>> don't use cluster for big tables ! simple
>>  select *  into clustered_foo from foo order by indexed_field
>> would be faster and does the same job.
>
> What the use of clustered_foo table? And how to use it?
> I think it will not distinct duplicate rows. And the clustered_foo table
> still not have an index, so if query to this table, I think the query will
> be very slow to get a result.

oh guy, you certainly need to read documentation
http://www.postgresql.org/docs/8.0/static/sql-cluster.html


>
> Regards,
> ahmad fajar
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-performance by date:

Previous
From: "Ahmad Fajar"
Date:
Subject: Re: tsearch2 seem very slow
Next
From: "Steinar H. Gunderson"
Date:
Subject: Re: Advice on RAID card