Re: tsearch2 seem very slow - Mailing list pgsql-performance

From Ahmad Fajar
Subject Re: tsearch2 seem very slow
Date
Msg-id SVONE8rBKXGA7zyWR4A0000023b@ki-communication.com
Whole thread Raw
In response to Re: tsearch2 seem very slow  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: tsearch2 seem very slow
List pgsql-performance
Hi Oleg,

> what king of garbage ? Probably you index not needed token types, for
> example, email address, file names....

> do you need proximity ? If no, use strip(tsvector) function to remove
> coordinate information from tsvector.

I need proximity. Some time I have to rank my article and make a chart for
that.

> don't index default configuration and index only needed tokens, for
> example, to index only 3 type of tokens, first create 'qq' configuration
> and specify tokens to index.

> insert into pg_ts_cfg values('qq','default','en_US');
-- tokens to index
> insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
> insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
> insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}');

I still don't understand about tsearch2 configuration, so until now I just
use default configuration. I will try your suggestion. But how can I get the
en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}';
From default configuration I only see: ..., '{en_stem}';

> Beside that, I still have problem, if I do a simple query like:
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
single
> word); The table have 200 million rows, I have index the keywords field.
On
> the first time my query seem to slow to get the result, about 15-60 sec to
> get the result. I use latest pgAdmin3 to test all queries. But if I repeat
> the query I will get fast result. My question is why on the first time the
> query seem to slow.

> because index pages should be readed from disk into shared buffers, so
> next query will benefit from that. You need enough shared memory to get
> real benefit. You may get postgresql stats and look on cache hit ration.

> btw, how does your query ( keywords='blabla') relates to tsearch2 ?

(Keywords='blabla') isn't related to tsearch2, I just got an idea from
tsearch2 and try different approach. But I stuck on the query result speed.
Very slow to get result on the first query.
And how to see postgresql stats and look on cache hit ratio? I still don't
know how to get it.

> I try to cluster the table base on keyword index, but after 15 hours
> waiting and it doesn't finish I stop clustering.

> don't use cluster for big tables ! simple
>  select *  into clustered_foo from foo order by indexed_field
> would be faster and does the same job.

What the use of clustered_foo table? And how to use it?
I think it will not distinct duplicate rows. And the clustered_foo table
still not have an index, so if query to this table, I think the query will
be very slow to get a result.

Regards,
ahmad fajar


pgsql-performance by date:

Previous
From: "Ahmad Fajar"
Date:
Subject: Query seem to slow if table have more than 200 million rows
Next
From: Oleg Bartunov
Date:
Subject: Re: tsearch2 seem very slow