Re: tsearch2 seem very slow - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: tsearch2 seem very slow
Date
Msg-id Pine.GSO.4.63.0509241004350.27150@ra.sai.msu.su
Whole thread Raw
In response to Re: tsearch2 seem very slow  ("Ahmad Fajar" <gendowo@konphalindo.or.id>)
Responses Re: tsearch2 seem very slow
List pgsql-performance
Ahmad,

what's about the number of unique words ? I mean stat() function.
Sometimes, it helps to identify garbage words.
How big is your articles (average length) ?

please, cut'n paste queries and output from psql ! How fast are
next queries ?

     Oleg
On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> For single index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&indonesia');
>
> analyze result:
> ----------------
> "Index Scan using fti_idx on articles  (cost=0.00..862.97 rows=420 width=51)
> (actual time=0.067..183761.324 rows=46186 loops=1)"
> "  Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
> "Total runtime: 183837.826 ms"
>
> And for multicolumn index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&mega');
>
> analyze result:
> ----------------
> "Index Scan using articles_x1 on articles  (cost=0.00..848.01 rows=410
> width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
> "  Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
> ('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
> "Total runtime: 37933.757 ms"
>
> The table structure is as mention on the first talk. If you wanna know how
> much table in my database, it's about 100 tables or maybe more. Now I
> develop the version 2 of my web application, you can take a look at:
> http://www.mediatrac.net, so it will hold many datas. But the biggest table
> is article's table. On develop this version 2 I just use half data of the
> article's table (about 419804 rows). May be if I import all of the article's
> table data it will have 1 million rows. The article's table grows rapidly,
> about 100000 rows per-week. My developing database size is 28 GB (not real
> database, coz I still develop the version 2 and I use half of the data for
> play around). I just wanna to perform quick search (fulltext search) on my
> article's table not other table. On version 1, the current running version I
> use same hardware spesification as mention below, but there is no fulltext
> search. So I develop the new version with new features, new interface and
> include the fulltext search.
>
> I do know, if the application finish, I must use powerfull hardware. But how
> can I guarantee the application will run smooth, if I do fulltext search on
> 419804 rows in a table it took a long time to get the result.
>
> Could you or friends in this maling-list help me....plz..plzz
>
> Tsearch2 configuration:
> -------------------------
> I use default configuration, english stop word file as tsearch2 provide,
> stem dictionary as default (coz I don't know how to configure and add new
> data to stem dictionary) and I add some words to the english stop word file.
>
> Postgresql configuration
> -------------------------
> max_connections = 32
> shared_buffers = 32768
> sort_mem = 8192
> vacuum_mem = 65536
> work_mem = 16384
> maintenance_work_mem = 65536
> max_fsm_pages = 30000
> max_fsm_relations = 1000
> max_files_per_process = 100000
> checkpoint_segments = 15
> effective_cache_size = 192000
> random_page_cost = 2
> geqo = true
> geqo_threshold = 50
> geqo_effort = 5
> geqo_pool_size = 0
> geqo_generations = 0
> geqo_selection_bias = 2.0
> from_collapse_limit = 10
> join_collapse_limit = 15
>
> OS configuration:
> ------------------
> I use Redhat 4 AS, kernel 2.6.9-11
> kernel.shmmax=1073741824
> kernel.sem=250 32000 100 128
> fs.aio-max-nr=5242880
> the server I configure just only for postgresql, no other service is running
> like: www, samba, ftp, email, firewall
>
> hardware configuration:
> ------------------------
> Motherboard ASUS P5GD1
> Processor P4 3,2 GHz
> Memory 2 GB DDR 400,
> 2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for postgresql
> data and the partition is EXT3
> 1x80 GB EIDE 7200 RPM configure for system and home directory and the
> partiton is EXT3
>
> Did I miss something?
>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Sent: Jumat, 23 September 2005 18:26
> To: Ahmad Fajar
> Cc: pgsql-performance@postgresql.org
> Subject: RE: [PERFORM] tsearch2 seem very slow
>
> On Fri, 23 Sep 2005, Ahmad Fajar wrote:
>
>> Hi Oleg,
>>
>> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
>> only because the result still in postgres cache, but how about in the
> first
>> run? I didn't dare, the values is un-acceptable. Because my table will
> grows
>> rapidly, it's about 100000 rows per-week. And the visitor will search
>> anything that I don't know, whether it's the repeated search or new
> search,
>> or whether it's in postgres cache or not.
>
> if you have enoush shared memory postgresql will keep index pages there.
>
>
>>
>> I just compare with http://www.postgresql.org, the search is quite fast,
> and
>> I don't know whether the site uses tsearch2 or something else. But as fas
> as
>> I know, if the rows reach >100 milion (I have try for 200 milion rows and
> it
>> seem very slow), even if don't use tsearch2, only use simple query like:
>> select f1, f2 from table1 where f2='blabla',
>> and f2 is indexes, my postgres still slow on the first time, about >10
> sec.
>> because of this I tried something brand new to fullfill my needs. I have
>> used fti, and tsearch2 but still slow.
>>
>> I don't know what's going wrong with my postgres, what configuration must
> I
>> do to perform the query get fast result. Or must I use enterprisedb 2005
> or
>> pervasive postgres (both uses postgres), I don't know very much about
> these
>> two products.
>
> you didn't show us your configuration (hardware,postgresql and tsearch2),
> explain analyze of your queries, so we can't help you.
> How big is your database, tsearch2 index size ?
>
>
>>
>> Regards,
>> ahmad fajar
>>
>>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
>> Sent: Jumat, 23 September 2005 14:36
>> To: Ahmad Fajar
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] tsearch2 seem very slow
>>
>> Ahmad,
>>
>> how fast is repeated runs ? First time system could be very slow.
>> Also, have you checked my page
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>> and some info about tsearch2 internals
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>>
>>     Oleg
>> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>>
>>> I have about 419804 rows in my article table. I have installed tsearch2
>> and
>>> its gist index correctly.
>>>
>>> My table structure is:
>>>
>>> CREATE TABLE tbarticles
>>>
>>> (
>>>
>>>  articleid int4 NOT NULL,
>>>
>>>  title varchar(250),
>>>
>>>  mediaid int4,
>>>
>>>  datee date,
>>>
>>>  content text,
>>>
>>>  contentvar text,
>>>
>>>  mmcol float4 NOT NULL,
>>>
>>>  sirkulasi float4,
>>>
>>>  page varchar(10),
>>>
>>>  tglisidata date,
>>>
>>>  namapc varchar(12),
>>>
>>>  usere varchar(12),
>>>
>>>  file_pdf varchar(255),
>>>
>>>  file_pdf2 varchar(50),
>>>
>>>  kolom int4,
>>>
>>>  size_jpeg int4,
>>>
>>>  journalist varchar(120),
>>>
>>>  ratebw float4,
>>>
>>>  ratefc float4,
>>>
>>>  fti tsvector,
>>>
>>>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>>
>>> ) WITHOUT OIDS;
>>>
>>> Create index fti_idx1 on tbarticles using gist (fti);
>>>
>>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>>
>>>
>>>
>>> But when I search something like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>>
>>> It takes about 30 sec. I run explain analyze and the index is used
>>> correctly.
>>>
>>>
>>>
>>> Then I try multi column index to filter by date, and my query something
>>> like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
>> '2002-01-01'
>>> and datee <= current_date
>>>
>>> An it still run about 25 sec. I do run explain analyze and my multicolumn
>>> index is used correctly.
>>>
>>> This is not acceptable if want to publish my website if the search took
>> very
>>> longer.
>>>
>>>
>>>
>>> I have run vacuum full analyze before doing such query. What going wrong
>>> with my query?? Is there any way to make this faster?
>>>
>>> I have try to tune my postgres configuration, but it seem helpless. My
>> linux
>>> box is Redhat 4 AS, and
>>>
>>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure
>> as
>>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>>
>>>
>>>
>>> Please.help.help.
>>>
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-performance by date:

Previous
From: Mark Kirkwood
Date:
Subject: Re: SELECT LIMIT 1 VIEW Performance Issue
Next
From: PFC
Date:
Subject: Advice on RAID card