Re: tsearch2 seem very slow - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch2 seem very slow |
Date | |
Msg-id | Pine.GSO.4.63.0509241004350.27150@ra.sai.msu.su Whole thread Raw |
In response to | Re: tsearch2 seem very slow ("Ahmad Fajar" <gendowo@konphalindo.or.id>) |
Responses |
Re: tsearch2 seem very slow
|
List | pgsql-performance |
Ahmad, what's about the number of unique words ? I mean stat() function. Sometimes, it helps to identify garbage words. How big is your articles (average length) ? please, cut'n paste queries and output from psql ! How fast are next queries ? Oleg On Fri, 23 Sep 2005, Ahmad Fajar wrote: > Hi Oleg, > > For single index I try this query: > explain analyze > select articleid, title, datee from articles > where fti @@ to_tsquery('bank&indonesia'); > > analyze result: > ---------------- > "Index Scan using fti_idx on articles (cost=0.00..862.97 rows=420 width=51) > (actual time=0.067..183761.324 rows=46186 loops=1)" > " Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)" > "Total runtime: 183837.826 ms" > > And for multicolumn index I try this query: > explain analyze > select articleid, title, datee from articles > where fti @@ to_tsquery('bank&mega'); > > analyze result: > ---------------- > "Index Scan using articles_x1 on articles (cost=0.00..848.01 rows=410 > width=51) (actual time=52.204..37914.135 rows=1841 loops=1)" > " Index Cond: ((datee >= '2002-01-01'::date) AND (datee <= > ('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))" > "Total runtime: 37933.757 ms" > > The table structure is as mention on the first talk. If you wanna know how > much table in my database, it's about 100 tables or maybe more. Now I > develop the version 2 of my web application, you can take a look at: > http://www.mediatrac.net, so it will hold many datas. But the biggest table > is article's table. On develop this version 2 I just use half data of the > article's table (about 419804 rows). May be if I import all of the article's > table data it will have 1 million rows. The article's table grows rapidly, > about 100000 rows per-week. My developing database size is 28 GB (not real > database, coz I still develop the version 2 and I use half of the data for > play around). I just wanna to perform quick search (fulltext search) on my > article's table not other table. On version 1, the current running version I > use same hardware spesification as mention below, but there is no fulltext > search. So I develop the new version with new features, new interface and > include the fulltext search. > > I do know, if the application finish, I must use powerfull hardware. But how > can I guarantee the application will run smooth, if I do fulltext search on > 419804 rows in a table it took a long time to get the result. > > Could you or friends in this maling-list help me....plz..plzz > > Tsearch2 configuration: > ------------------------- > I use default configuration, english stop word file as tsearch2 provide, > stem dictionary as default (coz I don't know how to configure and add new > data to stem dictionary) and I add some words to the english stop word file. > > Postgresql configuration > ------------------------- > max_connections = 32 > shared_buffers = 32768 > sort_mem = 8192 > vacuum_mem = 65536 > work_mem = 16384 > maintenance_work_mem = 65536 > max_fsm_pages = 30000 > max_fsm_relations = 1000 > max_files_per_process = 100000 > checkpoint_segments = 15 > effective_cache_size = 192000 > random_page_cost = 2 > geqo = true > geqo_threshold = 50 > geqo_effort = 5 > geqo_pool_size = 0 > geqo_generations = 0 > geqo_selection_bias = 2.0 > from_collapse_limit = 10 > join_collapse_limit = 15 > > OS configuration: > ------------------ > I use Redhat 4 AS, kernel 2.6.9-11 > kernel.shmmax=1073741824 > kernel.sem=250 32000 100 128 > fs.aio-max-nr=5242880 > the server I configure just only for postgresql, no other service is running > like: www, samba, ftp, email, firewall > > hardware configuration: > ------------------------ > Motherboard ASUS P5GD1 > Processor P4 3,2 GHz > Memory 2 GB DDR 400, > 2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for postgresql > data and the partition is EXT3 > 1x80 GB EIDE 7200 RPM configure for system and home directory and the > partiton is EXT3 > > Did I miss something? > > Regards, > ahmad fajar > > > -----Original Message----- > From: Oleg Bartunov [mailto:oleg@sai.msu.su] > Sent: Jumat, 23 September 2005 18:26 > To: Ahmad Fajar > Cc: pgsql-performance@postgresql.org > Subject: RE: [PERFORM] tsearch2 seem very slow > > On Fri, 23 Sep 2005, Ahmad Fajar wrote: > >> Hi Oleg, >> >> I didn't deny on the third repeat or more, it can reach < 600 msec. It is >> only because the result still in postgres cache, but how about in the > first >> run? I didn't dare, the values is un-acceptable. Because my table will > grows >> rapidly, it's about 100000 rows per-week. And the visitor will search >> anything that I don't know, whether it's the repeated search or new > search, >> or whether it's in postgres cache or not. > > if you have enoush shared memory postgresql will keep index pages there. > > >> >> I just compare with http://www.postgresql.org, the search is quite fast, > and >> I don't know whether the site uses tsearch2 or something else. But as fas > as >> I know, if the rows reach >100 milion (I have try for 200 milion rows and > it >> seem very slow), even if don't use tsearch2, only use simple query like: >> select f1, f2 from table1 where f2='blabla', >> and f2 is indexes, my postgres still slow on the first time, about >10 > sec. >> because of this I tried something brand new to fullfill my needs. I have >> used fti, and tsearch2 but still slow. >> >> I don't know what's going wrong with my postgres, what configuration must > I >> do to perform the query get fast result. Or must I use enterprisedb 2005 > or >> pervasive postgres (both uses postgres), I don't know very much about > these >> two products. > > you didn't show us your configuration (hardware,postgresql and tsearch2), > explain analyze of your queries, so we can't help you. > How big is your database, tsearch2 index size ? > > >> >> Regards, >> ahmad fajar >> >> >> -----Original Message----- >> From: Oleg Bartunov [mailto:oleg@sai.msu.su] >> Sent: Jumat, 23 September 2005 14:36 >> To: Ahmad Fajar >> Cc: pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] tsearch2 seem very slow >> >> Ahmad, >> >> how fast is repeated runs ? First time system could be very slow. >> Also, have you checked my page >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes >> and some info about tsearch2 internals >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals >> >> Oleg >> On Thu, 22 Sep 2005, Ahmad Fajar wrote: >> >>> I have about 419804 rows in my article table. I have installed tsearch2 >> and >>> its gist index correctly. >>> >>> My table structure is: >>> >>> CREATE TABLE tbarticles >>> >>> ( >>> >>> articleid int4 NOT NULL, >>> >>> title varchar(250), >>> >>> mediaid int4, >>> >>> datee date, >>> >>> content text, >>> >>> contentvar text, >>> >>> mmcol float4 NOT NULL, >>> >>> sirkulasi float4, >>> >>> page varchar(10), >>> >>> tglisidata date, >>> >>> namapc varchar(12), >>> >>> usere varchar(12), >>> >>> file_pdf varchar(255), >>> >>> file_pdf2 varchar(50), >>> >>> kolom int4, >>> >>> size_jpeg int4, >>> >>> journalist varchar(120), >>> >>> ratebw float4, >>> >>> ratefc float4, >>> >>> fti tsvector, >>> >>> CONSTRAINT pk_tbarticles PRIMARY KEY (articleid) >>> >>> ) WITHOUT OIDS; >>> >>> Create index fti_idx1 on tbarticles using gist (fti); >>> >>> Create index fti_idx2 on tbarticles using gist (datee, fti); >>> >>> >>> >>> But when I search something like: >>> >>> Select articleid, title, datee from tbarticles where fti @@ >>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla'); >>> >>> It takes about 30 sec. I run explain analyze and the index is used >>> correctly. >>> >>> >>> >>> Then I try multi column index to filter by date, and my query something >>> like: >>> >>> Select articleid, title, datee from tbarticles where fti @@ >>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >= >> '2002-01-01' >>> and datee <= current_date >>> >>> An it still run about 25 sec. I do run explain analyze and my multicolumn >>> index is used correctly. >>> >>> This is not acceptable if want to publish my website if the search took >> very >>> longer. >>> >>> >>> >>> I have run vacuum full analyze before doing such query. What going wrong >>> with my query?? Is there any way to make this faster? >>> >>> I have try to tune my postgres configuration, but it seem helpless. My >> linux >>> box is Redhat 4 AS, and >>> >>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure >> as >>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM. >>> >>> >>> >>> Please.help.help. >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, >> Sternberg Astronomical Institute, Moscow University (Russia) >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(095)939-16-83, +007(095)939-23-83 >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-performance by date: