Thread: tsearch2 seem very slow

tsearch2 seem very slow

From
"Ahmad Fajar"
Date:

I have about 419804 rows in my article table. I have installed tsearch2 and its gist index correctly.

My table structure is:

CREATE TABLE tbarticles

(

  articleid int4 NOT NULL,

  title varchar(250),

  mediaid int4,

  datee date,

  content text,

  contentvar text,

  mmcol float4 NOT NULL,

  sirkulasi float4,

  page varchar(10),

  tglisidata date,

  namapc varchar(12),

  usere varchar(12),

  file_pdf varchar(255),

  file_pdf2 varchar(50),

  kolom int4,

  size_jpeg int4,

  journalist varchar(120),

  ratebw float4,

  ratefc float4,

  fti tsvector,

  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)

) WITHOUT OIDS;

Create index fti_idx1 on tbarticles using gist (fti);

Create index fti_idx2 on tbarticles using gist (datee, fti);

 

But when I search something like:

Select articleid, title, datee from tbarticles where fti @@ to_tsquery(‘susilo&bambang&yudhoyono&jusuf&kalla’);

It takes about 30 sec. I run explain analyze and the index is used correctly.

 

Then I try multi column index to filter by date, and my query something like:

Select articleid, title, datee from tbarticles where fti @@ to_tsquery(‘susilo&bambang&yudhoyono&jusuf&kalla’) and datee >= '2002-01-01' and datee <= current_date

An it still run about 25 sec. I do run explain analyze and my multicolumn index is used correctly.

This is not acceptable if want to publish my website if the search took very longer.

 

I have run vacuum full analyze before doing such query. What going wrong with my query?? Is there any way to make this faster?

I have try to tune my postgres configuration, but it seem helpless. My linux box is Redhat 4 AS, and

the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure as RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.

 

Please…help…help…

Re: tsearch2 seem very slow

From
Oleg Bartunov
Date:
Ahmad,

how fast is repeated runs ? First time system could be very slow.
Also, have you checked my page
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
and some info about tsearch2 internals
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals

     Oleg
On Thu, 22 Sep 2005, Ahmad Fajar wrote:

> I have about 419804 rows in my article table. I have installed tsearch2 and
> its gist index correctly.
>
> My table structure is:
>
> CREATE TABLE tbarticles
>
> (
>
>  articleid int4 NOT NULL,
>
>  title varchar(250),
>
>  mediaid int4,
>
>  datee date,
>
>  content text,
>
>  contentvar text,
>
>  mmcol float4 NOT NULL,
>
>  sirkulasi float4,
>
>  page varchar(10),
>
>  tglisidata date,
>
>  namapc varchar(12),
>
>  usere varchar(12),
>
>  file_pdf varchar(255),
>
>  file_pdf2 varchar(50),
>
>  kolom int4,
>
>  size_jpeg int4,
>
>  journalist varchar(120),
>
>  ratebw float4,
>
>  ratefc float4,
>
>  fti tsvector,
>
>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>
> ) WITHOUT OIDS;
>
> Create index fti_idx1 on tbarticles using gist (fti);
>
> Create index fti_idx2 on tbarticles using gist (datee, fti);
>
>
>
> But when I search something like:
>
> Select articleid, title, datee from tbarticles where fti @@
> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>
> It takes about 30 sec. I run explain analyze and the index is used
> correctly.
>
>
>
> Then I try multi column index to filter by date, and my query something
> like:
>
> Select articleid, title, datee from tbarticles where fti @@
> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >= '2002-01-01'
> and datee <= current_date
>
> An it still run about 25 sec. I do run explain analyze and my multicolumn
> index is used correctly.
>
> This is not acceptable if want to publish my website if the search took very
> longer.
>
>
>
> I have run vacuum full analyze before doing such query. What going wrong
> with my query?? Is there any way to make this faster?
>
> I have try to tune my postgres configuration, but it seem helpless. My linux
> box is Redhat 4 AS, and
>
> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure as
> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>
>
>
> Please.help.help.
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 seem very slow

From
Oleg Bartunov
Date:
On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
> only because the result still in postgres cache, but how about in the first
> run? I didn't dare, the values is un-acceptable. Because my table will grows
> rapidly, it's about 100000 rows per-week. And the visitor will search
> anything that I don't know, whether it's the repeated search or new search,
> or whether it's in postgres cache or not.

if you have enoush shared memory postgresql will keep index pages there.


>
> I just compare with http://www.postgresql.org, the search is quite fast, and
> I don't know whether the site uses tsearch2 or something else. But as fas as
> I know, if the rows reach >100 milion (I have try for 200 milion rows and it
> seem very slow), even if don't use tsearch2, only use simple query like:
> select f1, f2 from table1 where f2='blabla',
> and f2 is indexes, my postgres still slow on the first time, about >10 sec.
> because of this I tried something brand new to fullfill my needs. I have
> used fti, and tsearch2 but still slow.
>
> I don't know what's going wrong with my postgres, what configuration must I
> do to perform the query get fast result. Or must I use enterprisedb 2005 or
> pervasive postgres (both uses postgres), I don't know very much about these
> two products.

you didn't show us your configuration (hardware,postgresql and tsearch2),
explain analyze of your queries, so we can't help you.
How big is your database, tsearch2 index size ?


>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Sent: Jumat, 23 September 2005 14:36
> To: Ahmad Fajar
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] tsearch2 seem very slow
>
> Ahmad,
>
> how fast is repeated runs ? First time system could be very slow.
> Also, have you checked my page
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
> and some info about tsearch2 internals
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>
>     Oleg
> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>
>> I have about 419804 rows in my article table. I have installed tsearch2
> and
>> its gist index correctly.
>>
>> My table structure is:
>>
>> CREATE TABLE tbarticles
>>
>> (
>>
>>  articleid int4 NOT NULL,
>>
>>  title varchar(250),
>>
>>  mediaid int4,
>>
>>  datee date,
>>
>>  content text,
>>
>>  contentvar text,
>>
>>  mmcol float4 NOT NULL,
>>
>>  sirkulasi float4,
>>
>>  page varchar(10),
>>
>>  tglisidata date,
>>
>>  namapc varchar(12),
>>
>>  usere varchar(12),
>>
>>  file_pdf varchar(255),
>>
>>  file_pdf2 varchar(50),
>>
>>  kolom int4,
>>
>>  size_jpeg int4,
>>
>>  journalist varchar(120),
>>
>>  ratebw float4,
>>
>>  ratefc float4,
>>
>>  fti tsvector,
>>
>>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>
>> ) WITHOUT OIDS;
>>
>> Create index fti_idx1 on tbarticles using gist (fti);
>>
>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>
>>
>>
>> But when I search something like:
>>
>> Select articleid, title, datee from tbarticles where fti @@
>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>
>> It takes about 30 sec. I run explain analyze and the index is used
>> correctly.
>>
>>
>>
>> Then I try multi column index to filter by date, and my query something
>> like:
>>
>> Select articleid, title, datee from tbarticles where fti @@
>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
> '2002-01-01'
>> and datee <= current_date
>>
>> An it still run about 25 sec. I do run explain analyze and my multicolumn
>> index is used correctly.
>>
>> This is not acceptable if want to publish my website if the search took
> very
>> longer.
>>
>>
>>
>> I have run vacuum full analyze before doing such query. What going wrong
>> with my query?? Is there any way to make this faster?
>>
>> I have try to tune my postgres configuration, but it seem helpless. My
> linux
>> box is Redhat 4 AS, and
>>
>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure
> as
>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>
>>
>>
>> Please.help.help.
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 seem very slow

From
"Ahmad Fajar"
Date:
Hi Oleg,

For single index I try this query:
explain analyze
select articleid, title, datee from articles
where fti @@ to_tsquery('bank&indonesia');

analyze result:
----------------
"Index Scan using fti_idx on articles  (cost=0.00..862.97 rows=420 width=51)
(actual time=0.067..183761.324 rows=46186 loops=1)"
"  Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
"Total runtime: 183837.826 ms"

And for multicolumn index I try this query:
explain analyze
select articleid, title, datee from articles
where fti @@ to_tsquery('bank&mega');

analyze result:
----------------
"Index Scan using articles_x1 on articles  (cost=0.00..848.01 rows=410
width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
"  Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
"Total runtime: 37933.757 ms"

The table structure is as mention on the first talk. If you wanna know how
much table in my database, it's about 100 tables or maybe more. Now I
develop the version 2 of my web application, you can take a look at:
http://www.mediatrac.net, so it will hold many datas. But the biggest table
is article's table. On develop this version 2 I just use half data of the
article's table (about 419804 rows). May be if I import all of the article's
table data it will have 1 million rows. The article's table grows rapidly,
about 100000 rows per-week. My developing database size is 28 GB (not real
database, coz I still develop the version 2 and I use half of the data for
play around). I just wanna to perform quick search (fulltext search) on my
article's table not other table. On version 1, the current running version I
use same hardware spesification as mention below, but there is no fulltext
search. So I develop the new version with new features, new interface and
include the fulltext search.

I do know, if the application finish, I must use powerfull hardware. But how
can I guarantee the application will run smooth, if I do fulltext search on
419804 rows in a table it took a long time to get the result.

Could you or friends in this maling-list help me....plz..plzz

Tsearch2 configuration:
-------------------------
I use default configuration, english stop word file as tsearch2 provide,
stem dictionary as default (coz I don't know how to configure and add new
data to stem dictionary) and I add some words to the english stop word file.

Postgresql configuration
-------------------------
max_connections = 32
shared_buffers = 32768
sort_mem = 8192
vacuum_mem = 65536
work_mem = 16384
maintenance_work_mem = 65536
max_fsm_pages = 30000
max_fsm_relations = 1000
max_files_per_process = 100000
checkpoint_segments = 15
effective_cache_size = 192000
random_page_cost = 2
geqo = true
geqo_threshold = 50
geqo_effort = 5
geqo_pool_size = 0
geqo_generations = 0
geqo_selection_bias = 2.0
from_collapse_limit = 10
join_collapse_limit = 15

OS configuration:
------------------
I use Redhat 4 AS, kernel 2.6.9-11
kernel.shmmax=1073741824
kernel.sem=250 32000 100 128
fs.aio-max-nr=5242880
the server I configure just only for postgresql, no other service is running
like: www, samba, ftp, email, firewall

hardware configuration:
------------------------
Motherboard ASUS P5GD1
Processor P4 3,2 GHz
Memory 2 GB DDR 400,
2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for postgresql
data and the partition is EXT3
1x80 GB EIDE 7200 RPM configure for system and home directory and the
partiton is EXT3

Did I miss something?

Regards,
ahmad fajar


-----Original Message-----
From: Oleg Bartunov [mailto:oleg@sai.msu.su]
Sent: Jumat, 23 September 2005 18:26
To: Ahmad Fajar
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] tsearch2 seem very slow

On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
> only because the result still in postgres cache, but how about in the
first
> run? I didn't dare, the values is un-acceptable. Because my table will
grows
> rapidly, it's about 100000 rows per-week. And the visitor will search
> anything that I don't know, whether it's the repeated search or new
search,
> or whether it's in postgres cache or not.

if you have enoush shared memory postgresql will keep index pages there.


>
> I just compare with http://www.postgresql.org, the search is quite fast,
and
> I don't know whether the site uses tsearch2 or something else. But as fas
as
> I know, if the rows reach >100 milion (I have try for 200 milion rows and
it
> seem very slow), even if don't use tsearch2, only use simple query like:
> select f1, f2 from table1 where f2='blabla',
> and f2 is indexes, my postgres still slow on the first time, about >10
sec.
> because of this I tried something brand new to fullfill my needs. I have
> used fti, and tsearch2 but still slow.
>
> I don't know what's going wrong with my postgres, what configuration must
I
> do to perform the query get fast result. Or must I use enterprisedb 2005
or
> pervasive postgres (both uses postgres), I don't know very much about
these
> two products.

you didn't show us your configuration (hardware,postgresql and tsearch2),
explain analyze of your queries, so we can't help you.
How big is your database, tsearch2 index size ?


>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Sent: Jumat, 23 September 2005 14:36
> To: Ahmad Fajar
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] tsearch2 seem very slow
>
> Ahmad,
>
> how fast is repeated runs ? First time system could be very slow.
> Also, have you checked my page
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
> and some info about tsearch2 internals
> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>
>     Oleg
> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>
>> I have about 419804 rows in my article table. I have installed tsearch2
> and
>> its gist index correctly.
>>
>> My table structure is:
>>
>> CREATE TABLE tbarticles
>>
>> (
>>
>>  articleid int4 NOT NULL,
>>
>>  title varchar(250),
>>
>>  mediaid int4,
>>
>>  datee date,
>>
>>  content text,
>>
>>  contentvar text,
>>
>>  mmcol float4 NOT NULL,
>>
>>  sirkulasi float4,
>>
>>  page varchar(10),
>>
>>  tglisidata date,
>>
>>  namapc varchar(12),
>>
>>  usere varchar(12),
>>
>>  file_pdf varchar(255),
>>
>>  file_pdf2 varchar(50),
>>
>>  kolom int4,
>>
>>  size_jpeg int4,
>>
>>  journalist varchar(120),
>>
>>  ratebw float4,
>>
>>  ratefc float4,
>>
>>  fti tsvector,
>>
>>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>
>> ) WITHOUT OIDS;
>>
>> Create index fti_idx1 on tbarticles using gist (fti);
>>
>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>
>>
>>
>> But when I search something like:
>>
>> Select articleid, title, datee from tbarticles where fti @@
>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>
>> It takes about 30 sec. I run explain analyze and the index is used
>> correctly.
>>
>>
>>
>> Then I try multi column index to filter by date, and my query something
>> like:
>>
>> Select articleid, title, datee from tbarticles where fti @@
>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
> '2002-01-01'
>> and datee <= current_date
>>
>> An it still run about 25 sec. I do run explain analyze and my multicolumn
>> index is used correctly.
>>
>> This is not acceptable if want to publish my website if the search took
> very
>> longer.
>>
>>
>>
>> I have run vacuum full analyze before doing such query. What going wrong
>> with my query?? Is there any way to make this faster?
>>
>> I have try to tune my postgres configuration, but it seem helpless. My
> linux
>> box is Redhat 4 AS, and
>>
>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure
> as
>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>
>>
>>
>> Please.help.help.
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


Re: tsearch2 seem very slow

From
Oleg Bartunov
Date:
Ahmad,

what's about the number of unique words ? I mean stat() function.
Sometimes, it helps to identify garbage words.
How big is your articles (average length) ?

please, cut'n paste queries and output from psql ! How fast are
next queries ?

     Oleg
On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> For single index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&indonesia');
>
> analyze result:
> ----------------
> "Index Scan using fti_idx on articles  (cost=0.00..862.97 rows=420 width=51)
> (actual time=0.067..183761.324 rows=46186 loops=1)"
> "  Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
> "Total runtime: 183837.826 ms"
>
> And for multicolumn index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&mega');
>
> analyze result:
> ----------------
> "Index Scan using articles_x1 on articles  (cost=0.00..848.01 rows=410
> width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
> "  Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
> ('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
> "Total runtime: 37933.757 ms"
>
> The table structure is as mention on the first talk. If you wanna know how
> much table in my database, it's about 100 tables or maybe more. Now I
> develop the version 2 of my web application, you can take a look at:
> http://www.mediatrac.net, so it will hold many datas. But the biggest table
> is article's table. On develop this version 2 I just use half data of the
> article's table (about 419804 rows). May be if I import all of the article's
> table data it will have 1 million rows. The article's table grows rapidly,
> about 100000 rows per-week. My developing database size is 28 GB (not real
> database, coz I still develop the version 2 and I use half of the data for
> play around). I just wanna to perform quick search (fulltext search) on my
> article's table not other table. On version 1, the current running version I
> use same hardware spesification as mention below, but there is no fulltext
> search. So I develop the new version with new features, new interface and
> include the fulltext search.
>
> I do know, if the application finish, I must use powerfull hardware. But how
> can I guarantee the application will run smooth, if I do fulltext search on
> 419804 rows in a table it took a long time to get the result.
>
> Could you or friends in this maling-list help me....plz..plzz
>
> Tsearch2 configuration:
> -------------------------
> I use default configuration, english stop word file as tsearch2 provide,
> stem dictionary as default (coz I don't know how to configure and add new
> data to stem dictionary) and I add some words to the english stop word file.
>
> Postgresql configuration
> -------------------------
> max_connections = 32
> shared_buffers = 32768
> sort_mem = 8192
> vacuum_mem = 65536
> work_mem = 16384
> maintenance_work_mem = 65536
> max_fsm_pages = 30000
> max_fsm_relations = 1000
> max_files_per_process = 100000
> checkpoint_segments = 15
> effective_cache_size = 192000
> random_page_cost = 2
> geqo = true
> geqo_threshold = 50
> geqo_effort = 5
> geqo_pool_size = 0
> geqo_generations = 0
> geqo_selection_bias = 2.0
> from_collapse_limit = 10
> join_collapse_limit = 15
>
> OS configuration:
> ------------------
> I use Redhat 4 AS, kernel 2.6.9-11
> kernel.shmmax=1073741824
> kernel.sem=250 32000 100 128
> fs.aio-max-nr=5242880
> the server I configure just only for postgresql, no other service is running
> like: www, samba, ftp, email, firewall
>
> hardware configuration:
> ------------------------
> Motherboard ASUS P5GD1
> Processor P4 3,2 GHz
> Memory 2 GB DDR 400,
> 2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for postgresql
> data and the partition is EXT3
> 1x80 GB EIDE 7200 RPM configure for system and home directory and the
> partiton is EXT3
>
> Did I miss something?
>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Sent: Jumat, 23 September 2005 18:26
> To: Ahmad Fajar
> Cc: pgsql-performance@postgresql.org
> Subject: RE: [PERFORM] tsearch2 seem very slow
>
> On Fri, 23 Sep 2005, Ahmad Fajar wrote:
>
>> Hi Oleg,
>>
>> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
>> only because the result still in postgres cache, but how about in the
> first
>> run? I didn't dare, the values is un-acceptable. Because my table will
> grows
>> rapidly, it's about 100000 rows per-week. And the visitor will search
>> anything that I don't know, whether it's the repeated search or new
> search,
>> or whether it's in postgres cache or not.
>
> if you have enoush shared memory postgresql will keep index pages there.
>
>
>>
>> I just compare with http://www.postgresql.org, the search is quite fast,
> and
>> I don't know whether the site uses tsearch2 or something else. But as fas
> as
>> I know, if the rows reach >100 milion (I have try for 200 milion rows and
> it
>> seem very slow), even if don't use tsearch2, only use simple query like:
>> select f1, f2 from table1 where f2='blabla',
>> and f2 is indexes, my postgres still slow on the first time, about >10
> sec.
>> because of this I tried something brand new to fullfill my needs. I have
>> used fti, and tsearch2 but still slow.
>>
>> I don't know what's going wrong with my postgres, what configuration must
> I
>> do to perform the query get fast result. Or must I use enterprisedb 2005
> or
>> pervasive postgres (both uses postgres), I don't know very much about
> these
>> two products.
>
> you didn't show us your configuration (hardware,postgresql and tsearch2),
> explain analyze of your queries, so we can't help you.
> How big is your database, tsearch2 index size ?
>
>
>>
>> Regards,
>> ahmad fajar
>>
>>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
>> Sent: Jumat, 23 September 2005 14:36
>> To: Ahmad Fajar
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] tsearch2 seem very slow
>>
>> Ahmad,
>>
>> how fast is repeated runs ? First time system could be very slow.
>> Also, have you checked my page
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>> and some info about tsearch2 internals
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>>
>>     Oleg
>> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>>
>>> I have about 419804 rows in my article table. I have installed tsearch2
>> and
>>> its gist index correctly.
>>>
>>> My table structure is:
>>>
>>> CREATE TABLE tbarticles
>>>
>>> (
>>>
>>>  articleid int4 NOT NULL,
>>>
>>>  title varchar(250),
>>>
>>>  mediaid int4,
>>>
>>>  datee date,
>>>
>>>  content text,
>>>
>>>  contentvar text,
>>>
>>>  mmcol float4 NOT NULL,
>>>
>>>  sirkulasi float4,
>>>
>>>  page varchar(10),
>>>
>>>  tglisidata date,
>>>
>>>  namapc varchar(12),
>>>
>>>  usere varchar(12),
>>>
>>>  file_pdf varchar(255),
>>>
>>>  file_pdf2 varchar(50),
>>>
>>>  kolom int4,
>>>
>>>  size_jpeg int4,
>>>
>>>  journalist varchar(120),
>>>
>>>  ratebw float4,
>>>
>>>  ratefc float4,
>>>
>>>  fti tsvector,
>>>
>>>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>>
>>> ) WITHOUT OIDS;
>>>
>>> Create index fti_idx1 on tbarticles using gist (fti);
>>>
>>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>>
>>>
>>>
>>> But when I search something like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>>
>>> It takes about 30 sec. I run explain analyze and the index is used
>>> correctly.
>>>
>>>
>>>
>>> Then I try multi column index to filter by date, and my query something
>>> like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
>> '2002-01-01'
>>> and datee <= current_date
>>>
>>> An it still run about 25 sec. I do run explain analyze and my multicolumn
>>> index is used correctly.
>>>
>>> This is not acceptable if want to publish my website if the search took
>> very
>>> longer.
>>>
>>>
>>>
>>> I have run vacuum full analyze before doing such query. What going wrong
>>> with my query?? Is there any way to make this faster?
>>>
>>> I have try to tune my postgres configuration, but it seem helpless. My
>> linux
>>> box is Redhat 4 AS, and
>>>
>>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and configure
>> as
>>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>>
>>>
>>>
>>> Please.help.help.
>>>
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 seem very slow

From
"Ahmad Fajar"
Date:
Hi Oleg,

Sorry for my late. From the stat() function I got 1,5 million rows, although
I've added garbage words to the stop word file, there seem still have
garbage words. So I ask for my team to identify the garbage words and add to
stop words and I will update the articles after that. And about my articles,
it is quite big enough, the average length is about 2900 characters. And I
think, I have to tune tsearch2 and concentrate to the garbage words. The
most articles are indonesian language. What others way to tune the tsearch2
beside the garbage words?

Beside that, I still have problem, if I do a simple query like:
Select ids, keywords from dict where keywords='blabla' ('blabla' is a single
word); The table have 200 million rows, I have index the keywords field. On
the first time my query seem to slow to get the result, about 15-60 sec to
get the result. I use latest pgAdmin3 to test all queries. But if I repeat
the query I will get fast result. My question is why on the first time the
query seem to slow.

I try to cluster the table base on keyword index, but after 15 hours waiting
and it doesn't finish I stop clustering. Now I think I have to change the
file system for postgresql data. Do you have any idea what best for
postgresql, JFS or XFS? I will not try reiserfs, because there are some
rumors about reiserfs stability, although reiserfs is fast enough for
postgresql. And must I down grade my postgresql from version 8.0.3 to 7.4.8?



Regards,
ahmad fajar

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Oleg Bartunov
Sent: Saturday, September 24, 2005 1:08 PM
To: Ahmad Fajar
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] tsearch2 seem very slow

Ahmad,

what's about the number of unique words ? I mean stat() function.
Sometimes, it helps to identify garbage words.
How big is your articles (average length) ?

please, cut'n paste queries and output from psql ! How fast are
next queries ?

     Oleg
On Fri, 23 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> For single index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&indonesia');
>
> analyze result:
> ----------------
> "Index Scan using fti_idx on articles  (cost=0.00..862.97 rows=420
width=51)
> (actual time=0.067..183761.324 rows=46186 loops=1)"
> "  Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
> "Total runtime: 183837.826 ms"
>
> And for multicolumn index I try this query:
> explain analyze
> select articleid, title, datee from articles
> where fti @@ to_tsquery('bank&mega');
>
> analyze result:
> ----------------
> "Index Scan using articles_x1 on articles  (cost=0.00..848.01 rows=410
> width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
> "  Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
> ('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
> "Total runtime: 37933.757 ms"
>
> The table structure is as mention on the first talk. If you wanna know how
> much table in my database, it's about 100 tables or maybe more. Now I
> develop the version 2 of my web application, you can take a look at:
> http://www.mediatrac.net, so it will hold many datas. But the biggest
table
> is article's table. On develop this version 2 I just use half data of the
> article's table (about 419804 rows). May be if I import all of the
article's
> table data it will have 1 million rows. The article's table grows rapidly,
> about 100000 rows per-week. My developing database size is 28 GB (not real
> database, coz I still develop the version 2 and I use half of the data for
> play around). I just wanna to perform quick search (fulltext search) on my
> article's table not other table. On version 1, the current running version
I
> use same hardware spesification as mention below, but there is no fulltext
> search. So I develop the new version with new features, new interface and
> include the fulltext search.
>
> I do know, if the application finish, I must use powerfull hardware. But
how
> can I guarantee the application will run smooth, if I do fulltext search
on
> 419804 rows in a table it took a long time to get the result.
>
> Could you or friends in this maling-list help me....plz..plzz
>
> Tsearch2 configuration:
> -------------------------
> I use default configuration, english stop word file as tsearch2 provide,
> stem dictionary as default (coz I don't know how to configure and add new
> data to stem dictionary) and I add some words to the english stop word
file.
>
> Postgresql configuration
> -------------------------
> max_connections = 32
> shared_buffers = 32768
> sort_mem = 8192
> vacuum_mem = 65536
> work_mem = 16384
> maintenance_work_mem = 65536
> max_fsm_pages = 30000
> max_fsm_relations = 1000
> max_files_per_process = 100000
> checkpoint_segments = 15
> effective_cache_size = 192000
> random_page_cost = 2
> geqo = true
> geqo_threshold = 50
> geqo_effort = 5
> geqo_pool_size = 0
> geqo_generations = 0
> geqo_selection_bias = 2.0
> from_collapse_limit = 10
> join_collapse_limit = 15
>
> OS configuration:
> ------------------
> I use Redhat 4 AS, kernel 2.6.9-11
> kernel.shmmax=1073741824
> kernel.sem=250 32000 100 128
> fs.aio-max-nr=5242880
> the server I configure just only for postgresql, no other service is
running
> like: www, samba, ftp, email, firewall
>
> hardware configuration:
> ------------------------
> Motherboard ASUS P5GD1
> Processor P4 3,2 GHz
> Memory 2 GB DDR 400,
> 2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for
postgresql
> data and the partition is EXT3
> 1x80 GB EIDE 7200 RPM configure for system and home directory and the
> partiton is EXT3
>
> Did I miss something?
>
> Regards,
> ahmad fajar
>
>
> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Sent: Jumat, 23 September 2005 18:26
> To: Ahmad Fajar
> Cc: pgsql-performance@postgresql.org
> Subject: RE: [PERFORM] tsearch2 seem very slow
>
> On Fri, 23 Sep 2005, Ahmad Fajar wrote:
>
>> Hi Oleg,
>>
>> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
>> only because the result still in postgres cache, but how about in the
> first
>> run? I didn't dare, the values is un-acceptable. Because my table will
> grows
>> rapidly, it's about 100000 rows per-week. And the visitor will search
>> anything that I don't know, whether it's the repeated search or new
> search,
>> or whether it's in postgres cache or not.
>
> if you have enoush shared memory postgresql will keep index pages there.
>
>
>>
>> I just compare with http://www.postgresql.org, the search is quite fast,
> and
>> I don't know whether the site uses tsearch2 or something else. But as fas
> as
>> I know, if the rows reach >100 milion (I have try for 200 milion rows and
> it
>> seem very slow), even if don't use tsearch2, only use simple query like:
>> select f1, f2 from table1 where f2='blabla',
>> and f2 is indexes, my postgres still slow on the first time, about >10
> sec.
>> because of this I tried something brand new to fullfill my needs. I have
>> used fti, and tsearch2 but still slow.
>>
>> I don't know what's going wrong with my postgres, what configuration must
> I
>> do to perform the query get fast result. Or must I use enterprisedb 2005
> or
>> pervasive postgres (both uses postgres), I don't know very much about
> these
>> two products.
>
> you didn't show us your configuration (hardware,postgresql and tsearch2),
> explain analyze of your queries, so we can't help you.
> How big is your database, tsearch2 index size ?
>
>
>>
>> Regards,
>> ahmad fajar
>>
>>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
>> Sent: Jumat, 23 September 2005 14:36
>> To: Ahmad Fajar
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] tsearch2 seem very slow
>>
>> Ahmad,
>>
>> how fast is repeated runs ? First time system could be very slow.
>> Also, have you checked my page
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>> and some info about tsearch2 internals
>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>>
>>     Oleg
>> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>>
>>> I have about 419804 rows in my article table. I have installed tsearch2
>> and
>>> its gist index correctly.
>>>
>>> My table structure is:
>>>
>>> CREATE TABLE tbarticles
>>>
>>> (
>>>
>>>  articleid int4 NOT NULL,
>>>
>>>  title varchar(250),
>>>
>>>  mediaid int4,
>>>
>>>  datee date,
>>>
>>>  content text,
>>>
>>>  contentvar text,
>>>
>>>  mmcol float4 NOT NULL,
>>>
>>>  sirkulasi float4,
>>>
>>>  page varchar(10),
>>>
>>>  tglisidata date,
>>>
>>>  namapc varchar(12),
>>>
>>>  usere varchar(12),
>>>
>>>  file_pdf varchar(255),
>>>
>>>  file_pdf2 varchar(50),
>>>
>>>  kolom int4,
>>>
>>>  size_jpeg int4,
>>>
>>>  journalist varchar(120),
>>>
>>>  ratebw float4,
>>>
>>>  ratefc float4,
>>>
>>>  fti tsvector,
>>>
>>>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>>
>>> ) WITHOUT OIDS;
>>>
>>> Create index fti_idx1 on tbarticles using gist (fti);
>>>
>>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>>
>>>
>>>
>>> But when I search something like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>>
>>> It takes about 30 sec. I run explain analyze and the index is used
>>> correctly.
>>>
>>>
>>>
>>> Then I try multi column index to filter by date, and my query something
>>> like:
>>>
>>> Select articleid, title, datee from tbarticles where fti @@
>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
>> '2002-01-01'
>>> and datee <= current_date
>>>
>>> An it still run about 25 sec. I do run explain analyze and my
multicolumn
>>> index is used correctly.
>>>
>>> This is not acceptable if want to publish my website if the search took
>> very
>>> longer.
>>>
>>>
>>>
>>> I have run vacuum full analyze before doing such query. What going wrong
>>> with my query?? Is there any way to make this faster?
>>>
>>> I have try to tune my postgres configuration, but it seem helpless. My
>> linux
>>> box is Redhat 4 AS, and
>>>
>>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and
configure
>> as
>>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>>
>>>
>>>
>>> Please.help.help.
>>>
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


Re: tsearch2 seem very slow

From
Oleg Bartunov
Date:
On Mon, 26 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
> Sorry for my late. From the stat() function I got 1,5 million rows, although
> I've added garbage words to the stop word file, there seem still have
> garbage words. So I ask for my team to identify the garbage words and add to

what king of garbage ? Probably you index not needed token types, for
example, email address, file names....

> stop words and I will update the articles after that. And about my articles,
> it is quite big enough, the average length is about 2900 characters. And I
> think, I have to tune tsearch2 and concentrate to the garbage words. The
> most articles are indonesian language. What others way to tune the tsearch2
> beside the garbage words?

do you need proximity ? If no, use strip(tsvector) function to remove
coordinate information from tsvector.

don't index default configuration and index only needed tokens, for
example, to index only 3 type of tokens, first create 'qq' configuration
and specify tokens to index.

insert into pg_ts_cfg values('qq','default','en_US');
-- tokens to index
insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}');


>
> Beside that, I still have problem, if I do a simple query like:
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a single
> word); The table have 200 million rows, I have index the keywords field. On
> the first time my query seem to slow to get the result, about 15-60 sec to
> get the result. I use latest pgAdmin3 to test all queries. But if I repeat
> the query I will get fast result. My question is why on the first time the
> query seem to slow.

because index pages should be readed from disk into shared buffers, so next
query will benefit from that. You need enough shared memory to get real
benefit. You may get postgresql stats and look on cache hit ration.

btw, how does your query ( keywords='blabla') relates to tsearch2 ?

>
> I try to cluster the table base on keyword index, but after 15 hours waiting
> and it doesn't finish I stop clustering. Now I think I have to change the

don't use cluster for big tables ! simple
  select *  into clustered_foo from foo order by indexed_field
would be faster and does the same job.

> file system for postgresql data. Do you have any idea what best for
> postgresql, JFS or XFS? I will not try reiserfs, because there are some
> rumors about reiserfs stability, although reiserfs is fast enough for
> postgresql. And must I down grade my postgresql from version 8.0.3 to 7.4.8?
>

I'm not experienced with filesystems :)


>
>
> Regards,
> ahmad fajar
>
> -----Original Message-----
> From: pgsql-performance-owner@postgresql.org
> [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Oleg Bartunov
> Sent: Saturday, September 24, 2005 1:08 PM
> To: Ahmad Fajar
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] tsearch2 seem very slow
>
> Ahmad,
>
> what's about the number of unique words ? I mean stat() function.
> Sometimes, it helps to identify garbage words.
> How big is your articles (average length) ?
>
> please, cut'n paste queries and output from psql ! How fast are
> next queries ?
>
>     Oleg
> On Fri, 23 Sep 2005, Ahmad Fajar wrote:
>
>> Hi Oleg,
>>
>> For single index I try this query:
>> explain analyze
>> select articleid, title, datee from articles
>> where fti @@ to_tsquery('bank&indonesia');
>>
>> analyze result:
>> ----------------
>> "Index Scan using fti_idx on articles  (cost=0.00..862.97 rows=420
> width=51)
>> (actual time=0.067..183761.324 rows=46186 loops=1)"
>> "  Index Cond: (fti @@ '\'bank\' & \'indonesia\''::tsquery)"
>> "Total runtime: 183837.826 ms"
>>
>> And for multicolumn index I try this query:
>> explain analyze
>> select articleid, title, datee from articles
>> where fti @@ to_tsquery('bank&mega');
>>
>> analyze result:
>> ----------------
>> "Index Scan using articles_x1 on articles  (cost=0.00..848.01 rows=410
>> width=51) (actual time=52.204..37914.135 rows=1841 loops=1)"
>> "  Index Cond: ((datee >= '2002-01-01'::date) AND (datee <=
>> ('now'::text)::date) AND (fti @@ '\'bank\' & \'mega\''::tsquery))"
>> "Total runtime: 37933.757 ms"
>>
>> The table structure is as mention on the first talk. If you wanna know how
>> much table in my database, it's about 100 tables or maybe more. Now I
>> develop the version 2 of my web application, you can take a look at:
>> http://www.mediatrac.net, so it will hold many datas. But the biggest
> table
>> is article's table. On develop this version 2 I just use half data of the
>> article's table (about 419804 rows). May be if I import all of the
> article's
>> table data it will have 1 million rows. The article's table grows rapidly,
>> about 100000 rows per-week. My developing database size is 28 GB (not real
>> database, coz I still develop the version 2 and I use half of the data for
>> play around). I just wanna to perform quick search (fulltext search) on my
>> article's table not other table. On version 1, the current running version
> I
>> use same hardware spesification as mention below, but there is no fulltext
>> search. So I develop the new version with new features, new interface and
>> include the fulltext search.
>>
>> I do know, if the application finish, I must use powerfull hardware. But
> how
>> can I guarantee the application will run smooth, if I do fulltext search
> on
>> 419804 rows in a table it took a long time to get the result.
>>
>> Could you or friends in this maling-list help me....plz..plzz
>>
>> Tsearch2 configuration:
>> -------------------------
>> I use default configuration, english stop word file as tsearch2 provide,
>> stem dictionary as default (coz I don't know how to configure and add new
>> data to stem dictionary) and I add some words to the english stop word
> file.
>>
>> Postgresql configuration
>> -------------------------
>> max_connections = 32
>> shared_buffers = 32768
>> sort_mem = 8192
>> vacuum_mem = 65536
>> work_mem = 16384
>> maintenance_work_mem = 65536
>> max_fsm_pages = 30000
>> max_fsm_relations = 1000
>> max_files_per_process = 100000
>> checkpoint_segments = 15
>> effective_cache_size = 192000
>> random_page_cost = 2
>> geqo = true
>> geqo_threshold = 50
>> geqo_effort = 5
>> geqo_pool_size = 0
>> geqo_generations = 0
>> geqo_selection_bias = 2.0
>> from_collapse_limit = 10
>> join_collapse_limit = 15
>>
>> OS configuration:
>> ------------------
>> I use Redhat 4 AS, kernel 2.6.9-11
>> kernel.shmmax=1073741824
>> kernel.sem=250 32000 100 128
>> fs.aio-max-nr=5242880
>> the server I configure just only for postgresql, no other service is
> running
>> like: www, samba, ftp, email, firewall
>>
>> hardware configuration:
>> ------------------------
>> Motherboard ASUS P5GD1
>> Processor P4 3,2 GHz
>> Memory 2 GB DDR 400,
>> 2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for
> postgresql
>> data and the partition is EXT3
>> 1x80 GB EIDE 7200 RPM configure for system and home directory and the
>> partiton is EXT3
>>
>> Did I miss something?
>>
>> Regards,
>> ahmad fajar
>>
>>
>> -----Original Message-----
>> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
>> Sent: Jumat, 23 September 2005 18:26
>> To: Ahmad Fajar
>> Cc: pgsql-performance@postgresql.org
>> Subject: RE: [PERFORM] tsearch2 seem very slow
>>
>> On Fri, 23 Sep 2005, Ahmad Fajar wrote:
>>
>>> Hi Oleg,
>>>
>>> I didn't deny on the third repeat or more, it can reach < 600 msec. It is
>>> only because the result still in postgres cache, but how about in the
>> first
>>> run? I didn't dare, the values is un-acceptable. Because my table will
>> grows
>>> rapidly, it's about 100000 rows per-week. And the visitor will search
>>> anything that I don't know, whether it's the repeated search or new
>> search,
>>> or whether it's in postgres cache or not.
>>
>> if you have enoush shared memory postgresql will keep index pages there.
>>
>>
>>>
>>> I just compare with http://www.postgresql.org, the search is quite fast,
>> and
>>> I don't know whether the site uses tsearch2 or something else. But as fas
>> as
>>> I know, if the rows reach >100 milion (I have try for 200 milion rows and
>> it
>>> seem very slow), even if don't use tsearch2, only use simple query like:
>>> select f1, f2 from table1 where f2='blabla',
>>> and f2 is indexes, my postgres still slow on the first time, about >10
>> sec.
>>> because of this I tried something brand new to fullfill my needs. I have
>>> used fti, and tsearch2 but still slow.
>>>
>>> I don't know what's going wrong with my postgres, what configuration must
>> I
>>> do to perform the query get fast result. Or must I use enterprisedb 2005
>> or
>>> pervasive postgres (both uses postgres), I don't know very much about
>> these
>>> two products.
>>
>> you didn't show us your configuration (hardware,postgresql and tsearch2),
>> explain analyze of your queries, so we can't help you.
>> How big is your database, tsearch2 index size ?
>>
>>
>>>
>>> Regards,
>>> ahmad fajar
>>>
>>>
>>> -----Original Message-----
>>> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
>>> Sent: Jumat, 23 September 2005 14:36
>>> To: Ahmad Fajar
>>> Cc: pgsql-performance@postgresql.org
>>> Subject: Re: [PERFORM] tsearch2 seem very slow
>>>
>>> Ahmad,
>>>
>>> how fast is repeated runs ? First time system could be very slow.
>>> Also, have you checked my page
>>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes
>>> and some info about tsearch2 internals
>>> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals
>>>
>>>     Oleg
>>> On Thu, 22 Sep 2005, Ahmad Fajar wrote:
>>>
>>>> I have about 419804 rows in my article table. I have installed tsearch2
>>> and
>>>> its gist index correctly.
>>>>
>>>> My table structure is:
>>>>
>>>> CREATE TABLE tbarticles
>>>>
>>>> (
>>>>
>>>>  articleid int4 NOT NULL,
>>>>
>>>>  title varchar(250),
>>>>
>>>>  mediaid int4,
>>>>
>>>>  datee date,
>>>>
>>>>  content text,
>>>>
>>>>  contentvar text,
>>>>
>>>>  mmcol float4 NOT NULL,
>>>>
>>>>  sirkulasi float4,
>>>>
>>>>  page varchar(10),
>>>>
>>>>  tglisidata date,
>>>>
>>>>  namapc varchar(12),
>>>>
>>>>  usere varchar(12),
>>>>
>>>>  file_pdf varchar(255),
>>>>
>>>>  file_pdf2 varchar(50),
>>>>
>>>>  kolom int4,
>>>>
>>>>  size_jpeg int4,
>>>>
>>>>  journalist varchar(120),
>>>>
>>>>  ratebw float4,
>>>>
>>>>  ratefc float4,
>>>>
>>>>  fti tsvector,
>>>>
>>>>  CONSTRAINT pk_tbarticles PRIMARY KEY (articleid)
>>>>
>>>> ) WITHOUT OIDS;
>>>>
>>>> Create index fti_idx1 on tbarticles using gist (fti);
>>>>
>>>> Create index fti_idx2 on tbarticles using gist (datee, fti);
>>>>
>>>>
>>>>
>>>> But when I search something like:
>>>>
>>>> Select articleid, title, datee from tbarticles where fti @@
>>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla');
>>>>
>>>> It takes about 30 sec. I run explain analyze and the index is used
>>>> correctly.
>>>>
>>>>
>>>>
>>>> Then I try multi column index to filter by date, and my query something
>>>> like:
>>>>
>>>> Select articleid, title, datee from tbarticles where fti @@
>>>> to_tsquery('susilo&bambang&yudhoyono&jusuf&kalla') and datee >=
>>> '2002-01-01'
>>>> and datee <= current_date
>>>>
>>>> An it still run about 25 sec. I do run explain analyze and my
> multicolumn
>>>> index is used correctly.
>>>>
>>>> This is not acceptable if want to publish my website if the search took
>>> very
>>>> longer.
>>>>
>>>>
>>>>
>>>> I have run vacuum full analyze before doing such query. What going wrong
>>>> with my query?? Is there any way to make this faster?
>>>>
>>>> I have try to tune my postgres configuration, but it seem helpless. My
>>> linux
>>>> box is Redhat 4 AS, and
>>>>
>>>> the hardware: 2 GB RAM DDR 400, 2x200 GB Serial ATA 7200RPM and
> configure
>>> as
>>>> RAID0 (just for postgres data), my sistem run at EIDE 80GB 7200 RPM.
>>>>
>>>>
>>>>
>>>> Please.help.help.
>>>>
>>>>
>>>
>>>     Regards,
>>>         Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>>> Sternberg Astronomical Institute, Moscow University (Russia)
>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>>> phone: +007(095)939-16-83, +007(095)939-23-83
>>>
>>
>>     Regards,
>>         Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
>> Sternberg Astronomical Institute, Moscow University (Russia)
>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 seem very slow

From
"Ahmad Fajar"
Date:
Hi Oleg,

> what king of garbage ? Probably you index not needed token types, for
> example, email address, file names....

> do you need proximity ? If no, use strip(tsvector) function to remove
> coordinate information from tsvector.

I need proximity. Some time I have to rank my article and make a chart for
that.

> don't index default configuration and index only needed tokens, for
> example, to index only 3 type of tokens, first create 'qq' configuration
> and specify tokens to index.

> insert into pg_ts_cfg values('qq','default','en_US');
-- tokens to index
> insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
> insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
> insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}');

I still don't understand about tsearch2 configuration, so until now I just
use default configuration. I will try your suggestion. But how can I get the
en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}';
From default configuration I only see: ..., '{en_stem}';

> Beside that, I still have problem, if I do a simple query like:
> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
single
> word); The table have 200 million rows, I have index the keywords field.
On
> the first time my query seem to slow to get the result, about 15-60 sec to
> get the result. I use latest pgAdmin3 to test all queries. But if I repeat
> the query I will get fast result. My question is why on the first time the
> query seem to slow.

> because index pages should be readed from disk into shared buffers, so
> next query will benefit from that. You need enough shared memory to get
> real benefit. You may get postgresql stats and look on cache hit ration.

> btw, how does your query ( keywords='blabla') relates to tsearch2 ?

(Keywords='blabla') isn't related to tsearch2, I just got an idea from
tsearch2 and try different approach. But I stuck on the query result speed.
Very slow to get result on the first query.
And how to see postgresql stats and look on cache hit ratio? I still don't
know how to get it.

> I try to cluster the table base on keyword index, but after 15 hours
> waiting and it doesn't finish I stop clustering.

> don't use cluster for big tables ! simple
>  select *  into clustered_foo from foo order by indexed_field
> would be faster and does the same job.

What the use of clustered_foo table? And how to use it?
I think it will not distinct duplicate rows. And the clustered_foo table
still not have an index, so if query to this table, I think the query will
be very slow to get a result.

Regards,
ahmad fajar


Re: tsearch2 seem very slow

From
Oleg Bartunov
Date:
Ahmad,

On Mon, 26 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
>> what king of garbage ? Probably you index not needed token types, for
>> example, email address, file names....
>
>> do you need proximity ? If no, use strip(tsvector) function to remove
>> coordinate information from tsvector.
>
> I need proximity. Some time I have to rank my article and make a chart for
> that.
>
>> don't index default configuration and index only needed tokens, for
>> example, to index only 3 type of tokens, first create 'qq' configuration
>> and specify tokens to index.
>
>> insert into pg_ts_cfg values('qq','default','en_US');
> -- tokens to index
>> insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
>> insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
>> insert into pg_ts_cfgmap values('qq','lpart_hword','{en_ispell,en_stem}');
>
> I still don't understand about tsearch2 configuration, so until now I just
> use default configuration. I will try your suggestion. But how can I get the
> en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}';
>> From default configuration I only see: ..., '{en_stem}';

I think you should read documentation. I couldn't explain you things already
written.

>
>> Beside that, I still have problem, if I do a simple query like:
>> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
>> word); The table have 200 million rows, I have index the keywords field.
> On
>> the first time my query seem to slow to get the result, about 15-60 sec to
>> get the result. I use latest pgAdmin3 to test all queries. But if I repeat
>> the query I will get fast result. My question is why on the first time the
>> query seem to slow.
>
>> because index pages should be readed from disk into shared buffers, so
>> next query will benefit from that. You need enough shared memory to get
>> real benefit. You may get postgresql stats and look on cache hit ration.
>
>> btw, how does your query ( keywords='blabla') relates to tsearch2 ?
>
> (Keywords='blabla') isn't related to tsearch2, I just got an idea from
> tsearch2 and try different approach. But I stuck on the query result speed.
> Very slow to get result on the first query.
> And how to see postgresql stats and look on cache hit ratio? I still don't
> know how to get it.
>

learn from http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

>> I try to cluster the table base on keyword index, but after 15 hours
>> waiting and it doesn't finish I stop clustering.
>
>> don't use cluster for big tables ! simple
>>  select *  into clustered_foo from foo order by indexed_field
>> would be faster and does the same job.
>
> What the use of clustered_foo table? And how to use it?
> I think it will not distinct duplicate rows. And the clustered_foo table
> still not have an index, so if query to this table, I think the query will
> be very slow to get a result.

oh guy, you certainly need to read documentation
http://www.postgresql.org/docs/8.0/static/sql-cluster.html


>
> Regards,
> ahmad fajar
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: tsearch2 seem very slow

From
"Ahmad Fajar"
Date:
Hi Oleg,

Thanks, I will read your documentation again, and try to understand what I
miss. And about pgmanual, it is very help me. I'll take attention on that.

Regards,
ahmad fajar

-----Original Message-----
From: Oleg Bartunov [mailto:oleg@sai.msu.su]
Sent: Monday, September 26, 2005 3:12 AM
To: Ahmad Fajar
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] tsearch2 seem very slow

Ahmad,

On Mon, 26 Sep 2005, Ahmad Fajar wrote:

> Hi Oleg,
>
>> what king of garbage ? Probably you index not needed token types, for
>> example, email address, file names....
>
>> do you need proximity ? If no, use strip(tsvector) function to remove
>> coordinate information from tsvector.
>
> I need proximity. Some time I have to rank my article and make a chart for
> that.
>
>> don't index default configuration and index only needed tokens, for
>> example, to index only 3 type of tokens, first create 'qq' configuration
>> and specify tokens to index.
>
>> insert into pg_ts_cfg values('qq','default','en_US');
> -- tokens to index
>> insert into pg_ts_cfgmap values('qq','lhword','{en_ispell,en_stem}');
>> insert into pg_ts_cfgmap values('qq','lword','{en_ispell,en_stem}');
>> insert into pg_ts_cfgmap
values('qq','lpart_hword','{en_ispell,en_stem}');
>
> I still don't understand about tsearch2 configuration, so until now I just
> use default configuration. I will try your suggestion. But how can I get
the
> en_ispell? Does my system will know if I use: ....,'{en_ispell,en_stem}';
>> From default configuration I only see: ..., '{en_stem}';

I think you should read documentation. I couldn't explain you things already
written.

>
>> Beside that, I still have problem, if I do a simple query like:
>> Select ids, keywords from dict where keywords='blabla' ('blabla' is a
> single
>> word); The table have 200 million rows, I have index the keywords field.
> On
>> the first time my query seem to slow to get the result, about 15-60 sec
to
>> get the result. I use latest pgAdmin3 to test all queries. But if I
repeat
>> the query I will get fast result. My question is why on the first time
the
>> query seem to slow.
>
>> because index pages should be readed from disk into shared buffers, so
>> next query will benefit from that. You need enough shared memory to get
>> real benefit. You may get postgresql stats and look on cache hit ration.
>
>> btw, how does your query ( keywords='blabla') relates to tsearch2 ?
>
> (Keywords='blabla') isn't related to tsearch2, I just got an idea from
> tsearch2 and try different approach. But I stuck on the query result
speed.
> Very slow to get result on the first query.
> And how to see postgresql stats and look on cache hit ratio? I still don't
> know how to get it.
>

learn from http://www.postgresql.org/docs/8.0/static/monitoring-stats.html

>> I try to cluster the table base on keyword index, but after 15 hours
>> waiting and it doesn't finish I stop clustering.
>
>> don't use cluster for big tables ! simple
>>  select *  into clustered_foo from foo order by indexed_field
>> would be faster and does the same job.
>
> What the use of clustered_foo table? And how to use it?
> I think it will not distinct duplicate rows. And the clustered_foo table
> still not have an index, so if query to this table, I think the query will
> be very slow to get a result.

oh guy, you certainly need to read documentation
http://www.postgresql.org/docs/8.0/static/sql-cluster.html


>
> Regards,
> ahmad fajar
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83