Re: tesearch2 question - Mailing list pgsql-sql

From Sumeet
Subject Re: tesearch2 question
Date
Msg-id 7539aebb0703071240l6c9c091eq163b3df16040a06f@mail.gmail.com
Whole thread Raw
In response to Re: tesearch2 question  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: tesearch2 question  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-sql
Thanks Oleg,

My String message are Abstracts of papers, I did a

$ select avg(len) from (select length(abstract) as len from master_table limit 500) E;

          avg         
-----------------------
 1355.5907859078590786
(1 row)

so length is approx 1400.


I couldn't find any appropriate way to analyze the time for update queries, but what i did was a explain analyze

$ explain analyze select to_tsvector(article_title) from master_table limit 1000;

The total runtime was approx 500ms.

The server is Sun OS 5.10, with around 8gigs of RAM and 6cpus.

Thanks,
Sumeet.



On 3/7/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
On Wed, 7 Mar 2007, Sumeet wrote:

> Hi All,
>
> I'm trying to udpate a table containing  13149741 records. And its taking
> forever to complete this process.
>
> The update query i'm trying to run is for full text indexing similiar to
>
> UPDATE tblMessages SET idxFTI=to_tsvector(strMessage);
>

How big are your strMessage ? and what's your tsearch2 configuration ?
Can you estimate how long takes updating, for example, 1000 rows ?
It looks like your system is IO bound. What's your hardware ?

>
> Below are some of the stats which might be helpful for analyzing this
>
> $top
>
>  PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
> 3091 postgres   1  43    0   46M   38M cpu/1  200:06  3.20% postgres
> 5052 postgres   1  60    0  149M  134M sleep    0:17  3.12% postgres
>
> <<<here are the top 2 processes, out of which the first process i have been
> running almost for a day and a half and it is still running,
>
> This table which i'm trying to  update has 10 indexes
>
> =========================================================
> "a_article_pk" PRIMARY KEY, btree (id)
>   "a_article_uk_pmid" UNIQUE, btree (pmid)
>   "a_article_idx_abstract" btree ("substring"(abstract::text, 0, 255))
>   "a_article_idx_date_cr_year" btree (date_cr_year)
>   "a_article_idx_ml_journal_info_medline_ta" btree (ml_journal_info_a_ta)
>   "a_article_idx_owner" btree ("owner")
>   "a_article_idx_pmid" btree (pmid)
>   "a_article_idx_status" btree (status)
>   "a_article_idx_title" btree (article_title)
>   "a_master_t_idx_year_published" btree (published_year)
> ========================================================
> But no indexes on the field i'm trying to update. The field i'm trying to
> add is a new field.
> Can anyone help me out to figure out why is it taking so much time to update
> the table.
>
> Also as u see in the above indexes, I have some indexes on some varchar
> column which i feel are totally useless unless u so a exact string match.
> But does that help in any sense for improving the speed of retreiving the
> string just normally without any search on it?
>
> Thanks,
> Sumeet.
>

        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet ( www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83



--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.

pgsql-sql by date:

Previous
From: "Andrej Ricnik-Bay"
Date:
Subject: Re: SHA-1 vs MD5
Next
From: Oleg Bartunov
Date:
Subject: Re: tesearch2 question