Re: very long update gin index troubles back? - Mailing list pgsql-general

From Ivan Sergio Borgonovo
Subject Re: very long update gin index troubles back?
Date
Msg-id 20090129103459.50af0ae1@dawn.webthatworks.it
Whole thread Raw
In response to Re: very long update gin index troubles back?  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-general
On Wed, 28 Jan 2009 23:38:47 +0300 (MSK)
Oleg Bartunov <oleg@sai.msu.su> wrote:

> On Wed, 28 Jan 2009, Ivan Sergio Borgonovo wrote:

> > On Tue, 27 Jan 2009 20:45:53 +0300
> > Teodor Sigaev <teodor@sigaev.ru> wrote:

> >>> No matter if I drop the trigger that update agg content and the
> >>> fact that I'm just updating d, postgresql will update the
> >>> index?
> >> Yes, due to MVCC. Update of row could produce new version
> >> (tuple) and new version should be index as old one.

> > Does that mean that it could be a good choice to place the
> > tsvector in another table?

> this is a trade-off - additional join

gin index creation/update seems to put a lot of pressure on the box.

While the total amount of time saved for every search may exceed the
time saved avoiding to reindex when not strictly necessary, when the
box is reindexing a large set of records it is on its knees.

Consider that, excluding the update that cause the reindex, the
table could be considered "readonly" and during tests it was
actually "readonly" while in production I may expect no more than 10
clients waiting to write on the table during an index update.
Still I'm wondering how the index updated when I eg. do something
like:

create table1 (
 x int,
 y int,
 fti tsvector
);

update table1 set x=10 where y=11; and more than one record is
involved.
Will the whole table be locked till the whole statement end or the
table will be locked just when every single row is updated till that
row is reindexed?

Is there any way to control "reindexing" priority (a sort of nice)?

CONCURRENTLY doesn't look as what I'm looking for.

A solution that could be just fired and forgotten could be OK.
Once you define an index CONCURRENTLY I can't understand how to
check for "invalidity" and when.

The problem of scheduling the update of the tsvector is that a
search may return a wrong result till the tsvector is updated.
It would be nice if when an index entry is stale postgresql could
look at "real data"... but well that's not a field I can speculate
on.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


pgsql-general by date:

Previous
From: Alban Hertroys
Date:
Subject: Re: Slow first query despite LIMIT and OFFSET clause
Next
From: sanjeev kumar
Date:
Subject: getting no days problem