Thread: very long update gin index troubles back?

very long update gin index troubles back?

From
Ivan Sergio Borgonovo
Date:
I've a table that contain a tsvector that is indexed (gin) and
triggers to update the tsvector that should then update the index.

This gin index has always been problematic. Recreation and updates
were very slow.

Now I had to update 1M rows of that table but for columns
that doesn't involve the tsvector
I dropped the trigger to update the tsvector so that when rows get
updated the trigger won't be called so things should be faster...
but still it is taking forever.

begin;
        set constraints all deferred;

        select * from FT1IDX_trigger_drop();
        update catalog_items set
        APrice=p.PrezzoA,
        BPrice=p.PrezzoB
        from import.catalog_prices p where
        catalog_items.ItemID=p.id;
    select * from FT1IDX_trigger_create();
commit;

function are used since I've 2 triggers actually that I drop and
create.

Is there anything wrong in the above to make this update so slow on
a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write.


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: very long update gin index troubles back?

From
Teodor Sigaev
Date:
Ответишь ему что-нибудь? Он так мутно излагает, что я ни хрена не понял.

Ivan Sergio Borgonovo wrote:
> I've a table that contain a tsvector that is indexed (gin) and
> triggers to update the tsvector that should then update the index.
>
> This gin index has always been problematic. Recreation and updates
> were very slow.
>
> Now I had to update 1M rows of that table but for columns
> that doesn't involve the tsvector
> I dropped the trigger to update the tsvector so that when rows get
> updated the trigger won't be called so things should be faster...
> but still it is taking forever.
>
> begin;
>         set constraints all deferred;
>
>         select * from FT1IDX_trigger_drop();
>         update catalog_items set
>         APrice=p.PrezzoA,
>         BPrice=p.PrezzoB
>         from import.catalog_prices p where
>         catalog_items.ItemID=p.id;
>     select * from FT1IDX_trigger_create();
> commit;
>
> function are used since I've 2 triggers actually that I drop and
> create.
>
> Is there anything wrong in the above to make this update so slow on
> a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write.
>
>

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: very long update gin index troubles back?

From
Oleg Bartunov
Date:
On Fri, 23 Jan 2009, Ivan Sergio Borgonovo wrote:

> I've a table that contain a tsvector that is indexed (gin) and
> triggers to update the tsvector that should then update the index.
>
> This gin index has always been problematic. Recreation and updates
> were very slow.
>
> Now I had to update 1M rows of that table but for columns
> that doesn't involve the tsvector
> I dropped the trigger to update the tsvector so that when rows get
> updated the trigger won't be called so things should be faster...
> but still it is taking forever.

all indexes will be updated, so if you might want to drop indexes and
rebuild from the scratch.

>
> begin;
>        set constraints all deferred;
>
>        select * from FT1IDX_trigger_drop();
>        update catalog_items set
>         APrice=p.PrezzoA,
>         BPrice=p.PrezzoB
>         from import.catalog_prices p where
>         catalog_items.ItemID=p.id;
>     select * from FT1IDX_trigger_create();
> commit;
>
> function are used since I've 2 triggers actually that I drop and
> create.
>
> Is there anything wrong in the above to make this update so slow on
> a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write.
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: very long update gin index troubles back?

From
Teodor Sigaev
Date:
GIN index is slow for update by its construction. When you update the rows with
or without columns indexed by GIN, postgres (in most cases) will insert new
records, so index insertion will occur. So, for large updates it's much cheaper
to drop and create index.

That was a one of reasons to develop fast_insert_gin patch which now in review
process.

Ivan Sergio Borgonovo wrote:
> I've a table that contain a tsvector that is indexed (gin) and
> triggers to update the tsvector that should then update the index.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: very long update gin index troubles back?

From
Ivan Sergio Borgonovo
Date:
On Sat, 24 Jan 2009 15:54:37 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> GIN index is slow for update by its construction. When you update
> the rows with or without columns indexed by GIN, postgres (in most
> cases) will insert new records, so index insertion will occur. So,
> for large updates it's much cheaper to drop and create index.
>
> That was a one of reasons to develop fast_insert_gin patch which
> now in review process.

Somehow the update succeeded... but for better preparing to the
future...

I'm not sure I understood, so I'll provide more details.

I've something like:

create table t1(
 agg tsvector,
 a varchar(10),
 b varchar(10),
 c varchar(10),
 d int
);

then 2 triggers that for every update to t1 fill up agg.

agg:=tsvactor(coalesce(a,'')) || tsvactor(coalesce(b,'')) ||
  tsvactor(coalesce(c,''));

and I have a gin index on agg.

No matter if I drop the trigger that update agg content and the fact
that I'm just updating d, postgresql will update the index?

Right?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: very long update gin index troubles back?

From
Teodor Sigaev
Date:
> No matter if I drop the trigger that update agg content and the fact
> that I'm just updating d, postgresql will update the index?
Yes, due to MVCC. Update of row could produce new version (tuple) and new
version should be index as old one.
--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

Re: very long update gin index troubles back?

From
Ivan Sergio Borgonovo
Date:
On Tue, 27 Jan 2009 20:45:53 +0300
Teodor Sigaev <teodor@sigaev.ru> wrote:

> > No matter if I drop the trigger that update agg content and the
> > fact that I'm just updating d, postgresql will update the index?
> Yes, due to MVCC. Update of row could produce new version (tuple)
> and new version should be index as old one.

Does that mean that it could be a good choice to place the tsvector
in another table?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: very long update gin index troubles back?

From
Oleg Bartunov
Date:
On Wed, 28 Jan 2009, Ivan Sergio Borgonovo wrote:

> On Tue, 27 Jan 2009 20:45:53 +0300
> Teodor Sigaev <teodor@sigaev.ru> wrote:
>
>>> No matter if I drop the trigger that update agg content and the
>>> fact that I'm just updating d, postgresql will update the index?
>> Yes, due to MVCC. Update of row could produce new version (tuple)
>> and new version should be index as old one.
>
> Does that mean that it could be a good choice to place the tsvector
> in another table?

this is a trade-off - additional join

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: very long update gin index troubles back?

From
Ivan Sergio Borgonovo
Date:
On Wed, 28 Jan 2009 23:38:47 +0300 (MSK)
Oleg Bartunov <oleg@sai.msu.su> wrote:

> On Wed, 28 Jan 2009, Ivan Sergio Borgonovo wrote:

> > On Tue, 27 Jan 2009 20:45:53 +0300
> > Teodor Sigaev <teodor@sigaev.ru> wrote:

> >>> No matter if I drop the trigger that update agg content and the
> >>> fact that I'm just updating d, postgresql will update the
> >>> index?
> >> Yes, due to MVCC. Update of row could produce new version
> >> (tuple) and new version should be index as old one.

> > Does that mean that it could be a good choice to place the
> > tsvector in another table?

> this is a trade-off - additional join

gin index creation/update seems to put a lot of pressure on the box.

While the total amount of time saved for every search may exceed the
time saved avoiding to reindex when not strictly necessary, when the
box is reindexing a large set of records it is on its knees.

Consider that, excluding the update that cause the reindex, the
table could be considered "readonly" and during tests it was
actually "readonly" while in production I may expect no more than 10
clients waiting to write on the table during an index update.
Still I'm wondering how the index updated when I eg. do something
like:

create table1 (
 x int,
 y int,
 fti tsvector
);

update table1 set x=10 where y=11; and more than one record is
involved.
Will the whole table be locked till the whole statement end or the
table will be locked just when every single row is updated till that
row is reindexed?

Is there any way to control "reindexing" priority (a sort of nice)?

CONCURRENTLY doesn't look as what I'm looking for.

A solution that could be just fired and forgotten could be OK.
Once you define an index CONCURRENTLY I can't understand how to
check for "invalidity" and when.

The problem of scheduling the update of the tsvector is that a
search may return a wrong result till the tsvector is updated.
It would be nice if when an index entry is stale postgresql could
look at "real data"... but well that's not a field I can speculate
on.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it