Thread: very long update gin index troubles back?
I've a table that contain a tsvector that is indexed (gin) and triggers to update the tsvector that should then update the index. This gin index has always been problematic. Recreation and updates were very slow. Now I had to update 1M rows of that table but for columns that doesn't involve the tsvector I dropped the trigger to update the tsvector so that when rows get updated the trigger won't be called so things should be faster... but still it is taking forever. begin; set constraints all deferred; select * from FT1IDX_trigger_drop(); update catalog_items set APrice=p.PrezzoA, BPrice=p.PrezzoB from import.catalog_prices p where catalog_items.ItemID=p.id; select * from FT1IDX_trigger_create(); commit; function are used since I've 2 triggers actually that I drop and create. Is there anything wrong in the above to make this update so slow on a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write. -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ответишь ему что-нибудь? Он так мутно излагает, что я ни хрена не понял. Ivan Sergio Borgonovo wrote: > I've a table that contain a tsvector that is indexed (gin) and > triggers to update the tsvector that should then update the index. > > This gin index has always been problematic. Recreation and updates > were very slow. > > Now I had to update 1M rows of that table but for columns > that doesn't involve the tsvector > I dropped the trigger to update the tsvector so that when rows get > updated the trigger won't be called so things should be faster... > but still it is taking forever. > > begin; > set constraints all deferred; > > select * from FT1IDX_trigger_drop(); > update catalog_items set > APrice=p.PrezzoA, > BPrice=p.PrezzoB > from import.catalog_prices p where > catalog_items.ItemID=p.id; > select * from FT1IDX_trigger_create(); > commit; > > function are used since I've 2 triggers actually that I drop and > create. > > Is there anything wrong in the above to make this update so slow on > a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write. > > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Fri, 23 Jan 2009, Ivan Sergio Borgonovo wrote: > I've a table that contain a tsvector that is indexed (gin) and > triggers to update the tsvector that should then update the index. > > This gin index has always been problematic. Recreation and updates > were very slow. > > Now I had to update 1M rows of that table but for columns > that doesn't involve the tsvector > I dropped the trigger to update the tsvector so that when rows get > updated the trigger won't be called so things should be faster... > but still it is taking forever. all indexes will be updated, so if you might want to drop indexes and rebuild from the scratch. > > begin; > set constraints all deferred; > > select * from FT1IDX_trigger_drop(); > update catalog_items set > APrice=p.PrezzoA, > BPrice=p.PrezzoB > from import.catalog_prices p where > catalog_items.ItemID=p.id; > select * from FT1IDX_trigger_create(); > commit; > > function are used since I've 2 triggers actually that I drop and > create. > > Is there anything wrong in the above to make this update so slow on > a 2x Xeon 3.2GHz 4GbRAM and a RAID1 [sic] I know it is slow on write. > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
GIN index is slow for update by its construction. When you update the rows with or without columns indexed by GIN, postgres (in most cases) will insert new records, so index insertion will occur. So, for large updates it's much cheaper to drop and create index. That was a one of reasons to develop fast_insert_gin patch which now in review process. Ivan Sergio Borgonovo wrote: > I've a table that contain a tsvector that is indexed (gin) and > triggers to update the tsvector that should then update the index. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Sat, 24 Jan 2009 15:54:37 +0300 Teodor Sigaev <teodor@sigaev.ru> wrote: > GIN index is slow for update by its construction. When you update > the rows with or without columns indexed by GIN, postgres (in most > cases) will insert new records, so index insertion will occur. So, > for large updates it's much cheaper to drop and create index. > > That was a one of reasons to develop fast_insert_gin patch which > now in review process. Somehow the update succeeded... but for better preparing to the future... I'm not sure I understood, so I'll provide more details. I've something like: create table t1( agg tsvector, a varchar(10), b varchar(10), c varchar(10), d int ); then 2 triggers that for every update to t1 fill up agg. agg:=tsvactor(coalesce(a,'')) || tsvactor(coalesce(b,'')) || tsvactor(coalesce(c,'')); and I have a gin index on agg. No matter if I drop the trigger that update agg content and the fact that I'm just updating d, postgresql will update the index? Right? -- Ivan Sergio Borgonovo http://www.webthatworks.it
> No matter if I drop the trigger that update agg content and the fact > that I'm just updating d, postgresql will update the index? Yes, due to MVCC. Update of row could produce new version (tuple) and new version should be index as old one. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On Tue, 27 Jan 2009 20:45:53 +0300 Teodor Sigaev <teodor@sigaev.ru> wrote: > > No matter if I drop the trigger that update agg content and the > > fact that I'm just updating d, postgresql will update the index? > Yes, due to MVCC. Update of row could produce new version (tuple) > and new version should be index as old one. Does that mean that it could be a good choice to place the tsvector in another table? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Wed, 28 Jan 2009, Ivan Sergio Borgonovo wrote: > On Tue, 27 Jan 2009 20:45:53 +0300 > Teodor Sigaev <teodor@sigaev.ru> wrote: > >>> No matter if I drop the trigger that update agg content and the >>> fact that I'm just updating d, postgresql will update the index? >> Yes, due to MVCC. Update of row could produce new version (tuple) >> and new version should be index as old one. > > Does that mean that it could be a good choice to place the tsvector > in another table? this is a trade-off - additional join Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Wed, 28 Jan 2009 23:38:47 +0300 (MSK) Oleg Bartunov <oleg@sai.msu.su> wrote: > On Wed, 28 Jan 2009, Ivan Sergio Borgonovo wrote: > > On Tue, 27 Jan 2009 20:45:53 +0300 > > Teodor Sigaev <teodor@sigaev.ru> wrote: > >>> No matter if I drop the trigger that update agg content and the > >>> fact that I'm just updating d, postgresql will update the > >>> index? > >> Yes, due to MVCC. Update of row could produce new version > >> (tuple) and new version should be index as old one. > > Does that mean that it could be a good choice to place the > > tsvector in another table? > this is a trade-off - additional join gin index creation/update seems to put a lot of pressure on the box. While the total amount of time saved for every search may exceed the time saved avoiding to reindex when not strictly necessary, when the box is reindexing a large set of records it is on its knees. Consider that, excluding the update that cause the reindex, the table could be considered "readonly" and during tests it was actually "readonly" while in production I may expect no more than 10 clients waiting to write on the table during an index update. Still I'm wondering how the index updated when I eg. do something like: create table1 ( x int, y int, fti tsvector ); update table1 set x=10 where y=11; and more than one record is involved. Will the whole table be locked till the whole statement end or the table will be locked just when every single row is updated till that row is reindexed? Is there any way to control "reindexing" priority (a sort of nice)? CONCURRENTLY doesn't look as what I'm looking for. A solution that could be just fired and forgotten could be OK. Once you define an index CONCURRENTLY I can't understand how to check for "invalidity" and when. The problem of scheduling the update of the tsvector is that a search may return a wrong result till the tsvector is updated. It would be nice if when an index entry is stale postgresql could look at "real data"... but well that's not a field I can speculate on. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it