Re: Gin index insert performance issue - Mailing list pgsql-performance

From Jesper Krogh
Subject Re: Gin index insert performance issue
Date
Msg-id 4F5EE965.8010706@krogh.cc
Whole thread Raw
In response to Gin index insert performance issue  (Rural Hunter <ruralhunter@gmail.com>)
Responses Re: Gin index insert performance issue  (Rural Hunter <ruralhunter@gmail.com>)
Re: Gin index insert performance issue  (Rural Hunter <ruralhunter@gmail.com>)
List pgsql-performance
On 13/03/12 06:43, Rural Hunter wrote:
> I tried to increase work_mem but the inserts hang more time each time
> with less frequency. So it makes almost no difference for the total
> hanging time. Frequent vacuum is not a choice since the hang happens
> very 3-5 mins. is there any improvement I can make with pg for such
> data volumn(still increasing) or it's time to turn to other full text
> search solution such as lucene etc?


We're using gin for fts-search, current index-size is up to 149GB and yes
the update process is quite tough on the disk-io-subsystem.

What you're experiencing is filling of the fastupdate queue, thats being
flushed. Setting wok_mem higher causes the system to stall for longer
period less frequent and has a side cost on queries that need to go through
the pending list (that is bigger) in addition to the index-search. To me
it seems like all other writing/updating processes are being stalled when
the pending list is flushed, but I am not sure about the specifice here.

Our solution is to turn "fastupdate" off for our gin-indices.
http://www.postgresql.org/docs/9.0/static/sql-createindex.html
Can also be set with ALTER TABLE ALTER INDEX

I would have preferred a "backend local" batch-update process so it
could batch up everything from its own transaction instead of interferring
with other transactions.

I would say, that we came from Xapian and the PG-index is a way better
fit for our application. The benefits of having the fts index next to all the
other data saves a significant amount of development time in the application
both in terms of development and maintaince. (simpler, easier and more manageble).

--
Jesper

pgsql-performance by date:

Previous
From: Rural Hunter
Date:
Subject: Gin index insert performance issue
Next
From: Rural Hunter
Date:
Subject: Re: Gin index insert performance issue