Thread: Gin index insert performance issue

Gin index insert performance issue

From
Rural Hunter
Date:
I'm using gin index for my full text search engine in production. These
days the gin index size grows to 20-30G and the system started to suffer
with periodical insert hanging. This is same as described in the 2 posts:
http://postgresql.1045698.n5.nabble.com/Random-penalties-on-GIN-index-updates-td2073848.html
http://postgresql.1045698.n5.nabble.com/Periodically-slow-inserts-td3230434.html

The gin index is on a dedicated raid 10 SAS disk and the performance
should be enough for normal db operation. But I always see almost 100%
disk utiliztion on the disk when the inserts hang. The utiliztion for
other data(such as the full text table data) on another disk(same setup
as the gin index disk: SAS raid 10) is quite low comparing with the gin
index disk. From my observation, looks too much data is written to the
disk when the pending list of gin index is flushed to the disk. Below is
the outupt of 'iostat -xm 3' on the disk when inserts hang:
Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sde               0.00     0.00    0.67 2614.00     0.08    22.94
18.03    32.94   12.61   0.38 100.00
sde               0.00     0.00    1.67 2377.33     0.17    20.43
17.73    32.00   13.44   0.42 100.00
sde               0.00     0.00   15.67 2320.33     0.23    20.13
17.85    31.99   13.73   0.43 100.00
sde               0.00     0.00    7.33 1525.00     0.12    14.02
18.90    32.00   20.83   0.65 100.00
sde               0.00     0.00   14.33 1664.67     0.12    15.54
19.10    32.00   19.06   0.60 100.00
sde               0.00     0.00    5.33 1654.33     0.04    12.07
14.94    32.00   19.22   0.60 100.00

I tried to increase work_mem but the inserts hang more time each time
with less frequency.  So it makes almost no difference for the total
hanging time. Frequent vacuum is not a choice since the hang happens
very 3-5 mins. is there any improvement I can make with pg for such data
volumn(still increasing) or it's time to turn to other full text search
solution such as lucene etc?

Re: Gin index insert performance issue

From
Jesper Krogh
Date:
On 13/03/12 06:43, Rural Hunter wrote:
> I tried to increase work_mem but the inserts hang more time each time
> with less frequency. So it makes almost no difference for the total
> hanging time. Frequent vacuum is not a choice since the hang happens
> very 3-5 mins. is there any improvement I can make with pg for such
> data volumn(still increasing) or it's time to turn to other full text
> search solution such as lucene etc?


We're using gin for fts-search, current index-size is up to 149GB and yes
the update process is quite tough on the disk-io-subsystem.

What you're experiencing is filling of the fastupdate queue, thats being
flushed. Setting wok_mem higher causes the system to stall for longer
period less frequent and has a side cost on queries that need to go through
the pending list (that is bigger) in addition to the index-search. To me
it seems like all other writing/updating processes are being stalled when
the pending list is flushed, but I am not sure about the specifice here.

Our solution is to turn "fastupdate" off for our gin-indices.
http://www.postgresql.org/docs/9.0/static/sql-createindex.html
Can also be set with ALTER TABLE ALTER INDEX

I would have preferred a "backend local" batch-update process so it
could batch up everything from its own transaction instead of interferring
with other transactions.

I would say, that we came from Xapian and the PG-index is a way better
fit for our application. The benefits of having the fts index next to all the
other data saves a significant amount of development time in the application
both in terms of development and maintaince. (simpler, easier and more manageble).

--
Jesper

Re: Gin index insert performance issue

From
Rural Hunter
Date:
Thanks for the reply. Your index is much larger than mine..so I see some
light. :)

于 2012/3/13 14:29, Jesper Krogh 写道:

> Our solution is to turn "fastupdate" off for our gin-indices.
> http://www.postgresql.org/docs/9.0/static/sql-createindex.html
> Can also be set with ALTER TABLE ALTER INDEX
I will check and try that.
>
> I would have preferred a "backend local" batch-update process so it
> could batch up everything from its own transaction instead of interferring
> with other transactions.
have you tested if there is any performance boot for backend
batch-update comparing the real time updates?
>
> I would say, that we came from Xapian and the PG-index is a way better
> fit for our application. The benefits of having the fts index next to
> all the
> other data saves a significant amount of development time in the
> application
> both in terms of development and maintaince. (simpler, easier and more
> manageble).
Yes, that's why I'm still looking for the improvment inside pg. This is
really a big dev/maint saver.
>
> --
> Jesper
>


Re: Gin index insert performance issue

From
Rural Hunter
Date:
I disabled fastupdate on the gin index. looks it solved my problem, at
least for now. Thanks a lot for your help Jesper!
------------------------------
Thanks for the reply. Your index is much larger than mine..so I see some
light. :)

于 2012/3/13 14:29, Jesper Krogh 写道:

> Our solution is to turn "fastupdate" off for our gin-indices.
> http://www.postgresql.org/docs/9.0/static/sql-createindex.html
> Can also be set with ALTER TABLE ALTER INDEX
I will check and try that.
>
> I would have preferred a "backend local" batch-update process so it
> could batch up everything from its own transaction instead of interferring
> with other transactions.
have you tested if there is any performance boot for backend
batch-update comparing the real time updates?
>
> I would say, that we came from Xapian and the PG-index is a way better
> fit for our application. The benefits of having the fts index next to
> all the
> other data saves a significant amount of development time in the
> application
> both in terms of development and maintaince. (simpler, easier and more
> manageble).
Yes, that's why I'm still looking for the improvment inside pg. This is
really a big dev/maint saver.
>
> --
> Jesper
>