Thread: Random penalties on GIN index updates?

Random penalties on GIN index updates?

From
jesper@krogh.cc
Date:
Hi  (running PG8.4.1)

As far as I have gotten in my test of PG Full Text Search.. I have got
over 6m documents indexed so far and the index has grown to 37GB. The
systems didnt do any autovacuums in the process but I manually vacuumed a
few times and that stopped growth for a short period of time.

 table_name |   index_name    | times_used | table_size | index_size |
num_writes |                              definition

------------+-----------------+------------+------------+------------+------------+----------------------------------------------------------------------
 ftstest    | body_tfs_idx |        171 | 5071 MB    | 37 GB      |
6122086 | CREATE INDEX ftstest_tfs_idx ON ftstest USING gin
(ftstest_body_fts)
(1 row)

This is sort of what I'd expect this is not more scary than the Xapian
index it is comparing with. Search speed seems excellent. But I feel I'm
getting a significant drop-off in indexing speed as time goes by, I dont
have numbers to confirm this.

If i understand the technicalities correct then INSERT/UPDATES to the
index will be accumulated in the "maintainance_work_mem" and the "user"
being unlucky to fill it up will pay the penalty of merging all the
changes into the index?

I currently have "maintainance_work_mem" set to 128MB and according to
"pg_stat_activity" i currently have a insert sitting for over 1 hour. If I
strace the postgres process-id it is reading and writing a lot on the
filesystem and imposing an IO-wait load of 1 cpu.

Can I do something to prevent this from happening? Is it "by design"?

--
Jesper


Re: Random penalties on GIN index updates?

From
Tom Lane
Date:
jesper@krogh.cc writes:
> If i understand the technicalities correct then INSERT/UPDATES to the
> index will be accumulated in the "maintainance_work_mem" and the "user"
> being unlucky to fill it up will pay the penalty of merging all the
> changes into the index?

You can turn off the "fastupdate" index parameter to disable that,
but I think there may be a penalty in index bloat as well as insertion
speed.  It would be better to use a more conservative work_mem
(work_mem, not maintenance_work_mem, is what limits the amount of stuff
accumulated during normal inserts).

            regards, tom lane

Re: Random penalties on GIN index updates?

From
Jesper Krogh
Date:
Tom Lane wrote:
> jesper@krogh.cc writes:
>> If i understand the technicalities correct then INSERT/UPDATES to the
>> index will be accumulated in the "maintainance_work_mem" and the "user"
>> being unlucky to fill it up will pay the penalty of merging all the
>> changes into the index?
>
> You can turn off the "fastupdate" index parameter to disable that,
> but I think there may be a penalty in index bloat as well as insertion
> speed.  It would be better to use a more conservative work_mem
> (work_mem, not maintenance_work_mem, is what limits the amount of stuff
> accumulated during normal inserts).

Ok, I read the manual about that. Seems worth testing, hat I'm seeing is
stuff like this:

2009-10-21T16:32:21
2009-10-21T16:32:25
2009-10-21T16:32:30
2009-10-21T16:32:35
2009-10-21T17:10:50
2009-10-21T17:10:59
2009-10-21T17:11:09
... then it went on steady for another 180.000 documents.

Each row is a printout from the application doing INSERTS, it print the
time for each 1000 rows it gets through. It is the 38minutes in the
middle I'm a bit worried about.

work_mem is set to 512MB, that may translate into 180.000 documents in
my system?

What I seems to miss a way to make sure som "background" application is
the one getting the penalty, so a random user doing a single insert
won't get stuck. Is that doable?

It also seems to lock out other inserts while being in this state.

--
Jesper

Re: Random penalties on GIN index updates?

From
Tom Lane
Date:
Jesper Krogh <jesper@krogh.cc> writes:
> What I seems to miss a way to make sure som "background" application is
> the one getting the penalty, so a random user doing a single insert
> won't get stuck. Is that doable?

You could force a vacuum every so often, but I don't think that will
help the locking situation.  You really need to back off work_mem ---
512MB is probably not a sane global value for that anyway.

            regards, tom lane

Re: Random penalties on GIN index updates?

From
Robert Haas
Date:
On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jesper Krogh <jesper@krogh.cc> writes:
>> What I seems to miss a way to make sure som "background" application is
>> the one getting the penalty, so a random user doing a single insert
>> won't get stuck. Is that doable?
>
> You could force a vacuum every so often, but I don't think that will
> help the locking situation.  You really need to back off work_mem ---
> 512MB is probably not a sane global value for that anyway.

Yeah, it's hard to imagine a system where that doesn't threaten all
kinds of other bad results.  I bet setting this to 4MB will make this
problem largely go away.

Arguably we shouldn't be using work_mem to control this particular
behavior, but...

...Robert

Re: Random penalties on GIN index updates?

From
Jesper Krogh
Date:
Robert Haas wrote:
> On Wed, Oct 21, 2009 at 2:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jesper Krogh <jesper@krogh.cc> writes:
>>> What I seems to miss a way to make sure som "background" application is
>>> the one getting the penalty, so a random user doing a single insert
>>> won't get stuck. Is that doable?
>> You could force a vacuum every so often, but I don't think that will
>> help the locking situation.  You really need to back off work_mem ---
>> 512MB is probably not a sane global value for that anyway.
>
> Yeah, it's hard to imagine a system where that doesn't threaten all
> kinds of other bad results.  I bet setting this to 4MB will make this
> problem largely go away.
>
> Arguably we shouldn't be using work_mem to control this particular
> behavior, but...

I came from Xapian, where you only can have one writer process, but
batching up in several GB's improved indexing performance dramatically.
Lowering work_mem to 16MB gives "batches" of 11.000 documents and stall
between 45 and 90s.  ~ 33 docs/s

--
Jesper