Thread: performance problems inserting random vals in index

performance problems inserting random vals in index

From

Leonardo F

Date:

19 April 2010, 13:30:20

Hi,


I have a simple table that has indexes on 2 integer columns.
Data is inserted very often (no updates, no deletes, just inserts):
at least 4000/5000 rows per second.
The input for the 2 indexed columns is very random.

Everything is "fine" for the first 10-20M rows; after that, performance
gets worse and worse, and by 50M rows I can't insert more than
1500 rows per second.

How can I improve it?

1) the table is already partitioned; at the moment it's based on
30 partitions. What kind of "select" performance impact would I
hit going to 3000 partitions? I know in the 3000 partition case there
will be less rows per partition, but index seek times are O(logN),
right? So:

3000*(log(N/3000)) is much bigger than 30*(log(N/30))

2) How much improvement can I expect going to RAID1+0 on
15rpm disks instead of raid5 on 7200 disks? I know, I know, I
shouldn't even be testing such a huge workload with such a
setup, but can't try anything else at the moment. Disk array is
pretty fast anyway: 190MB seq output, 320MB seq input, 1600
random seeks / s (bonnie++)

3) Is there anything else I can try to "help" postgres update those
index faster?

Re: performance problems inserting random vals in index

From

Tom Lane

Date:

19 April 2010, 14:04:42

Leonardo F <m_lists@yahoo.it> writes:
> I have a simple table that has indexes on 2 integer columns.
> Data is inserted very often (no updates, no deletes, just inserts):
> at least 4000/5000 rows per second.
> The input for the 2 indexed columns is very random.

> Everything is "fine" for the first 10-20M rows; after that, performance
> gets worse and worse, and by 50M rows I can't insert more than
> 1500 rows per second.

> How can I improve it?

Does it help to reindex the index at that point?

> 1) the table is already partitioned; at the moment it's based on
> 30 partitions. What kind of "select" performance impact would I
> hit going to 3000 partitions?

Bad.  The partitioning code isn't designed to scale beyond a few dozen
partitions.

            regards, tom lane

Re: performance problems inserting random vals in index

From

Scott Marlowe

Date:

19 April 2010, 14:06:30

On Mon, Apr 19, 2010 at 11:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Leonardo F <m_lists@yahoo.it> writes:
>> I have a simple table that has indexes on 2 integer columns.
>> Data is inserted very often (no updates, no deletes, just inserts):
>> at least 4000/5000 rows per second.
>> The input for the 2 indexed columns is very random.
>
>> Everything is "fine" for the first 10-20M rows; after that, performance
>> gets worse and worse, and by 50M rows I can't insert more than
>> 1500 rows per second.
>
>> How can I improve it?
>
> Does it help to reindex the index at that point?
>
>> 1) the table is already partitioned; at the moment it's based on
>> 30 partitions. What kind of "select" performance impact would I
>> hit going to 3000 partitions?
>
> Bad.  The partitioning code isn't designed to scale beyond a few dozen
> partitions.

On a few very narrow applications I've gotten good performance in the
low hundreds.  After that things fall apart quickly.

Re: performance problems inserting random vals in index

From

Greg Smith

Date:

19 April 2010, 20:42:55

Leonardo F wrote:
> Is there anything else I can try to "help" postgres update those
> index faster?
>

It sounds like your problem is that read/write activities on the indexes
are becoming increasingly large and random as more data is inserted.
There are two standard ways to improve on that:

1) Periodically rebuild the indexes and hope that the new version is
laid out better than what you've ended up with the random key value
insertions.

2) Optimize your database server configuration to perform better under
this particular situation.  The usual set of tricks is to increase
shared_buffers, checkpoint_segments, and checkpoint_timeout to cut down
on the amount of time that the database has to write to the index
blocks, and improve the odds that ones it needs to read are already in
its cache.

It's hard to say whether increasing raw disk speed will help you as much
as you'd like or not.  Index-related performance is often driven by
whether the working set needed to work on them efficiently can fit in
RAM or not.  Once you've exceeded that, performance drops really fast,
and a linear increase in disk speed may not recover very much of that.
You can look at the size of all the active indexes using something like
the first query at http://wiki.postgresql.org/wiki/Disk_Usage to get an
idea how big they are relative to RAM.  Sometimes having more memory is
the only good way to scale upwards in this situation while retaining
something close to original performance.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: performance problems inserting random vals in index

From

Leonardo F

Date:

20 April 2010, 03:47:11

> Does it help to reindex the index at that point?

Didn't try; but I guess a reindex of such a big table
would block inserts for a long time... but I'll try

> Bad.  The partitioning code isn't designed to scale
> beyond a few dozen partitions.


What kind of problems am I going to experience?
It's just a N times access to the index or something
worse? It's the fact that it will take a linear amount of
time to "pick" the right partitions?

Re: performance problems inserting random vals in index

From

Leonardo F

Date:

20 April 2010, 03:51:13

> On a few very narrow applications I've gotten good
> performance in the
> low hundreds.  After that things fall apart
> quickly.


Ehm... what exactly "fall apart quickly" means?
I can trade some "select" speed for "insert" speed...

I don't have experience with partitioning, if some of
you already know what's going to happen I could
avoid testing or at least testing for the wrong setup...


Thank you

Re: performance problems inserting random vals in index

From

Leonardo F

Date:

20 April 2010, 03:52:55

> The usual set of tricks is to
> increase shared_buffers, checkpoint_segments, and checkpoint_timeout to cut down


Uh, didn't know shared_buffers played a role in index insertion as well...
got to try that. Thank you.

Re: performance problems inserting random vals in index

From

Leonardo F

Date:

21 April 2010, 07:36:29

> > The usual set of tricks is to
> increase shared_buffers,
> checkpoint_segments, and checkpoint_timeout to cut down

That did it. Setting a much higher shared_buffers helped quite a lot.


Thank you everybody for your suggestions.

Re: performance problems inserting random vals in index

From

Leonardo F

Date:

06 May 2010, 09:27:32

I'm sorry I have to come back at this, but the solution the list gave helped,
but didn't fully solved my problems...

To sum up:

I have a simple table that has indexes on 2 integer columns.
Data is inserted very often (no updates, no deletes, just inserts): at
least 4000/5000 rows per second. The input for the 2 indexed columns
is very random.


I was experiencing bad insertion times past 20M rows.

Some of you suggested that it might have to do with the indexes being
too big to fit in shared_buffers, so I raised it and, in fact, that solved the
issue... but for another 50M rows: basically at roughly 70M rows I'm
back at poor insertion times (disk always 100% busy).

The indexes at this point are 1.7GB each. My shared buffers is 10GB
(machine has 32GB). So I expect the whole indexes to fit in ram, yet
the disk array is 100% busy...

What's going on? What can I try to get back at good insertion
performance?