Thread: strange index performance?

strange index performance?

From
Thomas Finneid
Date:
Hi

I just experienced a performance loss on inserts when redefining my
index on a table.

I have a database with the following table

table:
    id1    int
    id2    int
    id3    int
    id4    int
    val1    float
    ....
    tablespace dbspace

the data id distribution is hierarchical and even, well fitting to a
btree, there is about 20000 rows per insert (actually a jdbc COPY)

originally I had an index on all id felds

index:
    btree on (id1, id2, id3, id4) tablespace indexspace

that gave me an insert performance of 230 ms

because my query does not need id2 I changed the index and removed id2
from the criteria:

index:
    btree on (id1, id3, id4) tablespace indexspace

now an insert takes approx 330-430 ms

Anybody have any ideas why that is? I was expecting it to take
approximately the same amount of time or less, since there is an element
  less in the criteria.

regards

thomas



Re: strange index performance?

From
Scott Marlowe
Date:
On Sat, Jan 24, 2009 at 3:57 AM, Thomas Finneid <tfinneid@fcon.no> wrote:
> Hi
>
> I just experienced a performance loss on inserts when redefining my index on
> a table.
>
> I have a database with the following table
>
> table:
>   id1  int
>   id2  int
>   id3  int
>   id4  int
>   val1 float
>   ....
>   tablespace dbspace
>
> the data id distribution is hierarchical and even, well fitting to a btree,
> there is about 20000 rows per insert (actually a jdbc COPY)

Is this table constantly growing, or is it at a stable plateu?  I'd
assume a constantly growing table, or one with bloat problems would
get slower over time.  About how many rows does this table have?

Re: strange index performance?

From
Scott Marlowe
Date:
Also, what other kind of usage patterns are going on.  I wrote a
simple test case for this and on a table with 100,000 entries already
in it, then inserting 10,000 in a transaction and 10,000 outside of a
transaction, I get insert rates of 0.1 ms and 0.5 ms respectively.
With a table with 1,000,000 rows already in place, the insert times
with all the fields in an index was 1.5ms and 4.3ms respectively.

With only i1, i3, i4, val1 in the index, the numbers for a table with
100,000 entries to start with was 0.1ms and 0.5 ms, just like the
above with the larger index.  With a 1,000,000 initial table, inserts
take 2.1 and 3.0 ms respectively.

So I don't think you've found the cause of your problem with the smaller index.

Re: strange index performance?

From
Thomas Finneid
Date:
Scott Marlowe wrote:
> On Sat, Jan 24, 2009 at 3:57 AM, Thomas Finneid <tfinneid@fcon.no> wrote:
 >
> Is this table constantly growing, or is it at a stable plateu?  I'd
> assume a constantly growing table, or one with bloat problems would
> get slower over time.  About how many rows does this table have?

The table is constantly growing, by 20000 rows per second.
I did a test with an index for all id fields, and the insert time is
constant for every insert up to 1.2 billion rows. Then I stopped the
test because I didn't have more disk space allocated.

regards

thomas

Re: strange index performance?

From
Thomas Finneid
Date:
Scott Marlowe wrote:
> Also, what other kind of usage patterns are going on.

For this test there was nothing else going on, it was just that one
writer. The complete usage pattern is that there is one writer that
writes this data, about 20000 rows per second, and then a small number
of readers that query for some data based on id1,3,4.
(To help you visualize it, think of a query where you want to know the
names all residents on the 4th floor in that particular street
independent of house number. So id1 would be the street, id2 would be
the house number, id3 would be the floor number and id4 would be the
apartment number. Such a query would only use id1,3,4)

> I wrote a
> simple test case for this and on a table with 100,000 entries already
> in it, then inserting 10,000 in a transaction and 10,000 outside of a
> transaction, I get insert rates of 0.1 ms and 0.5 ms respectively.
> With a table with 1,000,000 rows already in place, the insert times
> with all the fields in an index was 1.5ms and 4.3ms respectively.
>
> With only i1, i3, i4, val1 in the index, the numbers for a table with
> 100,000 entries to start with was 0.1ms and 0.5 ms, just like the
> above with the larger index.  With a 1,000,000 initial table, inserts
> take 2.1 and 3.0 ms respectively.

How do you run that setup, because those times are amazing, my inserts
take about 220ms, constantly from the first row in the table to the 1.2
billionth row. The client I was using for the inserts is a bare-bone use
case simulation tool I have written in java, to test different insert
and query strategies for this application. Its using JDBC copy to do the
inserts.

There is one thing you should know though, and that is that the real
table I am using has 20 value fields where the 6 first fields contains a
value, but that does not affect the difference int eh the execution time
of the two indexes.

> So I don't think you've found the cause of your problem with the smaller index.

I dont quite understand what you are saying here, but I assume you are
saying that the smaller index is not the cause of the increased insert time?

If so, I did the test with both indexes on exactly the same db and
setup. And when the index uses all four ids the insert time is larger
than if I only use id1,3,4.

What concerns me about your test, is that you dont seem to get constant
insert times, so there is a difference between the two tests, which
miuch be why you dont see the problem I am seeing with my index.

regards

thomas

Re: strange index performance?

From
Scott Marlowe
Date:
On Sun, Jan 25, 2009 at 1:14 AM, Thomas Finneid <tfinneid@fcon.no> wrote:
> Scott Marlowe wrote:
>>
>> I wrote a
>> simple test case for this and on a table with 100,000 entries already
>> in it, then inserting 10,000 in a transaction and 10,000 outside of a
>> transaction, I get insert rates of 0.1 ms and 0.5 ms respectively.
>> With a table with 1,000,000 rows already in place, the insert times
>> with all the fields in an index was 1.5ms and 4.3ms respectively.
>>
>> With only i1, i3, i4, val1 in the index, the numbers for a table with
>> 100,000 entries to start with was 0.1ms and 0.5 ms, just like the
>> above with the larger index.  With a 1,000,000 initial table, inserts
>> take 2.1 and 3.0 ms respectively.
>
> How do you run that setup, because those times are amazing, my inserts take
> about 220ms, constantly from the first row in the table to the 1.2 billionth
> row. The client I was using for the inserts is a bare-bone use case
> simulation tool I have written in java, to test different insert and query
> strategies for this application. Its using JDBC copy to do the inserts.

The setup was a simple PHP script.  I've attached it to this email.

> There is one thing you should know though, and that is that the real table I
> am using has 20 value fields where the 6 first fields contains a value, but
> that does not affect the difference int eh the execution time of the two
> indexes.

Yes, but it will increase the insert time to the table depending very
much on the size of those other fields.

>> So I don't think you've found the cause of your problem with the smaller
>> index.
>
> I dont quite understand what you are saying here, but I assume you are
> saying that the smaller index is not the cause of the increased insert time?

Yes, that's what I was saying.

> If so, I did the test with both indexes on exactly the same db and setup.
> And when the index uses all four ids the insert time is larger than if I
> only use id1,3,4.

I thought it was the other way around for you, that the smaller index
was slower.

> What concerns me about your test, is that you dont seem to get constant
> insert times, so there is a difference between the two tests, which miuch be
> why you dont see the problem I am seeing with my index.

Yes, you need to look at things like increasing the number of wal
segments and checkpointing.  If a checkpoint is kicking in it's going
to slow everything down.

What version pgsql are you running?  My tests were on 8.3.x on a
core2duo laptop with a stock slow 80Gig hard drive, but most likely
it's lying about fsync, so that could explain some of the numbers.

I just ran it on my real server, since it's late at night, there's not
much else going on.  With 1M rows created ahead of time I got similar
numbers:

0.12 ms per insert with all 10,000 inserted in a transaction
0.24 ms per insert with each insert being individual transactions
(i.e. no begin; commt; wrapped around them all)  This is on a machine
with a 12 disk RAID-10 array under an Areca 1680ix controller with
512M battery backed cache.  Note that the table had no other columns
in it like yours does.

Attachment

Re: strange index performance?

From
Thomas Finneid
Date:
Scott Marlowe wrote:
> On Sun, Jan 25, 2009 at 1:14 AM, Thomas Finneid <tfinneid@fcon.no> wrote:
>> Scott Marlowe wrote:
>>> So I don't think you've found the cause of your problem with the smaller
>>> index.

Ok I understand, but why dont you think the index is the problem?

>> If so, I did the test with both indexes on exactly the same db and setup.
>> And when the index uses all four ids the insert time is larger than if I
>> only use id1,3,4.
>
> I thought it was the other way around for you, that the smaller index
> was slower.

Sorry for the mistake, I meant to say the smaller index causes the
slowest insert time.

> What version pgsql are you running?  My tests were on 8.3.x on a

I am running on pg 8.2.x (kubuntu x64) with 8GB ram, 8 opteron cores and
8 disks on a Areca Raid controller

> 0.12 ms per insert with all 10,000 inserted in a transaction
> 0.24 ms per insert with each insert being individual transactions
> (i.e. no begin; commt; wrapped around them all)  This is on a machine
> with a 12 disk RAID-10 array under an Areca 1680ix controller with
> 512M battery backed cache.  Note that the table had no other columns
> in it like yours does.

Are you sure you mean to say 0.12 ms and not 0.12 seconds? My server
also uses an Areca RAID controller (8 disk controller model), but it
does not matter how many disks are in the stripe, its still the same
performance. So, if you get that performance then I have must have set
up postgres, the OS or the RAID controller wrong.  What are the most
important configurations you did to get that performance?



Re: strange index performance?

From
Scott Marlowe
Date:
On Sun, Jan 25, 2009 at 2:54 PM, Thomas Finneid <tfinneid@fcon.no> wrote:
> Scott Marlowe wrote:
>>
>> On Sun, Jan 25, 2009 at 1:14 AM, Thomas Finneid <tfinneid@fcon.no> wrote:
>>>
>>> Scott Marlowe wrote:
>>>>
>>>> So I don't think you've found the cause of your problem with the smaller
>>>> index.
>
> Ok I understand, but why dont you think the index is the problem?

Because on any decent hardware you should be able to insert a single
row in way under 200ms.  Whether it's got an index on it or not.

>> I thought it was the other way around for you, that the smaller index
>> was slower.
>
> Sorry for the mistake, I meant to say the smaller index causes the slowest
> insert time.

I'm guessing that you just had more data in the table or something by
the time you tested that, or some cron job was running in the
background, or some other issue, not the index.

>> What version pgsql are you running?  My tests were on 8.3.x on a
>
> I am running on pg 8.2.x (kubuntu x64) with 8GB ram, 8 opteron cores and 8
> disks on a Areca Raid controller

Quite a similar machine.  write back cache with battery backed
controller on the controller?  A really old Areca like an 11xx series
or a newer one 12xx, 16xx?

>> 0.12 ms per insert with all 10,000 inserted in a transaction
>> 0.24 ms per insert with each insert being individual transactions
>> (i.e. no begin; commt; wrapped around them all)  This is on a machine
>> with a 12 disk RAID-10 array under an Areca 1680ix controller with
>> 512M battery backed cache.  Note that the table had no other columns
>> in it like yours does.
>
> Are you sure you mean to say 0.12 ms and not 0.12 seconds? My server also

0.12 seconds per insert is pretty slow.  10 inserts would take a
second.  I'm inserting 10,000 rows in about 2 seconds.  Each insert is
definitely in the 0.12 millisecond range.

> uses an Areca RAID controller (8 disk controller model), but it does not
> matter how many disks are in the stripe, its still the same performance. So,
> if you get that performance then I have must have set up postgres, the OS or
> the RAID controller wrong.  What are the most important configurations you
> did to get that performance?

Hard to say.  What does bonnie++ have to say about the performance of
your RAID array?

Re: strange index performance?

From
Thomas Finneid
Date:
Scott Marlowe wrote:

> I'm guessing that you just had more data in the table or something by
> the time you tested that, or some cron job was running in the
> background, or some other issue, not the index.

It starts from scratch and builds up. Every insert has constant time
from the first to the last row, ie. row 1 to row 1.2 billion.
There is no background jobs or other disturbances.

> Quite a similar machine.  write back cache with battery backed
> controller on the controller?  A really old Areca like an 11xx series
> or a newer one 12xx, 16xx?

Its an Areca 1220. write back is enabled but it does not have a BBU,
because its an development machine and not a production machine.

> 0.12 seconds per insert is pretty slow.  10 inserts would take a
> second.  I'm inserting 10,000 rows in about 2 seconds.  Each insert is
> definitely in the 0.12 millisecond range.

I see the confusion. I use COPY(JDBC) not INSERT, so one transaction
contains 20000 rows, which is copy inserted in 300 ms, so that gives a
per row insert time of 0.015ms. So I actually have pretty decent write
performance. If I remove the index, the copy insert only takes about
125ms. So the index update time amounts to half the total update time.

This still leaves me with the question of why the smaller index
(id1,3,4) take longer to update than the larger index (id1,2,3,4)?
Updating an index like id1,2,3 should take shorter time, I have to test
it first to verify, so a similar index, id1,3,4 should take
approximately the same time.

Could it have something to do with the smaller index is more complicated
to fill in? Could the placing of the id2 filed in the table have
anything to say about it?

> Hard to say.  What does bonnie++ have to say about the performance of
> your RAID array?

Dont know, havent heard about it before now. But I will have a look at
it and see if the controller and the os is set up correctly.

regards

thomas


Re: strange index performance?

From
Kenneth Marshall
Date:
On Mon, Jan 26, 2009 at 10:10:13AM +0100, Thomas Finneid wrote:
> Scott Marlowe wrote:
>
>> I'm guessing that you just had more data in the table or something by
>> the time you tested that, or some cron job was running in the
>> background, or some other issue, not the index.
>
> It starts from scratch and builds up. Every insert has constant time from
> the first to the last row, ie. row 1 to row 1.2 billion.
> There is no background jobs or other disturbances.
>
>> Quite a similar machine.  write back cache with battery backed
>> controller on the controller?  A really old Areca like an 11xx series
>> or a newer one 12xx, 16xx?
>
> Its an Areca 1220. write back is enabled but it does not have a BBU,
> because its an development machine and not a production machine.
>
>> 0.12 seconds per insert is pretty slow.  10 inserts would take a
>> second.  I'm inserting 10,000 rows in about 2 seconds.  Each insert is
>> definitely in the 0.12 millisecond range.
>
> I see the confusion. I use COPY(JDBC) not INSERT, so one transaction
> contains 20000 rows, which is copy inserted in 300 ms, so that gives a per
> row insert time of 0.015ms. So I actually have pretty decent write
> performance. If I remove the index, the copy insert only takes about 125ms.
> So the index update time amounts to half the total update time.
>
> This still leaves me with the question of why the smaller index (id1,3,4)
> take longer to update than the larger index (id1,2,3,4)?
> Updating an index like id1,2,3 should take shorter time, I have to test it
> first to verify, so a similar index, id1,3,4 should take approximately the
> same time.
>
> Could it have something to do with the smaller index is more complicated to
> fill in? Could the placing of the id2 filed in the table have anything to
> say about it?
>

It may be that the smaller index has update contention for the same
blocks that the larger index does not.

Cheers,
Ken


Re: strange index performance?

From
Thomas Finneid
Date:
Kenneth Marshall wrote:
> It may be that the smaller index has update contention for the same
> blocks that the larger index does not.

Is that an assumption based on both indexes existing? if so I might
agree, but if you are talking about only one index existing at a time
then could you explain what the basis for you conclusion is?

regards

thomas


Re: strange index performance?

From
Kenneth Marshall
Date:
On Mon, Jan 26, 2009 at 03:49:00PM +0100, Thomas Finneid wrote:
> Kenneth Marshall wrote:
>> It may be that the smaller index has update contention for the same
>> blocks that the larger index does not.
>
> Is that an assumption based on both indexes existing? if so I might agree,
> but if you are talking about only one index existing at a time then could
> you explain what the basis for you conclusion is?
>
> regards
>
> thomas
>

The small index blocks would look like:

|abcd|efgh|ijkl|...

and the large index:

|axxx|...|bxxx|...|cxxx|... and so on.

Now, if you try to update a-k, the small index will be trying to
update and possibly rearrange/split/... items on the same disk
blocks while the larger index would be updating without contention.
It may not even be block level contention, the same argument applies
to cachelines with in a block.

Cheers,
Ken