Thread: Partial indexes Vs standard indexes : Insert performance

Partial indexes Vs standard indexes : Insert performance

From
MaXX
Date:
Hi,

I just want to verify if I'm understanding this correctly:

I have a table in which I store log from my firewall.
For the protocol column (3 distinct values: TCP ~82%, UDP ~17%, ICMP
~1%, the table contains 1.7M rows), I use a partial index to find ICMP
packets faster.

In my understanding, a partial index is only touched when a matching row
is inserted/updated/deleted (index constraint is true), so if I create a
partial index for each protocol, I will slow down my machine as if I had
created a single "normal" index, but it will find rows faster (the
distribution is not uniform)...

Is this correct?

Thanks a lot,
--
MaXX

Re: Partial indexes Vs standard indexes : Insert

From
Jeff Davis
Date:
On Tue, 2006-08-15 at 13:13 +0200, MaXX wrote:
> Hi,
>
> I just want to verify if I'm understanding this correctly:
>
> I have a table in which I store log from my firewall.
> For the protocol column (3 distinct values: TCP ~82%, UDP ~17%, ICMP
> ~1%, the table contains 1.7M rows), I use a partial index to find ICMP
> packets faster.
>
> In my understanding, a partial index is only touched when a matching row
> is inserted/updated/deleted (index constraint is true), so if I create a
> partial index for each protocol, I will slow down my machine as if I had
> created a single "normal" index, but it will find rows faster (the
> distribution is not uniform)...
>
> Is this correct?

That should work. Keep in mind that the main idea of an index is to
reduce the number of pages that have to be fetched from disk. If the
record size is small, you may have at least one ICMP packet on 50% (or
more) of the disk pages even if ICMP packets only make up 1% of the
total records. Even if they aren't inserted randomly, updates/deletes
may randomize the distribution somewhat. If you have an ICMP packet on
every other page, you might not be impressed with the performance versus
a sequential scan. However, it could be a big win if you have other
WHERE conditions aside from just the packet type.

The planner tries to take all of these things into consideration to some
degree. The best test is to try EXPLAIN or EXPLAIN ANALYZE to see what
plan it makes. Also, try forcing different types of plans to see if the
planner is making the right choice.

Regards,
    Jeff Davis


Re: Partial indexes Vs standard indexes : Insert performance

From
Gregory Stark
Date:
MaXX <bs139412@skynet.be> writes:

> In my understanding, a partial index is only touched when a matching row is
> inserted/updated/deleted (index constraint is true), so if I create a partial
> index for each protocol, I will slow down my machine as if I had created a
> single "normal" index, but it will find rows faster (the distribution is not
> uniform)...
>
> Is this correct?

Everything up to the "find rows faster" is pretty much true.

"find rows faster" depends on exactly how you define your indexes, what your
queries look like, and what the distribution of both the queries and the data
look like.

Where it really helps is when you're processing a whole bunch of records and
using the partial index expression in addition the key column effectively lets
you combine two constraints on your query. To get the same effect without the
partial index you would either need a compound key which would take a lot more
space and cause more i/o or you would need two separate indexes that postgres
would combine with a bitmap index scan but that wouldn't be as effective.

So for example if there are a million packets to a given host but only 100k
that were TCP then a partial index on <host where proto = TCP> would let you
scan only the 100k instead of having to scan the million and look at each one
to discard it. And it would let you do that without having to create a much
larger index on <proto,host> or combine two indexes one on <proto> and one on
<host> either of which would be much slower and take more space.

But if you're just looking up a single record I wouldn't expect it to be much
faster to look it up in the smaller partial index than in the larger index.
Indexes find records in log(n) time and log() grows awfully slowly. At best
you're basically skipping a single tree level in favour of earlier query
planning which is probably not going to be noticeable.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: Partial indexes Vs standard indexes : Insert performance

From
Tom Lane
Date:
Gregory Stark <gsstark@mit.edu> writes:
> But if you're just looking up a single record I wouldn't expect it to be much
> faster to look it up in the smaller partial index than in the larger index.
> Indexes find records in log(n) time and log() grows awfully slowly.

Yeah.  Given the proportions mentioned in the original message, I think
one index on the whole table and one on just the ICMP records is
probably the best solution.  A partial index covering most of a table is
not going to win enough to justify its maintenance overhead.

            regards, tom lane

Re: Partial indexes Vs standard indexes : Insert performance

From
MaXX
Date:
Gregory Stark wrote:
> MaXX <bs139412@skynet.be> writes:
>> In my understanding, a partial index is only touched when a matching row is
>> inserted/updated/deleted (index constraint is true), so if I create a partial
>> index for each protocol, I will slow down my machine as if I had created a
>> single "normal" index, but it will find rows faster (the distribution is not
>> uniform)...
>> Is this correct?
[snip]
> So for example if there are a million packets to a given host but only 100k
> that were TCP then a partial index on <host where proto = TCP> would let you
> scan only the 100k instead of having to scan the million and look at each one
> to discard it. And it would let you do that without having to create a much
> larger index on <proto,host> or combine two indexes one on <proto> and one on
> <host> either of which would be much slower and take more space.
OK. I made some test with the queries actually run by my app and I found
a new usefull indexes to replace another.
I can see a real improvement from 112ms to 4ms in the query to find ICMP
pkts.

> But if you're just looking up a single record I wouldn't expect it to be much
> faster to look it up in the smaller partial index than in the larger index.
> Indexes find records in log(n) time and log() grows awfully slowly. At best
> you're basically skipping a single tree level in favour of earlier query
> planning which is probably not going to be noticeable.

I'm taking good note of this.

Thanks a lot,
--
MaXX

Re: Partial indexes Vs standard indexes : Insert

From
MaXX
Date:
Jeff Davis wrote:
> On Tue, 2006-08-15 at 13:13 +0200, MaXX wrote:
[snip]
>> I have a table in which I store log from my firewall.
>> For the protocol column (3 distinct values: TCP ~82%, UDP ~17%, ICMP
>> ~1%, the table contains 1.7M rows), I use a partial index to find ICMP
>> packets faster.
It's ICMP ~0.1%
>> In my understanding, a partial index is only touched when a matching row
>> is inserted/updated/deleted (index constraint is true), so if I create a
>> partial index for each protocol, I will slow down my machine as if I had
>> created a single "normal" index, but it will find rows faster (the
>> distribution is not uniform)...
>> Is this correct?
> That should work. Keep in mind that the main idea of an index is to
> reduce the number of pages that have to be fetched from disk. If the
> record size is small, you may have at least one ICMP packet on 50% (or
> more) of the disk pages even if ICMP packets only make up 1% of the
> total records. Even if they aren't inserted randomly, updates/deletes
> may randomize the distribution somewhat. If you have an ICMP packet on
> every other page, you might not be impressed with the performance versus
> a sequential scan. However, it could be a big win if you have other
> WHERE conditions aside from just the packet type.
OK, so that works well for queries where there is a very few rows in the
index in regard of the table size, and as long as this still true.

> The planner tries to take all of these things into consideration to some
> degree. The best test is to try EXPLAIN or EXPLAIN ANALYZE to see what
> plan it makes. Also, try forcing different types of plans to see if the
> planner is making the right choice.
I did some test and with both your reply and the one of Gregory Stark, I
was able identify what are good indexes and speed up the thing...

Thanks a lot,
--
MaXX

Re: Partial indexes Vs standard indexes : Insert

From
Jeff Davis
Date:
On Wed, 2006-08-16 at 12:15 +0200, MaXX wrote:
> > That should work. Keep in mind that the main idea of an index is to
> > reduce the number of pages that have to be fetched from disk. If the
> > record size is small, you may have at least one ICMP packet on 50% (or
> > more) of the disk pages even if ICMP packets only make up 1% of the
> > total records. Even if they aren't inserted randomly, updates/deletes
> > may randomize the distribution somewhat. If you have an ICMP packet on
> > every other page, you might not be impressed with the performance versus
> > a sequential scan. However, it could be a big win if you have other
> > WHERE conditions aside from just the packet type.
> OK, so that works well for queries where there is a very few rows in the
> index in regard of the table size, and as long as this still true.
>

Indexes work well when you have a WHERE clause that's highly restrictive
and reduces the number of pages needed from disk substantially. Partial
indexes work well when you're concerned about the index growing too
large (and requiring more maintenance), especially with keys you don't
need.

> > The planner tries to take all of these things into consideration to some
> > degree. The best test is to try EXPLAIN or EXPLAIN ANALYZE to see what
> > plan it makes. Also, try forcing different types of plans to see if the
> > planner is making the right choice.
> I did some test and with both your reply and the one of Gregory Stark, I
> was able identify what are good indexes and speed up the thing...
>

Excellent! Results are what count :)

Regards,
    Jeff Davis