Thread: Partial indexes Vs standard indexes : Insert performance
Hi, I just want to verify if I'm understanding this correctly: I have a table in which I store log from my firewall. For the protocol column (3 distinct values: TCP ~82%, UDP ~17%, ICMP ~1%, the table contains 1.7M rows), I use a partial index to find ICMP packets faster. In my understanding, a partial index is only touched when a matching row is inserted/updated/deleted (index constraint is true), so if I create a partial index for each protocol, I will slow down my machine as if I had created a single "normal" index, but it will find rows faster (the distribution is not uniform)... Is this correct? Thanks a lot, -- MaXX
On Tue, 2006-08-15 at 13:13 +0200, MaXX wrote: > Hi, > > I just want to verify if I'm understanding this correctly: > > I have a table in which I store log from my firewall. > For the protocol column (3 distinct values: TCP ~82%, UDP ~17%, ICMP > ~1%, the table contains 1.7M rows), I use a partial index to find ICMP > packets faster. > > In my understanding, a partial index is only touched when a matching row > is inserted/updated/deleted (index constraint is true), so if I create a > partial index for each protocol, I will slow down my machine as if I had > created a single "normal" index, but it will find rows faster (the > distribution is not uniform)... > > Is this correct? That should work. Keep in mind that the main idea of an index is to reduce the number of pages that have to be fetched from disk. If the record size is small, you may have at least one ICMP packet on 50% (or more) of the disk pages even if ICMP packets only make up 1% of the total records. Even if they aren't inserted randomly, updates/deletes may randomize the distribution somewhat. If you have an ICMP packet on every other page, you might not be impressed with the performance versus a sequential scan. However, it could be a big win if you have other WHERE conditions aside from just the packet type. The planner tries to take all of these things into consideration to some degree. The best test is to try EXPLAIN or EXPLAIN ANALYZE to see what plan it makes. Also, try forcing different types of plans to see if the planner is making the right choice. Regards, Jeff Davis
MaXX <bs139412@skynet.be> writes: > In my understanding, a partial index is only touched when a matching row is > inserted/updated/deleted (index constraint is true), so if I create a partial > index for each protocol, I will slow down my machine as if I had created a > single "normal" index, but it will find rows faster (the distribution is not > uniform)... > > Is this correct? Everything up to the "find rows faster" is pretty much true. "find rows faster" depends on exactly how you define your indexes, what your queries look like, and what the distribution of both the queries and the data look like. Where it really helps is when you're processing a whole bunch of records and using the partial index expression in addition the key column effectively lets you combine two constraints on your query. To get the same effect without the partial index you would either need a compound key which would take a lot more space and cause more i/o or you would need two separate indexes that postgres would combine with a bitmap index scan but that wouldn't be as effective. So for example if there are a million packets to a given host but only 100k that were TCP then a partial index on <host where proto = TCP> would let you scan only the 100k instead of having to scan the million and look at each one to discard it. And it would let you do that without having to create a much larger index on <proto,host> or combine two indexes one on <proto> and one on <host> either of which would be much slower and take more space. But if you're just looking up a single record I wouldn't expect it to be much faster to look it up in the smaller partial index than in the larger index. Indexes find records in log(n) time and log() grows awfully slowly. At best you're basically skipping a single tree level in favour of earlier query planning which is probably not going to be noticeable. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark <gsstark@mit.edu> writes: > But if you're just looking up a single record I wouldn't expect it to be much > faster to look it up in the smaller partial index than in the larger index. > Indexes find records in log(n) time and log() grows awfully slowly. Yeah. Given the proportions mentioned in the original message, I think one index on the whole table and one on just the ICMP records is probably the best solution. A partial index covering most of a table is not going to win enough to justify its maintenance overhead. regards, tom lane
Gregory Stark wrote: > MaXX <bs139412@skynet.be> writes: >> In my understanding, a partial index is only touched when a matching row is >> inserted/updated/deleted (index constraint is true), so if I create a partial >> index for each protocol, I will slow down my machine as if I had created a >> single "normal" index, but it will find rows faster (the distribution is not >> uniform)... >> Is this correct? [snip] > So for example if there are a million packets to a given host but only 100k > that were TCP then a partial index on <host where proto = TCP> would let you > scan only the 100k instead of having to scan the million and look at each one > to discard it. And it would let you do that without having to create a much > larger index on <proto,host> or combine two indexes one on <proto> and one on > <host> either of which would be much slower and take more space. OK. I made some test with the queries actually run by my app and I found a new usefull indexes to replace another. I can see a real improvement from 112ms to 4ms in the query to find ICMP pkts. > But if you're just looking up a single record I wouldn't expect it to be much > faster to look it up in the smaller partial index than in the larger index. > Indexes find records in log(n) time and log() grows awfully slowly. At best > you're basically skipping a single tree level in favour of earlier query > planning which is probably not going to be noticeable. I'm taking good note of this. Thanks a lot, -- MaXX
Jeff Davis wrote: > On Tue, 2006-08-15 at 13:13 +0200, MaXX wrote: [snip] >> I have a table in which I store log from my firewall. >> For the protocol column (3 distinct values: TCP ~82%, UDP ~17%, ICMP >> ~1%, the table contains 1.7M rows), I use a partial index to find ICMP >> packets faster. It's ICMP ~0.1% >> In my understanding, a partial index is only touched when a matching row >> is inserted/updated/deleted (index constraint is true), so if I create a >> partial index for each protocol, I will slow down my machine as if I had >> created a single "normal" index, but it will find rows faster (the >> distribution is not uniform)... >> Is this correct? > That should work. Keep in mind that the main idea of an index is to > reduce the number of pages that have to be fetched from disk. If the > record size is small, you may have at least one ICMP packet on 50% (or > more) of the disk pages even if ICMP packets only make up 1% of the > total records. Even if they aren't inserted randomly, updates/deletes > may randomize the distribution somewhat. If you have an ICMP packet on > every other page, you might not be impressed with the performance versus > a sequential scan. However, it could be a big win if you have other > WHERE conditions aside from just the packet type. OK, so that works well for queries where there is a very few rows in the index in regard of the table size, and as long as this still true. > The planner tries to take all of these things into consideration to some > degree. The best test is to try EXPLAIN or EXPLAIN ANALYZE to see what > plan it makes. Also, try forcing different types of plans to see if the > planner is making the right choice. I did some test and with both your reply and the one of Gregory Stark, I was able identify what are good indexes and speed up the thing... Thanks a lot, -- MaXX
On Wed, 2006-08-16 at 12:15 +0200, MaXX wrote: > > That should work. Keep in mind that the main idea of an index is to > > reduce the number of pages that have to be fetched from disk. If the > > record size is small, you may have at least one ICMP packet on 50% (or > > more) of the disk pages even if ICMP packets only make up 1% of the > > total records. Even if they aren't inserted randomly, updates/deletes > > may randomize the distribution somewhat. If you have an ICMP packet on > > every other page, you might not be impressed with the performance versus > > a sequential scan. However, it could be a big win if you have other > > WHERE conditions aside from just the packet type. > OK, so that works well for queries where there is a very few rows in the > index in regard of the table size, and as long as this still true. > Indexes work well when you have a WHERE clause that's highly restrictive and reduces the number of pages needed from disk substantially. Partial indexes work well when you're concerned about the index growing too large (and requiring more maintenance), especially with keys you don't need. > > The planner tries to take all of these things into consideration to some > > degree. The best test is to try EXPLAIN or EXPLAIN ANALYZE to see what > > plan it makes. Also, try forcing different types of plans to see if the > > planner is making the right choice. > I did some test and with both your reply and the one of Gregory Stark, I > was able identify what are good indexes and speed up the thing... > Excellent! Results are what count :) Regards, Jeff Davis