Thread: Efficiently searching for CIDRs containing an IP address
Hi, I have a table like this: CREATE TABLE networks ( iprange CIDR, datum INTEGER ); and I want to efficiently support queries like this: SELECT * FROM networks WHERE '128.3.4.5' <<= iprange; There doesn't seem to be any indexing mechanism in core PostgresSQL that supports this; it always does a sequential scan. I've looked at two possibilities so far: 1) ip4r. This is a non-core module and also only handles IPv4. I'd prefer to stick with the native PostgreSQL data types. 2) For our application, we can limit iprange to a /8 at biggest, so another option is to expand the query like this: SELECT * FROM networks WHERE iprange IN ( '128.3.4.5/32','128.3.4.4/31','128.3.4.4/30','128.3.4.0/29','128.3.4.0/28', '128.3.4.0/27','128.3.4.0/26','128.3.4.0/25','128.3.4.0/24','128.3.4.0/23', '128.3.4.0/22','128.3.0.0/21','128.3.0.0/20','128.3.0.0/19','128.3.0.0/18', '128.3.0.0/17','128.3.0.0/16','128.2.0.0/15','128.0.0.0/14','128.0.0.0/13', '128.0.0.0/12','128.0.0.0/11','128.0.0.0/10','128.0.0.0/9', '128.0.0.0/8'); which lets you use a btree index, but seems grotesque to me. This kind of query seems like a common thing to want to do... anyone have any good ideas how to do it efficiently? Regards, David.
"David F. Skoll" <dfs@roaringpenguin.com> writes: > I want to efficiently support queries like this: > SELECT * FROM networks WHERE '128.3.4.5' <<= iprange; > There doesn't seem to be any indexing mechanism in core PostgresSQL that > supports this; it always does a sequential scan. Yeah. Something that's been on the TODO list for a long time is to put together a GIST index opclass that handles this sort of thing. I have not looked at ip4r in detail, but it seems like whatever that's doing for index support could be transposed to the built-in types. Or you could base it on the contrib/seg indexing code (but beware that we've recently found serious performance bugs in the latter). regards, tom lane
I've done some experiments; here are my results for posterity and Google: I installed the ip4r exension and created the following database: CREATE TABLE ip4r_networks ( iprange ip4r, satellite integer ); CREATE INDEX foo2 ON ip4r_networks USING gist (iprange); CREATE TABLE networks ( iprange cidr, satellite integer ); CREATE INDEX foo ON networks USING btree (iprange); I then populated ip4r_networks and networks with 1.78 million rows of randomly-generated CIDRs, ranging randomly from /16 networks to /32. Typical queries (recall that for our application, we don't allow networks "larger" than /8) ============================================================================= -- A hack expanding an IP address into 25 nested CIDRs... EXPLAIN ANALYZE SELECT * FROM networks WHERE iprangeIN('43.45.67.89/32', ..., '43.0.0.0/8'); -- Cleaned-up EXPLAIN results: Bitmap Heap Scan on networks (cost=106.67..203.68 rows=25 width=11) (actual time=0.317..0.323 rows=2 loops=1) Recheck Cond: ((iprange)::inet = ANY (('{...}'::cidr[])::inet[])) -> Bitmap Index Scan on foo (cost=0.00..106.66 rows=25 width=0) (actual time=0.304..0.304 rows=2 loops=1) Index Cond: ((iprange)::inet = ANY (('{...}'::cidr[])::inet[])) Total runtime: 0.387 ms ============================================================================= -- Using ip4r EXPLAIN ANALYZE SELECT * FROM ip4r_networks WHERE '43.45.67.89' <<= iprange; Bitmap Heap Scan on ip4r_networks (cost=50.37..4450.33 rows=1780 width=12) (actual time=0.123..0.129 rows=2 loops=1) Recheck Cond: ('43.45.67.89'::ip4r <<= iprange) -> Bitmap Index Scan on foo2 (cost=0.00..49.93 rows=1780 width=0) (actual time=0.109..0.109 rows=2 loops=1) Index Cond: ('43.45.67.89'::ip4r <<= iprange) Total runtime: 0.203 ms ============================================================================= My results show that ip4r is consistently about twice as fast as the hack that uses 25 nested CIDRs in an IN clause. However, creating the GiST index is much, much slower than creating the btree index. And the hack has acceptable performance, so we'll probably go with the hack. Note that the hack only works for the specific case of CIDRs. It obviously won't work for general IP ranges that might not be on CIDR boundaries. Regards, David.
Hmmm, I've never done this quite that way, but IPs - especially CIDRs - are far easier to work with in binary format than in human-readable format. At my old workplace about 5 years ago I wrote an IP management system (PHP/MySQL) that stored the IP in binary and human-readable formats, but all of the computations and comparisons and other such stuff always took place with the binary values (binary stored as a string of 0s and 1s as I recall). So is this not simply easier to implement with a library of functions to convert a string to binary and back? I recall in my implementation I had only 4 or 5 functions including converting back-and-forth from binary to human-readable, binary-AND, binary-OR and maybe one or two others. -- “Mother Nature doesn’t do bailouts.” - Glenn Prickett
Alan McKay wrote: > So is this not simply easier to implement with a library of functions > to convert a string to binary and back? The representation format has nothing to do with making a range search efficiently use an index, though. Regards, David.