Thread: Efficiently searching for CIDRs containing an IP address

Efficiently searching for CIDRs containing an IP address

From
"David F. Skoll"
Date:
Hi,

I have a table like this:

CREATE TABLE networks (
       iprange CIDR,
       datum   INTEGER
);

and I want to efficiently support queries like this:

SELECT * FROM networks WHERE '128.3.4.5' <<= iprange;

There doesn't seem to be any indexing mechanism in core PostgresSQL that
supports this; it always does a sequential scan.

I've looked at two possibilities so far:

1) ip4r.  This is a non-core module and also only handles IPv4.  I'd prefer
to stick with the native PostgreSQL data types.

2) For our application, we can limit iprange to a /8 at biggest, so
another option is to expand the query like this:

SELECT * FROM networks WHERE iprange IN (
  '128.3.4.5/32','128.3.4.4/31','128.3.4.4/30','128.3.4.0/29','128.3.4.0/28',
  '128.3.4.0/27','128.3.4.0/26','128.3.4.0/25','128.3.4.0/24','128.3.4.0/23',
  '128.3.4.0/22','128.3.0.0/21','128.3.0.0/20','128.3.0.0/19','128.3.0.0/18',
  '128.3.0.0/17','128.3.0.0/16','128.2.0.0/15','128.0.0.0/14','128.0.0.0/13',
  '128.0.0.0/12','128.0.0.0/11','128.0.0.0/10','128.0.0.0/9', '128.0.0.0/8');

which lets you use a btree index, but seems grotesque to me.  This kind of
query seems like a common thing to want to do... anyone have any good ideas
how to do it efficiently?

Regards,

David.


Re: Efficiently searching for CIDRs containing an IP address

From
Tom Lane
Date:
"David F. Skoll" <dfs@roaringpenguin.com> writes:
> I want to efficiently support queries like this:
> SELECT * FROM networks WHERE '128.3.4.5' <<= iprange;
> There doesn't seem to be any indexing mechanism in core PostgresSQL that
> supports this; it always does a sequential scan.

Yeah.  Something that's been on the TODO list for a long time is to put
together a GIST index opclass that handles this sort of thing.  I have
not looked at ip4r in detail, but it seems like whatever that's doing
for index support could be transposed to the built-in types.  Or you
could base it on the contrib/seg indexing code (but beware that we've
recently found serious performance bugs in the latter).

            regards, tom lane

Re: Efficiently searching for CIDRs containing an IP address

From
"David F. Skoll"
Date:
I've done some experiments; here are my results for posterity and Google:

I installed the ip4r exension and created the following database:

CREATE TABLE ip4r_networks (
    iprange ip4r,
    satellite integer
);
CREATE INDEX foo2 ON ip4r_networks USING gist (iprange);

CREATE TABLE networks (
    iprange cidr,
    satellite integer
);
CREATE INDEX foo ON networks USING btree (iprange);

I then populated ip4r_networks and networks with 1.78 million rows of
randomly-generated CIDRs, ranging randomly from /16 networks to /32.

Typical queries (recall that for our application, we don't allow networks
"larger" than /8)

=============================================================================
-- A hack expanding an IP address into 25 nested CIDRs...
EXPLAIN ANALYZE SELECT * FROM networks
WHERE iprangeIN('43.45.67.89/32', ..., '43.0.0.0/8');

-- Cleaned-up EXPLAIN results:
Bitmap Heap Scan on networks  (cost=106.67..203.68 rows=25 width=11) (actual time=0.317..0.323 rows=2 loops=1)
   Recheck Cond: ((iprange)::inet = ANY (('{...}'::cidr[])::inet[]))
   ->  Bitmap Index Scan on foo  (cost=0.00..106.66 rows=25 width=0) (actual time=0.304..0.304 rows=2 loops=1)
         Index Cond: ((iprange)::inet = ANY (('{...}'::cidr[])::inet[]))
 Total runtime: 0.387 ms
=============================================================================
-- Using ip4r
EXPLAIN ANALYZE SELECT * FROM ip4r_networks WHERE '43.45.67.89' <<= iprange;

Bitmap Heap Scan on ip4r_networks  (cost=50.37..4450.33 rows=1780 width=12) (actual time=0.123..0.129 rows=2 loops=1)
   Recheck Cond: ('43.45.67.89'::ip4r <<= iprange)
   ->  Bitmap Index Scan on foo2  (cost=0.00..49.93 rows=1780 width=0) (actual time=0.109..0.109 rows=2 loops=1)
         Index Cond: ('43.45.67.89'::ip4r <<= iprange)
 Total runtime: 0.203 ms
=============================================================================

My results show that ip4r is consistently about twice as fast as the
hack that uses 25 nested CIDRs in an IN clause.  However, creating the
GiST index is much, much slower than creating the btree index.  And
the hack has acceptable performance, so we'll probably go with the
hack.

Note that the hack only works for the specific case of CIDRs.  It obviously
won't work for general IP ranges that might not be on CIDR boundaries.

Regards,

David.


Re: Efficiently searching for CIDRs containing an IP address

From
Alan McKay
Date:
Hmmm, I've never done this quite that way, but IPs - especially CIDRs
- are far easier to work with in binary format than in human-readable
format.  At my old workplace about 5 years ago I wrote an IP
management system (PHP/MySQL) that stored the IP in binary and
human-readable formats, but all of the computations and comparisons
and other such stuff always took place with the binary values (binary
stored as a string of 0s and 1s as I recall).

So is this not simply easier to implement with a library of functions
to convert a string to binary and back?  I recall in my implementation
I had only 4 or 5 functions including converting back-and-forth from
binary to human-readable, binary-AND, binary-OR and maybe one or two
others.


--
“Mother Nature doesn’t do bailouts.”
         - Glenn Prickett

Re: Efficiently searching for CIDRs containing an IP address

From
"David F. Skoll"
Date:
Alan McKay wrote:

> So is this not simply easier to implement with a library of functions
> to convert a string to binary and back?

The representation format has nothing to do with making a range search
efficiently use an index, though.

Regards,

David.