Thread: Best way to index IP data?

Best way to index IP data?

From

Kevin Kempter

Date:

10 January 2008, 19:15:22

Hi List;

We'll be loading a table with begining & ending I.P.'s - the table will likely
have upwards of 30million rows.  Any thoughts on how to get the best
performance out of queries that want to look for IP ranges or the use of
between queries? Should these be modeled as integers?

Thanks in advance

/Kevin

Re: Best way to index IP data?

From

"Joshua D. Drake"

Date:

10 January 2008, 19:18:35

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 10 Jan 2008 16:14:54 -0700
Kevin Kempter <kevin@kevinkempterllc.com> wrote:

> Hi List;
> 
> We'll be loading a table with begining & ending I.P.'s - the table
> will likely have upwards of 30million rows.  Any thoughts on how to
> get the best performance out of queries that want to look for IP
> ranges or the use of between queries? Should these be modeled as
> integers? 
> 

http://www.postgresql.org/docs/current/static/datatype-net-types.html

> Thanks in advance
> 
> /Kevin
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 1: if posting/reading
> through Usenet, please send an appropriate subscribe-nomail command
> to majordomo@postgresql.org so that your message can get through to
> the mailing list cleanly
> 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHhqjHATb/zqfZUUQRAvMOAJ984Np5GMrFd1vixP/zECIl3qUWYgCff6U4
bCBBz1VaxqIoZfCFfKEIZLU=
=+9vD
-----END PGP SIGNATURE-----

Re: Best way to index IP data?

From

Steve Atkins

Date:

10 January 2008, 19:25:33

On Jan 10, 2008, at 3:14 PM, Kevin Kempter wrote:

> Hi List;
>
> We'll be loading a table with begining & ending I.P.'s - the table
> will likely
> have upwards of 30million rows.  Any thoughts on how to get the best
> performance out of queries that want to look for IP ranges or the
> use of
> between queries? Should these be modeled as integers?

http://pgfoundry.org/projects/ip4r/

That has the advantage over using integers, or the built-in inet type,
of being indexable for range and overlap queries.

Cheers,
   Steve

Re: Best way to index IP data?

From

"Jonah H. Harris"

Date:

10 January 2008, 19:42:10

On Jan 10, 2008 6:25 PM, Steve Atkins <steve@blighty.com> wrote:
> http://pgfoundry.org/projects/ip4r/
>
> That has the advantage over using integers, or the built-in inet type,
> of being indexable for range and overlap queries.

Agreed.  ip4r is da bomb.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation                | fax: 732.331.1301
499 Thornall Street, 2nd Floor          | jonah.harris@enterprisedb.com
Edison, NJ 08837                        | http://www.enterprisedb.com/

Re: Best way to index IP data?

From

Pomarede Nicolas

Date:

11 January 2008, 05:31:36

On Thu, 10 Jan 2008, Jonah H. Harris wrote:

> On Jan 10, 2008 6:25 PM, Steve Atkins <steve@blighty.com> wrote:
>> http://pgfoundry.org/projects/ip4r/
>>
>> That has the advantage over using integers, or the built-in inet type,
>> of being indexable for range and overlap queries.
>
> Agreed.  ip4r is da bomb.

Hello to all,

I also have to store a lot of IP v4 addresses, and I think the internal
inet type is somewhat overkill for that, since it always require 8 bytes,
even if you don't need to store a netmask.
When storing millions of IP add, this means MB of space used for nothing
in that case.

As ip4r seems to work very well with postgresql, is there a possibility to
see it merged in postgresql, to have a native 4 bytes IPv4 address date
type ?

Nicolas

Re: Best way to index IP data?

From

Tom Lane

Date:

11 January 2008, 11:20:00

Pomarede Nicolas <npomarede@corp.free.fr> writes:
> As ip4r seems to work very well with postgresql, is there a possibility to
> see it merged in postgresql, to have a native 4 bytes IPv4 address date
> type ?

Given that the world is going to IPv6 in a few years whether you like it
or not, that seems pretty darn short-sighted to me.

What would make sense IMHO is to adapt the improved indexing support in
ip4r to work on the native inet/cidr types.

            regards, tom lane

Re: Best way to index IP data?

From

Andrew Sullivan

Date:

11 January 2008, 11:38:04

On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote:
> Given that the world is going to IPv6 in a few years whether you like it
> or not, that seems pretty darn short-sighted to me.

Indeed.  Even ARIN has finally started to tell people that IPv4 is running
out.  There are currently significant deployments of IPv6 in the
Asia-Pacific region.  And it appears that Comcast is planning to move to
IPv6 for its own network deployment, which may mean that many U.S. homes
will have native v6 in the near future (the upshot of their plans aren't
actually clear to me yet, but if you're interested in some of what they're
telling people they're doing, look for Alain Durand's presentation to the
v6ops working group at the last IETF meeting).

> What would make sense IMHO is to adapt the improved indexing support in
> ip4r to work on the native inet/cidr types.

This seems like a good idea to me.

Re: Best way to index IP data?

From

Pomarede Nicolas

Date:

11 January 2008, 11:55:44

On Fri, 11 Jan 2008, Tom Lane wrote:

> Pomarede Nicolas <npomarede@corp.free.fr> writes:
>> As ip4r seems to work very well with postgresql, is there a possibility to
>> see it merged in postgresql, to have a native 4 bytes IPv4 address date
>> type ?
>
> Given that the world is going to IPv6 in a few years whether you like it
> or not, that seems pretty darn short-sighted to me.
>
> What would make sense IMHO is to adapt the improved indexing support in
> ip4r to work on the native inet/cidr types.
>

I understand your point on IPv6, but still being able to store IPv4
addresses with as little overhead as possible is important.

IPv6 will certainly grow in the year to come, but if you consider the case
of a very large private lan, with ip in the form 10.x.y.z, the fact that
the outside world is now ipv6 doesn't necessarily imply you will rename
all your internal equipments to be ipv6 if you don't need more addresses
(you can do the translation when packets cross the ipv6/ipv4 gateway in
your network).

To be more concret, I'm working for a large french ISP, so far we have 2+
millions "boxes" (triple play adsl equipments) at our customers' home.

All theses boxes have a private IPv4 address in 10.x.y.z as well as a
public IPV4 address, and although we recently activated a public IPv6 addr
on these boxes too (which certainly gives one of the biggest IPv6 network
so far), we still need to store one ipv4 and one ipv6 addr for each box.

So, my point was not to be short-sighted, we will go IPv6 for sure, it's
just that there're a lot of applications where storing ipv4 addr could be
needed (whether ipv6 is required for other applications or not), and in
this regard, I think that being able to store ipv4 addr with 4 bytes
instead of 8 could be appreciated.

Or perhaps another solution would be to have built-in inet_aton /
inet_ntoa functions in postgres, to store the result using an integer
(unsigned) ?

Nicolas

Re: Best way to index IP data?

From

Steve Atkins

Date:

11 January 2008, 12:11:31

On Jan 11, 2008, at 7:19 AM, Tom Lane wrote:

> Pomarede Nicolas <npomarede@corp.free.fr> writes:
>> As ip4r seems to work very well with postgresql, is there a
>> possibility to
>> see it merged in postgresql, to have a native 4 bytes IPv4 address
>> date
>> type ?
>
> Given that the world is going to IPv6 in a few years whether you
> like it
> or not, that seems pretty darn short-sighted to me.
>
> What would make sense IMHO is to adapt the improved indexing support
> in
> ip4r to work on the native inet/cidr types.

Can't be done. The native types are too limited to be effectively
indexed
in that  way - they cannot represent arbitrary ranges. ip4r started
with me
trying to retrofit decent indexing onto the cidr type and failing
miserably.

I'll likely be rolling out ip6r/ipr sometime in 2008, as users are
beginning to
express an interest. But even then I don't expect it to replace the inet
and cidr types in core, because it isn't compatible with them.

I'd actually support removing inet/cidr from core completely in the
longer
run. Postgresql is extensible, so we really don't need types used only
by niche users in core, once we have pgfoundry and something like
mysqludf.org/CPAN. But that's a longer term thought.

Cheers,
   Steve

Re: Best way to index IP data?

From

"Joshua D. Drake"

Date:

11 January 2008, 12:18:25

Steve Atkins wrote:

> I'd actually support removing inet/cidr from core completely in the longer
> run. Postgresql is extensible, so we really don't need types used only
> by niche users in core, once we have pgfoundry and something like
> mysqludf.org/CPAN. But that's a longer term thought.

I believe this is going to be solved by postgresqlpackages.org.

Joshua D. Drake

>
> Cheers,
>   Steve
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 12:28:31

On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote:
>Given that the world is going to IPv6 in a few years whether you like it
>or not, that seems pretty darn short-sighted to me.

Well, a native IPv6 type would also be nice; inet is ridiculously
bloated for both IPv4 *and* IPv6.

Mike Stone

Re: Best way to index IP data?

From

Tom Lane

Date:

11 January 2008, 16:07:46

Michael Stone <mstone+postgres@mathom.us> writes:
> Well, a native IPv6 type would also be nice; inet is ridiculously
> bloated for both IPv4 *and* IPv6.

Nonsense.  3 bytes overhead on a 16-byte address is not "ridiculously
bloated", especially if you want a netmask with it.

            regards, tom lane

Re: Best way to index IP data?

From

"D'Arcy J.M. Cain"

Date:

11 January 2008, 16:19:46

On Fri, 11 Jan 2008 15:07:38 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Stone <mstone+postgres@mathom.us> writes:
> > Well, a native IPv6 type would also be nice; inet is ridiculously
> > bloated for both IPv4 *and* IPv6.
>
> Nonsense.  3 bytes overhead on a 16-byte address is not "ridiculously
> bloated", especially if you want a netmask with it.

Besides, there are many cases where you want to track both ipv4 and
ipv6 for the same purpose and requiring two different fields would be
less than ideal.

--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 17:27:26

On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote:
>Michael Stone <mstone+postgres@mathom.us> writes:
>> Well, a native IPv6 type would also be nice; inet is ridiculously
>> bloated for both IPv4 *and* IPv6.
>
>Nonsense.  3 bytes overhead on a 16-byte address is not "ridiculously
>bloated", especially if you want a netmask with it.

Big if, no? There's a very large set of users that *don't* want/need a
netmask, which is why the topic keeps coming back. (Also, according to
the docs, inet requires 24 bytes, which is 50% more than needed; is that
not correct?)

Mike Stone

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 17:32:30

On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote:
>Besides, there are many cases where you want to track both ipv4 and
>ipv6 for the same purpose and requiring two different fields would be
>less than ideal.

And, there are many cases where you don't. I've got two kinds of db's
that have IPs in them. In some, the IP is a small part of a table which
is focused on something else. For those I use inet, which provides a
nice bit of future-proofing. In other db's the IPs are the primary
focus. There are lots and lots of IPs, and the space used by IPs may be
the largest chunk of a particular table. For those tables, I don't use
inet because the overhead really is a significant fraction of the space.

Mike Stone

Re: Best way to index IP data?

From

Alvaro Herrera

Date:

11 January 2008, 17:37:21

Michael Stone wrote:
> On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote:
>> Michael Stone <mstone+postgres@mathom.us> writes:
>>> Well, a native IPv6 type would also be nice; inet is ridiculously bloated
>>> for both IPv4 *and* IPv6.
>>
>> Nonsense.  3 bytes overhead on a 16-byte address is not "ridiculously
>> bloated", especially if you want a netmask with it.
>
> Big if, no? There's a very large set of users that *don't* want/need a
> netmask, which is why the topic keeps coming back. (Also, according to the
> docs, inet requires 24 bytes, which is 50% more than needed; is that not
> correct?)

So what this means is that our type oughta be optimized.  How about
having a separate bit to indicate whether there is a netmask or not, and
chop the storage earlier.  (I dunno if this already done)

Also, with packed varlenas the overhead is reduced AFAIK.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 17:41:06

On Fri, Jan 11, 2008 at 04:32:05PM -0500, Michael Stone wrote:
>On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote:
>>Besides, there are many cases where you want to track both ipv4 and
>>ipv6 for the same purpose and requiring two different fields would be
>>less than ideal.
>
>And, there are many cases where you don't. I've got two kinds of db's
>that have IPs in them. In some, the IP is a small part of a table which
>is focused on something else. For those I use inet, which provides a
>nice bit of future-proofing. In other db's the IPs are the primary
>focus. There are lots and lots of IPs, and the space used by IPs may be
>the largest chunk of a particular table. For those tables, I don't use
>inet because the overhead really is a significant fraction of the space.

Oh, yeah, the latter type also has seperate IPv4 and IPv6 tables,
because there's no point in bloating 99% of the data for the 1% that's
IPv6. Is that a niche requirement? Maybe--but I think that storing
netmasks is even *more* of a niche...

I'm not arguing for the removal of inet, but I do think there's room for
more than one type--and I certainly think its nuts to pretend that inet
can meet every requirement well.

Mike Stone

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 18:02:52

On Fri, Jan 11, 2008 at 06:37:10PM -0300, Alvaro Herrera wrote:
>So what this means is that our type oughta be optimized.  How about
>having a separate bit to indicate whether there is a netmask or not, and
>chop the storage earlier.  (I dunno if this already done)

Why not just have a type that indicates whether there is a netmask or
not? We currently have this (8.3 docs, which I see reflects the 3 byte
overhead--down to 20% rather than 50% for IPv6):

cidr    7 or 19 bytes    IPv4 and IPv6 networks
inet    7 or 19 bytes    IPv4 and IPv6 hosts and networks

Note that there's a type for (networks), and there's a type for (hosts and
networks), but there's a conspicuous lack of a type for (hosts). I
suppose if you really are sure that you want to store hosts and not
networks you should use inet and then set a constraint like
  if (family() == 4 && masklen() == 32)
  elsif (family() == 6 && masklen() == 128)

(For people whose databases don't resolve around network data, this
probably seems like not a big deal. OTOH, I can only imagine the outcry
if the only available arithmetic type was an intfloat, which can be
either an integer or a real number, has very low overhead to keep track
of whether there's a decimal point, and can easily be made to behave
like an integer if you set a constraint forbidding fractional parts.
Because, hey, you *never know* when you might need a real number, and
wouldn't want to paint yourself into a corner by stupidly specifying an
integer-only type.)

Mike Stone

Re: Best way to index IP data?

From

Andrew Sullivan

Date:

11 January 2008, 18:24:39

On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote:

> networks), but there's a conspicuous lack of a type for (hosts). I
> suppose if you really are sure that you want to store hosts and not
> networks

Well, part of the trouble is that in the CIDR world, an IP without a netmask
can be dangerously ambiguous.  I can see why the design is as it is for that
reason.  (But I understand the problem.)

A

Re: Best way to index IP data?

From

Steve Atkins

Date:

11 January 2008, 18:38:40

On Jan 11, 2008, at 2:24 PM, Andrew Sullivan wrote:

> On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote:
>
>> networks), but there's a conspicuous lack of a type for (hosts). I
>> suppose if you really are sure that you want to store hosts and not
>> networks
>
> Well, part of the trouble is that in the CIDR world, an IP without a
> netmask
> can be dangerously ambiguous.  I can see why the design is as it is
> for that
> reason.  (But I understand the problem.)

I don't think there's ambiguity about what an dotted-quad without a
netmask
means, and hasn't been for a long time. Am I missing something?

There is ambiguity when you feed non dotted-quads into the
existing cidr I/O functions[1], but that's both a dead horse,
and not something likely to actually affect users negatively.

Cheers,
   Steve

[1] Because postgresql copied obsolete pre-CIDR code from libbind.

Re: Best way to index IP data?

From

Pomarede Nicolas

Date:

11 January 2008, 18:44:07

On Fri, 11 Jan 2008, Andrew Sullivan wrote:

> On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote:
>
>> networks), but there's a conspicuous lack of a type for (hosts). I
>> suppose if you really are sure that you want to store hosts and not
>> networks
>
> Well, part of the trouble is that in the CIDR world, an IP without a netmask
> can be dangerously ambiguous.  I can see why the design is as it is for that
> reason.  (But I understand the problem.)
>
> A
>

Yes, in fact it all depends on the meaning you give to an IP.

If you want to store subnets, then you need an IP and a netmask, but if
you just want to store the IP of a particular equipment (that is, the IP
that will be refered to in the TCP/IP header), then there's no ambiguity,
you just need 4 bytes to describe this IP.

And it's true for IPv6 too, storing an IP that refer to an end point and
not a subnet is requiring twice as much data as needed, because the
netmask would always be ff:ff:ff:..:ff

So, for people dealing with large database of IPs, it would be nice to be
able to save 50% of the corresponding disk/cache/ram space for these IPs.

Nicolas

Re: Best way to index IP data?

From

Andrew Sullivan

Date:

11 January 2008, 19:01:27

On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote:
> I don't think there's ambiguity about what an dotted-quad without a
> netmask
> means, and hasn't been for a long time. Am I missing something?

Well, maybe.  The problem is actually that, without a netmask under CIDR,
the address alone isn't really enough.  You have to have a netmask to get
the packets to the destination.  As it happens, we have some nice
conventions, defined in the RFCs, for how to interpret hosts with no
netmask; note though that some of those are only for humans.  Or, to put it
another way, without context, a dotted-quad is insufficient on its own.
What you're really arguing is that the context ought to be storable
somewhere else (maybe in a human's brain).  I'm not suggesting that's wrong,
but I can see the "correctness" argument that someone might have made to get
to the datatype as it exists.  I think calling it "needless bloat" is just
holding it to the wrong criteria.

If you look at the binary wire data, that netmask is always represented in
some sense.  It can sometimes be more compact than the general-purpose data
type, though, no question.  This is why somewhere in this thread someone
talked about optimisation: there certainly are ways to make these things
more compact.

A

Re: Best way to index IP data?

From

Steve Atkins

Date:

11 January 2008, 19:37:43

On Jan 11, 2008, at 3:00 PM, Andrew Sullivan wrote:

> On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote:
>> I don't think there's ambiguity about what an dotted-quad without a
>> netmask
>> means, and hasn't been for a long time. Am I missing something?
>
> Well, maybe.  The problem is actually that, without a netmask under
> CIDR,
> the address alone isn't really enough.  You have to have a netmask
> to get
> the packets to the destination.

Not really. You may well need netmasks to configure your interface, but
there's absolutely no need for them to identify an IP endpoint, which
is all
you need to identify the destination the packet is going to, and that
is the
most common use of IP addresses.

> As it happens, we have some nice
> conventions, defined in the RFCs, for how to interpret hosts with no
> netmask; note though that some of those are only for humans.  Or, to
> put it
> another way, without context, a dotted-quad is insufficient on its
> own.
> What you're really arguing is that the context ought to be storable
> somewhere else (maybe in a human's brain).

A dotted quad without any additional information is an IPv4 address,
the same as it would be if followed by "/32".

Netmasks are rarely needed at any level above routing
or (some forms of) address assignment, and that's where an awful lot
of use of IP
addresses happens. When you do need netmasks then the cidr type is
great, but that's not the common case.

(And the inet-with-a-netmask is an even odder duck, as it's packing
two mostly
unrelated bits of information into a single type).

> I'm not suggesting that's wrong,
> but I can see the "correctness" argument that someone might have
> made to get
> to the datatype as it exists.  I think calling it "needless bloat"
> is just
> holding it to the wrong criteria.
>
>
> If you look at the binary wire data, that netmask is always
> represented in
> some sense.  It can sometimes be more compact than the general-
> purpose data
> type, though, no question.  This is why somewhere in this thread
> someone
> talked about optimisation: there certainly are ways to make these
> things
> more compact.

I think we're drifting a long way away from a -performance topic here,
as
we're agreed that inet or cidr are likely not the best types for the
original
poster to use.

Cheers,
   Steve

Re: Best way to index IP data?

From

"Joshua D. Drake"

Date:

11 January 2008, 19:48:00

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 11 Jan 2008 15:37:37 -0800
Steve Atkins <steve@blighty.com> wrote:

> > Well, maybe.  The problem is actually that, without a netmask
> > under CIDR,
> > the address alone isn't really enough.  You have to have a netmask  
> > to get
> > the packets to the destination.
> 
> Not really. You may well need netmasks to configure your interface,
> but there's absolutely no need for them to identify an IP endpoint,
> which is all
> you need to identify the destination the packet is going to, and
> that is the
> most common use of IP addresses.

Steve I think you are speaking of practicality and implementation
versus RFC compliance. I believe per the RFC Andrew is correct.

Sincerely,

Joshua D. Drake




- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHiAEtATb/zqfZUUQRAqVjAJ9cXrRmDyCYy1vwP6nYI2kbOlYxKgCgga9q
jIMuXCy8LKquevyPdehaQaA=
=FNIf
-----END PGP SIGNATURE-----

Re: Best way to index IP data?

From

Steve Atkins

Date:

11 January 2008, 19:55:52

On Jan 11, 2008, at 3:52 PM, Joshua D. Drake wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Fri, 11 Jan 2008 15:37:37 -0800
> Steve Atkins <steve@blighty.com> wrote:
>
>>> Well, maybe.  The problem is actually that, without a netmask
>>> under CIDR,
>>> the address alone isn't really enough.  You have to have a netmask
>>> to get
>>> the packets to the destination.
>>
>> Not really. You may well need netmasks to configure your interface,
>> but there's absolutely no need for them to identify an IP endpoint,
>> which is all
>> you need to identify the destination the packet is going to, and
>> that is the
>> most common use of IP addresses.
>
> Steve I think you are speaking of practicality and implementation
> versus RFC compliance. I believe per the RFC Andrew is correct.

I don't believe that's the case, but really we're at "how many angels
dance on the head of a pin" level quibbling by this point. :)

Cheers,
   Steve

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 20:12:07

On Fri, Jan 11, 2008 at 06:00:55PM -0500, Andrew Sullivan wrote:
>another way, without context, a dotted-quad is insufficient on its own.
>What you're really arguing is that the context ought to be storable
>somewhere else (maybe in a human's brain)

Or, say, in a database schema, where you say "this column contains host
IPs"? Are you suggesting that the IP associated with the host can
somehow be ambiguous? IMO, the only ambiguity is that the type is
defined as "network or host".

Mike Stone

Re: Best way to index IP data?

From

Tom Lane

Date:

11 January 2008, 20:19:19

Michael Stone <mstone+postgres@mathom.us> writes:
> On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote:
>> Nonsense.  3 bytes overhead on a 16-byte address is not "ridiculously
>> bloated", especially if you want a netmask with it.

> Big if, no? There's a very large set of users that *don't* want/need a
> netmask, which is why the topic keeps coming back. (Also, according to
> the docs, inet requires 24 bytes, which is 50% more than needed; is that
> not correct?)

It was correct, but not as of 8.3.  Considering you could save a whole
one byte by not storing the netmask (well, maybe more depending on
alignment considerations), the complaint level is unjustified.

            regards, tom lane

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 20:21:58

On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote:
>Steve I think you are speaking of practicality and implementation
>versus RFC compliance. I believe per the RFC Andrew is correct.

There's an RFC for storing IPs in a database?

Mike Stone

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 20:36:39

On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote:
>It was correct, but not as of 8.3.  Considering you could save a whole
>one byte by not storing the netmask

Hmm. One for the netmask, plus the other two mystery bytes. :-) A byte
here and a byte there is fine. 20% of a few billion IPs does start to
add up.

Mike Stone

Re: Best way to index IP data?

From

"Joshua D. Drake"

Date:

11 January 2008, 20:39:05

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 11 Jan 2008 19:21:29 -0500
Michael Stone <mstone+postgres@mathom.us> wrote:

> On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote:
> >Steve I think you are speaking of practicality and implementation
> >versus RFC compliance. I believe per the RFC Andrew is correct.
> 
> There's an RFC for storing IPs in a database?

Sigh. No. But there is an RFC that declare how IPs are denoted and used.

Joshua D. Drake

> 
> Mike Stone
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 1: if posting/reading
> through Usenet, please send an appropriate subscribe-nomail command
> to majordomo@postgresql.org so that your message can get through to
> the mailing list cleanly
> 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHiA0oATb/zqfZUUQRAgWNAJ9F+9Kxxazh0QK0z9zcskkG1UhFnQCgpqA1
hGiNsL5wjqxM7bzW6qNJfrE=
=/FxX
-----END PGP SIGNATURE-----

Re: Best way to index IP data?

From

Tom Lane

Date:

11 January 2008, 20:51:28

Pomarede Nicolas <npomarede@corp.free.fr> writes:
> And it's true for IPv6 too, storing an IP that refer to an end point and
> not a subnet is requiring twice as much data as needed, because the
> netmask would always be ff:ff:ff:..:ff
> So, for people dealing with large database of IPs, it would be nice to be
> able to save 50% of the corresponding disk/cache/ram space for these IPs.

There seem to be a number of people in this thread laboring under the
illusion that we store a netmask as a mask.  It's a bit count (think
/32 or /128) and occupies a whole one byte on disk.  Killer overhead,
for sure.

            regards, tom lane

Re: Best way to index IP data?

From

"Joshua D. Drake"

Date:

11 January 2008, 20:53:26

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 11 Jan 2008 19:36:02 -0500
Michael Stone <mstone+postgres@mathom.us> wrote:

> On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote:
> >It was correct, but not as of 8.3.  Considering you could save a
> >whole one byte by not storing the netmask
> 
> Hmm. One for the netmask, plus the other two mystery bytes. :-) A
> byte here and a byte there is fine. 20% of a few billion IPs does
> start to add up.

About a 65.00 hard disk:

select pg_size_pretty((500000000000 * 0.2)::bigint);
 pg_size_pretty 
- ----------------
 93 GB

But wumpf... that is a lot of ip addresses. Question is.. are you going
to have a few billion IPs in your database? Doubtful.

Sincerely,

Joshua D. Drake


> 
> Mike Stone
> 
> ---------------------------(end of
> broadcast)--------------------------- TIP 9: In versions below 8.0,
> the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match
> 


- -- 
The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ 
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD'


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHiBCMATb/zqfZUUQRAotjAJ4r6kjuO8pOZzD316Va1AE8VNt6TgCggQcT
lI/kT2DF59Zuu7cbipdBpPI=
=/xl5
-----END PGP SIGNATURE-----

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 21:04:37

On Fri, Jan 11, 2008 at 04:43:18PM -0800, Joshua D. Drake wrote:
>Sigh. No. But there is an RFC that declare how IPs are denoted and used.

Sigh, I'm honestly curious about what RFC says that endpoint IPs must
have netmasks associated with them.

Mike Stone

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 21:08:11

On Fri, Jan 11, 2008 at 04:57:46PM -0800, Joshua D. Drake wrote:
>Question is.. are you going
>to have a few billion IPs in your database? Doubtful.

Really depends on what you're using the database for, doesn't it?

Mike Stone

Re: Best way to index IP data?

From

Michael Stone

Date:

11 January 2008, 21:58:56

Tom wrote:
>There seem to be a number of people in this thread laboring under the
>illusion that we store a netmask as a mask.  It's a bit count (think
>/32 or /128) and occupies a whole one byte on disk.  Killer overhead,
>for sure.

There's no need to be quite so snarky. The netmask isn't the only part
of the overhead, but you did invite discussion of the netmask in
particular when you said "3 bytes overhead on a 16-byte address is not
'ridiculously bloated', *especially if you want a netmask with it*." You
might be hearing less about netmasks if you hadn't used them to justify
the size of the inet type. :-) There's also a number of issues being
conflated, as tends to happen when the pent up displeasure about inet
erupts on its semi-annual schedule. For myself, I mentioned two distinct
issues:

1) overhead (over absolute minimum required): 20% for IPv6 and *75%* for
IPv4. (In fairness, I actually am testing using inet for ipv6 tables, on
the assumption that I'll get another order of magnitude out of the
hardware before I really need high-volume ipv6 storage, and then it
really won't matter. But today, for the kind of addresses seen in the
real world, it really does matter. Also, recall that while you live in
the development version, those of us in the release world are dealing
with overheads of 50% for IPv6 and *200%* for IPv4. It'll take us a
while to recalibrate. :-)

2) ambiguity/completeness of data types (is it a host? is it a network?
what data type do I use if I really want to ensure that people don't
stick routing information into my host column?)

netmasks are kinda part of both, but they aren't the main point of
either.

As far as the hostility surrounding discussions of inet, I understand
and appreciate that there are reasons inet & cidr work the way they do,
and I find inet to be very useful in certain cases. But there are also
cases where they suck, and responses along the lines of "you should be
using ipv6 anyway" don't ease the pain any. :-) Responses like "it would
just be too much work to support seperate ipv4 & ipv6 data types" would
still suck :-) but at least they wouldn't be telling people that they're
imagining the problems that inet has had meeting their particular
requirements. And regardless of whether postgres gets a seperate ipv4
data type, I'd still like to have an "inethost" data type as a
complement to the "cidr" data type. :-)

Mike Stone

Re: Best way to index IP data?

From

Greg Smith

Date:

11 January 2008, 22:56:50

On Fri, 11 Jan 2008, Steve Atkins wrote:

> You may well need netmasks to configure your interface, but there's
> absolutely no need for them to identify an IP endpoint, which is all you
> need to identify the destination the packet is going to, and that is the
> most common use of IP addresses.

Technically you can't ever send a packet unless you know both the endpoint
and your local netmask.  As the sender, you're obligated to determine if
the destination is on your local LAN (in which case you send it there) or
if it goes to the gateway.  That's similar to a routing decision, but it's
not quite--if you don't have to look in a routing table, it's not actually
part of routing.

I believe this sort of detail is why subnet masks are considered required
for some things even though it doesn't seem like they are needed.
Regardless, the details of how the packets move aren't important to some
applications, and arguing over what the RFCs do and don't require doesn't
change that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Best way to index IP data?

From

david@lang.hm

Date:

11 January 2008, 23:22:54

On Fri, 11 Jan 2008, Greg Smith wrote:

>> You may well need netmasks to configure your interface, but there's
>> absolutely no need for them to identify an IP endpoint, which is all you
>> need to identify the destination the packet is going to, and that is the
>> most common use of IP addresses.
>
> Technically you can't ever send a packet unless you know both the endpoint
> and your local netmask.  As the sender, you're obligated to determine if the
> destination is on your local LAN (in which case you send it there) or if it
> goes to the gateway.  That's similar to a routing decision, but it's not
> quite--if you don't have to look in a routing table, it's not actually part
> of routing.

you also need to know your local IP address, but there is no reason to
need the netmask of the other end

my IP address is 64.81.33.126 why do you need to know my netmaask? how
would you find out what it is?

DNS doesn't report the netmask,and it's arguably the biggest database of
IP addresses around ;-)

one of the biggest reasons for storing IP addresses in a SQL database is
as part of log analysis.

David Lang

> I believe this sort of detail is why subnet masks are considered required for
> some things even though it doesn't seem like they are needed. Regardless, the
> details of how the packets move aren't important to some applications, and
> arguing over what the RFCs do and don't require doesn't change that.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>              http://www.postgresql.org/docs/faq
>

Re: Best way to index IP data?

From

"D'Arcy J.M. Cain"

Date:

12 January 2008, 02:01:04

On Fri, 11 Jan 2008 21:56:38 -0500 (EST)
Greg Smith <gsmith@gregsmith.com> wrote:
> On Fri, 11 Jan 2008, Steve Atkins wrote:
>
> > You may well need netmasks to configure your interface, but there's
> > absolutely no need for them to identify an IP endpoint, which is all you
> > need to identify the destination the packet is going to, and that is the
> > most common use of IP addresses.
>
> Technically you can't ever send a packet unless you know both the endpoint
> and your local netmask.  As the sender, you're obligated to determine if
> the destination is on your local LAN (in which case you send it there) or
> if it goes to the gateway.  That's similar to a routing decision, but it's
> not quite--if you don't have to look in a routing table, it's not actually
> part of routing.

Not sure what your point is here.  Sure, you need the netmask but not
of every IP address you send to, only for the IP/network that you are
on.  That's a grand total of one netmask per interface that you need to
know.  And you don't store it in your database.

--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: Best way to index IP data?

From

Florian Weimer

Date:

14 January 2008, 04:28:06

* Steve Atkins:

> I don't think there's ambiguity about what an dotted-quad without a
> netmask means, and hasn't been for a long time. Am I missing
> something?

Classful addressing is still part of many user interfaces, for
instance Cisco's IOS.  It's not just that the CLI accepts it, it's
also the standard[*] output format (that is, "192.0.2.0" instead of
"192.0.2.0/24"; if no prefix length is given, the one based on the
class is used).

[*] I don't think you can switch it off.  Obviously, for backwards
    compatibility reasons, the default has to stay anyway.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Best way to index IP data?

From

Michael Stone

Date:

14 January 2008, 07:19:42

On Mon, Jan 14, 2008 at 09:27:52AM +0100, Florian Weimer wrote:
>* Steve Atkins:
>> I don't think there's ambiguity about what an dotted-quad without a
>> netmask means, and hasn't been for a long time. Am I missing
>> something?
>
>Classful addressing is still part of many user interfaces, for
>instance Cisco's IOS.  It's not just that the CLI accepts it, it's
>also the standard[*] output format (that is, "192.0.2.0" instead of
>"192.0.2.0/24"; if no prefix length is given, the one based on the
>class is used).

Again, is this at all ambiguous *in the context of a host IP*? (IOW, if
I say that I recieved an IP packet from 10.0.0.1, are you really going
to ask me to clarify the netmask?)

Mike Stone

Re: Best way to index IP data?

From

Florian Weimer

Date:

14 January 2008, 10:05:42

* Michael Stone:

>>Classful addressing is still part of many user interfaces, for
>>instance Cisco's IOS.  It's not just that the CLI accepts it, it's
>>also the standard[*] output format (that is, "192.0.2.0" instead of
>>"192.0.2.0/24"; if no prefix length is given, the one based on the
>>class is used).
>
> Again, is this at all ambiguous *in the context of a host IP*? (IOW,
> if I say that I recieved an IP packet from 10.0.0.1, are you really
> going to ask me to clarify the netmask?)

Hmm.  It's an argument for a separate CIDR type, not against a host
type.

I agree that a host type would be helpful.  I currently use check
constraints to ensure no one accidentally stores data of the wrong
type, which is not ideal (space is not a main concern at this point).

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Best way to index IP data?

From

Michael Stone

Date:

14 January 2008, 10:12:22

On Mon, Jan 14, 2008 at 03:05:27PM +0100, Florian Weimer wrote:
>Hmm.  It's an argument for a separate CIDR type, not against a host
>type.

I don't think anyone argued against the CIDR type. :-)

Mike Stone