Thread: Best way to index IP data?
Hi List; We'll be loading a table with begining & ending I.P.'s - the table will likely have upwards of 30million rows. Any thoughts on how to get the best performance out of queries that want to look for IP ranges or the use of between queries? Should these be modeled as integers? Thanks in advance /Kevin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Thu, 10 Jan 2008 16:14:54 -0700 Kevin Kempter <kevin@kevinkempterllc.com> wrote: > Hi List; > > We'll be loading a table with begining & ending I.P.'s - the table > will likely have upwards of 30million rows. Any thoughts on how to > get the best performance out of queries that want to look for IP > ranges or the use of between queries? Should these be modeled as > integers? > http://www.postgresql.org/docs/current/static/datatype-net-types.html > Thanks in advance > > /Kevin > > ---------------------------(end of > broadcast)--------------------------- TIP 1: if posting/reading > through Usenet, please send an appropriate subscribe-nomail command > to majordomo@postgresql.org so that your message can get through to > the mailing list cleanly > - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHhqjHATb/zqfZUUQRAvMOAJ984Np5GMrFd1vixP/zECIl3qUWYgCff6U4 bCBBz1VaxqIoZfCFfKEIZLU= =+9vD -----END PGP SIGNATURE-----
On Jan 10, 2008, at 3:14 PM, Kevin Kempter wrote: > Hi List; > > We'll be loading a table with begining & ending I.P.'s - the table > will likely > have upwards of 30million rows. Any thoughts on how to get the best > performance out of queries that want to look for IP ranges or the > use of > between queries? Should these be modeled as integers? http://pgfoundry.org/projects/ip4r/ That has the advantage over using integers, or the built-in inet type, of being indexable for range and overlap queries. Cheers, Steve
On Jan 10, 2008 6:25 PM, Steve Atkins <steve@blighty.com> wrote: > http://pgfoundry.org/projects/ip4r/ > > That has the advantage over using integers, or the built-in inet type, > of being indexable for range and overlap queries. Agreed. ip4r is da bomb. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation | fax: 732.331.1301 499 Thornall Street, 2nd Floor | jonah.harris@enterprisedb.com Edison, NJ 08837 | http://www.enterprisedb.com/
On Thu, 10 Jan 2008, Jonah H. Harris wrote: > On Jan 10, 2008 6:25 PM, Steve Atkins <steve@blighty.com> wrote: >> http://pgfoundry.org/projects/ip4r/ >> >> That has the advantage over using integers, or the built-in inet type, >> of being indexable for range and overlap queries. > > Agreed. ip4r is da bomb. Hello to all, I also have to store a lot of IP v4 addresses, and I think the internal inet type is somewhat overkill for that, since it always require 8 bytes, even if you don't need to store a netmask. When storing millions of IP add, this means MB of space used for nothing in that case. As ip4r seems to work very well with postgresql, is there a possibility to see it merged in postgresql, to have a native 4 bytes IPv4 address date type ? Nicolas
Pomarede Nicolas <npomarede@corp.free.fr> writes: > As ip4r seems to work very well with postgresql, is there a possibility to > see it merged in postgresql, to have a native 4 bytes IPv4 address date > type ? Given that the world is going to IPv6 in a few years whether you like it or not, that seems pretty darn short-sighted to me. What would make sense IMHO is to adapt the improved indexing support in ip4r to work on the native inet/cidr types. regards, tom lane
On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote: > Given that the world is going to IPv6 in a few years whether you like it > or not, that seems pretty darn short-sighted to me. Indeed. Even ARIN has finally started to tell people that IPv4 is running out. There are currently significant deployments of IPv6 in the Asia-Pacific region. And it appears that Comcast is planning to move to IPv6 for its own network deployment, which may mean that many U.S. homes will have native v6 in the near future (the upshot of their plans aren't actually clear to me yet, but if you're interested in some of what they're telling people they're doing, look for Alain Durand's presentation to the v6ops working group at the last IETF meeting). > What would make sense IMHO is to adapt the improved indexing support in > ip4r to work on the native inet/cidr types. This seems like a good idea to me.
On Fri, 11 Jan 2008, Tom Lane wrote: > Pomarede Nicolas <npomarede@corp.free.fr> writes: >> As ip4r seems to work very well with postgresql, is there a possibility to >> see it merged in postgresql, to have a native 4 bytes IPv4 address date >> type ? > > Given that the world is going to IPv6 in a few years whether you like it > or not, that seems pretty darn short-sighted to me. > > What would make sense IMHO is to adapt the improved indexing support in > ip4r to work on the native inet/cidr types. > I understand your point on IPv6, but still being able to store IPv4 addresses with as little overhead as possible is important. IPv6 will certainly grow in the year to come, but if you consider the case of a very large private lan, with ip in the form 10.x.y.z, the fact that the outside world is now ipv6 doesn't necessarily imply you will rename all your internal equipments to be ipv6 if you don't need more addresses (you can do the translation when packets cross the ipv6/ipv4 gateway in your network). To be more concret, I'm working for a large french ISP, so far we have 2+ millions "boxes" (triple play adsl equipments) at our customers' home. All theses boxes have a private IPv4 address in 10.x.y.z as well as a public IPV4 address, and although we recently activated a public IPv6 addr on these boxes too (which certainly gives one of the biggest IPv6 network so far), we still need to store one ipv4 and one ipv6 addr for each box. So, my point was not to be short-sighted, we will go IPv6 for sure, it's just that there're a lot of applications where storing ipv4 addr could be needed (whether ipv6 is required for other applications or not), and in this regard, I think that being able to store ipv4 addr with 4 bytes instead of 8 could be appreciated. Or perhaps another solution would be to have built-in inet_aton / inet_ntoa functions in postgres, to store the result using an integer (unsigned) ? Nicolas
On Jan 11, 2008, at 7:19 AM, Tom Lane wrote: > Pomarede Nicolas <npomarede@corp.free.fr> writes: >> As ip4r seems to work very well with postgresql, is there a >> possibility to >> see it merged in postgresql, to have a native 4 bytes IPv4 address >> date >> type ? > > Given that the world is going to IPv6 in a few years whether you > like it > or not, that seems pretty darn short-sighted to me. > > What would make sense IMHO is to adapt the improved indexing support > in > ip4r to work on the native inet/cidr types. Can't be done. The native types are too limited to be effectively indexed in that way - they cannot represent arbitrary ranges. ip4r started with me trying to retrofit decent indexing onto the cidr type and failing miserably. I'll likely be rolling out ip6r/ipr sometime in 2008, as users are beginning to express an interest. But even then I don't expect it to replace the inet and cidr types in core, because it isn't compatible with them. I'd actually support removing inet/cidr from core completely in the longer run. Postgresql is extensible, so we really don't need types used only by niche users in core, once we have pgfoundry and something like mysqludf.org/CPAN. But that's a longer term thought. Cheers, Steve
Steve Atkins wrote: > I'd actually support removing inet/cidr from core completely in the longer > run. Postgresql is extensible, so we really don't need types used only > by niche users in core, once we have pgfoundry and something like > mysqludf.org/CPAN. But that's a longer term thought. I believe this is going to be solved by postgresqlpackages.org. Joshua D. Drake > > Cheers, > Steve > > > ---------------------------(end of broadcast)--------------------------- > TIP 7: You can help support the PostgreSQL project by donating at > > http://www.postgresql.org/about/donate >
On Fri, Jan 11, 2008 at 10:19:51AM -0500, Tom Lane wrote: >Given that the world is going to IPv6 in a few years whether you like it >or not, that seems pretty darn short-sighted to me. Well, a native IPv6 type would also be nice; inet is ridiculously bloated for both IPv4 *and* IPv6. Mike Stone
Michael Stone <mstone+postgres@mathom.us> writes: > Well, a native IPv6 type would also be nice; inet is ridiculously > bloated for both IPv4 *and* IPv6. Nonsense. 3 bytes overhead on a 16-byte address is not "ridiculously bloated", especially if you want a netmask with it. regards, tom lane
On Fri, 11 Jan 2008 15:07:38 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Stone <mstone+postgres@mathom.us> writes: > > Well, a native IPv6 type would also be nice; inet is ridiculously > > bloated for both IPv4 *and* IPv6. > > Nonsense. 3 bytes overhead on a 16-byte address is not "ridiculously > bloated", especially if you want a netmask with it. Besides, there are many cases where you want to track both ipv4 and ipv6 for the same purpose and requiring two different fields would be less than ideal. -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote: >Michael Stone <mstone+postgres@mathom.us> writes: >> Well, a native IPv6 type would also be nice; inet is ridiculously >> bloated for both IPv4 *and* IPv6. > >Nonsense. 3 bytes overhead on a 16-byte address is not "ridiculously >bloated", especially if you want a netmask with it. Big if, no? There's a very large set of users that *don't* want/need a netmask, which is why the topic keeps coming back. (Also, according to the docs, inet requires 24 bytes, which is 50% more than needed; is that not correct?) Mike Stone
On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote: >Besides, there are many cases where you want to track both ipv4 and >ipv6 for the same purpose and requiring two different fields would be >less than ideal. And, there are many cases where you don't. I've got two kinds of db's that have IPs in them. In some, the IP is a small part of a table which is focused on something else. For those I use inet, which provides a nice bit of future-proofing. In other db's the IPs are the primary focus. There are lots and lots of IPs, and the space used by IPs may be the largest chunk of a particular table. For those tables, I don't use inet because the overhead really is a significant fraction of the space. Mike Stone
Michael Stone wrote: > On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote: >> Michael Stone <mstone+postgres@mathom.us> writes: >>> Well, a native IPv6 type would also be nice; inet is ridiculously bloated >>> for both IPv4 *and* IPv6. >> >> Nonsense. 3 bytes overhead on a 16-byte address is not "ridiculously >> bloated", especially if you want a netmask with it. > > Big if, no? There's a very large set of users that *don't* want/need a > netmask, which is why the topic keeps coming back. (Also, according to the > docs, inet requires 24 bytes, which is 50% more than needed; is that not > correct?) So what this means is that our type oughta be optimized. How about having a separate bit to indicate whether there is a netmask or not, and chop the storage earlier. (I dunno if this already done) Also, with packed varlenas the overhead is reduced AFAIK. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Jan 11, 2008 at 04:32:05PM -0500, Michael Stone wrote: >On Fri, Jan 11, 2008 at 03:19:35PM -0500, D'Arcy J.M. Cain wrote: >>Besides, there are many cases where you want to track both ipv4 and >>ipv6 for the same purpose and requiring two different fields would be >>less than ideal. > >And, there are many cases where you don't. I've got two kinds of db's >that have IPs in them. In some, the IP is a small part of a table which >is focused on something else. For those I use inet, which provides a >nice bit of future-proofing. In other db's the IPs are the primary >focus. There are lots and lots of IPs, and the space used by IPs may be >the largest chunk of a particular table. For those tables, I don't use >inet because the overhead really is a significant fraction of the space. Oh, yeah, the latter type also has seperate IPv4 and IPv6 tables, because there's no point in bloating 99% of the data for the 1% that's IPv6. Is that a niche requirement? Maybe--but I think that storing netmasks is even *more* of a niche... I'm not arguing for the removal of inet, but I do think there's room for more than one type--and I certainly think its nuts to pretend that inet can meet every requirement well. Mike Stone
On Fri, Jan 11, 2008 at 06:37:10PM -0300, Alvaro Herrera wrote: >So what this means is that our type oughta be optimized. How about >having a separate bit to indicate whether there is a netmask or not, and >chop the storage earlier. (I dunno if this already done) Why not just have a type that indicates whether there is a netmask or not? We currently have this (8.3 docs, which I see reflects the 3 byte overhead--down to 20% rather than 50% for IPv6): cidr 7 or 19 bytes IPv4 and IPv6 networks inet 7 or 19 bytes IPv4 and IPv6 hosts and networks Note that there's a type for (networks), and there's a type for (hosts and networks), but there's a conspicuous lack of a type for (hosts). I suppose if you really are sure that you want to store hosts and not networks you should use inet and then set a constraint like if (family() == 4 && masklen() == 32) elsif (family() == 6 && masklen() == 128) (For people whose databases don't resolve around network data, this probably seems like not a big deal. OTOH, I can only imagine the outcry if the only available arithmetic type was an intfloat, which can be either an integer or a real number, has very low overhead to keep track of whether there's a decimal point, and can easily be made to behave like an integer if you set a constraint forbidding fractional parts. Because, hey, you *never know* when you might need a real number, and wouldn't want to paint yourself into a corner by stupidly specifying an integer-only type.) Mike Stone
On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote: > networks), but there's a conspicuous lack of a type for (hosts). I > suppose if you really are sure that you want to store hosts and not > networks Well, part of the trouble is that in the CIDR world, an IP without a netmask can be dangerously ambiguous. I can see why the design is as it is for that reason. (But I understand the problem.) A
On Jan 11, 2008, at 2:24 PM, Andrew Sullivan wrote: > On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote: > >> networks), but there's a conspicuous lack of a type for (hosts). I >> suppose if you really are sure that you want to store hosts and not >> networks > > Well, part of the trouble is that in the CIDR world, an IP without a > netmask > can be dangerously ambiguous. I can see why the design is as it is > for that > reason. (But I understand the problem.) I don't think there's ambiguity about what an dotted-quad without a netmask means, and hasn't been for a long time. Am I missing something? There is ambiguity when you feed non dotted-quads into the existing cidr I/O functions[1], but that's both a dead horse, and not something likely to actually affect users negatively. Cheers, Steve [1] Because postgresql copied obsolete pre-CIDR code from libbind.
On Fri, 11 Jan 2008, Andrew Sullivan wrote: > On Fri, Jan 11, 2008 at 05:02:36PM -0500, Michael Stone wrote: > >> networks), but there's a conspicuous lack of a type for (hosts). I >> suppose if you really are sure that you want to store hosts and not >> networks > > Well, part of the trouble is that in the CIDR world, an IP without a netmask > can be dangerously ambiguous. I can see why the design is as it is for that > reason. (But I understand the problem.) > > A > Yes, in fact it all depends on the meaning you give to an IP. If you want to store subnets, then you need an IP and a netmask, but if you just want to store the IP of a particular equipment (that is, the IP that will be refered to in the TCP/IP header), then there's no ambiguity, you just need 4 bytes to describe this IP. And it's true for IPv6 too, storing an IP that refer to an end point and not a subnet is requiring twice as much data as needed, because the netmask would always be ff:ff:ff:..:ff So, for people dealing with large database of IPs, it would be nice to be able to save 50% of the corresponding disk/cache/ram space for these IPs. Nicolas
On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote: > I don't think there's ambiguity about what an dotted-quad without a > netmask > means, and hasn't been for a long time. Am I missing something? Well, maybe. The problem is actually that, without a netmask under CIDR, the address alone isn't really enough. You have to have a netmask to get the packets to the destination. As it happens, we have some nice conventions, defined in the RFCs, for how to interpret hosts with no netmask; note though that some of those are only for humans. Or, to put it another way, without context, a dotted-quad is insufficient on its own. What you're really arguing is that the context ought to be storable somewhere else (maybe in a human's brain). I'm not suggesting that's wrong, but I can see the "correctness" argument that someone might have made to get to the datatype as it exists. I think calling it "needless bloat" is just holding it to the wrong criteria. If you look at the binary wire data, that netmask is always represented in some sense. It can sometimes be more compact than the general-purpose data type, though, no question. This is why somewhere in this thread someone talked about optimisation: there certainly are ways to make these things more compact. A
On Jan 11, 2008, at 3:00 PM, Andrew Sullivan wrote: > On Fri, Jan 11, 2008 at 02:38:27PM -0800, Steve Atkins wrote: >> I don't think there's ambiguity about what an dotted-quad without a >> netmask >> means, and hasn't been for a long time. Am I missing something? > > Well, maybe. The problem is actually that, without a netmask under > CIDR, > the address alone isn't really enough. You have to have a netmask > to get > the packets to the destination. Not really. You may well need netmasks to configure your interface, but there's absolutely no need for them to identify an IP endpoint, which is all you need to identify the destination the packet is going to, and that is the most common use of IP addresses. > As it happens, we have some nice > conventions, defined in the RFCs, for how to interpret hosts with no > netmask; note though that some of those are only for humans. Or, to > put it > another way, without context, a dotted-quad is insufficient on its > own. > What you're really arguing is that the context ought to be storable > somewhere else (maybe in a human's brain). A dotted quad without any additional information is an IPv4 address, the same as it would be if followed by "/32". Netmasks are rarely needed at any level above routing or (some forms of) address assignment, and that's where an awful lot of use of IP addresses happens. When you do need netmasks then the cidr type is great, but that's not the common case. (And the inet-with-a-netmask is an even odder duck, as it's packing two mostly unrelated bits of information into a single type). > I'm not suggesting that's wrong, > but I can see the "correctness" argument that someone might have > made to get > to the datatype as it exists. I think calling it "needless bloat" > is just > holding it to the wrong criteria. > > > If you look at the binary wire data, that netmask is always > represented in > some sense. It can sometimes be more compact than the general- > purpose data > type, though, no question. This is why somewhere in this thread > someone > talked about optimisation: there certainly are ways to make these > things > more compact. I think we're drifting a long way away from a -performance topic here, as we're agreed that inet or cidr are likely not the best types for the original poster to use. Cheers, Steve
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 11 Jan 2008 15:37:37 -0800 Steve Atkins <steve@blighty.com> wrote: > > Well, maybe. The problem is actually that, without a netmask > > under CIDR, > > the address alone isn't really enough. You have to have a netmask > > to get > > the packets to the destination. > > Not really. You may well need netmasks to configure your interface, > but there's absolutely no need for them to identify an IP endpoint, > which is all > you need to identify the destination the packet is going to, and > that is the > most common use of IP addresses. Steve I think you are speaking of practicality and implementation versus RFC compliance. I believe per the RFC Andrew is correct. Sincerely, Joshua D. Drake - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHiAEtATb/zqfZUUQRAqVjAJ9cXrRmDyCYy1vwP6nYI2kbOlYxKgCgga9q jIMuXCy8LKquevyPdehaQaA= =FNIf -----END PGP SIGNATURE-----
On Jan 11, 2008, at 3:52 PM, Joshua D. Drake wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On Fri, 11 Jan 2008 15:37:37 -0800 > Steve Atkins <steve@blighty.com> wrote: > >>> Well, maybe. The problem is actually that, without a netmask >>> under CIDR, >>> the address alone isn't really enough. You have to have a netmask >>> to get >>> the packets to the destination. >> >> Not really. You may well need netmasks to configure your interface, >> but there's absolutely no need for them to identify an IP endpoint, >> which is all >> you need to identify the destination the packet is going to, and >> that is the >> most common use of IP addresses. > > Steve I think you are speaking of practicality and implementation > versus RFC compliance. I believe per the RFC Andrew is correct. I don't believe that's the case, but really we're at "how many angels dance on the head of a pin" level quibbling by this point. :) Cheers, Steve
On Fri, Jan 11, 2008 at 06:00:55PM -0500, Andrew Sullivan wrote: >another way, without context, a dotted-quad is insufficient on its own. >What you're really arguing is that the context ought to be storable >somewhere else (maybe in a human's brain) Or, say, in a database schema, where you say "this column contains host IPs"? Are you suggesting that the IP associated with the host can somehow be ambiguous? IMO, the only ambiguity is that the type is defined as "network or host". Mike Stone
Michael Stone <mstone+postgres@mathom.us> writes: > On Fri, Jan 11, 2008 at 03:07:38PM -0500, Tom Lane wrote: >> Nonsense. 3 bytes overhead on a 16-byte address is not "ridiculously >> bloated", especially if you want a netmask with it. > Big if, no? There's a very large set of users that *don't* want/need a > netmask, which is why the topic keeps coming back. (Also, according to > the docs, inet requires 24 bytes, which is 50% more than needed; is that > not correct?) It was correct, but not as of 8.3. Considering you could save a whole one byte by not storing the netmask (well, maybe more depending on alignment considerations), the complaint level is unjustified. regards, tom lane
On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote: >Steve I think you are speaking of practicality and implementation >versus RFC compliance. I believe per the RFC Andrew is correct. There's an RFC for storing IPs in a database? Mike Stone
On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote: >It was correct, but not as of 8.3. Considering you could save a whole >one byte by not storing the netmask Hmm. One for the netmask, plus the other two mystery bytes. :-) A byte here and a byte there is fine. 20% of a few billion IPs does start to add up. Mike Stone
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 11 Jan 2008 19:21:29 -0500 Michael Stone <mstone+postgres@mathom.us> wrote: > On Fri, Jan 11, 2008 at 03:52:11PM -0800, Joshua D. Drake wrote: > >Steve I think you are speaking of practicality and implementation > >versus RFC compliance. I believe per the RFC Andrew is correct. > > There's an RFC for storing IPs in a database? Sigh. No. But there is an RFC that declare how IPs are denoted and used. Joshua D. Drake > > Mike Stone > > ---------------------------(end of > broadcast)--------------------------- TIP 1: if posting/reading > through Usenet, please send an appropriate subscribe-nomail command > to majordomo@postgresql.org so that your message can get through to > the mailing list cleanly > - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHiA0oATb/zqfZUUQRAgWNAJ9F+9Kxxazh0QK0z9zcskkG1UhFnQCgpqA1 hGiNsL5wjqxM7bzW6qNJfrE= =/FxX -----END PGP SIGNATURE-----
Pomarede Nicolas <npomarede@corp.free.fr> writes: > And it's true for IPv6 too, storing an IP that refer to an end point and > not a subnet is requiring twice as much data as needed, because the > netmask would always be ff:ff:ff:..:ff > So, for people dealing with large database of IPs, it would be nice to be > able to save 50% of the corresponding disk/cache/ram space for these IPs. There seem to be a number of people in this thread laboring under the illusion that we store a netmask as a mask. It's a bit count (think /32 or /128) and occupies a whole one byte on disk. Killer overhead, for sure. regards, tom lane
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Fri, 11 Jan 2008 19:36:02 -0500 Michael Stone <mstone+postgres@mathom.us> wrote: > On Fri, Jan 11, 2008 at 07:19:10PM -0500, Tom Lane wrote: > >It was correct, but not as of 8.3. Considering you could save a > >whole one byte by not storing the netmask > > Hmm. One for the netmask, plus the other two mystery bytes. :-) A > byte here and a byte there is fine. 20% of a few billion IPs does > start to add up. About a 65.00 hard disk: select pg_size_pretty((500000000000 * 0.2)::bigint); pg_size_pretty - ---------------- 93 GB But wumpf... that is a lot of ip addresses. Question is.. are you going to have a few billion IPs in your database? Doubtful. Sincerely, Joshua D. Drake > > Mike Stone > > ---------------------------(end of > broadcast)--------------------------- TIP 9: In versions below 8.0, > the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > - -- The PostgreSQL Company: Since 1997, http://www.commandprompt.com/ Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate SELECT 'Training', 'Consulting' FROM vendor WHERE name = 'CMD' -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQFHiBCMATb/zqfZUUQRAotjAJ4r6kjuO8pOZzD316Va1AE8VNt6TgCggQcT lI/kT2DF59Zuu7cbipdBpPI= =/xl5 -----END PGP SIGNATURE-----
On Fri, Jan 11, 2008 at 04:43:18PM -0800, Joshua D. Drake wrote: >Sigh. No. But there is an RFC that declare how IPs are denoted and used. Sigh, I'm honestly curious about what RFC says that endpoint IPs must have netmasks associated with them. Mike Stone
On Fri, Jan 11, 2008 at 04:57:46PM -0800, Joshua D. Drake wrote: >Question is.. are you going >to have a few billion IPs in your database? Doubtful. Really depends on what you're using the database for, doesn't it? Mike Stone
Tom wrote: >There seem to be a number of people in this thread laboring under the >illusion that we store a netmask as a mask. It's a bit count (think >/32 or /128) and occupies a whole one byte on disk. Killer overhead, >for sure. There's no need to be quite so snarky. The netmask isn't the only part of the overhead, but you did invite discussion of the netmask in particular when you said "3 bytes overhead on a 16-byte address is not 'ridiculously bloated', *especially if you want a netmask with it*." You might be hearing less about netmasks if you hadn't used them to justify the size of the inet type. :-) There's also a number of issues being conflated, as tends to happen when the pent up displeasure about inet erupts on its semi-annual schedule. For myself, I mentioned two distinct issues: 1) overhead (over absolute minimum required): 20% for IPv6 and *75%* for IPv4. (In fairness, I actually am testing using inet for ipv6 tables, on the assumption that I'll get another order of magnitude out of the hardware before I really need high-volume ipv6 storage, and then it really won't matter. But today, for the kind of addresses seen in the real world, it really does matter. Also, recall that while you live in the development version, those of us in the release world are dealing with overheads of 50% for IPv6 and *200%* for IPv4. It'll take us a while to recalibrate. :-) 2) ambiguity/completeness of data types (is it a host? is it a network? what data type do I use if I really want to ensure that people don't stick routing information into my host column?) netmasks are kinda part of both, but they aren't the main point of either. As far as the hostility surrounding discussions of inet, I understand and appreciate that there are reasons inet & cidr work the way they do, and I find inet to be very useful in certain cases. But there are also cases where they suck, and responses along the lines of "you should be using ipv6 anyway" don't ease the pain any. :-) Responses like "it would just be too much work to support seperate ipv4 & ipv6 data types" would still suck :-) but at least they wouldn't be telling people that they're imagining the problems that inet has had meeting their particular requirements. And regardless of whether postgres gets a seperate ipv4 data type, I'd still like to have an "inethost" data type as a complement to the "cidr" data type. :-) Mike Stone
On Fri, 11 Jan 2008, Steve Atkins wrote: > You may well need netmasks to configure your interface, but there's > absolutely no need for them to identify an IP endpoint, which is all you > need to identify the destination the packet is going to, and that is the > most common use of IP addresses. Technically you can't ever send a packet unless you know both the endpoint and your local netmask. As the sender, you're obligated to determine if the destination is on your local LAN (in which case you send it there) or if it goes to the gateway. That's similar to a routing decision, but it's not quite--if you don't have to look in a routing table, it's not actually part of routing. I believe this sort of detail is why subnet masks are considered required for some things even though it doesn't seem like they are needed. Regardless, the details of how the packets move aren't important to some applications, and arguing over what the RFCs do and don't require doesn't change that. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Fri, 11 Jan 2008, Greg Smith wrote: >> You may well need netmasks to configure your interface, but there's >> absolutely no need for them to identify an IP endpoint, which is all you >> need to identify the destination the packet is going to, and that is the >> most common use of IP addresses. > > Technically you can't ever send a packet unless you know both the endpoint > and your local netmask. As the sender, you're obligated to determine if the > destination is on your local LAN (in which case you send it there) or if it > goes to the gateway. That's similar to a routing decision, but it's not > quite--if you don't have to look in a routing table, it's not actually part > of routing. you also need to know your local IP address, but there is no reason to need the netmask of the other end my IP address is 64.81.33.126 why do you need to know my netmaask? how would you find out what it is? DNS doesn't report the netmask,and it's arguably the biggest database of IP addresses around ;-) one of the biggest reasons for storing IP addresses in a SQL database is as part of log analysis. David Lang > I believe this sort of detail is why subnet masks are considered required for > some things even though it doesn't seem like they are needed. Regardless, the > details of how the packets move aren't important to some applications, and > arguing over what the RFCs do and don't require doesn't change that. > > -- > * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
On Fri, 11 Jan 2008 21:56:38 -0500 (EST) Greg Smith <gsmith@gregsmith.com> wrote: > On Fri, 11 Jan 2008, Steve Atkins wrote: > > > You may well need netmasks to configure your interface, but there's > > absolutely no need for them to identify an IP endpoint, which is all you > > need to identify the destination the packet is going to, and that is the > > most common use of IP addresses. > > Technically you can't ever send a packet unless you know both the endpoint > and your local netmask. As the sender, you're obligated to determine if > the destination is on your local LAN (in which case you send it there) or > if it goes to the gateway. That's similar to a routing decision, but it's > not quite--if you don't have to look in a routing table, it's not actually > part of routing. Not sure what your point is here. Sure, you need the netmask but not of every IP address you send to, only for the IP/network that you are on. That's a grand total of one netmask per interface that you need to know. And you don't store it in your database. -- D'Arcy J.M. Cain <darcy@druid.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
* Steve Atkins: > I don't think there's ambiguity about what an dotted-quad without a > netmask means, and hasn't been for a long time. Am I missing > something? Classful addressing is still part of many user interfaces, for instance Cisco's IOS. It's not just that the CLI accepts it, it's also the standard[*] output format (that is, "192.0.2.0" instead of "192.0.2.0/24"; if no prefix length is given, the one based on the class is used). [*] I don't think you can switch it off. Obviously, for backwards compatibility reasons, the default has to stay anyway. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On Mon, Jan 14, 2008 at 09:27:52AM +0100, Florian Weimer wrote: >* Steve Atkins: >> I don't think there's ambiguity about what an dotted-quad without a >> netmask means, and hasn't been for a long time. Am I missing >> something? > >Classful addressing is still part of many user interfaces, for >instance Cisco's IOS. It's not just that the CLI accepts it, it's >also the standard[*] output format (that is, "192.0.2.0" instead of >"192.0.2.0/24"; if no prefix length is given, the one based on the >class is used). Again, is this at all ambiguous *in the context of a host IP*? (IOW, if I say that I recieved an IP packet from 10.0.0.1, are you really going to ask me to clarify the netmask?) Mike Stone
* Michael Stone: >>Classful addressing is still part of many user interfaces, for >>instance Cisco's IOS. It's not just that the CLI accepts it, it's >>also the standard[*] output format (that is, "192.0.2.0" instead of >>"192.0.2.0/24"; if no prefix length is given, the one based on the >>class is used). > > Again, is this at all ambiguous *in the context of a host IP*? (IOW, > if I say that I recieved an IP packet from 10.0.0.1, are you really > going to ask me to clarify the netmask?) Hmm. It's an argument for a separate CIDR type, not against a host type. I agree that a host type would be helpful. I currently use check constraints to ensure no one accidentally stores data of the wrong type, which is not ideal (space is not a main concern at this point). -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On Mon, Jan 14, 2008 at 03:05:27PM +0100, Florian Weimer wrote: >Hmm. It's an argument for a separate CIDR type, not against a host >type. I don't think anyone argued against the CIDR type. :-) Mike Stone