Thread: Brokenness in parsing of pg_hba.conf
So one of the client machines for one of my databases at work resides on 10.128.0.45. I had to enter something in pg_hba.conf for it today, as we're bringing this database up. We have a lot of 10/8 subnets, and I use it at home, so I'm accustomed to just using 10.128.45 for the IP. Strangely, however, postgres refused to acknowledge the host when it connected. I checked it again, and sure enough, the IP was right. It turns out that postgres parses pg_hba.conf in an unexpected way -- it does not accept "abbreviated" ip4 addresses (note that this is common in both ip4 and ip6). In the manpage for inet_aton, we see: INTERNET ADDRESSES (IP VERSION 4) Values specified using the `.' notation take one of the following forms: a.b.c.d a.b.c a.b a When four parts are specified, each is interpreted as a byte of data and assigned, from left to right, to the fourbytes of an Internet address. Andrew Dunstan on IRC mentioned that the parser is using the native getaddrinfo. I'm not sure if there are any advantages to this; I've said before that I'm really not a C guy. Paul Vixie had this to say about the convention: > What this man page is trying to tell you is that BSD users have > historically said "10.73" rather than "10.0.0.73" because they both > worked any place where either worked. This includes DNS primary zone > files, by the way. > > > I am pretty much assuming that the IETF does not want to standardize > this BSD practice, and that we ought not to accept ::10.73 as > equivilent to the longer ::10.0.0.73, especially given that the > degenerate case given in that man page would be ambiguous with respect > to ::1234, a valid RFC1884 address specifier whose low order 16 bits > are hexadecimal 1234 rather than decimal 1234. > > > However, that's only _my_ assumption, and some other implementor may > feel differently. In fact some other implementor of RFC 1884 might > decide to just call inet_aton() for parsing that IPv4 "dotted quad", > which is what I almost did. The original article can be found here: http://www.cs-ipv6.lancs.ac.uk/ipv6/mail-archive/IPng/1996-06/0037.html I think it is important that postgres behave as expected when handing it a properly formatted ip4 address. However, I'm aware that many people don't even realize this is a valid address. As such, I won't lose any sleep over it, but I thought I'd mention it, since it surprised me today. Thoughts? Alex -- alex@posixnap.net Alex J. Avriette, Solaris Frobnosticator "You can get much farther with a kind word and a gun than you can with a kind word alone." - Al Capone
A few points. 1. clarification of my IRC comment: A quick examination seems to shaw that we use the native getaddrinfo() where it exists, otherwise we use our own, which in turn calls inet_ntoa(). 2. ip6 has a well defined standard for abbreviation, and is quite important to have since ip6 addresses would otherwise often be tediously long. I haven't found a comparable standard for abbreviating IP4 addresses. There appears to be a convention relying on behaviour of inet_aton, and perhaps hallowed by history, but by any measure surely brain dead and counter intuitive. Why would a.b.c become a.b.0.c and a.b become a.0.0.b? On Linux it is not even documented. See the email from Paul Vixie cited below for futher gory details, including a citation of rfc1208 that specifies exactly 4 parts for a dotted notation. It's not surprising that he starts one sentence thus: "Now, before you laugh so hard you fall out of your collective seats,". 3. Maybe some people are used to it. In around 15 years of using and administering Unix I haven't tripped over this before, so I suspect it's probably not a huge problem :-) 4. My personal preference would be that if any change is made it would be to insist on an unabbreviated dotted quad for ip4. Alternatively, we need to make sure that whatever we do is consistent. That might not be possible, however, if different platforms or different library calls behave differently. cheers andrew Alex J. Avriette wrote: >So one of the client machines for one of my databases at work resides >on 10.128.0.45. I had to enter something in pg_hba.conf for it today, >as we're bringing this database up. We have a lot of 10/8 subnets, and >I use it at home, so I'm accustomed to just using 10.128.45 for the IP. >Strangely, however, postgres refused to acknowledge the host when it >connected. I checked it again, and sure enough, the IP was right. It >turns out that postgres parses pg_hba.conf in an unexpected way -- it >does not accept "abbreviated" ip4 addresses (note that this is common >in both ip4 and ip6). > >In the manpage for inet_aton, we see: > >INTERNET ADDRESSES (IP VERSION 4) > Values specified using the `.' notation take one of the following forms: > > a.b.c.d > a.b.c > a.b > a > > When four parts are specified, each is interpreted as a byte of data and > assigned, from left to right, to the four bytes of an Internet address. > >Andrew Dunstan on IRC mentioned that the parser is using the native >getaddrinfo. I'm not sure if there are any advantages to this; I've >said before that I'm really not a C guy. > >Paul Vixie had this to say about the convention: > > > >>What this man page is trying to tell you is that BSD users have >>historically said "10.73" rather than "10.0.0.73" because they both >>worked any place where either worked. This includes DNS primary zone >>files, by the way. >> >> >>I am pretty much assuming that the IETF does not want to standardize >>this BSD practice, and that we ought not to accept ::10.73 as >>equivilent to the longer ::10.0.0.73, especially given that the >>degenerate case given in that man page would be ambiguous with respect >>to ::1234, a valid RFC1884 address specifier whose low order 16 bits >>are hexadecimal 1234 rather than decimal 1234. >> >> >>However, that's only _my_ assumption, and some other implementor may >>feel differently. In fact some other implementor of RFC 1884 might >>decide to just call inet_aton() for parsing that IPv4 "dotted quad", >>which is what I almost did. >> >> > >The original article can be found here: > >http://www.cs-ipv6.lancs.ac.uk/ipv6/mail-archive/IPng/1996-06/0037.html > >I think it is important that postgres behave as expected when handing >it a properly formatted ip4 address. However, I'm aware that many >people don't even realize this is a valid address. As such, I won't >lose any sleep over it, but I thought I'd mention it, since it >surprised me today. > >Thoughts? > >Alex > >-- >alex@posixnap.net >Alex J. Avriette, Solaris Frobnosticator >"You can get much farther with a kind word and a gun than you can with a kind word alone." - Al Capone > >---------------------------(end of broadcast)--------------------------- >TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org > > >
On Tue, Jan 06, 2004 at 10:52:19PM -0500, Andrew Dunstan wrote: > 4. My personal preference would be that if any change is made it would > be to insist on an unabbreviated dotted quad for ip4. Alternatively, we I really think this is the wrong way to approach it. The 127.1 convention is common, and valid. To disallow it because you haven't experienced it is pretty egocentric. If you would instead object on the grounds of it being difficult to implement, or non portable, or outright incorrect, I would be fine with it. But the attitude of "I've never seen this, and I don't like it, regardless of the documentation" just sucks. > need to make sure that whatever we do is consistent. That might not be > possible, however, if different platforms or different library calls > behave differently. In how many places are we using inet_aton? I see in the docs: http://www.postgresql.org/docs/7.4/static/datatype-net-types.html#DATATYPE-INET It looks like the abbreviated addresses there refer to networks (like the RFC says). Additionally, if you give it '192.168.1/32', you get 192.168.1.0/32. This is even weirder than I expected. I'd really like to hear from others what their opinions on this are. alex -- alex@posixnap.net Alex J. Avriette, Shepherd of wayward Database Administrators "We are paying through the nose to be ignorant." - Larry Ellison
Andrew Dunstan <andrew@dunslane.net> writes: > 1. clarification of my IRC comment: A quick examination seems to shaw > that we use the native getaddrinfo() where it exists, otherwise we use > our own, which in turn calls inet_ntoa(). > 2. ip6 has a well defined standard for abbreviation, and is quite > important to have since ip6 addresses would otherwise often be tediously > long. I haven't found a comparable standard for abbreviating IP4 > addresses. AFAICS, Alex is quite far out in left field to believe that this is a standard notation. The fact that some BSD platforms have accepted it does not make it standard, especially not when Vixie's research shows that there is no RFC to legitimize it. (Personally I never heard of it before either, not that that proves much...) > 4. My personal preference would be that if any change is made it would > be to insist on an unabbreviated dotted quad for ip4. I can't get excited about replacing or second-guessing the platform's getaddrinfo() or inet_aton() implementation. If you don't like how those library routines behave, forward your bug report appropriately --- but it's not Postgres' problem. regards, tom lane
"Alex J. Avriette" <alex@posixnap.net> writes: > I really think this is the wrong way to approach it. The 127.1 > convention is common, and valid. AFAICS your own platform's C library doesn't support it, which means you are on pretty shaky ground to make this claim. regards, tom lane
"Alex J. Avriette" <alex@posixnap.net> writes: > In how many places are we using inet_aton? BTW, further digging shows that when the platform has neither getaddrinfo nor inet_aton, we fall back to src/port/inet_aton.c, which is a BSD-derived bit of code that behaves exactly as per your man page quote. So I'm pretty well convinced that your gripe is misdirected: you should be complaining to the authors of your C library. regards, tom lane
On Tue, Jan 06, 2004 at 11:38:44PM -0500, Tom Lane wrote: > AFAICS, Alex is quite far out in left field to believe that this is a > standard notation. The fact that some BSD platforms have accepted it How did I know you'd say that, Tom? By "standard," I mean, "many people use it." Not, "some standard is defined." For me, the manpage is enough. Additionally, the fact that I (and you) can ping 127.1 on our (your) machine is enough for me. Go on, try it. > does not make it standard, especially not when Vixie's research shows > that there is no RFC to legitimize it. (Personally I never heard of Vixie is known for being slightly ... irritable. If he encounters something he doesn't like, his first response is "oh, that's stupid." It seems strange that Linux, BSD, and Solaris (I can investigate IRIX and OSF1 tomorrow) all support it if it is either incorrect or nonstandard. We're not talking about just BSD here. > > 4. My personal preference would be that if any change is made it would > > be to insist on an unabbreviated dotted quad for ip4. > > I can't get excited about replacing or second-guessing the platform's > getaddrinfo() or inet_aton() implementation. If you don't like how Given on both Solaris (my database server) and OpenBSD (the machine from which that manpage came from) I can connect to 127.1, I think you must be mistaken here. What made you think that it isn't supported? > those library routines behave, forward your bug report appropriately > --- but it's not Postgres' problem. There isn't any point in filing a bug if it will be ignored. alex -- alex@posixnap.net Alex J. Avriette, Unix Systems Gladiator "You cannot invade the mainland United States. There would be a rifle behind each blade of grass." - Admiral Isoroku Yamamoto
"Alex J. Avriette" <alex@posixnap.net> writes: > Given on both Solaris (my database server) and OpenBSD (the machine from > which that manpage came from) I can connect to 127.1, I think you must > be mistaken here. What made you think that it isn't supported? AFAICT, our code simply hands the string off to a C library function --- either getaddrinfo() or inet_aton() depending on what your platform has. If it's not behaving the way you want, it's the fault of one of those routines. Just to verify, I changed "127.0.0.1" to "127.1" in my local pg_hba.conf (this is on HPUX 10.20, which has inet_aton but not getaddrinfo), and could still connect over localhost TCP ... then changed it to "127.2", and could not connect. So I don't believe there is anything in the PG code that is discriminating against addresses written this way. If you still think the problem is PG's and not your C library's, I invite you to trace through the code and show where we're going wrong. But right at the moment I don't believe this is our bug. regards, tom lane
Alex J. Avriette said: > On Tue, Jan 06, 2004 at 10:52:19PM -0500, Andrew Dunstan wrote: > >> 4. My personal preference would be that if any change is made it would >> be to insist on an unabbreviated dotted quad for ip4. Alternatively, >> we > > I really think this is the wrong way to approach it. The 127.1 > convention is common, and valid. To disallow it because you haven't > experienced it is pretty egocentric. If you would instead object on the > grounds of it being difficult to implement, or non portable, or > outright incorrect, I would be fine with it. But the attitude of "I've > never seen this, and I don't like it, regardless of the documentation" > just sucks. > Alex, I think you should be a little less ready to throw around terms of opprobrium like this. First, note that I stated that this was my *personal* preference, and that it only applied if a change was to be made. People here, including me, often follow a consensus rather than their personal preferences. Second, you state that this usage is valid. When you first raised the matter, you were so certain that it was sanctified by standard that you asked me if I would implement it if you could quote an RFC sanctifying it (I said yes) and went off to find one. To your surprise, you couldn't, and now want to say that "valid" is defined for every OS in every context by the man page for one library call on one OS (or possibly several). Tom has challenged you to prove that this is caused by Pg code and not code in your native libraries. Until then, the matter should rest. cheers andrew
"Andrew Dunstan" <andrew@dunslane.net> writes: > Second, you state that this usage is valid. When you first raised the > matter, you were so certain that it was sanctified by standard that you > asked me if I would implement it if you could quote an RFC sanctifying it > (I said yes) and went off to find one. To your surprise, you couldn't, and > now want to say that "valid" is defined for every OS in every context by > the man page for one library call on one OS (or possibly several). Would the POSIX/IEEE/SuS be authoritative enough? http://www.opengroup.org/onlinepubs/007904975/functions/getaddrinfo.html If the specified address family is AF_INET or AF_UNSPEC, address strings using Internet standard dot notation as specifiedin inet_addr() are valid. http://www.opengroup.org/onlinepubs/007904975/functions/inet_addr.html Values specified using IPv4 dotted decimal notation take one of the following forms: a.b.c.d When four parts are specified, each shall be interpreted as a byte of data and assigned, from left to right, tothe four bytes of an Internet address. a.b.c When a three-part address is specified, the last part shall be interpreted as a 16-bit quantity and placed in therightmost two bytes of the network address. This makes the three-part address format convenient for specifying Class B network addresses as "128.net.host" . a.b When a two-part address is supplied, the last part shall be interpreted as a 24-bit quantity and placed in the rightmostthree bytes of the network address. This makes the two-part address format convenient for specifying ClassA network addresses as "net.host" . a When only one part is given, the value shall be stored directly in the network address without any byte rearrangement. > Tom has challenged you to prove that this is caused by Pg code and not > code in your native libraries. Until then, the matter should rest. Indeed, while I'm not sure what platform the original submitter's using in the case of glibc it's already a reported bug (by me no less): http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=183814 -- greg
Greg Stark wrote: >"Andrew Dunstan" <andrew@dunslane.net> writes: > > > >>Second, you state that this usage is valid. When you first raised the >>matter, you were so certain that it was sanctified by standard that you >>asked me if I would implement it if you could quote an RFC sanctifying it >>(I said yes) and went off to find one. To your surprise, you couldn't, and >>now want to say that "valid" is defined for every OS in every context by >>the man page for one library call on one OS (or possibly several). >> >> > >Would the POSIX/IEEE/SuS be authoritative enough? > > > Enough for me, yes, no matter how crazy I think it is ;-) But as you seem to imply it still looks like Not Our Problem (tm). cheers andrew
On Tue, Jan 06, 2004 at 10:52:19PM -0500, Andrew Dunstan wrote: > > A few points. > > 1. clarification of my IRC comment: A quick examination seems to shaw > that we use the native getaddrinfo() where it exists, otherwise we use > our own, which in turn calls inet_ntoa(). > 2. ip6 has a well defined standard for abbreviation, and is quite > important to have since ip6 addresses would otherwise often be tediously > long. I haven't found a comparable standard for abbreviating IP4 > addresses. SUS does define it for inet_ntoa() and inet_addr(): a.b.c.d When four parts are specified, each shall be interpreted as a byte of data and assigned, from leftto right, to the four bytes of an Internet address. a.b.c When a three-part address is specified, thelast part shall be interpreted as a 16-bit quantity and placed in the rightmost two bytes of the networkaddress. This makes the three-part address format convenient for specifying Class B network addressesas "128.net.host" . a.b When a two-part address is supplied, the last part shall be interpretedas a 24-bit quantity and placed in the rightmost three bytes of the network address. This makes the two-part address format convenient for specifying Class A network addresses as "net.host" . a Whenonly one part is given, the value shall be stored directly in the network address without any byte rearrangement. All numbers supplied as parts in IPv4 dotted decimal notation may be decimal, octal, or hexadecimal, as specifiedin the ISO C standard (that is, a leading 0x or 0X implies hexadecimal; otherwise, a leading '0' implies octal;otherwise, the number is interpreted as decimal). For inet_pton() it says: If the af argument of inet_pton() is AF_INET, the src string shall be in the standard IPv4 dotted-decimal form: ddd.ddd.ddd.ddd where "ddd" is a one to three digit decimal number between 0 and 255 (see inet_addr() ). The inet_pton() functiondoes not accept other formats (such as the octal numbers, hexadecimal numbers, and fewer than four numbersthat inet_addr() accepts). ^^^^^^^^^^^^^^^^^^^^^^^ For getaddrinfo() it says: If the nodename argument is not null, it can be a descriptive name or can be an address string. If the specified addressfamily is AF_INET, [IP6] [Option Start] AF_INET6, [Option End] or AF_UNSPEC, valid descriptive names includehost names. If the specified address family is AF_INET or AF_UNSPEC, address strings using Internet standarddot notation as specified in inet_addr() are valid. [IP6] [Option Start] If the specified address family is AF_INET6 or AF_UNSPEC, standard IPv6 text forms describedin inet_ntop() are valid. [Option End] I'm not sure what this really says, but I can read it to either use the inet_addr() or inet_ntop() behaviour. Kurt
Greg Stark wrote: > a.b.c > > When a three-part address is specified, the last part shall be interpreted > as a 16-bit quantity and placed in the rightmost two bytes of the network > address. This makes the three-part address format convenient for specifying > Class B network addresses as "128.net.host" . I can understand the a.b case, but the a.b.c case is just weird. What logic is there that it is a.0.b.c? Nothing I can think of except convention. I agree with Vixie that this syntax is strange and shouldn't be encouraged. > > Tom has challenged you to prove that this is caused by Pg code and not > > code in your native libraries. Until then, the matter should rest. > > Indeed, while I'm not sure what platform the original submitter's using in the > case of glibc it's already a reported bug (by me no less): > > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=183814 BSD/OS 4.3.1 doesn't like 127.1: $ ping 127.1ping: 127.1: hostname nor servname provided, or not known$ ping 127.0.0.1PING 127.0.0.1 (127.0.0.1): 56 databytes64 bytes from 127.0.0.1: icmp_seq=0 ttl=255 time=0.11 ms64 bytes from 127.0.0.1: icmp_seq=1 ttl=255 time=0.056 ms -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Wed, Jan 07, 2004 at 12:53:19PM -0500, Bruce Momjian wrote: > Greg Stark wrote: > > a.b.c > > > > When a three-part address is specified, the last part shall be interpreted > > as a 16-bit quantity and placed in the rightmost two bytes of the network > > address. This makes the three-part address format convenient for specifying > > Class B network addresses as "128.net.host" . > > I can understand the a.b case, but the a.b.c case is just weird. What > logic is there that it is a.0.b.c? Nothing I can think of except > convention. I agree with Vixie that this syntax is strange and > shouldn't be encouraged. It's a.b.0.c. Note that the "c" can be bigger than 255, so 128.1.512 turns into 128.1.2.0. This can make perfect sense when you still used classes. Kurt
Bruce Momjian wrote: >Greg Stark wrote: > > >> a.b.c >> >> When a three-part address is specified, the last part shall be interpreted >> as a 16-bit quantity and placed in the rightmost two bytes of the network >> address. This makes the three-part address format convenient for specifying >> Class B network addresses as "128.net.host" . >> >> > >I can understand the a.b case, but the a.b.c case is just weird. What >logic is there that it is a.0.b.c? Nothing I can think of except >convention. I agree with Vixie that this syntax is strange and >shouldn't be encouraged. > > The mentioning of Class B network addresses proves that this is a convention from ancient times, when a couple of network admins where using up all A and B networks and didn't want to write all those ".0" indicating their waste of address space... Its usability nowadays is very limited, and should be avoided for clarity reasons. Regards, Andreas
Kurt Roeckx wrote: >For getaddrinfo() it says: > > If the nodename argument is not null, it can be a descriptive name > or can be an address string. If the specified address family is > AF_INET, [IP6] [Option Start] AF_INET6, [Option End] or AF_UNSPEC, > valid descriptive names include host names. If the specified > address family is AF_INET or AF_UNSPEC, address strings using > Internet standard dot notation as specified in inet_addr() are > valid. > > [IP6] [Option Start] If the specified address family is AF_INET6 or > AF_UNSPEC, standard IPv6 text forms described in inet_ntop() are > valid. [Option End] > > >I'm not sure what this really says, but I can read it to either >use the inet_addr() or inet_ntop() behaviour. > > > > I read it to mean that abbreviated forms (via inet_addr()) are OK for AF_INET but not for AF_INET6 (via inet_pton()) Does anyone else read it differently? cheers andrew
On Wed, Jan 07, 2004 at 01:58:54PM -0500, Andrew Dunstan wrote: > > I read it to mean that abbreviated forms (via inet_addr()) are OK for > AF_INET but not for AF_INET6 (via inet_pton()) > But we use AF_UNSPEC/PF_UNSPEC. Kurt
On Wed, Jan 07, 2004 at 01:58:54PM -0500, Andrew Dunstan wrote: > Kurt Roeckx wrote: > > > [IP6] [Option Start] If the specified address family is AF_INET6 or > > AF_UNSPEC, standard IPv6 text forms described in inet_ntop() are > > valid. [Option End] > I read it to mean that abbreviated forms (via inet_addr()) are OK for > AF_INET but not for AF_INET6 (via inet_pton()) I guess I missed that that section is only about IPv6. So it should use inet_addr()'s behaviour. Kurt
Kurt Roeckx <Q@ping.be> writes: > It's a.b.0.c. > > Note that the "c" can be bigger than 255, so 128.1.512 turns into > 128.1.2.0. This can make perfect sense when you still used > classes. Perhaps it'll seem less strange if I restate the rule so there aren't four different cases: A dotted quad is 1-4 numbers separated by dots where each number is an 8 bit number except for the last which includes allthe remaining bits in the 32 bit address. It might seem strange to people used to networks smaller than /24. But if you have a /16 with thousand hosts and don't need subnets it makes perfect sense to number them from 1-1000 rather than using base 256. I use it all the time for my net-10 addresses. They're subnetted into 10.1/16 10.2/16 etc. Sadly, I don't have thousands of hosts though. -- greg
Kurt Roeckx <Q@ping.be> writes: > I guess I missed that that section is only about IPv6. So it > should use inet_addr()'s behaviour. Well, the question is still whether *our* code is doing anything wrong, or whether the blame rests entirely with the complainant's C library. AFAICT the issue must be in his C library, because it seemed to work as he wanted on my platform... regards, tom lane
Greg Stark wrote: > > Kurt Roeckx <Q@ping.be> writes: > > > It's a.b.0.c. > > > > Note that the "c" can be bigger than 255, so 128.1.512 turns into > > 128.1.2.0. This can make perfect sense when you still used > > classes. > > Perhaps it'll seem less strange if I restate the rule so there aren't four > different cases: > > A dotted quad is 1-4 numbers separated by dots where each number is an 8 bit > number except for the last which includes all the remaining bits in the 32 > bit address. > > It might seem strange to people used to networks smaller than /24. But if you > have a /16 with thousand hosts and don't need subnets it makes perfect sense > to number them from 1-1000 rather than using base 256. > > I use it all the time for my net-10 addresses. They're subnetted into 10.1/16 > 10.2/16 etc. Sadly, I don't have thousands of hosts though. Oh, the last number can be >255. That seems useful, I guess. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Kurt Roeckx wrote: >On Wed, Jan 07, 2004 at 01:58:54PM -0500, Andrew Dunstan wrote: > > >>I read it to mean that abbreviated forms (via inet_addr()) are OK for >>AF_INET but not for AF_INET6 (via inet_pton()) >> >> >> > >But we use AF_UNSPEC/PF_UNSPEC. > > > > <mode value="language lawyer"> Even so, as I read it an IP6 address must follow the standard forms set out under inet_pton(), since that is the ONLY provision made for IP6 addresses. </mode> cheers andrew