Thread: host name support in pg_hba.conf
Here is a patch for host name support in pg_hba.conf. I have reviewed various past threads about this, and there appeared to have been a 50/50 split of for and against reverse lookup. I went with the reverse lookup, because 0) I like it. 1) It is more secure. 2) It allows extending it to wildcards in the future. 3) Apache (Allow from) does it that way. To clarify how it works: The client's IP address (known from the kernel) is reverse looked up, which results in a host name. That host name is compared with the line in pg_hba.conf. If it matches, a forward lookup is performed on the host name to check if any of the resulting IP addresses match the client's IP address. If yes, the line is considered to match and the authentication method is selected. Anyway, assuming we will go with this, you will also notice that in the patch I changed the default pg_hba.conf to match against "localhost" instead of numeric addresses. Initially thought of as a temporary change for testing this patch, I think this might actually have some permanent value because it saves you from having to change the IPv4 and IPv6 lines in tandem most of the times, which is a moderately common mistake. We already rely on localhost being (forward) resolvable for the stats collector. Something to think about: Maybe we need a quoting mechanism in case someone names their hosts "samenet".
Attachment
Peter Eisentraut <peter_e@gmx.net> wrote: > The client's IP address (known from the kernel) Some machines have several IP addresses; how is that handled? > is reverse looked up, which results in a host name. Some IP addresses have several host names, including in reverse lookup; how is that handled? -Kevin
On 9 August 2010 19:47, Peter Eisentraut <peter_e@gmx.net> wrote: > Here is a patch for host name support in pg_hba.conf. I have reviewed > various past threads about this, and there appeared to have been a 50/50 > split of for and against reverse lookup. I went with the reverse > lookup, because > > 0) I like it. > > 1) It is more secure. > > 2) It allows extending it to wildcards in the future. > > 3) Apache (Allow from) does it that way. > > To clarify how it works: The client's IP address (known from the > kernel) is reverse looked up, which results in a host name. That host > name is compared with the line in pg_hba.conf. If it matches, a forward > lookup is performed on the host name to check if any of the resulting IP > addresses match the client's IP address. If yes, the line is considered > to match and the authentication method is selected. > > Anyway, assuming we will go with this, you will also notice that in the > patch I changed the default pg_hba.conf to match against "localhost" > instead of numeric addresses. Initially thought of as a temporary > change for testing this patch, I think this might actually have some > permanent value because it saves you from having to change the IPv4 and > IPv6 lines in tandem most of the times, which is a moderately common > mistake. We already rely on localhost being (forward) resolvable for > the stats collector. > > Something to think about: Maybe we need a quoting mechanism in case > someone names their hosts "samenet". > > > -- A couple things: + matches. This field can contain either a host name, an IP + address range, one of the special key words mentioned below. + </para> s/, one/, or one/ + If a host name is specified (anything that is not an IP address + or a special key word is processed as a potential host name), a + reverse DNS lookup is performed on the client's IP address, + then a forward DNS lookup on the resulting name to check if it + matches the original IP address (that is, at least one of the + potentially many IP addresses matches the original one), and + the name found in the reverse lookup is compared with the + specified host name. That's one loooong sentence! -- Thom Brown Registered Linux user: #516935
On Mon, Aug 9, 2010 at 2:47 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > Here is a patch for host name support in pg_hba.conf. I have reviewed > various past threads about this, and there appeared to have been a 50/50 > split of for and against reverse lookup. I went with the reverse > lookup, because > > 0) I like it. > > 1) It is more secure. > > 2) It allows extending it to wildcards in the future. > > 3) Apache (Allow from) does it that way. > > To clarify how it works: The client's IP address (known from the > kernel) is reverse looked up, which results in a host name. That host > name is compared with the line in pg_hba.conf. If it matches, a forward > lookup is performed on the host name to check if any of the resulting IP > addresses match the client's IP address. If yes, the line is considered > to match and the authentication method is selected. Seems reasonable. > Anyway, assuming we will go with this, you will also notice that in the > patch I changed the default pg_hba.conf to match against "localhost" > instead of numeric addresses. Initially thought of as a temporary > change for testing this patch, I think this might actually have some > permanent value because it saves you from having to change the IPv4 and > IPv6 lines in tandem most of the times, which is a moderately common > mistake. We already rely on localhost being (forward) resolvable for > the stats collector. -1 from me, on this part. I think we should be trying to eliminate any dependencies we have on how localhost resolves, and we certainly should not add more. I have spent way too much time fighting with problems caused by localhost being present/absent in /etc/hosts; and more caused by whether it maps to 127.0.0.1 or some IP assigned to one of the boxes' Ethernet cards. What's really special is when you have two parts of the system, one of which will only work with one version of /etc/hosts and the other of which will only work with a different version. I humbly, but fervently, suggest that we try to avoid being polemical about this. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Peter Eisentraut <peter_e@gmx.net> writes: > Here is a patch for host name support in pg_hba.conf. My recollection is that the previous discussions got stuck on the cost of doing DNS lookups for every connect; and the obvious solution of trying to cache the names was shot down on the basis of not knowing when to flush the cache. Have you decided that people who want this feature will just have to pay that cost? If so, I think the documentation needs to be a bit more explicit about names being more expensive than IP addresses in pg_hba.conf. regards, tom lane
* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote: > > The client's IP address (known from the kernel) > > Some machines have several IP addresses; how is that handled? Sounds like he already described it, or I read it wrong. The fact that some machines have several IP addresses hardly matters- whatever IP is used to connect to PG is what gets the reverse DNS lookup. That then returns a host. That host is then looked up, and as long as *one* of the IPs associated with that host matches the IP of the connector, it's good to go. > > is reverse looked up, which results in a host name. > > Some IP addresses have several host names, including in reverse > lookup; how is that handled? Yeahhhh... That's just busted, imnsho. But then, that's probably because it breaks Kerberos too. :) Thanks, Stephen
On Mon, Aug 09, 2010 at 03:25:23PM -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Here is a patch for host name support in pg_hba.conf. > > My recollection is that the previous discussions got stuck on the cost > of doing DNS lookups for every connect; and the obvious solution of > trying to cache the names was shot down on the basis of not knowing when > to flush the cache. Have you decided that people who want this feature > will just have to pay that cost? If so, I think the documentation needs > to be a bit more explicit about names being more expensive than IP > addresses in pg_hba.conf. I am also a bit scared of the startup cost for each connection. In which order are things evaluated? What if I only include IP addresses in my pg_hba, will it still cost me a DNS lookup or two? Kind regards, Kristian. -- Kristian Larsson KLL-RIPE +46 704 264511 kll@spritelink.net
On mån, 2010-08-09 at 15:05 -0400, Robert Haas wrote: > -1 from me, on this part. I think we should be trying to eliminate > any dependencies we have on how localhost resolves, and we certainly > should not add more. Maybe this is something that distributors could add if they have more knowledge about the "localhost" policies on their hosts.
On mån, 2010-08-09 at 13:56 -0500, Kevin Grittner wrote: > Peter Eisentraut <peter_e@gmx.net> wrote: > > > The client's IP address (known from the kernel) > > Some machines have several IP addresses; how is that handled? A connection comes in over exactly one address. > > is reverse looked up, which results in a host name. > > Some IP addresses have several host names, including in reverse > lookup; how is that handled? This is not possible, or at least the C library APIs don't expose it. Compare the getnameinfo() and getaddrinfo() man pages, for example.
On mån, 2010-08-09 at 23:01 +0200, Kristian Larsson wrote: > In which order are things evaluated? What if I only include IP > addresses in my pg_hba, will it still cost me a DNS lookup or > two? No, if you don't use this feature, it won't cost you anything.
On mån, 2010-08-09 at 15:25 -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Here is a patch for host name support in pg_hba.conf. > > My recollection is that the previous discussions got stuck on the cost > of doing DNS lookups for every connect; and the obvious solution of > trying to cache the names was shot down on the basis of not knowing when > to flush the cache. Have you decided that people who want this feature > will just have to pay that cost? Either that or install nscd. While the speed of establishing a connection is critical to some setups, there are also plenty of setups where the connection setup speed is pretty much unimportant because of connection pooling. > If so, I think the documentation needs > to be a bit more explicit about names being more expensive than IP > addresses in pg_hba.conf. Fair enough.
Peter Eisentraut wrote: > On mån, 2010-08-09 at 13:56 -0500, Kevin Grittner wrote: >> Peter Eisentraut wrote: >>> is reverse looked up, which results in a host name. >> >> Some IP addresses have several host names, including in reverse >> lookup; how is that handled? > > This is not possible, http://en.wikipedia.org/wiki/Reverse_DNS_lookup#Multiple_pointer_records > or at least the C library APIs don't expose it. That may explain the prevalence of bugs in code dealing with it. -Kevin
On tis, 2010-08-10 at 07:32 -0500, Kevin Grittner wrote: > http://en.wikipedia.org/wiki/Reverse_DNS_lookup#Multiple_pointer_records Yeah, you can configure all kinds of nonsense and sometimes even get away with it, but the basic assumption throughout is that a system has one host name and between 1 and many IP addresses. We must make our implementation robust again other setups, but we don't have to (or rather cannot) support them.
On Tue, Aug 10, 2010 at 10:05 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > On tis, 2010-08-10 at 07:32 -0500, Kevin Grittner wrote: >> http://en.wikipedia.org/wiki/Reverse_DNS_lookup#Multiple_pointer_records > > Yeah, you can configure all kinds of nonsense and sometimes even get > away with it, but the basic assumption throughout is that a system has > one host name and between 1 and many IP addresses. These days, I think it's more common the other way around: one IP address, and many host names. > We must make our > implementation robust again other setups, but we don't have to (or > rather cannot) support them. "Cannot" is a good argument for not supporting just about anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Peter Eisentraut <peter_e@gmx.net> wrote: > Yeah, you can configure all kinds of nonsense and sometimes even > get away with it, but the basic assumption throughout is that a > system has one host name and between 1 and many IP addresses. It's hardly nonsense to have multiple names on a machine. While we usually avoid having multiple reverse lookup names, we have many in-house web applications and we neither want users to access them by IP address or have to worry about which web server is hosting which applications at the moment. So it's not unusual for one of our web servers to have 10 or 15 DNS names for forward lookup. If one machine becomes overloaded, we can move an application, change the DNS, and everyone's bookmark still works. This is precisely the sort of situation where using a hostname in pg_hba.conf would be most useful. > We must make our implementation robust again other setups, but we > don't have to (or rather cannot) support them. Without the logic to ensure that the hostname matches the reverse lookup, this might be useful for us. With that logic it is useless for us. I'm wondering how much you gain by having it in there. Why can't a forward lookup which matches the requesting IP be considered sufficient? -Kevin
On tis, 2010-08-10 at 10:11 -0400, Robert Haas wrote: > These days, I think it's more common the other way around: one IP > address, and many host names. Yes, that setup is very common, but it's actually only an illusion that DNS creates. The actual machine still has only one host name and some IP addresses, as far as the kernel is concerned. As you are surely aware, this situation creates all kinds of problems in practice.
On tis, 2010-08-10 at 09:18 -0500, Kevin Grittner wrote: > Without the logic to ensure that the hostname matches the reverse > lookup, this might be useful for us. With that logic it is useless > for us. I'm wondering how much you gain by having it in there. Why > can't a forward lookup which matches the requesting IP be considered > sufficient? For one thing, because people might like to add wildcard support. So I might be able to say host all all appserver*.example.com md5
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Without the logic to ensure that the hostname matches the reverse > lookup, this might be useful for us. With that logic it is useless > for us. I'm wondering how much you gain by having it in there. Why > can't a forward lookup which matches the requesting IP be considered > sufficient? I was about to complain about that same thing. ISTM the logic ought to be that you do a forward DNS lookup on the name presented in pg_hba.conf, and if any of the returned IP addresses match the connection's remote IP address, then you have a match. This business with doing a reverse lookup is at least twice as expensive, far more fragile, and it seems completely bogus from a security viewpoint. Why should I trust the RDNS server for an attacker's IP address? regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [100810 10:39]: > I was about to complain about that same thing. ISTM the logic ought > to be that you do a forward DNS lookup on the name presented in > pg_hba.conf, and if any of the returned IP addresses match the > connection's remote IP address, then you have a match. This business > with doing a reverse lookup is at least twice as expensive, far more > fragile, and it seems completely bogus from a security viewpoint. > Why should I trust the RDNS server for an attacker's IP address? Well, you don't trust the RDNS of the IP, you trust the normal lookup of the hostname returned by the RDNS. So if some other ip network is trying to give hostnames that should be authorized, you see that immediately when you resolve the "authorized" hostname and it doesn't give you that IP. The PTR query is a means to get the "hostname" to check against, so you d'nt have to pre-cache all thos possible results of all the hostnames. Pre-caching all the hostnames in pg_hba.conf is madness. How long do you cache them for? or do send out 1000 queries every connection? You can't support wildcards, or anythign usefull... AFAIK, every software I've used which allows hostnames as some connection control all do PTR->A/AAAA lookups as Peter proposed. a. -- Aidan Van Dyk Create like a god, aidan@highrise.ca command like a king, http://www.highrise.ca/ work like a slave.
On tis, 2010-08-10 at 10:39 -0400, Tom Lane wrote: > I was about to complain about that same thing. ISTM the logic ought > to be that you do a forward DNS lookup on the name presented in > pg_hba.conf, and if any of the returned IP addresses match the > connection's remote IP address, then you have a match. This business > with doing a reverse lookup is at least twice as expensive, far more > fragile, and it seems completely bogus from a security viewpoint. If you have hundreds on lines in pg_hba.conf, then you need to do hundreds of DNS lookups per connection attempt (*), and each of those lookups could result in even more IP addresses, or could time out. So if some unrelated part of the system breaks (DNS down), it could take you hours to establish a connection. On the other hand, with the reverse DNS lookup, you would normally do about two DNS queries per successful connection attempt, and those would only be in relation to the machines actually involved in the connection. Also, if you are in a names-only environment, you might also like to turn on log_hostnames, in which case the reverse lookup is free (well, shared). (*) That could of course be addressed by your earlier idea of caching the resolved names when pg_hba.conf is read, but I don't think many people were on board with that idea.
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2010-08-10 at 09:18 -0500, Kevin Grittner wrote: >> Why can't a forward lookup which matches the requesting IP be considered >> sufficient? > For one thing, because people might like to add wildcard support. So I > might be able to say > host all all appserver*.example.com md5 I don't think that the possibility that we might support that in future can justify using a slow and not-too-reliable method for ordinary non-wildcard names. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2010-08-10 at 10:39 -0400, Tom Lane wrote: >> I was about to complain about that same thing. ISTM the logic ought >> to be that you do a forward DNS lookup on the name presented in >> pg_hba.conf, and if any of the returned IP addresses match the >> connection's remote IP address, then you have a match. This business >> with doing a reverse lookup is at least twice as expensive, far more >> fragile, and it seems completely bogus from a security viewpoint. > If you have hundreds on lines in pg_hba.conf, then you need to do > hundreds of DNS lookups per connection attempt (*), and each of those > lookups could result in even more IP addresses, or could time out. If you have a configuration that would actually require that, then you would have a case for using a wildcard. My complaint is that you're trying to force everyone to pay for that feature whether it's of use to them or not. I think it's at least as likely that typical setups would need exactly *one*, non wildcard, entry, to wit appserver.mycompany.com (which'd necessarily yield IPs for all the machines running your app server code). regards, tom lane
* Peter Eisentraut (peter_e@gmx.net) wrote: > On mån, 2010-08-09 at 13:56 -0500, Kevin Grittner wrote: > > Some IP addresses have several host names, including in reverse > > lookup; how is that handled? > > This is not possible, or at least the C library APIs don't expose it. > Compare the getnameinfo() and getaddrinfo() man pages, for example. Don't know how it happens at a technical level, but I've definitely seen it happen before.. Particularly with Windows domains where they don't have "clean-up reverse DNS" enabled. Manifests itself by having different host names show up on successive requests... Evil in any case. Stephen
* Aidan Van Dyk (aidan@highrise.ca) wrote: > The PTR query is a means to get the "hostname" to check against, so you > d'nt have to pre-cache all thos possible results of all the hostnames. > Pre-caching all the hostnames in pg_hba.conf is madness. How long do > you cache them for? or do send out 1000 queries every connection? You > can't support wildcards, or anythign usefull... > > AFAIK, every software I've used which allows hostnames as some > connection control all do PTR->A/AAAA lookups as Peter proposed. Completely agreed. It's madness to precache all thse hostnames, but we need to figure out the hostname, thus, rDNS is used. The forward lookup is then to double-check that it matches. This is exactly how Kerberos works also. You certainly don't want to be repeatedly doing rDNS lookups to see if maybe that IP has other hosts. I also don't buy that there's an issue with setting up your rDNS to go to what you put in the pg_hba and then having the forward of that include the IP; again, it's how Kerberos works, and even if you don't believe in Kerberos, I hope you realize it's kind of popular. Thanks, Stephen
* Kevin Grittner (Kevin.Grittner@wicourts.gov) wrote: > It's hardly nonsense to have multiple names on a machine. While we > usually avoid having multiple reverse lookup names, we have many > in-house web applications and we neither want users to access them > by IP address or have to worry about which web server is hosting > which applications at the moment. So it's not unusual for one of > our web servers to have 10 or 15 DNS names for forward lookup. If > one machine becomes overloaded, we can move an application, change > the DNS, and everyone's bookmark still works. This is precisely the > sort of situation where using a hostname in pg_hba.conf would be > most useful. We're talking about client systems connecting to PG here. Are you authenticating your web users by looking at their client addresses..? That strikes me as pretty unlikely. Your web servers will be connecting to your PG server from *one* address (whatever the main one is for that pariticular server), and that address just needs to have an rDNS entry that goes to a host whose forward DNS includes that IP. If you have multiple web servers that are connecting to the same PG database, then have multiple pg_hba entries, or make them all have the same hostname per reverse DNS (though I don't really see why you'd want to). > > We must make our implementation robust again other setups, but we > > don't have to (or rather cannot) support them. > > Without the logic to ensure that the hostname matches the reverse > lookup, this might be useful for us. With that logic it is useless > for us. I'm wondering how much you gain by having it in there. Why > can't a forward lookup which matches the requesting IP be considered > sufficient? Because "you can't get there from here". You'd either have to cache all the entries in pg_hba (which is horrible), or do a look-up on each one on every connection (which is going to be a hell of alot slower than doing one more DNS lookup here). This isn't magic. What we have is a bunch of host names and a single IP (the connecting one). Figuring out which one goes with which is the issue. Stephen
On Aug 10, 2010, at 8:23 AM, Stephen Frost wrote: > * Peter Eisentraut (peter_e@gmx.net) wrote: >> On mån, 2010-08-09 at 13:56 -0500, Kevin Grittner wrote: >>> Some IP addresses have several host names, including in reverse >>> lookup; how is that handled? >> >> This is not possible, or at least the C library APIs don't expose it. >> Compare the getnameinfo() and getaddrinfo() man pages, for example. > > Don't know how it happens at a technical level, but I've definitely seen > it happen before.. Particularly with Windows domains where they don't > have "clean-up reverse DNS" enabled. Manifests itself by having > different host names show up on successive requests... Evil in any > case. Multiple hostnames for a given IP address are supported just fine by the DNS. Some C library APIs support this just fine, others (such as getnameinfo) have been simplified to make them more pleasant to use for the common case of displaying a text representation of an IP address in a friendly manner with simple code, at the expense of actually returning correct data. So getnameinfo() is not suitable for this particular usage. If an IP address has multiple hostnames then what getnameinfo() will return isn't well-defined (and I believe there's been a trickle of bugs in implementations such that sometimes they won't return any hostname if there are multiple ones configured in the DNS). Any approach to restrict based on hostnames will either need to just work with forward DNS resolution of hostnames configured in pg_hba.conf to create a list of IP addresses to compare against an incoming connection, or it'll need to use a more general interface to get the reverse DNS of an incoming connection (e.g. gethostbyaddr(), less elegant as that is) before checking forward DNS. The former approach won't work if we want to support wildcard hostnames ("accept connections from *.example.com") - and that's the only useful functionality that adding hostname based ACLs provides, I think. If we want to do that, we need to use gethostbyaddr() to get all the claimed hostnames via reverse DNS, and for each of those that matches our ACL do a getaddrinfo() to check it resolves to the connecting IP. This is something that's pretty common to do in the email world, so stealing some robust code from there might be an idea. Cheers, Steve
It seems to me this patch has been left for a long time, although it tries to provide a useful functionality. In the previous discussion, several issues were pointed out. * Case handling when an IP-address has multiple hostnames. Steve Atkins noticed getaddrinfo() does not have capability to handle this case, because it returns just one hostname from the candidates. So, he suggested to use gethostbyaddr() instead, because it can return all the candidates of hostnames. I also think gethostbyaddr() is more prefereble here rather than getaddrinfo() with the primary hostname as a fallback. How about the following idea? 1. Add a list of hostname on Port structure to store all the candidates of hostnames for the current remote host. 2. Revise the pg_getnameinfo_all() to call the gethostbyaddr(), then it constructs the list from the returned hostent->h_aliases. 3. Revise Port->remote_host to indicate contents of the list head. 4. Revise the new check_hostname() to compare the supplied hostname from pg_hba.conf and each hostnames in the above list. If pg_getnameinfo_all() would have all the candidates of hostnames, we don't need to lookup dns twice on fallbacks. * "localhost" in the default pg_hba.conf Robert Haas disagreed to switch "localhost" in the default pg_hba.conf into numeric expression, because /etc/host should determine where the localhost shall be mapped. I agree with the opinion from Robert. We should not have any assumptions of /ets/hosts settings. * Cost for reverse lookup of DNS Tom Lane concerned about the cost to communicate with DNS for each connections. You answered either caching mechanism in PG or nscd is a solution, and agreed to add an explicit description about names being more expensive than IP addresses in pg_hba.conf. I also think it is fair enough. How about adding a recommendation to install nscd when people uses hostname in pg_hba.conf. Thanks, (2010/08/10 3:47), Peter Eisentraut wrote: > Here is a patch for host name support in pg_hba.conf. I have reviewed > various past threads about this, and there appeared to have been a 50/50 > split of for and against reverse lookup. I went with the reverse > lookup, because > > 0) I like it. > > 1) It is more secure. > > 2) It allows extending it to wildcards in the future. > > 3) Apache (Allow from) does it that way. > > To clarify how it works: The client's IP address (known from the > kernel) is reverse looked up, which results in a host name. That host > name is compared with the line in pg_hba.conf. If it matches, a forward > lookup is performed on the host name to check if any of the resulting IP > addresses match the client's IP address. If yes, the line is considered > to match and the authentication method is selected. > > Anyway, assuming we will go with this, you will also notice that in the > patch I changed the default pg_hba.conf to match against "localhost" > instead of numeric addresses. Initially thought of as a temporary > change for testing this patch, I think this might actually have some > permanent value because it saves you from having to change the IPv4 and > IPv6 lines in tandem most of the times, which is a moderately common > mistake. We already rely on localhost being (forward) resolvable for > the stats collector. > > Something to think about: Maybe we need a quoting mechanism in case > someone names their hosts "samenet". > > > > -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On tis, 2010-10-05 at 12:35 +0900, KaiGai Kohei wrote: > It seems to me this patch has been left for a long time, although it tries > to provide a useful functionality. > > In the previous discussion, several issues were pointed out. > > * Case handling when an IP-address has multiple hostnames. > > Steve Atkins noticed getaddrinfo() does not have capability to handle > this case, because it returns just one hostname from the candidates. > So, he suggested to use gethostbyaddr() instead, because it can return > all the candidates of hostnames. > > I also think gethostbyaddr() is more prefereble here rather than > getaddrinfo() with the primary hostname as a fallback. I have several concerns about this approach. The first is conceptual: Assume that an IP address resolves as host names "foo" and "bar", perhaps even by accident. You have log_hostnames on, which would be typical if you have an all-names pg_hba.conf. log_hostnames only logs the first host name, say, "foo". And then assume that pg_hba.conf is set up to only allow "bar" in. There you have a debugging and auditing nightmare, because what pg_hba.conf says and what you are logging doesn't match. To address this you would have to change the log_hostnames facility to log *all* host names everywhere the host name is mentioned, which could make this whole thing quite silly. Secondly, long-standing and presumably reputable implementations of a similar feature, namely Apache's mod_authz_host and tcp-wrappers use getnameinfo() in preference of gethostbyaddr(), and don't support multiple host names per IP address. In fact, reading through that source code indicates that gethostbyaddr() has all kinds of bugs and issues, including apparently lack of IPv6 support (on some platforms?). Thirdly, gethostbyname() and gethostbyaddr() are deprecated by POSIX in favor of getaddrinfo() and getnameinfo(), so we shouldn't build new features that depend on them. > * "localhost" in the default pg_hba.conf > > Robert Haas disagreed to switch "localhost" in the default pg_hba.conf > into numeric expression, because /etc/host should determine where the > localhost shall be mapped. > > I agree with the opinion from Robert. We should not have any assumptions > of /ets/hosts settings. Note that we already default listen_addresses to 'localhost', so it would only make sense to have pg_hba.conf behave the same by default. To pick up on your argument, we effectively *do* make assumptions about /etc/hosts now, and this change would remove them. Note that this would only be a default. But if you think that it's the wrong default, then we should consider changing the listen_addresses default as well. The reason why I think this is semi-important and not just cosmetic is that (for some reason that is not entirely clear to me) clients connecting to localhost end up appearing to the server as ::1 on a lot of machines I use which are not otherwise keen on IPv6, and it is a common mistake to forget to keep the pg_hba.conf entries for 127.0.0.1 and ::1 in sync.
(2010/10/06 4:41), Peter Eisentraut wrote: > On tis, 2010-10-05 at 12:35 +0900, KaiGai Kohei wrote: >> It seems to me this patch has been left for a long time, although it tries >> to provide a useful functionality. >> >> In the previous discussion, several issues were pointed out. >> >> * Case handling when an IP-address has multiple hostnames. >> >> Steve Atkins noticed getaddrinfo() does not have capability to handle >> this case, because it returns just one hostname from the candidates. >> So, he suggested to use gethostbyaddr() instead, because it can return >> all the candidates of hostnames. >> >> I also think gethostbyaddr() is more prefereble here rather than >> getaddrinfo() with the primary hostname as a fallback. > > I have several concerns about this approach. > > The first is conceptual: Assume that an IP address resolves as host > names "foo" and "bar", perhaps even by accident. You have log_hostnames > on, which would be typical if you have an all-names pg_hba.conf. > log_hostnames only logs the first host name, say, "foo". And then > assume that pg_hba.conf is set up to only allow "bar" in. There you > have a debugging and auditing nightmare, because what pg_hba.conf says > and what you are logging doesn't match. To address this you would have > to change the log_hostnames facility to log *all* host names everywhere > the host name is mentioned, which could make this whole thing quite > silly. > > Secondly, long-standing and presumably reputable implementations of a > similar feature, namely Apache's mod_authz_host and tcp-wrappers use > getnameinfo() in preference of gethostbyaddr(), and don't support > multiple host names per IP address. In fact, reading through that > source code indicates that gethostbyaddr() has all kinds of bugs and > issues, including apparently lack of IPv6 support (on some platforms?). > > Thirdly, gethostbyname() and gethostbyaddr() are deprecated by POSIX in > favor of getaddrinfo() and getnameinfo(), so we shouldn't build new > features that depend on them. > OK, it seems to me fair enough. I consented with your explanation. I'll check the patch for more details, please wait for a few days. >> * "localhost" in the default pg_hba.conf >> >> Robert Haas disagreed to switch "localhost" in the default pg_hba.conf >> into numeric expression, because /etc/host should determine where the >> localhost shall be mapped. >> >> I agree with the opinion from Robert. We should not have any assumptions >> of /ets/hosts settings. > > Note that we already default listen_addresses to 'localhost', so it > would only make sense to have pg_hba.conf behave the same by default. > To pick up on your argument, we effectively *do* make assumptions > about /etc/hosts now, and this change would remove them. > Sorry, I misread something. I read the previous discussions again, then I know I misread the reason why Robert disagreed with this replacement. He said we should not assume resolve of localhost being enough fast because of local /etc/hosts, not saying we should not assume localhost is "127.0.0.1" or "::1". Right? Well, in my personal opinion, we should not assume the way to resolve localhost, but we can expect more than 99.9% of hosts resolve localhost using local /etc/hosts. Even if here is a curious setting, it will pay a bit more cost on connection. It is a responsibility of DBA. I agree with replacement "127.0.0.1" and "::1" by "localhost". It enables to eliminate an assumption that localhost have either of their addresses. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
KaiGai Kohei <kaigai@ak.jp.nec.com> writes: >> Note that we already default listen_addresses to 'localhost', so it >> would only make sense to have pg_hba.conf behave the same by default. >> To pick up on your argument, we effectively *do* make assumptions >> about /etc/hosts now, and this change would remove them. >> > Sorry, I misread something. > I read the previous discussions again, then I know I misread the reason > why Robert disagreed with this replacement. He said we should not assume > resolve of localhost being enough fast because of local /etc/hosts, not > saying we should not assume localhost is "127.0.0.1" or "::1". Right? > Well, in my personal opinion, we should not assume the way to resolve > localhost, but we can expect more than 99.9% of hosts resolve localhost > using local /etc/hosts. Even if here is a curious setting, it will pay > a bit more cost on connection. It is a responsibility of DBA. > I agree with replacement "127.0.0.1" and "::1" by "localhost". > It enables to eliminate an assumption that localhost have either of > their addresses. This argument is completely unfounded in reality. Please go read the relevant RFCs. 127.0.0.1 is standardized as the IPv4 loopback address (see RFC3330), and ::1 is standardized as the IPv6 loopback address (see RFC1884 section 2.4.3). So far as I can find, there is *no* standard mandating that localhost means the loopback address. RFC1537 suggests that DNS domains "should" resolve localhost.anything as 127.0.0.1; but that is a lot weaker than the other specifications, and there's nothing whatever to promise that it will work in a DNS-less environment. In fact, we have seen cases where it didn't work even with publicly available DNS servers, eg http://archives.postgresql.org/pgsql-admin/2010-05/msg00073.php That example leads me to think that using localhost in the default pg_hba.conf file would actually be a security hazard: you would be placing it in the hands of your DNS provider as to which addresses Postgres will believe are "local" connections. That's an OK decision for individual admins to make, but it's not a good idea for us to ship it as a universal default. (Note that relying on the DNS provider to interpret listen_addresses is not nearly as dangerous, since in any case the kernel isn't going to let us bind() to nonlocal addresses.) On top of that, there's no way for your DNS server to know whether your local kernel speaks IPv6 or not, so you might not get a resolution of the name that includes the appropriate loopback addresses. This may or may not have anything to do with the reports we occasionally get of people having to change listen_addresses to "*" to get things to work. Between these issues and the speed concern, I don't think that we should change this. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Between these issues and the speed concern, I don't think that we should > change this. +1. Stephen
On Tue, Oct 5, 2010 at 3:41 PM, Peter Eisentraut <peter_e@gmx.net> wrote:
This is exactly what I am seeing here. However contrary to your case the patch makes it even worse on my side. With the patch compiled in and a pg_hba.conf entry of "localhost", I cannot connect anymore to "-h localhost", I get "no pg_hba.conf entry for host ::1".
This is mostly standard Ubuntu setup.
Joachim
The reason why I think this is semi-important and not just cosmetic is
that (for some reason that is not entirely clear to me) clients
connecting to localhost end up appearing to the server as ::1 on a lot
of machines I use which are not otherwise keen on IPv6, and it is a
common mistake to forget to keep the pg_hba.conf entries for 127.0.0.1
and ::1 in sync.
This is exactly what I am seeing here. However contrary to your case the patch makes it even worse on my side. With the patch compiled in and a pg_hba.conf entry of "localhost", I cannot connect anymore to "-h localhost", I get "no pg_hba.conf entry for host ::1".
This is mostly standard Ubuntu setup.
Joachim
On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote: > So far as I can find, there is *no* standard > mandating that localhost means the loopback address. Should we then change pgstat.c to use IP addresses instead of hardcoding "localhost"?
On 10/06/2010 04:05 AM, Peter Eisentraut wrote: > On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote: >> So far as I can find, there is *no* standard >> mandating that localhost means the loopback address. > Should we then change pgstat.c to use IP addresses instead of hardcoding > "localhost"? > I understood Tom to be saying we should not rely on "localhost" for authentication, not that we shouldn't use it at all. cheers andrew
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote: >> So far as I can find, there is *no* standard >> mandating that localhost means the loopback address. > Should we then change pgstat.c to use IP addresses instead of hardcoding > "localhost"? Hm, perhaps so. regards, tom lane
Andrew Dunstan <andrew@dunslane.net> writes: > On 10/06/2010 04:05 AM, Peter Eisentraut wrote: >> On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote: >>> So far as I can find, there is *no* standard >>> mandating that localhost means the loopback address. >> Should we then change pgstat.c to use IP addresses instead of hardcoding >> "localhost"? > I understood Tom to be saying we should not rely on "localhost" for > authentication, not that we shouldn't use it at all. I think it's all right to use it as the default value for listen_addresses, because (1) it's an understandable default, and (2) users can change the setting if it doesn't work. However, the usage in pgstat.c is hard-wired, meaning that if you have a configuration where "localhost" doesn't resolve correctly for whatever reason, there's no simple recourse to get the stats collector working. So ISTM there is an argument for changing that. regards, tom lane
On Wed, Oct 6, 2010 at 15:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 10/06/2010 04:05 AM, Peter Eisentraut wrote: >>> On tis, 2010-10-05 at 22:17 -0400, Tom Lane wrote: >>>> So far as I can find, there is *no* standard >>>> mandating that localhost means the loopback address. > >>> Should we then change pgstat.c to use IP addresses instead of hardcoding >>> "localhost"? > >> I understood Tom to be saying we should not rely on "localhost" for >> authentication, not that we shouldn't use it at all. > > I think it's all right to use it as the default value for > listen_addresses, because (1) it's an understandable default, > and (2) users can change the setting if it doesn't work. > > However, the usage in pgstat.c is hard-wired, meaning that if you > have a configuration where "localhost" doesn't resolve correctly > for whatever reason, there's no simple recourse to get the stats > collector working. So ISTM there is an argument for changing that. Well, hardcoding it will break the (unusual) case when localhost isn't 127.0.0.1 / ::1. (You'd obviously have to have it try both ipv4 and ipv6). It's not common, but i've certainly come across a number of virtual machines where localhost resolves (through /etc/hosts) to the machines "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't exist. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> writes: > On Wed, Oct 6, 2010 at 15:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> However, the usage in pgstat.c is hard-wired, meaning that if you >> have a configuration where "localhost" doesn't resolve correctly >> for whatever reason, there's no simple recourse to get the stats >> collector working. So ISTM there is an argument for changing that. > Well, hardcoding it will break the (unusual) case when localhost isn't > 127.0.0.1 / ::1. (You'd obviously have to have it try both ipv4 and > ipv6). You didn't read what I wrote before. Those numeric addresses define the loopback address, *not* "localhost". When localhost fails to resolve as those address(es), it's localhost that is wrong. We have actually seen this in the field with bogus DNS providers. > It's not common, but i've certainly come across a number of virtual > machines where localhost resolves (through /etc/hosts) to the machines > "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't > exist. That appears to me to be a broken (non RFC compliant) VM setup. However, maybe what this is telling us is we need to expose the setting? Or perhaps better, try 127.0.0.1, ::1, localhost, in that order. regards, tom lane
On Wed, Oct 6, 2010 at 15:34, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Wed, Oct 6, 2010 at 15:16, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> However, the usage in pgstat.c is hard-wired, meaning that if you >>> have a configuration where "localhost" doesn't resolve correctly >>> for whatever reason, there's no simple recourse to get the stats >>> collector working. So ISTM there is an argument for changing that. > >> Well, hardcoding it will break the (unusual) case when localhost isn't >> 127.0.0.1 / ::1. (You'd obviously have to have it try both ipv4 and >> ipv6). > > You didn't read what I wrote before. Those numeric addresses define the > loopback address, *not* "localhost". When localhost fails to resolve > as those address(es), it's localhost that is wrong. We have actually > seen this in the field with bogus DNS providers. > >> It's not common, but i've certainly come across a number of virtual >> machines where localhost resolves (through /etc/hosts) to the machines >> "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't >> exist. > > That appears to me to be a broken (non RFC compliant) VM setup. Can't argue with that. But it exists. > However, maybe what this is telling us is we need to expose the setting? > Or perhaps better, try 127.0.0.1, ::1, localhost, in that order. That was kind of my point, that yes, we probably need to do one of those at least. Today it is "kind of exposed", because you can edit /etc/hosts - you don't need to rely on DNS for it. I just don't want to lose that ability. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > That appears to me to be a broken (non RFC compliant) VM setup. > However, maybe what this is telling us is we need to expose the setting? > Or perhaps better, try 127.0.0.1, ::1, localhost, in that order. Yeah, I'd be happier if we exposed it, to be honest. Either that, or figure out a way to get rid of it entirely by using a different method, but that's a much bigger issue. Thanks, Stephen
On 10/06/2010 09:49 AM, Stephen Frost wrote: > * Tom Lane (tgl@sss.pgh.pa.us) wrote: >> That appears to me to be a broken (non RFC compliant) VM setup. >> However, maybe what this is telling us is we need to expose the setting? >> Or perhaps better, try 127.0.0.1, ::1, localhost, in that order. > Yeah, I'd be happier if we exposed it, to be honest. Either that, or > figure out a way to get rid of it entirely by using a different method, > but that's a much bigger issue. Please don't expose it. It will a source of yet more confusion. People already get confused by the difference between listening addresses and pg_hba.conf addresses. It's one of the most frequent points of confusion seen on IRC. Adding another address to configure will just compound the confusion badly. I much prefer Tom's last suggestion. cheers andrew
On Wed, Oct 6, 2010 at 6:21 AM, Magnus Hagander <magnus@hagander.net> wrote: > It's not common, but i've certainly come across a number of virtual > machines where localhost resolves (through /etc/hosts) to the machines > "real" IP rather than 127.0.01, because 127.0.0.1 simply doesn't > exist. It's perfectly fine for localhost to resolve to the machine's external ip address. It would be weird for it to resolve to some other host's ip address like the vm's host machine. But having 127.0.0.1 not exist would be positively broken. All kinds of things wouldn't work. Are you sure about that part? -- greg
(2010/10/06 10:21), KaiGai Kohei wrote: > I'll check the patch for more details, please wait for a few days. I could find out some matters in this patch, independent from the discussion of localhost. (About pg_hba.conf.sample, I'm sorry for the missuggestion. Please fix up it according to Tom's comment.) * The logic is still unclear for me. The check_hostname() immediately returns with false, if the resolved remote hostname is NOT matched with the hostname described in pg_hba.conf. | +static bool | +check_hostname(hbaPort *port, const char *hostname) | +{ | + struct addrinfo *gai_result, *gai; | + int ret; | + bool found; | + | + /* Lookup remote host name if not already done */ | + if (!port->remote_hostname) | + { | + char remote_hostname[NI_MAXHOST]; | + | + if (pg_getnameinfo_all(&port->raddr.addr, port->raddr.salen, | + remote_hostname, sizeof(remote_hostname), | + NULL, 0, | + 0)) | + return false; | + | + port->remote_hostname = strdup(remote_hostname); | + } | + | + if (strcmp(port->remote_hostname, hostname) != 0) | + return false; | + | + /* Lookup IP from host name and check against original IP */ However, it seems to me you expected an opposite behavior. If the resolved hostname is matched with the hostname described in pg_hba.conf, we can consider this HbaLine to be a suitable configuration without any fallbacks. Right? It so, it should be as follows: if (strcmp(port->remote_hostname, hostname) == 0) return true; In addition, we should go the rest of fallback code on mismatch cases only, don't we? * Why getnameinfo() in the fallback loop? At the fallback code when the hostname was matched (I believe this code is intended to handle the case when hostname was NOT matched.) calls getnameinfo() for each candidate of remote addresses. But its result is not referenced by anybody. Is it really necessary? | + found = false; | + for (gai = gai_result; gai; gai = gai->ai_next) | + { | + char hostinfo[NI_MAXHOST]; | + | + getnameinfo(gai->ai_addr, gai->ai_addrlen, | + hostinfo, sizeof(hostinfo), | + NULL, 0, | + NI_NUMERICHOST); | + | + if (gai->ai_addr->sa_family == port->raddr.addr.ss_family) | + { | + if (gai->ai_addr->sa_family == AF_INET) | + { | + if (ipv4eq((struct sockaddr_in *) gai->ai_addr, | + (struct sockaddr_in *) &port->raddr.addr)) | + { | + found = true; | + break; | + } | + } | + else if (gai->ai_addr->sa_family == AF_INET6) | + { | + if (ipv6eq((struct sockaddr_in6 *) gai->ai_addr, | + (struct sockaddr_in6 *) &port->raddr.addr)) | + { | + found = true; | + break; | + } | + } | + } | + } | + | + if (gai_result) | + freeaddrinfo(gai_result); | + | + return found; | +} * Slash ('/') after the hostname At the parse_hba_line(), the parsed token which contains either hostname or cidr address is sliced into two parts on the first '/' character, if exist. Then, even if cidr_slash is not NULL, it shall be ignored when top-half of the token is hostname, not numeric address. | else | { | /* IP and netmask are specified */ | parsedline->ip_cmp_method = ipCmpMask; | | /* need a modifiable copy of token */ | token = pstrdup(token); | | /* Check if it has a CIDR suffix and if so isolate it */ | cidr_slash = strchr(token, '/'); | if (cidr_slash) | *cidr_slash = '\0'; : | ret = pg_getaddrinfo_all(token, NULL, &hints, &gai_result); | - if (ret || !gai_result) | + if (ret == 0 && gai_result) | + memcpy(&parsedline->addr, gai_result->ai_addr, | + gai_result->ai_addrlen); | + else if (ret == EAI_NONAME) | + parsedline->hostname = token; | + else | { It allows the following configuration works without any errors. (In fact, it works for me.) # IPv4/6 local connections: host all all kaigai.myhome.cx/today_is_sunny trust It seems to me, we should raise an error, if both of cidr_slash and parsedline->hostname are not NULL. Thanks, -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On tis, 2010-10-05 at 22:28 -0400, Joachim Wieland wrote: > This is exactly what I am seeing here. However contrary to your case > the > patch makes it even worse on my side. With the patch compiled in and a > pg_hba.conf entry of "localhost", I cannot connect anymore to "-h > localhost", I get "no pg_hba.conf entry for host ::1". > > This is mostly standard Ubuntu setup. Could you post your /etc/hosts file?
On tor, 2010-10-07 at 12:45 +0900, KaiGai Kohei wrote: > * The logic is still unclear for me. > > The check_hostname() immediately returns with false, if the resolved > remote hostname is NOT matched with the hostname described in pg_hba.conf. > If the resolved hostname is matched with the hostname described > in pg_hba.conf, we can consider this HbaLine to be a suitable > configuration without any fallbacks. Right? > It so, it should be as follows: > > if (strcmp(port->remote_hostname, hostname) == 0) > return true; > > In addition, we should go the rest of fallback code on mismatch > cases only, don't we? The code below that is not a fallback, it is the second part of the double DNS lookup that has been extensively discussed throughout this thread. The logic is get hostname from client's IP address strcmp hostname to pg_hba.conf get IP address from hostname if that IP address == client's IP address; then pg_hba.conf entry OK > * Why getnameinfo() in the fallback loop? I checked through my git history; this was actually a leftover from some debugging code. I'll remove it. > * Slash ('/') after the hostname > > At the parse_hba_line(), the parsed token which contains either > hostname or cidr address is sliced into two parts on the first '/' > character, if exist. > Then, even if cidr_slash is not NULL, it shall be ignored when > top-half of the token is hostname, not numeric address. OK, I'll fix that.
(2010/10/12 3:34), Peter Eisentraut wrote: > On tor, 2010-10-07 at 12:45 +0900, KaiGai Kohei wrote: >> * The logic is still unclear for me. >> >> The check_hostname() immediately returns with false, if the resolved >> remote hostname is NOT matched with the hostname described in pg_hba.conf. > >> If the resolved hostname is matched with the hostname described >> in pg_hba.conf, we can consider this HbaLine to be a suitable >> configuration without any fallbacks. Right? >> It so, it should be as follows: >> >> if (strcmp(port->remote_hostname, hostname) == 0) >> return true; >> >> In addition, we should go the rest of fallback code on mismatch >> cases only, don't we? > > The code below that is not a fallback, it is the second part of the > double DNS lookup that has been extensively discussed throughout this > thread. The logic is > > get hostname from client's IP address > strcmp hostname to pg_hba.conf > get IP address from hostname > if that IP address == client's IP address; then pg_hba.conf entry OK > Sorry, I missed what you introduced at the head of this thread. When an entry passes the checks, this routine always checks both of directions for the supplied entries. BTW, I have one other question. Is it really necessary to check reverse dns entries? If check_hostname() compares remote address of the client and all the candidates retrieved using getaddrinfo() for the hostname in pg_hba.conf line (not port->remote_hostname), it seems to me we don't need to set up reverse dns entries. A typical ISP often doesn't assign reverse dns entries for cheap class rental server being shared by multiple users, for instance. It seem to me this idea allows to apply this new feature more widely. # Let's try nslookup to www.kaigai.gr.jp, and its retrieved address. :-) Of course, it is just my idea. If dislike it, please ignore it. >> * Why getnameinfo() in the fallback loop? > > I checked through my git history; this was actually a leftover from some > debugging code. I'll remove it. > >> * Slash ('/') after the hostname >> >> At the parse_hba_line(), the parsed token which contains either >> hostname or cidr address is sliced into two parts on the first '/' >> character, if exist. >> Then, even if cidr_slash is not NULL, it shall be ignored when >> top-half of the token is hostname, not numeric address. > > OK, I'll fix that. > > > -- KaiGai Kohei <kaigai@ak.jp.nec.com>
On mån, 2010-10-11 at 21:34 +0300, Peter Eisentraut wrote: > > * Why getnameinfo() in the fallback loop? > > I checked through my git history; this was actually a leftover from > some > debugging code. I'll remove it. > > > * Slash ('/') after the hostname > > > > At the parse_hba_line(), the parsed token which contains either > > hostname or cidr address is sliced into two parts on the first '/' > > character, if exist. > > Then, even if cidr_slash is not NULL, it shall be ignored when > > top-half of the token is hostname, not numeric address. > > OK, I'll fix that. Hopefully final patch, which addresses the above issues, adds some documentation enhancements, and the possibility to quote host names (in case someone wants to have a host named "samehost").
Attachment
On tis, 2010-10-12 at 10:11 +0900, KaiGai Kohei wrote: > Is it really necessary to check reverse dns entries? This has been extensively discussed in this thread.
Peter Eisentraut <peter_e@gmx.net> writes: > Hopefully final patch, which addresses the above issues, adds some > documentation enhancements, and the possibility to quote host names (in > case someone wants to have a host named "samehost"). A few minor gripes: > + If a host name is specified (anything that is not an IP address > + or a special key word is processed as a potential host name), > + that name is compared with the result of a reverse name > + resolution of the client's IP address (e.g., reverse DNS > + lookup, if DNS is used). If there is a match, then a forward > + name resolution (e.g., forward DNS lookup) is performed on the > + host name to check if it resolves to an IP address that is > + equal to the client's IP address. If both directions match, > + then the entry is considered to match. I think the reason you're getting repeated questions about why the reverse DNS lookup is needed is that the documentation fails to explain that. It'd be helpful if this part of the docs pointed out why the apparently-extra lookup is necessary. If I understand correctly, the point is that we do the reverse lookup only once per connection (when first finding a name-based pg_hba.conf entry) and that saves us having to do forward lookup on all the name-based entries that don't match. Which means BTW that the technique loses if the first name-based entry matches the connection, and only wins when the match comes at the third or later name-based entry. Are we really sure this is a good tradeoff? Are we sure there shouldn't be a way to turn it off? I'm a bit concerned about the fact that the argument seems to be "better performance" versus "makes it completely useless for some use cases", as here: http://archives.postgresql.org/pgsql-hackers/2010-08/msg00726.php Another smaller point is that it might be helpful if the wording didn't make it sound like we expect the forward DNS lookup to produce just one IP address. Perhaps "If there is a match, then a forward name resolution (e.g., forward DNS lookup) is performed on the host name to check if any of the addresses it resolves to are equal to the client's IP address." In check_hostname(): > + port->remote_hostname = strdup(remote_hostname); Seems like this had better be pstrdup(), or at least have an error check. Dumping core at the next line is not a real acceptable substitute for an "out of memory" error. > + /* Lookup IP from host name and check against original IP */ > + ret = getaddrinfo(port->remote_hostname, NULL, NULL, &gai_result); > + if (ret != 0) > + ereport(ERROR, > + (errcode(ERRCODE_CONFIG_FILE_ERROR), > + errmsg("getaddrinfo failed on \"%s\": %s", > + port->remote_hostname, gai_strerror(ret)))); Is ERRCODE_CONFIG_FILE_ERROR really the most appropriate code here? regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: >> Hopefully final patch, which addresses the above issues, adds some >> documentation enhancements, and the possibility to quote host names (in >> case someone wants to have a host named "samehost"). Oh, I had an idea for a small improvement to this. It doesn't seem unlikely that pg_hba.conf could contain multiple entries with the same host name (but, presumably, different user and/or database names). As this is coded, you'll do a forward DNS lookup for each one until finding the complete match. You could easily prevent that by adding an additional cache field to struct Port, along the lines of+1 = remote_hostname is known to resolve to client's IP address-1= remote_hostname is known NOT to resolve to client's IP address0 = we have not done the forward DNS lookup yet. With this additional field we could guarantee to do not more than two DNS lookups per connection attempt. It also seems worth taking a second look at the order of tests in check_hba(). I suspect that on average check_db() and check_role() will now be much cheaper than the client IP test; should they be done first? Of course, if you assume that "all" is the typical entry in those columns, this doesn't win. regards, tom lane
On tis, 2010-10-12 at 17:03 -0400, Tom Lane wrote: > Oh, I had an idea for a small improvement to this. It doesn't seem > unlikely that pg_hba.conf could contain multiple entries with the same > host name (but, presumably, different user and/or database names). As > this is coded, you'll do a forward DNS lookup for each one until > finding > the complete match. You could easily prevent that by adding an > additional cache field to struct Port, along the lines of > +1 = remote_hostname is known to resolve to client's IP > address > -1 = remote_hostname is known NOT to resolve to client's IP > address > 0 = we have not done the forward DNS lookup yet. > With this additional field we could guarantee to do not more than two > DNS lookups per connection attempt. That's a very good idea. I will revise my patch.
I have addressed these issues in my final patch. One other change I did was change the host name comparison to case insensitive (Apache and TCP Wrappers do the same). I'm planning to work on a wildcard mechanism soon. After that the issue of which way the host resolution should work will have an additional argument, so I'm not going to address that any further right now. On tis, 2010-10-12 at 16:33 -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Hopefully final patch, which addresses the above issues, adds some > > documentation enhancements, and the possibility to quote host names (in > > case someone wants to have a host named "samehost"). > > A few minor gripes: > > > + If a host name is specified (anything that is not an IP address > > + or a special key word is processed as a potential host name), > > + that name is compared with the result of a reverse name > > + resolution of the client's IP address (e.g., reverse DNS > > + lookup, if DNS is used). If there is a match, then a forward > > + name resolution (e.g., forward DNS lookup) is performed on the > > + host name to check if it resolves to an IP address that is > > + equal to the client's IP address. If both directions match, > > + then the entry is considered to match. > > I think the reason you're getting repeated questions about why the > reverse DNS lookup is needed is that the documentation fails to explain > that. It'd be helpful if this part of the docs pointed out why the > apparently-extra lookup is necessary. If I understand correctly, the > point is that we do the reverse lookup only once per connection (when > first finding a name-based pg_hba.conf entry) and that saves us having > to do forward lookup on all the name-based entries that don't match. > > Which means BTW that the technique loses if the first name-based entry > matches the connection, and only wins when the match comes at the third > or later name-based entry. Are we really sure this is a good tradeoff? > Are we sure there shouldn't be a way to turn it off? I'm a bit > concerned about the fact that the argument seems to be "better > performance" versus "makes it completely useless for some use cases", > as here: > http://archives.postgresql.org/pgsql-hackers/2010-08/msg00726.php > > Another smaller point is that it might be helpful if the wording didn't > make it sound like we expect the forward DNS lookup to produce just > one IP address. Perhaps "If there is a match, then a forward name > resolution (e.g., forward DNS lookup) is performed on the host name to > check if any of the addresses it resolves to are equal to the client's > IP address." > > In check_hostname(): > > > + port->remote_hostname = strdup(remote_hostname); > > Seems like this had better be pstrdup(), or at least have an error > check. Dumping core at the next line is not a real acceptable > substitute for an "out of memory" error. > > > + /* Lookup IP from host name and check against original IP */ > > + ret = getaddrinfo(port->remote_hostname, NULL, NULL, &gai_result); > > + if (ret != 0) > > + ereport(ERROR, > > + (errcode(ERRCODE_CONFIG_FILE_ERROR), > > + errmsg("getaddrinfo failed on \"%s\": %s", > > + port->remote_hostname, gai_strerror(ret)))); > > Is ERRCODE_CONFIG_FILE_ERROR really the most appropriate code here? > > regards, tom lane >