Re: Hostnames in pg_hba.conf - Mailing list pgsql-hackers

From Mark Mielke
Subject Re: Hostnames in pg_hba.conf
Date
Msg-id 4B742403.9000009@mark.mielke.cc
Whole thread Raw
In response to Hostnames in pg_hba.conf  (Bart Samwel <bart@samwel.tk>)
Responses Re: Hostnames in pg_hba.conf
List pgsql-hackers
On 02/11/2010 08:13 AM, Bart Samwel wrote:<br /><blockquote
cite="mid:ded01eb21002110513n296d60b7me5255820a69a4bff@mail.gmail.com"type="cite">ISSUE #1: Performance / caching<br
/><br/> At present, I've simply not added caching. The reasoning for this is as follows:<br /> (a) getaddrinfo doesn't
tellus about expiry, so when do you refresh?<br /> (b) If you put the cache in the postmaster, it will not work for
exec-basedbackends as opposed to fork-based backends, since those read pg_hba.conf every time they are exec'ed.<br />
(c)If you put this in the postmaster, the postmaster will have to update the cache every once in a while, which may be
slowand which may prevent new connections while the cache update takes place.<br /> (d) Outdated cache entries may
inexplicablyand without any logging choose the wrong rule for some clients. Big aargh: people will start using this to
specify'deny' rules based on host names.<br /><br /> If you COULD get expiry info out of getaddrinfo you could
potentiallystore this info in a table or something like that, and have it updated by the backends? But that's way over
myhead for now. ISTM that this stuff may better be handled by a locally-running caching DNS server, if people have
performanceissues with the lack of caching. These local caching DNS servers can also handle expiry correctly,
etcetera.<br/><br /> We should of course still take care to look up a given hostname only once for each connection
request.<br/></blockquote><br /> You should cache for some minimal amount of time or some minimal number of records -
evenif it's just one minute, and even if it's a fixed length LRU sorted list. This would deal with situations where a
newconnection is raised several times a second (some types of load). For connections raised once a minute or less, the
benefitof caching is far less. But, this can be a feature tagged on later if necessary and doesn't need to gate the
feature.<br/><br /> Many UNIX/Linux boxes have some sort of built-in cache, sometimes persistent, sometimes shared. On
myLinux box, I have nscd - "name server caching daemon" - which should be able to cache these sorts of lookups. I
believeit is used for things as common as mapping uid to username in output of "/bin/ls -l", so it does need to be
prettyfast.<br /><br /> The difference between in process cache and something like "nscd" is the inter-process
communicationrequired to use "nscd".<br /><br /><br /><blockquote
cite="mid:ded01eb21002110513n296d60b7me5255820a69a4bff@mail.gmail.com"type="cite">ISSUE #2: Reverse lookup?<br /><br />
Therewas a suggestion on the TODO list on the wiki, which basically said that maybe we could use reverse lookup to find
"the"hostname and then check for that hostname in the list. I think that won't work, since IPs can go by many names and
maynot support reverse lookup for some hostnames (/etc/hosts anybody?). Furthermore, due to the top-to-bottom
processingof pg_hba.conf, you CANNOT SKIP entries that might possibly match. For instance, if the third line is for
host"<a href="http://foo.example.com" moz-do-not-send="true">foo.example.com</a>" and the fifth line is for "<a
href="http://bar.example.com"moz-do-not-send="true">bar.example.com</a>", both lines may apply to the same IP, and you
stillHAVE to check the first one, even if reverse lookup turns up the second host name. So it doesn't save you any
lookups,it just costs an extra one.<br /></blockquote><br /> I don't see a need to do a reverse lookup. Reverse lookups
aresometimes done as a verification check, in the sense that it's cheap to get a map from NAME -> IP, but sometimes
itis much harder to get the reverse map from IP -> NAME. However, it's not a reliable check as many legitimate users
havetrouble getting a reverse map from IP -> NAME. It also doesn't same anything as IP -> NAME lookups are a
completelydifferent set of name servers, and these name servers are not always optimized for speed as IP -> NAME
lookupsare less common than NAME -> IP. Finally, if one finds a map from IP -> NAME, that doesn't prove that a
mapfrom NAME -> IP exists, so using *any* results from IP -> NAME is questionable.<br /><br /> I think reverse
lookupsare unnecessary and undesirable.<br /><br /><blockquote
cite="mid:ded01eb21002110513n296d60b7me5255820a69a4bff@mail.gmail.com"type="cite">ISSUE #3: Multiple hostnames?<br
/><br/> Currently, a pg_hba entry lists an IP / netmask combination. I would suggest allowing lists of hostnames in the
entries,so that you can at least mimic the "match multiple hosts by a single rule". Any reason not to do this?<br
/></blockquote><br/> I'm mixed. In some situations, I've wanted to put multiple IP/netmask. I would say that if
multiplenames are supported, then multiple IP/netmask should be supported. But, this does make the lines unwieldy
beyondtwo or three. This direction leans towards the capability to define "host classes", where the rules allows the
hostclass, and the host class can have a list of hostnames.<br /><br /> Two other aspects I don't see mentioned:<br
/><br/> 1) What will you do for hostnames that have multiple IP addresses? Will you accept all IP addresses as being
valid?<br/> 2) What will you do if they specify a hostname and a netmask? This seems like a convenient way of saying
"everybodyon the same subnet as NAME."<br /><br /> Cheers,<br /> mark<br /><br /><pre class="moz-signature"
cols="72">--
 
Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a>
</pre>

pgsql-hackers by date:

Previous
From: Euler Taveira de Oliveira
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL
Next
From: Robert Haas
Date:
Subject: Re: Writeable CTEs and empty relations