Thread: LDAP: bugfix and deprecated OpenLDAP API
I found a small bug in the implementation of LDAP connection parameter lookup. As documented in http://www.postgresql.org/docs/current/static/libpq-ldap.html processing should continue after a failed attempt to connect to an LDAP server. The code in src/interfaces/libpq/fe-connect.c defines a timeout of two seconds so that this failure won't block the libpq connection attempt for a long time. As coded now, the timeout won't work - if the LDAP server is down, ldap_simple_bind will wait for the network timeout, which will be quite longer than 2 seconds. The attached patch ldap-bug.patch fixes this problem; unfortunately I found no way that works both with OpenLDAP and Windows LDAP, so I had to add an #ifdef. I think that this patch should be applied and backpatched. I also tried to fix the problem mentioned in http://www.postgresql.org/message-id/CA+TgmoYnj=Es3L_0Q8+ijR4tVhvztW1fb=7C9K9gEmZWqhpwuQ@mail.gmail.com that we use deprecated OpenLDAP functions, see the attached ldap-undeprecate.patch. I added a file ldap.c in src/port with my own implementation of some of the functions that OpenLDAP has deprecated. With that, the code changes necessary are pretty minimal. I guess it's too late for something like that to go into 9.3. Should I add it to the next commitfest? Yours, Laurenz Albe
Attachment
On Tue, Feb 5, 2013 at 4:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > I guess it's too late for something like that to go into 9.3. > Should I add it to the next commitfest? Bug fixes can go in pretty much whenever, but adding it to the next CommitFest is a good way of backstopping it against the possibility that it might be forgotten. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 5, 2013 at 10:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > I found a small bug in the implementation of LDAP connection > parameter lookup. > > As documented in > http://www.postgresql.org/docs/current/static/libpq-ldap.html > processing should continue after a failed attempt > to connect to an LDAP server. > > The code in src/interfaces/libpq/fe-connect.c defines a > timeout of two seconds so that this failure won't block > the libpq connection attempt for a long time. > > As coded now, the timeout won't work - if the LDAP server > is down, ldap_simple_bind will wait for the network > timeout, which will be quite longer than 2 seconds. > > The attached patch ldap-bug.patch fixes this problem; > unfortunately I found no way that works both with OpenLDAP > and Windows LDAP, so I had to add an #ifdef. > > I think that this patch should be applied and backpatched. So just to be clear - the difference is we're going from implicit anonymous bind, to an explicit one? We're not actually causing an extra bind compared to previous versions? > I also tried to fix the problem mentioned in > http://www.postgresql.org/message-id/CA+TgmoYnj=Es3L_0Q8+ijR4tVhvztW1fb=7C9K9gEmZWqhpwuQ@mail.gmail.com > that we use deprecated OpenLDAP functions, see the attached > ldap-undeprecate.patch. > > I added a file ldap.c in src/port with my own implementation > of some of the functions that OpenLDAP has deprecated. > With that, the code changes necessary are pretty minimal. Doesn't this need a version check against OpenSSL at some point, or a configure check? Are we just assuming that all versions that people ever use have the function deprecated? (That's probably not entirely unreasonable, just double checking) --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 7/1/13 7:58 AM, Magnus Hagander wrote: >> I also tried to fix the problem mentioned in >> > http://www.postgresql.org/message-id/CA+TgmoYnj=Es3L_0Q8+ijR4tVhvztW1fb=7C9K9gEmZWqhpwuQ@mail.gmail.com >> > that we use deprecated OpenLDAP functions, see the attached >> > ldap-undeprecate.patch. >> > >> > I added a file ldap.c in src/port with my own implementation >> > of some of the functions that OpenLDAP has deprecated. >> > With that, the code changes necessary are pretty minimal. > Doesn't this need a version check against OpenSSL at some point, or a > configure check? Are we just assuming that all versions that people > ever use have the function deprecated? (That's probably not entirely > unreasonable, just double checking) Btw., I just checked the source code of Apache, PHP, and PAM, and they are all unconditionally building with LDAP_DEPRECATED. So maybe there is no hurry about this.
Magnus Hagander wrote: > On Tue, Feb 5, 2013 at 10:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: >> I found a small bug in the implementation of LDAP connection >> parameter lookup. [...] >> As coded now, the timeout won't work - if the LDAP server >> is down, ldap_simple_bind will wait for the network >> timeout, which will be quite longer than 2 seconds. >> >> The attached patch ldap-bug.patch fixes this problem; >> unfortunately I found no way that works both with OpenLDAP >> and Windows LDAP, so I had to add an #ifdef. >> >> I think that this patch should be applied and backpatched. > > So just to be clear - the difference is we're going from implicit > anonymous bind, to an explicit one? We're not actually causing an > extra bind compared to previous versions? No, it was an explicit bind before as well. The bug I discovered is that if you ldap_simple_bind, the timeout set in ldap_result does not work for network timeouts. It always waits until the TCP connection is established or fails, which can take much longer than PGLDAP_TIMEOUT seconds. With OpenLDAP you can set the LDAP_OPT_NETWORK_TIMEOUT option to enforce a timeout, but on Windows you have to use the nonstandard function ldap_connect. >> I also tried to fix the problem mentioned in >> http://www.postgresql.org/message-id/CA+TgmoYnj=Es3L_0Q8+ijR4tVhvztW1fb=7C9K9gEmZWqhpwuQ@mail.gmail.com >> that we use deprecated OpenLDAP functions, see the attached >> ldap-undeprecate.patch. >> >> I added a file ldap.c in src/port with my own implementation >> of some of the functions that OpenLDAP has deprecated. >> With that, the code changes necessary are pretty minimal. > > Doesn't this need a version check against OpenSSL at some point, or a > configure check? Are we just assuming that all versions that people > ever use have the function deprecated? (That's probably not entirely > unreasonable, just double checking) I checked, and it should work fine since OpenLDAP 2.0.0, released in 2000. The current version is 2.4.35. In 2.0.0 the old API was not yet deprecated, but the new functions are already there so that PostgreSQL should work. I don't know if that would require a check, but it should be simple to add a configure test if we are sufficiently paranoid. I'll be on vacation from Wednesday on until July 20th. Yours, Laurenz Albe
Peter Eisentraut wrote: > Btw., I just checked the source code of Apache, PHP, and PAM, and they > are all unconditionally building with LDAP_DEPRECATED. So maybe there > is no hurry about this. I don't think that the old API functions will go away until there is a new standard for the LDAP C API, but I have no inside knowledge of OpenLDAP. Yours, Laurenz Albe
On Mon, Jul 1, 2013 at 4:16 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Magnus Hagander wrote: >> On Tue, Feb 5, 2013 at 10:39 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: >>> I found a small bug in the implementation of LDAP connection >>> parameter lookup. > > [...] > >>> As coded now, the timeout won't work - if the LDAP server >>> is down, ldap_simple_bind will wait for the network >>> timeout, which will be quite longer than 2 seconds. >>> >>> The attached patch ldap-bug.patch fixes this problem; >>> unfortunately I found no way that works both with OpenLDAP >>> and Windows LDAP, so I had to add an #ifdef. >>> >>> I think that this patch should be applied and backpatched. >> >> So just to be clear - the difference is we're going from implicit >> anonymous bind, to an explicit one? We're not actually causing an >> extra bind compared to previous versions? > > No, it was an explicit bind before as well. Ah, got it. In that case, doesn't this patch break Windows? We no longer do the anonymous bind on Windows, since it's now in the #ifdef HAVE_LIBLDAP. Don't we need to keep the ldap_simple_bind() call in the Windows case, or break it up so the call to ldap_sasl_bind_s() is moved outside the #ifdef? At least I can't find anything in the docs that indicate that ldap_connect() on Windows would actually call that for us - only the other way around? > I'll be on vacation from Wednesday on until July 20th. Sorry I couldn't get back to you on that one earlier. I'm going to set this patch as returned with feedback for now, but please feel free to comment on above and possibly resubmit if necessary before the CF and I'll see if I can deal with it before the next CF anyway, as it's a bug fix. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On Mon, Jul 1, 2013 at 4:18 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Peter Eisentraut wrote: >> Btw., I just checked the source code of Apache, PHP, and PAM, and they >> are all unconditionally building with LDAP_DEPRECATED. So maybe there >> is no hurry about this. > > I don't think that the old API functions will go away until there > is a new standard for the LDAP C API, but I have no inside knowledge > of OpenLDAP. Based on this, I thikn it's reasonable to not worry about LDAP_DEPRECATED at this point, and only take that plunge when we actually *need* some of that functionality. --Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Magnus Hagander wrote: > In that case, doesn't this patch break Windows? We no longer do the > anonymous bind on Windows, since it's now in the #ifdef HAVE_LIBLDAP. > > Don't we need to keep the ldap_simple_bind() call in the Windows case, > or break it up so the call to ldap_sasl_bind_s() is moved outside the > #ifdef? At least I can't find anything in the docs that indicate that > ldap_connect() on Windows would actually call that for us - only the > other way around? This patch works for the Windows case, because ldap_connect performs an anonymous bind, see http://msdn.microsoft.com/en-us/library/windows/desktop/aa366171%28v=vs.85%29.aspx If the call to ldap_connect succeeds, the client is connected to the LDAP server as an anonymous user. The session handleshould be freed with a call to ldap_unbind when it is no longer required. > I'm going to set this patch as returned with feedback for now, but > please feel free to comment on above and possibly resubmit if > necessary before the CF and I'll see if I can deal with it before the > next CF anyway, as it's a bug fix. The patch should still be good, but if we keep the deprecated OpenLDAP API, it might be more consistent to use ldap_simple_bind_s instead of ldap_sasl_bind_s. If you agree, I'll change that. Yours, Laurenz Albe
On Tue, Jul 23, 2013 at 11:53 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Magnus Hagander wrote: >> In that case, doesn't this patch break Windows? We no longer do the >> anonymous bind on Windows, since it's now in the #ifdef HAVE_LIBLDAP. >> >> Don't we need to keep the ldap_simple_bind() call in the Windows case, >> or break it up so the call to ldap_sasl_bind_s() is moved outside the >> #ifdef? At least I can't find anything in the docs that indicate that >> ldap_connect() on Windows would actually call that for us - only the >> other way around? > > > This patch works for the Windows case, because ldap_connect performs > an anonymous bind, see > http://msdn.microsoft.com/en-us/library/windows/desktop/aa366171%28v=vs.85%29.aspx > > If the call to ldap_connect succeeds, the client is connected > to the LDAP server as an anonymous user. The session handle > should be freed with a call to ldap_unbind when it is no longer required. > >> I'm going to set this patch as returned with feedback for now, but >> please feel free to comment on above and possibly resubmit if >> necessary before the CF and I'll see if I can deal with it before the >> next CF anyway, as it's a bug fix. > > The patch should still be good, but if we keep the deprecated > OpenLDAP API, it might be more consistent to use ldap_simple_bind_s > instead of ldap_sasl_bind_s. > > If you agree, I'll change that. Sorry, you got this one in just as my vacation started. Yes, I agree with that. So please do. -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
Magnus Hagander wrote: >> The patch should still be good, but if we keep the deprecated >> OpenLDAP API, it might be more consistent to use ldap_simple_bind_s >> instead of ldap_sasl_bind_s. >> >> If you agree, I'll change that. > > Sorry, you got this one in just as my vacation started. > > Yes, I agree with that. So please do. Here is the updated patch. To repeat: this fixes a bug in LDAP connection parameter lookup if you want to have failover with more than one LDAP server: the timeout that should ensure that failover does not take too long did not work if there are TCP connection problems; in that case the connection attempt would hang until network timeout before failing over to the second LDAP server. This should be backpatched as far as supported (8.4). Yours, Laurenz Albe
Attachment
At 2013-08-19 11:47:36 +0000, laurenz.albe@wien.gv.at wrote: > > To repeat: this fixes a bug in LDAP connection parameter lookup Hi. I read through the patch, and it looks sensible. I would have preferred the ldap_simple_bind_s() call in the HAVE_LIBLDAP branch to not be inside an else {} (the if block above returns if there is an error anyway), but that's a minor point. I tested the code as follows: 1. Built the patched source --with-ldap 2. Set up ~/.pg_service.conf: [foo] ldap://localhost:3343/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*) ldap://localhost:3443/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*) 3. iptables -A INPUT -p tcp -d 127.0.0.1 --dport 3343 -j DROP 4. netcat -l 127.0.0.1 3343 ; netcat -l 127.0.0.1 3443 5. PGSERVICE=foo bin/psql psql did connect to localhost:3443 after a few seconds of trying to connect to :3343 and failing. (I tried without the iptables rule, so I know that it does try to connect to both.) This doesn't seem to handle timeouts in the sense of a server that doesn't respond after you connect (or perhaps the timeout was long enough that it outlasted my patience), but that's not the fault of this patch, anyway. I can't say anything about the patch on Windows, but since Magnus seemed to think it was OK, I'm marking this ready for committer. -- Abhijit
Abhijit Menon-Sen wrote: > I read through the patch, and it looks sensible. Thanks for the thorough review! > I would have preferred the ldap_simple_bind_s() call in the HAVE_LIBLDAP > branch to not be inside an else {} (the if block above returns if there > is an error anyway), but that's a minor point. I agree, changed. > I tested the code as follows: > > 1. Built the patched source --with-ldap > 2. Set up ~/.pg_service.conf: > > [foo] > ldap://localhost:3343/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*) > ldap://localhost:3443/cn=dbserver,cn=hosts?pgconnectinfo?base?(objectclass=*) > > 3. iptables -A INPUT -p tcp -d 127.0.0.1 --dport 3343 -j DROP > 4. netcat -l 127.0.0.1 3343 ; netcat -l 127.0.0.1 3443 > 5. PGSERVICE=foo bin/psql > > psql did connect to localhost:3443 after a few seconds of trying to > connect to :3343 and failing. (I tried without the iptables rule, so > I know that it does try to connect to both.) > > This doesn't seem to handle timeouts in the sense of a server that > doesn't respond after you connect (or perhaps the timeout was long > enough that it outlasted my patience), but that's not the fault of > this patch, anyway. That's a bug. I thought that setting LDAP_OPT_NETWORK_TIMEOUT would also take care of an unresponsive server, but it seems that I was wrong. The attached patch uses ldap_simple_bind / ldap_result to handle this kind of timeout (like the original code). > I can't say anything about the patch on Windows, but since Magnus seemed > to think it was OK, I'm marking this ready for committer. The Windows part should handle all kinds of timeouts the same. Yours, Laurenz Albe
Attachment
On Tue, 2013-09-24 at 15:07 +0000, Albe Laurenz wrote: > --- 3511,3534 ---- > } > > /* > ! * Perform an explicit anonymous bind. > ! * This is not necessary in principle, but we want to set a timeout > ! * of PGLDAP_TIMEOUT seconds and return 2 if the connection fails. > ! * Unfortunately there is no standard conforming way to do that. > */ This comment has become a bit confusing. What exactly is nonstandard? Setting a timeout, or returning 2? The code below actually returns 3. > + #ifdef HAVE_LIBLDAP > + /* in OpenLDAP, use the LDAP_OPT_NETWORK_TIMEOUT option */ We don't use HAVE_LIBLDAP anywhere else to mean OpenLDAP. Existing LDAP-related code uses #ifdef WIN32. > + #else There should be a comment here indicating what this #else belongs to (#else /* HAVE_LIBLDAP */, or whatever we end up using). > + /* the nonstandard ldap_connect function performs an anonymous bind */ > + if (ldap_connect(ld, &time) != LDAP_SUCCESS) > + { > + /* error or timeout in ldap_connect */ > + free(url); > + ldap_unbind(ld); > + return 2; > + } > + #endif here too Bonus: Write a commit message for your patch. (Consider using git format-patch.)
Peter Eisentraut wrote: [good suggestions for improvement] I'll send an updated patch on Monday. Yours, Laurenz Albe
Peter Eisentraut wrote: >> --- 3511,3534 ---- >> } >> >> /* >> ! * Perform an explicit anonymous bind. >> ! * This is not necessary in principle, but we want to set a timeout >> ! * of PGLDAP_TIMEOUT seconds and return 2 if the connection fails. >> ! * Unfortunately there is no standard conforming way to do that. >> */ > > This comment has become a bit confusing. What exactly is nonstandard? > Setting a timeout, or returning 2? The code below actually returns 3. I have improved the comment. >> + #ifdef HAVE_LIBLDAP >> + /* in OpenLDAP, use the LDAP_OPT_NETWORK_TIMEOUT option */ > > We don't use HAVE_LIBLDAP anywhere else to mean OpenLDAP. Existing > LDAP-related code uses #ifdef WIN32. Changed. > > + #else > > There should be a comment here indicating what this #else belongs to > (#else /* HAVE_LIBLDAP */, or whatever we end up using). Changed. >> + /* the nonstandard ldap_connect function performs an anonymous bind */ >> + if (ldap_connect(ld, &time) != LDAP_SUCCESS) >> + { >> + /* error or timeout in ldap_connect */ >> + free(url); >> + ldap_unbind(ld); >> + return 2; >> + } >> + #endif > > here too Changed. > Bonus: Write a commit message for your patch. (Consider using git > format-patch.) Suggested commit message is included in the patch. Yours, Laurenz Albe
Attachment
On Mon, Oct 21, 2013 at 01:31:26PM +0000, Albe Laurenz wrote: > Bind attempts to an LDAP server should time out after two seconds, > allowing additional lines in the service control file to be parsed > (which provide a fall back to a secondary LDAP server or default options). > The existing code failed to enforce that timeout during TCP connect, > resulting in a hang far longer than two seconds if the LDAP server > does not respond. Where are we on this patch? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 1/31/14, 6:32 PM, Bruce Momjian wrote: > On Mon, Oct 21, 2013 at 01:31:26PM +0000, Albe Laurenz wrote: >> Bind attempts to an LDAP server should time out after two seconds, >> allowing additional lines in the service control file to be parsed >> (which provide a fall back to a secondary LDAP server or default options). >> The existing code failed to enforce that timeout during TCP connect, >> resulting in a hang far longer than two seconds if the LDAP server >> does not respond. > > Where are we on this patch? From my perspective, we need someone who can test this.
On Mon, Oct 21, 2013 at 3:31 PM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
Peter Eisentraut wrote:I have improved the comment.
>> --- 3511,3534 ----
>> }
>>
>> /*
>> ! * Perform an explicit anonymous bind.
>> ! * This is not necessary in principle, but we want to set a timeout
>> ! * of PGLDAP_TIMEOUT seconds and return 2 if the connection fails.
>> ! * Unfortunately there is no standard conforming way to do that.
>> */
>
> This comment has become a bit confusing. What exactly is nonstandard?
> Setting a timeout, or returning 2? The code below actually returns 3.Changed.
>> + #ifdef HAVE_LIBLDAP
>> + /* in OpenLDAP, use the LDAP_OPT_NETWORK_TIMEOUT option */
>
> We don't use HAVE_LIBLDAP anywhere else to mean OpenLDAP. Existing
> LDAP-related code uses #ifdef WIN32.Changed.
> > + #else
>
> There should be a comment here indicating what this #else belongs to
> (#else /* HAVE_LIBLDAP */, or whatever we end up using).Changed.
>> + /* the nonstandard ldap_connect function performs an anonymous bind */
>> + if (ldap_connect(ld, &time) != LDAP_SUCCESS)
>> + {
>> + /* error or timeout in ldap_connect */
>> + free(url);
>> + ldap_unbind(ld);
>> + return 2;
>> + }
>> + #endif
>
> here tooSuggested commit message is included in the patch.
> Bonus: Write a commit message for your patch. (Consider using git
> format-patch.)
Sorry about the huge delay in trying to get around to this one.
For the sake of the archives - I looked at the fact that the windows codepath always returns 2, whereas the unix codepath separates at least one case out as a return 3. I can't see a pattern in the windows return codes that would make it possible to do the same though, without explicitly listing different error codes mapping to each of them - so I don't think it's worth doing that.
Thus - applied, and backpatched all the way. Thanks!
(And I just realized I forgot to credit reviewers in the commit message. My apologies - I blame confusion after havnig to re-setup my windows build environments all over again..)
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/