Thread: Ident authentication fails due to bind error on server (8.4.8)

Ident authentication fails due to bind error on server (8.4.8)

From
"Marinos Yannikos"
Date:
Hi,

I'm not sure that this is not a configuration or networking issue (so
apologies if it is), but we seem to be getting rare (a few times/day)
failures with ident authentication because several clients attempt to do
it simultaneously over a high-latency connection (capitalized = edited
IPs/username etc.):

[DB CLIENTADDR(51985) 3173 2011-06-17 10:49:56 CEST] LOG:  could not bind
to local address "SERVERADDR": Address already in use
[DB CLIENTADDR(51985) 3173 2011-06-17 10:49:56 CEST] FATAL:  Ident
authentication failed for user "USER"
[DB CLIENTADDR(51986) 3183 2011-06-17 10:49:56 CEST] FATAL:  no
pg_hba.conf entry for host "CLIENTADDR", user "USER", database "DB", SSL
off

on the client side, we had 2 connection attempts, of which 1 failed
(apparently):

Jun 17 10:49:53 xxx oidentd[12377]: Connection from SERVER (SERVERADDR):0
Jun 17 10:49:53 xxx oidentd[12377]: [SERVER] Successful lookup: 51980 ,
5432 : crm (crm)

[Fri Jun 17 10:49:53 2011] [error] [client 127.0.0.1] [Fri Jun 17 10:49:53
2011] kv_tpl.pl: DBI connect('dbname=DB;host=SERVER','USER',...) failed:
FATAL:  Ident authentication failed for user "USER", referer: URL
[Fri Jun 17 10:49:53 2011] [error] [client 127.0.0.1] [Fri Jun 17 10:49:53
2011] kv_tpl.pl: FATAL:  no pg_hba.conf entry for host "CLIENTADDR", user
"USER", database "DB", SSL off at /var/www/crm/kv_tpl.pl line 100,
referer: URL

Is this a possible race condition in src/backend/libpq/auth.c ?

[note: the client/server clocks are 3 seconds apart at this point, I
haven't investigated whether that causes issues here]

---
     /*
      * Bind to the address which the client originally contacted, otherwise
      * the ident server won't be able to match up the right connection.
This
      * is necessary if the PostgreSQL server is running on an IP alias.
      */
     rc = bind(sock_fd, la->ai_addr, la->ai_addrlen);
     if (rc != 0)
     {
         ereport(LOG,
                 (errcode_for_socket_access(),
                  errmsg("could not bind to local address \"%s\": %m",
                         local_addr_s)));
         ident_return = false;
         goto ident_inet_done;
     }
---

Regards,
  Marinos

Re: Ident authentication fails due to bind error on server (8.4.8)

From
Tom Lane
Date:
"Marinos Yannikos" <mjy@geizhals.at> writes:
> I'm not sure that this is not a configuration or networking issue (so
> apologies if it is), but we seem to be getting rare (a few times/day)
> failures with ident authentication because several clients attempt to do
> it simultaneously over a high-latency connection (capitalized = edited
> IPs/username etc.):

> [DB CLIENTADDR(51985) 3173 2011-06-17 10:49:56 CEST] LOG:  could not bind
> to local address "SERVERADDR": Address already in use
> [DB CLIENTADDR(51985) 3173 2011-06-17 10:49:56 CEST] FATAL:  Ident
> authentication failed for user "USER"

Hm.  What platform is this on?

> Is this a possible race condition in src/backend/libpq/auth.c ?

I don't think it's a race condition per se.  The code ought to be
setting up the address argument for bind() with sin_port = 0 so that
an unused port number gets assigned.  That seems to be what happens on
a couple of machines that I tried here, but I notice that the Linux
manpage for getaddrinfo says

    service sets the port in each returned address structure.  If
    this argument is a service name (see services(5)), it is
    translated to the corresponding port number.  This argument can
    also be specified as a decimal number, which is simply converted
    to binary.  If service is NULL, then the port number of the
    returned socket addresses will be left uninitialized.

In principle this wording would allow getaddrinfo to return the same
nonzero port number in multiple backends, which would lead to the
reported failure if they were doing ident verification at the same time.
I'm thinking maybe we should explicitly pass "0" rather than NULL to
getaddrinfo here.  On the other hand, it seems to work reliably as-is
on my Linux machine, so this is just speculation at this point.

(BTW, is it really sane to be using ident auth over a "high latency
connection"?  That would certainly suggest to me that you could be
getting connections from untrustworthy machines ...)

            regards, tom lane

Re: Ident authentication fails due to bind error on server (8.4.8)

From
Tom Lane
Date:
I wrote:
> I don't think it's a race condition per se.  The code ought to be
> setting up the address argument for bind() with sin_port = 0 so that
> an unused port number gets assigned.  That seems to be what happens on
> a couple of machines that I tried here, but I notice that the Linux
> manpage for getaddrinfo says

>     service sets the port in each returned address structure.  If
>     this argument is a service name (see services(5)), it is
>     translated to the corresponding port number.  This argument can
>     also be specified as a decimal number, which is simply converted
>     to binary.  If service is NULL, then the port number of the
>     returned socket addresses will be left uninitialized.

> In principle this wording would allow getaddrinfo to return the same
> nonzero port number in multiple backends, which would lead to the
> reported failure if they were doing ident verification at the same time.
> I'm thinking maybe we should explicitly pass "0" rather than NULL to
> getaddrinfo here.  On the other hand, it seems to work reliably as-is
> on my Linux machine, so this is just speculation at this point.

I looked at the glibc source code for getaddrinfo, and it looks like
they do reliably set sin_port to zero when no service argument is
provided, despite the above documentation statement.  So that's why it
works for me.  But still, if you're on a non-Linux platform it seems
possible that this is the mechanism for what's biting you.

            regards, tom lane

Re: Ident authentication fails due to bind error on server (8.4.8)

From
"Marinos Yannikos"
Date:
On Fri, 17 Jun 2011 19:51:59 +0200, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I looked at the glibc source code for getaddrinfo, and it looks like
> they do reliably set sin_port to zero when no service argument is
> provided, despite the above documentation statement.  So that's why it
> works for me.  But still, if you're on a non-Linux platform it seems
> possible that this is the mechanism for what's biting you.

Both client and server are Linux systems here and sin_port is 0 also
according to debug output I added. I cannot reproduce the problem reliably
(the users are much better testers it seems), so I'm a bit stuck with my
best guess being TIME_WAIT issues, perhaps FIN packets getting lost. I've
set

sysctl -w net.ipv4.tcp_tw_reuse=1

now and will post again if there is any change.

> (BTW, is it really sane to be using ident auth over a "high latency
> connection"?  That would certainly suggest to me that you could be
> getting connections from untrustworthy machines ...)

Both endpoints are properly firewalled (the sane sysadmins say so) and for
this particular connection only one client IP address is allowed by
pg_hba.conf, the reason why we also use ident authentication is to allow
only a few select uid's on the client host to connect to certain DSNs.

Thanks for all the helpful info!

Regards,
  Marinos

Re: Ident authentication fails due to bind error on server (8.4.8)

From
"Marinos Yannikos"
Date:
On Sat, 18 Jun 2011 04:55:59 +0200, Marinos Yannikos <mjy@geizhals.at>
wrote:

> sysctl -w net.ipv4.tcp_tw_reuse=1

This fixed the issue apparently, so bind() seems to choose ports in
TIME_WAIT state for some reason with sin_port=0 and that caused it.

Regards,
  Marinos