Thread: [HACKERS] PQhost may return socket dir for network connection

[HACKERS] PQhost may return socket dir for network connection

From
Kyotaro HORIGUCHI
Date:
Hello.

As the subject, PQhost() seems to be forgeting about the case
where only hostaddr is specified in a connection string.

PQhost() returns /var/run/postgresql or such for a connection
made from the connection string "hostaddr=127.0.0.1 port=5432
dbname=postgres". I don't believe this behavior is useful.

The first attached file is a fix for master. connhost[]->pghost
is properly set by connectOptions2() so we don't need to handle
the default case in PQhost.

The second is a fix for 9.6. connectOption2 was setting
pgunixsocket then clearing pghost reagardless of the value of
pghostaddr. This behavior inhibited PQhost() from getting the
right name to return.

The behavior for several connection strings after this fix are
the follows. The first line is the case fixed by this patch.

1. PQhost = 127.0.0.1 <= "hostaddr=127.0.0.1 port=5432 dbname=postgres"
2. PQhost = /tmp <= "port=5432 dbname=postgres"
3. PQhost = /tmp <= "host=/tmp port=5432 dbname=postgres"
4. PQhost = /tmp <= "host=/tmp hostaddr=127.0.0.1 port=5432 dbname=postgres"
5. PQhost = localhost <= "host=localhost port=5432 dbname=postgres"
6. PQhost = localhost <= "host=localhost hostaddr=127.0.0.1 port=5432 dbname=postgres"
7. PQhost = hoge <= "host=hoge hostaddr=127.0.0.1 port=5432 dbname=postgres"

The third case is somewhat confusing but it would be the user's
fault.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 5836,5854 **** PQhost(const PGconn *conn) {     if (!conn)         return NULL;
!     if (conn->connhost != NULL &&
!         conn->connhost[conn->whichhost].type != CHT_HOST_ADDRESS)
!         return conn->connhost[conn->whichhost].host;
!     else if (conn->pghost != NULL && conn->pghost[0] != '\0')         return conn->pghost;
!     else
!     {
! #ifdef HAVE_UNIX_SOCKETS
!         return DEFAULT_PGSOCKET_DIR;
! #else
!         return DefaultHost;
! #endif
!     } }  char *
--- 5836,5857 ---- {     if (!conn)         return NULL;
! 
!     /*
!      * We should try to avoid returning ip address. connhost[]->host stores IP
!      * address in the case of CHT_HOST_ADDRESS so try to use conn->pghost
!      * instead.
!      */
!     if (conn->connhost == NULL ||
!         (conn->connhost[conn->whichhost].type == CHT_HOST_ADDRESS &&
!          conn->pghost != NULL && conn->pghost[0] != '\0'))         return conn->pghost;
! 
!     /*
!      * Otherwise we use this as host name even though it might be an IP
!      * address.
!      */
!     return conn->connhost[conn->whichhost].host; }  char *
*** a/src/interfaces/libpq/fe-connect.c
--- b/src/interfaces/libpq/fe-connect.c
***************
*** 819,827 **** connectOptions2(PGconn *conn)     }      /*
!      * Allow unix socket specification in the host name      */
!     if (conn->pghost && is_absolute_path(conn->pghost))     {         if (conn->pgunixsocket)
free(conn->pgunixsocket);
--- 819,828 ----     }      /*
!      * Allow unix socket specification in the host name if hostaddr is not set      */
!     if ((conn->pghostaddr == NULL || conn->pghostaddr[0] == '\0') &&
!         conn->pghost && is_absolute_path(conn->pghost))     {         if (conn->pgunixsocket)
free(conn->pgunixsocket);
***************
*** 5354,5370 **** PQhost(const PGconn *conn)         return NULL;     if (conn->pghost != NULL && conn->pghost[0] !=
'\0')        return conn->pghost;
 
!     else
!     { #ifdef HAVE_UNIX_SOCKETS
!         if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0')
!             return conn->pgunixsocket;
!         else
!             return DEFAULT_PGSOCKET_DIR; #else
!         return DefaultHost; #endif
-     } }  char *
--- 5355,5369 ----         return NULL;     if (conn->pghost != NULL && conn->pghost[0] != '\0')         return
conn->pghost;
!     if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0')
!         return conn->pghostaddr; #ifdef HAVE_UNIX_SOCKETS
!     if (conn->pgunixsocket != NULL && conn->pgunixsocket[0] != '\0')
!         return conn->pgunixsocket;
!     return DEFAULT_PGSOCKET_DIR; #else
!     return DefaultHost; #endif }  char *

Re: [HACKERS] PQhost may return socket dir for network connection

From
Robert Haas
Date:
On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> As the subject, PQhost() seems to be forgeting about the case
> where only hostaddr is specified in a connection string.

I suspect that may have been intentional.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] PQhost may return socket dir for network connection

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> As the subject, PQhost() seems to be forgeting about the case
>> where only hostaddr is specified in a connection string.

> I suspect that may have been intentional.

See commits 11003eb55 and 40cb21f70 for recent history in this area.
Further back there's more history around host vs. hostaddr.  We've
gone back and forth on this in the past, including an ultimately
reverted attempt to add "PQhostaddr()", so it'd be a good idea to
study those past threads before proposing a change here.

Having said that, the behavior stated in $subject does sound wrong.
        regards, tom lane



Re: [HACKERS] PQhost may return socket dir for network connection

From
Robert Haas
Date:
On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Apr 28, 2017 at 3:43 AM, Kyotaro HORIGUCHI
>> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>> As the subject, PQhost() seems to be forgeting about the case
>>> where only hostaddr is specified in a connection string.
>
>> I suspect that may have been intentional.
>
> See commits 11003eb55 and 40cb21f70 for recent history in this area.
> Further back there's more history around host vs. hostaddr.  We've
> gone back and forth on this in the past, including an ultimately
> reverted attempt to add "PQhostaddr()", so it'd be a good idea to
> study those past threads before proposing a change here.
>
> Having said that, the behavior stated in $subject does sound wrong.

I'm not sure.  My understanding of the relationship between host and
hostaddr is that hostaddr overrides our notion of where to find host,
but not our notion of the host to which we're connecting.  Under that
definition, the current behavior as described by Kyotaro sounds
correct.  When you say host=X hostaddr=Y, we act as though we're
connecting to X but try to connect to IP Y.  When you just specify
hostaddr=Y, we act as if we're trying to connect to the default host
(which happens to be a socket address in that example) but actually
use the specified IP address.  That's consistent.

Now, against that,
https://www.postgresql.org/docs/9.6/static/libpq-connect.html says:

--
If hostaddr is specified without host, the value for hostaddr gives
the server network address. The connection attempt will fail if the
authentication method requires a host name.
--

And PQhost() asks for the host name, so you could argue that it too
should "fail" in this situation.  But it has no way to report failure,
so that's kind of problematic.
https://www.postgresql.org/docs/9.6/static/libpq-status.html says:

--
Without either a host name or host address, libpq will connect using a
local Unix-domain socket; or on machines without Unix-domain sockets,
it will attempt to connect to localhost.
--

But that's just about where to make the connection, not what we should
consider the hostname to be for purposes other than calling
connect(2).

Kyotaro Horiguchi argues that the current behavior is "not useful" and
that's probably true, but I don't think it's the job of an API to try
as hard as possible to do something useful in every case.  It's more
important that the behavior is predictable and understandable.  In
short, if we're going to change the behavior of PQhost() here, then
there should be a documentation change to go with it, because the
current documentation around the interaction between host and hostaddr
does not make it at all clear that the current behavior is wrong, at
least not as far as I can see.  To the contrary, it suggests that if
you use hostaddr without host, you may find the results surprising or
even unfortunate:

--
Using hostaddr instead of host allows the application to avoid a host
name look-up, which might be important in applications with time
constraints. However, a host name is required for GSSAPI or SSPI
authentication methods, as well as for verify-full SSL certificate
verification. ... Note that authentication is likely to fail if host
is not the name of the server at network address hostaddr.
--

The overall impression that the documentation leaves me with is that
you are expected to use only host unless you care about saving name
lookups; then, use both host and hostaddr; if you want to use just
hostaddr you can try it, but it'll fail to work properly if you then
try to do something that needs a host name.  Calling PQhost() is,
perhaps, one such thing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] PQhost may return socket dir for network connection

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Having said that, the behavior stated in $subject does sound wrong.

> I'm not sure.  My understanding of the relationship between host and
> hostaddr is that hostaddr overrides our notion of where to find host,
> but not our notion of the host to which we're connecting.  Under that
> definition, the current behavior as described by Kyotaro sounds
> correct.

Perhaps.  But hostaddr also forces us to believe that we're making an
IP connection, so it still seems pretty dubious to return a socket
path.  The true situation is that we're connecting to an IP host that
we do not know the name of.

I notice that one of the recent changes was made to avoid situations where
PQhost() would return NULL and thereby provoke a crash if the application
wasn't expecting that (which is not unreasonable of it, since the PQhost()
documentation mentions no such hazard).  So I would not want to see us
return NULL in this case.

And I believe we already considered and rejected the idea of having it
return the hostaddr string, back in some of the older discussions.
(We could revisit that decision, no doubt, but let's go back and see
what the reasoning was first.)

But maybe returning an empty string ("") would be OK?
        regards, tom lane



Re: [HACKERS] PQhost may return socket dir for network connection

From
Robert Haas
Date:
On Mon, May 1, 2017 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Having said that, the behavior stated in $subject does sound wrong.
>
>> I'm not sure.  My understanding of the relationship between host and
>> hostaddr is that hostaddr overrides our notion of where to find host,
>> but not our notion of the host to which we're connecting.  Under that
>> definition, the current behavior as described by Kyotaro sounds
>> correct.
>
> Perhaps.  But hostaddr also forces us to believe that we're making an
> IP connection, so it still seems pretty dubious to return a socket
> path.  The true situation is that we're connecting to an IP host that
> we do not know the name of.

Yes, I think that's a reasonable interpretation.

> I notice that one of the recent changes was made to avoid situations where
> PQhost() would return NULL and thereby provoke a crash if the application
> wasn't expecting that (which is not unreasonable of it, since the PQhost()
> documentation mentions no such hazard).  So I would not want to see us
> return NULL in this case.
>
> And I believe we already considered and rejected the idea of having it
> return the hostaddr string, back in some of the older discussions.
> (We could revisit that decision, no doubt, but let's go back and see
> what the reasoning was first.)
>
> But maybe returning an empty string ("") would be OK?

Yeah, that might be OK.  But I'd be inclined not to back-patch any
behavior changes we make in this area unless it's clear that 9.6
regressed relative to previous releases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] PQhost may return socket dir for network connection

From
Kyotaro HORIGUCHI
Date:
At Mon, 1 May 2017 15:48:17 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
<CA+TgmoZcyGsb621i35AC9TJar__F9fQbmRPHfCOnE4aNE227HA@mail.gmail.com>
> On Mon, May 1, 2017 at 1:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Mon, May 1, 2017 at 12:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Having said that, the behavior stated in $subject does sound wrong.
> >
> >> I'm not sure.  My understanding of the relationship between host and
> >> hostaddr is that hostaddr overrides our notion of where to find host,
> >> but not our notion of the host to which we're connecting.  Under that
> >> definition, the current behavior as described by Kyotaro sounds
> >> correct.
> >
> > Perhaps.  But hostaddr also forces us to believe that we're making an
> > IP connection, so it still seems pretty dubious to return a socket
> > path.  The true situation is that we're connecting to an IP host that
> > we do not know the name of.
> 
> Yes, I think that's a reasonable interpretation.
> 
> > I notice that one of the recent changes was made to avoid situations where
> > PQhost() would return NULL and thereby provoke a crash if the application
> > wasn't expecting that (which is not unreasonable of it, since the PQhost()
> > documentation mentions no such hazard).  So I would not want to see us
> > return NULL in this case.
> > And I believe we already considered and rejected the idea of having it
> > return the hostaddr string, back in some of the older discussions.
> > (We could revisit that decision, no doubt, but let's go back and see
> > what the reasoning was first.)
> >
> > But maybe returning an empty string ("") would be OK?
> 
> Yeah, that might be OK.  But I'd be inclined not to back-patch any
> behavior changes we make in this area unless it's clear that 9.6
> regressed relative to previous releases.

I personally don't have a specific wish on this since I don't
have a specific usage of PQhost. (I think that users are
reposible for the result of contradicting host and hostaddr.)

However, it might be a problem that the documentation doesn't
mention what the returned value from PQhost is.

https://www.postgresql.org/docs/9.6/static/libpq-status.html

> Returns the server host name of the connection. This can be a
> host name, an IP address, or a directory path if the connection
> is via Unix socket. (The path case can be distinguished because
> it will always be an absolute path, beginning with /.)

I don't think more strict definition is required but the above
should describe the expected usage or obvious limitation. Anyway
it is contradicting to the current behavior. It can return a
socket path for a IP connection.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center