Thread: Adding a pg_servername() function

Adding a pg_servername() function

From
Laetitia Avrot
Date:
Dear Hackers,

One of my customers suggested creating a function that could return the server's hostname.

After a quick search, we found [this Wiki page](https://wiki.postgresql.org/wiki/Pg_gethostname) referring to [that extension](https://github.com/theory/pg-hostname/) from David E. Wheeler.

I used shamelessly the idea and created a working proof of concept:

- the function takes no argument and returns the hostname or a null value if any error occurs
- the function was added to the network.c file because it makes sense to me to have that near the inet_server_addr() and inet_server_port() functions.
- I didn't add any test as the inet functions are not tested either.

If you think my design is good enough, I'll go ahead and port/test that function for Windows.

Have a nice day,

Lætitia
Attachment

Re: Adding a pg_servername() function

From
Matthias van de Meent
Date:
On Thu, 3 Aug 2023 at 10:37, Laetitia Avrot <laetitia.avrot@gmail.com> wrote:
>
> Dear Hackers,
>
> One of my customers suggested creating a function that could return the server's hostname.

Mostly for my curiosity: What would be their use case?
I only see limited usability, considering that the local user's
hostname can be very different from the hostname used in the url that
connects to that PostgreSQL instance.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)



Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:


Le jeu. 3 août 2023, 11:31, Matthias van de Meent <boekewurm+postgres@gmail.com> a écrit :
On Thu, 3 Aug 2023 at 10:37, Laetitia Avrot <laetitia.avrot@gmail.com> wrote:
>
> Dear Hackers,
>
> One of my customers suggested creating a function that could return the server's hostname.

Mostly for my curiosity: What would be their use case?

Thank you for showing interest in that patch.

For my customer,  their use case is to be able from an SQL client to double check they're on the right host before doing things that could become a production disaster. 

I see also another use case: being able to identify postgres metrics on a monitoring tool. Look at the hack pg_staviz uses here: 

Those are the  use cases I can think of.

I only see limited usability, considering that the local user's
hostname can be very different from the hostname used in the url that
connects to that PostgreSQL instance.

Agreed, depending on how hosts and dns are set,  it can be useless. But normally,  companies have host making standards to avoid that.

Have a nice day, 

Lætitia 


Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Adding a pg_servername() function

From
066ce286@free.fr
Date:

> Agreed, depending on how hosts and dns are set, it can be useless.
> But normally, companies have host making standards to avoid that.


BTW you already have it : 

select setting from pg_settings where name = 'listen_addresses' 

No ?



Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:


Le jeu. 3 août 2023 à 14:20, <066ce286@free.fr> a écrit :


> Agreed, depending on how hosts and dns are set, it can be useless.
> But normally, companies have host making standards to avoid that.


BTW you already have it :

select setting from pg_settings where name = 'listen_addresses'

No ?


Hello,

No, this is not the same thing. Most of the time, we will find either the wildcard '*' or an Ip address or a list of IP addresses, which is not what we want in this particular use case. But I agree that there are some use cases where we will find the hostname there, but from my experience, that's not the majority of the use cases.

For example, you could use a VIP in listen_addresses, so that you ensure that distant connection will go through that VIP and you'll always end up on the right server even though it might be another physical instance (after switch or failover).

Lætitia
 

Re: Adding a pg_servername() function

From
Christoph Moench-Tegeder
Date:
## Laetitia Avrot (laetitia.avrot@gmail.com):

> For my customer,  their use case is to be able from an SQL client to double
> check they're on the right host before doing things that could become a
> production disaster.

Why not use cluster_name for that? Even if it may not be relevant
for their setup, there might be multiple clusters on the same host
(multiple clusters with the same hostname) or you database could be
hiding behind some failover/loadbalancer mechanism (multiple hostnames
in one cluster), thus using the hostname to identify the cluster is
totally not universal (the same goes for the monitoring/metrics
stuff). And there's of course inet_server_addr(), if you really
need to double-check if you're connected to the right machine.

Regards,
Christoph

-- 
Spare Space.



Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:


Le jeu. 3 août 2023 à 15:17, Christoph Moench-Tegeder <cmt@burggraben.net> a écrit :
## Laetitia Avrot (laetitia.avrot@gmail.com):

> For my customer,  their use case is to be able from an SQL client to double
> check they're on the right host before doing things that could become a
> production disaster.

Why not use cluster_name for that? Even if it may not be relevant
for their setup, there might be multiple clusters on the same host
(multiple clusters with the same hostname) or you database could be
hiding behind some failover/loadbalancer mechanism (multiple hostnames
in one cluster), thus using the hostname to identify the cluster is
totally not universal (the same goes for the monitoring/metrics
stuff). And there's of course inet_server_addr(), if you really
need to double-check if you're connected to the right machine.

Hello Christoph,

I understand your point and sure enough, my customer could set and use the cluster_name for that purpose. I totally disagree with using inet_server_addr() for that purpose as there are so many different network settings with VIPs and so on that it won't help. Also, a socket connection will give a NULL value, not that it's *that* bad because if it's a socket, you're running locally and could still use `\! ifconfig`, but it does not work on some configurations (docker for example). Also, most humans will find it easier to memorize a name than a number and that's one of the reasons why we remember websites' URLs instead of their IP.

I still disagree with the monitoring part. A good monitoring tool will have to reconcile data from the OS with data from the Postgres cluster. So that, we kind of need a way for the monitoring tool to know on which host this particular cluster is running and I think it's smarter to get this information from the Postgres cluster.

Of course, I do love the idea that Postgres is agnostic from where it's running, but I don't think this new function removes that.
 
Have a nice day,

Lætitia

Re: Adding a pg_servername() function

From
Christoph Moench-Tegeder
Date:
Hi,

## Laetitia Avrot (laetitia.avrot@gmail.com):

> I understand your point and sure enough, my customer could set and use the
> cluster_name for that purpose. I totally disagree with using
> inet_server_addr() for that purpose as there are so many different network
> settings with VIPs and so on that it won't help.

That depends a bit on what the exact question is: "where did I connect
to" vs "where is the thing I conencted to running".

> Also, a socket connection
> will give a NULL value, not that it's *that* bad because if it's a socket,
> you're running locally and could still use `\! ifconfig`, but it does not
> work on some configurations (docker for example).

But with a local socket, you already know where you are.

> Also, most humans will
> find it easier to memorize a name than a number and that's one of the
> reasons why we remember websites' URLs instead of their IP.

That's why we have DNS, and it works both ways (especially when
working programatically).

> I still disagree with the monitoring part. A good monitoring tool will have
> to reconcile data from the OS with data from the Postgres cluster. So that,
> we kind of need a way for the monitoring tool to know on which host this
> particular cluster is running and I think it's smarter to get this
> information from the Postgres cluster.

But that's again at least two questions in here: "what is the
instance on this host doing" and "what is the primary/read standby/...
of service X doing", and depending on that ask the base host's
primary address or the service's address. Yes, that's easy to say
when you can define the whole environment, and many setups discover
this only later in the life cycle.

Regards,
Christoph

-- 
Spare Space



Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:
Dear all,

First, thank you so much for your interest in this patch. I didn't think I would have so many answers.

I agree that the feature I'm suggesting could be done with a few tricks. I meant to simplify the life of the user by providing a simple new feature. (Also, I might have trust issues with DNS due to several past production disasters.)

My question is very simple: Do you oppose having this feature in Postgres?

In other words, should I stop working on this?

Have a nice day,

Lætitia


Re: Adding a pg_servername() function

From
Peter Eisentraut
Date:
On 09.08.23 08:42, Laetitia Avrot wrote:
> I agree that the feature I'm suggesting could be done with a few tricks. 
> I meant to simplify the life of the user by providing a simple new 
> feature. (Also, I might have trust issues with DNS due to several past 
> production disasters.)
> 
> My question is very simple: Do you oppose having this feature in Postgres?

I think this is pretty harmless(*) and can be useful, so it seems 
reasonable to pursue.

(*) But we should think about access control for this.  If you're in a 
DBaaS environment, providers might not like that you can read out their 
internal host names.  I'm not sure if there is an existing permission 
role that this could be attached to or if we need a new one.





Re: Adding a pg_servername() function

From
GF
Date:
Hi everybody,

On 09.08.23 08:42, Laetitia Avrot wrote:
> I agree that the feature I'm suggesting could be done with a few tricks.
> I meant to simplify the life of the user by providing a simple new
> feature. (Also, I might have trust issues with DNS due to several past
> production disasters.)

Just my contribution on why this function could be useful, since we had a use case that fits exactly this.
In the past I needed such a pg_servername() function because in a project I was engaged on they needed to distribute "requests" directed to a in-house management service running on network servers, and each one had a DB, so we went with pglogical to be used as a (transactional) messaging middleware. The requests were targeted to specific hostnames, thus on the replication downstream we wanted to filter with a before-insert trigger on the hostname.
At that point, we had to do some contortions to get the hostname the DB was running on, sometimes really nasty like: on linux servers we could go with a python function; but on windows ones (where there was no python available) we had to resort to a function that read the OS "hostname" command into a temporary table via the pg "copy", which is both tricky and inefficient.

So, from me +1 to Laetitia's proposal.

On Wed, 9 Aug 2023 at 10:26, Peter Eisentraut <peter@eisentraut.org> wrote:
(*) But we should think about access control for this.
 
And +1 also to Peter's note on enforcing the access control. BTW that's exactly what we needed with plpythonu, and also with "copy from program".

Best,
Giovanni

Re: Adding a pg_servername() function

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 09.08.23 08:42, Laetitia Avrot wrote:
>> My question is very simple: Do you oppose having this feature in Postgres?

> I think this is pretty harmless(*) and can be useful, so it seems 
> reasonable to pursue.

I actually do object to this, because I think the concept of "server
name" is extremely ill-defined and if we try to support it, we will
forever be chasing requests for alternative behaviors.  Just to start
with, is a prospective user expecting a fully-qualified domain name
or just the base name?  If the machine has several names (perhaps
associated with different IP addresses), what do you do about that?
I wouldn't be too surprised if users would expect to get the name
associated with the IP address that the current connection came
through.  Or at least they might tell you they want that, until
they discover they're getting "localhost.localdomain" on loopback
connections and come right back to bitch about that.

Windows likely adds a whole 'nother set of issues to "what's the
machine name", but I don't know enough about it to say what.

I think the upthread suggestion to use cluster_name is going to
be a superior solution for most people, not least because they
can use it today and it will work the same regardless of platform.

> (*) But we should think about access control for this.  If you're in a 
> DBaaS environment, providers might not like that you can read out their 
> internal host names.

There's that, too.

            regards, tom lane



Re: Adding a pg_servername() function

From
GF
Date:


On Wed, 9 Aug 2023 at 16:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 09.08.23 08:42, Laetitia Avrot wrote:
>> My question is very simple: Do you oppose having this feature in Postgres?

> I think this is pretty harmless(*) and can be useful, so it seems
> reasonable to pursue.

I actually do object to this, because I think the concept of "server
name" is extremely ill-defined

Tom,
But the gethostname() function is well defined, both in Linux and in Windows.
Maybe the proposed name pg_servername() is unfortunate in that it may suggest that DNS or something else is involved, but in Laetitia's patch the call is to gethostname().
Would it be less problematic if the function were called pg_gethostname()?
@Laetitia: why did you propose that name? maybe to avoid clashes with some extension out there?

Best,
Giovanni

Re: Adding a pg_servername() function

From
Tom Lane
Date:
GF <phabriz@gmail.com> writes:
> On Wed, 9 Aug 2023 at 16:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I actually do object to this, because I think the concept of "server
>> name" is extremely ill-defined

> But the gethostname() function is well defined, both in Linux and in
> Windows.

Sure, its call convention is standardized.  But I see nothing in POSIX
saying whether it returns a FQDN or just some random name.  In any
case, the bigger issue is that I don't really want us to expose a
function defined as "whatever gethostname() says".  I think there will
be portability issues on some platforms, and I am dubious that that
definition is what people would want.

One concrete reason why I am doubtful about this is the case of
multiple PG servers running on the same machine.  gethostname()
will be unable to distinguish them.

            regards, tom lane



Re: Adding a pg_servername() function

From
Christoph Moench-Tegeder
Date:
## GF (phabriz@gmail.com):

> In the past I needed such a pg_servername() function because in a project I
> was engaged on they needed to distribute "requests" directed to a in-house
> management service running on network servers, and each one had a DB, so we
> went with pglogical to be used as a (transactional) messaging middleware.

That sounds like an "if all you have is a hammer" kind of architecture.
Or "solutioning by fandom". Short of using an actual addressable
message bus, you could set up the node name as a configuration
parameter, or use cluster_name, or if you really must do some
COPY FROM PROGRAM magic (there's your access control) and just store
the value somewhere.
The more examples of use cases I see, the more I think that actually
beneficial use cases are really rare.

And now that I checked it: I do have systems with gethostname()
returning an FQDN, and other systems return the (short) hostname
only. And it gets worse when you're talking "container" and
"automatic image deployment". So I believe it's a good thing when
a database does not expose too much of the OS below it...

Regards,
Christoph

-- 
Spare Space



Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:


Le mer. 9 août 2023 à 17:32, GF <phabriz@gmail.com> a écrit :

Would it be less problematic if the function were called pg_gethostname()?
@Laetitia: why did you propose that name? maybe to avoid clashes with some extension out there?


I used that name to be kind of coherent with the inet_server_addr(), but I see now how it is a bad decision and can add confusion and wrong expectations. I think pg_gethostname() is better.

Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:
Dear Tom,

Thank you for your interest in that patch and for taking the time to point out several things that need to be better. Please find below my answers.

Le mer. 9 août 2023 à 16:04, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
I actually do object to this, because I think the concept of "server
name" is extremely ill-defined and if we try to support it, we will
forever be chasing requests for alternative behaviors.

Yes, that's on me with choosing a poor name. I will go with pg_gethostname().
 
Just to start
with, is a prospective user expecting a fully-qualified domain name
or just the base name?  If the machine has several names (perhaps
associated with different IP addresses), what do you do about that?
I wouldn't be too surprised if users would expect to get the name
associated with the IP address that the current connection came
through.  Or at least they might tell you they want that, until
they discover they're getting "localhost.localdomain" on loopback
connections and come right back to bitch about that.

If there is a gap between what the function does and the user expectations, it is my job to write the documentation in a more clear way to set expectations to the right level and explain precisely what this function is doing. Again, using a better name as pg_gethostname() will also help to remove the confusion.
 

Windows likely adds a whole 'nother set of issues to "what's the
machine name", but I don't know enough about it to say what.

Windows does have a similar gethostname() function (see here: https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-gethostname).
 

I think the upthread suggestion to use cluster_name is going to
be a superior solution for most people, not least because they
can use it today and it will work the same regardless of platform.

I don't think cluster_name is the same as hostname. True, people could use that parameter for that usage, but it does not feel right to entertain confusion between the cluster_name (which, in my humble opinion, should be different depending on the Postgres cluster) and the answer to "on which host is this Postgres cluster running?".


> (*) But we should think about access control for this.  If you're in a
> DBaaS environment, providers might not like that you can read out their
> internal host names.

There's that, too.

Of course, this function will need special access and DBaaS providers will be able to not allow their users to use that function, as they already do for other features. As I said, the patch is only at the stage of POC, at the moment.

Le mer. 9 août 2023 à 18:31, Tom Lane <tgl@sss.pgh.pa.us> a écrit :


One concrete reason why I am doubtful about this is the case of
multiple PG servers running on the same machine.  gethostname()
will be unable to distinguish them.


And that's where my bad name for this function brings confusion. If this function returns the hostname, then it does seem totally legit and normal to get the same if 3 Postgres clusters are running on the same host. If people want to identify their cluster, they should use cluster_name. I totally agree with that.

Why do I think this is useful?

1- In most companies I've worked for, people responsible for the OS settings, Network settings, and database settings are different persons. Also, for most companies I've worked with, maintaining their inventory and finding out which Postgres cluster is running on which host is still a difficult thing and error-prone to do. I thought that it could be nice and useful to display easily for the DBAs on which host the cluster they are connected to is running so that when they are called at 2 AM, their life could be a little easier.

2- In addition, as already pointed out, I know that pg_staviz (a monitoring tool) needs that information and uses this very dirty hack to get it (see https://github.com/vyruss/pg_statviz/blob/7cd0c694cea40f780fb8b76275c6097b5d210de6/src/pg_statviz/libs/info.py#L30)

CREATE TEMP TABLE _info(hostname text);
COPY _info FROM PROGRAM 'hostname';

3- Last but not least, as David E. Wheeler had created an extension to do so for Postgres 9.0+ and I found out a customer who asked me for this feature, I thought there might be out there more Postgres users who could find this feature helpful.

I'm sorry if I'm not making myself clear enough about the use cases of that feature. If you still object to my points, I will simply drop this patch.

Have a nice day,

Lætitia

Re: Adding a pg_servername() function

From
Laetitia Avrot
Date:
Dear Christoph,

Please find my answers below.

Le mer. 9 août 2023 à 22:05, Christoph Moench-Tegeder <cmt@burggraben.net> a écrit :
## GF (phabriz@gmail.com):

And now that I checked it: I do have systems with gethostname()
returning an FQDN, and other systems return the (short) hostname
only.

The return of gethostname() depends on what has been configured. So, yes some people will prefer a FQDN while others will prefer a short hostname. Also, it's a POSIX standard function (see https://pubs.opengroup.org/onlinepubs/9699919799/), so I don't get why getting a FQDN or a short name depending on what people set would be a problem for Postgres while it's not for Linux.
 
And it gets worse when you're talking "container" and
"automatic image deployment". So I believe it's a good thing when
a database does not expose too much of the OS below it...


I respectfully disagree. Containers still have a hostname: for example, by default, docker uses their container id. (see https://docs.docker.com/network/). In fact, the Open Container Initiative defines the hostname as a runtime configuration parameter (see https://github.com/opencontainers/runtime-spec/blob/main/config.md):

Hostname

Example

"hostname": "mrsdalloway"

Have a nice day,

Lætitia

Re: Adding a pg_servername() function

From
Jimmy Angelakos
Date:
Hi all,

FWIW, I too believe this is a feature that has been sorely missing for years, leading us to use awful hacks like running "hostname" as PROGRAM and retrieving its output.
Ideally that's what this function (or GUC) should return, what the system believes its hostname to be. It doesn't even have to return any additional hostnames it may have.

2 more points:
1. I too don't see it as relevant to the cluster's name
2. I agree that this may not be desirable for non-monitoring roles but would not go so far as to require superuser. If a cloud provider doesn't want this hostname exposed for whatever reason, they can hack Postgres to block this. They don't seem to have trouble hacking it for other reasons.

Best regards,
Jimmy

Jimmy Angelakos
Senior Solutions Architect
EDB - https://www.enterprisedb.com/


On Thu, 10 Aug 2023 at 08:44, Laetitia Avrot <laetitia.avrot@gmail.com> wrote:
Dear Tom,

Thank you for your interest in that patch and for taking the time to point out several things that need to be better. Please find below my answers.

Le mer. 9 août 2023 à 16:04, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
I actually do object to this, because I think the concept of "server
name" is extremely ill-defined and if we try to support it, we will
forever be chasing requests for alternative behaviors.

Yes, that's on me with choosing a poor name. I will go with pg_gethostname().
 
Just to start
with, is a prospective user expecting a fully-qualified domain name
or just the base name?  If the machine has several names (perhaps
associated with different IP addresses), what do you do about that?
I wouldn't be too surprised if users would expect to get the name
associated with the IP address that the current connection came
through.  Or at least they might tell you they want that, until
they discover they're getting "localhost.localdomain" on loopback
connections and come right back to bitch about that.

If there is a gap between what the function does and the user expectations, it is my job to write the documentation in a more clear way to set expectations to the right level and explain precisely what this function is doing. Again, using a better name as pg_gethostname() will also help to remove the confusion.
 

Windows likely adds a whole 'nother set of issues to "what's the
machine name", but I don't know enough about it to say what.

Windows does have a similar gethostname() function (see here: https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-gethostname).
 

I think the upthread suggestion to use cluster_name is going to
be a superior solution for most people, not least because they
can use it today and it will work the same regardless of platform.

I don't think cluster_name is the same as hostname. True, people could use that parameter for that usage, but it does not feel right to entertain confusion between the cluster_name (which, in my humble opinion, should be different depending on the Postgres cluster) and the answer to "on which host is this Postgres cluster running?".


> (*) But we should think about access control for this.  If you're in a
> DBaaS environment, providers might not like that you can read out their
> internal host names.

There's that, too.

Of course, this function will need special access and DBaaS providers will be able to not allow their users to use that function, as they already do for other features. As I said, the patch is only at the stage of POC, at the moment.

Le mer. 9 août 2023 à 18:31, Tom Lane <tgl@sss.pgh.pa.us> a écrit :


One concrete reason why I am doubtful about this is the case of
multiple PG servers running on the same machine.  gethostname()
will be unable to distinguish them.


And that's where my bad name for this function brings confusion. If this function returns the hostname, then it does seem totally legit and normal to get the same if 3 Postgres clusters are running on the same host. If people want to identify their cluster, they should use cluster_name. I totally agree with that.

Why do I think this is useful?

1- In most companies I've worked for, people responsible for the OS settings, Network settings, and database settings are different persons. Also, for most companies I've worked with, maintaining their inventory and finding out which Postgres cluster is running on which host is still a difficult thing and error-prone to do. I thought that it could be nice and useful to display easily for the DBAs on which host the cluster they are connected to is running so that when they are called at 2 AM, their life could be a little easier.

2- In addition, as already pointed out, I know that pg_staviz (a monitoring tool) needs that information and uses this very dirty hack to get it (see https://github.com/vyruss/pg_statviz/blob/7cd0c694cea40f780fb8b76275c6097b5d210de6/src/pg_statviz/libs/info.py#L30)

CREATE TEMP TABLE _info(hostname text);
COPY _info FROM PROGRAM 'hostname';

3- Last but not least, as David E. Wheeler had created an extension to do so for Postgres 9.0+ and I found out a customer who asked me for this feature, I thought there might be out there more Postgres users who could find this feature helpful.

I'm sorry if I'm not making myself clear enough about the use cases of that feature. If you still object to my points, I will simply drop this patch.

Have a nice day,

Lætitia