Thread: Adding a pg_servername() function
Dear Hackers,
One of my customers suggested creating a function that could return the server's hostname.- the function takes no argument and returns the hostname or a null value if any error occurs
- the function was added to the network.c file because it makes sense to me to have that near the inet_server_addr() and inet_server_port() functions.
- I didn't add any test as the inet functions are not tested either.
If you think my design is good enough, I'll go ahead and port/test that function for Windows.
Have a nice day,
Lætitia
Attachment
On Thu, 3 Aug 2023 at 10:37, Laetitia Avrot <laetitia.avrot@gmail.com> wrote: > > Dear Hackers, > > One of my customers suggested creating a function that could return the server's hostname. Mostly for my curiosity: What would be their use case? I only see limited usability, considering that the local user's hostname can be very different from the hostname used in the url that connects to that PostgreSQL instance. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Le jeu. 3 août 2023, 11:31, Matthias van de Meent <boekewurm+postgres@gmail.com> a écrit :
On Thu, 3 Aug 2023 at 10:37, Laetitia Avrot <laetitia.avrot@gmail.com> wrote:
>
> Dear Hackers,
>
> One of my customers suggested creating a function that could return the server's hostname.
Mostly for my curiosity: What would be their use case?
Thank you for showing interest in that patch.
For my customer, their use case is to be able from an SQL client to double check they're on the right host before doing things that could become a production disaster.
I see also another use case: being able to identify postgres metrics on a monitoring tool. Look at the hack pg_staviz uses here:
Those are the use cases I can think of.
I only see limited usability, considering that the local user's
hostname can be very different from the hostname used in the url that
connects to that PostgreSQL instance.
Agreed, depending on how hosts and dns are set, it can be useless. But normally, companies have host making standards to avoid that.
Have a nice day,
Lætitia
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
> Agreed, depending on how hosts and dns are set, it can be useless. > But normally, companies have host making standards to avoid that. BTW you already have it : select setting from pg_settings where name = 'listen_addresses' No ?
Le jeu. 3 août 2023 à 14:20, <066ce286@free.fr> a écrit :
> Agreed, depending on how hosts and dns are set, it can be useless.
> But normally, companies have host making standards to avoid that.
BTW you already have it :
select setting from pg_settings where name = 'listen_addresses'
No ?
Hello,
No, this is not the same thing. Most of the time, we will find either the wildcard '*' or an Ip address or a list of IP addresses, which is not what we want in this particular use case. But I agree that there are some use cases where we will find the hostname there, but from my experience, that's not the majority of the use cases.
For example, you could use a VIP in listen_addresses, so that you ensure that distant connection will go through that VIP and you'll always end up on the right server even though it might be another physical instance (after switch or failover).
Lætitia
## Laetitia Avrot (laetitia.avrot@gmail.com): > For my customer, their use case is to be able from an SQL client to double > check they're on the right host before doing things that could become a > production disaster. Why not use cluster_name for that? Even if it may not be relevant for their setup, there might be multiple clusters on the same host (multiple clusters with the same hostname) or you database could be hiding behind some failover/loadbalancer mechanism (multiple hostnames in one cluster), thus using the hostname to identify the cluster is totally not universal (the same goes for the monitoring/metrics stuff). And there's of course inet_server_addr(), if you really need to double-check if you're connected to the right machine. Regards, Christoph -- Spare Space.
Le jeu. 3 août 2023 à 15:17, Christoph Moench-Tegeder <cmt@burggraben.net> a écrit :
## Laetitia Avrot (laetitia.avrot@gmail.com):
> For my customer, their use case is to be able from an SQL client to double
> check they're on the right host before doing things that could become a
> production disaster.
Why not use cluster_name for that? Even if it may not be relevant
for their setup, there might be multiple clusters on the same host
(multiple clusters with the same hostname) or you database could be
hiding behind some failover/loadbalancer mechanism (multiple hostnames
in one cluster), thus using the hostname to identify the cluster is
totally not universal (the same goes for the monitoring/metrics
stuff). And there's of course inet_server_addr(), if you really
need to double-check if you're connected to the right machine.
Hello Christoph,
I understand your point and sure enough, my customer could set and use the cluster_name for that purpose. I totally disagree with using inet_server_addr() for that purpose as there are so many different network settings with VIPs and so on that it won't help. Also, a socket connection will give a NULL value, not that it's *that* bad because if it's a socket, you're running locally and could still use `\! ifconfig`, but it does not work on some configurations (docker for example). Also, most humans will find it easier to memorize a name than a number and that's one of the reasons why we remember websites' URLs instead of their IP.
I still disagree with the monitoring part. A good monitoring tool will have to reconcile data from the OS with data from the Postgres cluster. So that, we kind of need a way for the monitoring tool to know on which host this particular cluster is running and I think it's smarter to get this information from the Postgres cluster.
Of course, I do love the idea that Postgres is agnostic from where it's running, but I don't think this new function removes that.
Have a nice day,
Lætitia
Hi, ## Laetitia Avrot (laetitia.avrot@gmail.com): > I understand your point and sure enough, my customer could set and use the > cluster_name for that purpose. I totally disagree with using > inet_server_addr() for that purpose as there are so many different network > settings with VIPs and so on that it won't help. That depends a bit on what the exact question is: "where did I connect to" vs "where is the thing I conencted to running". > Also, a socket connection > will give a NULL value, not that it's *that* bad because if it's a socket, > you're running locally and could still use `\! ifconfig`, but it does not > work on some configurations (docker for example). But with a local socket, you already know where you are. > Also, most humans will > find it easier to memorize a name than a number and that's one of the > reasons why we remember websites' URLs instead of their IP. That's why we have DNS, and it works both ways (especially when working programatically). > I still disagree with the monitoring part. A good monitoring tool will have > to reconcile data from the OS with data from the Postgres cluster. So that, > we kind of need a way for the monitoring tool to know on which host this > particular cluster is running and I think it's smarter to get this > information from the Postgres cluster. But that's again at least two questions in here: "what is the instance on this host doing" and "what is the primary/read standby/... of service X doing", and depending on that ask the base host's primary address or the service's address. Yes, that's easy to say when you can define the whole environment, and many setups discover this only later in the life cycle. Regards, Christoph -- Spare Space
Dear all,
First, thank you so much for your interest in this patch. I didn't think I would have so many answers.I agree that the feature I'm suggesting could be done with a few tricks. I meant to simplify the life of the user by providing a simple new feature. (Also, I might have trust issues with DNS due to several past production disasters.)
My question is very simple: Do you oppose having this feature in Postgres?
In other words, should I stop working on this?
Have a nice day,
Lætitia
On 09.08.23 08:42, Laetitia Avrot wrote: > I agree that the feature I'm suggesting could be done with a few tricks. > I meant to simplify the life of the user by providing a simple new > feature. (Also, I might have trust issues with DNS due to several past > production disasters.) > > My question is very simple: Do you oppose having this feature in Postgres? I think this is pretty harmless(*) and can be useful, so it seems reasonable to pursue. (*) But we should think about access control for this. If you're in a DBaaS environment, providers might not like that you can read out their internal host names. I'm not sure if there is an existing permission role that this could be attached to or if we need a new one.
Hi everybody,
On 09.08.23 08:42, Laetitia Avrot wrote:
> I agree that the feature I'm suggesting could be done with a few tricks.
> I meant to simplify the life of the user by providing a simple new
> feature. (Also, I might have trust issues with DNS due to several past
> production disasters.)
> I agree that the feature I'm suggesting could be done with a few tricks.
> I meant to simplify the life of the user by providing a simple new
> feature. (Also, I might have trust issues with DNS due to several past
> production disasters.)
Just my contribution on why this function could be useful, since we had a use case that fits exactly this.
In the past I needed such a pg_servername() function because in a project I was engaged on they needed to distribute "requests" directed to a in-house management service running on network servers, and each one had a DB, so we went with pglogical to be used as a (transactional) messaging middleware. The requests were targeted to specific hostnames, thus on the replication downstream we wanted to filter with a before-insert trigger on the hostname.
At that point, we had to do some contortions to get the hostname the DB was running on, sometimes really nasty like: on linux servers we could go with a python function; but on windows ones (where there was no python available) we had to resort to a function that read the OS "hostname" command into a temporary table via the pg "copy", which is both tricky and inefficient.
So, from me +1 to Laetitia's proposal.
On Wed, 9 Aug 2023 at 10:26, Peter Eisentraut <peter@eisentraut.org> wrote:
(*) But we should think about access control for this.
And +1 also to Peter's note on enforcing the access control. BTW that's exactly what we needed with plpythonu, and also with "copy from program".
Best,
Giovanni
Peter Eisentraut <peter@eisentraut.org> writes: > On 09.08.23 08:42, Laetitia Avrot wrote: >> My question is very simple: Do you oppose having this feature in Postgres? > I think this is pretty harmless(*) and can be useful, so it seems > reasonable to pursue. I actually do object to this, because I think the concept of "server name" is extremely ill-defined and if we try to support it, we will forever be chasing requests for alternative behaviors. Just to start with, is a prospective user expecting a fully-qualified domain name or just the base name? If the machine has several names (perhaps associated with different IP addresses), what do you do about that? I wouldn't be too surprised if users would expect to get the name associated with the IP address that the current connection came through. Or at least they might tell you they want that, until they discover they're getting "localhost.localdomain" on loopback connections and come right back to bitch about that. Windows likely adds a whole 'nother set of issues to "what's the machine name", but I don't know enough about it to say what. I think the upthread suggestion to use cluster_name is going to be a superior solution for most people, not least because they can use it today and it will work the same regardless of platform. > (*) But we should think about access control for this. If you're in a > DBaaS environment, providers might not like that you can read out their > internal host names. There's that, too. regards, tom lane
On Wed, 9 Aug 2023 at 16:05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 09.08.23 08:42, Laetitia Avrot wrote:
>> My question is very simple: Do you oppose having this feature in Postgres?
> I think this is pretty harmless(*) and can be useful, so it seems
> reasonable to pursue.
I actually do object to this, because I think the concept of "server
name" is extremely ill-defined
Tom,
But the gethostname() function is well defined, both in Linux and in Windows.
Maybe the proposed name pg_servername() is unfortunate in that it may suggest that DNS or something else is involved, but in Laetitia's patch the call is to gethostname().
Would it be less problematic if the function were called pg_gethostname()?
@Laetitia: why did you propose that name? maybe to avoid clashes with some extension out there?
Best,
Giovanni
GF <phabriz@gmail.com> writes: > On Wed, 9 Aug 2023 at 16:05, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I actually do object to this, because I think the concept of "server >> name" is extremely ill-defined > But the gethostname() function is well defined, both in Linux and in > Windows. Sure, its call convention is standardized. But I see nothing in POSIX saying whether it returns a FQDN or just some random name. In any case, the bigger issue is that I don't really want us to expose a function defined as "whatever gethostname() says". I think there will be portability issues on some platforms, and I am dubious that that definition is what people would want. One concrete reason why I am doubtful about this is the case of multiple PG servers running on the same machine. gethostname() will be unable to distinguish them. regards, tom lane
## GF (phabriz@gmail.com): > In the past I needed such a pg_servername() function because in a project I > was engaged on they needed to distribute "requests" directed to a in-house > management service running on network servers, and each one had a DB, so we > went with pglogical to be used as a (transactional) messaging middleware. That sounds like an "if all you have is a hammer" kind of architecture. Or "solutioning by fandom". Short of using an actual addressable message bus, you could set up the node name as a configuration parameter, or use cluster_name, or if you really must do some COPY FROM PROGRAM magic (there's your access control) and just store the value somewhere. The more examples of use cases I see, the more I think that actually beneficial use cases are really rare. And now that I checked it: I do have systems with gethostname() returning an FQDN, and other systems return the (short) hostname only. And it gets worse when you're talking "container" and "automatic image deployment". So I believe it's a good thing when a database does not expose too much of the OS below it... Regards, Christoph -- Spare Space
Le mer. 9 août 2023 à 17:32, GF <phabriz@gmail.com> a écrit :
Would it be less problematic if the function were called pg_gethostname()?@Laetitia: why did you propose that name? maybe to avoid clashes with some extension out there?
I used that name to be kind of coherent with the inet_server_addr(), but I see now how it is a bad decision and can add confusion and wrong expectations. I think pg_gethostname() is better.
Dear Tom,
Thank you for your interest in that patch and for taking the time to point out several things that need to be better. Please find below my answers.
I actually do object to this, because I think the concept of "server
name" is extremely ill-defined and if we try to support it, we will
forever be chasing requests for alternative behaviors.
Yes, that's on me with choosing a poor name. I will go with pg_gethostname().
Just to start
with, is a prospective user expecting a fully-qualified domain name
or just the base name? If the machine has several names (perhaps
associated with different IP addresses), what do you do about that?
I wouldn't be too surprised if users would expect to get the name
associated with the IP address that the current connection came
through. Or at least they might tell you they want that, until
they discover they're getting "localhost.localdomain" on loopback
connections and come right back to bitch about that.
If there is a gap between what the function does and the user expectations, it is my job to write the documentation in a more clear way to set expectations to the right level and explain precisely what this function is doing. Again, using a better name as pg_gethostname() will also help to remove the confusion.
Windows likely adds a whole 'nother set of issues to "what's the
machine name", but I don't know enough about it to say what.
Windows does have a similar gethostname() function (see here: https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-gethostname).
I think the upthread suggestion to use cluster_name is going to
be a superior solution for most people, not least because they
can use it today and it will work the same regardless of platform.
I don't think cluster_name is the same as hostname. True, people could use that parameter for that usage, but it does not feel right to entertain confusion between the cluster_name (which, in my humble opinion, should be different depending on the Postgres cluster) and the answer to "on which host is this Postgres cluster running?".
> (*) But we should think about access control for this. If you're in a
> DBaaS environment, providers might not like that you can read out their
> internal host names.
There's that, too.
Of course, this function will need special access and DBaaS providers will be able to not allow their users to use that function, as they already do for other features. As I said, the patch is only at the stage of POC, at the moment.
Le mer. 9 août 2023 à 18:31, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
One concrete reason why I am doubtful about this is the case of
multiple PG servers running on the same machine. gethostname()
will be unable to distinguish them.
And that's where my bad name for this function brings confusion. If this function returns the hostname, then it does seem totally legit and normal to get the same if 3 Postgres clusters are running on the same host. If people want to identify their cluster, they should use cluster_name. I totally agree with that.
Why do I think this is useful?
1- In most companies I've worked for, people responsible for the OS settings, Network settings, and database settings are different persons. Also, for most companies I've worked with, maintaining their inventory and finding out which Postgres cluster is running on which host is still a difficult thing and error-prone to do. I thought that it could be nice and useful to display easily for the DBAs on which host the cluster they are connected to is running so that when they are called at 2 AM, their life could be a little easier.
2- In addition, as already pointed out, I know that pg_staviz (a monitoring tool) needs that information and uses this very dirty hack to get it (see https://github.com/vyruss/pg_statviz/blob/7cd0c694cea40f780fb8b76275c6097b5d210de6/src/pg_statviz/libs/info.py#L30)
CREATE TEMP TABLE _info(hostname text);
COPY _info FROM PROGRAM 'hostname';
3- Last but not least, as David E. Wheeler had created an extension to do so for Postgres 9.0+ and I found out a customer who asked me for this feature, I thought there might be out there more Postgres users who could find this feature helpful.
I'm sorry if I'm not making myself clear enough about the use cases of that feature. If you still object to my points, I will simply drop this patch.
Have a nice day,
Lætitia
Dear Christoph,
Please find my answers below.
Le mer. 9 août 2023 à 22:05, Christoph Moench-Tegeder <cmt@burggraben.net> a écrit :
## GF (phabriz@gmail.com):
And now that I checked it: I do have systems with gethostname()
returning an FQDN, and other systems return the (short) hostname
only.
The return of gethostname() depends on what has been configured. So, yes some people will prefer a FQDN while others will prefer a short hostname. Also, it's a POSIX standard function (see https://pubs.opengroup.org/onlinepubs/9699919799/), so I don't get why getting a FQDN or a short name depending on what people set would be a problem for Postgres while it's not for Linux.
And it gets worse when you're talking "container" and
"automatic image deployment". So I believe it's a good thing when
a database does not expose too much of the OS below it...
I respectfully disagree. Containers still have a hostname: for example, by default, docker uses their container id. (see https://docs.docker.com/network/). In fact, the Open Container Initiative defines the hostname as a runtime configuration parameter (see https://github.com/opencontainers/runtime-spec/blob/main/config.md):
Hostname
hostname
(string, OPTIONAL) specifies the container's hostname as seen by processes running inside the container. On Linux, for example, this will change the hostname in the container UTS namespace. Depending on your namespace configuration, the container UTS namespace may be the runtime UTS namespace.Example
"hostname": "mrsdalloway"
Have a nice day,
Lætitia
Hi all,
FWIW, I too believe this is a feature that has been sorely missing for years, leading us to use awful hacks like running "hostname" as PROGRAM and retrieving its output. Ideally that's what this function (or GUC) should return, what the system believes its hostname to be. It doesn't even have to return any additional hostnames it may have.
2 more points:
1. I too don't see it as relevant to the cluster's name
On Thu, 10 Aug 2023 at 08:44, Laetitia Avrot <laetitia.avrot@gmail.com> wrote:
Dear Tom,Thank you for your interest in that patch and for taking the time to point out several things that need to be better. Please find below my answers.I actually do object to this, because I think the concept of "server
name" is extremely ill-defined and if we try to support it, we will
forever be chasing requests for alternative behaviors.Yes, that's on me with choosing a poor name. I will go with pg_gethostname().Just to start
with, is a prospective user expecting a fully-qualified domain name
or just the base name? If the machine has several names (perhaps
associated with different IP addresses), what do you do about that?
I wouldn't be too surprised if users would expect to get the name
associated with the IP address that the current connection came
through. Or at least they might tell you they want that, until
they discover they're getting "localhost.localdomain" on loopback
connections and come right back to bitch about that.If there is a gap between what the function does and the user expectations, it is my job to write the documentation in a more clear way to set expectations to the right level and explain precisely what this function is doing. Again, using a better name as pg_gethostname() will also help to remove the confusion.
Windows likely adds a whole 'nother set of issues to "what's the
machine name", but I don't know enough about it to say what.Windows does have a similar gethostname() function (see here: https://learn.microsoft.com/en-us/windows/win32/api/winsock/nf-winsock-gethostname).
I think the upthread suggestion to use cluster_name is going to
be a superior solution for most people, not least because they
can use it today and it will work the same regardless of platform.I don't think cluster_name is the same as hostname. True, people could use that parameter for that usage, but it does not feel right to entertain confusion between the cluster_name (which, in my humble opinion, should be different depending on the Postgres cluster) and the answer to "on which host is this Postgres cluster running?".
> (*) But we should think about access control for this. If you're in a
> DBaaS environment, providers might not like that you can read out their
> internal host names.
There's that, too.Of course, this function will need special access and DBaaS providers will be able to not allow their users to use that function, as they already do for other features. As I said, the patch is only at the stage of POC, at the moment.Le mer. 9 août 2023 à 18:31, Tom Lane <tgl@sss.pgh.pa.us> a écrit :
One concrete reason why I am doubtful about this is the case of
multiple PG servers running on the same machine. gethostname()
will be unable to distinguish them.And that's where my bad name for this function brings confusion. If this function returns the hostname, then it does seem totally legit and normal to get the same if 3 Postgres clusters are running on the same host. If people want to identify their cluster, they should use cluster_name. I totally agree with that.Why do I think this is useful?1- In most companies I've worked for, people responsible for the OS settings, Network settings, and database settings are different persons. Also, for most companies I've worked with, maintaining their inventory and finding out which Postgres cluster is running on which host is still a difficult thing and error-prone to do. I thought that it could be nice and useful to display easily for the DBAs on which host the cluster they are connected to is running so that when they are called at 2 AM, their life could be a little easier.2- In addition, as already pointed out, I know that pg_staviz (a monitoring tool) needs that information and uses this very dirty hack to get it (see https://github.com/vyruss/pg_statviz/blob/7cd0c694cea40f780fb8b76275c6097b5d210de6/src/pg_statviz/libs/info.py#L30)CREATE TEMP TABLE _info(hostname text);COPY _info FROM PROGRAM 'hostname';3- Last but not least, as David E. Wheeler had created an extension to do so for Postgres 9.0+ and I found out a customer who asked me for this feature, I thought there might be out there more Postgres users who could find this feature helpful.I'm sorry if I'm not making myself clear enough about the use cases of that feature. If you still object to my points, I will simply drop this patch.Have a nice day,Lætitia