Re: Observability in Postgres - Mailing list pgsql-hackers

From Greg Stark
Subject Re: Observability in Postgres
Date
Msg-id CAM-w4HPkAO=uitQA6osVWB2rf42gsXNkJsCRvWqTxj7Oz-QukA@mail.gmail.com
Whole thread Raw
In response to Re: Observability in Postgres  (Magnus Hagander <magnus@hagander.net>)
Responses Re: Observability in Postgres  (Jacob Champion <pchampion@vmware.com>)
List pgsql-hackers
On Tue, 15 Feb 2022 at 17:37, Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Feb 15, 2022 at 11:24 PM Greg Stark <stark@mit.edu> wrote:
> >
> > On Tue, 15 Feb 2022 at 16:43, Magnus Hagander <magnus@hagander.net> wrote:
>
> I really don't see the problem with having the monitoring on a different port.
>
> I *do* see the problem with having a different monitoring port for
> each database in a cluster, if that's what you're saying. Definitely.

No, I'm talking about each cluster. Like, say you deploy software that
has an embedded postgres database in it and doesn't know about your
custom port mapping scheme. Or you are trying to pack as many as you
can on your servers and they're dynamically allocating ports when the
jobs start.


> But if it's 5432 for the database and 8432 for the monitoring for
> example, I'd see that as an improvement. And if you're deploying a
> larger cluster you're auto-configuring these things anyway so to have
> your environment always set "monitoring port = database port + 3000"
> for example should be trivial.

It's definitely doable -- it's what people do today -- but it would be
better if people didn't have to do this. In particular the thing that
really bothers me is that it's one of the reasons you can't write
dashboards (or alerting rules or recording rules) that work out of the
box. Your custom +3000 rule is not something that service discovery
tools or dashboards are going to know about.

And when you try to use the metrics for anything further you run into
issues. Like, if you have metrics from clients about connection errors
-- they'll have labels for the database connection address. Or if you
want to use OS metrics for the network traffic -- same thing. Or if
you want to use metrics about replication from replicas...

> But I think you'll run into a different problem much earlier. Pretty
> much everything out there is going to want to speak http(s). How are
> you going to terminate that, especially https, on the same port as a
> PostgreSQL connection? PostgreSQL will have to reply with it's initial
> negotiating byte before anything else is done, including the TLS
> negotiation, and that will kill anything http.

Yeah this is a serious problem. I think there are other even more
compelling reasons someone else was already looking at this so I'm
kind of hoping it solves itself :)


> > I assume the idea is that that kind of rich structured data belongs in
> > some other system. But I definitely see people squeezing it into
> > metrics. For things like replication topology for example.... I would
> > love to have a
>
> .... love to have a completed sentence there? :)

Oops :) I think I was going to say something like:

I would love to have a system like this but I don't know of one. I
mean there are plenty of tools that could be used to build this but
nothing that does it for you.

-- 
greg



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: USE_BARRIER_SMGRRELEASE on Linux?
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Race conditions in 019_replslot_limit.pl