Thread: Observability in Postgres

Observability in Postgres

From

Greg Stark

Date:

14 February 2022, 20:15:14

So I've been dealing a lot with building and maintaining dashboards
for (fleets of) Postgres servers. And it's a pain. I have a few
strongly held ideas about where the pain points are and what the right
ways to tackle them are. Some of which are going to be controversial I
think...

The state of the art is pg_exporter which is a separate client that
connects to the database and runs SQL queries to extract database
metrics. The pain points I've seen are:

1) The exporter isn't reliable when things are failing. If your
clients can't connect the exporter also can't connect leading to data
gaps in your metrics for precisely the time windows where you need
data. This can happen to connection exhaustion, xid wraparound, or
even something as simple as someone taking an exclusive lock on
something used in the sql queries.

2) SQL connections are tied to specific databases within a cluster.
Making it hard to get data for all your databases if you have more
than one. The exporter needs to reconnect to each database.

3) The exporter needs to listen on a different port from the
postmaster. Making it necessary to write software to manage the
mapping from server port to exporter port and that's left to the
end-user as it varies from site to site.

4) The queries are customizable (the built-in ones don't exhaustively
exporter postgres's metrics). As a result there's no standard
dashboard that will work on any site out of the box. Moreover issue
(3) also makes it impossible to implement one that works properly.

5) data needs to be marshaled from shared memory into SQL and then
read by the client and re-emitted in the metric format. The double
processing requires writing SQL queries very carefully to avoid losing
fidelity for things like LSN positions, xids, etc. Moreover the
latency and gathering data from multiple SQL queries results in
metrics that are often out of sync with each other making them hard to
interpret.

All this said, I think we should have a component in Postgres that
reads from the stats data directly and outputs metrics in standard
metrics format directly. This would probably take the form of a
background worker with a few tricky bits.

This would mean there would be a standard official set of metrics
available that a standard dashboard could rely on to be present at any
site and it would be reliable if the SQL layer isn't functioning due
to lack of connections or xid wraparound or locking issues.

The elephant in the room is that issue (3) requires a bit of sleight
of hand. Ideally I would want it to be listening on the same ports as
the database. That means having the postmaster recognize metric
requests and hand them to the metrics background worker instead of a
backend. I'm not sure people are going to be ok with that....

For now my approach is to implement a background worker that listens
on a new port and is basically its own small web server with shared
memory access. This ignores issue (3) and my hope is that when we have
some experience with this approach we'll see how reliable it is and
how comfortable we are with the kind of hacking in postmaster it would
take to fix it. Fwiw I do think this is an important issue and not one
that we can ignore indefinitely.

There is another elephant in the room (it's a big room) which is that
this all makes sense for stats data. It doesn't make much sense for
data that currently lives in pg_class, pg_index, etc. In other words
I'm mostly solving (2) by ignoring it and concentrating on stats data.

I haven't settled on a good solution for that data. I vaguely lean
towards saying that the volatile metrics in those tables should really
live in stats or at least be mirrored there. That makes a clean
definition of what Postgres thinks a metric is and what it thinks
catalog data is. But I'm not sure that will really work in practice.
In particular I think it's likely we'll need to get catalog data from
every database anyways, for example to label things like tables with
better labels than oids.

This work is being funded by Aiven which is really interested in
improving observability and integration between Postgres and other
open source cloud software.

-- 
greg

Re: Observability in Postgres

From

Nikolay Samokhvalov

Date:

14 February 2022, 21:45:51

On Mon, Feb 14, 2022 at 10:15 Greg Stark <stark@mit.edu> wrote:

....
For now my approach is to implement a background worker that listens
on a new port and is basically its own small web server with shared
memory access

This reminds me bg_mon (included into Spilo, docker image used by Zalando operator for Postgres):

https://github.com/CyberDem0n/bg_mon

Re: Observability in Postgres

From

Dave Page

Date:

15 February 2022, 12:30:28

Hi Greg,

On Mon, 14 Feb 2022 at 20:16, Greg Stark <stark@mit.edu> wrote:

So I've been dealing a lot with building and maintaining dashboards
for (fleets of) Postgres servers. And it's a pain. I have a few
strongly held ideas about where the pain points are and what the right
ways to tackle them are. Some of which are going to be controversial I
think...

The state of the art is pg_exporter which is a separate client that
connects to the database and runs SQL queries to extract database
metrics. The pain points I've seen are:

1) The exporter isn't reliable when things are failing. If your
clients can't connect the exporter also can't connect leading to data
gaps in your metrics for precisely the time windows where you need
data. This can happen to connection exhaustion, xid wraparound, or
even something as simple as someone taking an exclusive lock on
something used in the sql queries.

2) SQL connections are tied to specific databases within a cluster.
Making it hard to get data for all your databases if you have more
than one. The exporter needs to reconnect to each database.

3) The exporter needs to listen on a different port from the
postmaster. Making it necessary to write software to manage the
mapping from server port to exporter port and that's left to the
end-user as it varies from site to site.

4) The queries are customizable (the built-in ones don't exhaustively
exporter postgres's metrics). As a result there's no standard
dashboard that will work on any site out of the box. Moreover issue
(3) also makes it impossible to implement one that works properly.

5) data needs to be marshaled from shared memory into SQL and then
read by the client and re-emitted in the metric format. The double
processing requires writing SQL queries very carefully to avoid losing
fidelity for things like LSN positions, xids, etc. Moreover the
latency and gathering data from multiple SQL queries results in
metrics that are often out of sync with each other making them hard to
interpret.

All this said, I think we should have a component in Postgres that
reads from the stats data directly and outputs metrics in standard
metrics format directly. This would probably take the form of a
background worker with a few tricky bits.

This would mean there would be a standard official set of metrics
available that a standard dashboard could rely on to be present at any
site and it would be reliable if the SQL layer isn't functioning due
to lack of connections or xid wraparound or locking issues.

The elephant in the room is that issue (3) requires a bit of sleight
of hand. Ideally I would want it to be listening on the same ports as
the database. That means having the postmaster recognize metric
requests and hand them to the metrics background worker instead of a
backend. I'm not sure people are going to be ok with that....

For now my approach is to implement a background worker that listens
on a new port and is basically its own small web server with shared
memory access. This ignores issue (3) and my hope is that when we have
some experience with this approach we'll see how reliable it is and
how comfortable we are with the kind of hacking in postmaster it would
take to fix it. Fwiw I do think this is an important issue and not one
that we can ignore indefinitely.

There is another elephant in the room (it's a big room) which is that
this all makes sense for stats data. It doesn't make much sense for
data that currently lives in pg_class, pg_index, etc. In other words
I'm mostly solving (2) by ignoring it and concentrating on stats data.

I haven't settled on a good solution for that data. I vaguely lean
towards saying that the volatile metrics in those tables should really
live in stats or at least be mirrored there. That makes a clean
definition of what Postgres thinks a metric is and what it thinks
catalog data is. But I'm not sure that will really work in practice.
In particular I think it's likely we'll need to get catalog data from
every database anyways, for example to label things like tables with
better labels than oids.

This work is being funded by Aiven which is really interested in
improving observability and integration between Postgres and other
open source cloud software.

I agree with pretty much everything above, bar a couple of points:

- Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing that is a good thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the monitoring traffic to a management VLAN for example.

- I strongly dislike the idea of building this around the prometheus exporter format. Whilst that is certainly a useful format if you're using prom (as many do), it does have limitations and quirks that would make it painful for other systems to use; for example, the need to encode non-numeric data into labels rather than the metrics themselves (e.g. server version strings or LSNs). I would much prefer to see a common format such as JSON used by default, and perhaps offer a hook to allow alternate formatters to replace that. The prometheus format is also pretty inefficient, as you have to repeat all the key data (labels) for each individual metric.

Dave Page
Blog: https://pgsnake.blogspot.com
Twitter: @pgsnake

EDB: https://www.enterprisedb.com

Re: Observability in Postgres

From

Joe Conway

Date:

15 February 2022, 14:10:25

On 2/15/22 07:30, Dave Page wrote:
> On Mon, 14 Feb 2022 at 20:16, Greg Stark <stark@mit.edu 
> <mailto:stark@mit.edu>> wrote:
> 
>     So I've been dealing a lot with building and maintaining dashboards
>     for (fleets of) Postgres servers. And it's a pain. I have a few
>     strongly held ideas about where the pain points are and what the right
>     ways to tackle them are. Some of which are going to be controversial I
>     think...

<snip>

>     This work is being funded by Aiven which is really interested in
>     improving observability and integration between Postgres and other
>     open source cloud software.
> 
> 
> I agree with pretty much everything above, bar a couple of points:
> 
> - Does it really matter if metrics are exposed on a separate port from 
> the postmaster? I actually think doing that is a good thing as it allows 
> use of alternative listen addresses and firewalling rules; you could 
> then confine the monitoring traffic to a management VLAN for example.
> 
> - I strongly dislike the idea of building this around the 
> prometheus exporter format. Whilst that is certainly a useful format if 
> you're using prom (as many do), it does have limitations and quirks that 
> would make it painful for other systems to use; for example, the need to 
> encode non-numeric data into labels rather than the metrics themselves 
> (e.g. server version strings or LSNs). I would much prefer to see a 
> common format such as JSON used by default, and perhaps offer a hook to 
> allow alternate formatters to replace that. The prometheus format is 
> also pretty inefficient, as you have to repeat all the key data (labels) 
> for each individual metric.


+1 to Dave's points

Joe
---
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: Observability in Postgres

From

Magnus Hagander

Date:

15 February 2022, 21:43:10

On Tue, Feb 15, 2022 at 1:30 PM Dave Page <dpage@pgadmin.org> wrote:
>
> Hi Greg,
>
> On Mon, 14 Feb 2022 at 20:16, Greg Stark <stark@mit.edu> wrote:
>>
>> So I've been dealing a lot with building and maintaining dashboards
>> for (fleets of) Postgres servers. And it's a pain. I have a few
>> strongly held ideas about where the pain points are and what the right
>> ways to tackle them are. Some of which are going to be controversial I
>> think...
>>
>> The state of the art is pg_exporter which is a separate client that
>> connects to the database and runs SQL queries to extract database
>> metrics. The pain points I've seen are:
>>
>> 1) The exporter isn't reliable when things are failing. If your
>> clients can't connect the exporter also can't connect leading to data
>> gaps in your metrics for precisely the time windows where you need
>> data. This can happen to connection exhaustion, xid wraparound, or
>> even something as simple as someone taking an exclusive lock on
>> something used in the sql queries.
>>
>> 2) SQL connections are tied to specific databases within a cluster.
>> Making it hard to get data for all your databases if you have more
>> than one. The exporter needs to reconnect to each database.
>>
>> 3) The exporter needs to listen on a different port from the
>> postmaster. Making it necessary to write software to manage the
>> mapping from server port to exporter port and that's left to the
>> end-user as it varies from site to site.
>>
>> 4) The queries are customizable (the built-in ones don't exhaustively
>> exporter postgres's metrics). As a result there's no standard
>> dashboard that will work on any site out of the box. Moreover issue
>> (3) also makes it impossible to implement one that works properly.
>>
>> 5) data needs to be marshaled from shared memory into SQL and then
>> read by the client and re-emitted in the metric format. The double
>> processing requires writing SQL queries very carefully to avoid losing
>> fidelity for things like LSN positions, xids, etc. Moreover the
>> latency and gathering data from multiple SQL queries results in
>> metrics that are often out of sync with each other making them hard to
>> interpret.
>>
>> All this said, I think we should have a component in Postgres that
>> reads from the stats data directly and outputs metrics in standard
>> metrics format directly. This would probably take the form of a
>> background worker with a few tricky bits.
>>
>> This would mean there would be a standard official set of metrics
>> available that a standard dashboard could rely on to be present at any
>> site and it would be reliable if the SQL layer isn't functioning due
>> to lack of connections or xid wraparound or locking issues.
>>
>> The elephant in the room is that issue (3) requires a bit of sleight
>> of hand. Ideally I would want it to be listening on the same ports as
>> the database. That means having the postmaster recognize metric
>> requests and hand them to the metrics background worker instead of a
>> backend. I'm not sure people are going to be ok with that....
>>
>> For now my approach is to implement a background worker that listens
>> on a new port and is basically its own small web server with shared
>> memory access. This ignores issue (3) and my hope is that when we have
>> some experience with this approach we'll see how reliable it is and
>> how comfortable we are with the kind of hacking in postmaster it would
>> take to fix it. Fwiw I do think this is an important issue and not one
>> that we can ignore indefinitely.
>>
>> There is another elephant in the room (it's a big room) which is that
>> this all makes sense for stats data. It doesn't make much sense for
>> data that currently lives in pg_class, pg_index, etc. In other words
>> I'm mostly solving (2) by ignoring it and concentrating on stats data.
>>
>> I haven't settled on a good solution for that data. I vaguely lean
>> towards saying that the volatile metrics in those tables should really
>> live in stats or at least be mirrored there. That makes a clean
>> definition of what Postgres thinks a metric is and what it thinks
>> catalog data is. But I'm not sure that will really work in practice.
>> In particular I think it's likely we'll need to get catalog data from
>> every database anyways, for example to label things like tables with
>> better labels than oids.
>>
>> This work is being funded by Aiven which is really interested in
>> improving observability and integration between Postgres and other
>> open source cloud software.
>
>
> I agree with pretty much everything above, bar a couple of points:
>
> - Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing that is
agood thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the
monitoringtraffic to a management VLAN for example. 

+1. I think it would be much better to keep it on a separate port.

Doesn't even have to be to the point of VLANs or whatever. You just
want your firewall rules to be able to know what data it's talking
about.

Another part missing in the proposal is how to deal with
authentication. That'll be an even harder problem if it sits on the
same port but speaks a different protocol. How would it work with
pg_hba etc?


> - I strongly dislike the idea of building this around the prometheus exporter format. Whilst that is certainly a
usefulformat if you're using prom (as many do), it does have limitations and quirks that would make it painful for
othersystems to use; for example, the need to encode non-numeric data into labels rather than the metrics themselves
(e.g.server version strings or LSNs). I would much prefer to see a common format such as JSON used by default, and
perhapsoffer a hook to allow alternate formatters to replace that. The prometheus format is also pretty inefficient, as
youhave to repeat all the key data (labels) for each individual metric. 

There's good and bad with it. The bug "good" with it is that it's an
open standard (openmetrics). I think supporting that would be a very
good idea. But it would also be good to have a different, "richer",
format available. Whether it'd be worth to go the full "postgresql
way" and make it pluggable is questionable, but I would suggest at
least having both openmetrics and a native/richer one, and not just
the latter. Being able to just point your existing monitoring system
at a postgres instance (with auth configured) and have things just
shows up is in itself a large value. (Then either pluggable or hooks
beyond that, but having both those as native)

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Observability in Postgres

From

Greg Stark

Date:

15 February 2022, 22:23:44

On Tue, 15 Feb 2022 at 16:43, Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Feb 15, 2022 at 1:30 PM Dave Page <dpage@pgadmin.org> wrote:
> >
> > - Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing that
isa good thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the
monitoringtraffic to a management VLAN for example. 
>
> +1. I think it would be much better to keep it on a separate port.
>
> Doesn't even have to be to the point of VLANs or whatever. You just
> want your firewall rules to be able to know what data it's talking
> about.

I would definitely want that to be an option that could be configured.
If you're deploying a server to be accessible as a public service and
configuring firewall rules etc then sure you probably want to be very
explicit about what is listening where.

But when you're deploying databases automatically in a clustered type
environment you really want a service to deploy on a given port and
have the monitoring associated with that port as well. If you deploy
five databases you don't want to have to deal with five other ports
for monitoring and then have to maintain a database of which
monitoring ports are associated with which service ports.... It's
definitely doable -- that's what people do today -- but it's a pain
and it's fragile and it's different at each site which makes it
impossible for dashboards to work out of the box.

> Another part missing in the proposal is how to deal with
> authentication. That'll be an even harder problem if it sits on the
> same port but speaks a different protocol. How would it work with
> pg_hba etc?

Wouldn't it make it easier to work with pg_hba? If incoming
connections are coming through pg_hba then postmaster gets to accept
or refuse the connection based on the host and TLS information. If
it's listening on a separate port then unless that logic is duplicated
it'll be stuck in a parallel world with different security rules.

I'm not actually sure how to make this work. There's a feature in Unix
where a file descriptor can be passed over from one process to another
over a socket but that's gotta be a portability pain. And starting a
new worker for each incoming connection would be a different pain.

So right now I'm kind of guessing this might be just a hook in
postmaster that we can experiment with in the module. The hook would
just return a flag to postmaster saying the connection was handled.

> There's good and bad with it. The bug "good" with it is that it's an
> open standard (openmetrics). I think supporting that would be a very
> good idea. But it would also be good to have a different, "richer",
> format available. Whether it'd be worth to go the full "postgresql
> way" and make it pluggable is questionable, but I would suggest at
> least having both openmetrics and a native/richer one, and not just
> the latter. Being able to just point your existing monitoring system
> at a postgres instance (with auth configured) and have things just
> shows up is in itself a large value. (Then either pluggable or hooks
> beyond that, but having both those as native)

Ideally I would want to provide OpenMetrics data that doesn't break
compatibility with OpenTelemetry -- which I'm still not 100% sure I
understand but I gather that means following certain conventions about
metadata. But those standards only have quantitive metrics, no rich
structured data.

I assume the idea is that that kind of rich structured data belongs in
some other system. But I definitely see people squeezing it into
metrics. For things like replication topology for example.... I would
love to have a

Personally I feel similarly about the inefficiency but I think the
feeling is that compression makes it irrelevant. I suspect there's a
fair amount of burnout over predecessors like SNMP that went to a lot
of trouble to be efficient and implementations were always buggy and
impenetrable as a result. (The predecessor in Google had some features
that made it slightly more efficient too but also made it more
complex. It seems intentional that they didn't carry those over too)

Fwiw one constant source of pain is the insistence on putting
everything into floating point numbers. They have 56 bits of precision
and that leaves us not quite being able to represent an LSN or 64-bit
xid for example.

--
greg

Re: Observability in Postgres

From

Magnus Hagander

Date:

15 February 2022, 22:37:07

On Tue, Feb 15, 2022 at 11:24 PM Greg Stark <stark@mit.edu> wrote:
>
> On Tue, 15 Feb 2022 at 16:43, Magnus Hagander <magnus@hagander.net> wrote:
> >
> > On Tue, Feb 15, 2022 at 1:30 PM Dave Page <dpage@pgadmin.org> wrote:
> > >
> > > - Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing
thatis a good thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the
monitoringtraffic to a management VLAN for example. 
> >
> > +1. I think it would be much better to keep it on a separate port.
> >
> > Doesn't even have to be to the point of VLANs or whatever. You just
> > want your firewall rules to be able to know what data it's talking
> > about.
>
> I would definitely want that to be an option that could be configured.
> If you're deploying a server to be accessible as a public service and
> configuring firewall rules etc then sure you probably want to be very
> explicit about what is listening where.
>
> But when you're deploying databases automatically in a clustered type
> environment you really want a service to deploy on a given port and
> have the monitoring associated with that port as well. If you deploy
> five databases you don't want to have to deal with five other ports
> for monitoring and then have to maintain a database of which
> monitoring ports are associated with which service ports.... It's
> definitely doable -- that's what people do today -- but it's a pain
> and it's fragile and it's different at each site which makes it
> impossible for dashboards to work out of the box.

I really don't see the problem with having the monitoring on a different port.

I *do* see the problem with having a different monitoring port for
each database in a cluster, if that's what you're saying. Definitely.

But if it's 5432 for the database and 8432 for the monitoring for
example, I'd see that as an improvement. And if you're deploying a
larger cluster you're auto-configuring these things anyway so to have
your environment always set "monitoring port = database port + 3000"
for example should be trivial.


> > Another part missing in the proposal is how to deal with
> > authentication. That'll be an even harder problem if it sits on the
> > same port but speaks a different protocol. How would it work with
> > pg_hba etc?
>
> Wouldn't it make it easier to work with pg_hba? If incoming
> connections are coming through pg_hba then postmaster gets to accept
> or refuse the connection based on the host and TLS information. If
> it's listening on a separate port then unless that logic is duplicated
> it'll be stuck in a parallel world with different security rules.

I guess you could map it against yet another virtual database, like we
do with streaming replication.


> I'm not actually sure how to make this work. There's a feature in Unix
> where a file descriptor can be passed over from one process to another
> over a socket but that's gotta be a portability pain. And starting a
> new worker for each incoming connection would be a different pain.
>
> So right now I'm kind of guessing this might be just a hook in
> postmaster that we can experiment with in the module. The hook would
> just return a flag to postmaster saying the connection was handled.

If it was as easy as username/password you could just have a comm
channel between postmaster and a bgworker for example. But you also
have to implement things liker GSSAPI authentication.

But I think you'll run into a different problem much earlier. Pretty
much everything out there is going to want to speak http(s). How are
you going to terminate that, especially https, on the same port as a
PostgreSQL connection? PostgreSQL will have to reply with it's initial
negotiating byte before anything else is done, including the TLS
negotiation, and that will kill anything http.

And if your metrics endpoint isn't going to speak http, you've given
up the ability for the "plug and play setup".


> > There's good and bad with it. The bug "good" with it is that it's an
> > open standard (openmetrics). I think supporting that would be a very
> > good idea. But it would also be good to have a different, "richer",
> > format available. Whether it'd be worth to go the full "postgresql
> > way" and make it pluggable is questionable, but I would suggest at
> > least having both openmetrics and a native/richer one, and not just
> > the latter. Being able to just point your existing monitoring system
> > at a postgres instance (with auth configured) and have things just
> > shows up is in itself a large value. (Then either pluggable or hooks
> > beyond that, but having both those as native)
>
> Ideally I would want to provide OpenMetrics data that doesn't break
> compatibility with OpenTelemetry -- which I'm still not 100% sure I
> understand but I gather that means following certain conventions about
> metadata. But those standards only have quantitive metrics, no rich
> structured data.

Yeah. That's why I think the reasonable thing is to provide both.


> I assume the idea is that that kind of rich structured data belongs in
> some other system. But I definitely see people squeezing it into
> metrics. For things like replication topology for example.... I would
> love to have a

.... love to have a completed sentence there? :)


> Personally I feel similarly about the inefficiency but I think the
> feeling is that compression makes it irrelevant. I suspect there's a
> fair amount of burnout over predecessors like SNMP that went to a lot
> of trouble to be efficient and implementations were always buggy and
> impenetrable as a result. (The predecessor in Google had some features
> that made it slightly more efficient too but also made it more
> complex. It seems intentional that they didn't carry those over too)

Given the amount of metrics that people pull in through prometheus or
similar today, I think most have just accepted the overhead. You can
of course serve it over protobufs to make it more efficient, but yes
the basic part is still there -- but overall the market has spoken and
said they accept that one.


> Fwiw one constant source of pain is the insistence on putting
> everything into floating point numbers. They have 56 bits of precision
> and that leaves us not quite being able to represent an LSN or 64-bit
> xid for example.

Yeah, it's clearly designed around certain types of metrics only...

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Observability in Postgres

From

Stephan Doliov

Date:

16 February 2022, 03:47:41

I am curious what you mean by standard metrics format? I am all for standards-based but what are those in the case of DBs. For environments where O11y matters a lot, I think the challenge lies in mapping specific query executions back to system characteristics. I am just thinking aloud as a newbie to this community.

On Mon, Feb 14, 2022, 12:16 PM Greg Stark <stark@mit.edu> wrote:

So I've been dealing a lot with building and maintaining dashboards
for (fleets of) Postgres servers. And it's a pain. I have a few
strongly held ideas about where the pain points are and what the right
ways to tackle them are. Some of which are going to be controversial I
think...

The state of the art is pg_exporter which is a separate client that
connects to the database and runs SQL queries to extract database
metrics. The pain points I've seen are:

1) The exporter isn't reliable when things are failing. If your
clients can't connect the exporter also can't connect leading to data
gaps in your metrics for precisely the time windows where you need
data. This can happen to connection exhaustion, xid wraparound, or
even something as simple as someone taking an exclusive lock on
something used in the sql queries.

2) SQL connections are tied to specific databases within a cluster.
Making it hard to get data for all your databases if you have more
than one. The exporter needs to reconnect to each database.

3) The exporter needs to listen on a different port from the
postmaster. Making it necessary to write software to manage the
mapping from server port to exporter port and that's left to the
end-user as it varies from site to site.

4) The queries are customizable (the built-in ones don't exhaustively
exporter postgres's metrics). As a result there's no standard
dashboard that will work on any site out of the box. Moreover issue
(3) also makes it impossible to implement one that works properly.

5) data needs to be marshaled from shared memory into SQL and then
read by the client and re-emitted in the metric format. The double
processing requires writing SQL queries very carefully to avoid losing
fidelity for things like LSN positions, xids, etc. Moreover the
latency and gathering data from multiple SQL queries results in
metrics that are often out of sync with each other making them hard to
interpret.

All this said, I think we should have a component in Postgres that
reads from the stats data directly and outputs metrics in standard
metrics format directly. This would probably take the form of a
background worker with a few tricky bits.

This would mean there would be a standard official set of metrics
available that a standard dashboard could rely on to be present at any
site and it would be reliable if the SQL layer isn't functioning due
to lack of connections or xid wraparound or locking issues.

The elephant in the room is that issue (3) requires a bit of sleight
of hand. Ideally I would want it to be listening on the same ports as
the database. That means having the postmaster recognize metric
requests and hand them to the metrics background worker instead of a
backend. I'm not sure people are going to be ok with that....

For now my approach is to implement a background worker that listens
on a new port and is basically its own small web server with shared
memory access. This ignores issue (3) and my hope is that when we have
some experience with this approach we'll see how reliable it is and
how comfortable we are with the kind of hacking in postmaster it would
take to fix it. Fwiw I do think this is an important issue and not one
that we can ignore indefinitely.

There is another elephant in the room (it's a big room) which is that
this all makes sense for stats data. It doesn't make much sense for
data that currently lives in pg_class, pg_index, etc. In other words
I'm mostly solving (2) by ignoring it and concentrating on stats data.

I haven't settled on a good solution for that data. I vaguely lean
towards saying that the volatile metrics in those tables should really
live in stats or at least be mirrored there. That makes a clean
definition of what Postgres thinks a metric is and what it thinks
catalog data is. But I'm not sure that will really work in practice.
In particular I think it's likely we'll need to get catalog data from
every database anyways, for example to label things like tables with
better labels than oids.

This work is being funded by Aiven which is really interested in
improving observability and integration between Postgres and other
open source cloud software.

--
greg

Re: Observability in Postgres

From

Greg Stark

Date:

16 February 2022, 05:48:11

On Tue, 15 Feb 2022 at 22:48, Stephan Doliov <stephan.doliov@gmail.com> wrote:
>
> I am curious what you mean by standard metrics format? I am all for standards-based but what are those in the case of
DBs.For environments where O11y matters a lot, I think the challenge lies in mapping specific query executions back to
systemcharacteristics. I am just thinking aloud as a newbie to this community. 

I was about to reply simply that the standard the open source world is
coalescing around is OpenMetrics. This is basically a codification of
the prometheus format with some extra features.

But on further thought I think what you're asking is whether there are
standard database metrics and standard names for them. A lot of this
work has already been done with pg_exporter but it is worth looking at
other database software and see if there are opportunities to
standardize metrics naming for across databases.

I don't really expect a lot of deep examples of that though. As soon
as you dig under the surface the differences quickly add up so you'll
have different labels with different semantics. But sure, it would be
nice if you could do simple high level queries like "number of active
connections" using the same metrics on mysql, postgres, and ...
whatever.

--
greg

Re: Observability in Postgres

From

Greg Stark

Date:

16 February 2022, 07:10:27

On Tue, 15 Feb 2022 at 17:37, Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Feb 15, 2022 at 11:24 PM Greg Stark <stark@mit.edu> wrote:
> >
> > On Tue, 15 Feb 2022 at 16:43, Magnus Hagander <magnus@hagander.net> wrote:
>
> I really don't see the problem with having the monitoring on a different port.
>
> I *do* see the problem with having a different monitoring port for
> each database in a cluster, if that's what you're saying. Definitely.

No, I'm talking about each cluster. Like, say you deploy software that
has an embedded postgres database in it and doesn't know about your
custom port mapping scheme. Or you are trying to pack as many as you
can on your servers and they're dynamically allocating ports when the
jobs start.

> But if it's 5432 for the database and 8432 for the monitoring for
> example, I'd see that as an improvement. And if you're deploying a
> larger cluster you're auto-configuring these things anyway so to have
> your environment always set "monitoring port = database port + 3000"
> for example should be trivial.

It's definitely doable -- it's what people do today -- but it would be
better if people didn't have to do this. In particular the thing that
really bothers me is that it's one of the reasons you can't write
dashboards (or alerting rules or recording rules) that work out of the
box. Your custom +3000 rule is not something that service discovery
tools or dashboards are going to know about.

And when you try to use the metrics for anything further you run into
issues. Like, if you have metrics from clients about connection errors
-- they'll have labels for the database connection address. Or if you
want to use OS metrics for the network traffic -- same thing. Or if
you want to use metrics about replication from replicas...

> But I think you'll run into a different problem much earlier. Pretty
> much everything out there is going to want to speak http(s). How are
> you going to terminate that, especially https, on the same port as a
> PostgreSQL connection? PostgreSQL will have to reply with it's initial
> negotiating byte before anything else is done, including the TLS
> negotiation, and that will kill anything http.

Yeah this is a serious problem. I think there are other even more
compelling reasons someone else was already looking at this so I'm
kind of hoping it solves itself :)

> > I assume the idea is that that kind of rich structured data belongs in
> > some other system. But I definitely see people squeezing it into
> > metrics. For things like replication topology for example.... I would
> > love to have a
>
> .... love to have a completed sentence there? :)

Oops :) I think I was going to say something like:

I would love to have a system like this but I don't know of one. I
mean there are plenty of tools that could be used to build this but
nothing that does it for you.

-- 
greg

Re: Observability in Postgres

From

Julien Rouhaud

Date:

16 February 2022, 07:18:49

Hi,

On Mon, Feb 14, 2022 at 03:15:14PM -0500, Greg Stark wrote:
>
> [...]
> 2) SQL connections are tied to specific databases within a cluster.
> Making it hard to get data for all your databases if you have more
> than one. The exporter needs to reconnect to each database.
>
> 3) The exporter needs to listen on a different port from the
> postmaster. Making it necessary to write software to manage the
> mapping from server port to exporter port and that's left to the
> end-user as it varies from site to site.
>
> 4) The queries are customizable (the built-in ones don't exhaustively
> exporter postgres's metrics). As a result there's no standard
> dashboard that will work on any site out of the box. Moreover issue
> (3) also makes it impossible to implement one that works properly.
> [...]
> All this said, I think we should have a component in Postgres that
> reads from the stats data directly and outputs metrics in standard
> metrics format directly. This would probably take the form of a
> background worker with a few tricky bits.

But having a background worker for that will bring its own set of (new)
problem.  I never really had a problem with (3), and even if we fixed that
users will still have to rely on mapping for other products they monitor so I
don't see that as a really big issue.

Also I don't think that having such a component directly embedded in postgres
is a good idea, as it means it would be tied to major version releases.  I
don't think anyone will like to hear "sorry you need to upgrade to a new
postgres major version to monitor X even if the data is available in the
catalogs".  It also means that you will now maybe have different standard
metric definition depending on the major version, which seems to contradict
(4).

> There is another elephant in the room (it's a big room) which is that
> this all makes sense for stats data. It doesn't make much sense for
> data that currently lives in pg_class, pg_index, etc. In other words
> I'm mostly solving (2) by ignoring it and concentrating on stats data.
>
> I haven't settled on a good solution for that data. I vaguely lean
> towards saying that the volatile metrics in those tables should really
> live in stats or at least be mirrored there. That makes a clean
> definition of what Postgres thinks a metric is and what it thinks
> catalog data is. But I'm not sure that will really work in practice.
> In particular I think it's likely we'll need to get catalog data from
> every database anyways, for example to label things like tables with
> better labels than oids.

I also don't think that sending those data in stats is going to work, which
makes me quite worried about spending a lot of efforts on a solution that has
problematic limitations for something as useful as database-specific metrics.

Re: Observability in Postgres

From

Jacob Champion

Date:

16 February 2022, 19:27:34

On Wed, 2022-02-16 at 02:10 -0500, Greg Stark wrote:
> On Tue, 15 Feb 2022 at 17:37, Magnus Hagander <magnus@hagander.net> wrote:
> 
> > But I think you'll run into a different problem much earlier. Pretty
> > much everything out there is going to want to speak http(s). How are
> > you going to terminate that, especially https, on the same port as a
> > PostgreSQL connection? PostgreSQL will have to reply with it's initial
> > negotiating byte before anything else is done, including the TLS
> > negotiation, and that will kill anything http.
> 
> Yeah this is a serious problem. I think there are other even more
> compelling reasons someone else was already looking at this so I'm
> kind of hoping it solves itself :)

Yeah, this seems like a shoe-in with implicit TLS support and ALPN. So
hopefully we can help that piece solve itself. :)

That said, I feel like I should probably advise against forwarding HTTP
through Postgres. With implicit TLS you should be able to run a reverse
proxy out front, which could check the ALPN and redirect traffic to the
bgworker port as needed. (I don't think you have to terminate TLS in
order to do this -- so channel bindings et al should be unaffected --
but I don't have experience with that.)

So Postgres wouldn't have to touch HTTP traffic at all, and the
bgworker extension can upgrade its HTTP stack completely independently.
(And if you don't want to share ports, you don't have to deploy the
proxy at all.)

--Jacob

Re: Observability in Postgres

From

Michael Banck

Date:

16 February 2022, 19:39:02

Hi,

On Wed, Feb 16, 2022 at 12:48:11AM -0500, Greg Stark wrote:
> But on further thought I think what you're asking is whether there are
> standard database metrics and standard names for them. A lot of this
> work has already been done with pg_exporter but it is worth looking at
> other database software and see if there are opportunities to
> standardize metrics naming for across databases.

Can you clarify what exactly you mean with "pg_exporter", I think you
mentioned it upthread as well?

https://github.com/Vonng/pg_exporter (90 GH stars, never heard of it) ?
https://github.com/prometheus-community/postgres_exporter (1.6k GH stars) ?

Something else?

Michael

-- 
Michael Banck
Teamleiter PostgreSQL-Team
Projektleiter
Tel.: +49 2166 9901-171
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Geoff Richardson, Peter Lilley

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz