Re: Observability in Postgres - Mailing list pgsql-hackers

From Dave Page
Subject Re: Observability in Postgres
Date
Msg-id CA+OCxow1BCKZ5Uu9a42SJP4KG=f9Bf=4pcGi32Ht-p_ctG3f1w@mail.gmail.com
Whole thread Raw
In response to Observability in Postgres  (Greg Stark <stark@mit.edu>)
Responses Re: Observability in Postgres
Re: Observability in Postgres
List pgsql-hackers
Hi Greg,

On Mon, 14 Feb 2022 at 20:16, Greg Stark <stark@mit.edu> wrote:
So I've been dealing a lot with building and maintaining dashboards
for (fleets of) Postgres servers. And it's a pain. I have a few
strongly held ideas about where the pain points are and what the right
ways to tackle them are. Some of which are going to be controversial I
think...

The state of the art is pg_exporter which is a separate client that
connects to the database and runs SQL queries to extract database
metrics. The pain points I've seen are:

1) The exporter isn't reliable when things are failing. If your
clients can't connect the exporter also can't connect leading to data
gaps in your metrics for precisely the time windows where you need
data. This can happen to connection exhaustion, xid wraparound, or
even something as simple as someone taking an exclusive lock on
something used in the sql queries.

2) SQL connections are tied to specific databases within a cluster.
Making it hard to get data for all your databases if you have more
than one. The exporter needs to reconnect to each database.

3) The exporter needs to listen on a different port from the
postmaster. Making it necessary to write software to manage the
mapping from server port to exporter port and that's left to the
end-user as it varies from site to site.

4) The queries are customizable (the built-in ones don't exhaustively
exporter postgres's metrics). As a result there's no standard
dashboard that will work on any site out of the box. Moreover issue
(3) also makes it impossible to implement one that works properly.

5) data needs to be marshaled from shared memory into SQL and then
read by the client and re-emitted in the metric format. The double
processing requires writing SQL queries very carefully to avoid losing
fidelity for things like LSN positions, xids, etc. Moreover the
latency and gathering data from multiple SQL queries results in
metrics that are often out of sync with each other making them hard to
interpret.

All this said, I think we should have a component in Postgres that
reads from the stats data directly and outputs metrics in standard
metrics format directly. This would probably take the form of a
background worker with a few tricky bits.

This would mean there would be a standard official set of metrics
available that a standard dashboard could rely on to be present at any
site and it would be reliable if the SQL layer isn't functioning due
to lack of connections or xid wraparound or locking issues.

The elephant in the room is that issue (3) requires a bit of sleight
of hand. Ideally I would want it to be listening on the same ports as
the database. That means having the postmaster recognize metric
requests and hand them to the metrics background worker instead of a
backend. I'm not sure people are going to be ok with that....

For now my approach is to implement a background worker that listens
on a new port and is basically its own small web server with shared
memory access. This ignores issue (3) and my hope is that when we have
some experience with this approach we'll see how reliable it is and
how comfortable we are with the kind of hacking in postmaster it would
take to fix it. Fwiw I do think this is an important issue and not one
that we can ignore indefinitely.

There is another elephant in the room (it's a big room) which is that
this all makes sense for stats data. It doesn't make much sense for
data that currently lives in pg_class, pg_index, etc. In other words
I'm mostly solving (2) by ignoring it and concentrating on stats data.

I haven't settled on a good solution for that data. I vaguely lean
towards saying that the volatile metrics in those tables should really
live in stats or at least be mirrored there. That makes a clean
definition of what Postgres thinks a metric is and what it thinks
catalog data is. But I'm not sure that will really work in practice.
In particular I think it's likely we'll need to get catalog data from
every database anyways, for example to label things like tables with
better labels than oids.

This work is being funded by Aiven which is really interested in
improving observability and integration between Postgres and other
open source cloud software.

I agree with pretty much everything above, bar a couple of points:

- Does it really matter if metrics are exposed on a separate port from the postmaster? I actually think doing that is a good thing as it allows use of alternative listen addresses and firewalling rules; you could then confine the monitoring traffic to a management VLAN for example.

- I strongly dislike the idea of building this around the prometheus exporter format. Whilst that is certainly a useful format if you're using prom (as many do), it does have limitations and quirks that would make it painful for other systems to use; for example, the need to encode non-numeric data into labels rather than the metrics themselves (e.g. server version strings or LSNs). I would much prefer to see a common format such as JSON used by default, and perhaps offer a hook to allow alternate formatters to replace that. The prometheus format is also pretty inefficient, as you have to repeat all the key data (labels) for each individual metric.
 
--

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: [PATCH] Fix out-of-bouds access (src/common/wchar.c)
Next
From: Filipe Rosset
Date:
Subject: Re: How did queensnake corrupt zic.o?