Re: Observability in Postgres - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: Observability in Postgres
Date
Msg-id 20220216071849.gooow2xgr7h2emcy@jrouhaud
Whole thread Raw
In response to Observability in Postgres  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Hi,

On Mon, Feb 14, 2022 at 03:15:14PM -0500, Greg Stark wrote:
>
> [...]
> 2) SQL connections are tied to specific databases within a cluster.
> Making it hard to get data for all your databases if you have more
> than one. The exporter needs to reconnect to each database.
>
> 3) The exporter needs to listen on a different port from the
> postmaster. Making it necessary to write software to manage the
> mapping from server port to exporter port and that's left to the
> end-user as it varies from site to site.
>
> 4) The queries are customizable (the built-in ones don't exhaustively
> exporter postgres's metrics). As a result there's no standard
> dashboard that will work on any site out of the box. Moreover issue
> (3) also makes it impossible to implement one that works properly.
> [...]
> All this said, I think we should have a component in Postgres that
> reads from the stats data directly and outputs metrics in standard
> metrics format directly. This would probably take the form of a
> background worker with a few tricky bits.

But having a background worker for that will bring its own set of (new)
problem.  I never really had a problem with (3), and even if we fixed that
users will still have to rely on mapping for other products they monitor so I
don't see that as a really big issue.

Also I don't think that having such a component directly embedded in postgres
is a good idea, as it means it would be tied to major version releases.  I
don't think anyone will like to hear "sorry you need to upgrade to a new
postgres major version to monitor X even if the data is available in the
catalogs".  It also means that you will now maybe have different standard
metric definition depending on the major version, which seems to contradict
(4).

> There is another elephant in the room (it's a big room) which is that
> this all makes sense for stats data. It doesn't make much sense for
> data that currently lives in pg_class, pg_index, etc. In other words
> I'm mostly solving (2) by ignoring it and concentrating on stats data.
>
> I haven't settled on a good solution for that data. I vaguely lean
> towards saying that the volatile metrics in those tables should really
> live in stats or at least be mirrored there. That makes a clean
> definition of what Postgres thinks a metric is and what it thinks
> catalog data is. But I'm not sure that will really work in practice.
> In particular I think it's likely we'll need to get catalog data from
> every database anyways, for example to label things like tables with
> better labels than oids.

I also don't think that sending those data in stats is going to work, which
makes me quite worried about spending a lot of efforts on a solution that has
problematic limitations for something as useful as database-specific metrics.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Race conditions in 019_replslot_limit.pl
Next
From: Michael Paquier
Date:
Subject: Re: Trap errors from streaming child in pg_basebackup to exit early