Re: shared-memory based stats collector - v70 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: shared-memory based stats collector - v70
Date
Msg-id 20220810222327.5tjgq7hmiipeeuo2@awork3.anarazel.de
Whole thread Raw
In response to Re: shared-memory based stats collector - v70  (Greg Stark <stark@mit.edu>)
List pgsql-hackers
Hi,

On 2022-08-10 14:18:25 -0400, Greg Stark wrote:
> > I don't think that's a large enough issue to worry about unless you're
> > polling at a very high rate, which'd be a bad idea in itself. If a backend
> > can't get the lock for some stats change it'll defer flushing the stats a bit,
> > so it'll not cause a lot of other problems.
> 
> Hm. I wonder if we're on the same page about what constitutes a "high rate".
> 
> I've seen people try push prometheus or other similar systems to 5s
> poll intervals. That would be challenging for Postgres due to the
> volume of statistics. The default is 30s and people often struggle to
> even have that function for large fleets. But if you had a small
> fleet, perhaps an iot style system with a "one large table" type of
> schema you might well want stats every 5s or even every 1s.

That's probably fine. Although I think you might run into trouble not from the
stats subystem side, but from the "amount of data" side. On a system with a
lot of objects that can be a fair amount.  If you really want to do very low
latency stats reporting, I suspect you'd have to build an incremental system.


> > I'm *dead* set against including catalog names in shared memory stats. That'll
> > add a good amount of memory usage and complexity, without any sort of
> > comensurate gain.
> 
> Well it's pushing the complexity there from elsewhere. If the labels
> aren't in the stats structures then the exporter needs to connect to
> each database, gather all the names into some local cache and then it
> needs to worry about keeping it up to date. And if there are any
> database problems such as disk errors or catalog objects being locked
> then your monitoring breaks though perhaps it can be limited to just
> missing some object names or having out of date names.

Shrug. If the stats system state desynchronizes from an alter table rename
you'll also have a problem in monitoring.

And even if you can benefit from having all that information, it'd still be an
overhead born by everybody for a very small share of users.


> > > I also think it would be nice to have a change counter for every stat
> > > object, or perhaps a change time. Prometheus wouldn't be able to make
> > > use of it but other monitoring software might be able to receive only
> > > metrics that have changed since the last update which would really
> > > help on databases with large numbers of mostly static objects.
> >
> > I think you're proposing adding overhead that doesn't even have a real user.
> 
> I guess I'm just brainstorming here. I don't need to currently no. It
> doesn't seem like significant overhead though compared to the locking
> and copying though?

Yes, timestamps aren't cheap to determine (nor free too store, but that's a
lesser issue).

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: optimize lookups in snapshot [sub]xip arrays
Next
From: Andres Freund
Date:
Subject: Re: shared-memory based stats collector - v70