shared memory stats ideas - Mailing list pgsql-hackers

From Andres Freund
Subject shared memory stats ideas
Date
Msg-id 20221019181930.bx73kul4nbiftr65@awork3.anarazel.de
Whole thread Raw
Responses Re: shared memory stats ideas  (Peter Geoghegan <pg@bowt.ie>)
Re: shared memory stats ideas  ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
Hi,

shortly after shared memory stats went in I had a conversation with Lukas
about what it'd enable us going forward. I also chatted with Peter about
autovacuum related stats. I started to write an email, but then somehow lost
the draft and couldn't bring myself to start from scratch.


Here's a largely unordered list of ideas. I'm not planning to work on them
myself, but thought it'd nevertheless be useful to have them memorialized
somewhere.


1) Track some statistics based on relfilenodes rather than oids

We currently track IO related statistics as part of the normal relation
stats. The problem is that that prevents us from collecting stats whenever we
operate on a relfilenode, rather than a Relation.

We e.g. currently can't track the number of blocks written out in a relation,
because we don't have a Relation at that point. Nor can't we really get hold
of one, as the writeback can happen in a different database without access to
pg_class. Which is also the reason why the per-relation IO stats aren't
populated by the startup process, even though it'd obviously sometimes be
helpful to know where the most IO time is spent on a standby.

There's also quite a bit of contortions of the bufmgr interface related to
this.

I think the solution to this is actually fairly simple: We split the IO
related statistics out from the relation statistics, and track them on a
relfilenode basis instead. That'd allow us to track all the IO stats from all
the places, rather than the partial job we do right now.


2) Split index and table statistics into different types of stats

We track both types of statistics in the same format and rename column in
views etc to make them somewhat sensible. A number of the "columns" in index
stats are currently unused.

If we split the stats for indexes and relations we can have reasonable names
for the fields, shrink the current memory usage by halfing the set of fields
we keep for indexes, and extend the stats in a more targeted fashion.


This e.g. would allow us keep track of the number of index entries killed via
the killtuples mechanism, which in turn would allow us to more intelligently
decide whether we should vacuum indexes (often the most expensive part of
vacuum). In a lot of workload killtuples takes care of most of the cleanup,
but in others it doesn't do much.


3) Maintain more historical statistics about vacuuming

We currently track the last time a table was vacuumed, the number of times it
was vacuumed and a bunch of counters for the number of modified tuples since
the last vacuum.

However, none of that allows the user to identify which relations are causing
autovacuum to not keep up. Even just keeping track of the the total time
autovacuum has spent on certain relations would be a significant improvement,
with more easily imaginable (total IO [time], autovacuum delay time, xid age).


4) Make the stats mechanism extensible

Most of the work towards this has already been done, but a bit more work is
necessary. The hardest likely is how to identify stats belonging to an
extension across restarts.

There's a bunch of extensions with their own stats mechanisms, but it's hard
to get this stuff right from the outside.


5) Use extensible shared memory stats to store pg_stat_statements data

pg_stat_statements current mechanism has a few issues. The top ones I know of
are:

- Contention on individual stats entries when the same queryid is executed
  concurrently. pgstats deals with this by allowing stats to be collected in
  backend local memory and to be flushed into shared stats at a lower
  frequency.

- The querytext file can get huge (I've seen > 100GB) and cause massive
  slowdowns. It's better than the old fixed-length, fixed-shared-memory
  mechansism, don't get me wrong. But we can do better by storing the data in
  dynamic shared memory and then also support trimming based on the total
  size.


There were some other things, but I can't remember them right now.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: interrupted tap tests leave postgres instances around
Next
From: samay sharma
Date:
Subject: Documentation for building with meson