Re: Add pg_stat_vfdcache view for VFD cache statistics - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add pg_stat_vfdcache view for VFD cache statistics
Date
Msg-id 42776281-3603-4161-b47d-d4ffd2029e8c@vondra.me
Whole thread Raw
In response to Re: Add pg_stat_vfdcache view for VFD cache statistics  (KAZAR Ayoub <ma_kazar@esi.dz>)
List pgsql-hackers
Hi,

Thanks for working on this, I think having some stats about the vfd
cache would be quite helpful. I took a quick look at the patch, and in
general it goes in the right direction.

Here's a couple comments / suggestions:

1) sgml docs

- Formatting seems a bit wrong, both for the entry in the first table,
and then for the view description later. Clearly different from the
nearby tables after the documentation is built. Did you generate the
tables somehow?

- The ordering seems a bit random, but I'd argue the pg_stat_vfdcache
view should go before pg_stat_wal (at least in the first table).


2) system_views.sql

- missing the REVOKE command
- We don't align the AS clauses with spaces (I like to align my queries,
but here it's a question of consistency with the other commands, and the
alignment makes future diffs larger)


3) fd.c

- Wouldn't it be better to have pgstat_count_vfd_access(hit bool)? That
way you would't even need the new else, it'd ne enough to do
pgstat_count_vfd_access(!FileIsNotOpen(file)) at the beginning.

- GetVfdCacheOccupancy does not seem to be called from anywhere


4) pgstat_vfdcache.c

- I'm not sure usagecount makes sense for these stats, because that's
for cases with a single writer. These stats are written by backends, so
it probably needs a lwlock.

- I'm not sure updating PgStatShared_Backend from pgstat_vfdcache.c is a
good idea, when it's already synced from pgstat_backend. We don't do
that for WAL either, but there's pgstat_flush_backend_entry_wal in
pgstat_backend.c. I suppose vfdcache should do it the same way.


5) pg_proc.dat

- formatting seems a bit inconsistent


6) pgstat.h

- Aren't evictions mostly the same as misses, at least after a while?

- I think it would be useful to report how many file descriptors we
  are allowed to open (it's less than max_files_per_process, depending
  on the ulimits etc.)

- I know io_uring can consume quite a few descriptors, and it can cause
  issues, I wonder if this would make it easier to observe


I also suggest to split the patch into smaller patches, to make it
easier to review and evaluate. Not because of size - the patch is fairly
small. But it's better to not mix multiple features with different
cost/benefit trade offs, because then it's possible to evaluate them
separately. Maybe even commit the first part and continue discussion
about the following one(s).

This patch seems to mix two different types of stats - global stats of
the vfd cache, and then also per-backend stats. Those seems like very
different things, both in terms of overhead and benefits.

The global cache stats is going to be virtually free (at least the
hits/misses, I'm not sure about the number of entries and bytes), and
it's obviously useful for tuning the max_files_per_process GUC. I'd even
contemplate getting this into PG19, maybe.

The per-backend stats seem like a much harder sell to me, but I can be
convinced. Maybe it's not an issue in terms of overhead, maybe the stats
we get from that are worth it. Not sure. But I'd keep it in a separate
0002 patch, on top of 0001 with just the "global" stats.



regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Don't synchronously wait for already-in-progress IO in read stream
Next
From: Tom Lane
Date:
Subject: Re: docs: warn about post-data-only schema dumps with parallel restore.