Re: pg_stat_io_histogram - Mailing list pgsql-hackers
| From | Jakub Wartak |
|---|---|
| Subject | Re: pg_stat_io_histogram |
| Date | |
| Msg-id | CAKZiRmz-KH2a_qcwGeFwsoajCsQEd8jXyFLx0B0yKgkK=LhBLA@mail.gmail.com Whole thread |
| In response to | Re: pg_stat_io_histogram (Andres Freund <andres@anarazel.de>) |
| Responses |
Re: pg_stat_io_histogram
|
| List | pgsql-hackers |
On Mon, Mar 2, 2026 at 4:06 PM Andres Freund <andres@anarazel.de> wrote: Hi Andres, > On 2026-03-02 09:01:05 +0100, Jakub Wartak wrote: > > On Thu, Feb 26, 2026 at 5:13 PM Andres Freund <andres@anarazel.de> wrote: > > > > > > but I think having it in PgStat_BktypeIO is not great. This makes > > > > > > PgStat_IO 30k*BACKEND_NUM_TYPES bigger, or ~ 0.5MB. Having a stats snapshot > > > > > > be half a megabyte bigger for no reason seems too wasteful. > > > > > > > > > > Yea, that's not awesome. > > > > > > > > Guys, question, could You please explain me what are the drawbacks of having > > > > this semi-big (internal-only) stat snapshot of 0.5MB? I'm struggling to > > > > understand two things: > > > > a) 0.5MB is not a lot those days (ok my 286 had 1MB in the day ;)) > > > > > > I don't really agree with that, I guess. And even if I did, it's one thing to > > > use 0.5MB when you actually use it, it's quite another when most of that > > > memory is never used. > > > > > > > > > With the patch, *every* backend ends up with a substantially larger > > > pgStatLocal. Before: > > > > > > nm -t d --size-sort -r -S src/backend/postgres|head -n20|less > > > (the second column is the decimal size, third the type of the symbol) > > > > > > 0000000004131808 0000000000297456 r yy_transition > > > ... > > > 0000000003916352 0000000000054744 r UnicodeDecompMain > > > 0000000021004896 0000000000052824 B pgStatLocal > > > 0000000003850592 0000000000040416 r unicode_categories > > > ... > > > > > > after: > > > 0000000023220512 0000000000329304 B pgStatLocal > > > 0000000018531648 0000000000297456 r yy_transition > > > ... > > > > > > And because pgStatLocal is zero initialized data, it'll be on-demand-allocated > > > in every single backend (whereas e.g. yy_transition is read-only shared). So > > > you're not talking a single time increase, you're multiplying it by the numer > > > of active connections > > > > > > Now, it's true that most backend won't ever touch pgStatLocal. However, most > > > backends will touch Pending[Backend]IOStats, which also increased noticably: > > > > > > before: > > > 0000000021060960 0000000000002880 b PendingIOStats > > > 0000000021057792 0000000000002880 b PendingBackendStats > > > > > > after: > > > 0000000023568416 0000000000018240 b PendingIOStats > > > 0000000023549888 0000000000018240 b PendingBackendStats > > > > > > > > > Again, I think some increase here doesn't have to be fatal, but increasing > > > with mainly impossible-to-use memory seems just too much waste to mee. > > > > > > > > > This also increases the shared-memory usage of pgstats: Before it used ~300kB > > > on a small system. That nearly doubles with this patch. But that's perhaps > > > less concerning, given it's per-system, rather than per-backend memory usage. > > > > > > > > > > > > > b) how does it affect anything, because testing show it's not? > > > > > > Which of your testing would conceivably show the effect? The concern here > > > isn't really performance, it's that it increases our memory usage, which you'd > > > only see having an effect if you are tight on memory or have a workload that > > > is cache sensitive. > > > > > > > Oh ok, now I get understand the problem about pgStatLocal properly, > > thanks for detailed > > explanation! (but I'm somewhat I'm still lost a little in the woods of > > pgstat infra). Anyway, I > > agree that PgStat_IO started to be way too big especially when the > > pg_stat_io(_histogram) > > views wouldn't be really accessed. > > > > How about the attached v6-0002? It now dynamically allocates PgStat_IO > > memory to avoid > > the memory cost (only allocated if pgstat_io_snapshot_cb() is used).Is > > that the right path? And > > if so, perhaps it should allocate it from mxct > > pgStatLocal.snapshot.context instead? > > I think even the per-backend pending IO stats are too big. And for both > pending stats, stored stats and snapshots, I still don't think I am OK with > storing so many histograms that are not possible to use. I think that needs > to be fixed first. v7-0001: no changes since quite some time Memory reduction stuff (I didn't want to squash it, so for now they are separate) v7-0002: As PendingBackendStats (per individual backend IO stats) was not collecting latency buckets at all (but it was sharing the the same struct/typedef), I cloned the struct without those latency buckets. This reduces struct back again from 18240, back to 2880 bytes per backend (BSS) as on master. v7-0003: Sadly I couldn't easily make backend-local side recording inside PendingIOStats dynamically from within pgstat_count_io_op_time() on first use of specific IO traffic type, so that is for each [IOOBJECT_NUM_TYPES][IOCONTEXT_NUM_TYPES][IOOP_NUM_TYPES] as any MemoryContextAlloc() from there happens to be used as part of critical sections and this blows up. It's just +15kB per backend, so I hope that is ok when we just allocate if we have really desire to use it (track_io/wal_io_timings on) -- so nm(1) reports just 2888 (so just +8b pointer). The drawback of this is that setting GUCs locally won't be effective for histogram collection immediatley, but only for newly spawned backends. This means that I had to switch it to TAP test instead, so it can be tested. I don't have strong opinion if that saving +15kB is worth it or not for users not running with track_[io/wal_io]_timings. v7-0004: (This was already sent with previous message) With orginal v5 every backend had big pgStatLocal (0000000000329304 B pgStatLocal) that was there but not used at all if pg_stat_io(_histogram) views wouldn't be really accessed. Now it is (0000000000000984 B pgStatLocal) and allocates PgStat_Snapshot.PgStat_IO only when quering those views. So with all 3 above combined we have back: 0000000011573376 0000000000002888 B PendingIOStats 0000000011570304 0000000000002880 b PendingBackendStats 0000000011569184 0000000000000984 B pgStatLocal That's actual saving over master itself: 0000000011577344 0000000000052824 B pgStatLocal 0000000011633408 0000000000002880 b PendingIOStats 0000000011630304 0000000000002880 b PendingBackendStats > This also increases the shared-memory usage of pgstats: Before it used ~300kB > on a small system. That nearly doubles with this patch. But that's perhaps > less concerning, given it's per-system, rather than per-backend memory usage. v7-0005: Skipping 4 backend types out of of 17 makes it ignoring ~23% of backend types and with simple array , I can get this down from ~592384 down to ~519424 _total_ memory allocated for'Shared Memory Stats' shm (this one was sent earlier). v7-0006: We could reduce total pgstats shm down to ~482944b if we would eliminate tracking of two further IMHO useless types: autovacuum_launcher and standalone_backend. Master is @ 315904 (so that's just 163kb more according to pg_shm_allocations). Patches probably need some squash and pgident, etc. -J.
Attachment
- v7-0001-Add-pg_stat_io_histogram-view-to-provide-more-det.patch
- v7-0002-PendingBackendStats-save-memory.patch
- v7-0004-Convert-PgStat_IO-to-pointer-to-avoid-huge-static.patch
- v7-0003-PendingIOStats-save-memory.patch
- v7-0005-Condense-PgStat_IO.stats-BACKEND_NUM_TYPES-array-.patch
- v7-0006-Further-condense-and-reduce-memory-used-by-pgstat.patch
pgsql-hackers by date: