Re: WAL usage calculation patch - Mailing list pgsql-hackers

From Julien Rouhaud
Subject Re: WAL usage calculation patch
Date
Msg-id 20200304160225.GA26982@nol
Whole thread Raw
In response to Re: WAL usage calculation patch  (Kirill Bychik <kirill.bychik@gmail.com>)
Responses Re: WAL usage calculation patch
List pgsql-hackers
On Thu, Feb 20, 2020 at 06:56:27PM +0300, Kirill Bychik wrote:
> > вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
> > > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
> > > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > > > > Patch is separated in two parts: core changes and pg_stat_statements
> > > > > additions. Essentially the extension has its schema updated to allow
> > > > > two more fields, docs updated to reflect the change. Patch is prepared
> > > > > against master branch.
> > > > >
> > > > > Please provide your comments and/or code findings.
> > > >
> > > > I like the concept, I'm a big fan of anything that affordably improves
> > > > visibility into Pg's I/O and activity.
> > >
> > > +1

Huge +1 too.

> > Thank you for testing the patch and running extension checks. I assume
> > the patch applies without problems.
> >
> > As for the regr test, it apparently requires some rework. I didn't pay
> > attention enough to make sure the data I check is actually meaningful
> > and isolated enough to be repeatable.
> >
> > Please consider the extension part of the patch as WIP, I'll resubmit
> > the patch once I get a stable and meanngful test up. Thanks for
> > finding it!
> >
>
> I have reworked the extension regression test to be more isolated.
> Apparently, something merged into master branch shifted my numbers.
>
> PFA the new patch. Core part didn't change a bit, the extension part
> has regression test SQL and expected log changed.

I'm quite worried about the stability of those counters for regression tests.
Wouldn't a checkpoint happening during the test change them?

While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.

Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick test to store the informations in pgstat, sending
the data in the PG_FINALLY part of vacuum():

rjuju=# create table t1(id integer, val text);
CREATE TABLE
rjuju=# insert into t1 select i, 'val ' || i from generate_series(1, 100000) i;
INSERT 0 100000
rjuju=# vacuum t1;
VACUUM
rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
 datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes
---------+-----------------+---------------+---------------------+-------------------
 rjuju   |             547 |         65201 |                   0 |                 0
(1 row)

rjuju=# delete from t1 where id % 2 = 0;
DELETE 50000
rjuju=# select pg_sleep(60);
 pg_sleep
----------

(1 row)

rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
 datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes
---------+-----------------+---------------+---------------------+-------------------
 rjuju   |             547 |         65201 |                1631 |            323193
(1 row)

That's seems like useful data (especially since I recently had to dig into a
problematic WAL consumption issue that was due to some autovacuum activity),
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance.  We could then
see the complete WAL usage per-database.  What do you think?

Some minor points I noticed:

- the extension patch doesn't apply anymore, I guess since 70a7732007bc4689

 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE         UINT64CONST(0xE000000000000010)

Shouldn't it be 0xA rather than 0x10?

- it would be better to add a version number to the patches, so we're sure
  which one we're talking about.



pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: PG14 target version?
Next
From: David Steele
Date:
Subject: Re: jsonpath syntax extensions