Thread: WAL usage calculation patch

WAL usage calculation patch

From
Kirill Bychik
Date:
Hello pgsql-hackers,

Submitting a patch that would enable gathering of per-statement WAL
generation statistics, similar to how it is done for buffer usage.
Collected is the number of records added to WAL and number of WAL
bytes written.

The data collected was found valuable to analyze update-heavy load,
with WAL generation being the bottleneck.

The usage data is collected at low level, after compression is done on
WAL record. Data is then exposed via pg_stat_statements, could also be
used in EXPLAIN ANALYZE if needed. Instrumentation is alike to the one
used for buffer stats. I didn't dare to unify both usage metric sets
into single struct, nor rework the way both are passed to parallel
workers.

Performance impact is (supposed to be) very low, essentially adding
two int operations and memory access on WAL record insert. Additional
efforts to allocate shmem chunk for parallel workers. Parallel workers
shmem usage is increased to fir in a struct of two longs.

Patch is separated in two parts: core changes and pg_stat_statements
additions. Essentially the extension has its schema updated to allow
two more fields, docs updated to reflect the change. Patch is prepared
against master branch.

Please provide your comments and/or code findings.

Attachment

Re: WAL usage calculation patch

From
Craig Ringer
Date:
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
>
> Hello pgsql-hackers,
>
> Submitting a patch that would enable gathering of per-statement WAL
> generation statistics, similar to how it is done for buffer usage.
> Collected is the number of records added to WAL and number of WAL
> bytes written.
>
> The data collected was found valuable to analyze update-heavy load,
> with WAL generation being the bottleneck.
>
> The usage data is collected at low level, after compression is done on
> WAL record. Data is then exposed via pg_stat_statements, could also be
> used in EXPLAIN ANALYZE if needed. Instrumentation is alike to the one
> used for buffer stats. I didn't dare to unify both usage metric sets
> into single struct, nor rework the way both are passed to parallel
> workers.
>
> Performance impact is (supposed to be) very low, essentially adding
> two int operations and memory access on WAL record insert. Additional
> efforts to allocate shmem chunk for parallel workers. Parallel workers
> shmem usage is increased to fir in a struct of two longs.
>
> Patch is separated in two parts: core changes and pg_stat_statements
> additions. Essentially the extension has its schema updated to allow
> two more fields, docs updated to reflect the change. Patch is prepared
> against master branch.
>
> Please provide your comments and/or code findings.

I like the concept, I'm a big fan of anything that affordably improves
visibility into Pg's I/O and activity.

To date I've been relying on tools like systemtap to do this sort of
thing. But that's a bit specialised, and Pg currently lacks useful
instrumentation for it so it can be a pain to match up activity by
parallel workers and that sort of thing. (I aim to find time to submit
a patch for that.)

I haven't yet reviewed the patch.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 2ndQuadrant - PostgreSQL Solutions for the Enterprise



Re: WAL usage calculation patch

From
Thomas Munro
Date:
On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
> On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > Patch is separated in two parts: core changes and pg_stat_statements
> > additions. Essentially the extension has its schema updated to allow
> > two more fields, docs updated to reflect the change. Patch is prepared
> > against master branch.
> >
> > Please provide your comments and/or code findings.
>
> I like the concept, I'm a big fan of anything that affordably improves
> visibility into Pg's I/O and activity.

+1

> To date I've been relying on tools like systemtap to do this sort of
> thing. But that's a bit specialised, and Pg currently lacks useful
> instrumentation for it so it can be a pain to match up activity by
> parallel workers and that sort of thing. (I aim to find time to submit
> a patch for that.)

(I'm interested in seeing your conference talk about that!  I did a
bunch of stuff with static probes to measure PHJ behaviour around
barrier waits and so on but it was hard to figure out what stuff like
that to put in the actual tree, it was all a bit
use-once-to-test-a-theory-and-then-throw-away.)

Kirill, I noticed that you included a regression test that is failing.  Can
this possibly be stable across machines or even on the same machine?
Does it still pass for you or did something change on the master
branch to add a new WAL record since you posted the patch?

query | calls | rows | wal_write_bytes | wal_write_records
 -------------------------------------------+-------+------+-----------------+-------------------
- CREATE INDEX test_b ON test(b)            |     1 |    0 | 1673 |
            16
- DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) |     1 |    0 |   56 |
             1
+ CREATE INDEX test_b ON test(b)            |     1 |    0 | 1755 |
            17
+ DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) |     1 |    0 |    0 |
             0



Re: WAL usage calculation patch

From
Kirill Bychik
Date:
вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
> On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
> > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > > Patch is separated in two parts: core changes and pg_stat_statements
> > > additions. Essentially the extension has its schema updated to allow
> > > two more fields, docs updated to reflect the change. Patch is prepared
> > > against master branch.
> > >
> > > Please provide your comments and/or code findings.
> >
> > I like the concept, I'm a big fan of anything that affordably improves
> > visibility into Pg's I/O and activity.
>
> +1
>
> > To date I've been relying on tools like systemtap to do this sort of
> > thing. But that's a bit specialised, and Pg currently lacks useful
> > instrumentation for it so it can be a pain to match up activity by
> > parallel workers and that sort of thing. (I aim to find time to submit
> > a patch for that.)
>
> (I'm interested in seeing your conference talk about that!  I did a
> bunch of stuff with static probes to measure PHJ behaviour around
> barrier waits and so on but it was hard to figure out what stuff like
> that to put in the actual tree, it was all a bit
> use-once-to-test-a-theory-and-then-throw-away.)
>
> Kirill, I noticed that you included a regression test that is failing.  Can
> this possibly be stable across machines or even on the same machine?
> Does it still pass for you or did something change on the master
> branch to add a new WAL record since you posted the patch?

Thank you for testing the patch and running extension checks. I assume
the patch applies without problems.

As for the regr test, it apparently requires some rework. I didn't pay
attention enough to make sure the data I check is actually meaningful
and isolated enough to be repeatable.

Please consider the extension part of the patch as WIP, I'll resubmit
the patch once I get a stable and meanngful test up. Thanks for
finding it!

> query | calls | rows | wal_write_bytes | wal_write_records
>  -------------------------------------------+-------+------+-----------------+-------------------
> - CREATE INDEX test_b ON test(b)            |     1 |    0 | 1673 |
>             16
> - DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) |     1 |    0 |   56 |
>              1
> + CREATE INDEX test_b ON test(b)            |     1 |    0 | 1755 |
>             17
> + DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) |     1 |    0 |    0 |
>              0



Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
> > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
> > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > > > Patch is separated in two parts: core changes and pg_stat_statements
> > > > additions. Essentially the extension has its schema updated to allow
> > > > two more fields, docs updated to reflect the change. Patch is prepared
> > > > against master branch.
> > > >
> > > > Please provide your comments and/or code findings.
> > >
> > > I like the concept, I'm a big fan of anything that affordably improves
> > > visibility into Pg's I/O and activity.
> >
> > +1
> >
> > > To date I've been relying on tools like systemtap to do this sort of
> > > thing. But that's a bit specialised, and Pg currently lacks useful
> > > instrumentation for it so it can be a pain to match up activity by
> > > parallel workers and that sort of thing. (I aim to find time to submit
> > > a patch for that.)
> >
> > (I'm interested in seeing your conference talk about that!  I did a
> > bunch of stuff with static probes to measure PHJ behaviour around
> > barrier waits and so on but it was hard to figure out what stuff like
> > that to put in the actual tree, it was all a bit
> > use-once-to-test-a-theory-and-then-throw-away.)
> >
> > Kirill, I noticed that you included a regression test that is failing.  Can
> > this possibly be stable across machines or even on the same machine?
> > Does it still pass for you or did something change on the master
> > branch to add a new WAL record since you posted the patch?
>
> Thank you for testing the patch and running extension checks. I assume
> the patch applies without problems.
>
> As for the regr test, it apparently requires some rework. I didn't pay
> attention enough to make sure the data I check is actually meaningful
> and isolated enough to be repeatable.
>
> Please consider the extension part of the patch as WIP, I'll resubmit
> the patch once I get a stable and meanngful test up. Thanks for
> finding it!
>

I have reworked the extension regression test to be more isolated.
Apparently, something merged into master branch shifted my numbers.

PFA the new patch. Core part didn't change a bit, the extension part
has regression test SQL and expected log changed.

Looking forward for new comments.

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Feb 20, 2020 at 06:56:27PM +0300, Kirill Bychik wrote:
> > вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>:
> > > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote:
> > > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > > > > Patch is separated in two parts: core changes and pg_stat_statements
> > > > > additions. Essentially the extension has its schema updated to allow
> > > > > two more fields, docs updated to reflect the change. Patch is prepared
> > > > > against master branch.
> > > > >
> > > > > Please provide your comments and/or code findings.
> > > >
> > > > I like the concept, I'm a big fan of anything that affordably improves
> > > > visibility into Pg's I/O and activity.
> > >
> > > +1

Huge +1 too.

> > Thank you for testing the patch and running extension checks. I assume
> > the patch applies without problems.
> >
> > As for the regr test, it apparently requires some rework. I didn't pay
> > attention enough to make sure the data I check is actually meaningful
> > and isolated enough to be repeatable.
> >
> > Please consider the extension part of the patch as WIP, I'll resubmit
> > the patch once I get a stable and meanngful test up. Thanks for
> > finding it!
> >
>
> I have reworked the extension regression test to be more isolated.
> Apparently, something merged into master branch shifted my numbers.
>
> PFA the new patch. Core part didn't change a bit, the extension part
> has regression test SQL and expected log changed.

I'm quite worried about the stability of those counters for regression tests.
Wouldn't a checkpoint happening during the test change them?

While at it, did you consider adding a full-page image counter in the WalUsage?
That's something I'd really like to have and it doesn't seem hard to integrate.

Another point is that this patch won't help to see autovacuum activity.
As an example, I did a quick test to store the informations in pgstat, sending
the data in the PG_FINALLY part of vacuum():

rjuju=# create table t1(id integer, val text);
CREATE TABLE
rjuju=# insert into t1 select i, 'val ' || i from generate_series(1, 100000) i;
INSERT 0 100000
rjuju=# vacuum t1;
VACUUM
rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
 datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes
---------+-----------------+---------------+---------------------+-------------------
 rjuju   |             547 |         65201 |                   0 |                 0
(1 row)

rjuju=# delete from t1 where id % 2 = 0;
DELETE 50000
rjuju=# select pg_sleep(60);
 pg_sleep
----------

(1 row)

rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes
from pg_stat_database where datname = 'rjuju';
 datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes
---------+-----------------+---------------+---------------------+-------------------
 rjuju   |             547 |         65201 |                1631 |            323193
(1 row)

That's seems like useful data (especially since I recently had to dig into a
problematic WAL consumption issue that was due to some autovacuum activity),
but that may seem strange to only account for (auto)vacuum activity, rather
than globally, grouping per RmgrId or CommandTag for instance.  We could then
see the complete WAL usage per-database.  What do you think?

Some minor points I noticed:

- the extension patch doesn't apply anymore, I guess since 70a7732007bc4689

 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
+#define PARALLEL_KEY_WAL_USAGE         UINT64CONST(0xE000000000000010)

Shouldn't it be 0xA rather than 0x10?

- it would be better to add a version number to the patches, so we're sure
  which one we're talking about.



Re: WAL usage calculation patch

From
Michael Paquier
Date:
On Wed, Mar 04, 2020 at 05:02:25PM +0100, Julien Rouhaud wrote:
> I'm quite worried about the stability of those counters for regression tests.
> Wouldn't a checkpoint happening during the test change them?

Yep.  One way to go through that would be to test if this output is
non-zero still I suspect at quick glance that this won't be entirely
reliable either.

> While at it, did you consider adding a full-page image counter in the WalUsage?
> That's something I'd really like to have and it doesn't seem hard to integrate.

FWIW, one reason here is that we had recently some benchmark work done
internally where this would have been helpful in studying some spiky
WAL load patterns.
--
Michael

Attachment

Fwd: WAL usage calculation patch

From
Kirill Bychik
Date:
> I'm quite worried about the stability of those counters for regression tests.
> Wouldn't a checkpoint happening during the test change them?

Agree, stability of test could be an issue, even shifting of write
format or compression method or adding compatible changes could break
such test. Frankly speaking, the numbers expected are not actually
calculated, my logic was rather well described by "these numbers
should be non-zero for real tables". I believe the test can be
modified to check that numbers are above zero, both for bytes written
and for records stored.

Having a checkpoint in the moddle of the test can be almost 100%
countered by triggering one before the test. I'll add a checkpoint
call to the test scenario, if no objections here.

> While at it, did you consider adding a full-page image counter in the WalUsage?
> That's something I'd really like to have and it doesn't seem hard to integrate.

Well, not sure I understand you 100%, being new to Postgres dev. Do
you want a separate counter for pages written whenever doPageWrites is
true? I can do that, if needed. Please confirm.

> Another point is that this patch won't help to see autovacuum activity.
> As an example, I did a quick te.....
> ...LONG QUOTE...
> but that may seem strange to only account for (auto)vacuum activity, rather
> than globally, grouping per RmgrId or CommandTag for instance.  We could then
> see the complete WAL usage per-database.  What do you think?

I wanted to keep the patch small and simple, and fit to practical
needs. This patch is supposed to provide tuning assistance, catching
an io heavy query in commit-bound situation.
Total WAL usage per DB can be assessed rather easily using other means.
Let's get this change into the codebase and then work on connecting
WAL usage to  (auto)vacuum stats.

>
> Some minor points I noticed:
>
> - the extension patch doesn't apply anymore, I guess since 70a7732007bc4689

Will fix, thank you.

>
>  #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009)
> +#define PARALLEL_KEY_WAL_USAGE         UINT64CONST(0xE000000000000010)
>
> Shouldn't it be 0xA rather than 0x10?

Oww, my bad, this is embaracing! Will fix, thank you.

> - it would be better to add a version number to the patches, so we're sure
>   which one we're talking about.

Noted, thank you.

Please comment on the proposed changes, I will cook up a new version
once all are agreed upon.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
>
> > While at it, did you consider adding a full-page image counter in the WalUsage?
> > That's something I'd really like to have and it doesn't seem hard to integrate.
>
> Well, not sure I understand you 100%, being new to Postgres dev. Do
> you want a separate counter for pages written whenever doPageWrites is
> true? I can do that, if needed. Please confirm.

Yes, I meant a separate 3rd counter for the number of full page images
written.  However after a quick look I think that a FPI should be
detected with (doPageWrites && fpw_lsn != InvalidXLogRecPtr && fpw_lsn
<= RedoRecPtr).

> > Another point is that this patch won't help to see autovacuum activity.
> > As an example, I did a quick te.....
> > ...LONG QUOTE...
> > but that may seem strange to only account for (auto)vacuum activity, rather
> > than globally, grouping per RmgrId or CommandTag for instance.  We could then
> > see the complete WAL usage per-database.  What do you think?
>
> I wanted to keep the patch small and simple, and fit to practical
> needs. This patch is supposed to provide tuning assistance, catching
> an io heavy query in commit-bound situation.
> Total WAL usage per DB can be assessed rather easily using other means.
> Let's get this change into the codebase and then work on connecting
> WAL usage to  (auto)vacuum stats.

I agree that having a view of the full activity is a way bigger scope,
so it could be done later (and at this point in pg14), but I'm still
hoping that we can get insight of other backend WAL activity, such as
autovacuum, in pg13.



Re: WAL usage calculation patch

From
Kirill Bychik
Date:
пт, 6 мар. 2020 г. в 20:14, Julien Rouhaud <rjuju123@gmail.com>:
>
> On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
> >
> > > While at it, did you consider adding a full-page image counter in the WalUsage?
> > > That's something I'd really like to have and it doesn't seem hard to integrate.
> >
> > Well, not sure I understand you 100%, being new to Postgres dev. Do
> > you want a separate counter for pages written whenever doPageWrites is
> > true? I can do that, if needed. Please confirm.
>
> Yes, I meant a separate 3rd counter for the number of full page images
> written.  However after a quick look I think that a FPI should be
> detected with (doPageWrites && fpw_lsn != InvalidXLogRecPtr && fpw_lsn
> <= RedoRecPtr).

This seems easy, will implement once I get some spare time.

> > > Another point is that this patch won't help to see autovacuum activity.
> > > As an example, I did a quick te.....
> > > ...LONG QUOTE...
> > > but that may seem strange to only account for (auto)vacuum activity, rather
> > > than globally, grouping per RmgrId or CommandTag for instance.  We could then
> > > see the complete WAL usage per-database.  What do you think?
> >
> > I wanted to keep the patch small and simple, and fit to practical
> > needs. This patch is supposed to provide tuning assistance, catching
> > an io heavy query in commit-bound situation.
> > Total WAL usage per DB can be assessed rather easily using other means.
> > Let's get this change into the codebase and then work on connecting
> > WAL usage to  (auto)vacuum stats.
>
> I agree that having a view of the full activity is a way bigger scope,
> so it could be done later (and at this point in pg14), but I'm still
> hoping that we can get insight of other backend WAL activity, such as
> autovacuum, in pg13.

How do you think this information should be exposed? Via the pg_stat_statement?

Anyways, I believe this change could be bigger than FPI. I propose to
plan a separate patch for it, or even add it to the TODO after the
core patch of wal usage is merged.

Please expect a new patch version next week, with FPI counters added.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Fri, Mar 6, 2020 at 6:59 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
>
> пт, 6 мар. 2020 г. в 20:14, Julien Rouhaud <rjuju123@gmail.com>:
> >
> > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > > I wanted to keep the patch small and simple, and fit to practical
> > > needs. This patch is supposed to provide tuning assistance, catching
> > > an io heavy query in commit-bound situation.
> > > Total WAL usage per DB can be assessed rather easily using other means.
> > > Let's get this change into the codebase and then work on connecting
> > > WAL usage to  (auto)vacuum stats.
> >
> > I agree that having a view of the full activity is a way bigger scope,
> > so it could be done later (and at this point in pg14), but I'm still
> > hoping that we can get insight of other backend WAL activity, such as
> > autovacuum, in pg13.
>
> How do you think this information should be exposed? Via the pg_stat_statement?

That's unlikely, since autovacuum won't trigger any hook.  I was
thinking on some new view for pgstats, similarly to the example I
showed previously. The implementation is straightforward, although
pg_stat_database is maybe not the best choice here.

> Anyways, I believe this change could be bigger than FPI. I propose to
> plan a separate patch for it, or even add it to the TODO after the
> core patch of wal usage is merged.

Just in case, if the problem is a lack of time, I'd be happy to help
on that if needed.  Otherwise, I'll definitely not try to block any
progress for the feature as proposed.

> Please expect a new patch version next week, with FPI counters added.

Thanks!



Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
> > > > I wanted to keep the patch small and simple, and fit to practical
> > > > needs. This patch is supposed to provide tuning assistance, catching
> > > > an io heavy query in commit-bound situation.
> > > > Total WAL usage per DB can be assessed rather easily using other means.
> > > > Let's get this change into the codebase and then work on connecting
> > > > WAL usage to  (auto)vacuum stats.
> > >
> > > I agree that having a view of the full activity is a way bigger scope,
> > > so it could be done later (and at this point in pg14), but I'm still
> > > hoping that we can get insight of other backend WAL activity, such as
> > > autovacuum, in pg13.
> >
> > How do you think this information should be exposed? Via the pg_stat_statement?
>
> That's unlikely, since autovacuum won't trigger any hook.  I was
> thinking on some new view for pgstats, similarly to the example I
> showed previously. The implementation is straightforward, although
> pg_stat_database is maybe not the best choice here.

After extensive thinking and some code diving, I did not manage to
come up with a sane idea on how to expose data about autovacuum WAL
usage. Must be the flu.

> > Anyways, I believe this change could be bigger than FPI. I propose to
> > plan a separate patch for it, or even add it to the TODO after the
> > core patch of wal usage is merged.
>
> Just in case, if the problem is a lack of time, I'd be happy to help
> on that if needed.  Otherwise, I'll definitely not try to block any
> progress for the feature as proposed.

Please feel free to work on any extension of this patch idea. I lack
both time and knowledge to do it all by myself.

> > Please expect a new patch version next week, with FPI counters added.

Please find attached patch version 003, with FP writes and minor
corrections. Hope i use attachment versioning as expected in this
group :)

Test had been reworked, and I believe it should be stable now, the
part which checks WAL is written and there is a correlation between
affected rows and WAL records. I still have no idea how to test
full-page writes against regular updates, it seems very unstable.
Please share ideas if any.

Thanks!

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sun, Mar 15, 2020 at 09:52:18PM +0300, Kirill Bychik wrote:
> > > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
> After extensive thinking and some code diving, I did not manage to
> come up with a sane idea on how to expose data about autovacuum WAL
> usage. Must be the flu.
>
> > > Anyways, I believe this change could be bigger than FPI. I propose to
> > > plan a separate patch for it, or even add it to the TODO after the
> > > core patch of wal usage is merged.
> >
> > Just in case, if the problem is a lack of time, I'd be happy to help
> > on that if needed.  Otherwise, I'll definitely not try to block any
> > progress for the feature as proposed.
>
> Please feel free to work on any extension of this patch idea. I lack
> both time and knowledge to do it all by myself.


I'm adding a 3rd patch on top of yours to expose the new WAL counters in
pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic with
this approach but I didn't find better, and maybe this will raise some better
ideas.  The only sure thing is that we're not going to add a bunch of new
fields in pg_stat_all_tables anyway.

We can also drop this 3rd patch entirely if no one's happy about it without
impacting the first two.


> > > Please expect a new patch version next week, with FPI counters added.
>
> Please find attached patch version 003, with FP writes and minor
> corrections. Hope i use attachment versioning as expected in this
> group :)


Thanks!


> Test had been reworked, and I believe it should be stable now, the
> part which checks WAL is written and there is a correlation between
> affected rows and WAL records. I still have no idea how to test
> full-page writes against regular updates, it seems very unstable.
> Please share ideas if any.


I just reviewed the patches, and it globally looks good to me.  The way to
detect full page images looks sensible, but I'm really not familiar with that
code so additional review would be useful.

I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
used in the test.  Since I have to add all the patches to make the cfbot happy,
I slightly adapted the tests to reference the fp column too.  There was also a
minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
twice while wal_write_fp_records wasn't documented, so I also changed it.

Let me know if you're ok with those changes.

Attachment

Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> > Please feel free to work on any extension of this patch idea. I lack
> > both time and knowledge to do it all by myself.
>
>
> I'm adding a 3rd patch on top of yours to expose the new WAL counters in
> pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic with
> this approach but I didn't find better, and maybe this will raise some better
> ideas.  The only sure thing is that we're not going to add a bunch of new
> fields in pg_stat_all_tables anyway.
>
> We can also drop this 3rd patch entirely if no one's happy about it without
> impacting the first two.

No objections about 3rd on my side, unless we miss the CF completely.

As for the code, I believe:
+ walusage.wal_records = pgWalUsage.wal_records -
+ walusage_start.wal_records;
+ walusage.wal_fp_records = pgWalUsage.wal_fp_records -
+ walusage_start.wal_fp_records;
+ walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;

Could be done much simpler via the utility:
WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);

On a side note, I agree API to the buf/wal usage is far from perfect.

> > Test had been reworked, and I believe it should be stable now, the
> > part which checks WAL is written and there is a correlation between
> > affected rows and WAL records. I still have no idea how to test
> > full-page writes against regular updates, it seems very unstable.
> > Please share ideas if any.
>
>
> I just reviewed the patches, and it globally looks good to me.  The way to
> detect full page images looks sensible, but I'm really not familiar with that
> code so additional review would be useful.
>
> I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
> used in the test.  Since I have to add all the patches to make the cfbot happy,
> I slightly adapted the tests to reference the fp column too.  There was also a
> minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
> twice while wal_write_fp_records wasn't documented, so I also changed it.
>
> Let me know if you're ok with those changes.

Sorry for not getting wal_fp_usage into the docs, my fault.

As for the tests, please get somebody else to review this. I strongly
believe checking full page writes here could be a source of
instability.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Mar 17, 2020 at 10:27:05PM +0300, Kirill Bychik wrote:
> > > Please feel free to work on any extension of this patch idea. I lack
> > > both time and knowledge to do it all by myself.
> >
> > I'm adding a 3rd patch on top of yours to expose the new WAL counters in
> > pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic with
> > this approach but I didn't find better, and maybe this will raise some better
> > ideas.  The only sure thing is that we're not going to add a bunch of new
> > fields in pg_stat_all_tables anyway.
> >
> > We can also drop this 3rd patch entirely if no one's happy about it without
> > impacting the first two.
>
> No objections about 3rd on my side, unless we miss the CF completely.
>
> As for the code, I believe:
> + walusage.wal_records = pgWalUsage.wal_records -
> + walusage_start.wal_records;
> + walusage.wal_fp_records = pgWalUsage.wal_fp_records -
> + walusage_start.wal_fp_records;
> + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
>
> Could be done much simpler via the utility:
> WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);


Indeed, but this function is private to instrument.c.  AFAICT
pg_stat_statements is already duplicating similar code for buffers rather than
having BufferUsageAccumDiff being exported, so I chose the same approach.

I'd be in favor of exporting both functions though.


> On a side note, I agree API to the buf/wal usage is far from perfect.


Yes clearly.


> > > Test had been reworked, and I believe it should be stable now, the
> > > part which checks WAL is written and there is a correlation between
> > > affected rows and WAL records. I still have no idea how to test
> > > full-page writes against regular updates, it seems very unstable.
> > > Please share ideas if any.
> >
> >
> > I just reviewed the patches, and it globally looks good to me.  The way to
> > detect full page images looks sensible, but I'm really not familiar with that
> > code so additional review would be useful.
> >
> > I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
> > used in the test.  Since I have to add all the patches to make the cfbot happy,
> > I slightly adapted the tests to reference the fp column too.  There was also a
> > minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
> > twice while wal_write_fp_records wasn't documented, so I also changed it.
> >
> > Let me know if you're ok with those changes.
>
> Sorry for not getting wal_fp_usage into the docs, my fault.
>
> As for the tests, please get somebody else to review this. I strongly
> believe checking full page writes here could be a source of
> instability.


I'm also a little bit dubious about it.  The initial checkpoint should make
things stable (of course unless full_page_writes is disabled), and Cfbot also
seems happy about it.  At least keeping it for the temporary tables test
shouldn't be a problem.



Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> > > > Please feel free to work on any extension of this patch idea. I lack
> > > > both time and knowledge to do it all by myself.
> > >
> > > I'm adding a 3rd patch on top of yours to expose the new WAL counters in
> > > pg_stat_database, for vacuum and autovacuum.  I'm not really enthiusiastic with
> > > this approach but I didn't find better, and maybe this will raise some better
> > > ideas.  The only sure thing is that we're not going to add a bunch of new
> > > fields in pg_stat_all_tables anyway.
> > >
> > > We can also drop this 3rd patch entirely if no one's happy about it without
> > > impacting the first two.
> >
> > No objections about 3rd on my side, unless we miss the CF completely.
> >
> > As for the code, I believe:
> > + walusage.wal_records = pgWalUsage.wal_records -
> > + walusage_start.wal_records;
> > + walusage.wal_fp_records = pgWalUsage.wal_fp_records -
> > + walusage_start.wal_fp_records;
> > + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes;
> >
> > Could be done much simpler via the utility:
> > WalUsageAccumDiff(walusage, pgWalUsage, walusage_start);
>
>
> Indeed, but this function is private to instrument.c.  AFAICT
> pg_stat_statements is already duplicating similar code for buffers rather than
> having BufferUsageAccumDiff being exported, so I chose the same approach.
>
> I'd be in favor of exporting both functions though.
> > On a side note, I agree API to the buf/wal usage is far from perfect.
>
>
> Yes clearly.

There is a higher-level Instrumentation API that can be used with
INSTRUMENT_WAL flag to collect the wal usage information. I believe
the instrumentation is widely used in the executor code, so it should
not be a problem to colelct instrumentation information on autovacuum
worker level.

Just a recommendation/chat, though. I am happy with the way the data
is collected now. If you commit this variant, please add a TODO to
rework wal usage to common instr API.

> > > > Test had been reworked, and I believe it should be stable now, the
> > > > part which checks WAL is written and there is a correlation between
> > > > affected rows and WAL records. I still have no idea how to test
> > > > full-page writes against regular updates, it seems very unstable.
> > > > Please share ideas if any.
> > >
> > >
> > > I just reviewed the patches, and it globally looks good to me.  The way to
> > > detect full page images looks sensible, but I'm really not familiar with that
> > > code so additional review would be useful.
> > >
> > > I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't
> > > used in the test.  Since I have to add all the patches to make the cfbot happy,
> > > I slightly adapted the tests to reference the fp column too.  There was also a
> > > minor issue in the documentation, as wal_records and wal_bytes were copy/pasted
> > > twice while wal_write_fp_records wasn't documented, so I also changed it.
> > >
> > > Let me know if you're ok with those changes.
> >
> > Sorry for not getting wal_fp_usage into the docs, my fault.
> >
> > As for the tests, please get somebody else to review this. I strongly
> > believe checking full page writes here could be a source of
> > instability.
>
>
> I'm also a little bit dubious about it.  The initial checkpoint should make
> things stable (of course unless full_page_writes is disabled), and Cfbot also
> seems happy about it.  At least keeping it for the temporary tables test
> shouldn't be a problem.

Temp tables should show zero FPI WAL records, true :)

I have no objections to the patch.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote:
>
> There is a higher-level Instrumentation API that can be used with
> INSTRUMENT_WAL flag to collect the wal usage information. I believe
> the instrumentation is widely used in the executor code, so it should
> not be a problem to colelct instrumentation information on autovacuum
> worker level.
>
> Just a recommendation/chat, though. I am happy with the way the data
> is collected now. If you commit this variant, please add a TODO to
> rework wal usage to common instr API.


The instrumentation is somewhat intended to be used with executor nodes, not
backend commands.  I don't see real technical reason that would prevent that,
but I prefer to keep things as-is for now, as it sound less controversial.
This is for the 3rd patch, which may not even be considered for this CF anyway.


> > > As for the tests, please get somebody else to review this. I strongly
> > > believe checking full page writes here could be a source of
> > > instability.
> >
> >
> > I'm also a little bit dubious about it.  The initial checkpoint should make
> > things stable (of course unless full_page_writes is disabled), and Cfbot also
> > seems happy about it.  At least keeping it for the temporary tables test
> > shouldn't be a problem.
>
> Temp tables should show zero FPI WAL records, true :)
>
> I have no objections to the patch.


I'm attaching a v5 with fp records only for temp tables, so there's no risk of
instability.  As I previously said I'm fine with your two patches, so unless
you have objections on the fpi test for temp tables or the documentation
changes, I believe those should be ready for committer.

Attachment

Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> > There is a higher-level Instrumentation API that can be used with
> > INSTRUMENT_WAL flag to collect the wal usage information. I believe
> > the instrumentation is widely used in the executor code, so it should
> > not be a problem to colelct instrumentation information on autovacuum
> > worker level.
> >
> > Just a recommendation/chat, though. I am happy with the way the data
> > is collected now. If you commit this variant, please add a TODO to
> > rework wal usage to common instr API.
>
>
> The instrumentation is somewhat intended to be used with executor nodes, not
> backend commands.  I don't see real technical reason that would prevent that,
> but I prefer to keep things as-is for now, as it sound less controversial.
> This is for the 3rd patch, which may not even be considered for this CF anyway.
>
>
> > > > As for the tests, please get somebody else to review this. I strongly
> > > > believe checking full page writes here could be a source of
> > > > instability.
> > >
> > >
> > > I'm also a little bit dubious about it.  The initial checkpoint should make
> > > things stable (of course unless full_page_writes is disabled), and Cfbot also
> > > seems happy about it.  At least keeping it for the temporary tables test
> > > shouldn't be a problem.
> >
> > Temp tables should show zero FPI WAL records, true :)
> >
> > I have no objections to the patch.
>
>
> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> instability.  As I previously said I'm fine with your two patches, so unless
> you have objections on the fpi test for temp tables or the documentation
> changes, I believe those should be ready for committer.

No objections on my side either. Thank you for your review, time and efforts!



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Wed, Mar 18, 2020 at 08:48:17PM +0300, Kirill Bychik wrote:
> > I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> > instability.  As I previously said I'm fine with your two patches, so unless
> > you have objections on the fpi test for temp tables or the documentation
> > changes, I believe those should be ready for committer.
>
> No objections on my side either. Thank you for your review, time and efforts!


Great, thanks also for the patches and efforts!  I'll mark the entry as RFC.



Re: WAL usage calculation patch

From
Fujii Masao
Date:

On 2020/03/19 2:19, Julien Rouhaud wrote:
> On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote:
>>
>> There is a higher-level Instrumentation API that can be used with
>> INSTRUMENT_WAL flag to collect the wal usage information. I believe
>> the instrumentation is widely used in the executor code, so it should
>> not be a problem to colelct instrumentation information on autovacuum
>> worker level.
>>
>> Just a recommendation/chat, though. I am happy with the way the data
>> is collected now. If you commit this variant, please add a TODO to
>> rework wal usage to common instr API.
> 
> 
> The instrumentation is somewhat intended to be used with executor nodes, not
> backend commands.  I don't see real technical reason that would prevent that,
> but I prefer to keep things as-is for now, as it sound less controversial.
> This is for the 3rd patch, which may not even be considered for this CF anyway.
> 
> 
>>>> As for the tests, please get somebody else to review this. I strongly
>>>> believe checking full page writes here could be a source of
>>>> instability.
>>>
>>>
>>> I'm also a little bit dubious about it.  The initial checkpoint should make
>>> things stable (of course unless full_page_writes is disabled), and Cfbot also
>>> seems happy about it.  At least keeping it for the temporary tables test
>>> shouldn't be a problem.
>>
>> Temp tables should show zero FPI WAL records, true :)
>>
>> I have no objections to the patch.
> 
> 
> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> instability.  As I previously said I'm fine with your two patches, so unless
> you have objections on the fpi test for temp tables or the documentation
> changes, I believe those should be ready for committer.

You added the columns into pg_stat_database, but seem to forget to
update the document for pg_stat_database.

Is it really reasonable to add the columns for vacuum's WAL usage into
pg_stat_database? I'm not sure how much the information about
the amount of WAL generated by vacuum per database is useful.
Isn't it better to make VACUUM VERBOSE and autovacuum log include
that information, instead, to see how much each vacuum activity
generates the WAL? Sorry if this discussion has already been done
upthread.

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Mar 19, 2020 at 09:03:02PM +0900, Fujii Masao wrote:
> 
> On 2020/03/19 2:19, Julien Rouhaud wrote:
> > 
> > I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> > instability.  As I previously said I'm fine with your two patches, so unless
> > you have objections on the fpi test for temp tables or the documentation
> > changes, I believe those should be ready for committer.
> 
> You added the columns into pg_stat_database, but seem to forget to
> update the document for pg_stat_database.

Ah right, I totally missed that when I tried to clean up the original POC.

> Is it really reasonable to add the columns for vacuum's WAL usage into
> pg_stat_database? I'm not sure how much the information about
> the amount of WAL generated by vacuum per database is useful.

The amount per database isn't really useful, but I didn't had a better idea on
how to expose (auto)vacuum WAL usage until this:

> Isn't it better to make VACUUM VERBOSE and autovacuum log include
> that information, instead, to see how much each vacuum activity
> generates the WAL? Sorry if this discussion has already been done
> upthread.

That's a way better idea!  I'm attaching the full patchset with the 3rd patch
to use this approach instead.  There's a bit a duplicate code for computing the
WalUsage, as I didn't find a better way to avoid that without exposing
WalUsageAccumDiff().

Autovacuum log sample:

2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
    pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
    tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
    buffer usage: 4448 hits, 4 misses, 4 dirtied
    avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
    system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
    WAL usage: 6643 records, 4 full page records, 1402679 bytes

VACUUM log sample:

# vacuum VERBOSE t1;
INFO:  vacuuming "public.t1"
INFO:  "t1": removed 50000 row versions in 443 pages
INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
There were 50000 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
1332 WAL records, 4 WAL full page records, 306901 WAL bytes
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
INFO:  "t1": truncated 443 to 0 pages
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  vacuuming "pg_toast.pg_toast_16385"
INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
There were 0 unused item identifiers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
0 WAL records, 0 WAL full page records, 0 WAL bytes
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

Note that the 3rd patch is an addition on top of Kirill's original patch, as
this is information that would have been greatly helpful to investigate in some
performance issues I had to investigate recently.  I'd be happy to have it land
into v13, but if that's controversial or too late I'm happy to postpone it to
v14 if the infrastructure added in Kirill's patches can make it to v13.

Attachment

Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> > > I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> > > instability.  As I previously said I'm fine with your two patches, so unless
> > > you have objections on the fpi test for temp tables or the documentation
> > > changes, I believe those should be ready for committer.
> >
> > You added the columns into pg_stat_database, but seem to forget to
> > update the document for pg_stat_database.
>
> Ah right, I totally missed that when I tried to clean up the original POC.
>
> > Is it really reasonable to add the columns for vacuum's WAL usage into
> > pg_stat_database? I'm not sure how much the information about
> > the amount of WAL generated by vacuum per database is useful.
>
> The amount per database isn't really useful, but I didn't had a better idea on
> how to expose (auto)vacuum WAL usage until this:
>
> > Isn't it better to make VACUUM VERBOSE and autovacuum log include
> > that information, instead, to see how much each vacuum activity
> > generates the WAL? Sorry if this discussion has already been done
> > upthread.
>
> That's a way better idea!  I'm attaching the full patchset with the 3rd patch
> to use this approach instead.  There's a bit a duplicate code for computing the
> WalUsage, as I didn't find a better way to avoid that without exposing
> WalUsageAccumDiff().
>
> Autovacuum log sample:
>
> 2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
>         pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
>         tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
>         buffer usage: 4448 hits, 4 misses, 4 dirtied
>         avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
>         system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
>         WAL usage: 6643 records, 4 full page records, 1402679 bytes
>
> VACUUM log sample:
>
> # vacuum VERBOSE t1;
> INFO:  vacuuming "public.t1"
> INFO:  "t1": removed 50000 row versions in 443 pages
> INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
> There were 50000 unused item identifiers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes
> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
> INFO:  "t1": truncated 443 to 0 pages
> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> INFO:  vacuuming "pg_toast.pg_toast_16385"
> INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
> DETAIL:  0 index row versions were removed.
> 0 index pages have been deleted, 0 are currently reusable.
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
> There were 0 unused item identifiers.
> Skipped 0 pages due to buffer pins, 0 frozen pages.
> 0 pages are entirely empty.
> 0 WAL records, 0 WAL full page records, 0 WAL bytes
> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> VACUUM
>
> Note that the 3rd patch is an addition on top of Kirill's original patch, as
> this is information that would have been greatly helpful to investigate in some
> performance issues I had to investigate recently.  I'd be happy to have it land
> into v13, but if that's controversial or too late I'm happy to postpone it to
> v14 if the infrastructure added in Kirill's patches can make it to v13.

Dear all, can we please focus on getting the core patch committed?
Given the uncertainity regarding autovacuum stats, can we please get
parts 1 and 2 into the codebase, and think about exposing autovacuum
stats later?



Re: WAL usage calculation patch

From
Fujii Masao
Date:

On 2020/03/23 7:32, Kirill Bychik wrote:
>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
>>>> instability.  As I previously said I'm fine with your two patches, so unless
>>>> you have objections on the fpi test for temp tables or the documentation
>>>> changes, I believe those should be ready for committer.
>>>
>>> You added the columns into pg_stat_database, but seem to forget to
>>> update the document for pg_stat_database.
>>
>> Ah right, I totally missed that when I tried to clean up the original POC.
>>
>>> Is it really reasonable to add the columns for vacuum's WAL usage into
>>> pg_stat_database? I'm not sure how much the information about
>>> the amount of WAL generated by vacuum per database is useful.
>>
>> The amount per database isn't really useful, but I didn't had a better idea on
>> how to expose (auto)vacuum WAL usage until this:
>>
>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include
>>> that information, instead, to see how much each vacuum activity
>>> generates the WAL? Sorry if this discussion has already been done
>>> upthread.
>>
>> That's a way better idea!  I'm attaching the full patchset with the 3rd patch
>> to use this approach instead.  There's a bit a duplicate code for computing the
>> WalUsage, as I didn't find a better way to avoid that without exposing
>> WalUsageAccumDiff().
>>
>> Autovacuum log sample:
>>
>> 2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
>>          pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
>>          tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
>>          buffer usage: 4448 hits, 4 misses, 4 dirtied
>>          avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
>>          system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
>>          WAL usage: 6643 records, 4 full page records, 1402679 bytes
>>
>> VACUUM log sample:
>>
>> # vacuum VERBOSE t1;
>> INFO:  vacuuming "public.t1"
>> INFO:  "t1": removed 50000 row versions in 443 pages
>> INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
>> There were 50000 unused item identifiers.
>> Skipped 0 pages due to buffer pins, 0 frozen pages.
>> 0 pages are entirely empty.
>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes
>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
>> INFO:  "t1": truncated 443 to 0 pages
>> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>> INFO:  vacuuming "pg_toast.pg_toast_16385"
>> INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
>> DETAIL:  0 index row versions were removed.
>> 0 index pages have been deleted, 0 are currently reusable.
>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
>> INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
>> There were 0 unused item identifiers.
>> Skipped 0 pages due to buffer pins, 0 frozen pages.
>> 0 pages are entirely empty.
>> 0 WAL records, 0 WAL full page records, 0 WAL bytes
>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
>> VACUUM
>>
>> Note that the 3rd patch is an addition on top of Kirill's original patch, as
>> this is information that would have been greatly helpful to investigate in some
>> performance issues I had to investigate recently.  I'd be happy to have it land
>> into v13, but if that's controversial or too late I'm happy to postpone it to
>> v14 if the infrastructure added in Kirill's patches can make it to v13.
> 
> Dear all, can we please focus on getting the core patch committed?
> Given the uncertainity regarding autovacuum stats, can we please get
> parts 1 and 2 into the codebase, and think about exposing autovacuum
> stats later?

Here are the comments for 0001 patch.

+            /*
+             * Report a full page image constructed for the WAL record
+             */
+            pgWalUsage.wal_fp_records++;

Isn't it better to use "fpw" or "fpi" for the variable name rather than
"fp" here? In other places, "fpw" and "fpi" are used for full page
writes/image.

ISTM that this counter could be incorrect if XLogInsertRecord() determines to
calculate again whether FPI is necessary or not. No? IOW, this issue could
happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
its do-while loop. Isn't this problematic?

+    long        wal_bytes;        /* size of wal records produced */

Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
rather than long?

+    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);

bufusage_space should be walusage_space here?

/*
  * Finish parallel execution.  We wait for parallel workers to finish, and
  * accumulate their buffer usage.
  */

There are some comments mentioning buffer usage, in execParallel.c.
For example, the top comment for ExecParallelFinish(), as the above.
These should be updated.

Regards,


-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: WAL usage calculation patch

From
Fujii Masao
Date:

On 2020/03/23 21:01, Fujii Masao wrote:
> 
> 
> On 2020/03/23 7:32, Kirill Bychik wrote:
>>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
>>>>> instability.  As I previously said I'm fine with your two patches, so unless
>>>>> you have objections on the fpi test for temp tables or the documentation
>>>>> changes, I believe those should be ready for committer.
>>>>
>>>> You added the columns into pg_stat_database, but seem to forget to
>>>> update the document for pg_stat_database.
>>>
>>> Ah right, I totally missed that when I tried to clean up the original POC.
>>>
>>>> Is it really reasonable to add the columns for vacuum's WAL usage into
>>>> pg_stat_database? I'm not sure how much the information about
>>>> the amount of WAL generated by vacuum per database is useful.
>>>
>>> The amount per database isn't really useful, but I didn't had a better idea on
>>> how to expose (auto)vacuum WAL usage until this:
>>>
>>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include
>>>> that information, instead, to see how much each vacuum activity
>>>> generates the WAL? Sorry if this discussion has already been done
>>>> upthread.
>>>
>>> That's a way better idea!  I'm attaching the full patchset with the 3rd patch
>>> to use this approach instead.  There's a bit a duplicate code for computing the
>>> WalUsage, as I didn't find a better way to avoid that without exposing
>>> WalUsageAccumDiff().
>>>
>>> Autovacuum log sample:
>>>
>>> 2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
>>>          pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
>>>          tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
>>>          buffer usage: 4448 hits, 4 misses, 4 dirtied
>>>          avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
>>>          system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
>>>          WAL usage: 6643 records, 4 full page records, 1402679 bytes
>>>
>>> VACUUM log sample:
>>>
>>> # vacuum VERBOSE t1;
>>> INFO:  vacuuming "public.t1"
>>> INFO:  "t1": removed 50000 row versions in 443 pages
>>> INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
>>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
>>> There were 50000 unused item identifiers.
>>> Skipped 0 pages due to buffer pins, 0 frozen pages.
>>> 0 pages are entirely empty.
>>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes
>>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
>>> INFO:  "t1": truncated 443 to 0 pages
>>> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>>> INFO:  vacuuming "pg_toast.pg_toast_16385"
>>> INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
>>> DETAIL:  0 index row versions were removed.
>>> 0 index pages have been deleted, 0 are currently reusable.
>>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
>>> INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
>>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
>>> There were 0 unused item identifiers.
>>> Skipped 0 pages due to buffer pins, 0 frozen pages.
>>> 0 pages are entirely empty.
>>> 0 WAL records, 0 WAL full page records, 0 WAL bytes
>>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
>>> VACUUM
>>>
>>> Note that the 3rd patch is an addition on top of Kirill's original patch, as
>>> this is information that would have been greatly helpful to investigate in some
>>> performance issues I had to investigate recently.  I'd be happy to have it land
>>> into v13, but if that's controversial or too late I'm happy to postpone it to
>>> v14 if the infrastructure added in Kirill's patches can make it to v13.
>>
>> Dear all, can we please focus on getting the core patch committed?
>> Given the uncertainity regarding autovacuum stats, can we please get
>> parts 1 and 2 into the codebase, and think about exposing autovacuum
>> stats later?
> 
> Here are the comments for 0001 patch.
> 
> +            /*
> +             * Report a full page image constructed for the WAL record
> +             */
> +            pgWalUsage.wal_fp_records++;
> 
> Isn't it better to use "fpw" or "fpi" for the variable name rather than
> "fp" here? In other places, "fpw" and "fpi" are used for full page
> writes/image.
> 
> ISTM that this counter could be incorrect if XLogInsertRecord() determines to
> calculate again whether FPI is necessary or not. No? IOW, this issue could
> happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
> its do-while loop. Isn't this problematic?
> 
> +    long        wal_bytes;        /* size of wal records produced */
> 
> Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
> rather than long?
> 
> +    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
> 
> bufusage_space should be walusage_space here?
> 
> /*
>   * Finish parallel execution.  We wait for parallel workers to finish, and
>   * accumulate their buffer usage.
>   */
> 
> There are some comments mentioning buffer usage, in execParallel.c.
> For example, the top comment for ExecParallelFinish(), as the above.
> These should be updated.

Here are the comments for 0002 patch.

+    OUT wal_write_bytes int8,
+    OUT wal_write_records int8,
+    OUT wal_write_fp_records int8

Isn't "write" part in the column names confusing because it's WAL
*generated* (not written) by the statement?

+RETURNS SETOF record
+AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
+LANGUAGE C STRICT VOLATILE;

PARALLEL SAFE should be specified?

+/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */

ISTM it's good timing to have also pg_stat_statements--1.8.sql since
the definition of pg_stat_statements() is changed. Thought?

+-- CHECKPOINT before WAL tests to ensure test stability
+CHECKPOINT;

Is this true? I thought you added this because the number of FPI
should be larger than zero in the subsequent test. No? But there
seems no such test. I'm not excited about adding the test checking
the number of FPI because it looks fragile, though...

+UPDATE pgss_test SET b = '333' WHERE a = 3 \;
+UPDATE pgss_test SET b = '444' WHERE a = 4 ;

Could you tell me why several queries need to be run to test
the WAL usage? Isn't running a few query enough for the test purpase?

Regards,

-- 
Fujii Masao
NTT DATA CORPORATION
Advanced Platform Technology Group
Research and Development Headquarters



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Mar 23, 2020 at 3:24 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> On 2020/03/23 21:01, Fujii Masao wrote:
> >
> >
> > On 2020/03/23 7:32, Kirill Bychik wrote:
> >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> >>>>> instability.  As I previously said I'm fine with your two patches, so unless
> >>>>> you have objections on the fpi test for temp tables or the documentation
> >>>>> changes, I believe those should be ready for committer.
> >>>>
> >>>> You added the columns into pg_stat_database, but seem to forget to
> >>>> update the document for pg_stat_database.
> >>>
> >>> Ah right, I totally missed that when I tried to clean up the original POC.
> >>>
> >>>> Is it really reasonable to add the columns for vacuum's WAL usage into
> >>>> pg_stat_database? I'm not sure how much the information about
> >>>> the amount of WAL generated by vacuum per database is useful.
> >>>
> >>> The amount per database isn't really useful, but I didn't had a better idea on
> >>> how to expose (auto)vacuum WAL usage until this:
> >>>
> >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include
> >>>> that information, instead, to see how much each vacuum activity
> >>>> generates the WAL? Sorry if this discussion has already been done
> >>>> upthread.
> >>>
> >>> That's a way better idea!  I'm attaching the full patchset with the 3rd patch
> >>> to use this approach instead.  There's a bit a duplicate code for computing the
> >>> WalUsage, as I didn't find a better way to avoid that without exposing
> >>> WalUsageAccumDiff().
> >>>
> >>> Autovacuum log sample:
> >>>
> >>> 2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
> >>>          pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
> >>>          tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
> >>>          buffer usage: 4448 hits, 4 misses, 4 dirtied
> >>>          avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
> >>>          system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
> >>>          WAL usage: 6643 records, 4 full page records, 1402679 bytes
> >>>
> >>> VACUUM log sample:
> >>>
> >>> # vacuum VERBOSE t1;
> >>> INFO:  vacuuming "public.t1"
> >>> INFO:  "t1": removed 50000 row versions in 443 pages
> >>> INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
> >>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
> >>> There were 50000 unused item identifiers.
> >>> Skipped 0 pages due to buffer pins, 0 frozen pages.
> >>> 0 pages are entirely empty.
> >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes
> >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
> >>> INFO:  "t1": truncated 443 to 0 pages
> >>> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> >>> INFO:  vacuuming "pg_toast.pg_toast_16385"
> >>> INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
> >>> DETAIL:  0 index row versions were removed.
> >>> 0 index pages have been deleted, 0 are currently reusable.
> >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> >>> INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
> >>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
> >>> There were 0 unused item identifiers.
> >>> Skipped 0 pages due to buffer pins, 0 frozen pages.
> >>> 0 pages are entirely empty.
> >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes
> >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> >>> VACUUM
> >>>
> >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as
> >>> this is information that would have been greatly helpful to investigate in some
> >>> performance issues I had to investigate recently.  I'd be happy to have it land
> >>> into v13, but if that's controversial or too late I'm happy to postpone it to
> >>> v14 if the infrastructure added in Kirill's patches can make it to v13.
> >>
> >> Dear all, can we please focus on getting the core patch committed?
> >> Given the uncertainity regarding autovacuum stats, can we please get
> >> parts 1 and 2 into the codebase, and think about exposing autovacuum
> >> stats later?
> >
> > Here are the comments for 0001 patch.
> >
> > +            /*
> > +             * Report a full page image constructed for the WAL record
> > +             */
> > +            pgWalUsage.wal_fp_records++;
> >
> > Isn't it better to use "fpw" or "fpi" for the variable name rather than
> > "fp" here? In other places, "fpw" and "fpi" are used for full page
> > writes/image.
> >
> > ISTM that this counter could be incorrect if XLogInsertRecord() determines to
> > calculate again whether FPI is necessary or not. No? IOW, this issue could
> > happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
> > its do-while loop. Isn't this problematic?
> >
> > +    long        wal_bytes;        /* size of wal records produced */
> >
> > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
> > rather than long?
> >
> > +    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
> >
> > bufusage_space should be walusage_space here?
> >
> > /*
> >   * Finish parallel execution.  We wait for parallel workers to finish, and
> >   * accumulate their buffer usage.
> >   */
> >
> > There are some comments mentioning buffer usage, in execParallel.c.
> > For example, the top comment for ExecParallelFinish(), as the above.
> > These should be updated.
>
> Here are the comments for 0002 patch.
>
> +    OUT wal_write_bytes int8,
> +    OUT wal_write_records int8,
> +    OUT wal_write_fp_records int8
>
> Isn't "write" part in the column names confusing because it's WAL
> *generated* (not written) by the statement?
>
> +RETURNS SETOF record
> +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
> +LANGUAGE C STRICT VOLATILE;
>
> PARALLEL SAFE should be specified?
>
> +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
>
> ISTM it's good timing to have also pg_stat_statements--1.8.sql since
> the definition of pg_stat_statements() is changed. Thought?
>
> +-- CHECKPOINT before WAL tests to ensure test stability
> +CHECKPOINT;
>
> Is this true? I thought you added this because the number of FPI
> should be larger than zero in the subsequent test. No? But there
> seems no such test. I'm not excited about adding the test checking
> the number of FPI because it looks fragile, though...
>
> +UPDATE pgss_test SET b = '333' WHERE a = 3 \;
> +UPDATE pgss_test SET b = '444' WHERE a = 4 ;
>
> Could you tell me why several queries need to be run to test
> the WAL usage? Isn't running a few query enough for the test purpase?

FTR I marked the commitfest entry as waiting on author.

Kirill do you think you'll have time to address Fuji-san's review
shortly?  The end of the commitfest is approaching quite fast :(



Re: WAL usage calculation patch

From
Kirill Bychik
Date:
> > >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> > >>>>> instability.  As I previously said I'm fine with your two patches, so unless
> > >>>>> you have objections on the fpi test for temp tables or the documentation
> > >>>>> changes, I believe those should be ready for committer.
> > >>>>
> > >>>> You added the columns into pg_stat_database, but seem to forget to
> > >>>> update the document for pg_stat_database.
> > >>>
> > >>> Ah right, I totally missed that when I tried to clean up the original POC.
> > >>>
> > >>>> Is it really reasonable to add the columns for vacuum's WAL usage into
> > >>>> pg_stat_database? I'm not sure how much the information about
> > >>>> the amount of WAL generated by vacuum per database is useful.
> > >>>
> > >>> The amount per database isn't really useful, but I didn't had a better idea on
> > >>> how to expose (auto)vacuum WAL usage until this:
> > >>>
> > >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include
> > >>>> that information, instead, to see how much each vacuum activity
> > >>>> generates the WAL? Sorry if this discussion has already been done
> > >>>> upthread.
> > >>>
> > >>> That's a way better idea!  I'm attaching the full patchset with the 3rd patch
> > >>> to use this approach instead.  There's a bit a duplicate code for computing the
> > >>> WalUsage, as I didn't find a better way to avoid that without exposing
> > >>> WalUsageAccumDiff().
> > >>>
> > >>> Autovacuum log sample:
> > >>>
> > >>> 2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
> > >>>          pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
> > >>>          tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
> > >>>          buffer usage: 4448 hits, 4 misses, 4 dirtied
> > >>>          avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
> > >>>          system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
> > >>>          WAL usage: 6643 records, 4 full page records, 1402679 bytes
> > >>>
> > >>> VACUUM log sample:
> > >>>
> > >>> # vacuum VERBOSE t1;
> > >>> INFO:  vacuuming "public.t1"
> > >>> INFO:  "t1": removed 50000 row versions in 443 pages
> > >>> INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
> > >>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
> > >>> There were 50000 unused item identifiers.
> > >>> Skipped 0 pages due to buffer pins, 0 frozen pages.
> > >>> 0 pages are entirely empty.
> > >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes
> > >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
> > >>> INFO:  "t1": truncated 443 to 0 pages
> > >>> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> > >>> INFO:  vacuuming "pg_toast.pg_toast_16385"
> > >>> INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
> > >>> DETAIL:  0 index row versions were removed.
> > >>> 0 index pages have been deleted, 0 are currently reusable.
> > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> > >>> INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
> > >>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
> > >>> There were 0 unused item identifiers.
> > >>> Skipped 0 pages due to buffer pins, 0 frozen pages.
> > >>> 0 pages are entirely empty.
> > >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes
> > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> > >>> VACUUM
> > >>>
> > >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as
> > >>> this is information that would have been greatly helpful to investigate in some
> > >>> performance issues I had to investigate recently.  I'd be happy to have it land
> > >>> into v13, but if that's controversial or too late I'm happy to postpone it to
> > >>> v14 if the infrastructure added in Kirill's patches can make it to v13.
> > >>
> > >> Dear all, can we please focus on getting the core patch committed?
> > >> Given the uncertainity regarding autovacuum stats, can we please get
> > >> parts 1 and 2 into the codebase, and think about exposing autovacuum
> > >> stats later?
> > >
> > > Here are the comments for 0001 patch.
> > >
> > > +            /*
> > > +             * Report a full page image constructed for the WAL record
> > > +             */
> > > +            pgWalUsage.wal_fp_records++;
> > >
> > > Isn't it better to use "fpw" or "fpi" for the variable name rather than
> > > "fp" here? In other places, "fpw" and "fpi" are used for full page
> > > writes/image.
> > >
> > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to
> > > calculate again whether FPI is necessary or not. No? IOW, this issue could
> > > happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
> > > its do-while loop. Isn't this problematic?
> > >
> > > +    long        wal_bytes;        /* size of wal records produced */
> > >
> > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
> > > rather than long?
> > >
> > > +    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
> > >
> > > bufusage_space should be walusage_space here?
> > >
> > > /*
> > >   * Finish parallel execution.  We wait for parallel workers to finish, and
> > >   * accumulate their buffer usage.
> > >   */
> > >
> > > There are some comments mentioning buffer usage, in execParallel.c.
> > > For example, the top comment for ExecParallelFinish(), as the above.
> > > These should be updated.
> >
> > Here are the comments for 0002 patch.
> >
> > +    OUT wal_write_bytes int8,
> > +    OUT wal_write_records int8,
> > +    OUT wal_write_fp_records int8
> >
> > Isn't "write" part in the column names confusing because it's WAL
> > *generated* (not written) by the statement?
> >
> > +RETURNS SETOF record
> > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
> > +LANGUAGE C STRICT VOLATILE;
> >
> > PARALLEL SAFE should be specified?
> >
> > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
> >
> > ISTM it's good timing to have also pg_stat_statements--1.8.sql since
> > the definition of pg_stat_statements() is changed. Thought?
> >
> > +-- CHECKPOINT before WAL tests to ensure test stability
> > +CHECKPOINT;
> >
> > Is this true? I thought you added this because the number of FPI
> > should be larger than zero in the subsequent test. No? But there
> > seems no such test. I'm not excited about adding the test checking
> > the number of FPI because it looks fragile, though...
> >
> > +UPDATE pgss_test SET b = '333' WHERE a = 3 \;
> > +UPDATE pgss_test SET b = '444' WHERE a = 4 ;
> >
> > Could you tell me why several queries need to be run to test
> > the WAL usage? Isn't running a few query enough for the test purpase?
>
> FTR I marked the commitfest entry as waiting on author.
>
> Kirill do you think you'll have time to address Fuji-san's review
> shortly?  The end of the commitfest is approaching quite fast :(

All these are really valuable objections. Unfortunately, I won't be
able to get all sorted out soon, due to total lack of time. I would be
very glad if somebody could step in for this patch.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
>
> > > >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of
> > > >>>>> instability.  As I previously said I'm fine with your two patches, so unless
> > > >>>>> you have objections on the fpi test for temp tables or the documentation
> > > >>>>> changes, I believe those should be ready for committer.
> > > >>>>
> > > >>>> You added the columns into pg_stat_database, but seem to forget to
> > > >>>> update the document for pg_stat_database.
> > > >>>
> > > >>> Ah right, I totally missed that when I tried to clean up the original POC.
> > > >>>
> > > >>>> Is it really reasonable to add the columns for vacuum's WAL usage into
> > > >>>> pg_stat_database? I'm not sure how much the information about
> > > >>>> the amount of WAL generated by vacuum per database is useful.
> > > >>>
> > > >>> The amount per database isn't really useful, but I didn't had a better idea on
> > > >>> how to expose (auto)vacuum WAL usage until this:
> > > >>>
> > > >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include
> > > >>>> that information, instead, to see how much each vacuum activity
> > > >>>> generates the WAL? Sorry if this discussion has already been done
> > > >>>> upthread.
> > > >>>
> > > >>> That's a way better idea!  I'm attaching the full patchset with the 3rd patch
> > > >>> to use this approach instead.  There's a bit a duplicate code for computing the
> > > >>> WalUsage, as I didn't find a better way to avoid that without exposing
> > > >>> WalUsageAccumDiff().
> > > >>>
> > > >>> Autovacuum log sample:
> > > >>>
> > > >>> 2020-03-19 15:49:05.708 CET [5843] LOG:  automatic vacuum of table "rjuju.public.t1": index scans: 0
> > > >>>          pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen
> > > >>>          tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502
> > > >>>          buffer usage: 4448 hits, 4 misses, 4 dirtied
> > > >>>          avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s
> > > >>>          system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s
> > > >>>          WAL usage: 6643 records, 4 full page records, 1402679 bytes
> > > >>>
> > > >>> VACUUM log sample:
> > > >>>
> > > >>> # vacuum VERBOSE t1;
> > > >>> INFO:  vacuuming "public.t1"
> > > >>> INFO:  "t1": removed 50000 row versions in 443 pages
> > > >>> INFO:  "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages
> > > >>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 512
> > > >>> There were 50000 unused item identifiers.
> > > >>> Skipped 0 pages due to buffer pins, 0 frozen pages.
> > > >>> 0 pages are entirely empty.
> > > >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes
> > > >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
> > > >>> INFO:  "t1": truncated 443 to 0 pages
> > > >>> DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> > > >>> INFO:  vacuuming "pg_toast.pg_toast_16385"
> > > >>> INFO:  index "pg_toast_16385_index" now contains 0 row versions in 1 pages
> > > >>> DETAIL:  0 index row versions were removed.
> > > >>> 0 index pages have been deleted, 0 are currently reusable.
> > > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> > > >>> INFO:  "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
> > > >>> DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 513
> > > >>> There were 0 unused item identifiers.
> > > >>> Skipped 0 pages due to buffer pins, 0 frozen pages.
> > > >>> 0 pages are entirely empty.
> > > >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes
> > > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
> > > >>> VACUUM
> > > >>>
> > > >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as
> > > >>> this is information that would have been greatly helpful to investigate in some
> > > >>> performance issues I had to investigate recently.  I'd be happy to have it land
> > > >>> into v13, but if that's controversial or too late I'm happy to postpone it to
> > > >>> v14 if the infrastructure added in Kirill's patches can make it to v13.
> > > >>
> > > >> Dear all, can we please focus on getting the core patch committed?
> > > >> Given the uncertainity regarding autovacuum stats, can we please get
> > > >> parts 1 and 2 into the codebase, and think about exposing autovacuum
> > > >> stats later?
> > > >
> > > > Here are the comments for 0001 patch.
> > > >
> > > > +            /*
> > > > +             * Report a full page image constructed for the WAL record
> > > > +             */
> > > > +            pgWalUsage.wal_fp_records++;
> > > >
> > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than
> > > > "fp" here? In other places, "fpw" and "fpi" are used for full page
> > > > writes/image.
> > > >
> > > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to
> > > > calculate again whether FPI is necessary or not. No? IOW, this issue could
> > > > happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
> > > > its do-while loop. Isn't this problematic?
> > > >
> > > > +    long        wal_bytes;        /* size of wal records produced */
> > > >
> > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
> > > > rather than long?
> > > >
> > > > +    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
> > > >
> > > > bufusage_space should be walusage_space here?
> > > >
> > > > /*
> > > >   * Finish parallel execution.  We wait for parallel workers to finish, and
> > > >   * accumulate their buffer usage.
> > > >   */
> > > >
> > > > There are some comments mentioning buffer usage, in execParallel.c.
> > > > For example, the top comment for ExecParallelFinish(), as the above.
> > > > These should be updated.
> > >
> > > Here are the comments for 0002 patch.
> > >
> > > +    OUT wal_write_bytes int8,
> > > +    OUT wal_write_records int8,
> > > +    OUT wal_write_fp_records int8
> > >
> > > Isn't "write" part in the column names confusing because it's WAL
> > > *generated* (not written) by the statement?
> > >
> > > +RETURNS SETOF record
> > > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
> > > +LANGUAGE C STRICT VOLATILE;
> > >
> > > PARALLEL SAFE should be specified?
> > >
> > > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
> > >
> > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since
> > > the definition of pg_stat_statements() is changed. Thought?
> > >
> > > +-- CHECKPOINT before WAL tests to ensure test stability
> > > +CHECKPOINT;
> > >
> > > Is this true? I thought you added this because the number of FPI
> > > should be larger than zero in the subsequent test. No? But there
> > > seems no such test. I'm not excited about adding the test checking
> > > the number of FPI because it looks fragile, though...
> > >
> > > +UPDATE pgss_test SET b = '333' WHERE a = 3 \;
> > > +UPDATE pgss_test SET b = '444' WHERE a = 4 ;
> > >
> > > Could you tell me why several queries need to be run to test
> > > the WAL usage? Isn't running a few query enough for the test purpase?
> >
> > FTR I marked the commitfest entry as waiting on author.
> >
> > Kirill do you think you'll have time to address Fuji-san's review
> > shortly?  The end of the commitfest is approaching quite fast :(
>
> All these are really valuable objections. Unfortunately, I won't be
> able to get all sorted out soon, due to total lack of time. I would be
> very glad if somebody could step in for this patch.

I'll try to do that tomorrow!



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sat, Mar 28, 2020 at 12:54 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik <kirill.bychik@gmail.com> wrote:
> >
> >
> > All these are really valuable objections. Unfortunately, I won't be
> > able to get all sorted out soon, due to total lack of time. I would be
> > very glad if somebody could step in for this patch.
>
> I'll try to do that tomorrow!
>

I see some basic problems with the patch.  The way it tries to compute
WAL usage for parallel stuff doesn't seem right to me.  Can you share
or point me to any test done where we have computed WAL for parallel
operations like Parallel Vacuum or Parallel Create Index?  Basically,
I don't know changes done in ExecInitParallelPlan and friends allow us
to compute WAL for parallel operations.  Those will primarily cover
parallel queries that won't write WAL.  How you have tested those
changes?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> 
> I see some basic problems with the patch.  The way it tries to compute
> WAL usage for parallel stuff doesn't seem right to me.  Can you share
> or point me to any test done where we have computed WAL for parallel
> operations like Parallel Vacuum or Parallel Create Index?

Ah, that's indeed a good point and AFAICT WAL records from parallel utility
workers won't be accounted for.  That being said, I think that an argument
could be made that proper infrastructure should have been added in the original
parallel utility patches, as pg_stat_statement is already broken wrt. buffer
usage in parallel utility, unless I'm missing something.

> Basically,
> I don't know changes done in ExecInitParallelPlan and friends allow us
> to compute WAL for parallel operations.  Those will primarily cover
> parallel queries that won't write WAL.  How you have tested those
> changes?

I didn't tested those, and I'm not even sure how to properly and reliably test
that.  Do you have any advice on how to achieve that?

However the patch is mimicking the buffer instrumentation that already exists,
and the approach also looks correct to me.  Do you have a reason to believe
that the approach that works for buffer usage wouldn't work for WAL records? (I
of course agree that this should be tested anyway)



On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > 
> > I see some basic problems with the patch.  The way it tries to compute
> > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > or point me to any test done where we have computed WAL for parallel
> > operations like Parallel Vacuum or Parallel Create Index?
> 
> Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> workers won't be accounted for.  That being said, I think that an argument
> could be made that proper infrastructure should have been added in the original
> parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> usage in parallel utility, unless I'm missing something.

Just to be sure I did a quick test with pg_stat_statements behavior using
parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
doesn't reflect parallel workers' activity.

I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
commit adding parallel maintenance.



On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > >
> > > I see some basic problems with the patch.  The way it tries to compute
> > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > or point me to any test done where we have computed WAL for parallel
> > > operations like Parallel Vacuum or Parallel Create Index?
> >
> > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > workers won't be accounted for.  That being said, I think that an argument
> > could be made that proper infrastructure should have been added in the original
> > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > usage in parallel utility, unless I'm missing something.
>
> Just to be sure I did a quick test with pg_stat_statements behavior using
> parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> doesn't reflect parallel workers' activity.
>

Sawada-San would like to investigate this? If not, I will look into
this next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> >
> > Basically,
> > I don't know changes done in ExecInitParallelPlan and friends allow us
> > to compute WAL for parallel operations.  Those will primarily cover
> > parallel queries that won't write WAL.  How you have tested those
> > changes?
>
> I didn't tested those, and I'm not even sure how to properly and reliably test
> that.  Do you have any advice on how to achieve that?
>
> However the patch is mimicking the buffer instrumentation that already exists,
> and the approach also looks correct to me.  Do you have a reason to believe
> that the approach that works for buffer usage wouldn't work for WAL records? (I
> of course agree that this should be tested anyway)
>

The buffer usage infrastructure is for read-only queries (for ex. for
stats like blks_hit, blks_read).  As far as I can think, there is no
easy way to test the WAL usage via that API.  It might or might not be
required in the future depending on whether we decide to use the same
infrastructure for parallel writes.  I think for now we should remove
that part of changes and rather think how to get that for parallel
operations that can write WAL.  For ex. we might need to do something
similar to what this patch has done in begin_parallel_vacuum and
end_parallel_vacuum.  Would you like to attempt that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > >
> > > > I see some basic problems with the patch.  The way it tries to compute
> > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > or point me to any test done where we have computed WAL for parallel
> > > > operations like Parallel Vacuum or Parallel Create Index?
> > >
> > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > > workers won't be accounted for.  That being said, I think that an argument
> > > could be made that proper infrastructure should have been added in the original
> > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > > usage in parallel utility, unless I'm missing something.
> >
> > Just to be sure I did a quick test with pg_stat_statements behavior using
> > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> > doesn't reflect parallel workers' activity.
> >
>
> Sawada-San would like to investigate this? If not, I will look into
> this next week.

Sure, I'll investigate this issue today.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > > >
> > > > > I see some basic problems with the patch.  The way it tries to compute
> > > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > > or point me to any test done where we have computed WAL for parallel
> > > > > operations like Parallel Vacuum or Parallel Create Index?
> > > >
> > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > > > workers won't be accounted for.  That being said, I think that an argument
> > > > could be made that proper infrastructure should have been added in the original
> > > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > > > usage in parallel utility, unless I'm missing something.
> > >
> > > Just to be sure I did a quick test with pg_stat_statements behavior using
> > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> > > doesn't reflect parallel workers' activity.
> > >
> >
> > Sawada-San would like to investigate this? If not, I will look into
> > this next week.
>
> Sure, I'll investigate this issue today.
>

I've run vacuum with/without parallel workers on the table having 5
indexes. The vacuum reads all blocks of table and indexes.

* VACUUM command with no parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';

  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
--------------+-----------------+------------------+-----------------+---------------------+---------------------
 19857.217207 |           45238 |           226944 |          272182 |
             225943 |              225894
(1 row)

* VACUUM command with 4 parallel workers
=# select total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query ~ 'vacuum';

 total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
shared_blks_dirtied | shared_blks_written
-------------+-----------------+------------------+-----------------+---------------------+---------------------
 6932.117365 |           45205 |            73079 |          118284 |
             72403 |               72365
(1 row)

The total number of blocks of table and indexes are about 182243
blocks. As Julien reported, obviously the total number of read blocks
during parallel vacuum is much less than single process vacuum's
result.

Parallel create index has the same issue but it doesn't exist in
parallel queries for SELECTs.

I think we need to change parallel maintenance commands so that they
report buffer usage like what ParallelQueryMain() does; prepare to
track buffer usage during query execution by
InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
after parallel maintenance command. To report buffer usage of parallel
maintenance command correctly, I'm thinking that we can (1) change
parallel create index and parallel vacuum so that they prepare
gathering buffer usage, or (2) have a common entry point for parallel
maintenance commands that is responsible for gathering buffer usage
and calling the entry functions for individual maintenance command.
I'll investigate it more in depth.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sun, Mar 29, 2020 at 11:03:50AM +0530, Amit Kapila wrote:
> On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > >
> > > Basically,
> > > I don't know changes done in ExecInitParallelPlan and friends allow us
> > > to compute WAL for parallel operations.  Those will primarily cover
> > > parallel queries that won't write WAL.  How you have tested those
> > > changes?
> >
> > I didn't tested those, and I'm not even sure how to properly and reliably test
> > that.  Do you have any advice on how to achieve that?
> >
> > However the patch is mimicking the buffer instrumentation that already exists,
> > and the approach also looks correct to me.  Do you have a reason to believe
> > that the approach that works for buffer usage wouldn't work for WAL records? (I
> > of course agree that this should be tested anyway)
> >
> 
> The buffer usage infrastructure is for read-only queries (for ex. for
> stats like blks_hit, blks_read).  As far as I can think, there is no
> easy way to test the WAL usage via that API.  It might or might not be
> required in the future depending on whether we decide to use the same
> infrastructure for parallel writes.

I'm not sure that I get your point.  I'm assuming that you meant
parallel-read-only queries, but surely buffer usage infrastructure for
parallel query relies on the same approach as non-parallel one (each node
computes the process-local pgBufferUsage diff) and sums all of that at the end
of the parallel query execution.  I also don't see how whether the query is
read-only or not is relevant here as far as instrumentation is concerned,
especially since read-only query can definitely do writes and increase the
count of dirtied buffers, like a write query would.  For instance a hint
bit change can be done in a parallel query AFAIK, and this can generate WAL
records in wal_log_hints is enabled, so that's probably one way to test it.

I now think that not adding support for WAL buffers in EXPLAIN output in the
initial patch scope was a mistake, as this is probably the best way to test the
WAL counters for parallel queries.  This shouldn't be hard to add though, and I
can work on it quickly if there's still a chance to get this feature included
in pg13.

> I think for now we should remove
> that part of changes and rather think how to get that for parallel
> operations that can write WAL.  For ex. we might need to do something
> similar to what this patch has done in begin_parallel_vacuum and
> end_parallel_vacuum.  Would you like to attempt that?

Do you mean removing WAL buffers instrumentation from parallel query
infrastructure?

For parallel utility that can do writes it's probably better to keep the
discussion in the other part of the thread.  I tried to think a little bit
about that, but for now I don't have a better idea than adding something
similar to intrumentation for utility command to have a general infrastructure,
as building a workaround for specific utility looks like the wrong approach.
But this would require quite import changes in utility handling, which is maybe
not a good idea a couple of week before the feature freeze, and that is
definitely not backpatchable so that won't fix the issue for parallel index
build that exists since pg11.



On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > > > >
> > > > > > I see some basic problems with the patch.  The way it tries to compute
> > > > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > > > or point me to any test done where we have computed WAL for parallel
> > > > > > operations like Parallel Vacuum or Parallel Create Index?
> > > > >
> > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > > > > workers won't be accounted for.  That being said, I think that an argument
> > > > > could be made that proper infrastructure should have been added in the original
> > > > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > > > > usage in parallel utility, unless I'm missing something.
> > > >
> > > > Just to be sure I did a quick test with pg_stat_statements behavior using
> > > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> > > > doesn't reflect parallel workers' activity.
> > > >
> > >
> > > Sawada-San would like to investigate this? If not, I will look into
> > > this next week.
> >
> > Sure, I'll investigate this issue today.

Thanks for looking at it!

> I've run vacuum with/without parallel workers on the table having 5
> indexes. The vacuum reads all blocks of table and indexes.
>
> * VACUUM command with no parallel workers
> =# select total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query ~ 'vacuum';
>
>   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> shared_blks_dirtied | shared_blks_written
> --------------+-----------------+------------------+-----------------+---------------------+---------------------
>  19857.217207 |           45238 |           226944 |          272182 |
>              225943 |              225894
> (1 row)
>
> * VACUUM command with 4 parallel workers
> =# select total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query ~ 'vacuum';
>
>  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> shared_blks_dirtied | shared_blks_written
> -------------+-----------------+------------------+-----------------+---------------------+---------------------
>  6932.117365 |           45205 |            73079 |          118284 |
>              72403 |               72365
> (1 row)
>
> The total number of blocks of table and indexes are about 182243
> blocks. As Julien reported, obviously the total number of read blocks
> during parallel vacuum is much less than single process vacuum's
> result.
>
> Parallel create index has the same issue but it doesn't exist in
> parallel queries for SELECTs.
>
> I think we need to change parallel maintenance commands so that they
> report buffer usage like what ParallelQueryMain() does; prepare to
> track buffer usage during query execution by
> InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> after parallel maintenance command. To report buffer usage of parallel
> maintenance command correctly, I'm thinking that we can (1) change
> parallel create index and parallel vacuum so that they prepare
> gathering buffer usage, or (2) have a common entry point for parallel
> maintenance commands that is responsible for gathering buffer usage
> and calling the entry functions for individual maintenance command.
> I'll investigate it more in depth.

As I just mentioned, (2) seems like a better design as it's quite
likely that the number of parallel-aware utilities will probably
continue to increase.  One problem also is that parallel CREATE INDEX
has been introduced in pg11, so (2) probably won't be packpatchable
(and (1) seems problematic too).



On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I've run vacuum with/without parallel workers on the table having 5
> > indexes. The vacuum reads all blocks of table and indexes.
> >
> > * VACUUM command with no parallel workers
> > =# select total_time, shared_blks_hit, shared_blks_read,
> > shared_blks_hit + shared_blks_read as total_read_blks,
> > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > query ~ 'vacuum';
> >
> >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > shared_blks_dirtied | shared_blks_written
> > --------------+-----------------+------------------+-----------------+---------------------+---------------------
> >  19857.217207 |           45238 |           226944 |          272182 |
> >              225943 |              225894
> > (1 row)
> >
> > * VACUUM command with 4 parallel workers
> > =# select total_time, shared_blks_hit, shared_blks_read,
> > shared_blks_hit + shared_blks_read as total_read_blks,
> > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > query ~ 'vacuum';
> >
> >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > shared_blks_dirtied | shared_blks_written
> > -------------+-----------------+------------------+-----------------+---------------------+---------------------
> >  6932.117365 |           45205 |            73079 |          118284 |
> >              72403 |               72365
> > (1 row)
> >
> > The total number of blocks of table and indexes are about 182243
> > blocks. As Julien reported, obviously the total number of read blocks
> > during parallel vacuum is much less than single process vacuum's
> > result.
> >
> > Parallel create index has the same issue but it doesn't exist in
> > parallel queries for SELECTs.
> >
> > I think we need to change parallel maintenance commands so that they
> > report buffer usage like what ParallelQueryMain() does; prepare to
> > track buffer usage during query execution by
> > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > after parallel maintenance command. To report buffer usage of parallel
> > maintenance command correctly, I'm thinking that we can (1) change
> > parallel create index and parallel vacuum so that they prepare
> > gathering buffer usage, or (2) have a common entry point for parallel
> > maintenance commands that is responsible for gathering buffer usage
> > and calling the entry functions for individual maintenance command.
> > I'll investigate it more in depth.
>
> As I just mentioned, (2) seems like a better design as it's quite
> likely that the number of parallel-aware utilities will probably
> continue to increase.  One problem also is that parallel CREATE INDEX
> has been introduced in pg11, so (2) probably won't be packpatchable
> (and (1) seems problematic too).
>

I am not sure if we can decide at this stage whether it is
back-patchable or not.  Let's first see the patch and if it turns out
to be complex, then we can try to do some straight-forward fix for
back-branches.  In general, I don't see why the fix here should be
complex?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> I'm not sure that I get your point.  I'm assuming that you meant
> parallel-read-only queries, but surely buffer usage infrastructure for
> parallel query relies on the same approach as non-parallel one (each node
> computes the process-local pgBufferUsage diff) and sums all of that at the end
> of the parallel query execution.  I also don't see how whether the query is
> read-only or not is relevant here as far as instrumentation is concerned,
> especially since read-only query can definitely do writes and increase the
> count of dirtied buffers, like a write query would.  For instance a hint
> bit change can be done in a parallel query AFAIK, and this can generate WAL
> records in wal_log_hints is enabled, so that's probably one way to test it.
>

Yeah, that way we can test it.  Can you try that?

> I now think that not adding support for WAL buffers in EXPLAIN output in the
> initial patch scope was a mistake, as this is probably the best way to test the
> WAL counters for parallel queries.  This shouldn't be hard to add though, and I
> can work on it quickly if there's still a chance to get this feature included
> in pg13.
>

I am not sure we will add it in Explain or not (maybe we need inputs
from others in this regard), but if it helps in testing this part of
the patch, then it is a good idea to write a patch for it.  You might
want to keep it separate from the main patch as we might not commit
it.

> > I think for now we should remove
> > that part of changes and rather think how to get that for parallel
> > operations that can write WAL.  For ex. we might need to do something
> > similar to what this patch has done in begin_parallel_vacuum and
> > end_parallel_vacuum.  Would you like to attempt that?
>
> Do you mean removing WAL buffers instrumentation from parallel query
> infrastructure?
>

Yes, I meant that but now I realize we need those and your proposed
way of testing it can help us in validating those changes.

> For parallel utility that can do writes it's probably better to keep the
> discussion in the other part of the thread.
>

Sure, I am fine with that but I am not sure if it is a good idea to
commit this patch without having a way to compute WAL utilization for
those commands.

  I tried to think a little bit
> about that, but for now I don't have a better idea than adding something
> similar to intrumentation for utility command to have a general infrastructure,
> as building a workaround for specific utility looks like the wrong approach.
>

I don't know what exactly you have in mind as I don't see why it
should be too complex.  Let's wait for a patch from Sawada-San on
buffer usage stuff and in the meantime, we can work on other parts of
this patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > I've run vacuum with/without parallel workers on the table having 5
> > > indexes. The vacuum reads all blocks of table and indexes.
> > >
> > > * VACUUM command with no parallel workers
> > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > query ~ 'vacuum';
> > >
> > >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > shared_blks_dirtied | shared_blks_written
> > >
--------------+-----------------+------------------+-----------------+---------------------+---------------------
> > >  19857.217207 |           45238 |           226944 |          272182 |
> > >              225943 |              225894
> > > (1 row)
> > >
> > > * VACUUM command with 4 parallel workers
> > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > query ~ 'vacuum';
> > >
> > >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > shared_blks_dirtied | shared_blks_written
> > > -------------+-----------------+------------------+-----------------+---------------------+---------------------
> > >  6932.117365 |           45205 |            73079 |          118284 |
> > >              72403 |               72365
> > > (1 row)
> > >
> > > The total number of blocks of table and indexes are about 182243
> > > blocks. As Julien reported, obviously the total number of read blocks
> > > during parallel vacuum is much less than single process vacuum's
> > > result.
> > >
> > > Parallel create index has the same issue but it doesn't exist in
> > > parallel queries for SELECTs.
> > >
> > > I think we need to change parallel maintenance commands so that they
> > > report buffer usage like what ParallelQueryMain() does; prepare to
> > > track buffer usage during query execution by
> > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > > after parallel maintenance command. To report buffer usage of parallel
> > > maintenance command correctly, I'm thinking that we can (1) change
> > > parallel create index and parallel vacuum so that they prepare
> > > gathering buffer usage, or (2) have a common entry point for parallel
> > > maintenance commands that is responsible for gathering buffer usage
> > > and calling the entry functions for individual maintenance command.
> > > I'll investigate it more in depth.
> >
> > As I just mentioned, (2) seems like a better design as it's quite
> > likely that the number of parallel-aware utilities will probably
> > continue to increase.  One problem also is that parallel CREATE INDEX
> > has been introduced in pg11, so (2) probably won't be packpatchable
> > (and (1) seems problematic too).
> >
>
> I am not sure if we can decide at this stage whether it is
> back-patchable or not.  Let's first see the patch and if it turns out
> to be complex, then we can try to do some straight-forward fix for
> back-branches.

Agreed.

> In general, I don't see why the fix here should be
> complex?

Yeah, particularly the approach (1) will not be complex. I'll write a
patch tomorrow.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Mar 23, 2020 at 11:24:50PM +0900, Fujii Masao wrote:
> 
> > Here are the comments for 0001 patch.
> > 
> > +            /*
> > +             * Report a full page image constructed for the WAL record
> > +             */
> > +            pgWalUsage.wal_fp_records++;
> > 
> > Isn't it better to use "fpw" or "fpi" for the variable name rather than
> > "fp" here? In other places, "fpw" and "fpi" are used for full page
> > writes/image.

Agreed, I went with fpw.

> > ISTM that this counter could be incorrect if XLogInsertRecord() determines to
> > calculate again whether FPI is necessary or not. No? IOW, this issue could
> > happen if XLogInsert() calls  XLogRecordAssemble() multiple times in
> > its do-while loop. Isn't this problematic?

Yes probably.  I also see while adding support for EXPLAIN/auto_explain that
the previous approach was incrementing both records and fpw_records, while it
should be only one of those for each record.  I fixed this using the approach I
previously mentionned in [1] which seems to work just fine.

> > +    long        wal_bytes;        /* size of wal records produced */
> > 
> > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable
> > rather than long?

Yes indeed.  I switched to uint64, and modified everything accordingly (and
changed pgss to output numeric as there's no other way to handle unsigned int8)

> > +    shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space);
> > 
> > bufusage_space should be walusage_space here?

Good catch, fixed.

> > /*
> >   * Finish parallel execution.  We wait for parallel workers to finish, and
> >   * accumulate their buffer usage.
> >   */
> > 
> > There are some comments mentioning buffer usage, in execParallel.c.
> > For example, the top comment for ExecParallelFinish(), as the above.
> > These should be updated.

I went through all the file and quickly checked in other places, and I think I
fixed all required comments.

> Here are the comments for 0002 patch.
> 
> +    OUT wal_write_bytes int8,
> +    OUT wal_write_records int8,
> +    OUT wal_write_fp_records int8
> 
> Isn't "write" part in the column names confusing because it's WAL
> *generated* (not written) by the statement?

Agreed, I simply dropped the "_write" part everywhere.

> +RETURNS SETOF record
> +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4'
> +LANGUAGE C STRICT VOLATILE;
> 
> PARALLEL SAFE should be specified?

Indeed, fixed.

> +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */
> 
> ISTM it's good timing to have also pg_stat_statements--1.8.sql since
> the definition of pg_stat_statements() is changed. Thought?

As mentionned in other pgss thread, I think the general agreement is to never
provide full script anymore, so I didn't changed that.

> +-- CHECKPOINT before WAL tests to ensure test stability
> +CHECKPOINT;
> 
> Is this true? I thought you added this because the number of FPI
> should be larger than zero in the subsequent test. No? But there
> seems no such test. I'm not excited about adding the test checking
> the number of FPI because it looks fragile, though...

It should ensure a FPW for each new block touch, but yes that's quite fragile.

Since I fixed the record / FPW record counters, I saw that this was actually
already broken as there was a mix of FPW and non-FPW, so I dropped the
checkpoint and just tested (wal_record + wal_fpw_record) instead.

> +UPDATE pgss_test SET b = '333' WHERE a = 3 \;
> +UPDATE pgss_test SET b = '444' WHERE a = 4 ;
> 
> Could you tell me why several queries need to be run to test
> the WAL usage? Isn't running a few query enough for the test purpase?

As far as I can see it's used to test multiple scenario (single command /
multiple commands in or outside explicit transaction).  It shouldn't add a lot
of overhead and since some commands are issues with "\;" it's also testing
proper query string isolation when multi-command query string is provided,
which doesn't seem like a bad idea.  I didn't changed that but I'm not opposed
to remove some of the updates if needed.

Also, to answer Amit Kapila's comments about WAL records and parallel query, I
added support for both EXPLAIN and auto_explain (tab completion and
documentation are also updated), and using a simple table with an index, with
forced parallelism and no leader participation and concurrent update on the
same table, I could test that WAL usage is working as expected:

rjuju=# explain (analyze, wal, verbose) select * from t1;
                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..8805.05 rows=100010 width=14) (actual time=8.695..47.592 rows=100010 loops=1)
   Output: id, val
   Workers Planned: 2
   Workers Launched: 2
   WAL: records=204 bytes=86198
   ->  Parallel Seq Scan on public.t1  (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.056..29.112 rows=50005
loops
         Output: id, val
         WAL: records=204 bytes=86198
         Worker 0:  actual time=0.060..28.995 rows=49593 loops=1
           WAL: records=105 bytes=44222
         Worker 1:  actual time=0.052..29.230 rows=50417 loops=1
           WAL: records=99 bytes=41976
 Planning Time: 0.038 ms
 Execution Time: 53.957 ms
(14 rows)

and the same query when nothing end up being modified:

rjuju=# explain (analyze, wal, verbose) select * from t1;
                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=0.00..8805.05 rows=100010 width=14) (actual time=9.413..48.187 rows=100010 loops=1)
   Output: id, val
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on public.t1  (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.033..24.697 rows=50005
loops
         Output: id, val
         Worker 0:  actual time=0.028..24.786 rows=50447 loops=1
         Worker 1:  actual time=0.038..24.609 rows=49563 loops=1
 Planning Time: 0.282 ms
 Execution Time: 55.643 ms
(10 rows)

So it seems to me that WAL usage infrastructure for parallel query is working
just fine.  I added the EXPLAIN/auto_explain in a separate commit just in case.

[1] https://www.postgresql.org/message-id/CAOBaU_aECK1Z7Nn+x=MhvEwrJzK8wyPsPtWAafjqtZN1fYjEmg@mail.gmail.com

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
Hi Amit,

Sorry I just noticed your mail.

On Sun, Mar 29, 2020 at 05:12:16PM +0530, Amit Kapila wrote:
> On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > I'm not sure that I get your point.  I'm assuming that you meant
> > parallel-read-only queries, but surely buffer usage infrastructure for
> > parallel query relies on the same approach as non-parallel one (each node
> > computes the process-local pgBufferUsage diff) and sums all of that at the end
> > of the parallel query execution.  I also don't see how whether the query is
> > read-only or not is relevant here as far as instrumentation is concerned,
> > especially since read-only query can definitely do writes and increase the
> > count of dirtied buffers, like a write query would.  For instance a hint
> > bit change can be done in a parallel query AFAIK, and this can generate WAL
> > records in wal_log_hints is enabled, so that's probably one way to test it.
> >
> 
> Yeah, that way we can test it.  Can you try that?
> 
> > I now think that not adding support for WAL buffers in EXPLAIN output in the
> > initial patch scope was a mistake, as this is probably the best way to test the
> > WAL counters for parallel queries.  This shouldn't be hard to add though, and I
> > can work on it quickly if there's still a chance to get this feature included
> > in pg13.
> >
> 
> I am not sure we will add it in Explain or not (maybe we need inputs
> from others in this regard), but if it helps in testing this part of
> the patch, then it is a good idea to write a patch for it.  You might
> want to keep it separate from the main patch as we might not commit
> it.

As I just wrote in [1] that's exactly what I did.  Using parallel query and
concurrent update on a table I could see that WAL usage for parallel query
seems to be working as one could expect.

> Sure, I am fine with that but I am not sure if it is a good idea to
> commit this patch without having a way to compute WAL utilization for
> those commands.

I'm generally fine with waiting for a fix for the existing issue to be
committed.  But as the feature freeze is approaching, I hope that it won't mean
postponing this feature to v14 because a related 2yo bug has just been
discovered, as it would seem a bit unfair.



On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > I've run vacuum with/without parallel workers on the table having 5
> > > > indexes. The vacuum reads all blocks of table and indexes.
> > > >
> > > > * VACUUM command with no parallel workers
> > > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > > query ~ 'vacuum';
> > > >
> > > >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > > shared_blks_dirtied | shared_blks_written
> > > >
--------------+-----------------+------------------+-----------------+---------------------+---------------------
> > > >  19857.217207 |           45238 |           226944 |          272182 |
> > > >              225943 |              225894
> > > > (1 row)
> > > >
> > > > * VACUUM command with 4 parallel workers
> > > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > > query ~ 'vacuum';
> > > >
> > > >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > > shared_blks_dirtied | shared_blks_written
> > > >
-------------+-----------------+------------------+-----------------+---------------------+---------------------
> > > >  6932.117365 |           45205 |            73079 |          118284 |
> > > >              72403 |               72365
> > > > (1 row)
> > > >
> > > > The total number of blocks of table and indexes are about 182243
> > > > blocks. As Julien reported, obviously the total number of read blocks
> > > > during parallel vacuum is much less than single process vacuum's
> > > > result.
> > > >
> > > > Parallel create index has the same issue but it doesn't exist in
> > > > parallel queries for SELECTs.
> > > >
> > > > I think we need to change parallel maintenance commands so that they
> > > > report buffer usage like what ParallelQueryMain() does; prepare to
> > > > track buffer usage during query execution by
> > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > > > after parallel maintenance command. To report buffer usage of parallel
> > > > maintenance command correctly, I'm thinking that we can (1) change
> > > > parallel create index and parallel vacuum so that they prepare
> > > > gathering buffer usage, or (2) have a common entry point for parallel
> > > > maintenance commands that is responsible for gathering buffer usage
> > > > and calling the entry functions for individual maintenance command.
> > > > I'll investigate it more in depth.
> > >
> > > As I just mentioned, (2) seems like a better design as it's quite
> > > likely that the number of parallel-aware utilities will probably
> > > continue to increase.  One problem also is that parallel CREATE INDEX
> > > has been introduced in pg11, so (2) probably won't be packpatchable
> > > (and (1) seems problematic too).
> > >
> >
> > I am not sure if we can decide at this stage whether it is
> > back-patchable or not.  Let's first see the patch and if it turns out
> > to be complex, then we can try to do some straight-forward fix for
> > back-branches.
>
> Agreed.
>
> > In general, I don't see why the fix here should be
> > complex?
>
> Yeah, particularly the approach (1) will not be complex. I'll write a
> patch tomorrow.
>

I've attached two patches fixing this issue for parallel index
creation and parallel vacuum. These approaches take the same approach;
we allocate DSM to share buffer usage and the leader gathers them,
described as approach (1) above. I think this is a straightforward
approach for this issue. We can create a common entry point for
parallel maintenance command that is responsible for gathering buffer
usage as well as sharing query text etc. But it will accompany
relatively big change and it might be overkill at this stage. We can
discuss that and it will become an item for PG14.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > I've run vacuum with/without parallel workers on the table having 5
> > > > > indexes. The vacuum reads all blocks of table and indexes.
> > > > >
> > > > > * VACUUM command with no parallel workers
> > > > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > > > query ~ 'vacuum';
> > > > >
> > > > >   total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > > > shared_blks_dirtied | shared_blks_written
> > > > >
--------------+-----------------+------------------+-----------------+---------------------+---------------------
> > > > >  19857.217207 |           45238 |           226944 |          272182 |
> > > > >              225943 |              225894
> > > > > (1 row)
> > > > >
> > > > > * VACUUM command with 4 parallel workers
> > > > > =# select total_time, shared_blks_hit, shared_blks_read,
> > > > > shared_blks_hit + shared_blks_read as total_read_blks,
> > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> > > > > query ~ 'vacuum';
> > > > >
> > > > >  total_time  | shared_blks_hit | shared_blks_read | total_read_blks |
> > > > > shared_blks_dirtied | shared_blks_written
> > > > >
-------------+-----------------+------------------+-----------------+---------------------+---------------------
> > > > >  6932.117365 |           45205 |            73079 |          118284 |
> > > > >              72403 |               72365
> > > > > (1 row)
> > > > >
> > > > > The total number of blocks of table and indexes are about 182243
> > > > > blocks. As Julien reported, obviously the total number of read blocks
> > > > > during parallel vacuum is much less than single process vacuum's
> > > > > result.
> > > > >
> > > > > Parallel create index has the same issue but it doesn't exist in
> > > > > parallel queries for SELECTs.
> > > > >
> > > > > I think we need to change parallel maintenance commands so that they
> > > > > report buffer usage like what ParallelQueryMain() does; prepare to
> > > > > track buffer usage during query execution by
> > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > > > > after parallel maintenance command. To report buffer usage of parallel
> > > > > maintenance command correctly, I'm thinking that we can (1) change
> > > > > parallel create index and parallel vacuum so that they prepare
> > > > > gathering buffer usage, or (2) have a common entry point for parallel
> > > > > maintenance commands that is responsible for gathering buffer usage
> > > > > and calling the entry functions for individual maintenance command.
> > > > > I'll investigate it more in depth.
> > > >
> > > > As I just mentioned, (2) seems like a better design as it's quite
> > > > likely that the number of parallel-aware utilities will probably
> > > > continue to increase.  One problem also is that parallel CREATE INDEX
> > > > has been introduced in pg11, so (2) probably won't be packpatchable
> > > > (and (1) seems problematic too).
> > > >
> > >
> > > I am not sure if we can decide at this stage whether it is
> > > back-patchable or not.  Let's first see the patch and if it turns out
> > > to be complex, then we can try to do some straight-forward fix for
> > > back-branches.
> >
> > Agreed.
> >
> > > In general, I don't see why the fix here should be
> > > complex?
> >
> > Yeah, particularly the approach (1) will not be complex. I'll write a
> > patch tomorrow.
> >
>
> I've attached two patches fixing this issue for parallel index
> creation and parallel vacuum. These approaches take the same approach;
> we allocate DSM to share buffer usage and the leader gathers them,
> described as approach (1) above. I think this is a straightforward
> approach for this issue. We can create a common entry point for
> parallel maintenance command that is responsible for gathering buffer
> usage as well as sharing query text etc. But it will accompany
> relatively big change and it might be overkill at this stage. We can
> discuss that and it will become an item for PG14.
>

The patch for vacuum conflicts with recent changes in vacuum. So I've
attached rebased one.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Mon, Mar 30, 2020 at 04:01:18PM +0900, Masahiko Sawada wrote:
> On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > > > > I think we need to change parallel maintenance commands so that they
> > > > > > report buffer usage like what ParallelQueryMain() does; prepare to
> > > > > > track buffer usage during query execution by
> > > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery()
> > > > > > after parallel maintenance command. To report buffer usage of parallel
> > > > > > maintenance command correctly, I'm thinking that we can (1) change
> > > > > > parallel create index and parallel vacuum so that they prepare
> > > > > > gathering buffer usage, or (2) have a common entry point for parallel
> > > > > > maintenance commands that is responsible for gathering buffer usage
> > > > > > and calling the entry functions for individual maintenance command.
> > > > > > I'll investigate it more in depth.
> > > > >
> > > [...]
> >
> > I've attached two patches fixing this issue for parallel index
> > creation and parallel vacuum. These approaches take the same approach;
> > we allocate DSM to share buffer usage and the leader gathers them,
> > described as approach (1) above. I think this is a straightforward
> > approach for this issue. We can create a common entry point for
> > parallel maintenance command that is responsible for gathering buffer
> > usage as well as sharing query text etc. But it will accompany
> > relatively big change and it might be overkill at this stage. We can
> > discuss that and it will become an item for PG14.
> >
> 
> The patch for vacuum conflicts with recent changes in vacuum. So I've
> attached rebased one.

Thanks Sawada-san!

Just minor nitpicking:

+   int         i;

    Assert(!IsParallelWorker());
    Assert(ParallelVacuumIsActive(lps));
@@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
    /* Wait for all vacuum workers to finish */
    WaitForParallelWorkersToFinish(lps->pcxt);

+   /*
+    * Next, accumulate buffer usage.  (This must wait for the workers to
+    * finish, or we might get incomplete data.)
+    */
+   for (i = 0; i < nworkers; i++)
+       InstrAccumParallelQuery(&lps->buffer_usage[i]);

We now allow declaring a variable in those loops, so it may be better to avoid
declaring i outside the for scope?

Other than that both patch looks good to me and a good fit for packpatching.  I
also did some testing on VACUUM and CREATE INDEX and it works as expected.



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sun, Mar 29, 2020 at 5:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>

@@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata,
  ProcLastRecPtr = StartPos;
  XactLastRecEnd = EndPos;

+ /* Provide WAL update data to the instrumentation */
+ if (inserted)
+ {
+ pgWalUsage.wal_bytes += rechdr->xl_tot_len;
+ if (doPageWrites && fpw_lsn <= RedoRecPtr)
+ pgWalUsage.wal_fpw_records++;
+ else
+ pgWalUsage.wal_records++;
+ }
+

I think the above code has multiple problems. (a) fpw_lsn can be
InvalidXLogRecPtr and still there could be full-page image (for ex.
when REGBUF_FORCE_IMAGE flag for buffer is set).  (b) There could be
multiple FPW records while inserting a record; consider when there are
multiple registered buffers.  I think the right place to figure this
out is XLogRecordAssemble. (c) There are cases when we also attach the
record data even when we decide to write FPW (cf. REGBUF_KEEP_DATA),
so we might want to increment wal_fpw_records and wal_records for such
cases.

I think the right place to compute this information is
XLogRecordAssemble even though we update it at the place where you
have it in the patch.  You can probably compute that in local
variables and then transfer to pgWalUsage in XLogInsertRecord.  I am
fine if you can think of some other way but the current patch doesn't
seem correct to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
> On Sun, Mar 29, 2020 at 5:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> 
> @@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata,
>   ProcLastRecPtr = StartPos;
>   XactLastRecEnd = EndPos;
> 
> + /* Provide WAL update data to the instrumentation */
> + if (inserted)
> + {
> + pgWalUsage.wal_bytes += rechdr->xl_tot_len;
> + if (doPageWrites && fpw_lsn <= RedoRecPtr)
> + pgWalUsage.wal_fpw_records++;
> + else
> + pgWalUsage.wal_records++;
> + }
> +
> 
> I think the above code has multiple problems. (a) fpw_lsn can be
> InvalidXLogRecPtr and still there could be full-page image (for ex.
> when REGBUF_FORCE_IMAGE flag for buffer is set).  (b) There could be
> multiple FPW records while inserting a record; consider when there are
> multiple registered buffers.  I think the right place to figure this
> out is XLogRecordAssemble. (c) There are cases when we also attach the
> record data even when we decide to write FPW (cf. REGBUF_KEEP_DATA),
> so we might want to increment wal_fpw_records and wal_records for such
> cases.
> 
> I think the right place to compute this information is
> XLogRecordAssemble even though we update it at the place where you
> have it in the patch.  You can probably compute that in local
> variables and then transfer to pgWalUsage in XLogInsertRecord.  I am
> fine if you can think of some other way but the current patch doesn't
> seem correct to me.

My previous approach was indeed totally broken.  v8 attached which hopefully
will be ok.

Attachment
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> The patch for vacuum conflicts with recent changes in vacuum. So I've
> attached rebased one.
>

+ /*
+ * Next, accumulate buffer usage.  (This must wait for the workers to
+ * finish, or we might get incomplete data.)
+ */
+ for (i = 0; i < nworkers; i++)
+ InstrAccumParallelQuery(&lps->buffer_usage[i]);
+

This should be done for launched workers aka
lps->pcxt->nworkers_launched.  I think a similar problem exists in
create index related patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > The patch for vacuum conflicts with recent changes in vacuum. So I've
> > attached rebased one.
> >
>
> + /*
> + * Next, accumulate buffer usage.  (This must wait for the workers to
> + * finish, or we might get incomplete data.)
> + */
> + for (i = 0; i < nworkers; i++)
> + InstrAccumParallelQuery(&lps->buffer_usage[i]);
> +
>
> This should be done for launched workers aka
> lps->pcxt->nworkers_launched.  I think a similar problem exists in
> create index related patch.

You're right. Fixed in the new patches.

On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Just minor nitpicking:
>
> +   int         i;
>
>     Assert(!IsParallelWorker());
>     Assert(ParallelVacuumIsActive(lps));
> @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
>     /* Wait for all vacuum workers to finish */
>     WaitForParallelWorkersToFinish(lps->pcxt);
>
> +   /*
> +    * Next, accumulate buffer usage.  (This must wait for the workers to
> +    * finish, or we might get incomplete data.)
> +    */
> +   for (i = 0; i < nworkers; i++)
> +       InstrAccumParallelQuery(&lps->buffer_usage[i]);
>
> We now allow declaring a variable in those loops, so it may be better to avoid
> declaring i outside the for scope?

We can do that but I was not sure if it's good since other codes
around there don't use that. So I'd like to leave it for committers.
It's a trivial change.

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Tue, Mar 31, 2020 at 10:44 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > The patch for vacuum conflicts with recent changes in vacuum. So I've
> > > attached rebased one.
> > >
> >
> > + /*
> > + * Next, accumulate buffer usage.  (This must wait for the workers to
> > + * finish, or we might get incomplete data.)
> > + */
> > + for (i = 0; i < nworkers; i++)
> > + InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > +
> >
> > This should be done for launched workers aka
> > lps->pcxt->nworkers_launched.  I think a similar problem exists in
> > create index related patch.
>
> You're right. Fixed in the new patches.
>
> On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Just minor nitpicking:
> >
> > +   int         i;
> >
> >     Assert(!IsParallelWorker());
> >     Assert(ParallelVacuumIsActive(lps));
> > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
> >     /* Wait for all vacuum workers to finish */
> >     WaitForParallelWorkersToFinish(lps->pcxt);
> >
> > +   /*
> > +    * Next, accumulate buffer usage.  (This must wait for the workers to
> > +    * finish, or we might get incomplete data.)
> > +    */
> > +   for (i = 0; i < nworkers; i++)
> > +       InstrAccumParallelQuery(&lps->buffer_usage[i]);
> >
> > We now allow declaring a variable in those loops, so it may be better to avoid
> > declaring i outside the for scope?
>
> We can do that but I was not sure if it's good since other codes
> around there don't use that. So I'd like to leave it for committers.
> It's a trivial change.

I have reviewed the patch and the patch looks fine to me.

One minor comment
/+ /* Points to buffer usage are in DSM */
+ BufferUsage *buffer_usage;
+
/buffer usage are in DSM / buffer usage area in DSM

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
> >
> > I think the right place to compute this information is
> > XLogRecordAssemble even though we update it at the place where you
> > have it in the patch.  You can probably compute that in local
> > variables and then transfer to pgWalUsage in XLogInsertRecord.  I am
> > fine if you can think of some other way but the current patch doesn't
> > seem correct to me.
>
> My previous approach was indeed totally broken.  v8 attached which hopefully
> will be ok.
>

This is better.  Few more comments:
1. The point (c) from my previous email doesn't seem to be fixed
properly.  Basically, the record data is only attached with FPW in
some particular cases like where REGBUF_KEEP_DATA is set, but the
patch assumes it is always set.

2.
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;

Typo. /imsage/image

3.  We need to enhance the patch to cover WAL usage for parallel
vacuum and parallel create index based on Sawada-San's latest patch[1]
which fixed the case for buffer usage.

[1] - https://www.postgresql.org/message-id/CA%2Bfd4k5L4yVoWz0smymmqB4_SMHd2tyJExUgA_ACsL7k00B5XQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Mar 31, 2020 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
> > >
> > > I think the right place to compute this information is
> > > XLogRecordAssemble even though we update it at the place where you
> > > have it in the patch.  You can probably compute that in local
> > > variables and then transfer to pgWalUsage in XLogInsertRecord.  I am
> > > fine if you can think of some other way but the current patch doesn't
> > > seem correct to me.
> >
> > My previous approach was indeed totally broken.  v8 attached which hopefully
> > will be ok.
> >
>
> This is better.  Few more comments:
> 1. The point (c) from my previous email doesn't seem to be fixed
> properly.  Basically, the record data is only attached with FPW in
> some particular cases like where REGBUF_KEEP_DATA is set, but the
> patch assumes it is always set.

As I mentioned multiple times already, I'm really not familiar with
the WAL code,  so I'll be happy to be proven wrong but my reading is
that in XLogRecordAssemble(), there are 2 different things being done:

- a FPW is optionally added, iif include_image is true, which doesn't
take into account REGBUF_KEEP_DATA.  Looking at that part of the code
I don't see any sign of the recorded FPW being skipped or discarded if
REGBUF_KEEP_DATA is not set, and useful variables such as total_len
are modified
- then data is also optionally added, iif needs_data is set.

IIUC a FPW can be added even if the WAL record doesn't contain data.
So the behavior look ok to me, as what seems to be useful it to
distinguish 9KB WAL for 1 record of 9KB from 9KB or WAL for 1KB record
and 1 FPW.

What am I missing here?

> 2.
> + /* Report a full page imsage constructed for the WAL record */
> + *num_fpw += 1;
>
> Typo. /imsage/image

Oops yes, will fix.

> 3.  We need to enhance the patch to cover WAL usage for parallel
> vacuum and parallel create index based on Sawada-San's latest patch[1]
> which fixed the case for buffer usage.

I'm sorry but I'm not following.  Do you mean adding regression tests
for that case?



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Tue, Mar 31, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
> > >
> > > I think the right place to compute this information is
> > > XLogRecordAssemble even though we update it at the place where you
> > > have it in the patch.  You can probably compute that in local
> > > variables and then transfer to pgWalUsage in XLogInsertRecord.  I am
> > > fine if you can think of some other way but the current patch doesn't
> > > seem correct to me.
> >
> > My previous approach was indeed totally broken.  v8 attached which hopefully
> > will be ok.
> >
>
> This is better.  Few more comments:
> 1. The point (c) from my previous email doesn't seem to be fixed
> properly.  Basically, the record data is only attached with FPW in
> some particular cases like where REGBUF_KEEP_DATA is set, but the
> patch assumes it is always set.
>
> 2.
> + /* Report a full page imsage constructed for the WAL record */
> + *num_fpw += 1;
>
> Typo. /imsage/image
>
> 3.  We need to enhance the patch to cover WAL usage for parallel
> vacuum and parallel create index based on Sawada-San's latest patch[1]
> which fixed the case for buffer usage.

I have started reviewing this patch and I have some comments/questions.

1.
@@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage;

 static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);

+WalUsage pgWalUsage;
+static WalUsage save_pgWalUsage;
+
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);

Better we move all variable declaration first along with other
variables and then function declaration along with other function
declaration.  That is the convention we follow.

2.
  {
  bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
+ bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;

I think you need to run pgindent,  we should give only one space
between the variable name and '='.
so we need to change like below

bool            need_wal = (instrument_options & INSTRUMENT_WAL) != 0;

3.
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_fpw_records; /* # of full page write WAL records
+ * produced */

IMHO, the name wal_fpw_records is bit confusing,  First I thought it
is counting the number of wal records which actually has FPW, then
after seeing code, I realized that it is actually counting total FPW.
Shouldn't we rename it to just wal_fpw? or wal_num_fpw or
wal_fpw_count?


4.  Currently, we are combining all full-page write
force/normal/consistency checks in one category.  I am not sure
whether it will be good information to know how many are force_fpw and
how many are normal_fpw?


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Tue, Mar 31, 2020 at 2:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> 4.  Currently, we are combining all full-page write
> force/normal/consistency checks in one category.  I am not sure
> whether it will be good information to know how many are force_fpw and
> how many are normal_fpw?
>

We can do it if we want but I am not sure how useful it will be.  I
think we can always enhance this information if people really need
this and have a clear use-case in mind.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Tue, Mar 31, 2020 at 2:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Mar 31, 2020 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote:
> > > >
> > > > I think the right place to compute this information is
> > > > XLogRecordAssemble even though we update it at the place where you
> > > > have it in the patch.  You can probably compute that in local
> > > > variables and then transfer to pgWalUsage in XLogInsertRecord.  I am
> > > > fine if you can think of some other way but the current patch doesn't
> > > > seem correct to me.
> > >
> > > My previous approach was indeed totally broken.  v8 attached which hopefully
> > > will be ok.
> > >
> >
> > This is better.  Few more comments:
> > 1. The point (c) from my previous email doesn't seem to be fixed
> > properly.  Basically, the record data is only attached with FPW in
> > some particular cases like where REGBUF_KEEP_DATA is set, but the
> > patch assumes it is always set.
>
> As I mentioned multiple times already, I'm really not familiar with
> the WAL code,  so I'll be happy to be proven wrong but my reading is
> that in XLogRecordAssemble(), there are 2 different things being done:
>
> - a FPW is optionally added, iif include_image is true, which doesn't
> take into account REGBUF_KEEP_DATA.  Looking at that part of the code
> I don't see any sign of the recorded FPW being skipped or discarded if
> REGBUF_KEEP_DATA is not set, and useful variables such as total_len
> are modified
> - then data is also optionally added, iif needs_data is set.
>
> IIUC a FPW can be added even if the WAL record doesn't contain data.
> So the behavior look ok to me, as what seems to be useful it to
> distinguish 9KB WAL for 1 record of 9KB from 9KB or WAL for 1KB record
> and 1 FPW.
>

It is possible that both of us are having different meanings for below
two variables:
+typedef struct WalUsage
+{
+ long wal_records; /* # of WAL records produced */
+ long wal_fpw_records; /* # of full page write WAL records
+ * produced */


Let me clarify my understanding.  Say if the record is just an FPI
(ex. XLOG_FPI) and doesn't contain any data then do we want to add one
to each of wal_fpw_records and wal_records?  My understanding was in
such a case we will just increment wal_fpw_records.

>
> > 3.  We need to enhance the patch to cover WAL usage for parallel
> > vacuum and parallel create index based on Sawada-San's latest patch[1]
> > which fixed the case for buffer usage.
>
> I'm sorry but I'm not following.  Do you mean adding regression tests
> for that case?
>

No.  I mean to say we should implement WAL usage calculation for those
two parallel commands.  AFAICS, your patch doesn't cover those two
commands.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Kuntal Ghosh
Date:
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
@@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
  bool doPageWrites;
  XLogRecPtr fpw_lsn;
  XLogRecData *rdt;
+ int num_fpw = 0;

  /*
  * Get values needed to decide whether to do full-page writes. Since
@@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
  GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);

  rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
- &fpw_lsn);
+ &fpw_lsn, &num_fpw);

- EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
+ EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
  } while (EndPos == InvalidXLogRecPtr);

I think there are some issues in the num_fpw calculation. For some
cases, we have to return from XLogInsert without inserting a record.
Basically, we've to recompute/reassemble the same record. In those
cases, num_fpw should be reset. Thoughts?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Mar 31, 2020 at 11:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have started reviewing this patch and I have some comments/questions.

Thanks a lot!

>
> 1.
> @@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage;
>
>  static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
>
> +WalUsage pgWalUsage;
> +static WalUsage save_pgWalUsage;
> +
> +static void WalUsageAdd(WalUsage *dst, WalUsage *add);
>
> Better we move all variable declaration first along with other
> variables and then function declaration along with other function
> declaration.  That is the convention we follow.

Agreed, fixed.

> 2.
>   {
>   bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0;
> + bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0;
>
> I think you need to run pgindent,  we should give only one space
> between the variable name and '='.
> so we need to change like below
>
> bool            need_wal = (instrument_options & INSTRUMENT_WAL) != 0;

Done.

> 3.
> +typedef struct WalUsage
> +{
> + long wal_records; /* # of WAL records produced */
> + long wal_fpw_records; /* # of full page write WAL records
> + * produced */
>
> IMHO, the name wal_fpw_records is bit confusing,  First I thought it
> is counting the number of wal records which actually has FPW, then
> after seeing code, I realized that it is actually counting total FPW.
> Shouldn't we rename it to just wal_fpw? or wal_num_fpw or
> wal_fpw_count?

Yes I agree, the name was too confusing.  I went with wal_num_fpw.  I
also used the same for pg_stat_statements.  Other fields are usually
named with a trailing "s" but wal_fpws just seems too weird.  I can
change it if consistency is preferred here.

> 4.  Currently, we are combining all full-page write
> force/normal/consistency checks in one category.  I am not sure
> whether it will be good information to know how many are force_fpw and
> how many are normal_fpw?

I agree with Amit's POV.  For now a single counter seems like enough
to diagnose many behaviors.

I'll keep answering following mails before sending an updated patchset.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Mar 31, 2020 at 12:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> It is possible that both of us are having different meanings for below
> two variables:
> +typedef struct WalUsage
> +{
> + long wal_records; /* # of WAL records produced */
> + long wal_fpw_records; /* # of full page write WAL records
> + * produced */
>
>
> Let me clarify my understanding.  Say if the record is just an FPI
> (ex. XLOG_FPI) and doesn't contain any data then do we want to add one
> to each of wal_fpw_records and wal_records?  My understanding was in
> such a case we will just increment wal_fpw_records.

Yes, as Dilip just pointed out the misunderstanding is due to this
poor name.  Indeed, in such case what I want is both counters to be
incremented.  What I want is wal_records to reflect the total number
of records generated regardless of any content, and wal_num_fpw the
number of full page images, as it seems to make the most sense, and
the easiest way to estimate the ratio of data due to FPW.

> > > 3.  We need to enhance the patch to cover WAL usage for parallel
> > > vacuum and parallel create index based on Sawada-San's latest patch[1]
> > > which fixed the case for buffer usage.
> >
> > I'm sorry but I'm not following.  Do you mean adding regression tests
> > for that case?
> >
>
> No.  I mean to say we should implement WAL usage calculation for those
> two parallel commands.  AFAICS, your patch doesn't cover those two
> commands.

Oh I see.  I just assumed that Sawada-san's patch would be committed
first and I'd then rebase the patchset on top of the newly added
infrastructure to also handle WAL counters, to avoid any conflict on
that bugfix while this new feature is being discussed.  I'll rebase
the patchset against those patches then.



On Tue, Mar 31, 2020 at 12:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Mar 31, 2020 at 10:44 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > The patch for vacuum conflicts with recent changes in vacuum. So I've
> > > > attached rebased one.
> > > >
> > >
> > > + /*
> > > + * Next, accumulate buffer usage.  (This must wait for the workers to
> > > + * finish, or we might get incomplete data.)
> > > + */
> > > + for (i = 0; i < nworkers; i++)
> > > + InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > > +
> > >
> > > This should be done for launched workers aka
> > > lps->pcxt->nworkers_launched.  I think a similar problem exists in
> > > create index related patch.
> >
> > You're right. Fixed in the new patches.
> >
> > On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > Just minor nitpicking:
> > >
> > > +   int         i;
> > >
> > >     Assert(!IsParallelWorker());
> > >     Assert(ParallelVacuumIsActive(lps));
> > > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
> > >     /* Wait for all vacuum workers to finish */
> > >     WaitForParallelWorkersToFinish(lps->pcxt);
> > >
> > > +   /*
> > > +    * Next, accumulate buffer usage.  (This must wait for the workers to
> > > +    * finish, or we might get incomplete data.)
> > > +    */
> > > +   for (i = 0; i < nworkers; i++)
> > > +       InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > >
> > > We now allow declaring a variable in those loops, so it may be better to avoid
> > > declaring i outside the for scope?
> >
> > We can do that but I was not sure if it's good since other codes
> > around there don't use that. So I'd like to leave it for committers.
> > It's a trivial change.
>
> I have reviewed the patch and the patch looks fine to me.
>
> One minor comment
> /+ /* Points to buffer usage are in DSM */
> + BufferUsage *buffer_usage;
> +
> /buffer usage are in DSM / buffer usage area in DSM
>

While testing I have found one issue.  Basically, during a parallel
vacuum, it was showing more number of
shared_blk_hits+shared_blks_read.  After, some investigation, I found
that during the cleanup phase nworkers are -1, and because of this we
didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
old launched worker count and shared memory also had old buffer read
data which was never updated as we did not try to launch the worker.

diff --git a/src/backend/access/heap/vacuumlazy.c
b/src/backend/access/heap/vacuumlazy.c
index b97b678..5dfaf4d 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
IndexBulkDeleteResult **stats,
         * Next, accumulate buffer usage.  (This must wait for the workers to
         * finish, or we might get incomplete data.)
         */
-       for (i = 0; i < lps->pcxt->nworkers_launched; i++)
+       nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
+       for (i = 0; i < nworkers; i++)
                InstrAccumParallelQuery(&lps->buffer_usage[i]);

It worked after the above fix.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Mar 31, 2020 at 12:21 PM Kuntal Ghosh
<kuntalghosh.2007@gmail.com> wrote:
>
> On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> @@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
>   bool doPageWrites;
>   XLogRecPtr fpw_lsn;
>   XLogRecData *rdt;
> + int num_fpw = 0;
>
>   /*
>   * Get values needed to decide whether to do full-page writes. Since
> @@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
>   GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
>
>   rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
> - &fpw_lsn);
> + &fpw_lsn, &num_fpw);
>
> - EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
> + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
>   } while (EndPos == InvalidXLogRecPtr);
>
> I think there are some issues in the num_fpw calculation. For some
> cases, we have to return from XLogInsert without inserting a record.
> Basically, we've to recompute/reassemble the same record. In those
> cases, num_fpw should be reset. Thoughts?

Mmm, yes but since that's the same record is being recomputed from the
same RedoRecPtr, doesn't it mean that we need to reset the counter?
Otherwise we would count the same FPW multiple times.



Re: WAL usage calculation patch

From
Kuntal Ghosh
Date:
On Tue, Mar 31, 2020 at 7:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Mar 31, 2020 at 12:21 PM Kuntal Ghosh
> <kuntalghosh.2007@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > @@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info)
> >   bool doPageWrites;
> >   XLogRecPtr fpw_lsn;
> >   XLogRecData *rdt;
> > + int num_fpw = 0;
> >
> >   /*
> >   * Get values needed to decide whether to do full-page writes. Since
> > @@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info)
> >   GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites);
> >
> >   rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites,
> > - &fpw_lsn);
> > + &fpw_lsn, &num_fpw);
> >
> > - EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags);
> > + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw);
> >   } while (EndPos == InvalidXLogRecPtr);
> >
> > I think there are some issues in the num_fpw calculation. For some
> > cases, we have to return from XLogInsert without inserting a record.
> > Basically, we've to recompute/reassemble the same record. In those
> > cases, num_fpw should be reset. Thoughts?
>
> Mmm, yes but since that's the same record is being recomputed from the
> same RedoRecPtr, doesn't it mean that we need to reset the counter?
> Otherwise we would count the same FPW multiple times.

Yes. That was my point as well. I missed the part that you're already
resetting the same inside the do-while loop before calling
XLogRecordAssemble. Sorry for the noise.



On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> While testing I have found one issue.  Basically, during a parallel
> vacuum, it was showing more number of
> shared_blk_hits+shared_blks_read.  After, some investigation, I found
> that during the cleanup phase nworkers are -1, and because of this we
> didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
> old launched worker count and shared memory also had old buffer read
> data which was never updated as we did not try to launch the worker.
>
> diff --git a/src/backend/access/heap/vacuumlazy.c
> b/src/backend/access/heap/vacuumlazy.c
> index b97b678..5dfaf4d 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> IndexBulkDeleteResult **stats,
>          * Next, accumulate buffer usage.  (This must wait for the workers to
>          * finish, or we might get incomplete data.)
>          */
> -       for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> +       nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
> +       for (i = 0; i < nworkers; i++)
>                 InstrAccumParallelQuery(&lps->buffer_usage[i]);
>
> It worked after the above fix.
>

Good catch.  I think we should not even call
WaitForParallelWorkersToFinish for such a case.  So, I guess the fix
could be,

if (workers > 0)
{
WaitForParallelWorkersToFinish();
for (i = 0; i < lps->pcxt->nworkers_launched; i++)
                 InstrAccumParallelQuery(&lps->buffer_usage[i]);
}

or something along those lines.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Apr 1, 2020 at 8:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > While testing I have found one issue.  Basically, during a parallel
> > vacuum, it was showing more number of
> > shared_blk_hits+shared_blks_read.  After, some investigation, I found
> > that during the cleanup phase nworkers are -1, and because of this we
> > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
> > old launched worker count and shared memory also had old buffer read
> > data which was never updated as we did not try to launch the worker.
> >
> > diff --git a/src/backend/access/heap/vacuumlazy.c
> > b/src/backend/access/heap/vacuumlazy.c
> > index b97b678..5dfaf4d 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> > IndexBulkDeleteResult **stats,
> >          * Next, accumulate buffer usage.  (This must wait for the workers to
> >          * finish, or we might get incomplete data.)
> >          */
> > -       for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > +       nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
> > +       for (i = 0; i < nworkers; i++)
> >                 InstrAccumParallelQuery(&lps->buffer_usage[i]);
> >
> > It worked after the above fix.
> >
>
> Good catch.  I think we should not even call
> WaitForParallelWorkersToFinish for such a case.  So, I guess the fix
> could be,
>
> if (workers > 0)
> {
> WaitForParallelWorkersToFinish();
> for (i = 0; i < lps->pcxt->nworkers_launched; i++)
>                  InstrAccumParallelQuery(&lps->buffer_usage[i]);
> }
>
> or something along those lines.

Hmm, Right!

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > While testing I have found one issue.  Basically, during a parallel
> > vacuum, it was showing more number of
> > shared_blk_hits+shared_blks_read.  After, some investigation, I found
> > that during the cleanup phase nworkers are -1, and because of this we
> > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
> > old launched worker count and shared memory also had old buffer read
> > data which was never updated as we did not try to launch the worker.
> >
> > diff --git a/src/backend/access/heap/vacuumlazy.c
> > b/src/backend/access/heap/vacuumlazy.c
> > index b97b678..5dfaf4d 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> > IndexBulkDeleteResult **stats,
> >          * Next, accumulate buffer usage.  (This must wait for the workers to
> >          * finish, or we might get incomplete data.)
> >          */
> > -       for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > +       nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
> > +       for (i = 0; i < nworkers; i++)
> >                 InstrAccumParallelQuery(&lps->buffer_usage[i]);
> >
> > It worked after the above fix.
> >
>
> Good catch.  I think we should not even call
> WaitForParallelWorkersToFinish for such a case.  So, I guess the fix
> could be,
>
> if (workers > 0)
> {
> WaitForParallelWorkersToFinish();
> for (i = 0; i < lps->pcxt->nworkers_launched; i++)
>                  InstrAccumParallelQuery(&lps->buffer_usage[i]);
> }
>

Agreed. I've attached the updated patch.

Thank you for testing, Dilip!

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Wed, Apr 1, 2020 at 8:26 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > While testing I have found one issue.  Basically, during a parallel
> > > vacuum, it was showing more number of
> > > shared_blk_hits+shared_blks_read.  After, some investigation, I found
> > > that during the cleanup phase nworkers are -1, and because of this we
> > > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
> > > old launched worker count and shared memory also had old buffer read
> > > data which was never updated as we did not try to launch the worker.
> > >
> > > diff --git a/src/backend/access/heap/vacuumlazy.c
> > > b/src/backend/access/heap/vacuumlazy.c
> > > index b97b678..5dfaf4d 100644
> > > --- a/src/backend/access/heap/vacuumlazy.c
> > > +++ b/src/backend/access/heap/vacuumlazy.c
> > > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> > > IndexBulkDeleteResult **stats,
> > >          * Next, accumulate buffer usage.  (This must wait for the workers to
> > >          * finish, or we might get incomplete data.)
> > >          */
> > > -       for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > > +       nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
> > > +       for (i = 0; i < nworkers; i++)
> > >                 InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > >
> > > It worked after the above fix.
> > >
> >
> > Good catch.  I think we should not even call
> > WaitForParallelWorkersToFinish for such a case.  So, I guess the fix
> > could be,
> >
> > if (workers > 0)
> > {
> > WaitForParallelWorkersToFinish();
> > for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> >                  InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > }
> >
>
> Agreed. I've attached the updated patch.
>
> Thank you for testing, Dilip!

Thanks!  One hunk is failing on the latest head.  And, I have rebased
the patch for my testing so posting the same.  I have done some more
testing to test multi-pass vacuum.

postgres[114321]=# show maintenance_work_mem ;
 maintenance_work_mem
----------------------
 1MB
(1 row)

--Test case
select pg_stat_statements_reset();
drop table test;
CREATE TABLE test (a int, b int);
CREATE INDEX idx1 on test(a);
CREATE INDEX idx2 on test(b);
INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
DELETE FROM test where a%2=0;
VACUUM (PARALLEL n) test;
select query, total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query like 'VACUUM%';

          query           | total_time  | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

--------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
 VACUUM (PARALLEL 0) test | 5964.282408 |           92447 |
    6 |           92453 |               19789 |                   0


          query           |     total_time     | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

--------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
 VACUUM (PARALLEL 1) test | 3957.7658810000003 |           92447 |
           6 |           92453 |               19789 |
  0
(1 row)

So I am getting correct results with the multi-pass vacuum.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> > Agreed. I've attached the updated patch.
> >
> > Thank you for testing, Dilip!
>
> Thanks!  One hunk is failing on the latest head.  And, I have rebased
> the patch for my testing so posting the same.  I have done some more
> testing to test multi-pass vacuum.
>

The patch looks good to me.  I have done a few minor modifications (a)
moved the declaration of variable closer to where it is used, (b)
changed a comment, (c) ran pgindent.  I have also done some additional
testing with more number of indexes and found that vacuum and parallel
vacuum used the same number of total_read_blks and that is what is
expected here.

Let me know what you think of the attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Wed, Apr 1, 2020 at 8:26 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > While testing I have found one issue.  Basically, during a parallel
> > > > vacuum, it was showing more number of
> > > > shared_blk_hits+shared_blks_read.  After, some investigation, I found
> > > > that during the cleanup phase nworkers are -1, and because of this we
> > > > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the
> > > > old launched worker count and shared memory also had old buffer read
> > > > data which was never updated as we did not try to launch the worker.
> > > >
> > > > diff --git a/src/backend/access/heap/vacuumlazy.c
> > > > b/src/backend/access/heap/vacuumlazy.c
> > > > index b97b678..5dfaf4d 100644
> > > > --- a/src/backend/access/heap/vacuumlazy.c
> > > > +++ b/src/backend/access/heap/vacuumlazy.c
> > > > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> > > > IndexBulkDeleteResult **stats,
> > > >          * Next, accumulate buffer usage.  (This must wait for the workers to
> > > >          * finish, or we might get incomplete data.)
> > > >          */
> > > > -       for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > > > +       nworkers = Min(nworkers, lps->pcxt->nworkers_launched);
> > > > +       for (i = 0; i < nworkers; i++)
> > > >                 InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > > >
> > > > It worked after the above fix.
> > > >
> > >
> > > Good catch.  I think we should not even call
> > > WaitForParallelWorkersToFinish for such a case.  So, I guess the fix
> > > could be,
> > >
> > > if (workers > 0)
> > > {
> > > WaitForParallelWorkersToFinish();
> > > for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > >                  InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > > }
> > >
> >
> > Agreed. I've attached the updated patch.
> >
> > Thank you for testing, Dilip!
>
> Thanks!  One hunk is failing on the latest head.  And, I have rebased
> the patch for my testing so posting the same.  I have done some more
> testing to test multi-pass vacuum.
>
> postgres[114321]=# show maintenance_work_mem ;
>  maintenance_work_mem
> ----------------------
>  1MB
> (1 row)
>
> --Test case
> select pg_stat_statements_reset();
> drop table test;
> CREATE TABLE test (a int, b int);
> CREATE INDEX idx1 on test(a);
> CREATE INDEX idx2 on test(b);
> INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
> DELETE FROM test where a%2=0;
> VACUUM (PARALLEL n) test;
> select query, total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query like 'VACUUM%';
>
>           query           | total_time  | shared_blks_hit |
> shared_blks_read | total_read_blks | shared_blks_dirtied |
> shared_blks_written
>
--------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
>  VACUUM (PARALLEL 0) test | 5964.282408 |           92447 |
>     6 |           92453 |               19789 |                   0
>
>
>           query           |     total_time     | shared_blks_hit |
> shared_blks_read | total_read_blks | shared_blks_dirtied |
> shared_blks_written
>
--------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
>  VACUUM (PARALLEL 1) test | 3957.7658810000003 |           92447 |
>            6 |           92453 |               19789 |
>   0
> (1 row)
>
> So I am getting correct results with the multi-pass vacuum.

I have done some testing for the parallel "create index".

postgres[99536]=# show maintenance_work_mem ;
 maintenance_work_mem
----------------------
 1MB
(1 row)

CREATE TABLE test (a int, b int);
INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
CREATE INDEX idx1 on test(a);
select query, total_time, shared_blks_hit, shared_blks_read,
shared_blks_hit + shared_blks_read as total_read_blks,
shared_blks_dirtied, shared_blks_written from pg_stat_statements where
query like 'CREATE INDEX%';


SET max_parallel_maintenance_workers TO 0;
            query             |     total_time     | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) | 1947.4959979999999 |            8947 |
              11 |            8958 |                   5 |
      0

SET max_parallel_maintenance_workers TO 2;

            query             |     total_time     | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) | 1942.1426040000001 |            8960 |
              14 |            8974 |                   5 |
      0
(1 row)

I have noticed that the total_read_blks, with the parallel, create
index is more compared to non-parallel one.  I have created a fresh
database before each run.  I am not much aware of the internal code of
parallel create an index so I am not sure whether it is expected to
read extra blocks with the parallel create an index.  I guess maybe
because multiple workers are inserting int the btree they might need
to visit some btree nodes multiple times while traversing the tree
down.  But, it's better if someone who have more idea with this code
can confirm this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
(Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
previously mentionned changes.

Note that I'm only attaching those patches for convenience and to make sure
that cfbot is happy.

Attachment

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
> (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
> previously mentionned changes.
>

Few other comments:
v9-0003-Add-infrastructure-to-track-WAL-usage
1.
 static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
-
+static void WalUsageAdd(WalUsage *dst, WalUsage *add);

Looks like a spurious line removal

2.
+ /* Report a full page imsage constructed for the WAL record */
+ *num_fpw += 1;

Typo. /imsage/image

3. Doing some testing with and without parallelism to ensure WAL usage
data is correct would be great and if possible, share the results?

v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
4.
+-- SELECT usage data, check WAL usage is reported, wal_records equal
rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";
+                              query                               |
calls | rows | wal_bytes_generated | wal_records_generated |
wal_records_as_rows

+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
+ DELETE FROM pgss_test WHERE a > $1                               |
  1 |    1 | t                   | t                     | t
+ DROP TABLE pgss_test                                             |
  1 |    0 | t                   | t                     | f
+ INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
  1 |    3 | t                   | t                     | t
+ INSERT INTO pgss_test VALUES(generate_series($1, $2), $3)        |
  1 |   10 | t                   | t                     | t
+ SELECT * FROM pgss_test ORDER BY a                               |
  1 |   12 | f                   | f                     | f
+ SELECT * FROM pgss_test WHERE a > $1 ORDER BY a                  |
  2 |    4 | f                   | f                     | f
+ SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5)          |
  1 |    8 | f                   | f                     | f
+ SELECT pg_stat_statements_reset()                                |
  1 |    1 | f                   | f                     | f
+ SET pg_stat_statements.track_utility = FALSE                     |
  1 |    0 | f                   | f                     | t
+ UPDATE pgss_test SET b = $1 WHERE a = $2                         |
  6 |    6 | t                   | t                     | t
+ UPDATE pgss_test SET b = $1 WHERE a > $2                         |
  1 |    3 | t                   | t                     | t
+(11 rows)
+

I am not sure if the above tests make much sense as they are just
testing that if WAL is generated for these commands.  I understand it
is not easy to make these tests reliable but in that case, we can
think of some simple tests.  It seems to me that the difficulty is due
to full_page_writes as that depends on the checkpoint.  Can we make
full_page_writes = off for these tests and check some simple
Insert/Update/Delete cases?  Alternatively, if you can present the
reason why that is unstable or are tricky to write, then we can simply
get rid of these tests because I don't see tests for BufferUsage.  Let
not write tests for the sake of writing it unless they can detect bugs
in the future or are meaningfully covering the new code added.

5.
-SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
-               query               | calls | rows
------------------------------------+-------+------
- SELECT $1::TEXT                   |     1 |    1
- SELECT PLUS_ONE($1)               |     2 |    2
- SELECT PLUS_TWO($1)               |     2 |    2
- SELECT pg_stat_statements_reset() |     1 |    1
+SELECT query, calls, rows, wal_bytes, wal_records FROM
pg_stat_statements ORDER BY query COLLATE "C";
+               query               | calls | rows | wal_bytes | wal_records
+-----------------------------------+-------+------+-----------+-------------
+ SELECT $1::TEXT                   |     1 |    1 |         0 |           0
+ SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
+ SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
+ SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
 (4 rows)

Again, I am not sure if these modifications make much sense?

6.
 static void pgss_shmem_startup(void);
@@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
     int query_location, int query_len,
     double total_time, uint64 rows,
     const BufferUsage *bufusage,
+    const WalUsage* walusage,
     pgssJumbleState *jstate);

The alignment for walusage doesn't seem to be correct. Running
pgindent will fix this.

7.
+ values[i++] = Int64GetDatumFast(tmp.wal_records);
+ values[i++] = UInt64GetDatum(tmp.wal_num_fpw);

Why are they different?  I think we should use the same *GetDatum API
(probably Int64GetDatumFast) for these.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, Apr 1, 2020 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
>

One more comment related to this patch.
+
+ snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
+
+ /* Convert to numeric. */
+ wal_bytes = DirectFunctionCall3(numeric_in,
+ CStringGetDatum(buf),
+ ObjectIdGetDatum(0),
+ Int32GetDatum(-1));
+
+ values[i++] = wal_bytes;

I see that other places that display uint64 values use BIGINT datatype
in SQL, so why can't we do the same here?  See the usage of queryid in
pg_stat_statements or internal_pages, *_pages exposed via
pgstatindex.c.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Apr 1, 2020 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > > Agreed. I've attached the updated patch.
> > >
> > > Thank you for testing, Dilip!
> >
> > Thanks!  One hunk is failing on the latest head.  And, I have rebased
> > the patch for my testing so posting the same.  I have done some more
> > testing to test multi-pass vacuum.
> >
>
> The patch looks good to me.  I have done a few minor modifications (a)
> moved the declaration of variable closer to where it is used, (b)
> changed a comment, (c) ran pgindent.  I have also done some additional
> testing with more number of indexes and found that vacuum and parallel
> vacuum used the same number of total_read_blks and that is what is
> expected here.
>
> Let me know what you think of the attached?

The patch looks fine to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Wed, Apr 1, 2020 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 1, 2020 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> >
>
> One more comment related to this patch.
> +
> + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
> +
> + /* Convert to numeric. */
> + wal_bytes = DirectFunctionCall3(numeric_in,
> + CStringGetDatum(buf),
> + ObjectIdGetDatum(0),
> + Int32GetDatum(-1));
> +
> + values[i++] = wal_bytes;
>
> I see that other places that display uint64 values use BIGINT datatype
> in SQL, so why can't we do the same here?  See the usage of queryid in
> pg_stat_statements or internal_pages, *_pages exposed via
> pgstatindex.c.

I have reviewed 0003 and 0004,  I have a few comments.
v9-0003-Add-infrastructure-to-track-WAL-usage

1.
  /* Points to buffer usage area in DSM */
  BufferUsage *buffer_usage;
+ /* Points to WAL usage area in DSM */
+ WalUsage   *wal_usage;

Better to give one blank line between the previous statement/variable
declaration and the next comment line.

  /* Points to buffer usage area in DSM */
  BufferUsage *buffer_usage;
---------Empty line here--------------------
+ /* Points to WAL usage area in DSM */
+ WalUsage   *wal_usage;

2.
@@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
IndexBulkDeleteResult **stats,
  WaitForParallelWorkersToFinish(lps->pcxt);

  for (i = 0; i < lps->pcxt->nworkers_launched; i++)
- InstrAccumParallelQuery(&lps->buffer_usage[i]);
+ InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
  }

The existing comment above this loop, which just mentions the buffer
usage, not the wal usage so I guess we need to change that.
" /*
* Next, accumulate buffer usage.  (This must wait for the workers to
* finish, or we might get incomplete data.)
*/"


v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut

3.
+ if (usage->wal_num_fpw > 0)
+ appendStringInfo(es->str, " full page records=%ld",
+    usage->wal_num_fpw);
+ if (usage->wal_bytes > 0)
+ appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
+    usage->wal_bytes);

Shall we change to 'full page writes' or 'full page image' instead of
full page records?

Apart from this, I have some testing to see the wal_usage with the
parallel vacuum and the results look fine.

postgres[104248]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[104248]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[104248]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[104248]=# VACUUM (PARALLEL 1) test;
VACUUM
postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
          query           | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
 VACUUM (PARALLEL 1) test |  72814331 |        8857 |        8855



postgres[106479]=# CREATE TABLE test (a int, b int);
CREATE TABLE
postgres[106479]=# INSERT INTO test SELECT i, i FROM
GENERATE_SERIES(1,2000000) as i;
INSERT 0 2000000
postgres[106479]=# CREATE INDEX idx1 on test(a);
CREATE INDEX
postgres[106479]=# VACUUM (PARALLEL 0) test;
VACUUM
postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
from pg_stat_statements where query like 'VACUUM%';
          query           | wal_bytes | wal_records | wal_num_fpw
--------------------------+-----------+-------------+-------------
 VACUUM (PARALLEL 0) test |  72814331 |        8857 |        8855

By tomorrow, I will try to finish reviewing 0005 and 0006.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
Hi,

I'm replying here to all reviews that have been sent, thanks a lot!

On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
> > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
> > previously mentionned changes.
> >
> 
> Few other comments:
> v9-0003-Add-infrastructure-to-track-WAL-usage
> 1.
>  static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
> -
> +static void WalUsageAdd(WalUsage *dst, WalUsage *add);
> 
> Looks like a spurious line removal


Fixed.


> 2.
> + /* Report a full page imsage constructed for the WAL record */
> + *num_fpw += 1;
> 
> Typo. /imsage/image


Ah sorry I though I fixed it previously, fixed.


> 3. Doing some testing with and without parallelism to ensure WAL usage
> data is correct would be great and if possible, share the results?


I just saw that Dilip did some testing, but just in case here is some
additional one

- vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"

=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
         query          | calls | wal_bytes | wal_records | wal_num_fpw
------------------------+-------+-----------+-------------+-------------
 vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
 vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
(2 rows)

- create index, overload t1's parallel_workers, using the 1M line just
  vacuumed:

=# alter table t1 set (parallel_workers = 2);
ALTER TABLE

=# create index t1_parallel_2 on t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 0);
ALTER TABLE

=# create index t1_parallel_0 on t1(id);
CREATE INDEX

=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
                query                 | calls | wal_bytes | wal_records | wal_num_fpw
--------------------------------------+-------+-----------+-------------+-------------
 create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
 create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
(2 rows)

It all looks good to me.


> v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> 4.
> +-- SELECT usage data, check WAL usage is reported, wal_records equal
> rows count for INSERT/UPDATE/DELETE
> +SELECT query, calls, rows,
> +wal_bytes > 0 as wal_bytes_generated,
> +wal_records > 0 as wal_records_generated,
> +wal_records = rows as wal_records_as_rows
> +FROM pg_stat_statements ORDER BY query COLLATE "C";
> +                              query                               |
> calls | rows | wal_bytes_generated | wal_records_generated |
> wal_records_as_rows
>
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
> + DELETE FROM pgss_test WHERE a > $1                               |
>   1 |    1 | t                   | t                     | t
> + DROP TABLE pgss_test                                             |
>   1 |    0 | t                   | t                     | f
> + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
>   1 |    3 | t                   | t                     | t
> + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3)        |
>   1 |   10 | t                   | t                     | t
> + SELECT * FROM pgss_test ORDER BY a                               |
>   1 |   12 | f                   | f                     | f
> + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a                  |
>   2 |    4 | f                   | f                     | f
> + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5)          |
>   1 |    8 | f                   | f                     | f
> + SELECT pg_stat_statements_reset()                                |
>   1 |    1 | f                   | f                     | f
> + SET pg_stat_statements.track_utility = FALSE                     |
>   1 |    0 | f                   | f                     | t
> + UPDATE pgss_test SET b = $1 WHERE a = $2                         |
>   6 |    6 | t                   | t                     | t
> + UPDATE pgss_test SET b = $1 WHERE a > $2                         |
>   1 |    3 | t                   | t                     | t
> +(11 rows)
> +
> 
> I am not sure if the above tests make much sense as they are just
> testing that if WAL is generated for these commands.  I understand it
> is not easy to make these tests reliable but in that case, we can
> think of some simple tests.  It seems to me that the difficulty is due
> to full_page_writes as that depends on the checkpoint.  Can we make
> full_page_writes = off for these tests and check some simple
> Insert/Update/Delete cases?  Alternatively, if you can present the
> reason why that is unstable or are tricky to write, then we can simply
> get rid of these tests because I don't see tests for BufferUsage.  Let
> not write tests for the sake of writing it unless they can detect bugs
> in the future or are meaningfully covering the new code added.


I don't think that we can have any hope in a stable amount of WAL bytes
generated, so testing a positive number looks sensible to me.  Then testing
that each 1-line-write query generates a WAL record also looks sensible, so I
kept this.  I realized that Kirill used an existing set of queries that were
previously added to validate the multi queries commands behavior, so there's no
need to have all of them again.  I just kept one of each (insert, update,
delete, select) to make sure that we do record WAL activity there, but I don't
think that more can really be done.  I still think that this is better than
nothing, but if you disagree feel free to drop those tests.


> 5.
> -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> -               query               | calls | rows
> ------------------------------------+-------+------
> - SELECT $1::TEXT                   |     1 |    1
> - SELECT PLUS_ONE($1)               |     2 |    2
> - SELECT PLUS_TWO($1)               |     2 |    2
> - SELECT pg_stat_statements_reset() |     1 |    1
> +SELECT query, calls, rows, wal_bytes, wal_records FROM
> pg_stat_statements ORDER BY query COLLATE "C";
> +               query               | calls | rows | wal_bytes | wal_records
> +-----------------------------------+-------+------+-----------+-------------
> + SELECT $1::TEXT                   |     1 |    1 |         0 |           0
> + SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
> + SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
> + SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
>  (4 rows)
> 
> Again, I am not sure if these modifications make much sense?


Those are queries that were previously executed.  As those are read-only query,
that are pretty much guaranteed to not cause any WAL activity, I don't see how
it hurts to test at the same time that that's we indeed record with
pg_stat_statements, just to be safe.  Once again, feel free to drop the extra
wal_* columns from the output if you disagree.


> 6.
>  static void pgss_shmem_startup(void);
> @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
>      int query_location, int query_len,
>      double total_time, uint64 rows,
>      const BufferUsage *bufusage,
> +    const WalUsage* walusage,
>      pgssJumbleState *jstate);
> 
> The alignment for walusage doesn't seem to be correct. Running
> pgindent will fix this.


Indeed, fixed.

> 7.
> + values[i++] = Int64GetDatumFast(tmp.wal_records);
> + values[i++] = UInt64GetDatum(tmp.wal_num_fpw);
> 
> Why are they different?  I think we should use the same *GetDatum API
> (probably Int64GetDatumFast) for these.


Oops, that's a mistake from when I was working on the wal_bytes output, fixed.

> > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> >
>
> One more comment related to this patch.
> +
> + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
> +
> + /* Convert to numeric. */
> + wal_bytes = DirectFunctionCall3(numeric_in,
> + CStringGetDatum(buf),
> + ObjectIdGetDatum(0),
> + Int32GetDatum(-1));
> +
> + values[i++] = wal_bytes;
>
> I see that other places that display uint64 values use BIGINT datatype
> in SQL, so why can't we do the same here?  See the usage of queryid in
> pg_stat_statements or internal_pages, *_pages exposed via
> pgstatindex.c.


That's because it's harmless to report a signed number for a hash (at least
comapred to the overhead of having it unsigned), while that's certainly not
wanted to report a negative amount of WAL bytes generated if it goes beyond
bigint limit.  See the usage of pg_lsn_mi in pg_lsn.c for instance.

On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote:
>
> I have reviewed 0003 and 0004,  I have a few comments.
> v9-0003-Add-infrastructure-to-track-WAL-usage
>
> 1.
>   /* Points to buffer usage area in DSM */
>   BufferUsage *buffer_usage;
> + /* Points to WAL usage area in DSM */
> + WalUsage   *wal_usage;
>
> Better to give one blank line between the previous statement/variable
> declaration and the next comment line.


Fixed.


> 2.
> @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> IndexBulkDeleteResult **stats,
>   WaitForParallelWorkersToFinish(lps->pcxt);
>
>   for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> - InstrAccumParallelQuery(&lps->buffer_usage[i]);
> + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
>   }
>
> The existing comment above this loop, which just mentions the buffer
> usage, not the wal usage so I guess we need to change that.


Ah indeed, I thought I caught all the comments but missed this one.  Fixed.


> v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
>
> 3.
> + if (usage->wal_num_fpw > 0)
> + appendStringInfo(es->str, " full page records=%ld",
> +    usage->wal_num_fpw);
> + if (usage->wal_bytes > 0)
> + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> +    usage->wal_bytes);
>
> Shall we change to 'full page writes' or 'full page image' instead of
> full page records?


Indeed, I changed it in the (auto)vacuum output but missed this one.  Fixed.


> Apart from this, I have some testing to see the wal_usage with the
> parallel vacuum and the results look fine.
>
> postgres[104248]=# CREATE TABLE test (a int, b int);
> CREATE TABLE
> postgres[104248]=# INSERT INTO test SELECT i, i FROM
> GENERATE_SERIES(1,2000000) as i;
> INSERT 0 2000000
> postgres[104248]=# CREATE INDEX idx1 on test(a);
> CREATE INDEX
> postgres[104248]=# VACUUM (PARALLEL 1) test;
> VACUUM
> postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
> from pg_stat_statements where query like 'VACUUM%';
>           query           | wal_bytes | wal_records | wal_num_fpw
> --------------------------+-----------+-------------+-------------
>  VACUUM (PARALLEL 1) test |  72814331 |        8857 |        8855
>
>
>
> postgres[106479]=# CREATE TABLE test (a int, b int);
> CREATE TABLE
> postgres[106479]=# INSERT INTO test SELECT i, i FROM
> GENERATE_SERIES(1,2000000) as i;
> INSERT 0 2000000
> postgres[106479]=# CREATE INDEX idx1 on test(a);
> CREATE INDEX
> postgres[106479]=# VACUUM (PARALLEL 0) test;
> VACUUM
> postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
> from pg_stat_statements where query like 'VACUUM%';
>           query           | wal_bytes | wal_records | wal_num_fpw
> --------------------------+-----------+-------------+-------------
>  VACUUM (PARALLEL 0) test |  72814331 |        8857 |        8855


Thanks!  I did some similar testing, with also seq/parallel index creation and
got similar results.


> By tomorrow, I will try to finish reviewing 0005 and 0006.

Thanks!

Attachment
Adding Peter G.

On Wed, Apr 1, 2020 at 12:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I have done some testing for the parallel "create index".
>
> postgres[99536]=# show maintenance_work_mem ;
>  maintenance_work_mem
> ----------------------
>  1MB
> (1 row)
>
> CREATE TABLE test (a int, b int);
> INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i;
> CREATE INDEX idx1 on test(a);
> select query, total_time, shared_blks_hit, shared_blks_read,
> shared_blks_hit + shared_blks_read as total_read_blks,
> shared_blks_dirtied, shared_blks_written from pg_stat_statements where
> query like 'CREATE INDEX%';
>
>
> SET max_parallel_maintenance_workers TO 0;
>             query             |     total_time     | shared_blks_hit |
> shared_blks_read | total_read_blks | shared_blks_dirtied |
> shared_blks_written
>
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
>  CREATE INDEX idx1 on test(a) | 1947.4959979999999 |            8947 |
>               11 |            8958 |                   5 |
>       0
>
> SET max_parallel_maintenance_workers TO 2;
>
>             query             |     total_time     | shared_blks_hit |
> shared_blks_read | total_read_blks | shared_blks_dirtied |
> shared_blks_written
>
------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+---------------------
>  CREATE INDEX idx1 on test(a) | 1942.1426040000001 |            8960 |
>               14 |            8974 |                   5 |
>       0
> (1 row)
>
> I have noticed that the total_read_blks, with the parallel, create
> index is more compared to non-parallel one.  I have created a fresh
> database before each run.  I am not much aware of the internal code of
> parallel create an index so I am not sure whether it is expected to
> read extra blocks with the parallel create an index.  I guess maybe
> because multiple workers are inserting int the btree they might need
> to visit some btree nodes multiple times while traversing the tree
> down.  But, it's better if someone who have more idea with this code
> can confirm this.
>

Peter, Is this behavior expected?

Let me summarize the situation so that it would be easier for Peter to
comment.  Julien has noticed that parallel vacuum and parallel create
index doesn't seem to report correct values for buffer usage stats.
Sawada-San wrote a patch to fix the problem for both the cases.  We
expect that 'total_read_blks' as reported in pg_stat_statements should
give the same value for parallel and non-parallel operations.  We see
that is true for parallel vacuum and previously we have the same
observation for the parallel query. Now, for parallel create index
this doesn't seem to be true as test results by Dilip show that.  We
have two possibilities here (a) there is some bug in Sawada-San's
patch or (b) this is expected behavior for parallel create index.
What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> Peter, Is this behavior expected?
>
> Let me summarize the situation so that it would be easier for Peter to
> comment.  Julien has noticed that parallel vacuum and parallel create
> index doesn't seem to report correct values for buffer usage stats.
> Sawada-San wrote a patch to fix the problem for both the cases.  We
> expect that 'total_read_blks' as reported in pg_stat_statements should
> give the same value for parallel and non-parallel operations.  We see
> that is true for parallel vacuum and previously we have the same
> observation for the parallel query. Now, for parallel create index
> this doesn't seem to be true as test results by Dilip show that.  We
> have two possibilities here (a) there is some bug in Sawada-San's
> patch or (b) this is expected behavior for parallel create index.
> What do you think?

nbtree CREATE INDEX doesn't even go through the buffer manager. The
difference that Dilip showed is probably due to extra catalog accesses
in the two parallel workers -- pg_amproc lookups, and the like. Those
are rather small differences, overall.

Can Dilip demonstrate the the "extra" buffer accesses are
proportionate to the number of workers launched in some constant,
predictable way?

-- 
Peter Geoghegan



On Thu, Apr 2, 2020 at 8:34 AM Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Peter, Is this behavior expected?
> >
> > Let me summarize the situation so that it would be easier for Peter to
> > comment.  Julien has noticed that parallel vacuum and parallel create
> > index doesn't seem to report correct values for buffer usage stats.
> > Sawada-San wrote a patch to fix the problem for both the cases.  We
> > expect that 'total_read_blks' as reported in pg_stat_statements should
> > give the same value for parallel and non-parallel operations.  We see
> > that is true for parallel vacuum and previously we have the same
> > observation for the parallel query. Now, for parallel create index
> > this doesn't seem to be true as test results by Dilip show that.  We
> > have two possibilities here (a) there is some bug in Sawada-San's
> > patch or (b) this is expected behavior for parallel create index.
> > What do you think?
>
> nbtree CREATE INDEX doesn't even go through the buffer manager.

Thanks for clarifying.  So IIUC, it will not go through the buffer
manager for the index pages,  but for the heap pages, it will still go
through the buffer manager.

> The
> difference that Dilip showed is probably due to extra catalog accesses
> in the two parallel workers -- pg_amproc lookups, and the like. Those
> are rather small differences, overall.

> Can Dilip demonstrate the the "extra" buffer accesses are
> proportionate to the number of workers launched in some constant,
> predictable way?

Okay, I will test this.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Thu, Apr 2, 2020 at 9:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 8:34 AM Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Peter, Is this behavior expected?
> > >
> > > Let me summarize the situation so that it would be easier for Peter to
> > > comment.  Julien has noticed that parallel vacuum and parallel create
> > > index doesn't seem to report correct values for buffer usage stats.
> > > Sawada-San wrote a patch to fix the problem for both the cases.  We
> > > expect that 'total_read_blks' as reported in pg_stat_statements should
> > > give the same value for parallel and non-parallel operations.  We see
> > > that is true for parallel vacuum and previously we have the same
> > > observation for the parallel query. Now, for parallel create index
> > > this doesn't seem to be true as test results by Dilip show that.  We
> > > have two possibilities here (a) there is some bug in Sawada-San's
> > > patch or (b) this is expected behavior for parallel create index.
> > > What do you think?
> >
> > nbtree CREATE INDEX doesn't even go through the buffer manager.
>
> Thanks for clarifying.  So IIUC, it will not go through the buffer
> manager for the index pages,  but for the heap pages, it will still go
> through the buffer manager.
>
> > The
> > difference that Dilip showed is probably due to extra catalog accesses
> > in the two parallel workers -- pg_amproc lookups, and the like. Those
> > are rather small differences, overall.
>
> > Can Dilip demonstrate the the "extra" buffer accesses are
> > proportionate to the number of workers launched in some constant,
> > predictable way?
>
> Okay, I will test this.

0-worker
           query             | total_time  | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) | 1228.895057 |            8947 |
       11 |            8971 |                   5 |
0

1-worker
            query             | total_time  | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) | 1006.157231 |            8962 |
       12 |            8974 |                   5 |
0

2-workers
            query             | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) |  949.44663 |            8965 |
      12 |            8977 |                   5 |                   0

3-workers
            query             | total_time  | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+-------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) | 1037.297196 |            8968 |
       12 |            8980 |                   5 |
0

4-workers
            query             | total_time | shared_blks_hit |
shared_blks_read | total_read_blks | shared_blks_dirtied |
shared_blks_written

------------------------------+------------+-----------------+------------------+-----------------+---------------------+---------------------
 CREATE INDEX idx1 on test(a) | 889.332782 |            8971 |
      12 |            8983 |                   6 |                   0

You are right, it is increasing with some constant factor.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > 3. Doing some testing with and without parallelism to ensure WAL usage
> > data is correct would be great and if possible, share the results?
>
>
> I just saw that Dilip did some testing, but just in case here is some
> additional one
>
> - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
>          query          | calls | wal_bytes | wal_records | wal_num_fpw
> ------------------------+-------+-----------+-------------+-------------
>  vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
>  vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
> (2 rows)
>
> - create index, overload t1's parallel_workers, using the 1M line just
>   vacuumed:
>
> =# alter table t1 set (parallel_workers = 2);
> ALTER TABLE
>
> =# create index t1_parallel_2 on t1(id);
> CREATE INDEX
>
> =# alter table t1 set (parallel_workers = 0);
> ALTER TABLE
>
> =# create index t1_parallel_0 on t1(id);
> CREATE INDEX
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
>                 query                 | calls | wal_bytes | wal_records | wal_num_fpw
> --------------------------------------+-------+-----------+-------------+-------------
>  create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
>  create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
> (2 rows)
>
> It all looks good to me.
>

Here the wal_num_fpw and wal_bytes are different between parallel and
non-parallel versions.  Is it due to checkpoint or something else?  We
can probably rule out checkpoint by increasing checkpoint_timeout and
other checkpoint related parameters.

>
> > 5.
> > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> > -               query               | calls | rows
> > ------------------------------------+-------+------
> > - SELECT $1::TEXT                   |     1 |    1
> > - SELECT PLUS_ONE($1)               |     2 |    2
> > - SELECT PLUS_TWO($1)               |     2 |    2
> > - SELECT pg_stat_statements_reset() |     1 |    1
> > +SELECT query, calls, rows, wal_bytes, wal_records FROM
> > pg_stat_statements ORDER BY query COLLATE "C";
> > +               query               | calls | rows | wal_bytes | wal_records
> > +-----------------------------------+-------+------+-----------+-------------
> > + SELECT $1::TEXT                   |     1 |    1 |         0 |           0
> > + SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
> > + SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
> > + SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
> >  (4 rows)
> >
> > Again, I am not sure if these modifications make much sense?
>
>
> Those are queries that were previously executed.  As those are read-only query,
> that are pretty much guaranteed to not cause any WAL activity, I don't see how
> it hurts to test at the same time that that's we indeed record with
> pg_stat_statements, just to be safe.
>

On a similar theory, one could have checked bufferusage stats as well.
The statements are using some expressions so don't see any value in
check all usage data for such statements.

>  Once again, feel free to drop the extra
> wal_* columns from the output if you disagree.
>

Right now, that particular patch is not getting applied (probably due
to recent commit 17e0328224).  Can you rebase it?

>
>
> > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
> >
> > 3.
> > + if (usage->wal_num_fpw > 0)
> > + appendStringInfo(es->str, " full page records=%ld",
> > +    usage->wal_num_fpw);
> > + if (usage->wal_bytes > 0)
> > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> > +    usage->wal_bytes);
> >
> > Shall we change to 'full page writes' or 'full page image' instead of
> > full page records?
>
>
> Indeed, I changed it in the (auto)vacuum output but missed this one.  Fixed.
>

I don't see this change in the patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 2, 2020 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>

Also, I forgot to mention that let's not base this on buffer usage
patch for create index
(v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
per recent discussion I am not sure about its usefulness.  I think we
can proceed with this patch without
v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Hi,
>
> I'm replying here to all reviews that have been sent, thanks a lot!
>
> On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes
> > > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all
> > > previously mentionned changes.
> > >
> >
> > Few other comments:
> > v9-0003-Add-infrastructure-to-track-WAL-usage
> > 1.
> >  static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add);
> > -
> > +static void WalUsageAdd(WalUsage *dst, WalUsage *add);
> >
> > Looks like a spurious line removal
>
>
> Fixed.
>
>
> > 2.
> > + /* Report a full page imsage constructed for the WAL record */
> > + *num_fpw += 1;
> >
> > Typo. /imsage/image
>
>
> Ah sorry I though I fixed it previously, fixed.
>
>
> > 3. Doing some testing with and without parallelism to ensure WAL usage
> > data is correct would be great and if possible, share the results?
>
>
> I just saw that Dilip did some testing, but just in case here is some
> additional one
>
> - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
>          query          | calls | wal_bytes | wal_records | wal_num_fpw
> ------------------------+-------+-----------+-------------+-------------
>  vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
>  vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
> (2 rows)
>
> - create index, overload t1's parallel_workers, using the 1M line just
>   vacuumed:
>
> =# alter table t1 set (parallel_workers = 2);
> ALTER TABLE
>
> =# create index t1_parallel_2 on t1(id);
> CREATE INDEX
>
> =# alter table t1 set (parallel_workers = 0);
> ALTER TABLE
>
> =# create index t1_parallel_0 on t1(id);
> CREATE INDEX
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
>                 query                 | calls | wal_bytes | wal_records | wal_num_fpw
> --------------------------------------+-------+-----------+-------------+-------------
>  create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
>  create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
> (2 rows)
>
> It all looks good to me.
>
>
> > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> > 4.
> > +-- SELECT usage data, check WAL usage is reported, wal_records equal
> > rows count for INSERT/UPDATE/DELETE
> > +SELECT query, calls, rows,
> > +wal_bytes > 0 as wal_bytes_generated,
> > +wal_records > 0 as wal_records_generated,
> > +wal_records = rows as wal_records_as_rows
> > +FROM pg_stat_statements ORDER BY query COLLATE "C";
> > +                              query                               |
> > calls | rows | wal_bytes_generated | wal_records_generated |
> > wal_records_as_rows
> >
+------------------------------------------------------------------+-------+------+---------------------+-----------------------+---------------------
> > + DELETE FROM pgss_test WHERE a > $1                               |
> >   1 |    1 | t                   | t                     | t
> > + DROP TABLE pgss_test                                             |
> >   1 |    0 | t                   | t                     | f
> > + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) |
> >   1 |    3 | t                   | t                     | t
> > + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3)        |
> >   1 |   10 | t                   | t                     | t
> > + SELECT * FROM pgss_test ORDER BY a                               |
> >   1 |   12 | f                   | f                     | f
> > + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a                  |
> >   2 |    4 | f                   | f                     | f
> > + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5)          |
> >   1 |    8 | f                   | f                     | f
> > + SELECT pg_stat_statements_reset()                                |
> >   1 |    1 | f                   | f                     | f
> > + SET pg_stat_statements.track_utility = FALSE                     |
> >   1 |    0 | f                   | f                     | t
> > + UPDATE pgss_test SET b = $1 WHERE a = $2                         |
> >   6 |    6 | t                   | t                     | t
> > + UPDATE pgss_test SET b = $1 WHERE a > $2                         |
> >   1 |    3 | t                   | t                     | t
> > +(11 rows)
> > +
> >
> > I am not sure if the above tests make much sense as they are just
> > testing that if WAL is generated for these commands.  I understand it
> > is not easy to make these tests reliable but in that case, we can
> > think of some simple tests.  It seems to me that the difficulty is due
> > to full_page_writes as that depends on the checkpoint.  Can we make
> > full_page_writes = off for these tests and check some simple
> > Insert/Update/Delete cases?  Alternatively, if you can present the
> > reason why that is unstable or are tricky to write, then we can simply
> > get rid of these tests because I don't see tests for BufferUsage.  Let
> > not write tests for the sake of writing it unless they can detect bugs
> > in the future or are meaningfully covering the new code added.
>
>
> I don't think that we can have any hope in a stable amount of WAL bytes
> generated, so testing a positive number looks sensible to me.  Then testing
> that each 1-line-write query generates a WAL record also looks sensible, so I
> kept this.  I realized that Kirill used an existing set of queries that were
> previously added to validate the multi queries commands behavior, so there's no
> need to have all of them again.  I just kept one of each (insert, update,
> delete, select) to make sure that we do record WAL activity there, but I don't
> think that more can really be done.  I still think that this is better than
> nothing, but if you disagree feel free to drop those tests.
>
>
> > 5.
> > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> > -               query               | calls | rows
> > ------------------------------------+-------+------
> > - SELECT $1::TEXT                   |     1 |    1
> > - SELECT PLUS_ONE($1)               |     2 |    2
> > - SELECT PLUS_TWO($1)               |     2 |    2
> > - SELECT pg_stat_statements_reset() |     1 |    1
> > +SELECT query, calls, rows, wal_bytes, wal_records FROM
> > pg_stat_statements ORDER BY query COLLATE "C";
> > +               query               | calls | rows | wal_bytes | wal_records
> > +-----------------------------------+-------+------+-----------+-------------
> > + SELECT $1::TEXT                   |     1 |    1 |         0 |           0
> > + SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
> > + SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
> > + SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
> >  (4 rows)
> >
> > Again, I am not sure if these modifications make much sense?
>
>
> Those are queries that were previously executed.  As those are read-only query,
> that are pretty much guaranteed to not cause any WAL activity, I don't see how
> it hurts to test at the same time that that's we indeed record with
> pg_stat_statements, just to be safe.  Once again, feel free to drop the extra
> wal_* columns from the output if you disagree.
>
>
> > 6.
> >  static void pgss_shmem_startup(void);
> > @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId,
> >      int query_location, int query_len,
> >      double total_time, uint64 rows,
> >      const BufferUsage *bufusage,
> > +    const WalUsage* walusage,
> >      pgssJumbleState *jstate);
> >
> > The alignment for walusage doesn't seem to be correct. Running
> > pgindent will fix this.
>
>
> Indeed, fixed.
>
> > 7.
> > + values[i++] = Int64GetDatumFast(tmp.wal_records);
> > + values[i++] = UInt64GetDatum(tmp.wal_num_fpw);
> >
> > Why are they different?  I think we should use the same *GetDatum API
> > (probably Int64GetDatumFast) for these.
>
>
> Oops, that's a mistake from when I was working on the wal_bytes output, fixed.
>
> > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
> > >
> >
> > One more comment related to this patch.
> > +
> > + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes);
> > +
> > + /* Convert to numeric. */
> > + wal_bytes = DirectFunctionCall3(numeric_in,
> > + CStringGetDatum(buf),
> > + ObjectIdGetDatum(0),
> > + Int32GetDatum(-1));
> > +
> > + values[i++] = wal_bytes;
> >
> > I see that other places that display uint64 values use BIGINT datatype
> > in SQL, so why can't we do the same here?  See the usage of queryid in
> > pg_stat_statements or internal_pages, *_pages exposed via
> > pgstatindex.c.
>
>
> That's because it's harmless to report a signed number for a hash (at least
> comapred to the overhead of having it unsigned), while that's certainly not
> wanted to report a negative amount of WAL bytes generated if it goes beyond
> bigint limit.  See the usage of pg_lsn_mi in pg_lsn.c for instance.
>
> On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote:
> >
> > I have reviewed 0003 and 0004,  I have a few comments.
> > v9-0003-Add-infrastructure-to-track-WAL-usage
> >
> > 1.
> >   /* Points to buffer usage area in DSM */
> >   BufferUsage *buffer_usage;
> > + /* Points to WAL usage area in DSM */
> > + WalUsage   *wal_usage;
> >
> > Better to give one blank line between the previous statement/variable
> > declaration and the next comment line.
>
>
> Fixed.
>
>
> > 2.
> > @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel,
> > IndexBulkDeleteResult **stats,
> >   WaitForParallelWorkersToFinish(lps->pcxt);
> >
> >   for (i = 0; i < lps->pcxt->nworkers_launched; i++)
> > - InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]);
> >   }
> >
> > The existing comment above this loop, which just mentions the buffer
> > usage, not the wal usage so I guess we need to change that.
>
>
> Ah indeed, I thought I caught all the comments but missed this one.  Fixed.
>
>
> > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
> >
> > 3.
> > + if (usage->wal_num_fpw > 0)
> > + appendStringInfo(es->str, " full page records=%ld",
> > +    usage->wal_num_fpw);
> > + if (usage->wal_bytes > 0)
> > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> > +    usage->wal_bytes);
> >
> > Shall we change to 'full page writes' or 'full page image' instead of
> > full page records?
>
>
> Indeed, I changed it in the (auto)vacuum output but missed this one.  Fixed.
>
>
> > Apart from this, I have some testing to see the wal_usage with the
> > parallel vacuum and the results look fine.
> >
> > postgres[104248]=# CREATE TABLE test (a int, b int);
> > CREATE TABLE
> > postgres[104248]=# INSERT INTO test SELECT i, i FROM
> > GENERATE_SERIES(1,2000000) as i;
> > INSERT 0 2000000
> > postgres[104248]=# CREATE INDEX idx1 on test(a);
> > CREATE INDEX
> > postgres[104248]=# VACUUM (PARALLEL 1) test;
> > VACUUM
> > postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw
> > from pg_stat_statements where query like 'VACUUM%';
> >           query           | wal_bytes | wal_records | wal_num_fpw
> > --------------------------+-----------+-------------+-------------
> >  VACUUM (PARALLEL 1) test |  72814331 |        8857 |        8855
> >
> >
> >
> > postgres[106479]=# CREATE TABLE test (a int, b int);
> > CREATE TABLE
> > postgres[106479]=# INSERT INTO test SELECT i, i FROM
> > GENERATE_SERIES(1,2000000) as i;
> > INSERT 0 2000000
> > postgres[106479]=# CREATE INDEX idx1 on test(a);
> > CREATE INDEX
> > postgres[106479]=# VACUUM (PARALLEL 0) test;
> > VACUUM
> > postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw
> > from pg_stat_statements where query like 'VACUUM%';
> >           query           | wal_bytes | wal_records | wal_num_fpw
> > --------------------------+-----------+-------------+-------------
> >  VACUUM (PARALLEL 0) test |  72814331 |        8857 |        8855
>
>
> Thanks!  I did some similar testing, with also seq/parallel index creation and
> got similar results.
>
>
> > By tomorrow, I will try to finish reviewing 0005 and 0006.

I have reviewed these patches and I have a few cosmetic comments.
v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements

1.
+ uint64 wal_bytes; /* total amount of wal bytes written */
+ int64 wal_records; /* # of wal records written */
+ int64 wal_num_fpw; /* # of full page wal records written */


/s/# of full page wal records written / /* # of WAL full page image produced */

2.
 static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
  ProcessUtilityContext context, ParamListInfo params,
  QueryEnvironment *queryEnv,
- DestReceiver *dest, QueryCompletion *qc);
+ DestReceiver *dest, QueryCompletion * qc);

Useless hunk.

3.

v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum

@@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
  {
  ExplainPropertyInteger("WAL records", NULL,
     usage->wal_records, es);
- ExplainPropertyInteger("WAL full page records", NULL,
+ ExplainPropertyInteger("WAL full page writes", NULL,
     usage->wal_num_fpw, es);
Just noticed that in 0004 you have first added "WAL full page
records", which is later corrected to "WAL full page writes" in 0006.
I think we better keep this proper in 0004 itself and avoid this hunk
in 0006, otherwise, it creates confusion while reviewing.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
> On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > > 3. Doing some testing with and without parallelism to ensure WAL usage
> > > data is correct would be great and if possible, share the results?
> >
> >
> > I just saw that Dilip did some testing, but just in case here is some
> > additional one
> >
> > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%';
> >          query          | calls | wal_bytes | wal_records | wal_num_fpw
> > ------------------------+-------+-----------+-------------+-------------
> >  vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
> >  vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
> > (2 rows)
> >
> > - create index, overload t1's parallel_workers, using the 1M line just
> >   vacuumed:
> >
> > =# alter table t1 set (parallel_workers = 2);
> > ALTER TABLE
> >
> > =# create index t1_parallel_2 on t1(id);
> > CREATE INDEX
> >
> > =# alter table t1 set (parallel_workers = 0);
> > ALTER TABLE
> >
> > =# create index t1_parallel_0 on t1(id);
> > CREATE INDEX
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> >                 query                 | calls | wal_bytes | wal_records | wal_num_fpw
> > --------------------------------------+-------+-----------+-------------+-------------
> >  create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
> >  create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
> > (2 rows)
> >
> > It all looks good to me.
> >
> 
> Here the wal_num_fpw and wal_bytes are different between parallel and
> non-parallel versions.  Is it due to checkpoint or something else?  We
> can probably rule out checkpoint by increasing checkpoint_timeout and
> other checkpoint related parameters.

I think this is because I did a checkpoint after the VACUUM tests, so the 1st
CREATE INDEX (with parallelism) induced some FPW on the catalog blocks.  I
didn't try to investigate more since:

On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
>
> Also, I forgot to mention that let's not base this on buffer usage
> patch for create index
> (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
> per recent discussion I am not sure about its usefulness.  I think we
> can proceed with this patch without
> v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.


Which is done in attached v11.


> > > 5.
> > > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
> > > -               query               | calls | rows
> > > ------------------------------------+-------+------
> > > - SELECT $1::TEXT                   |     1 |    1
> > > - SELECT PLUS_ONE($1)               |     2 |    2
> > > - SELECT PLUS_TWO($1)               |     2 |    2
> > > - SELECT pg_stat_statements_reset() |     1 |    1
> > > +SELECT query, calls, rows, wal_bytes, wal_records FROM
> > > pg_stat_statements ORDER BY query COLLATE "C";
> > > +               query               | calls | rows | wal_bytes | wal_records
> > > +-----------------------------------+-------+------+-----------+-------------
> > > + SELECT $1::TEXT                   |     1 |    1 |         0 |           0
> > > + SELECT PLUS_ONE($1)               |     2 |    2 |         0 |           0
> > > + SELECT PLUS_TWO($1)               |     2 |    2 |         0 |           0
> > > + SELECT pg_stat_statements_reset() |     1 |    1 |         0 |           0
> > >  (4 rows)
> > >
> > > Again, I am not sure if these modifications make much sense?
> >
> >
> > Those are queries that were previously executed.  As those are read-only query,
> > that are pretty much guaranteed to not cause any WAL activity, I don't see how
> > it hurts to test at the same time that that's we indeed record with
> > pg_stat_statements, just to be safe.
> >
> 
> On a similar theory, one could have checked bufferusage stats as well.
> The statements are using some expressions so don't see any value in
> check all usage data for such statements.


Dropped.


> Right now, that particular patch is not getting applied (probably due
> to recent commit 17e0328224).  Can you rebase it?


Done.


> > > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut
> > >
> > > 3.
> > > + if (usage->wal_num_fpw > 0)
> > > + appendStringInfo(es->str, " full page records=%ld",
> > > +    usage->wal_num_fpw);
> > > + if (usage->wal_bytes > 0)
> > > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT,
> > > +    usage->wal_bytes);
> > >
> > > Shall we change to 'full page writes' or 'full page image' instead of
> > > full page records?
> >
> >
> > Indeed, I changed it in the (auto)vacuum output but missed this one.  Fixed.
> >
> 
> I don't see this change in the patch.


Yes, as Dilip reported I fixuped the wrong commit, sorry about that.  This
version should now be ok.


On Thu, Apr 02, 2020 at 12:04:32PM +0530, Dilip Kumar wrote:
> On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > > By tomorrow, I will try to finish reviewing 0005 and 0006.
>
> I have reviewed these patches and I have a few cosmetic comments.
> v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements
>
> 1.
> + uint64 wal_bytes; /* total amount of wal bytes written */
> + int64 wal_records; /* # of wal records written */
> + int64 wal_num_fpw; /* # of full page wal records written */
>
>
> /s/# of full page wal records written / /* # of WAL full page image produced */


Done, I also consistently s/wal/WAL/.

>
> 2.
>  static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString,
>   ProcessUtilityContext context, ParamListInfo params,
>   QueryEnvironment *queryEnv,
> - DestReceiver *dest, QueryCompletion *qc);
> + DestReceiver *dest, QueryCompletion * qc);
>
> Useless hunk.


Oops, leftover of a pgindent as QueryCompletion isn't in the typedefs yet.  I
thought I discarded all the useless hunks but missed this one.  Thanks, fixed.


>
> 3.
>
> v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum
>
> @@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage)
>   {
>   ExplainPropertyInteger("WAL records", NULL,
>      usage->wal_records, es);
> - ExplainPropertyInteger("WAL full page records", NULL,
> + ExplainPropertyInteger("WAL full page writes", NULL,
>      usage->wal_num_fpw, es);
> Just noticed that in 0004 you have first added "WAL full page
> records", which is later corrected to "WAL full page writes" in 0006.
> I think we better keep this proper in 0004 itself and avoid this hunk
> in 0006, otherwise, it creates confusion while reviewing.


Oh, I didn't realized that I fixuped the wrong commit.  Fixed.


I also adapted the documentation that mentioned full page records instead of
full page images, and integrated Justin's comment:

> In 0003:
> +       /* Provide WAL update data to the instrumentation */
> Remove "data" ??

so changed to "Report WAL traffic to the instrumentation."

I didn't change the (auto)vacuum output yet (except fixing the s/full page
records/full page writes/ that I previously missed), as it's not clear what the
consensus is yet.  I'll take care of that as soon as we reach to a consensus.

Attachment

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
> > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > > > 3. Doing some testing with and without parallelism to ensure WAL usage
> > > > data is correct would be great and if possible, share the results?
> > >
> > >
> > > I just saw that Dilip did some testing, but just in case here is some
> > > additional one
> > >
> > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
> > >
> > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike
'%vacuum%';
> > >          query          | calls | wal_bytes | wal_records | wal_num_fpw
> > > ------------------------+-------+-----------+-------------+-------------
> > >  vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
> > >  vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
> > > (2 rows)
> > >
> > > - create index, overload t1's parallel_workers, using the 1M line just
> > >   vacuumed:
> > >
> > > =# alter table t1 set (parallel_workers = 2);
> > > ALTER TABLE
> > >
> > > =# create index t1_parallel_2 on t1(id);
> > > CREATE INDEX
> > >
> > > =# alter table t1 set (parallel_workers = 0);
> > > ALTER TABLE
> > >
> > > =# create index t1_parallel_0 on t1(id);
> > > CREATE INDEX
> > >
> > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> > >                 query                 | calls | wal_bytes | wal_records | wal_num_fpw
> > > --------------------------------------+-------+-----------+-------------+-------------
> > >  create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
> > >  create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
> > > (2 rows)
> > >
> > > It all looks good to me.
> > >
> >
> > Here the wal_num_fpw and wal_bytes are different between parallel and
> > non-parallel versions.  Is it due to checkpoint or something else?  We
> > can probably rule out checkpoint by increasing checkpoint_timeout and
> > other checkpoint related parameters.
>
> I think this is because I did a checkpoint after the VACUUM tests, so the 1st
> CREATE INDEX (with parallelism) induced some FPW on the catalog blocks.  I
> didn't try to investigate more since:
>

We need to do this.

> On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
> >
> > Also, I forgot to mention that let's not base this on buffer usage
> > patch for create index
> > (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
> > per recent discussion I am not sure about its usefulness.  I think we
> > can proceed with this patch without
> > v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.
>
>
> Which is done in attached v11.
>

Hmm, I haven't suggested removing the WAL usage from the parallel
create index.   I just told not to use the infrastructure of another
patch.  We bypass the buffer manager but do write WAL.  See
_bt_blwritepage->log_newpage.  So we need to accumulate WAL usage even
if we decide not to do anything about BufferUsage which means we need
to investigate the above inconsistency in wal_num_fpw and wal_bytes
between parallel and non-parallel version.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Apr 02, 2020 at 02:32:07PM +0530, Amit Kapila wrote:
> On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote:
> > > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote:
> > > > > 3. Doing some testing with and without parallelism to ensure WAL usage
> > > > > data is correct would be great and if possible, share the results?
> > > >
> > > >
> > > > I just saw that Dilip did some testing, but just in case here is some
> > > > additional one
> > > >
> > > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id"
> > > >
> > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike
'%vacuum%';
> > > >          query          | calls | wal_bytes | wal_records | wal_num_fpw
> > > > ------------------------+-------+-----------+-------------+-------------
> > > >  vacuum (parallel 3) t1 |     1 |  20098962 |       34104 |           2
> > > >  vacuum (parallel 0) t1 |     1 |  20098962 |       34104 |           2
> > > > (2 rows)
> > > >
> > > > - create index, overload t1's parallel_workers, using the 1M line just
> > > >   vacuumed:
> > > >
> > > > =# alter table t1 set (parallel_workers = 2);
> > > > ALTER TABLE
> > > >
> > > > =# create index t1_parallel_2 on t1(id);
> > > > CREATE INDEX
> > > >
> > > > =# alter table t1 set (parallel_workers = 0);
> > > > ALTER TABLE
> > > >
> > > > =# create index t1_parallel_0 on t1(id);
> > > > CREATE INDEX
> > > >
> > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> > > >                 query                 | calls | wal_bytes | wal_records | wal_num_fpw
> > > > --------------------------------------+-------+-----------+-------------+-------------
> > > >  create index t1_parallel_0 on t1(id) |     1 |  20355540 |        2762 |        2745
> > > >  create index t1_parallel_2 on t1(id) |     1 |  20406811 |        2762 |        2758
> > > > (2 rows)
> > > >
> > > > It all looks good to me.
> > > >
> > >
> > > Here the wal_num_fpw and wal_bytes are different between parallel and
> > > non-parallel versions.  Is it due to checkpoint or something else?  We
> > > can probably rule out checkpoint by increasing checkpoint_timeout and
> > > other checkpoint related parameters.
> >
> > I think this is because I did a checkpoint after the VACUUM tests, so the 1st
> > CREATE INDEX (with parallelism) induced some FPW on the catalog blocks.  I
> > didn't try to investigate more since:
> >
> 
> We need to do this.
> 
> > On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote:
> > >
> > > Also, I forgot to mention that let's not base this on buffer usage
> > > patch for create index
> > > (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as
> > > per recent discussion I am not sure about its usefulness.  I think we
> > > can proceed with this patch without
> > > v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well.
> >
> >
> > Which is done in attached v11.
> >
> 
> Hmm, I haven't suggested removing the WAL usage from the parallel
> create index.   I just told not to use the infrastructure of another
> patch.  We bypass the buffer manager but do write WAL.  See
> _bt_blwritepage->log_newpage.  So we need to accumulate WAL usage even
> if we decide not to do anything about BufferUsage which means we need
> to investigate the above inconsistency in wal_num_fpw and wal_bytes
> between parallel and non-parallel version.


Oh, I thought that you wanted to wait on that part, as we'll probably change
the parallel create index to report buffer access eventually.

v12 attached with an adaptation of Sawada-san's original patch but only dealing
with WAL activity.

I did some more experiment, ensuring as much stability as possible:

=# create table t1(id integer);
CREATE TABLE
=# insert into t1 select * from generate_series(1, 1000000);
INSERT 0 1000000
=# select * from pg_stat_statements_reset() ;
 pg_stat_statements_reset
--------------------------

(1 row)

=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_0 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 1);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_1 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 2);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_2 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 3);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_3 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 4);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_4 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 5);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_5 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 6);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_6 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 7);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_7 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 8);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_8 ON t1(id);
CREATE INDEX

=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_0_bis ON t1(id);
CREATE INDEX
=# vacuum;checkpoint;
VACUUM
CHECKPOINT
=# create index t1_idx_parallel_0_ter ON t1(id);
CREATE INDEX

=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
                  query                       | calls | wal_bytes | wal_records | wal_num_fpw
----------------------------------------------+-------+-----------+-------------+-------------
 create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
 create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
 create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
 create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
 create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
 create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
 create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
 create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
 create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
 create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
 create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
(11 rows)

=# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
      relname          | pg_relation_size
-----------------------+------------------
 t1_idx_parallel_0     |         22487040
 t1_idx_parallel_0_bis |         22487040
 t1_idx_parallel_0_ter |         22487040
 t1_idx_parallel_2     |         22487040
 t1_idx_parallel_1     |         22487040
 t1_idx_parallel_4     |         22487040
 t1_idx_parallel_3     |         22487040
 t1_idx_parallel_5     |         22487040
 t1_idx_parallel_6     |         22487040
 t1_idx_parallel_7     |         22487040
 t1_idx_parallel_8     |         22487040
(9 rows)


So while the number of WAL records and full page images stay constant, we can
see some small fluctuations in the total amount of generated WAL data, even for
multiple execution of the sequential create index.  I'm wondering if the
fluctuations are due to some other internal details or if the WalUsage support
is just completely broken (although I don't see any obvious issue ATM).

Attachment

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
>                   query                       | calls | wal_bytes | wal_records | wal_num_fpw
> ----------------------------------------------+-------+-----------+-------------+-------------
>  create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
>  create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
>  create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
>  create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
>  create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
>  create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
>  create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
>  create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
>  create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
>  create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
>  create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
> (11 rows)
>
> =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
>       relname          | pg_relation_size
> -----------------------+------------------
>  t1_idx_parallel_0     |         22487040
>  t1_idx_parallel_0_bis |         22487040
>  t1_idx_parallel_0_ter |         22487040
>  t1_idx_parallel_2     |         22487040
>  t1_idx_parallel_1     |         22487040
>  t1_idx_parallel_4     |         22487040
>  t1_idx_parallel_3     |         22487040
>  t1_idx_parallel_5     |         22487040
>  t1_idx_parallel_6     |         22487040
>  t1_idx_parallel_7     |         22487040
>  t1_idx_parallel_8     |         22487040
> (9 rows)
>
>
> So while the number of WAL records and full page images stay constant, we can
> see some small fluctuations in the total amount of generated WAL data, even for
> multiple execution of the sequential create index.  I'm wondering if the
> fluctuations are due to some other internal details or if the WalUsage support
> is just completely broken (although I don't see any obvious issue ATM).
>

I think we need to know the reason for this.  Can you try with small
size indexes and see if the problem is reproducible? If it is, then it
will be easier to debug the same.

Few other minor comments
------------------------------------
pg_stat_statements patch
1.
+--
+-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to
validate WAL generation metrics
+--

The word 'non-temp' in the above comment appears out of place.  We
don't need to specify it.

2.
+-- SELECT usage data, check WAL usage is reported, wal_records equal
rows count for INSERT/UPDATE/DELETE
+SELECT query, calls, rows,
+wal_bytes > 0 as wal_bytes_generated,
+wal_records > 0 as wal_records_generated,
+wal_records = rows as wal_records_as_rows
+FROM pg_stat_statements ORDER BY query COLLATE "C";

The comment doesn't seem to match what we are doing in the statement.
I think we can simplify it to something like "check WAL is generated
for above statements:

3.
@@ -185,6 +185,9 @@ typedef struct Counters
  int64 local_blks_written; /* # of local disk blocks written */
  int64 temp_blks_read; /* # of temp blocks read */
  int64 temp_blks_written; /* # of temp blocks written */
+ uint64 wal_bytes; /* total amount of WAL bytes generated */
+ int64 wal_records; /* # of WAL records generated */
+ int64 wal_num_fpw; /* # of WAL full page image generated */
  double blk_read_time; /* time spent reading, in msec */
  double blk_write_time; /* time spent writing, in msec */
  double usage; /* usage factor */

It is better to keep wal_bytes should be after wal_num_fpw as it is in
the main patch.  Also, consider changing at other places in this
patch.  I think we should add these new fields after blk_write_time or
at the end after usage.

4.
/* # of WAL full page image generated */
Can we change it to "/* # of WAL full page image records generated */"?

If you agree, then a similar comment exists in
v11-0001-Add-infrastructure-to-track-WAL-usage, consider changing that
as well.


v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au
5.
Specifically, include the
+      number of records, full page images and bytes generated.

How about making the above slightly clear?  "Specifically, include the
number of records, number of full page image records and amount of WAL
bytes generated.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> >                   query                       | calls | wal_bytes | wal_records | wal_num_fpw
> > ----------------------------------------------+-------+-----------+-------------+-------------
> >  create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
> >  create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
> >  create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
> >  create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
> >  create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
> >  create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
> >  create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
> >  create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
> >  create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
> >  create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
> >  create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
> > (11 rows)
> >
> > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
> >       relname          | pg_relation_size
> > -----------------------+------------------
> >  t1_idx_parallel_0     |         22487040
> >  t1_idx_parallel_0_bis |         22487040
> >  t1_idx_parallel_0_ter |         22487040
> >  t1_idx_parallel_2     |         22487040
> >  t1_idx_parallel_1     |         22487040
> >  t1_idx_parallel_4     |         22487040
> >  t1_idx_parallel_3     |         22487040
> >  t1_idx_parallel_5     |         22487040
> >  t1_idx_parallel_6     |         22487040
> >  t1_idx_parallel_7     |         22487040
> >  t1_idx_parallel_8     |         22487040
> > (9 rows)
> >
> >
> > So while the number of WAL records and full page images stay constant, we can
> > see some small fluctuations in the total amount of generated WAL data, even for
> > multiple execution of the sequential create index.  I'm wondering if the
> > fluctuations are due to some other internal details or if the WalUsage support
> > is just completely broken (although I don't see any obvious issue ATM).
> >
>
> I think we need to know the reason for this.  Can you try with small
> size indexes and see if the problem is reproducible? If it is, then it
> will be easier to debug the same.
>
> Few other minor comments
> ------------------------------------
> pg_stat_statements patch
> 1.
> +--
> +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to
> validate WAL generation metrics
> +--
>
> The word 'non-temp' in the above comment appears out of place.  We
> don't need to specify it.
>
> 2.
> +-- SELECT usage data, check WAL usage is reported, wal_records equal
> rows count for INSERT/UPDATE/DELETE
> +SELECT query, calls, rows,
> +wal_bytes > 0 as wal_bytes_generated,
> +wal_records > 0 as wal_records_generated,
> +wal_records = rows as wal_records_as_rows
> +FROM pg_stat_statements ORDER BY query COLLATE "C";
>
> The comment doesn't seem to match what we are doing in the statement.
> I think we can simplify it to something like "check WAL is generated
> for above statements:
>
> 3.
> @@ -185,6 +185,9 @@ typedef struct Counters
>   int64 local_blks_written; /* # of local disk blocks written */
>   int64 temp_blks_read; /* # of temp blocks read */
>   int64 temp_blks_written; /* # of temp blocks written */
> + uint64 wal_bytes; /* total amount of WAL bytes generated */
> + int64 wal_records; /* # of WAL records generated */
> + int64 wal_num_fpw; /* # of WAL full page image generated */
>   double blk_read_time; /* time spent reading, in msec */
>   double blk_write_time; /* time spent writing, in msec */
>   double usage; /* usage factor */
>
> It is better to keep wal_bytes should be after wal_num_fpw as it is in
> the main patch.  Also, consider changing at other places in this
> patch.  I think we should add these new fields after blk_write_time or
> at the end after usage.
>
> 4.
> /* # of WAL full page image generated */
> Can we change it to "/* # of WAL full page image records generated */"?

IMHO, "# of WAL full-page image records" seems like the number of wal
record which contains the full-page image.  But, actually, this is the
total number of the full-page images, not the number of records that
have a full-page image.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Apr 02, 2020 at 06:40:51PM +0530, Amit Kapila wrote:
> On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> >                   query                       | calls | wal_bytes | wal_records | wal_num_fpw
> > ----------------------------------------------+-------+-----------+-------------+-------------
> >  create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
> >  create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
> >  create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
> >  create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
> >  create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
> >  create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
> >  create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
> >  create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
> >  create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
> >  create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
> >  create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
> > (11 rows)
> >
> > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
> >       relname          | pg_relation_size
> > -----------------------+------------------
> >  t1_idx_parallel_0     |         22487040
> >  t1_idx_parallel_0_bis |         22487040
> >  t1_idx_parallel_0_ter |         22487040
> >  t1_idx_parallel_2     |         22487040
> >  t1_idx_parallel_1     |         22487040
> >  t1_idx_parallel_4     |         22487040
> >  t1_idx_parallel_3     |         22487040
> >  t1_idx_parallel_5     |         22487040
> >  t1_idx_parallel_6     |         22487040
> >  t1_idx_parallel_7     |         22487040
> >  t1_idx_parallel_8     |         22487040
> > (9 rows)
> >
> >
> > So while the number of WAL records and full page images stay constant, we can
> > see some small fluctuations in the total amount of generated WAL data, even for
> > multiple execution of the sequential create index.  I'm wondering if the
> > fluctuations are due to some other internal details or if the WalUsage support
> > is just completely broken (although I don't see any obvious issue ATM).
> >
> 
> I think we need to know the reason for this.  Can you try with small
> size indexes and see if the problem is reproducible? If it is, then it
> will be easier to debug the same.


I did some quick testing using the attached shell script:

- one a 1k line base number of lines, scales 1 10 100 1000 (suffix _s)
- parallel workers from 0 to 8 (suffix _w)
- each index created twice (suffix _pa and _pb)
- with a vacuum;checkpoint;pg_switch_wal executed each time

I get the following results:

                   query                    | wal_bytes | wal_records | wal_num_fpw 
--------------------------------------------+-----------+-------------+-------------
 CREATE INDEX t1_idx_s001_pa_w0 ON t1 (id)  |     61871 |          22 |          18
 CREATE INDEX t1_idx_s001_pa_w1 ON t1 (id)  |     62394 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w2 ON t1 (id)  |     63150 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w3 ON t1 (id)  |     63906 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w4 ON t1 (id)  |     64662 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w5 ON t1 (id)  |     65418 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w6 ON t1 (id)  |     65450 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w7 ON t1 (id)  |     66206 |          21 |          18
 CREATE INDEX t1_idx_s001_pa_w8 ON t1 (id)  |     66962 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w0 ON t1 (id)  |     67718 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w1 ON t1 (id)  |     68474 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w2 ON t1 (id)  |     68418 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w3 ON t1 (id)  |     69174 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w4 ON t1 (id)  |     69930 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w5 ON t1 (id)  |     70686 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w6 ON t1 (id)  |     71442 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w7 ON t1 (id)  |     64922 |          21 |          18
 CREATE INDEX t1_idx_s001_pb_w8 ON t1 (id)  |     65682 |          21 |          18
 CREATE INDEX t1_idx_s010_pa_w0 ON t1 (id)  |    250460 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w1 ON t1 (id)  |    251216 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w2 ON t1 (id)  |    251972 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w3 ON t1 (id)  |    252728 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w4 ON t1 (id)  |    253484 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w5 ON t1 (id)  |    254240 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w6 ON t1 (id)  |    253552 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w7 ON t1 (id)  |    254308 |          47 |          44
 CREATE INDEX t1_idx_s010_pa_w8 ON t1 (id)  |    255064 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w0 ON t1 (id)  |    255820 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w1 ON t1 (id)  |    256576 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w2 ON t1 (id)  |    257332 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w3 ON t1 (id)  |    258088 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w4 ON t1 (id)  |    258844 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w5 ON t1 (id)  |    259600 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w6 ON t1 (id)  |    260356 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w7 ON t1 (id)  |    260012 |          47 |          44
 CREATE INDEX t1_idx_s010_pb_w8 ON t1 (id)  |    260768 |          47 |          44
 CREATE INDEX t1_idx_s1000_pa_w0 ON t1 (id) |  20400595 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w1 ON t1 (id) |  20401351 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w2 ON t1 (id) |  20402107 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w3 ON t1 (id) |  20402863 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w4 ON t1 (id) |  20403619 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w5 ON t1 (id) |  20404375 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w6 ON t1 (id) |  20403687 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w7 ON t1 (id) |  20404443 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pa_w8 ON t1 (id) |  20405199 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w0 ON t1 (id) |  20405955 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w1 ON t1 (id) |  20406711 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w2 ON t1 (id) |  20407467 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w3 ON t1 (id) |  20408223 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w4 ON t1 (id) |  20408979 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w5 ON t1 (id) |  20409735 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w6 ON t1 (id) |  20410491 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w7 ON t1 (id) |  20410147 |        2762 |        2759
 CREATE INDEX t1_idx_s1000_pb_w8 ON t1 (id) |  20410903 |        2762 |        2759
 CREATE INDEX t1_idx_s100_pa_w0 ON t1 (id)  |   2082194 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w1 ON t1 (id)  |   2082950 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w2 ON t1 (id)  |   2083706 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w3 ON t1 (id)  |   2084462 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w4 ON t1 (id)  |   2085218 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w5 ON t1 (id)  |   2085974 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w6 ON t1 (id)  |   2085286 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w7 ON t1 (id)  |   2086042 |         293 |         290
 CREATE INDEX t1_idx_s100_pa_w8 ON t1 (id)  |   2086798 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w0 ON t1 (id)  |   2087554 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w1 ON t1 (id)  |   2088310 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w2 ON t1 (id)  |   2089066 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w3 ON t1 (id)  |   2089822 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w4 ON t1 (id)  |   2090578 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w5 ON t1 (id)  |   2091334 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w6 ON t1 (id)  |   2092090 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w7 ON t1 (id)  |   2091746 |         293 |         290
 CREATE INDEX t1_idx_s100_pb_w8 ON t1 (id)  |   2092502 |         293 |         290
(72 rows)

The fluctuations exist for all scales, but doesn't seem to depend on the input
size.


Just to be sure I tried to measure the amount of WAL for various INSERT size
using roughly the same approach, and results are stable:

                        query                        | wal_bytes | wal_records | wal_num_fpw
-----------------------------------------------------+-----------+-------------+-------------
 INSERT INTO t_001_a SELECT generate_series($1, $2)  |     59000 |        1000 |           0
 INSERT INTO t_001_b SELECT generate_series($1, $2)  |     59000 |        1000 |           0
 INSERT INTO t_010_a SELECT generate_series($1, $2)  |    590000 |       10000 |           0
 INSERT INTO t_010_b SELECT generate_series($1, $2)  |    590000 |       10000 |           0
 INSERT INTO t_1000_a SELECT generate_series($1, $2) |  59000000 |     1000000 |           0
 INSERT INTO t_1000_b SELECT generate_series($1, $2) |  59000000 |     1000000 |           0
 INSERT INTO t_100_a SELECT generate_series($1, $2)  |   5900000 |      100000 |           0
 INSERT INTO t_100_b SELECT generate_series($1, $2)  |   5900000 |      100000 |           0
(8 rows)


At this point I tend to think that this is somehow due to btbuild specific
behavior, or somewhere nearby.


> Few other minor comments
> ------------------------------------
> pg_stat_statements patch
> 1.
> +--
> +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to
> validate WAL generation metrics
> +--
> 
> The word 'non-temp' in the above comment appears out of place.  We
> don't need to specify it.


Fixed.


> 2.
> +-- SELECT usage data, check WAL usage is reported, wal_records equal
> rows count for INSERT/UPDATE/DELETE
> +SELECT query, calls, rows,
> +wal_bytes > 0 as wal_bytes_generated,
> +wal_records > 0 as wal_records_generated,
> +wal_records = rows as wal_records_as_rows
> +FROM pg_stat_statements ORDER BY query COLLATE "C";
> 
> The comment doesn't seem to match what we are doing in the statement.
> I think we can simplify it to something like "check WAL is generated
> for above statements:


Done.


> 3.
> @@ -185,6 +185,9 @@ typedef struct Counters
>   int64 local_blks_written; /* # of local disk blocks written */
>   int64 temp_blks_read; /* # of temp blocks read */
>   int64 temp_blks_written; /* # of temp blocks written */
> + uint64 wal_bytes; /* total amount of WAL bytes generated */
> + int64 wal_records; /* # of WAL records generated */
> + int64 wal_num_fpw; /* # of WAL full page image generated */
>   double blk_read_time; /* time spent reading, in msec */
>   double blk_write_time; /* time spent writing, in msec */
>   double usage; /* usage factor */
> 
> It is better to keep wal_bytes should be after wal_num_fpw as it is in
> the main patch.  Also, consider changing at other places in this
> patch.  I think we should add these new fields after blk_write_time or
> at the end after usage.


Done.


> 4.
> /* # of WAL full page image generated */
> Can we change it to "/* # of WAL full page image records generated */"?
> 
> If you agree, then a similar comment exists in
> v11-0001-Add-infrastructure-to-track-WAL-usage, consider changing that
> as well.


Agreed, and fixed in both place.


> v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au
> 5.
> Specifically, include the
> +      number of records, full page images and bytes generated.
> 
> How about making the above slightly clear?  "Specifically, include the
> number of records, number of full page image records and amount of WAL
> bytes generated.


Thanks, that's clearer.  Done

Attachment

Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> >                   query                       | calls | wal_bytes | wal_records | wal_num_fpw
> > ----------------------------------------------+-------+-----------+-------------+-------------
> >  create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
> >  create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
> >  create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
> >  create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
> >  create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
> >  create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
> >  create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
> >  create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
> >  create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
> >  create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
> >  create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
> > (11 rows)
> >
> > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
> >       relname          | pg_relation_size
> > -----------------------+------------------
> >  t1_idx_parallel_0     |         22487040
> >  t1_idx_parallel_0_bis |         22487040
> >  t1_idx_parallel_0_ter |         22487040
> >  t1_idx_parallel_2     |         22487040
> >  t1_idx_parallel_1     |         22487040
> >  t1_idx_parallel_4     |         22487040
> >  t1_idx_parallel_3     |         22487040
> >  t1_idx_parallel_5     |         22487040
> >  t1_idx_parallel_6     |         22487040
> >  t1_idx_parallel_7     |         22487040
> >  t1_idx_parallel_8     |         22487040
> > (9 rows)
> >
> >
> > So while the number of WAL records and full page images stay constant, we can
> > see some small fluctuations in the total amount of generated WAL data, even for
> > multiple execution of the sequential create index.  I'm wondering if the
> > fluctuations are due to some other internal details or if the WalUsage support
> > is just completely broken (although I don't see any obvious issue ATM).
> >
>
> I think we need to know the reason for this.  Can you try with small
> size indexes and see if the problem is reproducible? If it is, then it
> will be easier to debug the same.

I have done some testing to see where these extra WAL size is coming
from.  First I tried to create new db before every run then the size
is consistent.  But, then on the same server, I tired as Julien showed
in his experiment then I am getting few extra wal bytes from next
create index onwards.  And, the waldump(attached in the mail) shows
that is pg_class insert wal.  I still have to check that why we need
to write an extra wal size.

create extension pg_stat_statements;
drop table t1;
create table t1(id integer);
insert into t1 select * from generate_series(1, 10);
alter table t1 set (parallel_workers = 0);
vacuum;checkpoint;
select * from pg_stat_statements_reset() ;
create index t1_idx_parallel_0 ON t1(id);
select query, calls, wal_bytes, wal_records, wal_num_fpw from
pg_stat_statements where query ilike '%create index%';;
                                      query
           | calls | wal_bytes | wal_records | wal_num_fpw

----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
 create index t1_idx_parallel_0 ON t1(id)
           |     1 |     49320 |          23 |          15


drop table t1;
create table t1(id integer);
insert into t1 select * from generate_series(1, 10);
--select * from pg_stat_statements_reset() ;
alter table t1 set (parallel_workers = 0);
vacuum;checkpoint;
create index t1_idx_parallel_1 ON t1(id);

select query, calls, wal_bytes, wal_records, wal_num_fpw from
pg_stat_statements where query ilike '%create index%';;
postgres[110383]=# select query, calls, wal_bytes, wal_records,
wal_num_fpw from pg_stat_statements;
                                      query
           | calls | wal_bytes | wal_records | wal_num_fpw

----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
 create index t1_idx_parallel_1 ON t1(id)
           |     1 |     50040 |          23 |          15

wal_bytes diff = 50040-49320 = 720

Below, WAL record is causing the 720 bytes difference, all other WALs
are of the same size.
t1_idx_parallel_0:
rmgr: Heap        len (rec/tot):     54/  7498, tx:        489, lsn:
0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref
#0: rel 1663/13580/1249

t1_idx_parallel_1:
rmgr: Heap        len (rec/tot):     54/  8218, tx:        494, lsn:
0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref
#0: rel 1663/13580/1249

wal diff: 8218 - 7498 = 720


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 4.
> > /* # of WAL full page image generated */
> > Can we change it to "/* # of WAL full page image records generated */"?
>
> IMHO, "# of WAL full-page image records" seems like the number of wal
> record which contains the full-page image.
>

I think this resembles what you have written here.

>  But, actually, this is the
> total number of the full-page images, not the number of records that
> have a full-page image.
>

We count this when forming WAL records.  As per my understanding, all
three counters are about WAL records.  This counter tells how many
records have full page images and one of the purposes of having this
counter is to check what percentage of records contain full page
image.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Kyotaro Horiguchi
Date:
Hello.

The v13 patch seems failing to apply on the master.

At Fri, 3 Apr 2020 06:37:21 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in 
> On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 4.
> > > /* # of WAL full page image generated */
> > > Can we change it to "/* # of WAL full page image records generated */"?
> >
> > IMHO, "# of WAL full-page image records" seems like the number of wal
> > record which contains the full-page image.
> >
> 
> I think this resembles what you have written here.
> 
> >  But, actually, this is the
> > total number of the full-page images, not the number of records that
> > have a full-page image.
> >
> 
> We count this when forming WAL records.  As per my understanding, all
> three counters are about WAL records.  This counter tells how many
> records have full page images and one of the purposes of having this
> counter is to check what percentage of records contain full page
> image.

Aside from which is desirable or useful, acutually XLogRecordAssemble
in v13-0001 counts the number of attached images then XLogInsertRecord
sums up the number of images in pgWalUsage.wal_num_fpw.

FWIW, it seems to me that the main concern here is the source of WAL
size. If it is the case I think that the number of full page image is
more useful.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 3, 2020 at 6:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > 4.
> > > /* # of WAL full page image generated */
> > > Can we change it to "/* # of WAL full page image records generated */"?
> >
> > IMHO, "# of WAL full-page image records" seems like the number of wal
> > record which contains the full-page image.
> >
>
> I think this resembles what you have written here.
>
> >  But, actually, this is the
> > total number of the full-page images, not the number of records that
> > have a full-page image.
> >
>
> We count this when forming WAL records.  As per my understanding, all
> three counters are about WAL records.  This counter tells how many
> records have full page images and one of the purposes of having this
> counter is to check what percentage of records contain full page
> image.
>

How about if say "# of full-page writes generated" or "# of WAL
full-page writes generated"?  I think now I understand your concern
because we want to display it as full page writes and the comment
doesn't seem to indicate the same.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Fri, Apr 3, 2020 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 3, 2020 at 6:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > 4.
> > > > /* # of WAL full page image generated */
> > > > Can we change it to "/* # of WAL full page image records generated */"?
> > >
> > > IMHO, "# of WAL full-page image records" seems like the number of wal
> > > record which contains the full-page image.
> > >
> >
> > I think this resembles what you have written here.
> >
> > >  But, actually, this is the
> > > total number of the full-page images, not the number of records that
> > > have a full-page image.
> > >
> >
> > We count this when forming WAL records.  As per my understanding, all
> > three counters are about WAL records.  This counter tells how many
> > records have full page images and one of the purposes of having this
> > counter is to check what percentage of records contain full page
> > image.
> >
>
> How about if say "# of full-page writes generated" or "# of WAL
> full-page writes generated"?  I think now I understand your concern
> because we want to display it as full page writes and the comment
> doesn't seem to indicate the same.

Either of these seem good to me.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 3, 2020 at 7:15 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> Hello.
>
> The v13 patch seems failing to apply on the master.
>

It is probably due to recent commit ed7a509571.  I have briefly
studied that and I think we should make this patch account for plan
time WAL usage if any similar to what got committed for buffer usage.
The reason is that there is a possibility that during planning we
might write a WAL due to hint bits.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Thu, Apr 2, 2020 at 9:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
> > >                   query                       | calls | wal_bytes | wal_records | wal_num_fpw
> > > ----------------------------------------------+-------+-----------+-------------+-------------
> > >  create index t1_idx_parallel_0 ON t1(id)     |     1 |  20389743 |        2762 |        2758
> > >  create index t1_idx_parallel_0_bis ON t1(id) |     1 |  20394391 |        2762 |        2758
> > >  create index t1_idx_parallel_0_ter ON t1(id) |     1 |  20395155 |        2762 |        2758
> > >  create index t1_idx_parallel_1 ON t1(id)     |     1 |  20388335 |        2762 |        2758
> > >  create index t1_idx_parallel_2 ON t1(id)     |     1 |  20389091 |        2762 |        2758
> > >  create index t1_idx_parallel_3 ON t1(id)     |     1 |  20389847 |        2762 |        2758
> > >  create index t1_idx_parallel_4 ON t1(id)     |     1 |  20390603 |        2762 |        2758
> > >  create index t1_idx_parallel_5 ON t1(id)     |     1 |  20391359 |        2762 |        2758
> > >  create index t1_idx_parallel_6 ON t1(id)     |     1 |  20392115 |        2762 |        2758
> > >  create index t1_idx_parallel_7 ON t1(id)     |     1 |  20392871 |        2762 |        2758
> > >  create index t1_idx_parallel_8 ON t1(id)     |     1 |  20393627 |        2762 |        2758
> > > (11 rows)
> > >
> > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%';
> > >       relname          | pg_relation_size
> > > -----------------------+------------------
> > >  t1_idx_parallel_0     |         22487040
> > >  t1_idx_parallel_0_bis |         22487040
> > >  t1_idx_parallel_0_ter |         22487040
> > >  t1_idx_parallel_2     |         22487040
> > >  t1_idx_parallel_1     |         22487040
> > >  t1_idx_parallel_4     |         22487040
> > >  t1_idx_parallel_3     |         22487040
> > >  t1_idx_parallel_5     |         22487040
> > >  t1_idx_parallel_6     |         22487040
> > >  t1_idx_parallel_7     |         22487040
> > >  t1_idx_parallel_8     |         22487040
> > > (9 rows)
> > >
> > >
> > > So while the number of WAL records and full page images stay constant, we can
> > > see some small fluctuations in the total amount of generated WAL data, even for
> > > multiple execution of the sequential create index.  I'm wondering if the
> > > fluctuations are due to some other internal details or if the WalUsage support
> > > is just completely broken (although I don't see any obvious issue ATM).
> > >
> >
> > I think we need to know the reason for this.  Can you try with small
> > size indexes and see if the problem is reproducible? If it is, then it
> > will be easier to debug the same.
>
> I have done some testing to see where these extra WAL size is coming
> from.  First I tried to create new db before every run then the size
> is consistent.  But, then on the same server, I tired as Julien showed
> in his experiment then I am getting few extra wal bytes from next
> create index onwards.  And, the waldump(attached in the mail) shows
> that is pg_class insert wal.  I still have to check that why we need
> to write an extra wal size.
>
> create extension pg_stat_statements;
> drop table t1;
> create table t1(id integer);
> insert into t1 select * from generate_series(1, 10);
> alter table t1 set (parallel_workers = 0);
> vacuum;checkpoint;
> select * from pg_stat_statements_reset() ;
> create index t1_idx_parallel_0 ON t1(id);
> select query, calls, wal_bytes, wal_records, wal_num_fpw from
> pg_stat_statements where query ilike '%create index%';;
>                                       query
>            | calls | wal_bytes | wal_records | wal_num_fpw
>
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
>  create index t1_idx_parallel_0 ON t1(id)
>            |     1 |     49320 |          23 |          15
>
>
> drop table t1;
> create table t1(id integer);
> insert into t1 select * from generate_series(1, 10);
> --select * from pg_stat_statements_reset() ;
> alter table t1 set (parallel_workers = 0);
> vacuum;checkpoint;
> create index t1_idx_parallel_1 ON t1(id);
>
> select query, calls, wal_bytes, wal_records, wal_num_fpw from
> pg_stat_statements where query ilike '%create index%';;
> postgres[110383]=# select query, calls, wal_bytes, wal_records,
> wal_num_fpw from pg_stat_statements;
>                                       query
>            | calls | wal_bytes | wal_records | wal_num_fpw
>
----------------------------------------------------------------------------------+-------+-----------+-------------+-------------
>  create index t1_idx_parallel_1 ON t1(id)
>            |     1 |     50040 |          23 |          15
>
> wal_bytes diff = 50040-49320 = 720
>
> Below, WAL record is causing the 720 bytes difference, all other WALs
> are of the same size.
> t1_idx_parallel_0:
> rmgr: Heap        len (rec/tot):     54/  7498, tx:        489, lsn:
> 0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref
> #0: rel 1663/13580/1249
>
> t1_idx_parallel_1:
> rmgr: Heap        len (rec/tot):     54/  8218, tx:        494, lsn:
> 0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref
> #0: rel 1663/13580/1249
>
> wal diff: 8218 - 7498 = 720

I think now I got the reason.  Basically, both of these records are
storing the FPW, and FPW size can vary based on the hole size on the
page.  If hold size is smaller the image length will be more, the
image_len= BLCKSZ-hole_size.  So in subsequent records, the image size
is bigger.  You can refer below code in
XLogRecordAssemble
{
....
bimg.length = BLCKSZ - cbimg.hole_length;

if (cbimg.hole_length == 0)
{
....
}
else
{
/* must skip the hole */
rdt_datas_last->data = page;
rdt_datas_last->len = bimg.hole_offset;

rdt_datas_last->next = ®buf->bkp_rdatas[1];
rdt_datas_last = rdt_datas_last->next;

rdt_datas_last->data =
page + (bimg.hole_offset + cbimg.hole_length);
rdt_datas_last->len =
BLCKSZ - (bimg.hole_offset + cbimg.hole_length);
}


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I think now I got the reason.  Basically, both of these records are
> storing the FPW, and FPW size can vary based on the hole size on the
> page.  If hold size is smaller the image length will be more, the
> image_len= BLCKSZ-hole_size.  So in subsequent records, the image size
> is bigger.
>

This means if we always re-create the database or may be keep
full_page_writes to off, then we should get consistent WAL usage data
for all tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I think now I got the reason.  Basically, both of these records are
> > storing the FPW, and FPW size can vary based on the hole size on the
> > page.  If hold size is smaller the image length will be more, the
> > image_len= BLCKSZ-hole_size.  So in subsequent records, the image size
> > is bigger.
> >
>
> This means if we always re-create the database or may be keep
> full_page_writes to off, then we should get consistent WAL usage data
> for all tests.

With new database, it is always the same.  But, with full-page write,
I could see one of the create index is writing extra wal and if we
change the older then the new create index at that place will write
extra wal.  I guess that could be due to a non-in place update in some
of the system tables.

postgres[58554]=# create extension pg_stat_statements;
CREATE EXTENSION
postgres[58554]=#
postgres[58554]=# create table t1(id integer);
CREATE TABLE
postgres[58554]=# insert into t1 select * from generate_series(1, 1000000);
INSERT 0 1000000
postgres[58554]=# select * from pg_stat_statements_reset() ;
 pg_stat_statements_reset
--------------------------

(1 row)

postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 0);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_0 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 1);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_1 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 2);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_2 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 3);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_3 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 4);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_4 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 5);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_5 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 6);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_6 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 7);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_7 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# alter table t1 set (parallel_workers = 8);
ALTER TABLE
postgres[58554]=# vacuum;checkpoint;
VACUUM
CHECKPOINT
postgres[58554]=# create index t1_idx_parallel_8 ON t1(id);
CREATE INDEX
postgres[58554]=#
postgres[58554]=# select query, calls, wal_bytes, wal_records,
wal_num_fpw from pg_stat_statements where query ilike '%create
index%';
                  query                   | calls | wal_bytes |
wal_records | wal_num_fpw
------------------------------------------+-------+-----------+-------------+-------------
 create index t1_idx_parallel_0 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_1 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_3 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_2 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_4 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_8 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_6 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_7 ON t1(id) |     1 |  20355953 |
2766 |        2745
 create index t1_idx_parallel_5 ON t1(id) |     1 |  20359585 |
2767 |        2745

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Dilip Kumar
Date:
On Fri, Apr 3, 2020 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I think now I got the reason.  Basically, both of these records are
> > > storing the FPW, and FPW size can vary based on the hole size on the
> > > page.  If hold size is smaller the image length will be more, the
> > > image_len= BLCKSZ-hole_size.  So in subsequent records, the image size
> > > is bigger.
> > >
> >
> > This means if we always re-create the database or may be keep
> > full_page_writes to off, then we should get consistent WAL usage data
> > for all tests.
>
> With new database, it is always the same.  But, with full-page write,
> I could see one of the create index is writing extra wal and if we
> change the older then the new create index at that place will write
> extra wal.  I guess that could be due to a non-in place update in some
> of the system tables.

I have analyzed the WAL and there could be multiple reasons for the
same.  With small data, I have noticed that while inserting in the
system index there was a Page Split and that created extra WAL.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Fri, Apr 3, 2020 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I think now I got the reason.  Basically, both of these records are
> > > > storing the FPW, and FPW size can vary based on the hole size on the
> > > > page.  If hold size is smaller the image length will be more, the
> > > > image_len= BLCKSZ-hole_size.  So in subsequent records, the image size
> > > > is bigger.
> > > >
> > >
> > > This means if we always re-create the database or may be keep
> > > full_page_writes to off, then we should get consistent WAL usage data
> > > for all tests.
> >
> > With new database, it is always the same.  But, with full-page write,
> > I could see one of the create index is writing extra wal and if we
> > change the older then the new create index at that place will write
> > extra wal.  I guess that could be due to a non-in place update in some
> > of the system tables.
>
> I have analyzed the WAL and there could be multiple reasons for the
> same.  With small data, I have noticed that while inserting in the
> system index there was a Page Split and that created extra WAL.
>

Thanks for the investigation.  I think it is clear that we can't
expect the same WAL size even if we repeat the same operation unless
it is a fresh database.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > I have analyzed the WAL and there could be multiple reasons for the
> > same.  With small data, I have noticed that while inserting in the
> > system index there was a Page Split and that created extra WAL.
> >
>
> Thanks for the investigation.  I think it is clear that we can't
> expect the same WAL size even if we repeat the same operation unless
> it is a fresh database.
>

Attached find the latest patches.  I have modified based on our
discussion on user interface thread [1], ran pgindent on all patches,
slightly modified one comment based on Dilip's input and added commit
messages.  I think the patches are in good shape.  I would like to
commit the first patch in this series tomorrow unless I see more
comments or any other objections.  The patch-2 might need to be
rebased if the other related patch [2] got committed first and we
might need to tweak a bit based on the input from other thread [1]
where we are discussing user interface for it.


[1] - https://www.postgresql.org/message-id/CAA4eK1%2Bo1Vj4Rso09pKOaKhY8QWTA0gWwCL3TGCi1rCLBBf-QQ%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/E1jKC4J-0007R3-Bo%40gemulon.postgresql.org

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > >
> > > I have analyzed the WAL and there could be multiple reasons for the
> > > same.  With small data, I have noticed that while inserting in the
> > > system index there was a Page Split and that created extra WAL.
> > >
> >
> > Thanks for the investigation.  I think it is clear that we can't
> > expect the same WAL size even if we repeat the same operation unless
> > it is a fresh database.
> >
>
> Attached find the latest patches.  I have modified based on our
> discussion on user interface thread [1], ran pgindent on all patches,
> slightly modified one comment based on Dilip's input and added commit
> messages.  I think the patches are in good shape.  I would like to
> commit the first patch in this series tomorrow unless I see more
> comments or any other objections.
>

Pushed.

>  The patch-2 might need to be
> rebased if the other related patch [2] got committed first and we
> might need to tweak a bit based on the input from other thread [1]
> where we are discussing user interface for it.
>

The primary question for patch-2 is whether we want to include WAL
usage information for the planning phase as we did for BUFFERS in
recent commit ce77abe63c (Include information on buffer usage during
planning phase, in EXPLAIN output, take two.).  Initially, I thought
it might be a good idea to do the same for WAL but after reading the
thread that leads to commit, I am not sure if there is any pressing
need to include WAL information for the planning phase.  Because
during planning we might not write much WAL (with the exception of WAL
due to setting of hint-bits) so users might not care much.  What do
you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote:
> On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > > >
> > > > I have analyzed the WAL and there could be multiple reasons for the
> > > > same.  With small data, I have noticed that while inserting in the
> > > > system index there was a Page Split and that created extra WAL.
> > > >
> > >
> > > Thanks for the investigation.  I think it is clear that we can't
> > > expect the same WAL size even if we repeat the same operation unless
> > > it is a fresh database.
> > >
> >
> > Attached find the latest patches.  I have modified based on our
> > discussion on user interface thread [1], ran pgindent on all patches,
> > slightly modified one comment based on Dilip's input and added commit
> > messages.  I think the patches are in good shape.  I would like to
> > commit the first patch in this series tomorrow unless I see more
> > comments or any other objections.
> >
> 
> Pushed.


Thanks!


> >  The patch-2 might need to be
> > rebased if the other related patch [2] got committed first and we
> > might need to tweak a bit based on the input from other thread [1]
> > where we are discussing user interface for it.
> >
> 
> The primary question for patch-2 is whether we want to include WAL
> usage information for the planning phase as we did for BUFFERS in
> recent commit ce77abe63c (Include information on buffer usage during
> planning phase, in EXPLAIN output, take two.).  Initially, I thought
> it might be a good idea to do the same for WAL but after reading the
> thread that leads to commit, I am not sure if there is any pressing
> need to include WAL information for the planning phase.  Because
> during planning we might not write much WAL (with the exception of WAL
> due to setting of hint-bits) so users might not care much.  What do
> you think?


I agree that WAL activity during planning shouldn't be very frequent, but it
might still be worthwhile to add it.  I'm wondering how stable the normalized
WAL information would be in some regression tests, as the counters are only
showed if non zero.  Maybe it'd be better to remove them from the output, same
as the buffers?



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sat, Apr 4, 2020 at 11:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote:
>
> > >  The patch-2 might need to be
> > > rebased if the other related patch [2] got committed first and we
> > > might need to tweak a bit based on the input from other thread [1]
> > > where we are discussing user interface for it.
> > >
> >
> > The primary question for patch-2 is whether we want to include WAL
> > usage information for the planning phase as we did for BUFFERS in
> > recent commit ce77abe63c (Include information on buffer usage during
> > planning phase, in EXPLAIN output, take two.).  Initially, I thought
> > it might be a good idea to do the same for WAL but after reading the
> > thread that leads to commit, I am not sure if there is any pressing
> > need to include WAL information for the planning phase.  Because
> > during planning we might not write much WAL (with the exception of WAL
> > due to setting of hint-bits) so users might not care much.  What do
> > you think?
>
>
> I agree that WAL activity during planning shouldn't be very frequent, but it
> might still be worthwhile to add it.
>

We can add if we want but I am not able to convince myself for that.
Do you have any use case in mind?  I think in most of the cases
(except for hint-bit WAL) it will be zero. If we are not sure of this
we can also discuss it separately in a new thread once this
patch-series is committed and see if anybody else sees the value of it
and if so adding the code should be easy.

>  I'm wondering how stable the normalized
> WAL information would be in some regression tests, as the counters are only
> showed if non zero.  Maybe it'd be better to remove them from the output, same
> as the buffers?
>

Which regression tests are you referring to? pg_stat_statements?  If
so, why would it be unstable?  It should always generate WAL although
the exact values may differ and we have already taken care of that in
the patch, no?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sat, Apr 04, 2020 at 02:12:59PM +0530, Amit Kapila wrote:
> On Sat, Apr 4, 2020 at 11:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote:
> >
> > > >  The patch-2 might need to be
> > > > rebased if the other related patch [2] got committed first and we
> > > > might need to tweak a bit based on the input from other thread [1]
> > > > where we are discussing user interface for it.
> > > >
> > >
> > > The primary question for patch-2 is whether we want to include WAL
> > > usage information for the planning phase as we did for BUFFERS in
> > > recent commit ce77abe63c (Include information on buffer usage during
> > > planning phase, in EXPLAIN output, take two.).  Initially, I thought
> > > it might be a good idea to do the same for WAL but after reading the
> > > thread that leads to commit, I am not sure if there is any pressing
> > > need to include WAL information for the planning phase.  Because
> > > during planning we might not write much WAL (with the exception of WAL
> > > due to setting of hint-bits) so users might not care much.  What do
> > > you think?
> >
> >
> > I agree that WAL activity during planning shouldn't be very frequent, but it
> > might still be worthwhile to add it.
> >
> 
> We can add if we want but I am not able to convince myself for that.
> Do you have any use case in mind?  I think in most of the cases
> (except for hint-bit WAL) it will be zero. If we are not sure of this
> we can also discuss it separately in a new thread once this
> patch-series is committed and see if anybody else sees the value of it
> and if so adding the code should be easy.


I'm mostly thinking of people trying to investigate possible slowdowns on a
hot-standby replica with a primary without wal_log_hints.  If they explicitly
ask for WAL information, we should provide them, even if it's quite unlikely to
happen.


> 
> >  I'm wondering how stable the normalized
> > WAL information would be in some regression tests, as the counters are only
> > showed if non zero.  Maybe it'd be better to remove them from the output, same
> > as the buffers?
> >
> 
> Which regression tests are you referring to? pg_stat_statements?  If
> so, why would it be unstable?  It should always generate WAL although
> the exact values may differ and we have already taken care of that in
> the patch, no?


I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test,
which could be unstable for similar reason to why the first attempt to add
BUFFERS in the planning part of EXPLAIN was unstable.  I thought that's why you
were hesitating of adding it.



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > We can add if we want but I am not able to convince myself for that.
> > Do you have any use case in mind?  I think in most of the cases
> > (except for hint-bit WAL) it will be zero. If we are not sure of this
> > we can also discuss it separately in a new thread once this
> > patch-series is committed and see if anybody else sees the value of it
> > and if so adding the code should be easy.
>
>
> I'm mostly thinking of people trying to investigate possible slowdowns on a
> hot-standby replica with a primary without wal_log_hints.  If they explicitly
> ask for WAL information, we should provide them, even if it's quite unlikely to
> happen.
>

Yeah, possible but I am not completely sure.  I would like to hear the
opinion of others if any before adding code for this.  How about if we
first commit pg_stat_statements and wait for this till Monday and if
nobody responds we can commit the current patch but would start a new
thread and try to get the opinion of others?

>
> >
> > >  I'm wondering how stable the normalized
> > > WAL information would be in some regression tests, as the counters are only
> > > showed if non zero.  Maybe it'd be better to remove them from the output, same
> > > as the buffers?
> > >
> >
> > Which regression tests are you referring to? pg_stat_statements?  If
> > so, why would it be unstable?  It should always generate WAL although
> > the exact values may differ and we have already taken care of that in
> > the patch, no?
>
>
> I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test,
> which could be unstable for similar reason to why the first attempt to add
> BUFFERS in the planning part of EXPLAIN was unstable.
>

oh, then leave it for now because I don't see much use of those as the
code path can anyway be hit by the tests added by pg_stat_statements
patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sat, Apr 04, 2020 at 02:39:32PM +0530, Amit Kapila wrote:
> On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > We can add if we want but I am not able to convince myself for that.
> > > Do you have any use case in mind?  I think in most of the cases
> > > (except for hint-bit WAL) it will be zero. If we are not sure of this
> > > we can also discuss it separately in a new thread once this
> > > patch-series is committed and see if anybody else sees the value of it
> > > and if so adding the code should be easy.
> >
> >
> > I'm mostly thinking of people trying to investigate possible slowdowns on a
> > hot-standby replica with a primary without wal_log_hints.  If they explicitly
> > ask for WAL information, we should provide them, even if it's quite unlikely to
> > happen.
> >
> 
> Yeah, possible but I am not completely sure.  I would like to hear the
> opinion of others if any before adding code for this.  How about if we
> first commit pg_stat_statements and wait for this till Monday and if
> nobody responds we can commit the current patch but would start a new
> thread and try to get the opinion of others?


I'm fine with it.


> 
> >
> > >
> > > >  I'm wondering how stable the normalized
> > > > WAL information would be in some regression tests, as the counters are only
> > > > showed if non zero.  Maybe it'd be better to remove them from the output, same
> > > > as the buffers?
> > > >
> > >
> > > Which regression tests are you referring to? pg_stat_statements?  If
> > > so, why would it be unstable?  It should always generate WAL although
> > > the exact values may differ and we have already taken care of that in
> > > the patch, no?
> >
> >
> > I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test,
> > which could be unstable for similar reason to why the first attempt to add
> > BUFFERS in the planning part of EXPLAIN was unstable.
> >
> 
> oh, then leave it for now because I don't see much use of those as the
> code path can anyway be hit by the tests added by pg_stat_statements
> patch.
> 


Perfect then!



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sat, Apr 4, 2020 at 2:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Sat, Apr 04, 2020 at 02:39:32PM +0530, Amit Kapila wrote:
> > On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > We can add if we want but I am not able to convince myself for that.
> > > > Do you have any use case in mind?  I think in most of the cases
> > > > (except for hint-bit WAL) it will be zero. If we are not sure of this
> > > > we can also discuss it separately in a new thread once this
> > > > patch-series is committed and see if anybody else sees the value of it
> > > > and if so adding the code should be easy.
> > >
> > >
> > > I'm mostly thinking of people trying to investigate possible slowdowns on a
> > > hot-standby replica with a primary without wal_log_hints.  If they explicitly
> > > ask for WAL information, we should provide them, even if it's quite unlikely to
> > > happen.
> > >
> >
> > Yeah, possible but I am not completely sure.  I would like to hear the
> > opinion of others if any before adding code for this.  How about if we
> > first commit pg_stat_statements and wait for this till Monday and if
> > nobody responds we can commit the current patch but would start a new
> > thread and try to get the opinion of others?
>
>
> I'm fine with it.
>

I have pushed pg_stat_statements and Explain related patches.  I am
now looking into (auto)vacuum patch and have few comments.

@@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,

  TimestampDifference(starttime, endtime, &secs, &usecs);

+ memset(&walusage, 0, sizeof(WalUsage));
+ WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
+
  read_rate = 0;
  write_rate = 0;
  if ((secs > 0) || (usecs > 0))
@@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
  (long long) VacuumPageDirty);
  appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate:
%.3f MB/s\n"),
  read_rate, write_rate);
- appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
+ appendStringInfo(&buf,
+ _("WAL usage: %ld records, %ld full page writes, "
+    UINT64_FORMAT " bytes"),
+ walusage.wal_records,
+ walusage.wal_num_fpw,
+ walusage.wal_bytes);

Here, we are not displaying Buffers related data, so why do we think
it is important to display WAL data?  I see some point in displaying
Buffers and WAL data in a vacuum (verbose), but I feel it is better to
make a case for both the statistics together rather than just
displaying one and leaving other.  I think the other change related to
autovacuum stats seems okay to me.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Tue, 31 Mar 2020 at 14:13, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > The patch for vacuum conflicts with recent changes in vacuum. So I've
> > > attached rebased one.
> > >
> >
> > + /*
> > + * Next, accumulate buffer usage.  (This must wait for the workers to
> > + * finish, or we might get incomplete data.)
> > + */
> > + for (i = 0; i < nworkers; i++)
> > + InstrAccumParallelQuery(&lps->buffer_usage[i]);
> > +
> >
> > This should be done for launched workers aka
> > lps->pcxt->nworkers_launched.  I think a similar problem exists in
> > create index related patch.
>
> You're right. Fixed in the new patches.
>
> On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Just minor nitpicking:
> >
> > +   int         i;
> >
> >     Assert(!IsParallelWorker());
> >     Assert(ParallelVacuumIsActive(lps));
> > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats,
> >     /* Wait for all vacuum workers to finish */
> >     WaitForParallelWorkersToFinish(lps->pcxt);
> >
> > +   /*
> > +    * Next, accumulate buffer usage.  (This must wait for the workers to
> > +    * finish, or we might get incomplete data.)
> > +    */
> > +   for (i = 0; i < nworkers; i++)
> > +       InstrAccumParallelQuery(&lps->buffer_usage[i]);
> >
> > We now allow declaring a variable in those loops, so it may be better to avoid
> > declaring i outside the for scope?
>
> We can do that but I was not sure if it's good since other codes
> around there don't use that. So I'd like to leave it for committers.
> It's a trivial change.
>

I've updated the buffer usage patch for parallel index creation as the
previous patch conflicts with commit df3b181499b40.

This comment in commit df3b181499b40 seems the comment which had been
replaced by Amit with a better sentence when introducing buffer usage
to parallel vacuum.

+   /*
+    * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE
+    *
+    * WalUsage during execution of maintenance command can be used by an
+    * extension that reports the WAL usage, such as pg_stat_statements. We
+    * have no way of knowing whether anyone's looking at pgWalUsage, so do it
+    * unconditionally.
+    */

Would the following sentence in lazyvacuum.c be also better for
parallel create index?

    * If there are no extensions loaded that care, we could skip this.  We
    * have no way of knowing whether anyone's looking at pgBufferUsage or
    * pgWalUsage, so do it unconditionally.

The attached patch changes to the above comment and removed the code
that is used to un-support only buffer usage accumulation.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> The attached patch changes to the above comment and removed the code
> that is used to un-support only buffer usage accumulation.
>

So, IIUC, the purpose of this patch will be to count the buffer usage
due to the heap scan (in heapam_index_build_range_scan) we perform
while parallel create index? Because the index creation itself won't
use buffer manager.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Mon, 6 Apr 2020 at 16:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > The attached patch changes to the above comment and removed the code
> > that is used to un-support only buffer usage accumulation.
> >
>
> So, IIUC, the purpose of this patch will be to count the buffer usage
> due to the heap scan (in heapam_index_build_range_scan) we perform
> while parallel create index? Because the index creation itself won't
> use buffer manager.

Oops, I'd missed Peter's comment. Btree index doesn't use
heapam_index_build_range_scan so it's not necessary. Sorry for the
noise.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote:
> On Sat, Apr 4, 2020 at 2:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> I have pushed pg_stat_statements and Explain related patches.  I am
> now looking into (auto)vacuum patch and have few comments.
> 

Thanks!

> @@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
> 
>   TimestampDifference(starttime, endtime, &secs, &usecs);
> 
> + memset(&walusage, 0, sizeof(WalUsage));
> + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start);
> +
>   read_rate = 0;
>   write_rate = 0;
>   if ((secs > 0) || (usecs > 0))
> @@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params,
>   (long long) VacuumPageDirty);
>   appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate:
> %.3f MB/s\n"),
>   read_rate, write_rate);
> - appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0));
> + appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0));
> + appendStringInfo(&buf,
> + _("WAL usage: %ld records, %ld full page writes, "
> +    UINT64_FORMAT " bytes"),
> + walusage.wal_records,
> + walusage.wal_num_fpw,
> + walusage.wal_bytes);
> 
> Here, we are not displaying Buffers related data, so why do we think
> it is important to display WAL data?  I see some point in displaying
> Buffers and WAL data in a vacuum (verbose), but I feel it is better to
> make a case for both the statistics together rather than just
> displaying one and leaving other.  I think the other change related to
> autovacuum stats seems okay to me.

One thing is that the amount of WAL, and more precisely FPW, is quite
unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO
a very useful metric.  That being said I totally agree with you that both
should be displayed.  Should I send a patch to also expose it?



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Apr 6, 2020 at 1:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote:
> >
> > Here, we are not displaying Buffers related data, so why do we think
> > it is important to display WAL data?  I see some point in displaying
> > Buffers and WAL data in a vacuum (verbose), but I feel it is better to
> > make a case for both the statistics together rather than just
> > displaying one and leaving other.  I think the other change related to
> > autovacuum stats seems okay to me.
>
> One thing is that the amount of WAL, and more precisely FPW, is quite
> unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO
> a very useful metric.
>

I agree but we already have a way via pg_stat_statements to find it if
the metric is so useful.

>  That being said I totally agree with you that both
> should be displayed.  Should I send a patch to also expose it?
>

I think this should be a separate proposal.  Let's not add things
unless they are really essential.  We can separately discuss of
enhancing vacuum verbose for Buffer and WAL usage stats and see if
others also find that information useful.  I think you can send a
patch by removing the code I mentioned above if you agree.  Thanks for
working on this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Mon, Apr 6, 2020 at 12:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 6 Apr 2020 at 16:16, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > The attached patch changes to the above comment and removed the code
> > > that is used to un-support only buffer usage accumulation.
> > >
> >
> > So, IIUC, the purpose of this patch will be to count the buffer usage
> > due to the heap scan (in heapam_index_build_range_scan) we perform
> > while parallel create index? Because the index creation itself won't
> > use buffer manager.
>
> Oops, I'd missed Peter's comment. Btree index doesn't use
> heapam_index_build_range_scan so it's not necessary.
>

AFAIU, it uses heapam_index_build_range_scan but for writing to index,
it doesn't use buffer manager.  So, I guess probably we can accumulate
BufferUsage stats for parallel create index.  What I wanted to know is
whether the extra lookup for pg_amproc or any other catalog access via
parallel workers is fine or we somehow want to eliminate that?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Apr 06, 2020 at 02:34:36PM +0530, Amit Kapila wrote:
> On Mon, Apr 6, 2020 at 1:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote:
> > >
> > > Here, we are not displaying Buffers related data, so why do we think
> > > it is important to display WAL data?  I see some point in displaying
> > > Buffers and WAL data in a vacuum (verbose), but I feel it is better to
> > > make a case for both the statistics together rather than just
> > > displaying one and leaving other.  I think the other change related to
> > > autovacuum stats seems okay to me.
> >
> > One thing is that the amount of WAL, and more precisely FPW, is quite
> > unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO
> > a very useful metric.
> >
> 
> I agree but we already have a way via pg_stat_statements to find it if
> the metric is so useful.
> 

Agreed.

> 
> >  That being said I totally agree with you that both
> > should be displayed.  Should I send a patch to also expose it?
> >
> 
> I think this should be a separate proposal.  Let's not add things
> unless they are really essential.  We can separately discuss of
> enhancing vacuum verbose for Buffer and WAL usage stats and see if
> others also find that information useful.  I think you can send a
> patch by removing the code I mentioned above if you agree.  Thanks for
> working on this.

Thanks!  v15 attached.

Attachment

Re: WAL usage calculation patch

From
Euler Taveira
Date:
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:

I have pushed pg_stat_statements and Explain related patches.  I am
now looking into (auto)vacuum patch and have few comments.

I wasn't paying much attention to this thread. May I suggest changing wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix 'num'. It seems inconsistent to me.

 
Regards,


--
Euler Taveira                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
> On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> >
> > I have pushed pg_stat_statements and Explain related patches.  I am
> > now looking into (auto)vacuum patch and have few comments.
> >
> > I wasn't paying much attention to this thread. May I suggest changing
> wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
> 'num'. It seems inconsistent to me.
> 

If we want to be consistent shouldn't we rename it to wal_fpws?  FTR I don't
like much either version.



Re: WAL usage calculation patch

From
Euler Taveira
Date:
On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
> On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > I have pushed pg_stat_statements and Explain related patches.  I am
> > now looking into (auto)vacuum patch and have few comments.
> >
> > I wasn't paying much attention to this thread. May I suggest changing
> wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
> 'num'. It seems inconsistent to me.
>

If we want to be consistent shouldn't we rename it to wal_fpws?  FTR I don't
like much either version.

Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefer singular form because parameter names are lowercase. Function description will clarify that this is "number of WAL full page writes".


Regards,


--
Euler Taveira                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WAL usage calculation patch

From
Peter Eisentraut
Date:
I noticed in some of the screenshots that were tweeted that for example in

     WAL:  records=1  bytes=56

there are two spaces between pieces of data.  This doesn't match the 
rest of the EXPLAIN output.  Can that be adjusted?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Justin Pryzby
Date:
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
> I noticed in some of the screenshots that were tweeted that for example in
> 
>     WAL:  records=1  bytes=56
> 
> there are two spaces between pieces of data.  This doesn't match the rest of
> the EXPLAIN output.  Can that be adjusted?

We talked about that here:
https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com

-- 
Justin



On Mon, Apr 6, 2020 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> AFAIU, it uses heapam_index_build_range_scan but for writing to index,
> it doesn't use buffer manager.

Right. It doesn't need to use the buffer manager to write to the
index, unlike (say) GIN's CREATE INDEX.

-- 
Peter Geoghegan



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
> > I noticed in some of the screenshots that were tweeted that for example in
> >
> >     WAL:  records=1  bytes=56
> >
> > there are two spaces between pieces of data.  This doesn't match the rest of
> > the EXPLAIN output.  Can that be adjusted?
>
> We talked about that here:
> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com
>

Yeah.  Just to brief here, the main reason was that one of the fields
(full page writes) already had a single space and then we had prior
cases as mentioned in Justin's email [1] where we use two spaces which
lead us to decide using two spaces in this case.

Now, we can change back to one space as suggested by you but I am not
sure if that is an improvement over what we have done.  Let me know if
you think otherwise.


[1] - https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
<euler.taveira@2ndquadrant.com> wrote:
>
> On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
>>
>> On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
>> > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > >
>> > > I have pushed pg_stat_statements and Explain related patches.  I am
>> > > now looking into (auto)vacuum patch and have few comments.
>> > >
>> > > I wasn't paying much attention to this thread. May I suggest changing
>> > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
>> > 'num'. It seems inconsistent to me.
>> >
>>
>> If we want to be consistent shouldn't we rename it to wal_fpws?  FTR I don't
>> like much either version.
>
>
> Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I
prefersingular form because parameter names are lowercase. Function description will clarify that this is "number of
WALfull page writes".
 
>

I like Euler's suggestion to change wal_num_fpw to wal_fpw.  It is
better if others who didn't like this name can also share their
opinion now because changing multiple times the same thing is not a
good idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Tue, 7 Apr 2020 at 02:40, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Mon, Apr 6, 2020 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > AFAIU, it uses heapam_index_build_range_scan but for writing to index,
> > it doesn't use buffer manager.
>
> Right. It doesn't need to use the buffer manager to write to the
> index, unlike (say) GIN's CREATE INDEX.

Hmm, after more thoughts and testing, it seems to me that parallel
btree index creation uses buffer manager while scanning the table in
parallel, i.e in heapam_index_build_range_scan, which affects
shared_blks_xxx in pg_stat_statements. I've some parallel create index
tests with the current HEAD and with the attached patch. The table has
44248 blocks.

HEAD, no workers:

-[ RECORD 1 ]-------+----------
total_plan_time     | 0
total_plan_time     | 0
shared_blks_hit     | 148
shared_blks_read    | 44281
total_read_blks     | 44429
shared_blks_dirtied | 44261
shared_blks_written | 24644
wal_records         | 71693
wal_num_fpw         | 71682
wal_bytes           | 566815038

HEAD, 4 workers:

-[ RECORD 1 ]-------+----------
total_plan_time     | 0
total_plan_time     | 0
shared_blks_hit     | 160
shared_blks_read    | 8892
total_read_blks     | 9052
shared_blks_dirtied | 8871
shared_blks_written | 5342
wal_records         | 71693
wal_num_fpw         | 71682
wal_bytes           | 566815038

The WAL usage statistics are good but the buffer usage statistics seem
not correct.

Patched, no workers:

-[ RECORD 1 ]-------+----------
total_plan_time     | 0
total_plan_time     | 0
shared_blks_hit     | 148
shared_blks_read    | 44281
total_read_blks     | 44429
shared_blks_dirtied | 44261
shared_blks_written | 24843
wal_records         | 71693
wal_num_fpw         | 71682
wal_bytes           | 566815038

Patched, 4 workers:

-[ RECORD 1 ]-------+----------
total_plan_time     | 0
total_plan_time     | 0
shared_blks_hit     | 172
shared_blks_read    | 44282
total_read_blks     | 44454
shared_blks_dirtied | 44261
shared_blks_written | 26968
wal_records         | 71693
wal_num_fpw         | 71682
wal_bytes           | 566815038

Buffer usage statistics seem correct. The small differences would be
catalog lookups Peter mentioned.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> Buffer usage statistics seem correct. The small differences would be
> catalog lookups Peter mentioned.
>

Agreed, but can you check which part of code does that lookup?  I want
to see if we can avoid that from buffer usage stats or at least write
a comment about it, otherwise, we might have to face this question
again and again.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
> <euler.taveira@2ndquadrant.com> wrote:
> >
> > On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
> >>
> >> On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
> >> > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> >
> >> > >
> >> > > I have pushed pg_stat_statements and Explain related patches.  I am
> >> > > now looking into (auto)vacuum patch and have few comments.
> >> > >
> >> > > I wasn't paying much attention to this thread. May I suggest changing
> >> > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
> >> > 'num'. It seems inconsistent to me.
> >> >
> >>
> >> If we want to be consistent shouldn't we rename it to wal_fpws?  FTR I don't
> >> like much either version.
> >
> >
> > Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I
prefersingular form because parameter names are lowercase. Function description will clarify that this is "number of
WALfull page writes".
 
> >
>
> I like Euler's suggestion to change wal_num_fpw to wal_fpw.  It is
> better if others who didn't like this name can also share their
> opinion now because changing multiple times the same thing is not a
> good idea.

+1

About Justin and your comments on the other thread:

On Tue, Apr 7, 2020 at 4:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 6, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Thu, Apr 02, 2020 at 08:29:31AM +0200, Julien Rouhaud wrote:
> > > > > "full page records" seems to be showing the number of full page
> > > > > images, not the record having full page images.
> > > >
> > > > I am not sure what exactly is a difference but it is the records
> > > > having full page images.  Julien correct me if I am wrong.
> >
> > > Obviously previous complaints about the meaning and parsability of
> > > "full page writes" should be addressed here for consistency.
> >
> > There's a couple places that say "full page image records" which I think is
> > language you were trying to avoid.  It's the number of pages, not the number of
> > records, no ?  I see explain and autovacuum say what I think is wanted, but
> > these say the wrong thing?  Find attached slightly larger patch.
> >
> > $ git grep 'image record'
> > contrib/pg_stat_statements/pg_stat_statements.c:        int64           wal_num_fpw;    /* # of WAL full page image
recordsgenerated */
 
> > doc/src/sgml/ref/explain.sgml:      number of records, number of full page image records and amount of WAL
> >
>
> Few comments:
> 1.
> - int64 wal_num_fpw; /* # of WAL full page image records generated */
> + int64 wal_num_fpw; /* # of WAL full page images generated */
>
> Let's change comment as " /* # of WAL full page writes generated */"
> to be consistent with other places like instrument.h.  Also, make a
> similar change at other places if required.

Agreed.  That's pg_stat_statements.c and instrument.h.  I'll send a
patch once we reach consensus with the rest of the comments.

> 2.
>        <entry>
> -        Total amount of WAL bytes generated by the statement
> +        Total number of WAL bytes generated by the statement
>        </entry>
>
> I feel the previous text was better as this field can give us the size
> of WAL with which we can answer "how much WAL data is generated by a
> particular statement?".  Julien, do you have any thoughts on this?

I also prefer "amount" as it feels more natural.  I'm not a native
english speaker though, so maybe I'm just biased.



On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > Buffer usage statistics seem correct. The small differences would be
> > catalog lookups Peter mentioned.
> >
>
> Agreed, but can you check which part of code does that lookup?  I want
> to see if we can avoid that from buffer usage stats or at least write
> a comment about it, otherwise, we might have to face this question
> again and again.

Okay, I'll check it.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Peter Eisentraut
Date:
On 2020-04-07 04:12, Amit Kapila wrote:
> On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>>
>> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
>>> I noticed in some of the screenshots that were tweeted that for example in
>>>
>>>      WAL:  records=1  bytes=56
>>>
>>> there are two spaces between pieces of data.  This doesn't match the rest of
>>> the EXPLAIN output.  Can that be adjusted?
>>
>> We talked about that here:
>> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com
>>
> 
> Yeah.  Just to brief here, the main reason was that one of the fields
> (full page writes) already had a single space and then we had prior
> cases as mentioned in Justin's email [1] where we use two spaces which
> lead us to decide using two spaces in this case.

We also have existing cases for the other way:

     actual time=0.050..0.052
     Buffers: shared hit=3 dirtied=1

The cases mentioned by Justin are not formatted in a key=value format, 
so it's not quite the same, but it also raises the question why they are 
not.

Let's figure out a way to consolidate this without making up a third format.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > Buffer usage statistics seem correct. The small differences would be
> > > catalog lookups Peter mentioned.
> > >
> >
> > Agreed, but can you check which part of code does that lookup?  I want
> > to see if we can avoid that from buffer usage stats or at least write
> > a comment about it, otherwise, we might have to face this question
> > again and again.
>
> Okay, I'll check it.
>

I've checked the buffer usage differences when parallel btree index creation.

TL;DR;

During tuple sorting individual parallel workers read blocks of
pg_amproc and pg_amproc_fam_proc_index to get the sort support
function. The call flow is like:

ParallelWorkerMain()
  _bt_parallel_scan_and_sort()
    tuplesort_begin_index_btree()
      PrepareSortSupportFromIndexRel()
        FinishSortSupportFunction()
          get_opfamily_proc()

The details are as follows.

I populated the test table by the following scripts:

create table test (c int) with (autovacuum_enabled = off, parallel_workers = 8);
insert into test select generate_series(1,10000000);

and create index DDL is:

create index test_idx on test (c);

Before executing the test script, I've put code at the following 4
places which checks the buffer usage at that point, and calculated the
difference between points: (a), (b) and (c). For example, (b) shows
the number of blocks read or hit during executing scanning heap and
building index.

1. Before executing CREATE INDEX command (at pgss_ProcessUtility())
(a)
2. Before parallel create index (at _bt_begin_parallel())
(b)
3. After parallel create index, after accumlating workers stats (at
_bt_end_parallel())
(c)
4. After executing CREATE INDEX command (at pgss_ProcessUtility())

And here is the results:

2 workers:
(a) hit: 107, read: 26
(b) hit: 12(=6+3+3), read: 44248(=15538+14453+14527)
(c) hit: 13, read: 2
total hit: 132, read:44276

4 workers:
(a) hit: 107, read: 26
(b) hit: 18(=6+3+3+3+3), read: 44248(=9368+8582+8544+9250+8504)
(c) hit: 13, read: 2
total hit: 138, read:44276

The table 'test' has 44276 blocks.

From the above results, the total number of reading blocks (44248
blocks) during parallel index creation is stable and equals to the
number of blocks of the test table. And we can see that extra three
blocks are read per workers. These three blocks are two for
pg_amproc_fam_proc_index and one for pg_amproc. That is, individual
parallel workers accesses these relations to get the sort support
function. The full backtrace is:

* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff779c561a libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010cc9f90d postgres`pg_usleep(microsec=20000000)
at pgsleep.c:56:10
    frame #2: 0x000000010ca5a668
postgres`ReadBuffer_common(smgr=0x00007fe872848f70,
relpersistence='p', forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL,
strategy=0x0000000000000000, hit=0x00007ffee363071b) at bufmgr.c:685:3
    frame #3: 0x000000010ca5a4b6
postgres`ReadBufferExtended(reln=0x000000010d58f790,
forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL,
strategy=0x0000000000000000) at bufmgr.c:628:8
    frame #4: 0x000000010ca5a397
postgres`ReadBuffer(reln=0x000000010d58f790, blockNum=3) at
bufmgr.c:560:9
    frame #5: 0x000000010c67187e
postgres`_bt_getbuf(rel=0x000000010d58f790, blkno=3, access=1) at
nbtpage.c:792:9
    frame #6: 0x000000010c670507
postgres`_bt_getroot(rel=0x000000010d58f790, access=1) at
nbtpage.c:294:13
    frame #7: 0x000000010c679393
postgres`_bt_search(rel=0x000000010d58f790, key=0x00007ffee36312d0,
bufP=0x00007ffee3631bec, access=1, snapshot=0x00007fe8728388e0) at
nbtsearch.c:107:10
    frame #8: 0x000000010c67b489
postgres`_bt_first(scan=0x00007fe86f814998, dir=ForwardScanDirection)
at nbtsearch.c:1355:10
    frame #9: 0x000000010c676869
postgres`btgettuple(scan=0x00007fe86f814998, dir=ForwardScanDirection)
at nbtree.c:253:10
    frame #10: 0x000000010c6656ad
postgres`index_getnext_tid(scan=0x00007fe86f814998,
direction=ForwardScanDirection) at indexam.c:530:10
    frame #11: 0x000000010c66585b
postgres`index_getnext_slot(scan=0x00007fe86f814998,
direction=ForwardScanDirection, slot=0x00007fe86f814880) at
indexam.c:622:10
    frame #12: 0x000000010c663eac
postgres`systable_getnext(sysscan=0x00007fe86f814828) at genam.c:454:7
    frame #13: 0x000000010cc0be41
postgres`SearchCatCacheMiss(cache=0x00007fe872818e80, nkeys=4,
hashValue=3052139574, hashIndex=6, v1=1976, v2=23, v3=23, v4=2) at
catcache.c:1368:9
    frame #14: 0x000000010cc0bced
postgres`SearchCatCacheInternal(cache=0x00007fe872818e80, nkeys=4,
v1=1976, v2=23, v3=23, v4=2) at catcache.c:1299:9
    frame #15: 0x000000010cc0baa8
postgres`SearchCatCache4(cache=0x00007fe872818e80, v1=1976, v2=23,
v3=23, v4=2) at catcache.c:1191:9
    frame #16: 0x000000010cc27c82 postgres`SearchSysCache4(cacheId=5,
key1=1976, key2=23, key3=23, key4=2) at syscache.c:1156:9
    frame #17: 0x000000010cc105dd
postgres`get_opfamily_proc(opfamily=1976, lefttype=23, righttype=23,
procnum=2) at lsyscache.c:751:7
    frame #18: 0x000000010cc72e1d
postgres`FinishSortSupportFunction(opfamily=1976, opcintype=23,
ssup=0x00007fe86f8147d0) at sortsupport.c:99:24
    frame #19: 0x000000010cc73100
postgres`PrepareSortSupportFromIndexRel(indexRel=0x000000010d5ced48,
strategy=1, ssup=0x00007fe86f8147d0) at sortsupport.c:176:2
    frame #20: 0x000000010cc75463
postgres`tuplesort_begin_index_btree(heapRel=0x000000010d5cf808,
indexRel=0x000000010d5ced48, enforceUnique=false, workMem=21845,
coordinate=0x00007fe872839248, randomAccess=false) at
tuplesort.c:1114:3
    frame #21: 0x000000010c681ffc
postgres`_bt_parallel_scan_and_sort(btspool=0x00007fe872839738,
btspool2=0x0000000000000000, btshared=0x000000010d56c4c0,
sharedsort=0x000000010d56c460, sharedsort2=0x0000000000000000,
sortmem=21845, progress=false) at nbtsort.c:1941:23
    frame #22: 0x000000010c681eb2
postgres`_bt_parallel_build_main(seg=0x00007fe87280a058,
toc=0x000000010d56c000) at nbtsort.c:1889:2
    frame #23: 0x000000010c6b7358
postgres`ParallelWorkerMain(main_arg=1169089032) at parallel.c:1471:2
    frame #24: 0x000000010c9da86f postgres`StartBackgroundWorker at
bgworker.c:813:2
    frame #25: 0x000000010c9efbc0
postgres`do_start_bgworker(rw=0x00007fe86f419290) at
postmaster.c:5852:4
    frame #26: 0x000000010c9eff9f postgres`maybe_start_bgworkers at
postmaster.c:6078:9
    frame #27: 0x000000010c9eee99
postgres`sigusr1_handler(postgres_signal_arg=30) at
postmaster.c:5247:3
    frame #28: 0x00007fff77a74b5d libsystem_platform.dylib`_sigtramp + 29
    frame #29: 0x00007fff779c561b libsystem_kernel.dylib`__select + 11
    frame #30: 0x000000010c9ea48c postgres`ServerLoop at postmaster.c:1691:13
    frame #31: 0x000000010c9e9e06 postgres`PostmasterMain(argc=5,
argv=0x00007fe86f4036f0) at postmaster.c:1400:11
    frame #32: 0x000000010c8ee399 postgres`main(argc=<unavailable>,
argv=<unavailable>) at main.c:210:3
    frame #33: 0x00007fff778893d5 libdyld.dylib`start + 1

Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, Apr 7, 2020 at 12:00 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-04-07 04:12, Amit Kapila wrote:
> > On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >>
> >> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
> >>> I noticed in some of the screenshots that were tweeted that for example in
> >>>
> >>>      WAL:  records=1  bytes=56
> >>>
> >>> there are two spaces between pieces of data.  This doesn't match the rest of
> >>> the EXPLAIN output.  Can that be adjusted?
> >>
> >> We talked about that here:
> >> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com
> >>
> >
> > Yeah.  Just to brief here, the main reason was that one of the fields
> > (full page writes) already had a single space and then we had prior
> > cases as mentioned in Justin's email [1] where we use two spaces which
> > lead us to decide using two spaces in this case.
>
> We also have existing cases for the other way:
>
>      actual time=0.050..0.052
>      Buffers: shared hit=3 dirtied=1
>
> The cases mentioned by Justin are not formatted in a key=value format,
> so it's not quite the same, but it also raises the question why they are
> not.
>
> Let's figure out a way to consolidate this without making up a third format.

The parsability problem Justin was mentioning is only due to "full
page writes", so we could use "full_page_writes" or "fpw" instead and
remove the extra spaces.  There would be a small discrepancy with the
verbose autovacuum log, but there are others differences already.

I'd slightly in favor of "fpw" to be more concise. Would that be ok?



Re: WAL usage calculation patch

From
Justin Pryzby
Date:
On Tue, Apr 07, 2020 at 12:00:29PM +0200, Peter Eisentraut wrote:
> We also have existing cases for the other way:
> 
>     actual time=0.050..0.052
>     Buffers: shared hit=3 dirtied=1
> 
> The cases mentioned by Justin are not formatted in a key=value format, so
> it's not quite the same, but it also raises the question why they are not.
> 
> Let's figure out a way to consolidate this without making up a third format.

So this re-raises my suggestion here to use colons, Title Case Field Names, and
"Size: ..kB" rather than "bytes=":
|https://www.postgresql.org/message-id/20200403054451.GN14618%40telsasoft.com

As I see it, the sort/hashjoin style is being used for cases with fields with
different units:

   Sort Method: quicksort  Memory: 931kB
   Buckets: 1024  Batches: 1  Memory Usage: 16kB

..which is distinguished from the case where the units are the same, like
buffers (hit=Npages read=Npages dirtied=Npages written=Npages).

Note, as of 1f39bce021, we have hashagg_disk, which looks like this:

template1=# explain analyze SELECT a, COUNT(1) FROM generate_series(1,99999) a GROUP BY 1 ORDER BY 1;
...
   ->  HashAggregate  (cost=1499.99..1501.99 rows=200 width=12) (actual time=166.883..280.943 rows=99999 loops=1)
         Group Key: a
         Peak Memory Usage: 4913 kB
         Disk Usage: 1848 kB
         HashAgg Batches: 8

Incremental sort adds yet another variation, which I've mentioned that thread.
I'm hoping to come to some resolution here, first.
https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com

-- 
Justin



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-04-07 04:12, Amit Kapila wrote:
> > On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >>
> >> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote:
> >>> I noticed in some of the screenshots that were tweeted that for example in
> >>>
> >>>      WAL:  records=1  bytes=56
> >>>
> >>> there are two spaces between pieces of data.  This doesn't match the rest of
> >>> the EXPLAIN output.  Can that be adjusted?
> >>
> >> We talked about that here:
> >> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com
> >>
> >
> > Yeah.  Just to brief here, the main reason was that one of the fields
> > (full page writes) already had a single space and then we had prior
> > cases as mentioned in Justin's email [1] where we use two spaces which
> > lead us to decide using two spaces in this case.
>
> We also have existing cases for the other way:
>
>      actual time=0.050..0.052
>      Buffers: shared hit=3 dirtied=1
>

Buffers case is not the same because 'shared' is used for 'hit',
'read', 'dirtied', etc.  However, I think it is arguable.

> The cases mentioned by Justin are not formatted in a key=value format,
> so it's not quite the same, but it also raises the question why they are
> not.
>
> Let's figure out a way to consolidate this without making up a third format.
>

Sure, I think my intention is to keep the format of WAL stats as close
to Buffers stats as possible because both depict I/O and users would
probably be interested to check/read both together.  There is a point
to keep things in a format so that it is easier for someone to parse
but I guess as these as fixed 'words', it shouldn't be difficult
either way and we should give more weightage to consistency.  Any
suggestions?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > Buffer usage statistics seem correct. The small differences would be
> > > > catalog lookups Peter mentioned.
> > > >
> > >
> > > Agreed, but can you check which part of code does that lookup?  I want
> > > to see if we can avoid that from buffer usage stats or at least write
> > > a comment about it, otherwise, we might have to face this question
> > > again and again.
> >
> > Okay, I'll check it.
> >
>
> I've checked the buffer usage differences when parallel btree index creation.
>
> TL;DR;
>
> During tuple sorting individual parallel workers read blocks of
> pg_amproc and pg_amproc_fam_proc_index to get the sort support
> function. The call flow is like:
>
> ParallelWorkerMain()
>   _bt_parallel_scan_and_sort()
>     tuplesort_begin_index_btree()
>       PrepareSortSupportFromIndexRel()
>         FinishSortSupportFunction()
>           get_opfamily_proc()
>

Thanks for the investigation.  I don't see we can do anything special
about this.  In an ideal world, this should be done once and not for
each worker but I guess it doesn't matter too much.  I am not sure if
it is worth adding a comment for this, what do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
> > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > >
> > > > > Buffer usage statistics seem correct. The small differences would be
> > > > > catalog lookups Peter mentioned.
> > > > >
> > > >
> > > > Agreed, but can you check which part of code does that lookup?  I want
> > > > to see if we can avoid that from buffer usage stats or at least write
> > > > a comment about it, otherwise, we might have to face this question
> > > > again and again.
> > >
> > > Okay, I'll check it.
> > >
> >
> > I've checked the buffer usage differences when parallel btree index creation.
> >
> > TL;DR;
> >
> > During tuple sorting individual parallel workers read blocks of
> > pg_amproc and pg_amproc_fam_proc_index to get the sort support
> > function. The call flow is like:
> >
> > ParallelWorkerMain()
> >   _bt_parallel_scan_and_sort()
> >     tuplesort_begin_index_btree()
> >       PrepareSortSupportFromIndexRel()
> >         FinishSortSupportFunction()
> >           get_opfamily_proc()
> >
>
> Thanks for the investigation.  I don't see we can do anything special
> about this.  In an ideal world, this should be done once and not for
> each worker but I guess it doesn't matter too much.  I am not sure if
> it is worth adding a comment for this, what do you think?
>

I agree with you. If the differences were considerably large probably
we would do something but I think we don't need to anything at this
time.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > Thanks for the investigation.  I don't see we can do anything special
> > about this.  In an ideal world, this should be done once and not for
> > each worker but I guess it doesn't matter too much.  I am not sure if
> > it is worth adding a comment for this, what do you think?
> >
>
> I agree with you. If the differences were considerably large probably
> we would do something but I think we don't need to anything at this
> time.
>

Fair enough, can you once check this in back-branches as this needs to
be backpatched?  I will do that once by myself as well.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



On Wed, Apr 8, 2020 at 8:23 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada
> > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > >
> > > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > > >
> > > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada
> > > > > <masahiko.sawada@2ndquadrant.com> wrote:
> > > > > >
> > > > > > Buffer usage statistics seem correct. The small differences would be
> > > > > > catalog lookups Peter mentioned.
> > > > > >
> > > > >
> > > > > Agreed, but can you check which part of code does that lookup?  I want
> > > > > to see if we can avoid that from buffer usage stats or at least write
> > > > > a comment about it, otherwise, we might have to face this question
> > > > > again and again.
> > > >
> > > > Okay, I'll check it.
> > > >
> > >
> > > I've checked the buffer usage differences when parallel btree index creation.
> > >
> > > TL;DR;
> > >
> > > During tuple sorting individual parallel workers read blocks of
> > > pg_amproc and pg_amproc_fam_proc_index to get the sort support
> > > function. The call flow is like:
> > >
> > > ParallelWorkerMain()
> > >   _bt_parallel_scan_and_sort()
> > >     tuplesort_begin_index_btree()
> > >       PrepareSortSupportFromIndexRel()
> > >         FinishSortSupportFunction()
> > >           get_opfamily_proc()
> > >
> >
> > Thanks for the investigation.  I don't see we can do anything special
> > about this.  In an ideal world, this should be done once and not for
> > each worker but I guess it doesn't matter too much.  I am not sure if
> > it is worth adding a comment for this, what do you think?
> >
>
> I agree with you. If the differences were considerably large probably
> we would do something but I think we don't need to anything at this
> time.

+1



On Wed, 8 Apr 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > Thanks for the investigation.  I don't see we can do anything special
> > > about this.  In an ideal world, this should be done once and not for
> > > each worker but I guess it doesn't matter too much.  I am not sure if
> > > it is worth adding a comment for this, what do you think?
> > >
> >
> > I agree with you. If the differences were considerably large probably
> > we would do something but I think we don't need to anything at this
> > time.
> >
>
> Fair enough, can you once check this in back-branches as this needs to
> be backpatched?  I will do that once by myself as well.

I've done the same test with HEAD of both REL_12_STABLE and
REL_11_STABLE. I think the patch needs to be backpatched to PG11 where
parallel index creation was introduced. I've attached the patches
for PG12 and PG11 I used for this test for reference.

Here are the results:

* PG12

With no worker:
-[ RECORD 1 ]-------+-------------
shared_blks_hit     | 119
shared_blks_read    | 44283
total_read_blks     | 44402
shared_blks_dirtied | 44262
shared_blks_written | 24925

With 4 workers:
-[ RECORD 1 ]-------+------------
shared_blks_hit     | 128
shared_blks_read    | 8844
total_read_blks     | 8972
shared_blks_dirtied | 8822
shared_blks_written | 5393

With 4 workers after patching:
-[ RECORD 1 ]-------+------------
shared_blks_hit     | 140
shared_blks_read    | 44284
total_read_blks     | 44424
shared_blks_dirtied | 44262
shared_blks_written | 26574

* PG11

With no worker:
-[ RECORD 1 ]-------+------------
shared_blks_hit     | 124
shared_blks_read    | 44284
total_read_blks     | 44408
shared_blks_dirtied | 44263
shared_blks_written | 24908

With 4 workers:
-[ RECORD 1 ]-------+-------------
shared_blks_hit     | 132
shared_blks_read    | 8910
total_read_blks     | 9042
shared_blks_dirtied | 8888
shared_blks_written | 5370

With 4 workers after patched:
-[ RECORD 1 ]-------+-------------
shared_blks_hit     | 144
shared_blks_read    | 44285
total_read_blks     | 44429
shared_blks_dirtied | 44263
shared_blks_written | 26861


Regards,

--
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment
On Wed, Apr 8, 2020 at 1:49 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Wed, 8 Apr 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > > >
> > > > Thanks for the investigation.  I don't see we can do anything special
> > > > about this.  In an ideal world, this should be done once and not for
> > > > each worker but I guess it doesn't matter too much.  I am not sure if
> > > > it is worth adding a comment for this, what do you think?
> > > >
> > >
> > > I agree with you. If the differences were considerably large probably
> > > we would do something but I think we don't need to anything at this
> > > time.
> > >
> >
> > Fair enough, can you once check this in back-branches as this needs to
> > be backpatched?  I will do that once by myself as well.
>
> I've done the same test with HEAD of both REL_12_STABLE and
> REL_11_STABLE. I think the patch needs to be backpatched to PG11 where
> parallel index creation was introduced. I've attached the patches
> for PG12 and PG11 I used for this test for reference.
>

Thanks, I will once again verify and push this tomorrow if there are
no other comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Tue, Apr 7, 2020 at 2:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
> > <euler.taveira@2ndquadrant.com> wrote:
> > >
> > > On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >>
> > >> On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
> > >> > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >> >
> > >> > >
> > >> > > I have pushed pg_stat_statements and Explain related patches.  I am
> > >> > > now looking into (auto)vacuum patch and have few comments.
> > >> > >
> > >> > > I wasn't paying much attention to this thread. May I suggest changing
> > >> > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
> > >> > 'num'. It seems inconsistent to me.
> > >> >
> > >>
> > >> If we want to be consistent shouldn't we rename it to wal_fpws?  FTR I don't
> > >> like much either version.
> > >
> > >
> > > Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I
prefersingular form because parameter names are lowercase. Function description will clarify that this is "number of
WALfull page writes". 
> > >
> >
> > I like Euler's suggestion to change wal_num_fpw to wal_fpw.  It is
> > better if others who didn't like this name can also share their
> > opinion now because changing multiple times the same thing is not a
> > good idea.
>
> +1
>
> About Justin and your comments on the other thread:
>
> On Tue, Apr 7, 2020 at 4:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 6, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > On Thu, Apr 02, 2020 at 08:29:31AM +0200, Julien Rouhaud wrote:
> > > > > > "full page records" seems to be showing the number of full page
> > > > > > images, not the record having full page images.
> > > > >
> > > > > I am not sure what exactly is a difference but it is the records
> > > > > having full page images.  Julien correct me if I am wrong.
> > >
> > > > Obviously previous complaints about the meaning and parsability of
> > > > "full page writes" should be addressed here for consistency.
> > >
> > > There's a couple places that say "full page image records" which I think is
> > > language you were trying to avoid.  It's the number of pages, not the number of
> > > records, no ?  I see explain and autovacuum say what I think is wanted, but
> > > these say the wrong thing?  Find attached slightly larger patch.
> > >
> > > $ git grep 'image record'
> > > contrib/pg_stat_statements/pg_stat_statements.c:        int64           wal_num_fpw;    /* # of WAL full page
imagerecords generated */ 
> > > doc/src/sgml/ref/explain.sgml:      number of records, number of full page image records and amount of WAL
> > >
> >
> > Few comments:
> > 1.
> > - int64 wal_num_fpw; /* # of WAL full page image records generated */
> > + int64 wal_num_fpw; /* # of WAL full page images generated */
> >
> > Let's change comment as " /* # of WAL full page writes generated */"
> > to be consistent with other places like instrument.h.  Also, make a
> > similar change at other places if required.
>
> Agreed.  That's pg_stat_statements.c and instrument.h.  I'll send a
> patch once we reach consensus with the rest of the comments.
>

Would you like to send a consolidated patch that includes Euler's
suggestion and Justin's patch (by making changes for points we
discussed.)?  I think we can keep the point related to number of
spaces before each field open?

> > 2.
> >        <entry>
> > -        Total amount of WAL bytes generated by the statement
> > +        Total number of WAL bytes generated by the statement
> >        </entry>
> >
> > I feel the previous text was better as this field can give us the size
> > of WAL with which we can answer "how much WAL data is generated by a
> > particular statement?".  Julien, do you have any thoughts on this?
>
> I also prefer "amount" as it feels more natural.
>

As we see no other opinion on this matter, we can use "amount" here.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 7, 2020 at 2:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira
> > > <euler.taveira@2ndquadrant.com> wrote:
> > > Few comments:
> > > 1.
> > > - int64 wal_num_fpw; /* # of WAL full page image records generated */
> > > + int64 wal_num_fpw; /* # of WAL full page images generated */
> > >
> > > Let's change comment as " /* # of WAL full page writes generated */"
> > > to be consistent with other places like instrument.h.  Also, make a
> > > similar change at other places if required.
> >
> > Agreed.  That's pg_stat_statements.c and instrument.h.  I'll send a
> > patch once we reach consensus with the rest of the comments.
> >
>
> Would you like to send a consolidated patch that includes Euler's
> suggestion and Justin's patch (by making changes for points we
> discussed.)?  I think we can keep the point related to number of
> spaces before each field open?

Sure, I'll take care of that tomorrow!

> > > 2.
> > >        <entry>
> > > -        Total amount of WAL bytes generated by the statement
> > > +        Total number of WAL bytes generated by the statement
> > >        </entry>
> > >
> > > I feel the previous text was better as this field can give us the size
> > > of WAL with which we can answer "how much WAL data is generated by a
> > > particular statement?".  Julien, do you have any thoughts on this?
> >
> > I also prefer "amount" as it feels more natural.
> >
>
> As we see no other opinion on this matter, we can use "amount" here.

Ok.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Would you like to send a consolidated patch that includes Euler's
> > suggestion and Justin's patch (by making changes for points we
> > discussed.)?  I think we can keep the point related to number of
> > spaces before each field open?
>
> Sure, I'll take care of that tomorrow!

I tried to take into account all that have been discussed, but I have
to admit that I'm absolutely not sure of what was actually decided
here.  I went with those changes:

- rename wal_num_fpw to wal_fpw for consistency, both in pgss view
fiel name but also everywhere in the code
- change comments to consistently mention "full page writes generated"
- changed pgss and explain documentation to mention "full page images
generated", from Justin's patch on another thread
- kept "amount" of WAL bytes
- no change to the explain output as I have no idea what is the
consensus (one or two spaces, use semicolon or equal, show unit or
not)

Attachment
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
> On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > 
> > > I see some basic problems with the patch.  The way it tries to compute
> > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > or point me to any test done where we have computed WAL for parallel
> > > operations like Parallel Vacuum or Parallel Create Index?
> > 
> > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > workers won't be accounted for.  That being said, I think that an argument
> > could be made that proper infrastructure should have been added in the original
> > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > usage in parallel utility, unless I'm missing something.
> 
> Just to be sure I did a quick test with pg_stat_statements behavior using
> parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> doesn't reflect parallel workers' activity.
> 
> I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
> commit adding parallel maintenance.

I believe this is resolved for parallel vacuum in master and parallel create
index back to PG11.

I marked this as closed.
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781

-- 
Justin



Le dim. 12 avr. 2020 à 00:33, Justin Pryzby <pryzby@telsasoft.com> a écrit :
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
>
> Just to be sure I did a quick test with pg_stat_statements behavior using
> parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> doesn't reflect parallel workers' activity.
>
> I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
> commit adding parallel maintenance.

I believe this is resolved for parallel vacuum in master and parallel create
index back to PG11.

indeed, I was about to take care of this too


thanks a lot! 
On Sun, Apr 12, 2020 at 4:03 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
> > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote:
> > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote:
> > > >
> > > > I see some basic problems with the patch.  The way it tries to compute
> > > > WAL usage for parallel stuff doesn't seem right to me.  Can you share
> > > > or point me to any test done where we have computed WAL for parallel
> > > > operations like Parallel Vacuum or Parallel Create Index?
> > >
> > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility
> > > workers won't be accounted for.  That being said, I think that an argument
> > > could be made that proper infrastructure should have been added in the original
> > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer
> > > usage in parallel utility, unless I'm missing something.
> >
> > Just to be sure I did a quick test with pg_stat_statements behavior using
> > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> > doesn't reflect parallel workers' activity.
> >
> > I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
> > commit adding parallel maintenance.
>
> I believe this is resolved for parallel vacuum in master and parallel create
> index back to PG11.
>
> I marked this as closed.
> https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781
>

Okay, thanks.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > Would you like to send a consolidated patch that includes Euler's
> > > suggestion and Justin's patch (by making changes for points we
> > > discussed.)?  I think we can keep the point related to number of
> > > spaces before each field open?
> >
> > Sure, I'll take care of that tomorrow!
>
> I tried to take into account all that have been discussed, but I have
> to admit that I'm absolutely not sure of what was actually decided
> here.  I went with those changes:
>
> - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> fiel name but also everywhere in the code
> - change comments to consistently mention "full page writes generated"
> - changed pgss and explain documentation to mention "full page images
> generated", from Justin's patch on another thread
>

I think it is better to use "full page writes" to be consistent with
other places.

> - kept "amount" of WAL bytes
>

Okay, but I would like to make another change suggested by Justin
which is to replace "count" with "number" at a few places.

I have made the above two changes in the attached.  Let me know what
you think about attached?

> - no change to the explain output as I have no idea what is the
> consensus (one or two spaces, use semicolon or equal, show unit or
> not)
>

Yeah, let's do this separately once we have consensus.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > I tried to take into account all that have been discussed, but I have
> > to admit that I'm absolutely not sure of what was actually decided
> > here.  I went with those changes:
> >
> > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> > fiel name but also everywhere in the code
> > - change comments to consistently mention "full page writes generated"
> > - changed pgss and explain documentation to mention "full page images
> > generated", from Justin's patch on another thread
> >
>
> I think it is better to use "full page writes" to be consistent with
> other places.
>
> > - kept "amount" of WAL bytes
> >
>
> Okay, but I would like to make another change suggested by Justin
> which is to replace "count" with "number" at a few places.

Ah sorry I missed this one.  +1 it also sounds better.

> I have made the above two changes in the attached.  Let me know what
> you think about attached?

It all looks good to me!

> > - no change to the explain output as I have no idea what is the
> > consensus (one or two spaces, use semicolon or equal, show unit or
> > not)
> >
>
> Yeah, let's do this separately once we have consensus.

Agreed.



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > I tried to take into account all that have been discussed, but I have
> > > to admit that I'm absolutely not sure of what was actually decided
> > > here.  I went with those changes:
> > >
> > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> > > fiel name but also everywhere in the code
> > > - change comments to consistently mention "full page writes generated"
> > > - changed pgss and explain documentation to mention "full page images
> > > generated", from Justin's patch on another thread
> > >
> >
> > I think it is better to use "full page writes" to be consistent with
> > other places.
> >
> > > - kept "amount" of WAL bytes
> > >
> >
> > Okay, but I would like to make another change suggested by Justin
> > which is to replace "count" with "number" at a few places.
>
> Ah sorry I missed this one.  +1 it also sounds better.
>
> > I have made the above two changes in the attached.  Let me know what
> > you think about attached?
>
> It all looks good to me!
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
Le lun. 13 avr. 2020 à 13:47, Amit Kapila <amit.kapila16@gmail.com> a écrit :
On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > I tried to take into account all that have been discussed, but I have
> > > to admit that I'm absolutely not sure of what was actually decided
> > > here.  I went with those changes:
> > >
> > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> > > fiel name but also everywhere in the code
> > > - change comments to consistently mention "full page writes generated"
> > > - changed pgss and explain documentation to mention "full page images
> > > generated", from Justin's patch on another thread
> > >
> >
> > I think it is better to use "full page writes" to be consistent with
> > other places.
> >
> > > - kept "amount" of WAL bytes
> > >
> >
> > Okay, but I would like to make another change suggested by Justin
> > which is to replace "count" with "number" at a few places.
>
> Ah sorry I missed this one.  +1 it also sounds better.
>
> > I have made the above two changes in the attached.  Let me know what
> > you think about attached?
>
> It all looks good to me!
>

Pushed.

Thanks a lot Amit! 

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, Apr 8, 2020 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> >
> >
> > We also have existing cases for the other way:
> >
> >      actual time=0.050..0.052
> >      Buffers: shared hit=3 dirtied=1
> >
>
> Buffers case is not the same because 'shared' is used for 'hit',
> 'read', 'dirtied', etc.  However, I think it is arguable.
>
> > The cases mentioned by Justin are not formatted in a key=value format,
> > so it's not quite the same, but it also raises the question why they are
> > not.
> >
> > Let's figure out a way to consolidate this without making up a third format.
> >
>
> Sure, I think my intention is to keep the format of WAL stats as close
> to Buffers stats as possible because both depict I/O and users would
> probably be interested to check/read both together.  There is a point
> to keep things in a format so that it is easier for someone to parse
> but I guess as these as fixed 'words', it shouldn't be difficult
> either way and we should give more weightage to consistency.  Any
> suggestions?
>

Peter E, others, any suggestions on how to move forward?  I think here
we should follow the rule "follow the style of nearby code" which in
this case would be to have one space after each field as we would like
it to be closer to the "Buffers" format.  It would be good if we have
a unified format among all Explain stuff but we might not want to
change the existing things and even if we want to do that it might be
a broader/bigger change and we should do that as a PG14 change.  What
do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Peter Eisentraut
Date:
On 2020-04-14 05:57, Amit Kapila wrote:
> Peter E, others, any suggestions on how to move forward?  I think here
> we should follow the rule "follow the style of nearby code" which in
> this case would be to have one space after each field as we would like
> it to be closer to the "Buffers" format.  It would be good if we have
> a unified format among all Explain stuff but we might not want to
> change the existing things and even if we want to do that it might be
> a broader/bigger change and we should do that as a PG14 change.  What
> do you think?

If looks like shortening to fpw= and using one space is the easiest way 
to solve this issue.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-04-14 05:57, Amit Kapila wrote:
> > Peter E, others, any suggestions on how to move forward?  I think here
> > we should follow the rule "follow the style of nearby code" which in
> > this case would be to have one space after each field as we would like
> > it to be closer to the "Buffers" format.  It would be good if we have
> > a unified format among all Explain stuff but we might not want to
> > change the existing things and even if we want to do that it might be
> > a broader/bigger change and we should do that as a PG14 change.  What
> > do you think?
>
> If looks like shortening to fpw= and using one space is the easiest way
> to solve this issue.
>

I am fine with this approach and will change accordingly.  I will wait
for a few days (3-4 days) to see if someone shows up with either an
objection to this or with a better idea for the display of WAL usage
information.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> >
> > On 2020-04-14 05:57, Amit Kapila wrote:
> > > Peter E, others, any suggestions on how to move forward?  I think here
> > > we should follow the rule "follow the style of nearby code" which in
> > > this case would be to have one space after each field as we would like
> > > it to be closer to the "Buffers" format.  It would be good if we have
> > > a unified format among all Explain stuff but we might not want to
> > > change the existing things and even if we want to do that it might be
> > > a broader/bigger change and we should do that as a PG14 change.  What
> > > do you think?
> >
> > If looks like shortening to fpw= and using one space is the easiest way
> > to solve this issue.
> >
>
> I am fine with this approach and will change accordingly.  I will wait
> for a few days (3-4 days) to see if someone shows up with either an
> objection to this or with a better idea for the display of WAL usage
> information.

That was also my preferred alternative.  PFA a patch for that.  I also
changed to "fpw" for the non textual output for consistency.

Attachment

Re: WAL usage calculation patch

From
Justin Pryzby
Date:
On Sat, Apr 18, 2020 at 05:39:35PM +0200, Julien Rouhaud wrote:
> On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> > > On 2020-04-14 05:57, Amit Kapila wrote:
> > > > Peter E, others, any suggestions on how to move forward?  I think here
> > > > we should follow the rule "follow the style of nearby code" which in
> > > > this case would be to have one space after each field as we would like
> > > > it to be closer to the "Buffers" format.  It would be good if we have
> > > > a unified format among all Explain stuff but we might not want to
> > > > change the existing things and even if we want to do that it might be
> > > > a broader/bigger change and we should do that as a PG14 change.  What
> > > > do you think?
> > >
> > > If looks like shortening to fpw= and using one space is the easiest way
> > > to solve this issue.
> > >
> >
> > I am fine with this approach and will change accordingly.  I will wait
> > for a few days (3-4 days) to see if someone shows up with either an
> > objection to this or with a better idea for the display of WAL usage
> > information.
> 
> That was also my preferred alternative.  PFA a patch for that.  I also
> changed to "fpw" for the non textual output for consistency.

Should capitalize at least the non-text one ?  And maybe the text one for
consistency ?

+               ExplainPropertyInteger("WAL fpw", NULL,
                                                                            
 

And add the acronym to the docs:

$ git grep 'full page' '*/explain.sgml'
doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes

"..full page writes (FPW).."

Should we also change vacuumlazy.c for consistency ?

+                                                        _("WAL usage: %ld records, %ld full page writes, "
+                                                          UINT64_FORMAT " bytes"),

-- 
Justin



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
Hi Justin,

Thanks for the review!

On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Should capitalize at least the non-text one ?  And maybe the text one for
> consistency ?
>
> +               ExplainPropertyInteger("WAL fpw", NULL,

I think we should keep both version consistent, whether lower or upper
case.  The uppercase version is probably more correct, but it's a
little bit weird to have it being the only upper case label in all
output, so I kept it lower case.

> And add the acronym to the docs:
>
> $ git grep 'full page' '*/explain.sgml'
> doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes
>
> "..full page writes (FPW).."

Indeed!  Fixed (using lowercase to match current output).

> Should we also change vacuumlazy.c for consistency ?
>
> +                                                        _("WAL usage: %ld records, %ld full page writes, "
> +                                                          UINT64_FORMAT " bytes"),

I don't think this one should be changed, vacuumlazy output is already
entirely different, and is way more verbose so keeping it as is makes
sense to me.

Attachment

Re: WAL usage calculation patch

From
Kyotaro Horiguchi
Date:
At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> Hi Justin,
> 
> Thanks for the review!
> 
> On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > Should capitalize at least the non-text one ?  And maybe the text one for
> > consistency ?
> >
> > +               ExplainPropertyInteger("WAL fpw", NULL,
> 
> I think we should keep both version consistent, whether lower or upper
> case.  The uppercase version is probably more correct, but it's a
> little bit weird to have it being the only upper case label in all
> output, so I kept it lower case.

One space follwed by an acronym looks perfect.  I'd prefer capital
letters but small-letters also works well.

> > And add the acronym to the docs:
> >
> > $ git grep 'full page' '*/explain.sgml'
> > doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes
> >
> > "..full page writes (FPW).."
> 
> Indeed!  Fixed (using lowercase to match current output).

I searched through the documentation and AFAICS most of occurances of
"full page" are follwed by "image" and full_page_writes is used only
as the parameter name.

I'm fine with fpw as the acronym, but "fpw means the number of full
page images" looks odd..

> > Should we also change vacuumlazy.c for consistency ?
> >
> > +                                                        _("WAL usage: %ld records, %ld full page writes, "
> > +                                                          UINT64_FORMAT " bytes"),
> 
> I don't think this one should be changed, vacuumlazy output is already
> entirely different, and is way more verbose so keeping it as is makes
> sense to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
> > Hi Justin,
> >
> > Thanks for the review!
> >
> > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > Should capitalize at least the non-text one ?  And maybe the text one for
> > > consistency ?
> > >
> > > +               ExplainPropertyInteger("WAL fpw", NULL,
> >
> > I think we should keep both version consistent, whether lower or upper
> > case.  The uppercase version is probably more correct, but it's a
> > little bit weird to have it being the only upper case label in all
> > output, so I kept it lower case.

I think we can keep upper-case for all non-text ones in case of WAL
usage, something like WAL Records, WAL FPW, WAL Bytes.  The buffer
usage seems to be following a similar convention.

>
> One space follwed by an acronym looks perfect.  I'd prefer capital
> letters but small-letters also works well.
>
> > > And add the acronym to the docs:
> > >
> > > $ git grep 'full page' '*/explain.sgml'
> > > doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes
> > >
> > > "..full page writes (FPW).."
> >
> > Indeed!  Fixed (using lowercase to match current output).
>
> I searched through the documentation and AFAICS most of occurances of
> "full page" are follwed by "image" and full_page_writes is used only
> as the parameter name.
>
> I'm fine with fpw as the acronym, but "fpw means the number of full
> page images" looks odd..
>

I don't understand this.  Where are we using such a description of fpw?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Justin Pryzby
Date:
On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote:
> > > > And add the acronym to the docs:
> > > >
> > > > $ git grep 'full page' '*/explain.sgml'
> > > > doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes
> > > >
> > > > "..full page writes (FPW).."
> > >
> > > Indeed!  Fixed (using lowercase to match current output).
> >
> > I searched through the documentation and AFAICS most of occurances of
> > "full page" are follwed by "image" and full_page_writes is used only
> > as the parameter name.
> >
> > I'm fine with fpw as the acronym, but "fpw means the number of full
> > page images" looks odd..
> >
> 
> I don't understand this.  Where are we using such a description of fpw?

I suggested to add " (FPW)" to the new docs for "explain(wal)"
But, the documentation before this commit mostly refers to "full page images".
So the implication is that maybe we should use that language (and FPI acronym).

The only pre-existing use of "full page writes" seems to be here:
$ git grep -iC2 'full page write' origin doc 
origin:doc/src/sgml/wal.sgml-      Internal data structures such as <filename>pg_xact</filename>,
<filename>pg_subtrans</filename>,<filename>pg_multixact</filename>,
 
origin:doc/src/sgml/wal.sgml-      <filename>pg_serial</filename>, <filename>pg_notify</filename>,
<filename>pg_stat</filename>,<filename>pg_snapshots</filename> are not directly
 
origin:doc/src/sgml/wal.sgml:      checksummed, nor are pages protected by full page writes. However, where

And we're not using either acronym.

-- 
Justin



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, Apr 22, 2020 at 9:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote:
> > > > > And add the acronym to the docs:
> > > > >
> > > > > $ git grep 'full page' '*/explain.sgml'
> > > > > doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes
> > > > >
> > > > > "..full page writes (FPW).."
> > > >
> > > > Indeed!  Fixed (using lowercase to match current output).
> > >
> > > I searched through the documentation and AFAICS most of occurances of
> > > "full page" are follwed by "image" and full_page_writes is used only
> > > as the parameter name.
> > >
> > > I'm fine with fpw as the acronym, but "fpw means the number of full
> > > page images" looks odd..
> > >
> >
> > I don't understand this.  Where are we using such a description of fpw?
>
> I suggested to add " (FPW)" to the new docs for "explain(wal)"
> But, the documentation before this commit mostly refers to "full page images".
> So the implication is that maybe we should use that language (and FPI acronym).
>

I am not sure if it matters that much. I think we can use "full page
writes (FPW)" in this case but we should be consistent wherever we
refer it in the WAL usage context and I think we already are, if not
then let's be consistent.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, Apr 22, 2020 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
> > > Hi Justin,
> > >
> > > Thanks for the review!
> > >
> > > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > > Should capitalize at least the non-text one ?  And maybe the text one for
> > > > consistency ?
> > > >
> > > > +               ExplainPropertyInteger("WAL fpw", NULL,
> > >
> > > I think we should keep both version consistent, whether lower or upper
> > > case.  The uppercase version is probably more correct, but it's a
> > > little bit weird to have it being the only upper case label in all
> > > output, so I kept it lower case.
>
> I think we can keep upper-case for all non-text ones in case of WAL
> usage, something like WAL Records, WAL FPW, WAL Bytes.  The buffer
> usage seems to be following a similar convention.
>

The attached patch changed the non-text display format as mentioned.
Let me know if you have any comments?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Wed, Apr 22, 2020 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 22, 2020 at 9:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote:
> > > > > > And add the acronym to the docs:
> > > > > >
> > > > > > $ git grep 'full page' '*/explain.sgml'
> > > > > > doc/src/sgml/ref/explain.sgml:      number of records, number of full page writes and amount of WAL bytes
> > > > > >
> > > > > > "..full page writes (FPW).."
> > > > >
> > > > > Indeed!  Fixed (using lowercase to match current output).
> > > >
> > > > I searched through the documentation and AFAICS most of occurances of
> > > > "full page" are follwed by "image" and full_page_writes is used only
> > > > as the parameter name.
> > > >
> > > > I'm fine with fpw as the acronym, but "fpw means the number of full
> > > > page images" looks odd..
> > > >
> > >
> > > I don't understand this.  Where are we using such a description of fpw?
> >
> > I suggested to add " (FPW)" to the new docs for "explain(wal)"
> > But, the documentation before this commit mostly refers to "full page images".
> > So the implication is that maybe we should use that language (and FPI acronym).
> >
>
> I am not sure if it matters that much. I think we can use "full page
> writes (FPW)" in this case but we should be consistent wherever we
> refer it in the WAL usage context and I think we already are, if not
> then let's be consistent.

I agree that full page writes can be used in this case, but I'm
wondering if that can be misleading for some reader which might e.g.
confuse with the full_page_writes GUC.  And as Justin pointed out, the
documentation for now usually mentions "full page image(s)" in such
cases.



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Apr 23, 2020 at 7:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 22, 2020 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi
> > <horikyota.ntt@gmail.com> wrote:
> > >
> > > At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in
> > > > Hi Justin,
> > > >
> > > > Thanks for the review!
> > > >
> > > > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > >
> > > > > Should capitalize at least the non-text one ?  And maybe the text one for
> > > > > consistency ?
> > > > >
> > > > > +               ExplainPropertyInteger("WAL fpw", NULL,
> > > >
> > > > I think we should keep both version consistent, whether lower or upper
> > > > case.  The uppercase version is probably more correct, but it's a
> > > > little bit weird to have it being the only upper case label in all
> > > > output, so I kept it lower case.
> >
> > I think we can keep upper-case for all non-text ones in case of WAL
> > usage, something like WAL Records, WAL FPW, WAL Bytes.  The buffer
> > usage seems to be following a similar convention.
> >
>
> The attached patch changed the non-text display format as mentioned.
> Let me know if you have any comments?

Assuming that we're fine using full page write(s) / FPW  rather than
full page image(s) / FPI (see previous mail), I'm fine with this
patch.



Re: WAL usage calculation patch

From
Kyotaro Horiguchi
Date:
At Thu, 23 Apr 2020 07:33:13 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in 
> > > > > I think we should keep both version consistent, whether lower or upper
> > > > > case.  The uppercase version is probably more correct, but it's a
> > > > > little bit weird to have it being the only upper case label in all
> > > > > output, so I kept it lower case.
> > >
> > > I think we can keep upper-case for all non-text ones in case of WAL
> > > usage, something like WAL Records, WAL FPW, WAL Bytes.  The buffer
> > > usage seems to be following a similar convention.
> > >
> >
> > The attached patch changed the non-text display format as mentioned.
> > Let me know if you have any comments?
> 
> Assuming that we're fine using full page write(s) / FPW  rather than
> full page image(s) / FPI (see previous mail), I'm fine with this
> patch.

FWIW, I like FPW, and the patch looks good to me. The index in the
documentation has the entry for full_page_writes (having underscores)
and it would work.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-04-23 07:31, Julien Rouhaud wrote:
> > I agree that full page writes can be used in this case, but I'm
> > wondering if that can be misleading for some reader which might e.g.
> > confuse with the full_page_writes GUC.  And as Justin pointed out, the
> > documentation for now usually mentions "full page image(s)" in such
> > cases.
>
> ISTM that in the context of this patch, "full-page image" is correct.  A
> "full-page write" is what you do to a table or index page when you are
> recovering a full-page image.
>

So what do we call when we log the page after it is touched after
checkpoint?  I thought we call that as full-page write.

>  The internal symbol for the WAL record is
> XLOG_FPI and xlogdesc.c prints it as "FPI".
>

That is just one way/reason we log the page.  There are others as
well.  I thought here we are computing the number of full-page writes
happened in the system due to various reasons like (a) a page is
operated upon first time after the checkpoint, (b) log the XLOG_FPI
record, (c) Guc for WAL consistency checker is on, etc.  If we see in
XLogRecordAssemble where we decide to log this information, there is a
comment " .... log a full-page write for the current block." and there
was an existing variable with 'fpw_lsn' which indicates to an extent
that what we are computing in this patch is full-page writes.  But
there is a reference to full-page image as well.  I think as
full_page_writes is an exposed variable that is well understood so
exposing information with similar name via this patch doesn't sound
illogical to me. Whatever we use here we need to be consistent all
throughout, even pg_stat_statements need to name exposed variable as
wal_fpi instead of wal_fpw.

To me, full-page writes sound more appealing with other WAL usage
variables like records and bytes. I might be more used to this term as
'fpw' that is why it occurred better to me.  OTOH, if most of us think
that a full-page image is better suited here, I am fine with changing
it at all places.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
>
> >  The internal symbol for the WAL record is
> > XLOG_FPI and xlogdesc.c prints it as "FPI".
> >
>
> That is just one way/reason we log the page.  There are others as
> well.  I thought here we are computing the number of full-page writes
> happened in the system due to various reasons like (a) a page is
> operated upon first time after the checkpoint, (b) log the XLOG_FPI
> record, (c) Guc for WAL consistency checker is on, etc.  If we see in
> XLogRecordAssemble where we decide to log this information, there is a
> comment " .... log a full-page write for the current block." and there
> was an existing variable with 'fpw_lsn' which indicates to an extent
> that what we are computing in this patch is full-page writes.  But
> there is a reference to full-page image as well.  I think as
> full_page_writes is an exposed variable that is well understood so
> exposing information with similar name via this patch doesn't sound
> illogical to me. Whatever we use here we need to be consistent all
> throughout, even pg_stat_statements need to name exposed variable as
> wal_fpi instead of wal_fpw.
>
> To me, full-page writes sound more appealing with other WAL usage
> variables like records and bytes. I might be more used to this term as
> 'fpw' that is why it occurred better to me.  OTOH, if most of us think
> that a full-page image is better suited here, I am fine with changing
> it at all places.
>

Julien, Peter, others do you have any opinion here?  I think it is
better if we decide on one of FPW or FPI and make the changes at all
places for this patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Michael Paquier
Date:
On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote:
> On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
>>>  The internal symbol for the WAL record is
>>> XLOG_FPI and xlogdesc.c prints it as "FPI".
>
> Julien, Peter, others do you have any opinion here?  I think it is
> better if we decide on one of FPW or FPI and make the changes at all
> places for this patch.

It seems to me that Peter is right here.  A full-page write is the
action to write a full-page image, so if you consider only a way to
define the static data of a full-page and/or a quantity associated to
it, we should talk about full-page images.
--
Michael

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, Apr 27, 2020 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote:
> > On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> >>>  The internal symbol for the WAL record is
> >>> XLOG_FPI and xlogdesc.c prints it as "FPI".
> >
> > Julien, Peter, others do you have any opinion here?  I think it is
> > better if we decide on one of FPW or FPI and make the changes at all
> > places for this patch.
>
> It seems to me that Peter is right here.  A full-page write is the
> action to write a full-page image, so if you consider only a way to
> define the static data of a full-page and/or a quantity associated to
> it, we should talk about full-page images.

I agree with that definition.  I can send a cleanup patch if there's
no objection.



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Apr 27, 2020 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote:
> >
> > On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote:
> > > On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> > >>>  The internal symbol for the WAL record is
> > >>> XLOG_FPI and xlogdesc.c prints it as "FPI".
> > >
> > > Julien, Peter, others do you have any opinion here?  I think it is
> > > better if we decide on one of FPW or FPI and make the changes at all
> > > places for this patch.
> >
> > It seems to me that Peter is right here.  A full-page write is the
> > action to write a full-page image, so if you consider only a way to
> > define the static data of a full-page and/or a quantity associated to
> > it, we should talk about full-page images.
>

Fair enough, if more people want full-page image terminology in this
context then we can do that.

> I agree with that definition.  I can send a cleanup patch if there's
> no objection.
>

Okay, feel free to send the patch.  Thanks for taking the initiative
to write a patch for this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
>
> > I agree with that definition.  I can send a cleanup patch if there's
> > no objection.
> >
>
> Okay, feel free to send the patch.  Thanks for taking the initiative
> to write a patch for this.
>

Julien, are you planning to write a cleanup patch for this open item?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> >
> > > I agree with that definition.  I can send a cleanup patch if there's
> > > no objection.
> > >
> >
> > Okay, feel free to send the patch.  Thanks for taking the initiative
> > to write a patch for this.
> >
>
> Julien, are you planning to write a cleanup patch for this open item?

Sorry Amit, I've been quite busy at work for the last couple of days.
I'll take care of that this morning for sure!



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > >
> > > > I agree with that definition.  I can send a cleanup patch if there's
> > > > no objection.
> > > >
> > >
> > > Okay, feel free to send the patch.  Thanks for taking the initiative
> > > to write a patch for this.
> > >
> >
> > Julien, are you planning to write a cleanup patch for this open item?
>
> Sorry Amit, I've been quite busy at work for the last couple of days.
> I'll take care of that this morning for sure!

Here's the patch.  I included the content of
v3-fix_explain_wal_output.patch you provided before, and tried to
consistently replace full page writes/fpw to full page images/fpi
everywhere on top of it (so documentation, command output, variable
names and comments).

Attachment

Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > Julien, are you planning to write a cleanup patch for this open item?
> >
> > Sorry Amit, I've been quite busy at work for the last couple of days.
> > I'll take care of that this morning for sure!
>
> Here's the patch.
>

Thanks for the patch. I will look into it early next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> Here's the patch.  I included the content of
> v3-fix_explain_wal_output.patch you provided before, and tried to
> consistently replace full page writes/fpw to full page images/fpi
> everywhere on top of it (so documentation, command output, variable
> names and comments).
>

Your patch looks mostly good to me.  I have made slight modifications
which include changing the non-text format in show_wal_usage to use a
capital letter for the second word, which makes it similar to Buffer
usage stats, and additionally, ran pgindent.

Let me know what do you think of attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > Here's the patch.  I included the content of
> > v3-fix_explain_wal_output.patch you provided before, and tried to
> > consistently replace full page writes/fpw to full page images/fpi
> > everywhere on top of it (so documentation, command output, variable
> > names and comments).
> >
>
> Your patch looks mostly good to me.  I have made slight modifications
> which include changing the non-text format in show_wal_usage to use a
> capital letter for the second word, which makes it similar to Buffer
> usage stats, and additionally, ran pgindent.
>
> Let me know what do you think of attached?

Thanks a lot Amit.  It looks perfect to me!



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Mon, May 4, 2020 at 8:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > Here's the patch.  I included the content of
> > > v3-fix_explain_wal_output.patch you provided before, and tried to
> > > consistently replace full page writes/fpw to full page images/fpi
> > > everywhere on top of it (so documentation, command output, variable
> > > names and comments).
> > >
> >
> > Your patch looks mostly good to me.  I have made slight modifications
> > which include changing the non-text format in show_wal_usage to use a
> > capital letter for the second word, which makes it similar to Buffer
> > usage stats, and additionally, ran pgindent.
> >
> > Let me know what do you think of attached?
>
> Thanks a lot Amit.  It looks perfect to me!
>

Pushed.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WAL usage calculation patch

From
Julien Rouhaud
Date:
On Tue, May 5, 2020 at 12:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, May 4, 2020 at 8:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> >
> > On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > > Here's the patch.  I included the content of
> > > > v3-fix_explain_wal_output.patch you provided before, and tried to
> > > > consistently replace full page writes/fpw to full page images/fpi
> > > > everywhere on top of it (so documentation, command output, variable
> > > > names and comments).
> > > >
> > >
> > > Your patch looks mostly good to me.  I have made slight modifications
> > > which include changing the non-text format in show_wal_usage to use a
> > > capital letter for the second word, which makes it similar to Buffer
> > > usage stats, and additionally, ran pgindent.
> > >
> > > Let me know what do you think of attached?
> >
> > Thanks a lot Amit.  It looks perfect to me!
> >
>
> Pushed.

Thanks!



Re: WAL usage calculation patch

From
Amit Kapila
Date:
On Wed, May 6, 2020 at 12:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Tue, May 5, 2020 at 12:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > > >
> > > > Your patch looks mostly good to me.  I have made slight modifications
> > > > which include changing the non-text format in show_wal_usage to use a
> > > > capital letter for the second word, which makes it similar to Buffer
> > > > usage stats, and additionally, ran pgindent.
> > > >
> > > > Let me know what do you think of attached?
> > >
> > > Thanks a lot Amit.  It looks perfect to me!
> > >
> >
> > Pushed.
>
> Thanks!
>

I have updated the open items page to reflect this commit [1].

[1] - https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com