Thread: WAL usage calculation patch
Hello pgsql-hackers, Submitting a patch that would enable gathering of per-statement WAL generation statistics, similar to how it is done for buffer usage. Collected is the number of records added to WAL and number of WAL bytes written. The data collected was found valuable to analyze update-heavy load, with WAL generation being the bottleneck. The usage data is collected at low level, after compression is done on WAL record. Data is then exposed via pg_stat_statements, could also be used in EXPLAIN ANALYZE if needed. Instrumentation is alike to the one used for buffer stats. I didn't dare to unify both usage metric sets into single struct, nor rework the way both are passed to parallel workers. Performance impact is (supposed to be) very low, essentially adding two int operations and memory access on WAL record insert. Additional efforts to allocate shmem chunk for parallel workers. Parallel workers shmem usage is increased to fir in a struct of two longs. Patch is separated in two parts: core changes and pg_stat_statements additions. Essentially the extension has its schema updated to allow two more fields, docs updated to reflect the change. Patch is prepared against master branch. Please provide your comments and/or code findings.
Attachment
On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote: > > Hello pgsql-hackers, > > Submitting a patch that would enable gathering of per-statement WAL > generation statistics, similar to how it is done for buffer usage. > Collected is the number of records added to WAL and number of WAL > bytes written. > > The data collected was found valuable to analyze update-heavy load, > with WAL generation being the bottleneck. > > The usage data is collected at low level, after compression is done on > WAL record. Data is then exposed via pg_stat_statements, could also be > used in EXPLAIN ANALYZE if needed. Instrumentation is alike to the one > used for buffer stats. I didn't dare to unify both usage metric sets > into single struct, nor rework the way both are passed to parallel > workers. > > Performance impact is (supposed to be) very low, essentially adding > two int operations and memory access on WAL record insert. Additional > efforts to allocate shmem chunk for parallel workers. Parallel workers > shmem usage is increased to fir in a struct of two longs. > > Patch is separated in two parts: core changes and pg_stat_statements > additions. Essentially the extension has its schema updated to allow > two more fields, docs updated to reflect the change. Patch is prepared > against master branch. > > Please provide your comments and/or code findings. I like the concept, I'm a big fan of anything that affordably improves visibility into Pg's I/O and activity. To date I've been relying on tools like systemtap to do this sort of thing. But that's a bit specialised, and Pg currently lacks useful instrumentation for it so it can be a pain to match up activity by parallel workers and that sort of thing. (I aim to find time to submit a patch for that.) I haven't yet reviewed the patch. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote: > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote: > > Patch is separated in two parts: core changes and pg_stat_statements > > additions. Essentially the extension has its schema updated to allow > > two more fields, docs updated to reflect the change. Patch is prepared > > against master branch. > > > > Please provide your comments and/or code findings. > > I like the concept, I'm a big fan of anything that affordably improves > visibility into Pg's I/O and activity. +1 > To date I've been relying on tools like systemtap to do this sort of > thing. But that's a bit specialised, and Pg currently lacks useful > instrumentation for it so it can be a pain to match up activity by > parallel workers and that sort of thing. (I aim to find time to submit > a patch for that.) (I'm interested in seeing your conference talk about that! I did a bunch of stuff with static probes to measure PHJ behaviour around barrier waits and so on but it was hard to figure out what stuff like that to put in the actual tree, it was all a bit use-once-to-test-a-theory-and-then-throw-away.) Kirill, I noticed that you included a regression test that is failing. Can this possibly be stable across machines or even on the same machine? Does it still pass for you or did something change on the master branch to add a new WAL record since you posted the patch? query | calls | rows | wal_write_bytes | wal_write_records -------------------------------------------+-------+------+-----------------+------------------- - CREATE INDEX test_b ON test(b) | 1 | 0 | 1673 | 16 - DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 56 | 1 + CREATE INDEX test_b ON test(b) | 1 | 0 | 1755 | 17 + DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 0 | 0
вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>: > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote: > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > Patch is separated in two parts: core changes and pg_stat_statements > > > additions. Essentially the extension has its schema updated to allow > > > two more fields, docs updated to reflect the change. Patch is prepared > > > against master branch. > > > > > > Please provide your comments and/or code findings. > > > > I like the concept, I'm a big fan of anything that affordably improves > > visibility into Pg's I/O and activity. > > +1 > > > To date I've been relying on tools like systemtap to do this sort of > > thing. But that's a bit specialised, and Pg currently lacks useful > > instrumentation for it so it can be a pain to match up activity by > > parallel workers and that sort of thing. (I aim to find time to submit > > a patch for that.) > > (I'm interested in seeing your conference talk about that! I did a > bunch of stuff with static probes to measure PHJ behaviour around > barrier waits and so on but it was hard to figure out what stuff like > that to put in the actual tree, it was all a bit > use-once-to-test-a-theory-and-then-throw-away.) > > Kirill, I noticed that you included a regression test that is failing. Can > this possibly be stable across machines or even on the same machine? > Does it still pass for you or did something change on the master > branch to add a new WAL record since you posted the patch? Thank you for testing the patch and running extension checks. I assume the patch applies without problems. As for the regr test, it apparently requires some rework. I didn't pay attention enough to make sure the data I check is actually meaningful and isolated enough to be repeatable. Please consider the extension part of the patch as WIP, I'll resubmit the patch once I get a stable and meanngful test up. Thanks for finding it! > query | calls | rows | wal_write_bytes | wal_write_records > -------------------------------------------+-------+------+-----------------+------------------- > - CREATE INDEX test_b ON test(b) | 1 | 0 | 1673 | > 16 > - DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 56 | > 1 > + CREATE INDEX test_b ON test(b) | 1 | 0 | 1755 | > 17 > + DROP FUNCTION IF EXISTS PLUS_ONE(INTEGER) | 1 | 0 | 0 | > 0
> вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>: > > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote: > > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > > Patch is separated in two parts: core changes and pg_stat_statements > > > > additions. Essentially the extension has its schema updated to allow > > > > two more fields, docs updated to reflect the change. Patch is prepared > > > > against master branch. > > > > > > > > Please provide your comments and/or code findings. > > > > > > I like the concept, I'm a big fan of anything that affordably improves > > > visibility into Pg's I/O and activity. > > > > +1 > > > > > To date I've been relying on tools like systemtap to do this sort of > > > thing. But that's a bit specialised, and Pg currently lacks useful > > > instrumentation for it so it can be a pain to match up activity by > > > parallel workers and that sort of thing. (I aim to find time to submit > > > a patch for that.) > > > > (I'm interested in seeing your conference talk about that! I did a > > bunch of stuff with static probes to measure PHJ behaviour around > > barrier waits and so on but it was hard to figure out what stuff like > > that to put in the actual tree, it was all a bit > > use-once-to-test-a-theory-and-then-throw-away.) > > > > Kirill, I noticed that you included a regression test that is failing. Can > > this possibly be stable across machines or even on the same machine? > > Does it still pass for you or did something change on the master > > branch to add a new WAL record since you posted the patch? > > Thank you for testing the patch and running extension checks. I assume > the patch applies without problems. > > As for the regr test, it apparently requires some rework. I didn't pay > attention enough to make sure the data I check is actually meaningful > and isolated enough to be repeatable. > > Please consider the extension part of the patch as WIP, I'll resubmit > the patch once I get a stable and meanngful test up. Thanks for > finding it! > I have reworked the extension regression test to be more isolated. Apparently, something merged into master branch shifted my numbers. PFA the new patch. Core part didn't change a bit, the extension part has regression test SQL and expected log changed. Looking forward for new comments.
Attachment
On Thu, Feb 20, 2020 at 06:56:27PM +0300, Kirill Bychik wrote: > > вт, 18 февр. 2020 г. в 06:23, Thomas Munro <thomas.munro@gmail.com>: > > > On Mon, Feb 10, 2020 at 8:20 PM Craig Ringer <craig@2ndquadrant.com> wrote: > > > > On Wed, 5 Feb 2020 at 21:36, Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > > > Patch is separated in two parts: core changes and pg_stat_statements > > > > > additions. Essentially the extension has its schema updated to allow > > > > > two more fields, docs updated to reflect the change. Patch is prepared > > > > > against master branch. > > > > > > > > > > Please provide your comments and/or code findings. > > > > > > > > I like the concept, I'm a big fan of anything that affordably improves > > > > visibility into Pg's I/O and activity. > > > > > > +1 Huge +1 too. > > Thank you for testing the patch and running extension checks. I assume > > the patch applies without problems. > > > > As for the regr test, it apparently requires some rework. I didn't pay > > attention enough to make sure the data I check is actually meaningful > > and isolated enough to be repeatable. > > > > Please consider the extension part of the patch as WIP, I'll resubmit > > the patch once I get a stable and meanngful test up. Thanks for > > finding it! > > > > I have reworked the extension regression test to be more isolated. > Apparently, something merged into master branch shifted my numbers. > > PFA the new patch. Core part didn't change a bit, the extension part > has regression test SQL and expected log changed. I'm quite worried about the stability of those counters for regression tests. Wouldn't a checkpoint happening during the test change them? While at it, did you consider adding a full-page image counter in the WalUsage? That's something I'd really like to have and it doesn't seem hard to integrate. Another point is that this patch won't help to see autovacuum activity. As an example, I did a quick test to store the informations in pgstat, sending the data in the PG_FINALLY part of vacuum(): rjuju=# create table t1(id integer, val text); CREATE TABLE rjuju=# insert into t1 select i, 'val ' || i from generate_series(1, 100000) i; INSERT 0 100000 rjuju=# vacuum t1; VACUUM rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes from pg_stat_database where datname = 'rjuju'; datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes ---------+-----------------+---------------+---------------------+------------------- rjuju | 547 | 65201 | 0 | 0 (1 row) rjuju=# delete from t1 where id % 2 = 0; DELETE 50000 rjuju=# select pg_sleep(60); pg_sleep ---------- (1 row) rjuju=# select datname, vac_wal_records, vac_wal_bytes, autovac_wal_records, autovac_wal_bytes from pg_stat_database where datname = 'rjuju'; datname | vac_wal_records | vac_wal_bytes | autovac_wal_records | autovac_wal_bytes ---------+-----------------+---------------+---------------------+------------------- rjuju | 547 | 65201 | 1631 | 323193 (1 row) That's seems like useful data (especially since I recently had to dig into a problematic WAL consumption issue that was due to some autovacuum activity), but that may seem strange to only account for (auto)vacuum activity, rather than globally, grouping per RmgrId or CommandTag for instance. We could then see the complete WAL usage per-database. What do you think? Some minor points I noticed: - the extension patch doesn't apply anymore, I guess since 70a7732007bc4689 #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009) +#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE000000000000010) Shouldn't it be 0xA rather than 0x10? - it would be better to add a version number to the patches, so we're sure which one we're talking about.
On Wed, Mar 04, 2020 at 05:02:25PM +0100, Julien Rouhaud wrote: > I'm quite worried about the stability of those counters for regression tests. > Wouldn't a checkpoint happening during the test change them? Yep. One way to go through that would be to test if this output is non-zero still I suspect at quick glance that this won't be entirely reliable either. > While at it, did you consider adding a full-page image counter in the WalUsage? > That's something I'd really like to have and it doesn't seem hard to integrate. FWIW, one reason here is that we had recently some benchmark work done internally where this would have been helpful in studying some spiky WAL load patterns. -- Michael
Attachment
> I'm quite worried about the stability of those counters for regression tests. > Wouldn't a checkpoint happening during the test change them? Agree, stability of test could be an issue, even shifting of write format or compression method or adding compatible changes could break such test. Frankly speaking, the numbers expected are not actually calculated, my logic was rather well described by "these numbers should be non-zero for real tables". I believe the test can be modified to check that numbers are above zero, both for bytes written and for records stored. Having a checkpoint in the moddle of the test can be almost 100% countered by triggering one before the test. I'll add a checkpoint call to the test scenario, if no objections here. > While at it, did you consider adding a full-page image counter in the WalUsage? > That's something I'd really like to have and it doesn't seem hard to integrate. Well, not sure I understand you 100%, being new to Postgres dev. Do you want a separate counter for pages written whenever doPageWrites is true? I can do that, if needed. Please confirm. > Another point is that this patch won't help to see autovacuum activity. > As an example, I did a quick te..... > ...LONG QUOTE... > but that may seem strange to only account for (auto)vacuum activity, rather > than globally, grouping per RmgrId or CommandTag for instance. We could then > see the complete WAL usage per-database. What do you think? I wanted to keep the patch small and simple, and fit to practical needs. This patch is supposed to provide tuning assistance, catching an io heavy query in commit-bound situation. Total WAL usage per DB can be assessed rather easily using other means. Let's get this change into the codebase and then work on connecting WAL usage to (auto)vacuum stats. > > Some minor points I noticed: > > - the extension patch doesn't apply anymore, I guess since 70a7732007bc4689 Will fix, thank you. > > #define PARALLEL_KEY_JIT_INSTRUMENTATION UINT64CONST(0xE000000000000009) > +#define PARALLEL_KEY_WAL_USAGE UINT64CONST(0xE000000000000010) > > Shouldn't it be 0xA rather than 0x10? Oww, my bad, this is embaracing! Will fix, thank you. > - it would be better to add a version number to the patches, so we're sure > which one we're talking about. Noted, thank you. Please comment on the proposed changes, I will cook up a new version once all are agreed upon.
On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > While at it, did you consider adding a full-page image counter in the WalUsage? > > That's something I'd really like to have and it doesn't seem hard to integrate. > > Well, not sure I understand you 100%, being new to Postgres dev. Do > you want a separate counter for pages written whenever doPageWrites is > true? I can do that, if needed. Please confirm. Yes, I meant a separate 3rd counter for the number of full page images written. However after a quick look I think that a FPI should be detected with (doPageWrites && fpw_lsn != InvalidXLogRecPtr && fpw_lsn <= RedoRecPtr). > > Another point is that this patch won't help to see autovacuum activity. > > As an example, I did a quick te..... > > ...LONG QUOTE... > > but that may seem strange to only account for (auto)vacuum activity, rather > > than globally, grouping per RmgrId or CommandTag for instance. We could then > > see the complete WAL usage per-database. What do you think? > > I wanted to keep the patch small and simple, and fit to practical > needs. This patch is supposed to provide tuning assistance, catching > an io heavy query in commit-bound situation. > Total WAL usage per DB can be assessed rather easily using other means. > Let's get this change into the codebase and then work on connecting > WAL usage to (auto)vacuum stats. I agree that having a view of the full activity is a way bigger scope, so it could be done later (and at this point in pg14), but I'm still hoping that we can get insight of other backend WAL activity, such as autovacuum, in pg13.
пт, 6 мар. 2020 г. в 20:14, Julien Rouhaud <rjuju123@gmail.com>: > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > > > While at it, did you consider adding a full-page image counter in the WalUsage? > > > That's something I'd really like to have and it doesn't seem hard to integrate. > > > > Well, not sure I understand you 100%, being new to Postgres dev. Do > > you want a separate counter for pages written whenever doPageWrites is > > true? I can do that, if needed. Please confirm. > > Yes, I meant a separate 3rd counter for the number of full page images > written. However after a quick look I think that a FPI should be > detected with (doPageWrites && fpw_lsn != InvalidXLogRecPtr && fpw_lsn > <= RedoRecPtr). This seems easy, will implement once I get some spare time. > > > Another point is that this patch won't help to see autovacuum activity. > > > As an example, I did a quick te..... > > > ...LONG QUOTE... > > > but that may seem strange to only account for (auto)vacuum activity, rather > > > than globally, grouping per RmgrId or CommandTag for instance. We could then > > > see the complete WAL usage per-database. What do you think? > > > > I wanted to keep the patch small and simple, and fit to practical > > needs. This patch is supposed to provide tuning assistance, catching > > an io heavy query in commit-bound situation. > > Total WAL usage per DB can be assessed rather easily using other means. > > Let's get this change into the codebase and then work on connecting > > WAL usage to (auto)vacuum stats. > > I agree that having a view of the full activity is a way bigger scope, > so it could be done later (and at this point in pg14), but I'm still > hoping that we can get insight of other backend WAL activity, such as > autovacuum, in pg13. How do you think this information should be exposed? Via the pg_stat_statement? Anyways, I believe this change could be bigger than FPI. I propose to plan a separate patch for it, or even add it to the TODO after the core patch of wal usage is merged. Please expect a new patch version next week, with FPI counters added.
On Fri, Mar 6, 2020 at 6:59 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > пт, 6 мар. 2020 г. в 20:14, Julien Rouhaud <rjuju123@gmail.com>: > > > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > I wanted to keep the patch small and simple, and fit to practical > > > needs. This patch is supposed to provide tuning assistance, catching > > > an io heavy query in commit-bound situation. > > > Total WAL usage per DB can be assessed rather easily using other means. > > > Let's get this change into the codebase and then work on connecting > > > WAL usage to (auto)vacuum stats. > > > > I agree that having a view of the full activity is a way bigger scope, > > so it could be done later (and at this point in pg14), but I'm still > > hoping that we can get insight of other backend WAL activity, such as > > autovacuum, in pg13. > > How do you think this information should be exposed? Via the pg_stat_statement? That's unlikely, since autovacuum won't trigger any hook. I was thinking on some new view for pgstats, similarly to the example I showed previously. The implementation is straightforward, although pg_stat_database is maybe not the best choice here. > Anyways, I believe this change could be bigger than FPI. I propose to > plan a separate patch for it, or even add it to the TODO after the > core patch of wal usage is merged. Just in case, if the problem is a lack of time, I'd be happy to help on that if needed. Otherwise, I'll definitely not try to block any progress for the feature as proposed. > Please expect a new patch version next week, with FPI counters added. Thanks!
> > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > > I wanted to keep the patch small and simple, and fit to practical > > > > needs. This patch is supposed to provide tuning assistance, catching > > > > an io heavy query in commit-bound situation. > > > > Total WAL usage per DB can be assessed rather easily using other means. > > > > Let's get this change into the codebase and then work on connecting > > > > WAL usage to (auto)vacuum stats. > > > > > > I agree that having a view of the full activity is a way bigger scope, > > > so it could be done later (and at this point in pg14), but I'm still > > > hoping that we can get insight of other backend WAL activity, such as > > > autovacuum, in pg13. > > > > How do you think this information should be exposed? Via the pg_stat_statement? > > That's unlikely, since autovacuum won't trigger any hook. I was > thinking on some new view for pgstats, similarly to the example I > showed previously. The implementation is straightforward, although > pg_stat_database is maybe not the best choice here. After extensive thinking and some code diving, I did not manage to come up with a sane idea on how to expose data about autovacuum WAL usage. Must be the flu. > > Anyways, I believe this change could be bigger than FPI. I propose to > > plan a separate patch for it, or even add it to the TODO after the > > core patch of wal usage is merged. > > Just in case, if the problem is a lack of time, I'd be happy to help > on that if needed. Otherwise, I'll definitely not try to block any > progress for the feature as proposed. Please feel free to work on any extension of this patch idea. I lack both time and knowledge to do it all by myself. > > Please expect a new patch version next week, with FPI counters added. Please find attached patch version 003, with FP writes and minor corrections. Hope i use attachment versioning as expected in this group :) Test had been reworked, and I believe it should be stable now, the part which checks WAL is written and there is a correlation between affected rows and WAL records. I still have no idea how to test full-page writes against regular updates, it seems very unstable. Please share ideas if any. Thanks!
Attachment
On Sun, Mar 15, 2020 at 09:52:18PM +0300, Kirill Bychik wrote: > > > > On Thu, Mar 5, 2020 at 8:55 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > After extensive thinking and some code diving, I did not manage to > come up with a sane idea on how to expose data about autovacuum WAL > usage. Must be the flu. > > > > Anyways, I believe this change could be bigger than FPI. I propose to > > > plan a separate patch for it, or even add it to the TODO after the > > > core patch of wal usage is merged. > > > > Just in case, if the problem is a lack of time, I'd be happy to help > > on that if needed. Otherwise, I'll definitely not try to block any > > progress for the feature as proposed. > > Please feel free to work on any extension of this patch idea. I lack > both time and knowledge to do it all by myself. I'm adding a 3rd patch on top of yours to expose the new WAL counters in pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with this approach but I didn't find better, and maybe this will raise some better ideas. The only sure thing is that we're not going to add a bunch of new fields in pg_stat_all_tables anyway. We can also drop this 3rd patch entirely if no one's happy about it without impacting the first two. > > > Please expect a new patch version next week, with FPI counters added. > > Please find attached patch version 003, with FP writes and minor > corrections. Hope i use attachment versioning as expected in this > group :) Thanks! > Test had been reworked, and I believe it should be stable now, the > part which checks WAL is written and there is a correlation between > affected rows and WAL records. I still have no idea how to test > full-page writes against regular updates, it seems very unstable. > Please share ideas if any. I just reviewed the patches, and it globally looks good to me. The way to detect full page images looks sensible, but I'm really not familiar with that code so additional review would be useful. I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't used in the test. Since I have to add all the patches to make the cfbot happy, I slightly adapted the tests to reference the fp column too. There was also a minor issue in the documentation, as wal_records and wal_bytes were copy/pasted twice while wal_write_fp_records wasn't documented, so I also changed it. Let me know if you're ok with those changes.
Attachment
> > Please feel free to work on any extension of this patch idea. I lack > > both time and knowledge to do it all by myself. > > > I'm adding a 3rd patch on top of yours to expose the new WAL counters in > pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with > this approach but I didn't find better, and maybe this will raise some better > ideas. The only sure thing is that we're not going to add a bunch of new > fields in pg_stat_all_tables anyway. > > We can also drop this 3rd patch entirely if no one's happy about it without > impacting the first two. No objections about 3rd on my side, unless we miss the CF completely. As for the code, I believe: + walusage.wal_records = pgWalUsage.wal_records - + walusage_start.wal_records; + walusage.wal_fp_records = pgWalUsage.wal_fp_records - + walusage_start.wal_fp_records; + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes; Could be done much simpler via the utility: WalUsageAccumDiff(walusage, pgWalUsage, walusage_start); On a side note, I agree API to the buf/wal usage is far from perfect. > > Test had been reworked, and I believe it should be stable now, the > > part which checks WAL is written and there is a correlation between > > affected rows and WAL records. I still have no idea how to test > > full-page writes against regular updates, it seems very unstable. > > Please share ideas if any. > > > I just reviewed the patches, and it globally looks good to me. The way to > detect full page images looks sensible, but I'm really not familiar with that > code so additional review would be useful. > > I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't > used in the test. Since I have to add all the patches to make the cfbot happy, > I slightly adapted the tests to reference the fp column too. There was also a > minor issue in the documentation, as wal_records and wal_bytes were copy/pasted > twice while wal_write_fp_records wasn't documented, so I also changed it. > > Let me know if you're ok with those changes. Sorry for not getting wal_fp_usage into the docs, my fault. As for the tests, please get somebody else to review this. I strongly believe checking full page writes here could be a source of instability.
On Tue, Mar 17, 2020 at 10:27:05PM +0300, Kirill Bychik wrote: > > > Please feel free to work on any extension of this patch idea. I lack > > > both time and knowledge to do it all by myself. > > > > I'm adding a 3rd patch on top of yours to expose the new WAL counters in > > pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with > > this approach but I didn't find better, and maybe this will raise some better > > ideas. The only sure thing is that we're not going to add a bunch of new > > fields in pg_stat_all_tables anyway. > > > > We can also drop this 3rd patch entirely if no one's happy about it without > > impacting the first two. > > No objections about 3rd on my side, unless we miss the CF completely. > > As for the code, I believe: > + walusage.wal_records = pgWalUsage.wal_records - > + walusage_start.wal_records; > + walusage.wal_fp_records = pgWalUsage.wal_fp_records - > + walusage_start.wal_fp_records; > + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes; > > Could be done much simpler via the utility: > WalUsageAccumDiff(walusage, pgWalUsage, walusage_start); Indeed, but this function is private to instrument.c. AFAICT pg_stat_statements is already duplicating similar code for buffers rather than having BufferUsageAccumDiff being exported, so I chose the same approach. I'd be in favor of exporting both functions though. > On a side note, I agree API to the buf/wal usage is far from perfect. Yes clearly. > > > Test had been reworked, and I believe it should be stable now, the > > > part which checks WAL is written and there is a correlation between > > > affected rows and WAL records. I still have no idea how to test > > > full-page writes against regular updates, it seems very unstable. > > > Please share ideas if any. > > > > > > I just reviewed the patches, and it globally looks good to me. The way to > > detect full page images looks sensible, but I'm really not familiar with that > > code so additional review would be useful. > > > > I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't > > used in the test. Since I have to add all the patches to make the cfbot happy, > > I slightly adapted the tests to reference the fp column too. There was also a > > minor issue in the documentation, as wal_records and wal_bytes were copy/pasted > > twice while wal_write_fp_records wasn't documented, so I also changed it. > > > > Let me know if you're ok with those changes. > > Sorry for not getting wal_fp_usage into the docs, my fault. > > As for the tests, please get somebody else to review this. I strongly > believe checking full page writes here could be a source of > instability. I'm also a little bit dubious about it. The initial checkpoint should make things stable (of course unless full_page_writes is disabled), and Cfbot also seems happy about it. At least keeping it for the temporary tables test shouldn't be a problem.
> > > > Please feel free to work on any extension of this patch idea. I lack > > > > both time and knowledge to do it all by myself. > > > > > > I'm adding a 3rd patch on top of yours to expose the new WAL counters in > > > pg_stat_database, for vacuum and autovacuum. I'm not really enthiusiastic with > > > this approach but I didn't find better, and maybe this will raise some better > > > ideas. The only sure thing is that we're not going to add a bunch of new > > > fields in pg_stat_all_tables anyway. > > > > > > We can also drop this 3rd patch entirely if no one's happy about it without > > > impacting the first two. > > > > No objections about 3rd on my side, unless we miss the CF completely. > > > > As for the code, I believe: > > + walusage.wal_records = pgWalUsage.wal_records - > > + walusage_start.wal_records; > > + walusage.wal_fp_records = pgWalUsage.wal_fp_records - > > + walusage_start.wal_fp_records; > > + walusage.wal_bytes = pgWalUsage.wal_bytes - walusage_start.wal_bytes; > > > > Could be done much simpler via the utility: > > WalUsageAccumDiff(walusage, pgWalUsage, walusage_start); > > > Indeed, but this function is private to instrument.c. AFAICT > pg_stat_statements is already duplicating similar code for buffers rather than > having BufferUsageAccumDiff being exported, so I chose the same approach. > > I'd be in favor of exporting both functions though. > > On a side note, I agree API to the buf/wal usage is far from perfect. > > > Yes clearly. There is a higher-level Instrumentation API that can be used with INSTRUMENT_WAL flag to collect the wal usage information. I believe the instrumentation is widely used in the executor code, so it should not be a problem to colelct instrumentation information on autovacuum worker level. Just a recommendation/chat, though. I am happy with the way the data is collected now. If you commit this variant, please add a TODO to rework wal usage to common instr API. > > > > Test had been reworked, and I believe it should be stable now, the > > > > part which checks WAL is written and there is a correlation between > > > > affected rows and WAL records. I still have no idea how to test > > > > full-page writes against regular updates, it seems very unstable. > > > > Please share ideas if any. > > > > > > > > > I just reviewed the patches, and it globally looks good to me. The way to > > > detect full page images looks sensible, but I'm really not familiar with that > > > code so additional review would be useful. > > > > > > I noticed that the new wal_write_fp_records field in pg_stat_statements wasn't > > > used in the test. Since I have to add all the patches to make the cfbot happy, > > > I slightly adapted the tests to reference the fp column too. There was also a > > > minor issue in the documentation, as wal_records and wal_bytes were copy/pasted > > > twice while wal_write_fp_records wasn't documented, so I also changed it. > > > > > > Let me know if you're ok with those changes. > > > > Sorry for not getting wal_fp_usage into the docs, my fault. > > > > As for the tests, please get somebody else to review this. I strongly > > believe checking full page writes here could be a source of > > instability. > > > I'm also a little bit dubious about it. The initial checkpoint should make > things stable (of course unless full_page_writes is disabled), and Cfbot also > seems happy about it. At least keeping it for the temporary tables test > shouldn't be a problem. Temp tables should show zero FPI WAL records, true :) I have no objections to the patch.
On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote: > > There is a higher-level Instrumentation API that can be used with > INSTRUMENT_WAL flag to collect the wal usage information. I believe > the instrumentation is widely used in the executor code, so it should > not be a problem to colelct instrumentation information on autovacuum > worker level. > > Just a recommendation/chat, though. I am happy with the way the data > is collected now. If you commit this variant, please add a TODO to > rework wal usage to common instr API. The instrumentation is somewhat intended to be used with executor nodes, not backend commands. I don't see real technical reason that would prevent that, but I prefer to keep things as-is for now, as it sound less controversial. This is for the 3rd patch, which may not even be considered for this CF anyway. > > > As for the tests, please get somebody else to review this. I strongly > > > believe checking full page writes here could be a source of > > > instability. > > > > > > I'm also a little bit dubious about it. The initial checkpoint should make > > things stable (of course unless full_page_writes is disabled), and Cfbot also > > seems happy about it. At least keeping it for the temporary tables test > > shouldn't be a problem. > > Temp tables should show zero FPI WAL records, true :) > > I have no objections to the patch. I'm attaching a v5 with fp records only for temp tables, so there's no risk of instability. As I previously said I'm fine with your two patches, so unless you have objections on the fpi test for temp tables or the documentation changes, I believe those should be ready for committer.
Attachment
> > There is a higher-level Instrumentation API that can be used with > > INSTRUMENT_WAL flag to collect the wal usage information. I believe > > the instrumentation is widely used in the executor code, so it should > > not be a problem to colelct instrumentation information on autovacuum > > worker level. > > > > Just a recommendation/chat, though. I am happy with the way the data > > is collected now. If you commit this variant, please add a TODO to > > rework wal usage to common instr API. > > > The instrumentation is somewhat intended to be used with executor nodes, not > backend commands. I don't see real technical reason that would prevent that, > but I prefer to keep things as-is for now, as it sound less controversial. > This is for the 3rd patch, which may not even be considered for this CF anyway. > > > > > > As for the tests, please get somebody else to review this. I strongly > > > > believe checking full page writes here could be a source of > > > > instability. > > > > > > > > > I'm also a little bit dubious about it. The initial checkpoint should make > > > things stable (of course unless full_page_writes is disabled), and Cfbot also > > > seems happy about it. At least keeping it for the temporary tables test > > > shouldn't be a problem. > > > > Temp tables should show zero FPI WAL records, true :) > > > > I have no objections to the patch. > > > I'm attaching a v5 with fp records only for temp tables, so there's no risk of > instability. As I previously said I'm fine with your two patches, so unless > you have objections on the fpi test for temp tables or the documentation > changes, I believe those should be ready for committer. No objections on my side either. Thank you for your review, time and efforts!
On Wed, Mar 18, 2020 at 08:48:17PM +0300, Kirill Bychik wrote: > > I'm attaching a v5 with fp records only for temp tables, so there's no risk of > > instability. As I previously said I'm fine with your two patches, so unless > > you have objections on the fpi test for temp tables or the documentation > > changes, I believe those should be ready for committer. > > No objections on my side either. Thank you for your review, time and efforts! Great, thanks also for the patches and efforts! I'll mark the entry as RFC.
On 2020/03/19 2:19, Julien Rouhaud wrote: > On Wed, Mar 18, 2020 at 09:02:58AM +0300, Kirill Bychik wrote: >> >> There is a higher-level Instrumentation API that can be used with >> INSTRUMENT_WAL flag to collect the wal usage information. I believe >> the instrumentation is widely used in the executor code, so it should >> not be a problem to colelct instrumentation information on autovacuum >> worker level. >> >> Just a recommendation/chat, though. I am happy with the way the data >> is collected now. If you commit this variant, please add a TODO to >> rework wal usage to common instr API. > > > The instrumentation is somewhat intended to be used with executor nodes, not > backend commands. I don't see real technical reason that would prevent that, > but I prefer to keep things as-is for now, as it sound less controversial. > This is for the 3rd patch, which may not even be considered for this CF anyway. > > >>>> As for the tests, please get somebody else to review this. I strongly >>>> believe checking full page writes here could be a source of >>>> instability. >>> >>> >>> I'm also a little bit dubious about it. The initial checkpoint should make >>> things stable (of course unless full_page_writes is disabled), and Cfbot also >>> seems happy about it. At least keeping it for the temporary tables test >>> shouldn't be a problem. >> >> Temp tables should show zero FPI WAL records, true :) >> >> I have no objections to the patch. > > > I'm attaching a v5 with fp records only for temp tables, so there's no risk of > instability. As I previously said I'm fine with your two patches, so unless > you have objections on the fpi test for temp tables or the documentation > changes, I believe those should be ready for committer. You added the columns into pg_stat_database, but seem to forget to update the document for pg_stat_database. Is it really reasonable to add the columns for vacuum's WAL usage into pg_stat_database? I'm not sure how much the information about the amount of WAL generated by vacuum per database is useful. Isn't it better to make VACUUM VERBOSE and autovacuum log include that information, instead, to see how much each vacuum activity generates the WAL? Sorry if this discussion has already been done upthread. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
On Thu, Mar 19, 2020 at 09:03:02PM +0900, Fujii Masao wrote: > > On 2020/03/19 2:19, Julien Rouhaud wrote: > > > > I'm attaching a v5 with fp records only for temp tables, so there's no risk of > > instability. As I previously said I'm fine with your two patches, so unless > > you have objections on the fpi test for temp tables or the documentation > > changes, I believe those should be ready for committer. > > You added the columns into pg_stat_database, but seem to forget to > update the document for pg_stat_database. Ah right, I totally missed that when I tried to clean up the original POC. > Is it really reasonable to add the columns for vacuum's WAL usage into > pg_stat_database? I'm not sure how much the information about > the amount of WAL generated by vacuum per database is useful. The amount per database isn't really useful, but I didn't had a better idea on how to expose (auto)vacuum WAL usage until this: > Isn't it better to make VACUUM VERBOSE and autovacuum log include > that information, instead, to see how much each vacuum activity > generates the WAL? Sorry if this discussion has already been done > upthread. That's a way better idea! I'm attaching the full patchset with the 3rd patch to use this approach instead. There's a bit a duplicate code for computing the WalUsage, as I didn't find a better way to avoid that without exposing WalUsageAccumDiff(). Autovacuum log sample: 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 buffer usage: 4448 hits, 4 misses, 4 dirtied avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s WAL usage: 6643 records, 4 full page records, 1402679 bytes VACUUM log sample: # vacuum VERBOSE t1; INFO: vacuuming "public.t1" INFO: "t1": removed 50000 row versions in 443 pages INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 There were 50000 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. 1332 WAL records, 4 WAL full page records, 306901 WAL bytes CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. INFO: "t1": truncated 443 to 0 pages DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s INFO: vacuuming "pg_toast.pg_toast_16385" INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 There were 0 unused item identifiers. Skipped 0 pages due to buffer pins, 0 frozen pages. 0 pages are entirely empty. 0 WAL records, 0 WAL full page records, 0 WAL bytes CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. VACUUM Note that the 3rd patch is an addition on top of Kirill's original patch, as this is information that would have been greatly helpful to investigate in some performance issues I had to investigate recently. I'd be happy to have it land into v13, but if that's controversial or too late I'm happy to postpone it to v14 if the infrastructure added in Kirill's patches can make it to v13.
Attachment
> > > I'm attaching a v5 with fp records only for temp tables, so there's no risk of > > > instability. As I previously said I'm fine with your two patches, so unless > > > you have objections on the fpi test for temp tables or the documentation > > > changes, I believe those should be ready for committer. > > > > You added the columns into pg_stat_database, but seem to forget to > > update the document for pg_stat_database. > > Ah right, I totally missed that when I tried to clean up the original POC. > > > Is it really reasonable to add the columns for vacuum's WAL usage into > > pg_stat_database? I'm not sure how much the information about > > the amount of WAL generated by vacuum per database is useful. > > The amount per database isn't really useful, but I didn't had a better idea on > how to expose (auto)vacuum WAL usage until this: > > > Isn't it better to make VACUUM VERBOSE and autovacuum log include > > that information, instead, to see how much each vacuum activity > > generates the WAL? Sorry if this discussion has already been done > > upthread. > > That's a way better idea! I'm attaching the full patchset with the 3rd patch > to use this approach instead. There's a bit a duplicate code for computing the > WalUsage, as I didn't find a better way to avoid that without exposing > WalUsageAccumDiff(). > > Autovacuum log sample: > > 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 > pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen > tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 > buffer usage: 4448 hits, 4 misses, 4 dirtied > avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s > system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s > WAL usage: 6643 records, 4 full page records, 1402679 bytes > > VACUUM log sample: > > # vacuum VERBOSE t1; > INFO: vacuuming "public.t1" > INFO: "t1": removed 50000 row versions in 443 pages > INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 > There were 50000 unused item identifiers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > 1332 WAL records, 4 WAL full page records, 306901 WAL bytes > CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. > INFO: "t1": truncated 443 to 0 pages > DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > INFO: vacuuming "pg_toast.pg_toast_16385" > INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages > DETAIL: 0 index row versions were removed. > 0 index pages have been deleted, 0 are currently reusable. > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages > DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 > There were 0 unused item identifiers. > Skipped 0 pages due to buffer pins, 0 frozen pages. > 0 pages are entirely empty. > 0 WAL records, 0 WAL full page records, 0 WAL bytes > CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > VACUUM > > Note that the 3rd patch is an addition on top of Kirill's original patch, as > this is information that would have been greatly helpful to investigate in some > performance issues I had to investigate recently. I'd be happy to have it land > into v13, but if that's controversial or too late I'm happy to postpone it to > v14 if the infrastructure added in Kirill's patches can make it to v13. Dear all, can we please focus on getting the core patch committed? Given the uncertainity regarding autovacuum stats, can we please get parts 1 and 2 into the codebase, and think about exposing autovacuum stats later?
On 2020/03/23 7:32, Kirill Bychik wrote: >>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of >>>> instability. As I previously said I'm fine with your two patches, so unless >>>> you have objections on the fpi test for temp tables or the documentation >>>> changes, I believe those should be ready for committer. >>> >>> You added the columns into pg_stat_database, but seem to forget to >>> update the document for pg_stat_database. >> >> Ah right, I totally missed that when I tried to clean up the original POC. >> >>> Is it really reasonable to add the columns for vacuum's WAL usage into >>> pg_stat_database? I'm not sure how much the information about >>> the amount of WAL generated by vacuum per database is useful. >> >> The amount per database isn't really useful, but I didn't had a better idea on >> how to expose (auto)vacuum WAL usage until this: >> >>> Isn't it better to make VACUUM VERBOSE and autovacuum log include >>> that information, instead, to see how much each vacuum activity >>> generates the WAL? Sorry if this discussion has already been done >>> upthread. >> >> That's a way better idea! I'm attaching the full patchset with the 3rd patch >> to use this approach instead. There's a bit a duplicate code for computing the >> WalUsage, as I didn't find a better way to avoid that without exposing >> WalUsageAccumDiff(). >> >> Autovacuum log sample: >> >> 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 >> pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen >> tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 >> buffer usage: 4448 hits, 4 misses, 4 dirtied >> avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s >> system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s >> WAL usage: 6643 records, 4 full page records, 1402679 bytes >> >> VACUUM log sample: >> >> # vacuum VERBOSE t1; >> INFO: vacuuming "public.t1" >> INFO: "t1": removed 50000 row versions in 443 pages >> INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages >> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 >> There were 50000 unused item identifiers. >> Skipped 0 pages due to buffer pins, 0 frozen pages. >> 0 pages are entirely empty. >> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes >> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. >> INFO: "t1": truncated 443 to 0 pages >> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s >> INFO: vacuuming "pg_toast.pg_toast_16385" >> INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages >> DETAIL: 0 index row versions were removed. >> 0 index pages have been deleted, 0 are currently reusable. >> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. >> INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages >> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 >> There were 0 unused item identifiers. >> Skipped 0 pages due to buffer pins, 0 frozen pages. >> 0 pages are entirely empty. >> 0 WAL records, 0 WAL full page records, 0 WAL bytes >> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. >> VACUUM >> >> Note that the 3rd patch is an addition on top of Kirill's original patch, as >> this is information that would have been greatly helpful to investigate in some >> performance issues I had to investigate recently. I'd be happy to have it land >> into v13, but if that's controversial or too late I'm happy to postpone it to >> v14 if the infrastructure added in Kirill's patches can make it to v13. > > Dear all, can we please focus on getting the core patch committed? > Given the uncertainity regarding autovacuum stats, can we please get > parts 1 and 2 into the codebase, and think about exposing autovacuum > stats later? Here are the comments for 0001 patch. + /* + * Report a full page image constructed for the WAL record + */ + pgWalUsage.wal_fp_records++; Isn't it better to use "fpw" or "fpi" for the variable name rather than "fp" here? In other places, "fpw" and "fpi" are used for full page writes/image. ISTM that this counter could be incorrect if XLogInsertRecord() determines to calculate again whether FPI is necessary or not. No? IOW, this issue could happen if XLogInsert() calls XLogRecordAssemble() multiple times in its do-while loop. Isn't this problematic? + long wal_bytes; /* size of wal records produced */ Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable rather than long? + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); bufusage_space should be walusage_space here? /* * Finish parallel execution. We wait for parallel workers to finish, and * accumulate their buffer usage. */ There are some comments mentioning buffer usage, in execParallel.c. For example, the top comment for ExecParallelFinish(), as the above. These should be updated. Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
On 2020/03/23 21:01, Fujii Masao wrote: > > > On 2020/03/23 7:32, Kirill Bychik wrote: >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of >>>>> instability. As I previously said I'm fine with your two patches, so unless >>>>> you have objections on the fpi test for temp tables or the documentation >>>>> changes, I believe those should be ready for committer. >>>> >>>> You added the columns into pg_stat_database, but seem to forget to >>>> update the document for pg_stat_database. >>> >>> Ah right, I totally missed that when I tried to clean up the original POC. >>> >>>> Is it really reasonable to add the columns for vacuum's WAL usage into >>>> pg_stat_database? I'm not sure how much the information about >>>> the amount of WAL generated by vacuum per database is useful. >>> >>> The amount per database isn't really useful, but I didn't had a better idea on >>> how to expose (auto)vacuum WAL usage until this: >>> >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include >>>> that information, instead, to see how much each vacuum activity >>>> generates the WAL? Sorry if this discussion has already been done >>>> upthread. >>> >>> That's a way better idea! I'm attaching the full patchset with the 3rd patch >>> to use this approach instead. There's a bit a duplicate code for computing the >>> WalUsage, as I didn't find a better way to avoid that without exposing >>> WalUsageAccumDiff(). >>> >>> Autovacuum log sample: >>> >>> 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 >>> pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen >>> tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 >>> buffer usage: 4448 hits, 4 misses, 4 dirtied >>> avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s >>> system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s >>> WAL usage: 6643 records, 4 full page records, 1402679 bytes >>> >>> VACUUM log sample: >>> >>> # vacuum VERBOSE t1; >>> INFO: vacuuming "public.t1" >>> INFO: "t1": removed 50000 row versions in 443 pages >>> INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 >>> There were 50000 unused item identifiers. >>> Skipped 0 pages due to buffer pins, 0 frozen pages. >>> 0 pages are entirely empty. >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. >>> INFO: "t1": truncated 443 to 0 pages >>> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s >>> INFO: vacuuming "pg_toast.pg_toast_16385" >>> INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages >>> DETAIL: 0 index row versions were removed. >>> 0 index pages have been deleted, 0 are currently reusable. >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. >>> INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 >>> There were 0 unused item identifiers. >>> Skipped 0 pages due to buffer pins, 0 frozen pages. >>> 0 pages are entirely empty. >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. >>> VACUUM >>> >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as >>> this is information that would have been greatly helpful to investigate in some >>> performance issues I had to investigate recently. I'd be happy to have it land >>> into v13, but if that's controversial or too late I'm happy to postpone it to >>> v14 if the infrastructure added in Kirill's patches can make it to v13. >> >> Dear all, can we please focus on getting the core patch committed? >> Given the uncertainity regarding autovacuum stats, can we please get >> parts 1 and 2 into the codebase, and think about exposing autovacuum >> stats later? > > Here are the comments for 0001 patch. > > + /* > + * Report a full page image constructed for the WAL record > + */ > + pgWalUsage.wal_fp_records++; > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > "fp" here? In other places, "fpw" and "fpi" are used for full page > writes/image. > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to > calculate again whether FPI is necessary or not. No? IOW, this issue could > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > its do-while loop. Isn't this problematic? > > + long wal_bytes; /* size of wal records produced */ > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > rather than long? > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > bufusage_space should be walusage_space here? > > /* > * Finish parallel execution. We wait for parallel workers to finish, and > * accumulate their buffer usage. > */ > > There are some comments mentioning buffer usage, in execParallel.c. > For example, the top comment for ExecParallelFinish(), as the above. > These should be updated. Here are the comments for 0002 patch. + OUT wal_write_bytes int8, + OUT wal_write_records int8, + OUT wal_write_fp_records int8 Isn't "write" part in the column names confusing because it's WAL *generated* (not written) by the statement? +RETURNS SETOF record +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' +LANGUAGE C STRICT VOLATILE; PARALLEL SAFE should be specified? +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ ISTM it's good timing to have also pg_stat_statements--1.8.sql since the definition of pg_stat_statements() is changed. Thought? +-- CHECKPOINT before WAL tests to ensure test stability +CHECKPOINT; Is this true? I thought you added this because the number of FPI should be larger than zero in the subsequent test. No? But there seems no such test. I'm not excited about adding the test checking the number of FPI because it looks fragile, though... +UPDATE pgss_test SET b = '333' WHERE a = 3 \; +UPDATE pgss_test SET b = '444' WHERE a = 4 ; Could you tell me why several queries need to be run to test the WAL usage? Isn't running a few query enough for the test purpase? Regards, -- Fujii Masao NTT DATA CORPORATION Advanced Platform Technology Group Research and Development Headquarters
On Mon, Mar 23, 2020 at 3:24 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2020/03/23 21:01, Fujii Masao wrote: > > > > > > On 2020/03/23 7:32, Kirill Bychik wrote: > >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of > >>>>> instability. As I previously said I'm fine with your two patches, so unless > >>>>> you have objections on the fpi test for temp tables or the documentation > >>>>> changes, I believe those should be ready for committer. > >>>> > >>>> You added the columns into pg_stat_database, but seem to forget to > >>>> update the document for pg_stat_database. > >>> > >>> Ah right, I totally missed that when I tried to clean up the original POC. > >>> > >>>> Is it really reasonable to add the columns for vacuum's WAL usage into > >>>> pg_stat_database? I'm not sure how much the information about > >>>> the amount of WAL generated by vacuum per database is useful. > >>> > >>> The amount per database isn't really useful, but I didn't had a better idea on > >>> how to expose (auto)vacuum WAL usage until this: > >>> > >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include > >>>> that information, instead, to see how much each vacuum activity > >>>> generates the WAL? Sorry if this discussion has already been done > >>>> upthread. > >>> > >>> That's a way better idea! I'm attaching the full patchset with the 3rd patch > >>> to use this approach instead. There's a bit a duplicate code for computing the > >>> WalUsage, as I didn't find a better way to avoid that without exposing > >>> WalUsageAccumDiff(). > >>> > >>> Autovacuum log sample: > >>> > >>> 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 > >>> pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen > >>> tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 > >>> buffer usage: 4448 hits, 4 misses, 4 dirtied > >>> avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s > >>> system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s > >>> WAL usage: 6643 records, 4 full page records, 1402679 bytes > >>> > >>> VACUUM log sample: > >>> > >>> # vacuum VERBOSE t1; > >>> INFO: vacuuming "public.t1" > >>> INFO: "t1": removed 50000 row versions in 443 pages > >>> INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 > >>> There were 50000 unused item identifiers. > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > >>> 0 pages are entirely empty. > >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes > >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. > >>> INFO: "t1": truncated 443 to 0 pages > >>> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > >>> INFO: vacuuming "pg_toast.pg_toast_16385" > >>> INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages > >>> DETAIL: 0 index row versions were removed. > >>> 0 index pages have been deleted, 0 are currently reusable. > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > >>> INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 > >>> There were 0 unused item identifiers. > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > >>> 0 pages are entirely empty. > >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > >>> VACUUM > >>> > >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as > >>> this is information that would have been greatly helpful to investigate in some > >>> performance issues I had to investigate recently. I'd be happy to have it land > >>> into v13, but if that's controversial or too late I'm happy to postpone it to > >>> v14 if the infrastructure added in Kirill's patches can make it to v13. > >> > >> Dear all, can we please focus on getting the core patch committed? > >> Given the uncertainity regarding autovacuum stats, can we please get > >> parts 1 and 2 into the codebase, and think about exposing autovacuum > >> stats later? > > > > Here are the comments for 0001 patch. > > > > + /* > > + * Report a full page image constructed for the WAL record > > + */ > > + pgWalUsage.wal_fp_records++; > > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > > "fp" here? In other places, "fpw" and "fpi" are used for full page > > writes/image. > > > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to > > calculate again whether FPI is necessary or not. No? IOW, this issue could > > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > > its do-while loop. Isn't this problematic? > > > > + long wal_bytes; /* size of wal records produced */ > > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > > rather than long? > > > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > > > bufusage_space should be walusage_space here? > > > > /* > > * Finish parallel execution. We wait for parallel workers to finish, and > > * accumulate their buffer usage. > > */ > > > > There are some comments mentioning buffer usage, in execParallel.c. > > For example, the top comment for ExecParallelFinish(), as the above. > > These should be updated. > > Here are the comments for 0002 patch. > > + OUT wal_write_bytes int8, > + OUT wal_write_records int8, > + OUT wal_write_fp_records int8 > > Isn't "write" part in the column names confusing because it's WAL > *generated* (not written) by the statement? > > +RETURNS SETOF record > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' > +LANGUAGE C STRICT VOLATILE; > > PARALLEL SAFE should be specified? > > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since > the definition of pg_stat_statements() is changed. Thought? > > +-- CHECKPOINT before WAL tests to ensure test stability > +CHECKPOINT; > > Is this true? I thought you added this because the number of FPI > should be larger than zero in the subsequent test. No? But there > seems no such test. I'm not excited about adding the test checking > the number of FPI because it looks fragile, though... > > +UPDATE pgss_test SET b = '333' WHERE a = 3 \; > +UPDATE pgss_test SET b = '444' WHERE a = 4 ; > > Could you tell me why several queries need to be run to test > the WAL usage? Isn't running a few query enough for the test purpase? FTR I marked the commitfest entry as waiting on author. Kirill do you think you'll have time to address Fuji-san's review shortly? The end of the commitfest is approaching quite fast :(
> > >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of > > >>>>> instability. As I previously said I'm fine with your two patches, so unless > > >>>>> you have objections on the fpi test for temp tables or the documentation > > >>>>> changes, I believe those should be ready for committer. > > >>>> > > >>>> You added the columns into pg_stat_database, but seem to forget to > > >>>> update the document for pg_stat_database. > > >>> > > >>> Ah right, I totally missed that when I tried to clean up the original POC. > > >>> > > >>>> Is it really reasonable to add the columns for vacuum's WAL usage into > > >>>> pg_stat_database? I'm not sure how much the information about > > >>>> the amount of WAL generated by vacuum per database is useful. > > >>> > > >>> The amount per database isn't really useful, but I didn't had a better idea on > > >>> how to expose (auto)vacuum WAL usage until this: > > >>> > > >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include > > >>>> that information, instead, to see how much each vacuum activity > > >>>> generates the WAL? Sorry if this discussion has already been done > > >>>> upthread. > > >>> > > >>> That's a way better idea! I'm attaching the full patchset with the 3rd patch > > >>> to use this approach instead. There's a bit a duplicate code for computing the > > >>> WalUsage, as I didn't find a better way to avoid that without exposing > > >>> WalUsageAccumDiff(). > > >>> > > >>> Autovacuum log sample: > > >>> > > >>> 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 > > >>> pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen > > >>> tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 > > >>> buffer usage: 4448 hits, 4 misses, 4 dirtied > > >>> avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s > > >>> system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s > > >>> WAL usage: 6643 records, 4 full page records, 1402679 bytes > > >>> > > >>> VACUUM log sample: > > >>> > > >>> # vacuum VERBOSE t1; > > >>> INFO: vacuuming "public.t1" > > >>> INFO: "t1": removed 50000 row versions in 443 pages > > >>> INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages > > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 > > >>> There were 50000 unused item identifiers. > > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > > >>> 0 pages are entirely empty. > > >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes > > >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. > > >>> INFO: "t1": truncated 443 to 0 pages > > >>> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > > >>> INFO: vacuuming "pg_toast.pg_toast_16385" > > >>> INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages > > >>> DETAIL: 0 index row versions were removed. > > >>> 0 index pages have been deleted, 0 are currently reusable. > > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > > >>> INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages > > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 > > >>> There were 0 unused item identifiers. > > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > > >>> 0 pages are entirely empty. > > >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes > > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > > >>> VACUUM > > >>> > > >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as > > >>> this is information that would have been greatly helpful to investigate in some > > >>> performance issues I had to investigate recently. I'd be happy to have it land > > >>> into v13, but if that's controversial or too late I'm happy to postpone it to > > >>> v14 if the infrastructure added in Kirill's patches can make it to v13. > > >> > > >> Dear all, can we please focus on getting the core patch committed? > > >> Given the uncertainity regarding autovacuum stats, can we please get > > >> parts 1 and 2 into the codebase, and think about exposing autovacuum > > >> stats later? > > > > > > Here are the comments for 0001 patch. > > > > > > + /* > > > + * Report a full page image constructed for the WAL record > > > + */ > > > + pgWalUsage.wal_fp_records++; > > > > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > > > "fp" here? In other places, "fpw" and "fpi" are used for full page > > > writes/image. > > > > > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to > > > calculate again whether FPI is necessary or not. No? IOW, this issue could > > > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > > > its do-while loop. Isn't this problematic? > > > > > > + long wal_bytes; /* size of wal records produced */ > > > > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > > > rather than long? > > > > > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > > > > > bufusage_space should be walusage_space here? > > > > > > /* > > > * Finish parallel execution. We wait for parallel workers to finish, and > > > * accumulate their buffer usage. > > > */ > > > > > > There are some comments mentioning buffer usage, in execParallel.c. > > > For example, the top comment for ExecParallelFinish(), as the above. > > > These should be updated. > > > > Here are the comments for 0002 patch. > > > > + OUT wal_write_bytes int8, > > + OUT wal_write_records int8, > > + OUT wal_write_fp_records int8 > > > > Isn't "write" part in the column names confusing because it's WAL > > *generated* (not written) by the statement? > > > > +RETURNS SETOF record > > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' > > +LANGUAGE C STRICT VOLATILE; > > > > PARALLEL SAFE should be specified? > > > > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ > > > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since > > the definition of pg_stat_statements() is changed. Thought? > > > > +-- CHECKPOINT before WAL tests to ensure test stability > > +CHECKPOINT; > > > > Is this true? I thought you added this because the number of FPI > > should be larger than zero in the subsequent test. No? But there > > seems no such test. I'm not excited about adding the test checking > > the number of FPI because it looks fragile, though... > > > > +UPDATE pgss_test SET b = '333' WHERE a = 3 \; > > +UPDATE pgss_test SET b = '444' WHERE a = 4 ; > > > > Could you tell me why several queries need to be run to test > > the WAL usage? Isn't running a few query enough for the test purpase? > > FTR I marked the commitfest entry as waiting on author. > > Kirill do you think you'll have time to address Fuji-san's review > shortly? The end of the commitfest is approaching quite fast :( All these are really valuable objections. Unfortunately, I won't be able to get all sorted out soon, due to total lack of time. I would be very glad if somebody could step in for this patch.
On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > > >>>>> I'm attaching a v5 with fp records only for temp tables, so there's no risk of > > > >>>>> instability. As I previously said I'm fine with your two patches, so unless > > > >>>>> you have objections on the fpi test for temp tables or the documentation > > > >>>>> changes, I believe those should be ready for committer. > > > >>>> > > > >>>> You added the columns into pg_stat_database, but seem to forget to > > > >>>> update the document for pg_stat_database. > > > >>> > > > >>> Ah right, I totally missed that when I tried to clean up the original POC. > > > >>> > > > >>>> Is it really reasonable to add the columns for vacuum's WAL usage into > > > >>>> pg_stat_database? I'm not sure how much the information about > > > >>>> the amount of WAL generated by vacuum per database is useful. > > > >>> > > > >>> The amount per database isn't really useful, but I didn't had a better idea on > > > >>> how to expose (auto)vacuum WAL usage until this: > > > >>> > > > >>>> Isn't it better to make VACUUM VERBOSE and autovacuum log include > > > >>>> that information, instead, to see how much each vacuum activity > > > >>>> generates the WAL? Sorry if this discussion has already been done > > > >>>> upthread. > > > >>> > > > >>> That's a way better idea! I'm attaching the full patchset with the 3rd patch > > > >>> to use this approach instead. There's a bit a duplicate code for computing the > > > >>> WalUsage, as I didn't find a better way to avoid that without exposing > > > >>> WalUsageAccumDiff(). > > > >>> > > > >>> Autovacuum log sample: > > > >>> > > > >>> 2020-03-19 15:49:05.708 CET [5843] LOG: automatic vacuum of table "rjuju.public.t1": index scans: 0 > > > >>> pages: 0 removed, 2213 remain, 0 skipped due to pins, 0 skipped frozen > > > >>> tuples: 250000 removed, 250000 remain, 0 are dead but not yet removable, oldest xmin: 502 > > > >>> buffer usage: 4448 hits, 4 misses, 4 dirtied > > > >>> avg read rate: 0.160 MB/s, avg write rate: 0.160 MB/s > > > >>> system usage: CPU: user: 0.13 s, system: 0.00 s, elapsed: 0.19 s > > > >>> WAL usage: 6643 records, 4 full page records, 1402679 bytes > > > >>> > > > >>> VACUUM log sample: > > > >>> > > > >>> # vacuum VERBOSE t1; > > > >>> INFO: vacuuming "public.t1" > > > >>> INFO: "t1": removed 50000 row versions in 443 pages > > > >>> INFO: "t1": found 50000 removable, 0 nonremovable row versions in 443 out of 443 pages > > > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 512 > > > >>> There were 50000 unused item identifiers. > > > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > > > >>> 0 pages are entirely empty. > > > >>> 1332 WAL records, 4 WAL full page records, 306901 WAL bytes > > > >>> CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s. > > > >>> INFO: "t1": truncated 443 to 0 pages > > > >>> DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s > > > >>> INFO: vacuuming "pg_toast.pg_toast_16385" > > > >>> INFO: index "pg_toast_16385_index" now contains 0 row versions in 1 pages > > > >>> DETAIL: 0 index row versions were removed. > > > >>> 0 index pages have been deleted, 0 are currently reusable. > > > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > > > >>> INFO: "pg_toast_16385": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages > > > >>> DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 513 > > > >>> There were 0 unused item identifiers. > > > >>> Skipped 0 pages due to buffer pins, 0 frozen pages. > > > >>> 0 pages are entirely empty. > > > >>> 0 WAL records, 0 WAL full page records, 0 WAL bytes > > > >>> CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. > > > >>> VACUUM > > > >>> > > > >>> Note that the 3rd patch is an addition on top of Kirill's original patch, as > > > >>> this is information that would have been greatly helpful to investigate in some > > > >>> performance issues I had to investigate recently. I'd be happy to have it land > > > >>> into v13, but if that's controversial or too late I'm happy to postpone it to > > > >>> v14 if the infrastructure added in Kirill's patches can make it to v13. > > > >> > > > >> Dear all, can we please focus on getting the core patch committed? > > > >> Given the uncertainity regarding autovacuum stats, can we please get > > > >> parts 1 and 2 into the codebase, and think about exposing autovacuum > > > >> stats later? > > > > > > > > Here are the comments for 0001 patch. > > > > > > > > + /* > > > > + * Report a full page image constructed for the WAL record > > > > + */ > > > > + pgWalUsage.wal_fp_records++; > > > > > > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > > > > "fp" here? In other places, "fpw" and "fpi" are used for full page > > > > writes/image. > > > > > > > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to > > > > calculate again whether FPI is necessary or not. No? IOW, this issue could > > > > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > > > > its do-while loop. Isn't this problematic? > > > > > > > > + long wal_bytes; /* size of wal records produced */ > > > > > > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > > > > rather than long? > > > > > > > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > > > > > > > bufusage_space should be walusage_space here? > > > > > > > > /* > > > > * Finish parallel execution. We wait for parallel workers to finish, and > > > > * accumulate their buffer usage. > > > > */ > > > > > > > > There are some comments mentioning buffer usage, in execParallel.c. > > > > For example, the top comment for ExecParallelFinish(), as the above. > > > > These should be updated. > > > > > > Here are the comments for 0002 patch. > > > > > > + OUT wal_write_bytes int8, > > > + OUT wal_write_records int8, > > > + OUT wal_write_fp_records int8 > > > > > > Isn't "write" part in the column names confusing because it's WAL > > > *generated* (not written) by the statement? > > > > > > +RETURNS SETOF record > > > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' > > > +LANGUAGE C STRICT VOLATILE; > > > > > > PARALLEL SAFE should be specified? > > > > > > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ > > > > > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since > > > the definition of pg_stat_statements() is changed. Thought? > > > > > > +-- CHECKPOINT before WAL tests to ensure test stability > > > +CHECKPOINT; > > > > > > Is this true? I thought you added this because the number of FPI > > > should be larger than zero in the subsequent test. No? But there > > > seems no such test. I'm not excited about adding the test checking > > > the number of FPI because it looks fragile, though... > > > > > > +UPDATE pgss_test SET b = '333' WHERE a = 3 \; > > > +UPDATE pgss_test SET b = '444' WHERE a = 4 ; > > > > > > Could you tell me why several queries need to be run to test > > > the WAL usage? Isn't running a few query enough for the test purpase? > > > > FTR I marked the commitfest entry as waiting on author. > > > > Kirill do you think you'll have time to address Fuji-san's review > > shortly? The end of the commitfest is approaching quite fast :( > > All these are really valuable objections. Unfortunately, I won't be > able to get all sorted out soon, due to total lack of time. I would be > very glad if somebody could step in for this patch. I'll try to do that tomorrow!
On Sat, Mar 28, 2020 at 12:54 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Mar 27, 2020 at 8:21 PM Kirill Bychik <kirill.bychik@gmail.com> wrote: > > > > > > All these are really valuable objections. Unfortunately, I won't be > > able to get all sorted out soon, due to total lack of time. I would be > > very glad if somebody could step in for this patch. > > I'll try to do that tomorrow! > I see some basic problems with the patch. The way it tries to compute WAL usage for parallel stuff doesn't seem right to me. Can you share or point me to any test done where we have computed WAL for parallel operations like Parallel Vacuum or Parallel Create Index? Basically, I don't know changes done in ExecInitParallelPlan and friends allow us to compute WAL for parallel operations. Those will primarily cover parallel queries that won't write WAL. How you have tested those changes? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > I see some basic problems with the patch. The way it tries to compute > WAL usage for parallel stuff doesn't seem right to me. Can you share > or point me to any test done where we have computed WAL for parallel > operations like Parallel Vacuum or Parallel Create Index? Ah, that's indeed a good point and AFAICT WAL records from parallel utility workers won't be accounted for. That being said, I think that an argument could be made that proper infrastructure should have been added in the original parallel utility patches, as pg_stat_statement is already broken wrt. buffer usage in parallel utility, unless I'm missing something. > Basically, > I don't know changes done in ExecInitParallelPlan and friends allow us > to compute WAL for parallel operations. Those will primarily cover > parallel queries that won't write WAL. How you have tested those > changes? I didn't tested those, and I'm not even sure how to properly and reliably test that. Do you have any advice on how to achieve that? However the patch is mimicking the buffer instrumentation that already exists, and the approach also looks correct to me. Do you have a reason to believe that the approach that works for buffer usage wouldn't work for WAL records? (I of course agree that this should be tested anyway)
pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Julien Rouhaud
Date:
On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > I see some basic problems with the patch. The way it tries to compute > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > or point me to any test done where we have computed WAL for parallel > > operations like Parallel Vacuum or Parallel Create Index? > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > workers won't be accounted for. That being said, I think that an argument > could be made that proper infrastructure should have been added in the original > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > usage in parallel utility, unless I'm missing something. Just to be sure I did a quick test with pg_stat_statements behavior using parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage doesn't reflect parallel workers' activity. I added an open for that, and adding Robert in Cc as 9da0cc352 is the first commit adding parallel maintenance.
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > I see some basic problems with the patch. The way it tries to compute > > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > > or point me to any test done where we have computed WAL for parallel > > > operations like Parallel Vacuum or Parallel Create Index? > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > > workers won't be accounted for. That being said, I think that an argument > > could be made that proper infrastructure should have been added in the original > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > > usage in parallel utility, unless I'm missing something. > > Just to be sure I did a quick test with pg_stat_statements behavior using > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage > doesn't reflect parallel workers' activity. > Sawada-San would like to investigate this? If not, I will look into this next week. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > Basically, > > I don't know changes done in ExecInitParallelPlan and friends allow us > > to compute WAL for parallel operations. Those will primarily cover > > parallel queries that won't write WAL. How you have tested those > > changes? > > I didn't tested those, and I'm not even sure how to properly and reliably test > that. Do you have any advice on how to achieve that? > > However the patch is mimicking the buffer instrumentation that already exists, > and the approach also looks correct to me. Do you have a reason to believe > that the approach that works for buffer usage wouldn't work for WAL records? (I > of course agree that this should be tested anyway) > The buffer usage infrastructure is for read-only queries (for ex. for stats like blks_hit, blks_read). As far as I can think, there is no easy way to test the WAL usage via that API. It might or might not be required in the future depending on whether we decide to use the same infrastructure for parallel writes. I think for now we should remove that part of changes and rather think how to get that for parallel operations that can write WAL. For ex. we might need to do something similar to what this patch has done in begin_parallel_vacuum and end_parallel_vacuum. Would you like to attempt that? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > > > I see some basic problems with the patch. The way it tries to compute > > > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > > > or point me to any test done where we have computed WAL for parallel > > > > operations like Parallel Vacuum or Parallel Create Index? > > > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > > > workers won't be accounted for. That being said, I think that an argument > > > could be made that proper infrastructure should have been added in the original > > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > > > usage in parallel utility, unless I'm missing something. > > > > Just to be sure I did a quick test with pg_stat_statements behavior using > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage > > doesn't reflect parallel workers' activity. > > > > Sawada-San would like to investigate this? If not, I will look into > this next week. Sure, I'll investigate this issue today. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > > > > > I see some basic problems with the patch. The way it tries to compute > > > > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > > > > or point me to any test done where we have computed WAL for parallel > > > > > operations like Parallel Vacuum or Parallel Create Index? > > > > > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > > > > workers won't be accounted for. That being said, I think that an argument > > > > could be made that proper infrastructure should have been added in the original > > > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > > > > usage in parallel utility, unless I'm missing something. > > > > > > Just to be sure I did a quick test with pg_stat_statements behavior using > > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage > > > doesn't reflect parallel workers' activity. > > > > > > > Sawada-San would like to investigate this? If not, I will look into > > this next week. > > Sure, I'll investigate this issue today. > I've run vacuum with/without parallel workers on the table having 5 indexes. The vacuum reads all blocks of table and indexes. * VACUUM command with no parallel workers =# select total_time, shared_blks_hit, shared_blks_read, shared_blks_hit + shared_blks_read as total_read_blks, shared_blks_dirtied, shared_blks_written from pg_stat_statements where query ~ 'vacuum'; total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written --------------+-----------------+------------------+-----------------+---------------------+--------------------- 19857.217207 | 45238 | 226944 | 272182 | 225943 | 225894 (1 row) * VACUUM command with 4 parallel workers =# select total_time, shared_blks_hit, shared_blks_read, shared_blks_hit + shared_blks_read as total_read_blks, shared_blks_dirtied, shared_blks_written from pg_stat_statements where query ~ 'vacuum'; total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written -------------+-----------------+------------------+-----------------+---------------------+--------------------- 6932.117365 | 45205 | 73079 | 118284 | 72403 | 72365 (1 row) The total number of blocks of table and indexes are about 182243 blocks. As Julien reported, obviously the total number of read blocks during parallel vacuum is much less than single process vacuum's result. Parallel create index has the same issue but it doesn't exist in parallel queries for SELECTs. I think we need to change parallel maintenance commands so that they report buffer usage like what ParallelQueryMain() does; prepare to track buffer usage during query execution by InstrStartParallelQuery(), and report it by InstrEndParallelQuery() after parallel maintenance command. To report buffer usage of parallel maintenance command correctly, I'm thinking that we can (1) change parallel create index and parallel vacuum so that they prepare gathering buffer usage, or (2) have a common entry point for parallel maintenance commands that is responsible for gathering buffer usage and calling the entry functions for individual maintenance command. I'll investigate it more in depth. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Mar 29, 2020 at 11:03:50AM +0530, Amit Kapila wrote: > On Sat, Mar 28, 2020 at 7:08 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > Basically, > > > I don't know changes done in ExecInitParallelPlan and friends allow us > > > to compute WAL for parallel operations. Those will primarily cover > > > parallel queries that won't write WAL. How you have tested those > > > changes? > > > > I didn't tested those, and I'm not even sure how to properly and reliably test > > that. Do you have any advice on how to achieve that? > > > > However the patch is mimicking the buffer instrumentation that already exists, > > and the approach also looks correct to me. Do you have a reason to believe > > that the approach that works for buffer usage wouldn't work for WAL records? (I > > of course agree that this should be tested anyway) > > > > The buffer usage infrastructure is for read-only queries (for ex. for > stats like blks_hit, blks_read). As far as I can think, there is no > easy way to test the WAL usage via that API. It might or might not be > required in the future depending on whether we decide to use the same > infrastructure for parallel writes. I'm not sure that I get your point. I'm assuming that you meant parallel-read-only queries, but surely buffer usage infrastructure for parallel query relies on the same approach as non-parallel one (each node computes the process-local pgBufferUsage diff) and sums all of that at the end of the parallel query execution. I also don't see how whether the query is read-only or not is relevant here as far as instrumentation is concerned, especially since read-only query can definitely do writes and increase the count of dirtied buffers, like a write query would. For instance a hint bit change can be done in a parallel query AFAIK, and this can generate WAL records in wal_log_hints is enabled, so that's probably one way to test it. I now think that not adding support for WAL buffers in EXPLAIN output in the initial patch scope was a mistake, as this is probably the best way to test the WAL counters for parallel queries. This shouldn't be hard to add though, and I can work on it quickly if there's still a chance to get this feature included in pg13. > I think for now we should remove > that part of changes and rather think how to get that for parallel > operations that can write WAL. For ex. we might need to do something > similar to what this patch has done in begin_parallel_vacuum and > end_parallel_vacuum. Would you like to attempt that? Do you mean removing WAL buffers instrumentation from parallel query infrastructure? For parallel utility that can do writes it's probably better to keep the discussion in the other part of the thread. I tried to think a little bit about that, but for now I don't have a better idea than adding something similar to intrumentation for utility command to have a general infrastructure, as building a workaround for specific utility looks like the wrong approach. But this would require quite import changes in utility handling, which is maybe not a good idea a couple of week before the feature freeze, and that is definitely not backpatchable so that won't fix the issue for parallel index build that exists since pg11.
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Julien Rouhaud
Date:
On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 29 Mar 2020 at 15:19, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sun, 29 Mar 2020 at 14:23, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sat, Mar 28, 2020 at 8:47 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > > > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > > > > > > > I see some basic problems with the patch. The way it tries to compute > > > > > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > > > > > or point me to any test done where we have computed WAL for parallel > > > > > > operations like Parallel Vacuum or Parallel Create Index? > > > > > > > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > > > > > workers won't be accounted for. That being said, I think that an argument > > > > > could be made that proper infrastructure should have been added in the original > > > > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > > > > > usage in parallel utility, unless I'm missing something. > > > > > > > > Just to be sure I did a quick test with pg_stat_statements behavior using > > > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage > > > > doesn't reflect parallel workers' activity. > > > > > > > > > > Sawada-San would like to investigate this? If not, I will look into > > > this next week. > > > > Sure, I'll investigate this issue today. Thanks for looking at it! > I've run vacuum with/without parallel workers on the table having 5 > indexes. The vacuum reads all blocks of table and indexes. > > * VACUUM command with no parallel workers > =# select total_time, shared_blks_hit, shared_blks_read, > shared_blks_hit + shared_blks_read as total_read_blks, > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > query ~ 'vacuum'; > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > shared_blks_dirtied | shared_blks_written > --------------+-----------------+------------------+-----------------+---------------------+--------------------- > 19857.217207 | 45238 | 226944 | 272182 | > 225943 | 225894 > (1 row) > > * VACUUM command with 4 parallel workers > =# select total_time, shared_blks_hit, shared_blks_read, > shared_blks_hit + shared_blks_read as total_read_blks, > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > query ~ 'vacuum'; > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > shared_blks_dirtied | shared_blks_written > -------------+-----------------+------------------+-----------------+---------------------+--------------------- > 6932.117365 | 45205 | 73079 | 118284 | > 72403 | 72365 > (1 row) > > The total number of blocks of table and indexes are about 182243 > blocks. As Julien reported, obviously the total number of read blocks > during parallel vacuum is much less than single process vacuum's > result. > > Parallel create index has the same issue but it doesn't exist in > parallel queries for SELECTs. > > I think we need to change parallel maintenance commands so that they > report buffer usage like what ParallelQueryMain() does; prepare to > track buffer usage during query execution by > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > after parallel maintenance command. To report buffer usage of parallel > maintenance command correctly, I'm thinking that we can (1) change > parallel create index and parallel vacuum so that they prepare > gathering buffer usage, or (2) have a common entry point for parallel > maintenance commands that is responsible for gathering buffer usage > and calling the entry functions for individual maintenance command. > I'll investigate it more in depth. As I just mentioned, (2) seems like a better design as it's quite likely that the number of parallel-aware utilities will probably continue to increase. One problem also is that parallel CREATE INDEX has been introduced in pg11, so (2) probably won't be packpatchable (and (1) seems problematic too).
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I've run vacuum with/without parallel workers on the table having 5 > > indexes. The vacuum reads all blocks of table and indexes. > > > > * VACUUM command with no parallel workers > > =# select total_time, shared_blks_hit, shared_blks_read, > > shared_blks_hit + shared_blks_read as total_read_blks, > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > query ~ 'vacuum'; > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > shared_blks_dirtied | shared_blks_written > > --------------+-----------------+------------------+-----------------+---------------------+--------------------- > > 19857.217207 | 45238 | 226944 | 272182 | > > 225943 | 225894 > > (1 row) > > > > * VACUUM command with 4 parallel workers > > =# select total_time, shared_blks_hit, shared_blks_read, > > shared_blks_hit + shared_blks_read as total_read_blks, > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > query ~ 'vacuum'; > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > shared_blks_dirtied | shared_blks_written > > -------------+-----------------+------------------+-----------------+---------------------+--------------------- > > 6932.117365 | 45205 | 73079 | 118284 | > > 72403 | 72365 > > (1 row) > > > > The total number of blocks of table and indexes are about 182243 > > blocks. As Julien reported, obviously the total number of read blocks > > during parallel vacuum is much less than single process vacuum's > > result. > > > > Parallel create index has the same issue but it doesn't exist in > > parallel queries for SELECTs. > > > > I think we need to change parallel maintenance commands so that they > > report buffer usage like what ParallelQueryMain() does; prepare to > > track buffer usage during query execution by > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > > after parallel maintenance command. To report buffer usage of parallel > > maintenance command correctly, I'm thinking that we can (1) change > > parallel create index and parallel vacuum so that they prepare > > gathering buffer usage, or (2) have a common entry point for parallel > > maintenance commands that is responsible for gathering buffer usage > > and calling the entry functions for individual maintenance command. > > I'll investigate it more in depth. > > As I just mentioned, (2) seems like a better design as it's quite > likely that the number of parallel-aware utilities will probably > continue to increase. One problem also is that parallel CREATE INDEX > has been introduced in pg11, so (2) probably won't be packpatchable > (and (1) seems problematic too). > I am not sure if we can decide at this stage whether it is back-patchable or not. Let's first see the patch and if it turns out to be complex, then we can try to do some straight-forward fix for back-branches. In general, I don't see why the fix here should be complex? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > I'm not sure that I get your point. I'm assuming that you meant > parallel-read-only queries, but surely buffer usage infrastructure for > parallel query relies on the same approach as non-parallel one (each node > computes the process-local pgBufferUsage diff) and sums all of that at the end > of the parallel query execution. I also don't see how whether the query is > read-only or not is relevant here as far as instrumentation is concerned, > especially since read-only query can definitely do writes and increase the > count of dirtied buffers, like a write query would. For instance a hint > bit change can be done in a parallel query AFAIK, and this can generate WAL > records in wal_log_hints is enabled, so that's probably one way to test it. > Yeah, that way we can test it. Can you try that? > I now think that not adding support for WAL buffers in EXPLAIN output in the > initial patch scope was a mistake, as this is probably the best way to test the > WAL counters for parallel queries. This shouldn't be hard to add though, and I > can work on it quickly if there's still a chance to get this feature included > in pg13. > I am not sure we will add it in Explain or not (maybe we need inputs from others in this regard), but if it helps in testing this part of the patch, then it is a good idea to write a patch for it. You might want to keep it separate from the main patch as we might not commit it. > > I think for now we should remove > > that part of changes and rather think how to get that for parallel > > operations that can write WAL. For ex. we might need to do something > > similar to what this patch has done in begin_parallel_vacuum and > > end_parallel_vacuum. Would you like to attempt that? > > Do you mean removing WAL buffers instrumentation from parallel query > infrastructure? > Yes, I meant that but now I realize we need those and your proposed way of testing it can help us in validating those changes. > For parallel utility that can do writes it's probably better to keep the > discussion in the other part of the thread. > Sure, I am fine with that but I am not sure if it is a good idea to commit this patch without having a way to compute WAL utilization for those commands. I tried to think a little bit > about that, but for now I don't have a better idea than adding something > similar to intrumentation for utility command to have a general infrastructure, > as building a workaround for specific utility looks like the wrong approach. > I don't know what exactly you have in mind as I don't see why it should be too complex. Let's wait for a patch from Sawada-San on buffer usage stuff and in the meantime, we can work on other parts of this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I've run vacuum with/without parallel workers on the table having 5 > > > indexes. The vacuum reads all blocks of table and indexes. > > > > > > * VACUUM command with no parallel workers > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > query ~ 'vacuum'; > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > shared_blks_dirtied | shared_blks_written > > > --------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > 19857.217207 | 45238 | 226944 | 272182 | > > > 225943 | 225894 > > > (1 row) > > > > > > * VACUUM command with 4 parallel workers > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > query ~ 'vacuum'; > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > shared_blks_dirtied | shared_blks_written > > > -------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > 6932.117365 | 45205 | 73079 | 118284 | > > > 72403 | 72365 > > > (1 row) > > > > > > The total number of blocks of table and indexes are about 182243 > > > blocks. As Julien reported, obviously the total number of read blocks > > > during parallel vacuum is much less than single process vacuum's > > > result. > > > > > > Parallel create index has the same issue but it doesn't exist in > > > parallel queries for SELECTs. > > > > > > I think we need to change parallel maintenance commands so that they > > > report buffer usage like what ParallelQueryMain() does; prepare to > > > track buffer usage during query execution by > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > > > after parallel maintenance command. To report buffer usage of parallel > > > maintenance command correctly, I'm thinking that we can (1) change > > > parallel create index and parallel vacuum so that they prepare > > > gathering buffer usage, or (2) have a common entry point for parallel > > > maintenance commands that is responsible for gathering buffer usage > > > and calling the entry functions for individual maintenance command. > > > I'll investigate it more in depth. > > > > As I just mentioned, (2) seems like a better design as it's quite > > likely that the number of parallel-aware utilities will probably > > continue to increase. One problem also is that parallel CREATE INDEX > > has been introduced in pg11, so (2) probably won't be packpatchable > > (and (1) seems problematic too). > > > > I am not sure if we can decide at this stage whether it is > back-patchable or not. Let's first see the patch and if it turns out > to be complex, then we can try to do some straight-forward fix for > back-branches. Agreed. > In general, I don't see why the fix here should be > complex? Yeah, particularly the approach (1) will not be complex. I'll write a patch tomorrow. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Mar 23, 2020 at 11:24:50PM +0900, Fujii Masao wrote: > > > Here are the comments for 0001 patch. > > > > + /* > > + * Report a full page image constructed for the WAL record > > + */ > > + pgWalUsage.wal_fp_records++; > > > > Isn't it better to use "fpw" or "fpi" for the variable name rather than > > "fp" here? In other places, "fpw" and "fpi" are used for full page > > writes/image. Agreed, I went with fpw. > > ISTM that this counter could be incorrect if XLogInsertRecord() determines to > > calculate again whether FPI is necessary or not. No? IOW, this issue could > > happen if XLogInsert() calls XLogRecordAssemble() multiple times in > > its do-while loop. Isn't this problematic? Yes probably. I also see while adding support for EXPLAIN/auto_explain that the previous approach was incrementing both records and fpw_records, while it should be only one of those for each record. I fixed this using the approach I previously mentionned in [1] which seems to work just fine. > > + long wal_bytes; /* size of wal records produced */ > > > > Isn't it safer to use uint64 (i.e., XLogRecPtr) as the type of this variable > > rather than long? Yes indeed. I switched to uint64, and modified everything accordingly (and changed pgss to output numeric as there's no other way to handle unsigned int8) > > + shm_toc_insert(pcxt->toc, PARALLEL_KEY_WAL_USAGE, bufusage_space); > > > > bufusage_space should be walusage_space here? Good catch, fixed. > > /* > > * Finish parallel execution. We wait for parallel workers to finish, and > > * accumulate their buffer usage. > > */ > > > > There are some comments mentioning buffer usage, in execParallel.c. > > For example, the top comment for ExecParallelFinish(), as the above. > > These should be updated. I went through all the file and quickly checked in other places, and I think I fixed all required comments. > Here are the comments for 0002 patch. > > + OUT wal_write_bytes int8, > + OUT wal_write_records int8, > + OUT wal_write_fp_records int8 > > Isn't "write" part in the column names confusing because it's WAL > *generated* (not written) by the statement? Agreed, I simply dropped the "_write" part everywhere. > +RETURNS SETOF record > +AS 'MODULE_PATHNAME', 'pg_stat_statements_1_4' > +LANGUAGE C STRICT VOLATILE; > > PARALLEL SAFE should be specified? Indeed, fixed. > +/* contrib/pg_stat_statements/pg_stat_statements--1.7--1.8.sql */ > > ISTM it's good timing to have also pg_stat_statements--1.8.sql since > the definition of pg_stat_statements() is changed. Thought? As mentionned in other pgss thread, I think the general agreement is to never provide full script anymore, so I didn't changed that. > +-- CHECKPOINT before WAL tests to ensure test stability > +CHECKPOINT; > > Is this true? I thought you added this because the number of FPI > should be larger than zero in the subsequent test. No? But there > seems no such test. I'm not excited about adding the test checking > the number of FPI because it looks fragile, though... It should ensure a FPW for each new block touch, but yes that's quite fragile. Since I fixed the record / FPW record counters, I saw that this was actually already broken as there was a mix of FPW and non-FPW, so I dropped the checkpoint and just tested (wal_record + wal_fpw_record) instead. > +UPDATE pgss_test SET b = '333' WHERE a = 3 \; > +UPDATE pgss_test SET b = '444' WHERE a = 4 ; > > Could you tell me why several queries need to be run to test > the WAL usage? Isn't running a few query enough for the test purpase? As far as I can see it's used to test multiple scenario (single command / multiple commands in or outside explicit transaction). It shouldn't add a lot of overhead and since some commands are issues with "\;" it's also testing proper query string isolation when multi-command query string is provided, which doesn't seem like a bad idea. I didn't changed that but I'm not opposed to remove some of the updates if needed. Also, to answer Amit Kapila's comments about WAL records and parallel query, I added support for both EXPLAIN and auto_explain (tab completion and documentation are also updated), and using a simple table with an index, with forced parallelism and no leader participation and concurrent update on the same table, I could test that WAL usage is working as expected: rjuju=# explain (analyze, wal, verbose) select * from t1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..8805.05 rows=100010 width=14) (actual time=8.695..47.592 rows=100010 loops=1) Output: id, val Workers Planned: 2 Workers Launched: 2 WAL: records=204 bytes=86198 -> Parallel Seq Scan on public.t1 (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.056..29.112 rows=50005 loops Output: id, val WAL: records=204 bytes=86198 Worker 0: actual time=0.060..28.995 rows=49593 loops=1 WAL: records=105 bytes=44222 Worker 1: actual time=0.052..29.230 rows=50417 loops=1 WAL: records=99 bytes=41976 Planning Time: 0.038 ms Execution Time: 53.957 ms (14 rows) and the same query when nothing end up being modified: rjuju=# explain (analyze, wal, verbose) select * from t1; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- Gather (cost=0.00..8805.05 rows=100010 width=14) (actual time=9.413..48.187 rows=100010 loops=1) Output: id, val Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on public.t1 (cost=0.00..8805.05 rows=50005 width=14) (actual time=0.033..24.697 rows=50005 loops Output: id, val Worker 0: actual time=0.028..24.786 rows=50447 loops=1 Worker 1: actual time=0.038..24.609 rows=49563 loops=1 Planning Time: 0.282 ms Execution Time: 55.643 ms (10 rows) So it seems to me that WAL usage infrastructure for parallel query is working just fine. I added the EXPLAIN/auto_explain in a separate commit just in case. [1] https://www.postgresql.org/message-id/CAOBaU_aECK1Z7Nn+x=MhvEwrJzK8wyPsPtWAafjqtZN1fYjEmg@mail.gmail.com
Attachment
Hi Amit, Sorry I just noticed your mail. On Sun, Mar 29, 2020 at 05:12:16PM +0530, Amit Kapila wrote: > On Sun, Mar 29, 2020 at 1:26 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > I'm not sure that I get your point. I'm assuming that you meant > > parallel-read-only queries, but surely buffer usage infrastructure for > > parallel query relies on the same approach as non-parallel one (each node > > computes the process-local pgBufferUsage diff) and sums all of that at the end > > of the parallel query execution. I also don't see how whether the query is > > read-only or not is relevant here as far as instrumentation is concerned, > > especially since read-only query can definitely do writes and increase the > > count of dirtied buffers, like a write query would. For instance a hint > > bit change can be done in a parallel query AFAIK, and this can generate WAL > > records in wal_log_hints is enabled, so that's probably one way to test it. > > > > Yeah, that way we can test it. Can you try that? > > > I now think that not adding support for WAL buffers in EXPLAIN output in the > > initial patch scope was a mistake, as this is probably the best way to test the > > WAL counters for parallel queries. This shouldn't be hard to add though, and I > > can work on it quickly if there's still a chance to get this feature included > > in pg13. > > > > I am not sure we will add it in Explain or not (maybe we need inputs > from others in this regard), but if it helps in testing this part of > the patch, then it is a good idea to write a patch for it. You might > want to keep it separate from the main patch as we might not commit > it. As I just wrote in [1] that's exactly what I did. Using parallel query and concurrent update on a table I could see that WAL usage for parallel query seems to be working as one could expect. > Sure, I am fine with that but I am not sure if it is a good idea to > commit this patch without having a way to compute WAL utilization for > those commands. I'm generally fine with waiting for a fix for the existing issue to be committed. But as the feature freeze is approaching, I hope that it won't mean postponing this feature to v14 because a related 2yo bug has just been discovered, as it would seem a bit unfair.
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > I've run vacuum with/without parallel workers on the table having 5 > > > > indexes. The vacuum reads all blocks of table and indexes. > > > > > > > > * VACUUM command with no parallel workers > > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > > query ~ 'vacuum'; > > > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > > shared_blks_dirtied | shared_blks_written > > > > --------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > > 19857.217207 | 45238 | 226944 | 272182 | > > > > 225943 | 225894 > > > > (1 row) > > > > > > > > * VACUUM command with 4 parallel workers > > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > > query ~ 'vacuum'; > > > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > > shared_blks_dirtied | shared_blks_written > > > > -------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > > 6932.117365 | 45205 | 73079 | 118284 | > > > > 72403 | 72365 > > > > (1 row) > > > > > > > > The total number of blocks of table and indexes are about 182243 > > > > blocks. As Julien reported, obviously the total number of read blocks > > > > during parallel vacuum is much less than single process vacuum's > > > > result. > > > > > > > > Parallel create index has the same issue but it doesn't exist in > > > > parallel queries for SELECTs. > > > > > > > > I think we need to change parallel maintenance commands so that they > > > > report buffer usage like what ParallelQueryMain() does; prepare to > > > > track buffer usage during query execution by > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > > > > after parallel maintenance command. To report buffer usage of parallel > > > > maintenance command correctly, I'm thinking that we can (1) change > > > > parallel create index and parallel vacuum so that they prepare > > > > gathering buffer usage, or (2) have a common entry point for parallel > > > > maintenance commands that is responsible for gathering buffer usage > > > > and calling the entry functions for individual maintenance command. > > > > I'll investigate it more in depth. > > > > > > As I just mentioned, (2) seems like a better design as it's quite > > > likely that the number of parallel-aware utilities will probably > > > continue to increase. One problem also is that parallel CREATE INDEX > > > has been introduced in pg11, so (2) probably won't be packpatchable > > > (and (1) seems problematic too). > > > > > > > I am not sure if we can decide at this stage whether it is > > back-patchable or not. Let's first see the patch and if it turns out > > to be complex, then we can try to do some straight-forward fix for > > back-branches. > > Agreed. > > > In general, I don't see why the fix here should be > > complex? > > Yeah, particularly the approach (1) will not be complex. I'll write a > patch tomorrow. > I've attached two patches fixing this issue for parallel index creation and parallel vacuum. These approaches take the same approach; we allocate DSM to share buffer usage and the leader gathers them, described as approach (1) above. I think this is a straightforward approach for this issue. We can create a common entry point for parallel maintenance command that is responsible for gathering buffer usage as well as sharing query text etc. But it will accompany relatively big change and it might be overkill at this stage. We can discuss that and it will become an item for PG14. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sun, 29 Mar 2020 at 20:15, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Sun, Mar 29, 2020 at 1:44 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > On Sun, Mar 29, 2020 at 9:52 AM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > I've run vacuum with/without parallel workers on the table having 5 > > > > > indexes. The vacuum reads all blocks of table and indexes. > > > > > > > > > > * VACUUM command with no parallel workers > > > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > > > query ~ 'vacuum'; > > > > > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > > > shared_blks_dirtied | shared_blks_written > > > > > --------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > > > 19857.217207 | 45238 | 226944 | 272182 | > > > > > 225943 | 225894 > > > > > (1 row) > > > > > > > > > > * VACUUM command with 4 parallel workers > > > > > =# select total_time, shared_blks_hit, shared_blks_read, > > > > > shared_blks_hit + shared_blks_read as total_read_blks, > > > > > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > > > > > query ~ 'vacuum'; > > > > > > > > > > total_time | shared_blks_hit | shared_blks_read | total_read_blks | > > > > > shared_blks_dirtied | shared_blks_written > > > > > -------------+-----------------+------------------+-----------------+---------------------+--------------------- > > > > > 6932.117365 | 45205 | 73079 | 118284 | > > > > > 72403 | 72365 > > > > > (1 row) > > > > > > > > > > The total number of blocks of table and indexes are about 182243 > > > > > blocks. As Julien reported, obviously the total number of read blocks > > > > > during parallel vacuum is much less than single process vacuum's > > > > > result. > > > > > > > > > > Parallel create index has the same issue but it doesn't exist in > > > > > parallel queries for SELECTs. > > > > > > > > > > I think we need to change parallel maintenance commands so that they > > > > > report buffer usage like what ParallelQueryMain() does; prepare to > > > > > track buffer usage during query execution by > > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > > > > > after parallel maintenance command. To report buffer usage of parallel > > > > > maintenance command correctly, I'm thinking that we can (1) change > > > > > parallel create index and parallel vacuum so that they prepare > > > > > gathering buffer usage, or (2) have a common entry point for parallel > > > > > maintenance commands that is responsible for gathering buffer usage > > > > > and calling the entry functions for individual maintenance command. > > > > > I'll investigate it more in depth. > > > > > > > > As I just mentioned, (2) seems like a better design as it's quite > > > > likely that the number of parallel-aware utilities will probably > > > > continue to increase. One problem also is that parallel CREATE INDEX > > > > has been introduced in pg11, so (2) probably won't be packpatchable > > > > (and (1) seems problematic too). > > > > > > > > > > I am not sure if we can decide at this stage whether it is > > > back-patchable or not. Let's first see the patch and if it turns out > > > to be complex, then we can try to do some straight-forward fix for > > > back-branches. > > > > Agreed. > > > > > In general, I don't see why the fix here should be > > > complex? > > > > Yeah, particularly the approach (1) will not be complex. I'll write a > > patch tomorrow. > > > > I've attached two patches fixing this issue for parallel index > creation and parallel vacuum. These approaches take the same approach; > we allocate DSM to share buffer usage and the leader gathers them, > described as approach (1) above. I think this is a straightforward > approach for this issue. We can create a common entry point for > parallel maintenance command that is responsible for gathering buffer > usage as well as sharing query text etc. But it will accompany > relatively big change and it might be overkill at this stage. We can > discuss that and it will become an item for PG14. > The patch for vacuum conflicts with recent changes in vacuum. So I've attached rebased one. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Julien Rouhaud
Date:
On Mon, Mar 30, 2020 at 04:01:18PM +0900, Masahiko Sawada wrote: > On Mon, 30 Mar 2020 at 15:46, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sun, 29 Mar 2020 at 20:44, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > I think we need to change parallel maintenance commands so that they > > > > > > report buffer usage like what ParallelQueryMain() does; prepare to > > > > > > track buffer usage during query execution by > > > > > > InstrStartParallelQuery(), and report it by InstrEndParallelQuery() > > > > > > after parallel maintenance command. To report buffer usage of parallel > > > > > > maintenance command correctly, I'm thinking that we can (1) change > > > > > > parallel create index and parallel vacuum so that they prepare > > > > > > gathering buffer usage, or (2) have a common entry point for parallel > > > > > > maintenance commands that is responsible for gathering buffer usage > > > > > > and calling the entry functions for individual maintenance command. > > > > > > I'll investigate it more in depth. > > > > > > > > [...] > > > > I've attached two patches fixing this issue for parallel index > > creation and parallel vacuum. These approaches take the same approach; > > we allocate DSM to share buffer usage and the leader gathers them, > > described as approach (1) above. I think this is a straightforward > > approach for this issue. We can create a common entry point for > > parallel maintenance command that is responsible for gathering buffer > > usage as well as sharing query text etc. But it will accompany > > relatively big change and it might be overkill at this stage. We can > > discuss that and it will become an item for PG14. > > > > The patch for vacuum conflicts with recent changes in vacuum. So I've > attached rebased one. Thanks Sawada-san! Just minor nitpicking: + int i; Assert(!IsParallelWorker()); Assert(ParallelVacuumIsActive(lps)); @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, /* Wait for all vacuum workers to finish */ WaitForParallelWorkersToFinish(lps->pcxt); + /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]); We now allow declaring a variable in those loops, so it may be better to avoid declaring i outside the for scope? Other than that both patch looks good to me and a good fit for packpatching. I also did some testing on VACUUM and CREATE INDEX and it works as expected.
On Sun, Mar 29, 2020 at 5:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > @@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata, ProcLastRecPtr = StartPos; XactLastRecEnd = EndPos; + /* Provide WAL update data to the instrumentation */ + if (inserted) + { + pgWalUsage.wal_bytes += rechdr->xl_tot_len; + if (doPageWrites && fpw_lsn <= RedoRecPtr) + pgWalUsage.wal_fpw_records++; + else + pgWalUsage.wal_records++; + } + I think the above code has multiple problems. (a) fpw_lsn can be InvalidXLogRecPtr and still there could be full-page image (for ex. when REGBUF_FORCE_IMAGE flag for buffer is set). (b) There could be multiple FPW records while inserting a record; consider when there are multiple registered buffers. I think the right place to figure this out is XLogRecordAssemble. (c) There are cases when we also attach the record data even when we decide to write FPW (cf. REGBUF_KEEP_DATA), so we might want to increment wal_fpw_records and wal_records for such cases. I think the right place to compute this information is XLogRecordAssemble even though we update it at the place where you have it in the patch. You can probably compute that in local variables and then transfer to pgWalUsage in XLogInsertRecord. I am fine if you can think of some other way but the current patch doesn't seem correct to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote: > On Sun, Mar 29, 2020 at 5:49 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > @@ -1249,6 +1250,16 @@ XLogInsertRecord(XLogRecData *rdata, > ProcLastRecPtr = StartPos; > XactLastRecEnd = EndPos; > > + /* Provide WAL update data to the instrumentation */ > + if (inserted) > + { > + pgWalUsage.wal_bytes += rechdr->xl_tot_len; > + if (doPageWrites && fpw_lsn <= RedoRecPtr) > + pgWalUsage.wal_fpw_records++; > + else > + pgWalUsage.wal_records++; > + } > + > > I think the above code has multiple problems. (a) fpw_lsn can be > InvalidXLogRecPtr and still there could be full-page image (for ex. > when REGBUF_FORCE_IMAGE flag for buffer is set). (b) There could be > multiple FPW records while inserting a record; consider when there are > multiple registered buffers. I think the right place to figure this > out is XLogRecordAssemble. (c) There are cases when we also attach the > record data even when we decide to write FPW (cf. REGBUF_KEEP_DATA), > so we might want to increment wal_fpw_records and wal_records for such > cases. > > I think the right place to compute this information is > XLogRecordAssemble even though we update it at the place where you > have it in the patch. You can probably compute that in local > variables and then transfer to pgWalUsage in XLogInsertRecord. I am > fine if you can think of some other way but the current patch doesn't > seem correct to me. My previous approach was indeed totally broken. v8 attached which hopefully will be ok.
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > The patch for vacuum conflicts with recent changes in vacuum. So I've > attached rebased one. > + /* + * Next, accumulate buffer usage. (This must wait for the workers to + * finish, or we might get incomplete data.) + */ + for (i = 0; i < nworkers; i++) + InstrAccumParallelQuery(&lps->buffer_usage[i]); + This should be done for launched workers aka lps->pcxt->nworkers_launched. I think a similar problem exists in create index related patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > The patch for vacuum conflicts with recent changes in vacuum. So I've > > attached rebased one. > > > > + /* > + * Next, accumulate buffer usage. (This must wait for the workers to > + * finish, or we might get incomplete data.) > + */ > + for (i = 0; i < nworkers; i++) > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > + > > This should be done for launched workers aka > lps->pcxt->nworkers_launched. I think a similar problem exists in > create index related patch. You're right. Fixed in the new patches. On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote: > > Just minor nitpicking: > > + int i; > > Assert(!IsParallelWorker()); > Assert(ParallelVacuumIsActive(lps)); > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, > /* Wait for all vacuum workers to finish */ > WaitForParallelWorkersToFinish(lps->pcxt); > > + /* > + * Next, accumulate buffer usage. (This must wait for the workers to > + * finish, or we might get incomplete data.) > + */ > + for (i = 0; i < nworkers; i++) > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > We now allow declaring a variable in those loops, so it may be better to avoid > declaring i outside the for scope? We can do that but I was not sure if it's good since other codes around there don't use that. So I'd like to leave it for committers. It's a trivial change. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Tue, Mar 31, 2020 at 10:44 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > The patch for vacuum conflicts with recent changes in vacuum. So I've > > > attached rebased one. > > > > > > > + /* > > + * Next, accumulate buffer usage. (This must wait for the workers to > > + * finish, or we might get incomplete data.) > > + */ > > + for (i = 0; i < nworkers; i++) > > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > + > > > > This should be done for launched workers aka > > lps->pcxt->nworkers_launched. I think a similar problem exists in > > create index related patch. > > You're right. Fixed in the new patches. > > On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Just minor nitpicking: > > > > + int i; > > > > Assert(!IsParallelWorker()); > > Assert(ParallelVacuumIsActive(lps)); > > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, > > /* Wait for all vacuum workers to finish */ > > WaitForParallelWorkersToFinish(lps->pcxt); > > > > + /* > > + * Next, accumulate buffer usage. (This must wait for the workers to > > + * finish, or we might get incomplete data.) > > + */ > > + for (i = 0; i < nworkers; i++) > > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > We now allow declaring a variable in those loops, so it may be better to avoid > > declaring i outside the for scope? > > We can do that but I was not sure if it's good since other codes > around there don't use that. So I'd like to leave it for committers. > It's a trivial change. I have reviewed the patch and the patch looks fine to me. One minor comment /+ /* Points to buffer usage are in DSM */ + BufferUsage *buffer_usage; + /buffer usage are in DSM / buffer usage area in DSM -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote: > > > > I think the right place to compute this information is > > XLogRecordAssemble even though we update it at the place where you > > have it in the patch. You can probably compute that in local > > variables and then transfer to pgWalUsage in XLogInsertRecord. I am > > fine if you can think of some other way but the current patch doesn't > > seem correct to me. > > My previous approach was indeed totally broken. v8 attached which hopefully > will be ok. > This is better. Few more comments: 1. The point (c) from my previous email doesn't seem to be fixed properly. Basically, the record data is only attached with FPW in some particular cases like where REGBUF_KEEP_DATA is set, but the patch assumes it is always set. 2. + /* Report a full page imsage constructed for the WAL record */ + *num_fpw += 1; Typo. /imsage/image 3. We need to enhance the patch to cover WAL usage for parallel vacuum and parallel create index based on Sawada-San's latest patch[1] which fixed the case for buffer usage. [1] - https://www.postgresql.org/message-id/CA%2Bfd4k5L4yVoWz0smymmqB4_SMHd2tyJExUgA_ACsL7k00B5XQ%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote: > > > > > > I think the right place to compute this information is > > > XLogRecordAssemble even though we update it at the place where you > > > have it in the patch. You can probably compute that in local > > > variables and then transfer to pgWalUsage in XLogInsertRecord. I am > > > fine if you can think of some other way but the current patch doesn't > > > seem correct to me. > > > > My previous approach was indeed totally broken. v8 attached which hopefully > > will be ok. > > > > This is better. Few more comments: > 1. The point (c) from my previous email doesn't seem to be fixed > properly. Basically, the record data is only attached with FPW in > some particular cases like where REGBUF_KEEP_DATA is set, but the > patch assumes it is always set. As I mentioned multiple times already, I'm really not familiar with the WAL code, so I'll be happy to be proven wrong but my reading is that in XLogRecordAssemble(), there are 2 different things being done: - a FPW is optionally added, iif include_image is true, which doesn't take into account REGBUF_KEEP_DATA. Looking at that part of the code I don't see any sign of the recorded FPW being skipped or discarded if REGBUF_KEEP_DATA is not set, and useful variables such as total_len are modified - then data is also optionally added, iif needs_data is set. IIUC a FPW can be added even if the WAL record doesn't contain data. So the behavior look ok to me, as what seems to be useful it to distinguish 9KB WAL for 1 record of 9KB from 9KB or WAL for 1KB record and 1 FPW. What am I missing here? > 2. > + /* Report a full page imsage constructed for the WAL record */ > + *num_fpw += 1; > > Typo. /imsage/image Oops yes, will fix. > 3. We need to enhance the patch to cover WAL usage for parallel > vacuum and parallel create index based on Sawada-San's latest patch[1] > which fixed the case for buffer usage. I'm sorry but I'm not following. Do you mean adding regression tests for that case?
On Tue, Mar 31, 2020 at 12:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote: > > > > > > I think the right place to compute this information is > > > XLogRecordAssemble even though we update it at the place where you > > > have it in the patch. You can probably compute that in local > > > variables and then transfer to pgWalUsage in XLogInsertRecord. I am > > > fine if you can think of some other way but the current patch doesn't > > > seem correct to me. > > > > My previous approach was indeed totally broken. v8 attached which hopefully > > will be ok. > > > > This is better. Few more comments: > 1. The point (c) from my previous email doesn't seem to be fixed > properly. Basically, the record data is only attached with FPW in > some particular cases like where REGBUF_KEEP_DATA is set, but the > patch assumes it is always set. > > 2. > + /* Report a full page imsage constructed for the WAL record */ > + *num_fpw += 1; > > Typo. /imsage/image > > 3. We need to enhance the patch to cover WAL usage for parallel > vacuum and parallel create index based on Sawada-San's latest patch[1] > which fixed the case for buffer usage. I have started reviewing this patch and I have some comments/questions. 1. @@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage; static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); +WalUsage pgWalUsage; +static WalUsage save_pgWalUsage; + +static void WalUsageAdd(WalUsage *dst, WalUsage *add); Better we move all variable declaration first along with other variables and then function declaration along with other function declaration. That is the convention we follow. 2. { bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0; + bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0; I think you need to run pgindent, we should give only one space between the variable name and '='. so we need to change like below bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0; 3. +typedef struct WalUsage +{ + long wal_records; /* # of WAL records produced */ + long wal_fpw_records; /* # of full page write WAL records + * produced */ IMHO, the name wal_fpw_records is bit confusing, First I thought it is counting the number of wal records which actually has FPW, then after seeing code, I realized that it is actually counting total FPW. Shouldn't we rename it to just wal_fpw? or wal_num_fpw or wal_fpw_count? 4. Currently, we are combining all full-page write force/normal/consistency checks in one category. I am not sure whether it will be good information to know how many are force_fpw and how many are normal_fpw? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 2:51 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > 4. Currently, we are combining all full-page write > force/normal/consistency checks in one category. I am not sure > whether it will be good information to know how many are force_fpw and > how many are normal_fpw? > We can do it if we want but I am not sure how useful it will be. I think we can always enhance this information if people really need this and have a clear use-case in mind. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 2:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 31, 2020 at 8:53 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Mon, Mar 30, 2020 at 03:52:38PM +0530, Amit Kapila wrote: > > > > > > > > I think the right place to compute this information is > > > > XLogRecordAssemble even though we update it at the place where you > > > > have it in the patch. You can probably compute that in local > > > > variables and then transfer to pgWalUsage in XLogInsertRecord. I am > > > > fine if you can think of some other way but the current patch doesn't > > > > seem correct to me. > > > > > > My previous approach was indeed totally broken. v8 attached which hopefully > > > will be ok. > > > > > > > This is better. Few more comments: > > 1. The point (c) from my previous email doesn't seem to be fixed > > properly. Basically, the record data is only attached with FPW in > > some particular cases like where REGBUF_KEEP_DATA is set, but the > > patch assumes it is always set. > > As I mentioned multiple times already, I'm really not familiar with > the WAL code, so I'll be happy to be proven wrong but my reading is > that in XLogRecordAssemble(), there are 2 different things being done: > > - a FPW is optionally added, iif include_image is true, which doesn't > take into account REGBUF_KEEP_DATA. Looking at that part of the code > I don't see any sign of the recorded FPW being skipped or discarded if > REGBUF_KEEP_DATA is not set, and useful variables such as total_len > are modified > - then data is also optionally added, iif needs_data is set. > > IIUC a FPW can be added even if the WAL record doesn't contain data. > So the behavior look ok to me, as what seems to be useful it to > distinguish 9KB WAL for 1 record of 9KB from 9KB or WAL for 1KB record > and 1 FPW. > It is possible that both of us are having different meanings for below two variables: +typedef struct WalUsage +{ + long wal_records; /* # of WAL records produced */ + long wal_fpw_records; /* # of full page write WAL records + * produced */ Let me clarify my understanding. Say if the record is just an FPI (ex. XLOG_FPI) and doesn't contain any data then do we want to add one to each of wal_fpw_records and wal_records? My understanding was in such a case we will just increment wal_fpw_records. > > > 3. We need to enhance the patch to cover WAL usage for parallel > > vacuum and parallel create index based on Sawada-San's latest patch[1] > > which fixed the case for buffer usage. > > I'm sorry but I'm not following. Do you mean adding regression tests > for that case? > No. I mean to say we should implement WAL usage calculation for those two parallel commands. AFAICS, your patch doesn't cover those two commands. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > @@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info) bool doPageWrites; XLogRecPtr fpw_lsn; XLogRecData *rdt; + int num_fpw = 0; /* * Get values needed to decide whether to do full-page writes. Since @@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info) GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, - &fpw_lsn); + &fpw_lsn, &num_fpw); - EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw); } while (EndPos == InvalidXLogRecPtr); I think there are some issues in the num_fpw calculation. For some cases, we have to return from XLogInsert without inserting a record. Basically, we've to recompute/reassemble the same record. In those cases, num_fpw should be reset. Thoughts? -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com
On Tue, Mar 31, 2020 at 11:21 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have started reviewing this patch and I have some comments/questions. Thanks a lot! > > 1. > @@ -22,6 +22,10 @@ static BufferUsage save_pgBufferUsage; > > static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); > > +WalUsage pgWalUsage; > +static WalUsage save_pgWalUsage; > + > +static void WalUsageAdd(WalUsage *dst, WalUsage *add); > > Better we move all variable declaration first along with other > variables and then function declaration along with other function > declaration. That is the convention we follow. Agreed, fixed. > 2. > { > bool need_buffers = (instrument_options & INSTRUMENT_BUFFERS) != 0; > + bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0; > > I think you need to run pgindent, we should give only one space > between the variable name and '='. > so we need to change like below > > bool need_wal = (instrument_options & INSTRUMENT_WAL) != 0; Done. > 3. > +typedef struct WalUsage > +{ > + long wal_records; /* # of WAL records produced */ > + long wal_fpw_records; /* # of full page write WAL records > + * produced */ > > IMHO, the name wal_fpw_records is bit confusing, First I thought it > is counting the number of wal records which actually has FPW, then > after seeing code, I realized that it is actually counting total FPW. > Shouldn't we rename it to just wal_fpw? or wal_num_fpw or > wal_fpw_count? Yes I agree, the name was too confusing. I went with wal_num_fpw. I also used the same for pg_stat_statements. Other fields are usually named with a trailing "s" but wal_fpws just seems too weird. I can change it if consistency is preferred here. > 4. Currently, we are combining all full-page write > force/normal/consistency checks in one category. I am not sure > whether it will be good information to know how many are force_fpw and > how many are normal_fpw? I agree with Amit's POV. For now a single counter seems like enough to diagnose many behaviors. I'll keep answering following mails before sending an updated patchset.
On Tue, Mar 31, 2020 at 12:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > It is possible that both of us are having different meanings for below > two variables: > +typedef struct WalUsage > +{ > + long wal_records; /* # of WAL records produced */ > + long wal_fpw_records; /* # of full page write WAL records > + * produced */ > > > Let me clarify my understanding. Say if the record is just an FPI > (ex. XLOG_FPI) and doesn't contain any data then do we want to add one > to each of wal_fpw_records and wal_records? My understanding was in > such a case we will just increment wal_fpw_records. Yes, as Dilip just pointed out the misunderstanding is due to this poor name. Indeed, in such case what I want is both counters to be incremented. What I want is wal_records to reflect the total number of records generated regardless of any content, and wal_num_fpw the number of full page images, as it seems to make the most sense, and the easiest way to estimate the ratio of data due to FPW. > > > 3. We need to enhance the patch to cover WAL usage for parallel > > > vacuum and parallel create index based on Sawada-San's latest patch[1] > > > which fixed the case for buffer usage. > > > > I'm sorry but I'm not following. Do you mean adding regression tests > > for that case? > > > > No. I mean to say we should implement WAL usage calculation for those > two parallel commands. AFAICS, your patch doesn't cover those two > commands. Oh I see. I just assumed that Sawada-san's patch would be committed first and I'd then rebase the patchset on top of the newly added infrastructure to also handle WAL counters, to avoid any conflict on that bugfix while this new feature is being discussed. I'll rebase the patchset against those patches then.
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Tue, Mar 31, 2020 at 12:20 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Tue, Mar 31, 2020 at 10:44 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > The patch for vacuum conflicts with recent changes in vacuum. So I've > > > > attached rebased one. > > > > > > > > > > + /* > > > + * Next, accumulate buffer usage. (This must wait for the workers to > > > + * finish, or we might get incomplete data.) > > > + */ > > > + for (i = 0; i < nworkers; i++) > > > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > + > > > > > > This should be done for launched workers aka > > > lps->pcxt->nworkers_launched. I think a similar problem exists in > > > create index related patch. > > > > You're right. Fixed in the new patches. > > > > On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > Just minor nitpicking: > > > > > > + int i; > > > > > > Assert(!IsParallelWorker()); > > > Assert(ParallelVacuumIsActive(lps)); > > > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, > > > /* Wait for all vacuum workers to finish */ > > > WaitForParallelWorkersToFinish(lps->pcxt); > > > > > > + /* > > > + * Next, accumulate buffer usage. (This must wait for the workers to > > > + * finish, or we might get incomplete data.) > > > + */ > > > + for (i = 0; i < nworkers; i++) > > > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > > > We now allow declaring a variable in those loops, so it may be better to avoid > > > declaring i outside the for scope? > > > > We can do that but I was not sure if it's good since other codes > > around there don't use that. So I'd like to leave it for committers. > > It's a trivial change. > > I have reviewed the patch and the patch looks fine to me. > > One minor comment > /+ /* Points to buffer usage are in DSM */ > + BufferUsage *buffer_usage; > + > /buffer usage are in DSM / buffer usage area in DSM > While testing I have found one issue. Basically, during a parallel vacuum, it was showing more number of shared_blk_hits+shared_blks_read. After, some investigation, I found that during the cleanup phase nworkers are -1, and because of this we didn't try to launch worker but "lps->pcxt->nworkers_launched" had the old launched worker count and shared memory also had old buffer read data which was never updated as we did not try to launch the worker. diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c index b97b678..5dfaf4d 100644 --- a/src/backend/access/heap/vacuumlazy.c +++ b/src/backend/access/heap/vacuumlazy.c @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */ - for (i = 0; i < lps->pcxt->nworkers_launched; i++) + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); + for (i = 0; i < nworkers; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]); It worked after the above fix.
On Tue, Mar 31, 2020 at 12:21 PM Kuntal Ghosh <kuntalghosh.2007@gmail.com> wrote: > > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > @@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info) > bool doPageWrites; > XLogRecPtr fpw_lsn; > XLogRecData *rdt; > + int num_fpw = 0; > > /* > * Get values needed to decide whether to do full-page writes. Since > @@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info) > GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); > > rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, > - &fpw_lsn); > + &fpw_lsn, &num_fpw); > > - EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); > + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw); > } while (EndPos == InvalidXLogRecPtr); > > I think there are some issues in the num_fpw calculation. For some > cases, we have to return from XLogInsert without inserting a record. > Basically, we've to recompute/reassemble the same record. In those > cases, num_fpw should be reset. Thoughts? Mmm, yes but since that's the same record is being recomputed from the same RedoRecPtr, doesn't it mean that we need to reset the counter? Otherwise we would count the same FPW multiple times.
On Tue, Mar 31, 2020 at 7:39 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Mar 31, 2020 at 12:21 PM Kuntal Ghosh > <kuntalghosh.2007@gmail.com> wrote: > > > > On Mon, Mar 30, 2020 at 6:14 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > @@ -448,6 +449,7 @@ XLogInsert(RmgrId rmid, uint8 info) > > bool doPageWrites; > > XLogRecPtr fpw_lsn; > > XLogRecData *rdt; > > + int num_fpw = 0; > > > > /* > > * Get values needed to decide whether to do full-page writes. Since > > @@ -457,9 +459,9 @@ XLogInsert(RmgrId rmid, uint8 info) > > GetFullPageWriteInfo(&RedoRecPtr, &doPageWrites); > > > > rdt = XLogRecordAssemble(rmid, info, RedoRecPtr, doPageWrites, > > - &fpw_lsn); > > + &fpw_lsn, &num_fpw); > > > > - EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags); > > + EndPos = XLogInsertRecord(rdt, fpw_lsn, curinsert_flags, num_fpw); > > } while (EndPos == InvalidXLogRecPtr); > > > > I think there are some issues in the num_fpw calculation. For some > > cases, we have to return from XLogInsert without inserting a record. > > Basically, we've to recompute/reassemble the same record. In those > > cases, num_fpw should be reset. Thoughts? > > Mmm, yes but since that's the same record is being recomputed from the > same RedoRecPtr, doesn't it mean that we need to reset the counter? > Otherwise we would count the same FPW multiple times. Yes. That was my point as well. I missed the part that you're already resetting the same inside the do-while loop before calling XLogRecordAssemble. Sorry for the noise.
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > While testing I have found one issue. Basically, during a parallel > vacuum, it was showing more number of > shared_blk_hits+shared_blks_read. After, some investigation, I found > that during the cleanup phase nworkers are -1, and because of this we > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the > old launched worker count and shared memory also had old buffer read > data which was never updated as we did not try to launch the worker. > > diff --git a/src/backend/access/heap/vacuumlazy.c > b/src/backend/access/heap/vacuumlazy.c > index b97b678..5dfaf4d 100644 > --- a/src/backend/access/heap/vacuumlazy.c > +++ b/src/backend/access/heap/vacuumlazy.c > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > IndexBulkDeleteResult **stats, > * Next, accumulate buffer usage. (This must wait for the workers to > * finish, or we might get incomplete data.) > */ > - for (i = 0; i < lps->pcxt->nworkers_launched; i++) > + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); > + for (i = 0; i < nworkers; i++) > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > It worked after the above fix. > Good catch. I think we should not even call WaitForParallelWorkersToFinish for such a case. So, I guess the fix could be, if (workers > 0) { WaitForParallelWorkersToFinish(); for (i = 0; i < lps->pcxt->nworkers_launched; i++) InstrAccumParallelQuery(&lps->buffer_usage[i]); } or something along those lines. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Wed, Apr 1, 2020 at 8:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > While testing I have found one issue. Basically, during a parallel > > vacuum, it was showing more number of > > shared_blk_hits+shared_blks_read. After, some investigation, I found > > that during the cleanup phase nworkers are -1, and because of this we > > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the > > old launched worker count and shared memory also had old buffer read > > data which was never updated as we did not try to launch the worker. > > > > diff --git a/src/backend/access/heap/vacuumlazy.c > > b/src/backend/access/heap/vacuumlazy.c > > index b97b678..5dfaf4d 100644 > > --- a/src/backend/access/heap/vacuumlazy.c > > +++ b/src/backend/access/heap/vacuumlazy.c > > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > > IndexBulkDeleteResult **stats, > > * Next, accumulate buffer usage. (This must wait for the workers to > > * finish, or we might get incomplete data.) > > */ > > - for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); > > + for (i = 0; i < nworkers; i++) > > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > It worked after the above fix. > > > > Good catch. I think we should not even call > WaitForParallelWorkersToFinish for such a case. So, I guess the fix > could be, > > if (workers > 0) > { > WaitForParallelWorkersToFinish(); > for (i = 0; i < lps->pcxt->nworkers_launched; i++) > InstrAccumParallelQuery(&lps->buffer_usage[i]); > } > > or something along those lines. Hmm, Right! -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > While testing I have found one issue. Basically, during a parallel > > vacuum, it was showing more number of > > shared_blk_hits+shared_blks_read. After, some investigation, I found > > that during the cleanup phase nworkers are -1, and because of this we > > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the > > old launched worker count and shared memory also had old buffer read > > data which was never updated as we did not try to launch the worker. > > > > diff --git a/src/backend/access/heap/vacuumlazy.c > > b/src/backend/access/heap/vacuumlazy.c > > index b97b678..5dfaf4d 100644 > > --- a/src/backend/access/heap/vacuumlazy.c > > +++ b/src/backend/access/heap/vacuumlazy.c > > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > > IndexBulkDeleteResult **stats, > > * Next, accumulate buffer usage. (This must wait for the workers to > > * finish, or we might get incomplete data.) > > */ > > - for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); > > + for (i = 0; i < nworkers; i++) > > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > It worked after the above fix. > > > > Good catch. I think we should not even call > WaitForParallelWorkersToFinish for such a case. So, I guess the fix > could be, > > if (workers > 0) > { > WaitForParallelWorkersToFinish(); > for (i = 0; i < lps->pcxt->nworkers_launched; i++) > InstrAccumParallelQuery(&lps->buffer_usage[i]); > } > Agreed. I've attached the updated patch. Thank you for testing, Dilip! Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Wed, Apr 1, 2020 at 8:26 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > While testing I have found one issue. Basically, during a parallel > > > vacuum, it was showing more number of > > > shared_blk_hits+shared_blks_read. After, some investigation, I found > > > that during the cleanup phase nworkers are -1, and because of this we > > > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the > > > old launched worker count and shared memory also had old buffer read > > > data which was never updated as we did not try to launch the worker. > > > > > > diff --git a/src/backend/access/heap/vacuumlazy.c > > > b/src/backend/access/heap/vacuumlazy.c > > > index b97b678..5dfaf4d 100644 > > > --- a/src/backend/access/heap/vacuumlazy.c > > > +++ b/src/backend/access/heap/vacuumlazy.c > > > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > > > IndexBulkDeleteResult **stats, > > > * Next, accumulate buffer usage. (This must wait for the workers to > > > * finish, or we might get incomplete data.) > > > */ > > > - for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > > + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); > > > + for (i = 0; i < nworkers; i++) > > > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > > > It worked after the above fix. > > > > > > > Good catch. I think we should not even call > > WaitForParallelWorkersToFinish for such a case. So, I guess the fix > > could be, > > > > if (workers > 0) > > { > > WaitForParallelWorkersToFinish(); > > for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > } > > > > Agreed. I've attached the updated patch. > > Thank you for testing, Dilip! Thanks! One hunk is failing on the latest head. And, I have rebased the patch for my testing so posting the same. I have done some more testing to test multi-pass vacuum. postgres[114321]=# show maintenance_work_mem ; maintenance_work_mem ---------------------- 1MB (1 row) --Test case select pg_stat_statements_reset(); drop table test; CREATE TABLE test (a int, b int); CREATE INDEX idx1 on test(a); CREATE INDEX idx2 on test(b); INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i; DELETE FROM test where a%2=0; VACUUM (PARALLEL n) test; select query, total_time, shared_blks_hit, shared_blks_read, shared_blks_hit + shared_blks_read as total_read_blks, shared_blks_dirtied, shared_blks_written from pg_stat_statements where query like 'VACUUM%'; query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written --------------------------+-------------+-----------------+------------------+-----------------+---------------------+--------------------- VACUUM (PARALLEL 0) test | 5964.282408 | 92447 | 6 | 92453 | 19789 | 0 query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written --------------------------+--------------------+-----------------+------------------+-----------------+---------------------+--------------------- VACUUM (PARALLEL 1) test | 3957.7658810000003 | 92447 | 6 | 92453 | 19789 | 0 (1 row) So I am getting correct results with the multi-pass vacuum. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > Agreed. I've attached the updated patch. > > > > Thank you for testing, Dilip! > > Thanks! One hunk is failing on the latest head. And, I have rebased > the patch for my testing so posting the same. I have done some more > testing to test multi-pass vacuum. > The patch looks good to me. I have done a few minor modifications (a) moved the declaration of variable closer to where it is used, (b) changed a comment, (c) ran pgindent. I have also done some additional testing with more number of indexes and found that vacuum and parallel vacuum used the same number of total_read_blks and that is what is expected here. Let me know what you think of the attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Apr 1, 2020 at 8:26 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 1 Apr 2020 at 11:46, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Mar 31, 2020 at 7:32 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > While testing I have found one issue. Basically, during a parallel > > > > vacuum, it was showing more number of > > > > shared_blk_hits+shared_blks_read. After, some investigation, I found > > > > that during the cleanup phase nworkers are -1, and because of this we > > > > didn't try to launch worker but "lps->pcxt->nworkers_launched" had the > > > > old launched worker count and shared memory also had old buffer read > > > > data which was never updated as we did not try to launch the worker. > > > > > > > > diff --git a/src/backend/access/heap/vacuumlazy.c > > > > b/src/backend/access/heap/vacuumlazy.c > > > > index b97b678..5dfaf4d 100644 > > > > --- a/src/backend/access/heap/vacuumlazy.c > > > > +++ b/src/backend/access/heap/vacuumlazy.c > > > > @@ -2150,7 +2150,8 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > > > > IndexBulkDeleteResult **stats, > > > > * Next, accumulate buffer usage. (This must wait for the workers to > > > > * finish, or we might get incomplete data.) > > > > */ > > > > - for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > > > + nworkers = Min(nworkers, lps->pcxt->nworkers_launched); > > > > + for (i = 0; i < nworkers; i++) > > > > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > > > > > It worked after the above fix. > > > > > > > > > > Good catch. I think we should not even call > > > WaitForParallelWorkersToFinish for such a case. So, I guess the fix > > > could be, > > > > > > if (workers > 0) > > > { > > > WaitForParallelWorkersToFinish(); > > > for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > > InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > } > > > > > > > Agreed. I've attached the updated patch. > > > > Thank you for testing, Dilip! > > Thanks! One hunk is failing on the latest head. And, I have rebased > the patch for my testing so posting the same. I have done some more > testing to test multi-pass vacuum. > > postgres[114321]=# show maintenance_work_mem ; > maintenance_work_mem > ---------------------- > 1MB > (1 row) > > --Test case > select pg_stat_statements_reset(); > drop table test; > CREATE TABLE test (a int, b int); > CREATE INDEX idx1 on test(a); > CREATE INDEX idx2 on test(b); > INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i; > DELETE FROM test where a%2=0; > VACUUM (PARALLEL n) test; > select query, total_time, shared_blks_hit, shared_blks_read, > shared_blks_hit + shared_blks_read as total_read_blks, > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > query like 'VACUUM%'; > > query | total_time | shared_blks_hit | > shared_blks_read | total_read_blks | shared_blks_dirtied | > shared_blks_written > --------------------------+-------------+-----------------+------------------+-----------------+---------------------+--------------------- > VACUUM (PARALLEL 0) test | 5964.282408 | 92447 | > 6 | 92453 | 19789 | 0 > > > query | total_time | shared_blks_hit | > shared_blks_read | total_read_blks | shared_blks_dirtied | > shared_blks_written > --------------------------+--------------------+-----------------+------------------+-----------------+---------------------+--------------------- > VACUUM (PARALLEL 1) test | 3957.7658810000003 | 92447 | > 6 | 92453 | 19789 | > 0 > (1 row) > > So I am getting correct results with the multi-pass vacuum. I have done some testing for the parallel "create index". postgres[99536]=# show maintenance_work_mem ; maintenance_work_mem ---------------------- 1MB (1 row) CREATE TABLE test (a int, b int); INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i; CREATE INDEX idx1 on test(a); select query, total_time, shared_blks_hit, shared_blks_read, shared_blks_hit + shared_blks_read as total_read_blks, shared_blks_dirtied, shared_blks_written from pg_stat_statements where query like 'CREATE INDEX%'; SET max_parallel_maintenance_workers TO 0; query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 1947.4959979999999 | 8947 | 11 | 8958 | 5 | 0 SET max_parallel_maintenance_workers TO 2; query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 1942.1426040000001 | 8960 | 14 | 8974 | 5 | 0 (1 row) I have noticed that the total_read_blks, with the parallel, create index is more compared to non-parallel one. I have created a fresh database before each run. I am not much aware of the internal code of parallel create an index so I am not sure whether it is expected to read extra blocks with the parallel create an index. I guess maybe because multiple workers are inserting int the btree they might need to visit some btree nodes multiple times while traversing the tree down. But, it's better if someone who have more idea with this code can confirm this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all previously mentionned changes. Note that I'm only attaching those patches for convenience and to make sure that cfbot is happy.
Attachment
- v9-0001-Allow-parallel-vacuum-to-accumulate-buffer-usage.patch
- v9-0002-Allow-parallel-index-creation-to-accumulate-buffe.patch
- v9-0003-Add-infrastructure-to-track-WAL-usage.patch
- v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut.patch
- v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements.patch
- v9-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum-.patch
On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all > previously mentionned changes. > Few other comments: v9-0003-Add-infrastructure-to-track-WAL-usage 1. static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); - +static void WalUsageAdd(WalUsage *dst, WalUsage *add); Looks like a spurious line removal 2. + /* Report a full page imsage constructed for the WAL record */ + *num_fpw += 1; Typo. /imsage/image 3. Doing some testing with and without parallelism to ensure WAL usage data is correct would be great and if possible, share the results? v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements 4. +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes_generated | wal_records_generated | wal_records_as_rows +------------------------------------------------------------------+-------+------+---------------------+-----------------------+--------------------- + DELETE FROM pgss_test WHERE a > $1 | 1 | 1 | t | t | t + DROP TABLE pgss_test | 1 | 0 | t | t | f + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | 1 | 3 | t | t | t + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | 1 | 10 | t | t | t + SELECT * FROM pgss_test ORDER BY a | 1 | 12 | f | f | f + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | 2 | 4 | f | f | f + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | 1 | 8 | f | f | f + SELECT pg_stat_statements_reset() | 1 | 1 | f | f | f + SET pg_stat_statements.track_utility = FALSE | 1 | 0 | f | f | t + UPDATE pgss_test SET b = $1 WHERE a = $2 | 6 | 6 | t | t | t + UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 | 3 | t | t | t +(11 rows) + I am not sure if the above tests make much sense as they are just testing that if WAL is generated for these commands. I understand it is not easy to make these tests reliable but in that case, we can think of some simple tests. It seems to me that the difficulty is due to full_page_writes as that depends on the checkpoint. Can we make full_page_writes = off for these tests and check some simple Insert/Update/Delete cases? Alternatively, if you can present the reason why that is unstable or are tricky to write, then we can simply get rid of these tests because I don't see tests for BufferUsage. Let not write tests for the sake of writing it unless they can detect bugs in the future or are meaningfully covering the new code added. 5. -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; - query | calls | rows ------------------------------------+-------+------ - SELECT $1::TEXT | 1 | 1 - SELECT PLUS_ONE($1) | 2 | 2 - SELECT PLUS_TWO($1) | 2 | 2 - SELECT pg_stat_statements_reset() | 1 | 1 +SELECT query, calls, rows, wal_bytes, wal_records FROM pg_stat_statements ORDER BY query COLLATE "C"; + query | calls | rows | wal_bytes | wal_records +-----------------------------------+-------+------+-----------+------------- + SELECT $1::TEXT | 1 | 1 | 0 | 0 + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 (4 rows) Again, I am not sure if these modifications make much sense? 6. static void pgss_shmem_startup(void); @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId, int query_location, int query_len, double total_time, uint64 rows, const BufferUsage *bufusage, + const WalUsage* walusage, pgssJumbleState *jstate); The alignment for walusage doesn't seem to be correct. Running pgindent will fix this. 7. + values[i++] = Int64GetDatumFast(tmp.wal_records); + values[i++] = UInt64GetDatum(tmp.wal_num_fpw); Why are they different? I think we should use the same *GetDatum API (probably Int64GetDatumFast) for these. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > One more comment related to this patch. + + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); + + /* Convert to numeric. */ + wal_bytes = DirectFunctionCall3(numeric_in, + CStringGetDatum(buf), + ObjectIdGetDatum(0), + Int32GetDatum(-1)); + + values[i++] = wal_bytes; I see that other places that display uint64 values use BIGINT datatype in SQL, so why can't we do the same here? See the usage of queryid in pg_stat_statements or internal_pages, *_pages exposed via pgstatindex.c. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Wed, Apr 1, 2020 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 1, 2020 at 8:51 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > Agreed. I've attached the updated patch. > > > > > > Thank you for testing, Dilip! > > > > Thanks! One hunk is failing on the latest head. And, I have rebased > > the patch for my testing so posting the same. I have done some more > > testing to test multi-pass vacuum. > > > > The patch looks good to me. I have done a few minor modifications (a) > moved the declaration of variable closer to where it is used, (b) > changed a comment, (c) ran pgindent. I have also done some additional > testing with more number of indexes and found that vacuum and parallel > vacuum used the same number of total_read_blks and that is what is > expected here. > > Let me know what you think of the attached? The patch looks fine to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 5:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 1, 2020 at 4:29 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > > > > One more comment related to this patch. > + > + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); > + > + /* Convert to numeric. */ > + wal_bytes = DirectFunctionCall3(numeric_in, > + CStringGetDatum(buf), > + ObjectIdGetDatum(0), > + Int32GetDatum(-1)); > + > + values[i++] = wal_bytes; > > I see that other places that display uint64 values use BIGINT datatype > in SQL, so why can't we do the same here? See the usage of queryid in > pg_stat_statements or internal_pages, *_pages exposed via > pgstatindex.c. I have reviewed 0003 and 0004, I have a few comments. v9-0003-Add-infrastructure-to-track-WAL-usage 1. /* Points to buffer usage area in DSM */ BufferUsage *buffer_usage; + /* Points to WAL usage area in DSM */ + WalUsage *wal_usage; Better to give one blank line between the previous statement/variable declaration and the next comment line. /* Points to buffer usage area in DSM */ BufferUsage *buffer_usage; ---------Empty line here-------------------- + /* Points to WAL usage area in DSM */ + WalUsage *wal_usage; 2. @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, WaitForParallelWorkersToFinish(lps->pcxt); for (i = 0; i < lps->pcxt->nworkers_launched; i++) - InstrAccumParallelQuery(&lps->buffer_usage[i]); + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]); } The existing comment above this loop, which just mentions the buffer usage, not the wal usage so I guess we need to change that. " /* * Next, accumulate buffer usage. (This must wait for the workers to * finish, or we might get incomplete data.) */" v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut 3. + if (usage->wal_num_fpw > 0) + appendStringInfo(es->str, " full page records=%ld", + usage->wal_num_fpw); + if (usage->wal_bytes > 0) + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, + usage->wal_bytes); Shall we change to 'full page writes' or 'full page image' instead of full page records? Apart from this, I have some testing to see the wal_usage with the parallel vacuum and the results look fine. postgres[104248]=# CREATE TABLE test (a int, b int); CREATE TABLE postgres[104248]=# INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i; INSERT 0 2000000 postgres[104248]=# CREATE INDEX idx1 on test(a); CREATE INDEX postgres[104248]=# VACUUM (PARALLEL 1) test; VACUUM postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query like 'VACUUM%'; query | wal_bytes | wal_records | wal_num_fpw --------------------------+-----------+-------------+------------- VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855 postgres[106479]=# CREATE TABLE test (a int, b int); CREATE TABLE postgres[106479]=# INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i; INSERT 0 2000000 postgres[106479]=# CREATE INDEX idx1 on test(a); CREATE INDEX postgres[106479]=# VACUUM (PARALLEL 0) test; VACUUM postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query like 'VACUUM%'; query | wal_bytes | wal_records | wal_num_fpw --------------------------+-----------+-------------+------------- VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855 By tomorrow, I will try to finish reviewing 0005 and 0006. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Hi, I'm replying here to all reviews that have been sent, thanks a lot! On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes > > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all > > previously mentionned changes. > > > > Few other comments: > v9-0003-Add-infrastructure-to-track-WAL-usage > 1. > static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); > - > +static void WalUsageAdd(WalUsage *dst, WalUsage *add); > > Looks like a spurious line removal Fixed. > 2. > + /* Report a full page imsage constructed for the WAL record */ > + *num_fpw += 1; > > Typo. /imsage/image Ah sorry I though I fixed it previously, fixed. > 3. Doing some testing with and without parallelism to ensure WAL usage > data is correct would be great and if possible, share the results? I just saw that Dilip did some testing, but just in case here is some additional one - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%'; query | calls | wal_bytes | wal_records | wal_num_fpw ------------------------+-------+-----------+-------------+------------- vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 (2 rows) - create index, overload t1's parallel_workers, using the 1M line just vacuumed: =# alter table t1 set (parallel_workers = 2); ALTER TABLE =# create index t1_parallel_2 on t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 0); ALTER TABLE =# create index t1_parallel_0 on t1(id); CREATE INDEX =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; query | calls | wal_bytes | wal_records | wal_num_fpw --------------------------------------+-------+-----------+-------------+------------- create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745 create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758 (2 rows) It all looks good to me. > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > 4. > +-- SELECT usage data, check WAL usage is reported, wal_records equal > rows count for INSERT/UPDATE/DELETE > +SELECT query, calls, rows, > +wal_bytes > 0 as wal_bytes_generated, > +wal_records > 0 as wal_records_generated, > +wal_records = rows as wal_records_as_rows > +FROM pg_stat_statements ORDER BY query COLLATE "C"; > + query | > calls | rows | wal_bytes_generated | wal_records_generated | > wal_records_as_rows > +------------------------------------------------------------------+-------+------+---------------------+-----------------------+--------------------- > + DELETE FROM pgss_test WHERE a > $1 | > 1 | 1 | t | t | t > + DROP TABLE pgss_test | > 1 | 0 | t | t | f > + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | > 1 | 3 | t | t | t > + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | > 1 | 10 | t | t | t > + SELECT * FROM pgss_test ORDER BY a | > 1 | 12 | f | f | f > + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | > 2 | 4 | f | f | f > + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | > 1 | 8 | f | f | f > + SELECT pg_stat_statements_reset() | > 1 | 1 | f | f | f > + SET pg_stat_statements.track_utility = FALSE | > 1 | 0 | f | f | t > + UPDATE pgss_test SET b = $1 WHERE a = $2 | > 6 | 6 | t | t | t > + UPDATE pgss_test SET b = $1 WHERE a > $2 | > 1 | 3 | t | t | t > +(11 rows) > + > > I am not sure if the above tests make much sense as they are just > testing that if WAL is generated for these commands. I understand it > is not easy to make these tests reliable but in that case, we can > think of some simple tests. It seems to me that the difficulty is due > to full_page_writes as that depends on the checkpoint. Can we make > full_page_writes = off for these tests and check some simple > Insert/Update/Delete cases? Alternatively, if you can present the > reason why that is unstable or are tricky to write, then we can simply > get rid of these tests because I don't see tests for BufferUsage. Let > not write tests for the sake of writing it unless they can detect bugs > in the future or are meaningfully covering the new code added. I don't think that we can have any hope in a stable amount of WAL bytes generated, so testing a positive number looks sensible to me. Then testing that each 1-line-write query generates a WAL record also looks sensible, so I kept this. I realized that Kirill used an existing set of queries that were previously added to validate the multi queries commands behavior, so there's no need to have all of them again. I just kept one of each (insert, update, delete, select) to make sure that we do record WAL activity there, but I don't think that more can really be done. I still think that this is better than nothing, but if you disagree feel free to drop those tests. > 5. > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; > - query | calls | rows > ------------------------------------+-------+------ > - SELECT $1::TEXT | 1 | 1 > - SELECT PLUS_ONE($1) | 2 | 2 > - SELECT PLUS_TWO($1) | 2 | 2 > - SELECT pg_stat_statements_reset() | 1 | 1 > +SELECT query, calls, rows, wal_bytes, wal_records FROM > pg_stat_statements ORDER BY query COLLATE "C"; > + query | calls | rows | wal_bytes | wal_records > +-----------------------------------+-------+------+-----------+------------- > + SELECT $1::TEXT | 1 | 1 | 0 | 0 > + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 > + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 > + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 > (4 rows) > > Again, I am not sure if these modifications make much sense? Those are queries that were previously executed. As those are read-only query, that are pretty much guaranteed to not cause any WAL activity, I don't see how it hurts to test at the same time that that's we indeed record with pg_stat_statements, just to be safe. Once again, feel free to drop the extra wal_* columns from the output if you disagree. > 6. > static void pgss_shmem_startup(void); > @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId, > int query_location, int query_len, > double total_time, uint64 rows, > const BufferUsage *bufusage, > + const WalUsage* walusage, > pgssJumbleState *jstate); > > The alignment for walusage doesn't seem to be correct. Running > pgindent will fix this. Indeed, fixed. > 7. > + values[i++] = Int64GetDatumFast(tmp.wal_records); > + values[i++] = UInt64GetDatum(tmp.wal_num_fpw); > > Why are they different? I think we should use the same *GetDatum API > (probably Int64GetDatumFast) for these. Oops, that's a mistake from when I was working on the wal_bytes output, fixed. > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > > > > One more comment related to this patch. > + > + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); > + > + /* Convert to numeric. */ > + wal_bytes = DirectFunctionCall3(numeric_in, > + CStringGetDatum(buf), > + ObjectIdGetDatum(0), > + Int32GetDatum(-1)); > + > + values[i++] = wal_bytes; > > I see that other places that display uint64 values use BIGINT datatype > in SQL, so why can't we do the same here? See the usage of queryid in > pg_stat_statements or internal_pages, *_pages exposed via > pgstatindex.c. That's because it's harmless to report a signed number for a hash (at least comapred to the overhead of having it unsigned), while that's certainly not wanted to report a negative amount of WAL bytes generated if it goes beyond bigint limit. See the usage of pg_lsn_mi in pg_lsn.c for instance. On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote: > > I have reviewed 0003 and 0004, I have a few comments. > v9-0003-Add-infrastructure-to-track-WAL-usage > > 1. > /* Points to buffer usage area in DSM */ > BufferUsage *buffer_usage; > + /* Points to WAL usage area in DSM */ > + WalUsage *wal_usage; > > Better to give one blank line between the previous statement/variable > declaration and the next comment line. Fixed. > 2. > @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > IndexBulkDeleteResult **stats, > WaitForParallelWorkersToFinish(lps->pcxt); > > for (i = 0; i < lps->pcxt->nworkers_launched; i++) > - InstrAccumParallelQuery(&lps->buffer_usage[i]); > + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]); > } > > The existing comment above this loop, which just mentions the buffer > usage, not the wal usage so I guess we need to change that. Ah indeed, I thought I caught all the comments but missed this one. Fixed. > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut > > 3. > + if (usage->wal_num_fpw > 0) > + appendStringInfo(es->str, " full page records=%ld", > + usage->wal_num_fpw); > + if (usage->wal_bytes > 0) > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, > + usage->wal_bytes); > > Shall we change to 'full page writes' or 'full page image' instead of > full page records? Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed. > Apart from this, I have some testing to see the wal_usage with the > parallel vacuum and the results look fine. > > postgres[104248]=# CREATE TABLE test (a int, b int); > CREATE TABLE > postgres[104248]=# INSERT INTO test SELECT i, i FROM > GENERATE_SERIES(1,2000000) as i; > INSERT 0 2000000 > postgres[104248]=# CREATE INDEX idx1 on test(a); > CREATE INDEX > postgres[104248]=# VACUUM (PARALLEL 1) test; > VACUUM > postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw > from pg_stat_statements where query like 'VACUUM%'; > query | wal_bytes | wal_records | wal_num_fpw > --------------------------+-----------+-------------+------------- > VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855 > > > > postgres[106479]=# CREATE TABLE test (a int, b int); > CREATE TABLE > postgres[106479]=# INSERT INTO test SELECT i, i FROM > GENERATE_SERIES(1,2000000) as i; > INSERT 0 2000000 > postgres[106479]=# CREATE INDEX idx1 on test(a); > CREATE INDEX > postgres[106479]=# VACUUM (PARALLEL 0) test; > VACUUM > postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw > from pg_stat_statements where query like 'VACUUM%'; > query | wal_bytes | wal_records | wal_num_fpw > --------------------------+-----------+-------------+------------- > VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855 Thanks! I did some similar testing, with also seq/parallel index creation and got similar results. > By tomorrow, I will try to finish reviewing 0005 and 0006. Thanks!
Attachment
- v10-0001-Allow-parallel-vacuum-to-accumulate-buffer-usage.patch
- v10-0002-Allow-parallel-index-creation-to-accumulate-buff.patch
- v10-0003-Add-infrastructure-to-track-WAL-usage.patch
- v10-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au.patch
- v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements.patch
- v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum.patch
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
Adding Peter G. On Wed, Apr 1, 2020 at 12:41 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I have done some testing for the parallel "create index". > > postgres[99536]=# show maintenance_work_mem ; > maintenance_work_mem > ---------------------- > 1MB > (1 row) > > CREATE TABLE test (a int, b int); > INSERT INTO test SELECT i, i FROM GENERATE_SERIES(1,2000000) as i; > CREATE INDEX idx1 on test(a); > select query, total_time, shared_blks_hit, shared_blks_read, > shared_blks_hit + shared_blks_read as total_read_blks, > shared_blks_dirtied, shared_blks_written from pg_stat_statements where > query like 'CREATE INDEX%'; > > > SET max_parallel_maintenance_workers TO 0; > query | total_time | shared_blks_hit | > shared_blks_read | total_read_blks | shared_blks_dirtied | > shared_blks_written > ------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+--------------------- > CREATE INDEX idx1 on test(a) | 1947.4959979999999 | 8947 | > 11 | 8958 | 5 | > 0 > > SET max_parallel_maintenance_workers TO 2; > > query | total_time | shared_blks_hit | > shared_blks_read | total_read_blks | shared_blks_dirtied | > shared_blks_written > ------------------------------+--------------------+-----------------+------------------+-----------------+---------------------+--------------------- > CREATE INDEX idx1 on test(a) | 1942.1426040000001 | 8960 | > 14 | 8974 | 5 | > 0 > (1 row) > > I have noticed that the total_read_blks, with the parallel, create > index is more compared to non-parallel one. I have created a fresh > database before each run. I am not much aware of the internal code of > parallel create an index so I am not sure whether it is expected to > read extra blocks with the parallel create an index. I guess maybe > because multiple workers are inserting int the btree they might need > to visit some btree nodes multiple times while traversing the tree > down. But, it's better if someone who have more idea with this code > can confirm this. > Peter, Is this behavior expected? Let me summarize the situation so that it would be easier for Peter to comment. Julien has noticed that parallel vacuum and parallel create index doesn't seem to report correct values for buffer usage stats. Sawada-San wrote a patch to fix the problem for both the cases. We expect that 'total_read_blks' as reported in pg_stat_statements should give the same value for parallel and non-parallel operations. We see that is true for parallel vacuum and previously we have the same observation for the parallel query. Now, for parallel create index this doesn't seem to be true as test results by Dilip show that. We have two possibilities here (a) there is some bug in Sawada-San's patch or (b) this is expected behavior for parallel create index. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Peter Geoghegan
Date:
On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > Peter, Is this behavior expected? > > Let me summarize the situation so that it would be easier for Peter to > comment. Julien has noticed that parallel vacuum and parallel create > index doesn't seem to report correct values for buffer usage stats. > Sawada-San wrote a patch to fix the problem for both the cases. We > expect that 'total_read_blks' as reported in pg_stat_statements should > give the same value for parallel and non-parallel operations. We see > that is true for parallel vacuum and previously we have the same > observation for the parallel query. Now, for parallel create index > this doesn't seem to be true as test results by Dilip show that. We > have two possibilities here (a) there is some bug in Sawada-San's > patch or (b) this is expected behavior for parallel create index. > What do you think? nbtree CREATE INDEX doesn't even go through the buffer manager. The difference that Dilip showed is probably due to extra catalog accesses in the two parallel workers -- pg_amproc lookups, and the like. Those are rather small differences, overall. Can Dilip demonstrate the the "extra" buffer accesses are proportionate to the number of workers launched in some constant, predictable way? -- Peter Geoghegan
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Thu, Apr 2, 2020 at 8:34 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Peter, Is this behavior expected? > > > > Let me summarize the situation so that it would be easier for Peter to > > comment. Julien has noticed that parallel vacuum and parallel create > > index doesn't seem to report correct values for buffer usage stats. > > Sawada-San wrote a patch to fix the problem for both the cases. We > > expect that 'total_read_blks' as reported in pg_stat_statements should > > give the same value for parallel and non-parallel operations. We see > > that is true for parallel vacuum and previously we have the same > > observation for the parallel query. Now, for parallel create index > > this doesn't seem to be true as test results by Dilip show that. We > > have two possibilities here (a) there is some bug in Sawada-San's > > patch or (b) this is expected behavior for parallel create index. > > What do you think? > > nbtree CREATE INDEX doesn't even go through the buffer manager. Thanks for clarifying. So IIUC, it will not go through the buffer manager for the index pages, but for the heap pages, it will still go through the buffer manager. > The > difference that Dilip showed is probably due to extra catalog accesses > in the two parallel workers -- pg_amproc lookups, and the like. Those > are rather small differences, overall. > Can Dilip demonstrate the the "extra" buffer accesses are > proportionate to the number of workers launched in some constant, > predictable way? Okay, I will test this. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Dilip Kumar
Date:
On Thu, Apr 2, 2020 at 9:13 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Apr 2, 2020 at 8:34 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Wed, Apr 1, 2020 at 7:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Peter, Is this behavior expected? > > > > > > Let me summarize the situation so that it would be easier for Peter to > > > comment. Julien has noticed that parallel vacuum and parallel create > > > index doesn't seem to report correct values for buffer usage stats. > > > Sawada-San wrote a patch to fix the problem for both the cases. We > > > expect that 'total_read_blks' as reported in pg_stat_statements should > > > give the same value for parallel and non-parallel operations. We see > > > that is true for parallel vacuum and previously we have the same > > > observation for the parallel query. Now, for parallel create index > > > this doesn't seem to be true as test results by Dilip show that. We > > > have two possibilities here (a) there is some bug in Sawada-San's > > > patch or (b) this is expected behavior for parallel create index. > > > What do you think? > > > > nbtree CREATE INDEX doesn't even go through the buffer manager. > > Thanks for clarifying. So IIUC, it will not go through the buffer > manager for the index pages, but for the heap pages, it will still go > through the buffer manager. > > > The > > difference that Dilip showed is probably due to extra catalog accesses > > in the two parallel workers -- pg_amproc lookups, and the like. Those > > are rather small differences, overall. > > > Can Dilip demonstrate the the "extra" buffer accesses are > > proportionate to the number of workers launched in some constant, > > predictable way? > > Okay, I will test this. 0-worker query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+-------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 1228.895057 | 8947 | 11 | 8971 | 5 | 0 1-worker query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+-------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 1006.157231 | 8962 | 12 | 8974 | 5 | 0 2-workers query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 949.44663 | 8965 | 12 | 8977 | 5 | 0 3-workers query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+-------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 1037.297196 | 8968 | 12 | 8980 | 5 | 0 4-workers query | total_time | shared_blks_hit | shared_blks_read | total_read_blks | shared_blks_dirtied | shared_blks_written ------------------------------+------------+-----------------+------------------+-----------------+---------------------+--------------------- CREATE INDEX idx1 on test(a) | 889.332782 | 8971 | 12 | 8983 | 6 | 0 You are right, it is increasing with some constant factor. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > > 3. Doing some testing with and without parallelism to ensure WAL usage > > data is correct would be great and if possible, share the results? > > > I just saw that Dilip did some testing, but just in case here is some > additional one > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%'; > query | calls | wal_bytes | wal_records | wal_num_fpw > ------------------------+-------+-----------+-------------+------------- > vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 > vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 > (2 rows) > > - create index, overload t1's parallel_workers, using the 1M line just > vacuumed: > > =# alter table t1 set (parallel_workers = 2); > ALTER TABLE > > =# create index t1_parallel_2 on t1(id); > CREATE INDEX > > =# alter table t1 set (parallel_workers = 0); > ALTER TABLE > > =# create index t1_parallel_0 on t1(id); > CREATE INDEX > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > query | calls | wal_bytes | wal_records | wal_num_fpw > --------------------------------------+-------+-----------+-------------+------------- > create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745 > create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758 > (2 rows) > > It all looks good to me. > Here the wal_num_fpw and wal_bytes are different between parallel and non-parallel versions. Is it due to checkpoint or something else? We can probably rule out checkpoint by increasing checkpoint_timeout and other checkpoint related parameters. > > > 5. > > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; > > - query | calls | rows > > ------------------------------------+-------+------ > > - SELECT $1::TEXT | 1 | 1 > > - SELECT PLUS_ONE($1) | 2 | 2 > > - SELECT PLUS_TWO($1) | 2 | 2 > > - SELECT pg_stat_statements_reset() | 1 | 1 > > +SELECT query, calls, rows, wal_bytes, wal_records FROM > > pg_stat_statements ORDER BY query COLLATE "C"; > > + query | calls | rows | wal_bytes | wal_records > > +-----------------------------------+-------+------+-----------+------------- > > + SELECT $1::TEXT | 1 | 1 | 0 | 0 > > + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 > > + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 > > + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 > > (4 rows) > > > > Again, I am not sure if these modifications make much sense? > > > Those are queries that were previously executed. As those are read-only query, > that are pretty much guaranteed to not cause any WAL activity, I don't see how > it hurts to test at the same time that that's we indeed record with > pg_stat_statements, just to be safe. > On a similar theory, one could have checked bufferusage stats as well. The statements are using some expressions so don't see any value in check all usage data for such statements. > Once again, feel free to drop the extra > wal_* columns from the output if you disagree. > Right now, that particular patch is not getting applied (probably due to recent commit 17e0328224). Can you rebase it? > > > > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut > > > > 3. > > + if (usage->wal_num_fpw > 0) > > + appendStringInfo(es->str, " full page records=%ld", > > + usage->wal_num_fpw); > > + if (usage->wal_bytes > 0) > > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, > > + usage->wal_bytes); > > > > Shall we change to 'full page writes' or 'full page image' instead of > > full page records? > > > Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed. > I don't see this change in the patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 11:07 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > Also, I forgot to mention that let's not base this on buffer usage patch for create index (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as per recent discussion I am not sure about its usefulness. I think we can proceed with this patch without v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Hi, > > I'm replying here to all reviews that have been sent, thanks a lot! > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > > On Wed, Apr 1, 2020 at 1:32 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > So here's a v9, rebased on top of the latest versions of Sawada-san's bug fixes > > > (Amit's v6 for vacuum and Sawada-san's v2 for create index), with all > > > previously mentionned changes. > > > > > > > Few other comments: > > v9-0003-Add-infrastructure-to-track-WAL-usage > > 1. > > static void BufferUsageAdd(BufferUsage *dst, const BufferUsage *add); > > - > > +static void WalUsageAdd(WalUsage *dst, WalUsage *add); > > > > Looks like a spurious line removal > > > Fixed. > > > > 2. > > + /* Report a full page imsage constructed for the WAL record */ > > + *num_fpw += 1; > > > > Typo. /imsage/image > > > Ah sorry I though I fixed it previously, fixed. > > > > 3. Doing some testing with and without parallelism to ensure WAL usage > > data is correct would be great and if possible, share the results? > > > I just saw that Dilip did some testing, but just in case here is some > additional one > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%'; > query | calls | wal_bytes | wal_records | wal_num_fpw > ------------------------+-------+-----------+-------------+------------- > vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 > vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 > (2 rows) > > - create index, overload t1's parallel_workers, using the 1M line just > vacuumed: > > =# alter table t1 set (parallel_workers = 2); > ALTER TABLE > > =# create index t1_parallel_2 on t1(id); > CREATE INDEX > > =# alter table t1 set (parallel_workers = 0); > ALTER TABLE > > =# create index t1_parallel_0 on t1(id); > CREATE INDEX > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > query | calls | wal_bytes | wal_records | wal_num_fpw > --------------------------------------+-------+-----------+-------------+------------- > create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745 > create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758 > (2 rows) > > It all looks good to me. > > > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > > 4. > > +-- SELECT usage data, check WAL usage is reported, wal_records equal > > rows count for INSERT/UPDATE/DELETE > > +SELECT query, calls, rows, > > +wal_bytes > 0 as wal_bytes_generated, > > +wal_records > 0 as wal_records_generated, > > +wal_records = rows as wal_records_as_rows > > +FROM pg_stat_statements ORDER BY query COLLATE "C"; > > + query | > > calls | rows | wal_bytes_generated | wal_records_generated | > > wal_records_as_rows > > +------------------------------------------------------------------+-------+------+---------------------+-----------------------+--------------------- > > + DELETE FROM pgss_test WHERE a > $1 | > > 1 | 1 | t | t | t > > + DROP TABLE pgss_test | > > 1 | 0 | t | t | f > > + INSERT INTO pgss_test (a, b) VALUES ($1, $2), ($3, $4), ($5, $6) | > > 1 | 3 | t | t | t > > + INSERT INTO pgss_test VALUES(generate_series($1, $2), $3) | > > 1 | 10 | t | t | t > > + SELECT * FROM pgss_test ORDER BY a | > > 1 | 12 | f | f | f > > + SELECT * FROM pgss_test WHERE a > $1 ORDER BY a | > > 2 | 4 | f | f | f > > + SELECT * FROM pgss_test WHERE a IN ($1, $2, $3, $4, $5) | > > 1 | 8 | f | f | f > > + SELECT pg_stat_statements_reset() | > > 1 | 1 | f | f | f > > + SET pg_stat_statements.track_utility = FALSE | > > 1 | 0 | f | f | t > > + UPDATE pgss_test SET b = $1 WHERE a = $2 | > > 6 | 6 | t | t | t > > + UPDATE pgss_test SET b = $1 WHERE a > $2 | > > 1 | 3 | t | t | t > > +(11 rows) > > + > > > > I am not sure if the above tests make much sense as they are just > > testing that if WAL is generated for these commands. I understand it > > is not easy to make these tests reliable but in that case, we can > > think of some simple tests. It seems to me that the difficulty is due > > to full_page_writes as that depends on the checkpoint. Can we make > > full_page_writes = off for these tests and check some simple > > Insert/Update/Delete cases? Alternatively, if you can present the > > reason why that is unstable or are tricky to write, then we can simply > > get rid of these tests because I don't see tests for BufferUsage. Let > > not write tests for the sake of writing it unless they can detect bugs > > in the future or are meaningfully covering the new code added. > > > I don't think that we can have any hope in a stable amount of WAL bytes > generated, so testing a positive number looks sensible to me. Then testing > that each 1-line-write query generates a WAL record also looks sensible, so I > kept this. I realized that Kirill used an existing set of queries that were > previously added to validate the multi queries commands behavior, so there's no > need to have all of them again. I just kept one of each (insert, update, > delete, select) to make sure that we do record WAL activity there, but I don't > think that more can really be done. I still think that this is better than > nothing, but if you disagree feel free to drop those tests. > > > > 5. > > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; > > - query | calls | rows > > ------------------------------------+-------+------ > > - SELECT $1::TEXT | 1 | 1 > > - SELECT PLUS_ONE($1) | 2 | 2 > > - SELECT PLUS_TWO($1) | 2 | 2 > > - SELECT pg_stat_statements_reset() | 1 | 1 > > +SELECT query, calls, rows, wal_bytes, wal_records FROM > > pg_stat_statements ORDER BY query COLLATE "C"; > > + query | calls | rows | wal_bytes | wal_records > > +-----------------------------------+-------+------+-----------+------------- > > + SELECT $1::TEXT | 1 | 1 | 0 | 0 > > + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 > > + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 > > + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 > > (4 rows) > > > > Again, I am not sure if these modifications make much sense? > > > Those are queries that were previously executed. As those are read-only query, > that are pretty much guaranteed to not cause any WAL activity, I don't see how > it hurts to test at the same time that that's we indeed record with > pg_stat_statements, just to be safe. Once again, feel free to drop the extra > wal_* columns from the output if you disagree. > > > > 6. > > static void pgss_shmem_startup(void); > > @@ -313,6 +318,7 @@ static void pgss_store(const char *query, uint64 queryId, > > int query_location, int query_len, > > double total_time, uint64 rows, > > const BufferUsage *bufusage, > > + const WalUsage* walusage, > > pgssJumbleState *jstate); > > > > The alignment for walusage doesn't seem to be correct. Running > > pgindent will fix this. > > > Indeed, fixed. > > > 7. > > + values[i++] = Int64GetDatumFast(tmp.wal_records); > > + values[i++] = UInt64GetDatum(tmp.wal_num_fpw); > > > > Why are they different? I think we should use the same *GetDatum API > > (probably Int64GetDatumFast) for these. > > > Oops, that's a mistake from when I was working on the wal_bytes output, fixed. > > > > v9-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > > > > > > > One more comment related to this patch. > > + > > + snprintf(buf, sizeof buf, UINT64_FORMAT, tmp.wal_bytes); > > + > > + /* Convert to numeric. */ > > + wal_bytes = DirectFunctionCall3(numeric_in, > > + CStringGetDatum(buf), > > + ObjectIdGetDatum(0), > > + Int32GetDatum(-1)); > > + > > + values[i++] = wal_bytes; > > > > I see that other places that display uint64 values use BIGINT datatype > > in SQL, so why can't we do the same here? See the usage of queryid in > > pg_stat_statements or internal_pages, *_pages exposed via > > pgstatindex.c. > > > That's because it's harmless to report a signed number for a hash (at least > comapred to the overhead of having it unsigned), while that's certainly not > wanted to report a negative amount of WAL bytes generated if it goes beyond > bigint limit. See the usage of pg_lsn_mi in pg_lsn.c for instance. > > On Wed, Apr 01, 2020 at 07:20:31PM +0530, Dilip Kumar wrote: > > > > I have reviewed 0003 and 0004, I have a few comments. > > v9-0003-Add-infrastructure-to-track-WAL-usage > > > > 1. > > /* Points to buffer usage area in DSM */ > > BufferUsage *buffer_usage; > > + /* Points to WAL usage area in DSM */ > > + WalUsage *wal_usage; > > > > Better to give one blank line between the previous statement/variable > > declaration and the next comment line. > > > Fixed. > > > > 2. > > @@ -2154,7 +2157,7 @@ lazy_parallel_vacuum_indexes(Relation *Irel, > > IndexBulkDeleteResult **stats, > > WaitForParallelWorkersToFinish(lps->pcxt); > > > > for (i = 0; i < lps->pcxt->nworkers_launched; i++) > > - InstrAccumParallelQuery(&lps->buffer_usage[i]); > > + InstrAccumParallelQuery(&lps->buffer_usage[i], &lps->wal_usage[i]); > > } > > > > The existing comment above this loop, which just mentions the buffer > > usage, not the wal usage so I guess we need to change that. > > > Ah indeed, I thought I caught all the comments but missed this one. Fixed. > > > > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut > > > > 3. > > + if (usage->wal_num_fpw > 0) > > + appendStringInfo(es->str, " full page records=%ld", > > + usage->wal_num_fpw); > > + if (usage->wal_bytes > 0) > > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, > > + usage->wal_bytes); > > > > Shall we change to 'full page writes' or 'full page image' instead of > > full page records? > > > Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed. > > > > Apart from this, I have some testing to see the wal_usage with the > > parallel vacuum and the results look fine. > > > > postgres[104248]=# CREATE TABLE test (a int, b int); > > CREATE TABLE > > postgres[104248]=# INSERT INTO test SELECT i, i FROM > > GENERATE_SERIES(1,2000000) as i; > > INSERT 0 2000000 > > postgres[104248]=# CREATE INDEX idx1 on test(a); > > CREATE INDEX > > postgres[104248]=# VACUUM (PARALLEL 1) test; > > VACUUM > > postgres[104248]=# select query , wal_bytes, wal_records, wal_num_fpw > > from pg_stat_statements where query like 'VACUUM%'; > > query | wal_bytes | wal_records | wal_num_fpw > > --------------------------+-----------+-------------+------------- > > VACUUM (PARALLEL 1) test | 72814331 | 8857 | 8855 > > > > > > > > postgres[106479]=# CREATE TABLE test (a int, b int); > > CREATE TABLE > > postgres[106479]=# INSERT INTO test SELECT i, i FROM > > GENERATE_SERIES(1,2000000) as i; > > INSERT 0 2000000 > > postgres[106479]=# CREATE INDEX idx1 on test(a); > > CREATE INDEX > > postgres[106479]=# VACUUM (PARALLEL 0) test; > > VACUUM > > postgres[106479]=# select query , wal_bytes, wal_records, wal_num_fpw > > from pg_stat_statements where query like 'VACUUM%'; > > query | wal_bytes | wal_records | wal_num_fpw > > --------------------------+-----------+-------------+------------- > > VACUUM (PARALLEL 0) test | 72814331 | 8857 | 8855 > > > Thanks! I did some similar testing, with also seq/parallel index creation and > got similar results. > > > > By tomorrow, I will try to finish reviewing 0005 and 0006. I have reviewed these patches and I have a few cosmetic comments. v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements 1. + uint64 wal_bytes; /* total amount of wal bytes written */ + int64 wal_records; /* # of wal records written */ + int64 wal_num_fpw; /* # of full page wal records written */ /s/# of full page wal records written / /* # of WAL full page image produced */ 2. static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, ProcessUtilityContext context, ParamListInfo params, QueryEnvironment *queryEnv, - DestReceiver *dest, QueryCompletion *qc); + DestReceiver *dest, QueryCompletion * qc); Useless hunk. 3. v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum @@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage) { ExplainPropertyInteger("WAL records", NULL, usage->wal_records, es); - ExplainPropertyInteger("WAL full page records", NULL, + ExplainPropertyInteger("WAL full page writes", NULL, usage->wal_num_fpw, es); Just noticed that in 0004 you have first added "WAL full page records", which is later corrected to "WAL full page writes" in 0006. I think we better keep this proper in 0004 itself and avoid this hunk in 0006, otherwise, it creates confusion while reviewing. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote: > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > > > 3. Doing some testing with and without parallelism to ensure WAL usage > > > data is correct would be great and if possible, share the results? > > > > > > I just saw that Dilip did some testing, but just in case here is some > > additional one > > > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%'; > > query | calls | wal_bytes | wal_records | wal_num_fpw > > ------------------------+-------+-----------+-------------+------------- > > vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 > > vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 > > (2 rows) > > > > - create index, overload t1's parallel_workers, using the 1M line just > > vacuumed: > > > > =# alter table t1 set (parallel_workers = 2); > > ALTER TABLE > > > > =# create index t1_parallel_2 on t1(id); > > CREATE INDEX > > > > =# alter table t1 set (parallel_workers = 0); > > ALTER TABLE > > > > =# create index t1_parallel_0 on t1(id); > > CREATE INDEX > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > query | calls | wal_bytes | wal_records | wal_num_fpw > > --------------------------------------+-------+-----------+-------------+------------- > > create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745 > > create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758 > > (2 rows) > > > > It all looks good to me. > > > > Here the wal_num_fpw and wal_bytes are different between parallel and > non-parallel versions. Is it due to checkpoint or something else? We > can probably rule out checkpoint by increasing checkpoint_timeout and > other checkpoint related parameters. I think this is because I did a checkpoint after the VACUUM tests, so the 1st CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I didn't try to investigate more since: On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote: > > Also, I forgot to mention that let's not base this on buffer usage > patch for create index > (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as > per recent discussion I am not sure about its usefulness. I think we > can proceed with this patch without > v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well. Which is done in attached v11. > > > 5. > > > -SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; > > > - query | calls | rows > > > ------------------------------------+-------+------ > > > - SELECT $1::TEXT | 1 | 1 > > > - SELECT PLUS_ONE($1) | 2 | 2 > > > - SELECT PLUS_TWO($1) | 2 | 2 > > > - SELECT pg_stat_statements_reset() | 1 | 1 > > > +SELECT query, calls, rows, wal_bytes, wal_records FROM > > > pg_stat_statements ORDER BY query COLLATE "C"; > > > + query | calls | rows | wal_bytes | wal_records > > > +-----------------------------------+-------+------+-----------+------------- > > > + SELECT $1::TEXT | 1 | 1 | 0 | 0 > > > + SELECT PLUS_ONE($1) | 2 | 2 | 0 | 0 > > > + SELECT PLUS_TWO($1) | 2 | 2 | 0 | 0 > > > + SELECT pg_stat_statements_reset() | 1 | 1 | 0 | 0 > > > (4 rows) > > > > > > Again, I am not sure if these modifications make much sense? > > > > > > Those are queries that were previously executed. As those are read-only query, > > that are pretty much guaranteed to not cause any WAL activity, I don't see how > > it hurts to test at the same time that that's we indeed record with > > pg_stat_statements, just to be safe. > > > > On a similar theory, one could have checked bufferusage stats as well. > The statements are using some expressions so don't see any value in > check all usage data for such statements. Dropped. > Right now, that particular patch is not getting applied (probably due > to recent commit 17e0328224). Can you rebase it? Done. > > > v9-0004-Add-option-to-report-WAL-usage-in-EXPLAIN-and-aut > > > > > > 3. > > > + if (usage->wal_num_fpw > 0) > > > + appendStringInfo(es->str, " full page records=%ld", > > > + usage->wal_num_fpw); > > > + if (usage->wal_bytes > 0) > > > + appendStringInfo(es->str, " bytes=" UINT64_FORMAT, > > > + usage->wal_bytes); > > > > > > Shall we change to 'full page writes' or 'full page image' instead of > > > full page records? > > > > > > Indeed, I changed it in the (auto)vacuum output but missed this one. Fixed. > > > > I don't see this change in the patch. Yes, as Dilip reported I fixuped the wrong commit, sorry about that. This version should now be ok. On Thu, Apr 02, 2020 at 12:04:32PM +0530, Dilip Kumar wrote: > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > By tomorrow, I will try to finish reviewing 0005 and 0006. > > I have reviewed these patches and I have a few cosmetic comments. > v10-0005-Keep-track-of-WAL-usage-in-pg_stat_statements > > 1. > + uint64 wal_bytes; /* total amount of wal bytes written */ > + int64 wal_records; /* # of wal records written */ > + int64 wal_num_fpw; /* # of full page wal records written */ > > > /s/# of full page wal records written / /* # of WAL full page image produced */ Done, I also consistently s/wal/WAL/. > > 2. > static void pgss_ProcessUtility(PlannedStmt *pstmt, const char *queryString, > ProcessUtilityContext context, ParamListInfo params, > QueryEnvironment *queryEnv, > - DestReceiver *dest, QueryCompletion *qc); > + DestReceiver *dest, QueryCompletion * qc); > > Useless hunk. Oops, leftover of a pgindent as QueryCompletion isn't in the typedefs yet. I thought I discarded all the useless hunks but missed this one. Thanks, fixed. > > 3. > > v10-0006-Expose-WAL-usage-counters-in-verbose-auto-vacuum > > @@ -3105,7 +3105,7 @@ show_wal_usage(ExplainState *es, const WalUsage *usage) > { > ExplainPropertyInteger("WAL records", NULL, > usage->wal_records, es); > - ExplainPropertyInteger("WAL full page records", NULL, > + ExplainPropertyInteger("WAL full page writes", NULL, > usage->wal_num_fpw, es); > Just noticed that in 0004 you have first added "WAL full page > records", which is later corrected to "WAL full page writes" in 0006. > I think we better keep this proper in 0004 itself and avoid this hunk > in 0006, otherwise, it creates confusion while reviewing. Oh, I didn't realized that I fixuped the wrong commit. Fixed. I also adapted the documentation that mentioned full page records instead of full page images, and integrated Justin's comment: > In 0003: > + /* Provide WAL update data to the instrumentation */ > Remove "data" ?? so changed to "Report WAL traffic to the instrumentation." I didn't change the (auto)vacuum output yet (except fixing the s/full page records/full page writes/ that I previously missed), as it's not clear what the consensus is yet. I'll take care of that as soon as we reach to a consensus.
Attachment
On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote: > > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > > > > 3. Doing some testing with and without parallelism to ensure WAL usage > > > > data is correct would be great and if possible, share the results? > > > > > > > > > I just saw that Dilip did some testing, but just in case here is some > > > additional one > > > > > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%'; > > > query | calls | wal_bytes | wal_records | wal_num_fpw > > > ------------------------+-------+-----------+-------------+------------- > > > vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 > > > vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 > > > (2 rows) > > > > > > - create index, overload t1's parallel_workers, using the 1M line just > > > vacuumed: > > > > > > =# alter table t1 set (parallel_workers = 2); > > > ALTER TABLE > > > > > > =# create index t1_parallel_2 on t1(id); > > > CREATE INDEX > > > > > > =# alter table t1 set (parallel_workers = 0); > > > ALTER TABLE > > > > > > =# create index t1_parallel_0 on t1(id); > > > CREATE INDEX > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > > query | calls | wal_bytes | wal_records | wal_num_fpw > > > --------------------------------------+-------+-----------+-------------+------------- > > > create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745 > > > create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758 > > > (2 rows) > > > > > > It all looks good to me. > > > > > > > Here the wal_num_fpw and wal_bytes are different between parallel and > > non-parallel versions. Is it due to checkpoint or something else? We > > can probably rule out checkpoint by increasing checkpoint_timeout and > > other checkpoint related parameters. > > I think this is because I did a checkpoint after the VACUUM tests, so the 1st > CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I > didn't try to investigate more since: > We need to do this. > On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote: > > > > Also, I forgot to mention that let's not base this on buffer usage > > patch for create index > > (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as > > per recent discussion I am not sure about its usefulness. I think we > > can proceed with this patch without > > v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well. > > > Which is done in attached v11. > Hmm, I haven't suggested removing the WAL usage from the parallel create index. I just told not to use the infrastructure of another patch. We bypass the buffer manager but do write WAL. See _bt_blwritepage->log_newpage. So we need to accumulate WAL usage even if we decide not to do anything about BufferUsage which means we need to investigate the above inconsistency in wal_num_fpw and wal_bytes between parallel and non-parallel version. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 02, 2020 at 02:32:07PM +0530, Amit Kapila wrote: > On Thu, Apr 2, 2020 at 2:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Thu, Apr 02, 2020 at 11:07:29AM +0530, Amit Kapila wrote: > > > On Wed, Apr 1, 2020 at 8:00 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > On Wed, Apr 01, 2020 at 04:29:16PM +0530, Amit Kapila wrote: > > > > > 3. Doing some testing with and without parallelism to ensure WAL usage > > > > > data is correct would be great and if possible, share the results? > > > > > > > > > > > > I just saw that Dilip did some testing, but just in case here is some > > > > additional one > > > > > > > > - vacuum, after a truncate, loading 1M row and a "UPDATE t1 SET id = id" > > > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%vacuum%'; > > > > query | calls | wal_bytes | wal_records | wal_num_fpw > > > > ------------------------+-------+-----------+-------------+------------- > > > > vacuum (parallel 3) t1 | 1 | 20098962 | 34104 | 2 > > > > vacuum (parallel 0) t1 | 1 | 20098962 | 34104 | 2 > > > > (2 rows) > > > > > > > > - create index, overload t1's parallel_workers, using the 1M line just > > > > vacuumed: > > > > > > > > =# alter table t1 set (parallel_workers = 2); > > > > ALTER TABLE > > > > > > > > =# create index t1_parallel_2 on t1(id); > > > > CREATE INDEX > > > > > > > > =# alter table t1 set (parallel_workers = 0); > > > > ALTER TABLE > > > > > > > > =# create index t1_parallel_0 on t1(id); > > > > CREATE INDEX > > > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > > > query | calls | wal_bytes | wal_records | wal_num_fpw > > > > --------------------------------------+-------+-----------+-------------+------------- > > > > create index t1_parallel_0 on t1(id) | 1 | 20355540 | 2762 | 2745 > > > > create index t1_parallel_2 on t1(id) | 1 | 20406811 | 2762 | 2758 > > > > (2 rows) > > > > > > > > It all looks good to me. > > > > > > > > > > Here the wal_num_fpw and wal_bytes are different between parallel and > > > non-parallel versions. Is it due to checkpoint or something else? We > > > can probably rule out checkpoint by increasing checkpoint_timeout and > > > other checkpoint related parameters. > > > > I think this is because I did a checkpoint after the VACUUM tests, so the 1st > > CREATE INDEX (with parallelism) induced some FPW on the catalog blocks. I > > didn't try to investigate more since: > > > > We need to do this. > > > On Thu, Apr 02, 2020 at 11:22:16AM +0530, Amit Kapila wrote: > > > > > > Also, I forgot to mention that let's not base this on buffer usage > > > patch for create index > > > (v10-0002-Allow-parallel-index-creation-to-accumulate-buff) because as > > > per recent discussion I am not sure about its usefulness. I think we > > > can proceed with this patch without > > > v10-0002-Allow-parallel-index-creation-to-accumulate-buff as well. > > > > > > Which is done in attached v11. > > > > Hmm, I haven't suggested removing the WAL usage from the parallel > create index. I just told not to use the infrastructure of another > patch. We bypass the buffer manager but do write WAL. See > _bt_blwritepage->log_newpage. So we need to accumulate WAL usage even > if we decide not to do anything about BufferUsage which means we need > to investigate the above inconsistency in wal_num_fpw and wal_bytes > between parallel and non-parallel version. Oh, I thought that you wanted to wait on that part, as we'll probably change the parallel create index to report buffer access eventually. v12 attached with an adaptation of Sawada-san's original patch but only dealing with WAL activity. I did some more experiment, ensuring as much stability as possible: =# create table t1(id integer); CREATE TABLE =# insert into t1 select * from generate_series(1, 1000000); INSERT 0 1000000 =# select * from pg_stat_statements_reset() ; pg_stat_statements_reset -------------------------- (1 row) =# alter table t1 set (parallel_workers = 0); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_0 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 1); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_1 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 2); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_2 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 3); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_3 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 4); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_4 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 5); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_5 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 6); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_6 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 7); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_7 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 8); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_8 ON t1(id); CREATE INDEX =# alter table t1 set (parallel_workers = 0); ALTER TABLE =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_0_bis ON t1(id); CREATE INDEX =# vacuum;checkpoint; VACUUM CHECKPOINT =# create index t1_idx_parallel_0_ter ON t1(id); CREATE INDEX =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; query | calls | wal_bytes | wal_records | wal_num_fpw ----------------------------------------------+-------+-----------+-------------+------------- create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 (11 rows) =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; relname | pg_relation_size -----------------------+------------------ t1_idx_parallel_0 | 22487040 t1_idx_parallel_0_bis | 22487040 t1_idx_parallel_0_ter | 22487040 t1_idx_parallel_2 | 22487040 t1_idx_parallel_1 | 22487040 t1_idx_parallel_4 | 22487040 t1_idx_parallel_3 | 22487040 t1_idx_parallel_5 | 22487040 t1_idx_parallel_6 | 22487040 t1_idx_parallel_7 | 22487040 t1_idx_parallel_8 | 22487040 (9 rows) So while the number of WAL records and full page images stay constant, we can see some small fluctuations in the total amount of generated WAL data, even for multiple execution of the sequential create index. I'm wondering if the fluctuations are due to some other internal details or if the WalUsage support is just completely broken (although I don't see any obvious issue ATM).
Attachment
On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > query | calls | wal_bytes | wal_records | wal_num_fpw > ----------------------------------------------+-------+-----------+-------------+------------- > create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 > create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 > create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 > create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 > create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 > create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 > create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 > create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 > create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 > create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 > create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 > (11 rows) > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; > relname | pg_relation_size > -----------------------+------------------ > t1_idx_parallel_0 | 22487040 > t1_idx_parallel_0_bis | 22487040 > t1_idx_parallel_0_ter | 22487040 > t1_idx_parallel_2 | 22487040 > t1_idx_parallel_1 | 22487040 > t1_idx_parallel_4 | 22487040 > t1_idx_parallel_3 | 22487040 > t1_idx_parallel_5 | 22487040 > t1_idx_parallel_6 | 22487040 > t1_idx_parallel_7 | 22487040 > t1_idx_parallel_8 | 22487040 > (9 rows) > > > So while the number of WAL records and full page images stay constant, we can > see some small fluctuations in the total amount of generated WAL data, even for > multiple execution of the sequential create index. I'm wondering if the > fluctuations are due to some other internal details or if the WalUsage support > is just completely broken (although I don't see any obvious issue ATM). > I think we need to know the reason for this. Can you try with small size indexes and see if the problem is reproducible? If it is, then it will be easier to debug the same. Few other minor comments ------------------------------------ pg_stat_statements patch 1. +-- +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to validate WAL generation metrics +-- The word 'non-temp' in the above comment appears out of place. We don't need to specify it. 2. +-- SELECT usage data, check WAL usage is reported, wal_records equal rows count for INSERT/UPDATE/DELETE +SELECT query, calls, rows, +wal_bytes > 0 as wal_bytes_generated, +wal_records > 0 as wal_records_generated, +wal_records = rows as wal_records_as_rows +FROM pg_stat_statements ORDER BY query COLLATE "C"; The comment doesn't seem to match what we are doing in the statement. I think we can simplify it to something like "check WAL is generated for above statements: 3. @@ -185,6 +185,9 @@ typedef struct Counters int64 local_blks_written; /* # of local disk blocks written */ int64 temp_blks_read; /* # of temp blocks read */ int64 temp_blks_written; /* # of temp blocks written */ + uint64 wal_bytes; /* total amount of WAL bytes generated */ + int64 wal_records; /* # of WAL records generated */ + int64 wal_num_fpw; /* # of WAL full page image generated */ double blk_read_time; /* time spent reading, in msec */ double blk_write_time; /* time spent writing, in msec */ double usage; /* usage factor */ It is better to keep wal_bytes should be after wal_num_fpw as it is in the main patch. Also, consider changing at other places in this patch. I think we should add these new fields after blk_write_time or at the end after usage. 4. /* # of WAL full page image generated */ Can we change it to "/* # of WAL full page image records generated */"? If you agree, then a similar comment exists in v11-0001-Add-infrastructure-to-track-WAL-usage, consider changing that as well. v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au 5. Specifically, include the + number of records, full page images and bytes generated. How about making the above slightly clear? "Specifically, include the number of records, number of full page image records and amount of WAL bytes generated. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > query | calls | wal_bytes | wal_records | wal_num_fpw > > ----------------------------------------------+-------+-----------+-------------+------------- > > create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 > > create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 > > create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 > > create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 > > create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 > > create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 > > create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 > > create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 > > create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 > > create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 > > create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 > > (11 rows) > > > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; > > relname | pg_relation_size > > -----------------------+------------------ > > t1_idx_parallel_0 | 22487040 > > t1_idx_parallel_0_bis | 22487040 > > t1_idx_parallel_0_ter | 22487040 > > t1_idx_parallel_2 | 22487040 > > t1_idx_parallel_1 | 22487040 > > t1_idx_parallel_4 | 22487040 > > t1_idx_parallel_3 | 22487040 > > t1_idx_parallel_5 | 22487040 > > t1_idx_parallel_6 | 22487040 > > t1_idx_parallel_7 | 22487040 > > t1_idx_parallel_8 | 22487040 > > (9 rows) > > > > > > So while the number of WAL records and full page images stay constant, we can > > see some small fluctuations in the total amount of generated WAL data, even for > > multiple execution of the sequential create index. I'm wondering if the > > fluctuations are due to some other internal details or if the WalUsage support > > is just completely broken (although I don't see any obvious issue ATM). > > > > I think we need to know the reason for this. Can you try with small > size indexes and see if the problem is reproducible? If it is, then it > will be easier to debug the same. > > Few other minor comments > ------------------------------------ > pg_stat_statements patch > 1. > +-- > +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to > validate WAL generation metrics > +-- > > The word 'non-temp' in the above comment appears out of place. We > don't need to specify it. > > 2. > +-- SELECT usage data, check WAL usage is reported, wal_records equal > rows count for INSERT/UPDATE/DELETE > +SELECT query, calls, rows, > +wal_bytes > 0 as wal_bytes_generated, > +wal_records > 0 as wal_records_generated, > +wal_records = rows as wal_records_as_rows > +FROM pg_stat_statements ORDER BY query COLLATE "C"; > > The comment doesn't seem to match what we are doing in the statement. > I think we can simplify it to something like "check WAL is generated > for above statements: > > 3. > @@ -185,6 +185,9 @@ typedef struct Counters > int64 local_blks_written; /* # of local disk blocks written */ > int64 temp_blks_read; /* # of temp blocks read */ > int64 temp_blks_written; /* # of temp blocks written */ > + uint64 wal_bytes; /* total amount of WAL bytes generated */ > + int64 wal_records; /* # of WAL records generated */ > + int64 wal_num_fpw; /* # of WAL full page image generated */ > double blk_read_time; /* time spent reading, in msec */ > double blk_write_time; /* time spent writing, in msec */ > double usage; /* usage factor */ > > It is better to keep wal_bytes should be after wal_num_fpw as it is in > the main patch. Also, consider changing at other places in this > patch. I think we should add these new fields after blk_write_time or > at the end after usage. > > 4. > /* # of WAL full page image generated */ > Can we change it to "/* # of WAL full page image records generated */"? IMHO, "# of WAL full-page image records" seems like the number of wal record which contains the full-page image. But, actually, this is the total number of the full-page images, not the number of records that have a full-page image. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 02, 2020 at 06:40:51PM +0530, Amit Kapila wrote: > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > query | calls | wal_bytes | wal_records | wal_num_fpw > > ----------------------------------------------+-------+-----------+-------------+------------- > > create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 > > create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 > > create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 > > create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 > > create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 > > create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 > > create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 > > create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 > > create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 > > create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 > > create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 > > (11 rows) > > > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; > > relname | pg_relation_size > > -----------------------+------------------ > > t1_idx_parallel_0 | 22487040 > > t1_idx_parallel_0_bis | 22487040 > > t1_idx_parallel_0_ter | 22487040 > > t1_idx_parallel_2 | 22487040 > > t1_idx_parallel_1 | 22487040 > > t1_idx_parallel_4 | 22487040 > > t1_idx_parallel_3 | 22487040 > > t1_idx_parallel_5 | 22487040 > > t1_idx_parallel_6 | 22487040 > > t1_idx_parallel_7 | 22487040 > > t1_idx_parallel_8 | 22487040 > > (9 rows) > > > > > > So while the number of WAL records and full page images stay constant, we can > > see some small fluctuations in the total amount of generated WAL data, even for > > multiple execution of the sequential create index. I'm wondering if the > > fluctuations are due to some other internal details or if the WalUsage support > > is just completely broken (although I don't see any obvious issue ATM). > > > > I think we need to know the reason for this. Can you try with small > size indexes and see if the problem is reproducible? If it is, then it > will be easier to debug the same. I did some quick testing using the attached shell script: - one a 1k line base number of lines, scales 1 10 100 1000 (suffix _s) - parallel workers from 0 to 8 (suffix _w) - each index created twice (suffix _pa and _pb) - with a vacuum;checkpoint;pg_switch_wal executed each time I get the following results: query | wal_bytes | wal_records | wal_num_fpw --------------------------------------------+-----------+-------------+------------- CREATE INDEX t1_idx_s001_pa_w0 ON t1 (id) | 61871 | 22 | 18 CREATE INDEX t1_idx_s001_pa_w1 ON t1 (id) | 62394 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w2 ON t1 (id) | 63150 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w3 ON t1 (id) | 63906 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w4 ON t1 (id) | 64662 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w5 ON t1 (id) | 65418 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w6 ON t1 (id) | 65450 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w7 ON t1 (id) | 66206 | 21 | 18 CREATE INDEX t1_idx_s001_pa_w8 ON t1 (id) | 66962 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w0 ON t1 (id) | 67718 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w1 ON t1 (id) | 68474 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w2 ON t1 (id) | 68418 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w3 ON t1 (id) | 69174 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w4 ON t1 (id) | 69930 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w5 ON t1 (id) | 70686 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w6 ON t1 (id) | 71442 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w7 ON t1 (id) | 64922 | 21 | 18 CREATE INDEX t1_idx_s001_pb_w8 ON t1 (id) | 65682 | 21 | 18 CREATE INDEX t1_idx_s010_pa_w0 ON t1 (id) | 250460 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w1 ON t1 (id) | 251216 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w2 ON t1 (id) | 251972 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w3 ON t1 (id) | 252728 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w4 ON t1 (id) | 253484 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w5 ON t1 (id) | 254240 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w6 ON t1 (id) | 253552 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w7 ON t1 (id) | 254308 | 47 | 44 CREATE INDEX t1_idx_s010_pa_w8 ON t1 (id) | 255064 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w0 ON t1 (id) | 255820 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w1 ON t1 (id) | 256576 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w2 ON t1 (id) | 257332 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w3 ON t1 (id) | 258088 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w4 ON t1 (id) | 258844 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w5 ON t1 (id) | 259600 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w6 ON t1 (id) | 260356 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w7 ON t1 (id) | 260012 | 47 | 44 CREATE INDEX t1_idx_s010_pb_w8 ON t1 (id) | 260768 | 47 | 44 CREATE INDEX t1_idx_s1000_pa_w0 ON t1 (id) | 20400595 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w1 ON t1 (id) | 20401351 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w2 ON t1 (id) | 20402107 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w3 ON t1 (id) | 20402863 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w4 ON t1 (id) | 20403619 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w5 ON t1 (id) | 20404375 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w6 ON t1 (id) | 20403687 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w7 ON t1 (id) | 20404443 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pa_w8 ON t1 (id) | 20405199 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w0 ON t1 (id) | 20405955 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w1 ON t1 (id) | 20406711 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w2 ON t1 (id) | 20407467 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w3 ON t1 (id) | 20408223 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w4 ON t1 (id) | 20408979 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w5 ON t1 (id) | 20409735 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w6 ON t1 (id) | 20410491 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w7 ON t1 (id) | 20410147 | 2762 | 2759 CREATE INDEX t1_idx_s1000_pb_w8 ON t1 (id) | 20410903 | 2762 | 2759 CREATE INDEX t1_idx_s100_pa_w0 ON t1 (id) | 2082194 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w1 ON t1 (id) | 2082950 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w2 ON t1 (id) | 2083706 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w3 ON t1 (id) | 2084462 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w4 ON t1 (id) | 2085218 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w5 ON t1 (id) | 2085974 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w6 ON t1 (id) | 2085286 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w7 ON t1 (id) | 2086042 | 293 | 290 CREATE INDEX t1_idx_s100_pa_w8 ON t1 (id) | 2086798 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w0 ON t1 (id) | 2087554 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w1 ON t1 (id) | 2088310 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w2 ON t1 (id) | 2089066 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w3 ON t1 (id) | 2089822 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w4 ON t1 (id) | 2090578 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w5 ON t1 (id) | 2091334 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w6 ON t1 (id) | 2092090 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w7 ON t1 (id) | 2091746 | 293 | 290 CREATE INDEX t1_idx_s100_pb_w8 ON t1 (id) | 2092502 | 293 | 290 (72 rows) The fluctuations exist for all scales, but doesn't seem to depend on the input size. Just to be sure I tried to measure the amount of WAL for various INSERT size using roughly the same approach, and results are stable: query | wal_bytes | wal_records | wal_num_fpw -----------------------------------------------------+-----------+-------------+------------- INSERT INTO t_001_a SELECT generate_series($1, $2) | 59000 | 1000 | 0 INSERT INTO t_001_b SELECT generate_series($1, $2) | 59000 | 1000 | 0 INSERT INTO t_010_a SELECT generate_series($1, $2) | 590000 | 10000 | 0 INSERT INTO t_010_b SELECT generate_series($1, $2) | 590000 | 10000 | 0 INSERT INTO t_1000_a SELECT generate_series($1, $2) | 59000000 | 1000000 | 0 INSERT INTO t_1000_b SELECT generate_series($1, $2) | 59000000 | 1000000 | 0 INSERT INTO t_100_a SELECT generate_series($1, $2) | 5900000 | 100000 | 0 INSERT INTO t_100_b SELECT generate_series($1, $2) | 5900000 | 100000 | 0 (8 rows) At this point I tend to think that this is somehow due to btbuild specific behavior, or somewhere nearby. > Few other minor comments > ------------------------------------ > pg_stat_statements patch > 1. > +-- > +-- CRUD: INSERT SELECT UPDATE DELETE on test non-temp table to > validate WAL generation metrics > +-- > > The word 'non-temp' in the above comment appears out of place. We > don't need to specify it. Fixed. > 2. > +-- SELECT usage data, check WAL usage is reported, wal_records equal > rows count for INSERT/UPDATE/DELETE > +SELECT query, calls, rows, > +wal_bytes > 0 as wal_bytes_generated, > +wal_records > 0 as wal_records_generated, > +wal_records = rows as wal_records_as_rows > +FROM pg_stat_statements ORDER BY query COLLATE "C"; > > The comment doesn't seem to match what we are doing in the statement. > I think we can simplify it to something like "check WAL is generated > for above statements: Done. > 3. > @@ -185,6 +185,9 @@ typedef struct Counters > int64 local_blks_written; /* # of local disk blocks written */ > int64 temp_blks_read; /* # of temp blocks read */ > int64 temp_blks_written; /* # of temp blocks written */ > + uint64 wal_bytes; /* total amount of WAL bytes generated */ > + int64 wal_records; /* # of WAL records generated */ > + int64 wal_num_fpw; /* # of WAL full page image generated */ > double blk_read_time; /* time spent reading, in msec */ > double blk_write_time; /* time spent writing, in msec */ > double usage; /* usage factor */ > > It is better to keep wal_bytes should be after wal_num_fpw as it is in > the main patch. Also, consider changing at other places in this > patch. I think we should add these new fields after blk_write_time or > at the end after usage. Done. > 4. > /* # of WAL full page image generated */ > Can we change it to "/* # of WAL full page image records generated */"? > > If you agree, then a similar comment exists in > v11-0001-Add-infrastructure-to-track-WAL-usage, consider changing that > as well. Agreed, and fixed in both place. > v11-0002-Add-option-to-report-WAL-usage-in-EXPLAIN-and-au > 5. > Specifically, include the > + number of records, full page images and bytes generated. > > How about making the above slightly clear? "Specifically, include the > number of records, number of full page image records and amount of WAL > bytes generated. Thanks, that's clearer. Done
Attachment
On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > query | calls | wal_bytes | wal_records | wal_num_fpw > > ----------------------------------------------+-------+-----------+-------------+------------- > > create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 > > create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 > > create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 > > create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 > > create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 > > create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 > > create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 > > create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 > > create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 > > create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 > > create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 > > (11 rows) > > > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; > > relname | pg_relation_size > > -----------------------+------------------ > > t1_idx_parallel_0 | 22487040 > > t1_idx_parallel_0_bis | 22487040 > > t1_idx_parallel_0_ter | 22487040 > > t1_idx_parallel_2 | 22487040 > > t1_idx_parallel_1 | 22487040 > > t1_idx_parallel_4 | 22487040 > > t1_idx_parallel_3 | 22487040 > > t1_idx_parallel_5 | 22487040 > > t1_idx_parallel_6 | 22487040 > > t1_idx_parallel_7 | 22487040 > > t1_idx_parallel_8 | 22487040 > > (9 rows) > > > > > > So while the number of WAL records and full page images stay constant, we can > > see some small fluctuations in the total amount of generated WAL data, even for > > multiple execution of the sequential create index. I'm wondering if the > > fluctuations are due to some other internal details or if the WalUsage support > > is just completely broken (although I don't see any obvious issue ATM). > > > > I think we need to know the reason for this. Can you try with small > size indexes and see if the problem is reproducible? If it is, then it > will be easier to debug the same. I have done some testing to see where these extra WAL size is coming from. First I tried to create new db before every run then the size is consistent. But, then on the same server, I tired as Julien showed in his experiment then I am getting few extra wal bytes from next create index onwards. And, the waldump(attached in the mail) shows that is pg_class insert wal. I still have to check that why we need to write an extra wal size. create extension pg_stat_statements; drop table t1; create table t1(id integer); insert into t1 select * from generate_series(1, 10); alter table t1 set (parallel_workers = 0); vacuum;checkpoint; select * from pg_stat_statements_reset() ; create index t1_idx_parallel_0 ON t1(id); select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';; query | calls | wal_bytes | wal_records | wal_num_fpw ----------------------------------------------------------------------------------+-------+-----------+-------------+------------- create index t1_idx_parallel_0 ON t1(id) | 1 | 49320 | 23 | 15 drop table t1; create table t1(id integer); insert into t1 select * from generate_series(1, 10); --select * from pg_stat_statements_reset() ; alter table t1 set (parallel_workers = 0); vacuum;checkpoint; create index t1_idx_parallel_1 ON t1(id); select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%';; postgres[110383]=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements; query | calls | wal_bytes | wal_records | wal_num_fpw ----------------------------------------------------------------------------------+-------+-----------+-------------+------------- create index t1_idx_parallel_1 ON t1(id) | 1 | 50040 | 23 | 15 wal_bytes diff = 50040-49320 = 720 Below, WAL record is causing the 720 bytes difference, all other WALs are of the same size. t1_idx_parallel_0: rmgr: Heap len (rec/tot): 54/ 7498, tx: 489, lsn: 0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref #0: rel 1663/13580/1249 t1_idx_parallel_1: rmgr: Heap len (rec/tot): 54/ 8218, tx: 494, lsn: 0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref #0: rel 1663/13580/1249 wal diff: 8218 - 7498 = 720 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > 4. > > /* # of WAL full page image generated */ > > Can we change it to "/* # of WAL full page image records generated */"? > > IMHO, "# of WAL full-page image records" seems like the number of wal > record which contains the full-page image. > I think this resembles what you have written here. > But, actually, this is the > total number of the full-page images, not the number of records that > have a full-page image. > We count this when forming WAL records. As per my understanding, all three counters are about WAL records. This counter tells how many records have full page images and one of the purposes of having this counter is to check what percentage of records contain full page image. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hello. The v13 patch seems failing to apply on the master. At Fri, 3 Apr 2020 06:37:21 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in > On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 4. > > > /* # of WAL full page image generated */ > > > Can we change it to "/* # of WAL full page image records generated */"? > > > > IMHO, "# of WAL full-page image records" seems like the number of wal > > record which contains the full-page image. > > > > I think this resembles what you have written here. > > > But, actually, this is the > > total number of the full-page images, not the number of records that > > have a full-page image. > > > > We count this when forming WAL records. As per my understanding, all > three counters are about WAL records. This counter tells how many > records have full page images and one of the purposes of having this > counter is to check what percentage of records contain full page > image. Aside from which is desirable or useful, acutually XLogRecordAssemble in v13-0001 counts the number of attached images then XLogInsertRecord sums up the number of images in pgWalUsage.wal_num_fpw. FWIW, it seems to me that the main concern here is the source of WAL size. If it is the case I think that the number of full page image is more useful. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Apr 3, 2020 at 6:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 4. > > > /* # of WAL full page image generated */ > > > Can we change it to "/* # of WAL full page image records generated */"? > > > > IMHO, "# of WAL full-page image records" seems like the number of wal > > record which contains the full-page image. > > > > I think this resembles what you have written here. > > > But, actually, this is the > > total number of the full-page images, not the number of records that > > have a full-page image. > > > > We count this when forming WAL records. As per my understanding, all > three counters are about WAL records. This counter tells how many > records have full page images and one of the purposes of having this > counter is to check what percentage of records contain full page > image. > How about if say "# of full-page writes generated" or "# of WAL full-page writes generated"? I think now I understand your concern because we want to display it as full page writes and the comment doesn't seem to indicate the same. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 8:14 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 3, 2020 at 6:37 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Apr 2, 2020 at 8:06 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > 4. > > > > /* # of WAL full page image generated */ > > > > Can we change it to "/* # of WAL full page image records generated */"? > > > > > > IMHO, "# of WAL full-page image records" seems like the number of wal > > > record which contains the full-page image. > > > > > > > I think this resembles what you have written here. > > > > > But, actually, this is the > > > total number of the full-page images, not the number of records that > > > have a full-page image. > > > > > > > We count this when forming WAL records. As per my understanding, all > > three counters are about WAL records. This counter tells how many > > records have full page images and one of the purposes of having this > > counter is to check what percentage of records contain full page > > image. > > > > How about if say "# of full-page writes generated" or "# of WAL > full-page writes generated"? I think now I understand your concern > because we want to display it as full page writes and the comment > doesn't seem to indicate the same. Either of these seem good to me. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 7:15 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > Hello. > > The v13 patch seems failing to apply on the master. > It is probably due to recent commit ed7a509571. I have briefly studied that and I think we should make this patch account for plan time WAL usage if any similar to what got committed for buffer usage. The reason is that there is a possibility that during planning we might write a WAL due to hint bits. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 2, 2020 at 9:28 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Apr 2, 2020 at 6:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Apr 2, 2020 at 6:18 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > =# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; > > > query | calls | wal_bytes | wal_records | wal_num_fpw > > > ----------------------------------------------+-------+-----------+-------------+------------- > > > create index t1_idx_parallel_0 ON t1(id) | 1 | 20389743 | 2762 | 2758 > > > create index t1_idx_parallel_0_bis ON t1(id) | 1 | 20394391 | 2762 | 2758 > > > create index t1_idx_parallel_0_ter ON t1(id) | 1 | 20395155 | 2762 | 2758 > > > create index t1_idx_parallel_1 ON t1(id) | 1 | 20388335 | 2762 | 2758 > > > create index t1_idx_parallel_2 ON t1(id) | 1 | 20389091 | 2762 | 2758 > > > create index t1_idx_parallel_3 ON t1(id) | 1 | 20389847 | 2762 | 2758 > > > create index t1_idx_parallel_4 ON t1(id) | 1 | 20390603 | 2762 | 2758 > > > create index t1_idx_parallel_5 ON t1(id) | 1 | 20391359 | 2762 | 2758 > > > create index t1_idx_parallel_6 ON t1(id) | 1 | 20392115 | 2762 | 2758 > > > create index t1_idx_parallel_7 ON t1(id) | 1 | 20392871 | 2762 | 2758 > > > create index t1_idx_parallel_8 ON t1(id) | 1 | 20393627 | 2762 | 2758 > > > (11 rows) > > > > > > =# select relname, pg_relation_size(oid) from pg_class where relname like '%t1_id%'; > > > relname | pg_relation_size > > > -----------------------+------------------ > > > t1_idx_parallel_0 | 22487040 > > > t1_idx_parallel_0_bis | 22487040 > > > t1_idx_parallel_0_ter | 22487040 > > > t1_idx_parallel_2 | 22487040 > > > t1_idx_parallel_1 | 22487040 > > > t1_idx_parallel_4 | 22487040 > > > t1_idx_parallel_3 | 22487040 > > > t1_idx_parallel_5 | 22487040 > > > t1_idx_parallel_6 | 22487040 > > > t1_idx_parallel_7 | 22487040 > > > t1_idx_parallel_8 | 22487040 > > > (9 rows) > > > > > > > > > So while the number of WAL records and full page images stay constant, we can > > > see some small fluctuations in the total amount of generated WAL data, even for > > > multiple execution of the sequential create index. I'm wondering if the > > > fluctuations are due to some other internal details or if the WalUsage support > > > is just completely broken (although I don't see any obvious issue ATM). > > > > > > > I think we need to know the reason for this. Can you try with small > > size indexes and see if the problem is reproducible? If it is, then it > > will be easier to debug the same. > > I have done some testing to see where these extra WAL size is coming > from. First I tried to create new db before every run then the size > is consistent. But, then on the same server, I tired as Julien showed > in his experiment then I am getting few extra wal bytes from next > create index onwards. And, the waldump(attached in the mail) shows > that is pg_class insert wal. I still have to check that why we need > to write an extra wal size. > > create extension pg_stat_statements; > drop table t1; > create table t1(id integer); > insert into t1 select * from generate_series(1, 10); > alter table t1 set (parallel_workers = 0); > vacuum;checkpoint; > select * from pg_stat_statements_reset() ; > create index t1_idx_parallel_0 ON t1(id); > select query, calls, wal_bytes, wal_records, wal_num_fpw from > pg_stat_statements where query ilike '%create index%';; > query > | calls | wal_bytes | wal_records | wal_num_fpw > ----------------------------------------------------------------------------------+-------+-----------+-------------+------------- > create index t1_idx_parallel_0 ON t1(id) > | 1 | 49320 | 23 | 15 > > > drop table t1; > create table t1(id integer); > insert into t1 select * from generate_series(1, 10); > --select * from pg_stat_statements_reset() ; > alter table t1 set (parallel_workers = 0); > vacuum;checkpoint; > create index t1_idx_parallel_1 ON t1(id); > > select query, calls, wal_bytes, wal_records, wal_num_fpw from > pg_stat_statements where query ilike '%create index%';; > postgres[110383]=# select query, calls, wal_bytes, wal_records, > wal_num_fpw from pg_stat_statements; > query > | calls | wal_bytes | wal_records | wal_num_fpw > ----------------------------------------------------------------------------------+-------+-----------+-------------+------------- > create index t1_idx_parallel_1 ON t1(id) > | 1 | 50040 | 23 | 15 > > wal_bytes diff = 50040-49320 = 720 > > Below, WAL record is causing the 720 bytes difference, all other WALs > are of the same size. > t1_idx_parallel_0: > rmgr: Heap len (rec/tot): 54/ 7498, tx: 489, lsn: > 0/0167B9B0, prev 0/0167B970, desc: INSERT off 30 flags 0x01, blkref > #0: rel 1663/13580/1249 > > t1_idx_parallel_1: > rmgr: Heap len (rec/tot): 54/ 8218, tx: 494, lsn: > 0/016B84F8, prev 0/016B84B8, desc: INSERT off 30 flags 0x01, blkref > #0: rel 1663/13580/1249 > > wal diff: 8218 - 7498 = 720 I think now I got the reason. Basically, both of these records are storing the FPW, and FPW size can vary based on the hole size on the page. If hold size is smaller the image length will be more, the image_len= BLCKSZ-hole_size. So in subsequent records, the image size is bigger. You can refer below code in XLogRecordAssemble { .... bimg.length = BLCKSZ - cbimg.hole_length; if (cbimg.hole_length == 0) { .... } else { /* must skip the hole */ rdt_datas_last->data = page; rdt_datas_last->len = bimg.hole_offset; rdt_datas_last->next = ®buf->bkp_rdatas[1]; rdt_datas_last = rdt_datas_last->next; rdt_datas_last->data = page + (bimg.hole_offset + cbimg.hole_length); rdt_datas_last->len = BLCKSZ - (bimg.hole_offset + cbimg.hole_length); } -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > I think now I got the reason. Basically, both of these records are > storing the FPW, and FPW size can vary based on the hole size on the > page. If hold size is smaller the image length will be more, the > image_len= BLCKSZ-hole_size. So in subsequent records, the image size > is bigger. > This means if we always re-create the database or may be keep full_page_writes to off, then we should get consistent WAL usage data for all tests. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I think now I got the reason. Basically, both of these records are > > storing the FPW, and FPW size can vary based on the hole size on the > > page. If hold size is smaller the image length will be more, the > > image_len= BLCKSZ-hole_size. So in subsequent records, the image size > > is bigger. > > > > This means if we always re-create the database or may be keep > full_page_writes to off, then we should get consistent WAL usage data > for all tests. With new database, it is always the same. But, with full-page write, I could see one of the create index is writing extra wal and if we change the older then the new create index at that place will write extra wal. I guess that could be due to a non-in place update in some of the system tables. postgres[58554]=# create extension pg_stat_statements; CREATE EXTENSION postgres[58554]=# postgres[58554]=# create table t1(id integer); CREATE TABLE postgres[58554]=# insert into t1 select * from generate_series(1, 1000000); INSERT 0 1000000 postgres[58554]=# select * from pg_stat_statements_reset() ; pg_stat_statements_reset -------------------------- (1 row) postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 0); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_0 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 1); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_1 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 2); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_2 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 3); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_3 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 4); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_4 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 5); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_5 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 6); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_6 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 7); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_7 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# alter table t1 set (parallel_workers = 8); ALTER TABLE postgres[58554]=# vacuum;checkpoint; VACUUM CHECKPOINT postgres[58554]=# create index t1_idx_parallel_8 ON t1(id); CREATE INDEX postgres[58554]=# postgres[58554]=# select query, calls, wal_bytes, wal_records, wal_num_fpw from pg_stat_statements where query ilike '%create index%'; query | calls | wal_bytes | wal_records | wal_num_fpw ------------------------------------------+-------+-----------+-------------+------------- create index t1_idx_parallel_0 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_1 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_3 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_2 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_4 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_8 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_6 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_7 ON t1(id) | 1 | 20355953 | 2766 | 2745 create index t1_idx_parallel_5 ON t1(id) | 1 | 20359585 | 2767 | 2745 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I think now I got the reason. Basically, both of these records are > > > storing the FPW, and FPW size can vary based on the hole size on the > > > page. If hold size is smaller the image length will be more, the > > > image_len= BLCKSZ-hole_size. So in subsequent records, the image size > > > is bigger. > > > > > > > This means if we always re-create the database or may be keep > > full_page_writes to off, then we should get consistent WAL usage data > > for all tests. > > With new database, it is always the same. But, with full-page write, > I could see one of the create index is writing extra wal and if we > change the older then the new create index at that place will write > extra wal. I guess that could be due to a non-in place update in some > of the system tables. I have analyzed the WAL and there could be multiple reasons for the same. With small data, I have noticed that while inserting in the system index there was a Page Split and that created extra WAL. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Fri, Apr 3, 2020 at 9:17 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Fri, Apr 3, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Apr 3, 2020 at 8:55 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I think now I got the reason. Basically, both of these records are > > > > storing the FPW, and FPW size can vary based on the hole size on the > > > > page. If hold size is smaller the image length will be more, the > > > > image_len= BLCKSZ-hole_size. So in subsequent records, the image size > > > > is bigger. > > > > > > > > > > This means if we always re-create the database or may be keep > > > full_page_writes to off, then we should get consistent WAL usage data > > > for all tests. > > > > With new database, it is always the same. But, with full-page write, > > I could see one of the create index is writing extra wal and if we > > change the older then the new create index at that place will write > > extra wal. I guess that could be due to a non-in place update in some > > of the system tables. > > I have analyzed the WAL and there could be multiple reasons for the > same. With small data, I have noticed that while inserting in the > system index there was a Page Split and that created extra WAL. > Thanks for the investigation. I think it is clear that we can't expect the same WAL size even if we repeat the same operation unless it is a fresh database. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > I have analyzed the WAL and there could be multiple reasons for the > > same. With small data, I have noticed that while inserting in the > > system index there was a Page Split and that created extra WAL. > > > > Thanks for the investigation. I think it is clear that we can't > expect the same WAL size even if we repeat the same operation unless > it is a fresh database. > Attached find the latest patches. I have modified based on our discussion on user interface thread [1], ran pgindent on all patches, slightly modified one comment based on Dilip's input and added commit messages. I think the patches are in good shape. I would like to commit the first patch in this series tomorrow unless I see more comments or any other objections. The patch-2 might need to be rebased if the other related patch [2] got committed first and we might need to tweak a bit based on the input from other thread [1] where we are discussing user interface for it. [1] - https://www.postgresql.org/message-id/CAA4eK1%2Bo1Vj4Rso09pKOaKhY8QWTA0gWwCL3TGCi1rCLBBf-QQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/E1jKC4J-0007R3-Bo%40gemulon.postgresql.org -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > I have analyzed the WAL and there could be multiple reasons for the > > > same. With small data, I have noticed that while inserting in the > > > system index there was a Page Split and that created extra WAL. > > > > > > > Thanks for the investigation. I think it is clear that we can't > > expect the same WAL size even if we repeat the same operation unless > > it is a fresh database. > > > > Attached find the latest patches. I have modified based on our > discussion on user interface thread [1], ran pgindent on all patches, > slightly modified one comment based on Dilip's input and added commit > messages. I think the patches are in good shape. I would like to > commit the first patch in this series tomorrow unless I see more > comments or any other objections. > Pushed. > The patch-2 might need to be > rebased if the other related patch [2] got committed first and we > might need to tweak a bit based on the input from other thread [1] > where we are discussing user interface for it. > The primary question for patch-2 is whether we want to include WAL usage information for the planning phase as we did for BUFFERS in recent commit ce77abe63c (Include information on buffer usage during planning phase, in EXPLAIN output, take two.). Initially, I thought it might be a good idea to do the same for WAL but after reading the thread that leads to commit, I am not sure if there is any pressing need to include WAL information for the planning phase. Because during planning we might not write much WAL (with the exception of WAL due to setting of hint-bits) so users might not care much. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote: > On Fri, Apr 3, 2020 at 7:36 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 3, 2020 at 9:40 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Fri, Apr 3, 2020 at 9:35 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > > > > > I have analyzed the WAL and there could be multiple reasons for the > > > > same. With small data, I have noticed that while inserting in the > > > > system index there was a Page Split and that created extra WAL. > > > > > > > > > > Thanks for the investigation. I think it is clear that we can't > > > expect the same WAL size even if we repeat the same operation unless > > > it is a fresh database. > > > > > > > Attached find the latest patches. I have modified based on our > > discussion on user interface thread [1], ran pgindent on all patches, > > slightly modified one comment based on Dilip's input and added commit > > messages. I think the patches are in good shape. I would like to > > commit the first patch in this series tomorrow unless I see more > > comments or any other objections. > > > > Pushed. Thanks! > > The patch-2 might need to be > > rebased if the other related patch [2] got committed first and we > > might need to tweak a bit based on the input from other thread [1] > > where we are discussing user interface for it. > > > > The primary question for patch-2 is whether we want to include WAL > usage information for the planning phase as we did for BUFFERS in > recent commit ce77abe63c (Include information on buffer usage during > planning phase, in EXPLAIN output, take two.). Initially, I thought > it might be a good idea to do the same for WAL but after reading the > thread that leads to commit, I am not sure if there is any pressing > need to include WAL information for the planning phase. Because > during planning we might not write much WAL (with the exception of WAL > due to setting of hint-bits) so users might not care much. What do > you think? I agree that WAL activity during planning shouldn't be very frequent, but it might still be worthwhile to add it. I'm wondering how stable the normalized WAL information would be in some regression tests, as the counters are only showed if non zero. Maybe it'd be better to remove them from the output, same as the buffers?
On Sat, Apr 4, 2020 at 11:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote: > > > > The patch-2 might need to be > > > rebased if the other related patch [2] got committed first and we > > > might need to tweak a bit based on the input from other thread [1] > > > where we are discussing user interface for it. > > > > > > > The primary question for patch-2 is whether we want to include WAL > > usage information for the planning phase as we did for BUFFERS in > > recent commit ce77abe63c (Include information on buffer usage during > > planning phase, in EXPLAIN output, take two.). Initially, I thought > > it might be a good idea to do the same for WAL but after reading the > > thread that leads to commit, I am not sure if there is any pressing > > need to include WAL information for the planning phase. Because > > during planning we might not write much WAL (with the exception of WAL > > due to setting of hint-bits) so users might not care much. What do > > you think? > > > I agree that WAL activity during planning shouldn't be very frequent, but it > might still be worthwhile to add it. > We can add if we want but I am not able to convince myself for that. Do you have any use case in mind? I think in most of the cases (except for hint-bit WAL) it will be zero. If we are not sure of this we can also discuss it separately in a new thread once this patch-series is committed and see if anybody else sees the value of it and if so adding the code should be easy. > I'm wondering how stable the normalized > WAL information would be in some regression tests, as the counters are only > showed if non zero. Maybe it'd be better to remove them from the output, same > as the buffers? > Which regression tests are you referring to? pg_stat_statements? If so, why would it be unstable? It should always generate WAL although the exact values may differ and we have already taken care of that in the patch, no? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 04, 2020 at 02:12:59PM +0530, Amit Kapila wrote: > On Sat, Apr 4, 2020 at 11:33 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Sat, Apr 04, 2020 at 10:38:14AM +0530, Amit Kapila wrote: > > > > > > The patch-2 might need to be > > > > rebased if the other related patch [2] got committed first and we > > > > might need to tweak a bit based on the input from other thread [1] > > > > where we are discussing user interface for it. > > > > > > > > > > The primary question for patch-2 is whether we want to include WAL > > > usage information for the planning phase as we did for BUFFERS in > > > recent commit ce77abe63c (Include information on buffer usage during > > > planning phase, in EXPLAIN output, take two.). Initially, I thought > > > it might be a good idea to do the same for WAL but after reading the > > > thread that leads to commit, I am not sure if there is any pressing > > > need to include WAL information for the planning phase. Because > > > during planning we might not write much WAL (with the exception of WAL > > > due to setting of hint-bits) so users might not care much. What do > > > you think? > > > > > > I agree that WAL activity during planning shouldn't be very frequent, but it > > might still be worthwhile to add it. > > > > We can add if we want but I am not able to convince myself for that. > Do you have any use case in mind? I think in most of the cases > (except for hint-bit WAL) it will be zero. If we are not sure of this > we can also discuss it separately in a new thread once this > patch-series is committed and see if anybody else sees the value of it > and if so adding the code should be easy. I'm mostly thinking of people trying to investigate possible slowdowns on a hot-standby replica with a primary without wal_log_hints. If they explicitly ask for WAL information, we should provide them, even if it's quite unlikely to happen. > > > I'm wondering how stable the normalized > > WAL information would be in some regression tests, as the counters are only > > showed if non zero. Maybe it'd be better to remove them from the output, same > > as the buffers? > > > > Which regression tests are you referring to? pg_stat_statements? If > so, why would it be unstable? It should always generate WAL although > the exact values may differ and we have already taken care of that in > the patch, no? I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test, which could be unstable for similar reason to why the first attempt to add BUFFERS in the planning part of EXPLAIN was unstable. I thought that's why you were hesitating of adding it.
On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > We can add if we want but I am not able to convince myself for that. > > Do you have any use case in mind? I think in most of the cases > > (except for hint-bit WAL) it will be zero. If we are not sure of this > > we can also discuss it separately in a new thread once this > > patch-series is committed and see if anybody else sees the value of it > > and if so adding the code should be easy. > > > I'm mostly thinking of people trying to investigate possible slowdowns on a > hot-standby replica with a primary without wal_log_hints. If they explicitly > ask for WAL information, we should provide them, even if it's quite unlikely to > happen. > Yeah, possible but I am not completely sure. I would like to hear the opinion of others if any before adding code for this. How about if we first commit pg_stat_statements and wait for this till Monday and if nobody responds we can commit the current patch but would start a new thread and try to get the opinion of others? > > > > > > I'm wondering how stable the normalized > > > WAL information would be in some regression tests, as the counters are only > > > showed if non zero. Maybe it'd be better to remove them from the output, same > > > as the buffers? > > > > > > > Which regression tests are you referring to? pg_stat_statements? If > > so, why would it be unstable? It should always generate WAL although > > the exact values may differ and we have already taken care of that in > > the patch, no? > > > I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test, > which could be unstable for similar reason to why the first attempt to add > BUFFERS in the planning part of EXPLAIN was unstable. > oh, then leave it for now because I don't see much use of those as the code path can anyway be hit by the tests added by pg_stat_statements patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 04, 2020 at 02:39:32PM +0530, Amit Kapila wrote: > On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > We can add if we want but I am not able to convince myself for that. > > > Do you have any use case in mind? I think in most of the cases > > > (except for hint-bit WAL) it will be zero. If we are not sure of this > > > we can also discuss it separately in a new thread once this > > > patch-series is committed and see if anybody else sees the value of it > > > and if so adding the code should be easy. > > > > > > I'm mostly thinking of people trying to investigate possible slowdowns on a > > hot-standby replica with a primary without wal_log_hints. If they explicitly > > ask for WAL information, we should provide them, even if it's quite unlikely to > > happen. > > > > Yeah, possible but I am not completely sure. I would like to hear the > opinion of others if any before adding code for this. How about if we > first commit pg_stat_statements and wait for this till Monday and if > nobody responds we can commit the current patch but would start a new > thread and try to get the opinion of others? I'm fine with it. > > > > > > > > > > I'm wondering how stable the normalized > > > > WAL information would be in some regression tests, as the counters are only > > > > showed if non zero. Maybe it'd be better to remove them from the output, same > > > > as the buffers? > > > > > > > > > > Which regression tests are you referring to? pg_stat_statements? If > > > so, why would it be unstable? It should always generate WAL although > > > the exact values may differ and we have already taken care of that in > > > the patch, no? > > > > > > I'm talking about a hypothetical new EXPLAIN (ALAYZE, WAL) regression test, > > which could be unstable for similar reason to why the first attempt to add > > BUFFERS in the planning part of EXPLAIN was unstable. > > > > oh, then leave it for now because I don't see much use of those as the > code path can anyway be hit by the tests added by pg_stat_statements > patch. > Perfect then!
On Sat, Apr 4, 2020 at 2:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Sat, Apr 04, 2020 at 02:39:32PM +0530, Amit Kapila wrote: > > On Sat, Apr 4, 2020 at 2:24 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > We can add if we want but I am not able to convince myself for that. > > > > Do you have any use case in mind? I think in most of the cases > > > > (except for hint-bit WAL) it will be zero. If we are not sure of this > > > > we can also discuss it separately in a new thread once this > > > > patch-series is committed and see if anybody else sees the value of it > > > > and if so adding the code should be easy. > > > > > > > > > I'm mostly thinking of people trying to investigate possible slowdowns on a > > > hot-standby replica with a primary without wal_log_hints. If they explicitly > > > ask for WAL information, we should provide them, even if it's quite unlikely to > > > happen. > > > > > > > Yeah, possible but I am not completely sure. I would like to hear the > > opinion of others if any before adding code for this. How about if we > > first commit pg_stat_statements and wait for this till Monday and if > > nobody responds we can commit the current patch but would start a new > > thread and try to get the opinion of others? > > > I'm fine with it. > I have pushed pg_stat_statements and Explain related patches. I am now looking into (auto)vacuum patch and have few comments. @@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, TimestampDifference(starttime, endtime, &secs, &usecs); + memset(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); + read_rate = 0; write_rate = 0; if ((secs > 0) || (usecs > 0)) @@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, (long long) VacuumPageDirty); appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: %.3f MB/s\n"), read_rate, write_rate); - appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0)); + appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0)); + appendStringInfo(&buf, + _("WAL usage: %ld records, %ld full page writes, " + UINT64_FORMAT " bytes"), + walusage.wal_records, + walusage.wal_num_fpw, + walusage.wal_bytes); Here, we are not displaying Buffers related data, so why do we think it is important to display WAL data? I see some point in displaying Buffers and WAL data in a vacuum (verbose), but I feel it is better to make a case for both the statistics together rather than just displaying one and leaving other. I think the other change related to autovacuum stats seems okay to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Tue, 31 Mar 2020 at 14:13, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 31 Mar 2020 at 12:58, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Mar 30, 2020 at 12:31 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > The patch for vacuum conflicts with recent changes in vacuum. So I've > > > attached rebased one. > > > > > > > + /* > > + * Next, accumulate buffer usage. (This must wait for the workers to > > + * finish, or we might get incomplete data.) > > + */ > > + for (i = 0; i < nworkers; i++) > > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > + > > > > This should be done for launched workers aka > > lps->pcxt->nworkers_launched. I think a similar problem exists in > > create index related patch. > > You're right. Fixed in the new patches. > > On Mon, 30 Mar 2020 at 17:00, Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Just minor nitpicking: > > > > + int i; > > > > Assert(!IsParallelWorker()); > > Assert(ParallelVacuumIsActive(lps)); > > @@ -2166,6 +2172,13 @@ lazy_parallel_vacuum_indexes(Relation *Irel, IndexBulkDeleteResult **stats, > > /* Wait for all vacuum workers to finish */ > > WaitForParallelWorkersToFinish(lps->pcxt); > > > > + /* > > + * Next, accumulate buffer usage. (This must wait for the workers to > > + * finish, or we might get incomplete data.) > > + */ > > + for (i = 0; i < nworkers; i++) > > + InstrAccumParallelQuery(&lps->buffer_usage[i]); > > > > We now allow declaring a variable in those loops, so it may be better to avoid > > declaring i outside the for scope? > > We can do that but I was not sure if it's good since other codes > around there don't use that. So I'd like to leave it for committers. > It's a trivial change. > I've updated the buffer usage patch for parallel index creation as the previous patch conflicts with commit df3b181499b40. This comment in commit df3b181499b40 seems the comment which had been replaced by Amit with a better sentence when introducing buffer usage to parallel vacuum. + /* + * Estimate space for WalUsage -- PARALLEL_KEY_WAL_USAGE + * + * WalUsage during execution of maintenance command can be used by an + * extension that reports the WAL usage, such as pg_stat_statements. We + * have no way of knowing whether anyone's looking at pgWalUsage, so do it + * unconditionally. + */ Would the following sentence in lazyvacuum.c be also better for parallel create index? * If there are no extensions loaded that care, we could skip this. We * have no way of knowing whether anyone's looking at pgBufferUsage or * pgWalUsage, so do it unconditionally. The attached patch changes to the above comment and removed the code that is used to un-support only buffer usage accumulation. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > The attached patch changes to the above comment and removed the code > that is used to un-support only buffer usage accumulation. > So, IIUC, the purpose of this patch will be to count the buffer usage due to the heap scan (in heapam_index_build_range_scan) we perform while parallel create index? Because the index creation itself won't use buffer manager. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Mon, 6 Apr 2020 at 16:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > The attached patch changes to the above comment and removed the code > > that is used to un-support only buffer usage accumulation. > > > > So, IIUC, the purpose of this patch will be to count the buffer usage > due to the heap scan (in heapam_index_build_range_scan) we perform > while parallel create index? Because the index creation itself won't > use buffer manager. Oops, I'd missed Peter's comment. Btree index doesn't use heapam_index_build_range_scan so it's not necessary. Sorry for the noise. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote: > On Sat, Apr 4, 2020 at 2:50 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > I have pushed pg_stat_statements and Explain related patches. I am > now looking into (auto)vacuum patch and have few comments. > Thanks! > @@ -614,6 +616,9 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, > > TimestampDifference(starttime, endtime, &secs, &usecs); > > + memset(&walusage, 0, sizeof(WalUsage)); > + WalUsageAccumDiff(&walusage, &pgWalUsage, &walusage_start); > + > read_rate = 0; > write_rate = 0; > if ((secs > 0) || (usecs > 0)) > @@ -666,7 +671,13 @@ heap_vacuum_rel(Relation onerel, VacuumParams *params, > (long long) VacuumPageDirty); > appendStringInfo(&buf, _("avg read rate: %.3f MB/s, avg write rate: > %.3f MB/s\n"), > read_rate, write_rate); > - appendStringInfo(&buf, _("system usage: %s"), pg_rusage_show(&ru0)); > + appendStringInfo(&buf, _("system usage: %s\n"), pg_rusage_show(&ru0)); > + appendStringInfo(&buf, > + _("WAL usage: %ld records, %ld full page writes, " > + UINT64_FORMAT " bytes"), > + walusage.wal_records, > + walusage.wal_num_fpw, > + walusage.wal_bytes); > > Here, we are not displaying Buffers related data, so why do we think > it is important to display WAL data? I see some point in displaying > Buffers and WAL data in a vacuum (verbose), but I feel it is better to > make a case for both the statistics together rather than just > displaying one and leaving other. I think the other change related to > autovacuum stats seems okay to me. One thing is that the amount of WAL, and more precisely FPW, is quite unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO a very useful metric. That being said I totally agree with you that both should be displayed. Should I send a patch to also expose it?
On Mon, Apr 6, 2020 at 1:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote: > > > > Here, we are not displaying Buffers related data, so why do we think > > it is important to display WAL data? I see some point in displaying > > Buffers and WAL data in a vacuum (verbose), but I feel it is better to > > make a case for both the statistics together rather than just > > displaying one and leaving other. I think the other change related to > > autovacuum stats seems okay to me. > > One thing is that the amount of WAL, and more precisely FPW, is quite > unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO > a very useful metric. > I agree but we already have a way via pg_stat_statements to find it if the metric is so useful. > That being said I totally agree with you that both > should be displayed. Should I send a patch to also expose it? > I think this should be a separate proposal. Let's not add things unless they are really essential. We can separately discuss of enhancing vacuum verbose for Buffer and WAL usage stats and see if others also find that information useful. I think you can send a patch by removing the code I mentioned above if you agree. Thanks for working on this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Mon, Apr 6, 2020 at 12:55 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 6 Apr 2020 at 16:16, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Apr 6, 2020 at 11:19 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > The attached patch changes to the above comment and removed the code > > > that is used to un-support only buffer usage accumulation. > > > > > > > So, IIUC, the purpose of this patch will be to count the buffer usage > > due to the heap scan (in heapam_index_build_range_scan) we perform > > while parallel create index? Because the index creation itself won't > > use buffer manager. > > Oops, I'd missed Peter's comment. Btree index doesn't use > heapam_index_build_range_scan so it's not necessary. > AFAIU, it uses heapam_index_build_range_scan but for writing to index, it doesn't use buffer manager. So, I guess probably we can accumulate BufferUsage stats for parallel create index. What I wanted to know is whether the extra lookup for pg_amproc or any other catalog access via parallel workers is fine or we somehow want to eliminate that? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 06, 2020 at 02:34:36PM +0530, Amit Kapila wrote: > On Mon, Apr 6, 2020 at 1:53 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, Apr 06, 2020 at 08:55:01AM +0530, Amit Kapila wrote: > > > > > > Here, we are not displaying Buffers related data, so why do we think > > > it is important to display WAL data? I see some point in displaying > > > Buffers and WAL data in a vacuum (verbose), but I feel it is better to > > > make a case for both the statistics together rather than just > > > displaying one and leaving other. I think the other change related to > > > autovacuum stats seems okay to me. > > > > One thing is that the amount of WAL, and more precisely FPW, is quite > > unpredictable wrt. vacuum and even more anti-wraparound vacuum, so this is IMHO > > a very useful metric. > > > > I agree but we already have a way via pg_stat_statements to find it if > the metric is so useful. > Agreed. > > > That being said I totally agree with you that both > > should be displayed. Should I send a patch to also expose it? > > > > I think this should be a separate proposal. Let's not add things > unless they are really essential. We can separately discuss of > enhancing vacuum verbose for Buffer and WAL usage stats and see if > others also find that information useful. I think you can send a > patch by removing the code I mentioned above if you agree. Thanks for > working on this. Thanks! v15 attached.
Attachment
On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
I have pushed pg_stat_statements and Explain related patches. I am
now looking into (auto)vacuum patch and have few comments.
I wasn't paying much attention to this thread. May I suggest changing wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix 'num'. It seems inconsistent to me.
Regards,
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote: > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > I have pushed pg_stat_statements and Explain related patches. I am > > now looking into (auto)vacuum patch and have few comments. > > > > I wasn't paying much attention to this thread. May I suggest changing > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix > 'num'. It seems inconsistent to me. > If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't like much either version.
On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote:
On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote:
> On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >
> > I have pushed pg_stat_statements and Explain related patches. I am
> > now looking into (auto)vacuum patch and have few comments.
> >
> > I wasn't paying much attention to this thread. May I suggest changing
> wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix
> 'num'. It seems inconsistent to me.
>
If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't
like much either version.
Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefer singular form because parameter names are lowercase. Function description will clarify that this is "number of WAL full page writes".
Regards,
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
I noticed in some of the screenshots that were tweeted that for example in WAL: records=1 bytes=56 there are two spaces between pieces of data. This doesn't match the rest of the EXPLAIN output. Can that be adjusted? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote: > I noticed in some of the screenshots that were tweeted that for example in > > WAL: records=1 bytes=56 > > there are two spaces between pieces of data. This doesn't match the rest of > the EXPLAIN output. Can that be adjusted? We talked about that here: https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com -- Justin
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Peter Geoghegan
Date:
On Mon, Apr 6, 2020 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > AFAIU, it uses heapam_index_build_range_scan but for writing to index, > it doesn't use buffer manager. Right. It doesn't need to use the buffer manager to write to the index, unlike (say) GIN's CREATE INDEX. -- Peter Geoghegan
On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote: > > I noticed in some of the screenshots that were tweeted that for example in > > > > WAL: records=1 bytes=56 > > > > there are two spaces between pieces of data. This doesn't match the rest of > > the EXPLAIN output. Can that be adjusted? > > We talked about that here: > https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com > Yeah. Just to brief here, the main reason was that one of the fields (full page writes) already had a single space and then we had prior cases as mentioned in Justin's email [1] where we use two spaces which lead us to decide using two spaces in this case. Now, we can change back to one space as suggested by you but I am not sure if that is an improvement over what we have done. Let me know if you think otherwise. [1] - https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira <euler.taveira@2ndquadrant.com> wrote: > > On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote: >> >> On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote: >> > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > > >> > > I have pushed pg_stat_statements and Explain related patches. I am >> > > now looking into (auto)vacuum patch and have few comments. >> > > >> > > I wasn't paying much attention to this thread. May I suggest changing >> > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix >> > 'num'. It seems inconsistent to me. >> > >> >> If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't >> like much either version. > > > Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefersingular form because parameter names are lowercase. Function description will clarify that this is "number of WALfull page writes". > I like Euler's suggestion to change wal_num_fpw to wal_fpw. It is better if others who didn't like this name can also share their opinion now because changing multiple times the same thing is not a good idea. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Tue, 7 Apr 2020 at 02:40, Peter Geoghegan <pg@bowt.ie> wrote: > > On Mon, Apr 6, 2020 at 2:21 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > AFAIU, it uses heapam_index_build_range_scan but for writing to index, > > it doesn't use buffer manager. > > Right. It doesn't need to use the buffer manager to write to the > index, unlike (say) GIN's CREATE INDEX. Hmm, after more thoughts and testing, it seems to me that parallel btree index creation uses buffer manager while scanning the table in parallel, i.e in heapam_index_build_range_scan, which affects shared_blks_xxx in pg_stat_statements. I've some parallel create index tests with the current HEAD and with the attached patch. The table has 44248 blocks. HEAD, no workers: -[ RECORD 1 ]-------+---------- total_plan_time | 0 total_plan_time | 0 shared_blks_hit | 148 shared_blks_read | 44281 total_read_blks | 44429 shared_blks_dirtied | 44261 shared_blks_written | 24644 wal_records | 71693 wal_num_fpw | 71682 wal_bytes | 566815038 HEAD, 4 workers: -[ RECORD 1 ]-------+---------- total_plan_time | 0 total_plan_time | 0 shared_blks_hit | 160 shared_blks_read | 8892 total_read_blks | 9052 shared_blks_dirtied | 8871 shared_blks_written | 5342 wal_records | 71693 wal_num_fpw | 71682 wal_bytes | 566815038 The WAL usage statistics are good but the buffer usage statistics seem not correct. Patched, no workers: -[ RECORD 1 ]-------+---------- total_plan_time | 0 total_plan_time | 0 shared_blks_hit | 148 shared_blks_read | 44281 total_read_blks | 44429 shared_blks_dirtied | 44261 shared_blks_written | 24843 wal_records | 71693 wal_num_fpw | 71682 wal_bytes | 566815038 Patched, 4 workers: -[ RECORD 1 ]-------+---------- total_plan_time | 0 total_plan_time | 0 shared_blks_hit | 172 shared_blks_read | 44282 total_read_blks | 44454 shared_blks_dirtied | 44261 shared_blks_written | 26968 wal_records | 71693 wal_num_fpw | 71682 wal_bytes | 566815038 Buffer usage statistics seem correct. The small differences would be catalog lookups Peter mentioned. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > Buffer usage statistics seem correct. The small differences would be > catalog lookups Peter mentioned. > Agreed, but can you check which part of code does that lookup? I want to see if we can avoid that from buffer usage stats or at least write a comment about it, otherwise, we might have to face this question again and again. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira > <euler.taveira@2ndquadrant.com> wrote: > > > > On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote: > >> > >> On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote: > >> > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > > >> > > > >> > > I have pushed pg_stat_statements and Explain related patches. I am > >> > > now looking into (auto)vacuum patch and have few comments. > >> > > > >> > > I wasn't paying much attention to this thread. May I suggest changing > >> > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix > >> > 'num'. It seems inconsistent to me. > >> > > >> > >> If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't > >> like much either version. > > > > > > Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefersingular form because parameter names are lowercase. Function description will clarify that this is "number of WALfull page writes". > > > > I like Euler's suggestion to change wal_num_fpw to wal_fpw. It is > better if others who didn't like this name can also share their > opinion now because changing multiple times the same thing is not a > good idea. +1 About Justin and your comments on the other thread: On Tue, Apr 7, 2020 at 4:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Apr 6, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Thu, Apr 02, 2020 at 08:29:31AM +0200, Julien Rouhaud wrote: > > > > > "full page records" seems to be showing the number of full page > > > > > images, not the record having full page images. > > > > > > > > I am not sure what exactly is a difference but it is the records > > > > having full page images. Julien correct me if I am wrong. > > > > > Obviously previous complaints about the meaning and parsability of > > > "full page writes" should be addressed here for consistency. > > > > There's a couple places that say "full page image records" which I think is > > language you were trying to avoid. It's the number of pages, not the number of > > records, no ? I see explain and autovacuum say what I think is wanted, but > > these say the wrong thing? Find attached slightly larger patch. > > > > $ git grep 'image record' > > contrib/pg_stat_statements/pg_stat_statements.c: int64 wal_num_fpw; /* # of WAL full page image recordsgenerated */ > > doc/src/sgml/ref/explain.sgml: number of records, number of full page image records and amount of WAL > > > > Few comments: > 1. > - int64 wal_num_fpw; /* # of WAL full page image records generated */ > + int64 wal_num_fpw; /* # of WAL full page images generated */ > > Let's change comment as " /* # of WAL full page writes generated */" > to be consistent with other places like instrument.h. Also, make a > similar change at other places if required. Agreed. That's pg_stat_statements.c and instrument.h. I'll send a patch once we reach consensus with the rest of the comments. > 2. > <entry> > - Total amount of WAL bytes generated by the statement > + Total number of WAL bytes generated by the statement > </entry> > > I feel the previous text was better as this field can give us the size > of WAL with which we can answer "how much WAL data is generated by a > particular statement?". Julien, do you have any thoughts on this? I also prefer "amount" as it feels more natural. I'm not a native english speaker though, so maybe I'm just biased.
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > Buffer usage statistics seem correct. The small differences would be > > catalog lookups Peter mentioned. > > > > Agreed, but can you check which part of code does that lookup? I want > to see if we can avoid that from buffer usage stats or at least write > a comment about it, otherwise, we might have to face this question > again and again. Okay, I'll check it. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-04-07 04:12, Amit Kapila wrote: > On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote: >> >> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote: >>> I noticed in some of the screenshots that were tweeted that for example in >>> >>> WAL: records=1 bytes=56 >>> >>> there are two spaces between pieces of data. This doesn't match the rest of >>> the EXPLAIN output. Can that be adjusted? >> >> We talked about that here: >> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com >> > > Yeah. Just to brief here, the main reason was that one of the fields > (full page writes) already had a single space and then we had prior > cases as mentioned in Justin's email [1] where we use two spaces which > lead us to decide using two spaces in this case. We also have existing cases for the other way: actual time=0.050..0.052 Buffers: shared hit=3 dirtied=1 The cases mentioned by Justin are not formatted in a key=value format, so it's not quite the same, but it also raises the question why they are not. Let's figure out a way to consolidate this without making up a third format. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > Buffer usage statistics seem correct. The small differences would be > > > catalog lookups Peter mentioned. > > > > > > > Agreed, but can you check which part of code does that lookup? I want > > to see if we can avoid that from buffer usage stats or at least write > > a comment about it, otherwise, we might have to face this question > > again and again. > > Okay, I'll check it. > I've checked the buffer usage differences when parallel btree index creation. TL;DR; During tuple sorting individual parallel workers read blocks of pg_amproc and pg_amproc_fam_proc_index to get the sort support function. The call flow is like: ParallelWorkerMain() _bt_parallel_scan_and_sort() tuplesort_begin_index_btree() PrepareSortSupportFromIndexRel() FinishSortSupportFunction() get_opfamily_proc() The details are as follows. I populated the test table by the following scripts: create table test (c int) with (autovacuum_enabled = off, parallel_workers = 8); insert into test select generate_series(1,10000000); and create index DDL is: create index test_idx on test (c); Before executing the test script, I've put code at the following 4 places which checks the buffer usage at that point, and calculated the difference between points: (a), (b) and (c). For example, (b) shows the number of blocks read or hit during executing scanning heap and building index. 1. Before executing CREATE INDEX command (at pgss_ProcessUtility()) (a) 2. Before parallel create index (at _bt_begin_parallel()) (b) 3. After parallel create index, after accumlating workers stats (at _bt_end_parallel()) (c) 4. After executing CREATE INDEX command (at pgss_ProcessUtility()) And here is the results: 2 workers: (a) hit: 107, read: 26 (b) hit: 12(=6+3+3), read: 44248(=15538+14453+14527) (c) hit: 13, read: 2 total hit: 132, read:44276 4 workers: (a) hit: 107, read: 26 (b) hit: 18(=6+3+3+3+3), read: 44248(=9368+8582+8544+9250+8504) (c) hit: 13, read: 2 total hit: 138, read:44276 The table 'test' has 44276 blocks. From the above results, the total number of reading blocks (44248 blocks) during parallel index creation is stable and equals to the number of blocks of the test table. And we can see that extra three blocks are read per workers. These three blocks are two for pg_amproc_fam_proc_index and one for pg_amproc. That is, individual parallel workers accesses these relations to get the sort support function. The full backtrace is: * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP * frame #0: 0x00007fff779c561a libsystem_kernel.dylib`__select + 10 frame #1: 0x000000010cc9f90d postgres`pg_usleep(microsec=20000000) at pgsleep.c:56:10 frame #2: 0x000000010ca5a668 postgres`ReadBuffer_common(smgr=0x00007fe872848f70, relpersistence='p', forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL, strategy=0x0000000000000000, hit=0x00007ffee363071b) at bufmgr.c:685:3 frame #3: 0x000000010ca5a4b6 postgres`ReadBufferExtended(reln=0x000000010d58f790, forkNum=MAIN_FORKNUM, blockNum=3, mode=RBM_NORMAL, strategy=0x0000000000000000) at bufmgr.c:628:8 frame #4: 0x000000010ca5a397 postgres`ReadBuffer(reln=0x000000010d58f790, blockNum=3) at bufmgr.c:560:9 frame #5: 0x000000010c67187e postgres`_bt_getbuf(rel=0x000000010d58f790, blkno=3, access=1) at nbtpage.c:792:9 frame #6: 0x000000010c670507 postgres`_bt_getroot(rel=0x000000010d58f790, access=1) at nbtpage.c:294:13 frame #7: 0x000000010c679393 postgres`_bt_search(rel=0x000000010d58f790, key=0x00007ffee36312d0, bufP=0x00007ffee3631bec, access=1, snapshot=0x00007fe8728388e0) at nbtsearch.c:107:10 frame #8: 0x000000010c67b489 postgres`_bt_first(scan=0x00007fe86f814998, dir=ForwardScanDirection) at nbtsearch.c:1355:10 frame #9: 0x000000010c676869 postgres`btgettuple(scan=0x00007fe86f814998, dir=ForwardScanDirection) at nbtree.c:253:10 frame #10: 0x000000010c6656ad postgres`index_getnext_tid(scan=0x00007fe86f814998, direction=ForwardScanDirection) at indexam.c:530:10 frame #11: 0x000000010c66585b postgres`index_getnext_slot(scan=0x00007fe86f814998, direction=ForwardScanDirection, slot=0x00007fe86f814880) at indexam.c:622:10 frame #12: 0x000000010c663eac postgres`systable_getnext(sysscan=0x00007fe86f814828) at genam.c:454:7 frame #13: 0x000000010cc0be41 postgres`SearchCatCacheMiss(cache=0x00007fe872818e80, nkeys=4, hashValue=3052139574, hashIndex=6, v1=1976, v2=23, v3=23, v4=2) at catcache.c:1368:9 frame #14: 0x000000010cc0bced postgres`SearchCatCacheInternal(cache=0x00007fe872818e80, nkeys=4, v1=1976, v2=23, v3=23, v4=2) at catcache.c:1299:9 frame #15: 0x000000010cc0baa8 postgres`SearchCatCache4(cache=0x00007fe872818e80, v1=1976, v2=23, v3=23, v4=2) at catcache.c:1191:9 frame #16: 0x000000010cc27c82 postgres`SearchSysCache4(cacheId=5, key1=1976, key2=23, key3=23, key4=2) at syscache.c:1156:9 frame #17: 0x000000010cc105dd postgres`get_opfamily_proc(opfamily=1976, lefttype=23, righttype=23, procnum=2) at lsyscache.c:751:7 frame #18: 0x000000010cc72e1d postgres`FinishSortSupportFunction(opfamily=1976, opcintype=23, ssup=0x00007fe86f8147d0) at sortsupport.c:99:24 frame #19: 0x000000010cc73100 postgres`PrepareSortSupportFromIndexRel(indexRel=0x000000010d5ced48, strategy=1, ssup=0x00007fe86f8147d0) at sortsupport.c:176:2 frame #20: 0x000000010cc75463 postgres`tuplesort_begin_index_btree(heapRel=0x000000010d5cf808, indexRel=0x000000010d5ced48, enforceUnique=false, workMem=21845, coordinate=0x00007fe872839248, randomAccess=false) at tuplesort.c:1114:3 frame #21: 0x000000010c681ffc postgres`_bt_parallel_scan_and_sort(btspool=0x00007fe872839738, btspool2=0x0000000000000000, btshared=0x000000010d56c4c0, sharedsort=0x000000010d56c460, sharedsort2=0x0000000000000000, sortmem=21845, progress=false) at nbtsort.c:1941:23 frame #22: 0x000000010c681eb2 postgres`_bt_parallel_build_main(seg=0x00007fe87280a058, toc=0x000000010d56c000) at nbtsort.c:1889:2 frame #23: 0x000000010c6b7358 postgres`ParallelWorkerMain(main_arg=1169089032) at parallel.c:1471:2 frame #24: 0x000000010c9da86f postgres`StartBackgroundWorker at bgworker.c:813:2 frame #25: 0x000000010c9efbc0 postgres`do_start_bgworker(rw=0x00007fe86f419290) at postmaster.c:5852:4 frame #26: 0x000000010c9eff9f postgres`maybe_start_bgworkers at postmaster.c:6078:9 frame #27: 0x000000010c9eee99 postgres`sigusr1_handler(postgres_signal_arg=30) at postmaster.c:5247:3 frame #28: 0x00007fff77a74b5d libsystem_platform.dylib`_sigtramp + 29 frame #29: 0x00007fff779c561b libsystem_kernel.dylib`__select + 11 frame #30: 0x000000010c9ea48c postgres`ServerLoop at postmaster.c:1691:13 frame #31: 0x000000010c9e9e06 postgres`PostmasterMain(argc=5, argv=0x00007fe86f4036f0) at postmaster.c:1400:11 frame #32: 0x000000010c8ee399 postgres`main(argc=<unavailable>, argv=<unavailable>) at main.c:210:3 frame #33: 0x00007fff778893d5 libdyld.dylib`start + 1 Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Apr 7, 2020 at 12:00 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-04-07 04:12, Amit Kapila wrote: > > On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > >> > >> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote: > >>> I noticed in some of the screenshots that were tweeted that for example in > >>> > >>> WAL: records=1 bytes=56 > >>> > >>> there are two spaces between pieces of data. This doesn't match the rest of > >>> the EXPLAIN output. Can that be adjusted? > >> > >> We talked about that here: > >> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com > >> > > > > Yeah. Just to brief here, the main reason was that one of the fields > > (full page writes) already had a single space and then we had prior > > cases as mentioned in Justin's email [1] where we use two spaces which > > lead us to decide using two spaces in this case. > > We also have existing cases for the other way: > > actual time=0.050..0.052 > Buffers: shared hit=3 dirtied=1 > > The cases mentioned by Justin are not formatted in a key=value format, > so it's not quite the same, but it also raises the question why they are > not. > > Let's figure out a way to consolidate this without making up a third format. The parsability problem Justin was mentioning is only due to "full page writes", so we could use "full_page_writes" or "fpw" instead and remove the extra spaces. There would be a small discrepancy with the verbose autovacuum log, but there are others differences already. I'd slightly in favor of "fpw" to be more concise. Would that be ok?
On Tue, Apr 07, 2020 at 12:00:29PM +0200, Peter Eisentraut wrote: > We also have existing cases for the other way: > > actual time=0.050..0.052 > Buffers: shared hit=3 dirtied=1 > > The cases mentioned by Justin are not formatted in a key=value format, so > it's not quite the same, but it also raises the question why they are not. > > Let's figure out a way to consolidate this without making up a third format. So this re-raises my suggestion here to use colons, Title Case Field Names, and "Size: ..kB" rather than "bytes=": |https://www.postgresql.org/message-id/20200403054451.GN14618%40telsasoft.com As I see it, the sort/hashjoin style is being used for cases with fields with different units: Sort Method: quicksort Memory: 931kB Buckets: 1024 Batches: 1 Memory Usage: 16kB ..which is distinguished from the case where the units are the same, like buffers (hit=Npages read=Npages dirtied=Npages written=Npages). Note, as of 1f39bce021, we have hashagg_disk, which looks like this: template1=# explain analyze SELECT a, COUNT(1) FROM generate_series(1,99999) a GROUP BY 1 ORDER BY 1; ... -> HashAggregate (cost=1499.99..1501.99 rows=200 width=12) (actual time=166.883..280.943 rows=99999 loops=1) Group Key: a Peak Memory Usage: 4913 kB Disk Usage: 1848 kB HashAgg Batches: 8 Incremental sort adds yet another variation, which I've mentioned that thread. I'm hoping to come to some resolution here, first. https://www.postgresql.org/message-id/20200407042521.GH2228%40telsasoft.com -- Justin
On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-04-07 04:12, Amit Kapila wrote: > > On Mon, Apr 6, 2020 at 10:01 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > >> > >> On Mon, Apr 06, 2020 at 05:01:30PM +0200, Peter Eisentraut wrote: > >>> I noticed in some of the screenshots that were tweeted that for example in > >>> > >>> WAL: records=1 bytes=56 > >>> > >>> there are two spaces between pieces of data. This doesn't match the rest of > >>> the EXPLAIN output. Can that be adjusted? > >> > >> We talked about that here: > >> https://www.postgresql.org/message-id/20200402054120.GC14618%40telsasoft.com > >> > > > > Yeah. Just to brief here, the main reason was that one of the fields > > (full page writes) already had a single space and then we had prior > > cases as mentioned in Justin's email [1] where we use two spaces which > > lead us to decide using two spaces in this case. > > We also have existing cases for the other way: > > actual time=0.050..0.052 > Buffers: shared hit=3 dirtied=1 > Buffers case is not the same because 'shared' is used for 'hit', 'read', 'dirtied', etc. However, I think it is arguable. > The cases mentioned by Justin are not formatted in a key=value format, > so it's not quite the same, but it also raises the question why they are > not. > > Let's figure out a way to consolidate this without making up a third format. > Sure, I think my intention is to keep the format of WAL stats as close to Buffers stats as possible because both depict I/O and users would probably be interested to check/read both together. There is a point to keep things in a format so that it is easier for someone to parse but I guess as these as fixed 'words', it shouldn't be difficult either way and we should give more weightage to consistency. Any suggestions? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > Buffer usage statistics seem correct. The small differences would be > > > > catalog lookups Peter mentioned. > > > > > > > > > > Agreed, but can you check which part of code does that lookup? I want > > > to see if we can avoid that from buffer usage stats or at least write > > > a comment about it, otherwise, we might have to face this question > > > again and again. > > > > Okay, I'll check it. > > > > I've checked the buffer usage differences when parallel btree index creation. > > TL;DR; > > During tuple sorting individual parallel workers read blocks of > pg_amproc and pg_amproc_fam_proc_index to get the sort support > function. The call flow is like: > > ParallelWorkerMain() > _bt_parallel_scan_and_sort() > tuplesort_begin_index_btree() > PrepareSortSupportFromIndexRel() > FinishSortSupportFunction() > get_opfamily_proc() > Thanks for the investigation. I don't see we can do anything special about this. In an ideal world, this should be done once and not for each worker but I guess it doesn't matter too much. I am not sure if it is worth adding a comment for this, what do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > Buffer usage statistics seem correct. The small differences would be > > > > > catalog lookups Peter mentioned. > > > > > > > > > > > > > Agreed, but can you check which part of code does that lookup? I want > > > > to see if we can avoid that from buffer usage stats or at least write > > > > a comment about it, otherwise, we might have to face this question > > > > again and again. > > > > > > Okay, I'll check it. > > > > > > > I've checked the buffer usage differences when parallel btree index creation. > > > > TL;DR; > > > > During tuple sorting individual parallel workers read blocks of > > pg_amproc and pg_amproc_fam_proc_index to get the sort support > > function. The call flow is like: > > > > ParallelWorkerMain() > > _bt_parallel_scan_and_sort() > > tuplesort_begin_index_btree() > > PrepareSortSupportFromIndexRel() > > FinishSortSupportFunction() > > get_opfamily_proc() > > > > Thanks for the investigation. I don't see we can do anything special > about this. In an ideal world, this should be done once and not for > each worker but I guess it doesn't matter too much. I am not sure if > it is worth adding a comment for this, what do you think? > I agree with you. If the differences were considerably large probably we would do something but I think we don't need to anything at this time. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Thanks for the investigation. I don't see we can do anything special > > about this. In an ideal world, this should be done once and not for > > each worker but I guess it doesn't matter too much. I am not sure if > > it is worth adding a comment for this, what do you think? > > > > I agree with you. If the differences were considerably large probably > we would do something but I think we don't need to anything at this > time. > Fair enough, can you once check this in back-branches as this needs to be backpatched? I will do that once by myself as well. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Julien Rouhaud
Date:
On Wed, Apr 8, 2020 at 8:23 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Apr 7, 2020 at 5:17 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Tue, 7 Apr 2020 at 18:29, Masahiko Sawada > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > On Tue, 7 Apr 2020 at 17:42, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > On Tue, Apr 7, 2020 at 1:30 PM Masahiko Sawada > > > > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > > > > > > > Buffer usage statistics seem correct. The small differences would be > > > > > > catalog lookups Peter mentioned. > > > > > > > > > > > > > > > > Agreed, but can you check which part of code does that lookup? I want > > > > > to see if we can avoid that from buffer usage stats or at least write > > > > > a comment about it, otherwise, we might have to face this question > > > > > again and again. > > > > > > > > Okay, I'll check it. > > > > > > > > > > I've checked the buffer usage differences when parallel btree index creation. > > > > > > TL;DR; > > > > > > During tuple sorting individual parallel workers read blocks of > > > pg_amproc and pg_amproc_fam_proc_index to get the sort support > > > function. The call flow is like: > > > > > > ParallelWorkerMain() > > > _bt_parallel_scan_and_sort() > > > tuplesort_begin_index_btree() > > > PrepareSortSupportFromIndexRel() > > > FinishSortSupportFunction() > > > get_opfamily_proc() > > > > > > > Thanks for the investigation. I don't see we can do anything special > > about this. In an ideal world, this should be done once and not for > > each worker but I guess it doesn't matter too much. I am not sure if > > it is worth adding a comment for this, what do you think? > > > > I agree with you. If the differences were considerably large probably > we would do something but I think we don't need to anything at this > time. +1
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Masahiko Sawada
Date:
On Wed, 8 Apr 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > Thanks for the investigation. I don't see we can do anything special > > > about this. In an ideal world, this should be done once and not for > > > each worker but I guess it doesn't matter too much. I am not sure if > > > it is worth adding a comment for this, what do you think? > > > > > > > I agree with you. If the differences were considerably large probably > > we would do something but I think we don't need to anything at this > > time. > > > > Fair enough, can you once check this in back-branches as this needs to > be backpatched? I will do that once by myself as well. I've done the same test with HEAD of both REL_12_STABLE and REL_11_STABLE. I think the patch needs to be backpatched to PG11 where parallel index creation was introduced. I've attached the patches for PG12 and PG11 I used for this test for reference. Here are the results: * PG12 With no worker: -[ RECORD 1 ]-------+------------- shared_blks_hit | 119 shared_blks_read | 44283 total_read_blks | 44402 shared_blks_dirtied | 44262 shared_blks_written | 24925 With 4 workers: -[ RECORD 1 ]-------+------------ shared_blks_hit | 128 shared_blks_read | 8844 total_read_blks | 8972 shared_blks_dirtied | 8822 shared_blks_written | 5393 With 4 workers after patching: -[ RECORD 1 ]-------+------------ shared_blks_hit | 140 shared_blks_read | 44284 total_read_blks | 44424 shared_blks_dirtied | 44262 shared_blks_written | 26574 * PG11 With no worker: -[ RECORD 1 ]-------+------------ shared_blks_hit | 124 shared_blks_read | 44284 total_read_blks | 44408 shared_blks_dirtied | 44263 shared_blks_written | 24908 With 4 workers: -[ RECORD 1 ]-------+------------- shared_blks_hit | 132 shared_blks_read | 8910 total_read_blks | 9042 shared_blks_dirtied | 8888 shared_blks_written | 5370 With 4 workers after patched: -[ RECORD 1 ]-------+------------- shared_blks_hit | 144 shared_blks_read | 44285 total_read_blks | 44429 shared_blks_dirtied | 44263 shared_blks_written | 26861 Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Wed, Apr 8, 2020 at 1:49 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Wed, 8 Apr 2020 at 16:04, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Apr 8, 2020 at 11:53 AM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Wed, 8 Apr 2020 at 14:44, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > Thanks for the investigation. I don't see we can do anything special > > > > about this. In an ideal world, this should be done once and not for > > > > each worker but I guess it doesn't matter too much. I am not sure if > > > > it is worth adding a comment for this, what do you think? > > > > > > > > > > I agree with you. If the differences were considerably large probably > > > we would do something but I think we don't need to anything at this > > > time. > > > > > > > Fair enough, can you once check this in back-branches as this needs to > > be backpatched? I will do that once by myself as well. > > I've done the same test with HEAD of both REL_12_STABLE and > REL_11_STABLE. I think the patch needs to be backpatched to PG11 where > parallel index creation was introduced. I've attached the patches > for PG12 and PG11 I used for this test for reference. > Thanks, I will once again verify and push this tomorrow if there are no other comments. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 7, 2020 at 2:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira > > <euler.taveira@2ndquadrant.com> wrote: > > > > > > On Mon, 6 Apr 2020 at 10:37, Julien Rouhaud <rjuju123@gmail.com> wrote: > > >> > > >> On Mon, Apr 06, 2020 at 10:12:55AM -0300, Euler Taveira wrote: > > >> > On Mon, 6 Apr 2020 at 00:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> > > > >> > > > > >> > > I have pushed pg_stat_statements and Explain related patches. I am > > >> > > now looking into (auto)vacuum patch and have few comments. > > >> > > > > >> > > I wasn't paying much attention to this thread. May I suggest changing > > >> > wal_num_fpw to wal_fpw? wal_records and wal_bytes does not have a prefix > > >> > 'num'. It seems inconsistent to me. > > >> > > > >> > > >> If we want to be consistent shouldn't we rename it to wal_fpws? FTR I don't > > >> like much either version. > > > > > > > > > Since FPW is an acronym, plural form reads better when you are using uppercase (such as FPWs or FPW's); thus, I prefersingular form because parameter names are lowercase. Function description will clarify that this is "number of WALfull page writes". > > > > > > > I like Euler's suggestion to change wal_num_fpw to wal_fpw. It is > > better if others who didn't like this name can also share their > > opinion now because changing multiple times the same thing is not a > > good idea. > > +1 > > About Justin and your comments on the other thread: > > On Tue, Apr 7, 2020 at 4:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Apr 6, 2020 at 10:04 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Thu, Apr 02, 2020 at 08:29:31AM +0200, Julien Rouhaud wrote: > > > > > > "full page records" seems to be showing the number of full page > > > > > > images, not the record having full page images. > > > > > > > > > > I am not sure what exactly is a difference but it is the records > > > > > having full page images. Julien correct me if I am wrong. > > > > > > > Obviously previous complaints about the meaning and parsability of > > > > "full page writes" should be addressed here for consistency. > > > > > > There's a couple places that say "full page image records" which I think is > > > language you were trying to avoid. It's the number of pages, not the number of > > > records, no ? I see explain and autovacuum say what I think is wanted, but > > > these say the wrong thing? Find attached slightly larger patch. > > > > > > $ git grep 'image record' > > > contrib/pg_stat_statements/pg_stat_statements.c: int64 wal_num_fpw; /* # of WAL full page imagerecords generated */ > > > doc/src/sgml/ref/explain.sgml: number of records, number of full page image records and amount of WAL > > > > > > > Few comments: > > 1. > > - int64 wal_num_fpw; /* # of WAL full page image records generated */ > > + int64 wal_num_fpw; /* # of WAL full page images generated */ > > > > Let's change comment as " /* # of WAL full page writes generated */" > > to be consistent with other places like instrument.h. Also, make a > > similar change at other places if required. > > Agreed. That's pg_stat_statements.c and instrument.h. I'll send a > patch once we reach consensus with the rest of the comments. > Would you like to send a consolidated patch that includes Euler's suggestion and Justin's patch (by making changes for points we discussed.)? I think we can keep the point related to number of spaces before each field open? > > 2. > > <entry> > > - Total amount of WAL bytes generated by the statement > > + Total number of WAL bytes generated by the statement > > </entry> > > > > I feel the previous text was better as this field can give us the size > > of WAL with which we can answer "how much WAL data is generated by a > > particular statement?". Julien, do you have any thoughts on this? > > I also prefer "amount" as it feels more natural. > As we see no other opinion on this matter, we can use "amount" here. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 7, 2020 at 2:48 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Tue, Apr 7, 2020 at 4:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Apr 6, 2020 at 7:58 PM Euler Taveira > > > <euler.taveira@2ndquadrant.com> wrote: > > > Few comments: > > > 1. > > > - int64 wal_num_fpw; /* # of WAL full page image records generated */ > > > + int64 wal_num_fpw; /* # of WAL full page images generated */ > > > > > > Let's change comment as " /* # of WAL full page writes generated */" > > > to be consistent with other places like instrument.h. Also, make a > > > similar change at other places if required. > > > > Agreed. That's pg_stat_statements.c and instrument.h. I'll send a > > patch once we reach consensus with the rest of the comments. > > > > Would you like to send a consolidated patch that includes Euler's > suggestion and Justin's patch (by making changes for points we > discussed.)? I think we can keep the point related to number of > spaces before each field open? Sure, I'll take care of that tomorrow! > > > 2. > > > <entry> > > > - Total amount of WAL bytes generated by the statement > > > + Total number of WAL bytes generated by the statement > > > </entry> > > > > > > I feel the previous text was better as this field can give us the size > > > of WAL with which we can answer "how much WAL data is generated by a > > > particular statement?". Julien, do you have any thoughts on this? > > > > I also prefer "amount" as it feels more natural. > > > > As we see no other opinion on this matter, we can use "amount" here. Ok.
On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Would you like to send a consolidated patch that includes Euler's > > suggestion and Justin's patch (by making changes for points we > > discussed.)? I think we can keep the point related to number of > > spaces before each field open? > > Sure, I'll take care of that tomorrow! I tried to take into account all that have been discussed, but I have to admit that I'm absolutely not sure of what was actually decided here. I went with those changes: - rename wal_num_fpw to wal_fpw for consistency, both in pgss view fiel name but also everywhere in the code - change comments to consistently mention "full page writes generated" - changed pgss and explain documentation to mention "full page images generated", from Justin's patch on another thread - kept "amount" of WAL bytes - no change to the explain output as I have no idea what is the consensus (one or two spaces, use semicolon or equal, show unit or not)
Attachment
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Justin Pryzby
Date:
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote: > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > I see some basic problems with the patch. The way it tries to compute > > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > > or point me to any test done where we have computed WAL for parallel > > > operations like Parallel Vacuum or Parallel Create Index? > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > > workers won't be accounted for. That being said, I think that an argument > > could be made that proper infrastructure should have been added in the original > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > > usage in parallel utility, unless I'm missing something. > > Just to be sure I did a quick test with pg_stat_statements behavior using > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage > doesn't reflect parallel workers' activity. > > I added an open for that, and adding Robert in Cc as 9da0cc352 is the first > commit adding parallel maintenance. I believe this is resolved for parallel vacuum in master and parallel create index back to PG11. I marked this as closed. https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781 -- Justin
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Julien Rouhaud
Date:
Le dim. 12 avr. 2020 à 00:33, Justin Pryzby <pryzby@telsasoft.com> a écrit :
On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote:
>
> Just to be sure I did a quick test with pg_stat_statements behavior using
> parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage
> doesn't reflect parallel workers' activity.
>
> I added an open for that, and adding Robert in Cc as 9da0cc352 is the first
> commit adding parallel maintenance.
I believe this is resolved for parallel vacuum in master and parallel create
index back to PG11.
indeed, I was about to take care of this too
I marked this as closed.
https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781
thanks a lot!
Re: pg_stat_statements issue with parallel maintenance (Was Re: WALusage calculation patch)
From
Amit Kapila
Date:
On Sun, Apr 12, 2020 at 4:03 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sat, Mar 28, 2020 at 04:17:21PM +0100, Julien Rouhaud wrote: > > On Sat, Mar 28, 2020 at 02:38:27PM +0100, Julien Rouhaud wrote: > > > On Sat, Mar 28, 2020 at 04:14:04PM +0530, Amit Kapila wrote: > > > > > > > > I see some basic problems with the patch. The way it tries to compute > > > > WAL usage for parallel stuff doesn't seem right to me. Can you share > > > > or point me to any test done where we have computed WAL for parallel > > > > operations like Parallel Vacuum or Parallel Create Index? > > > > > > Ah, that's indeed a good point and AFAICT WAL records from parallel utility > > > workers won't be accounted for. That being said, I think that an argument > > > could be made that proper infrastructure should have been added in the original > > > parallel utility patches, as pg_stat_statement is already broken wrt. buffer > > > usage in parallel utility, unless I'm missing something. > > > > Just to be sure I did a quick test with pg_stat_statements behavior using > > parallel/non-parallel CREATE INDEX and VACUUM, and unsurprisingly buffer usage > > doesn't reflect parallel workers' activity. > > > > I added an open for that, and adding Robert in Cc as 9da0cc352 is the first > > commit adding parallel maintenance. > > I believe this is resolved for parallel vacuum in master and parallel create > index back to PG11. > > I marked this as closed. > https://wiki.postgresql.org/index.php?title=PostgreSQL_13_Open_Items&diff=34802&oldid=34781 > Okay, thanks. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Fri, Apr 10, 2020 at 8:17 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > Would you like to send a consolidated patch that includes Euler's > > > suggestion and Justin's patch (by making changes for points we > > > discussed.)? I think we can keep the point related to number of > > > spaces before each field open? > > > > Sure, I'll take care of that tomorrow! > > I tried to take into account all that have been discussed, but I have > to admit that I'm absolutely not sure of what was actually decided > here. I went with those changes: > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view > fiel name but also everywhere in the code > - change comments to consistently mention "full page writes generated" > - changed pgss and explain documentation to mention "full page images > generated", from Justin's patch on another thread > I think it is better to use "full page writes" to be consistent with other places. > - kept "amount" of WAL bytes > Okay, but I would like to make another change suggested by Justin which is to replace "count" with "number" at a few places. I have made the above two changes in the attached. Let me know what you think about attached? > - no change to the explain output as I have no idea what is the > consensus (one or two spaces, use semicolon or equal, show unit or > not) > Yeah, let's do this separately once we have consensus. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > I tried to take into account all that have been discussed, but I have > > to admit that I'm absolutely not sure of what was actually decided > > here. I went with those changes: > > > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view > > fiel name but also everywhere in the code > > - change comments to consistently mention "full page writes generated" > > - changed pgss and explain documentation to mention "full page images > > generated", from Justin's patch on another thread > > > > I think it is better to use "full page writes" to be consistent with > other places. > > > - kept "amount" of WAL bytes > > > > Okay, but I would like to make another change suggested by Justin > which is to replace "count" with "number" at a few places. Ah sorry I missed this one. +1 it also sounds better. > I have made the above two changes in the attached. Let me know what > you think about attached? It all looks good to me! > > - no change to the explain output as I have no idea what is the > > consensus (one or two spaces, use semicolon or equal, show unit or > > not) > > > > Yeah, let's do this separately once we have consensus. Agreed.
On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > I tried to take into account all that have been discussed, but I have > > > to admit that I'm absolutely not sure of what was actually decided > > > here. I went with those changes: > > > > > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view > > > fiel name but also everywhere in the code > > > - change comments to consistently mention "full page writes generated" > > > - changed pgss and explain documentation to mention "full page images > > > generated", from Justin's patch on another thread > > > > > > > I think it is better to use "full page writes" to be consistent with > > other places. > > > > > - kept "amount" of WAL bytes > > > > > > > Okay, but I would like to make another change suggested by Justin > > which is to replace "count" with "number" at a few places. > > Ah sorry I missed this one. +1 it also sounds better. > > > I have made the above two changes in the attached. Let me know what > > you think about attached? > > It all looks good to me! > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Le lun. 13 avr. 2020 à 13:47, Amit Kapila <amit.kapila16@gmail.com> a écrit :
On Mon, Apr 13, 2020 at 1:10 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
>
> On Mon, Apr 13, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Apr 11, 2020 at 6:55 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > >
> > > On Fri, Apr 10, 2020 at 9:37 PM Julien Rouhaud <rjuju123@gmail.com> wrote:
> > > >
> > > I tried to take into account all that have been discussed, but I have
> > > to admit that I'm absolutely not sure of what was actually decided
> > > here. I went with those changes:
> > >
> > > - rename wal_num_fpw to wal_fpw for consistency, both in pgss view
> > > fiel name but also everywhere in the code
> > > - change comments to consistently mention "full page writes generated"
> > > - changed pgss and explain documentation to mention "full page images
> > > generated", from Justin's patch on another thread
> > >
> >
> > I think it is better to use "full page writes" to be consistent with
> > other places.
> >
> > > - kept "amount" of WAL bytes
> > >
> >
> > Okay, but I would like to make another change suggested by Justin
> > which is to replace "count" with "number" at a few places.
>
> Ah sorry I missed this one. +1 it also sounds better.
>
> > I have made the above two changes in the attached. Let me know what
> > you think about attached?
>
> It all looks good to me!
>
Pushed.
Thanks a lot Amit!
On Wed, Apr 8, 2020 at 8:36 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 7, 2020 at 3:30 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > > > We also have existing cases for the other way: > > > > actual time=0.050..0.052 > > Buffers: shared hit=3 dirtied=1 > > > > Buffers case is not the same because 'shared' is used for 'hit', > 'read', 'dirtied', etc. However, I think it is arguable. > > > The cases mentioned by Justin are not formatted in a key=value format, > > so it's not quite the same, but it also raises the question why they are > > not. > > > > Let's figure out a way to consolidate this without making up a third format. > > > > Sure, I think my intention is to keep the format of WAL stats as close > to Buffers stats as possible because both depict I/O and users would > probably be interested to check/read both together. There is a point > to keep things in a format so that it is easier for someone to parse > but I guess as these as fixed 'words', it shouldn't be difficult > either way and we should give more weightage to consistency. Any > suggestions? > Peter E, others, any suggestions on how to move forward? I think here we should follow the rule "follow the style of nearby code" which in this case would be to have one space after each field as we would like it to be closer to the "Buffers" format. It would be good if we have a unified format among all Explain stuff but we might not want to change the existing things and even if we want to do that it might be a broader/bigger change and we should do that as a PG14 change. What do you think? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-04-14 05:57, Amit Kapila wrote: > Peter E, others, any suggestions on how to move forward? I think here > we should follow the rule "follow the style of nearby code" which in > this case would be to have one space after each field as we would like > it to be closer to the "Buffers" format. It would be good if we have > a unified format among all Explain stuff but we might not want to > change the existing things and even if we want to do that it might be > a broader/bigger change and we should do that as a PG14 change. What > do you think? If looks like shortening to fpw= and using one space is the easiest way to solve this issue. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-04-14 05:57, Amit Kapila wrote: > > Peter E, others, any suggestions on how to move forward? I think here > > we should follow the rule "follow the style of nearby code" which in > > this case would be to have one space after each field as we would like > > it to be closer to the "Buffers" format. It would be good if we have > > a unified format among all Explain stuff but we might not want to > > change the existing things and even if we want to do that it might be > > a broader/bigger change and we should do that as a PG14 change. What > > do you think? > > If looks like shortening to fpw= and using one space is the easiest way > to solve this issue. > I am fine with this approach and will change accordingly. I will wait for a few days (3-4 days) to see if someone shows up with either an objection to this or with a better idea for the display of WAL usage information. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > On 2020-04-14 05:57, Amit Kapila wrote: > > > Peter E, others, any suggestions on how to move forward? I think here > > > we should follow the rule "follow the style of nearby code" which in > > > this case would be to have one space after each field as we would like > > > it to be closer to the "Buffers" format. It would be good if we have > > > a unified format among all Explain stuff but we might not want to > > > change the existing things and even if we want to do that it might be > > > a broader/bigger change and we should do that as a PG14 change. What > > > do you think? > > > > If looks like shortening to fpw= and using one space is the easiest way > > to solve this issue. > > > > I am fine with this approach and will change accordingly. I will wait > for a few days (3-4 days) to see if someone shows up with either an > objection to this or with a better idea for the display of WAL usage > information. That was also my preferred alternative. PFA a patch for that. I also changed to "fpw" for the non textual output for consistency.
Attachment
On Sat, Apr 18, 2020 at 05:39:35PM +0200, Julien Rouhaud wrote: > On Sat, Apr 18, 2020 at 6:16 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 17, 2020 at 6:45 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > > On 2020-04-14 05:57, Amit Kapila wrote: > > > > Peter E, others, any suggestions on how to move forward? I think here > > > > we should follow the rule "follow the style of nearby code" which in > > > > this case would be to have one space after each field as we would like > > > > it to be closer to the "Buffers" format. It would be good if we have > > > > a unified format among all Explain stuff but we might not want to > > > > change the existing things and even if we want to do that it might be > > > > a broader/bigger change and we should do that as a PG14 change. What > > > > do you think? > > > > > > If looks like shortening to fpw= and using one space is the easiest way > > > to solve this issue. > > > > > > > I am fine with this approach and will change accordingly. I will wait > > for a few days (3-4 days) to see if someone shows up with either an > > objection to this or with a better idea for the display of WAL usage > > information. > > That was also my preferred alternative. PFA a patch for that. I also > changed to "fpw" for the non textual output for consistency. Should capitalize at least the non-text one ? And maybe the text one for consistency ? + ExplainPropertyInteger("WAL fpw", NULL, And add the acronym to the docs: $ git grep 'full page' '*/explain.sgml' doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes "..full page writes (FPW).." Should we also change vacuumlazy.c for consistency ? + _("WAL usage: %ld records, %ld full page writes, " + UINT64_FORMAT " bytes"), -- Justin
Hi Justin, Thanks for the review! On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > Should capitalize at least the non-text one ? And maybe the text one for > consistency ? > > + ExplainPropertyInteger("WAL fpw", NULL, I think we should keep both version consistent, whether lower or upper case. The uppercase version is probably more correct, but it's a little bit weird to have it being the only upper case label in all output, so I kept it lower case. > And add the acronym to the docs: > > $ git grep 'full page' '*/explain.sgml' > doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes > > "..full page writes (FPW).." Indeed! Fixed (using lowercase to match current output). > Should we also change vacuumlazy.c for consistency ? > > + _("WAL usage: %ld records, %ld full page writes, " > + UINT64_FORMAT " bytes"), I don't think this one should be changed, vacuumlazy output is already entirely different, and is way more verbose so keeping it as is makes sense to me.
Attachment
At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in > Hi Justin, > > Thanks for the review! > > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > Should capitalize at least the non-text one ? And maybe the text one for > > consistency ? > > > > + ExplainPropertyInteger("WAL fpw", NULL, > > I think we should keep both version consistent, whether lower or upper > case. The uppercase version is probably more correct, but it's a > little bit weird to have it being the only upper case label in all > output, so I kept it lower case. One space follwed by an acronym looks perfect. I'd prefer capital letters but small-letters also works well. > > And add the acronym to the docs: > > > > $ git grep 'full page' '*/explain.sgml' > > doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes > > > > "..full page writes (FPW).." > > Indeed! Fixed (using lowercase to match current output). I searched through the documentation and AFAICS most of occurances of "full page" are follwed by "image" and full_page_writes is used only as the parameter name. I'm fine with fpw as the acronym, but "fpw means the number of full page images" looks odd.. > > Should we also change vacuumlazy.c for consistency ? > > > > + _("WAL usage: %ld records, %ld full page writes, " > > + UINT64_FORMAT " bytes"), > > I don't think this one should be changed, vacuumlazy output is already > entirely different, and is way more verbose so keeping it as is makes > sense to me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in > > Hi Justin, > > > > Thanks for the review! > > > > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > Should capitalize at least the non-text one ? And maybe the text one for > > > consistency ? > > > > > > + ExplainPropertyInteger("WAL fpw", NULL, > > > > I think we should keep both version consistent, whether lower or upper > > case. The uppercase version is probably more correct, but it's a > > little bit weird to have it being the only upper case label in all > > output, so I kept it lower case. I think we can keep upper-case for all non-text ones in case of WAL usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer usage seems to be following a similar convention. > > One space follwed by an acronym looks perfect. I'd prefer capital > letters but small-letters also works well. > > > > And add the acronym to the docs: > > > > > > $ git grep 'full page' '*/explain.sgml' > > > doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes > > > > > > "..full page writes (FPW).." > > > > Indeed! Fixed (using lowercase to match current output). > > I searched through the documentation and AFAICS most of occurances of > "full page" are follwed by "image" and full_page_writes is used only > as the parameter name. > > I'm fine with fpw as the acronym, but "fpw means the number of full > page images" looks odd.. > I don't understand this. Where are we using such a description of fpw? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote: > > > > And add the acronym to the docs: > > > > > > > > $ git grep 'full page' '*/explain.sgml' > > > > doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes > > > > > > > > "..full page writes (FPW).." > > > > > > Indeed! Fixed (using lowercase to match current output). > > > > I searched through the documentation and AFAICS most of occurances of > > "full page" are follwed by "image" and full_page_writes is used only > > as the parameter name. > > > > I'm fine with fpw as the acronym, but "fpw means the number of full > > page images" looks odd.. > > > > I don't understand this. Where are we using such a description of fpw? I suggested to add " (FPW)" to the new docs for "explain(wal)" But, the documentation before this commit mostly refers to "full page images". So the implication is that maybe we should use that language (and FPI acronym). The only pre-existing use of "full page writes" seems to be here: $ git grep -iC2 'full page write' origin doc origin:doc/src/sgml/wal.sgml- Internal data structures such as <filename>pg_xact</filename>, <filename>pg_subtrans</filename>,<filename>pg_multixact</filename>, origin:doc/src/sgml/wal.sgml- <filename>pg_serial</filename>, <filename>pg_notify</filename>, <filename>pg_stat</filename>,<filename>pg_snapshots</filename> are not directly origin:doc/src/sgml/wal.sgml: checksummed, nor are pages protected by full page writes. However, where And we're not using either acronym. -- Justin
On Wed, Apr 22, 2020 at 9:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote: > > > > > And add the acronym to the docs: > > > > > > > > > > $ git grep 'full page' '*/explain.sgml' > > > > > doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes > > > > > > > > > > "..full page writes (FPW).." > > > > > > > > Indeed! Fixed (using lowercase to match current output). > > > > > > I searched through the documentation and AFAICS most of occurances of > > > "full page" are follwed by "image" and full_page_writes is used only > > > as the parameter name. > > > > > > I'm fine with fpw as the acronym, but "fpw means the number of full > > > page images" looks odd.. > > > > > > > I don't understand this. Where are we using such a description of fpw? > > I suggested to add " (FPW)" to the new docs for "explain(wal)" > But, the documentation before this commit mostly refers to "full page images". > So the implication is that maybe we should use that language (and FPI acronym). > I am not sure if it matters that much. I think we can use "full page writes (FPW)" in this case but we should be consistent wherever we refer it in the WAL usage context and I think we already are, if not then let's be consistent. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Apr 22, 2020 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in > > > Hi Justin, > > > > > > Thanks for the review! > > > > > > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > Should capitalize at least the non-text one ? And maybe the text one for > > > > consistency ? > > > > > > > > + ExplainPropertyInteger("WAL fpw", NULL, > > > > > > I think we should keep both version consistent, whether lower or upper > > > case. The uppercase version is probably more correct, but it's a > > > little bit weird to have it being the only upper case label in all > > > output, so I kept it lower case. > > I think we can keep upper-case for all non-text ones in case of WAL > usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer > usage seems to be following a similar convention. > The attached patch changed the non-text display format as mentioned. Let me know if you have any comments? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Wed, Apr 22, 2020 at 2:27 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 22, 2020 at 9:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Wed, Apr 22, 2020 at 09:15:08AM +0530, Amit Kapila wrote: > > > > > > And add the acronym to the docs: > > > > > > > > > > > > $ git grep 'full page' '*/explain.sgml' > > > > > > doc/src/sgml/ref/explain.sgml: number of records, number of full page writes and amount of WAL bytes > > > > > > > > > > > > "..full page writes (FPW).." > > > > > > > > > > Indeed! Fixed (using lowercase to match current output). > > > > > > > > I searched through the documentation and AFAICS most of occurances of > > > > "full page" are follwed by "image" and full_page_writes is used only > > > > as the parameter name. > > > > > > > > I'm fine with fpw as the acronym, but "fpw means the number of full > > > > page images" looks odd.. > > > > > > > > > > I don't understand this. Where are we using such a description of fpw? > > > > I suggested to add " (FPW)" to the new docs for "explain(wal)" > > But, the documentation before this commit mostly refers to "full page images". > > So the implication is that maybe we should use that language (and FPI acronym). > > > > I am not sure if it matters that much. I think we can use "full page > writes (FPW)" in this case but we should be consistent wherever we > refer it in the WAL usage context and I think we already are, if not > then let's be consistent. I agree that full page writes can be used in this case, but I'm wondering if that can be misleading for some reader which might e.g. confuse with the full_page_writes GUC. And as Justin pointed out, the documentation for now usually mentions "full page image(s)" in such cases.
On Thu, Apr 23, 2020 at 7:20 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 22, 2020 at 9:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Apr 20, 2020 at 1:17 PM Kyotaro Horiguchi > > <horikyota.ntt@gmail.com> wrote: > > > > > > At Sun, 19 Apr 2020 16:22:26 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in > > > > Hi Justin, > > > > > > > > Thanks for the review! > > > > > > > > On Sat, Apr 18, 2020 at 10:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > > > > > Should capitalize at least the non-text one ? And maybe the text one for > > > > > consistency ? > > > > > > > > > > + ExplainPropertyInteger("WAL fpw", NULL, > > > > > > > > I think we should keep both version consistent, whether lower or upper > > > > case. The uppercase version is probably more correct, but it's a > > > > little bit weird to have it being the only upper case label in all > > > > output, so I kept it lower case. > > > > I think we can keep upper-case for all non-text ones in case of WAL > > usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer > > usage seems to be following a similar convention. > > > > The attached patch changed the non-text display format as mentioned. > Let me know if you have any comments? Assuming that we're fine using full page write(s) / FPW rather than full page image(s) / FPI (see previous mail), I'm fine with this patch.
At Thu, 23 Apr 2020 07:33:13 +0200, Julien Rouhaud <rjuju123@gmail.com> wrote in > > > > > I think we should keep both version consistent, whether lower or upper > > > > > case. The uppercase version is probably more correct, but it's a > > > > > little bit weird to have it being the only upper case label in all > > > > > output, so I kept it lower case. > > > > > > I think we can keep upper-case for all non-text ones in case of WAL > > > usage, something like WAL Records, WAL FPW, WAL Bytes. The buffer > > > usage seems to be following a similar convention. > > > > > > > The attached patch changed the non-text display format as mentioned. > > Let me know if you have any comments? > > Assuming that we're fine using full page write(s) / FPW rather than > full page image(s) / FPI (see previous mail), I'm fine with this > patch. FWIW, I like FPW, and the patch looks good to me. The index in the documentation has the entry for full_page_writes (having underscores) and it would work. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-04-23 07:31, Julien Rouhaud wrote: > > I agree that full page writes can be used in this case, but I'm > > wondering if that can be misleading for some reader which might e.g. > > confuse with the full_page_writes GUC. And as Justin pointed out, the > > documentation for now usually mentions "full page image(s)" in such > > cases. > > ISTM that in the context of this patch, "full-page image" is correct. A > "full-page write" is what you do to a table or index page when you are > recovering a full-page image. > So what do we call when we log the page after it is touched after checkpoint? I thought we call that as full-page write. > The internal symbol for the WAL record is > XLOG_FPI and xlogdesc.c prints it as "FPI". > That is just one way/reason we log the page. There are others as well. I thought here we are computing the number of full-page writes happened in the system due to various reasons like (a) a page is operated upon first time after the checkpoint, (b) log the XLOG_FPI record, (c) Guc for WAL consistency checker is on, etc. If we see in XLogRecordAssemble where we decide to log this information, there is a comment " .... log a full-page write for the current block." and there was an existing variable with 'fpw_lsn' which indicates to an extent that what we are computing in this patch is full-page writes. But there is a reference to full-page image as well. I think as full_page_writes is an exposed variable that is well understood so exposing information with similar name via this patch doesn't sound illogical to me. Whatever we use here we need to be consistent all throughout, even pg_stat_statements need to name exposed variable as wal_fpi instead of wal_fpw. To me, full-page writes sound more appealing with other WAL usage variables like records and bytes. I might be more used to this term as 'fpw' that is why it occurred better to me. OTOH, if most of us think that a full-page image is better suited here, I am fine with changing it at all places. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > The internal symbol for the WAL record is > > XLOG_FPI and xlogdesc.c prints it as "FPI". > > > > That is just one way/reason we log the page. There are others as > well. I thought here we are computing the number of full-page writes > happened in the system due to various reasons like (a) a page is > operated upon first time after the checkpoint, (b) log the XLOG_FPI > record, (c) Guc for WAL consistency checker is on, etc. If we see in > XLogRecordAssemble where we decide to log this information, there is a > comment " .... log a full-page write for the current block." and there > was an existing variable with 'fpw_lsn' which indicates to an extent > that what we are computing in this patch is full-page writes. But > there is a reference to full-page image as well. I think as > full_page_writes is an exposed variable that is well understood so > exposing information with similar name via this patch doesn't sound > illogical to me. Whatever we use here we need to be consistent all > throughout, even pg_stat_statements need to name exposed variable as > wal_fpi instead of wal_fpw. > > To me, full-page writes sound more appealing with other WAL usage > variables like records and bytes. I might be more used to this term as > 'fpw' that is why it occurred better to me. OTOH, if most of us think > that a full-page image is better suited here, I am fine with changing > it at all places. > Julien, Peter, others do you have any opinion here? I think it is better if we decide on one of FPW or FPI and make the changes at all places for this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote: > On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: >>> The internal symbol for the WAL record is >>> XLOG_FPI and xlogdesc.c prints it as "FPI". > > Julien, Peter, others do you have any opinion here? I think it is > better if we decide on one of FPW or FPI and make the changes at all > places for this patch. It seems to me that Peter is right here. A full-page write is the action to write a full-page image, so if you consider only a way to define the static data of a full-page and/or a quantity associated to it, we should talk about full-page images. -- Michael
Attachment
On Mon, Apr 27, 2020 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote: > > On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > >>> The internal symbol for the WAL record is > >>> XLOG_FPI and xlogdesc.c prints it as "FPI". > > > > Julien, Peter, others do you have any opinion here? I think it is > > better if we decide on one of FPW or FPI and make the changes at all > > places for this patch. > > It seems to me that Peter is right here. A full-page write is the > action to write a full-page image, so if you consider only a way to > define the static data of a full-page and/or a quantity associated to > it, we should talk about full-page images. I agree with that definition. I can send a cleanup patch if there's no objection.
On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, Apr 27, 2020 at 8:12 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Mon, Apr 27, 2020 at 08:35:51AM +0530, Amit Kapila wrote: > > > On Thu, Apr 23, 2020 at 2:35 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > >> On Thu, Apr 23, 2020 at 12:16 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > >>> The internal symbol for the WAL record is > > >>> XLOG_FPI and xlogdesc.c prints it as "FPI". > > > > > > Julien, Peter, others do you have any opinion here? I think it is > > > better if we decide on one of FPW or FPI and make the changes at all > > > places for this patch. > > > > It seems to me that Peter is right here. A full-page write is the > > action to write a full-page image, so if you consider only a way to > > define the static data of a full-page and/or a quantity associated to > > it, we should talk about full-page images. > Fair enough, if more people want full-page image terminology in this context then we can do that. > I agree with that definition. I can send a cleanup patch if there's > no objection. > Okay, feel free to send the patch. Thanks for taking the initiative to write a patch for this. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > I agree with that definition. I can send a cleanup patch if there's > > no objection. > > > > Okay, feel free to send the patch. Thanks for taking the initiative > to write a patch for this. > Julien, are you planning to write a cleanup patch for this open item? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > I agree with that definition. I can send a cleanup patch if there's > > > no objection. > > > > > > > Okay, feel free to send the patch. Thanks for taking the initiative > > to write a patch for this. > > > > Julien, are you planning to write a cleanup patch for this open item? Sorry Amit, I've been quite busy at work for the last couple of days. I'll take care of that this morning for sure!
On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Tue, Apr 28, 2020 at 7:38 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Apr 27, 2020 at 1:22 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > > > > I agree with that definition. I can send a cleanup patch if there's > > > > no objection. > > > > > > > > > > Okay, feel free to send the patch. Thanks for taking the initiative > > > to write a patch for this. > > > > > > > Julien, are you planning to write a cleanup patch for this open item? > > Sorry Amit, I've been quite busy at work for the last couple of days. > I'll take care of that this morning for sure! Here's the patch. I included the content of v3-fix_explain_wal_output.patch you provided before, and tried to consistently replace full page writes/fpw to full page images/fpi everywhere on top of it (so documentation, command output, variable names and comments).
Attachment
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Thu, Apr 30, 2020 at 9:18 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Thu, Apr 30, 2020 at 5:05 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > Julien, are you planning to write a cleanup patch for this open item? > > > > Sorry Amit, I've been quite busy at work for the last couple of days. > > I'll take care of that this morning for sure! > > Here's the patch. > Thanks for the patch. I will look into it early next week. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > Here's the patch. I included the content of > v3-fix_explain_wal_output.patch you provided before, and tried to > consistently replace full page writes/fpw to full page images/fpi > everywhere on top of it (so documentation, command output, variable > names and comments). > Your patch looks mostly good to me. I have made slight modifications which include changing the non-text format in show_wal_usage to use a capital letter for the second word, which makes it similar to Buffer usage stats, and additionally, ran pgindent. Let me know what do you think of attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > Here's the patch. I included the content of > > v3-fix_explain_wal_output.patch you provided before, and tried to > > consistently replace full page writes/fpw to full page images/fpi > > everywhere on top of it (so documentation, command output, variable > > names and comments). > > > > Your patch looks mostly good to me. I have made slight modifications > which include changing the non-text format in show_wal_usage to use a > capital letter for the second word, which makes it similar to Buffer > usage stats, and additionally, ran pgindent. > > Let me know what do you think of attached? Thanks a lot Amit. It looks perfect to me!
On Mon, May 4, 2020 at 8:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > Here's the patch. I included the content of > > > v3-fix_explain_wal_output.patch you provided before, and tried to > > > consistently replace full page writes/fpw to full page images/fpi > > > everywhere on top of it (so documentation, command output, variable > > > names and comments). > > > > > > > Your patch looks mostly good to me. I have made slight modifications > > which include changing the non-text format in show_wal_usage to use a > > capital letter for the second word, which makes it similar to Buffer > > usage stats, and additionally, ran pgindent. > > > > Let me know what do you think of attached? > > Thanks a lot Amit. It looks perfect to me! > Pushed. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, May 5, 2020 at 12:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, May 4, 2020 at 8:03 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > On Mon, May 4, 2020 at 6:10 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Thu, Apr 30, 2020 at 2:19 PM Julien Rouhaud <rjuju123@gmail.com> wrote: > > > > > > > > Here's the patch. I included the content of > > > > v3-fix_explain_wal_output.patch you provided before, and tried to > > > > consistently replace full page writes/fpw to full page images/fpi > > > > everywhere on top of it (so documentation, command output, variable > > > > names and comments). > > > > > > > > > > Your patch looks mostly good to me. I have made slight modifications > > > which include changing the non-text format in show_wal_usage to use a > > > capital letter for the second word, which makes it similar to Buffer > > > usage stats, and additionally, ran pgindent. > > > > > > Let me know what do you think of attached? > > > > Thanks a lot Amit. It looks perfect to me! > > > > Pushed. Thanks!
On Wed, May 6, 2020 at 12:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > On Tue, May 5, 2020 at 12:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > Your patch looks mostly good to me. I have made slight modifications > > > > which include changing the non-text format in show_wal_usage to use a > > > > capital letter for the second word, which makes it similar to Buffer > > > > usage stats, and additionally, ran pgindent. > > > > > > > > Let me know what do you think of attached? > > > > > > Thanks a lot Amit. It looks perfect to me! > > > > > > > Pushed. > > Thanks! > I have updated the open items page to reflect this commit [1]. [1] - https://wiki.postgresql.org/wiki/PostgreSQL_13_Open_Items -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com