Thread: Progress reporting for pg_verify_checksums
Hi, my colleague Bernd Helmle recently added progress reporting to our pg_checksums application[1]. I have now forward ported this to pg_verify_checksums for the September commitfest, please see the attached patch. Here's the description: This optionally prints the progress of pg_verify_checksums via read kilobytes to the terminal with the new command line argument -P. If -P was forgotten and pg_verify_checksums operates on a large cluster, the caller can send SIGUSR1 to pg_verify_checksums to turn progress status reporting on during runtime. Michael [1] https://github.com/credativ/pg_checksums/commit/2b691 -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, > my colleague Bernd Helmle recently added progress reporting to our > pg_checksums application[1]. I have now forward ported this to > pg_verify_checksums for the September commitfest, please see the > attached patch. It seems that the patch does not apply cleanly on master, neither with "git apply" nor "patch". Could you rebase it? > Here's the description: > > This optionally prints the progress of pg_verify_checksums via read > kilobytes MB? GB? > to the terminal with the new command line argument -P. > > If -P was forgotten and pg_verify_checksums operates on a large cluster, > the caller can send SIGUSR1 to pg_verify_checksums to turn progress > status reporting on during runtime. Hmmm. I cannot say I like the signal feature much. Would it make sense for the progress to be on by default, and to have a quiet option instead? -- Fabien.
On 2018-Sep-01, Fabien COELHO wrote: > > If -P was forgotten and pg_verify_checksums operates on a large cluster, > > the caller can send SIGUSR1 to pg_verify_checksums to turn progress > > status reporting on during runtime. > > Hmmm. I cannot say I like the signal feature much. Would it make sense for > the progress to be on by default, and to have a quiet option instead? Hmm, I recall this technique being used elsewhere and is sometimes useful. Can't remember where though -- by manpages, it's not rsync nor pv ... How about making it a toggle? Default off, enable-able by option, toggleable by signal. (If you enable it via the signal, what's the rate to report at?) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, On Mon, Sep 03, 2018 at 11:21:32AM -0300, Alvaro Herrera wrote: > On 2018-Sep-01, Fabien COELHO wrote: > > > If -P was forgotten and pg_verify_checksums operates on a large cluster, > > > the caller can send SIGUSR1 to pg_verify_checksums to turn progress > > > status reporting on during runtime. > > > > Hmmm. I cannot say I like the signal feature much. Would it make sense for > > the progress to be on by default, and to have a quiet option instead? > > Hmm, I recall this technique being used elsewhere and is sometimes > useful. Can't remember where though -- by manpages, it's not rsync nor > pv ... e2fsck does it. And I think it would be useful for pg_basebackup as well, but that's for another time. In any case, pg_basebackup has the same option and it is off by default as well. I guess the rationale for not having it on by default for pg_basebackup is the fact is does not have any other output normally. pg_verify_checksums writes a report at the end however, but maybe that one could be demoted to verbose mode now that we have it? > How about making it a toggle? Default off, enable-able by option, > toggleable by signal. That sounds like a good idea. > (If you enable it via the signal, what's the rate to report at?) You mean how often it reports progress (that seems independent of the signal)? From testing, it's roughly every second or so. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Hi, Am Freitag, den 31.08.2018, 14:50 +0200 schrieb Michael Banck: > my colleague Bernd Helmle recently added progress reporting to our > pg_checksums application[1]. I have now forward ported this to > pg_verify_checksums for the September commitfest, please see the > attached patch. > > Here's the description: > > This optionally prints the progress of pg_verify_checksums via read > kilobytes to the terminal with the new command line argument -P. > > If -P was forgotten and pg_verify_checksums operates on a large cluster, > the caller can send SIGUSR1 to pg_verify_checksums to turn progress > status reporting on during runtime. Version 2 of the patch is attached. This is rebased to master as of 422952ee and changes the signal handling to be a toggle as suggested by Alvaro. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, >> This optionally prints the progress of pg_verify_checksums via read >> kilobytes to the terminal with the new command line argument -P. >> >> If -P was forgotten and pg_verify_checksums operates on a large cluster, >> the caller can send SIGUSR1 to pg_verify_checksums to turn progress >> status reporting on during runtime. > > Version 2 of the patch is attached. This is rebased to master as of > 422952ee and changes the signal handling to be a toggle as suggested by > Alvaro. Patch applies cleanly and compiles. About tests: "make check" is okay, but ITSM that the command is not started once, ever, in any test:-( It is unclear whether any test triggers checksums anyway... The time() granularity to the second makes the result awkward on small tests: 8/1554552 kB (0%, 8 kB/s) 1040856/1554552 kB (66%, 1040856 kB/s) 1554552/1554552 kB (100%, 1554552 kB/s) I'd suggest to reuse the "portability/instr_time.h" stuff which allows a much better precision. The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying 1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are about storage which is usually measured in powers of 1,000, so I'd suggest to use thousands. Another reserve I have is that on any hardware it is likely that there will be a lot of kilos, so maybe megas (MB) should be used instead. I'm wondering whether the bandwidth display should be local (from the last display) or global (from the start of the command), but for the last forced one. This is an open question. Maybe it would be nice to show elapsed time and expected completion time based on the current speed. I would be in favor or having this turned on by default, and silenced with some option. I'm not sure other people would agree, though, so it is an open question as well. About the code: + if (show_progress) + show_progress = false; + else + show_progress = true; Why not a simple: /* invert current state */ show_progress = !show_progress; I do not see much advantage in using intermediate string buffers for values: + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT, + total_size / 1024); + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT, + current_size / 1024); + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT, + (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started))); + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)", + currentstr, totalstr, total_percent, currspeedstr); ISTM that just one simple fprintf would be fine: fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...", formulas for each slot...); ISTM that this line overwriting trick deserves a comment: + if (isatty(fileno(stderr))) + fprintf(stderr, "\r"); + else + fprintf(stderr, "\n"); And it runs a little amok with "-v". + memset(&act, '\0', sizeof(act)); pg uses 0 instead of '\0' everywhere else. + /* Make sure we just report at least once a second */ + if ((now == last_progress_update) && !force) return; The comments seems contradictory, I understand the code makes sure that it is "at most" once a second. Pgbench uses "-P XXX" to strigger a progress report every XXX second. I'm unsure whether it would be useful to allow the user to change the 1 second display interval. I'm not sure why you removed one unrelated line: #include "storage/checksum.h" #include "storage/checksum_impl.h" - static int64 files = 0; static int64 blocks = 0; static int64 badblocks = 0; There is a problem in the scan_file code: The added sizeonly parameter is not used. It should be removed. -- Fabien.
Hi, thanks for the review! On Wed, Sep 19, 2018 at 05:17:05PM +0200, Fabien COELHO wrote: > >>This optionally prints the progress of pg_verify_checksums via read > >>kilobytes to the terminal with the new command line argument -P. > >> > >>If -P was forgotten and pg_verify_checksums operates on a large cluster, > >>the caller can send SIGUSR1 to pg_verify_checksums to turn progress > >>status reporting on during runtime. > > > >Version 2 of the patch is attached. This is rebased to master as of > >422952ee and changes the signal handling to be a toggle as suggested by > >Alvaro. > > Patch applies cleanly and compiles. > > About tests: "make check" is okay, but ITSM that the command is not started > once, ever, in any test:-( It is unclear whether any test triggers checksums > anyway... Yeah, this was discussed on another thread and some basic tap tests for pg_verify_checksums were proposed (also by me), but this hasn't been further addressed. Personally, I think this would be a good thing to add to v11 even. In any case, this particular feature might not be very easy to tap test, but I am open to suggestions, of course. > The time() granularity to the second makes the result awkward on small > tests: > > 8/1554552 kB (0%, 8 kB/s) > 1040856/1554552 kB (66%, 1040856 kB/s) > 1554552/1554552 kB (100%, 1554552 kB/s) > > I'd suggest to reuse the "portability/instr_time.h" stuff which allows a > much better precision. > > The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying > 1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are > about storage which is usually measured in powers of 1,000, so I'd suggest > to use thousands. > > Another reserve I have is that on any hardware it is likely that there will > be a lot of kilos, so maybe megas (MB) should be used instead. The display is exactly the same as in pg_basebackup (except the bandwith is shown as well), so right now I think it is more useful to be consistent here. So if we change that, pg_basebackup should be changed as well I think. Maybe the code could be factored out to some common file in the future. > I'm wondering whether the bandwidth display should be local (from the last > display) or global (from the start of the command), but for the last forced > one. This is an open question. > Maybe it would be nice to show elapsed time and expected completion time > based on the current speed. Maybe; this could be added to the factored out common code I mentioned above. > I would be in favor or having this turned on by default, and silenced with > some option. I'm not sure other people would agree, though, so it is an open > question as well. If this runs in a script, it would be pretty annoying, so everybody would have to add --no-progress so I am not convinced. Also, both pg_basebackup and pgbench don't show progress by default. > About the code: > > + if (show_progress) > + show_progress = false; > + else > + show_progress = true; > > Why not a simple: > > /* invert current state */ > show_progress = !show_progress; Fair enough. > I do not see much advantage in using intermediate string buffers for values: > > + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT, > + total_size / 1024); > + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT, > + current_size / 1024); > + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT, > + (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started))); > + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)", > + currentstr, totalstr, total_percent, currspeedstr); > > ISTM that just one simple fprintf would be fine: > > fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...", > formulas for each slot...); That code is adopted from pg_basebackup.c and the comment there says: | * Separate step to keep platform-dependent format code out of | * translatable strings. And we only test for INT64_FORMAT | * availability in snprintf, not fprintf. So that sounds legit to me. > ISTM that this line overwriting trick deserves a comment: > > + if (isatty(fileno(stderr))) > + fprintf(stderr, "\r"); > + else > + fprintf(stderr, "\n"); I agree a comment should be in there. But that code is also taken verbatim from pg_basebackup.c (but this time, there's no comment there, either). How about this: | * If we are reporting to a terminal, send a carriage return so that | * we stay on the same line. If not, send a newline. > And it runs a little amok with "-v". Do you suggest we should make those mutually exlusive? I agree that in general, -P is not very useful if -v is on, but if you have a really big table, it should still be, no? > + memset(&act, '\0', sizeof(act)); > > pg uses 0 instead of '\0' everywhere else. Ok. > + /* Make sure we just report at least once a second */ > + if ((now == last_progress_update) && !force) return; > > The comments seems contradictory, I understand the code makes sure that it > is "at most" once a second. I guess you're right as the identical code in pg_basebackup.c has a comment saying "Max once per second". > Pgbench uses "-P XXX" to strigger a progress > report every XXX second. I'm unsure whether it would be useful to allow the > user to change the 1 second display interval. I think pgbench writes a new line for each report? In that case, having it every second for longer runs might be annoying and I can see the point in having in configurable. In the case of pg_basebackup/pg_verify_checksums, I guess 1 second is fine. > I'm not sure why you removed one unrelated line: > > #include "storage/checksum.h" > #include "storage/checksum_impl.h" > > - > static int64 files = 0; > static int64 blocks = 0; > static int64 badblocks = 0; Merge error on my side I guess, will fix. > There is a problem in the scan_file code: The added sizeonly parameter is > not used. It should be removed. Right, this might have been a leftover from an earlier version of the code, I'll check back with Bernd to make sure that was not a porting/merge error on my side. I've attached V3 of the patch, addressing some of your comments. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
On Tue, Sep 4, 2018 at 2:21 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > On 2018-Sep-01, Fabien COELHO wrote: > > > If -P was forgotten and pg_verify_checksums operates on a large cluster, > > > the caller can send SIGUSR1 to pg_verify_checksums to turn progress > > > status reporting on during runtime. > > > > Hmmm. I cannot say I like the signal feature much. Would it make sense for > > the progress to be on by default, and to have a quiet option instead? > > Hmm, I recall this technique being used elsewhere and is sometimes > useful. Can't remember where though -- by manpages, it's not rsync nor > pv ... On BSD-derived systems including macOS, you can hit ^T to deliver SIGINFO to the foreground process (much like ^C delivers SIGINT). Many well behaved programs including rsync (at least on FreeBSD) then show you some kind of progress or status report: munro@asterix $ sleep 60 load: 0.11 cmd: sleep 62234 [nanslp] 1.22r 0.00u 0.00s 0% 2032k sleep: about 58 second(s) left out of the original 60 I've contemplated various crackpot ideas about teaching psql to show information about what my backend is doing when I hit ^T (perhaps via a second connection, similar to the way it cancels queries?)... There was an interesting thread[1] about dumping various things from backends on (multiplexed) SIGUSR1. Also if memory serves, JVMs also dump internal stuff when you signal them, so you can confirm that they really did eat all 64GB of your RAM. But none of these things are examples of enabling/disabling an on-going status reporting mode, which is perhaps what you meant, they're more like one-off pokes to dump information. [1] https://www.postgresql.org/message-id/flat/20140623101501.GN16260%40awork2.anarazel.de -- Thomas Munro http://www.enterprisedb.com
Hallo Michael, About patch v3: applies cleanly and compiles. The xml documentation should be updated! (It is kind of hard to notice what is not there:-) See "doc/src/sgml/ref/pg_verify_checksums.sgml". >> The time() granularity to the second makes the result awkward on small >> tests: >> >> 8/1554552 kB (0%, 8 kB/s) >> 1040856/1554552 kB (66%, 1040856 kB/s) >> 1554552/1554552 kB (100%, 1554552 kB/s) >> >> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a >> much better precision. I still think it would make sense to use that instead of low-precision time(). >> The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying >> 1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are >> about storage which is usually measured in powers of 1,000, so I'd suggest >> to use thousands. >> >> Another reserve I have is that on any hardware it is likely that there will >> be a lot of kilos, so maybe megas (MB) should be used instead. > > The display is exactly the same as in pg_basebackup (except the > bandwith is shown as well), so right now I think it is more useful to be > consistent here. Hmmm... units are units, and the display is wrong. The fact that other commands are wrong as well does not look like a good argument to persist in an error. > So if we change that, pg_basebackup should be changed as well I think. I'm okay with fixing pg_basebackup as well! I'm unsure about the best place to put such a function, though. Probably under "src/common/" where there is already some front-end specific code ("fe_memutils.c"). > Maybe the code could be factored out to some common file in the future. I would be okay with a progress printing function shared between some commands. It just needs some place. pg_rewind also has a --rewind option. >> Maybe it would be nice to show elapsed time and expected completion time >> based on the current speed. > > Maybe; this could be added to the factored out common code I mentioned > above. Yep. >> I would be in favor or having this turned on by default, and silenced with >> some option. I'm not sure other people would agree, though, so it is an open >> question as well. > > If this runs in a script, it would be pretty annoying, so everybody > would have to add --no-progress so I am not convinced. Also, both > pg_basebackup and pgbench don't show progress by default. > Ok. >> I do not see much advantage in using intermediate string buffers for values: >> >> + snprintf(totalstr, sizeof(totalstr), INT64_FORMAT, >> + total_size / 1024); >> + snprintf(currentstr, sizeof(currentstr), INT64_FORMAT, >> + current_size / 1024); >> + snprintf(currspeedstr, sizeof(currspeedstr), INT64_FORMAT, >> + (current_size / 1024) / (((time(NULL) - scan_started) == 0) ? 1 : (time(NULL) - scan_started))); >> + fprintf(stderr, "%s/%s kB (%d%%, %s kB/s)", >> + currentstr, totalstr, total_percent, currspeedstr); >> >> ISTM that just one simple fprintf would be fine: >> >> fprintf(stderr, INT64_FORMAT "/" INT64_FORMAT " ...", >> formulas for each slot...); > > That code is adopted from pg_basebackup.c and the comment there says: > > | * Separate step to keep platform-dependent format code out of > | * translatable strings. And we only test for INT64_FORMAT > | * availability in snprintf, not fprintf. > > So that sounds legit to me. Hmmm. Translation. Not sure I like much the idea of translating units, but why not. > | * If we are reporting to a terminal, send a carriage return so that > | * we stay on the same line. If not, send a newline. > >> And it runs a little amok with "-v". > > Do you suggest we should make those mutually exlusive? I agree that in > general, -P is not very useful if -v is on, but if you have a really big > table, it should still be, no? Yep. I was just mentionning that they interfere on a terminal, but I agree that there is no obvious fix. >> + memset(&act, '\0', sizeof(act)); >> >> pg uses 0 instead of '\0' everywhere else. > > Ok. Not '0', plain 0 (the integer of value zero). -- Fabien.
Hi, On Wed, Sep 26, 2018 at 10:46:06AM +0200, Fabien COELHO wrote: > The xml documentation should be updated! (It is kind of hard to notice what > is not there:-) > > See "doc/src/sgml/ref/pg_verify_checksums.sgml". Right, I've added a paragraph. > >>The time() granularity to the second makes the result awkward on small > >>tests: > >> > >> 8/1554552 kB (0%, 8 kB/s) > >> 1040856/1554552 kB (66%, 1040856 kB/s) > >> 1554552/1554552 kB (100%, 1554552 kB/s) > >> > >>I'd suggest to reuse the "portability/instr_time.h" stuff which allows a > >>much better precision. > > I still think it would make sense to use that instead of low-precision > time(). As the use-case of this is not for small tests, I don't think it is useful to make the code more complicated for this. > >>The "kB" kilo-byte symbols stands for 1000 bytes, but you are displaying > >>1024-bytes kilos, which symbol is either "KiB" or "KB". Given that we are > >>about storage which is usually measured in powers of 1,000, so I'd suggest > >>to use thousands. > >> > >>Another reserve I have is that on any hardware it is likely that there will > >>be a lot of kilos, so maybe megas (MB) should be used instead. > > > >The display is exactly the same as in pg_basebackup (except the > >bandwith is shown as well), so right now I think it is more useful to be > >consistent here. > > Hmmm... units are units, and the display is wrong. The fact that other > commands are wrong as well does not look like a good argument to persist in > an error. I've had a look around, and "kB" seems to be a common unit for 1024 bytes, e.g. in pg_test_fsync etc., unless I am mistaken? > >So if we change that, pg_basebackup should be changed as well I think. > > I'm okay with fixing pg_basebackup as well! I'm unsure about the best place > to put such a function, though. Probably under "src/common/" where there is > already some front-end specific code ("fe_memutils.c"). > > >Maybe the code could be factored out to some common file in the future. > > I would be okay with a progress printing function shared between some > commands. It just needs some place. pg_rewind also has a --rewind option. I guess you mean pg_rewind also has a --progress option. I do agree it makes sense to refactor that, but I don't think this should be part of this patch. > >> + memset(&act, '\0', sizeof(act)); > >> > >>pg uses 0 instead of '\0' everywhere else. > > > >Ok. > > Not '0', plain 0 (the integer of value zero). Oops, thanks for spotting that. I've attached v4 of the patch. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
>>>> The time() granularity to the second makes the result awkward on small >>>> tests: >>>> >>>> 8/1554552 kB (0%, 8 kB/s) >>>> 1040856/1554552 kB (66%, 1040856 kB/s) >>>> 1554552/1554552 kB (100%, 1554552 kB/s) >>>> >>>> I'd suggest to reuse the "portability/instr_time.h" stuff which allows a >>>> much better precision. >> >> I still think it would make sense to use that instead of low-precision >> time(). > > As the use-case of this is not for small tests, Even for a long run, the induce error by the 1 second imprecision on both points is significant at the beginning of the scan. > I don't think it is useful to make the code more complicated for this. I do not think that it would be much more complicated to use the portability macros to get a precise time. >>> The display is exactly the same as in pg_basebackup (except the >>> bandwith is shown as well), so right now I think it is more useful to be >>> consistent here. >> >> Hmmm... units are units, and the display is wrong. The fact that other >> commands are wrong as well does not look like a good argument to persist in >> an error. > > I've had a look around, and "kB" seems to be a common unit for 1024 > bytes, e.g. in pg_test_fsync etc., unless I am mistaken? I can only repeat my above comment: the fact that postgres is wrong elsewhere is not a good reason to persist in reproducing an error. See the mother of all knowledge https://en.wikipedia.org/wiki/Kilobyte - SI (decimal, 1000): kB, MB, GB, ... - IEC (binary 1024): KiB, MiB, GiB, ... - JEDEC (binary 1024, used for memory): KB, MB, GB. Summary: - 1 kB = 1000 bytes - 1 KB = 1 KiB = 1024 bytes Decimal is used for storage (HDD, SSD), binary for memory. That is life, and IMHO Postgres code is not the place to invent new units. Moreover, I still think that the progress should use MB defined as 1000000 bytes for showing the progress. >> I would be okay with a progress printing function shared between some >> commands. It just needs some place. pg_rewind also has a --rewind option. > > I guess you mean pg_rewind also has a --progress option. Indeed. > I do agree it makes sense to refactor that, but I don't think this > should be part of this patch. That's a point. I'd suggest to put the new progress function directly in the common part, and other patches could take advantage of it for other commands when someone feels like it. Other comments: function toggle_progress_report has empty lines after { and before }, but this style is not used elsewhere, I think that they should be removed. The report_progress API should be ready to be used by another client application, which suggest that global variables should be avoided. Maybe: void report_progress(current, total, force) and the scan start and last times could be a static variable inside the function initialized on the first call, which would suggest to call the function at the beginning of the scan, probably with current == 0. Or maybe: time_type report_progress(current, total, start, last, force) Which would return the last time, and the caller has responsability for initializing a start time. -- Fabien.
On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <michael.banck@credativ.de> wrote: > I've attached v4 of the patch. Hi Michael, Windows doesn't like sigaction: https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189 I'm not sure if we classify this as a "frontend" program. Should it be using pqsignal() from src/port/pqsignal.c? Or perhaps just sigaction as you have it (pqsignal.c says that we require sigaction on all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is never going to work anyway. -- Thomas Munro http://www.enterprisedb.com
> On Wed, Oct 3, 2018 at 12:51 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > > On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <michael.banck@credativ.de> wrote: > > I've attached v4 of the patch. > > Hi Michael, > > Windows doesn't like sigaction: > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189 > > I'm not sure if we classify this as a "frontend" program. Should it > be using pqsignal() from src/port/pqsignal.c? Or perhaps just > sigaction as you have it (pqsignal.c says that we require sigaction on > all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is > never going to work anyway. Unfortunately, patch also has some conflicts with the current master. I'll move it to the next CF.
Hi, On Wed, Oct 03, 2018 at 11:50:36AM +1300, Thomas Munro wrote: > On Sat, Sep 29, 2018 at 1:07 AM Michael Banck <michael.banck@credativ.de> wrote: > Windows doesn't like sigaction: > > https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.15189 Thanks for the report and sorry for taking so long to reply. > I'm not sure if we classify this as a "frontend" program. Should it > be using pqsignal() from src/port/pqsignal.c? Or perhaps just > sigaction as you have it (pqsignal.c says that we require sigaction on > all Unices), but #ifndef WIN32 around that stuff, since SIGUSR1 is > never going to work anyway. I've used pqsignal now, and disabled that feature on Windows. I've also updated one comment and added an additional regression test. V5 attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, > V5 attached. Patch applies cleanly, compiles, global & local make check are ok. Given that the specific output is not checked, I do not think that the -P check deserves a test on its own, I think that the -P option could simply be added to any of the existing tests. I'm still unhappy that "kB" is used for 1024 bytes in the output, contrary to all existing standards (1 kB = 1000 bytes -- SI, 1 KB = 1 KiB = 1024 bytes -- IEC & JEDEC). The fact that this is also used wrongly elsewhere in pg is not relevant, these are bugs to be fixed, not to be replicated. Given the speed of verifying checksums and its storage-oriented status, I also still think that a (possibly fractional) MB (1,000,000 bytes), or even GB, is the right unit to use for reporting this progress. On my laptop (SSD), verifying runs at least at 1.26 GB/s (on one small test), there is no point in displaying kilobytes progress. I still think that using a more precise time than time(), eg with existing macros from "instr_time.h", would not cost anything more and result in a better precision output. It would also allow to remove the check used to avoid a division-by-zero by switching to double. If the check is performed while online (other patch in queue), then the size may change thus it may not reach or go beyond 100%. No big deal. I'd consider inverting the sizeonly boolean, so that true does the check and false does only the size collection. It seems more logical to me if it performs more with true than with false. -- Fabien.
> Given the speed of verifying checksums and its storage-oriented status, I > also still think that a (possibly fractional) MB (1,000,000 bytes), or even > GB, is the right unit to use for reporting this progress. On my laptop (SSD), > verifying runs at least at 1.26 GB/s (on one small test), there is no point > in displaying kilobytes progress. Obviously the file is cached by the system at such speed, but still most disks should provides dozens of MB per second of read bandwidth. If GB is used, it should use fractional display (eg 1.25 GB) though. -- Fabien.
Hi, Am Dienstag, den 25.12.2018, 12:12 +0100 schrieb Fabien COELHO: > > Given the speed of verifying checksums and its storage-oriented status, I > > also still think that a (possibly fractional) MB (1,000,000 bytes), or even > > GB, is the right unit to use for reporting this progress. On my laptop (SSD), > > verifying runs at least at 1.26 GB/s (on one small test), there is no point > > in displaying kilobytes progress. > > Obviously the file is cached by the system at such speed, but still most > disks should provides dozens of MB per second of read bandwidth. If GB is > used, it should use fractional display (eg 1.25 GB) though. I think MB indeed makes more sense than kB, so I have changed that now in V7, per attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
> I think MB indeed makes more sense than kB, so I have changed that now > in V7, per attached. You use 1024² bytes. What about 1000000 bytes per MB, as the unit is about stored files? Also, you did not answer to my other points: - use "instr_time.h" for better precision - invert sizeonly - reuse a test -- Fabien.
On Tue, Dec 25, 2018 at 07:05:30PM +0100, Fabien COELHO wrote: > You use 1024² bytes. What about 1000000 bytes per MB, as the unit is about > stored files? > > Also, you did not answer to my other points: > - use "instr_time.h" for better precision > - invert sizeonly > - reuse a test It seems to me that the SIGUSR1 business is not necessary for the basic layer of this feature, so I would rather rip that off now. If necessary we could always discuss that later on. My take about the option is that --progress should not be the default, but that reports should only be provided if the caller wants them. --quiet may have some value by itself, but that seems like a separate discussion to me. + /* + * If we are reporting to a terminal, send a carriage return so that we + * stay on the same line. If not, send a newline. + */ + if (isatty(fileno(stderr))) + fprintf(stderr, "\r"); + else + fprintf(stderr, "\n"); This bit is not really elegant, why not just '\r'? + /* The same for total size */ + if (current_size > total_size) + total_size = current_size / 1048576; Let's use that in a variable and not hardcode the number. What's introduced here is very similar to what progress_report() does in pg_rewind/logging.c. The report depends on the context so this cannot be a common routine logic but perhaps we could keep a consistent output for both tools? -- Michael
Attachment
On 2018-Dec-26, Michael Paquier wrote: > + /* > + * If we are reporting to a terminal, send a carriage return so that we > + * stay on the same line. If not, send a newline. > + */ > + if (isatty(fileno(stderr))) > + fprintf(stderr, "\r"); > + else > + fprintf(stderr, "\n"); > This bit is not really elegant, why not just '\r'? Umm, this is established coding pattern in pg_basebackup.c. Stylistically I'd change all those cases to "fprintf(stderr, isatty(fileno(stderr)) ? "\r" : "\n")" but leave the string alone, since AFAIR it took some time to figure out what to do. (I'd also make the comment one line instead of four, say "Stay on the same line if reporting to a terminal". That makes the whole stanza two lines rather than eight, which is the appropriate amount of space for it). -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Dec 25, 2018 at 11:45:09PM -0300, Alvaro Herrera wrote: > Umm, this is established coding pattern in pg_basebackup.c. > Stylistically I'd change all those cases to "fprintf(stderr, > isatty(fileno(stderr)) ? "\r" : "\n")" but leave the string alone, since > AFAIR it took some time to figure out what to do. (I'd also make the > comment one line instead of four, say "Stay on the same line if > reporting to a terminal". That makes the whole stanza two lines rather > than eight, which is the appropriate amount of space for it). There has been no input from the author for a couple of weeks now, so I have marked the patch as returned with feedback. -- Michael
Attachment
Hi, Am Mittwoch, den 26.12.2018, 11:14 +0900 schrieb Michael Paquier: > On Tue, Dec 25, 2018 at 07:05:30PM +0100, Fabien COELHO wrote: > > You use 1024² bytes. What about 1000000 bytes per MB, as the > > unit is about stored files? I haven't changed that as Ihave not been pointed to a clear project guideline that 1 MB should be 1000000 and not 1024x1024. > > Also, you did not answer to my other points: > > - use "instr_time.h" for better precision Both pg_rewind and pg_basebackup are using time() as well, so I think there's precedence for this. > > - invert sizeonly The current way is consistent with src/backend/replication/basebackup.c so I'd prefer to keep it that way. > > - reuse a test I've removed the test. > It seems to me that the SIGUSR1 business is not necessary for the > basic layer of this feature, so I would rather rip that off now. But is it gaining all so much if it is ripped out? The patch will be a dozen lines shorter, but those are all isolated. And contrary to pg_basebackup, pg_verify_checksums might be run in the foreground much more often - right now it has to be done while the cluster is offline, so if you have large instance and you forgot to pass -P then you're glad you can signal it so you roughly know when you can expect to startup the instance again. > If necessary we could always discuss that later on. If you insist we can rip it out, of course. > My take about the option is that --progress should not be the default, > but that reports should only be provided if the caller wants them. > > --quiet may have some value by itself, but that seems like a separate > discussion to me. Right now --progress isn't default so I think that's how you like it? > + /* > + * If we are reporting to a terminal, send a carriage return so that we > + * stay on the same line. If not, send a newline. > + */ > + if (isatty(fileno(stderr))) > + fprintf(stderr, "\r"); > + else > + fprintf(stderr, "\n"); > This bit is not really elegant, why not just '\r'? I've changed it as Alvaro suggested. > + /* The same for total size */ > + if (current_size > total_size) > + total_size = current_size / 1048576; > Let's use that in a variable and not hardcode the number. Done so, I called it MEGABYTES. > What's introduced here is very similar to what progress_report() does > in pg_rewind/logging.c. The report depends on the context so this > cannot be a common routine logic but perhaps we could keep a > consistent output for both tools? Well, pg_rewind is also using kB, so should I switch it back to that? It looks like the progress reporting is otherwise quite similar, so what exactly did you have in mind? New patch attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
On Sun, Feb 17, 2019 at 09:00:29PM +0100, Michael Banck wrote: > Well, pg_rewind is also using kB, so should I switch it back to that? I am not sure which one is better, sorry :) There is an argument for switching pg_rewind to use MB as well. I don't expect pg_rewind to transfer gigs of data, but it could make the maths harder for a newcomer if it transfers a lot of data. > It looks like the progress reporting is otherwise quite similar, so what > exactly did you have in mind? I was thinking about a routine in src/common/ or such at this time. The tools have separate goals though, so it looks sensible as well to keep two routines. I am fine to let that up to you. -- Michael
Attachment
Hi, Am Montag, den 18.02.2019, 12:24 +0900 schrieb Michael Paquier: > On Sun, Feb 17, 2019 at 09:00:29PM +0100, Michael Banck wrote: > > Well, pg_rewind is also using kB, so should I switch it back to that? > > I am not sure which one is better, sorry :) > > There is an argument for switching pg_rewind to use MB as well. I > don't expect pg_rewind to transfer gigs of data, but it could make the > maths harder for a newcomer if it transfers a lot of data. Right, I think it makes more sense for pg_basebackup and pg_verify_checksums as they go over the whole cluster than for pg_rewind, but being consistent is also worth something. > > It looks like the progress reporting is otherwise quite similar, so what > > exactly did you have in mind? > > I was thinking about a routine in src/common/ or such at this time. > The tools have separate goals though, so it looks sensible as well to > keep two routines. I am fine to let that up to you. I'll leave it as-is for now, but will see whether this can be improved in a follow-up patch. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On 2019-Feb-17, Michael Banck wrote: > + /* > + * As progress status information may be requested, we need to scan the > + * directory tree(s) twice, once to get the idea how much data we need to > + * scan and finally to do the real legwork. > + */ > + total_size = scan_directory(DataDir, "global", true); > + total_size += scan_directory(DataDir, "base", true); > + total_size += scan_directory(DataDir, "pg_tblspc", true); Surely we know at that point whether this first scan is needed, and we can skip it if not? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Am Montag, den 18.02.2019, 11:18 -0300 schrieb Alvaro Herrera: > On 2019-Feb-17, Michael Banck wrote: > > + /* > > + * As progress status information may be requested, we need to scan the > > + * directory tree(s) twice, once to get the idea how much data we need to > > + * scan and finally to do the real legwork. > > + */ > > + total_size = scan_directory(DataDir, "global", true); > > + total_size += scan_directory(DataDir, "base", true); > > + total_size += scan_directory(DataDir, "pg_tblspc", true); > > Surely we know at that point whether this first scan is needed, and we > can skip it if not? Yeah - new patch attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Am Montag, den 18.02.2019, 16:52 +0100 schrieb Michael Banck: > > Surely we know at that point whether this first scan is needed, and > we > > can skip it if not? > > Yeah - new patch attached. Maybe i'm wrong, but my thought is that this breaks the SIGUSR1 business, since there seems no code path which calculates total_size in this case? Bernd
Attachment
On 2019-Feb-18, Bernd Helmle wrote: > Am Montag, den 18.02.2019, 16:52 +0100 schrieb Michael Banck: > > > Surely we know at that point whether this first scan is needed, and > > we > > > can skip it if not? > > > > Yeah - new patch attached. > > Maybe i'm wrong, but my thought is that this breaks the SIGUSR1 > business, since there seems no code path which calculates total_size in > this case? Oh, yeah, it does. In that case, a comment explaining that is needed. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi, Am Montag, den 18.02.2019, 13:42 -0300 schrieb Alvaro Herrera: > On 2019-Feb-18, Bernd Helmle wrote: > > > Am Montag, den 18.02.2019, 16:52 +0100 schrieb Michael Banck: > > > > Surely we know at that point whether this first scan is needed, and > > > > we can skip it if not? > > > > > > Yeah - new patch attached. > > > > Maybe i'm wrong, but my thought is that this breaks the SIGUSR1 > > business, since there seems no code path which calculates total_size in > > this case? > > Oh, yeah, it does. In that case, a comment explaining that is needed. New patch attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, > New patch attached. Patch applies cleanly. Compiles, "make check" ok. doc build is also ok. There are no tests, which is probably fine for such an interactive feature. Docs look okay to me. Clear and to the point. About : total_percent = total_size ? (int64) ((current_size / MEGABYTES) * 100 / (total_size / MEGABYTES)) : 0; MEGABYTES can be simplified and will enhance precision. ISTM that the percent could be a double: total_percent = total_size ? 100.0 * current_size / total_size : 0.0 ; and then just let fprintf do the rounding. I would bother rounding down < 100% to 100, because then you would get 1560/1492 MB (100\%, X MB/s) which is kind of silly. If it goes over it just mean that the initial estimate was wrong, which might happen if the db is running (other patch in the queue). I'm fine with over 100 if it is consistent with the MB ratio next to it. fprintf could be called once instead of twice by merging the eol and the previous message. Maybe I'd consider inverting the sizeonly boolean so that more is done when true. I'd still would use more precise than one second precision time so that the speed displayed at the beginning is more realistic. eg I got on a short test with probably in system cache files: 188/188 MB (100%, 188 MB/s) but the reality is that it took 0.126 seconds, and the speed was really 1492 MB/s. -- Fabien.
Hi, Am Dienstag, den 19.02.2019, 16:37 +0100 schrieb Fabien COELHO > > About : > > total_percent = total_size ? (int64) ((current_size / MEGABYTES) * 100 / (total_size / MEGABYTES)) : 0; > > MEGABYTES can be simplified and will enhance precision. ISTM that the > percent could be a double: > > total_percent = total_size ? 100.0 * current_size / total_size : 0.0 ; > > and then just let fprintf do the rounding. Ok, done so. > I would bother rounding down < 100% to 100, because then you would get > > 1560/1492 MB (100\%, X MB/s) > > which is kind of silly. No, we cap the total_size to current_size so you won't see that (but total_size will potentially gradually increase). pg_basebackup has the same behaviour. > I'd still would use more precise than one second precision time so that > the speed displayed at the beginning is more realistic. eg I got on a > short test with probably in system cache files: > > 188/188 MB (100%, 188 MB/s) > > but the reality is that it took 0.126 seconds, and the speed was really > 1492 MB/s. Because I implemented I/O throttling for pg_checksums and needed it there anyway, I've switched it to the instr_time API now. It now updates the progress 10 times every second. So now I get |20/20 MB (100%, 1382 MB/s) for an empty cluster vs |20/20 MB (100%, 20 MB/s) with the previous version of the patch. New patch attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, >> I would bother rounding down < 100% to 100, because then you would get >> >> 1560/1492 MB (100\%, X MB/s) >> >> which is kind of silly. > > No, we cap the total_size to current_size so you won't see that (but > total_size will potentially gradually increase). pg_basebackup has the > same behaviour. Ok. > Because I implemented I/O throttling for pg_checksums Great! > New patch attached. Does not apply because of the renaming committed by Michaël. Could you rebase? -- Fabien.
On Wed, Mar 13, 2019 at 07:22:28AM +0100, Fabien COELHO wrote: > Does not apply because of the renaming committed by Michaël. > > Could you rebase? This stuff touches pg_checksums.c, so you may want to wait one day or two to avoid extra work... I think that I'll be able to finish the addition of the --enable and --disable switches by then. -- Michael
Attachment
Hello. At Wed, 13 Mar 2019 16:25:15 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190313072515.GB2988@paquier.xyz> > On Wed, Mar 13, 2019 at 07:22:28AM +0100, Fabien COELHO wrote: > > Does not apply because of the renaming committed by Michaël. > > > > Could you rebase? > > This stuff touches pg_checksums.c, so you may want to wait one day or > two to avoid extra work... I think that I'll be able to finish the > addition of the --enable and --disable switches by then. > + /* Make sure we report at most once every tenth of a second */ > + if ((INSTR_TIME_GET_MILLISEC(now) - > + INSTR_TIME_GET_MILLISEC(last_progress_update) < 100) && !force) I'm not a skilled gamer and 10Hz update is a bit hard for my eyes:p The second hand of old clocks ticks at 4Hz. I think it is enough frequently. > 841/1516 MB (55%, 700 MB/s) Differently from other prgress reporting project, estimating ETC (estimated time to completion), which is in the most important, is quite easy. (pgbench does that.) And I'd like to see a progress bar there but it might be overdoing. I'm not sure let the current size occupy a part of screen width is needed. I don't think total size is needed to be fixed in MB and I think at most four or five digits are enough. (That is the same for speed.) If the all of aboves are involved, the line would look as the follows. [======================= ] ( 63% of 12.53 GB, 179 MB/s, ETC 26s) # Note that this is just an opinion. (pg_checksum runs fast at the beginning so ETC behaves somewhat strange in the meanwhile.) > +#define MEGABYTES 1048576 1024 * 1024 is preferable than that many digits. > + /* we handle SIGUSR1 only, and toggle the value of show_progress */ > + if (signum == SIGUSR1) > + show_progress = !show_progress; SIGUSR1 *toggles* progress. A garbage line is left alone after turning it off. It would be erasable. I'm not sure which is better, though. > > @@ -167,7 +255,7 @@ scan_directory(const char *basedir, const char *subdir) > if (strncmp(de->d_name, > PG_TEMP_FILES_DIR, > strlen(PG_TEMP_FILES_DIR)) == 0) > - return; > + continue; Why this patch changes the behavior for temprary directories? It seems like a bug fix of pg_checksums. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Mar 14, 2019 at 11:54:17AM +0900, Kyotaro HORIGUCHI wrote: > Why this patch changes the behavior for temprary directories? It > seems like a bug fix of pg_checksums. Oops, that's a thinko from 5c995139, so fixed. Note this has no actual consequence though as PG_TEMP_FILE_PREFIX and PG_TEMP_FILES_DIR have the same value. However a bug is a bug. -- Michael
Attachment
Hi, thanks for the additional review! Am Donnerstag, den 14.03.2019, 11:54 +0900 schrieb Kyotaro HORIGUCHI: > At Wed, 13 Mar 2019 16:25:15 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20190313072515.GB2988@paquier.xyz> > > On Wed, Mar 13, 2019 at 07:22:28AM +0100, Fabien COELHO wrote: > > > Does not apply because of the renaming committed by Michaël. > > > > > > Could you rebase? > > > > This stuff touches pg_checksums.c, so you may want to wait one day or > > two to avoid extra work... I think that I'll be able to finish the > > addition of the --enable and --disable switches by then. I have rebased it now. > > + /* Make sure we report at most once every tenth of a second */ > > + if ((INSTR_TIME_GET_MILLISEC(now) - > > + INSTR_TIME_GET_MILLISEC(last_progress_update) < 100) && !force) > > I'm not a skilled gamer and 10Hz update is a bit hard for my > eyes:p The second hand of old clocks ticks at 4Hz. I think it is > enough frequently. Ok. > > 841/1516 MB (55%, 700 MB/s) > > Differently from other prgress reporting project, estimating ETC > (estimated time to completion), which is in the most important, > is quite easy. (pgbench does that.) And I'd like to see a > progress bar there but it might be overdoing. I'm not sure let > the current size occupy a part of screen width is needed. I > don't think total size is needed to be fixed in MB and I think at > most four or five digits are enough. (That is the same for > speed.) > > If the all of aboves are involved, the line would look as the > follows. > > [======================= ] ( 63% of 12.53 GB, 179 MB/s, ETC 26s) > > # Note that this is just an opinion. > > (pg_checksum runs fast at the beginning so ETC behaves somewhat > strange in the meanwhile.) I haven't changed that for now as it seems to be a bit more involved. I'd like to hear other opinions on whether that is worthwhile? > > +#define MEGABYTES 1048576 > > 1024 * 1024 is preferable than that many digits. Good point, changed that way. > > + /* we handle SIGUSR1 only, and toggle the value of show_progress */ > > + if (signum == SIGUSR1) > > + show_progress = !show_progress; > > SIGUSR1 *toggles* progress. Not sure what you mean here, are you saying it should be called toggle_progress and not show_progress? I think it was like that originally but got changed due to review comments. > A garbage line is left alone after turning it off. It would be > erasable. I'm not sure which is better, though. How about just printing a "\n" in the signal handler if we are in a terminal? I think erasing the line might get too clever and will be a portability hazard? I have done that now in the attached patch, let me know what you think. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
> I have rebased it now. Thanks. Will look at it. >> If the all of aboves are involved, the line would look as the >> follows. >> >> [======================= ] ( 63% of 12.53 GB, 179 MB/s, ETC 26s) >> >> # Note that this is just an opinion. >> >> (pg_checksum runs fast at the beginning so ETC behaves somewhat >> strange in the meanwhile.) > > I haven't changed that for now as it seems to be a bit more involved. > I'd like to hear other opinions on whether that is worthwhile? I think that the bar is overkill, but ETC is easy and nice. >>> + /* we handle SIGUSR1 only, and toggle the value of show_progress */ >>> + if (signum == SIGUSR1) >>> + show_progress = !show_progress; >> >> SIGUSR1 *toggles* progress. > > Not sure what you mean here, Probably it is meant to simplify the comment? -- Fabien.
Hello. At Mon, 18 Mar 2019 23:14:01 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.21.1903182311130.23282@lancre> > > > I have rebased it now. > > Thanks. Will look at it. > > >> If the all of aboves are involved, the line would look as the > >> follows. > >> > >> [======================= ] ( 63% of 12.53 GB, 179 MB/s, > >> ETC 26s) > >> > >> # Note that this is just an opinion. > >> > >> (pg_checksum runs fast at the beginning so ETC behaves somewhat > >> strange in the meanwhile.) > > > > I haven't changed that for now as it seems to be a bit more involved. > > I'd like to hear other opinions on whether that is worthwhile? > I think that the bar is overkill, but ETC is easy and nice. I'm fine with that. > >>> + /* we handle SIGUSR1 only, and toggle the value of show_progress > >>> */ > >>> + if (signum == SIGUSR1) > >>> + show_progress = !show_progress; > >> > >> SIGUSR1 *toggles* progress. > > > > Not sure what you mean here, > > Probably it is meant to simplify the comment? Sorry. I meant that "it can be turned off and a perhaps-garbage line is left alone after turning off. Don't we need to erase it?". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, Am Dienstag, den 19.03.2019, 09:04 +0900 schrieb Kyotaro HORIGUCHI: > At Mon, 18 Mar 2019 23:14:01 +0100 (CET), Fabien COELHO <coelho@cri.ensmp.fr> wrote in <alpine.DEB.2.21.1903182311130.23282@lancre> > > >>> + /* we handle SIGUSR1 only, and toggle the value of show_progress > > >>> */ > > >>> + if (signum == SIGUSR1) > > >>> + show_progress = !show_progress; > > >> > > >> SIGUSR1 *toggles* progress. > > > > > > Not sure what you mean here, > > > > Probably it is meant to simplify the comment? > > Sorry. I meant that "it can be turned off and a perhaps-garbage > line is left alone after turning off. Don't we need to erase it?". The current version prints a newline when it progress reporting is toggled off. Do you mean there is a hazard that this happens right when we are printing the progress, so end up with a partly garbage line? I don't think that'd be so bad to warrant further code complexitiy, after all, the user explicitly wanted the progress to stop. But maybe I am misunderstanding? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Fri, Mar 22, 2019 at 02:23:17PM +0100, Michael Banck wrote: > The current version prints a newline when it progress reporting is > toggled off. Do you mean there is a hazard that this happens right when > we are printing the progress, so end up with a partly garbage line? I > don't think that'd be so bad to warrant further code complexitiy, after > all, the user explicitly wanted the progress to stop. > > But maybe I am misunderstanding? The latest patch does not apply because of mainly ed308d7. Could you send a rebase? FWIW, I would remove the signal toggling stuff from the patch and keep the logic simple at this stage. Not having a solution which works on Windows is perhaps not a strong reason to not include it, but it's a sign that we could perhaps design something better, and that's annoying. Personally I think that I would just use --progress all the time and not use the signaling part at all, my 2c. -- Michael
Attachment
Hi, Am Samstag, den 23.03.2019, 10:49 +0900 schrieb Michael Paquier: > On Fri, Mar 22, 2019 at 02:23:17PM +0100, Michael Banck wrote: > > The current version prints a newline when it progress reporting is > > toggled off. Do you mean there is a hazard that this happens right when > > we are printing the progress, so end up with a partly garbage line? I > > don't think that'd be so bad to warrant further code complexitiy, after > > all, the user explicitly wanted the progress to stop. > > > > But maybe I am misunderstanding? > > The latest patch does not apply because of mainly ed308d7. Could you > send a rebase? I've rebased it now, see attached. > FWIW, I would remove the signal toggling stuff from the patch and keep > the logic simple at this stage. Not having a solution which works on > Windows is perhaps not a strong reason to not include it, but it's a > sign that we could perhaps design something better, and that's > annoying. Personally I think that I would just use --progress all the > time and not use the signaling part at all, my 2c. We had this discussion already, and back then I was against dropping it as well. Do you see --progress being the default? I don't think this is in-line with other project binaries, so I guess not? If not, I guess your main point is that we might come up with a better implementation/UI further down the road. Which is fair enough, but I don't think this is such an in-your-face feature that changing the way it works in future releases would be a terrible breakage of backwards compatibility. Even more so as it is solely an interactive feature and cannot be used in scripts etc. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, About patch v12: Patch applies cleanly, compiles. make check ok, but feature is not tested. Doc build ok. Although I'm in favor of it, I'm not sure that the signal think will make it for 12. Maybe it is worth compromising, doing a simple version for now, and resubmitting the signal feature later? ISTM that someone suggested 4 Hz was the good eye speed, but you wait for 400 ms, which is 2.5 Hz. What about it? I seems that current_size and total_size are not in the same unit: + if (current_size > total_size) + total_size = current_size / MEGABYTES; But they should both really be bytes, always. I think that the computations should be inverted: - first adjust total_size to current_size if needed - then compute percent - remove the percent adjustement as it is <= 100 since current_size <= total_size I still think that the speed should compute a double to avoid integer rounding errors within the computation. ISTM that rounding should only be done for display in the end. I'm okay with calling the report on each file even if this means every few GB... Someone suggested ETA, and it seems rather simple to do. What about adding it? -- Fabien.
Hi, thanks again for your review! Am Mittwoch, den 27.03.2019, 15:34 +0100 schrieb Fabien COELHO: > Hallo Michael, > > About patch v12: > > Patch applies cleanly, compiles. make check ok, but feature is not tested. > Doc build ok. > > Although I'm in favor of it, I'm not sure that the signal think will make > it for 12. Maybe it is worth compromising, doing a simple version for now, > and resubmitting the signal feature later? Let's wait one more round. I think that feature is quite useful and isolated enough that it could stay, but I'll remove it for the next patch version if needed. > ISTM that someone suggested 4 Hz was the good eye speed, but you wait for > 400 ms, which is 2.5 Hz. What about it? Uh right, I messed that up, fixed. > I seems that current_size and total_size are not in the same unit: > > + if (current_size > total_size) > + total_size = current_size / MEGABYTES; > > But they should both really be bytes, always. Yeah, they are both bytes and the "/ MEGABYTES" is wrong, removed. > I think that the computations should be inverted: > - first adjust total_size to current_size if needed > - then compute percent > - remove the percent adjustement as it is <= 100 > since current_size <= total_size Ok, done so. > I still think that the speed should compute a double to avoid integer > rounding errors within the computation. ISTM that rounding should only be > done for display in the end. Ok, also done this way. I decided to print only full MB and not any further digits as those don't seem very relevant. > I'm okay with calling the report on each file even if this means every few > GB... For my I/O throttling branch I've backed off to only call the report every 100 blocks (and at the end of the file). I've added that logic to this patch as well. > Someone suggested ETA, and it seems rather simple to do. What about > adding it? I don't know, just computing the ETA is easy of course. But you'd have to fiddle with seconds/minutes/hours conversions cause the runtime could be hours. Then, you don't want to have it bumping around weirdly when your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds simpler than the signal trigger we might get rid of. I have attached a POC which just prints the remaining seconds. If somebody wants to contribute a full implementation as a co-author, I'm happy to include it. Otherwise, I might take a stab at it over the next days, but it is not on the top of my TODO list. New patch attached. I've also changed the reporting line so that it prints a couple of blanks at the end cause I noticed that if the previously reported line was longer than the current line, then the end of it is still visible. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
>> I still think that the speed should compute a double to avoid integer >> rounding errors within the computation. ISTM that rounding should only be >> done for display in the end. > > Ok, also done this way. I decided to print only full MB and not any > further digits as those don't seem very relevant. For the computation to be in double, you must convert to double somewhere in the formula, otherwise it is computed as ints and only cast as a double afterwards. Maybe: current_speed = (1000.0 / MEGABYTES) * current_size / elapsed; Or anything which converts to double early. Instead of casting percent to int, maybe use "%.0f" as well, just like current_speed? + if ((blockno % 100 == 0) && show_progress) I'd invert the condition to avoid a modulo when not needed. I'm not very found of global variables, but it does not matter here and doing it differently would involve more code. It would matter though if the progress report would be shared between commands. >> I'm okay with calling the report on each file even if this means every few >> GB... > > For my I/O throttling branch I've backed off to only call the report > every 100 blocks (and at the end of the file). I've added that logic to > this patch as well. The 100 blocks = 800 KiB looks arbitrary. What is the rational? ISTM that postgres will tend to store a power of two number of pages on full segments, so maybe there could be a rational for 1024. Not sure. >> Someone suggested ETA, and it seems rather simple to do. What about >> adding it? > > I don't know, just computing the ETA is easy of course. But you'd have > to fiddle with seconds/minutes/hours conversions cause the runtime could > be hours. Then, you don't want to have it bumping around weirdly when > your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds > simpler than the signal trigger we might get rid of. Ok, it is more complicated that it looks for large sizes if second is not the right display unit. -- Fabien.
Hi, Am Mittwoch, den 27.03.2019, 21:31 +0100 schrieb Fabien COELHO: > > > I still think that the speed should compute a double to avoid integer > > > rounding errors within the computation. ISTM that rounding should only be > > > done for display in the end. > > > > Ok, also done this way. I decided to print only full MB and not any > > further digits as those don't seem very relevant. > > For the computation to be in double, you must convert to double somewhere > in the formula, otherwise it is computed as ints and only cast as a double > afterwards. Maybe: > > current_speed = (1000.0 / MEGABYTES) * current_size / elapsed; > > Or anything which converts to double early. Are you sure, seeing elapsed is a double already? If I print out two additional fractional digits for current_speed, I get two non-zero numbers, are those garbage? Anyway, I've done that now as it doesn't hurt. > Instead of casting percent to int, maybe use "%.0f" as well, just like > current_speed? Ok. > + if ((blockno % 100 == 0) && show_progress) > > I'd invert the condition to avoid a modulo when not needed. Ok. > > > I'm okay with calling the report on each file even if this means every few > > > GB... > > > > For my I/O throttling branch I've backed off to only call the report > > every 100 blocks (and at the end of the file). I've added that logic to > > this patch as well. > > The 100 blocks = 800 KiB looks arbitrary. What is the rational? ISTM that > postgres will tend to store a power of two number of pages on full > segments, so maybe there could be a rational for 1024. Not sure. It was arbitrary. I've changed it to 1024 now which looks a bit better. > > > Someone suggested ETA, and it seems rather simple to do. What about > > > adding it? > > > > I don't know, just computing the ETA is easy of course. But you'd have > > to fiddle with seconds/minutes/hours conversions cause the runtime could > > be hours. Then, you don't want to have it bumping around weirdly when > > your I/O rate changes, cf. https://xkcd.com/612/ That doesn't sounds > > simpler than the signal trigger we might get rid of. > > Ok, it is more complicated that it looks for large sizes if second is not > the right display unit. Right, new version attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, >> Or anything which converts to double early. > Are you sure, seeing elapsed is a double already? Argh, I missed that. You are right that a double elapsed is enough for the second part. However, with + current_speed = (current_size / MEGABYTES) / (elapsed / 1000.0); the first division is an integer one because both operands are ints, so megabytes conversion is rounded down. I'd suggest: + current_speed = ((double) current_size / MEGABYTES) / (elapsed / 1000); > If I print out two additional fractional digits for current_speed, I get > two non-zero numbers, are those garbage? Nope, they come from the other divisions, but the fist division will have wipped out 0.5 MiB on average. >> Ok, it is more complicated that it looks for large sizes if second is not >> the right display unit. > > Right, new version attached. Applies, compiles, global & local "make check" (although not tested) ok, doc build ok, manual tests ok. Otherwise a very minor comment: I'd invert !force and the computations in the return condition to avoid the computations when not needed. -- Fabien.
Hi, Am Donnerstag, den 28.03.2019, 09:41 +0100 schrieb Fabien COELHO: > Hallo Michael, > > > > Or anything which converts to double early. > > Are you sure, seeing elapsed is a double already? > > > Argh, I missed that. You are right that a double elapsed is enough for the > second part. However, with > > + current_speed = (current_size / MEGABYTES) / (elapsed / 1000.0); > > the first division is an integer one because both operands are ints, so > megabytes conversion is rounded down. I'd suggest: > > + current_speed = ((double) current_size / MEGABYTES) / (elapsed / 1000); Ok. > > > Ok, it is more complicated that it looks for large sizes if second is not > > > the right display unit. > > > > Right, new version attached. > > Applies, compiles, global & local "make check" (although not tested) ok, > doc build ok, manual tests ok. Thanks. > Otherwise a very minor comment: I'd invert !force and the computations in > the return condition to avoid the computations when not needed. The force is only ever true right at the end of the program so it would not save anything really and detract from the main gist of that statement, so I left it as-is. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
>> Otherwise a very minor comment: I'd invert !force and the computations in >> the return condition to avoid the computations when not needed. > > The force is only ever true right at the end of the program so it would > not save anything really and detract from the main gist of that > statement, so I left it as-is. Ok. I marked it as ready, but I'd advise that you split it in (1) progress and (2) signal toggling so that the first part is more likely to make it before 12 freeze. -- Fabien.
Hi, Am Donnerstag, den 28.03.2019, 10:08 +0100 schrieb Fabien COELHO: > I marked it as ready, Thanks! > but I'd advise that you split it in (1) progress and (2) signal > toggling so that the first part is more likely to make it before 12 > freeze. Ok, done so in the attached. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
Hallo Michael, >> but I'd advise that you split it in (1) progress and (2) signal >> toggling so that the first part is more likely to make it before 12 >> freeze. > > Ok, done so in the attached. Fine. I think that it is good to show the overall impact of the signal stuff, in particular the fact that the size must always be computed if the progress may be activated. Also, I'd suggest to add "volatile" on the show_progress variable in the second patch. -- Fabien.
On Thu, Mar 28, 2019 at 03:53:59PM +0100, Fabien COELHO wrote: > I think that it is good to show the overall impact of the signal stuff, in > particular the fact that the size must always be computed if the progress > may be activated. Getting to know the total size and the current size are the two important factors that matter when it comes to do progress reporting in my opinion. I have read the patch, and I am not really convinced by the need to show the progress report based on an interval of 250ms as we talk about an operation which could take dozens of minutes. So I have simplified the patch to only show a progress report every second. This also removes the include for the time-related APIs from portability/. A second thing is that I don't think that the speed is much useful. I would expect the speed to be steady, still there is a risk to show incorrect information if the speed of the operation is spiky or irregular leading to an incorrect estimation of the remaining time. In short, I would like to commit the first patch as attached, which is much more simple than what has been sent previously, still it provides the progress information which is useful. -- Michael
Attachment
Bonjour Michaël, > Getting to know the total size and the current size are the two > important factors that matter when it comes to do progress reporting > in my opinion. I have read the patch, and I am not really convinced > by the need to show the progress report based on an interval of 250ms > as we talk about an operation which could take dozens of minutes. I do not think that it matters. I like to see things moving, and the performance impact is null. > So I have simplified the patch to only show a progress report every > second. This also removes the include for the time-related APIs from > portability/. I do not think that it is a good idea, because Michael is thinking of adding some throttling capability, which would be a very good thing, but which will need something precise, so better use the precise stuff from the start. Also, the per second stuff induces rounding effects at the beginning. > A second thing is that I don't think that the speed is much useful. Hmmm. I like this information because I this is where I have expectations, whereas I'm not sure whether 1234 seconds for 12.3 GB is good or bad, but I know that 10 MB/s on my SSD is not very good. > I would expect the speed to be steady, still there is a risk to show > incorrect information if the speed of the operation is spiky or > irregular leading to an incorrect estimation of the remaining time. Hmmm. That is life, I'd say I'm used to it. > In short, I would like to commit the first patch as attached, which is > much more simple than what has been sent previously, still it provides > the progress information which is useful. I would prefer that you would keep the patch with the initial precision & features for the reasons outlined above, but you are the committer and I'm only a reviewer. -- Fabien.
Hi, so we are basically back at the original patch as written by Bernd :) > +/* > + * Report current progress status. Parts borrowed from > + * PostgreSQLs' src/bin/pg_basebackup.c > + */ > +static void > +progress_report(bool force) > +{ > + int percent; > + char total_size_str[32]; > + char current_size_str[32]; > + pg_time_t now; > + > + if (!showprogress) > + return; The way you've now changed this is that there's a function call into progress_report() for every block that's being read, even if there is no progress reporting requested. That looks like a pretty severe performance problem so I suggest to at least stick with checking showprogress before calling progress_report() and not the other way round. I guess there is still some noticeable performance penalty to pay when reporting progress, even more so now that you reverted to only doing it once per second - if we report progress, we presumably call report_progress() thousands of times and return early 99,9% of the time. So my vote is in favour of only calling progress_report() every once in a while - I saw quite a speedup (or removal of slowdown) due to this in my tests, this was not just some unwarranted microoptimization. > + fprintf(stderr, "\r"); I think the isatty() check that was in our versions of the patch is useful here, did you check how this looks when piping the output to a file? > @@ -195,12 +254,23 @@ scan_file(const char *fn, BlockNumber segmentno) > _("%s: checksums enabled in file \"%s\"\n"), progname, fn); > } > > + /* Make sure progress is reported at least once per file */ > + progress_report(false); > + > close(f); > } This hunk is from the performance optimization of calling progress_report only every 1024 blocks and could be removed if we stick with calling it for every block anyway (but see above). > -static void > -scan_directory(const char *basedir, const char *subdir) > +/* > + * Scan the given directory for items which can be checksummed and > + * operate on each one of them. If "sizeonly" is true, the size of > + * all the items which have checksums is computed and returned back "computed" is maybe a bit strong a word here, how about "added up"? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote: > The way you've now changed this is that there's a function call into > progress_report() for every block that's being read, even if there is no > progress reporting requested. That looks like a pretty severe > performance problem so I suggest to at least stick with checking > showprogress before calling progress_report() and not the other way > round. > > So my vote is in favour of only calling progress_report() every once in > a while - I saw quite a speedup (or removal of slowdown) due to this in > my tests, this was not just some unwarranted microoptimization. Do you have some runtime numbers? If all the pages are in the OS cache you should be able to see more impact. I have been testing with a pgbench database at scale 300 and --check, and perf is showing me that progress_report counts for 0.10% with or without the option of the profile while pg_checksum_page() counts for 36%. Switching the switch in or out of it makes the performance change lost in the noise. I'd rather keep the check for showprogress within progress_report() so as extra callers don't miss bypassing the report if --progress is not enabled, still we are talking about only one caller now so the attached does it the other way around with an assertion. >> + fprintf(stderr, "\r"); > > I think the isatty() check that was in our versions of the patch is > useful here, did you check how this looks when piping the output to a > file? Oops, fixed. The output was strange. > This hunk is from the performance optimization of calling > progress_report only every 1024 blocks and could be removed if we stick > with calling it for every block anyway (but see above). Indeed, removed this one. >> -static void >> -scan_directory(const char *basedir, const char *subdir) >> +/* >> + * Scan the given directory for items which can be checksummed and >> + * operate on each one of them. If "sizeonly" is true, the size of >> + * all the items which have checksums is computed and returned back > > "computed" is maybe a bit strong a word here, how about "added up"? "computed" sounds fine to me. -- Michael
Attachment
On Sat, Mar 30, 2019 at 06:52:56PM +0100, Fabien COELHO wrote: > I do not think that it matters. I like to see things moving, and the > performance impact is null. Another point is that this bloats the logs redirected to a file by 4 compared to the initial proposal. I am not sure that this helps much for anybody. > I do not think that it is a good idea, because Michael is thinking of adding > some throttling capability, which would be a very good thing, but which will > need something precise, so better use the precise stuff from the start. > Also, the per second stuff induces rounding effects at the beginning. Let's revisit that when the need shows up then. I'd rather have us start with a basic set of metrics which can be extended later on. > Hmmm. I like this information because I this is where I have expectations, > whereas I'm not sure whether 1234 seconds for 12.3 GB is good or bad, but I > know that 10 MB/s on my SSD is not very good. Well, with some progress generated once per second you are one substraction away to guess how much has been computed in the last N second... -- Michael
Attachment
Bonjour Michaël, >> I do not think that it matters. I like to see things moving, and the >> performance impact is null. > > Another point is that this bloats the logs redirected to a file by 4 > compared to the initial proposal. I am not sure that this helps > much for anybody. Hmmm. Progress is more an interactive feature where the previous result is overriden thanks to the \r. Maybe it should be -P X where X is the expected delay in seconds. Pgbench progress reporting on initialization basically outputs 10 rows per second, probably it is too much. >> I do not think that it is a good idea, because Michael is thinking of adding >> some throttling capability, which would be a very good thing, but which will >> need something precise, so better use the precise stuff from the start. >> Also, the per second stuff induces rounding effects at the beginning. > > Let's revisit that when the need shows up then. I'd rather have us > start with a basic set of metrics which can be extended later on. I do not see why it would be better to do it roughly if it is already implemented precisely and nicely. >> Hmmm. I like this information because I this is where I have expectations, >> whereas I'm not sure whether 1234 seconds for 12.3 GB is good or bad, but I >> know that 10 MB/s on my SSD is not very good. > > Well, with some progress generated once per second you are one > substraction away to guess how much has been computed in the last N > second... I would prefer to have the speed simply printed out. The per second or more thing is debatable, but for the other changes I do not think that they improve the feature much. As I said, I'm only a reviewer, you do as you feel. -- Fabien.
Hi, Am Montag, den 01.04.2019, 15:10 +0900 schrieb Michael Paquier: > On Sat, Mar 30, 2019 at 09:07:41PM +0100, Michael Banck wrote: > > The way you've now changed this is that there's a function call into > > progress_report() for every block that's being read, even if there is no > > progress reporting requested. That looks like a pretty severe > > performance problem so I suggest to at least stick with checking > > showprogress before calling progress_report() and not the other way > > round. > > > > So my vote is in favour of only calling progress_report() every once in > > a while - I saw quite a speedup (or removal of slowdown) due to this in > > my tests, this was not just some unwarranted microoptimization. > > Do you have some runtime numbers? I had another look and I don't see any slowdown with your patch. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Mon, Apr 01, 2019 at 08:51:03PM +0200, Michael Banck wrote: > I had another look and I don't see any slowdown with your patch. I noticed one slowdown when using --disable --progress as this was scanning the data directory for the total size but we don't need it in this case. Fixed that and committed. -- Michael
Attachment
On Mon, Apr 01, 2019 at 07:26:00PM +0200, Fabien COELHO wrote: > Hmmm. Progress is more an interactive feature where the previous result is > overriden thanks to the \r. Well, many people also redirect the output for such things. > Maybe it should be -P X where X is the expected > delay in seconds. Pgbench progress reporting on initialization basically > outputs 10 rows per second, probably it is too much. I cannot say for pgbench. I personally think that's a lot but you are the one who wrote it as such I guess. > I do not see why it would be better to do it roughly if it is already > implemented precisely and nicely. Simple things can be extended later on, while complicated things cannot, and we don't have similar metrics for other tools which may make sense for them to have (not pg_rewind, but pg_basebackup). Please note that progress reports on the backend also include total amount of data to process vs current amount of data processed, which is reliable output. The speed may be nice, but it is easy enough to see in an output file where things get stuck even if there is no speed showing up (or maybe just the difference with the last progress makes more sense to have?). -- Michael
Attachment
Bonjour Michaël, >> Maybe it should be -P X where X is the expected >> delay in seconds. Pgbench progress reporting on initialization basically >> outputs 10 rows per second, probably it is too much. > > I cannot say for pgbench. I personally think that's a lot but you are > the one who wrote it as such I guess. Nope, this was way before I intervened. ISTM that a patch was submitted to get per second or slower progress reporting on the initalization, and it was rejected. Now that there are many SSD, maybe I could bring it back. An issue with pbbench is that the reported time and progress is for the insertion phase only, but PK and other FK declaration take as much time and are not included, so I'm not sure it can be much better. For pg_checksums, probably some improvement patch will be submitted later, if someone feels like it. -- Fabien.
On Tue, Apr 02, 2019 at 08:11:45AM +0200, Fabien COELHO wrote: > Nope, this was way before I intervened. ISTM that a patch was submitted to > get per second or slower progress reporting on the initalization, and it was > rejected. Now that there are many SSD, maybe I could bring it back. An issue > with pbbench is that the reported time and progress is for the insertion > phase only, but PK and other FK declaration take as much time and are not > included, so I'm not sure it can be much better. Bonus idea: integrate automatically with pg_stat_progress_vacuum() for the last vacuum at the end of initialization. This step can also take a very long time and one can only know the progress of the VACUUM with a second session/terminal. > For pg_checksums, probably some improvement patch will be submitted later, > if someone feels like it. Let's see. I think that what we have now in v12 is good enough for checksum operations on an offline cluster. And my take is that we should focus more on online checksum verification for v13 and future versions. Regarding all this tooling around checksums. With v12, enabling checksums with no actual downtime is doable with a primary-standby deployment using physical replication and one planned failover (possible as well with v10 and newer versions and logical replication but that takes longer), so I think that there is not much value in being able to enable checksums on a single node while it is online. -- Michael
Attachment
>> For pg_checksums, probably some improvement patch will be submitted later, >> if someone feels like it. > > Let's see. I think that what we have now in v12 is good enough for > checksum operations on an offline cluster. And my take is that we > should focus more on online checksum verification for v13 and future > versions. I agree. For online, we should want throttling so that the check can have a reduced performance impact when scrubbing. -- Fabien.
Hi, Am Dienstag, den 02.04.2019, 15:56 +0900 schrieb Michael Paquier: > Regarding all this tooling around checksums. With v12, enabling > checksums with no actual downtime is doable with a primary-standby > deployment using physical replication and one planned failover Can you explain in more detail how this would work? I thought we came to the conclusion (and the documentation seems to indicate so), that you should stop all participating instances of a cluster and then enable checksums on all of them, which would impose a downtime. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
On Tue, Apr 02, 2019 at 04:02:59PM +0200, Michael Banck wrote: > Can you explain in more detail how this would work? I thought we came to > the conclusion (and the documentation seems to indicate so), that you > should stop all participating instances of a cluster and then enable > checksums on all of them, which would impose a downtime. That's what I was pointing out here: https://www.postgresql.org/message-id/20190320225924.GC20192@paquier.xyz I still agree about keeping in the docs safer recommendations, in the shape of the ones currently present, than what I mentioned on the other thread. -- Michael
Attachment
On Tue, Apr 02, 2019 at 10:10:34AM +0200, Fabien COELHO wrote: > For online, we should want throttling so that the check can have a reduced > performance impact when scrubbing. Yes. Throttling is a necessary property in my opinion, perhaps in combination with some autovacuum-like options to only trigger the checks for a relation after a certain amout of activity has happened on it. -- Michael