Thread: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
Hi, one take-away from the Gitlab Post-Mortem[1] appears to be that after their secondary lost replication, they were confused about what pg_basebackup was doing when they tried to rebuild it. It just sat there and did nothing (even with --verbose), so they assumed something was wrong with either the primary or the connection, and restarted it several times. AFAICT, it turns out the checkpoint was written on the master (they probably did not use -c fast), but this wasn't obvious to them: "One of the engineers went to the secondary and wiped the data directory, then ran pg_basebackup. Unfortunately pg_basebackup would hang, producing no meaningful output, despite the --verbose option being set." [...] "Unfortunately this did not resolve the problem of pg_basebackup not starting replication immediately. One of the engineers decided to run it with strace to see what it was blocking on. strace showed that pg_basebackup was hanging in a poll call, but that did not provide any other meaningful information that might have explained why." [...] "It would later be revealed by another engineer (who wasn't around at the time) that this is normal behavior: pg_basebackup will wait for the primary to start sending over replication data and it will sit and wait silently until that time. Unfortunately this was not clearly documented in our engineering runbooks nor in the official pg_basebackup document." ISTM that even with WAL streaming, nothing would be written on the client server until the checkpoint is complete, as do_pg_start_backup() runs the checkpoint and only returns the starting WAL location afterwards. The attached (untested) patch is to kick of a discussion on how to improve the situation, it is supposed to mention the checkpoint when --verbose is used and adds a paragraph about the checkpoint being run to the Notes section of the documentation. Michael [1]https://about.gitlab.com/2017/02/10/postmortem-of-database-outage-of-january-31/ -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sat, Feb 11, 2017 at 10:38 AM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,
one take-away from the Gitlab Post-Mortem[1] appears to be that after
their secondary lost replication, they were confused about what
pg_basebackup was doing when they tried to rebuild it. It just sat there
and did nothing (even with --verbose), so they assumed something was
wrong with either the primary or the connection, and restarted it
several times.
AFAICT, it turns out the checkpoint was written on the master (they
probably did not use -c fast), but this wasn't obvious to them:
Yeah, I've seen this happen to a number of people. I think that sounds like what's happened here as well. I've considered things in the line of the patch you posted, but never got around to actually doing anything about it.
ISTM that even with WAL streaming, nothing would be written on the
client server until the checkpoint is complete, as do_pg_start_backup()
runs the checkpoint and only returns the starting WAL location
afterwards.
The attached (untested) patch is to kick of a discussion on how to
improve the situation, it is supposed to mention the checkpoint when
--verbose is used and adds a paragraph about the checkpoint being run to
the Notes section of the documentation.
Docs look good to me, other than claiming that pg_basebackup runs on a server (it can run anywhere). I would just say "during which pg_basebackup will appear idle". How does that sound to you?
Hi, Am Samstag, den 11.02.2017, 11:07 +0100 schrieb Magnus Hagander: > As for the code, while I haven't tested it, isn't the "checkpoint > completed" message in the wrong place? Doesn't PQsendQuery() complete > immediately, and the check needs to be put *after* the PQgetResult() > call? I guess you're right, I've moved it further down. There is in fact a message about the xlog location (unless you switch off wal entirely), but having another one right before that mentioning the completed checkpoint sounds ok to me. There's also some inconsistencies around which messages are prepended with "pg_basebackup: " and which are translatable; I guess all messages printed on --verbose should be translatable? Also, as almost all messages have a "pg_basebackup: " prefix, I've added it to the rest. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, Am Samstag, den 11.02.2017, 11:25 +0100 schrieb Michael Banck: > Am Samstag, den 11.02.2017, 11:07 +0100 schrieb Magnus Hagander: > > As for the code, while I haven't tested it, isn't the "checkpoint > > completed" message in the wrong place? Doesn't PQsendQuery() complete > > immediately, and the check needs to be put *after* the PQgetResult() > > call? > > I guess you're right, I've moved it further down. There is in fact a > message about the xlog location (unless you switch off wal entirely), > but having another one right before that mentioning the completed > checkpoint sounds ok to me. > > There's also some inconsistencies around which messages are prepended > with "pg_basebackup: " and which are translatable; I guess all messages > printed on --verbose should be translatable? Also, as almost all > messages have a "pg_basebackup: " prefix, I've added it to the rest. Sorry, there were two typoes in the last patch, I've attached a fixed one. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 2/11/17 4:36 AM, Michael Banck wrote: > I guess you're right, I've moved it further down. There is in fact a > message about the xlog location (unless you switch off wal entirely), > but having another one right before that mentioning the completed > checkpoint sounds ok to me. 1) I don't think this should be verbose output. Having a program sit there "doing nothing" for no apparent reason is just horrible UI design. 2) I think it'd be useful to have a way to get the status of a running checkpoint. The checkpointer already has that info, and I think it might even be in shared memory already. If there was a function that reported checkpoint status pg_basebackup could poll that to provide users with live status. That should be a separate patch though. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
-- On 2/11/17 4:36 AM, Michael Banck wrote:I guess you're right, I've moved it further down. There is in fact a
message about the xlog location (unless you switch off wal entirely),
but having another one right before that mentioning the completed
checkpoint sounds ok to me.
1) I don't think this should be verbose output. Having a program sit there "doing nothing" for no apparent reason is just horrible UI design.
That would include much of Unix then.. For example if I run "cp" on a large file it sits around "doing nothing". Same if I do "tar". No?
2) I think it'd be useful to have a way to get the status of a running checkpoint. The checkpointer already has that info, and I think it might even be in shared memory already. If there was a function that reported checkpoint status pg_basebackup could poll that to provide users with live status. That should be a separate patch though.
I agree that this would definitely be useful. But it might be something that's better exposed as a server-side view?
(and if pg_basebackup could poll it it would probably still not be included by default -- only if -P was given).
Hi, Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander: > On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim.Nasby@bluetreble.com> > wrote: > On 2/11/17 4:36 AM, Michael Banck wrote: > I guess you're right, I've moved it further down. > There is in fact a > message about the xlog location (unless you switch off > wal entirely), > but having another one right before that mentioning > the completed > checkpoint sounds ok to me. > > 1) I don't think this should be verbose output. Having a > program sit there "doing nothing" for no apparent reason is > just horrible UI design. > > > That would include much of Unix then.. For example if I run "cp" on a > large file it sits around "doing nothing". Same if I do "tar". No? The expectation for all three commands is that, even if there is no output on stdout, they will write data to the local machine. So you can easily monitor the progress of cp and tar by running du or something in a different terminal. With pg_basebackup, nothing is happening on the local machine until the checkpoint on the remote is finished; while this is obvious to somebody familiar with how basebackups work internally, it appears to be not clear at all to some users. So I think notifying the user that something is happening remotely while the local process waits would be useful, but on the other hand, pg_basebackup does not print anything unless (i) --verbose is set or (ii) there is an error, so I think having it mention the checkpoint in --verbose mode only is acceptable. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On Mon, Feb 13, 2017 at 10:33 AM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,
Am Montag, den 13.02.2017, 09:31 +0100 schrieb Magnus Hagander:
> On Mon, Feb 13, 2017 at 3:29 AM, Jim Nasby <Jim.Nasby@bluetreble.com>
> wrote:
> On 2/11/17 4:36 AM, Michael Banck wrote:
> I guess you're right, I've moved it further down.
> There is in fact a
> message about the xlog location (unless you switch off
> wal entirely),
> but having another one right before that mentioning
> the completed
> checkpoint sounds ok to me.
>
> 1) I don't think this should be verbose output. Having a
> program sit there "doing nothing" for no apparent reason is
> just horrible UI design.
>
>
> That would include much of Unix then.. For example if I run "cp" on a
> large file it sits around "doing nothing". Same if I do "tar". No?
The expectation for all three commands is that, even if there is no
output on stdout, they will write data to the local machine. So you can
easily monitor the progress of cp and tar by running du or something in
a different terminal.
With pg_basebackup, nothing is happening on the local machine until the
checkpoint on the remote is finished; while this is obvious to somebody
familiar with how basebackups work internally, it appears to be not
clear at all to some users.
True.
However, outputing this info by default will make it show up in things like everybodys cronjobs by default. Right now a successful pg_basebackup run will come out with no output at all, which is how most Unix commands work, and brings it's own advantages. If we change that people will have to send all the output to /dev/null, resulting in missing the things that are actually important in any regard.
So I think notifying the user that something is happening remotely while
the local process waits would be useful, but on the other hand,
pg_basebackup does not print anything unless (i) --verbose is set or
(ii) there is an error, so I think having it mention the checkpoint in
--verbose mode only is acceptable.
Yeah, that's my view as well. I'm all for including it in verbose mode.
*Iff* we can get a progress indicator through the checkpoint we could include that in --progress mode. But that's a different patch, of course, but it shouldn't be included in the default output even if we find it.
On Tue, Feb 14, 2017 at 12:06 PM, Magnus Hagander <magnus@hagander.net> wrote: > However, outputing this info by default will make it show up in things like > everybodys cronjobs by default. Right now a successful pg_basebackup run > will come out with no output at all, which is how most Unix commands work, > and brings it's own advantages. If we change that people will have to send > all the output to /dev/null, resulting in missing the things that are > actually important in any regard. I agree with that. I think having this show up in verbose mode is a really good idea - when something just hangs, users don't know what's going on, and that's bad. But showing it all the time seems like a bridge too far. As the postmortem linked above shows, people will think of things like "hey, let's try --verbose mode" when the obvious thing doesn't work. What is really irritating to them is when --verbose mode fails to be, uh, verbose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas wrote: > On Tue, Feb 14, 2017 at 12:06 PM, Magnus Hagander <magnus@hagander.net> wrote: > > However, outputing this info by default will make it show up in things like > > everybodys cronjobs by default. Right now a successful pg_basebackup run > > will come out with no output at all, which is how most Unix commands work, > > and brings it's own advantages. If we change that people will have to send > > all the output to /dev/null, resulting in missing the things that are > > actually important in any regard. > > I agree with that. I think having this show up in verbose mode is a > really good idea - when something just hangs, users don't know what's > going on, and that's bad. But showing it all the time seems like a > bridge too far. As the postmortem linked above shows, people will > think of things like "hey, let's try --verbose mode" when the obvious > thing doesn't work. What is really irritating to them is when > --verbose mode fails to be, uh, verbose. I'd rather have a --quiet mode instead. If you're running it by hand, you're likely to omit the switch, whereas when writing the cron job you're going to notice lack of switch even before you let the job run once. I think progress reporting ought to go to stderr anyway. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I'd rather have a --quiet mode instead. If you're running it by hand, > you're likely to omit the switch, whereas when writing the cron job > you're going to notice lack of switch even before you let the job run > once. Well, that might've been a better way to design it, but changing it now would break backward compatibility and I'm not really sure that's a good idea. Even if it is, it's a separate concern from whether or not in the less-quiet mode we should point out that we're waiting for a checkpoint on the server side. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Feb 14, 2017 at 9:06 AM, Magnus Hagander <magnus@hagander.net> wrote:
Yeah, that's my view as well. I'm all for including it in verbose mode.*Iff* we can get a progress indicator through the checkpoint we could include that in --progress mode. But that's a different patch, of course, but it shouldn't be included in the default output even if we find it.
I think it should show up in --progress mode. It would be great if we could show fine-grained progress reports on the checkpoint, but if we can't do that we should still report as fine as we are able to, which is that a checkpoint is in progress. Otherwise we are setting the perfect as the enemy of the good.
Cheers,
Jeff
On 2/14/17 5:18 PM, Robert Haas wrote: > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: >> I'd rather have a --quiet mode instead. If you're running it by hand, >> you're likely to omit the switch, whereas when writing the cron job >> you're going to notice lack of switch even before you let the job run >> once. > > Well, that might've been a better way to design it, but changing it > now would break backward compatibility and I'm not really sure that's Meh... it's really only going to affect cronjobs or scripts, which are easy enough to fix, and you're not going to have that many of them (or if you do you certainly have an automated way to push the update). > a good idea. Even if it is, it's a separate concern from whether or > not in the less-quiet mode we should point out that we're waiting for > a checkpoint on the server side. Well, --quite was suggested because of confusion from pg_basebackup twiddling it's thumbs... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532)
On 02/17/2017 08:17 PM, Jim Nasby wrote:> On 2/14/17 5:18 PM, Robert Haas wrote:>> On Tue, Feb 14, 2017 at 4:06 PM, AlvaroHerrera>> <alvherre@2ndquadrant.com> wrote:>>> I'd rather have a --quiet mode instead. If you're running it by hand,>>>you're likely to omit the switch, whereas when writing the cron job>>> you're going to notice lack of switch evenbefore you let the job run>>> once.>>>> Well, that might've been a better way to design it, but changing it>> now wouldbreak backward compatibility and I'm not really sure that's>> Meh... it's really only going to affect cronjobs or scripts,which are> easy enough to fix, and you're not going to have that many of them (or> if you do you certainly have anautomated way to push the update).> I think you're underestimating the breakage and overestimating how easy it's going to be to it. It's true we'd only change this in a major version, so people should assume possible breakage and test. >> a good idea. Even if it is, it's a separate concern from whether or>> not in the less-quiet mode we should point outthat we're waiting for>> a checkpoint on the server side.>> Well, --quite was suggested because of confusion from pg_basebackup>twiddling it's thumbs... I'm in favor of the '--verbose' route. People are used to that when investigating issues, and it does not break existing cron jobs. I can live with --quiet though, as long as we don't resort to some craziness along the lines "if there's tty be verbose, otherwise be quiet". I have my doubts about this actually addressing gitlab-like mistakes, though, because it's a helluva jump from "It's waiting and not doing anything," to "We need to remove the datadir." (One of the reasons being that non-empty directory is a local issue, and there's no reason why the tool should wait instead of just reporting an error.) FWIW before messing with the pg_basebackup code, perhaps we should improve the documentation and explain clearly the meaning of 'fast' and 'spread' checkpoint modes. Right now, pg_basebackup docs only say this: Sets checkpoint mode to fast or spread (default) (see Section 24.3.3). which is pretty damn useless, when you're investigating an issue. And the referenced section (Making a Base Backup Using the Low Level API) does not clearly explain how this maps to pg_start_backup(_,?). What about adding a paragraph into pg_basebackup docs, explaining that with 'fast' it does immediate checkpoint, while with 'spread' it'll wait for a spread checkpoint. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] gitlab post-mortem: pg_basebackup waiting for checkpoint
From
"David G. Johnston"
Date:
What about adding a paragraph into pg_basebackup docs, explaining that with 'fast' it does immediate checkpoint, while with 'spread' it'll wait for a spread checkpoint.
I agree that a better, and self-contained, explanation of the behaviors that fast and spread invoke on the server should be included directly in the pg_basebackup docs.
Additionally, a primary benefit of pg_basebackup is hiding the low-level details from the user and in that spirit the cross-reference link to Section 25.3.3 "Making a Base Backup Using the Low Level API" should be removed. If there is specific information there that a user of pg_basebackup needs it should be presented properly in the application documentation.
The top of pg_basebackup points to the entire 25.3 chapter but the flow from there is solid - coverage of pg_basebackup occurs and points out the low level API for those whose needs are not fully served by the bundled application. If one uses pg_basebackup they should be able to stop at that point, go back to the app page, and continue reading and skip all of 25.3.3
The term "spread checkpoint" isn't actually a defined term in our docs...and aside from the word spread itself describing out a checkpoint works, it isn't used outside of pg_basebackup docs. So "it will wait for a spread checkpoint" doesn't really work - "it will start and then wait for a normal checkpoint to complete" does.
More holistically (i.e., feel free to skip)
This paragraph from 25.3.3:
"""
This is because it performs a checkpoint, and the I/O required for the checkpoint will be spread out over a significant period of time, by default half your inter-checkpoint interval (see the configuration parameter checkpoint_completion_target). This is usually what you want, because it minimizes the impact on query processing. If you want to start the backup as soon as possible, change the second parameter to true.
"""
is good but buried and seems like it would be more visible in Chapter 30. Reliability and the Write-Ahead Log. To there both the internals and backbackup pages could point the reader. There isn't a chapter dedicated to checkpoints - nor does there need to be - but a section in 30 seems warranted as being the official reference. Right now you have to skim the configuration variables and "WAL Configuration" and "CHECKPOINT" and "base backup API and pg_basebackup" to cover everything. A checkpoint chapter with that paragraph as a focus would allow the other items to simply say "immediate or normal checkpoint" as needed and redirect the reader for additional context as to the trade-offs of each - whether done manually or during some form of backup script.
David J.
On Sat, Feb 18, 2017 at 4:52 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > I have my doubts about this actually addressing gitlab-like mistakes, > though, because it's a helluva jump from "It's waiting and not doing > anything," to "We need to remove the datadir." (One of the reasons being > that non-empty directory is a local issue, and there's no reason why the > tool should wait instead of just reporting an error.) It's pretty clear that the gitlab postmortem involves multiple people making multiple serious errors, including failing to test that the ostensible backups could actually be restored. I was taught that rule #1 as far as backups are concerned is to test that you can restore them, so that seems like a big miss. However, I don't think the fact they made other mistakes is a reason not to improve the things we can improve and, certainly, having some way for pg_basebackup to tell you that it's waiting for the master to checkpoint will help the next person who is confused by that particular thing. That person may go on to be confused by something else, but then again maybe not. Improving the reporting in this case stands on its own merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas: > On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera > <alvherre@2ndquadrant.com> wrote: > > I'd rather have a --quiet mode instead. If you're running it by hand, > > you're likely to omit the switch, whereas when writing the cron job > > you're going to notice lack of switch even before you let the job run > > once. > > Well, that might've been a better way to design it, but changing it > now would break backward compatibility and I'm not really sure that's > a good idea. Even if it is, it's a separate concern from whether or > not in the less-quiet mode we should point out that we're waiting for > a checkpoint on the server side. ISTM the consensus is that there should be no output in regular mode, but a message should be displayed in verbose and progress mode. So I went forth and also added a message in progress mode (unless verbose messages are requested anyway). Regarding the documentation, I tried to clarify the difference between the checkpoint types a bit more, but I think any further action is probably a larger rewrite of the documentation on this topic. So attached are two patches, I've split it up in the documentation and the code output part. I'll add it as one commitfest entry in the "Clients" section though, as it's not really a big patch, unless somebody thinks it should have a secon entry in "Documentation"? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,
Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I'd rather have a --quiet mode instead. If you're running it by hand,
> > you're likely to omit the switch, whereas when writing the cron job
> > you're going to notice lack of switch even before you let the job run
> > once.
>
> Well, that might've been a better way to design it, but changing it
> now would break backward compatibility and I'm not really sure that's
> a good idea. Even if it is, it's a separate concern from whether or
> not in the less-quiet mode we should point out that we're waiting for
> a checkpoint on the server side.
ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.
So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).
Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.
So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?
Agreed, and applied as one patch. Except I noticed you also fixed a couple of entries which were missing the progname in the messages -- I broke those out to a separate patch instead.
Made a small change to "using as much I/O as available" rather than "as possible", which I think is a better wording, along with the change of the idle wording I suggested before. (but feel free to point it out to me if that's wrong).
Hi, Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander: > On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck > <michael.banck@credativ.de> wrote: > Agreed, and applied as one patch. Except I noticed you also fixed a > couple of entries which were missing the progname in the messages -- I > broke those out to a separate patch instead. Thanks! > Made a small change to "using as much I/O as available" rather than > "as possible", which I think is a better wording, along with the > change of the idle wording I suggested before. (but feel free to point > it out to me if that's wrong). LGTM, I apparently missed your suggestion when I re-read the thread. I am just wondering whether this could/should be back-patched, maybe? It is not a bug fix, of course, but OTOH is rather small and probably helpful to some users on current releases. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On Sun, Feb 26, 2017 at 9:53 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,
Am Sonntag, den 26.02.2017, 21:32 +0100 schrieb Magnus Hagander:
> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck
> <michael.banck@credativ.de> wrote:
> Agreed, and applied as one patch. Except I noticed you also fixed a
> couple of entries which were missing the progname in the messages -- I
> broke those out to a separate patch instead.
Thanks!
> Made a small change to "using as much I/O as available" rather than
> "as possible", which I think is a better wording, along with the
> change of the idle wording I suggested before. (but feel free to point
> it out to me if that's wrong).
LGTM, I apparently missed your suggestion when I re-read the thread.
I am just wondering whether this could/should be back-patched, maybe? It
is not a bug fix, of course, but OTOH is rather small and probably
helpful to some users on current releases.
Good point. We should definitely back-patch the documentation updates.
Not 100% sure about the others, as it's a small behaviour change. But since it's only in verbose mode, I doubt it is very likely to break anybodys scripts relying on certain output or so.
What do others think?
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de> > wrote: >> ISTM the consensus is that there should be no output in regular mode, >> but a message should be displayed in verbose and progress mode. > Agreed, and applied as one patch. Is there an argument for back-patching this? regards, tom lane
On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Magnus Hagander <magnus@hagander.net> writes:
> On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de>
> wrote:
>> ISTM the consensus is that there should be no output in regular mode,
>> but a message should be displayed in verbose and progress mode.
> Agreed, and applied as one patch.
Is there an argument for back-patching this?
Seems you were typing that at the same time as we did.
I'm considering it, but not swayed in either direction. Should I take your comment as a vote that we should back-patch it?
Magnus Hagander <magnus@hagander.net> writes: > On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Is there an argument for back-patching this? > I'm considering it, but not swayed in either direction. Should I take your > comment as a vote that we should back-patch it? Yeah, I'd vote for it. regards, tom lane
On 26 February 2017 at 20:55, Magnus Hagander <magnus@hagander.net> wrote: > What do others think? Changing the output behaviour of a command isn't something we usually do as a backpatch. This change doesn't affect the default behaviour so probably wouldn't make a difference to the outcome of the situation that generated this thread. Having said that, if it helps others to avoid mistakes in the future then its worth doing, so +1 to backpatch. I've looked into changing the actual underlying behaviour and I don't think its feasible, so making this change will at least allow some responsiveness from us. Thanks Michael, Magnus. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de> wrote:Hi,
Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I'd rather have a --quiet mode instead. If you're running it by hand,
> > you're likely to omit the switch, whereas when writing the cron job
> > you're going to notice lack of switch even before you let the job run
> > once.
>
> Well, that might've been a better way to design it, but changing it
> now would break backward compatibility and I'm not really sure that's
> a good idea. Even if it is, it's a separate concern from whether or
> not in the less-quiet mode we should point out that we're waiting for
> a checkpoint on the server side.
ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.
So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).
Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.
So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?Agreed, and applied as one patch. Except I noticed you also fixed a couple of entries which were missing the progname in the messages -- I broke those out to a separate patch instead.Made a small change to "using as much I/O as available" rather than "as possible", which I think is a better wording, along with the change of the idle wording I suggested before. (but feel free to point it out to me if that's wrong).
Should the below fprintf end in a \r rather than a \n, so that the the progress message gets over-written once the checkpoint is done and we have moved on?
if (showprogress && !verbose)
fprintf(stderr, "waiting for checkpoint\n");
That would seem more in keeping with how the other progress messages operate.
Cheers,
Jeff
Hi, Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander: > On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Is there an argument for back-patching this? > > > Seems you were typing that at the same time as we did. > > > I'm considering it, but not swayed in either direction. Should I take > your comment as a vote that we should back-patch it? I've checked back into this thread, and there seems to be a +1 from Tom and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you decide against it in the end, or is this still an open item? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer
On Wed, Mar 29, 2017 at 1:05 PM, Michael Banck <michael.banck@credativ.de> wrote:
Hi,
Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:
> On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is there an argument for back-patching this?
>
>
> Seems you were typing that at the same time as we did.
>
>
> I'm considering it, but not swayed in either direction. Should I take
> your comment as a vote that we should back-patch it?
I've checked back into this thread, and there seems to be a +1 from Tom
and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
decide against it in the end, or is this still an open item?
No, I plan to work on it, so it's still an open item. I've been backlogged with other things, but I will try to get too it soon.
(This also includes considering Jeff's note)
On Mon, Feb 27, 2017 at 7:46 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sun, Feb 26, 2017 at 12:32 PM, Magnus Hagander <magnus@hagander.net> wrote:On Sun, Feb 26, 2017 at 8:27 PM, Michael Banck <michael.banck@credativ.de> wrote:Hi,
Am Dienstag, den 14.02.2017, 18:18 -0500 schrieb Robert Haas:
> On Tue, Feb 14, 2017 at 4:06 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
> > I'd rather have a --quiet mode instead. If you're running it by hand,
> > you're likely to omit the switch, whereas when writing the cron job
> > you're going to notice lack of switch even before you let the job run
> > once.
>
> Well, that might've been a better way to design it, but changing it
> now would break backward compatibility and I'm not really sure that's
> a good idea. Even if it is, it's a separate concern from whether or
> not in the less-quiet mode we should point out that we're waiting for
> a checkpoint on the server side.
ISTM the consensus is that there should be no output in regular mode,
but a message should be displayed in verbose and progress mode.
So I went forth and also added a message in progress mode (unless
verbose messages are requested anyway).
Regarding the documentation, I tried to clarify the difference between
the checkpoint types a bit more, but I think any further action is
probably a larger rewrite of the documentation on this topic.
So attached are two patches, I've split it up in the documentation and
the code output part. I'll add it as one commitfest entry in the
"Clients" section though, as it's not really a big patch, unless
somebody thinks it should have a secon entry in "Documentation"?Agreed, and applied as one patch. Except I noticed you also fixed a couple of entries which were missing the progname in the messages -- I broke those out to a separate patch instead.Made a small change to "using as much I/O as available" rather than "as possible", which I think is a better wording, along with the change of the idle wording I suggested before. (but feel free to point it out to me if that's wrong).Should the below fprintf end in a \r rather than a \n, so that the the progress message gets over-written once the checkpoint is done and we have moved on?if (showprogress && !verbose)fprintf(stderr, "waiting for checkpoint\n");That would seem more in keeping with how the other progress messages operate.
Agreed, that makes more sense. I've pushed a patch that does this.
On Fri, Mar 31, 2017 at 8:59 AM, Magnus Hagander <magnus@hagander.net> wrote:
-- On Wed, Mar 29, 2017 at 1:05 PM, Michael Banck <michael.banck@credativ.de> wrote:Hi,
Am Montag, den 27.02.2017, 16:20 +0100 schrieb Magnus Hagander:
> On Sun, Feb 26, 2017 at 9:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Is there an argument for back-patching this?
>
>
> Seems you were typing that at the same time as we did.
>
>
> I'm considering it, but not swayed in either direction. Should I take
> your comment as a vote that we should back-patch it?
I've checked back into this thread, and there seems to be a +1 from Tom
and a +(0.5-1) from Simon for backpatching, and no obvious -1s. Did you
decide against it in the end, or is this still an open item?No, I plan to work on it, so it's still an open item. I've been backlogged with other things, but I will try to get too it soon.(This also includes considering Jeff's note)
I've applied a backpatch to 9.4. Prior to that pretty much the entire patch is a conflict, so it would need a full rewrite.
Am Samstag, den 01.04.2017, 17:29 +0200 schrieb Magnus Hagander: > I've applied a backpatch to 9.4. Prior to that pretty much the entire > patch is a conflict, so it would need a full rewrite. Thanks! Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer