Re: Duplicate history file? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Duplicate history file? |
Date | |
Msg-id | 20210616.120403.229198747003285048.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Duplicate history file? (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: Duplicate history file?
|
List | pgsql-hackers |
Thanks for the opinions. At Tue, 15 Jun 2021 11:33:10 -0400, Stephen Frost <sfrost@snowman.net> wrote in > Greetings, > > * Kyotaro Horiguchi (horikyota.ntt@gmail.com) wrote: > > At Fri, 11 Jun 2021 16:08:33 +0900, Michael Paquier <michael@paquier.xyz> wrote in > > > On Fri, Jun 11, 2021 at 03:32:28PM +0900, Kyotaro Horiguchi wrote: > > > > I think cp can be an example as far as we explain the limitations. (On > > > > the other hand "test !-f" cannot since it actually prevents server > > > > from working correctly.) > > > > > > Disagreed. I think that we should not try to change this area until > > > we can document a reliable solution, and a simple "cp" is not that. > > > > Isn't removing cp from the documentation a change in this area? I > > basically agree to not to change anything but the current example > > "test ! -f <fn> && cp .." and relevant description has been known to > > be problematic in a certain situation. > > [...] > > > - Write the full (known) requirements and use a pseudo tool-name in > > the example? > > I'm generally in favor of just using a pseudo tool-name and then perhaps > providing a link to a new place on .Org where people can ask to have > their PG backup solution listed, or something along those lines. Looks fine. > > - provide a minimal implement of the command? > > Having been down this road for a rather long time, I can't accept this > as a serious suggestion. No, not even with Perl. Been there, done > that, not going back. > > > - recommend some external tools (that we can guarantee that they > > comform the requriements)? > > The requirements are things which are learned over years and changes > over time. Trying to document them and keep up with them would be a > pretty serious project all on its own. There are external projects who > spend serious time and energy doing their best to provide the tooling > needed here and we should be promoting those, not trying to pretend like > this is a simple thing which anyone could write a short perl script to > accomplish. I agree that no simple solution could be really perfect. The reason I think that a simple cp can be a candidate of the example might be based on the assumption that anyone who is going to build a database system ought to know their requirements including the durability/reliability of archives/backups and the limitaions of adopted methods/technologies. However, as Julien mentioned, if there's actually a problem that relatively.. ahem, ill-advised users (sorry in advance if it's rude) uses the 'cp' only for the reason that it is shown in the example without a thought and inadvertently loses archives, it might be better that we don't suggest a concrete command for archive_command. > > - not recommend any tools? > > This is the approach that has been tried and it's, objectively, failed > miserably. Our users are ending up with invalid and unusable backups, > corrupted WAL segments, inability to use PITR, and various other issues > because we've been trying to pretend that this isn't a hard problem. We > really need to stop that and accept that it's hard and promote the tools > which have been explicitly written to address that hard problem. I can sympathize that but is there any difference with system backups? One can just copy $HOME to another directory in the same drive then call it a day. Another uses dd to make a image backup. Others need durability or guarantee for integrity or even encryption so acquire or purchase a tool that conforms their requirements. Or someone creates their own backup solution that meets their requirements. On the other hand, what OS distributors offer a long list for requirements or a recipe for perfect backups? (Yeah, I'm saying this based on nothing, just from a prejudice.) If the system is serious, who don't know enough about backup ought to consult professionals before building an inadequate backup system and lose their data. > > > Hmm. A simple command that could be used as reference is for example > > > "dd" that flushes the file by itself, or we could just revisit the > > > discussions about having a pg_copy command, or we could document a > > > small utility in perl that does the job. > > > > I think we should do that if pg_copy comforms the mandatory > > requirements but maybe it's in the future. Showing the minimal > > implement in perl looks good. > > Already tried doing it in perl. No, it's not simple and it's also > entirely vaporware today and implies that we're going to develop this > tool, improve it in the future as we realize it needs to be improved, > and maintain it as part of core forever. If we want to actually adopt > and pull in a backup tool to be part of core then we should talk about > things which actually exist, such as the various existing projects that > have been written to specifically work to address all the requirements > which are understood today, not say "well, we can just write a simple > perl script to do it" because it's not actually that simple. > > Providing yet another half solution would be doubling-down on the failed > approach to document a "simple" solution and would be a disservice to > our users. Ok, if we follow the direction that we are responsible for ensuring that every user has reliable backups, I don't come up with proper description about that. We could list several "requirement" like "do sync after copy", "take a checksum for all files then check it periodically" or other things but what is more important things to list here, I think, is "how we run the archive_command". Doesn't the following work for now? (No example) - "%f is replace by ... %p is .., %r is ... in archive_command" - We call the archive_command for every wal segment which is finished. - We may call the archive_command for the same file more than once. - We may call the archive_command for different files with the same name. In this case server is working incorrectly and need a check. Don't overwrite with the new content. - We don't offer any durability or integrity on the archived files. All of them is up to you. You can use some existing solutions for archiving. See the following links. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: