Thread: Sample archive_command is still problematic
All: The example archive_command we give in the docs is this one: archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' This is a problematic recommendation. If there's any reason why copying the archive file gets interrupted (storage blip, for example), then the command will fail and will continue to fail forever, ending archiving. Is there some good reason why "test ! -f" was added to the sample? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > The example archive_command we give in the docs is this one: > > archive_command = 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' > > This is a problematic recommendation. I agree with that statement, ... > If there's any reason why copying the archive file gets > interrupted (storage blip, for example), then the command will > fail and will continue to fail forever, ending archiving. ... but not for that reason. > Is there some good reason why "test ! -f" was added to the > sample? In an environment with more than one cluster archiving, it is otherwise way too easy to copy a config file and have the WAL files of the two systems overwriting one another. I consider a check for an already existing file on the target to be very good practice. The errors in the log are a clue that something went wrong, and gives you a chance to fix things without data loss. The problem with the recommended command is that cp is not atomic. The file can be read before the contents are materialized, causing early end to recovery. I have seen it happen. The right way to do this is to copy to a different name or directory and mv the file into place once it is complete -- or use software which does that automatically, like rsync does. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 08/11/2014 10:21 AM, Kevin Grittner wrote: >> Is there some good reason why "test ! -f" was added to the >> sample? > > In an environment with more than one cluster archiving, it is > otherwise way too easy to copy a config file and have the WAL files > of the two systems overwriting one another. I consider a check for > an already existing file on the target to be very good practice. > The errors in the log are a clue that something went wrong, and > gives you a chance to fix things without data loss. It depends on what you're guarding against. In the case I was dealing with, the master crashed in the middle of an archive write. As a result, the file existed, but was incomplete, and *needed* to be overwritten. But because of 'test -f' archiving just kept failing. > The problem with the recommended command is that cp is not atomic. > The file can be read before the contents are materialized, causing > early end to recovery. I have seen it happen. The right way to do > this is to copy to a different name or directory and mv the file > into place once it is complete -- or use software which does that > automatically, like rsync does. Yeah, realistically, I think we need to start supplying a script or two in /contrib and referencing that. I'm not sure how to make it work for the Windows users though. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > On 08/11/2014 10:21 AM, Kevin Grittner wrote: >>> Is there some good reason why "test ! -f" was added to the >>> sample? >> >> In an environment with more than one cluster archiving, it is >> otherwise way too easy to copy a config file and have the WAL files >> of the two systems overwriting one another. I consider a check for >> an already existing file on the target to be very good practice. >> The errors in the log are a clue that something went wrong, and >> gives you a chance to fix things without data loss. > > It depends on what you're guarding against. In the case I was dealing > with, the master crashed in the middle of an archive write. As a > result, the file existed, but was incomplete, and *needed* to be > overwritten. But because of 'test -f' archiving just kept failing. I've seen that happen, too. It's just that the script I used sent an email to the DBAs when that happened, so the problem was quickly investigated and resolved. Also, our monitoring "big board" set an "LED" to red if we went an hour without a new WAL appearing in the archive directory. IMV the archiving script should ensure there is no data loss, and you should have monitoring or alert systems in place to know when things stall. >> The problem with the recommended command is that cp is not atomic. >> The file can be read before the contents are materialized, causing >> early end to recovery. I have seen it happen. The right way to do >> this is to copy to a different name or directory and mv the file >> into place once it is complete -- or use software which does that >> automatically, like rsync does. > > Yeah, realistically, I think we need to start supplying a script or two > in /contrib and referencing that. I'm not sure how to make it work for > the Windows users though. That might work. We should do something, though. The example we give in the docs is not production quality IMO, and is something of an embarrassment. The problem is, it may be hard to get agreement on what that should look like. As a DBA, I insisted on the check for an existing file. I also insisted on having scripts send an email to the DBAs on the first occurrence of a failure (but not to spam us on each and every failed attempt). -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Kevin Grittner <kgrittn@ymail.com> writes: > Josh Berkus <josh@agliodbs.com> wrote: >> Yeah, realistically, I think we need to start supplying a script or two >> in /contrib and referencing that.� I'm not sure how to make it work for >> the Windows users though. > That might work.� We should do something, though.� The example we > give in the docs is not production quality IMO, and is something of > an embarrassment. Well, it's not really intended to be production grade, and I think the docs say so (perhaps not emphatically enough). > The problem is, it may be hard to get agreement > on what that should look like.� As a DBA, I insisted on the check > for an existing file.� I also insisted on having scripts send an > email to the DBAs on the first occurrence of a failure (but not to > spam us on each and every failed attempt). As soon as you get into requirements like "send me email on failure", I think we're well past what we'd want as a documentation example. Conceivably we could put such a script in contrib though. The problem with such things as sample scripts is that it might get hard for people to tell the difference between barnacles (like email ;-)) and properties that they'd better preserve in any custom script. The documentation is primarily trying to make the point that the archive action must not overwrite any existing file (which is such a property) and that's why it has the test ! -f. It doesn't really address the question of appropriate error handling, which is what Josh is on about. It's possible that something like test ! -f $target && cp $source $target.tmp && mv $target.tmp $target adds enough robustness to be worth recommending. But it should go along with docs changes to emphasize that error handling is another important property to worry about. regards, tom lane
On 08/11/2014 12:49 PM, Tom Lane wrote: > Kevin Grittner <kgrittn@ymail.com> writes: >> Josh Berkus <josh@agliodbs.com> wrote: >>> Yeah, realistically, I think we need to start supplying a script or two >>> in /contrib and referencing that.� I'm not sure how to make it work for >>> the Windows users though. > >> That might work.� We should do something, though.� The example we >> give in the docs is not production quality IMO, and is something of >> an embarrassment. > > Well, it's not really intended to be production grade, and I think the > docs say so (perhaps not emphatically enough). Thing is, if we supply a sample command in the docs ... even if it's preceeded by ***DO NOT USE WILL EAT YOUR SERVER*** ... people will still copy-and-paste it, and then put it into production. > The problem with such things as sample scripts is that it might get hard > for people to tell the difference between barnacles (like email ;-)) > and properties that they'd better preserve in any custom script. > The documentation is primarily trying to make the point that the archive > action must not overwrite any existing file (which is such a property) > and that's why it has the test ! -f. It doesn't really address the > question of appropriate error handling, which is what Josh is on about. I'm suggesting that we've established that there is no one-liner which will not cause real problems for users who copy it. Given that, we should not supply a one-liner, even as an example; we should supply some sample scripts, and a reference in the docs: "Please look at /share/archiving-scripts/ for some sample shell scripts for archiving, or use a project like WAL-E, Barman, or OmniPITR." The alternative is to supply a C utility ourselves for log copying, but I think the presence of multiple archiving utilities is a good argument that it's not possible for any given utility to cover more than 30% of use-cases. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
From: "Kevin Grittner" <kgrittn@ymail.com> The problem with the recommended command is that cp is not atomic. The file can be read before the contents are materialized, causing early end to recovery. I have seen it happen. The right way to do this is to copy to a different name or directory and mv the file into place once it is complete -- or use software which does that automatically, like rsync does. I submitted a patch a patch for this a few months ago, which is pg_copy listed in the current CF. The patch also addresses the problem that the archived file can get lost after power failure because it is not flushed to disk. The patch consists of a program called pg_copy which can be used instead of cp/copy, and a doc fix to suggest using mv. I made it following the favorable suggestions from people. Regards MauMau
On 08/11/2014 03:23 PM, MauMau wrote: > From: "Kevin Grittner" <kgrittn@ymail.com> > The problem with the recommended command is that cp is not atomic. > The file can be read before the contents are materialized, causing > early end to recovery. I have seen it happen. The right way to do > this is to copy to a different name or directory and mv the file > into place once it is complete -- or use software which does that > automatically, like rsync does. > > > I submitted a patch a patch for this a few months ago, which is pg_copy > listed in the current CF. The patch also addresses the problem that the > archived file can get lost after power failure because it is not flushed > to disk. The patch consists of a program called pg_copy which can be > used instead of cp/copy, and a doc fix to suggest using mv. I made it > following the favorable suggestions from people. Yah? Does it work on Windows? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 8/11/14 6:23 PM, MauMau wrote: > I submitted a patch a patch for this a few months ago, which is pg_copy > listed in the current CF. The patch also addresses the problem that the > archived file can get lost after power failure because it is not flushed > to disk. The patch consists of a program called pg_copy which can be > used instead of cp/copy, and a doc fix to suggest using mv. I made it > following the favorable suggestions from people. I realize that there are about 128 different ways people set this up (which is itself a problem), but it appears to me that a solution like pg_copy only provides local copying, which implies the use of something like NFS. Which may be OK, but then we'd need to get into the details of how to set up NFS properly for this. Also, I think you can get local copy+fsync with dd. The alternatives of doing remote copying inside archive_command are also questionable if you have multiple standbys. Basically, this whole interface is terrible. Maybe it's time to phase it out and start looking into pg_receivexlog.
Peter Eisentraut <peter_e@gmx.net> wrote: > On 8/11/14 6:23 PM, MauMau wrote: > >> I submitted a patch a patch for this a few months ago, which is pg_copy >> listed in the current CF. The patch also addresses the problem that the >> archived file can get lost after power failure because it is not flushed >> to disk. The patch consists of a program called pg_copy which can be >> used instead of cp/copy, and a doc fix to suggest using mv. I made it >> following the favorable suggestions from people. > > I realize that there are about 128 different ways people set this up > (which is itself a problem), but it appears to me that a solution like > pg_copy only provides local copying, which implies the use of something > like NFS. Not necessarily. What I have done is to use the cp/mv technique on the database server and then rsync (through ssh) from each place that needs it. That seems to me much less fragile than copying to an NFS mount point. > Also, I think you can get local copy+fsync with dd. Does the directory entry only become visible to other processes once the file is complete when you use dd? > The alternatives of doing remote copying inside archive_command are also > questionable if you have multiple standbys. Right. It's a nightmare to try to design anything to serve multiple standbys without having the initial archive be local and copying from that archive to the others. At least, if there is some other good solution, I have yet to see it. The above is regarding WAL file archiving -- I'm not putting down streaming replication. Of course, what I would have *really* liked is a WAL receiver that could write out normal-looking WAL files for archiving purposes and pass through the WAL stream to a hot standby. Last I checked (which was admittedly at least a couple years back) there was no such utility, although I seem to remember that Magnus had done some work that looked like it could be bent to that end. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
From: "Josh Berkus" <josh@agliodbs.com> > Yah? Does it work on Windows? Yes. pg_copy is meant to be a replacement for cp/copy, not for the entire archive_command script. It just opens, reads, writes, syncs, and closes the file. Regards MauMau
From: "Peter Eisentraut" <peter_e@gmx.net> > I realize that there are about 128 different ways people set this up > (which is itself a problem), but it appears to me that a solution like > pg_copy only provides local copying, which implies the use of something > like NFS. Which may be OK, but then we'd need to get into the details > of how to set up NFS properly for this. Yes, I think the flexibility of archive_command is nice. The problem I want to address is that users don't have a simple way to realiably archive files in very simple use cases -- local copying to local or network storage. pg_copy is a low-level command to fill the gap. > Also, I think you can get local copy+fsync with dd. Yes, dd on Linux has "sync" option. But dd on Solaris doesn't. I can't find a command on Windows which is installed by default. > The alternatives of doing remote copying inside archive_command are also > questionable if you have multiple standbys. Yes, we may need another interface than archive_command for archiving files to multiple locations. That's another issue. > Basically, this whole interface is terrible. Maybe it's time to phase > it out and start looking into pg_receivexlog. pg_receivexlog seems difficult to me. Users have to start, stop, and monitor pg_receivexlog. That's burdonsome. For example, how do we start pg_receivexlog easily on Windows when the PostgreSQL is configured to start/stop automatically on OS startup/shutdown with Windows service? In addition, users have to be aware of connection slots (max_connections and max_wal_senders) and replication slots. pg_receivexlog impose extra overhead even on simple use cases. I want backup-related facilities to use as less resources as possible. e.g., with archive_command, the data flows like this: disk -> OS cache -> copy command's buffer -> OS cache -> disk OTOH, with pg_receivexlog: disk -> OS cache -> walsender's buffer -> socket send buffer -> kernel buffer? -> socket receive buffer -> pg_receivexlog's buffer -> OS cache -> disk For reference, \copy of psql is described like this: Tip: This operation is not as efficient as the SQL COPY command because all data must pass through the client/server connection. For large amounts of data the SQL command might be preferable. Regards MauMau
On Wed, Aug 13, 2014 at 11:23 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > The above is regarding WAL file archiving -- I'm not putting down > streaming replication. Of course, what I would have *really* liked > is a WAL receiver that could write out normal-looking WAL files for > archiving purposes and pass through the WAL stream to a hot > standby. Last I checked (which was admittedly at least a couple > years back) there was no such utility, although I seem to remember > that Magnus had done some work that looked like it could be bent to > that end. I did. But I think that has mostly been superceded by replication slots now. As in, if you use pg_receivexlog with a specific replication slot, I believe you no longer need archive command at all, do you? Since the replication slot will block rotation of the WAL files until they are actually archived by pg_receivexlog (What my command did was have an archive command that looked back into pg_stat_replication to see if pg_receivexlog had received the data or not). It did not pass through any WAL stream though - you'd have your standby connect directly to the same master that pg_receivexlog connects to. What would be the actual reason for having that one do the passthrough itself? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus Hagander <magnus@hagander.net> wrote: > On Wed, Aug 13, 2014 at 11:23 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > >> The above is regarding WAL file archiving -- I'm not putting down >> streaming replication. Of course, what I would have *really* liked >> is a WAL receiver that could write out normal-looking WAL files for >> archiving purposes and pass through the WAL stream to a hot >> standby. Last I checked (which was admittedly at least a couple >> years back) there was no such utility, although I seem to remember >> that Magnus had done some work that looked like it could be bent to >> that end. > > I did. But I think that has mostly been superceded by replication > slots now. As in, if you use pg_receivexlog with a specific > replication slot, I believe you no longer need archive command at all, > do you? Since the replication slot will block rotation of the WAL > files until they are actually archived by pg_receivexlog (What my > command did was have an archive command that looked back into > pg_stat_replication to see if pg_receivexlog had received the data or > not). > > It did not pass through any WAL stream though - you'd have your > standby connect directly to the same master that pg_receivexlog > connects to. What would be the actual reason for having that one do > the passthrough itself? The use case was to maintain both a hot standby and a set of WAL files to allow PITR recovery (e.g., to recover to just before some catastrophic SQL command was executed) to a remote site across a *slow* WAN connection. Rather than send the WAL across the slow connection twice they would ship and apply WAL files and suffer the consequent replication delay to the hot standby; but if the standby could be done through streaming replication and the WAL files could still be re-created off of the same stream, that would be better. Basically, where bandwidth is limited and expensive, you don't want to have to send the same WAL data over the same connection more than once. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Aug 17, 2014 at 9:50 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > Magnus Hagander <magnus@hagander.net> wrote: > >> On Wed, Aug 13, 2014 at 11:23 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > >> >>> The above is regarding WAL file archiving -- I'm not putting down >>> streaming replication. Of course, what I would have *really* liked >>> is a WAL receiver that could write out normal-looking WAL files for >>> archiving purposes and pass through the WAL stream to a hot >>> standby. Last I checked (which was admittedly at least a couple >>> years back) there was no such utility, although I seem to remember >>> that Magnus had done some work that looked like it could be bent to >>> that end. >> >> I did. But I think that has mostly been superceded by replication >> slots now. As in, if you use pg_receivexlog with a specific >> replication slot, I believe you no longer need archive command at all, >> do you? Since the replication slot will block rotation of the WAL >> files until they are actually archived by pg_receivexlog (What my >> command did was have an archive command that looked back into >> pg_stat_replication to see if pg_receivexlog had received the data or >> not). >> >> It did not pass through any WAL stream though - you'd have your >> standby connect directly to the same master that pg_receivexlog >> connects to. What would be the actual reason for having that one do >> the passthrough itself? > > The use case was to maintain both a hot standby and a set of WAL > files to allow PITR recovery (e.g., to recover to just before some > catastrophic SQL command was executed) to a remote site across a > *slow* WAN connection. Rather than send the WAL across the slow > connection twice they would ship and apply WAL files and suffer the > consequent replication delay to the hot standby; but if the standby > could be done through streaming replication and the WAL files could > still be re-created off of the same stream, that would be better. > > Basically, where bandwidth is limited and expensive, you don't want > to have to send the same WAL data over the same connection more > than once. Oh, now I remember. Different usecase, different tool :) That said, you can almost get there with pg_receivexlog - have it create the archives ,and use non-streaming replication on the slave... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Magnus, Kevin: Two things: 1) pg_receivexlog doesn't help for users who need to archive from the master to cold storage (e.g. remote SAN, S3, whatever). So we're going to still need an archive_command. 2) One reason users are using the "test -f" version of the archive command is that we put it in the same postgresql.conf. I would suggest that we don't put *any* archive command in the sample postgresql.conf, since there is no command we can supply which isn't a potential foot-gun. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Aug 18, 2014 at 7:12 PM, Josh Berkus <josh@agliodbs.com> wrote: > Magnus, Kevin: > > Two things: > > 1) pg_receivexlog doesn't help for users who need to archive from the > master to cold storage (e.g. remote SAN, S3, whatever). So we're going > to still need an archive_command. It can, but it's definitely ugly :) As in you can run pg_receivexlog locally... (No, I am not seriously suggesting you should) > 2) One reason users are using the "test -f" version of the archive > command is that we put it in the same postgresql.conf. I would suggest > that we don't put *any* archive command in the sample postgresql.conf, > since there is no command we can supply which isn't a potential foot-gun. +1. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Josh Berkus <josh@agliodbs.com> writes: > 2) One reason users are using the "test -f" version of the archive > command is that we put it in the same postgresql.conf. I would suggest > that we don't put *any* archive command in the sample postgresql.conf, > since there is no command we can supply which isn't a potential foot-gun. If you want to remove the last line of this: #archive_command = '' # command to use to archive a logfile segment # placeholders: %p = path of file to archive # %f = file name only # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' I'm okay with that. But if you want to remove the sample command from the SGML docs I'm going to push back a bit harder... regards, tom lane
On 08/18/2014 10:31 AM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> 2) One reason users are using the "test -f" version of the archive >> command is that we put it in the same postgresql.conf. I would suggest >> that we don't put *any* archive command in the sample postgresql.conf, >> since there is no command we can supply which isn't a potential foot-gun. > > If you want to remove the last line of this: > > #archive_command = '' # command to use to archive a logfile segment > # placeholders: %p = path of file to archive > # %f = file name only > # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' > > I'm okay with that. But if you want to remove the sample command from the > SGML docs I'm going to push back a bit harder... Well, if we want a sample command in the docs, then we should actually give a few different sample commands and even a sample shell script. If we give the users several alternatives, maybe they'll actually think about it instead of just C&P. Make sense? -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus <josh@agliodbs.com> writes: > Well, if we want a sample command in the docs, then we should actually > give a few different sample commands and even a sample shell script. If > we give the users several alternatives, maybe they'll actually think > about it instead of just C&P. Sure, if we can think of some plausibly different samples. regards, tom lane
On 08/19/2014 10:39 AM, Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> Well, if we want a sample command in the docs, then we should actually >> give a few different sample commands and even a sample shell script. If >> we give the users several alternatives, maybe they'll actually think >> about it instead of just C&P. > > Sure, if we can think of some plausibly different samples. I was thinking: 1. noclobber copy to a local/mounted dir (the current example) 2. rsync to a remote dir with overwrite 3. windows command example (hopefully using rsync ... can anyone help with this?) 4. Simple bash script example using rsync to a target server, with noarchiving touch file etc. I'd also like to mention pg_receivexlog, Barman, WAL-E, and RepMgr as external utilites and alternatives. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Aug 19, 2014 at 11:27:47AM -0700, Josh Berkus wrote: > I'd also like to mention pg_receivexlog, Barman, WAL-E, and RepMgr as > external utilites and alternatives. Yes, please! We need to give users on what else is available, and the strengths of each. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 08/20/2014 04:12 PM, Bruce Momjian wrote: > On Tue, Aug 19, 2014 at 11:27:47AM -0700, Josh Berkus wrote: >> I'd also like to mention pg_receivexlog, Barman, WAL-E, and RepMgr as >> external utilites and alternatives. > > Yes, please! We need to give users on what else is available, and the > strengths of each. > Well, I think putting an *evaluation* of the technologies there would be problematic, because the docs change very slowly and outside tech changes fast. I was just going to post "take a look at these examples". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Aug 21, 2014 at 11:52:07AM -0700, Josh Berkus wrote: > On 08/20/2014 04:12 PM, Bruce Momjian wrote: > > On Tue, Aug 19, 2014 at 11:27:47AM -0700, Josh Berkus wrote: > >> I'd also like to mention pg_receivexlog, Barman, WAL-E, and RepMgr as > >> external utilites and alternatives. > > > > Yes, please! We need to give users on what else is available, and the > > strengths of each. > > > > Well, I think putting an *evaluation* of the technologies there would be > problematic, because the docs change very slowly and outside tech > changes fast. I was just going to post "take a look at these examples". OK. I personally would like to know the strengths of each. Is that documented anywhere? -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 08/21/2014 12:20 PM, Bruce Momjian wrote: > On Thu, Aug 21, 2014 at 11:52:07AM -0700, Josh Berkus wrote: >> On 08/20/2014 04:12 PM, Bruce Momjian wrote: >>> On Tue, Aug 19, 2014 at 11:27:47AM -0700, Josh Berkus wrote: >>>> I'd also like to mention pg_receivexlog, Barman, WAL-E, and RepMgr as >>>> external utilites and alternatives. >>> >>> Yes, please! We need to give users on what else is available, and the >>> strengths of each. >>> >> >> Well, I think putting an *evaluation* of the technologies there would be >> problematic, because the docs change very slowly and outside tech >> changes fast. I was just going to post "take a look at these examples". > > OK. I personally would like to know the strengths of each. Is that > documented anywhere? Nope. Would be hard to do, because there's some company partisanship involved. Personally I use WAL-E when I'm on AWS and Barman when I'm not, but I know that the other solutions have their adherents. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com