Thread: [RFC] What should we do for reliable WAL archiving?
Hello, The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on Windows) as an example for archive_command. However, cp/copy does not sync the copied data to disk. As a result, the completed WAL segments would be lost in the following sequence: 1. A WAL segment fills up. 2. The archiver process archives the just filled WAL segment using archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/ and writes to the archive area. At this point, the WAL file is not persisted to the archive area yet, because cp/copy doesn't sync the writes. 3. The checkpoint processing removes the WAL segment file from pg_xlog/. 4. The OS crashes. The filled WAL segment doesn't exist anywhere any more. Considering the "reliable" image of PostgreSQL and widespread use in enterprise systems, I think something should be done. Could you give me your opinions on the right direction? Although the doc certainly escapes by saying "(This is an example, not a recommendation, and might not work on all platforms.)", it seems from pgsql-xxx MLs that many people are following this example. * Improve the example in the documentation. But what command can we use to reliably sync just one file? * Provide some command, say pg_copy, which copies a file synchronously by using fsync(), and describes in the doc something like "for simple use cases, you can use pg_copy as the standard reliable copy command." Related to this topic, pg_basebackup doesn't fsync the backed up files. I'm afraid this too is different from what the users expect --- I guess they would expect the backup is certainly available after pg_basebackup completes even if the machine crashes. Regards MauMau
On Sun, Mar 16, 2014 at 3:53 PM, MauMau <maumau307@gmail.com> wrote: > Hello, > > The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on > Windows) as an example for archive_command. However, cp/copy does not sync > the copied data to disk. As a result, the completed WAL segments would be > lost in the following sequence: > > 1. A WAL segment fills up. > > 2. The archiver process archives the just filled WAL segment using > archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/ > and writes to the archive area. At this point, the WAL file is not > persisted to the archive area yet, because cp/copy doesn't sync the writes. > > 3. The checkpoint processing removes the WAL segment file from pg_xlog/. > > 4. The OS crashes. The filled WAL segment doesn't exist anywhere any more. > > Considering the "reliable" image of PostgreSQL and widespread use in > enterprise systems, I think something should be done. Could you give me > your opinions on the right direction? How about using pg_receivexlog for archiving purpose? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
From: "Amit Kapila" <amit.kapila16@gmail.com> > How about using pg_receivexlog for archiving purpose? pg_receivexlog is good in that it does fsync(). But it seems difficult to use correctly, and I'm not sure if I can catch all WAL segments without any loss. pg_receivexlog must be started with postmaster and monitored with some measures. This won't be very easy at least on Windows. The pg_receivexlog reference page suggests another difficulty: Notes When using pg_receivexlog instead of archive_command, the server will continue to recycle transaction log files even if the backups are not properly archived, since there is no command that fails. This can be worked around by having an archive_command that fails when the file has not been properly archived yet, for example: archive_command = 'sleep 5 && test -f /mnt/server/archivedir/%f' This suggestion is not correct, because it only checks the existence of the file. What if the file size is less than 16MB? How can we check if the file is completely archived? Regards MauMau
On Sun, Mar 16, 2014 at 10:23 AM, MauMau <maumau307@gmail.com> wrote: > The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on > Windows) as an example for archive_command. However, cp/copy does not sync > the copied data to disk I'm actually a lot less concerned about fsyncing the backup than I am about fsyncing the restore. The backup is just a bunch of files for the user to make use of. They might copy them around, compress them, move them onto tape or other storage. They need to be aware of the persistence of whatever storage system they're putting them in. But when they do a restore they just untar or whatever other extraction tool and then hand those files to Postgres to maintain. I bet the number of people who fsync or call sync the data files after untarring their backups is vanishingly small and problems could manifest later after Postgres has been running. WAL-e recently changed to fsync each data file and the directories containing them after restore. But perhaps Postgres should open and fsync each file in the database when it starts up? In most file systems files written to are guaranteed to be synced within a configurable amount of time (in some systems unless the filesystem can't keep up). So the practical risk may be small. But in theory a database that wasn't synced when it was restored could suddenly lose files days or months later when a crash occurs and some data files weren't touched by the database in the intervening time. -- greg
On 03/16/2014 03:23 PM, MauMau wrote: > From: "Amit Kapila" <amit.kapila16@gmail.com> >> How about using pg_receivexlog for archiving purpose? > > pg_receivexlog is good in that it does fsync(). But it seems difficult > to use correctly, and I'm not sure if I can catch all WAL segments > without any loss. pg_receivexlog must be started with postmaster and > monitored with some measures. This won't be very easy at least on Windows. Replication slots should solve the issue of making sure to catch all of the WAL. -- Andreas Karlsson
On Sun, Mar 16, 2014 at 6:23 AM, MauMau <maumau307@gmail.com> wrote: > The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on > Windows) as an example for archive_command. However, cp/copy does not sync > the copied data to disk. As a result, the completed WAL segments would be > lost in the following sequence: > > 1. A WAL segment fills up. > > 2. The archiver process archives the just filled WAL segment using > archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/ > and writes to the archive area. At this point, the WAL file is not > persisted to the archive area yet, because cp/copy doesn't sync the writes. > > 3. The checkpoint processing removes the WAL segment file from pg_xlog/. > > 4. The OS crashes. The filled WAL segment doesn't exist anywhere any more. > > Considering the "reliable" image of PostgreSQL and widespread use in > enterprise systems, I think something should be done. Could you give me > your opinions on the right direction? Although the doc certainly escapes by > saying "(This is an example, not a recommendation, and might not work on all > platforms.)", it seems from pgsql-xxx MLs that many people are following > this example. > > * Improve the example in the documentation. > But what command can we use to reliably sync just one file? > > * Provide some command, say pg_copy, which copies a file synchronously by > using fsync(), and describes in the doc something like "for simple use > cases, you can use pg_copy as the standard reliable copy command." +1. This won't obviate the need for tools to manage replication, but it would make it possible to get the simplest case right without guessing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Mar 16, 2014 at 7:53 PM, MauMau <maumau307@gmail.com> wrote: > From: "Amit Kapila" <amit.kapila16@gmail.com> > >> How about using pg_receivexlog for archiving purpose? > > > pg_receivexlog is good in that it does fsync(). But it seems difficult to > use correctly, and I'm not sure if I can catch all WAL segments without any > loss. pg_receivexlog must be started with postmaster and monitored with > some measures. This won't be very easy at least on Windows. > > The pg_receivexlog reference page suggests another difficulty: > > Notes > When using pg_receivexlog instead of archive_command, the server will > continue to recycle transaction log files even if the backups are not > properly archived, since there is no command that fails. This can be worked > around by having an archive_command that fails when the file has not been > properly archived yet, for example: > archive_command = 'sleep 5 && test -f /mnt/server/archivedir/%f' > > This suggestion is not correct, because it only checks the existence of the > file. What if the file size is less than 16MB? How can we check if the > file is completely archived? The most probable reasons for un-successful archiving could be: 1. Disk space got full - pg_receivexlog makes sure while open/create new segment file that the size of new file should be 16MB (open_walfile()). So due to this reason there should not be a problem to above command. 2. Permission got denied - I think this will lead to failure of above archive command mentioned by you. 3. n/w connection broken - This will also lead to failure of above command, but here I think there is a possibility that it might have checked the existence of in-complete wal file on archive location and consider it archived, but I think wal_keep_segments can avoid this problem. Also if you are on 9.4, then may be --slot parameter can help you. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Mar 16, 2014 at 6:23 AM, MauMau <maumau307@gmail.com> wrote: >> The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on >> Windows) as an example for archive_command. However, cp/copy does not sync >> the copied data to disk. As a result, the completed WAL segments would be >> lost in the following sequence: >> >> 1. A WAL segment fills up. >> >> 2. The archiver process archives the just filled WAL segment using >> archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/ >> and writes to the archive area. At this point, the WAL file is not >> persisted to the archive area yet, because cp/copy doesn't sync the writes. >> >> 3. The checkpoint processing removes the WAL segment file from pg_xlog/. >> >> 4. The OS crashes. The filled WAL segment doesn't exist anywhere any more. >> >> Considering the "reliable" image of PostgreSQL and widespread use in >> enterprise systems, I think something should be done. Could you give me >> your opinions on the right direction? Although the doc certainly escapes by >> saying "(This is an example, not a recommendation, and might not work on all >> platforms.)", it seems from pgsql-xxx MLs that many people are following >> this example. >> >> * Improve the example in the documentation. >> But what command can we use to reliably sync just one file? >> >> * Provide some command, say pg_copy, which copies a file synchronously by >> using fsync(), and describes in the doc something like "for simple use >> cases, you can use pg_copy as the standard reliable copy command." > > +1. This won't obviate the need for tools to manage replication, but > it would make it possible to get the simplest case right without > guessing. +1, too. And, what about making pg_copy call posix_fadvise(DONT_NEED) against the archived file after the copy? Also It might be good idea to support the direct copy of the file to avoid wasting the file cache. Regards, -- Fujii Masao
2014-03-17 21:12 GMT+09:00 Fujii Masao <masao.fujii@gmail.com>:
+1, too.On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Mar 16, 2014 at 6:23 AM, MauMau <maumau307@gmail.com> wrote:
>> The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
>> Windows) as an example for archive_command. However, cp/copy does not sync
>> the copied data to disk. As a result, the completed WAL segments would be
>> lost in the following sequence:
>>
>> 1. A WAL segment fills up.
>>
>> 2. The archiver process archives the just filled WAL segment using
>> archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/
>> and writes to the archive area. At this point, the WAL file is not
>> persisted to the archive area yet, because cp/copy doesn't sync the writes.
>>
>> 3. The checkpoint processing removes the WAL segment file from pg_xlog/.
>>
>> 4. The OS crashes. The filled WAL segment doesn't exist anywhere any more.
>>
>> Considering the "reliable" image of PostgreSQL and widespread use in
>> enterprise systems, I think something should be done. Could you give me
>> your opinions on the right direction? Although the doc certainly escapes by
>> saying "(This is an example, not a recommendation, and might not work on all
>> platforms.)", it seems from pgsql-xxx MLs that many people are following
>> this example.
>>
>> * Improve the example in the documentation.
>> But what command can we use to reliably sync just one file?
>>
>> * Provide some command, say pg_copy, which copies a file synchronously by
>> using fsync(), and describes in the doc something like "for simple use
>> cases, you can use pg_copy as the standard reliable copy command."
>
> +1. This won't obviate the need for tools to manage replication, but
> it would make it possible to get the simplest case right without
> guessing.
And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
archived file after the copy? Also It might be good idea to support the direct
copy of the file to avoid wasting the file cache.
Use direct_cp.
Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center
From: "Mitsumasa KONDO" <kondo.mitsumasa@gmail.com> > 2014-03-17 21:12 GMT+09:00 Fujii Masao <masao.fujii@gmail.com>: > >> On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas <robertmhaas@gmail.com> >> wrote: >> > On Sun, Mar 16, 2014 at 6:23 AM, MauMau <maumau307@gmail.com> wrote: >> >> * Improve the example in the documentation. >> >> But what command can we use to reliably sync just one file? >> >> >> >> * Provide some command, say pg_copy, which copies a file synchronously >> by >> >> using fsync(), and describes in the doc something like "for simple use >> >> cases, you can use pg_copy as the standard reliable copy command." >> > >> > +1. This won't obviate the need for tools to manage replication, but >> > it would make it possible to get the simplest case right without >> > guessing. >> >> +1, too. >> >> And, what about making pg_copy call posix_fadvise(DONT_NEED) against the >> archived file after the copy? Also It might be good idea to support the >> direct >> copy of the file to avoid wasting the file cache. > > Use direct_cp. > http://directcp.sourceforge.net/direct_cp.html Thank you all for giving favorable responses and interesting ideas. Then, I think I'll do: * Create pg_copy in C so that it can be used on Windows as well as on UNIX/Linux. It just copies one file. Its source code is located in src/bin/pg_copy/. Please recommend a better name if you have one in mind. * Add a reference page for pg_copy in the chapter "Server applications". Modify the section for continuous archiving to recommend pg_copy for simple use cases as the standard command. * pg_copy calls posix_fadvise(DONT_NEED) on the destination file. * pg_copy passes O_DIRECT flag when opening the destination file when --directio or -d option is specified. O_DIRECT is not used by default because it may not be available on some file systems, as well as it might cause trouble on older platforms such as RHEL4/5. pg_copy does not use O_DIRECT for the source file so that it can copy the data from the filesystem cache, which is just written by postgres. Could you give me your opinions before starting the work, including the following? * Should I refactor the functions (copy_file, copydir, etc.) in src/backend/storage/file/copydir.c so that they can also be used for frontends? If so, which of src/port or src/common/ is the right place to put copydir.c in? * Should I complete the work before 9.4 beta so that it will be available starting with 9.4? I think so because it is a basic capability to archive transaction logs safely (although the time may not allow me to do this). Regards MauMau
On Fri, Mar 21, 2014 at 8:54 PM, MauMau <maumau307@gmail.com> wrote: > * Create pg_copy in C so that it can be used on Windows as well as on > UNIX/Linux. It just copies one file. Its source code is located in > src/bin/pg_copy/. Please recommend a better name if you have one in mind. I'd rather see that as a part of contrib/ if possible. Is there any portion of the code you have in mind that makes mandatory putting it in src/bin? > * Should I complete the work before 9.4 beta so that it will be available > starting with 9.4? I think so because it is a basic capability to archive > transaction logs safely (although the time may not allow me to do this). Pursing efforts on a utility like that is worth the shot IMO (I would use it for sure if it has reliable cross-platform support to unify sets of replication scripts), but including it in 9.4 is out of scope. A saner target would be the 1st commit fest of 9.5. Regards, -- Michael
From: "Michael Paquier" <michael.paquier@gmail.com> > On Fri, Mar 21, 2014 at 8:54 PM, MauMau <maumau307@gmail.com> wrote: >> * Create pg_copy in C so that it can be used on Windows as well as on >> UNIX/Linux. It just copies one file. Its source code is located in >> src/bin/pg_copy/. Please recommend a better name if you have one in >> mind. > I'd rather see that as a part of contrib/ if possible. Is there any > portion of the code you have in mind that makes mandatory putting it > in src/bin? Archiving transaction logs reliably is a basic responsibility of DBMS, so I think it should be treated as part of the core. It is not a desirable feature but actually a mandatory one to persist transaction logs. Even if it were a "better to have" feature, it can be put in the core like pg_basebackup and pg_isready, which are not mandatory tools. >> * Should I complete the work before 9.4 beta so that it will be available >> starting with 9.4? I think so because it is a basic capability to >> archive >> transaction logs safely (although the time may not allow me to do this). > Pursing efforts on a utility like that is worth the shot IMO (I would > use it for sure if it has reliable cross-platform support to unify > sets of replication scripts), but including it in 9.4 is out of scope. > A saner target would be the 1st commit fest of 9.5. OK, I don't mind if it should be targeted at 9.4 or 9.5. If someone wants it for 9.4, I try to hurry. Regards MauMau
From: "Michael Paquier" <michael.paquier@gmail.com> > On Fri, Mar 21, 2014 at 8:54 PM, MauMau <maumau307@gmail.com> wrote: >> * Create pg_copy in C so that it can be used on Windows as well as on >> UNIX/Linux. It just copies one file. Its source code is located in >> src/bin/pg_copy/. Please recommend a better name if you have one in >> mind. > I'd rather see that as a part of contrib/ if possible. Is there any > portion of the code you have in mind that makes mandatory putting it > in src/bin? Archiving transaction logs reliably is a basic responsibility of DBMS, so I think it should be treated as part of the core. It is not a desirable feature but actually a mandatory one to persist transaction logs. Even if it were a "better to have" feature, it can be put in the core like pg_basebackup and pg_isready, which are not mandatory tools. >> * Should I complete the work before 9.4 beta so that it will be available >> starting with 9.4? I think so because it is a basic capability to >> archive >> transaction logs safely (although the time may not allow me to do this). > Pursing efforts on a utility like that is worth the shot IMO (I would > use it for sure if it has reliable cross-platform support to unify > sets of replication scripts), but including it in 9.4 is out of scope. > A saner target would be the 1st commit fest of 9.5. OK, I don't mind if it should be targeted at 9.4 or 9.5. If someone wants it for 9.4, I try to hurry. Regards MauMau
On Sun, Mar 16, 2014 at 3:23 AM, MauMau <maumau307@gmail.com> wrote:
Hello,
The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on Windows) as an example for archive_command. However, cp/copy does not sync the copied data to disk. As a result, the completed WAL segments would be lost in the following sequence:
1. A WAL segment fills up.
2. The archiver process archives the just filled WAL segment using archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/ and writes to the archive area. At this point, the WAL file is not persisted to the archive area yet, because cp/copy doesn't sync the writes.
3. The checkpoint processing removes the WAL segment file from pg_xlog/.
Note that it takes two checkpoints for this to happen, at least as currently coded.
Also, if the system crashed badly enough to need media recovery, rather than just automatic crash recovery, some lost transactions are expected. Although this could silently break your PITR chain, of a crash happened and automatic recover used the copy in pg_xlog (which of course was synced) , while copy in the archive was not synced.
4. The OS crashes. The filled WAL segment doesn't exist anywhere any more.
Considering the "reliable" image of PostgreSQL and widespread use in enterprise systems, I think something should be done. Could you give me your opinions on the right direction? Although the doc certainly escapes by saying "(This is an example, not a recommendation, and might not work on all platforms.)", it seems from pgsql-xxx MLs that many people are following this example.
I use this as an example, kind of, but what I am copying to is a network mount, so any attempts to fsync it there would probably need unavailable hooks into the remote file system.
Do people really just copy the files from one directory of local storage to another directory of local storage? I don't see the point of that. But it seems like this is an area where there are hundreds of use cases, and often one doesn't see the point of other people's, making it hard to come up with good examples.
* Improve the example in the documentation.
But what command can we use to reliably sync just one file?
* Provide some command, say pg_copy, which copies a file synchronously by using fsync(), and describes in the doc something like "for simple use cases, you can use pg_copy as the standard reliable copy command."
The recommendation is to refuse to overwrite an existing file of the same name, and exit with failure. Which essentially brings archiving to a halt, because it keeps trying but it will keep failing. If we make a custom version, one thing it should do is determine if the existing archived file is just a truncated version of the attempting-to-be archived file, and if so overwrite it. Because if the first archival command fails with a network glitch, it can leave behind a partial file.
Cheers,
Jeff
From: "Jeff Janes" <jeff.janes@gmail.com> > Do people really just copy the files from one directory of local storage > to > another directory of local storage? I don't see the point of that. It makes sense to archive WAL to a directory of local storage for media recovery. Here, the local storage is a different disk drive which is directly attached to the database server or directly connected through SAN. > The recommendation is to refuse to overwrite an existing file of the same > name, and exit with failure. Which essentially brings archiving to a > halt, > because it keeps trying but it will keep failing. If we make a custom > version, one thing it should do is determine if the existing archived file > is just a truncated version of the attempting-to-be archived file, and if > so overwrite it. Because if the first archival command fails with a > network glitch, it can leave behind a partial file. What I'm trying to address is just an alternative to cp/copy which fsyncs a file. It just overwrites an existing file. Yes, you're right, the failed archive attempt leaves behind a partial file which causes subsequent attempts to fail, if you follow the PG manual. That's another undesirable point in the current doc. To overcome this, someone on this ML recommended me to do "cp %p /archive/dir/%f.tmp && mv /archive/dir/%f.tmp /archive/dir/%f". Does this solve your problem? Regards MauMau
On Fri, Mar 21, 2014 at 01:16:08PM -0700, Jeff Janes wrote: > On Sun, Mar 16, 2014 at 3:23 AM, MauMau <maumau307@gmail.com> wrote: > > Hello, > > The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on > Windows) as an example for archive_command. However, cp/copy does not sync > the copied data to disk. As a result, the completed WAL segments would be > lost in the following sequence: > > 1. A WAL segment fills up. > > 2. The archiver process archives the just filled WAL segment using > archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/ > and writes to the archive area. At this point, the WAL file is not > persisted to the archive area yet, because cp/copy doesn't sync the writes. > > 3. The checkpoint processing removes the WAL segment file from pg_xlog/. > > > Note that it takes two checkpoints for this to happen, at least as currently > coded. > > Also, if the system crashed badly enough to need media recovery, rather than > just automatic crash recovery, some lost transactions are expected. Although > this could silently break your PITR chain, of a crash happened and automatic > recover used the copy in pg_xlog (which of course was synced) , while copy in > the archive was not synced. That is one good reason to keep checkpoint_warning=30, so the typical file system sync that happens every 30 seconds warns that those files might not on permanent storage. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Sat, Mar 22, 2014 at 06:22:37AM +0900, MauMau wrote: > From: "Jeff Janes" <jeff.janes@gmail.com> > >Do people really just copy the files from one directory of local > >storage to > >another directory of local storage? I don't see the point of that. > > It makes sense to archive WAL to a directory of local storage for > media recovery. Here, the local storage is a different disk drive > which is directly attached to the database server or directly > connected through SAN. I'm one of those peope. They are archived into a local directory in preparation for an rsync over ssh. > >The recommendation is to refuse to overwrite an existing file of the same > >name, and exit with failure. Which essentially brings archiving > >to a halt, > >because it keeps trying but it will keep failing. If we make a custom > >version, one thing it should do is determine if the existing archived file > >is just a truncated version of the attempting-to-be archived file, and if > >so overwrite it. Because if the first archival command fails with a > >network glitch, it can leave behind a partial file. > > What I'm trying to address is just an alternative to cp/copy which > fsyncs a file. It just overwrites an existing file. I ran into a related problem with cp, where halfway the copy the disk was full and I was left with half a WAL file. This caused the rsync to copy only half a file and the replication broke. This is clearly a recoverable situation, but it didn't recover in this case. > Yes, you're right, the failed archive attempt leaves behind a > partial file which causes subsequent attempts to fail, if you follow > the PG manual. That's another undesirable point in the current doc. > To overcome this, someone on this ML recommended me to do "cp %p > /archive/dir/%f.tmp && mv /archive/dir/%f.tmp /archive/dir/%f". > Does this solve your problem? This would probably have handled it, but I find it odd that there's program to handle restoring of archives properly, but on the archiving side you have to cobble together your own shell scripts which fail in various corner cases. I'd love a program that just Did The Right Thing. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
On Fri, Mar 21, 2014 at 2:22 PM, MauMau <maumau307@gmail.com> wrote:
From: "Jeff Janes" <jeff.janes@gmail.com>It makes sense to archive WAL to a directory of local storage for media recovery. Here, the local storage is a different disk drive which is directly attached to the database server or directly connected through SAN.Do people really just copy the files from one directory of local storage to
another directory of local storage? I don't see the point of that.
For a SAN I guess we have different meanings of "local" :)
(I have no doubt yours is correct--the fine art of IT terminology is not my thing.)
What I'm trying to address is just an alternative to cp/copy which fsyncs a file. It just overwrites an existing file.The recommendation is to refuse to overwrite an existing file of the same
name, and exit with failure. Which essentially brings archiving to a halt,
because it keeps trying but it will keep failing. If we make a custom
version, one thing it should do is determine if the existing archived file
is just a truncated version of the attempting-to-be archived file, and if
so overwrite it. Because if the first archival command fails with a
network glitch, it can leave behind a partial file.
Yes, you're right, the failed archive attempt leaves behind a partial file which causes subsequent attempts to fail, if you follow the PG manual. That's another undesirable point in the current doc. To overcome this, someone on this ML recommended me to do "cp %p /archive/dir/%f.tmp && mv /archive/dir/%f.tmp /archive/dir/%f". Does this solve your problem?
As written is doesn't solve it, as it just unconditionally overwrites the file. If you wanted that you could just do the single-statement unconditional overwrite.
You could make it so that the .tmp gets overwritten unconditionally, but the move of it will not overwrite an existing permanent file. That would solve the problem where a glitch in the network leaves in incomplete file behind that blocks the next attempt, *except* that mv on (at least some) network file systems is really a copy, and not an atomic rename, so is still subject to leaving behind incomplete crud.
But, it is hard to tell what the real solution is, because the doc doesn't explain why it should refuse (and fail) to overwrite an existing file. The only reason I can think of to make that recommendation is because it is easy to accidentally configure two clusters to attempt to archive to the same location, and having them overwrite each others files should be guarded against. If I am right, it seems like this reason should be added to the docs, so people know what they are defending against. And if I am wrong, it seems even more important that the (correct) reason is added to the docs.
Cheers,
Jeff
Jeff Janes <jeff.janes@gmail.com> writes: > But, it is hard to tell what the real solution is, because the doc doesn't > explain why it should refuse (and fail) to overwrite an existing file. The > only reason I can think of to make that recommendation is because it is > easy to accidentally configure two clusters to attempt to archive to the > same location, and having them overwrite each others files should be > guarded against. If I am right, it seems like this reason should be added > to the docs, so people know what they are defending against. And if I am > wrong, it seems even more important that the (correct) reason is added to > the docs. If memory serves, that is the reason ... and I thought it *was* explained somewhere in the docs. regards, tom lane
On Saturday, March 29, 2014, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeff Janes <jeff.janes@gmail.com> writes:
> But, it is hard to tell what the real solution is, because the doc doesn't
> explain why it should refuse (and fail) to overwrite an existing file. The
> only reason I can think of to make that recommendation is because it is
> easy to accidentally configure two clusters to attempt to archive to the
> same location, and having them overwrite each others files should be
> guarded against. If I am right, it seems like this reason should be added
> to the docs, so people know what they are defending against. And if I am
> wrong, it seems even more important that the (correct) reason is added to
> the docs.
If memory serves, that is the reason ... and I thought it *was* explained
somewhere in the docs.
You are right, and it has been there for a decade. I don't know how I missed that the last several times I read it. I remember clearly the paragraph below it, just not that one.
Sorry,
Jeff