Thread: WAL archiving to network drive
I'm setting up WAL archiving on a Windows machine & need to copy the WAL files to a network drive. Is it best to give the 'postgres' user network access & archive the WAL files directly to the network drive? Or archive the WAL files to a local folder and then use a scheduled task to move them to the network drive? (Or something else entirely?) Thanks, --Rob Adams
Rob Adams wrote: > I'm setting up WAL archiving on a Windows machine & need to copy the WAL > files to a network drive. > > Is it best to give the 'postgres' user network access & archive the WAL > files directly to the network drive? Or archive the WAL files to a local > folder and then use a scheduled task to move them to the network drive? > (Or something else entirely?) I am archiving them directly, from a Linux installation, to a Windows machine. As long as the error handling is sound, it should work OK. I've had no problems other than some the expected down time causing WAL files to pile up. It's been working great for a couple years now. I write the files with an alternate file name, then rename them. That helps make sure half written files are not mistaken for fully written ones. I also have a little watch dog script that runs on cron every few minutes, that counts the number of WAL files present in the xlog directory. If that count is oddly high, I get an email. I've been alerted to, and fixed quickly, a problem three or four times this way.
On Tue, 19 Aug 2008, Rob Adams wrote: > Is it best to give the 'postgres' user network access & archive the WAL files > directly to the network drive? Or archive the WAL files to a local folder and > then use a scheduled task to move them to the network drive? I normally write first to a local drive then move to the network, because that has some useful properties: 1) As as long as there's local disk space, the archive_command called always returns quickly. While performance here doesn't really matter it makes me feel better. 2) If there is a problem with the network drive, the slow/blocked network write is not happening where the database is waiting for it. 3) In an emergency where the usual network drive is unavailable, you can temporarily change your archive strategy without touching the database server's configuration. 4) The copying process doesn't need read access to the main database, and the database server process doesn't need permissions on the network drive. I find myself giving it a completely different set of permissions, so that it can't touch the database at all really, but can run alert scripts and take actions I'd never want the database user to have access to. The main downside of doing it this is way is you're stuck writing and debugging a lot more scripts to make sure everything happens correctly. I've found the alternative--just putting something into archive_command does the transfer directly--isn't quite flexible enough when things go wrong for me to be comfortable with it. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith wrote: > 2) If there is a problem with the network drive, the slow/blocked > network write is not happening where the database is waiting for it. This is not really a consideration, is it? I was under the impression that no other database activity blocks waiting for the archiver. That doesn't even make sense. > 3) In an emergency where the usual network drive is unavailable, you can > temporarily change your archive strategy without touching the database > server's configuration. That's just as true with the direct write method. You simply change the archive script to do nothing and return a failed exit status. The DBMS will keep trying until you change it back, and then everything catches up and proceeds as normal. -Glen
On Wed, 20 Aug 2008, Glen Parker wrote: > Greg Smith wrote: >> 2) If there is a problem with the network drive, the slow/blocked network >> write is not happening where the database is waiting for it. > > This is not really a consideration, is it? I was under the impression that > no other database activity blocks waiting for the archiver. The database will continue accumulating WAL segments it can't recycle if the archiver keeps failing, which can cause the size of the pg_xlog directory (often mounted into a separate, smaller partition or disk) to increase dramatically. You do not want to be the guy who caused the database to go down because the xlog disk filled after some network mount flaked out. I've seen that way too many times in WAN environments where the remote location was unreachable for days, due to natural disaster for example, and since under normal operation pg_xlog never got very big it wasn't sized for that. It will also slow things down a bit under heavy write loads, as every segment change will result in creating a new segment file rather than re-using an old one. You also don't want to be the guy who has to explain why the database is taking hours to come back up again after it crashed and has 4000 WAL segments to replay, because archiving failed for a long time and prevented proper checkpoints (ask Robert Treat if you don't believe me, he also once was that guy). While occasional archiving failures are tolerated just fine and you can catchup later, a design that presumes it's OK for them to happen for extended periods is not optimal. A crash after an excess of segments have accumulated can result in a large amount of replay downtime for the server, and in some environments (where the primary is very redundant for example) the risk of that is worse than saying your PITR backup is hosed and just throwing away the segments if you can't transfer them to the backup for some reason. >> 3) In an emergency where the usual network drive is unavailable, you can >> temporarily change your archive strategy without touching the database >> server's configuration. > > That's just as true with the direct write method. You simply change the > archive script to do nothing and return a failed exit status. The DBMS will > keep trying until you change it back, and then everything catches up and > proceeds as normal. I've already been through why assuming archive failures have zero cost can be bad above. There are some additional issues here as well. When's the last time you tried to get a DBA to send a "kill -HUP" to a production database to get it to recognize that archive_command was changed in the postgresql.conf? Sure, we all know it's not supposed to ever hurt the server, but Murphy is a tricky character and lots of people are (rightly) paranoid. I think we've all known admins whose fingers were trained to only ever type "kill -9" no matter what signal they meant to send. OK, maybe you're smarter than that and used a separate script. DBAs are also not happy changing a script that gets called by the database every couple of minutes, and as soon as there's more than one piece involved it can be difficult to do an atomic update of said script. Anytime you can decouple a failure recovery cleanup (like a network drive getting disconnected screwing up archiving) completely away from the routine database operation, you have gained flexibility in how easily and safely you can recover from that failure. I assure you that all the suggestions I made come from painful and completely unexpected messes -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
Greg Smith <gsmith@gregsmith.com> writes: > You also don't want to be the guy who has to explain why the database is > taking hours to come back up again after it crashed and has 4000 WAL > segments to replay, because archiving failed for a long time and prevented > proper checkpoints (ask Robert Treat if you don't believe me, he also once > was that guy). Say what? Archiver failure can't/shouldn't prevent checkpointing. regards, tom lane
Greg Smith wrote: > On Wed, 20 Aug 2008, Glen Parker wrote: > The database will continue accumulating WAL segments it can't recycle if > the archiver keeps failing, which can cause the size of the pg_xlog > directory (often mounted into a separate, smaller partition or disk) to > increase dramatically. You do not want to be the guy who caused the > database to go down because the xlog disk filled after some network > mount flaked out. I've seen that way too many times in WAN environments > where the remote location was unreachable for days, due to natural > disaster for example, and since under normal operation pg_xlog never got > very big it wasn't sized for that. > > It will also slow things down a bit under heavy write loads, as every > segment change will result in creating a new segment file rather than > re-using an old one. So you advocate archiving the WAL files from a small xlog volume, to a larger local volume. Why not just make the xlog volume large enough to handle overruns, since you obviously have the space? Copying each WAL from one place to another on the local machine FAR outweighs the extra overhead created when WAL files most be created rather than recycled. Also, you mention days of down time, natural disasters, and a WAN. My DBMS and archive machines are in the same room. If I had to deal with different locations, I'd build more safety into the system. In fact, in a way, I have. My WALs are archived immediately to another machine, where they are (hours later) sent to tape in batches, which is then hiked off location; emulating to some extent your decoupled system. > OK, maybe you're smarter than that and used a separate script. DBAs are > also not happy changing a script that gets called by the database every > couple of minutes, and as soon as there's more than one piece involved > it can be difficult to do an atomic update of said script. Yes I'm smarter than that, and I'm also the DBA, so I don't mind much ;-) -Glen
On Wed, 20 Aug 2008, Tom Lane wrote: > Greg Smith <gsmith@gregsmith.com> writes: >> You also don't want to be the guy who has to explain why the database is >> taking hours to come back up again after it crashed and has 4000 WAL >> segments to replay, because archiving failed for a long time and prevented >> proper checkpoints (ask Robert Treat if you don't believe me, he also once >> was that guy). > > Say what? Archiver failure can't/shouldn't prevent checkpointing. Shouldn't, sure. The wacky case Robert ran into I was alluding to involved the system not checkpointing anymore and just piling the archive files up, and while I think it's safe to say that was all a hardware problem stuff like that makes me nervous. It is true that archiver failure prevents *normal* checkpointing, where WAL files get recycled rather than piling up. I know that shouldn't make any difference, but I've also been through two similarly awful situations resulting from odd archiver problems that seemed mysterious at the time (staring at the source later cleared up what really happened) that left me even more paranoid than usual when working in this area. The stance I've adopted says anything involving uncertain network resources should get moved to outside of the code the database itself runs. Any time you're following a different path than the usual one through the server code (in this case exercising the archive failure and resubmission section), I see that as an opportunity to run into more obscure bugs. That's just not code that gets run/tested as often. It also minimizes the amount of software the admin wrote that has to be right (bugs in the archive_command script are really bad) in order for the database to keep running. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Thu, 21 Aug 2008, Glen Parker wrote: > So you advocate archiving the WAL files from a small xlog volume, to a larger > local volume. Why not just make the xlog volume large enough to handle > overruns, since you obviously have the space? Copying each WAL from one > place to another on the local machine FAR outweighs the extra overhead > created when WAL files most be created rather than recycled. Yes, but the WAL creation can end up blocking the database operation in a way that another process doesn't. Ever fill the write buffer on your OS and have the whole database come to halt because the 16MB WAL buffer write needed to create a new segment is blocked waiting for I/O to clear? It's a bad situation to run into that can't happen if you're just recycling segments. That's really the fundamental theme of what I was driving at: anything that's happening inside the database server processes has a potential for disrupting the database, and as a rule I avoid all of those whenever possible. You could easily argue that using a temporary area increases total I/O and adds complexity compared with copying over the network directly, and I'd agree with you--and do it anyway in some places. Also: there are some situations where the xlog disk has a different cost associated with its use from the others in the system. Some people are starting to put those on flash, for example, and being able to keep the maximum usage well under control is important there. There are some other situations where expanding the xlog just isn't quite the same as the other disks on the system. I've been known to adjust the disk layout so the xlog is on a short-stroked parition for example, which also limits its size. > If I had to deal with different locations, I'd build more safety into > the system. I'm certainly not going to claim that the suggestions I made are universal. You have a setup that sounds appropriate for you; I'm not going to say that you should switch to what I suggested. Just trying to throw out some things people might consider when deciding what makes sense for their environment. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD