Thread: WAL archiving to network drive

WAL archiving to network drive

From
Rob Adams
Date:
I'm setting up WAL archiving on a Windows machine & need to copy the WAL
files to a network drive.

Is it best to give the 'postgres' user network access & archive the WAL
files directly to the network drive? Or archive the WAL files to a local
folder and then use a scheduled task to move them to the network drive?
(Or something else entirely?)

Thanks,
--Rob Adams

Re: WAL archiving to network drive

From
Glen Parker
Date:
Rob Adams wrote:
> I'm setting up WAL archiving on a Windows machine & need to copy the WAL
> files to a network drive.
>
> Is it best to give the 'postgres' user network access & archive the WAL
> files directly to the network drive? Or archive the WAL files to a local
> folder and then use a scheduled task to move them to the network drive?
> (Or something else entirely?)

I am archiving them directly, from a Linux installation, to a Windows
machine.  As long as the error handling is sound, it should work OK.
I've had no problems other than some the expected down time causing WAL
files to pile up.  It's been working great for a couple years now.


I write the files with an alternate file name, then rename them.  That
helps make sure half written files are not mistaken for fully written ones.

I also have a little watch dog script that runs on cron every few
minutes, that counts the number of WAL files present in the xlog
directory.  If that count is oddly high, I get an email.  I've been
alerted to, and fixed quickly, a problem three or four times this way.


Re: WAL archiving to network drive

From
Greg Smith
Date:
On Tue, 19 Aug 2008, Rob Adams wrote:

> Is it best to give the 'postgres' user network access & archive the WAL files
> directly to the network drive? Or archive the WAL files to a local folder and
> then use a scheduled task to move them to the network drive?

I normally write first to a local drive then move to the network, because
that has some useful properties:

1) As as long as there's local disk space, the archive_command called
always returns quickly.  While performance here doesn't really matter it
makes me feel better.

2) If there is a problem with the network drive, the slow/blocked network
write is not happening where the database is waiting for it.

3) In an emergency where the usual network drive is unavailable, you can
temporarily change your archive strategy without touching the database
server's configuration.

4) The copying process doesn't need read access to the main database, and
the database server process doesn't need permissions on the network drive.
I find myself giving it a completely different set of permissions, so that
it can't touch the database at all really, but can run alert scripts and
take actions I'd never want the database user to have access to.

The main downside of doing it this is way is you're stuck writing and
debugging a lot more scripts to make sure everything happens correctly.
I've found the alternative--just putting something into archive_command
does the transfer directly--isn't quite flexible enough when things go
wrong for me to be comfortable with it.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WAL archiving to network drive

From
Glen Parker
Date:
Greg Smith wrote:
> 2) If there is a problem with the network drive, the slow/blocked
> network write is not happening where the database is waiting for it.

This is not really a consideration, is it?  I was under the impression
that no other database activity blocks waiting for the archiver.  That
doesn't even make sense.

> 3) In an emergency where the usual network drive is unavailable, you can
> temporarily change your archive strategy without touching the database
> server's configuration.

That's just as true with the direct write method.  You simply change the
archive script to do nothing and return a failed exit status.  The DBMS
will keep trying until you change it back, and then everything catches
up and proceeds as normal.

-Glen


Re: WAL archiving to network drive

From
Greg Smith
Date:
On Wed, 20 Aug 2008, Glen Parker wrote:

> Greg Smith wrote:
>> 2) If there is a problem with the network drive, the slow/blocked network
>> write is not happening where the database is waiting for it.
>
> This is not really a consideration, is it?  I was under the impression that
> no other database activity blocks waiting for the archiver.

The database will continue accumulating WAL segments it can't recycle if
the archiver keeps failing, which can cause the size of the pg_xlog
directory (often mounted into a separate, smaller partition or disk) to
increase dramatically.  You do not want to be the guy who caused the
database to go down because the xlog disk filled after some network mount
flaked out.  I've seen that way too many times in WAN environments where
the remote location was unreachable for days, due to natural disaster for
example, and since under normal operation pg_xlog never got very big it
wasn't sized for that.

It will also slow things down a bit under heavy write loads, as every
segment change will result in creating a new segment file rather than
re-using an old one.

You also don't want to be the guy who has to explain why the database is
taking hours to come back up again after it crashed and has 4000 WAL
segments to replay, because archiving failed for a long time and prevented
proper checkpoints (ask Robert Treat if you don't believe me, he also once
was that guy).  While occasional archiving failures are tolerated just
fine and you can catchup later, a design that presumes it's OK for them to
happen for extended periods is not optimal.  A crash after an excess of
segments have accumulated can result in a large amount of replay downtime
for the server, and in some environments (where the primary is very
redundant for example) the risk of that is worse than saying your PITR
backup is hosed and just throwing away the segments if you can't transfer
them to the backup for some reason.

>> 3) In an emergency where the usual network drive is unavailable, you can
>> temporarily change your archive strategy without touching the database
>> server's configuration.
>
> That's just as true with the direct write method.  You simply change the
> archive script to do nothing and return a failed exit status.  The DBMS will
> keep trying until you change it back, and then everything catches up and
> proceeds as normal.

I've already been through why assuming archive failures have zero cost can
be bad above.  There are some additional issues here as well.

When's the last time you tried to get a DBA to send a "kill -HUP" to a
production database to get it to recognize that archive_command was
changed in the postgresql.conf?  Sure, we all know it's not supposed to
ever hurt the server, but Murphy is a tricky character and lots of people
are (rightly) paranoid.  I think we've all known admins whose fingers were
trained to only ever type "kill -9" no matter what signal they meant to
send.

OK, maybe you're smarter than that and used a separate script.  DBAs are
also not happy changing a script that gets called by the database every
couple of minutes, and as soon as there's more than one piece involved it
can be difficult to do an atomic update of said script.

Anytime you can decouple a failure recovery cleanup (like a network drive
getting disconnected screwing up archiving) completely away from the
routine database operation, you have gained flexibility in how easily and
safely you can recover from that failure.  I assure you that all the
suggestions I made come from painful and completely unexpected messes

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WAL archiving to network drive

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> You also don't want to be the guy who has to explain why the database is
> taking hours to come back up again after it crashed and has 4000 WAL
> segments to replay, because archiving failed for a long time and prevented
> proper checkpoints (ask Robert Treat if you don't believe me, he also once
> was that guy).

Say what?  Archiver failure can't/shouldn't prevent checkpointing.

            regards, tom lane

Re: WAL archiving to network drive

From
Glen Parker
Date:
Greg Smith wrote:
> On Wed, 20 Aug 2008, Glen Parker wrote:
> The database will continue accumulating WAL segments it can't recycle if
> the archiver keeps failing, which can cause the size of the pg_xlog
> directory (often mounted into a separate, smaller partition or disk) to
> increase dramatically.  You do not want to be the guy who caused the
> database to go down because the xlog disk filled after some network
> mount flaked out.  I've seen that way too many times in WAN environments
> where the remote location was unreachable for days, due to natural
> disaster for example, and since under normal operation pg_xlog never got
> very big it wasn't sized for that.
>
> It will also slow things down a bit under heavy write loads, as every
> segment change will result in creating a new segment file rather than
> re-using an old one.

So you advocate archiving the WAL files from a small xlog volume, to a
larger local volume.  Why not just make the xlog volume large enough to
handle overruns, since you obviously have the space?  Copying each WAL
from one place to another on the local machine FAR outweighs the extra
overhead created when WAL files most be created rather than recycled.

Also, you mention days of down time, natural disasters, and a WAN.  My
DBMS and archive machines are in the same room.  If I had to deal with
different locations, I'd build more safety into the system.  In fact, in
a way, I have.  My WALs are archived immediately to another machine,
where they are (hours later) sent to tape in batches, which is then
hiked off location; emulating to some extent your decoupled system.


> OK, maybe you're smarter than that and used a separate script.  DBAs are
> also not happy changing a script that gets called by the database every
> couple of minutes, and as soon as there's more than one piece involved
> it can be difficult to do an atomic update of said script.


Yes I'm smarter than that, and I'm also the DBA, so I don't mind much ;-)


-Glen




Re: WAL archiving to network drive

From
Greg Smith
Date:
On Wed, 20 Aug 2008, Tom Lane wrote:

> Greg Smith <gsmith@gregsmith.com> writes:
>> You also don't want to be the guy who has to explain why the database is
>> taking hours to come back up again after it crashed and has 4000 WAL
>> segments to replay, because archiving failed for a long time and prevented
>> proper checkpoints (ask Robert Treat if you don't believe me, he also once
>> was that guy).
>
> Say what?  Archiver failure can't/shouldn't prevent checkpointing.

Shouldn't, sure.  The wacky case Robert ran into I was alluding to
involved the system not checkpointing anymore and just piling the archive
files up, and while I think it's safe to say that was all a hardware
problem stuff like that makes me nervous.

It is true that archiver failure prevents *normal* checkpointing, where
WAL files get recycled rather than piling up.  I know that shouldn't make
any difference, but I've also been through two similarly awful situations
resulting from odd archiver problems that seemed mysterious at the time
(staring at the source later cleared up what really happened) that left me
even more paranoid than usual when working in this area.

The stance I've adopted says anything involving uncertain network
resources should get moved to outside of the code the database itself
runs.  Any time you're following a different path than the usual one
through the server code (in this case exercising the archive failure and
resubmission section), I see that as an opportunity to run into more
obscure bugs.  That's just not code that gets run/tested as often.  It
also minimizes the amount of software the admin wrote that has to be right
(bugs in the archive_command script are really bad) in order for the
database to keep running.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: WAL archiving to network drive

From
Greg Smith
Date:
On Thu, 21 Aug 2008, Glen Parker wrote:

> So you advocate archiving the WAL files from a small xlog volume, to a larger
> local volume.  Why not just make the xlog volume large enough to handle
> overruns, since you obviously have the space?  Copying each WAL from one
> place to another on the local machine FAR outweighs the extra overhead
> created when WAL files most be created rather than recycled.

Yes, but the WAL creation can end up blocking the database operation in a
way that another process doesn't.  Ever fill the write buffer on your OS
and have the whole database come to halt because the 16MB WAL buffer write
needed to create a new segment is blocked waiting for I/O to clear?  It's
a bad situation to run into that can't happen if you're just recycling
segments.

That's really the fundamental theme of what I was driving at:  anything
that's happening inside the database server processes has a potential for
disrupting the database, and as a rule I avoid all of those whenever
possible.  You could easily argue that using a temporary area increases
total I/O and adds complexity compared with copying over the network
directly, and I'd agree with you--and do it anyway in some places.

Also:  there are some situations where the xlog disk has a different cost
associated with its use from the others in the system.  Some people are
starting to put those on flash, for example, and being able to keep the
maximum usage well under control is important there.  There are some other
situations where expanding the xlog just isn't quite the same as the other
disks on the system.  I've been known to adjust the disk layout so the
xlog is on a short-stroked parition for example, which also limits its
size.

> If I had to deal with different locations, I'd build more safety into
> the system.

I'm certainly not going to claim that the suggestions I made are
universal.  You have a setup that sounds appropriate for you; I'm not
going to say that you should switch to what I suggested.  Just trying to
throw out some things people might consider when deciding what makes sense
for their environment.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD