Thread: HELP!!! The WAL Archive is taking up all space

HELP!!! The WAL Archive is taking up all space

From
FattahRozzaq
Date:
Hi all,

Please help...

I have 1 master PostgreSQL and 1 standby PostgreSQL.
Both servers has the same OS Linux Debian Wheezy, the same hardware.

Both server hardware:
CPU: 24 cores
RAM: 128GB
Disk-1: 800GB SAS (for OS, logs, WAL archive directory)
Disk-2: 330GB SSD (for PostgreSQL data directory, except WAL archive
and except pg_log)

The part of the configuration are as below:
checkpoint_segments = 64
checkpoint_completion_target = 0.9
default_statistics_target = 10
maintenance_work_mem = 1GB
effective_cache_size = 64GB
shared_buffers = 24GB
work_mem = 5MB
wal_buffers = 8MB
wal_keep_segments = 4096
wal_level = hot_standby
max_wal_senders = 10
archive_mode = on
archive_command = 'cp -i %p /home/postgres/archive/master/%f'


The WAL archive folder is at /home/postgres/archive/master/, right?
This directory consumes around 750GB of Disk-1.
Each segment in the /home/postgres/archive/master/ is 16MB each
There are currently 47443 files in this folder.

If I want to limit the total size use by WAL archive to around 200-400
GB, what value should I set for the wal_keep_segments,
checkpoint_segments?



Regards,
Fattah


Re: HELP!!! The WAL Archive is taking up all space

From
Alan Hodgson
Date:
On Wednesday, December 09, 2015 07:55:09 AM FattahRozzaq wrote:
> archive_mode = on
> archive_command = 'cp -i %p /home/postgres/archive/master/%f'
>
>
> The WAL archive folder is at /home/postgres/archive/master/, right?
> This directory consumes around 750GB of Disk-1.
> Each segment in the /home/postgres/archive/master/ is 16MB each
> There are currently 47443 files in this folder.
>
> If I want to limit the total size use by WAL archive to around 200-400
> GB, what value should I set for the wal_keep_segments,
> checkpoint_segments?

PostgreSQL doesn't clean up files copied by your archive_command. You need to
have a separate task clean those out. PostgreSQL's active wal_keep_segments
etc. are in the data/pg_xlog directory.

Attachment

Re: HELP!!! The WAL Archive is taking up all space

From
Adrian Klaver
Date:
On 12/09/2015 11:15 AM, Alan Hodgson wrote:
> On Wednesday, December 09, 2015 07:55:09 AM FattahRozzaq wrote:
>> archive_mode = on
>> archive_command = 'cp -i %p /home/postgres/archive/master/%f'
>>
>>
>> The WAL archive folder is at /home/postgres/archive/master/, right?
>> This directory consumes around 750GB of Disk-1.
>> Each segment in the /home/postgres/archive/master/ is 16MB each
>> There are currently 47443 files in this folder.
>>
>> If I want to limit the total size use by WAL archive to around 200-400
>> GB, what value should I set for the wal_keep_segments,
>> checkpoint_segments?
>
> PostgreSQL doesn't clean up files copied by your archive_command. You need to
> have a separate task clean those out. PostgreSQL's active wal_keep_segments
> etc. are in the data/pg_xlog directory.
>

The OP might want to take a look at:

http://www.postgresql.org/docs/9.4/interactive/pgarchivecleanup.html

To be safe I would use:

-n

     Print the names of the files that would have been removed on stdout
(performs a dry run).


at first.

--
Adrian Klaver
adrian.klaver@aklaver.com


Re: HELP!!! The WAL Archive is taking up all space

From
John R Pierce
Date:
On 12/8/2015 4:55 PM, FattahRozzaq wrote:
> ...I want to limit the total size use by WAL archive to around 200-400 GB...?

for what purpose are you keeping a wal archive ?

if its for PITR (point in time recovery), you need ALL WAL records since
the start of a base backup up to the point in time at which you wish to
recover.


--
john r pierce, recycling bits in santa cruz



Re: HELP!!! The WAL Archive is taking up all space

From
FattahRozzaq
Date:
Hi John,

I really don't know why I should keep the wal archives.
I implement streaming replication into 1 server (standby server).
I'm really newbie to PostgreSQL but the boss pushed me to handle it
and implement it in production f*&%*$%%$#%$#&# (forgive me)
They don't hire a database expert, I don't know why.

Hi Alan,
I have no problem with pg_xlog size which is only 67GB.
My issue is with the archive folder size.
I've read in this blog:
http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html
And I change the archive_command into: archive_command = '/bin/true'
Is it a bad decision?

Hi Adrian,
Thank you very much for the suggestion. is pgarchivecleanup harmful?
Do you have a pgarchivecleanup example for my case?
How to run a dry-run?


Thank you,
Fattah


Re: HELP!!! The WAL Archive is taking up all space

From
FattahRozzaq
Date:
Quick information,

After I realize, the line "archive_command=/bin/true" is a bad
decision, I have revert it back.
Now I'm really confused and panic.
I don't know what to do, and I don't really understand the postgresql.conf
I'm a network engineer, I should handle the network and also
postgresql database.
Oh man, the office is so good but this part is sucks :((

--
On 10/12/2015, FattahRozzaq <ssoorruu@gmail.com> wrote:
> Hi John,
>
> I really don't know why I should keep the wal archives.
> I implement streaming replication into 1 server (standby server).
> I'm really newbie to PostgreSQL but the boss pushed me to handle it
> and implement it in production f*&%*$%%$#%$#&# (forgive me)
> They don't hire a database expert, I don't know why.
>
> Hi Alan,
> I have no problem with pg_xlog size which is only 67GB.
> My issue is with the archive folder size.
> I've read in this blog:
> http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html
> And I change the archive_command into: archive_command = '/bin/true'
> Is it a bad decision?
>
> Hi Adrian,
> Thank you very much for the suggestion. is pgarchivecleanup harmful?
> Do you have a pgarchivecleanup example for my case?
> How to run a dry-run?
>
>
> Thank you,
> Fattah
>


Re: HELP!!! The WAL Archive is taking up all space

From
"Joshua D. Drake"
Date:
On 12/09/2015 04:38 PM, FattahRozzaq wrote:
> Quick information,
>
> After I realize, the line "archive_command=/bin/true" is a bad
> decision, I have revert it back.
> Now I'm really confused and panic.
> I don't know what to do, and I don't really understand the postgresql.conf
> I'm a network engineer, I should handle the network and also
> postgresql database.
> Oh man, the office is so good but this part is sucks :((

If the pg_xlog directory is growing it is likely that either:

* wal_keep_segments is set high and your slave is not correctly
receiving updates.

* You are using a replication slot and the slave is not correctly
receiving updates.

If your archive_command does not return a success, your pg_xlog will
also grow but you don't need the archive_command *IF* your streaming
replication is working *UNLESS* you are also doing archiving or PITR.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


Re: HELP!!! The WAL Archive is taking up all space

From
John R Pierce
Date:
On 12/9/2015 4:27 PM, FattahRozzaq wrote:
> I really don't know why I should keep the wal archives.
> I implement streaming replication into 1 server (standby server).
> I'm really newbie to PostgreSQL but the boss pushed me to handle it
> and implement it in production f*&%*$%%$#%$#&# (forgive me)
> They don't hire a database expert, I don't know why.


primary use for wal archives is to implement a point-in-time-recovery
(PITR) backup system.   It is effectively a continuous backup of your
databases, that can be restored to an arbitrary point-in-time.

typically, every so often (maybe once a week?) you would create a base
backup.    I personally would keep the last 2 basebackups, plus all wal
archives since the start of the older of those two.    both the WAL
archives and base backups should be stored on a separate storage
system,, NOT either the master or standby database servers, typically
via NFS.

one usecase for this basebackup+wal archive is fast recovery in case of
master failure...   if your master fails, you need to promote the
standby to master, then you'll want to bring up a new standby server,
this can be done by using the most recent base backup and then playing
back all wal archives to it til it catches up, and then it can stream
from the recently promoted master which was the standby and you have
your high availability redundancy back.

another use case, and perhaps more critical one..   say someone does
something nasty to your databases, like drops the wrong table, or
clobbers a bunch of financial data.   this will quickly replicate to the
standby, rendering it equally useless for recovery.   With PITR, you can
restore that most recent basebackup, then play back the WAL archive up
just to before the transaction that clobbered your critical data.  you
would need to do this on both the master and slave, resuming replication
in the correct sequence.

about the only use case for wal archives without a base backup is
speeding up the resuming of a standby server that got behind, perhaps
due to being shutdown for hardware or OS maintenance, or whatever.   if
you don't have a wal archive, the standby server has to request all the
xlog's since it last streamed from the master, while with a wal archive,
it can fetch as many as it can from the wal archive THEN catch up with
the master.  This lowers the workload on the master.

--
john r pierce, recycling bits in santa cruz



Re: HELP!!! The WAL Archive is taking up all space

From
FattahRozzaq
Date:
Hi John,

Really thanking you for spend time typing and responding my email.
I think the archive_command returns success, I can see the archive
directory piling up 16MB every 2 minutes.
Maybe the pgarchivecleanup is the solution to cleanup the contents of
archive folder?
How to properly do it?
What is the pgarchivecleanup example that I can use for this case?
How to run a dry-run for pgarchivecleanup?


Best Regards,
FR

On 10/12/2015, Joshua D. Drake <jd@commandprompt.com> wrote:
> On 12/09/2015 04:38 PM, FattahRozzaq wrote:
>> Quick information,
>>
>> After I realize, the line "archive_command=/bin/true" is a bad
>> decision, I have revert it back.
>> Now I'm really confused and panic.
>> I don't know what to do, and I don't really understand the
>> postgresql.conf
>> I'm a network engineer, I should handle the network and also
>> postgresql database.
>> Oh man, the office is so good but this part is sucks :((
>
> If the pg_xlog directory is growing it is likely that either:
>
> * wal_keep_segments is set high and your slave is not correctly
> receiving updates.
>
> * You are using a replication slot and the slave is not correctly
> receiving updates.
>
> If your archive_command does not return a success, your pg_xlog will
> also grow but you don't need the archive_command *IF* your streaming
> replication is working *UNLESS* you are also doing archiving or PITR.
>
> Sincerely,
>
> JD
>
> --
> Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
> PostgreSQL Centered full stack support, consulting and development.
> Announcing "I'm offended" is basically telling the world you can't
> control your own emotions, so everyone else should do it for you.
>


Re: HELP!!! The WAL Archive is taking up all space

From
Adrian Klaver
Date:
On 12/09/2015 04:27 PM, FattahRozzaq wrote:
> Hi John,
>
> I really don't know why I should keep the wal archives.

So who set up the archiving and why?

Is archive recovery set up on the standby?:

http://www.postgresql.org/docs/9.4/interactive/archive-recovery-settings.html

> I implement streaming replication into 1 server (standby server).

Is that the only standby or is there another set up previously?

Per another recent thread having a WAL archive to fall back on is handy
if the streaming replication falls behind and wal_keep_segments is not
high enough:

http://www.postgresql.org/docs/9.4/interactive/warm-standby.html#STREAMING-REPLICATION

"If you use streaming replication without file-based continuous
archiving, the server might recycle old WAL segments before the standby
has received them. If this occurs, the standby will need to be
reinitialized from a new base backup. You can avoid this by setting
wal_keep_segments to a value large enough to ensure that WAL segments
are not recycled too early, or by configuring a replication slot for the
standby. If you set up a WAL archive that's accessible from the standby,
these solutions are not required, since the standby can always use the
archive to catch up provided it retains enough segments."

> I'm really newbie to PostgreSQL but the boss pushed me to handle it
> and implement it in production f*&%*$%%$#%$#&# (forgive me)
> They don't hire a database expert, I don't know why.
>
> Hi Alan,
> I have no problem with pg_xlog size which is only 67GB.
> My issue is with the archive folder size.
> I've read in this blog:
> http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html
> And I change the archive_command into: archive_command = '/bin/true'
> Is it a bad decision?
>
> Hi Adrian,
> Thank you very much for the suggestion. is pgarchivecleanup harmful?
> Do you have a pgarchivecleanup example for my case?
> How to run a dry-run?

Before you do any of the below make sure you know what your
replication/archiving set up is, per the questions above.

On the standby do:

select pg_last_xlog_receive_location();

This will give you the last WAL synced to the standby disk.

You can use that to see what WAL files are before this in the WAL
archive. Then you can do this:

pg_archivecleanup -d -n <archive_dir> <some_wal_file_before_the one you
found_above>


>
>
> Thank you,
> Fattah
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: HELP!!! The WAL Archive is taking up all space

From
Andreas Kretschmer
Date:

> FattahRozzaq <ssoorruu@gmail.com> hat am 10. Dezember 2015 um 01:27
> geschrieben:
>
>
> Hi John,
>
> I really don't know why I should keep the wal archives.


That's the problem! But that's your part, not our. If you need a Backup with
PITR-capability you have to create a so called basebackup and continously WAL's.
If you create later, say the next day, a new Basebackup and your Backup-Policy
is hold one Backup, than you can delete all WAL's untill to the new Basebackup
and the old Backup.

If i where you i would use somethink like barman (see: http://www.pgbarman.org/
) for that. And yes: you should a extra Backup-Server. If you have both
(Database and Backup) on the same machine and the machine burns you will lost
both, data and backup.


Questions?



> I implement streaming replication into 1 server (standby server).


Streamin Replication can't replace a Backup!


> I'm really newbie to PostgreSQL but the boss pushed me to handle it
> and implement it in production f*&%*$%%$#%$#&# (forgive me)
> They don't hire a database expert, I don't know why.

You can learn that. PostgreSQL is really, really great.
Btw.: i know companies providing remote DBA service for PostgreSQL.



> http://blog.endpoint.com/2014/09/pgxlog-disk-space-problem-on-postgres.html
> And I change the archive_command into: archive_command = '/bin/true'
> Is it a bad decision?

Maybe. You don't have a backup now. In case of desaster your boss will fire YOU!
(and not me)


Re: HELP!!! The WAL Archive is taking up all space

From
Jim Nasby
Date:
On 12/9/15 7:05 PM, Andreas Kretschmer wrote:
>> I'm really newbie to PostgreSQL but the boss pushed me to handle it
>> >and implement it in production f*&%*$%%$#%$#&# (forgive me)
>> >They don't hire a database expert, I don't know why.
> You can learn that. PostgreSQL is really, really great.
> Btw.: i know companies providing remote DBA service for PostgreSQL.

Even if you want to learn all about Postgres, I'd strongly recommend
your company get some kind of a contract in place with an experienced
Postgres expert. Databases are some of the most complicated pieces of
software out there, and there's any number of ways you can really screw
yourself if you're not careful. Almost all other software is stateless
and pretty easy to fix mistakes in (screwed up a firewall config? Hit
the console, fix it, and you're good. Screwed up the database? All your
data may now be gone forever!)

As a Postgres consultant my opinion is obviously biased, but I've seen
plenty of horror stories where recovery of data becomes virtually
impossible, frequently without any kind of a backup in place.

It is possible that the data you're storing just isn't that important
(as hard as database people find that to believe!). If that's your case
then you can probably just turn off archiving and not worry about it.

Finally, as someone else said, *a replica is NOT a backup!*
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com