Thread: WAL for backup

WAL for backup

From
Shreesha
Date:
Hey All,
I am new to the postgreSQL and need some clarifications from your end.

I am running a standalone PostgreSQL server and I want to have the data backed up for consistency and system availability. Upon exploring options available in postgresql, I think continuous archival and PITR suits for my system. But this method requires archive mode to be enabled which comes with resource requirements like space/memory etc. 
The following are my concerns:
1. Currently, our system is backing up the database data before restart and periodically for every 3 hours. Is there any option in postgresql to do the archive periodically, say for every 3 hours?
2. If there is a system reboot and if postgreSQL server was running in the system, Will I be ensured from postgreSQL that all the WAL logs archived so far contains all the necessary data that was committed before system reboot?
3. As I am running standalone server, I want to set the wal_level='archive' and not 'hot standby'. Am I right in my understanding here? (I saw couple of examples over the internet and they are using 'hot standby' as the wal_level. So wanted to clarify.)

There are some constraints in my system as well.
4. The hard disk space in my system is minimal (around 300 MB). So I can't keep more archive files in my system as it may run out of space. I know that each archive files are of 16MB of size. So I want to keep it less. Is there any mechanism which can be used to cleanup the archive files periodically while the server is running, and ensure data consistency?
5. The system's performance is a concern. So I wanted to know whether by enabling the archiving option, will PostgreSQL show any delays in query execution and result retrieval?

Sorry if I am bombarding many questions here. It would be great if any of you could answer each of my questions mentioned above.
Eagerly looking forward for your response.

Thanks in advance!

Regards,
Shreesha.

Re: WAL for backup

From
David G Johnston
Date:
shreesha21 wrote
> Hey All,
> I am new to the postgreSQL and need some clarifications from your end.
>
> I am running a standalone PostgreSQL server and I want to have the data
> backed up for consistency and system availability. Upon exploring options
> available in postgresql, I think continuous archival and PITR suits for my
> system. But this method requires archive mode to be enabled which comes
> with resource requirements like space/memory etc.
> The following are my concerns:
> 1. Currently, our system is backing up the database data before restart
> and
> periodically for every 3 hours. Is there any option in postgresql to do
> the
> archive periodically, say for every 3 hours?
> 2. If there is a system reboot and if postgreSQL server was running in the
> system, Will I be ensured from postgreSQL that all the WAL logs archived
> so
> far contains all the necessary data that was committed before system
> reboot?
> 3. As I am running standalone server, I want to set the
> wal_level='archive'
> and not 'hot standby'. Am I right in my understanding here? (I saw couple
> of examples over the internet and they are using 'hot standby' as the
> wal_level. So wanted to clarify.)
>
> There are some constraints in my system as well.
> 4. The hard disk space in my system is minimal (around 300 MB). So I can't
> keep more archive files in my system as it may run out of space. I know
> that each archive files are of 16MB of size. So I want to keep it less. Is
> there any mechanism which can be used to cleanup the archive files
> periodically while the server is running, and ensure data consistency?
> 5. The system's performance is a concern. So I wanted to know whether by
> enabling the archiving option, will PostgreSQL show any delays in query
> execution and result retrieval?
>
> Sorry if I am bombarding many questions here. It would be great if any of
> you could answer each of my questions mentioned above.
> Eagerly looking forward for your response.
>
> Thanks in advance!
>
> Regards,
> Shreesha.

You should indicate what version of PostgreSQL you are using.

1) The archives will occur as the files fill up.  The files themselves are
continually written.  In archive modes additional data is always written.
So, basically no, you cannot guarantee the archive will occur no more
frequently than every 3 hours.

2) If the database is shutdown cleanly I am almost positive you are
guaranteed a WAL archive

3) Some additional info is captured for hot standby mode but for offline
PITR either one will work.

4) Not totally fluent on this point but once a file has been archived
off-system it can and will be re-used so the main issue is if the archive
fails for some reason and you fill up before you can resolve the issue.

5) You will be writing a little more data to disk than in a non-backup setup
and you will periodically eat network bandwidth transferring the WAL files
to another system.  I doubt either hit is significant enough to forgo the
benefit of PITR.


I would probably be very concerned about only having 300MB on the production
system (I assume you have considerably more on the target archive storage
location).  If it is the archive system you are referring to you would have
to run pg_basebackup more frequently to minimize the overall archive storage
space but it has the added benefit of making recovery occur very quickly.

David J.







--
View this message in context: http://postgresql.1045698.n5.nabble.com/WAL-for-backup-tp5813194p5813197.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: WAL for backup

From
Shreesha
Date:
Thank you David for the quick response. Appreciate it!
I am currently running PostgreSQL version 9.3.4.

Actually, when I said 300MB, I was referring to the archive system space. So like you suggested, I might have to run pg_basebackup more frequently. 
So are you saying that every time after I run pg_basebackup, if I do pg_archivecleanup then can I ensure data consistency and reduce the archive storage space?


On Tue, Jul 29, 2014 at 10:41 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
shreesha21 wrote
> Hey All,
> I am new to the postgreSQL and need some clarifications from your end.
>
> I am running a standalone PostgreSQL server and I want to have the data
> backed up for consistency and system availability. Upon exploring options
> available in postgresql, I think continuous archival and PITR suits for my
> system. But this method requires archive mode to be enabled which comes
> with resource requirements like space/memory etc.
> The following are my concerns:
> 1. Currently, our system is backing up the database data before restart
> and
> periodically for every 3 hours. Is there any option in postgresql to do
> the
> archive periodically, say for every 3 hours?
> 2. If there is a system reboot and if postgreSQL server was running in the
> system, Will I be ensured from postgreSQL that all the WAL logs archived
> so
> far contains all the necessary data that was committed before system
> reboot?
> 3. As I am running standalone server, I want to set the
> wal_level='archive'
> and not 'hot standby'. Am I right in my understanding here? (I saw couple
> of examples over the internet and they are using 'hot standby' as the
> wal_level. So wanted to clarify.)
>
> There are some constraints in my system as well.
> 4. The hard disk space in my system is minimal (around 300 MB). So I can't
> keep more archive files in my system as it may run out of space. I know
> that each archive files are of 16MB of size. So I want to keep it less. Is
> there any mechanism which can be used to cleanup the archive files
> periodically while the server is running, and ensure data consistency?
> 5. The system's performance is a concern. So I wanted to know whether by
> enabling the archiving option, will PostgreSQL show any delays in query
> execution and result retrieval?
>
> Sorry if I am bombarding many questions here. It would be great if any of
> you could answer each of my questions mentioned above.
> Eagerly looking forward for your response.
>
> Thanks in advance!
>
> Regards,
> Shreesha.

You should indicate what version of PostgreSQL you are using.

1) The archives will occur as the files fill up.  The files themselves are
continually written.  In archive modes additional data is always written.
So, basically no, you cannot guarantee the archive will occur no more
frequently than every 3 hours.

2) If the database is shutdown cleanly I am almost positive you are
guaranteed a WAL archive

3) Some additional info is captured for hot standby mode but for offline
PITR either one will work.

4) Not totally fluent on this point but once a file has been archived
off-system it can and will be re-used so the main issue is if the archive
fails for some reason and you fill up before you can resolve the issue.

5) You will be writing a little more data to disk than in a non-backup setup
and you will periodically eat network bandwidth transferring the WAL files
to another system.  I doubt either hit is significant enough to forgo the
benefit of PITR.


I would probably be very concerned about only having 300MB on the production
system (I assume you have considerably more on the target archive storage
location).  If it is the archive system you are referring to you would have
to run pg_basebackup more frequently to minimize the overall archive storage
space but it has the added benefit of making recovery occur very quickly.

David J.







--
View this message in context: http://postgresql.1045698.n5.nabble.com/WAL-for-backup-tp5813194p5813197.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


--
Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-novice



--
~Shreesha.

Re: WAL for backup

From
David G Johnston
Date:
shreesha21 wrote
> Thank you David for the quick response. Appreciate it!
> I am currently running PostgreSQL version 9.3.4.
>
> Actually, when I said 300MB, I was referring to the archive system space.
> So like you suggested, I might have to run pg_basebackup more frequently.
> So are you saying that every time after I run pg_basebackup, if I do
> pg_archivecleanup
> then can I ensure data consistency and reduce the archive storage space?

I am not familiar enough with the specific mechanics to feel comfortable
giving prescriptive advice.  The pg_basebackup and associated mandatory WAL
file comprise a full image of the database and any WAL files created before
then can be removed.

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/WAL-for-backup-tp5813194p5813214.html
Sent from the PostgreSQL - novice mailing list archive at Nabble.com.


Re: WAL for backup

From
Sameer Kumar
Date:

What is the version postgresql you are using? What is the transaction volume you have got? Have you tried to run with minimal wal_level and assert the number of wal files generated during your activity? 300mb is considerably very small storage space for production/archive disk.

You may want to consider provisioning a separate server which has backups and archives.

You can stream archives using pg_receivexlog and take backup using pg_basebackup.

pg_receivexlog works same as streaming and will not wait for WAL to be archived and hence your RPO can be more closer to point of crash.

All this would work with lot of ifs and buts. Hence its is imp to know your version and OS platform.

Since the over head of having hot_standby over having archive is hardly noticeable you may want to use hot_standby instead.

Regards
Sameer

PS: Sent from my Mobile device. Pls ignore typo n abb

On 30 Jul 2014 01:18, "Shreesha" <shreesha1988@gmail.com> wrote:
Hey All,
I am new to the postgreSQL and need some clarifications from your end.

I am running a standalone PostgreSQL server and I want to have the data backed up for consistency and system availability. Upon exploring options available in postgresql, I think continuous archival and PITR suits for my system. But this method requires archive mode to be enabled which comes with resource requirements like space/memory etc. 
The following are my concerns:
1. Currently, our system is backing up the database data before restart and periodically for every 3 hours. Is there any option in postgresql to do the archive periodically, say for every 3 hours?
2. If there is a system reboot and if postgreSQL server was running in the system, Will I be ensured from postgreSQL that all the WAL logs archived so far contains all the necessary data that was committed before system reboot?
3. As I am running standalone server, I want to set the wal_level='archive' and not 'hot standby'. Am I right in my understanding here? (I saw couple of examples over the internet and they are using 'hot standby' as the wal_level. So wanted to clarify.)

There are some constraints in my system as well.
4. The hard disk space in my system is minimal (around 300 MB). So I can't keep more archive files in my system as it may run out of space. I know that each archive files are of 16MB of size. So I want to keep it less. Is there any mechanism which can be used to cleanup the archive files periodically while the server is running, and ensure data consistency?
5. The system's performance is a concern. So I wanted to know whether by enabling the archiving option, will PostgreSQL show any delays in query execution and result retrieval?

Sorry if I am bombarding many questions here. It would be great if any of you could answer each of my questions mentioned above.
Eagerly looking forward for your response.

Thanks in advance!

Regards,
Shreesha.

Re: WAL for backup

From
Sameer Kumar
Date:

On 30 Jul 2014 02:04, "Shreesha" <shreesha1988@gmail.com> wrote:
>
> Thank you David for the quick response. Appreciate it!
> I am currently running PostgreSQL version 9.3.4.
>

Saw your response now... Both the tools I mentioned will work in v9.3. And if you do a clean shutdown all pending wals will be streamed before shutdown is complete.

> Actually, when I said 300MB, I was referring to the archive system space. So like you suggested, I might have to run pg_basebackup more frequently. 
> So are you saying that every time after I run pg_basebackup, if I do pg_archivecleanup then can I ensure data consistency and reduce the archive storage space?
>
>
> On Tue, Jul 29, 2014 at 10:41 AM, David G Johnston <david.g.johnston@gmail.com> wrote:
>>
>> shreesha21 wrote
>> > Hey All,
>> > I am new to the postgreSQL and need some clarifications from your end.
>> >
>> > I am running a standalone PostgreSQL server and I want to have the data
>> > backed up for consistency and system availability. Upon exploring options
>> > available in postgresql, I think continuous archival and PITR suits for my
>> > system. But this method requires archive mode to be enabled which comes
>> > with resource requirements like space/memory etc.
>> > The following are my concerns:
>> > 1. Currently, our system is backing up the database data before restart
>> > and
>> > periodically for every 3 hours. Is there any option in postgresql to do
>> > the
>> > archive periodically, say for every 3 hours?
>> > 2. If there is a system reboot and if postgreSQL server was running in the
>> > system, Will I be ensured from postgreSQL that all the WAL logs archived
>> > so
>> > far contains all the necessary data that was committed before system
>> > reboot?
>> > 3. As I am running standalone server, I want to set the
>> > wal_level='archive'
>> > and not 'hot standby'. Am I right in my understanding here? (I saw couple
>> > of examples over the internet and they are using 'hot standby' as the
>> > wal_level. So wanted to clarify.)
>> >
>> > There are some constraints in my system as well.
>> > 4. The hard disk space in my system is minimal (around 300 MB). So I can't
>> > keep more archive files in my system as it may run out of space. I know
>> > that each archive files are of 16MB of size. So I want to keep it less. Is
>> > there any mechanism which can be used to cleanup the archive files
>> > periodically while the server is running, and ensure data consistency?
>> > 5. The system's performance is a concern. So I wanted to know whether by
>> > enabling the archiving option, will PostgreSQL show any delays in query
>> > execution and result retrieval?
>> >
>> > Sorry if I am bombarding many questions here. It would be great if any of
>> > you could answer each of my questions mentioned above.
>> > Eagerly looking forward for your response.
>> >
>> > Thanks in advance!
>> >
>> > Regards,
>> > Shreesha.
>>
>> You should indicate what version of PostgreSQL you are using.
>>
>> 1) The archives will occur as the files fill up.  The files themselves are
>> continually written.  In archive modes additional data is always written.
>> So, basically no, you cannot guarantee the archive will occur no more
>> frequently than every 3 hours.
>>
>> 2) If the database is shutdown cleanly I am almost positive you are
>> guaranteed a WAL archive
>>
>> 3) Some additional info is captured for hot standby mode but for offline
>> PITR either one will work.
>>
>> 4) Not totally fluent on this point but once a file has been archived
>> off-system it can and will be re-used so the main issue is if the archive
>> fails for some reason and you fill up before you can resolve the issue.
>>
>> 5) You will be writing a little more data to disk than in a non-backup setup
>> and you will periodically eat network bandwidth transferring the WAL files
>> to another system.  I doubt either hit is significant enough to forgo the
>> benefit of PITR.
>>
>>
>> I would probably be very concerned about only having 300MB on the production
>> system (I assume you have considerably more on the target archive storage
>> location).  If it is the archive system you are referring to you would have
>> to run pg_basebackup more frequently to minimize the overall archive storage
>> space but it has the added benefit of making recovery occur very quickly.
>>
>> David J.
>>
>>
>>
>>
>>
>>
>>
>> --
>> View this message in context: http://postgresql.1045698.n5.nabble.com/WAL-for-backup-tp5813194p5813197.html
>> Sent from the PostgreSQL - novice mailing list archive at Nabble.com.
>>
>>
>> --
>> Sent via pgsql-novice mailing list (pgsql-novice@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-novice
>
>
>
>
> --
> ~Shreesha.