Thread: PostgreSQL data loads - turn off WAL

PostgreSQL data loads - turn off WAL

From
hartrc
Date:
My version: PostgreSQL v9.1.5
Version string: "PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by
gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit"

Basically my question is:
Is there currently any way to avoid wal generation during data load for
given tables and then have point in time recovery after that?

Background and blurb
The situation i'm referring to here is for a development environment. I
require point in time recovery because if there is crash etc I don't want to
lose up to a days work for 12 developers. I'm fairly new to PostgreSQL so
please forgive any gaps in my knowledge.

A developer did a data load yesterday of approximately 5GB of data into a
new schema. This generated approximately 7GB of wal.  The situation arises
where if something is incorrect in the data load the data load may need to
be repeated 2 or 3 times (thus generating 20GB +of WAL). For a data load i
don't want wal to be generated. I accept the fact before there was nothing
and from the point of the next pg_basebackup there was everything. It is
from the point i say ok that is everything (the next backup) that i want
point in time recovery to apply to that table.
It is doesn't seem practical, and appears very risky to turn off wal_archive
during the data load.

I'd appreciate your thoughts and suggestions,
Thanks,
Rob





--
View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-data-loads-turn-off-WAL-tp5725374.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: PostgreSQL data loads - turn off WAL

From
Jayadevan M
Date:
Hi,
> Basically my question is:
> Is there currently any way to avoid wal generation during data load for given
> tables and then have point in time recovery after that?
Please have a look at unlogged and temporary options here -
http://www.postgresql.org/docs/9.1/static/sql-createtable.html
I don't think they are crash safe and point in time recovery may not be possible for these tables.
If this is something similar to a daily load in a data warehouse, you could consider using temporary tables for all the
processing/aggregationand then move data to the target tables (real, logged tables). This url might also help -  
http://www.postgresql.org/docs/9.1/static/populate.html
Regards,
Jayadevan




DISCLAIMER:   "The information in this e-mail and any attachment is intended only for the person to whom it is
addressedand may contain confidential and/or privileged material. If you have received this e-mail in error, kindly
contactthe sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor
guaranteesthe accuracy, adequacy or completeness of the information contained in this email or any attachment and is
notliable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."  

Re: PostgreSQL data loads - turn off WAL

From
"Albe Laurenz"
Date:
hartrc wrote:
> Basically my question is:
> Is there currently any way to avoid wal generation during data load
for
> given tables and then have point in time recovery after that?
>
> Background and blurb
> The situation i'm referring to here is for a development environment.
I
> require point in time recovery because if there is crash etc I don't
want to
> lose up to a days work for 12 developers. I'm fairly new to PostgreSQL
so
> please forgive any gaps in my knowledge.
>
> A developer did a data load yesterday of approximately 5GB of data
into a
> new schema. This generated approximately 7GB of wal.  The situation
arises
> where if something is incorrect in the data load the data load may
need to
> be repeated 2 or 3 times (thus generating 20GB +of WAL). For a data
load i
> don't want wal to be generated. I accept the fact before there was
nothing
> and from the point of the next pg_basebackup there was everything. It
is
> from the point i say ok that is everything (the next backup) that i
want
> point in time recovery to apply to that table.
> It is doesn't seem practical, and appears very risky to turn off
wal_archive
> during the data load.

But that's exactly what I'd recommend.

Set archive_mode=off, restart the server, load your data,
set archive_mode=on, restart the server, perform a backup.

If something goes wrong during data load, restore the
previous backup and PITR to the end.

Turning off WAL archiving is no data corruption risk.
The server will still be able to recover from crashes.

Yours,
Laurenz Albe


Re: PostgreSQL data loads - turn off WAL

From
Jeff Janes
Date:
On Tue, Sep 25, 2012 at 9:09 AM, hartrc <rhart2@mt.gov> wrote:
> My version: PostgreSQL v9.1.5
> Version string: "PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by
> gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit"
>
> Basically my question is:
> Is there currently any way to avoid wal generation during data load for
> given tables and then have point in time recovery after that?
>
> Background and blurb
> The situation i'm referring to here is for a development environment. I
> require point in time recovery because if there is crash etc I don't want to
> lose up to a days work for 12 developers.

What is it that is being developed?  If you are developing
applications that use postgres for storage, then with a crash of that
database you should not lose the developers' work, unless you are
using the same cluster as both the source-code repository and the
database to which the test applications being developed connect (which
seems like a bad idea)

Or are you not documenting your database schema and code anywhere
other than in the database itself?  That too seems like a bad idea, at
least for a busy multi-person development shop.


> A developer did a data load yesterday of approximately 5GB of data into a
> new schema. This generated approximately 7GB of wal.  The situation arises
> where if something is incorrect in the data load the data load may need to
> be repeated 2 or 3 times (thus generating 20GB +of WAL).

Is the problem the IO generated by this, or the short-term storage, or
the long term storage of it?  If it is the storage, you could do a
specially-scheduled backup as soon as the load is done, and then
delete the WAL soon.  Are you compressing your WAL?  If not, then
doing that might be enough to solve the problem.

It sounds like the main thing your group "produces" is code
development, and from what you describe the database is a "production"
database since losing it loses your product.  If that is the case, you
really need a "development" database as well.  This could run in
archive_mode=off, and then the 5GB would not go to "production" until
they are fairly confident it will work and not have to be repeated.
So that would cut out 2/3 of the WAL.

Or, they could create the table as "unlogged" until they know the bulk
load works, then recreate it as logged and repeat the load.  (I think
that currently there is no way to make an unlogged table then promote
it to logged, that would probably be ideal for you, but someday....)

> For a data load i
> don't want wal to be generated. I accept the fact before there was nothing
> and from the point of the next pg_basebackup there was everything. It is
> from the point i say ok that is everything (the next backup) that i want
> point in time recovery to apply to that table.
> It is doesn't seem practical, and appears very risky to turn off wal_archive
> during the data load.

How long does the bulk load plus one backup plus two database restarts
take?  That would probably be far less than one day, so the risk of
turning off archive_mode temporarily seems like it should be minimal.

If your hard drives were to die horribly, how long would it take you
replace them and restore the database from the last backup and roll it
forward?  That could calibrate how much past work you are willing to
lose in the event it comes to that.

Cheers,

Jeff