Thread: PostgreSQL data loads - turn off WAL
My version: PostgreSQL v9.1.5 Version string: "PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit" Basically my question is: Is there currently any way to avoid wal generation during data load for given tables and then have point in time recovery after that? Background and blurb The situation i'm referring to here is for a development environment. I require point in time recovery because if there is crash etc I don't want to lose up to a days work for 12 developers. I'm fairly new to PostgreSQL so please forgive any gaps in my knowledge. A developer did a data load yesterday of approximately 5GB of data into a new schema. This generated approximately 7GB of wal. The situation arises where if something is incorrect in the data load the data load may need to be repeated 2 or 3 times (thus generating 20GB +of WAL). For a data load i don't want wal to be generated. I accept the fact before there was nothing and from the point of the next pg_basebackup there was everything. It is from the point i say ok that is everything (the next backup) that i want point in time recovery to apply to that table. It is doesn't seem practical, and appears very risky to turn off wal_archive during the data load. I'd appreciate your thoughts and suggestions, Thanks, Rob -- View this message in context: http://postgresql.1045698.n5.nabble.com/PostgreSQL-data-loads-turn-off-WAL-tp5725374.html Sent from the PostgreSQL - general mailing list archive at Nabble.com.
Hi, > Basically my question is: > Is there currently any way to avoid wal generation during data load for given > tables and then have point in time recovery after that? Please have a look at unlogged and temporary options here - http://www.postgresql.org/docs/9.1/static/sql-createtable.html I don't think they are crash safe and point in time recovery may not be possible for these tables. If this is something similar to a daily load in a data warehouse, you could consider using temporary tables for all the processing/aggregationand then move data to the target tables (real, logged tables). This url might also help - http://www.postgresql.org/docs/9.1/static/populate.html Regards, Jayadevan DISCLAIMER: "The information in this e-mail and any attachment is intended only for the person to whom it is addressedand may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contactthe sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guaranteesthe accuracy, adequacy or completeness of the information contained in this email or any attachment and is notliable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
hartrc wrote: > Basically my question is: > Is there currently any way to avoid wal generation during data load for > given tables and then have point in time recovery after that? > > Background and blurb > The situation i'm referring to here is for a development environment. I > require point in time recovery because if there is crash etc I don't want to > lose up to a days work for 12 developers. I'm fairly new to PostgreSQL so > please forgive any gaps in my knowledge. > > A developer did a data load yesterday of approximately 5GB of data into a > new schema. This generated approximately 7GB of wal. The situation arises > where if something is incorrect in the data load the data load may need to > be repeated 2 or 3 times (thus generating 20GB +of WAL). For a data load i > don't want wal to be generated. I accept the fact before there was nothing > and from the point of the next pg_basebackup there was everything. It is > from the point i say ok that is everything (the next backup) that i want > point in time recovery to apply to that table. > It is doesn't seem practical, and appears very risky to turn off wal_archive > during the data load. But that's exactly what I'd recommend. Set archive_mode=off, restart the server, load your data, set archive_mode=on, restart the server, perform a backup. If something goes wrong during data load, restore the previous backup and PITR to the end. Turning off WAL archiving is no data corruption risk. The server will still be able to recover from crashes. Yours, Laurenz Albe
On Tue, Sep 25, 2012 at 9:09 AM, hartrc <rhart2@mt.gov> wrote: > My version: PostgreSQL v9.1.5 > Version string: "PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by > gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit" > > Basically my question is: > Is there currently any way to avoid wal generation during data load for > given tables and then have point in time recovery after that? > > Background and blurb > The situation i'm referring to here is for a development environment. I > require point in time recovery because if there is crash etc I don't want to > lose up to a days work for 12 developers. What is it that is being developed? If you are developing applications that use postgres for storage, then with a crash of that database you should not lose the developers' work, unless you are using the same cluster as both the source-code repository and the database to which the test applications being developed connect (which seems like a bad idea) Or are you not documenting your database schema and code anywhere other than in the database itself? That too seems like a bad idea, at least for a busy multi-person development shop. > A developer did a data load yesterday of approximately 5GB of data into a > new schema. This generated approximately 7GB of wal. The situation arises > where if something is incorrect in the data load the data load may need to > be repeated 2 or 3 times (thus generating 20GB +of WAL). Is the problem the IO generated by this, or the short-term storage, or the long term storage of it? If it is the storage, you could do a specially-scheduled backup as soon as the load is done, and then delete the WAL soon. Are you compressing your WAL? If not, then doing that might be enough to solve the problem. It sounds like the main thing your group "produces" is code development, and from what you describe the database is a "production" database since losing it loses your product. If that is the case, you really need a "development" database as well. This could run in archive_mode=off, and then the 5GB would not go to "production" until they are fairly confident it will work and not have to be repeated. So that would cut out 2/3 of the WAL. Or, they could create the table as "unlogged" until they know the bulk load works, then recreate it as logged and repeat the load. (I think that currently there is no way to make an unlogged table then promote it to logged, that would probably be ideal for you, but someday....) > For a data load i > don't want wal to be generated. I accept the fact before there was nothing > and from the point of the next pg_basebackup there was everything. It is > from the point i say ok that is everything (the next backup) that i want > point in time recovery to apply to that table. > It is doesn't seem practical, and appears very risky to turn off wal_archive > during the data load. How long does the bulk load plus one backup plus two database restarts take? That would probably be far less than one day, so the risk of turning off archive_mode temporarily seems like it should be minimal. If your hard drives were to die horribly, how long would it take you replace them and restore the database from the last backup and roll it forward? That could calibrate how much past work you are willing to lose in the event it comes to that. Cheers, Jeff