Thread: planned recovery from a certain transaction

planned recovery from a certain transaction

From
"Chris Spotts"
Date:

I know questions like this have been asked before, but I hadn’t seen one quite from the same perspective (although I’m sure it’s out there somewhere)…

 

We have a database which has one long involved procedure early in the morning that updates all sorts of things, moves data around, deletes some stuff, alters some DDL - you name it, it does it.  The rest of the day, the database is read only.  Autovacuum is not on, it was killing performance for it to kick on in the middle of the proc.  Vacuum is run right before the long procedure.  We typically wait until the guy onsite verifies the procedure went smoothly and then vacuum (Why? Because we’ve read enough of “well, you would have been able to restore that if the autovacuum wasn’t running”.  We have a backup so this is just a checking before vacuuming is technically unneeded.).  Don’t get me wrong, I’m a big autovacuum fan, just not for this specific case.

 

The transaction itself works flawlessly, but every once and awhile the data the it uploads from comes in flawed and we have to find a way to reset it.    This reset involves restoring a backup that was taken right before the proc started.   If we had the xid of the long running transaction, is there a better way to reset it right before that transaction happened?  Restoring the backup is a lengthy process because several of the tables that are affected are rather large.

 

Chris Spotts

 

 

 

 

Re: planned recovery from a certain transaction

From
Richard Huxton
Date:
Chris Spotts wrote:
>
> The transaction itself works flawlessly, but every once and awhile the data
> the it uploads from comes in flawed and we have to find a way to reset it.
> This reset involves restoring a backup that was taken right before the proc
> started.   If we had the xid of the long running transaction, is there a
> better way to reset it right before that transaction happened?  Restoring
> the backup is a lengthy process because several of the tables that are
> affected are rather large.

No way really to "rewind" to a previous transaction (although I believe
the original academic code PostgreSQL is based on could do this sort of
thing).

I'd look at doing a PITR backup (full+WAL) just before the long
transaction is started. Alternatively, if you can afford the downtime
you could just stop the database server and take a snapshot of all the
DB files (as for PITR).

Use of rsync or filesystems that handle snapshots would make both of
these reasonably fast. Restores are just a matter of moving the files /
pointing PG at the backup set and starting it - effectively instant.

--
   Richard Huxton
   Archonet Ltd

Re: planned recovery from a certain transaction

From
Alan Hodgson
Date:
On Thursday 25 June 2009, "Chris Spotts" <rfusca@gmail.com> wrote:
> The transaction itself works flawlessly, but every once and awhile the
> data the it uploads from comes in flawed and we have to find a way to
> reset it. This reset involves restoring a backup that was taken right
> before the proc started.   If we had the xid of the long running
> transaction, is there a better way to reset it right before that
> transaction happened?  Restoring the backup is a lengthy process because
> several of the tables that are affected are rather large.
>

PITR would permit you to restore the database to a point in time or a
transaction ID. You could probably do something with filesystem snapshots
to minimize backup/restore time for it.


--
WARNING:  Do not look into laser with remaining eye.

Re: planned recovery from a certain transaction

From
Alvaro Herrera
Date:
Chris Spotts escribió:

> The transaction itself works flawlessly, but every once and awhile the data
> the it uploads from comes in flawed and we have to find a way to reset it.
> This reset involves restoring a backup that was taken right before the proc
> started.   If we had the xid of the long running transaction, is there a
> better way to reset it right before that transaction happened?  Restoring
> the backup is a lengthy process because several of the tables that are
> affected are rather large.

You could mark it aborted in pg_clog, assuming none of the tuples it
touched have been examined by anyone else after it finished.  Since you
likely want to crosscheck the data (thus examine it, which sets its hint
bits), it's going to be very hard to do.

Another idea would be to use PITR to restore to the time just before the
transaction, but that's going to be painful too because restoring from a
base backup is going to take long for your big tables.

Lastly, you could use a filesystem snapshot taken just before the long
procedure, to which to revert if you don't like how it went.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: planned recovery from a certain transaction

From
Scott Marlowe
Date:
On Thu, Jun 25, 2009 at 9:40 AM, Chris Spotts<rfusca@gmail.com> wrote:
>
>
> We have a database which has one long involved procedure early in the
> morning that updates all sorts of things, moves data around, deletes some
> stuff, alters some DDL - you name it, it does it.  The rest of the day, the
> database is read only.  Autovacuum is not on, it was killing performance for
> it to kick on in the middle of the proc.  Vacuum is run right before the
> long procedure.  We typically wait until the guy onsite verifies the
> procedure went smoothly and then vacuum (Why? Because we’ve read enough of
> “well, you would have been able to restore that if the autovacuum wasn’t
> running”.  We have a backup so this is just a checking before vacuuming is
> technically unneeded.).  Don’t get me wrong, I’m a big autovacuum fan, just
> not for this specific case.
>
>
>
> The transaction itself works flawlessly, but every once and awhile the data
> the it uploads from comes in flawed and we have to find a way to reset it.
>   This reset involves restoring a backup that was taken right before the
> proc started.   If we had the xid of the long running transaction, is there
> a better way to reset it right before that transaction happened?  Restoring
> the backup is a lengthy process because several of the tables that are
> affected are rather large.

Assuming that the data is mostly created from whole cloth each
morning, it might do to have two dbs, and rename one to replace the
other when you're ready.  Gives you 20 or so hours to discover a screw
up and still have the backup db before you toss it away to build the
next day's db.

Re: planned recovery from a certain transaction

From
Greg Stark
Date:
>> The transaction itself works flawlessly, but every once and awhile the data
>> the it uploads from comes in flawed and we have to find a way to reset it.

If you can automate the tests for the flaws you can do the whole
transaction itself as one big transaction in Postgres. Even DDL can be
done in transactions in Postgres. Then run the tests, still in the
same transaction, and if you detect anything wrong roll the whole
shebang back.

--
greg
http://mit.edu/~gsstark/resume.pdf

Re: planned recovery from a certain transaction

From
Chris Spotts
Date:
>
> Assuming that the data is mostly created from whole cloth each
> morning, it might do to have two dbs, and rename one to replace the
> other when you're ready.  Gives you 20 or so hours to discover a screw
> up and still have the backup db before you toss it away to build the
> next day
For this database, we're dumping in the neighborhood of 75GB of data
each morning.  This is mostly data warehousing.
I'm not quite sure how effective its going to be to do a file system
snapshot since the whole db is so large.

The aborted transaction in the pg_clog sounds similar to what I was
dreaming of, but "very hard to do" doesn't sound real promising.

Here's the approach we've come up with today to help combat it and let
me know you're thoughts on it.

We're mostly  warehousing of logistics data.  So we have an import table
that the files are uploaded to.  We have a set of tables for today's
data processing, a set of tables for the results of yesterday (we do
some crossday day comparisions that make it needed), and a set of
partitioned archive tables.

The data is uploaded into the import table, the previous day tables are
moved to history, and then the current day tables are moved to previous
day, leaving the current day tables ready to be populated.

The import table is s then normalized into several current day tables,
mostly split into dynamic and static data.  Some fields are normalized
to reduce size, ie we store a city_id int instead of a 60 character
field.  Reports and stats are generated off the current days records and
the comparision of the two days.

Because we calculate certain things like durations of certain statuses,
its not as simple as just "delete everything with a certain date and
then copy the appropriate dates from history into current and previous
respectively." We effectively alter some fields that cannot be
"unaltered" short of a restore of some kind.  Due to speed concerns,
we've not found a way around this.  We'd like to store the needed fields
to calculate the durations after end of the duration, but that can be
quite lengthy.  Instead triggers update a table that tracks key events.

This all is what we currently do.
This means that we'd have to have a HUGE filesystem snapshot to do it
like that.  To do a pg_dump restore, we have to restore one of the
partitions that is 250GB (2 weeks of data partitioned) - its a lengthy
process.

We're looking at adding a set of daily queue tables that sits right
before archive.  Writing the files to these tables and then if
everything checks out, dump to history.  This way, we should only ever
have to have a dump of the current, previous, and queue tables to be
able to "undo".  Suggestions?

Attachment

Re planned recovery from a certain transaction

From
Chris Spotts
Date:
On Thu, 2009-06-25 at 21:59 +0100, Greg Stark wrote:
> >> The transaction itself works flawlessly, but every once and awhile the data
> >> the it uploads from comes in flawed and we have to find a way to reset it.
>
> If you can automate the tests for the flaws you can do the whole
> transaction itself as one big transaction in Postgres. Even DDL can be
> done in transactions in Postgres. Then run the tests, still in the
> same transaction, and if you detect anything wrong roll the whole
> shebang back.
>
We automate dozens of tests for the data, very first thing all in one
large transaction.  We love the fact that it undoes DDL and have taken
full advantage of it. But we get "these stats don't look right" and then
go look at the import and find out that the context of the data was
incorrect.  I can't imagine how to write the tests for the stuff we find
wrong.  If anybody has an extra AI system laying around, let me know ;).

Attachment