Thread: Saving snapshots for later use
I've been a PostgreSQL user for years and have a feature that I'd like to see implemented. I've started playing with the source and have a vague implementation plan for it, but before I go blundering around I'd like to run the idea by this list. So here it is:
I'd like to be able to save the current snapshot and then at a later date roll the entire database back to that snapshot, essentially erasing everything that happened since the snapshot. Well, more like making all that stuff invisible. This would be really useful when testing applications who's databases take a while to setup and who's tests change the database in some way. It could also provide a way to recover from some horrible user error. Frankly, I'm more interested in this for testing, but having the ability to recover from errors would be nice.
Once you've saved the snapshot it'd be tempting to expose the ability to run queries against the old snapshot as well. This seems like more of a bonus in my mind, though.
>From here on out I'm not sure if I have my terms right so please correct me if I'm wrong.
So I'm wondering a few things:
1. I _think_ all I have to do to store one of these snapshots is to store the current xid and a list of all in progress transactions. I believe I should be able to reconstitute this with the commit log to determine visibility at that snapshot.
2. I _think_ I can save that information in a system table. It feels weird to use a table to store visibility information but part of me thinks that is ok. I can't put my finger on why I think that is ok though.
3. I'm not really sure at all what to do with other connections that are doing things during a snapshot restore. They might just have to get cut. For my purposes this would be ok but I can immagine that would be a problem.
4. I think not allowing the user to save a snapshot during a transaction would simplify the book keeping a ton. I know what I typed in (1) wouldn't cover snapshots inside transactions. I just don't think that is worth the effort.
5. On a more personal note I have no idea if I am capable of implementing this. I'm really not very good with C but I haven't had too much trouble figuring out how to add new parser rules or map them into the tcop layer. I figured out how to create a new system table without too much trouble but haven't a clue how upgrades work for the new table. I'm not sure how to read from the system table but I see there is a faq entry for that.
6. I think it'd be ok if restoring an older snapshot removes a newer snapshot. Using the snapshot mechanism to jump forward in time just sounds hard to keep strait.
I'd attach a patch of my work so far but it certainly isn't anything mind blowing at this point.
Sorry to ramble on for a solid page. Thanks for everyone who made it to the end.
Cheers,
Nik
On 15.06.2012 06:19, Nikolas Everett wrote: > I've been a PostgreSQL user for years and have a feature that I'd like to > see implemented. I've started playing with the source and have a vague > implementation plan for it, but before I go blundering around I'd like to > run the idea by this list. So here it is: > > I'd like to be able to save the current snapshot and then at a later date > roll the entire database back to that snapshot, essentially erasing > everything that happened since the snapshot. Well, more like making all > that stuff invisible. This would be really useful when testing > applications who's databases take a while to setup and who's tests change > the database in some way. It could also provide a way to recover from some > horrible user error. Frankly, I'm more interested in this for testing, but > having the ability to recover from errors would be nice. Have you considered using filesystem snapshots? > So I'm wondering a few things: > 1. I _think_ all I have to do to store one of these snapshots is to store > the current xid and a list of all in progress transactions. I believe I > should be able to reconstitute this with the commit log to determine > visibility at that snapshot. > 2. I _think_ I can save that information in a system table. It feels > weird to use a table to store visibility information but part of me thinks > that is ok. I can't put my finger on why I think that is ok though. > 3. I'm not really sure at all what to do with other connections that are > doing things during a snapshot restore. They might just have to get cut. > For my purposes this would be ok but I can immagine that would be a > problem. > 4. I think not allowing the user to save a snapshot during a transaction > would simplify the book keeping a ton. I know what I typed in (1) wouldn't > cover snapshots inside transactions. I just don't think that is worth the > effort. > 5. On a more personal note I have no idea if I am capable of implementing > this. I'm really not very good with C but I haven't had too much trouble > figuring out how to add new parser rules or map them into the tcop layer. > I figured out how to create a new system table without too much trouble > but haven't a clue how upgrades work for the new table. I'm not sure how > to read from the system table but I see there is a faq entry for that. > 6. I think it'd be ok if restoring an older snapshot removes a newer > snapshot. Using the snapshot mechanism to jump forward in time just sounds > hard to keep strait. To revert the database to the earlier state, you'll also need to somehow roll back all the already-committed transactions. At first sight, that seems easy - just modify clog to mark them as aborted. However, it's not that easy, because you'd also need to somehow clear hint bits that claim those transactions to be committed. Or prevent those hint bits from being set in the first place, but that affects performance, so I don't think that would be very useful for testing. All in all, I think a filesystem snapshot is simpler. Or even just use a copy of the database, and use rsync to revert it back to the original state. You will have to restart postgres, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 15.06.2012 06:19, Nikolas Everett wrote: >> I'd like to be able to save the current snapshot and then at a later date >> roll the entire database back to that snapshot, essentially erasing >> everything that happened since the snapshot. > To revert the database to the earlier state, you'll also need to somehow > roll back all the already-committed transactions. At first sight, that > seems easy - just modify clog to mark them as aborted. However, it's not > that easy, because you'd also need to somehow clear hint bits that claim > those transactions to be committed. Not to mention prevent VACUUM from removing rows deleted by those committed transactions. A saved snapshot of this sort would have to act like an open transaction from the standpoint of resource reclamation, which makes it (a) complicated and (b) very expensive if you intend to hold that snapshot for a long time. I wonder whether your actual use-case could be solved with 9.2's exportable-snapshots feature, though. regards, tom lane