Thread: recovery after long delete
Hi. Our professor told us the following story: Oracle. A client issued a selective delete statement on a big table. After two days he lost patience and pulled the plug. Unfortunately while starting up, oracle had to restore all the deleted rows, which took it another two days. He reasoned that one better copies all rows that are not to be deleted in another table drops the original table afterwards. (Concurrency, fks, indexes are not the question here). So I wondered how this works in PostgreSQL. As I understand it, what's going on is the following: 1. The transaction 45 is started. It is recorded as in-progress. 2. The rows selected in the delete statement are one by one marked as to-be-deleted by txn 45. Among them row 27. 3. If a concurrently running read committed txn 47 wants to update row 27, it blocks, awaiting whether txn 45 commits or aborts. 4.1 When txn 45 commits, it is marked as such. 5.1 txn 47 can continue, but as row 27 was deleted, it is not affected by txn 47's update statement. 4.2 When txn 45 aborts, it is marked as such. This means the same as not being marked at all. 5.2 txn 47 continues and updates row 27. Now if you pull the plug after 2, at startup, pg will go through the in-progress txns and mark them as aborted. That's all the recovery in this case. All rows are still there. O(1). How does oracle do that? Has all this something to do with mvcc? Why does it take oracle so long to recover? Thanks Markus -- Markus Bertheau <twanger@bluetwanger.de>
Attachment
Markus Bertheau <twanger@bluetwanger.de> writes: > Now if you pull the plug after 2, at startup, pg will go through the > in-progress txns and mark them as aborted. That's all the recovery in > this case. All rows are still there. O(1). Right. (Actually it's O(checkpoint interval), because we have to make sure that everything we did since the last checkpoint actually got to disk --- but in principle, there's zero recovery effort.) > How does oracle do that? Has all this something to do with mvcc? Why > does it take oracle so long to recover? Oracle doesn't do MVCC the same way we do. They update rows in place and put the previous version of a row into an "undo log". If the transaction aborts, they have to go back through the undo log and put back the previous version of the row. I'm not real clear on how that applies to deletions, but I suppose it's the same deal: cost of undoing a transaction in Oracle is proportional to the number of rows it changed. There's also the little problem that the space available for UNDO logs is limited :-( As against which, they don't have to VACUUM. So it's a tradeoff. regards, tom lane
Markus Bertheau <twanger@bluetwanger.de> writes: > How does oracle do that? Has all this something to do with mvcc? Why > does it take oracle so long to recover? Postgres does "pessimistic MVCC" where it keeps the old versions where they are in the table. Only after it's committed can they be cleaned up and reused. So aborting is a noop but committing requires additional cleanup (which is put off until vacuum runs). Oracle does "optimistic MVCC" where it assumes most transactions will commit and most transactions will be reading mostly committed data. So it immediately does all the cleanup for the commit. It stores the old version in separate storage spaces called the rollback segment and redo logs. Committing is a noop (almost, there are some details, search for "delayed block cleanout") whereas rolling back requires copying back all that old data from the redo logs back to the table. Engineering is all about tradeoffs. -- greg