Thread: recovery after long delete

recovery after long delete

From

Markus Bertheau

Date:

14 April 2005, 12:47:46

Hi.

Our professor told us the following story: Oracle. A client issued a
selective delete statement on a big table. After two days he lost
patience and pulled the plug. Unfortunately while starting up, oracle
had to restore all the deleted rows, which took it another two days. He
reasoned that one better copies all rows that are not to be deleted in
another table drops the original table afterwards. (Concurrency, fks,
indexes are not the question here). So I wondered how this works in
PostgreSQL. As I understand it, what's going on is the following:

1. The transaction 45 is started. It is recorded as in-progress.
2. The rows selected in the delete statement are one by one marked as
to-be-deleted by txn 45. Among them row 27.
3. If a concurrently running read committed txn 47 wants to update row
27, it blocks, awaiting whether txn 45 commits or aborts.
4.1 When txn 45 commits, it is marked as such.
5.1 txn 47 can continue, but as row 27 was deleted, it is not affected
by txn 47's update statement.
4.2 When txn 45 aborts, it is marked as such. This means the same as not
being marked at all.
5.2 txn 47 continues and updates row 27.

Now if you pull the plug after 2, at startup, pg will go through the
in-progress txns and mark them as aborted. That's all the recovery in
this case. All rows are still there. O(1).

How does oracle do that? Has all this something to do with mvcc? Why
does it take oracle so long to recover?

Thanks

Markus
--
Markus Bertheau <twanger@bluetwanger.de>

Attachment

signature.asc

Re: recovery after long delete

From

Tom Lane

Date:

14 April 2005, 13:34:43

Markus Bertheau <twanger@bluetwanger.de> writes:
> Now if you pull the plug after 2, at startup, pg will go through the
> in-progress txns and mark them as aborted. That's all the recovery in
> this case. All rows are still there. O(1).

Right.  (Actually it's O(checkpoint interval), because we have to make
sure that everything we did since the last checkpoint actually got to
disk --- but in principle, there's zero recovery effort.)

> How does oracle do that? Has all this something to do with mvcc? Why
> does it take oracle so long to recover?

Oracle doesn't do MVCC the same way we do.  They update rows in place
and put the previous version of a row into an "undo log".  If the
transaction aborts, they have to go back through the undo log and put
back the previous version of the row.  I'm not real clear on how that
applies to deletions, but I suppose it's the same deal: cost of undoing
a transaction in Oracle is proportional to the number of rows it
changed.  There's also the little problem that the space available for
UNDO logs is limited :-(

As against which, they don't have to VACUUM.  So it's a tradeoff.

            regards, tom lane

Re: recovery after long delete

From

Greg Stark

Date:

14 April 2005, 15:11:39

Markus Bertheau <twanger@bluetwanger.de> writes:

> How does oracle do that? Has all this something to do with mvcc? Why
> does it take oracle so long to recover?

Postgres does "pessimistic MVCC" where it keeps the old versions where they
are in the table. Only after it's committed can they be cleaned up and reused.
So aborting is a noop but committing requires additional cleanup (which is put
off until vacuum runs).

Oracle does "optimistic MVCC" where it assumes most transactions will commit
and most transactions will be reading mostly committed data. So it immediately
does all the cleanup for the commit. It stores the old version in separate
storage spaces called the rollback segment and redo logs. Committing is a noop
(almost, there are some details, search for "delayed block cleanout") whereas
rolling back requires copying back all that old data from the redo logs back
to the table.

Engineering is all about tradeoffs.

--
greg