Re: Protecting against unexpected zero-pages: proposal - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Protecting against unexpected zero-pages: proposal
Date
Msg-id AANLkTintPQKjFBEMGc3Ww_db7wmTt8FeF=VqLgeQ69AG@mail.gmail.com
Whole thread Raw
In response to Re: Protecting against unexpected zero-pages: proposal  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
On Tue, Nov 9, 2010 at 3:05 PM, Greg Stark <gsstark@mit.edu> wrote:
> On Tue, Nov 9, 2010 at 7:37 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Well, most of the other MVCC-in-table DBMSes simply don't deal with
>> large, on-disk databases.  In fact, I can't think of one which does,
>> currently; while MVCC has been popular for the New Databases, they're
>> all focused on "in-memory" databases.  Oracle and InnoDB use rollback
>> segments.
>
> Well rollback segments are still MVCC. However Oracle's MVCC is
> block-based. So they only have to do the visibility check once per
> block, not once per row. Once they find the right block version they
> can process all the rows on it.
>
> Also Oracle's snapshots are just the log position. Instead of having
> to check whether every transaction committed or not, they just find
> the block version which was last modified before the log position for
> when their transaction started.

That is cool.  One problem is that it might sometimes result in
additional I/O.  A transaction begins and writes a tuple.  We must
write a preimage of the page (or at least, sufficient information to
reconstruct a preimage of the page) to the undo segment.  If the
transaction commits relatively quickly, and all transactions which
took their snapshots before the commit end either by committing or by
aborting, we can discard that information from the undo segment
without ever writing it to disk.  However, if that doesn't happen, the
undo log page may get evicted, and we're now doing three writes (WAL,
page, undo) rather than just two (WAL, page).  That's no worse than an
update where the old and new tuples land on different pages, but it IS
worse than an update where the old and new tuples are on the same
page, or at least I think it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Protecting against unexpected zero-pages: proposal
Next
From: Cédric Villemain
Date:
Subject: Re: multi-platform, multi-locale regression tests