Re: Protecting against unexpected zero-pages: proposal - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Protecting against unexpected zero-pages: proposal
Date
Msg-id AANLkTik5G-KeNGu0VqZCAbP1Qn-ZDDc2qZu38q1n_0JE@mail.gmail.com
Whole thread Raw
In response to Re: Protecting against unexpected zero-pages: proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Tue, Nov 9, 2010 at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> <dons asbestos underpants>
>> 4. There would presumably be some finite limit on the size of the
>> shared memory structure for aborted transactions.  I don't think
>> there'd be any reason to make it particularly small, but if you sat
>> there and aborted transactions at top speed you might eventually run
>> out of room, at which point any transactions you started wouldn't be
>> able to abort until vacuum made enough progress to free up an entry.
>
> Um, that bit is a *complete* nonstarter.  The possibility of a failed
> transaction always has to be allowed.  What if vacuum itself gets an
> error for example?  Or, what if the system crashes?

I wasn't proposing that it was impossible to abort, only that aborts
might have to block.  I admit I don't know what to do about VACUUM
itself failing.  A transient failure mightn't be so bad, but if you
find yourself permanently unable to eradicate the XIDs left behind by
an aborted transaction, you'll eventually have to shut down the
database, lest the XID space wrap around.

Actually, come to think of it, there's no reason you COULDN'T spill
the list of aborted-but-not-yet-cleaned-up XIDs to disk.  It's just
that XidInMVCCSnapshot() would get reeeeeeally expensive after a
while.

> I thought for a bit about inverting the idea, such that there were a
> limit on the number of unvacuumed *successful* transactions rather than
> the number of failed ones.  But that seems just as unforgiving: what if
> you really need to commit a transaction to effect some system state
> change?  An example might be dropping some enormous table that you no
> longer need, but vacuum is going to insist on plowing through before
> it'll let you have any more transactions.

The number of relevant aborted XIDs tends naturally to decline to zero
as vacuum does its thing, while the number of relevant committed XIDs
tends to grow very, very large (it starts to decline only when we
start freezing things), so remembering the not-yet-cleaned-up aborted
XIDs seems likely to be cheaper.  In fact, in many cases, the set of
not-yet-cleaned-up aborted XIDs will be completely empty.

> I'm of the opinion that any design that presumes it can always fit all
> the required transaction-status data in memory is probably not even
> worth discussing.

Well, InnoDB does it.

> There always has to be a way for status data to spill
> to disk.  What's interesting is how you can achieve enough locality of
> access so that most of what you need to look at is usually in memory.

We're not going to get any more locality of reference than we're
already getting from hint bits, are we?  The advantage of trying to do
timely cleanup of aborted transactions is that you can assume that any
XID before RecentGlobalXmin is committed, without checking CLOG and
without having to update hint bits and write out the ensuing dirty
pages.  If we could make CLOG access cheap enough that we didn't need
hint bits, that would also solve that problem, but nobody (including
me) seems to think that's feasible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Gurjeet Singh
Date:
Subject: Re: Protecting against unexpected zero-pages: proposal
Next
From: Gurjeet Singh
Date:
Subject: Re: DROP TABLESPACE needs crash-resistance