Re: Freezing without write I/O - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Freezing without write I/O
Date
Msg-id CA+TgmoaG1+2CNQe5aYpMKukPbdT0krK=L2fuZ6A-0FeeuCFmkw@mail.gmail.com
Whole thread Raw
In response to Re: Freezing without write I/O  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Freezing without write I/O  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On Thu, May 30, 2013 at 2:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Random thought: Could you compute the reference XID based on the page
> LSN?  That would eliminate the storage overhead.

After mulling this over a bit, I think this is definitely possible.
We begin a new "half-epoch" every 2 billion transactions.  We remember
the LSN at which the current half-epoch began and the LSN at which the
previous half-epoch began.  When a new half-epoch begins, the first
backend that wants to stamp a tuple with an XID from the new
half-epoch must first emit a "new half-epoch" WAL record, which
becomes the starting LSN for the new half-epoch.

We define a new page-level bit, something like PD_RECENTLY_FROZEN.
When this bit is set, it means there are no unfrozen tuples on the
page with XIDs that predate the current half-epoch.  Whenever we know
this to be true, we set the bit.  If the page LSN crosses more than
one half-epoch boundary at a time, we freeze the page and set the bit.If the page LSN crosses exactly one half-epoch
boundary,then (1) if
 
the bit is set, we clear it and (2) if the bit is not set, we freeze
the page and set the bit.  The advantage of this is that we avoid an
epidemic of freezing right after a half-epoch change.  Immediately
after a half-epoch change, many pages will mix tuples from the current
and previous half-epoch - but relatively few pages will have tuples
from the current half-epoch and a half-epoch more than one in the
past.

As things stand today, we really only need to remember the last two
half-epoch boundaries; they could be stored, for example, in the
control file.  But if we someday generalize CLOG to allow indefinite
retention as you suggest, we could instead remember all half-epoch
boundaries that have ever occurred; just maintain a file someplace
with 8 bytes of data for every 2 billion XIDs consumed over the
lifetime of the cluster.  In fact, we might want to do it that way
anyhow, just to keep our options open, and perhaps for forensics.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Behavior of a pg_trgm index for 2 (or < 3) character LIKE queries
Next
From: Brendan Jurd
Date:
Subject: Re: 9.3: Empty arrays returned by array_remove()