Re: First draft of snapshot snapshot building design document - Mailing list pgsql-hackers

From Robert Haas
Subject Re: First draft of snapshot snapshot building design document
Date
Msg-id CA+TgmoZXkCo5FAbU=3JHuXXF0Op2SLhGJcVuFM3tkmcBnmhBMQ@mail.gmail.com
Whole thread Raw
In response to First draft of snapshot snapshot building design document  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: First draft of snapshot snapshot building design document  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, Oct 16, 2012 at 7:30 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Thursday, October 11, 2012 01:02:26 AM Peter Geoghegan wrote:
>> The design document [2] really just explains the problem (which is the
>> need for catalog metadata at a point in time to make sense of heap
>> tuples), without describing the solution that this patch offers with
>> any degree of detail. Rather, [2] says "How we build snapshots is
>> somewhat intricate and complicated and seems to be out of scope for
>> this document", which is unsatisfactory. I look forward to reading the
>> promised document that describes this mechanism in more detail.
>
> Here's the first version of the promised document. I hope it answers most of
> the questions.
>
> Input welcome!

I haven't grokked all of this in its entirety, but I'm kind of
uncomfortable with the relfilenode -> OID mapping stuff.  I'm
wondering if we should, when logical replication is enabled, find a
way to cram the table OID into the XLOG record.  It seems like that
would simplify things.

If we don't choose to do that, it's worth noting that you actually
need 16 bytes of data to generate a unique identifier for a relation,
as in database OID + tablespace OID + relfilenode# + backend ID.
Backend ID might be ignorable because WAL-based logical replication is
going to ignore temporary relations anyway, but you definitely need
the other two.  There's nothing, for example, to keep you from having
two relations with the same value in pg_class.relfilenode in the same
database but in different tablespaces.  It's unlikely to happen,
because for new relations we set OID = relfilenode, but a subsequent
rewrite can bring it about if the stars align just right.  (Such
situations are, of course, a breeding ground for bugs, which might
make you question whether our current scheme for assigning
relfilenodes has much of anything to recommend it.)

Another thing to think about is that, like catalog snapshots,
relfilenode mappings have to be time-relativized; that is, you need to
know what the mapping was at the proper point in the WAL sequence, not
what it is now.  In practice, the risk here seems to be minimal,
because it takes a while to churn through 4 billion OIDs.  However, I
suspect it pays to think about this fairly carefully because if we do
ever run into a situation where the OID counter wraps during a time
period comparable to the replication lag, the bugs will be extremely
difficult to debug.

Anyhow, adding the table OID to the WAL header would chew up a few
more bytes of WAL space, but it seems like it might be worth it to
avoid having to think very hard about all of these issues.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: [BUG] False indication in pg_stat_replication.sync_state
Next
From: Simon Riggs
Date:
Subject: Re: Deprecations in authentication