Re: First draft of snapshot snapshot building design document - Mailing list pgsql-hackers

From Robert Haas
Subject Re: First draft of snapshot snapshot building design document
Date
Msg-id CA+TgmoZ29jsb3yha_+Sshu=miJZNW4BizLTs7cckipkt=a7_7Q@mail.gmail.com
Whole thread Raw
In response to Re: First draft of snapshot snapshot building design document  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: First draft of snapshot snapshot building design document  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Thu, Oct 18, 2012 at 11:20 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On Thursday, October 18, 2012 04:47:12 PM Robert Haas wrote:
>> On Tue, Oct 16, 2012 at 7:30 AM, Andres Freund <andres@2ndquadrant.com>
> wrote:
>> > On Thursday, October 11, 2012 01:02:26 AM Peter Geoghegan wrote:
>> >> The design document [2] really just explains the problem (which is the
>> >> need for catalog metadata at a point in time to make sense of heap
>> >> tuples), without describing the solution that this patch offers with
>> >> any degree of detail. Rather, [2] says "How we build snapshots is
>> >> somewhat intricate and complicated and seems to be out of scope for
>> >> this document", which is unsatisfactory. I look forward to reading the
>> >> promised document that describes this mechanism in more detail.
>> >
>> > Here's the first version of the promised document. I hope it answers most
>> > of the questions.
>> >
>> > Input welcome!
>>
>> I haven't grokked all of this in its entirety, but I'm kind of
>> uncomfortable with the relfilenode -> OID mapping stuff.  I'm
>> wondering if we should, when logical replication is enabled, find a
>> way to cram the table OID into the XLOG record.  It seems like that
>> would simplify things.
>>
>> If we don't choose to do that, it's worth noting that you actually
>> need 16 bytes of data to generate a unique identifier for a relation,
>> as in database OID + tablespace OID + relfilenode# + backend ID.
>> Backend ID might be ignorable because WAL-based logical replication is
>> going to ignore temporary relations anyway, but you definitely need
>> the other two.  ...
>
> Hm. I should take look at the way temporary tables are represented. As you say
> I is not going to matter for WAL decoding, but still...
>
>> Another thing to think about is that, like catalog snapshots,
>> relfilenode mappings have to be time-relativized; that is, you need to
>> know what the mapping was at the proper point in the WAL sequence, not
>> what it is now.  In practice, the risk here seems to be minimal,
>> because it takes a while to churn through 4 billion OIDs.  However, I
>> suspect it pays to think about this fairly carefully because if we do
>> ever run into a situation where the OID counter wraps during a time
>> period comparable to the replication lag, the bugs will be extremely
>> difficult to debug.
>
> I think with a rollbacks + restarts we might even be able to see the same
> relfilenode earlier.
>
>> Anyhow, adding the table OID to the WAL header would chew up a few
>> more bytes of WAL space, but it seems like it might be worth it to
>> avoid having to think very hard about all of these issues.
>
> I don't think its necessary to change wal logging here. The relfilenode mapping
> is now looked up using the timetravel snapshot we've built using (spcNode,
> relNode) as the key, so the time-relativized lookup is "builtin". If we screw
> that up way much more is broken anyway.
>
> Two problems are left:
>
> 1) (reltablespace, relfilenode) is not unique in pg_class because InvalidOid is
> stored for relfilenode if its a shared or nailed table. That not a problem for
> the lookup because weve already checked the relmapper before that, so we never
> look those up anyway. But it violates documented requirements of syscache.c.
> Even after some looking I haven't found any problem that that could cause.
>
> 2) We need to decide whether a HEAP[1-2]_* record did catalog changes when
> building/updating snapshots. Unfortunately we also need to do this *before* we
> built the first snapshot. For now treating all tables as catalog modifying
> before we built the snapshot seems to work fine.
> I think encoding the oid in the xlog header wouln't help all that much here,
> because I am pretty sure we want to have the set of "catalog tables" to be
> extensible at some point...

I don't like catalog-only snapshots at all.  I think that's just a
recipe for subtle or not-so-subtle breakage down the road...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: assertion failure w/extended query protocol
Next
From: Peter Geoghegan
Date:
Subject: Re: assertion failure w/extended query protocol