Re: logical changeset generation v6.2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: logical changeset generation v6.2
Date
Msg-id CA+TgmoY9MY0hh4Od=fBZW3n+5e9dPh8Ey3axdR547TT_ZfnG7Q@mail.gmail.com
Whole thread Raw
In response to Re: logical changeset generation v6.2  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: logical changeset generation v6.2  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, Oct 22, 2013 at 11:02 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-10-22 10:52:48 -0400, Robert Haas wrote:
>> On Fri, Oct 18, 2013 at 2:26 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > So. As it turns out that solution isn't sufficient in the face of VACUUM
>> > FULL and mixed DML/DDL transaction that have not yet been decoded.
>> >
>> > To reiterate, as published it works like:
>> > For every modification of catalog tuple (insert, multi_insert, update,
>> > delete) that has influence over visibility issue a record that contains:
>> > * filenode
>> > * ctid
>> > * (cmin, cmax)
>> >
>> > When doing a visibility check on a catalog row during decoding of mixed
>> > DML/DDL transaction lookup (cmin, cmax) for that row since we don't
>> > store both for the tuple.
>> >
>> > That mostly works great.
>> >
>> > The problematic scenario is decoding a transaction that has done mixed
>> > DML/DDL *after* a VACUUM FULL/CLUSTER has been performed. The VACUUM
>> > FULL obviously changes the filenode and the ctid of a tuple, so we
>> > cannot successfully do a lookup based on what we logged before.
>>
>> So I have a new idea for handling this problem, which seems obvious in
>> retrospect.  What if we make the VACUUM FULL or CLUSTER log the old
>> CTID -> new CTID mappings?  This would only need to be done for
>> catalog tables, and maybe could be skipped for tuples whose XIDs are
>> old enough that we know those transactions must already be decoded.
>
> Ah. If it only were so simple ;). That was my first idea, and after I'd
> bragged in an 2ndq internal chat that I'd found a simple idea I
> obviously had to realize it doesn't work.
>
> Consider:
> INIT_LOGICAL_REPLICATION;
> CREATE TABLE foo(...);
> BEGIN;
> INSERT INTO foo;
> ALTER TABLE foo ...;
> INSERT INTO foo;
> COMMIT TX 3;
> VACUUM FULL pg_class;
> START_LOGICAL_REPLICATION;
>
> When we decode tx 3 we haven't yet read the mapping from the vacuum
> freeze. That scenario can happen either because decoding was stopped for
> a moment, or because decoding couldn't keep up (slow connection,
> whatever).

That strikes me as a flaw in the implementation rather than the idea.
You're presupposing a patch where the necessary information is
available in WAL yet you don't make use of it at the proper time.  It
seems to me that you have to think of the CTID map as tied to a
relfilenode; if you try to use one relfilenode's map with a different
relfilenode, it's obviously not going to work.  So don't do that.

/me looks innocent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Rajeev rastogi
Date:
Subject: Re: COPY table FROM STDIN doesn't show count tag
Next
From: Andres Freund
Date:
Subject: Re: logical changeset generation v6.2