Re: logical changeset generation v6.2 - Mailing list pgsql-hackers

From Robert Haas
Subject Re: logical changeset generation v6.2
Date
Msg-id CA+TgmoabJ17SngfGWL_OA+5GVGyb3JHniy8smoDhn=eSbPKB5w@mail.gmail.com
Whole thread Raw
In response to Re: logical changeset generation v6.2  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Tue, Oct 29, 2013 at 11:43 AM, Andres Freund <andres@2ndquadrant.com> wrote:
>> I think modifying GetNewRelFileNode() is attacking the problem from
>> the wrong end.  The point is that when a table is dropped, that fact
>> can be communicated to the same machine machinery that's been tracking
>> the CTID->CTID mappings.  Instead of saying "hey, the tuples that were
>> in relfilenode 12345 are now in relfilenode 67890 in these new
>> positions", it can say "hey, the tuples that were in relfilenode 12345
>> are now GONE".
>
> Unfortunately I don't understand what you're suggesting. What I am
> worried about is something like:
>
> <- decoding is here
> VACUUM FULL pg_class; -- rewrites filenode 1 to 2
> VACUUM FULL pg_class; -- rewrites filenode 2 to 3
> VACUUM FULL pg_class; -- rewrites filenode 3 to 1
> <- now decode up to here
>
> In this case there are two possible (cmin,cmax) values for a specific
> tuple. One from the original filenode 1 and one for the one generated
> from 3.
> Now that will only happen if there's an oid wraparound which hopefully
> shouldn't happen very often, but I'd like to not rely on that.

Ah, OK.  I didn't properly understand the scenario you were concerned
about.  There's only a potential problem here if we get behind by more
than 4 billion relfilenodes, which seems remote, but maybe not:

http://www.pgcon.org/2013/schedule/events/595.en.html

This still seems to me to be basically an accounting problem.  At any
given time, we should *know* where the catalog tuples are located.  We
can't be decoding changes that require a given system catalog while
that system catalog is locked, so any given decoding operation happens
either before or after, not during, the rewrite of the corresponding
catalog.  As long as that VACUUM FULL operation is responsible for
updating the logical decoding metadata, we should be fine.  Any
relcache entries referencing the old relfilenode need to be
invalidated, and any CTID->[cmin,cmax] maps we're storing for those
old relfilenodes need to be invalidated, too.

>> >> Completely aside from this issue, what
>> >> keeps a relation from being dropped before we've decoded all of the
>> >> changes made to its data before the point at which it was dropped?  (I
>> >> hope the answer isn't "nothing".)
>> >
>> > Nothing. But there's no need to prevent it, it'll still be in the
>> > catalog and we don't ever access a non-catalog relation's data during
>> > decoding.
>>
>> Oh, right.  But what about a drop of a user-catalog table?
>
> Currently nothing prevents that. I am not sure it's worth worrying about
> it, do you think we should?

Maybe.  Depends partly on how ugly things get if it happens, I suppose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: CLUSTER FREEZE
Next
From: Jeff Janes
Date:
Subject: Re: Fast insertion indexes: why no developments