Re: Logical decoding & exported base snapshot - Mailing list pgsql-hackers

From Steve Singer
Subject Re: Logical decoding & exported base snapshot
Date
Msg-id BLU0-SMTP8417A16727A6855996B47DC4F0@phx.gbl
Whole thread Raw
In response to Logical decoding & exported base snapshot  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Logical decoding & exported base snapshot
List pgsql-hackers
On 12-12-11 06:52 PM, Andres Freund wrote:
> Hi,

>
> Problem 1:
>
> One problem I see is that while exporting a snapshot solves the
> visibility issues of the table's contents it does not protect against
> schema changes. I am not sure whether thats a problem.
>
> If somebody runs a CLUSTER or something like that, the table's contents
> will be preserved including MVCC semantics. That's fine.
> The more problematic cases I see are TRUNCATE, DROP and ALTER
> TABLE. Imagine the following:
>
> S1: INIT_LOGICAL_REPLICATION
> S1: get snapshot: 00000333-1
> S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
> S3: pg_dump --snapshot 00000333-1
> S1: START_LOGICAL_REPLICATION
>
> In that case the pg_dump would dump foo using the schema *after* the
> ALTER TABLE but showing only rows visible to our snapshot. After
> START_LOGICAL_REPLICATION all changes after the xlog position from
> INIT_LOGICAL_REPLICATION will be returned though - including all the
> tuples from the ALTER TABLE and potentially - if some form of schema
> capturing was in place - the ALTER TABLE itself. The copied schema would
> have the new format already though.
>
> Does anybody see that as aproblem or is it just a case of PEBKAC? One
> argument for the latter is that thats already a problematic case for
> normal pg_dump's. Its just that the window is a bit larger here.

Is there anyway to detect this situation as part of the pg_dump?  If I 
could detect this, abort my pg_dump then go and get a new snapshot then 
I don't see this as a big deal.  I can live with telling users, "don't 
do DDL like things while subscribing a new node, if you do the 
subscription will restart". I am less keen on telling users "don't do 
DDL like things while subscribing a new node or the results will be 
unpredictable"

I'm trying to convince myself if I will be able to take a pg_dump from 
an exported snapshot plus the changes made after in between the snapshot 
id to some later time and turn the results into a consistent database.  
What if something like this comes along

INIT REPLICATION
insert into foo (id,bar) values (1,2);
alter table foo drop column bar;
pg_dump --snapshot


The schema I get as part of the pg_dump won't have bar because it has 
been dropped, even though it will have the rows that existed with bar.
I then go to process the INSERT statement.  It will have a WAL record 
with column data for bar and the logical replication replay will lookup 
the catalog rows from before the alter table so it will generate a 
logical INSERT record with BAR.  That will fail on the replica.

I'm worried that it will be  difficult to pragmatically stitch together 
the  inconsistent snapshot from the pg_dump plus the logical records 
generated in between the snapshot and the dump (along with any record of 
the DDL if it exists).




> Problem 2:
>
> Control Flow.
>
> To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction
> needs to be alive. That's currently solved by exporting the snapshot in
> the walsender connection that did the INIT_LOGICAL_REPLICATION. The
> question is how long should we preserve that snapshot?
>
> There is no requirement - and there *cannot* be one in the general case,
> the slot needs to be usable after a restart - that
> START_LOGICAL_REPLICATION has to be run in the same replication
> connection as INIT.
>
> Possible solutions:
> 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
> confirms that logical replication initialization is finished. Before
> that the walsender connection cannot be used for anything else.
>
> 2) we remove the snapshot as soon as any other commend is received, this
> way the replication connection stays usable, e.g. to issue a
> START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
> case the snapshot would have to be imported *before* the next command
> was received as SET TRANSACTION SNAPSHOT requires the source transaction
> to be still open.
>
> Opinions?

Option 2 sounds more flexible.  Is it more difficult to implement?
> Andres
> --
>   Andres Freund                       http://www.2ndQuadrant.com/
>   PostgreSQL Development, 24x7 Support, Training & Services
>
>




pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Logical decoding & exported base snapshot
Next
From: Josh Kupershmidt
Date:
Subject: Re: Multiple --table options for other commands