Logical decoding & exported base snapshot - Mailing list pgsql-hackers

From Andres Freund
Subject Logical decoding & exported base snapshot
Date
Msg-id 20121211235216.GA4337@awork2.anarazel.de
Whole thread Raw
Responses Re: Logical decoding & exported base snapshot
Re: Logical decoding & exported base snapshot
List pgsql-hackers
Hi,

When initiating a new logical replication "slot" I want to provide a
'SET TRANSACTION SNAPSHOT'able snapshot which can be used to setup a new
replica. I have some questions arround this where I could use some input
on.

First, some basics around how this currently works:

Test the other side:

psql "port=5440 host=/tmp dbname=postgres replication=1"
postgres=# IDENTIFY_SYSTEM;     systemid       | timeline |  xlogpos  |  dbname
---------------------+----------+-----------+----------5820768138794874841 |        1 | 0/190AF98 | postgres
(1 row)


Now, initiate a replication slot:

postgres=# INIT_LOGICAL_REPLICATION 'test';
WARNING:  Initiating logical rep
WARNING:  reached consistent point, stopping!replication_id | consistent_point | snapshot_name | plugin
----------------+------------------+---------------+--------id-0           | 0/190AFD0        | 0xDEADBEEF    | test
(1 row)

If you would actually want to receive changes you would now (and later)
use START_LOGICAL_REPLICATION 'id-0' 0/190AFD0; to stream the changes
from that point onwards.

INIT_LOGICAL_REPLICATION starts to decode WAL from the last checkpoint
on and reads until it finds a point where it has enough information
(including a suitable xmin horizon) to decode the contents.. Then it
tells you the name of the newly created slot, the wal location and the
name of a snapshot it exported (obviously fake here).


I have the code to export a real snapshot, but exporting the snapshot is
actually the easy part. The idea with the exported snapshot obviously is
that you can start a pg_dump using it to replicate the base date for a
new replica.

Problem 1:

One problem I see is that while exporting a snapshot solves the
visibility issues of the table's contents it does not protect against
schema changes. I am not sure whether thats a problem.

If somebody runs a CLUSTER or something like that, the table's contents
will be preserved including MVCC semantics. That's fine.
The more problematic cases I see are TRUNCATE, DROP and ALTER
TABLE. Imagine the following:

S1: INIT_LOGICAL_REPLICATION
S1: get snapshot: 00000333-1
S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text);
S3: pg_dump --snapshot 00000333-1
S1: START_LOGICAL_REPLICATION

In that case the pg_dump would dump foo using the schema *after* the
ALTER TABLE but showing only rows visible to our snapshot. After
START_LOGICAL_REPLICATION all changes after the xlog position from
INIT_LOGICAL_REPLICATION will be returned though - including all the
tuples from the ALTER TABLE and potentially - if some form of schema
capturing was in place - the ALTER TABLE itself. The copied schema would
have the new format already though.

Does anybody see that as aproblem or is it just a case of PEBKAC? One
argument for the latter is that thats already a problematic case for
normal pg_dump's. Its just that the window is a bit larger here.

Problem 2:

Control Flow.

To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction
needs to be alive. That's currently solved by exporting the snapshot in
the walsender connection that did the INIT_LOGICAL_REPLICATION. The
question is how long should we preserve that snapshot?

There is no requirement - and there *cannot* be one in the general case,
the slot needs to be usable after a restart - that
START_LOGICAL_REPLICATION has to be run in the same replication
connection as INIT.

Possible solutions:
1) INIT_LOGICAL_REPLICATION waits for an answer from the client that
confirms that logical replication initialization is finished. Before
that the walsender connection cannot be used for anything else.

2) we remove the snapshot as soon as any other commend is received, this
way the replication connection stays usable, e.g. to issue a
START_LOGICAL_REPLICATION in parallel to the initial data dump. In that
case the snapshot would have to be imported *before* the next command
was received as SET TRANSACTION SNAPSHOT requires the source transaction
to be still open.

Opinions?

Andres
--Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: Re: [PATCH 02/14] Add support for a generic wal reading facility dubbed XLogReader
Next
From: Greg Smith
Date:
Subject: Re: [WIP PATCH] for Performance Improvement in Buffer Management