Logical decoding & exported base snapshot - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Logical decoding & exported base snapshot |
Date | |
Msg-id | 20121211235216.GA4337@awork2.anarazel.de Whole thread Raw |
Responses |
Re: Logical decoding & exported base snapshot
Re: Logical decoding & exported base snapshot |
List | pgsql-hackers |
Hi, When initiating a new logical replication "slot" I want to provide a 'SET TRANSACTION SNAPSHOT'able snapshot which can be used to setup a new replica. I have some questions arround this where I could use some input on. First, some basics around how this currently works: Test the other side: psql "port=5440 host=/tmp dbname=postgres replication=1" postgres=# IDENTIFY_SYSTEM; systemid | timeline | xlogpos | dbname ---------------------+----------+-----------+----------5820768138794874841 | 1 | 0/190AF98 | postgres (1 row) Now, initiate a replication slot: postgres=# INIT_LOGICAL_REPLICATION 'test'; WARNING: Initiating logical rep WARNING: reached consistent point, stopping!replication_id | consistent_point | snapshot_name | plugin ----------------+------------------+---------------+--------id-0 | 0/190AFD0 | 0xDEADBEEF | test (1 row) If you would actually want to receive changes you would now (and later) use START_LOGICAL_REPLICATION 'id-0' 0/190AFD0; to stream the changes from that point onwards. INIT_LOGICAL_REPLICATION starts to decode WAL from the last checkpoint on and reads until it finds a point where it has enough information (including a suitable xmin horizon) to decode the contents.. Then it tells you the name of the newly created slot, the wal location and the name of a snapshot it exported (obviously fake here). I have the code to export a real snapshot, but exporting the snapshot is actually the easy part. The idea with the exported snapshot obviously is that you can start a pg_dump using it to replicate the base date for a new replica. Problem 1: One problem I see is that while exporting a snapshot solves the visibility issues of the table's contents it does not protect against schema changes. I am not sure whether thats a problem. If somebody runs a CLUSTER or something like that, the table's contents will be preserved including MVCC semantics. That's fine. The more problematic cases I see are TRUNCATE, DROP and ALTER TABLE. Imagine the following: S1: INIT_LOGICAL_REPLICATION S1: get snapshot: 00000333-1 S2: ALTER TABLE foo ALTER COLUMN blub text USING (blub::text); S3: pg_dump --snapshot 00000333-1 S1: START_LOGICAL_REPLICATION In that case the pg_dump would dump foo using the schema *after* the ALTER TABLE but showing only rows visible to our snapshot. After START_LOGICAL_REPLICATION all changes after the xlog position from INIT_LOGICAL_REPLICATION will be returned though - including all the tuples from the ALTER TABLE and potentially - if some form of schema capturing was in place - the ALTER TABLE itself. The copied schema would have the new format already though. Does anybody see that as aproblem or is it just a case of PEBKAC? One argument for the latter is that thats already a problematic case for normal pg_dump's. Its just that the window is a bit larger here. Problem 2: Control Flow. To be able to do a "SET TRANSACTION SNAPSHOT" the source transaction needs to be alive. That's currently solved by exporting the snapshot in the walsender connection that did the INIT_LOGICAL_REPLICATION. The question is how long should we preserve that snapshot? There is no requirement - and there *cannot* be one in the general case, the slot needs to be usable after a restart - that START_LOGICAL_REPLICATION has to be run in the same replication connection as INIT. Possible solutions: 1) INIT_LOGICAL_REPLICATION waits for an answer from the client that confirms that logical replication initialization is finished. Before that the walsender connection cannot be used for anything else. 2) we remove the snapshot as soon as any other commend is received, this way the replication connection stays usable, e.g. to issue a START_LOGICAL_REPLICATION in parallel to the initial data dump. In that case the snapshot would have to be imported *before* the next command was received as SET TRANSACTION SNAPSHOT requires the source transaction to be still open. Opinions? Andres --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: