Re: Finalizing logical replication limitations as well as potentialfeatures - Mailing list pgsql-hackers

From Joshua D. Drake
Subject Re: Finalizing logical replication limitations as well as potentialfeatures
Date
Msg-id 104a4c3a-6e4f-e76e-ab83-9d0399d5dfa6@commandprompt.com
Whole thread Raw
In response to Re: Finalizing logical replication limitations as well as potential features  (Craig Ringer <craig@2ndquadrant.com>)
Responses Re: Finalizing logical replication limitations as well as potentialfeatures
List pgsql-hackers
On 12/21/2017 06:15 PM, Craig Ringer wrote:
> On 22 December 2017 at 05:24, Joshua D. Drake <jd@commandprompt.com 
> <mailto:jd@commandprompt.com>> wrote:
>
>     -Hackers,
>
>
>     Lastly, I noted that a full sync of a replication set is performed
>     by a COPY, this is fine for small sets but if we have a large data
>     set that may take some time it may be a problem with overall
>     performance and maintenance. We may want to see if we can do an
>     initial sync incrementally (optional) via a cursor (?) and queue
>     all changed rows until the sync completes?
>
>
> I'm not sure I understand this.
>
> The COPY is streamed from source to destination, IIRC it's not 
> buffering to a tempfile or anything. So I fail to see what using a 
> cursor would gain you. No matter whether you're using a cursor, a 
> COPY, or something else, you have to hold down a specific xmin and 
> work with the same snapshot for the whole sync operation. If you 
> instead did something like incremental SELECTs, each with a new 
> xmin+snapshot, across ranges of a PK your copy would see changes from 
> different points in time depending on where in the copy it was up to, 
> and you'd get an inconsistent view. It could possibly be worked around 
> with some tricky key-range-based filtering of the applied 
> change-stream if you were willing to require that no PK updates may 
> occur, but it'd probably be bug city. It's hard enough to get sync 
> correct at all.

I am not sure that this is entirely true. Granted it is easiest just to 
do everything within a snapshot but we shouldn't have to. It would be 
possible to perform incremental (even parallel) syncs whether copy or 
other mechanism. We would have to track changes to the table as we sync 
but that isn't impossible either (especially if we have a PK). I would 
think that this would only be valid within async replication but it is 
possible. We just queue/audit the changes as they happen and sync up the 
changes after the initial sync completes. Multi-phase sync baby :D

Thanks,

JD

-- 
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
*****     Unless otherwise stated, opinions are my own.   *****



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Add parallel-aware hash joins.
Next
From: Stephen Frost
Date:
Subject: Re: GSoC 2018