RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Kumar, Sachin
Subject RE: Initial Schema Sync for Logical Replication
Date
Msg-id 12867b0b0c7a44208d0e6653b19c8f54@amazon.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
> > > > From: Amit Kapila <amit.kapila16@gmail.com>
> > > > > I think we won't be able to use same snapshot because the
> > > > > transaction will be committed.
> > > > > In CreateSubscription() we can use the transaction snapshot from
> > > > > walrcv_create_slot() till walrcv_disconnect() is called.(I am
> > > > > not sure about this part maybe walrcv_disconnect() calls the commits
> internally ?).
> > > > > So somehow we need to keep this snapshot alive, even after
> > > > > transaction is committed(or delay committing the transaction ,
> > > > > but we can have CREATE SUBSCRIPTION with ENABLED=FALSE, so we
> > > > > can have a restart before tableSync is able to use the same
> > > > > snapshot.)
> > > > >
> > > >
> > > > Can we think of getting the table data as well along with schema
> > > > via pg_dump? Won't then both schema and initial data will
> > > > correspond to the same snapshot?
> > >
> > > Right , that will work, Thanks!
> >
> > While it works, we cannot get the initial data in parallel, no?
> >

I was thinking each TableSync process will call pg_dump --table, This way if we have N
tableSync process, we can have N pg_dump --table=table_name called in parallel.
In fact we can use --schema-only to get schema and then let COPY take care of data
syncing . We will use same snapshot for pg_dump as well as COPY table. 

Regards
Sachin

pgsql-hackers by date:

Previous
From: Dagfinn Ilmari Mannsåker
Date:
Subject: Tab completion for AT TIME ZONE
Next
From: Amit Kapila
Date:
Subject: Re: Initial Schema Sync for Logical Replication