Re: [PATCH] Add `truncate` option to subscription commands - Mailing list pgsql-hackers

From David Christensen
Subject Re: [PATCH] Add `truncate` option to subscription commands
Date
Msg-id CANKeMr_JmUzkEw9WHQYXXdKFU+xrkAQ9jd5U=TrcrHMC_2BrWg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Add `truncate` option to subscription commands  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: [PATCH] Add `truncate` option to subscription commands  (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
List pgsql-hackers
Hi,

At this time I do not have time to make the necessary changes for this
commitfest so I am voluntarily withdrawing this patch, but will
revisit at a future time.

Best,

David

On Wed, Oct 28, 2020 at 1:06 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi David,
>
> The feature seems useful to me.  The code will need to be refactored due to changes in commit : b05fe7b442
>
> Please see the following comments.
> 1. Is there a specific reason behind having new relstate for truncate?
> The current state flow is INIT->DATATSYNC->SYNCWAIT->CATCHUP->SYNCDONE->READY.
> Can we accommodate the truncate in either INIT or DATASYNC?
>
> 2.   +                               StartTransactionCommand();
>       +                               rel = table_open(MyLogicalRepWorker->relid, RowExclusiveLock);
>       +
>       +                               rels = lappend(rels, rel);
>       +                               relids = lappend_oid(relids, MyLogicalRepWorker->relid);
>       +
>       +                               ExecuteTruncateGuts(rels, relids, NIL, DROP_RESTRICT, false);
>       +                               CommitTransactionCommand();
>
> Truncate is being performed in a separate transaction as data copy, I think that leaves a window
> open for concurrent transactions to modify the data after truncate and before copy.
>
> 3. Regarding the truncate of the referenced table, I think one approach can be to perform the following:
> i. lock the referencing and referenced tables against writes
> ii. drop the foriegn key constraints,
> iii.truncate
> iv. sync
> v. recreate the constraints
> vi. release lock.
> However, I am not sure of the implications of locking these tables on the main apply process.
>
>
> Thank you,
>
>
> On Mon, Oct 12, 2020 at 11:31 PM David Christensen <david@endpoint.com> wrote:
>>
>> > On Oct 11, 2020, at 10:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Mon, Oct 12, 2020 at 3:44 AM David Christensen <david@endpoint.com> wrote:
>> >>
>> >>
>> >> On Oct 11, 2020, at 1:14 PM, Euler Taveira <euler.taveira@2ndquadrant.com> wrote:
>> >>
>> >> 
>> >> On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:
>> >>>
>> >>>
>> >>> Enclosed find a patch to add a “truncate” option to subscription commands.
>> >>>
>> >>> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on
thetarget which are being newly subscribed will be truncated before the data copy step.  This saves explicit
coordinationof a manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same
ason the publisher at the time of sync. 
>> >>>
>> >>> To preserve compatibility with existing behavior, the default value for this parameter is `false`.
>> >>>
>> >>
>> >> Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the
usefulnesswill be restricted. 
>> >>
>> >>
>> >> This is true for existing “truncate” with FKs, so doesn’t seem to be any different to me.
>> >>
>> >
>> > What would happen if there are multiple tables and truncate on only
>> > one of them failed due to FK check? Does it give an error in such a
>> > case, if so will the other tables be truncated?
>>
>> Currently each SyncRep relation is sync’d separately in its own worker process; we are doing the truncate at the
initializationstep of this, so it’s inherently in its own transaction. I think if we are going to do any sort of
validationon this, it would have to be at the point of the CREATE SUBSCRIPTION/REFRESH PUBLICATION where we have the
relationlist and can do sanity-checking there. 
>>
>> Obviously if someone changes the schema at some point between when it does this and when relation syncs start there
isa race condition, but the same issue would affect other data sync things, so I don’t care to solve that as part of
thispatch. 
>>
>> >> Hypothetically if you checked all new tables and could verify if there were FK cycles only already in the new
tablesbeing added then “truncate cascade” would be fine. Arguably if they had existing tables that were part of an FK
thatwasn’t fully replicated they were already operating brokenly. 
>> >>
>> >
>> > I think if both PK_table and FK_table are part of the same
>> > subscription then there should be a problem as both them first get
>> > truncated? If they are part of a different subscription (or if they
>> > are not subscribed due to whatever reason) then probably we need to
>> > deal such cases carefully.
>>
>> You mean “should not be a problem” here?  If so, I agree with that.  Obviously if we determine this features is only
usefulwith this support we’d have to chase the entire dependency graph and make sure that is all contained in the set
ofnewly-subscribed tables (or at least FK referents). 
>>
>> I have not considered tables that are part of more than one subscription (is that possible?); we presumably should
errorout if any table exists already in a separate subscription, as we’d want to avoid truncating tables already part
ofan existing subscription. 
>>
>> While I’m happy to take a stab at fixing some of the FK/PK issues, it seems easy to go down a rabbit hole.  I’m not
convincedthat we couldn’t just detect FK issues and choose to not handle this case without decreasing the utility for
atleast some cases.  (Perhaps we could give a hint as to the issues detected to point someone in the right direction.)
Anyway,glad to keep discussing potential implications, etc. 
>>
>> Best,
>>
>> David
>> --
>> David Christensen
>> Senior Software and Database Engineer
>> End Point Corporation
>> david@endpoint.com
>> 785-727-1171
>>
>>
>>
>>



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: walsender bug: stuck during shutdown
Next
From: "Bossart, Nathan"
Date:
Subject: Re: A few new options for CHECKPOINT