Thread: [PATCH] Add `truncate` option to subscription commands

[PATCH] Add `truncate` option to subscription commands

From
David Christensen
Date:
-hackers,

Enclosed find a patch to add a “truncate” option to subscription commands.

When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the
targetwhich are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination
ofa manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the
publisherat the time of sync. 

To preserve compatibility with existing behavior, the default value for this parameter is `false`.

Best,

David



--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david@endpoint.com
785-727-1171



Attachment

Re: [PATCH] Add `truncate` option to subscription commands

From
Amit Kapila
Date:
On Sat, Oct 10, 2020 at 12:24 AM David Christensen <david@endpoint.com> wrote:
>
> -hackers,
>
> Enclosed find a patch to add a “truncate” option to subscription commands.
>
> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the
targetwhich are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination
ofa manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the
publisherat the time of sync. 
>

So IIUC, this will either truncate all the tables for a particular
subscription or none?  Is it possible that the user wants some of
those tables to be truncated which made me think what exactly made you
propose this feature? Basically, is it from user complaint, or is it
some optimization that you think will be helpful to users?

--
With Regards,
Amit Kapila.



Re: [PATCH] Add `truncate` option to subscription commands

From
David Christensen
Date:

> On Oct 10, 2020, at 12:14 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 10, 2020 at 12:24 AM David Christensen <david@endpoint.com> wrote:
>>
>> -hackers,
>>
>> Enclosed find a patch to add a “truncate” option to subscription commands.
>>
>> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the
targetwhich are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination
ofa manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the
publisherat the time of sync. 
>>
>
> So IIUC, this will either truncate all the tables for a particular
> subscription or none?

Correct, when creating or altering the subscription all newly added tables would be left alone (current behavior) or
truncated(new functionality from the patch). 

> Is it possible that the user wants some of
> those tables to be truncated which made me think what exactly made you
> propose this feature? Basically, is it from user complaint, or is it
> some optimization that you think will be helpful to users?

This comes from my own experience with setting up/modifying subscriptions with adding many multiple additional tables,
someof which had data in the subscribing node. I would have found this feature very helpful.  

Thanks,

David


Re: [PATCH] Add `truncate` option to subscription commands

From
Euler Taveira
Date:
On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:

Enclosed find a patch to add a “truncate” option to subscription commands.

When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the target which are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination of a manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the publisher at the time of sync.

To preserve compatibility with existing behavior, the default value for this parameter is `false`.


Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the usefulness will be restricted.

Regards,

--
Euler Taveira                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Add `truncate` option to subscription commands

From
David Christensen
Date:

On Oct 11, 2020, at 1:14 PM, Euler Taveira <euler.taveira@2ndquadrant.com> wrote:


On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:

Enclosed find a patch to add a “truncate” option to subscription commands.

When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the target which are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination of a manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the publisher at the time of sync.

To preserve compatibility with existing behavior, the default value for this parameter is `false`.


Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the usefulness will be restricted.

This is true for existing “truncate” with FKs, so doesn’t seem to be any different to me.

Hypothetically if you checked all new tables and could verify if there were FK cycles only already in the new tables being added then “truncate cascade” would be fine. Arguably if they had existing tables that were part of an FK that wasn’t fully replicated they were already operating brokenly.

But you would definitely want to avoid “truncate cascade” if the FK target tables were already in the publication, unless we were willing to re-sync the other tables that would be truncated. 

David

Re: [PATCH] Add `truncate` option to subscription commands

From
Amit Kapila
Date:
On Mon, Oct 12, 2020 at 3:44 AM David Christensen <david@endpoint.com> wrote:
>
>
> On Oct 11, 2020, at 1:14 PM, Euler Taveira <euler.taveira@2ndquadrant.com> wrote:
>
> 
> On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:
>>
>>
>> Enclosed find a patch to add a “truncate” option to subscription commands.
>>
>> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the
targetwhich are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination
ofa manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the
publisherat the time of sync. 
>>
>> To preserve compatibility with existing behavior, the default value for this parameter is `false`.
>>
>
> Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the
usefulnesswill be restricted. 
>
>
> This is true for existing “truncate” with FKs, so doesn’t seem to be any different to me.
>

What would happen if there are multiple tables and truncate on only
one of them failed due to FK check? Does it give an error in such a
case, if so will the other tables be truncated?

> Hypothetically if you checked all new tables and could verify if there were FK cycles only already in the new tables
beingadded then “truncate cascade” would be fine. Arguably if they had existing tables that were part of an FK that
wasn’tfully replicated they were already operating brokenly. 
>

I think if both PK_table and FK_table are part of the same
subscription then there should be a problem as both them first get
truncated? If they are part of a different subscription (or if they
are not subscribed due to whatever reason) then probably we need to
deal such cases carefully.

--
With Regards,
Amit Kapila.



Re: [PATCH] Add `truncate` option to subscription commands

From
David Christensen
Date:
> On Oct 11, 2020, at 10:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 3:44 AM David Christensen <david@endpoint.com> wrote:
>>
>>
>> On Oct 11, 2020, at 1:14 PM, Euler Taveira <euler.taveira@2ndquadrant.com> wrote:
>>
>> 
>> On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:
>>>
>>>
>>> Enclosed find a patch to add a “truncate” option to subscription commands.
>>>
>>> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the
targetwhich are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination
ofa manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the
publisherat the time of sync. 
>>>
>>> To preserve compatibility with existing behavior, the default value for this parameter is `false`.
>>>
>>
>> Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the
usefulnesswill be restricted. 
>>
>>
>> This is true for existing “truncate” with FKs, so doesn’t seem to be any different to me.
>>
>
> What would happen if there are multiple tables and truncate on only
> one of them failed due to FK check? Does it give an error in such a
> case, if so will the other tables be truncated?

Currently each SyncRep relation is sync’d separately in its own worker process; we are doing the truncate at the
initializationstep of this, so it’s inherently in its own transaction. I think if we are going to do any sort of
validationon this, it would have to be at the point of the CREATE SUBSCRIPTION/REFRESH PUBLICATION where we have the
relationlist and can do sanity-checking there. 

Obviously if someone changes the schema at some point between when it does this and when relation syncs start there is
arace condition, but the same issue would affect other data sync things, so I don’t care to solve that as part of this
patch.

>> Hypothetically if you checked all new tables and could verify if there were FK cycles only already in the new tables
beingadded then “truncate cascade” would be fine. Arguably if they had existing tables that were part of an FK that
wasn’tfully replicated they were already operating brokenly. 
>>
>
> I think if both PK_table and FK_table are part of the same
> subscription then there should be a problem as both them first get
> truncated? If they are part of a different subscription (or if they
> are not subscribed due to whatever reason) then probably we need to
> deal such cases carefully.

You mean “should not be a problem” here?  If so, I agree with that.  Obviously if we determine this features is only
usefulwith this support we’d have to chase the entire dependency graph and make sure that is all contained in the set
ofnewly-subscribed tables (or at least FK referents). 

I have not considered tables that are part of more than one subscription (is that possible?); we presumably should
errorout if any table exists already in a separate subscription, as we’d want to avoid truncating tables already part
ofan existing subscription. 

While I’m happy to take a stab at fixing some of the FK/PK issues, it seems easy to go down a rabbit hole.  I’m not
convincedthat we couldn’t just detect FK issues and choose to not handle this case without decreasing the utility for
atleast some cases.  (Perhaps we could give a hint as to the issues detected to point someone in the right direction.)
Anyway,glad to keep discussing potential implications, etc. 

Best,

David
--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david@endpoint.com
785-727-1171





Attachment

Re: [PATCH] Add `truncate` option to subscription commands

From
Rahila Syed
Date:
Hi David,

The feature seems useful to me.  The code will need to be refactored due to changes in commit : b05fe7b442

Please see the following comments. 
1. Is there a specific reason behind having new relstate for truncate?
The current state flow is INIT->DATATSYNC->SYNCWAIT->CATCHUP->SYNCDONE->READY.
Can we accommodate the truncate in either INIT or DATASYNC?

2.   +                               StartTransactionCommand();
      +                               rel = table_open(MyLogicalRepWorker->relid, RowExclusiveLock);
      +
      +                               rels = lappend(rels, rel);
      +                               relids = lappend_oid(relids, MyLogicalRepWorker->relid);
      +
      +                               ExecuteTruncateGuts(rels, relids, NIL, DROP_RESTRICT, false);
      +                               CommitTransactionCommand();

Truncate is being performed in a separate transaction as data copy, I think that leaves a window
open for concurrent transactions to modify the data after truncate and before copy.

3. Regarding the truncate of the referenced table, I think one approach can be to perform the following:
i. lock the referencing and referenced tables against writes
ii. drop the foriegn key constraints, 
iii.truncate
iv. sync
v. recreate the constraints 
vi. release lock.  
However, I am not sure of the implications of locking these tables on the main apply process. 


Thank you,


On Mon, Oct 12, 2020 at 11:31 PM David Christensen <david@endpoint.com> wrote:
> On Oct 11, 2020, at 10:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 12, 2020 at 3:44 AM David Christensen <david@endpoint.com> wrote:
>>
>>
>> On Oct 11, 2020, at 1:14 PM, Euler Taveira <euler.taveira@2ndquadrant.com> wrote:
>>
>> 
>> On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:
>>>
>>>
>>> Enclosed find a patch to add a “truncate” option to subscription commands.
>>>
>>> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on the target which are being newly subscribed will be truncated before the data copy step.  This saves explicit coordination of a manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same as on the publisher at the time of sync.
>>>
>>> To preserve compatibility with existing behavior, the default value for this parameter is `false`.
>>>
>>
>> Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the usefulness will be restricted.
>>
>>
>> This is true for existing “truncate” with FKs, so doesn’t seem to be any different to me.
>>
>
> What would happen if there are multiple tables and truncate on only
> one of them failed due to FK check? Does it give an error in such a
> case, if so will the other tables be truncated?

Currently each SyncRep relation is sync’d separately in its own worker process; we are doing the truncate at the initialization step of this, so it’s inherently in its own transaction. I think if we are going to do any sort of validation on this, it would have to be at the point of the CREATE SUBSCRIPTION/REFRESH PUBLICATION where we have the relation list and can do sanity-checking there.

Obviously if someone changes the schema at some point between when it does this and when relation syncs start there is a race condition, but the same issue would affect other data sync things, so I don’t care to solve that as part of this patch.

>> Hypothetically if you checked all new tables and could verify if there were FK cycles only already in the new tables being added then “truncate cascade” would be fine. Arguably if they had existing tables that were part of an FK that wasn’t fully replicated they were already operating brokenly.
>>
>
> I think if both PK_table and FK_table are part of the same
> subscription then there should be a problem as both them first get
> truncated? If they are part of a different subscription (or if they
> are not subscribed due to whatever reason) then probably we need to
> deal such cases carefully.

You mean “should not be a problem” here?  If so, I agree with that.  Obviously if we determine this features is only useful with this support we’d have to chase the entire dependency graph and make sure that is all contained in the set of newly-subscribed tables (or at least FK referents).

I have not considered tables that are part of more than one subscription (is that possible?); we presumably should error out if any table exists already in a separate subscription, as we’d want to avoid truncating tables already part of an existing subscription.

While I’m happy to take a stab at fixing some of the FK/PK issues, it seems easy to go down a rabbit hole.  I’m not convinced that we couldn’t just detect FK issues and choose to not handle this case without decreasing the utility for at least some cases.  (Perhaps we could give a hint as to the issues detected to point someone in the right direction.)  Anyway, glad to keep discussing potential implications, etc.

Best,

David
--
David Christensen
Senior Software and Database Engineer
End Point Corporation
david@endpoint.com
785-727-1171




Re: [PATCH] Add `truncate` option to subscription commands

From
David Christensen
Date:
Hi,

At this time I do not have time to make the necessary changes for this
commitfest so I am voluntarily withdrawing this patch, but will
revisit at a future time.

Best,

David

On Wed, Oct 28, 2020 at 1:06 PM Rahila Syed <rahilasyed90@gmail.com> wrote:
>
> Hi David,
>
> The feature seems useful to me.  The code will need to be refactored due to changes in commit : b05fe7b442
>
> Please see the following comments.
> 1. Is there a specific reason behind having new relstate for truncate?
> The current state flow is INIT->DATATSYNC->SYNCWAIT->CATCHUP->SYNCDONE->READY.
> Can we accommodate the truncate in either INIT or DATASYNC?
>
> 2.   +                               StartTransactionCommand();
>       +                               rel = table_open(MyLogicalRepWorker->relid, RowExclusiveLock);
>       +
>       +                               rels = lappend(rels, rel);
>       +                               relids = lappend_oid(relids, MyLogicalRepWorker->relid);
>       +
>       +                               ExecuteTruncateGuts(rels, relids, NIL, DROP_RESTRICT, false);
>       +                               CommitTransactionCommand();
>
> Truncate is being performed in a separate transaction as data copy, I think that leaves a window
> open for concurrent transactions to modify the data after truncate and before copy.
>
> 3. Regarding the truncate of the referenced table, I think one approach can be to perform the following:
> i. lock the referencing and referenced tables against writes
> ii. drop the foriegn key constraints,
> iii.truncate
> iv. sync
> v. recreate the constraints
> vi. release lock.
> However, I am not sure of the implications of locking these tables on the main apply process.
>
>
> Thank you,
>
>
> On Mon, Oct 12, 2020 at 11:31 PM David Christensen <david@endpoint.com> wrote:
>>
>> > On Oct 11, 2020, at 10:00 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Mon, Oct 12, 2020 at 3:44 AM David Christensen <david@endpoint.com> wrote:
>> >>
>> >>
>> >> On Oct 11, 2020, at 1:14 PM, Euler Taveira <euler.taveira@2ndquadrant.com> wrote:
>> >>
>> >> 
>> >> On Fri, 9 Oct 2020 at 15:54, David Christensen <david@endpoint.com> wrote:
>> >>>
>> >>>
>> >>> Enclosed find a patch to add a “truncate” option to subscription commands.
>> >>>
>> >>> When adding new tables to a subscription (either via `CREATE SUBSCRIPTION` or `REFRESH PUBLICATION`), tables on
thetarget which are being newly subscribed will be truncated before the data copy step.  This saves explicit
coordinationof a manual `TRUNCATE` on the target tables and allows the results of the initial data sync to be the same
ason the publisher at the time of sync. 
>> >>>
>> >>> To preserve compatibility with existing behavior, the default value for this parameter is `false`.
>> >>>
>> >>
>> >> Truncate will fail for tables whose foreign keys refer to it. If such a feature cannot handle foreign keys, the
usefulnesswill be restricted. 
>> >>
>> >>
>> >> This is true for existing “truncate” with FKs, so doesn’t seem to be any different to me.
>> >>
>> >
>> > What would happen if there are multiple tables and truncate on only
>> > one of them failed due to FK check? Does it give an error in such a
>> > case, if so will the other tables be truncated?
>>
>> Currently each SyncRep relation is sync’d separately in its own worker process; we are doing the truncate at the
initializationstep of this, so it’s inherently in its own transaction. I think if we are going to do any sort of
validationon this, it would have to be at the point of the CREATE SUBSCRIPTION/REFRESH PUBLICATION where we have the
relationlist and can do sanity-checking there. 
>>
>> Obviously if someone changes the schema at some point between when it does this and when relation syncs start there
isa race condition, but the same issue would affect other data sync things, so I don’t care to solve that as part of
thispatch. 
>>
>> >> Hypothetically if you checked all new tables and could verify if there were FK cycles only already in the new
tablesbeing added then “truncate cascade” would be fine. Arguably if they had existing tables that were part of an FK
thatwasn’t fully replicated they were already operating brokenly. 
>> >>
>> >
>> > I think if both PK_table and FK_table are part of the same
>> > subscription then there should be a problem as both them first get
>> > truncated? If they are part of a different subscription (or if they
>> > are not subscribed due to whatever reason) then probably we need to
>> > deal such cases carefully.
>>
>> You mean “should not be a problem” here?  If so, I agree with that.  Obviously if we determine this features is only
usefulwith this support we’d have to chase the entire dependency graph and make sure that is all contained in the set
ofnewly-subscribed tables (or at least FK referents). 
>>
>> I have not considered tables that are part of more than one subscription (is that possible?); we presumably should
errorout if any table exists already in a separate subscription, as we’d want to avoid truncating tables already part
ofan existing subscription. 
>>
>> While I’m happy to take a stab at fixing some of the FK/PK issues, it seems easy to go down a rabbit hole.  I’m not
convincedthat we couldn’t just detect FK issues and choose to not handle this case without decreasing the utility for
atleast some cases.  (Perhaps we could give a hint as to the issues detected to point someone in the right direction.)
Anyway,glad to keep discussing potential implications, etc. 
>>
>> Best,
>>
>> David
>> --
>> David Christensen
>> Senior Software and Database Engineer
>> End Point Corporation
>> david@endpoint.com
>> 785-727-1171
>>
>>
>>
>>



Re: [PATCH] Add `truncate` option to subscription commands

From
Bharath Rupireddy
Date:
On Thu, Nov 26, 2020 at 12:16 AM David Christensen <david@endpoint.com> wrote:
>
> Hi,
>
> At this time I do not have time to make the necessary changes for this
> commitfest so I am voluntarily withdrawing this patch, but will
> revisit at a future time.

Hi,

This feature looks useful in the sense that it avoids users having to
manually lookup all the tables on all the subscribers for truncation
(in case they want the subscriber tables to exactly sync with the
publisher tables).

I have gone through the prior discussions on this thread. IMO, we can
always go ahead with TRUNCATE ... RESTRICT behavior to avoid some
unnecessary truncation of subscriber local tables (if at all users
have such tables) that can arise due to CASCADE option. It looks like
there are some problems with the FK - PK dependencies. Below are my
thoughts:

1) Whether a table the sync worker is trying to truncate is having any
referencing (foreign key) tables on the subscriber? If yes, whether
all the referencing tables are present in the list of subscription
tables (output of fetch_table_list)? In this case, the sync worker is
truncating the primary key/referenced table.

One way to solve the above problem is by storing the table oids of the
subscription tables (output of fetch_table_list) in a new column in
the pg_subscription catalog (like subpublications text[] column). In
the sync worker, before truncation of a table, use
heap_truncate_find_FKs to get all the referencing tables of the given
table and get all the subscription tables from the new pg_subscription
column. If all the referencing tables exist in the subscription
tables, then truncate the table, otherwise don't, just skip it. There
can be a problem here if there are many subscription tables, the size
of the new column in pg_susbcription can be huge. However, we can
choose to store the table ids in this new column only when the
truncate option is specified.

Another way is to let each table sync worker scan the
pg_subscription_rel to get all the relations that belong to a
subscription. But I felt this was costly.

2) Whether a table the sync worker is trying to truncate is a
referencing table for any of the subscriber tables that is not part of
the subscription list of tables? In this case, the table the sync
worker is truncating is the foreign key/referencing table.

This isn't a problem actually, the sync worker can safely truncate the
table. This is also inline with the current TRUNCATE command
behaviour.

3) I think we should allow the truncate option with CREATE
SUBSCRIPTION, ALTER SUBSCRIPTION ... REFRESH/SET/ADD PUBLICATION,
basically wherever copy_data and refresh options can be specified. And
there's no need to store the truncate option in the pg_subscription
catalogue because we allow it to be specified with only DDLs.

4) If there are a huge number of tables with lots of data, then all
the sync workers will have to spend an extra amount of time in
truncating the tables. At times the publications can use "FOR ALL
TABLES" i.e. all the tables within a database, so truncating all of
them on the subscriber would be a time consuming task. I'm not sure if
this is okay.

5) We can choose to skip the errors that arise out of
ExecuteTruncateGuts in a sync worker using PG_TRY/PG_CATCH or changing
ExecuteTruncateGuts API to return false on error instead of emitting
an error.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: [PATCH] Add `truncate` option to subscription commands

From
Amit Kapila
Date:
On Sat, May 22, 2021 at 9:58 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Thu, Nov 26, 2020 at 12:16 AM David Christensen <david@endpoint.com> wrote:
> >
> > Hi,
> >
> > At this time I do not have time to make the necessary changes for this
> > commitfest so I am voluntarily withdrawing this patch, but will
> > revisit at a future time.
>
> Hi,
>
> This feature looks useful in the sense that it avoids users having to
> manually lookup all the tables on all the subscribers for truncation
> (in case they want the subscriber tables to exactly sync with the
> publisher tables).
>
> I have gone through the prior discussions on this thread. IMO, we can
> always go ahead with TRUNCATE ... RESTRICT behavior to avoid some
> unnecessary truncation of subscriber local tables (if at all users
> have such tables) that can arise due to CASCADE option. It looks like
> there are some problems with the FK - PK dependencies. Below are my
> thoughts:
>
> 1) Whether a table the sync worker is trying to truncate is having any
> referencing (foreign key) tables on the subscriber? If yes, whether
> all the referencing tables are present in the list of subscription
> tables (output of fetch_table_list)? In this case, the sync worker is
> truncating the primary key/referenced table.
>
> One way to solve the above problem is by storing the table oids of the
> subscription tables (output of fetch_table_list) in a new column in
> the pg_subscription catalog (like subpublications text[] column). In
> the sync worker, before truncation of a table, use
> heap_truncate_find_FKs to get all the referencing tables of the given
> table and get all the subscription tables from the new pg_subscription
> column. If all the referencing tables exist in the subscription
> tables, then truncate the table, otherwise don't, just skip it.
>

Here, silently skipping doesn't seem like a good idea when the user
has asked to truncate the table. Shouldn't we allow it if the user has
provided say cascade with a truncate option?

> There
> can be a problem here if there are many subscription tables, the size
> of the new column in pg_susbcription can be huge. However, we can
> choose to store the table ids in this new column only when the
> truncate option is specified.
>
> Another way is to let each table sync worker scan the
> pg_subscription_rel to get all the relations that belong to a
> subscription. But I felt this was costly.
>

I feel it is better to use pg_subscription_rel especially because we
will do so when the user has given the truncate option and note that
we are already accessing it in sync worker for both reading and
writing. See LogicalRepSyncTableStart.

> 2) Whether a table the sync worker is trying to truncate is a
> referencing table for any of the subscriber tables that is not part of
> the subscription list of tables? In this case, the table the sync
> worker is truncating is the foreign key/referencing table.
>
> This isn't a problem actually, the sync worker can safely truncate the
> table. This is also inline with the current TRUNCATE command
> behaviour.
>
> 3) I think we should allow the truncate option with CREATE
> SUBSCRIPTION, ALTER SUBSCRIPTION ... REFRESH/SET/ADD PUBLICATION,
> basically wherever copy_data and refresh options can be specified. And
> there's no need to store the truncate option in the pg_subscription
> catalogue because we allow it to be specified with only DDLs.
>

makes sense.

One other problem discussed in this thread was what to do when the
same table is part of multiple subscriptions and the user has provided
a truncate option while operating on such a subscription.

-- 
With Regards,
Amit Kapila.



Re: [PATCH] Add `truncate` option to subscription commands

From
Bharath Rupireddy
Date:
On Mon, May 24, 2021 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 1) Whether a table the sync worker is trying to truncate is having any
> > referencing (foreign key) tables on the subscriber? If yes, whether
> > all the referencing tables are present in the list of subscription
> > tables (output of fetch_table_list)? In this case, the sync worker is
> > truncating the primary key/referenced table.
> >
> > One way to solve the above problem is by storing the table oids of the
> > subscription tables (output of fetch_table_list) in a new column in
> > the pg_subscription catalog (like subpublications text[] column). In
> > the sync worker, before truncation of a table, use
> > heap_truncate_find_FKs to get all the referencing tables of the given
> > table and get all the subscription tables from the new pg_subscription
> > column. If all the referencing tables exist in the subscription
> > tables, then truncate the table, otherwise don't, just skip it.
> >
>
> Here, silently skipping doesn't seem like a good idea when the user
> has asked to truncate the table. Shouldn't we allow it if the user has
> provided say cascade with a truncate option?

We could do that. In that case, the truncate option just can't be a
boolean, but it has to be an enum accepting "restrict", "cascade",
maybe "restart identity" or "continue identity" too. I have a concern
here - what if the ExecuteTruncateGuts fails with the cascade option
for whatever reason? Should the table sync worker be trapped in that
error? Will that table ever finish initial table sync/data copy?
Basically, how will this error info be known to the user other than
from the subscriber logs?

Or should it just continue by skipping the error?

If required, we could introduce another rel state, say,
SUBREL_STATE_READY_WITH_TRUNCATION_DONE if the table is truncated as
per the user expectation. Otherwise just SUBREL_STATE_READY if there
has been any error occurred while truncating.

Another thing is that, if we allow the cascade option we must document
it saying that the truncate might cascade down to any subscriber local
tables that are not part of the subscription.

Thoughts?

> > There
> > can be a problem here if there are many subscription tables, the size
> > of the new column in pg_susbcription can be huge. However, we can
> > choose to store the table ids in this new column only when the
> > truncate option is specified.
> >
> > Another way is to let each table sync worker scan the
> > pg_subscription_rel to get all the relations that belong to a
> > subscription. But I felt this was costly.
> >
>
> I feel it is better to use pg_subscription_rel especially because we
> will do so when the user has given the truncate option and note that
> we are already accessing it in sync worker for both reading and
> writing. See LogicalRepSyncTableStart.

Note that in pg_subscription_rel, there can exist multiple rows for
each table for a given subscription. Say, t1 is a table that the sync
worker is trying to truncate and copy. Say, t1_dep1, t1_dep2, t1_dep3
.... are the dependent tables (we can find these using
heap_truncate_find_FKs). Now, we need to see if all the t1_dep1,
t1_dep2, t1_dep3 .... tables are in the pg_suscription_rel with the
same subscription id, then only we can delete all of them with
EexecuteTruncateGuts() using cascade option. If any of the t1_depX is
either not in the pg_subscription_rel or it is subscribed in another
subscription, then is it okay if we scan pg_suscription_rel in a loop
with t1_depX relid's? Isn't it costiler? Or since it is a cache
lookup, maybe that's okay?

    /* Try finding the mapping. */
    tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
                          ObjectIdGetDatum(relid),
                          ObjectIdGetDatum(subid));

> One other problem discussed in this thread was what to do when the
> same table is part of multiple subscriptions and the user has provided
> a truncate option while operating on such a subscription.

I think we can just skip truncating a table when it is part of
multiple subscriptions. We can tell this clearly in the documentation.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: [PATCH] Add `truncate` option to subscription commands

From
Amit Kapila
Date:
On Mon, May 24, 2021 at 2:10 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, May 24, 2021 at 11:01 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > 1) Whether a table the sync worker is trying to truncate is having any
> > > referencing (foreign key) tables on the subscriber? If yes, whether
> > > all the referencing tables are present in the list of subscription
> > > tables (output of fetch_table_list)? In this case, the sync worker is
> > > truncating the primary key/referenced table.
> > >
> > > One way to solve the above problem is by storing the table oids of the
> > > subscription tables (output of fetch_table_list) in a new column in
> > > the pg_subscription catalog (like subpublications text[] column). In
> > > the sync worker, before truncation of a table, use
> > > heap_truncate_find_FKs to get all the referencing tables of the given
> > > table and get all the subscription tables from the new pg_subscription
> > > column. If all the referencing tables exist in the subscription
> > > tables, then truncate the table, otherwise don't, just skip it.
> > >
> >
> > Here, silently skipping doesn't seem like a good idea when the user
> > has asked to truncate the table. Shouldn't we allow it if the user has
> > provided say cascade with a truncate option?
>
> We could do that. In that case, the truncate option just can't be a
> boolean, but it has to be an enum accepting "restrict", "cascade",
> maybe "restart identity" or "continue identity" too. I have a concern
> here - what if the ExecuteTruncateGuts fails with the cascade option
> for whatever reason? Should the table sync worker be trapped in that
> error?
>

How is it any different from any other error we got during table sync
(say PK violation, out of memory, or any other such error)?

>
> > > There
> > > can be a problem here if there are many subscription tables, the size
> > > of the new column in pg_susbcription can be huge. However, we can
> > > choose to store the table ids in this new column only when the
> > > truncate option is specified.
> > >
> > > Another way is to let each table sync worker scan the
> > > pg_subscription_rel to get all the relations that belong to a
> > > subscription. But I felt this was costly.
> > >
> >
> > I feel it is better to use pg_subscription_rel especially because we
> > will do so when the user has given the truncate option and note that
> > we are already accessing it in sync worker for both reading and
> > writing. See LogicalRepSyncTableStart.
>
> Note that in pg_subscription_rel, there can exist multiple rows for
> each table for a given subscription. Say, t1 is a table that the sync
> worker is trying to truncate and copy. Say, t1_dep1, t1_dep2, t1_dep3
> .... are the dependent tables (we can find these using
> heap_truncate_find_FKs). Now, we need to see if all the t1_dep1,
> t1_dep2, t1_dep3 .... tables are in the pg_suscription_rel with the
> same subscription id, then only we can delete all of them with
> EexecuteTruncateGuts() using cascade option. If any of the t1_depX is
> either not in the pg_subscription_rel or it is subscribed in another
> subscription, then is it okay if we scan pg_suscription_rel in a loop
> with t1_depX relid's?
>

Why do you need to search in a loop? There is an index for relid, subid.

> Isn't it costiler? Or since it is a cache
> lookup, maybe that's okay?
>
>     /* Try finding the mapping. */
>     tup = SearchSysCache2(SUBSCRIPTIONRELMAP,
>                           ObjectIdGetDatum(relid),
>                           ObjectIdGetDatum(subid));
>
> > One other problem discussed in this thread was what to do when the
> > same table is part of multiple subscriptions and the user has provided
> > a truncate option while operating on such a subscription.
>
> I think we can just skip truncating a table when it is part of
> multiple subscriptions. We can tell this clearly in the documentation.
>

Okay, I don't have any better ideas at this stage.

-- 
With Regards,
Amit Kapila.