Re: [PATCH] Add `truncate` option to subscription commands - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: [PATCH] Add `truncate` option to subscription commands |
Date | |
Msg-id | CALj2ACUOkYk9WBHLmqEp9JDM333jZXCvqNQnS=0ukRdnTOcftg@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Add `truncate` option to subscription commands (David Christensen <david@endpoint.com>) |
Responses |
Re: [PATCH] Add `truncate` option to subscription commands
|
List | pgsql-hackers |
On Thu, Nov 26, 2020 at 12:16 AM David Christensen <david@endpoint.com> wrote: > > Hi, > > At this time I do not have time to make the necessary changes for this > commitfest so I am voluntarily withdrawing this patch, but will > revisit at a future time. Hi, This feature looks useful in the sense that it avoids users having to manually lookup all the tables on all the subscribers for truncation (in case they want the subscriber tables to exactly sync with the publisher tables). I have gone through the prior discussions on this thread. IMO, we can always go ahead with TRUNCATE ... RESTRICT behavior to avoid some unnecessary truncation of subscriber local tables (if at all users have such tables) that can arise due to CASCADE option. It looks like there are some problems with the FK - PK dependencies. Below are my thoughts: 1) Whether a table the sync worker is trying to truncate is having any referencing (foreign key) tables on the subscriber? If yes, whether all the referencing tables are present in the list of subscription tables (output of fetch_table_list)? In this case, the sync worker is truncating the primary key/referenced table. One way to solve the above problem is by storing the table oids of the subscription tables (output of fetch_table_list) in a new column in the pg_subscription catalog (like subpublications text[] column). In the sync worker, before truncation of a table, use heap_truncate_find_FKs to get all the referencing tables of the given table and get all the subscription tables from the new pg_subscription column. If all the referencing tables exist in the subscription tables, then truncate the table, otherwise don't, just skip it. There can be a problem here if there are many subscription tables, the size of the new column in pg_susbcription can be huge. However, we can choose to store the table ids in this new column only when the truncate option is specified. Another way is to let each table sync worker scan the pg_subscription_rel to get all the relations that belong to a subscription. But I felt this was costly. 2) Whether a table the sync worker is trying to truncate is a referencing table for any of the subscriber tables that is not part of the subscription list of tables? In this case, the table the sync worker is truncating is the foreign key/referencing table. This isn't a problem actually, the sync worker can safely truncate the table. This is also inline with the current TRUNCATE command behaviour. 3) I think we should allow the truncate option with CREATE SUBSCRIPTION, ALTER SUBSCRIPTION ... REFRESH/SET/ADD PUBLICATION, basically wherever copy_data and refresh options can be specified. And there's no need to store the truncate option in the pg_subscription catalogue because we allow it to be specified with only DDLs. 4) If there are a huge number of tables with lots of data, then all the sync workers will have to spend an extra amount of time in truncating the tables. At times the publications can use "FOR ALL TABLES" i.e. all the tables within a database, so truncating all of them on the subscriber would be a time consuming task. I'm not sure if this is okay. 5) We can choose to skip the errors that arise out of ExecuteTruncateGuts in a sync worker using PG_TRY/PG_CATCH or changing ExecuteTruncateGuts API to return false on error instead of emitting an error. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: