RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers
From | Wei Wang (Fujitsu) |
---|---|
Subject | RE: Initial Schema Sync for Logical Replication |
Date | |
Msg-id | OS3PR01MB627503ED3CE7E2E3F14AFE9A9E6A9@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Initial Schema Sync for Logical Replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Initial Schema Sync for Logical Replication
|
List | pgsql-hackers |
On Fri, Apr 21, 2023 at 16:48 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > On Thu, Apr 20, 2023 at 8:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Apr 17, 2023 at 9:12 AM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Apr 7, 2023 at 6:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Apr 6, 2023 at 6:57 PM Masahiko Sawada > <sawada.mshk@gmail.com> wrote: > > > > > > > > > > > > > > > While writing a PoC patch, I found some difficulties in this idea. > > > > > First, I tried to add schemaname+relname to pg_subscription_rel but I > > > > > could not define the primary key of pg_subscription_rel. The primary > > > > > key on (srsubid, srrelid) doesn't work since srrelid could be NULL. > > > > > Similarly, the primary key on (srsubid, srrelid, schemaname, relname) > > > > > also doesn't work. > > > > > > > > > > > > > Can we think of having a separate catalog table say > > > > pg_subscription_remote_rel for this? You can have srsubid, > > > > remote_schema_name, remote_rel_name, etc. We may need some other > state > > > > to be maintained during the initial schema sync where this table can > > > > be used. Basically, this can be used to maintain the state till the > > > > initial schema sync is complete because we can create a relation entry > > > > in pg_subscritption_rel only after the initial schema sync is > > > > complete. > > > > > > It might not be ideal but I guess it works. But I think we need to > > > modify the name of replication slot for initial sync as it currently > > > includes OID of the table: > > > > > > void > > > ReplicationSlotNameForTablesync(Oid suboid, Oid relid, > > > char *syncslotname, Size szslot) > > > { > > > snprintf(syncslotname, szslot, "pg_%u_sync_%u_" UINT64_FORMAT, > suboid, > > > relid, GetSystemIdentifier()); > > > } > > > > > > If we use both schema name and table name, it's possible that slot > > > names are duplicated if schema and/or table names are long. Another > > > idea is to use the hash value of schema+table names, but it cannot > > > completely eliminate that possibility, and probably would make > > > investigation and debugging hard in case of any failure. Probably we > > > can use the OID of each entry in pg_subscription_remote_rel instead, > > > but I'm not sure it's a good idea, mainly because we will end up using > > > twice as many OIDs as before. > > > > > > > The other possibility is to use worker_pid. To make debugging easier, > > we may want to LOG schema_name+rel_name vs slot_name mapping at > DEBUG1 > > log level. > > Since worker_pid changes after the worker restarts, a new worker > cannot find the slot that had been used, no? > > After thinking it over, a better solution would be that we add an oid > column, nspname column, and relname column to pg_subscription_rel and > the primary key on the oid. If the table is not present on the > subscriber we store the schema name and table name to the catalog, and > otherwise we store the local table oid same as today. The local table > oid will be filled after the schema sync. The names of origin and > replication slot the tablesync worker uses use the oid instead of the > table oid. > > I've attached a PoC patch of this idea (very rough patch and has many > TODO comments). It mixes the following changes: > > 1. Add oid column to the pg_subscription_rel. The oid is used as the > primary key and in the names of origin and slot the tablesync workers > use. > > 2. Add copy_schema = on/off option to CREATE SUBSCRIPTION (not yet > support for ALTER SUBSCRIPTION). > > 3. Add CRS_EXPORT_USE_SNAPSHOT new action in order to use the same > snapshot by both walsender and other processes (e.g. pg_dump). In this > patch, the snapshot is exported for pg_dump and is used by the > walsender for COPY. > > It seems to work well but there might be a pitfall as I've not fully > implemented it. Thanks for your POC patch. After reviewing this patch, I have a question below that want to confirm: 1. In the function synchronize_table_schema. I think some changes to GUC and table-related object SQLs are included in the pg_dump result. And in this POC, these SQLs will be executed. Do we need to alter the pg_dump results to only execute the table schema related SQLs? For example, if we have below table schema in the publisher-side: ``` create table tbl(a int, b int); create index idx_t on tbl (a); CREATE FUNCTION trigger_func() RETURNS TRIGGER LANGUAGE PLPGSQL AS $$ BEGIN INSERT INTO public.tbl VALUES (NEW.*); RETURNNEW; END; $$; CREATE TRIGGER tri_tbl BEFORE INSERT ON public.tbl FOR EACH ROW EXECUTE PROCEDURE trigger_func(); ``` The result of pg_dump executed on the subscriber-side: ``` SET statement_timeout = 0; SET lock_timeout = 0; SET idle_in_transaction_session_timeout = 0; SET client_encoding = 'UTF8'; SET standard_conforming_strings = on; SELECT pg_catalog.set_config('search_path', '', false); SET check_function_bodies = false; SET xmloption = content; SET client_min_messages = warning; SET row_security = off; SET default_tablespace = ''; SET default_table_access_method = heap; CREATE TABLE public.tbl ( a integer, b integer ); ALTER TABLE public.tbl OWNER TO postgres; CREATE INDEX idx_t ON public.tbl USING btree (a); CREATE TRIGGER tri_tbl BEFORE INSERT ON public.tbl FOR EACH ROW EXECUTE FUNCTION public.trigger_func(); ``` And this will cause an error when `CREATE TRIGGER` because we did not dump the function trigger_func. Regards, Wang Wei
pgsql-hackers by date: