Re: Allow logical replication to copy tables in binary format - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Allow logical replication to copy tables in binary format |
Date | |
Msg-id | CALj2ACVy1zpMvvbXw=UsBc+mh8UrJLUob-0V=rJv+CjXhWEBiw@mail.gmail.com Whole thread Raw |
In response to | Re: Allow logical replication to copy tables in binary format (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Fri, Feb 24, 2023 at 8:32 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Here is my summary of this thread so far, plus some other thoughts. Thanks. > 1. Initial Goal > ------------------ > > Currently, the CREATE SUBSCRIPTION ... WITH(binary=true) mode does > data replication in binary mode, but tablesync COPY phase is still > only in text mode. IIUC Melih just wanted to unify those phases so > binary=true would mean also do the COPY phase in binary [1]. > > Actually, this was my very first expectation too. > > 2. Objections to unification > ----------------------------------- > > Bharath posted suggesting tying the COPY/replication parts is not a > good idea [2]. But if these are not to be unified then it requires a > new subscription option to be introduced, and currently, the thread > refers to this new option as copy_format=binary. Looking closely at the existing binary=true option and COPY's binary protocol, essentially they depend on the data type's binary send and receive functions. > 3. A problem: binary replication is not always binary! > ---------------------------------------------------------------------- > > Shi-san reported [3] that under special circumstances (e.g. if the > datatype has no binary output function) the current HEAD binary=true > mode for replication has the ability to fall back to text replication. > Since the binary COPY doesn't do this, it means binary COPY could fail > in some cases where the binary=true replication will be OK. Right. Apply workers can fallback to text mode transparently, whereas with binary COPY it's a bit difficult to do so. > AFAIK, this means that even if we *wanted* to unify everything with > just 'binary=true' it can't be done like that. Hm, it looks like that. > 4. New option is needed > --------------------------------- > > OK, so let's assume these options must be separated (because of the > problem of #3). > > ~ > > 4a. New string option called copy_format? > > Currently, the patch/thread describes a new 'copy_format' string > option with values 'text' (default) and 'binary'. > > Why? If there are only 2 values then IMO it would be better to have a > *boolean* option called something like binary_copy=true/false. > > ~ > > 4b. Or, we could extend copy_data > > Jelte suggested [4] we could extend copy_data = 'text' (aka on/true) > OR 'binary' OR 'none' (aka off/false). How about copy_binary = {true/false}? So, the options for the user are: copy_binary - defaults to false and when true, the subscriber requests publisher to send pre-existing table's data in binary format (as opposed to text) during data synchronization/table copy phase. It is recommended to enable this option only when 1) the column data types have appropriate binary send/receive functions, 2) not replicating between different major versions or different platforms, 3) both publisher and subscriber tables have the exact same column types (not when replicating from smallint to int or numeric to int8 and so on), 4) both publisher and subscriber supports COPY with binary option, otherwise the table copy can fail. binary - defaults to false and when true, the subscriber requests publisher to send table's data in binary format (as opposed to text) during normal replication phase. <<existing description from the docs continues>> Note that with this we made users responsible to choose copy_binary rather than we being smart. > AFAIK currently the patch disallows some combinations: > > + ereport(ERROR, > + (errcode(ERRCODE_SYNTAX_ERROR), > + errmsg("%s and %s are mutually exclusive options", > + "binary = false", "copy_format = binary"))); > > 6. pub/sub version checking > ---------------------------- > > There were some discussions about doing some server checking to reject > some PG combinations from being allowed to use the copy_format=binary. IMHO, these restrictions make the feature more complicated to use and be removed and the options be made simple to use (usability and simplicity clearly wins the race). If there's any kind of feedback from the users/developers, we can always come back and improve. > But if the publication was defined as FOR ALL TABLES, or as ALL TABLES > IN SCHEMA, then I think the tablesync can still crash if a user > creates a new table that suffers the same COPY binary trouble Shi-san > described earlier [3]. > > What is the user supposed to do when that tablesync fails? > > They had no way to predict it could happen, And it will be too painful > to have to DROP and re-create the entire SUBSCRIPTION again now that > it has happened. Can't ALTER SUBSCRIPTION .... SET copy_format = 'text'; and ALTER SUBSCRIPTION ... REFRESH PUBLICATION WITH (copy_data = true); work here instead of drop and recreate subscription? > 7a. A thought bubble > > I wondered (perhaps this is naive) would it be it possible to enhance > the COPY command to enhance the "binary" mode so it can be made to > fall back to text mode if it needs to in the same way that binary > replication does. > > If such an enhanced COPY format mode worked, then most of the patch is redundant > - there is no need for any new option > - tablesync COPY could then *always* use this "binary-with-falback" mode I don't think that's a great idea. COPY is a user-facing SQL command and any errors (because of data type not having binary send/receive functions) better be reported to the users. Again, such an option might complicate both COPY code and usability. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: