Support logical replication of global object commands - Mailing list pgsql-hackers

From Zheng Li
Subject Support logical replication of global object commands
Date
Msg-id CAAD30UKD7YPEbYcs_L9PYLcLZjnxyqO=JF5_mnAwx7g_PtOi3A@mail.gmail.com
Whole thread Raw
Responses Re: Support logical replication of global object commands
Re: Support logical replication of global object commands
List pgsql-hackers
Hello,

Logical replication of DDL commands support is being worked on in [1].
However, global object commands are quite different from other
non-global object DDL commands and need to be handled differently. For
example, global object commands include ROLE statements, DATABASE
statements, TABLESPACE statements and a subset of GRANT/REVOKE
statements if the object being modified is a global object. These
commands are different from other DDL commands in that:

1. Global object commands can be executed in any database.
2. Global objects are not schema qualified.
3. Global object commands are not captured by event triggers.

I’ve put together a prototype to support logical replication of global
object commands in the attached patch. This patch builds on the DDL
replication patch from ZJ in [2] and must be applied on top of it.
Here is a list of global object commands that the patch replicate, you
can find more details in function LogGlobalObjectCommand:

/* ROLE statements */
CreateRoleStmt
AlterRoleStmt
AlterRoleSetStmt
DropRoleStmt
ReassignOwnedStmt
GrantRoleStmt

/* Database statements */
CreatedbStmt
AlterDatabaseStmt
AlterDatabaseRefreshCollStmt
AlterDatabaseSetStmt
DropdbStmt

/* TableSpace statements */
CreateTableSpaceStmt
DropTableSpaceStmt
AlterTableSpaceOptionsStmt

/* GrantStmt and RevokeStmt if objtype is a global object determined
by EventTriggerSupportsObjectType() */
GrantStmt
RevokeStmt

The idea with this patch is to support global objects commands
replication by WAL logging the command using the same function for DDL
logging - LogLogicalDDLMessage towards the end of
standard_ProcessUtility. Because global objects are not schema
qualified, we can skip the deparser invocation and directly log the
original command string for replay on the subscriber.

A key problem to address is that global objects can become
inconsistent between the publisher and the subscriber if a command
modifying the global object gets executed in a database (on the source
side) that doesn't replicate the global object commands. I think we
can work on the following two aspects in order to avoid such
inconsistency:

1. Introduce a publication option for global object commands
replication and document that logical replication of global object
commands is preferred to be enabled on all databases. Otherwise
inconsistency can happen if a command modifies the global object in a
database that doesn't replicate global object commands.

For example, we could introduce the following publication option
publish_global_object_command :
CREATE PUBLICATION mypub
FOR ALL TABLES
WITH (publish = 'insert, delete, update', publish_global_object_command = true);

We may consider other fine tuned global command options such as
“publish_role_statements”, “publish_database_statements”,
“publish_tablespace_statements” and "publish_grant_statements", i.e.
you pick which global commands you want replicated. For example, you
can do this if you need a permission or tablespace to be set up
differently on the target cluster. In addition, we may need to adjust
the syntax once the DDL replication syntax finalizes.

2. Introduce the following database cluster level logical replication
commands to avoid such inconsistency, this is especially handy when
there is a large number of databases to configure for logical
replication.

CREATE PUBLICATION GROUP mypub_
FOR ALL DATABASES
WITH (publish = 'insert, delete, update', publish_global_object_command = true);

CREATE SUBSCRIPTION GROUP mysub_
CONNECTION 'dbnames = \“path to file\” host=hostname user=username port=5432'
PUBLICATION GROUP mypub_;

Under the hood, the CREATE PUBLICATION GROUP command generates one
CREATE PUBLICATION mypub_n sub-command for each database in the
cluster where n is a monotonically increasing integer from 1. The
command outputs the (dbname, publication name) pairs which can be
saved in a file and then used on the subscription side.

Similarly, the CREATE SUBSCRIPTION GROUP command will generate one
CREATE SUBSCRIPTION mysub_n sub-command for each database in the
dbnames file. The dbnames file contains the (dbname, publication name)
pairs which come from the output of the CREATE PUBLICATION GROUP
command. Notice the connection string doesn’t have the dbname field,
During execution the connection string will be appended the dbname
retrieved from the dbnames file. By default the target DB name is the
same as the source DB name, optionally user can specify the source_db
to target_db mapping in the dbnames file.

In addition, we might want to create dependencies for the
publications/subscriptions created by the above commands in order to
guarantee the group consistency. Also we need to enforce that there is
only one group of publications/subscriptions for database cluster
level replication.

Logical replication of all commands across an entire cluster (instead
of on a per-database basis) is a separate topic. We can start another
thread after implementing a prototype.

Please let me know your thoughts.

[1] https://www.postgresql.org/message-id/flat/CAAD30U%2BpVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo%3DOjdQGJ9g%40mail.gmail.com

[2]https://www.postgresql.org/message-id/OS0PR01MB5716009FDCCC0B50BCB14A99949D9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

With Regards,
Zheng Li
Amazon RDS/Aurora for PostgreSQL

Attachment

pgsql-hackers by date:

Previous
From: Mikael Kjellström
Date:
Subject: Re: conchuela doesn't like gnu_printf anymore
Next
From: Zheng Li
Date:
Subject: Re: Support logical replication of DDLs