RE: Support logical replication of DDLs - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Support logical replication of DDLs
Date
Msg-id OS0PR01MB5716CBA51E912315468C320894629@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (vignesh C <vignesh21@gmail.com>)
Responses RE: Support logical replication of DDLs
List pgsql-hackers
On Mon, Apr 17, 2023 18:32 PM vignesh C <vignesh21@gmail.com> wrote:
> On Sat, 15 Apr 2023 at 06:38, vignesh C <vignesh21@gmail.com> wrote:
> >
> > On Fri, 14 Apr 2023 at 13:06, vignesh C <vignesh21@gmail.com> wrote:
> > >
> > > Few comments:

Thanks for your comments.
Improved the patch set according to your comments. Please refer to the details below.

===
About the comments in [1]:

> 1) I felt is_present_flag variable can be removed by moving 
> "object_name = append_object_to_format_string(tree, sub_fmt);" inside 
> the if condition:
> +static void
> +append_bool_object(ObjTree *tree, char *sub_fmt, bool value) {
> +       ObjElem    *param;
> +       char       *object_name = sub_fmt;
> +       bool            is_present_flag = false;
> +
> +       Assert(sub_fmt);
> +
> +       /*
> +        * Check if the format string is 'present' and if yes, store the boolean
> +        * value
> +        */
> +       if (strcmp(sub_fmt, "present") == 0)
> +       {
> +               is_present_flag = true;
> +               tree->present = value;
> +       }
> +
> +       if (!is_present_flag)
> +               object_name = append_object_to_format_string(tree, 
> + sub_fmt);
> +
> +       param = new_object(ObjTypeBool, object_name);
> +       param->value.boolean = value;
> +       append_premade_object(tree, param); }
> 
> By changing it to something like below:
> +static void
> +append_bool_object(ObjTree *tree, char *sub_fmt, bool value) {
> +       ObjElem    *param;
> +       char       *object_name = sub_fmt;
> +
> +       Assert(sub_fmt);
> +
> +       /*
> +        * Check if the format string is 'present' and if yes, store the boolean
> +        * value
> +        */
> +       if (strcmp(sub_fmt, "present") == 0)
> +       {
> +               tree->present = value;
> +               object_name = append_object_to_format_string(tree, sub_fmt);
> +       }
> +
> +       param = new_object(ObjTypeBool, object_name);
> +       param->value.boolean = value;
> +       append_premade_object(tree, param); }

Changed.

> 2) We could remove the temporary variable tmp_str here:
> +       if (start_ptr != NULL && end_ptr != NULL)
> +       {
> +               length = end_ptr - start_ptr - 1;
> +               tmp_str = (char *) palloc(length + 1);
> +               strncpy(tmp_str, start_ptr + 1, length);
> +               tmp_str[length] = '\0';
> +               appendStringInfoString(&object_name, tmp_str);
> +               pfree(tmp_str);
> +       }
> 
> by changing to:
> +       if (start_ptr != NULL && end_ptr != NULL)
> +               appendBinaryStringInfo(&object_name, start_ptr + 1,
> end_ptr - start_ptr - 1);

Changed.

> 3) I did not see the usage of ObjTypeFloat type used anywhere, we 
> could remove it:
> +typedef enum
> +{
> +       ObjTypeNull,
> +       ObjTypeBool,
> +       ObjTypeString,
> +       ObjTypeArray,
> +       ObjTypeInteger,
> +       ObjTypeFloat,
> +       ObjTypeObject
> +} ObjType;

Removed.

> 4) I noticed that none of the file names in src/backend/commands uses 
> "_" in the filenames, but in case of ddl_deparse.c and ddl_json.c we 
> have used "_", it might be better to be consistent with other 
> filenames in this directory:
> 
> diff --git a/src/backend/commands/Makefile 
> b/src/backend/commands/Makefile index 48f7348f91..171dfb2800 100644
> --- a/src/backend/commands/Makefile
> +++ b/src/backend/commands/Makefile
> @@ -29,6 +29,8 @@ OBJS = \
>         copyto.o \
>         createas.o \
>         dbcommands.o \
> +       ddl_deparse.o \
> +       ddl_json.o \
>         define.o \
>         discard.o \
>         dropcmds.o \

Changed.

> 5) The following includes are no more required in ddl_deparse.c as we 
> have removed support for deparsing of other objects:
> #include "catalog/pg_am.h"
> #include "catalog/pg_aggregate.h"
> #include "catalog/pg_authid.h"
> #include "catalog/pg_cast.h"
> #include "catalog/pg_conversion.h"
> #include "catalog/pg_depend.h"
> #include "catalog/pg_extension.h"
> #include "catalog/pg_foreign_data_wrapper.h"
> #include "catalog/pg_foreign_server.h"
> #include "catalog/pg_language.h"
> #include "catalog/pg_largeobject.h"
> #include "catalog/pg_opclass.h"
> #include "catalog/pg_operator.h"
> #include "catalog/pg_opfamily.h"
> #include "catalog/pg_policy.h"
> #include "catalog/pg_range.h"
> #include "catalog/pg_rewrite.h"
> #include "catalog/pg_sequence.h"
> #include "catalog/pg_statistic_ext.h"
> #include "catalog/pg_transform.h"
> #include "catalog/pg_ts_config.h"
> #include "catalog/pg_ts_dict.h"
> #include "catalog/pg_ts_parser.h"
> #include "catalog/pg_ts_template.h"
> #include "catalog/pg_user_mapping.h"
> #include "foreign/foreign.h"
> #include "mb/pg_wchar.h"
> #include "nodes/nodeFuncs.h"
> #include "nodes/parsenodes.h"
> #include "parser/parse_type.h"

Removed.

===
About the comments in [2]:

> 1) We could add a space after the 2nd parameter
> + * Note we don't have the luxury of sprintf-like compiler warnings 
> +for
> + * malformed argument lists.
> + */
> +static ObjTree *
> +new_objtree_VA(char *fmt, int numobjs,...)

Changed.

> 2) I felt objtree_to_jsonb_element is a helper function for 
> objtree_to_jsonb_rec, we should update the comments accordingly:
> +/*
> + * Helper for objtree_to_jsonb: process an individual element from an 
> +object or
> + * an array into the output parse state.
> + */
> +static void
> +objtree_to_jsonb_element(JsonbParseState *state, ObjElem *object,
> +                                                JsonbIteratorToken 
> +elem_token) {
> +       JsonbValue      val;
> +
> +       switch (object->objtype)
> +       {
> +               case ObjTypeNull:
> +                       val.type = jbvNull;
> +                       pushJsonbValue(&state, elem_token, &val);
> +                       break;

Changed.

> 3) domainId parameter change should be removed from the first patch:
> +static List *
> +obtainConstraints(List *elements, Oid relationId, Oid domainId,
> +                                 ConstraintObjType objType) {
> +       Relation        conRel;
> +       ScanKeyData key;
> +       SysScanDesc scan;
> +       HeapTuple       tuple;
> +       ObjTree    *constr;
> +       Oid                     relid;
> +
> +       /* Only one may be valid */
> +       Assert(OidIsValid(relationId) ^ OidIsValid(domainId));
> +
> +       relid = OidIsValid(relationId) ? ConstraintRelidTypidNameIndexId :
> +                       ConstraintTypidIndexId;

Removed in the last version.

> 4) Do we have any scenario for CONSTRAINT_TRIGGER in table/index, if 
> so could we add a test for this?
> +                       case CONSTRAINT_UNIQUE:
> +                               contype = "unique";
> +                               break;
> +                       case CONSTRAINT_TRIGGER:
> +                               contype = "trigger";
> +                               break;
> +                       case CONSTRAINT_EXCLUSION:
> +                               contype = "exclusion";
> +                               break;

No. Moved from 0001 patch to later patch (the deparser for the rest commands).

> 5) The below code adds information about compression but the comment 
> says "USING clause", the comment should be updated accordingly:
> +       /* USING clause */
> +       tmp_obj = new_objtree("COMPRESSION");
> +       if (coldef->compression)
> +               append_string_object(tmp_obj, 
> + "%{compression_method}I",
> +
> "compression_method", coldef->compression);
> +       else
> +       {
> +               append_null_object(tmp_obj, "%{compression_method}I");
> +               append_not_present(tmp_obj);
> +       }
> +       append_object_object(ret, "%{compression}s", tmp_obj);

Changed.

> 6) Generally we add append_null_object followed by append_not_present, 
> but it is not present for "COLLATE" handling, is this correct?
> +       tmp_obj = new_objtree("COMPRESSION");
> +       if (coldef->compression)
> +               append_string_object(tmp_obj, 
> + "%{compression_method}I",
> +
> "compression_method", coldef->compression);
> +       else
> +       {
> +               append_null_object(tmp_obj, "%{compression_method}I");
> +               append_not_present(tmp_obj);
> +       }
> +       append_object_object(ret, "%{compression}s", tmp_obj);
> +
> +       tmp_obj = new_objtree("COLLATE");
> +       if (OidIsValid(typcollation))
> +               append_object_object(tmp_obj, "%{name}D",
> +
> new_objtree_for_qualname_id(CollationRelationId,
> +
>                                           typcollation));
> +       else
> +               append_not_present(tmp_obj);
> +       append_object_object(ret, "%{collation}s", tmp_obj);

Changed.

> 7) I felt attrTup can be released after get_atttypetypmodcoll as we 
> are not using the tuple after that, I'm not sure if it is required to 
> hold the reference to this tuple till the end of the function:
> +static ObjTree *
> +deparse_ColumnDef_typed(Relation relation, List *dpcontext, ColumnDef
> *coldef)
> +{
> +       ObjTree    *ret = NULL;
> +       ObjTree    *tmp_obj;
> +       Oid                     relid = RelationGetRelid(relation);
> +       HeapTuple       attrTup;
> +       Form_pg_attribute attrForm;
> +       Oid                     typid;
> +       int32           typmod;
> +       Oid                     typcollation;
> +       bool            saw_notnull;
> +       ListCell   *cell;
> +
> +       attrTup = SearchSysCacheAttName(relid, coldef->colname);
> +       if (!HeapTupleIsValid(attrTup))
> +               elog(ERROR, "could not find cache entry for column
> \"%s\" of relation %u",
> +                        coldef->colname, relid);
> +       attrForm = (Form_pg_attribute) GETSTRUCT(attrTup);
> +
> +       get_atttypetypmodcoll(relid, attrForm->attnum,
> +                                                 &typid, &typmod,
> &typcollation);
> +
> +       /*
> +        * Search for a NOT NULL declaration. As in deparse_ColumnDef,
> we rely on
> +        * finding a constraint on the column rather than coldef->is_not_null.
> +        * (This routine is never used for ALTER cases.)
> +        */
> +       saw_notnull = false;
> +       foreach(cell, coldef->constraints)
> +       {
> +               Constraint *constr = (Constraint *) lfirst(cell);
> +
> +               if (constr->contype == CONSTR_NOTNULL)
> +               {
> +                       saw_notnull = true;
> +                       break;
> +               }
> +       }

'attrTup' can not be released earlier as it is used till the end in the form of 'attrForm'.

> 8) This looks like ALTER TABLE ... SET/RESET, the function header 
> should be updated accordingly:
> /*
>  * ... ALTER COLUMN ... SET/RESET (...)
>  *
>  * Verbose syntax
>  * RESET|SET (%{options:, }s)
>  */
> static ObjTree *
> deparse_RelSetOptions(AlterTableCmd *subcmd) {
> List    *sets = NIL;
> ListCell   *cell;
> bool is_reset = subcmd->subtype == AT_ResetRelOptions;

Changed.

> 9) Since we don't replicate temporary tables, is this required:
> +/*
> + * Deparse the ON COMMIT ... clause for CREATE ... TEMPORARY ...
> + *
> + * Verbose syntax
> + * ON COMMIT %{on_commit_value}s
> + */
> +static ObjTree *
> +deparse_OnCommitClause(OnCommitAction option) {
> +       ObjTree    *ret  = new_objtree("ON COMMIT");
> +       switch (option)

I will consider this in the next version.

> 10) Since we don't support MATERIALIZED VIEW, VIEW and FOREIGN TABLE, 
> they can be removed:
> +       switch (rel->rd_rel->relkind)
> +       {
> +               case RELKIND_RELATION:
> +               case RELKIND_PARTITIONED_TABLE:
> +                       reltype = "TABLE";
> +                       break;
> +               case RELKIND_INDEX:
> +               case RELKIND_PARTITIONED_INDEX:
> +                       reltype = "INDEX";
> +                       break;
> +               case RELKIND_VIEW:
> +                       reltype = "VIEW";
> +                       break;
> +               case RELKIND_COMPOSITE_TYPE:
> +                       reltype = "TYPE";
> +                       istype = true;
> +                       break;
> +               case RELKIND_FOREIGN_TABLE:
> +                       reltype = "FOREIGN TABLE";
> +                       break;
> +               case RELKIND_MATVIEW:
> +                       reltype = "MATERIALIZED VIEW";
> +                       break;

The code is changed in a way that these are no-op and return NULL instead of returning a valid value. I think this
shouldsuffice.
 
Please let me know if you think these must be removed.

===
About the comments in [3]:

> 1) since missing_ok is passed as false, if there is an error the error 
> will be handled in find_string_in_jsonbcontainer, "missing operator 
> name" handling can be removed from here:
> +/*
> + * Expand a JSON value as an operator name. The value may contain 
> +element
> + * "schemaname" (optional).
> + */
> +static void
> +expand_jsonval_operator(StringInfo buf, JsonbValue *jsonval) {
> +       char       *str;
> +       JsonbContainer *data = jsonval->val.binary.data;
> +
> +       str = find_string_in_jsonbcontainer(data, "schemaname", true, NULL);
> +       /* Schema might be NULL or empty */
> +       if (str != NULL && str[0] != '\0')
> +       {
> +               appendStringInfo(buf, "%s.", quote_identifier(str));
> +               pfree(str);
> +       }
> +
> +       str = find_string_in_jsonbcontainer(data, "objname", false, NULL);
> +       if (!str)
> +               ereport(ERROR,
> +                               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                               errmsg("missing operator name"));
> +

Removed.

> 2) This should be present at the beginning of the file before the functions:
> +#define ADVANCE_PARSE_POINTER(ptr,end_ptr) \
> +       do { \
> +               if (++(ptr) >= (end_ptr)) \
> +                       ereport(ERROR, \
> +
> errcode(ERRCODE_INVALID_PARAMETER_VALUE), \
> +                                       errmsg("unterminated format
> specifier")); \
> +       } while (0)
> +

Changed.

> 3) Should we add this to the documentation, we have documented other 
> event triggers like ddl_command_start, ddl_command_end, table_rewrite 
> and sql_drop at [1]:
>  +       runlist = EventTriggerCommonSetup(command->parsetree,
> +
>    EVT_TableInitWrite,
> +
>    "table_init_write",
> +
>    &trigdata);

Added.

> 4) The inclusion of stringinfo.h is not required, I was able to 
> compile the code without it:
> + *       src/backend/commands/ddl_json.c
> + *
> + 
> +*--------------------------------------------------------------------
> +-----
> + */
> +#include "postgres.h"
> +
> +#include "lib/stringinfo.h"
> +#include "tcop/ddl_deparse.h"
> +#include "utils/builtins.h"
> +#include "utils/jsonb.h"

Removed.

> 5) schema and typmodstr might not be allocated, should we still call pfree?
> +       schema = find_string_in_jsonbcontainer(data, "schemaname", true, NULL);
> +       typename = find_string_in_jsonbcontainer(data, "typename", false, NULL);
> +       typmodstr = find_string_in_jsonbcontainer(data, "typmod", true, NULL);
> +       is_array = find_bool_in_jsonbcontainer(data, "typarray");
> +       switch (is_array)
> +       {
> +               case tv_true:
> +                       array_decor = "[]";
> +                       break;
> +
> +               case tv_false:
> +                       array_decor = "";
> +                       break;
> +
> +               case tv_absent:
> +               default:
> +                       ereport(ERROR,
> +
> errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +                                       errmsg("missing typarray element"));
> +       }
> +
> +       if (schema == NULL)
> +               appendStringInfo(buf, "%s", quote_identifier(typename));
> +       else if (schema[0] == '\0')
> +               appendStringInfo(buf, "%s", typename);  /* Special
> typmod needs */
> +       else
> +               appendStringInfo(buf, "%s.%s", quote_identifier(schema),
> +                                                
> + quote_identifier(typename));
> +
> +       appendStringInfo(buf, "%s%s", typmodstr ? typmodstr : "", array_decor);
> +       pfree(schema);
> +       pfree(typename);
> +       pfree(typmodstr);

Changed code to do NULL initialization in the begining and call pfree only if these are allocated.

> 6) SHould the following from ddl_deparse_expand_command function 
> header be moved to expand_one_jsonb_element function header, as the 
> specified are being handled in expand_one_jsonb_element.
> * % expand to a literal %
>  * I expand as a single, non-qualified identifier
>  * D expand as a possibly-qualified identifier
>  * T expand as a type name
>  * O expand as an operator name
>  * L expand as a string literal (quote using single quotes)
>  * s expand as a simple string (no quoting)
>  * n expand as a simple number (no quoting)
>  * R expand as a role name (possibly quoted name, or PUBLIC)

It seems more suitable here as this is the exposed function
and thus complete details here make sense. But I have changed comment little
bit to indicate that actual conversion work happens in expand_one_jsonb_element().
Please let me know if you still think it is better to move the comments.

>  7) In ddl_deparse.c we have used elog(ERROR, ...) for error handling 
> and in ddl_json.c we have used ereport(ERROR, ...) for error handling, 
> Is this required for any special handling?

I checked occurences of elog in ddl_deparse.c. That looked appropriate to me
as they are internal errors which we usually don't expect user to encounter.
In fact I changed few of ereport added earlier by me to elog to be consistent.
The occurences of ereport in ddl_json.c  also looks fine to me. 
Please let me know if you want any specific error handling to be changed.

> 8) Is this required as part of create table implementation:
> 8.a)
> +/*
> + * EventTriggerAlterTypeStart
> + *             Save data about a single part of an ALTER TYPE.
> + *
> + * ALTER TABLE can have multiple subcommands which might include DROP
> COLUMN
> + * command and ALTER TYPE referring the drop column in USING expression.
> + * As the dropped column cannot be accessed after the execution of 
> + DROP
> COLUMN,
> + * a special trigger is required to handle this case before the drop 
> +column is
> + * executed.
> + */
> +void
> +EventTriggerAlterTypeStart(AlterTableCmd *subcmd, Relation rel) {
> 
> 8.b)
> +/*
> + * EventTriggerAlterTypeEnd
> + *             Finish up saving an ALTER TYPE command, and add it to
> command list.
> + */
> +void
> +EventTriggerAlterTypeEnd(Node *subcmd, ObjectAddress address, bool 
> +rewrite)

It is used to store the using expression of alter column type command,
and the using expression(usingexpr) is used in the deparser. So I think it
cannot be removed/moved to other patches.

> 9) Since we need only the table and index related implementation in 
> 0001, the rest can be moved to a different patch accordingly:
> +/*
> + * Return the given object type as a string.
> + *
> + * If isgrant is true, then this function is called while deparsing 
> +GRANT
> + * statement and some object names are replaced.
> + */
> +const char *
> +stringify_objtype(ObjectType objtype, bool isgrant) {
> +       switch (objtype)
> +       {
> +               case OBJECT_AGGREGATE:
> +                       return "AGGREGATE";
> +               case OBJECT_CAST:
> +                       return "CAST";
> +               case OBJECT_COLLATION:
> +                       return "COLLATION";
> +               case OBJECT_COLUMN:
> +                       return isgrant ? "TABLE" : "COLUMN";

Changed.

> 10) json_trivalue should be added to typedefs:
> +typedef enum
> +{
> +       tv_absent,
> +       tv_true,
> +       tv_false
> +}                      json_trivalue;

Added.

Attach the new patch set and thanks Shveta for helping
address the above comments.

Apart from above comments.
The new version patch also did the following changes:

- Fixed the cfbot warnings about the header file.
- Added documents for ALTER TABLE rewrite restrictions.
- Fixed the crash when applying create index concurrently.
- Used the consistent API as DML apply when switch the current role
 to the another before applying.

[1] - https://www.postgresql.org/message-id/CALDaNm1aTHyeMfmkyunq%3DHZ6dyOJNqgszhmsLkeVMEgWfJ8frA%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CALDaNm1RnYRfzsL4GfznU4zuoPMkgnAAM8Ons3kCtLr2Tdzoow%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CALDaNm1NeyTrCDizXHvUqhbOdH2%3D%2Bf%3DR8aX3r0AbDr7rRJeQAA%40mail.gmail.com

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Request for comment on setting binary format output per session
Next
From: Tom Lane
Date:
Subject: Re: Remove references to pre-11 versions