Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From vignesh C
Subject Re: Support logical replication of DDLs
Date
Msg-id CALDaNm1WEnw2Oykb90PO1c4oDAVrAR+16W8Cm_F-KzgNvqmmKg@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, 11 Nov 2022 at 10:39, Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Nov 11, 2022 at 3:47 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here are more review comments for the v32-0001 file ddl_deparse.c
> >
> > *** NOTE - my review post became too big, so I split it into smaller parts.
>
> THIS IS PART 2 OF 4.
>
> =======
>
> src/backend/commands/ddl_deparse.c
>
> 1. deparse_AlterExtensionStmt
>
> +/*
> + * Deparse an AlterExtensionStmt (ALTER EXTENSION .. UPDATE TO VERSION)
> + *
> + * Given an extension  OID and a parse tree that modified it, return an ObjTree
> + * representing the alter type.
> + */
> +static ObjTree *
> +deparse_AlterExtensionStmt(Oid objectId, Node *parsetree)
>
> Spurious blank space before "OID"

Modified

> 2.
>
> + ObjTree    *stmt;
> + ObjTree    *tmp;
> + List    *list = NIL;
> + ListCell   *cell;
>
> Variable 'tmp' can be declared only in the scope that it is used.

Modified

> 3.
>
> + foreach(cell, node->options)
> + {
> + DefElem    *opt = (DefElem *) lfirst(cell);
> +
> + if (strcmp(opt->defname, "new_version") == 0)
> + {
> + tmp = new_objtree_VA("TO %{version}L", 2,
> + "type", ObjTypeString, "version",
> + "version", ObjTypeString, defGetString(opt));
> + list = lappend(list, new_object_object(tmp));
> + }
> + else
> + elog(ERROR, "unsupported option %s", opt->defname);
> + }
>
> This code seems strange to be adding new versions to a list. How can
> there be multiple new versions? It does not seem compatible with the
> command syntax [1]

Modified

> 4. deparse_CreateCastStmt
>
> + initStringInfo(&func);
> + appendStringInfo(&func, "%s(",
> + quote_qualified_identifier(get_namespace_name(funcForm->pronamespace),
> + NameStr(funcForm->proname)));
> + for (i = 0; i < funcForm->pronargs; i++)
> + appendStringInfoString(&func,
> +    format_type_be_qualified(funcForm->proargtypes.values[i]));
> + appendStringInfoChar(&func, ')');
>
> Is this correct, or should there be some separators (e.g. commas)
> between multiple arg-types?

Modified

> 5. deparse_AlterDefaultPrivilegesStmt
>
> +
> +static ObjTree *
> +deparse_AlterDefaultPrivilegesStmt(CollectedCommand *cmd)
>
> Missing function comment

Modified

> 6.
>
> + schemas = lappend(schemas,
> +   new_string_object(strVal(val)));
>
> Unnecessary wrapping.

Modified

> 7.
>
> + /* Add the IN SCHEMA clause, if any */
> + tmp = new_objtree("IN SCHEMA");
> + append_array_object(tmp, "%{schemas:, }I", schemas);
> + if (schemas == NIL)
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterStmt, "%{in_schema}s", tmp);
> +
> + /* Add the FOR ROLE clause, if any */
> + tmp = new_objtree("FOR ROLE");
> + append_array_object(tmp, "%{roles:, }R", roles);
> + if (roles == NIL)
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterStmt, "%{for_roles}s", tmp);
>
>
> I don't really understand why the logic prefers to add a whole new
> empty tree with "present: false" versus just adding nothing at all
> unless it is relevant.

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 8.
>
> + if (stmt->action->is_grant)
> + grant = new_objtree("GRANT");
> + else
> + grant = new_objtree("REVOKE");
> +
> + /* add the GRANT OPTION clause for REVOKE subcommand */
> + if (!stmt->action->is_grant)
> + {
> + tmp = new_objtree_VA("GRANT OPTION FOR",
> +    1, "present", ObjTypeBool,
> +    stmt->action->grant_option);
> + append_object_object(grant, "%{grant_option}s", tmp);
> + }
>
> That 2nd 'if' can just be combined with the 'else' logic of the prior if.

Modified

> 9.
>
> + Assert(priv->cols == NIL);
> + privs = lappend(privs,
> + new_string_object(priv->priv_name));
>
> Unnecessary wrapping.

Modified

> 10. deparse_AlterTableStmt
>
> Maybe this function name should be different because it is not only
> for TABLEs but also serves for INDEX, VIEW, TYPE, etc

Changed the function name

> 11.
>
> AFAICT every case in the switch (subcmd->subtype) is doing subcmds =
> lappend(subcmds, new_object_object(tmpobj));
>
> Just doing this in common code at the end might be an easy way to
> remove ~50 lines of duplicate code.

There are fewer cases where we don't do anything:
case AT_ReAddIndex:
case AT_ReAddConstraint:
case AT_ReAddComment:
case AT_ReplaceRelOptions:
case AT_CheckNotNull:
case AT_ReAddStatistics:
/* Subtypes used for internal operations; nothing to do here */
break;

case AT_AddColumnToView:
/* CREATE OR REPLACE VIEW -- nothing to do here */
break;

This lappend cannot be  moved to end as it is not applicable for all the cases.

> 12. deparse_ColumnDef
>
> + * NOT NULL constraints in the column definition are emitted directly in the
> + * column definition by this routine; other constraints must be emitted
> + * elsewhere (the info in the parse node is incomplete anyway.).
> + */
> +static ObjTree *
> +deparse_ColumnDef(Relation relation, List *dpcontext, bool composite,
> +   ColumnDef *coldef, bool is_alter, List **exprs)
>
> "anyway.)." -> "anyway)."

Modified

> 13.
>
> + /* USING clause */
> + tmpobj = new_objtree("COMPRESSION");
> + if (coldef->compression)
> + append_string_object(tmpobj, "%{compression_method}I", coldef->compression);
> + else
> + {
> + append_null_object(tmpobj, "%{compression_method}I");
> + append_bool_object(tmpobj, "present", false);
> + }
>
> Why is it necessary to specify a NULL compression method if the entire
> "COMPRESSION" is anyway flagged as present=false?

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 14.
>
> + foreach(cell, coldef->constraints)
> + {
> + Constraint *constr = (Constraint *) lfirst(cell);
> +
> + if (constr->contype == CONSTR_NOTNULL)
> + saw_notnull = true;
> + }
>
> Why not break immediately from this loop the first time you find
> 'saw_notnull' true?

Modified

> 15.
>
> + tmpobj = new_objtree("DEFAULT");
> + if (attrForm->atthasdef)
> + {
> + char    *defstr;
> +
> + defstr = RelationGetColumnDefault(relation, attrForm->attnum,
> +   dpcontext, exprs);
> +
> + append_string_object(tmpobj, "%{default}s", defstr);
> + }
> + else
> + append_bool_object(tmpobj, "present", false);
> + append_object_object(column, "%{default}s", tmpobj);
>
> Something seems a bit strange here. It looks like there are formats
> called "%{default}s" at 2 levels in this tree, so will it cause a
> hierarchy of objects with the same name?

Yes it will have contents something like:
"default": {"fmt": "DEFAULT %{default}s", "default": "11"}}}],

I don't not find any issues even though it has same name, it is able
to replicate the statement without any issue

> 16. deparse_ColumnIdentity
>
> + column = new_objtree("");
> +
> + if (!OidIsValid(seqrelid))
> + {
> + append_bool_object(column, "present", false);
> + return column;
> + }
>
> I don't really understand the point of making empty tree structures
> for not "present" elements. IIUC this is just going to make the tree
> bigger for no reason and all these not "present" branches will be
> ultimately thrown away, right? I guess the justification is that it
> might be for debugging/documentation but that does not really stand up
> in this case because it seems like just a nameless tree here.

Modified

> 17. deparse_CreateDomain
>
> + createDomain = new_objtree("CREATE");
> +
> + append_object_object(createDomain,
> + "DOMAIN %{identity}D AS",
> + new_objtree_for_qualname_id(TypeRelationId,
> + objectId));
> + append_object_object(createDomain,
> + "%{type}T",
> + new_objtree_for_type(typForm->typbasetype, typForm->typtypmod));
> +
> + if (typForm->typnotnull)
> + append_string_object(createDomain, "%{not_null}s", "NOT NULL");
> + else
> + append_string_object(createDomain, "%{not_null}s", "");
>
> 17a.
> I don't understand why this is not just a single _VA() call instead of
> spread over multiple append_objects like this.

Modified

> 17b.
> In other places, something like the "%{not_null}s" is done with a
> ternary operator instead of the excessive if/else.

Modified

> 18. deparse_CreateFunction
>
> + if (isnull)
> + probin = NULL;
> + else
> + {
> + probin = TextDatumGetCString(tmpdatum);
> + if (probin[0] == '\0' || strcmp(probin, "-") == 0)
> + probin = NULL;
> + }
>
> Maybe it is simpler to assign prbin = NULL where it is declared, then
> here you only need to test the !isnull case.

Modified

> 19.
>
> + append_string_object(createFunc, "%{or_replace}s",
> + node->replace ? "OR REPLACE" : "");
>
> It is not clear to me what is the point of such code - I mean if
> node->replace is false why do append at all? ... Why not use
> appen_format_string() instead()?
> My guess is that this way is preferred to simplify the calling code,
> but knowing that a "" value will just do nothing anyway - seems an
> overcomplicated way to do it though.

We generally use this along with new_objtree_VA, in this case it was
not used along with new_objtree_VA. I have changed it to
new_objtree_VA so that it can be combined

> 20.
>
> + typarray = palloc(list_length(node->parameters) * sizeof(Oid));
> + if (list_length(node->parameters) > procForm->pronargs)
> + {
> + Datum alltypes;
> + Datum    *values;
> + bool    *nulls;
> + int nelems;
> +
> + alltypes = SysCacheGetAttr(PROCOID, procTup,
> +    Anum_pg_proc_proallargtypes, &isnull);
> + if (isnull)
> + elog(ERROR, "NULL proallargtypes, but more parameters than args");
> + deconstruct_array(DatumGetArrayTypeP(alltypes),
> +   OIDOID, 4, 't', 'i',
> +   &values, &nulls, &nelems);
> + if (nelems != list_length(node->parameters))
> + elog(ERROR, "mismatched proallargatypes");
> + for (i = 0; i < list_length(node->parameters); i++)
> + typarray[i] = values[i];
> + }
> + else
> + {
> + for (i = 0; i < list_length(node->parameters); i++)
> + typarray[i] = procForm->proargtypes.values[i];
> + }
>
> The list_length(node->parameters) is used multiple times here; it
> might have been cleaner code to assign that to some local variable.

Modified

> 21.
>
> + * Note that %{name}s is a string here, not an identifier; the reason
> + * for this is that an absent parameter name must produce an empty
> + * string, not "", which is what would happen if we were to use
> + * %{name}I here.  So we add another level of indirection to allow us
> + * to inject a "present" parameter.
> + */
>
> The above comment says:
> must produce an empty string, not ""
>
> I didn't get the point - what is the difference between an empty string and ""?

if we specify as  %{variable}s and if variable is specified as "", the
append_XXX functions will add a " " while deparsing json to string in
the subscriber side. It is not intended to add " " in this case.
I have changed it to below which is better and being followed
similarly in other places too:
if (param->name)
append_string_object(name, "%{name}I", param->name);
else
{
append_null_object(name, "%{name}I");
append_bool_object(name, "present", false);
}

append_object_object(paramobj, "%{name}s", name);

I have removed the comment which is confusing and changed it to above
which is common way that is used in these scenarios

> 22.
>
> + append_string_object(paramobj, "%{mode}s",
> + param->mode == FUNC_PARAM_IN ? "IN" :
> + param->mode == FUNC_PARAM_OUT ? "OUT" :
> + param->mode == FUNC_PARAM_INOUT ? "INOUT" :
> + param->mode == FUNC_PARAM_VARIADIC ? "VARIADIC" :
> + "IN");
>
> There doesn't seem to be much point to test for param->mode ==
> FUNC_PARAM_IN here since "IN" is the default mode anyway.

Modified

> 23.
>
> + name = new_objtree("");
> + append_string_object(name, "%{name}I",
> + param->name ? param->name : "NULL");
> +
> + append_bool_object(name, "present",
> +    param->name ? true : false);
>
> IIUC it is uncommon to inject a "present" object if it was "true", so
> why do it like that here?

Modified to keep it similar as we do in other places

> 24.
>
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{arguments:, }s", params);
> + append_format_string(tmpobj, ")");
>
> Is it necessary to do that in 3 lines? IIUC it would be the same if
> the parens were just included in the append_array_object format,
> right?

There is a possibility that arguments is null, if it is null then
append_array_object will not add "(" and ")". That is the reason we
need to keep it separate.

> 25.
>
> + if (procForm->prosupport)
> + {
> + Oid argtypes[1];
> +
> + /*
> + * We should qualify the support function's name if it wouldn't be
> + * resolved by lookup in the current search path.
> + */
> + argtypes[0] = INTERNALOID;
>
> Might as well just declare this as:
>
> Oid argtypes[] = { INTERNALOID };

Modified

> 26. deparse_CreateOpClassStmt
>
> +
> + stmt = new_objtree_VA("CREATE OPERATOR CLASS %{identity}D", 1,
> +   "identity", ObjTypeObject,
> +   new_objtree_for_qualname(opcForm->opcnamespace,
> +    NameStr(opcForm->opcname)));
> +
> + /* Add the DEFAULT clause */
> + append_string_object(stmt, "%{default}s",
> + opcForm->opcdefault ? "DEFAULT" : "");
> +
> + /* Add the FOR TYPE clause */
> + append_object_object(stmt, "FOR TYPE %{type}T",
> + new_objtree_for_type(opcForm->opcintype, -1));
> +
> + /* Add the USING clause */
> + append_string_object(stmt, "USING %{amname}I",
> get_am_name(opcForm->opcmethod));
>
> This can all be done just as a single VA call I think.

Modified

> 27.
>
> + append_format_string(tmpobj, "(");
> + append_array_object(tmpobj, "%{argtypes:, }T", arglist);
> + append_format_string(tmpobj, ")");
>
> AFAIK this can just be done by a single call including the parens in
> the format string of appen_array_object.

There is a possibility that arguments is null, if it is null then
append_array_object will not add "(" and ")". That is the reason we
need to keep it separate.

> 28. deparse_CreatePolicyStmt
>
> +
> +static ObjTree *
> +deparse_CreatePolicyStmt(Oid objectId, Node *parsetree)
>
> Missing function comment.

Modified

> 29.
>
> + /* Add the rest of the stuff */
> + add_policy_clauses(policy, objectId, node->roles, !!node->qual,
> +    !!node->with_check);
>
> The !! to cast the pointer parameter to boolean is cute, but IIUC that
> is not commonly used in the PG source. Maybe it is more conventional
> to just pass node->qual != NULL etc?

Modified

> 30. deparse_AlterPolicyStmt
>
> +
> +static ObjTree *
> +deparse_AlterPolicyStmt(Oid objectId, Node *parsetree)
>
> Missing function comment.

Modified

> 31.
>
> + /* Add the rest of the stuff */
> + add_policy_clauses(policy, objectId, node->roles, !!node->qual,
> +    !!node->with_check);
>
> The !! to cast the pointer parameter to boolean is cute, but IIUC that
> technique is not commonly used in the PG source. Maybe it is more
> conventional to just pass node->qual != NULL etc?

Modifed

> 32. deparse_CreateSchemaStmt
>
> + else
> + {
> + append_null_object(auth, "%{authorization_role}I ");
> + append_bool_object(auth, "present", false);
> + }
>
> 32a.
> Why append a NULL object if the "present" says it is false anyway?

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 32b.
> "%{authorization_role}I " -- why do they have extra space on the end?
> Just let the append_XXX functions can take care of the space
> separators automagically instead.

Modified

> 33. deparse_AlterDomainStmt
>
> + {
> + fmt = "ALTER DOMAIN";
> + type = "drop default";
> + alterDom = new_objtree_VA(fmt, 1, "type", ObjTypeString, type);
>
> This code style of assigning the 'fmt' and 'type' like this is not
> typical of all the other deparse_XXX functions which just pass
> parameter literals. Also, I see no good reason that the 'fmt' is
> unconditionally assigned to "ALTER DOMAIN" in 6 different places.

Removed it

> 34.
>
> AFAICT all these cases can be simplified to use single VA() calls and
> remove all the append_XXX.

Modified

> 35.
>
> +
> + break;
> + case 'N':
>
> Spurious or misplaced blank line.

Modified

> 36.
>
> + case 'C':
> +
> + /*
> + * ADD CONSTRAINT.  Only CHECK constraints are supported by
> + * domains
> + */
>
> A spurious blank line is inconsistent with the other cases.

Modified

> 36.
>
> +
> + break;
> + default:
>
> Spurious or misplaced blank line.

Modified

> 37. deparse_CreateStatisticsStmt
>
> + append_format_string(createStat, "FROM");
> +
> + append_object_object(createStat, "%{stat_table_identity}D",
> + new_objtree_for_qualname(get_rel_namespace(statform->stxrelid),
> +   get_rel_name(statform->stxrelid)));
>
> It would be easier to do things like this using a single call using a
> format of "FROM %{stat_table_identity}D", rather than have the extra
> append_format_string call.

Modified

> 38. deparse_CreateForeignServerStmt
>
> + /* Add a TYPE clause, if any */
> + tmp = new_objtree_VA("TYPE", 0);
> + if (node->servertype)
> + append_string_object(tmp, "%{type}L", node->servertype);
> + else
> + append_bool_object(tmp, "present", false);
> + append_object_object(createServer, "%{type}s", tmp);
> +
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
>
> Why use the VA() function if passing 0 args?

Modified

> 39.
>
> + append_string_object(createServer, "FOREIGN DATA WRAPPER %{fdw}I",
> node->fdwname);
> + /* add an OPTIONS clause, if any */
> + append_object_object(createServer, "%{generic_options}s",
> + deparse_FdwOptions(node->options, NULL));
>
> 39a.
> Use uppercase comment.

Modified

> 39b.
> Missing blank line above comment?

Modified

> 40. deparse_AlterForeignServerStmt
>
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
>
> Why use the VA() function if passing 0 args?

This was already fixed

> 41.
>
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
> + if (node->has_version && node->version)
> + append_string_object(tmp, "%{version}L", node->version);
> + else if (node->has_version)
> + append_string_object(tmp, "version", "NULL");
> + else
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterServer, "%{version}s", tmp);
> +
> + /* Add a VERSION clause, if any */
> + tmp = new_objtree_VA("VERSION", 0);
> + if (node->has_version && node->version)
> + append_string_object(tmp, "%{version}L", node->version);
> + else if (node->has_version)
> + append_string_object(tmp, "version", "NULL");
> + else
> + append_bool_object(tmp, "present", false);
> + append_object_object(alterServer, "%{version}s", tmp);
>
> Huh? Looks like a cut/paste error of duplicate VERSION clauses. Is this correct?

Removed the duplicate code

> 42. deparse_CreateStmt
>
> + if (tableelts == NIL)
> + {
> + tmpobj = new_objtree("");
> + append_bool_object(tmpobj, "present", false);
> + }
> + else
> + tmpobj = new_objtree_VA("(%{elements:, }s)", 1,
> + "elements", ObjTypeArray, tableelts);
>
> This fragment seemed a bit complicated. IIUC this is the same as just:
>
> tmpobj = new_objtree("");
> if (tableelts)
> append_array_object(tmpobj, "(%{elements:, }s)", tableelts);
> else
> append_bool_object(tmpobj, "present", false);

Modified

> 43.
>
> + tmpobj = new_objtree("INHERITS");
> + if (list_length(node->inhRelations) > 0)
> + append_array_object(tmpobj, "(%{parents:, }D)",
> deparse_InhRelations(objectId));
> + else
> + {
> + append_null_object(tmpobj, "(%{parents:, }D)");
> + append_bool_object(tmpobj, "present", false);
> + }
> + append_object_object(createStmt, "%{inherits}s", tmpobj);
>
> 43a.
> AFAIK convention for checking non-empty List is just "if
> (node->inhRelations != NIL)" or simply "if (node->inhRelations)

Modified

> 43b.
> Maybe I misunderstand something but I don't see why append_null_object
> is needed for tree marked as "present"=false anyhow. This similar
> pattern happens multiple times in this function.

This code was intended to generate a verbose json node. So that user
can easily modify the command by changing the value to generate a new
ddl command.

> 44. deparse_DefineStmt
>
> + switch (define->kind)
> + {
>
> IMO better to put all these OBJECT_XXX cases in alphabetical order
> instead of just random.

Modified

> 45.
>
> + default:
> + elog(ERROR, "unsupported object kind");
> + }
>
> Should this also log what the define->kind was attempted?

Modified

> 46. deparse_DefineStmt_Collation
>
> + stmt = new_objtree_VA("CREATE COLLATION", 0);
> +
> + append_object_object(stmt, "%{identity}D",
> + new_objtree_for_qualname(colForm->collnamespace,
> +   NameStr(colForm->collname)));
>
> Why not combine there to avoid VA args with 0 and use VA args with 1 instead?

Modified

> 47.
>
> + if (fromCollid.objectId != InvalidOid)
>
> Use OisIsValid macro.

Modified

> 48.
>
> + append_object_object(stmt, "FROM %{from_identity}D",
> + new_objtree_for_qualname(fromColForm->collnamespace,
> +   NameStr(fromColForm->collname)));
> +
> +
> + ReleaseSysCache(tp);
> + ReleaseSysCache(colTup);
> + return stmt;
>
> Extra blank line.

Modified

> 49.
>
> + if (!isnull)
> + {
> + tmp = new_objtree_VA("LOCALE=", 1,
> + "clause", ObjTypeString, "locale");
> + append_string_object(tmp, "%{locale}L",
> + psprintf("%s", TextDatumGetCString(datum)));
>
> IMO it should be easy enough to express this using a single VA(2 args)
> function, so avoiding the extra append_string. e.g. other functions
> like deparse_DefineStmt_Operator do this.
>
> And this same comment also applies to the rest of this function:
> - tmp = new_objtree_VA("LC_COLLATE=", 1,
> - tmp = new_objtree_VA("LC_CTYPE=", 1,
> - tmp = new_objtree_VA("PROVIDER=", 1,
> - tmp = new_objtree_VA("PROVIDER=", 1,
> - tmp = new_objtree_VA("DETERMINISTIC=", 1,
> - tmp = new_objtree_VA("VERSION=", 1,

Modified

Thanks for the comments, the attached v40 patch has the changes for the same.

Regards,
Vignesh

Attachment

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Minimal logical decoding on standbys
Next
From: Ashutosh Bapat
Date:
Subject: Re: Avoid streaming the transaction which are skipped (in corner cases)