Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Support logical replication of DDLs
Date
Msg-id CAHut+Puxo_kq2toicNK_BQdeccK3REGW-Xv8tVauFvTNku6V-w@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Support logical replication of DDLs
Re: Support logical replication of DDLs
List pgsql-hackers
Here are some more review comments for the v32-0001 file ddl_deparse.c

(This is a WIP since it is such a large file)

======

1. General - calling VA with 0 args

There are some calls to new_objtree_VA() where 0 extra args are passed.

e.g. See in deparse_AlterFunction
* alterFunc = new_objtree_VA("ALTER FUNCTION", 0);
* ObjTree    *tmpobj = new_objtree_VA("%{type}T", 0);
* tmpobj = new_objtree_VA(intVal(defel->arg) ? "RETURNS NULL ON NULL
INPUT" : "CALLED ON NULL INPUT", 0);
* tmpobj = new_objtree_VA(intVal(defel->arg) ? "SECURITY DEFINER" :
"SECURITY INVOKER", 0);
* tmpobj = new_objtree_VA(intVal(defel->arg) ? "LEAKPROOF" : "NOT
LEAKPROOF", 0);
* etc.

Shouldn't all those just be calling the new_objtree() function instead
of new_objtree_VA()?

Probably there are more than just those cited - please search for others.

~~~

2. General - when to call append_xxx functions?

I did not always understand what seems like an arbitrary choice of
function calls to append_xxx.

e.g. Function deparse_AlterObjectSchemaStmt() does:

+ append_string_object(alterStmt, "%{identity}s", ident);
+
+ append_string_object(alterStmt, "SET SCHEMA %{newschema}I", newschema);

Why doesn't the above just use new_objtree_VA instead -- it seemed to
me like the _VA function is underused in some places. Maybe search all
the append_xxx usage - I suspect many of those can in fact be combined
to use new_objtree_VA().

~~~

3. General - extract object names from formats

IIUC the calls to append_XXX__object will call deeper to
append_object_to_format_string(), which has a main side-effect loop to
extract the "name" part out of the sub_fmt string. But this logic all
seems inefficient and unnecessary to me. I think in most (all?) cases
the caller already knows what the object name should be, so instead of
making code work to figure it out again, it can just be passed in the
same way the _VA() function passes the known object name.

There are many cases of this:

e.g.

BEFORE
append_string_object(alterop, "(%{left_type}s", "NONE");

AFTER - just change the signature to pass the known object name
append_string_object(alterop, "(%{left_type}s", "left_type", "NONE");

~~~

4. General - fwd declares

static void append_array_object(ObjTree *tree, char *name, List *array);
static void append_bool_object(ObjTree *tree, char *name, bool value);
static void append_float_object(ObjTree *tree, char *name, float8 value);
static void append_null_object(ObjTree *tree, char *name);
static void append_object_object(ObjTree *tree, char *name, ObjTree *value);
static char *append_object_to_format_string(ObjTree *tree, char *sub_fmt);
static void append_premade_object(ObjTree *tree, ObjElem *elem);
static void append_string_object(ObjTree *tree, char *name, char *value);


I think those signatures are misleading. AFAIK seems what is called
the 'name' param above is often a 'sub_fmt' param in the
implementation.

~~~

5. General - inconsistent append_array object calls.

Sometimes enclosing brackets are included as part of the format string
to be appended and other times they are appended separately. IIUC
there is no difference but IMO the code should always be consistent to
avoid it being confusing.

e.g.1 (brackets in fmt)
append_array_object(tmpobj, "(%{rettypes:, }s)", rettypes);

e.g.2 (brackets appended separately)
+ append_format_string(tmpobj, "(");
+ append_array_object(tmpobj, "%{argtypes:, }T", arglist);
+ append_format_string(tmpobj, ")");

~~~

6. General - missing space before '('

I noticed a number of comment where there is a space missing before a '('.
Here are some examples:

- * An element of an object tree(ObjTree).
- * typmod at the middle of name(e.g. TIME(6) with time zone ). We cannot
- * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or
- * Sequence for IDENTITY COLUMN output separately(via CREATE TABLE or

Search all the patch-code to find others and add missing spaces.


~~~

7. General - Verbose syntax comments

Some (but not all) of the deparse_XXX functions have a comment
describing the verbose syntax.

e.g.
    /*
     * Verbose syntax
     *
     * CREATE %{default}s CONVERSION %{identity}D FOR %{source}L TO %{dest}L
     * FROM %{function}D
     */

These are helpful for understanding the logic of the function, so IMO
similar comments should be written for *all* of the deparse_xxx
function.

And maybe a more appropriate place to put these comments is in the
function header comment.


======

8. new_objtree_VA

+ /*
+ * For all other param types there must be a value in the varargs.
+ * Fetch it and add the fully formed subobject into the main object.
+ */
+ switch (type)

What does the comment mean when it says - for all "other" param types?

~~~

9. objtree_to_jsonb_element

+ ListCell   *cell;
+ JsonbValue val;

The 'cell' is only for the ObjTypeArray so consider declaring it for
that case only.

~~~

10. obtainConstraints

+ else
+ {
+ Assert(OidIsValid(domainId));

Looks like the above Assert is unnecessary because the earlier Assert
(below) already ensures this:
+ /* Only one may be valid */
+ Assert(OidIsValid(relationId) ^ OidIsValid(domainId));

~~~

11. pg_get_indexdef_detailed

+ /* Output tablespace */
+ {
+ Oid tblspc;
+
+ tblspc = get_rel_tablespace(indexrelid);
+ if (OidIsValid(tblspc))
+ *tablespace = pstrdup(quote_identifier(get_tablespace_name(tblspc)));
+ else
+ *tablespace = NULL;
+ }
+
+ /* Report index predicate, if any */
+ if (!heap_attisnull(ht_idx, Anum_pg_index_indpred, NULL))
+ {
+ Node    *node;
+ Datum predDatum;
+ char    *predString;
+
+ /* Convert text string to node tree */
+ predDatum = SysCacheGetAttr(INDEXRELID, ht_idx,
+ Anum_pg_index_indpred, &isnull);
+ Assert(!isnull);
+ predString = TextDatumGetCString(predDatum);
+ node = (Node *) stringToNode(predString);
+ pfree(predString);
+
+ /* Deparse */
+ *whereClause =
+ deparse_expression(node, context, false, false);
+ }
+ else
+ *whereClause = NULL;

Maybe just assign defaults:
*tablespace = NULL;
*whereClause = NULL;

then overwrite those defaults, so can avoid the 'else' code.

~~~

12. stringify_objtype

+/*
+ * Return the given object type as a string.
+ */
+static const char *
+stringify_objtype(ObjectType objtype)

12a.
This statics function feels like it belongs more in another module as
a utility function.

~

12b.
Actually, this function looks like it might be more appropriate just
as a static lookup array/map of names keys by the ObjectType, and
using a StaticAssertDecl for sanity checking.

~~~

13. deparse_GrantStmt

+ /*
+ * If there are no objects from "ALL ... IN SCHEMA" to be granted, then we
+ * need not do anything.
+ */
+ if (istmt->objects == NIL)
+ return NULL;

"we need not do anything." -> "nothing to do."

~~~

14. deparse_GrantStmt

+ switch (istmt->objtype)
+ {
+ case OBJECT_COLUMN:
+ case OBJECT_TABLE:
+ objtype = "TABLE";
+ classId = RelationRelationId;
+ break;
+ case OBJECT_SEQUENCE:
+ objtype = "SEQUENCE";
+ classId = RelationRelationId;
+ break;
+ case OBJECT_DOMAIN:
+ objtype = "DOMAIN";
+ classId = TypeRelationId;
+ break;
+ case OBJECT_FDW:
+ objtype = "FOREIGN DATA WRAPPER";
+ classId = ForeignDataWrapperRelationId;
+ break;
+ case OBJECT_FOREIGN_SERVER:
+ objtype = "FOREIGN SERVER";
+ classId = ForeignServerRelationId;
+ break;
+ case OBJECT_FUNCTION:
+ objtype = "FUNCTION";
+ classId = ProcedureRelationId;
+ break;
+ case OBJECT_PROCEDURE:
+ objtype = "PROCEDURE";
+ classId = ProcedureRelationId;
+ break;
+ case OBJECT_ROUTINE:
+ objtype = "ROUTINE";
+ classId = ProcedureRelationId;
+ break;
+ case OBJECT_LANGUAGE:
+ objtype = "LANGUAGE";
+ classId = LanguageRelationId;
+ break;
+ case OBJECT_LARGEOBJECT:
+ objtype = "LARGE OBJECT";
+ classId = LargeObjectRelationId;
+ break;
+ case OBJECT_SCHEMA:
+ objtype = "SCHEMA";
+ classId = NamespaceRelationId;
+ break;
+ case OBJECT_TYPE:
+ objtype = "TYPE";
+ classId = TypeRelationId;
+ break;
+ case OBJECT_DATABASE:
+ case OBJECT_TABLESPACE:
+ objtype = "";
+ classId = InvalidOid;
+ elog(ERROR, "global objects not supported");
+ break;
+ default:
+ elog(ERROR, "invalid OBJECT value %d", istmt->objtype);
+ }


Shouldn't code be calling to the other function stringify_objtype() to
do some of this?

~~~

15.

+ grantStmt = new_objtree_VA(fmt, 0);
+
+ /* build a list of privileges to grant/revoke */
+ if (istmt->all_privs)
+ {
+ tmp = new_objtree_VA("ALL PRIVILEGES", 0);

Here are some more examples of the _VA function being called with 0
args. Why use _VA function?

~~~

16.

+ list = NIL;
+
+ if (istmt->privileges & ACL_INSERT)
+ list = lappend(list, new_string_object("INSERT"));
+ if (istmt->privileges & ACL_SELECT)
+ list = lappend(list, new_string_object("SELECT"));
+ if (istmt->privileges & ACL_UPDATE)
+ list = lappend(list, new_string_object("UPDATE"));
+ if (istmt->privileges & ACL_DELETE)
+ list = lappend(list, new_string_object("DELETE"));
+ if (istmt->privileges & ACL_TRUNCATE)
+ list = lappend(list, new_string_object("TRUNCATE"));
+ if (istmt->privileges & ACL_REFERENCES)
+ list = lappend(list, new_string_object("REFERENCES"));
+ if (istmt->privileges & ACL_TRIGGER)
+ list = lappend(list, new_string_object("TRIGGER"));
+ if (istmt->privileges & ACL_EXECUTE)
+ list = lappend(list, new_string_object("EXECUTE"));
+ if (istmt->privileges & ACL_USAGE)
+ list = lappend(list, new_string_object("USAGE"));
+ if (istmt->privileges & ACL_CREATE)
+ list = lappend(list, new_string_object("CREATE"));
+ if (istmt->privileges & ACL_CREATE_TEMP)
+ list = lappend(list, new_string_object("TEMPORARY"));
+ if (istmt->privileges & ACL_CONNECT)
+ list = lappend(list, new_string_object("CONNECT"));

16a.
Shouldn't this be trying to re-use code like privilege_to_string()
mapping function already in aclchk.c to get all those ACL strings?

~

16b.
Is it correct that ACL_SET and ACL_ALTER_SYSTEM are missing?

~~~

17.

The coding style is inconsistent in this function...

For the same things - sometimes use the ternary operator; sometimes use if/else.

e.g.1
+ append_string_object(grantStmt, "%{grant_option}s",
+ istmt->grant_option ? "WITH GRANT OPTION" : "");

e.g.2
+ if (istmt->behavior == DROP_CASCADE)
+ append_string_object(grantStmt, "%{cascade}s", "CASCADE");
+ else
+ append_string_object(grantStmt, "%{cascade}s", "");

~~~

18. deparse_AlterOpFamily

+ tmpobj2 = new_objtree_VA("FOR ORDER BY", 0);
+ append_object_object(tmpobj2, "%{opfamily}D",
+ new_objtree_for_qualname_id(OperatorFamilyRelationId,
+ oper->sortfamily));

Why isn't something like this combined to be written as a signle
new_objtree_VA call?

~~~

19. deparse_Type_Storage

+ tmpstr = psprintf("%s", str);
+
+ fmt = "STORAGE = %{value}s";
+
+ storage = new_objtree_VA(fmt, 2,
+ "clause", ObjTypeString, "storage",
+ "value", ObjTypeString, tmpstr);

19a.
What is the purpose of tmpstr? Seems unnecessary

~

19b.
What is the purpose of separate 'fmt' var? Why not just pass format
string as a parameter literal to the new_objtree_VA()

~~~

20. deparse_CreateConversion

+ /* Add the DEFAULT clause */
+ append_string_object(ccStmt, "default",
+ conForm->condefault ? "DEFAULT" : "");

20a.
Is that code correct?  I thought the fmt should look like
"%{default}s", otherwise won't the resulting string object have no
name?

~

20b.
Anyway, it does not seem to match what the preceding verbose syntax
comment says.

~~~

21.

+
+
+ /* Add the DEFAULT clause */
+ append_string_object(ccStmt, "default",
+ conForm->condefault ? "DEFAULT" : "");
+
+ tmpObj = new_objtree_for_qualname(conForm->connamespace,
NameStr(conForm->conname));
+ append_object_object(ccStmt, "CONVERSION %{identity}D", tmpObj);
+ append_string_object(ccStmt, "FOR %{source}L", (char *)
+ pg_encoding_to_char(conForm->conforencoding));
+ append_string_object(ccStmt, "TO %{dest}L", (char *)
+ pg_encoding_to_char(conForm->contoencoding));
+ append_object_object(ccStmt, "FROM %{function}D",
+ new_objtree_for_qualname_id(ProcedureRelationId,
+ conForm->conproc));

I don't really understand why all this is not written instead using a
single new_objtree_VA() call.

~~~

22. deparse_CreateEnumStmt

+ enumtype = new_objtree("CREATE TYPE");
+ append_object_object(enumtype, "%{identity}D",
+ new_objtree_for_qualname_id(TypeRelationId,
+ objectId));
+
+ values = NIL;
+ foreach(cell, node->vals)
+ {
+ String    *val = lfirst_node(String, cell);
+
+ values = lappend(values, new_string_object(strVal(val)));
+ }
+
+ append_array_object(enumtype, "AS ENUM (%{values:, }L)", values);
+ return enumtype;

Ditto. I don't really understand why all this is not written instead
using a single new_objtree_VA() call.

~~~

23. deparse_CreateExtensionStmt

+ extStmt = new_objtree("CREATE EXTENSION");
+
+ append_string_object(extStmt, "%{if_not_exists}s",
+ node->if_not_exists ? "IF NOT EXISTS" : "");
+
+ append_string_object(extStmt, "%{name}I", node->extname);

Ditto. I don't really understand why all this is not written instead
using a single new_objtree_VA() call.

~~~

24. deparse_FdwOptions

+ tmp = new_objtree_VA("OPTIONS", 0);

Isn't it better to call other function instead of passing zero params
to this one?

~~~

25. deparse_CreateFdwStmt

25a.
+ /* add HANDLER clause */
+ if (fdwForm->fdwhandler == InvalidOid)
+ tmp = new_objtree_VA("NO HANDLER", 0);
+ else

Isn't it better to call other function instead of passing zero params
to this one?

~

25b.
+ /* add VALIDATOR clause */
+ if (fdwForm->fdwvalidator == InvalidOid)
+ tmp = new_objtree_VA("NO VALIDATOR", 0);

Ditto #25a

~

25c.
Both above should use OidIsValid macro.

~~~

26. deparse_AlterFdwStmt

26a.
+ /* add HANDLER clause */
+ if (fdwForm->fdwhandler == InvalidOid)
+ tmp = new_objtree_VA("NO HANDLER", 0);

Ditto #25a

~

26b.
+ /* add VALIDATOR clause */
+ if (fdwForm->fdwvalidator == InvalidOid)
+ tmp = new_objtree_VA("NO VALIDATOR", 0);

Ditto #25a

~

26c.
Both above should use OidIsValid macro.

~~~

27. deparse_CreateRangeStmt

+ /* SUBTYPE */
+ tmp = new_objtree_for_qualname_id(TypeRelationId,
+   rangeForm->rngsubtype);
+ tmp = new_objtree_VA("SUBTYPE = %{type}D",
+ 2,
+ "clause", ObjTypeString, "subtype",
+ "type", ObjTypeObject, tmp);
+ definition = lappend(definition, new_object_object(tmp));


The reusing of 'tmp' variable seems a bit sneaky to me. Perhaps using
'tmp' and 'tmp_qualid' might be a more readable way to go here.

~~~

28. deparse_AlterEnumStmt

+ if (node->newValNeighbor)
+ {
+ append_string_object(alterEnum, "%{after_or_before}s",
+ node->newValIsAfter ? "AFTER" : "BEFORE");
+ append_string_object(alterEnum, "%{neighbour}L", node->newValNeighbor);
+ }

Has a mix of US and UK spelling of neighbor/neighbour?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Ian Lawrence Barwick
Date:
Subject: Re: Improve description of XLOG_RUNNING_XACTS
Next
From: Bharath Rupireddy
Date:
Subject: Re: Adding doubly linked list type which stores the number of items in the list