Re: Support logical replication of DDLs - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Support logical replication of DDLs
Date
Msg-id CAHut+Pu6H6hY0JJNNRCRmFpM_3817z=0xjm-_ibP+cL85pBOpQ@mail.gmail.com
Whole thread Raw
In response to Re: Support logical replication of DDLs  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Support logical replication of DDLs
Re: Support logical replication of DDLs
List pgsql-hackers
Here are more review comments for the v32-0001 file ddl_deparse.c

This completes my first review pass over this overly large file.

This review has taken a long time, so for any of my review comments
(in this and previous posts) that get rejected, please reply citing
the rejected reference numbers, because I hope to avoid spending
multiple days (in a future review) trying to reconcile what was
addressed vs what was not addressed. TIA.

*** NOTE - my review post became too big, so I split it into smaller parts.

THIS IS PART 1 OF 4.

======

src/backend/commands/ddl_deparse.c

G.1. GENERAL _VA args wrapping

+ tmp = new_objtree_VA("WITH GRANT OPTION",
+ 1, "present", ObjTypeBool,
+ stmt->action->grant_option);

In general, I think all these _VA() style function calls are easier to
read if you can arrange to put each of the argument names on a new
line instead of just wrapping them randomly.

So the above would be better as:

tmp = new_objtree_VA("WITH GRANT OPTION", 1,
"present", ObjTypeBool, stmt->action->grant_option);

Please search/modify all cases of new_objtree_VA like this.

~~~

G.2. GENERAL - "type" object

There are many functions that insert a "type" object for some purpose:

e.g.
+ tmpobj = new_objtree_VA("DETACH PARTITION %{partition_identity}D FINALIZE", 2,
+ "type", ObjTypeString, "detach partition finalize",
+ "partition_identity", ObjTypeObject,
+ new_objtree_for_qualname_id(RelationRelationId,
+ sub->address.objectId));

e.g.
+ tmpobj = new_objtree_VA(fmtstr, 2,
+ "type", ObjTypeString, "add column",
+ "definition", ObjTypeObject, tree);

I'm not sure yet what these "type" objects are used for, but I felt
that these unsubstituted values should look slightly more like enum
values, and slightly less like real SQL syntax.

For example - maybe do like this (for the above):

"detach partition finalize" -> "type_detach_partition_finalize"
"add column" -> "type_add_column"
etc.

~~~

G.3. GENERAL - JSON deparsed structures should be documented

AFAICT there are mixtures of different JSON structure styles at play
in this module. Sometimes there are trees with names and sometimes
not, sometimes there are "present" objects and sometimes not.
Sometimes entire trees seemed unnecessary to me. It feels quite
arbitrary in places but it's quite hard to compare them because
everything is spread across 9000+ lines.

IMO all these deparse structures ought to be documented. Then I think
it will become apparent that lots of them are inconsistent with the
others. Even if such documentation is ultimately not needed by
end-users, I think it would be a very valuable internal design
accompaniment to this module, and it would help a lot for
reviews/maintenance/bug prevention etc. Better late than never.

~~~

G.4 GENERAL - Underuse of _VA() function.

(Probably I've mentioned this before in previous review comments, but
I keep encountering this many times).

The json is sort of built up part by part and objects are appended ...
it was probably easier to think about each part during coding but OTOH
I think this style is often unnecessary. IMO most times the function
can be far simpler just by gathering together all the necessary values
and then using a single big new_objtree_VA() call to deparse the
complete format in one call. I think it could also shave 100s of lines
of code from the module.

~~~

G.5 GENERAL - Inconsistent function comment wording.

The function comments are worded in different ways...

"Given a XXX OID and the parse tree that created it, return an ObjTree
representing the creation command."

versus

"Given a XXX OID and the parse tree that created it, return the JSON
blob representing the creation command."

Please use consistent wording throughout.

~~~

G.6 GENERAL - present=false

There are many calls that do like:
append_bool_object(tmpobj, "present", false);

I was thinking the code would be cleaner if there was a wrapper function like:

static void
append_not_present(ObjTree objTree)
{
append_bool_object(objTree, "present", false);
}

~~~

G.7 GENERAL - psprintf format strings

There are quite a few places where the format string is
pre-constructed using psprintf.

e.g.
+ fmt = psprintf("ALTER %s %%{identity}s OWNER TO %%{newowner}I",
+    stringify_objtype(node->objectType));
+
+ ownerStmt = new_objtree_VA(fmt, 2,
+    "identity", ObjTypeString,
+    getObjectIdentity(&address, false),
+    "newowner", ObjTypeString,
+    get_rolespec_name(node->newowner));

It's not entirely clear to me why this kind of distinction is even
made, or even what are the rules governing the choice. AFAICT this
same result could be achieved by using another string substitution
marker. So why not do it that way instead of mixing different styles?

IMO many/most of the psprintf can be removed.

e.g. I mean something like this (for the above example):

fmt = "ALTER %{obj_type}s %{identity}s OWNER TO %{newowner}I";

ownerStmt = new_objtree_VA(fmt, 3,
"obj_type", ObjTypeString, stringify_objtype(node->objectType),
"identity", ObjTypeString, getObjectIdentity(&address, false),
"newowner", ObjTypeString, get_rolespec_name(node->newowner));

~~~

G.8 GENERAL - Inconsistent OID/oid in error messages.

errmsg("role with OID %u does not exist", roleoid)));
elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failed for schema with OID %u",
errmsg("role with OID %u does not exist", istmt->grantor_uid)));
elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
elog(ERROR, "cache lookup failed for type with OID %u", objectId);
elog(ERROR, "cache lookup failed for conversion with OID %u", objectId);
elog(ERROR, "cache lookup failed for extension with OID %u",
elog(ERROR, "cache lookup failed for extension with OID %u",
elog(ERROR, "cache lookup failed for cast with OID %u", objectId);
elog(ERROR, "cache lookup failed for domain with OID %u", objectId);
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failure for language with OID %u",
elog(ERROR, "null prosrc in function with OID %u", objectId);
elog(ERROR, "cache lookup failed for opclass with OID %u", opcoid);
elog(ERROR, "cache lookup failed for operator family with OID %u",
opcForm->opcfamily);
elog(ERROR, "cache lookup failed for operator family with OID %u", objectId);
elog(ERROR, "cache lookup failed for domain with OID %u",
elog(ERROR, "cache lookup failed for collation with OID %u", objectId);
elog(ERROR, "cache lookup failed for operator with OID %u", objectId);
elog(ERROR, "cache lookup failed for type with OID %u", objectId);
elog(ERROR, "cache lookup failed for text search parser with OID %u",
elog(ERROR, "cache lookup failed for text search dictionary " "with
OID %u", objectId);
elog(ERROR, "cache lookup failed for text search template with OID %u",
elog(ERROR, "cache lookup failed for text search dictionary " "with
OID %u", objectId);
elog(ERROR, "cache lookup failed for opclass with OID %u",
elog(ERROR, "cache lookup failed for operator family with OID %u",
elog(ERROR, "cache lookup failure for transform with OID %u",
elog(ERROR, "cache lookup failure for language with OID %u",
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failure for function with OID %u",
elog(ERROR, "cache lookup failed for rewrite rule for view with OID
%u", viewoid)

elog(ERROR, "cache lookup failed for range with type oid %u",
elog(ERROR, "cache lookup failed for rewrite rule with oid %u",

G.8a.
Most are uppercase 'OID'. A few are lowercase 'oid'

~

G.8b.
There is a mixture of "cache lookup failed" and "cache lookup failure"
-- should all be the same.

~

G.8c.
A few above (e.g. role) have a different message text. Shouldn't those
also be "cache lookup failed"?

~~~

G.9 GENERAL - ObjTree variables

Often the ObjTree variable (for the deparse_XXX function return) is
given the name of the statement it is creating.

Although it is good to be descriptive, often there is no need for long
variable names (e.g. 'createTransform' etc), because there is no
ambiguity anyway and it just makes for extra code characters and
unnecessary wrapping. IMO it would be better to just call everything
some short but *consistent* name across every function -- like 'stmt'
or 'json_ddl' or 'root' or 'ret' ... or whatever.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



pgsql-hackers by date:

Previous
From: "Jonathan S. Katz"
Date:
Subject: Re: User functions for building SCRAM secrets
Next
From: Michael Paquier
Date:
Subject: Re: Avoid overhead open-close indexes (catalog updates)