Re: Proposal: Conflict log history table for Logical Replication - Mailing list pgsql-hackers
| From | Peter Smith |
|---|---|
| Subject | Re: Proposal: Conflict log history table for Logical Replication |
| Date | |
| Msg-id | CAHut+PtSggpJH36YOwdfmY5gU6yr7Wa-=reht4c2v+n8FYUKJg@mail.gmail.com Whole thread Raw |
| In response to | Re: Proposal: Conflict log history table for Logical Replication (Dilip Kumar <dilipbalaut@gmail.com>) |
| List | pgsql-hackers |
Some review comments for v12-0001.
======
General
1.
There is no documentation. Even if it seems a bit premature IMO
writing/reviewing the documention could help identify unanticipated
usability issues.
======
src/backend/commands/subscriptioncmds.c
2.
+
+ /* Setting conflict_log_table = NONE is treated as no table. */
+ if (strcmp(opts->conflictlogtable, "none") == 0)
+ opts->conflictlogtable = NULL;
+ }
2a.
This was unexpected when I cam across this code. This feature needs to
be described in the commit message.
~
2b.
Case sensitive?
~~~
CreateSubscription:
3.
+ List *names;
+
+ /* Explicitly check for empty string before any processing. */
+ if (opts.conflictlogtable[0] == '\0')
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("conflict log table name cannot be empty"),
+ errhint("Provide a valid table name or omit the parameter.")));
+
+ names = stringToQualifiedNameList(opts.conflictlogtable, NULL);
Should '' just be equivalent of NONE instead of another error condition?
~~~
AlterSubscription:
4.
+ Oid old_nspid = InvalidOid;
+ char *old_relname = NULL;
+ char *relname = NULL;
+ List *names = NIL;
Var 'names' can be declared at a lower scope -- e.g. in the 'if' block.
~~~
DropSubscription:
5.
+ /*
+ * Conflict log tables are recorded as internal dependencies of the
+ * subscription. We must drop the dependent objects before the
+ * subscription itself is removed. By using
+ * PERFORM_DELETION_SKIP_ORIGINAL, we ensure that only the conflict log
+ * table is reaped while the subscription remains for the final deletion
+ * step.
+ */
Double spaces? /the subscription/the subscription/
~~~
create_conflict_log_table_tupdesc:
6.
+static TupleDesc
+create_conflict_log_table_tupdesc(void)
+{
+ TupleDesc tupdesc;
+ int i;
+
+ tupdesc = CreateTemplateTupleDesc(MAX_CONFLICT_ATTR_NUM);
+
+ for (i = 0; i < MAX_CONFLICT_ATTR_NUM; i++)
Declare 'i' as a for-loop var.
~~~
create_conflict_log_table:
7.
+/*
+ * Create conflict log table.
+ *
+ * The subscription owner becomes the owner of this table and has all
+ * privileges on it.
+ */
+static void
+create_conflict_log_table(Oid namespaceId, char *conflictrel, Oid subid)
+{
I felt that the 'subid' should be the first parameter, not the last.
~~~
8.
namespace > relation, so I felt it is more natural to check for the
temp namespace *before* checking for clashing table names.
======
src/backend/replication/logical/conflict.c
9.
+ if (ValidateConflictLogTable(conflictlogrel))
+ {
+ /*
+ * Prepare the conflict log tuple. If the error level is below
+ * ERROR, insert it immediately. Otherwise, defer the insertion to
+ * a new transaction after the current one aborts, ensuring the
+ * insertion of the log tuple is not rolled back.
+ */
+ prepare_conflict_log_tuple(estate,
+ relinfo->ri_RelationDesc,
+ conflictlogrel,
+ type,
+ searchslot,
+ conflicttuples,
+ remoteslot);
+ if (elevel < ERROR)
+ InsertConflictLogTuple(conflictlogrel);
+ }
+ else
+ ereport(WARNING,
+ errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("Conflict log table \"%s.%s\" structure changed, skipping insertion",
+ get_namespace_name(RelationGetNamespace(conflictlogrel)),
+ RelationGetRelationName(conflictlogrel)));
9a.
AFAICT in the only few places this function is called it emits exactly
the same warning, so it seems unnecessary duplication. Would it be
better to have that WARNING code inside the ValidateConflictLogTable
(eg always give the warning when returning false). But see also 9b.
~
9b.
I have some doubts about this validation function. It seems
inefficient to be validating the same CLT structure over and over
every time there is a new conflict. Not only is that going to be
slower, but the logfile is going to fill up with warnings. Maybe this
"validation" phase should be a one-time check only during the
CREATE/ALTER SUBSCRIPTION.
Maybe if validation fails it could give some NOTICE that the CLT
logging is broken and then reset the CLT to NONE?
~~~
ValidateConflictLogTable:
10.
+/*
+ * ValidateConflictLogTable - Validate conflict log table
+ *
+ * Validate whether the conflict log table is still suitable for considering as
+ * conflict log table.
+ */
+bool
+ValidateConflictLogTable(Relation rel)
This function comment seems unhelpful. 3 times it mentions equivalent
of "validate conflict log table" but nowhere does it say what that
even means.
Maybe the later comment (below):
+ /*
+ * Check whether the table definition including its column names, data
+ * types, and column ordering meets the requirements for conflict log
+ * table.
+ */
Should be moved into the function comment part.
~~~
11.
+ Relation pg_attribute;
+ HeapTuple atup;
+ ScanKeyData scankey;
+ SysScanDesc scan;
+ Form_pg_attribute attForm;
+ int attcnt = 0;
+ bool tbl_ok = true;
'attForm' can be declared within the while loop.
~~~
12.
+ if (attcnt != MAX_CONFLICT_ATTR_NUM || !tbl_ok)
+ return false;
As per previous review comment, this could emit the WARNING log right
here. But see also #9b.
~~~
build_local_conflicts_json_array:
13.
+ Datum values[MAX_LOCAL_CONFLICT_INFO_ATTRS];
+ bool nulls[MAX_LOCAL_CONFLICT_INFO_ATTRS];
+ char *origin_name = NULL;
+ HeapTuple tuple;
+ Datum json_datum;
+ int attno;
+
+ memset(values, 0, sizeof(Datum) * MAX_LOCAL_CONFLICT_INFO_ATTRS);
+ memset(nulls, 0, sizeof(bool) * MAX_LOCAL_CONFLICT_INFO_ATTRS);
You could also just use designated initializer syntax here and avoid
the memsets.
e.g. = {0}
~~~
14.
+ memset(values, 0, sizeof(Datum) * MAX_LOCAL_CONFLICT_INFO_ATTRS);
+ memset(nulls, 0, sizeof(bool) * MAX_LOCAL_CONFLICT_INFO_ATTRS);
Another place where you could've avoided memset and just done = {0};
~~~
15.
+ json_datum_array = (Datum *) palloc(num_conflicts * sizeof(Datum));
+ json_null_array = (bool *) palloc0(num_conflicts * sizeof(bool));
- index_value = BuildIndexValueDescription(indexDesc, values, isnull);
+ i = 0;
+ foreach(lc, json_datums)
+ {
+ json_datum_array[i] = (Datum) lfirst(lc);
+ i++;
+ }
Should these be using new palloc_array instead of palloc?
======
src/include/replication/conflict.h
16.
+typedef struct ConflictLogColumnDef
+{
+ const char *attname; /* Column name */
+ Oid atttypid; /* Data type OID */
+} ConflictLogColumnDef;
Add this to typedefs.list
~~~
17.
+/* The single source of truth for the conflict log table schema */
+static const ConflictLogColumnDef ConflictLogSchema[] =
+{
+ { .attname = "relid", .atttypid = OIDOID },
+ { .attname = "schemaname", .atttypid = TEXTOID },
+ { .attname = "relname", .atttypid = TEXTOID },
+ { .attname = "conflict_type", .atttypid = TEXTOID },
+ { .attname = "remote_xid", .atttypid = XIDOID },
+ { .attname = "remote_commit_lsn",.atttypid = LSNOID },
+ { .attname = "remote_commit_ts", .atttypid = TIMESTAMPTZOID },
+ { .attname = "remote_origin", .atttypid = TEXTOID },
+ { .attname = "replica_identity", .atttypid = JSONOID },
+ { .attname = "remote_tuple", .atttypid = JSONOID },
+ { .attname = "local_conflicts", .atttypid = JSONARRAYOID }
+};
I like this, but I felt it would be better if all the definitions for
"local_conflicts" were defined here too. Then everythin gis in one
place.
e.g. MAX_LOCAL_CONFLICT_INFO_ATTRS and most of the content of
build_conflict_tupledesc().
~~~
18.
+/* Define the count using the array size */
+#define MAX_CONFLICT_ATTR_NUM (sizeof(ConflictLogSchema) /
sizeof(ConflictLogSchema[0]))
This comment is just saying same as the code so doesn't seem to be useful.
======
src/test/regress/expected/subscription.out
19.
+\dt+ clt.regress_conflict_log3
+ List of tables
+ Schema | Name | Type | Owner |
Persistence | Size | Description
+--------+-----------------------+-------+---------------------------+-------------+---------+-------------
+ clt | regress_conflict_log3 | table | regress_subscription_user |
permanent | 0 bytes |
+(1 row)
Since the CLT is auto-created internally, and since there is a
"Description" attribute, I wonder should you also be auto-generating
that description so that here it might say something useful like:
"Conflict Log File for subscription XYZ"
~~~
20.
+-- ok - create subscription with conflict_log_table = NONE
+CREATE SUBSCRIPTION regress_conflict_test1 CONNECTION
'dbname=regress_doesnotexist' PUBLICATION testpub WITH (connect =
false, conflict_log_table = NONE);
+SELECT subname, subconflictlogtable FROM pg_subscription WHERE
subname = 'regress_conflict_test2';
+ subname | subconflictlogtable
+------------------------+-----------------------
+ regress_conflict_test2 | regress_conflict_log3
+(1 row)
+
I didn't understand this test case; You are setting a NONE clt for
subscription 'regress_conflict_test1'. But then you are checking
subname 'regress_conflict_test2'.
Is that a typo?
~~~
21.
+ALTER SUBSCRIPTION regress_conflict_test1 DISABLE;
+ALTER SUBSCRIPTION regress_conflict_test1 SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_conflict_test1;
+-- Clean up remaining test subscription
+ALTER SUBSCRIPTION regress_conflict_test2 DISABLE;
+ALTER SUBSCRIPTION regress_conflict_test2 SET (slot_name = NONE);
+DROP SUBSCRIPTION regress_conflict_test2;
Something seems misplaced. Why aren't all of the cleanups under the
'cleanup' comment?
======
Kind Regards,
Peter Smith.
Fujitsu Australia
pgsql-hackers by date: