Issues in Replication Progress Tracking - Mailing list pgsql-hackers

From Amit Kapila
Subject Issues in Replication Progress Tracking
Date
Msg-id CAA4eK1JqUBVeWWKwUmBPryFaje4190ug0y-OAUHWQ6tD83V4xg@mail.gmail.com
Whole thread Raw
Responses Re: Issues in Replication Progress Tracking
List pgsql-hackers
While reading the commit- 5aa23504 for Replication Progress
Tracking, I came across few issues which I would like to share.

1. catalogs.sgml
+     <row>
+      <entry><structfield>local_lsn</structfield></entry>
+      
<entry><type>pg_lsn</type></entry>
+      <entry></entry>
+      <entry>This node's LSN that at
+      which 
<literal>remote_lsn</literal> has been replicated. Used to
+      flush commit records before persisting data 
to disk when using
+      asynchronous commits.</entry>

I think part of the sentence is not very clear.
"This node's LSN *that at which* remote_lsn has been
replicated."

Reference link in docs

2.
closing ')' is missing.

Check below line:

Using the replication origin infrastructure a session can be marked
as replaying from a remote node (using 
the pg_replication_origin_session_setup() function.


3.
+      <row id="pg-replication-origin-session-progress">
+       <entry>
+        <indexterm>
+         
<primary>pg_replication_origin_session_progress</primary>
+        </indexterm>
+        
<literal><function>pg_replication_origin_progress(<parameter>flush</parameter> <type>bool</type>)
</function></literal>
+       </entry>

Specified function name is wrong
/pg_replication_origin_progress/pg_replication_origin_session_progress

Refer below page in docs:


4.
xloginsert.c
+/* Should te in-progress insertion log the origin */
+static bool include_origin = false;
+

typo /te

5.
origin.c

* * To create and drop replication origins an exclusive lock on
 *   pg_replication_slot is required for the 
duration. That allows us to
 *   safely and conflict free assign new origins using a dirty snapshot.

a. Isn't in above comment, we need to use pg_replication_origin
   instead of pg_replication_slot?
b. "..safely and conflict free assign..", I understand this part
of comment, but not sure if this is the best way to write it.

6.
origin.c
* setting up replorigin_sesssion_origin_lsn and replorigin_sesssion_origin that ensures we

line length greater than 80 and looks slightly odd.

7.
origin.c
replorigin_create()
{
..
if (tuple == NULL)
ereport(ERROR,
(errcode
(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("no free replication oid could be 
found")));
}

I think here error message should be
"no free replication origin oid could be found"

8.
origin.c
replorigin_drop()
{
..
tuple = SearchSysCache1(REPLORIGIDENT, ObjectIdGetDatum(roident));
simple_heap_delete(rel, &tuple-
>t_self);
..
}

Isn't it better to have check for a valid tuple after SearchSysCache1()?
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed

9.
origin.c
ReplicationOriginShmemSize(void)
{
..
* XXX: max_replication_slots is arguablethe wrong
..
}

a. *arguablethe*, space is required and it better to use arguably
b. One thing that in favour of using a separate/new guc for
   ReplicationState is that even if the user has configured
   max_replication_slots for some other usage (other than
   tracking Replication Origin) of ReplicationSlots, even then we
   will end up allocating shared memory which will never be used,
   OTOH as the memory will not be huge, so we can even ignore it.
   

10.
origin.c
CheckPointReplicationOrigin()
{
..
if (unlink(tmppath) < 0 && errno != ENOENT)
ereport(PANIC,
(errcode_for_file_access(),
errmsg("could not remove file \"%s\": %m",
path)));
}

In ereport, tmppath should be used instead of path.

11.
origin.c
* local_commit needs to be a local LSN of the commit so that we can make sure
 * uppon a checkpoint that 
enough WAL has been persisted to disk.

/uppon/upon

12.
In funcions replorigin_advance() and replorigin_session_setup(),
different ways (free_state and free_slot) are used. Isn't it better
to use same way?

13.
In function replorigin_session_setup() and or
replorigin_session_advance(), don't we need to WAL log the
use of Replication state?

14.
pg_replication_origin_session_reset()
{
..
/* FIXME */
replorigin_sesssion_origin = InvalidRepOriginId;
..
}

What needs be fixed in FIXME is not clear?

15.
Below changes in pg_resetxlog.c seems redundant.
--------------------- src/bin/pg_resetxlog/pg_resetxlog.c ---------------------
index a0805d8..4a22575 100644
@@ -56,6 +56,8 @@
 #include "common/restricted_token.h"
 #include "storage/large_object.h"
 #include "pg_getopt.h"
+#include "replication/logical.h"
+#include "replication/origin.h"
 
 
 static ControlFileData ControlFile; /* pg_control values */
@@ -1091,6 +1093,7 @@ WriteEmptyXLOG(void)
  record->xl_tot_len = SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint);
  record->xl_info = XLOG_CHECKPOINT_SHUTDOWN;
  record->xl_rmid = RM_XLOG_ID;
+
  recptr += SizeOfXLogRecord;
  *(recptr++) = XLR_BLOCK_ID_DATA_SHORT;
  *(recptr++) = sizeof(CheckPoint);

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Change pg_cancel_*() to ignore current backend
Next
From: Heikki Linnakangas
Date:
Subject: Re: small typo