Re: pg_rewind docs correction - Mailing list pgsql-hackers

From James Coleman
Subject Re: pg_rewind docs correction
Date
Msg-id CAAaqYe_DfddPrKs++Kz9A13eXtNXjCtbr_amr=AEd2UZsXFikw@mail.gmail.com
Whole thread Raw
In response to Re: pg_rewind docs correction  (Michael Paquier <michael@paquier.xyz>)
Responses Re: pg_rewind docs correction  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Mon, Mar 9, 2020 at 2:59 AM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Mar 08, 2020 at 05:13:21PM -0400, James Coleman wrote:
> I've added the information about how the backup label control file is
> written, and updated the How It Works steps to refer to that separately
> from restart.
>
> Additionally the How It Works is updated to include WAL segments and new
> relation files in the list of files copied wholesale, since that was
> previously stated but somewhat contradicted there.

-   The result is equivalent to replacing the target data directory with the
-   source one. Only changed blocks from relation files are copied;
-   all other files are copied in full, including configuration files. The
-   advantage of <application>pg_rewind</application> over taking a new base backup, or
-   tools like <application>rsync</application>, is that <application>pg_rewind</application> does
-   not require reading through unchanged blocks in the cluster. This makes
-   it a lot faster when the database is large and only a small
-   fraction of blocks differ between the clusters.
+   After a successful rewind and subsequent WAL replay, the target data
+   directory is equivalent to a base backup of the source data directory. While
+   only changed blocks from existing relation files are copied; all other files
+   are copied in full, including new relation files, configuration files, and WAL
+   segments. The advantage of <application>pg_rewind</application> over taking a

The first sentence you are adding refers to "subsequent WAL replay".
However, this paragraph emphasizes with the state of the target
cluster after running pg_rewind but *before* make the target cluster
start recovery.  So shouldn't you just remove the part "and subsequent
WAL replay" from your first new sentence?

I'd originally typed this:
I'm not sure I follow. After pg_rewind but before replay the directory is *not* equivalent to a base backup. I don't see how paragraph is clearly limited to describing what pg_rewind does. While the 2nd sentence is about pg_rewind steps specifically, the paragraph (even in the original) goes on to compare it to a base backup so we're talking about the operation in totality not just the one tool.
 
But I realized while typing it that I was probably missing something of what you were getting at: is the hangup on calling out the WAL replay that a base backup (or rsync even) *also* requires WAL reply to reach a consistent state? I hadn't thought of that while writing this initially, so I've updated the patch to eliminate that part but also to make the analogy to base backups more direct, since it's helpful in understanding what result the tool is trying to accomplish and how it differs.

In the same paragraph, I think that you should remove the "While" from
"While only changed blocks", as the second part of the sentence refers
to the other files, WAl segments, etc.

Fixed as part of the above.
 
The second paragraph of the docs regarding timeline lookup is
unchanged, which is fine.

-   When the target server is started for the first time after running
-   <application>pg_rewind</application>, it will go into recovery mode and replay all
-   WAL generated in the source server after the point of divergence.
+   After running <application>pg_rewind</application> the data directory is
+   not immediately in a consistent state. However
+   <application>pg_rewind</application> configures the control file so that when
+   the target server is started again it will enter recovery mode and replay all
+   WAL generated in the source server after the point of divergence.

The second part of the third paragraph is not changed, and the
modification you are doing here is about the control file.  I am
still unconvinced that this is a good change, because mentioning the
control file would be actually more adapted to the part "How it
works", where you are adding details about the backup_label file, and
already include details about the minimum consistency LSN itself
stored in the control file.

I've removed the control file reference and instead continued the analogy to base backups.
 
+   <para>
+    Because <application>pg_rewind</application> copies configuration files
+    entirely from the source, correcting recovery configuration options before
+    restarting the server is necessary if you intend to re-introduce the target
+    as a replica of the source. If you restart the server after the rewind
+    operation has finished but without configuring recovery, the target will
+    again diverge from the primary.
+   </para>

True that this is not outlined enough.

Thanks.
 
+      The relation files are now to their state at the last checkpoint completed
+      prior to the point at which the WAL timelines of the source and target
+      diverged plus the current state on the source of any blocks changed on the
+      target after that divergence.

"Relation files are now in a state equivalent to the moment of the
last completed checkpoint prior to the point.."?

Updated.
 
-      <filename>pg_stat_tmp/</filename>, and
-      <filename>pg_subtrans/</filename> are omitted from the data copied
-      from the source cluster. Any file or directory beginning with
-      <filename>pgsql_tmp</filename> is omitted, as well as are
+      <filename>pg_stat_tmp/</filename>, and <filename>pg_subtrans/</filename>
+      are omitted from the data copied from the source cluster. The files

This is just reorganizing an existing list, why?

The grammar seemed a bit awkward to me, so while I was already reworking this paragraph I tried to clean that up a bit.
 
+      Create a backup label file to begin WAL replay at the checkpoint created
+      at failover and  a minimum consistency LSN using
+      <literal>pg_current_wal_insert_lsn()</literal>, when using a live source
+      and the last checkpoint LSN, when using a stopped source.

Now would be the moment to mention the control file.

I made that more explicit here, and also referenced the filenames directly (and with tags).
 
> I realized I didn't previously add this to the CF; since it's not a new
> patch I've added it to the current CF, but if this is incorrect please let
> me know.

The last CF of Postgres 13 began at the beginning of February :(

Still ongoing, correct? I guess I mentally think of them as being only one month, but I guess that's not actually true. Regardless I'm not sure what policy is for patches that have been in flight in hackers for a while but just missed being added to the CF app.

Thanks,
James
Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: POC: rational number type (fractions)
Next
From: tushar
Date:
Subject: Re: [Proposal] Global temporary tables