Thread: Documentation Edits for pg_createsubscriber

Documentation Edits for pg_createsubscriber

From
"David G. Johnston"
Date:
Hi.

With all the recent work on pg_createsubscriber I decided to take a look at it, namely the documentation.

The 0001 patch is basically just fixing some typos and other minor edits.

The 0002 includes some of the more editoralized (but still limited) changes.

The main thing I noted is that our synopsis is simply wrong regarding the optionality of the --database option.  I went with a fix that included only showing the two mandatory options and adding a paragraph to usage explaining that --database exists and how to use it.  Our existing example also covers that common usage.

I also noted that we've started including both the long and short forms of the options in our synopsis.  I went with the "choose the short option as the only one" as exemplified by pg_upgrade.  It seems quite noisy to include both in a synopsis that is fine with including 10+ other options under [options...].

I tweaked the wording of config-file and felt it worth pointing out the caveat regarding the port setting.

It took me a bit, and reading implementation code, to understand what was meant by "Additional recovery parameters are added to avoid ...".  I decided that detail seemed unnecessary for the user-facing documentation - supposedly they will never see the recovery file if everything works OK (and if they do the various empty string settings should be reasonably self-evident).  If we do want to keep it I'll try to find a better wording.

I removed the use of the word "switch" and replaced it with "option" for consistency.

There might be some items in 0001 that belong in 0002 and vice-versa; the final patch would just be a single one in any case.

I also dropped the bit about "intended for large databases, use normal logical replication for small ones".  If you already have a standby why not use this even for small ones?  Maybe the simplicity of normal replication is more important than how long an overnight process to copy the data tables might take.  We've explained how the two options compare to each other but we tend not to, for good reason, include such conclusions/recommendations in the technical/reference documentation.

David J.


Attachment

RE: Documentation Edits for pg_createsubscriber

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear David,

> The main thing I noted is that our synopsis is simply wrong regarding the
> optionality of the --database option.  I went with a fix that included only
> showing the two mandatory options and adding a paragraph to usage explaining
> that --database exists and how to use it.  Our existing example also covers
> that common usage.

According to the code, -D and -P is anyway needed, but -d can be omitted when -P
contians dbname option. And you wanted to follow the rule, right?
IIUC shorter synopsis is always welcomed.

> It took me a bit, and reading implementation code, to understand what was meant
> by "Additional recovery parameters are added to avoid ...".  I decided that
> detail seemed unnecessary for the user-facing documentation - supposedly they
> will never see the recovery file if everything works OK (and if they do the
> various empty string settings should be reasonably self-evident).  If we do
> want to keep it I'll try to find a better wording.

To confirm; users can run SHOW command after the conversion and they can see
our setting. Do you think it is a user-facing?

Here are my comments. (I ran Grammarly to check your sentences)

01.
Your patch breaks 80-column-per-line rule. Please check it.

02.
```
+   The minimal command shown above...
```

"shown" can be removed.

03.
```
+   then created using auto-generated names, which can be overriden
```

s/overriden/overridden/.

04.
```
+   using the corresponding long-form command-line options, once each per database.
```

"," and "each" can be removed.

05.
```
+   does not perform an initial data copy.  Instead it relies on a LSN-based
```

Latter sentence can be "Inseted, it relies on an LSN-based..."

06.
```
+      <para>
+       Note, the configuration setting <literal>port</literal> in this file is effectively ignored
+       as <application>pg_createsubscriber</application> always specifies the target listening port
+       on the <application>pg_ctl</application> command-line.
+       Use the <option>-p</option> option to specify which port the target server should listen on.
       </para>
```

Can you clarify the <literal>port</literal> in the config file is ignored only
while the pg_createsuscriber commands?

07.
```
+    <application>pg_createsubscriber</application> is designed have full and
```

s/is designed have/is designed to have/.

Best regards,
Hayato Kuroda
FUJITSU LIMITED


Re: Documentation Edits for pg_createsubscriber

From
"David G. Johnston"
Date:
Thank you for the quick reply.

On Mon, Mar 10, 2025 at 7:56 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote:
Dear David,

> The main thing I noted is that our synopsis is simply wrong regarding the
> optionality of the --database option.  I went with a fix that included only
> showing the two mandatory options and adding a paragraph to usage explaining
> that --database exists and how to use it.  Our existing example also covers
> that common usage.

According to the code, -D and -P is anyway needed, but -d can be omitted when -P
contians dbname option. And you wanted to follow the rule, right?
IIUC shorter synopsis is always welcomed.

I'm having second thoughts here a bit as I absorb more.  I'm not really sure why we added --publication-name etc... but I was giving them more weight than is probably needed.  They don't seem like something a typical user would care about.  I'll probably add back -d but leave mention of all the long-form option names out still.  Or, my other thought, two lines of commands and explain the single-database and multi-database mechanics that distinguish them in the description.


> It took me a bit, and reading implementation code, to understand what was meant
> by "Additional recovery parameters are added to avoid ...".  I decided that
> detail seemed unnecessary for the user-facing documentation - supposedly they
> will never see the recovery file if everything works OK (and if they do the
> various empty string settings should be reasonably self-evident).  If we do
> want to keep it I'll try to find a better wording.

To confirm; users can run SHOW command after the conversion and they can see
our setting. Do you think it is a user-facing?

And they will see empty strings (or blanks or whatever the defaults are) for everything except LSN, like they always do, and not realize or care that our code generated file actually contained lines for them. (I haven't actually experimented here though.)  But in normal use they have no reason to inspect the state data generated by this process so which it is accessible as a matter of design it is not practically user-facing for a user of this application.  The only state they need to manage is that their WAL retention for their standby is large enough to execute the needed point-in-time recovery.


Here are my comments. (I ran Grammarly to check your sentences)

01.
Your patch breaks 80-column-per-line rule. Please check it.

Ok, though it seems like a very loosely enforced one in the documentation.


02.
```
+   The minimal command shown above...
```

"shown" can be removed.

 
ok
 
03.
```
+   then created using auto-generated names, which can be overriden
```

s/overriden/overridden/.


ok

04.
```
+   using the corresponding long-form command-line options, once each per database.
```

"," and "each" can be removed.


I'll ponder this one and explain if I leave it as-is.
 
05.
```
+   does not perform an initial data copy.  Instead it relies on a LSN-based
```

Latter sentence can be "Inseted, it relies on an LSN-based..."

 
probably, though I have the vague impression that this entire page has too many unneeded commas right now. But not obviously so.

 
06.
```
+      <para>
+       Note, the configuration setting <literal>port</literal> in this file is effectively ignored
+       as <application>pg_createsubscriber</application> always specifies the target listening port
+       on the <application>pg_ctl</application> command-line.
+       Use the <option>-p</option> option to specify which port the target server should listen on.
       </para>
```

Can you clarify the <literal>port</literal> in the config file is ignored only
while the pg_createsuscriber commands?

I'm not following what you want me to do here?  This note only applies while pg_createsubscriber manages the target service which the first second basically states directly.


07.
```
+    <application>pg_createsubscriber</application> is designed have full and
```

s/is designed have/is designed to have/.

yep

David J.

Re: Documentation Edits for pg_createsubscriber

From
Peter Smith
Date:
On Tue, Mar 11, 2025 at 8:20 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> Hi.
>
> With all the recent work on pg_createsubscriber I decided to take a look at it, namely the documentation.
>

I agree this page needs some improvement. Thanks for taking up this task.

> The 0001 patch is basically just fixing some typos and other minor edits.
>
> The 0002 includes some of the more editoralized (but still limited) changes.
>
> The main thing I noted is that our synopsis is simply wrong regarding the optionality of the --database option.  I
wentwith a fix that included only showing the two mandatory options and adding a paragraph to usage explaining that
--databaseexists and how to use it.  Our existing example also covers that common usage. 
>
> I also noted that we've started including both the long and short forms of the options in our synopsis.  I went with
the"choose the short option as the only one" as exemplified by pg_upgrade.  It seems quite noisy to include both in a
synopsisthat is fine with including 10+ other options under [options...]. 

I think the synopsis format is destined to change significantly in the
thread [1] ("Enhance 'pg_createsubscriber' to retrieve databases
automatically when no database is provided"). Maybe it is worth
checking that thread to see if what you have in mind is compatible
with that proposal.

FYI some months ago I posted a thread [2]  ("DOCS: Make the Server
Application docs synopses more consistent"), but unfortunately, it
received no responses. Perhaps you can take a look at that thread and
we can try to bash all of these tool synopses until they become more
consistent.

...
> I removed the use of the word "switch" and replaced it with "option" for consistency.

Yes, +1 for this. You pipped me by a few days from posting a very
similar patch. It might be OK if "switch" was used only for boolean
flags and "option" was used for others, but the master SGML for this
page was an inconsistent muddle, so changing everything to "option" as
you have done is probably the best way to go.

======

Some other minor comments:

1.
I felt that all the sentences like "If this option is not
specified..." overuse the word "option" which is now everywhere. It
might be nicer to just name the options explicitly sometimes

e.g. --publication
CURRENT: If this option is not specified, a generated name is assigned
to the publication name.
SUGGEST: If --publication is not specified, a generated name is
assigned to the publication name.

e.g. --replication-slot
CURRENT: If this option is not specified, the subscription name is
assigned to the replication slot name.
SUGGEST: If this --replication-slot is not specified, the subscription
name is assigned to the replication slot name.

e.g. --subscription
CURRENT: If this option is not specified, a generated name is assigned
to the subscription name.
SUGGEST: If this --subscription is not specified, a generated name is
assigned to the subscription name.

======
[1] https://www.postgresql.org/message-id/flat/CAHv8RjKhA%3D_h5vAbozzJ1Opnv%3DKXYQHQ-fJyaMfqfRqPpnC2bA%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/CAHut%2BPv%2B96CykSY6-uLKZWaa6to9x1DurmyJh8Jmu1_1P7hp4Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Documentation Edits for pg_createsubscriber

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear David,

> And they will see empty strings (or blanks or whatever the defaults are) for
> everything except LSN, like they always do, and not realize or care that our code
> generated file actually contained lines for them. (I haven't actually experimented
> here though.)

It is OK for me to omit the description, but there is a note - recovery_target_lsn
would be set with certain LSN.

> Ok, though it seems like a very loosely enforced one in the documentation.

Yeah, e.g., no need to break lines in the tag.

> I'm not following what you want me to do here?  This note only applies while
> pg_createsubscriber manages the target service which the first second basically
> states directly.

I expected like;

```
Note, the configuration setting <literal>port</literal> in this file is
effectively ignored while running <application>pg_createsubscriber</application>
as it always specifies the target listening port on the <application>pg_ctl</application>
command-line.
```

Best regards,
Hayato Kuroda
FUJITSU LIMITED