Thread: Glossary and initdb definition work for "superuser" and database/cluster

Glossary and initdb definition work for "superuser" and database/cluster

From
"David G. Johnston"
Date:
Hey,

Recent threads have pointed out some long-standing doc language in initdb that could be made more precise, especially in light of the relatively recent addition of a glossary.  Toward this end I'm attaching a patch that defines three terms: "bootstrap superuser", "database superuser" and "superuser".  I didn't add any extra-glossary links for the later two but did for the limited-in-scope bootstrap superuser that is really only defined in initdb (actually, I suspect the authorization docs could use a link too but haven't gone looking for an appropriate place yet).

In passing I also changed a few places where the documentation says "database" when the thing being referred to is basically the file system data directory, which is a cluster-scoped thing.

I did some grep'ing, though another pass or two is probably worthwhile.  For now I submit a preliminary patch for consideration and buy-in before trying to polish it up.

David J.

Attachment

Re: Glossary and initdb definition work for "superuser" and database/cluster

From
Justin Pryzby
Date:
On Tue, Nov 01, 2022 at 03:47:15PM -0700, David G. Johnston wrote:
> Hey,
> 
> Recent threads have pointed out some long-standing doc language in initdb
> that could be made more precise, especially in light of the relatively
> recent addition of a glossary.  Toward this end I'm attaching a patch that
> defines three terms: "bootstrap superuser", "database superuser" and
> "superuser".  I didn't add any extra-glossary links for the later two but
> did for the limited-in-scope bootstrap superuser that is really only
> defined in initdb (actually, I suspect the authorization docs could use a
> link too but haven't gone looking for an appropriate place yet).
> 
> In passing I also changed a few places where the documentation says
> "database" when the thing being referred to is basically the file system
> data directory, which is a cluster-scoped thing.
> 
> I did some grep'ing, though another pass or two is probably worthwhile.
> For now I submit a preliminary patch for consideration and buy-in before
> trying to polish it up.

I think this is wrong:

| https://www.postgresql.org/docs/devel/app-initdb.html
| -U username
| --username=username
| 
|     Selects the user name of the database superuser. This defaults to
|     the name of the effective user running initdb [...]

It's true that the user who runs initdb is typically named "postgres",
but that's only by convention.

>+     This user owns all system catalog tables in each database.  It also is the role
>+     from which all granted permission originate.  Because of these things this
>+     role may not be dropped.

plural permissions

these comma

>+     While the <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is
>+     a database superuser it has special obligations and restrictions that plain database superusers do not.

comma it

>+  <glossentry id="glossary-superuser">
>+   <glossterm>Superuser</glossterm>
>+   <glossdef>
>+    <para>
>+     As used in this documentation it is a synonym for

comma it

>    Creating a database cluster consists of creating the directories in
>-   which the database data will live, generating the shared catalog
>+   which the cluster data will live, generating the shared catalog

+1

>    tables (tables that belong to the whole cluster rather than to any
>-   particular database), and creating the <literal>postgres</literal>,
>-   <literal>template1</literal>, and <literal>template0</literal> databases.
>+   particular database), creating the <literal>postgres</literal>,
>+   <literal>template1</literal>, and <literal>template0</literal> databases,
>+   and creating the
>+   <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>
>+   (<literal>postgres</literal>, by default).

"postgres" is wrong

>     For security reasons the new cluster created by <command>initdb</command>
>-    will only be accessible by the cluster owner by default.  The
>+    will only be accessible by the cluster user by default.  The

I prefer "cluster owner"

>         <command>initdb</command>, but you can avoid writing it by
>         setting the <envar>PGDATA</envar> environment variable, which
>         can be convenient since the database server
>-        (<command>postgres</command>) can find the database
>+        (<command>postgres</command>) can find the data
>         directory later by the same variable.

+1

>-        Makes <command>initdb</command> read the database superuser's password
>+        Makes <command>initdb</command> read the bootstrap superuser's password
>         from a file.  The first line of the file is taken as the password.

+1

>-        Safely write all database files to disk and exit.  This does not
>+        Safely write all database cluster files to disk and exit.  This does not

+1

>         It may be useful to adjust this size to control the granularity of
>-        WAL log shipping or archiving.  Also, in databases with a high volume
>+        WAL log shipping or archiving.  Also, in clusters with a high volume
>         of WAL, the sheer number of WAL files per directory can become a

+1



Re: Glossary and initdb definition work for "superuser" and database/cluster

From
"David G. Johnston"
Date:
On Tue, Nov 1, 2022 at 5:20 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Nov 01, 2022 at 03:47:15PM -0700, David G. Johnston wrote:


I think this is wrong:

| https://www.postgresql.org/docs/devel/app-initdb.html
| -U username
| --username=username
|
|     Selects the user name of the database superuser. This defaults to
|     the name of the effective user running initdb [...]

It's true that the user who runs initdb is typically named "postgres",
but that's only by convention.

Thanks.  I feel bad for missing this one given that I've been working on fixing up the default libpq user name wording.
 

>+     This user owns all system catalog tables in each database.  It also is the role
>+     from which all granted permission originate.  Because of these things this
>+     role may not be dropped.

plural permissions
+1
 

these comma

things comma actually (+0.5)


>+     While the <glossterm linkend="glossary-bootstrap-superuser">bootstrap superuser</glossterm> is
>+     a database superuser it has special obligations and restrictions that plain database superusers do not.

comma it

+ 0.5

>    tables (tables that belong to the whole cluster rather than to any
>-   particular database), and creating the <literal>postgres</literal>,
>-   <literal>template1</literal>, and <literal>template0</literal> databases.
>+   particular database), creating the <literal>postgres</literal>,
>+   <literal>template1</literal>, and <literal>template0</literal> databases,
>+   and creating the
>+   <glossterm linkend="glossary-bootstrap-superuser">boostrap superuser</glossterm>
>+   (<literal>postgres</literal>, by default).

"postgres" is wrong

Yep, will give this another look to see if anywhere but the actual option description wants to cover how this really works (or maybe just point the reader there).


>     For security reasons the new cluster created by <command>initdb</command>
>-    will only be accessible by the cluster owner by default.  The
>+    will only be accessible by the cluster user by default.  The

I prefer "cluster owner"

I'll either need to change it back or fix the one in the next sentence...

I'm still leaning toward continuing to use cluster user like everywhere else on the page instead of adding a new term.  The fact that this doesn't work on Windows makes having it in the description section at all arguable.  I'd rather rewrite it something like:

"On POSIX systems, the resulting data directory, and all of its contents, will have permissions of 700, though you can use --allow-group-access to instead get 750.  In either case, the effective user running initdb will become the owner and group for the files created within the data directory."

(I haven't tried to prove this owner:group dynamic, but having 700 or 750 and specifying the alternative does result in the directory having its permission bits changed during initdb)

Feel free to suggest something if similar wording should be added for non-POSIX systems.

I intend to try and integrate something like the above to replace the existing paragraph in the next version.

Thank you for the review!

David J.

P.S. I'm now looking at the very first paragraph to initdb more closely, not liking "single server instance" all that much and wondering how to fit in "cluster user" there - possibly by saying something like "...managed by a single server process, and physical data directory, whose effective user and owner respectively is called the cluster user.  That user must exist and be used to execute this program."

Then the whole "initdb must be run as..." paragraph can probably just go away.  Moving the commentary about "root", again a non-Windows thing, to the notes area.


Re: Glossary and initdb definition work for "superuser" and database/cluster

From
"David G. Johnston"
Date:
On Tue, Nov 1, 2022 at 6:59 PM David G. Johnston <david.g.johnston@gmail.com> wrote:

P.S. I'm now looking at the very first paragraph to initdb more closely, not liking "single server instance" all that much and wondering how to fit in "cluster user" there - possibly by saying something like "...managed by a single server process, and physical data directory, whose effective user and owner respectively is called the cluster user.  That user must exist and be used to execute this program."

Then the whole "initdb must be run as..." paragraph can probably just go away.  Moving the commentary about "root", again a non-Windows thing, to the notes area.


Version 2 attached, some significant re-working.  Starting to think that initdb isn't the place for some of this content - in particular the stuff I'm deciding to move down to the Notes section.  Might consider moving some of it to the Server Setup and Operation chapter 19 - Creating Cluster (or nearby...) [1].

I settled on "cluster owner" over "cluster user" and made the terminology consistent throughout initdb and the glossary (haven't looked at chapter 19 yet).  Also added it to the glossary.

Moved quite a bit of material to notes from the description and options and expanded upon what had already been said based upon various discussions I've been part of on the mailing lists.

Decided to call out, in the glossary, the effective equivalence of database superuser and cluster owner.  Which acts as an explanation as to why root is prohibited to be a cluster owner.

David J.


Attachment

Re: Glossary and initdb definition work for "superuser" and database/cluster

From
Alvaro Herrera
Date:
On 2022-Nov-02, David G. Johnston wrote:

> Version 2 attached, some significant re-working.  Starting to think that
> initdb isn't the place for some of this content - in particular the stuff
> I'm deciding to move down to the Notes section.  Might consider moving some
> of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
> nearby...) [1].
> 
> I settled on "cluster owner" over "cluster user" and made the terminology
> consistent throughout initdb and the glossary (haven't looked at chapter 19
> yet).  Also added it to the glossary.

Generally speaking, I like the idea of documenting these things.
However it sounds like you're not done with the wording and editing, so
I'm not committing the whole patch, but it seems a good starting point
to at least have some basic definitions.  So I've extracted them from
your patch and pushed those.  You can already see it at
https://www.postgresql.org/docs/devel/glossary.html

I left out almost all the material from the patch that's not in the
glossary proper, and also a few phrases in the glossary itself.  Some of
these sounded like security considerations rather than part of the
definitions.  I think we should have a separate chapter in Part III
(Server Administration) that explains many security aspects; right now
there's no hope of collecting a lot of very important advice in a single
place, so a wannabe admin has no chance of getting things right.  That
seems to me a serious deficiency.  A new chapter could provide a lot of
general advice on every aspect that needs to be considered, and link to
the reference section for additional details.  Maybe part of these
initdb considerations could be there, too.

> Moved quite a bit of material to notes from the description and options and
> expanded upon what had already been said based upon various discussions
> I've been part of on the mailing lists.

Please rebase.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Always assume the user will do much worse than the stupidest thing
you can imagine."                                (Julien PUYDT)



Re: Glossary and initdb definition work for "superuser" and database/cluster

From
"David G. Johnston"
Date:
On Fri, Nov 18, 2022 at 4:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Nov-02, David G. Johnston wrote:

> Version 2 attached, some significant re-working.  Starting to think that
> initdb isn't the place for some of this content - in particular the stuff
> I'm deciding to move down to the Notes section.  Might consider moving some
> of it to the Server Setup and Operation chapter 19 - Creating Cluster (or
> nearby...) [1].
>
> I settled on "cluster owner" over "cluster user" and made the terminology
> consistent throughout initdb and the glossary (haven't looked at chapter 19
> yet).  Also added it to the glossary.

Generally speaking, I like the idea of documenting these things.
However it sounds like you're not done with the wording and editing, so
I'm not committing the whole patch, but it seems a good starting point
to at least have some basic definitions.  So I've extracted them from
your patch and pushed those.  You can already see it at
https://www.postgresql.org/docs/devel/glossary.html

Agreed on the not quite ready yet, and that the glossary is indeed self-contained enough to go in by itself at this point.  Thank you for doing that.


I left out almost all the material from the patch that's not in the
glossary proper, and also a few phrases in the glossary itself.  Some of
these sounded like security considerations rather than part of the
definitions.  I think we should have a separate chapter in Part III
(Server Administration) that explains many security aspects; right now
there's no hope of collecting a lot of very important advice in a single
place, so a wannabe admin has no chance of getting things right.  That
seems to me a serious deficiency.  A new chapter could provide a lot of
general advice on every aspect that needs to be considered, and link to
the reference section for additional details.  Maybe part of these
initdb considerations could be there, too.

I'll consider that approach as well as other spots in the documentation on this next pass.

David J.