Thread: Re: Additional Chapter for Tutorial

Re: Additional Chapter for Tutorial

From
"David G. Johnston"
Date:
Removing -docs as moderation won’t let me cross-post.

On Monday, October 26, 2020, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, October 26, 2020, Jürgen Purtz <juergen@purtz.de> wrote:
On 21.10.20 22:33, David G. Johnston wrote:

Two, I find the amount of detail being provided here to be on the too-much side.  A bit more judicious use of links into the appropriate detail chapters seems warranted.

The patch is intended to give every interested person an overall impression of the chapter within its new position. Because it has moved from part 'Tutorial' to 'Internals' the text should be very accurate concerning technical issues - like all the other chapters in this part. A tutorial chapter has a more superficial nature.

Haven’t reviewed the patches yet but...

I still think that my comment applies even with the move to internals.  The value here is putting together a coherent narrative and making deeper implementation details accessible.  If those details are already covered elsewhere in the documentation (not source code) links should be given serious consideration.

David J.

Re: Additional Chapter for Tutorial

From
Jürgen Purtz
Date:
On 26.10.20 15:53, David G. Johnston wrote:
Removing -docs as moderation won’t let me cross-post.

On Monday, October 26, 2020, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Monday, October 26, 2020, Jürgen Purtz <juergen@purtz.de> wrote:
On 21.10.20 22:33, David G. Johnston wrote:

Two, I find the amount of detail being provided here to be on the too-much side.  A bit more judicious use of links into the appropriate detail chapters seems warranted.

The patch is intended to give every interested person an overall impression of the chapter within its new position. Because it has moved from part 'Tutorial' to 'Internals' the text should be very accurate concerning technical issues - like all the other chapters in this part. A tutorial chapter has a more superficial nature.

Haven’t reviewed the patches yet but...

I still think that my comment applies even with the move to internals.  The value here is putting together a coherent narrative and making deeper implementation details accessible.  If those details are already covered elsewhere in the documentation (not source code) links should be given serious consideration.

David J.

Please find the new patch in the attachment after integrating David's suggestions: a) versus the last patch and b) versus master.

Notably it contains

  • nearly all of his suggestions (see sgml file for comments 'DGJ')
  • reduction of <firstterm>. This was a hangover from the pre-glossary-times. I tried to emphasis standard terms. This is no longer necessary because nowadays they are clearly defined in the glossary.

--

J. Purtz


Attachment

Re: Additional Chapter for Tutorial

From
Erik Rijkers
Date:
On 2020-10-30 11:57, Jürgen Purtz wrote:
> On 26.10.20 15:53, David G. Johnston wrote:
>> Removing -docs as moderation won’t let me cross-post.
>> 

Hi,

I applied 0009-architecture-vs-master.patch to head
and went through architecture.sgml (only that file),
then produced the attached .diff


And I wrote down some separate items:

1.
'Two Phase Locking' and 'TPL' should be, I think,
'Two-Phase Commit'. Please someone confirm.
(no changes made)

2.
To compare xid to sequence because they similarly 'count up' seems a bad 
idea.
(I don't think it's always true in the case of sequences)
(no changes made)

3.
'accesses' seems a somewhat strange word most of the time just 'access' 
may be better.  Not sure - native speaker wanted. (no changes made)

4.
'heap', in postgres, means often (always?) files. But more generally, 
the meaning is more associated with memory.  Therefore it would be good 
I think to explicitly use 'heap file' at least in the beginning once to 
make clear that heap implies 'safely written away to disk'.  Again, I'm 
not quite sure if my understanding is correct - I have made no changes 
in this regard.



Erik Rijkers

Attachment

Re: Additional Chapter for Tutorial

From
Jürgen Purtz
Date:
On 30.10.20 17:45, Erik Rijkers wrote:
> Hi,
>
> I applied 0009-architecture-vs-master.patch to head
> and went through architecture.sgml (only that file),
> then produced the attached .diff
>
>
> And I wrote down some separate items:
>
> 1.
> 'Two Phase Locking' and 'TPL' should be, I think,
> 'Two-Phase Commit'. Please someone confirm.
> (no changes made)
>
> 2.
> To compare xid to sequence because they similarly 'count up' seems a 
> bad idea.
> (I don't think it's always true in the case of sequences)
> (no changes made)
>
> 3.
> 'accesses' seems a somewhat strange word most of the time just 
> 'access' may be better.  Not sure - native speaker wanted. (no changes 
> made)
>
> 4.
> 'heap', in postgres, means often (always?) files. But more generally, 
> the meaning is more associated with memory.  Therefore it would be 
> good I think to explicitly use 'heap file' at least in the beginning 
> once to make clear that heap implies 'safely written away to disk'.  
> Again, I'm not quite sure if my understanding is correct - I have made 
> no changes in this regard.
>
>
>
> Erik Rijkers

All suggestions so far are summarized in the attached patch with the 
following exceptions:

- 'Two Phase Locking' is the intended term.

- Not adopted:

      Second, the transfer of dirty buffers from Shared Memory to
      files must take place. This is the primary task of the
-    Background Writer process. Because I/O activities can block
+    Checkpointer process. Because I/O activities can block
      other processes, it starts periodically and

Partly adopted:

-    the data in the old version of the row does not change! ...

-    before. Nothing is thrown away so far! Only <literal>xmax</literal> ...

--

J. Purtz





Attachment

Re: Additional Chapter for Tutorial

From
Erik Rijkers
Date:
On 2020-11-01 16:38, Jürgen Purtz wrote:
> On 30.10.20 17:45, Erik Rijkers wrote:
>> 
>> And I wrote down some separate items:
>> 
>> 1.
>> 'Two Phase Locking' and 'TPL' should be, I think,
>> 'Two-Phase Commit'. Please someone confirm.
>> (no changes made)
>> 
>> Erik Rijkers
> 
> All suggestions so far are summarized in the attached patch with the
> following exceptions:
> 
> - 'Two Phase Locking' is the intended term.

OK, so what is 'Two Phase Locking'?  The term is not explained, and not 
used anywhere else in the manual.  You propose to introduce it here, in 
the tutorial.  I don't know what it means, and I am not really a 
beginner.

'Two Phase Locking' should be explained somewhere, and how it relates 
(or not) to Two-Phase Commit (2PC), don't you agree?


Erik Rijkers

























Re: Additional Chapter for Tutorial

From
Jürgen Purtz
Date:
On 02.11.20 07:15, Erik Rijkers wrote:
> On 2020-11-01 16:38, Jürgen Purtz wrote:
>> On 30.10.20 17:45, Erik Rijkers wrote:
>>>
>>> And I wrote down some separate items:
>>>
>>> 1.
>>> 'Two Phase Locking' and 'TPL' should be, I think,
>>> 'Two-Phase Commit'. Please someone confirm.
>>> (no changes made)
>>>
>>> Erik Rijkers
>>
>> All suggestions so far are summarized in the attached patch with the
>> following exceptions:
>>
>> - 'Two Phase Locking' is the intended term.
>
> OK, so what is 'Two Phase Locking'?  The term is not explained, and 
> not used anywhere else in the manual.  You propose to introduce it 
> here, in the tutorial.  I don't know what it means, and I am not 
> really a beginner.
>
> 'Two Phase Locking' should be explained somewhere, and how it relates 
> (or not) to Two-Phase Commit (2PC), don't you agree?
>
>
> Erik Rijkers
>
>
It may be possible to explain OCC and 2PL in two or three sentences 
within the glossary. But I think, we shall not try to explain such 
general strategies. They are not specific to PG and even not 
implemented. Instead, if the paragraph is too detailed, we can use a 
more general formulation without explicitly naming locking strategies.

OLD:

     A first approach to implement protections against concurrent
     access to the same data may be the locking of critical
     rows. Two such techniques are:
     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
     and <emphasis>Two Phase Locking</emphasis> (2PL).
     <productname>PostgreSQL</productname> implements a third, more
     sophisticated technique: <firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC ...

Proposal:

     A first approach to implement protections against concurrent
     access to the same data may be the locking of critical
     rows.
     <productname>PostgreSQL</productname> implements a more
     sophisticated technique which avoids any locking: 
<firstterm>Multiversion Concurrency
     Control</firstterm> (MVCC). The crucial advantage of MVCC ...

Any thoughts or other suggestions?

--

J. Purtz





Re: Additional Chapter for Tutorial

From
Erik Rijkers
Date:
On 2020-11-02 09:26, Jürgen Purtz wrote:

> OLD:
> 
>     A first approach to implement protections against concurrent
>     access to the same data may be the locking of critical
>     rows. Two such techniques are:
>     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
>     and <emphasis>Two Phase Locking</emphasis> (2PL).
>     <productname>PostgreSQL</productname> implements a third, more
>     sophisticated technique: <firstterm>Multiversion Concurrency
>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
> 
> Proposal:
> 
>     A first approach to implement protections against concurrent
>     access to the same data may be the locking of critical
>     rows.
>     <productname>PostgreSQL</productname> implements a more
>     sophisticated technique which avoids any locking:
> <firstterm>Multiversion Concurrency
>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
> 
> Any thoughts or other suggestions?
> 

Yes, just leave it out. Much better, as far as I'm concerned.

Erik





Re: Additional Chapter for Tutorial

From
Jürgen Purtz
Date:
On 02.11.20 09:44, Erik Rijkers wrote:
> On 2020-11-02 09:26, Jürgen Purtz wrote:
>
>> OLD:
>>
>>     A first approach to implement protections against concurrent
>>     access to the same data may be the locking of critical
>>     rows. Two such techniques are:
>>     <emphasis>Optimistic Concurrency Control</emphasis> (OCC)
>>     and <emphasis>Two Phase Locking</emphasis> (2PL).
>>     <productname>PostgreSQL</productname> implements a third, more
>>     sophisticated technique: <firstterm>Multiversion Concurrency
>>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
>>
>> Proposal:
>>
>>     A first approach to implement protections against concurrent
>>     access to the same data may be the locking of critical
>>     rows.
>>     <productname>PostgreSQL</productname> implements a more
>>     sophisticated technique which avoids any locking:
>> <firstterm>Multiversion Concurrency
>>     Control</firstterm> (MVCC). The crucial advantage of MVCC ...
>>
>> Any thoughts or other suggestions?
>>
>
> Yes, just leave it out. Much better, as far as I'm concerned.
>
> Erik
>
>
Because there have been no more comments in the last days I created a 
consolidated patch. It contains Erik's suggestion and some tweaks for 
the text size within graphics.

--

J. Purtz


Attachment

Re: Additional Chapter for Tutorial

From
Erik Rijkers
Date:
On 2020-11-07 13:24, Jürgen Purtz wrote:
>> 
> Because there have been no more comments in the last days I created a
> consolidated patch. It contains Erik's suggestion and some tweaks for
> the text size within graphics.
> 
> [0011-architecture.patch]

Hi,

I went through architecture.sgml once more; some proposed changes 
attached.

And in some .svg files I noticed 'jungest' which should be 'youngest', I 
suppose.
I did not change them but below is filelist of  grep -l 'jung'.

./doc/src/sgml/images/freeze-ink.svg
./doc/src/sgml/images/freeze-ink-svgo.svg
./doc/src/sgml/images/freeze-raw.svg
./doc/src/sgml/images/wraparound-ink.svg
./doc/src/sgml/images/wraparound-ink-svgo.svg
./doc/src/sgml/images/wraparound-raw.svg


Thanks,

Erik



Attachment

Re: Additional Chapter for Tutorial

From
Jürgen Purtz
Date:
On 07.11.20 20:15, Erik Rijkers wrote:
> On 2020-11-07 13:24, Jürgen Purtz wrote:
>>>
>> Because there have been no more comments in the last days I created a
>> consolidated patch. It contains Erik's suggestion and some tweaks for
>> the text size within graphics.
>>
>> [0011-architecture.patch]
>
> Hi,
>
> I went through architecture.sgml once more; some proposed changes 
> attached.
>
> And in some .svg files I noticed 'jungest' which should be 'youngest', 
> I suppose.
> I did not change them but below is filelist of  grep -l 'jung'.
>
> ./doc/src/sgml/images/freeze-ink.svg
> ./doc/src/sgml/images/freeze-ink-svgo.svg
> ./doc/src/sgml/images/freeze-raw.svg
> ./doc/src/sgml/images/wraparound-ink.svg
> ./doc/src/sgml/images/wraparound-ink-svgo.svg
> ./doc/src/sgml/images/wraparound-raw.svg
>
>
> Thanks,
>
> Erik
>
>
Good catches. Everything applied.

--

J. Purtz


Attachment

Re: Additional Chapter for Tutorial

From
"David G. Johnston"
Date:
On Sun, Nov 8, 2020 at 8:56 AM Jürgen Purtz <juergen@purtz.de> wrote:

Good catches. Everything applied.

Reviewed the first three sections.

template0 - I would remove the schema portions of this and simply note this as being a pristine recovery database in the diagram.

I would drop the word "more" and just say "system schemas".  I would drop pg_toast from the list of system schema and focus on the three user-facing ones.

Instead of "my_schema" (optional) I would do "my_schema" (example)

Server Graphic
#3 Global SQL Objects: Objects which are shared among all databases within a cluster.
#6 Client applications are prohibited from connecting to template0
#1 If by you we mean "the client" saying that you work "in the cluster data" doesn't really help.  I would emphasize the point that the client sees an endpoint the Postmaster publishes as a port or socket file and that plus the database name defines the endpoint the client connects to (meld with #5)

In lieu of some of the existing detail provided about structure I would add information about configuration and search_path at this level.

I like the object type enumeration - I would suggest grouping them by type in a manner consistent with the documentation and making each one a link to its "primary" section - the SQL Command reference if all else fails.

The "i" in internal in 51.3 (the image) needs capitalization).

You correctly add both Extension and Collation as database-level objects but they are not mentioned anywhere else.  They do belong here and need to be tied in properly in the text.

The whole thing needs a good pass focused on capitalization.  Both for typos and to decide when various primary concepts like Instance should be capitalized and when not.

51.4 - When you look at the diagram seeing /pg/data/base looks really cool, but when reading the prose where both the "pg" and the "base" are omitted and all you get are repeated references to "data", the directory name choice becomes an issue IMO.  I suggest (and changed the attached) to name the actual root directory "pgdata".  You should change the /pg/ directory name to something like ".../tutorial_project/".

Since you aren't following alphabetical order anyway I would place pg_tblspc after globals since tablespaces are globals and thus proximity links them here - and pointing out that pg_tblspc holds the data makes stating that global doesn't contain tablespace data unnecessary.

Maybe point out somewhere the the "base/databaseOID" directory represents the default tablespace for each database, which isn't "global", only the non-default tablespaces are considered globals (or just get rid of the mentioned on "non-default tablespace" for now).

David J.

Attachment

Re: Additional Chapter for Tutorial

From
"David G. Johnston"
Date:
On Sun, Nov 8, 2020 at 8:56 AM Jürgen Purtz <juergen@purtz.de> wrote:
Good catches. Everything applied.

MVCC Section

The first paragraph and example in the MVCC section is a good example but seems misplaced - its relationship to MVCC generally is tenuous, rather I would expect a discussion of the serializable isolation mode to follow.

I'm not sure how much detail this section wants to get into given the coverage of concurrency elsewhere in the documentation.  "Not much" would be my baseline.

I would suggest spelling out what "OLTP" stands for and ideally pointing the user to the glossary for the term.

Tending more toward a style gripe but the amount of leader phrases and redundancy are at a level that I am noticing them when I read this but do not have the same impression having read large portions of documentation. In particular:

"When we speak about transaction IDs, you need to know that xids are like sequences."

"But keep in mind that xids are independent of any time measurement — in milliseconds or otherwise. If you dive deeper into PostgreSQL, you will recognize parameters with names such as 'xxx_age'. Despite their names, these '_age' parameters do not specify a period of time but represent a certain number of transactions, e.g., 100 million."

Could just be:  xids are sequences and age computations involving them measure a transaction count as opposed to a time interval.

Then I would consider adding a bit more detail/context here.

xids are 32bit sequences, with a reserved value to handle wrap-around.  There are 4 billion values in the sequence but wrap-around handling must occur every 2 billion transactions. Age computations involving xids measure a transaction count as opposed to a time interval.

I would move the mentioning of "vacuum" to the main paragraph about delete and not solely as a "keep in mind" note.

The part before the diagram seems like it should be much shorter, concise, and provide links to the excellent documentation.  The part after the image, and the image itself, are good material, though possibly should be in a main administration chapter instead of an internals chapter.

The first bullet of "keep in mind" is both wordy and wrong - in particular "as xids grow old row versions get out of scope over time" doesn't make sense (or rather it only does in the context of wrap-around, not normal visibility).  Having the only mention of bloat be here is also not ideal, it too should be weaved into the main narrative.  The "keep in mind" section here should be a recap of already covered material in a succinct form, nothing should be new to someone who just read the entire section.

I don't think that usage of exclamation marks (!) is warranted here, though emphasis on the key phrase wouldn't hurt.

Vacuum Section

avoid -> prevent (continued growth)

Autovacuum is enabled by default.  The whole note needs commas.

I'd try to get rid of "at arbitrary point in time"

"Instance." we've already described where instances are previously ("on the server")

The other sections - these seem misplaced for the tutorial, update the main documentation if this information is wholly missing or lacking.  The MVCC chapter can incorporate overview information as it is a strict consequence of that implementation.

Statistics belong elsewhere - the tutorial should not use poor command implementation choices as a guide for user education.

In short, this whole section should not exist and its content moved to more appropriate areas (mainly MVCC).  Vacuum is a tool that one must use but the narrative should be about the system generally.

David J.

Re: Additional Chapter for Tutorial

From
Jürgen Purtz
Date:
On 10.11.20 00:14, David G. Johnston wrote:
Reviewed the first three sections.

template0 - I would remove the schema portions of this and simply note this as being a pristine recovery database in the diagram.
ok

I would drop the word "more" and just say "system schemas".  I would drop pg_toast from the list of system schema and focus on the three user-facing ones.
ok

Instead of "my_schema" (optional) I would do "my_schema" (example)
The terms 'optional' and 'default' are used at various places with their literal meaning. We shall not change them.

Server Graphic
#3 Global SQL Objects: Objects which are shared among all databases within a cluster.
#6 Client applications are prohibited from connecting to template0
ok
#1 If by you we mean "the client" saying that you work "in the cluster data" doesn't really help.  I would emphasize the point that the client sees an endpoint the Postmaster publishes as a port or socket file and that plus the database name defines the endpoint the client connects to (meld with #5)
ok, with some changes.

In lieu of some of the existing detail provided about structure I would add information about configuration and search_path at this level.
Search path appended. But IMO configuration questions are out of scope of this sub-chapter.

I like the object type enumeration - I would suggest grouping them by type in a manner consistent with the documentation and making each one a link to its "primary" section - the SQL Command reference if all else fails.
ok. But don't how to group them in a better way.

The "i" in internal in 51.3 (the image) needs capitalization).
ok

You correctly add both Extension and Collation as database-level objects but they are not mentioned anywhere else.  They do belong here and need to be tied in properly in the text.
Have some courage to the gap, it's an introductory chapter.

The whole thing needs a good pass focused on capitalization.  Both for typos and to decide when various primary concepts like Instance should be capitalized and when not.
'Instance' and 'Cluster' are now uppercase because of their importance, everything else lowercase for better reading.

51.4 - When you look at the diagram seeing /pg/data/base looks really cool, but when reading the prose where both the "pg" and the "base" are omitted and all you get are repeated references to "data", the directory name choice becomes an issue IMO.  I suggest (and changed the attached) to name the actual root directory "pgdata".  You should change the /pg/ directory name to something like ".../tutorial_project/".

The graphic shall reflect the default behavior of PG. Without the parameter -D, initdb creates the new cluster in the directory where PGDATA points to. This is in many cases /var/lib/pgsql/data. Therefore 'data' and its subdirectory 'base' are not my invention but reflects the default situation.

(Diving a little deeper into this issue I noticed that there is a parameter 'cluster_name' in the config file. But it does not change the name of the cluster's root directory, it only changes the names of the running processes. Choosing 'instance_name' instead of 'cluster_name' as the parameter's name would be a better choice imo - but that is not what we are speaking about in the context of the new chapter).

I changed the very first directory in the graphic to visualize the standard behavior; I reverted your recommendation to use 'pgdata' instead of 'data' in the text part.

Since you aren't following alphabetical order anyway I would place pg_tblspc after globals since tablespaces are globals and thus proximity links them here - and pointing out that pg_tblspc holds the data makes stating that global doesn't contain tablespace data unnecessary.
ok

Maybe point out somewhere the the "base/databaseOID" directory represents the default tablespace for each database, which isn't "global", only the non-default tablespaces are considered globals (or just get rid of the mentioned on "non-default tablespace" for now).

ok

more:

1) some changes concerning the nature of connections (52.2: logical perspective). IMO accessing multiple databases within one connection is not a question of configuring, you have to take more actions. But I'm not sure we should mention this at all.

2) you propose to cancel or trim down the paragraphs behind figure 51.2. (cluster, database, schema). I believe that a textual description of this hierarchy is essential for the understanding of the system. Because it isn't described explicitly at a different place, it should remain.

--- snipp -------- from other e-mail ----

MVCC Section

The first paragraph and example in the MVCC section is a good example but seems misplaced - its relationship to MVCC generally is tenuous, rather I would expect a discussion of the serializable isolation mode to follow.

I'm not sure how much detail this section wants to get into given the coverage of concurrency elsewhere in the documentation.  "Not much" would be my baseline.
The paragraph focus on the fact that new row versions are generated instead of locking something. Explaining serialization isolation modes is imo very complicate and out of the scope of this subchapter. If we want to give an overview - in addition to the exiting documentation - it should be a separate subchapter.

I would suggest spelling out what "OLTP" stands for and ideally pointing the user to the glossary for the term.
ok, but not added to glossary. The given explanation "... with a massive number of concurrent write actions" should be sufficient.

Tending more toward a style gripe but the amount of leader phrases and redundancy are at a level that I am noticing them when I read this but do not have the same impression having read large portions of documentation. In particular:
Because I'm not a native English speaker, orthographic and style hits are always welcome.

"When we speak about transaction IDs, you need to know that xids are like sequences."

"But keep in mind that xids are independent of any time measurement — in milliseconds or otherwise. If you dive deeper into PostgreSQL, you will recognize parameters with names such as 'xxx_age'. Despite their names, these '_age' parameters do not specify a period of time but represent a certain number of transactions, e.g., 100 million."

Could just be:  xids are sequences and age computations involving them measure a transaction count as opposed to a time interval.
ok

Then I would consider adding a bit more detail/context here.

xids are 32bit sequences, with a reserved value to handle wrap-around.  There are 4 billion values in the sequence but wrap-around handling must occur every 2 billion transactions. Age computations involving xids measure a transaction count as opposed to a time interval.

I would move the mentioning of "vacuum" to the main paragraph about delete and not solely as a "keep in mind" note.
The mentioning here at the food of the page is a crossover to the next subchapter.

The part before the diagram seems like it should be much shorter, concise, and provide links to the excellent documentation.  The part after the image, and the image itself, are good material, though possibly should be in a main administration chapter instead of an internals chapter.

vacuum: The problem - and one reason for the existence of this subchapter - is that vacuum's documentation is scattered across may pages:

19.4: parameters to configure the server, especially five parameters 'vacuum_cost_xxx'.

19.10: parameters to configure autovacuum.

19.11: parameters to configure client connections, especially five parameters 'vacuum_xxx' concerning their freeze-behavior.

24.1: explains the general necessity of (auto)vacuum and their strategies.

The page about the SQL command VACUUM explains the different options (FULL, FREEZE, ..) and their meaning.

Because of the structure of our documentation as well as the complexity of the issue that's ok. The existing documentation describes every parameter very well, but I'm missing a page where the 'big picture' of vacuum is explained (not necessarily here). It shall show the relationship between the huge number of parameters and an explanation *why* they exists. As far as we don't have such a page within the vacuum documentation the proposed subchapter fills the gap. (The provided graphics can be included multiple times without generating redundancies - here and at arbitrary other places.)


The first bullet of "keep in mind" is both wordy and wrong - in particular "as xids grow old row versions get out of scope over time" doesn't make sense (or rather it only does in the context of wrap-around, not normal visibility).  Having the only mention of bloat be here is also not ideal, it too should be weaved into the main narrative.  The "keep in mind" section here should be a recap of already covered material in a succinct form, nothing should be new to someone who just read the entire section.
ok.

I don't think that usage of exclamation marks (!) is warranted here, though emphasis on the key phrase wouldn't hurt.
ok

Vacuum Section

avoid -> prevent (continued growth)
ok

Autovacuum is enabled by default.  The whole note needs commas.
ok

I'd try to get rid of "at arbitrary point in time"
ok

"Instance." we've already described where instances are previously ("on the server")
ok

The other sections - these seem misplaced for the tutorial, update the main documentation if this information is wholly missing or lacking.  The MVCC chapter can incorporate overview information as it is a strict consequence of that implementation.

Statistics belong elsewhere - the tutorial should not use poor command implementation choices as a guide for user education.

In short, this whole section should not exist and its content moved to more appropriate areas (mainly MVCC).  Vacuum is a tool that one must use but the narrative should be about the system generally.


concerning vacuum section: see my comments above

concerning 'the other sections' (transactions, reliability, backup (plus: someone should add 'replication', I'm not familiar with this issue)): The intention of the chapter is to give a *summary* about PG's essential architecture and about central implementation aspects. This implies that the chapters does not present any new information. They shall only show (or repeat) essential things in their context and explain *why* they are used. In this sense the three chapters may be reasonable. Concerning this, I like to hear some comments from other people.


Attachments:

0013-architecture.patch: complete patch vs. master

0013-architecture.sgml.diff: changes in file architecture.sgml since 0012

0013-images.diff: changes in files *-raw.svg since 0012

--

J. Purtz


Attachment

Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Erik Rijkers
Date:
On 2020-11-15 19:45, Jürgen Purtz wrote:
>> 

(smallish) Changes to arch-dev.sgml

Erik

Attachment

Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Heikki Linnakangas
Date:
On 20/11/2020 23:52, Erik Rijkers wrote:
> (smallish) Changes to arch-dev.sgml

This looks good to me. One little complaint:

> @@ -125,7 +122,7 @@
>      use a <firstterm>supervisor process</firstterm> (also
>      <firstterm>master process</firstterm>) that spawns a new
>      server process every time a connection is requested. This supervisor
> -    process is called <literal>postgres</literal> and listens at a
> +    process is called <literal>postgres</literal> (formerly 'postmaster') and listens at a
>      specified TCP/IP port for incoming connections. Whenever a request
>      for a connection is detected the <literal>postgres</literal>
>      process spawns a new server process. The server tasks

I believe we still call it the postmaster process. We renamed the binary 
a long time ago (commit 5266f221a2), and the above text was changed as 
part of that commit. I think that was a mistake, and this should say simply:

... This supervisor process is called <literal>postmaster</literal> and ...

like it did before we renamed the binary.

Barring objections, I'll commit this with that change (as attached).

- Heikki

Attachment

Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Jürgen Purtz
Date:
On 18.01.21 15:13, Heikki Linnakangas wrote:
On 20/11/2020 23:52, Erik Rijkers wrote:
(smallish) Changes to arch-dev.sgml

This looks good to me. One little complaint:

@@ -125,7 +122,7 @@
     use a <firstterm>supervisor process</firstterm> (also
     <firstterm>master process</firstterm>) that spawns a new
     server process every time a connection is requested. This supervisor
-    process is called <literal>postgres</literal> and listens at a
+    process is called <literal>postgres</literal> (formerly 'postmaster') and listens at a
     specified TCP/IP port for incoming connections. Whenever a request
     for a connection is detected the <literal>postgres</literal>
     process spawns a new server process. The server tasks

I believe we still call it the postmaster process. We renamed the binary a long time ago (commit 5266f221a2), and the above text was changed as part of that commit. I think that was a mistake, and this should say simply:

... This supervisor process is called <literal>postmaster</literal> and ...

like it did before we renamed the binary.

Barring objections, I'll commit this with that change (as attached).

- Heikki

I fear that the patch 'Additional chapter for Tutorial' grows beyond manageable limits. It runs since nearly one year, the size of 228 KB is very huge, many people have made significant contributions. But a commit seems to be in far distance. Having said that, I'm pleased with Heikki's proposal to split changes in the existing file 'arch-dev.sgml' from the rest of the patch and commit them separately.

But I have some concerns with the chapter '51.2. How Connections Are Established'. It uses central terms like 'client process', 'server process', 'supervisor process', 'master process', 'server tasks', 'backend (server)', 'frontend (client)', 'server', 'client'. Some month ago, we have cleared his terminology in the new chapter 'glossary'. As long as it leads to readable text, we shall use the glossary-terms instead of the current ones. And we shall include some links to the glossary.

I propose to start a new thread which contains only changes to 'arch-dev.sgml'. In pgsql-hackers or in pgsql-docs list? Initialized by Heikki or by me?

--

Jürgen Purtz


Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Jürgen Purtz
Date:
On 18.01.21 15:13, Heikki Linnakangas wrote:
> On 20/11/2020 23:52, Erik Rijkers wrote:
>> (smallish) Changes to arch-dev.sgml
>
> This looks good to me. One little complaint:
>
>> @@ -125,7 +122,7 @@
>>      use a <firstterm>supervisor process</firstterm> (also
>>      <firstterm>master process</firstterm>) that spawns a new
>>      server process every time a connection is requested. This 
>> supervisor
>> -    process is called <literal>postgres</literal> and listens at a
>> +    process is called <literal>postgres</literal> (formerly 
>> 'postmaster') and listens at a
>>      specified TCP/IP port for incoming connections. Whenever a request
>>      for a connection is detected the <literal>postgres</literal>
>>      process spawns a new server process. The server tasks
>
> I believe we still call it the postmaster process. We renamed the 
> binary a long time ago (commit 5266f221a2), and the above text was 
> changed as part of that commit. I think that was a mistake, and this 
> should say simply:
>
> ... This supervisor process is called <literal>postmaster</literal> 
> and ...
>
> like it did before we renamed the binary.
>
> Barring objections, I'll commit this with that change (as attached).
>
> - Heikki

Some additional changes in 51.2:

  - smaller number of different terms

  - aligning with Glossary

  - active voice instead of passive voice

  - commas

---

J. Purtz



Attachment

Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Heikki Linnakangas
Date:
On 21/01/2021 14:38, Jürgen Purtz wrote:
> This supervisor process is called <glossterm
> linkend="glossary-postmaster">postmaster</glossterm> and listens at
> a specified TCP/IP port for incoming connections. Whenever he
> detects a request for a connection, he spawns a new backend process.

It sounds weird to refer to a process with "he". I left out this hunk, 
and the other with similar changes.

Committed the rest, thanks!.

- Heikki



Re: Additional Chapter for Tutorial - arch-dev.sgml

From
David Steele
Date:
On 1/22/21 4:15 AM, Heikki Linnakangas wrote:
> On 21/01/2021 14:38, Jürgen Purtz wrote:
>> This supervisor process is called <glossterm
>> linkend="glossary-postmaster">postmaster</glossterm> and listens at
>> a specified TCP/IP port for incoming connections. Whenever he
>> detects a request for a connection, he spawns a new backend process.
> 
> It sounds weird to refer to a process with "he". I left out this hunk, 
> and the other with similar changes.
> 
> Committed the rest, thanks!.

So it looks like this was committed. Is there anything left to do?

If not, we should close the CF entry.

Regards,
-- 
-David
david@pgmasters.net



Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Alvaro Herrera
Date:
On 2021-Mar-25, David Steele wrote:

> On 1/22/21 4:15 AM, Heikki Linnakangas wrote:
> > On 21/01/2021 14:38, Jürgen Purtz wrote:
> > > This supervisor process is called <glossterm
> > > linkend="glossary-postmaster">postmaster</glossterm> and listens at
> > > a specified TCP/IP port for incoming connections. Whenever he
> > > detects a request for a connection, he spawns a new backend process.
> > 
> > It sounds weird to refer to a process with "he". I left out this hunk,
> > and the other with similar changes.
> > 
> > Committed the rest, thanks!.
> 
> So it looks like this was committed. Is there anything left to do?

Yes, there is.  AFAICS Heikki committed a small wordsmithing patch --
not the large patch with the additional chapter.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W
"Ed is the standard text editor."
      http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3



Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Jürgen Purtz
Date:
On 03.04.21 15:39, Alvaro Herrera wrote:
> Yes, there is.  AFAICS Heikki committed a small wordsmithing patch --
> not the large patch with the additional chapter.

What can i do to move the matter forward?

--

J. Purtz





Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Alvaro Herrera
Date:
On 2021-Apr-03, Jürgen Purtz wrote:

> On 03.04.21 15:39, Alvaro Herrera wrote:
> > Yes, there is.  AFAICS Heikki committed a small wordsmithing patch --
> > not the large patch with the additional chapter.
> 
> What can i do to move the matter forward?

Please post a version that applies to the current sources.  If the
latest version posted does, please state so.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W



Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Jürgen Purtz
Date:
On 03.04.21 21:01, Alvaro Herrera wrote:
> On 2021-Apr-03, Jürgen Purtz wrote:
>
>> On 03.04.21 15:39, Alvaro Herrera wrote:
>>> Yes, there is.  AFAICS Heikki committed a small wordsmithing patch --
>>> not the large patch with the additional chapter.
>> What can i do to move the matter forward?
> Please post a version that applies to the current sources.  If the
> latest version posted does, please state so.
>
The small patch 'arch-dev.sgml.20210121.diff' contains only some 
clearing up concerning the used terminology and its alignments with the 
glossary. The patch was rejected by Heikki.

The latest version of the huge patch '0013-architecture.patch' is valid 
and doesn't contain merge conflicts.

--
Jürgen Purtz




Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Alvaro Herrera
Date:
On 2021-Apr-04, Jürgen Purtz wrote:

> The small patch 'arch-dev.sgml.20210121.diff' contains only some clearing up
> concerning the used terminology and its alignments with the glossary. The
> patch was rejected by Heikki.

This comment is not helpful, because it's not obvious where would I find
that patch.  Also, you say "the patch was rejected by Heikki" but
upthread he said he committed it.  His comment was that he left out some
paragraphs because of a style issue.  Did you re-post that patch after
fixing the style issues?  If you did, I couldn't find it.


> The latest version of the huge patch '0013-architecture.patch' is valid and
> doesn't contain merge conflicts.

Yeah, OK, but I have to dive deep in the thread to find it.  Please post
it again.  When you have a patch series, please post it as a whole every
time -- that makes it easier for a committer to review it.

You seem to be making your life hard by not using git to assist you.  Do
you know you can have several commits in a branch of your own, rebase it
to latest master, merge master to it, rebase on top of master, commit
fixups, "rebase -i" and change commit ordering to remove unnecessary
fixup commits, and so on?  Such techniques are extremely helpful when
dealing with a patch series.  When you want to post a new version to the
list, you can just do "git format-patch -v14 origin/master" to produce a
set of patch files.  You don't need to manually give names to your patch
files, or come up with a versioning scheme.  Just increment the argument
to -v by +1 each time you (or somebody else) posts a new version of the
patch series.

-- 
Álvaro Herrera       Valdivia, Chile



Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Jürgen Purtz
Date:
On 04.04.21 19:02, Alvaro Herrera wrote:
> On 2021-Apr-04, Jürgen Purtz wrote:
>
>> The small patch 'arch-dev.sgml.20210121.diff' contains only some clearing up
>> concerning the used terminology and its alignments with the glossary. The
>> patch was rejected by Heikki.
> This comment is not helpful, because it's not obvious where would I find
> that patch.  Also, you say "the patch was rejected by Heikki" but
> upthread he said he committed it.  His comment was that he left out some
> paragraphs because of a style issue.  Did you re-post that patch after
> fixing the style issues?  If you did, I couldn't find it.
>
>
>> The latest version of the huge patch '0013-architecture.patch' is valid and
>> doesn't contain merge conflicts.
> Yeah, OK, but I have to dive deep in the thread to find it.  Please post
> it again.  When you have a patch series, please post it as a whole every
> time -- that makes it easier for a committer to review it.
>
> You seem to be making your life hard by not using git to assist you.  Do
> you know you can have several commits in a branch of your own, rebase it
> to latest master, merge master to it, rebase on top of master, commit
> fixups, "rebase -i" and change commit ordering to remove unnecessary
> fixup commits, and so on?  Such techniques are extremely helpful when
> dealing with a patch series.  When you want to post a new version to the
> list, you can just do "git format-patch -v14 origin/master" to produce a
> set of patch files.  You don't need to manually give names to your patch
> files, or come up with a versioning scheme.  Just increment the argument
> to -v by +1 each time you (or somebody else) posts a new version of the
> patch series.
>
The thread contains a sequence of files '0001_architecture.patch' to 
'0013_architecture.patch' (with gaps in the numbering) created by me and 
other authors over the last 12 month. This is what I call the 'huge 
patch'. Indeed, the files are created more or less manually without the 
format-patch option. I welcome the reference to rebase and format-patch 
and I'm considering to use it in the future.

In addition to this chain Erik introduced in November within the same 
thread some changes to the chapter "Overview of Query Handling", which 
subsequently was expanded by Heikki and me with the sequence of 
'arch-dev.sgml.xxxxx.diff' files. This is what I call the 'small patch'. 
It's independent from the 'huge patch'. That 'small patch' is partly 
committed by Heikki. In case that a committer gives the uncommitted part 
a second chance, I append a patch. Because I'm not a native English 
speaker, every improvement in the linguistic is highly welcome.

--

Jürgen Purtz



Attachment

Re: Additional Chapter for Tutorial - arch-dev.sgml

From
Alvaro Herrera
Date:
On 2021-Apr-05, Jürgen Purtz wrote:

> In addition to this chain Erik introduced in November within the same thread
> some changes to the chapter "Overview of Query Handling", which subsequently
> was expanded by Heikki and me with the sequence of
> 'arch-dev.sgml.xxxxx.diff' files. This is what I call the 'small patch'.
> It's independent from the 'huge patch'. That 'small patch' is partly
> committed by Heikki. In case that a committer gives the uncommitted part a
> second chance, I append a patch. Because I'm not a native English speaker,
> every improvement in the linguistic is highly welcome.

Pushed this one with cosmetic adjustments.

-- 
Álvaro Herrera       Valdivia, Chile
"Cuando mañana llegue pelearemos segun lo que mañana exija" (Mowgli)