Thread: Add A Glossary

Add A Glossary

From
Corey Huinker
Date:
Attached is a v1 patch to add a Glossary to the appendix of our current documentation.

I believe that our documentation needs a glossary for a few reasons:

1. It's hard to ask for help if you don't know the proper terminology of the problem you're having.

2. Readers who are new to databases may not understand a few of the terms that are used casually both in the documentation and in forums. This helps to make our documentation a bit more useful as a teaching tool.

3. Readers whose primary language is not English may struggle to find the correct search terms, and this glossary may help them grasp that a given term has a usage in databases that is different from common English usage.

3b. If we are not able to find the resources to translate all of the documentation into a given language, translating the glossary page would be a good first step.

4. The glossary would be web-searchable, and draw viewers to the official documentation.

5. adding link anchors to each term would make them cite-able, useful in forum conversations.


A few notes about this patch:

1. It's obviously incomplete. There are more terms, a lot more, to add.

2. The individual definitions supplied are off-the-cuff, and should be thoroughly reviewed.

3. The definitions as a whole should be reviewed by an actual tech writer (one was initially involved but had to step back due to prior commitments), and the definitions should be normalized in terms of voice, tone, audience, etc.

4. My understanding of DocBook is not strong. The glossary vs glosslist tag issue is a bit confusing to me, and I'm not sure if the glossary tag is even appropriate for our needs.

5. I've made no effort at making each term an anchor, nor have I done any CSS styling at all.

6. I'm not quite sure how to handle terms that have different definitions in different contexts. Should that be two glossdefs following one glossterm, or two separate def/term pairs?

Please review and share your thoughts.
Attachment

Re: Add A Glossary

From
Fabien COELHO
Date:
Hello Corey,

My 0.02€:

On principle, I'm fine with having a glossary, i.e. word definitions, 
which are expected to be rather stable in the long run.

I'm wondering whether the effort would not be made redundant by other 
on-line effort such as wikipedia, wiktionary, stackoverflow, standards, 
whatever.

When explaining something, the teacher I am usually provides some level of 
example. This may or may not be appropriate there.

ISTM that there should be pointers to relevant sections in the 
documentation, for instance "Analytics" provided definition suggests
pointing to windowing functions.

There is significant redundancy involved, because a lot of term would be 
defined in other sections anyway.

There should be cross references, eg "Column" definition talks about 
Attribute, Table & View, which should be linked to.

I'd consider making SQL keywords uppercase.

Developing that is a significant undertaking. Do we have the available 
energy?

Patch generates a warning on "git apply".

  sh> git apply ...
  ... terms-and-definitions.patch:159: tab in indent. [...]
  warning: 1 line adds whitespace errors.

"Record" def as nested <para> for some unclear reason.

Basically the redacted definitions look pretty clear and well written to 
the non-native English speaker I am.

On Sun, 13 Oct 2019, Corey Huinker wrote:

> Date: Sun, 13 Oct 2019 16:52:05 -0400
> From: Corey Huinker <corey.huinker@gmail.com>
> To: pgsql-hackers@postgresql.org
> Subject: Add A Glossary
> 
> Attached is a v1 patch to add a Glossary to the appendix of our current
> documentation.
>
> I believe that our documentation needs a glossary for a few reasons:
>
> 1. It's hard to ask for help if you don't know the proper terminology of
> the problem you're having.
>
> 2. Readers who are new to databases may not understand a few of the terms
> that are used casually both in the documentation and in forums. This helps
> to make our documentation a bit more useful as a teaching tool.
>
> 3. Readers whose primary language is not English may struggle to find the
> correct search terms, and this glossary may help them grasp that a given
> term has a usage in databases that is different from common English usage.
>
> 3b. If we are not able to find the resources to translate all of the
> documentation into a given language, translating the glossary page would be
> a good first step.
>
> 4. The glossary would be web-searchable, and draw viewers to the official
> documentation.
>
> 5. adding link anchors to each term would make them cite-able, useful in
> forum conversations.
>
>
> A few notes about this patch:
>
> 1. It's obviously incomplete. There are more terms, a lot more, to add.
>
> 2. The individual definitions supplied are off-the-cuff, and should be
> thoroughly reviewed.
>
> 3. The definitions as a whole should be reviewed by an actual tech writer
> (one was initially involved but had to step back due to prior commitments),
> and the definitions should be normalized in terms of voice, tone, audience,
> etc.
>
> 4. My understanding of DocBook is not strong. The glossary vs glosslist tag
> issue is a bit confusing to me, and I'm not sure if the glossary tag is
> even appropriate for our needs.
>
> 5. I've made no effort at making each term an anchor, nor have I done any
> CSS styling at all.
>
> 6. I'm not quite sure how to handle terms that have different definitions
> in different contexts. Should that be two glossdefs following one
> glossterm, or two separate def/term pairs?
>
> Please review and share your thoughts.
>

-- 
Fabien Coelho - CRI, MINES ParisTech

Re: Add A Glossary

From
Michael Paquier
Date:
On Sat, Nov 09, 2019 at 09:19:16AM +0100, Fabien COELHO wrote:
> On principle, I'm fine with having a glossary, i.e. word definitions, which
> are expected to be rather stable in the long run.
>
> I'm wondering whether the effort would not be made redundant by other
> on-line effort such as wikipedia, wiktionary, stackoverflow, standards,
> whatever.
>
> When explaining something, the teacher I am usually provides some level of
> example. This may or may not be appropriate there.

That's exactly a good reason for being a reviewer here.  You have
quite some insight here.

> I'd consider making SQL keywords uppercase.
>
> Developing that is a significant undertaking. Do we have the available
> energy?

It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.
--
Michael

Attachment

Re: Add A Glossary

From
Corey Huinker
Date:
It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.

In light of feedback, I enlisted the help of an actual technical writer (Roger Harkavy, CCed) and we eventually found the time to take a second pass at this.

Attached is a revised patch.
 
Attachment

Re: Add A Glossary

From
Corey Huinker
Date:
This latest version is an attempt at merging the work of Jürgen Purtz into what I had posted earlier. There was relatively little overlap in the terms we had chosen to define.

Each glossary definition now has a reference id (good idea Jürgen), the form of which is "glossary-term". So we can link to the glossary from outside if we so choose.

I encourage everyone to read the definitions, and suggest fixes to any inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on the structure itself, and hoping to get that committed.


On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:
It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.

In light of feedback, I enlisted the help of an actual technical writer (Roger Harkavy, CCed) and we eventually found the time to take a second pass at this.

Attached is a revised patch.
 
Attachment

Re: Add A Glossary

From
Roger Harkavy
Date:
Hello, everyone, I'm Roger, the tech writer who worked with Corey on the glossary file. I just thought I'd announce that I am also on the list, and I'm looking forward to any questions or comments people may have. Thanks!

On Tue, Mar 10, 2020 at 11:37 AM Corey Huinker <corey.huinker@gmail.com> wrote:
This latest version is an attempt at merging the work of Jürgen Purtz into what I had posted earlier. There was relatively little overlap in the terms we had chosen to define.

Each glossary definition now has a reference id (good idea Jürgen), the form of which is "glossary-term". So we can link to the glossary from outside if we so choose.

I encourage everyone to read the definitions, and suggest fixes to any inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on the structure itself, and hoping to get that committed.


On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:
It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.

In light of feedback, I enlisted the help of an actual technical writer (Roger Harkavy, CCed) and we eventually found the time to take a second pass at this.

Attached is a revised patch.
 

Re: Add A Glossary

From
Jürgen Purtz
Date:
I made changes on top of 0001-add-glossary-page.patch which was supplied 
by C. Huinker. This affects not only terms proposed by me but also his 
original terms. If my changes are not obvious, please let me know and I 
will describe my motivation.

Please note especially lines marked with question marks.

It will be helpful for diff-ing to restrict the length of lines in the 
SGML files to 71 characters (as usual).

J. Purtz



Attachment

Re: Add A Glossary

From
Corey Huinker
Date:


On Wed, Mar 11, 2020 at 12:50 PM Jürgen Purtz <juergen@purtz.de> wrote:
I made changes on top of 0001-add-glossary-page.patch which was supplied
by C. Huinker. This affects not only terms proposed by me but also his
original terms. If my changes are not obvious, please let me know and I
will describe my motivation.

Please note especially lines marked with question marks.

It will be helpful for diff-ing to restrict the length of lines in the
SGML files to 71 characters (as usual).

J. Purtz

A new person replied off-list with some suggested edits, all of which seemed pretty good. I'll incorporate them myself if that person chooses to remain off-list.



 

Re: Add A Glossary

From
Corey Huinker
Date:
It will be helpful for diff-ing to restrict the length of lines in the
SGML files to 71 characters (as usual).

I did it that way for the following reasons
1. It aids grep-ability
2. The committers seem to be moving towards that for SQL strings, mostly for reason #1
3. I recall that the code is put through a linter as one of the final steps before release, I assumed that the SGML gets the same.
4. Even if #3 is false, its easy enough to do manually for me to do for this one file once we've settled on the text of the definitions.

As for the changes, most things seem fine, I specifically like:
* Checkpoint - looks good
* yes, PGDATA should have been a literal
* Partition - the a/b split works for me
* Unlogged - it reads better

I'm not so sure on / responses to your ???s:
* The statement that names of schema objects are unique isn't strictly true, just mostly true. Take the case of a unique constraints. The constraint has a name and the unique index has the same name, to the point where adding a unique constraint using an existing index renames that index to conform to the constraint name.
* Serializable "other way around" question - It's both. Outside the transaction you can't see changes made inside another transaction (though you can be blocked by them), and inside serializable you can't see any changes made since you started. Does that make sense? Were you asking a different question?
* Transaction - yes, all those things could be "visible" or they could be "side effects". It may be best to leave the over-simplified definition in place, and add a "For more information see <<linref to tutorial-transactions>>

Re: Add A Glossary

From
Corey Huinker
Date:

* Transaction - yes, all those things could be "visible" or they could be "side effects". It may be best to leave the over-simplified definition in place, and add a "For more information see <<linref to tutorial-transactions>>

transaction-iso would be a better linkref in this case 

Re: Add A Glossary

From
Jürgen Purtz
Date:

The statement that names of schema objects are unique isn't strictly true, just mostly true. Take the case of a unique constraints.

Concerning CONSTRAINTS you are right. Constraints seems to be an exception:

  • Their name belongs to a schema, but are not necessarily unique within this context: https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
  • There is a UNIQUE index within the system catalog pg_constraints:  "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname), which expresses that names are unique within the context of a table/constraint-type. Nevertheless tests have shown that some stronger restrictions exists across table-boarders (,which seems to be implemented in CREATE statements - or as a consequence of your mentioned correlation between constraint and index ?).

I hope that there are no more such exception to the global rule 'object names in a schema are unique': https://www.postgresql.org/docs/current/sql-createschema.html

This facts must be mentioned as a short note in glossary and in more detail in the later patch about the architecture.

J. Purtz


Re: Add A Glossary

From
Corey Huinker
Date:
On Fri, Mar 13, 2020 at 12:18 AM Jürgen Purtz <juergen@purtz.de> wrote:

The statement that names of schema objects are unique isn't strictly true, just mostly true. Take the case of a unique constraints.

Concerning CONSTRAINTS you are right. Constraints seems to be an exception:

  • Their name belongs to a schema, but are not necessarily unique within this context: https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
  • There is a UNIQUE index within the system catalog pg_constraints:  "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname), which expresses that names are unique within the context of a table/constraint-type. Nevertheless tests have shown that some stronger restrictions exists across table-boarders (,which seems to be implemented in CREATE statements - or as a consequence of your mentioned correlation between constraint and index ?).

I hope that there are no more such exception to the global rule 'object names in a schema are unique': https://www.postgresql.org/docs/current/sql-createschema.html

This facts must be mentioned as a short note in glossary and in more detail in the later patch about the architecture.


I did what I could to address the near uniqueness, as well as incorporate your earlier edits into this new, squashed patch attached.
 
Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
I gave this a look.  I first reformatted it so I could read it; that's
0001.  Second I changed all the long <link> items into <xref>s, which
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)

There are some mistakes.  "Tupple" is most glaring one -- not just the
typo but also the fact that it goes to sql-revoke.  A few definitions
we'll want to modify.  Nothing too big.  In general I like this work and
I think we should have it in pg13.

Please bikeshed the definition of your favorite term, and suggest what
other terms to add.  No pointing out of mere typos yet, please.

I think we should have the terms Consistency, Isolation, Durability.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Corey Huinker
Date:
On Thu, Mar 19, 2020 at 8:11 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
I gave this a look.  I first reformatted it so I could read it; that's
0001.  Second I changed all the long <link> items into <xref>s, which

Thanks! I didn't know about xrefs, that is a big improvement.
 
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)

There are some mistakes.  "Tupple" is most glaring one -- not just the
typo but also the fact that it goes to sql-revoke.  A few definitions
we'll want to modify.  Nothing too big.  In general I like this work and
I think we should have it in pg13.

Please bikeshed the definition of your favorite term, and suggest what
other terms to add.  No pointing out of mere typos yet, please.

Jürgen mentioned off-list that the man page doesn't build. I was going to look into that, but if anyone has more familiarity with that, I'm listening.


I think we should have the terms Consistency, Isolation, Durability.

+1

Re: Add A Glossary

From
Corey Huinker
Date:
Jürgen mentioned off-list that the man page doesn't build. I was going to look into that, but if anyone has more familiarity with that, I'm listening.

Looking at this some more, I'm not sure anything needs to be done for man pages. man1 is for executables, man3 seems to be dblink and SPI, and man7 is all SQL commands. This isn't any of those. The only possible thing left would be how to render the text of a <glossterm>foo</glossterm, and so I looked to see what we do in man pages for acronyms, and the answer appears to be "nothing":

postgres/doc/src$ git grep acronym | grep -v '\/acronym'
sgml/filelist.sgml:<!ENTITY acronyms   SYSTEM "acronyms.sgml">
sgml/postgres.sgml:  &acronyms;
sgml/release.sgml:[A-Z][A-Z_ ]+[A-Z_]             <command>, <literal>, <envar>, <acronym>
sgml/stylesheet.css:acronym { font-style: inherit; }

filelist.sgml, postgres.sgml, ans stylesheet.css already have the corresponding change, and the release.sgml is just an incidental mention of acronym.

Of course I could be missing something.

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Mar-20, Corey Huinker wrote:

> > Jürgen mentioned off-list that the man page doesn't build. I was going to
> > look into that, but if anyone has more familiarity with that, I'm listening.

> Looking at this some more, I'm not sure anything needs to be done for man
> pages.

Yeah, I don't think he was saying that we needed to do anything to
produce a glossary man page; rather that the "make man" command failed.
I tried it here, and indeed it failed.  But on further investigation,
after a "make maintainer-clean" it no longer failed.  I'm not sure what
to make of it, but it seems that this patch needn't concern itself with
that.

I gave a read through the first few actual definitions.  It's a much
slower work than I thought!  Attached you'll find the first few edits
that I propose.

Looking at the definition of "Aggregate" it seemed weird to have it
stand as a verb infinitive.  I looked up other glossaries, found this
one
https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
and realized that when they do verbs, they put the present participle
(-ing) form.  So I changed it to "Aggregating", and split out the
"Aggregate function" into its own term.

In Atomic, there seemed to be excessive use of <glossterm> in the
definitions.  Style guides seem to suggest to do that only the first
time you use a term in a definition.  I removed some markup.

I'm not sure about some terms such as "analytic" and "backend server".
I put them in XML comments for now.

The other changes should be self-explanatory.

It's hard to review work from a professional tech writer.  I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Roger Harkavy
Date:
Alvaro, I know that you are joking, but I want to impress on everyone: please don't feel like anyone here is breaking anything when it comes to modifying the content and structure of this glossary.

I do have technical writing experience, but everyone else here is a subject matter expert when it comes to the world of databases and how this one in particular functions.

On Fri, Mar 20, 2020 at 1:51 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
On 2020-Mar-20, Corey Huinker wrote:

> > Jürgen mentioned off-list that the man page doesn't build. I was going to
> > look into that, but if anyone has more familiarity with that, I'm listening.

> Looking at this some more, I'm not sure anything needs to be done for man
> pages.

Yeah, I don't think he was saying that we needed to do anything to
produce a glossary man page; rather that the "make man" command failed.
I tried it here, and indeed it failed.  But on further investigation,
after a "make maintainer-clean" it no longer failed.  I'm not sure what
to make of it, but it seems that this patch needn't concern itself with
that.

I gave a read through the first few actual definitions.  It's a much
slower work than I thought!  Attached you'll find the first few edits
that I propose.

Looking at the definition of "Aggregate" it seemed weird to have it
stand as a verb infinitive.  I looked up other glossaries, found this
one
https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
and realized that when they do verbs, they put the present participle
(-ing) form.  So I changed it to "Aggregating", and split out the
"Aggregate function" into its own term.

In Atomic, there seemed to be excessive use of <glossterm> in the
definitions.  Style guides seem to suggest to do that only the first
time you use a term in a definition.  I removed some markup.

I'm not sure about some terms such as "analytic" and "backend server".
I put them in XML comments for now.

The other changes should be self-explanatory.

It's hard to review work from a professional tech writer.  I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.

--
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

From
Corey Huinker
Date:
It's hard to review work from a professional tech writer.  I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.

If it makes you feel better, it's a mix of definitions I wrote that Roger proofed and restructured, ones that Jürgen had written for a separate effort which then got a Roger-pass, and then some edits of my own and some by Jürgen which I merged without consulting Roger.

Re: Add A Glossary

From
Justin Pryzby
Date:
On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:
> +    <glossterm>Aggregate</glossterm>
> +    <glossdef>
> +     <para>
> +      To combine a collection of data values into a single value, whose
> +      value may not be of the same type as the original values.
> +      <glossterm>Aggregate</glossterm> <glossterm>Functions</glossterm>
> +      combine multiple <glossterm>Rows</glossterm> that share a common set
> +      of values into one <glossterm>Row</glossterm>, which means that the
> +      only data visible in the values in common, and the aggregates of the

IS the values in common ?
(or, "is the shared values")

> +    <glossterm>Analytic</glossterm>
> +    <glossdef>
> +     <para>
> +      A <glossterm>Function</glossterm> whose computed value can reference
> +      values found in nearby <glossterm>Rows</glossterm> of the same
> +      <glossterm>Result Set</glossterm>.

> +    <glossterm>Archiver</glossterm>

Can you change that to archiver process ?

> +    <glossterm>Atomic</glossterm>
..
> +     <para>
> +      In reference to an operation: An event that cannot be completed in
> +      part: it must either entirely succeed or entirely fail. A series of

Can you say: "an action which is not allowed to partially succed and then fail,
..."

> +    <glossterm>Autovacuum</glossterm>

Say autovacuum process ?

> +    <glossdef>
> +     <para>
> +      Processes that remove outdated <acronym>MVCC</acronym>

I would say "A set of processes that remove..."

> +      <glossterm>Records</glossterm> of the <glossterm>Heap</glossterm> and

I'm not sure, can you say "tuples" ?

> +    <glossterm>Backend Process</glossterm>
> +    <glossdef>
> +     <para>
> +      Processes of an <glossterm>Instance</glossterm> which act on behalf of

Say DATABASE instance

> +    <glossterm>Backend Server</glossterm>
> +    <glossdef>
> +     <para>
> +      See <glossterm>Instance</glossterm>.
same

> +    <glossterm>Background Worker</glossterm>
> +    <glossdef>
> +     <para>
> +      Individual processes within an <glossterm>Instance</glossterm>, which
same

> +      run system- or user-supplied code. Typical use cases are processes
> +      which handle parts of an <acronym>SQL</acronym> query to take
> +      advantage of parallel execution on servers with multiple
> +      <acronym>CPUs</acronym>.

I would say "A typical use case is"

> +    <glossterm>Background Writer</glossterm>

Add "process" ?

> +    <glossdef>
> +     <para>
> +      Writes continuously dirty pages from <glossterm>Shared

Say "Continuously writes"

> +      Memory</glossterm> to the file system. It starts periodically, but

Hm, maybe "wakes up periodically"

> +    <glossterm>Cast</glossterm>
> +    <glossdef>
> +     <para>
> +      A conversion of a <glossterm>Datum</glossterm> from its current data
> +      type to another data type.

maybe just say
A conversion of a <glossterm>Datum</glossterm> another data type.

> +    <glossterm>Catalog</glossterm>
> +    <glossdef>
> +     <para>
> +      The <acronym>SQL</acronym> standard uses this standalone term to
> +      indicate what is called a <glossterm>Database</glossterm> in
> +      <productname>PostgreSQL</productname>'s terminology.

Maybe remove "standalone" ?

> +    <glossterm>Checkpointer</glossterm>

Process

> +      A process that writes dirty pages and <glossterm>WAL
> +      Records</glossterm> to the file system and creates a special

Does the chckpointer actually write WAL ?

> +      checkpoint record. This process is initiated when predefined
> +      conditions are met, such as a specified amount of time has passed, or
> +      a certain volume of records have been collected.

collected or written?

I would say:
> +      A checkpoint is usually initiated by
> +      a specified amount of time having passed, or
> +      a certain volume of records having been written.

> +    <glossterm>Checkpoint</glossterm>
> +    <glossdef>
> +     <para>
> +      A <link linkend="sql-checkpoint"> Checkpoint</link> is a point in time

Extra space

> +   <glossentry id="glossary-connection">
> +    <glossterm>Connection</glossterm>
> +    <glossdef>
> +     <para>
> +      A <acronym>TCP/IP</acronym> or socket line for inter-process

I don't know if I've ever heard the phase "socket line"
I guess you mean a unix socket.

> +    <glossterm>Constraint</glossterm>
> +    <glossdef>
> +     <para>
> +      A concept of restricting the values of data allowed within a
> +      <glossterm>Table</glossterm>.

Just say: "A restriction on the values..."?

> +    <glossterm>Data Area</glossterm>

Remove this ?  I've never heard this phrase before.

> +    <glossdef>
> +     <para>
> +      The base directory on the filesystem of a
> +      <glossterm>Server</glossterm> that contains all data files and
> +      subdirectories associated with a <glossterm>Cluster</glossterm> with
> +      the exception of tablespaces. The environment variable

Should add an entry for "tablespace".

> +    <glossterm>Datum</glossterm>
> +    <glossdef>
> +     <para>
> +      The internal representation of a <acronym>SQL</acronym> data type.

I'm not sure if should use "a SQL" or "an SQL", but not both.

> +    <glossterm>Delete</glossterm>
> +    <glossdef>
> +     <para>
> +      A <acronym>SQL</acronym> command whose purpose is to remove

just say "which removes"

> +   <glossentry id="glossary-file-segment">
> +    <glossterm>File Segment</glossterm>
> +    <glossdef>
> +     <para>
> +       If a heap or index file grows in size over 1 GB, it will be split

1GB is the default "segment size", which you should define.

> +   <glossentry id="glossary-foreign-data-wrapper">
> +    <glossterm>Foreign Data Wrapper</glossterm>
> +    <glossdef>
> +     <para>
> +      A means of representing data that is not contained in the local
> +      <glossterm>Database</glossterm> as if were in local
> +      <glossterm>Table</glossterm>(s).

I'd say:

+ A means of representing data as a <glossterm>Table</glossterm>(s) even though
+ it is not contained in the local <glossterm>Database</glossterm> 


> +   <glossentry id="glossary-foreign-key">
> +    <glossterm>Foreign Key</glossterm>
> +    <glossdef>
> +     <para>
> +      A type of <glossterm>Constraint</glossterm> defined on one or more
> +      <glossterm>Column</glossterm>s in a <glossterm>Table</glossterm> which
> +      requires the value in those <glossterm>Column</glossterm>s to uniquely
> +      identify a <glossterm>Row</glossterm> in the specified
> +      <glossterm>Table</glossterm>.

An FK doesn't require the values in its table to be unique, right ?
I'd say something like: "..which enforces that the values in those Columns are
also present in an(other) table."
Reference Referential Integrity?

> +    <glossterm>Function</glossterm>
> +    <glossdef>
> +     <para>
> +      Any pre-defined transformation of data. Many
> +      <glossterm>Functions</glossterm> are already defined within
> +      <productname>PostgreSQL</productname> itself, but can also be
> +      user-defined.

I would remove "pre-", since you mentioned that it can be user-defined.

> +    <glossterm>Global SQL Object</glossterm>
> +    <glossdef>
> +     <para>
> +     <!-- FIXME -->
> +      Not all <glossterm>SQL Objects</glossterm> belong to a certain
> +      <glossterm>Schema</glossterm>. Some belong to the complete
> +      <glossterm>Database</glossterm>, or even to the complete
> +      <glossterm>Cluster</glossterm>. These are referred to as
> +      <glossterm>Global SQL Objects</glossterm>. Collations and Extensions
> +      such as <glossterm>Foreign Data Wrappers</glossterm> reside at the
> +      <glossterm>Database</glossterm> level; <glossterm>Database</glossterm>
> +      names, <glossterm>Roles</glossterm>,
> +      <glossterm>Tablespaces</glossterm>, <glossterm>Replication</glossterm>
> +      origins, and subscriptions for logical
> +      <glossterm>Replication</glossterm> at the
> +      <glossterm>Cluster</glossterm> level.

I think "complete" is the wrong world.
I would say:
"An object which is not specific to a given database, but instead shared across
the entire Cluster".

> +   <glossentry id="glossary-grant">
> +    <glossterm>Grant</glossterm>
> +    <glossdef>
> +     <para>
> +      A <acronym>SQL</acronym> command that is used to enable

I'd say "allow"

> +   <glossentry id="glossary-heap">
> +    <glossterm>Heap</glossterm>
> +    <glossdef>
> +     <para>
> +      Contains the original values of <glossterm>Row</glossterm> attributes

I'm not sure what "original" means here ?

> +      (i.e. the data). The <glossterm>Heap</glossterm> is realized within
> +      <glossterm>Database</glossterm> files and mirrored in
> +      <glossterm>Shared Memory</glossterm>.

I wouldn't say mirrored, and probably just remove at least the part after "and".

> +   <glossentry id="glossary-host">
> +    <glossterm>Host</glossterm>
> +    <glossdef>
> +     <para>
> +      See <glossterm>Server</glossterm>.

Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
remove this.

> +   <glossentry id="glossary-index">
> +    <glossterm>Index</glossterm>
> +    <glossdef>
> +     <para>
> +      A <glossterm>Relation</glossterm> that contains data derived from a
> +      <glossterm>Table</glossterm> (or <glossterm>Relation</glossterm> such
> +      as a <glossterm>Materialized View</glossterm>). It's internal

Its

> +      structure supports very fast retrieval of and access to the original
> +      data.

> +    <glossterm>Instance</glossterm>
> +    <glossdef>
> +     <para>
...
> +     <para>
> +      Many <glossterm>Instances</glossterm> can run on the same server as
> +      long as they use different <acronym>IP</acronym> ports and manage

I would say "as long as their TCP/IP ports or sockets don't conflict, and manage..."

> +    <glossterm>Join</glossterm>
> +    <glossdef>
> +     <para>
> +      A technique used with <command>SELECT</command> statements for
> +      correlating data in one or more <glossterm>Relations</glossterm>.

I would refer to this as a SQL keyword allowing to combine data from multiple
relations.

> +    <glossterm>Lock</glossterm>
> +    <glossdef>
> +     <para>
> +      A mechanism for one process temporarily preventing data from being
> +      manipulated by any other process.

I'd say:

+      A mechanism by which a process protects simultaneous access to a resource
+      by other processes.

(I said "protects" since shared locks don't prevent all access, and it's easier
than explaining "unsafe access").


> +   <glossentry id="glossary-log-file">
> +    <glossterm>Log File</glossterm>
> +    <glossdef>
> +     <para>
> +      <link linkend="logfile-maintenance">LOG files</link> contain readable
> +      text lines about serious and non-serious events, e.g.: use of wrong
> +      password, long-running queries, ... .

Serious and non-serious?

> +    <glossterm>Log Writer</glossterm>

process

> +    <glossdef>
> +     <para>
> +      If activated and parameterized, the

I don't know what parameterized means here

> +      <link linkend="runtime-config-logging">Log Writer</link> process
> +      writes information about database events into the current
> +      <glossterm>Log file</glossterm>. When reaching certain time- or
> +      volume-dependent criterias, he <!-- FIXME "he"? --> creates a new

I think criteria is the plural..

> +    <glossterm>Log Record</glossterm>

Can we remove this ?
Couple releases ago, "pg_xlog" was renamed to pg_wal.
I'd prefer to avoid defining something called "Log Record" about WAL that's
right next to text logs.

> +    <glossterm>Logged</glossterm>
> +    <glossdef>
> +     <para>
> +      A <glossterm>Table</glossterm> is considered
> +      <glossterm>Logged</glossterm> if changes to it are sent to the
> +      <glossterm>WAL Log</glossterm>. By default, all regular
> +      <glossterm>Tables</glossterm> are <glossterm>Logged</glossterm>. A
> +      <glossterm>Table</glossterm> can be speficied as unlogged either at
> +      creation time or via the <command>ALTER TABLE</command> command. The
> +      primary use of unlogged <glossterm>Tables</glossterm> is for storing
> +      transient work data that must be shared across processes, but with a
> +      final result stored in logged <glossterm>Tables</glossterm>.
> +      <glossterm>Temporary Tables</glossterm> are always unlogged.
> +     </para>
> +    </glossdef>
> +   </glossentry>

Maybe it's be better to define "unlogged", since 1) logged is the default; and
2) it's right next to text logs.

> +    <glossterm>Master</glossterm>
> +    <glossdef>
> +     <para>
> +      When two or more <glossterm>Databases</glossterm> are linked via
> +      <glossterm>Replication</glossterm>, the <glossterm>Server</glossterm>
> +      that is considered the authoritative source of information is called
> +      the <glossterm>Master</glossterm>.

I think it'd actually be the <<instance>> which is authoritative, in case they're
running on the same <<Server>>

> +   <glossentry id="glossary-materialized">
> +    <glossterm>Materialized</glossterm>
> +    <glossdef>
> +     <para>
> +      The act of storing information rather than just the means of accessing

remove "means of" ?

> +      the information. This term is used in <glossterm>Materialized
> +      Views</glossterm> meaning that the data derived from the
> +      <glossterm>View</glossterm> is actually stored on disk separate from

separately

> +      the sources of that data. When the term
> +      <glossterm>Materialized</glossterm> is used in speaking about
> +      mulit-step queries, it means that the data of a given step is stored

multi

> +      (in memory, but that storage may spill over onto disk).
> +     </para>
> +    </glossdef>
> +   </glossentry>
> +
> +   <glossentry id="glossary-materialized-view">
> +    <glossterm>Materialized View</glossterm>
> +    <glossdef>
> +     <para>
> +      A <glossterm>Relation</glossterm> that is defined in the same way that
> +      a <glossterm>View</glossterm> is, but it stores data in the same way

change "it stores" to stores

> +   <glossentry id="glossary-partition">
> +    <glossterm>Partition</glossterm>
> +    <glossdef>
> +     <para>
> +      <!-- FIXME should this use the style used in "atomic"? -->
> +      a) A <glossterm>Table</glossterm> that can be queried independently by
> +      its own name, but can also be queried via another

just say "on its own" or "directly"

> +      <glossterm>Table</glossterm>, a partitionend

partitioned
also, put it in parens, like "via another table (a partitioned table)..."

> +      <glossterm>Table</glossterm>, which is a collection of

Say "set" here since you later talk about "subsets" and sets.

> +   <glossentry id="glossary-primary-key">
> +    <glossterm>Primary Key</glossterm>
> +    <glossdef>
> +     <para>
> +      A special case of <glossterm>Unique Index</glossterm> defined on a
> +      <glossterm>Table</glossterm> or other <glossterm>Relation</glossterm>
> +      that also guarantees that all of the <glossterm>Attributes</glossterm>
> +      within the <glossterm>Primary Key</glossterm> do not have
> +      <glossterm>Null</glossterm> values.  As the name implies, there can be
> +      only one <glossterm>Primary Key</glossterm> per
> +      <glossterm>Table</glossterm>, though it is possible to have multiple
> +      <glossterm>Unique Indexes</glossterm> that also have no
> +      <glossterm>Null</glossterm>-capable <glossterm>Attributes</glossterm>.

I would say "multiple >>unique indexes<< on >>attributes<< defined as not
nullable.

> +    <glossterm>Procedure</glossterm>
> +    <glossdef>
> +     <para>
> +      A defined set of instructions for manipulating data within a
> +      <glossterm>Database</glossterm>. <glossterm>Procedure</glossterm> can

"procedures" or "a procedure"

> +    <glossterm>Record</glossterm>
> +    <glossdef>
> +     <para>
> +      See <link linkend="sql-revoke">Tupple</link>.

Tupple is back.  And again below.

> +      A single <glossterm>Row</glossterm> of a <glossterm>Table</glossterm>
> +      or other Relation.

I think it's commonly used to mean "an instance of a row" (in an MVCC sense),
but maybe that's too much detail for here.

> +    <glossterm>Referential Integrity</glossterm>
> +    <glossdef>
> +     <para>
> +      The means of restricting data in one <glossterm>Relation</glossterm>

A means

> +   <glossentry id="glossary-relation">
> +    <glossterm>Relation</glossterm>
> +    <glossdef>
> +     <para>
> +      The generic term for all objects in a <glossterm>Database</glossterm>

"A generic term for any object in a >>database<< that has a name and..."

> +   <glossentry id="glossary-result-set">
> +    <glossterm>Result Set</glossterm>
> +    <glossdef>
> +     <para>
> +      A data structure transmitted from a <glossterm>Server</glossterm> to
> +      client program upon the completion of a <acronym>SQL</acronym>
> +      command, usually a <command>SELECT</command> but it can be an
> +      <command>INSERT</command>, <command>UPDATE</command>, or
> +      <command>DELETE</command> command if the <literal>RETURNING</literal>
> +      clause is specified.

I'd remove everything in that sentence after "usually".

> +    <glossterm>Revoke</glossterm>
> +    <glossdef>
> +     <para>
> +      A command to reduce access to a named set of

s/reduce/prevent/ ?

> +    <glossterm>Row</glossterm>
> +    <glossdef>
> +     <para>
> +      See <link linkend="sql-revoke">Tupple</link>.

tuple

> +   <glossentry id="glossary-savepoint">
> +    <glossterm>Savepoint</glossterm>
> +    <glossdef>
> +     <para>
> +      A special mark (such as a timestamp) inside a
> +      <glossterm>Transaction</glossterm>. Data modifications after this
> +      point in time may be rolled back to the time of the savepoint.

I don't think "timestamp" is a useful or accurate analogy for this.

> +    <glossterm>Schema</glossterm>
> +    <glossdef>
> +     <para>
> +      A <link linkend="ddl-schemas">schema</link> is a namespace for
> +      <glossterm>SQL objects</glossterm>, which all reside in the same
> +      <glossterm>database</glossterm>.  Each <glossterm>SQL
> +      object</glossterm> must reside in exactly one
> +      <glossterm>Schema</glossterm>.
> +     </para>

> +     <para>
> +      In general, the names of <glossterm>SQL objects</glossterm> in the
> +      schema are unique - even across different types of objects.  The lone
> +      exception is the case of <glossterm>Unique</glossterm>
> +      <glossterm>Constraint</glossterm>s, in which case there
> +      <emphasis>must</emphasis> be a <glossterm>Unique Index</glossterm>
> +      with the same name and <glossterm>Schema</glossterm> as the
> +      <glossterm>Constraint</glossterm>.  There is no restriction on having
> +      a name used in multiple <glossterm>Schema</glossterm>s.

I think there's some confusion.  Constraints are not objects, right ?

But, constraints do have an exception (not just unique constraints, though):
the constraint is only unique on its table, not in its database/schema.

    "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname) CLUSTER

> +    <glossterm>Select</glossterm>
> +    <glossdef>
> +     <para>
> +      The command used to query a <glossterm>Database</glossterm>. Normally,
> +      <command>SELECT</command>s are not expected to modify the
> +      <glossterm>Database</glossterm> in any way, but it is possible that
> +      <glossterm>Functions</glossterm> invoked within the query could have
> +      side-effects that do modify data.  </para>

I think there should be references to the sql-* pages for this and others.

> +   <glossentry id="glossary-serializable">
> +    <glossterm>Serializable</glossterm>
> +    <glossdef>
> +     <para>
> +      Transactions defined as <literal>SERIALIZABLE</literal> are unable to
> +      see changes made within other transactions. In effect, for the
> +      initializing session the entire <glossterm>Database</glossterm>
> +      appears to be frozen duration such a
> +      <glossterm>Transaction</glossterm>.

Do you mean "for the duration of the >>Transaction<<"

> +   <glossentry id="glossary-session">
> +    <glossterm>Session</glossterm>
> +    <glossdef>
> +     <para>
> +      A <glossterm>Connection</glossterm> to the <glossterm>Database</glossterm>.
> +     </para>
> +     <para>
> +      A description of the commands that were issued in the life cycle of a
> +      particular <glossterm>Connection</glossterm> to the
> +      <glossterm>Database</glossterm>.

I'm not sure what this <para> means.

> +    <glossterm>Sequence</glossterm>
> +    <glossdef>
> +     <para>
> +      <!-- sounds excessively complicated a definition -->
> +      An <glossterm>Database</glossterm> object which represents the

A not An

> +      mathematical concept of a numerical integral sequence. It can be
> +      thought of as a <glossterm>Table</glossterm> with exactly one
> +      <glossterm>Row</glossterm> and one <glossterm>Column</glossterm>. The
> +      value stored is known as the current value. A
> +      <glossterm>Sequence</glossterm> has a defined direction (almost always
> +      increasing) and an interval step (usually 1).  Whenever the
> +      <literal>NEXTVAL</literal> pseudo-column of a
> +      <glossterm>Sequence</glossterm> is accessed, the current value is moved
> +      in the defined direction by the defined interval step, and that value

say "given interval step"

> +    <glossterm>Shared Memory</glossterm>
> +    <glossdef>
> +     <para>
> +      <acronym>RAM</acronym> which is used by the processes common to an
> +      <glossterm>Instance</glossterm>. It mirrors parts of
> +      <glossterm>Database</glossterm> files, provides an area for
> +      <glossterm>WAL Records</glossterm>,

Do we use shared_buffers for WAL ?

> +   <glossentry id="glossary-table">
> +    <glossterm>Table</glossterm>
> +    <glossdef>
> +     <para>
> +      A collection of <glossterm>Tuples</glossterm> (also known as
> +      <glossterm>Rows</glossterm> or <glossterm>Records</glossterm>) having
> +      a common data structure (the same number of
> +      <glossterm>Attributes</glossterm>s, in the same order, having the same

Attributes has two esses.

> +      name and type per position). A <glossterm>Table</glossterm> is the

I don't think you need to say here that the columns of a table all have the
same type and order.

> +    <glossterm>Temporary Tables</glossterm>
> +    <glossdef>
> +     <para>
> +      <glossterm>Table</glossterm>s that exist either for the lifetime of a
> +      <glossterm>Session</glossterm> or a
> +      <glossterm>Transaction</glossterm>, as defined at creation time. The

I would say "as specified at the time of its creation".

> +    <glossterm>Transaction</glossterm>
> +    <glossdef>
> +     <para>
> +      A combination of one or more commands that must act as a single

Remove "one or more"

> +    <glossterm>Trigger</glossterm>
> +    <glossdef>
> +     <para>
> +      A <glossterm>Function</glossterm> which can be defined to execute
> +      whenever a certain operation (<command>INSERT</command>,
> +      <command>UPDATE</command>, or <command>DELTE</command>) is applied to
> +      that <glossterm>Relation</glossterm>. A <glossterm>Trigger</glossterm>

s/that/a/

> +   <glossentry id="glossary-unique">
> +    <glossterm>Unique</glossterm>
> +    <glossdef>
> +     <para>
> +      The condition of having no matching values in the same

s/matching/duplicate/

> +      <glossterm>Relation</glossterm>. Most often used in the concept of

s/concept/context/

> +   <glossentry id="glossary-update">
> +    <glossterm>Update</glossterm>
> +    <glossdef>
> +     <para>
> +      A command used to modify <glossterm>Rows</glossterm> that already

or 'may already'

> +    <glossterm>WAL File</glossterm>
...
> +     <para>
> +      The sequence of <glossterm>WAL Records</glossterm> in combination with
> +      the sequence of <glossterm>WAL Files</glossterm> represents the

Remove "in combination with the sequence of >WAL Files<"

> +    <glossentry id="glossary-wal-log">
> +    <glossterm>WAL Log</glossterm>

Can you just say WAL or "write-ahead log".

> +    <glossdef>
> +     <para>
> +      A <glossterm>WAL Record</glossterm> contains either new or changed
> +      <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> data or
> +      information about a <command>COMMIT</command>,
> +      <command>ROLLBACK</command>, <command>SAVEPOINT</command>, or
> +      <glossterm>Checkpointer</glossterm> operation. WAL records use a
> +      non-printabe binary format.

non-printable
Or just remove it.
Or just remove the sentence.

> +   <glossterm>WAL Writer</glossterm>

process

> +   <glossentry id="glossary-window-function">
> +    <glossterm>Window Function</glossterm>
> +    <glossdef>
> +     <para>
> +      A type of <glossterm>Function</glossterm> similar to an
> +      <glossterm>Aggregate</glossterm> in that can derive its value from a

in that IT

> +      set of <glossterm>Rows</glossterm> in a <glossterm>Result
> +      Set</glossterm>, but still retaining the original source data.

-- 
Justin



Re: Add A Glossary

From
Jürgen Purtz
Date:
man pages: Sorry, if I confused someone with my poor English. I just 
want to express in my 'offline' mail that we don't have to worry about 
man page generation. The patch doesn't affect files in the /ref 
subdirectory from where man pages are created.

review process: Yes, it will be time-consumptive and it may be a hard 
job because of a) the patch has multiple authors with divergent writing 
styles and b) the terms affect different fundamental issues: SQL basics 
and PG basics. Concerning PG basics in the past we used a wide range of 
similar terms with different meanings as well as different terms for the 
same matter - within our documentation as well as in secondary 
publications. The terms "backend server" / "instance" are such an 
example and there shall be a clear decision in favor of one of the two. 
Presumably we will see more discussions about the question which one is 
the preferred term (remember the discussion concerning the terms 
master/slave, primary/secondary some weeks ago).

ongoing: Intermediate questions for clarifications are welcome.


Kind regards, Jürgen





Re: Add A Glossary

From
Jürgen Purtz
Date:
On 20.03.20 20:58, Justin Pryzby wrote:
> On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:
>> +    <glossterm>Aggregate</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      To combine a collection of data values into a single value, whose
>> +      value may not be of the same type as the original values.
>> +      <glossterm>Aggregate</glossterm> <glossterm>Functions</glossterm>
>> +      combine multiple <glossterm>Rows</glossterm> that share a common set
>> +      of values into one <glossterm>Row</glossterm>, which means that the
>> +      only data visible in the values in common, and the aggregates of the
> IS the values in common ?
> (or, "is the shared values")
>
>> +    <glossterm>Analytic</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>Function</glossterm> whose computed value can reference
>> +      values found in nearby <glossterm>Rows</glossterm> of the same
>> +      <glossterm>Result Set</glossterm>.
>> +    <glossterm>Archiver</glossterm>
> Can you change that to archiver process ?


I prefer the short term without the addition of 'process' - concerning 
'Archiver' as well as the other cases. But I'm not an native English 
speaker.


>> +    <glossterm>Atomic</glossterm>
> ..
>> +     <para>
>> +      In reference to an operation: An event that cannot be completed in
>> +      part: it must either entirely succeed or entirely fail. A series of
> Can you say: "an action which is not allowed to partially succed and then fail,
> ..."
>
>> +    <glossterm>Autovacuum</glossterm>
> Say autovacuum process ?
>
>> +    <glossdef>
>> +     <para>
>> +      Processes that remove outdated <acronym>MVCC</acronym>
> I would say "A set of processes that remove..."
>
>> +      <glossterm>Records</glossterm> of the <glossterm>Heap</glossterm> and
> I'm not sure, can you say "tuples" ?


This concerns the upcomming MVCC terms. We need a linguistic distinction 
between the different versions of 'records' or 'tuples'. In my 
understanding the term 'tuple' is nearer to a logical construct 
(relational algebra) and a 'record' some concrete implementation on 
disc. Therefor I prefer 'record' in this context.


>> +    <glossterm>Backend Process</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      Processes of an <glossterm>Instance</glossterm> which act on behalf of
> Say DATABASE instance


-1: The term 'database' is used inflationary. We shall restrict it to a 
few cases.


>> +    <glossterm>Backend Server</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      See <glossterm>Instance</glossterm>.
> same
>
>> +    <glossterm>Background Worker</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      Individual processes within an <glossterm>Instance</glossterm>, which
> same
>
>> +      run system- or user-supplied code. Typical use cases are processes
>> +      which handle parts of an <acronym>SQL</acronym> query to take
>> +      advantage of parallel execution on servers with multiple
>> +      <acronym>CPUs</acronym>.
> I would say "A typical use case is"
+1
>> +    <glossterm>Background Writer</glossterm>
> Add "process" ?
>
>> +    <glossdef>
>> +     <para>
>> +      Writes continuously dirty pages from <glossterm>Shared
> Say "Continuously writes"
+1
>> +      Memory</glossterm> to the file system. It starts periodically, but
> Hm, maybe "wakes up periodically"
+1
>> +    <glossterm>Cast</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A conversion of a <glossterm>Datum</glossterm> from its current data
>> +      type to another data type.
> maybe just say
> A conversion of a <glossterm>Datum</glossterm> another data type.
>
>> +    <glossterm>Catalog</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The <acronym>SQL</acronym> standard uses this standalone term to
>> +      indicate what is called a <glossterm>Database</glossterm> in
>> +      <productname>PostgreSQL</productname>'s terminology.
> Maybe remove "standalone" ?
>
>> +    <glossterm>Checkpointer</glossterm>
> Process
>
>> +      A process that writes dirty pages and <glossterm>WAL
>> +      Records</glossterm> to the file system and creates a special
> Does the chckpointer actually write WAL ?


YES, not only WAL Writer.


>> +      checkpoint record. This process is initiated when predefined
>> +      conditions are met, such as a specified amount of time has passed, or
>> +      a certain volume of records have been collected.
> collected or written?
>
> I would say:
>> +      A checkpoint is usually initiated by
>> +      a specified amount of time having passed, or
>> +      a certain volume of records having been written.


+-0


>> +    <glossterm>Checkpoint</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <link linkend="sql-checkpoint"> Checkpoint</link> is a point in time
> Extra space
>
>> +   <glossentry id="glossary-connection">
>> +    <glossterm>Connection</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <acronym>TCP/IP</acronym> or socket line for inter-process
> I don't know if I've ever heard the phase "socket line"
> I guess you mean a unix socket.


+1


>> +    <glossterm>Constraint</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A concept of restricting the values of data allowed within a
>> +      <glossterm>Table</glossterm>.
> Just say: "A restriction on the values..."?
>
>> +    <glossterm>Data Area</glossterm>
> Remove this ?  I've never heard this phrase before.


grep on *.sgml delivers 4 occurrences.


>> +    <glossdef>
>> +     <para>
>> +      The base directory on the filesystem of a
>> +      <glossterm>Server</glossterm> that contains all data files and
>> +      subdirectories associated with a <glossterm>Cluster</glossterm> with
>> +      the exception of tablespaces. The environment variable
> Should add an entry for "tablespace".


+1


>> +    <glossterm>Datum</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The internal representation of a <acronym>SQL</acronym> data type.
> I'm not sure if should use "a SQL" or "an SQL", but not both.


grep | wc delivers 106 occurrences for "an SQL" and 63 for "a SQL". It 
depends on how people pronounce the term SQL: "an esquel" or "a sequel".


>> +    <glossterm>Delete</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <acronym>SQL</acronym> command whose purpose is to remove
> just say "which removes"


+1


>> +   <glossentry id="glossary-file-segment">
>> +    <glossterm>File Segment</glossterm>
>> +    <glossdef>
>> +     <para>
>> +       If a heap or index file grows in size over 1 GB, it will be split
> 1GB is the default "segment size", which you should define.


???


>> +   <glossentry id="glossary-foreign-data-wrapper">
>> +    <glossterm>Foreign Data Wrapper</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A means of representing data that is not contained in the local
>> +      <glossterm>Database</glossterm> as if were in local
>> +      <glossterm>Table</glossterm>(s).
> I'd say:
>
> + A means of representing data as a <glossterm>Table</glossterm>(s) even though
> + it is not contained in the local <glossterm>Database</glossterm>
>
>
>> +   <glossentry id="glossary-foreign-key">
>> +    <glossterm>Foreign Key</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A type of <glossterm>Constraint</glossterm> defined on one or more
>> +      <glossterm>Column</glossterm>s in a <glossterm>Table</glossterm> which
>> +      requires the value in those <glossterm>Column</glossterm>s to uniquely
>> +      identify a <glossterm>Row</glossterm> in the specified
>> +      <glossterm>Table</glossterm>.
> An FK doesn't require the values in its table to be unique, right ?
> I'd say something like: "..which enforces that the values in those Columns are
> also present in an(other) table."
> Reference Referential Integrity?
>
>> +    <glossterm>Function</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      Any pre-defined transformation of data. Many
>> +      <glossterm>Functions</glossterm> are already defined within
>> +      <productname>PostgreSQL</productname> itself, but can also be
>> +      user-defined.
> I would remove "pre-", since you mentioned that it can be user-defined.
>
>> +    <glossterm>Global SQL Object</glossterm>
>> +    <glossdef>
>> +     <para>
>> +     <!-- FIXME -->
>> +      Not all <glossterm>SQL Objects</glossterm> belong to a certain
>> +      <glossterm>Schema</glossterm>. Some belong to the complete
>> +      <glossterm>Database</glossterm>, or even to the complete
>> +      <glossterm>Cluster</glossterm>. These are referred to as
>> +      <glossterm>Global SQL Objects</glossterm>. Collations and Extensions
>> +      such as <glossterm>Foreign Data Wrappers</glossterm> reside at the
>> +      <glossterm>Database</glossterm> level; <glossterm>Database</glossterm>
>> +      names, <glossterm>Roles</glossterm>,
>> +      <glossterm>Tablespaces</glossterm>, <glossterm>Replication</glossterm>
>> +      origins, and subscriptions for logical
>> +      <glossterm>Replication</glossterm> at the
>> +      <glossterm>Cluster</glossterm> level.
> I think "complete" is the wrong world.
> I would say:
> "An object which is not specific to a given database, but instead shared across
> the entire Cluster".


This phrase seems to be too simple. We must differentiate between the 
different levels: schema, database, cluster. Possibly someone finds a 
better phrase.


>> +   <glossentry id="glossary-grant">
>> +    <glossterm>Grant</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <acronym>SQL</acronym> command that is used to enable
> I'd say "allow"
>
>> +   <glossentry id="glossary-heap">
>> +    <glossterm>Heap</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      Contains the original values of <glossterm>Row</glossterm> attributes
> I'm not sure what "original" means here ?


Yes, this may be misleading. I want to express, that values are stored 
in the heap (the 'original') and possibly repeated as a key in an index.


>> +      (i.e. the data). The <glossterm>Heap</glossterm> is realized within
>> +      <glossterm>Database</glossterm> files and mirrored in
>> +      <glossterm>Shared Memory</glossterm>.
> I wouldn't say mirrored, and probably just remove at least the part after "and".


+-0


>> +   <glossentry id="glossary-host">
>> +    <glossterm>Host</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      See <glossterm>Server</glossterm>.
> Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
> remove this.


Sometimes the term "host" is used in a different meaning. Therefor we 
shall have this glossary entry for clarification that it shall be used 
only in the sense of a "server".


>> +   <glossentry id="glossary-index">
>> +    <glossterm>Index</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>Relation</glossterm> that contains data derived from a
>> +      <glossterm>Table</glossterm> (or <glossterm>Relation</glossterm> such
>> +      as a <glossterm>Materialized View</glossterm>). It's internal
> Its
>
>> +      structure supports very fast retrieval of and access to the original
>> +      data.
>> +    <glossterm>Instance</glossterm>
>> +    <glossdef>
>> +     <para>
> ...
>> +     <para>
>> +      Many <glossterm>Instances</glossterm> can run on the same server as
>> +      long as they use different <acronym>IP</acronym> ports and manage
> I would say "as long as their TCP/IP ports or sockets don't conflict, and manage..."


+1


>> +    <glossterm>Join</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A technique used with <command>SELECT</command> statements for
>> +      correlating data in one or more <glossterm>Relations</glossterm>.
> I would refer to this as a SQL keyword allowing to combine data from multiple
> relations.
>
>> +    <glossterm>Lock</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A mechanism for one process temporarily preventing data from being
>> +      manipulated by any other process.
> I'd say:
>
> +      A mechanism by which a process protects simultaneous access to a resource
> +      by other processes.
>
> (I said "protects" since shared locks don't prevent all access, and it's easier
> than explaining "unsafe access").
>
>
>> +   <glossentry id="glossary-log-file">
>> +    <glossterm>Log File</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      <link linkend="logfile-maintenance">LOG files</link> contain readable
>> +      text lines about serious and non-serious events, e.g.: use of wrong
>> +      password, long-running queries, ... .
> Serious and non-serious?


ok, can be removed: 'events' only.


>> +    <glossterm>Log Writer</glossterm>
> process
>
>> +    <glossdef>
>> +     <para>
>> +      If activated and parameterized, the
> I don't know what parameterized means here


ok, unnecessary term. (There are parameters for the Log Writer process 
in the config file.)


>> +      <link linkend="runtime-config-logging">Log Writer</link> process
>> +      writes information about database events into the current
>> +      <glossterm>Log file</glossterm>. When reaching certain time- or
>> +      volume-dependent criterias, he <!-- FIXME "he"? --> creates a new
> I think criteria is the plural..


+1


>> +    <glossterm>Log Record</glossterm>
> Can we remove this ?
> Couple releases ago, "pg_xlog" was renamed to pg_wal.
> I'd prefer to avoid defining something called "Log Record" about WAL that's
> right next to text logs.


"... that's right next to text logs."  This is the problem, which shall 
be clarified. The rename of the directory does not affect the records 
which are written into the WAL files or are used for replication. The 
term "log record" is used in the documentation as well as in error messages.


>> +    <glossterm>Logged</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>Table</glossterm> is considered
>> +      <glossterm>Logged</glossterm> if changes to it are sent to the
>> +      <glossterm>WAL Log</glossterm>. By default, all regular
>> +      <glossterm>Tables</glossterm> are <glossterm>Logged</glossterm>. A
>> +      <glossterm>Table</glossterm> can be speficied as unlogged either at
>> +      creation time or via the <command>ALTER TABLE</command> command. The
>> +      primary use of unlogged <glossterm>Tables</glossterm> is for storing
>> +      transient work data that must be shared across processes, but with a
>> +      final result stored in logged <glossterm>Tables</glossterm>.
>> +      <glossterm>Temporary Tables</glossterm> are always unlogged.
>> +     </para>
>> +    </glossdef>
>> +   </glossentry>
> Maybe it's be better to define "unlogged", since 1) logged is the default; and
> 2) it's right next to text logs.
>
>> +    <glossterm>Master</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      When two or more <glossterm>Databases</glossterm> are linked via
>> +      <glossterm>Replication</glossterm>, the <glossterm>Server</glossterm>
>> +      that is considered the authoritative source of information is called
>> +      the <glossterm>Master</glossterm>.
> I think it'd actually be the <<instance>> which is authoritative, in case they're
> running on the same <<Server>>


In this phase of the glossary we shall avoid the discussion about 
master/slave vs. primary/secondary. Some weeks ago we have seen many 
contributions without a clear result. In one of the next phases of the 
glossary we shall discuss all terms concerning replication separately.


>> +   <glossentry id="glossary-materialized">
>> +    <glossterm>Materialized</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The act of storing information rather than just the means of accessing
> remove "means of" ?
>
>> +      the information. This term is used in <glossterm>Materialized
>> +      Views</glossterm> meaning that the data derived from the
>> +      <glossterm>View</glossterm> is actually stored on disk separate from
> separately
>
>> +      the sources of that data. When the term
>> +      <glossterm>Materialized</glossterm> is used in speaking about
>> +      mulit-step queries, it means that the data of a given step is stored
> multi
>
>> +      (in memory, but that storage may spill over onto disk).
>> +     </para>
>> +    </glossdef>
>> +   </glossentry>
>> +
>> +   <glossentry id="glossary-materialized-view">
>> +    <glossterm>Materialized View</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>Relation</glossterm> that is defined in the same way that
>> +      a <glossterm>View</glossterm> is, but it stores data in the same way
> change "it stores" to stores
>
>> +   <glossentry id="glossary-partition">
>> +    <glossterm>Partition</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      <!-- FIXME should this use the style used in "atomic"? -->
>> +      a) A <glossterm>Table</glossterm> that can be queried independently by
>> +      its own name, but can also be queried via another
> just say "on its own" or "directly"
>
>> +      <glossterm>Table</glossterm>, a partitionend
> partitioned
> also, put it in parens, like "via another table (a partitioned table)..."
>
>> +      <glossterm>Table</glossterm>, which is a collection of
> Say "set" here since you later talk about "subsets" and sets.
>
>> +   <glossentry id="glossary-primary-key">
>> +    <glossterm>Primary Key</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A special case of <glossterm>Unique Index</glossterm> defined on a
>> +      <glossterm>Table</glossterm> or other <glossterm>Relation</glossterm>
>> +      that also guarantees that all of the <glossterm>Attributes</glossterm>
>> +      within the <glossterm>Primary Key</glossterm> do not have
>> +      <glossterm>Null</glossterm> values.  As the name implies, there can be
>> +      only one <glossterm>Primary Key</glossterm> per
>> +      <glossterm>Table</glossterm>, though it is possible to have multiple
>> +      <glossterm>Unique Indexes</glossterm> that also have no
>> +      <glossterm>Null</glossterm>-capable <glossterm>Attributes</glossterm>.
> I would say "multiple >>unique indexes<< on >>attributes<< defined as not
> nullable.
>
>> +    <glossterm>Procedure</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A defined set of instructions for manipulating data within a
>> +      <glossterm>Database</glossterm>. <glossterm>Procedure</glossterm> can
> "procedures" or "a procedure"
>
>> +    <glossterm>Record</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      See <link linkend="sql-revoke">Tupple</link>.
> Tupple is back.  And again below.
>
>> +      A single <glossterm>Row</glossterm> of a <glossterm>Table</glossterm>
>> +      or other Relation.
> I think it's commonly used to mean "an instance of a row" (in an MVCC sense),
> but maybe that's too much detail for here.
>
>> +    <glossterm>Referential Integrity</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The means of restricting data in one <glossterm>Relation</glossterm>
> A means
>
>> +   <glossentry id="glossary-relation">
>> +    <glossterm>Relation</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The generic term for all objects in a <glossterm>Database</glossterm>
> "A generic term for any object in a >>database<< that has a name and..."
>
>> +   <glossentry id="glossary-result-set">
>> +    <glossterm>Result Set</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A data structure transmitted from a <glossterm>Server</glossterm> to
>> +      client program upon the completion of a <acronym>SQL</acronym>
>> +      command, usually a <command>SELECT</command> but it can be an
>> +      <command>INSERT</command>, <command>UPDATE</command>, or
>> +      <command>DELETE</command> command if the <literal>RETURNING</literal>
>> +      clause is specified.
> I'd remove everything in that sentence after "usually".
>
>> +    <glossterm>Revoke</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A command to reduce access to a named set of
> s/reduce/prevent/ ?
>
>> +    <glossterm>Row</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      See <link linkend="sql-revoke">Tupple</link>.
> tuple
>
>> +   <glossentry id="glossary-savepoint">
>> +    <glossterm>Savepoint</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A special mark (such as a timestamp) inside a
>> +      <glossterm>Transaction</glossterm>. Data modifications after this
>> +      point in time may be rolled back to the time of the savepoint.
> I don't think "timestamp" is a useful or accurate analogy for this.


+1


>> +    <glossterm>Schema</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <link linkend="ddl-schemas">schema</link> is a namespace for
>> +      <glossterm>SQL objects</glossterm>, which all reside in the same
>> +      <glossterm>database</glossterm>.  Each <glossterm>SQL
>> +      object</glossterm> must reside in exactly one
>> +      <glossterm>Schema</glossterm>.
>> +     </para>
>> +     <para>
>> +      In general, the names of <glossterm>SQL objects</glossterm> in the
>> +      schema are unique - even across different types of objects.  The lone
>> +      exception is the case of <glossterm>Unique</glossterm>
>> +      <glossterm>Constraint</glossterm>s, in which case there
>> +      <emphasis>must</emphasis> be a <glossterm>Unique Index</glossterm>
>> +      with the same name and <glossterm>Schema</glossterm> as the
>> +      <glossterm>Constraint</glossterm>.  There is no restriction on having
>> +      a name used in multiple <glossterm>Schema</glossterm>s.
> I think there's some confusion.  Constraints are not objects, right ?
>
> But, constraints do have an exception (not just unique constraints, though):
> the constraint is only unique on its table, not in its database/schema.
>
>      "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname) CLUSTER


Yes, you are right. But give me some time for a better suggestion.


>> +    <glossterm>Select</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The command used to query a <glossterm>Database</glossterm>. Normally,
>> +      <command>SELECT</command>s are not expected to modify the
>> +      <glossterm>Database</glossterm> in any way, but it is possible that
>> +      <glossterm>Functions</glossterm> invoked within the query could have
>> +      side-effects that do modify data.  </para>
> I think there should be references to the sql-* pages for this and others.
>
>> +   <glossentry id="glossary-serializable">
>> +    <glossterm>Serializable</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      Transactions defined as <literal>SERIALIZABLE</literal> are unable to
>> +      see changes made within other transactions. In effect, for the
>> +      initializing session the entire <glossterm>Database</glossterm>
>> +      appears to be frozen duration such a
>> +      <glossterm>Transaction</glossterm>.
> Do you mean "for the duration of the >>Transaction<<"
>
>> +   <glossentry id="glossary-session">
>> +    <glossterm>Session</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>Connection</glossterm> to the <glossterm>Database</glossterm>.
>> +     </para>
>> +     <para>
>> +      A description of the commands that were issued in the life cycle of a
>> +      particular <glossterm>Connection</glossterm> to the
>> +      <glossterm>Database</glossterm>.
> I'm not sure what this <para> means.
>
>> +    <glossterm>Sequence</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      <!-- sounds excessively complicated a definition -->
>> +      An <glossterm>Database</glossterm> object which represents the
> A not An
>
>> +      mathematical concept of a numerical integral sequence. It can be
>> +      thought of as a <glossterm>Table</glossterm> with exactly one
>> +      <glossterm>Row</glossterm> and one <glossterm>Column</glossterm>. The
>> +      value stored is known as the current value. A
>> +      <glossterm>Sequence</glossterm> has a defined direction (almost always
>> +      increasing) and an interval step (usually 1).  Whenever the
>> +      <literal>NEXTVAL</literal> pseudo-column of a
>> +      <glossterm>Sequence</glossterm> is accessed, the current value is moved
>> +      in the defined direction by the defined interval step, and that value
> say "given interval step"
>
>> +    <glossterm>Shared Memory</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      <acronym>RAM</acronym> which is used by the processes common to an
>> +      <glossterm>Instance</glossterm>. It mirrors parts of
>> +      <glossterm>Database</glossterm> files, provides an area for
>> +      <glossterm>WAL Records</glossterm>,
> Do we use shared_buffers for WAL ?


Yes, my information is that WAL records are part of the shared_buffers.


>> +   <glossentry id="glossary-table">
>> +    <glossterm>Table</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A collection of <glossterm>Tuples</glossterm> (also known as
>> +      <glossterm>Rows</glossterm> or <glossterm>Records</glossterm>) having
>> +      a common data structure (the same number of
>> +      <glossterm>Attributes</glossterm>s, in the same order, having the same
> Attributes has two esses.
>
>> +      name and type per position). A <glossterm>Table</glossterm> is the
> I don't think you need to say here that the columns of a table all have the
> same type and order.


In my opinion this is an essential information.


>> +    <glossterm>Temporary Tables</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      <glossterm>Table</glossterm>s that exist either for the lifetime of a
>> +      <glossterm>Session</glossterm> or a
>> +      <glossterm>Transaction</glossterm>, as defined at creation time. The
> I would say "as specified at the time of its creation".
>
>> +    <glossterm>Transaction</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A combination of one or more commands that must act as a single
> Remove "one or more"
>
>> +    <glossterm>Trigger</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>Function</glossterm> which can be defined to execute
>> +      whenever a certain operation (<command>INSERT</command>,
>> +      <command>UPDATE</command>, or <command>DELTE</command>) is applied to
>> +      that <glossterm>Relation</glossterm>. A <glossterm>Trigger</glossterm>
> s/that/a/
>
>> +   <glossentry id="glossary-unique">
>> +    <glossterm>Unique</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The condition of having no matching values in the same
> s/matching/duplicate/
>
>> +      <glossterm>Relation</glossterm>. Most often used in the concept of
> s/concept/context/
>
>> +   <glossentry id="glossary-update">
>> +    <glossterm>Update</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A command used to modify <glossterm>Rows</glossterm> that already
> or 'may already'
>
>> +    <glossterm>WAL File</glossterm>
> ...
>> +     <para>
>> +      The sequence of <glossterm>WAL Records</glossterm> in combination with
>> +      the sequence of <glossterm>WAL Files</glossterm> represents the
> Remove "in combination with the sequence of >WAL Files<"
>
>> +    <glossentry id="glossary-wal-log">
>> +    <glossterm>WAL Log</glossterm>
> Can you just say WAL or "write-ahead log".


Sometimes the term "WAL log" is used in the documentation. But the 
preferred term is "WAL file". This glossary entry does nothing but 
points to the preferred term, which indicates that he shall be avoided 
in the future.


>> +    <glossdef>
>> +     <para>
>> +      A <glossterm>WAL Record</glossterm> contains either new or changed
>> +      <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> data or
>> +      information about a <command>COMMIT</command>,
>> +      <command>ROLLBACK</command>, <command>SAVEPOINT</command>, or
>> +      <glossterm>Checkpointer</glossterm> operation. WAL records use a
>> +      non-printabe binary format.
> non-printable
+1
> Or just remove it.
> Or just remove the sentence.
>
>> +   <glossterm>WAL Writer</glossterm>
> process
>
>> +   <glossentry id="glossary-window-function">
>> +    <glossterm>Window Function</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      A type of <glossterm>Function</glossterm> similar to an
>> +      <glossterm>Aggregate</glossterm> in that can derive its value from a
> in that IT
>
>> +      set of <glossterm>Rows</glossterm> in a <glossterm>Result
>> +      Set</glossterm>, but still retaining the original source data.


Kind regards, Jürgen





Re: Add A Glossary

From
Justin Pryzby
Date:
On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
> > > +   <glossentry id="glossary-file-segment">
> > > +    <glossterm>File Segment</glossterm>
> > > +    <glossdef>
> > > +     <para>
> > > +       If a heap or index file grows in size over 1 GB, it will be split
> > 1GB is the default "segment size", which you should define.
> 
> ???

"A <<Table>> or other >>Relation<<" is larger than a >Cluster's< segment size
is stored in multiple physical files.  This avoids file size limitations which
vary across operating systems."

https://www.postgresql.org/docs/devel/runtime-config-preset.html

ts=# SELECT name, setting, unit, category, short_desc FROM pg_settings WHERE name~'block_size|segment_size';
       name       | setting  | unit |    category    |                  short_desc                  
------------------+----------+------+----------------+----------------------------------------------
 block_size       | 8192     |      | Preset Options | Shows the size of a disk block.
 segment_size     | 131072   | 8kB  | Preset Options | Shows the number of pages per disk file.
 wal_block_size   | 8192     |      | Preset Options | Shows the block size in the write ahead log.
 wal_segment_size | 16777216 | B    | Preset Options | Shows the size of write ahead log segments.

> > > +   <glossentry id="glossary-heap">
> > > +    <glossterm>Heap</glossterm>
> > > +    <glossdef>
> > > +     <para>
> > > +      Contains the original values of <glossterm>Row</glossterm> attributes
> > I'm not sure what "original" means here ?
> 
> Yes, this may be misleading. I want to express, that values are stored in
> the heap (the 'original') and possibly repeated as a key in an index.

Maybe "this is the content of rows/attributes in >>Tables<< or other >>Relations<<".
or "this is the data store for ..."

> > > +   <glossentry id="glossary-host">
> > > +    <glossterm>Host</glossterm>
> > > +    <glossdef>
> > > +     <para>
> > > +      See <glossterm>Server</glossterm>.
> > Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
> > remove this.
> 
> Sometimes the term "host" is used in a different meaning. Therefor we shall
> have this glossary entry for clarification that it shall be used only in the
> sense of a "server".

I think that suggests just removing "host" and consistently saying "server".

-- 
Justin



Re: Add A Glossary

From
Corey Huinker
Date:
On Fri, Mar 20, 2020 at 6:32 PM Jürgen Purtz <juergen@purtz.de> wrote:
man pages: Sorry, if I confused someone with my poor English. I just
want to express in my 'offline' mail that we don't have to worry about
man page generation. The patch doesn't affect files in the /ref
subdirectory from where man pages are created.

It wasn't your poor English - everyone else understood what you meant. I had wondered if our docs went into man page format as well, so my research was still time well spent. 

Re: Add A Glossary

From
Jürgen Purtz
Date:
On 21.03.20 00:03, Justin Pryzby wrote:
+   <glossentry id="glossary-host">
+    <glossterm>Host</glossterm>
+    <glossdef>
+     <para>
+      See <glossterm>Server</glossterm>.
Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
remove this.
Sometimes the term "host" is used in a different meaning. Therefor we shall
have this glossary entry for clarification that it shall be used only in the
sense of a "server".
I think that suggests just removing "host" and consistently saying "server".

"server", "host", "database server": All three terms are used intensively in the documentation. When we define glossary terms, we should also take into account the consequences for those parts. "database server" is the most diffuse. E.g.: In 'config.sgml' he is used in the sense of some hardware or VM "... If you have a dedicated database server with 1GB or more of RAM ..." as well as in the sense of an instance "... To start the database server on the command prompt ...". Additionally the term is completely misleading. In both cases we do not mean something which is related to a database but something which is related to a cluster.

In the past, people accepted such blurs. My - minimal - intention is to raise awareness of such ambiguities, or - better - to clearly define the situation in the glossary. But this is only a first step. The second step shall be a rework of the documentation to use the preferred terms defined in the glossary. Because there will be a time gap between the two steps, we may want to be a little chatty in the glossary and define ambiguous terms as shown in the following example:

---

Server: The term "Server" denotes ....  .

Host: An outdated term which will be replaced by <xref-to-the-glossary>Server</xref> over time.

Database Server: An outdated term which sometimes denotes a <xref-to-the-glossary>Server</xref> and sometimes an <xref-to-the-glossary>Instance</xref>.

---

This is a pattern for all terms which we currently described with the phrase "See ...". Later, after reviewing the documentation by eliminating the non-preferred terms, the glossary entries with "An outdated term ..." can be dropped.

In the last days we have seen many huge and small proposals. I think, it will be helpful to summarize this work by waiting for a patch from Alvaro containing everything it deems useful from his point of view.

Kind regards, Jürgen


Re: Add A Glossary

From
Justin Pryzby
Date:
On Sat, Mar 21, 2020 at 03:08:30PM +0100, Jürgen Purtz wrote:
> On 21.03.20 00:03, Justin Pryzby wrote:
> > > > > +   <glossentry id="glossary-host">
> > > > > +    <glossterm>Host</glossterm>
> > > > > +    <glossdef>
> > > > > +     <para>
> > > > > +      See <glossterm>Server</glossterm>.
> > > > Or client.  Or proxy at some layer or other intermediate thing.  Maybe just
> > > > remove this.
> > > Sometimes the term "host" is used in a different meaning. Therefor we shall
> > > have this glossary entry for clarification that it shall be used only in the
> > > sense of a "server".
> > I think that suggests just removing "host" and consistently saying "server".
> 
> "server", "host", "database server": All three terms are used intensively in
> the documentation. When we define glossary terms, we should also take into
> account the consequences for those parts.

The documentation uses "host", but doesn't always mean "server".

$ git grep -Fw host doc/src/
doc/src/sgml/backup.sgml:   that you can perform this backup procedure from any remote host that has

pg_hba appears to be all about client "hosts".
FATAL:  no pg_hba.conf entry for host "123.123.123.123", user "andym", database "testdb"

-- 
Justin



Re: Add A Glossary

From
Peter Eisentraut
Date:
On 2020-03-20 01:11, Alvaro Herrera wrote:
> I gave this a look.  I first reformatted it so I could read it; that's
> 0001.  Second I changed all the long <link> items into <xref>s, which
> are shorter and don't have to repeat the title of the refered to page.
> (Of course, this changes the link to be in the same style as every other
> link in our documentation; some people don't like it. But it's our
> style.)

AFAICT, all the <link> elements in this patch should be changed to <xref>.

If there is something undesirable about the output style, we can change 
that, but it's not this patch's job to make up its own rules.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Robert Haas
Date:
On Fri, Mar 20, 2020 at 3:58 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > +      A process that writes dirty pages and <glossterm>WAL
> > +      Records</glossterm> to the file system and creates a special
>
> Does the chckpointer actually write WAL ?

Yes.

> An FK doesn't require the values in its table to be unique, right ?

I believe it does require that the values are unique.

> I think there's some confusion.  Constraints are not objects, right ?

I think constraints are definitely objects. They have names and you
can, for example, COMMENT on them.

> Do we use shared_buffers for WAL ?

No.

(I have not reviewed the patch; these are just a few comments on your comments.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add A Glossary

From
Corey Huinker
Date:

> > +      Records</glossterm> to the file system and creates a special
>
> Does the chckpointer actually write WAL ?

Yes.

> An FK doesn't require the values in its table to be unique, right ?

I believe it does require that the values are unique.

> I think there's some confusion.  Constraints are not objects, right ?

I think constraints are definitely objects. They have names and you
can, for example, COMMENT on them.

> Do we use shared_buffers for WAL ?

No.

(I have not reviewed the patch; these are just a few comments on your comments.)


I'm going to be coalescing the feedback into an updated patch very soon (tonight/tomorrow), so please keep the feedback on the text/wording coming until then.
If anyone has a first attempt at all the ACID definitions, I'd love to see those as well.

 

Re: Add A Glossary

From
Jürgen Purtz
Date:


On 24.03.20 19:46, Robert Haas wrote:
Do we use shared_buffers for WAL ?
No.

What's about the explanation in https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers (integer)    The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the parameter wal_buffers grabs some of the existing shared_buffers for its own purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers two different shared memory areas?

Kind regards, Jürgen


Re: Add A Glossary

From
Robert Haas
Date:
On Tue, Mar 24, 2020 at 3:40 PM Jürgen Purtz <juergen@purtz.de> wrote:
> On 24.03.20 19:46, Robert Haas wrote:
> Do we use shared_buffers for WAL ?
>
> No.
>
> What's about the explanation in https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers (integer)
The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects
asize equal to 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the parameter wal_buffers grabs
someof the existing shared_buffers for its own purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers
twodifferent shared memory areas? 

Yes. The code adds up the shared memory requests from all of the
different subsystems and then allocates one giant chunk of shared
memory which is divided up between them. The overwhelming majority of
that memory goes into shared_buffers, but not all of it. You can use
the new pg_get_shmem_allocations() function to see how it's used. For
example, with shared_buffers=4GB:

rhaas=# select name, pg_size_pretty(size) from
pg_get_shmem_allocations() order by size desc limit 10;
         name         | pg_size_pretty
----------------------+----------------
 Buffer Blocks        | 4096 MB
 Buffer Descriptors   | 32 MB
 <anonymous>          | 32 MB
 XLOG Ctl             | 16 MB
 Buffer IO Locks      | 16 MB
 Checkpointer Data    | 12 MB
 Checkpoint BufferIds | 10 MB
 clog                 | 2067 kB
                      | 1876 kB
 subtrans             | 261 kB
(10 rows)

rhaas=# select count(*), pg_size_pretty(sum(size)) from
pg_get_shmem_allocations();
 count | pg_size_pretty
-------+----------------
    54 | 4219 MB
(1 row)

So, in this configuration, there whole shared memory segment is
4219MB, of which 4096MB is allocated to shared_buffers and the rest to
dozens of smaller allocations, with 1876 kB left over that might get
snapped up later by an extension that wants some shared memory.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add A Glossary

From
Justin Pryzby
Date:
On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
> > > +    <glossterm>Archiver</glossterm>
> > Can you change that to archiver process ?
> 
> I prefer the short term without the addition of 'process' - concerning
> 'Archiver' as well as the other cases. But I'm not an native English
> speaker.

I didn't like it due to lack of context.

What about "wal archiver" ?

It occured to me when I read this.
https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com

-- 
Justin



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 27.03.20 21:12, Justin Pryzby wrote:
> On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>>>> +    <glossterm>Archiver</glossterm>
>>> Can you change that to archiver process ?
>> I prefer the short term without the addition of 'process' - concerning
>> 'Archiver' as well as the other cases. But I'm not an native English
>> speaker.
> I didn't like it due to lack of context.
>
> What about "wal archiver" ?
>
> It occured to me when I read this.
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
>
"WAL archiver" is ok for me. In the current documentation we have 2 
places with "WAL archiver" and 4 with "archiver"-only 
(high-availability.sgml, monitoring.sgml).

"backend process" is an exception to the other terms because the 
standalone term "backend" is sensibly used in diverse situations.

Kind regards, Jürgen





Re: Add A Glossary

From
Corey Huinker
Date:


On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 27.03.20 21:12, Justin Pryzby wrote:
> On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>>>> +    <glossterm>Archiver</glossterm>
>>> Can you change that to archiver process ?
>> I prefer the short term without the addition of 'process' - concerning
>> 'Archiver' as well as the other cases. But I'm not an native English
>> speaker.
> I didn't like it due to lack of context.
>
> What about "wal archiver" ?
>
> It occured to me when I read this.
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
>
"WAL archiver" is ok for me. In the current documentation we have 2
places with "WAL archiver" and 4 with "archiver"-only
(high-availability.sgml, monitoring.sgml).

"backend process" is an exception to the other terms because the
standalone term "backend" is sensibly used in diverse situations.

Kind regards, Jürgen

I've taken Alvarao's fixes and done my best to incorporate the feedback into a new patch, which Roger's (tech writer) reviewed yesterday.

The changes are too numerous to list, but the highlights are:

New definitions:
* All four ACID terms
* Vacuum (split off from Autovacuum)
* Tablespace
* WAL Archiver (replaces Archiver)

Changes to existing terms:
* Implemented most wording changes recommended by Justin
* all remaining links were either made into xrefs or edited out of existence
* de-tagged most second uses of of a term within a definition

Did not do
* Addressed the " Process" suffix suggested by Justin. There isn't consensus on these changes, and I'm neutral on the matter
* change the Cast definition. I think it's important to express that a cast has a FROM datatype as well as a TO
* anything host/server related as I couldn't see a consensus reached

Other thoughts:
* Trivial definitions that are just see-other-definition are ok with me, as the goal of this glossary is to aid in discovery of term meanings, so knowing that two terms are interchangable is itself helpful

It is my hope that this revision represents the final _structural_ change to the glossary. New definitions and edits to existing definitions will, of course, go on forever. 
Attachment

Re: Add A Glossary

From
Jürgen Purtz
Date:
On 30.03.20 19:10, Corey Huinker wrote:


On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz <juergen@purtz.de> wrote:
On 27.03.20 21:12, Justin Pryzby wrote:
> On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>>>> +    <glossterm>Archiver</glossterm>
>>> Can you change that to archiver process ?
>> I prefer the short term without the addition of 'process' - concerning
>> 'Archiver' as well as the other cases. But I'm not an native English
>> speaker.
> I didn't like it due to lack of context.
>
> What about "wal archiver" ?
>
> It occured to me when I read this.
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
>
"WAL archiver" is ok for me. In the current documentation we have 2
places with "WAL archiver" and 4 with "archiver"-only
(high-availability.sgml, monitoring.sgml).

"backend process" is an exception to the other terms because the
standalone term "backend" is sensibly used in diverse situations.

Kind regards, Jürgen

I've taken Alvarao's fixes and done my best to incorporate the feedback into a new patch, which Roger's (tech writer) reviewed yesterday.

The changes are too numerous to list, but the highlights are:

New definitions:
* All four ACID terms
* Vacuum (split off from Autovacuum)
* Tablespace
* WAL Archiver (replaces Archiver)

Changes to existing terms:
* Implemented most wording changes recommended by Justin
* all remaining links were either made into xrefs or edited out of existence
* de-tagged most second uses of of a term within a definition

Did not do
* Addressed the " Process" suffix suggested by Justin. There isn't consensus on these changes, and I'm neutral on the matter
* change the Cast definition. I think it's important to express that a cast has a FROM datatype as well as a TO
* anything host/server related as I couldn't see a consensus reached

Other thoughts:
* Trivial definitions that are just see-other-definition are ok with me, as the goal of this glossary is to aid in discovery of term meanings, so knowing that two terms are interchangable is itself helpful

It is my hope that this revision represents the final _structural_ change to the glossary. New definitions and edits to existing definitions will, of course, go on forever.

Please find some minor suggestions in the attachment. They are based on Corey's last patch 0001-glossary-v4.patch.

Kind regards, Jürgen


Attachment

Re: Add A Glossary

From
Justin Pryzby
Date:
On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote:
> Please find some minor suggestions in the attachment. They are based on
> Corey's last patch 0001-glossary-v4.patch.

> @@ -220,7 +220,7 @@
>        Record</glossterm>s to the file system and creates a special
>        checkpoint record. This process is initiated when predefined
>        conditions are met, such as a specified amount of time has passed, or
> -      a certain volume of records have been collected.
> +      a certain volume of records has been collected.

I think you're correct in that "volume" is singular.  But I think "collected"
is the wrong world.  I suggested "written".

>       <para>
> -      One of the <acronym>ACID</acronym> properties. This means that concurrently running 
> +      One of the <acronym>ACID</acronym> properties. This means that concurrently running

These could maybe say "required" or "essential" >ACID< properties

>       <para>
> +      In reference to a <glossterm>Table</glossterm>:
>        A <glossterm>Table</glossterm> that can be queried directly,

Maybe: "In reference to a >Relation<: A table which can be queried directly,"

>        table in the collection.
>       </para>
>       <para>
> -      When referring to an <glossterm>Analytic</glossterm>
> -      <glossterm>Function</glossterm>: a partition is a definition
> -      that identifies which neighboring
> +      In reference to a <glossterm>Analytic Function</glossterm>:
s/a/an/

> @@ -1333,7 +1334,8 @@
>      <glossdef>
>       <para>
>        The condition of having no duplicate values in the same
> -      <glossterm>Relation</glossterm>. Often used in the concept of
> +      <glossterm>Column</glossterm> of a <glossterm>Relation</glossterm>.
> +      Often used in the concept of

s/concept/context/, but  I said that before, so maybe it was rejected.

-- 
Justin



Re: Add A Glossary

From
Justin Pryzby
Date:
On Mon, Mar 30, 2020 at 01:10:19PM -0400, Corey Huinker wrote:
> +   <glossentry id="glossary-aggregating">
> +    <glossterm>Aggregating</glossterm>
> +    <glossdef>
> +     <para>
> +      The act of combining a collection of data (input) values into
> +      a single output value, which may not be of the same type as the
> +      input values.

I think we maybe already tried to address this ; but could we define a noun
form ?  But not "aggregate" since it's the same word as the verb form.  I think
it would maybe be best to merge with "aggregate function", below.

> +   <glossentry id="glossary-consistency">
> +    <glossterm>Consistency</glossterm>
> +    <glossdef>
> +     <para>
> +      One of the <acronym>ACID</acronym> properties. This means that the database
> +      is always in compliance with its own rules such as <glossterm>Table</glossterm>
> +      structure, <glossterm>Constraint</glossterm>s,

I don't think the definition of "compliance" is good.  The state of being
consistent means an absense of corruption more than that an absense of data
integrity issues (which could be caused by corruption).

> +   <glossentry id="glossary-datum">
> +    <glossterm>Datum</glossterm>
> +    <glossdef>
> +     <para>
> +      The internal representation of a <acronym>SQL</acronym> data type.

Could you say "..used by PostgreSQL" ?

> +    <glossterm>File Segment</glossterm>
> +    <glossdef>
> +     <para>
> +       A physical file which stores data for a given
> +       <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> object.
> +       <glossterm>File Segment</glossterm>s are limited in size by a
> +       configuration value and if that size is exceeded, it will be split
> +       into multiple physical files.

Say "if an object exceeds that size, then it will be stored across multiple
physical files".

> +      which handles parts of an <acronym>SQL</acronym> query to take
...
> +      A <acronym>SQL</acronym> command used to add new data into a

I mentioned before, please be consistent: "A SQL or An SQL".

> +     </para>
> +     <para>
> +      Many <glossterm>Instance</glossterm>s can run on the same server as

Say "multiple" not many.

> +   <glossentry id="glossary-join">
> +    <glossterm>Join</glossterm>
> +    <glossdef>
> +     <para>
> +      A <acronym>SQL</acronym> keyword used in <command>SELECT</command> statements for
> +      combining data from multiple <glossterm>Relation</glossterm>s.

Could you add a link to the docs ?

> +   <glossentry id="glossary-log-writer">
> +    <glossterm>Log Writer</glossterm>
> +    <glossdef>
> +     <para>
> +      If activated and parameterized, the

I still don't know what parameterized means here.

> +   <glossentry id="glossary-system-catalog">
> +    <glossterm>System Catalog</glossterm>
> +    <glossdef>
> +     <para>
> +      A collection of <glossterm>Table</glossterm>s and
> +      <glossterm>View</glossterm>s which describe the structure of all
> +      <acronym>SQL</acronym> objects of the <glossterm>Database</glossterm>

I would say "... a PostgreSQL >Database<"

> +      and the <glossterm>Global SQL Object</glossterm>s of the
> +      <glossterm>Cluster</glossterm>. The <glossterm>System
> +      Catalog</glossterm> resides in the schema
> +      <literal>pg_catalog</literal>. Main parts are mirrored as
> +      <glossterm>View</glossterm>s in the <glossterm>Schema</glossterm>
> +      <literal>information_schema</literal>.

I wouldn't say "mirror":  Some information is also exposed as >Views< in the
>information_schema< >Schema<.

> +   <glossentry id="glossary-tablespace">
> +    <glossterm>Tablespace</glossterm>
> +    <glossdef>
> +     <para>
> +      A named location on the server filesystem. All <glossterm>SQL Object</glossterm>s
> +      which require storage beyond their definition in the
> +      <glossterm>System Catalog</glossterm>
> +      must belong to a single tablespace.

Remove "single" as it sounds like we only support one.

> +    <glossterm>Transaction</glossterm>
> +    <glossdef>
> +     <para>
> +      A combination of commands that must act as a single
> +      <glossterm>Atomic</glossterm> command: they all succeed or all fail
> +      as a single unit, and their effects are not visible to other
> +      <glossterm>Session</glossterm>s until
> +      the <glossterm>Transaction</glossterm> is complete.

s/complete/commited/ ?


> +   <glossentry id="glossary-unique">
> +    <glossterm>Unique</glossterm>
> +    <glossdef>
> +     <para>
> +      The condition of having no duplicate values in the same
> +      <glossterm>Relation</glossterm>. Often used in the concept of

s/concept/context/

> +    <glossterm>Vacuum</glossterm>
> +    <glossdef>
> +     <para>
> +      The process of removing outdated <acronym>MVCC</acronym>

Maybe say "tuples which were deleted or obsoleted by an UPDATE".
But maybe you're trying to use generic language.

-- 
Justin



Re: Add A Glossary

From
Justin Pryzby
Date:
On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote:
> 1. It's obviously incomplete. There are more terms, a lot more, to add.

How did you come up with the initial list of terms ?

Here's some ideas; I'm *not* suggesting to include all of everything, but
hopefully start with a coherent, self-contained list.

grep -roh '<firstterm>[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c |sort -nr |less

Maybe also:
object identifier
operator classes
operator family
visibility map

-- 
Justin



Re: Add A Glossary

From
Corey Huinker
Date:
On Tue, Mar 31, 2020 at 2:09 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote:
> 1. It's obviously incomplete. There are more terms, a lot more, to add.

How did you come up with the initial list of terms ?

1. I asked some newer database people to come up with a list of terms that they used.
2. I then added some more terms that seemed obvious given that first list.
3. That combined list was long on general database concepts and theory, and short on administration concepts
4. Then Jürgen suggested that we integrate his working list of terms, very much focused on internals, so I did that.
5. Everything after that was applying suggested edits and new terms.
 
Here's some ideas; I'm *not* suggesting to include all of everything, but
hopefully start with a coherent, self-contained list.

I don't think this list will ever be complete. It will always be a work in progress. I'd prefer to get the general structure of a glossary committed in the short term, and we're free to follow up with edits that focus on the wording.
 

grep -roh '<firstterm>[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c |sort -nr |less

Maybe also:
object identifier
operator classes
operator family
visibility map

Just so I can prioritize my work, which of these things, along with your suggestions in previous emails, would you say is a barrier to considering this ready for a committer? 

Re: Add A Glossary

From
Jürgen Purtz
Date:
On 31.03.20 19:58, Justin Pryzby wrote:
> On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote:
>> Please find some minor suggestions in the attachment. They are based on
>> Corey's last patch 0001-glossary-v4.patch.
>> @@ -220,7 +220,7 @@
>>         Record</glossterm>s to the file system and creates a special
>>         checkpoint record. This process is initiated when predefined
>>         conditions are met, such as a specified amount of time has passed, or
>> -      a certain volume of records have been collected.
>> +      a certain volume of records has been collected.
> I think you're correct in that "volume" is singular.  But I think "collected"
> is the wrong world.  I suggested "written".
>
"collected" is not optimal. I suggest "created". Please avoid "written", 
the WAL records will be written when the Checkpointer is running, not 
before. So:

  "a certain volume of <glossterm>WAL records<glossterm> has been 
collected."


Every thing else is ok for me.

Kind regards, Jürgen



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 31.03.20 20:07, Justin Pryzby wrote:
> On Mon, Mar 30, 2020 at 01:10:19PM -0400, Corey Huinker wrote:
>> +   <glossentry id="glossary-aggregating">
>> +    <glossterm>Aggregating</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      The act of combining a collection of data (input) values into
>> +      a single output value, which may not be of the same type as the
>> +      input values.
> I think we maybe already tried to address this ; but could we define a noun
> form ?  But not "aggregate" since it's the same word as the verb form.  I think
> it would maybe be best to merge with "aggregate function", below.

Yes, combine the two. Or remove "aggregating" at all.


> + <glossentry id="glossary-log-writer">
>> +    <glossterm>Log Writer</glossterm>
>> +    <glossdef>
>> +     <para>
>> +      If activated and parameterized, the
> I still don't know what parameterized means here.

Remove "and parameterized". The Log Writer always has (default) parameters.


Every thing else is ok for me.

Kind regards, Jürgen



Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-01, Jürgen Purtz wrote:

> 
> On 31.03.20 19:58, Justin Pryzby wrote:
> > On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote:
> > > Please find some minor suggestions in the attachment. They are based on
> > > Corey's last patch 0001-glossary-v4.patch.
> > > @@ -220,7 +220,7 @@
> > >         Record</glossterm>s to the file system and creates a special
> > >         checkpoint record. This process is initiated when predefined
> > >         conditions are met, such as a specified amount of time has passed, or
> > > -      a certain volume of records have been collected.
> > > +      a certain volume of records has been collected.
> > I think you're correct in that "volume" is singular.  But I think "collected"
> > is the wrong world.  I suggested "written".
> > 
> "collected" is not optimal. I suggest "created". Please avoid "written", the
> WAL records will be written when the Checkpointer is running, not before.

Actually, you're mistaken; the checkpointer hardly writes any WAL
records.  In fact, it only writes *one* wal record, which is the
checkpoint record itself.  All the other wal records are written either
by the backends that produce it, or by the wal writer process.  By the
time the checkpoint runs, the wal records are long expected to be written.

Anyway I changed a lot of terms again, as well as changing the way the
terms are marked up -- for two reasons:

1. I didn't like the way the WAL-related entries were structured.  I
created a new entry called "Write-Ahead Log", which explains what WAL
is; this replaces the term "WAL Log", which is redundant (since the L in
WAL stands for "log" already). I kept the id as glossary-wal, though,
because it's shorter and *shrug*.  The definition uses the terms "wal
record" and "wal file", which I also rewrote.

2. I found out that "see xyz" and "see also" have bespoke markup in
Docbook -- <glosssee> and <glossseealso>.  I changed some glossentries
to use those, removing some glossdefs and changing a couple of paras to
glossseealsos.  I also removed all "id" properties from glossentries
that are just <glosssee>, because I think it's a mistake to have
references to entries that will make the reader look up a different
term; for me as a reader that's annoying, and I don't like to annoy
people.


While at it, I again came across "analytic", which is a term we don't
use much, so I made it a glosssee for "window function"; and while at it
I realized we didn't clearly explain what a window was. So I added
"window frame" for that.  I considered adding the term "partition" which
is used in this context, but decided it wasn't necessary.

I also added "(process)" to terms that define processes.  So
now we have "checkpointer (process)" and so on.

I rewrote the definition for "atomic" once again.  Made it two
glossdefs, because I can.  If you don't like this, I can undo.

I added "recycling".

I still have to go through some other defs.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Justin Pryzby
Date:
On Tue, Mar 31, 2020 at 03:26:02PM -0400, Corey Huinker wrote:
> Just so I can prioritize my work, which of these things, along with your
> suggestions in previous emails, would you say is a barrier to considering
> this ready for a committer?

To answer your off-list inquiry, I'm not likely to mark it "ready" myself.
I don't know if any of these would be a "blocker" for someone else.

> > Here's some ideas; I'm *not* suggesting to include all of everything, but
> > hopefully start with a coherent, self-contained list.
> 
> > grep -roh '<firstterm>[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c
> > |sort -nr |less

I looked through that list and found these that might be good to include now or
in the future.  Probably all of these need language polishing; I'm not
requesting you to just copy them in just to say they're there.

join: concept of combining columns from two tables or other relations.  The
result of joining a table with N rows to another table with M rows might have
up to N*M rows (if every row from the first table "joins to" every row on the
second table).

normalized: A database schema is said to be "normalized" if its redundancy has
been removed.  Typically a "normalized" schema has a larger number of tables,
which include ID columns, and queries typically involve joining together
multiple tables.

query: a request send by a client to a server, usually to return results or to
modify data on the server;

query plan: the particular procedure by which the database server executes a
query.  A simple query involving a single table could might be planned using a
sequential scan or an index scan.  For a complex query involving multiple
tables joined togther, the optimizer attempts to determine the
cheapest/fastest/best way to execute the query, by joining tables in the
optimal order, and with the optimal join strategy.

planner/optimizer: ...

transaction isolation:
psql: ...

synchronous: An action is said to be "synchronous" if it does not return to its
requestor until its completion;

bind parameters: arguments to a SQL query that are sent separately from the
query text.  For example, the query text "SELECT * FROM tbl WHERE col=$1" might
be executed for some certain value of the $1 parameter.  If parameters are sent
"in-line" as a part of the query text, they need to be properly
quoted/escaped/sanitized, to avoid accidental or malicious misbehavior if the
input contains special characters like semicolons or quotes.

> > Maybe also:
> > object identifier
> > operator classes
> > operator family
> > visibility map

-- 
Justin



Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-01, Justin Pryzby wrote:

> planner/optimizer: ...

I propose we define "planner" and make "optimizer" a <glosssee> entry.

I further propose not to define the term "normalized", at least not for
now.  That seems a very deep rabbit hole.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Corey Huinker
Date:
2. I found out that "see xyz" and "see also" have bespoke markup in
Docbook -- <glosssee> and <glossseealso>.  I changed some glossentries
to use those, removing some glossdefs and changing a couple of paras to
glossseealsos.  I also removed all "id" properties from glossentries
that are just <glosssee>, because I think it's a mistake to have
references to entries that will make the reader look up a different
term; for me as a reader that's annoying, and I don't like to annoy
people.

+1 These structural enhancements are great. I'm fine with removing the id from just-glossee, and glad that we're keeping the entry to aid discovery.
 
I rewrote the definition for "atomic" once again.  Made it two
glossdefs, because I can.  If you don't like this, I can undo.

+1 Splitting this into two definitions, one for each context, is the most sensible thing and I don't know why I didn't do that in the first place.

Re: Add A Glossary

From
Corey Huinker
Date:
I propose we define "planner" and make "optimizer" a <glosssee> entry.

I have no objection to more entries, or edits to entries, but am concerned that the process leads to someone having to manually merge several start-from-scratch patches, with no clear sense of when we'll be done. I may make sense to appoint an edit-collector.
 
I further propose not to define the term "normalized", at least not for
now.  That seems a very deep rabbit hole.

+1 I think we appointed a guy named Xeno to work on that definition. He says he's getting close...
 

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-01, Corey Huinker wrote:

> > I propose we define "planner" and make "optimizer" a <glosssee> entry.
> 
> I have no objection to more entries, or edits to entries, but am concerned
> that the process leads to someone having to manually merge several
> start-from-scratch patches, with no clear sense of when we'll be done. I
> may make sense to appoint an edit-collector.

I added "query planner" (please suggest edits) and "query" (using
Justin's def) and edited the defs of the ACID terms a little bit (in
particular moved the definition of atomic transaction to "atomicity"
from "atomic", and made the latter reference the former instead of the
other way around).  Also removed "Aggregating" as suggested upthread.  I
moved "master" over to "primary (server)", keeping the ref; we don't use
the former much.

There's only one "serious" mistake in the defs AFAICS which is that of
"global objects".  Only roles, tablespace, databases are global objects.
Objects that are not in a schema (extensions, etc) are not "global" in
that sense.

I think all <glossterm> used in definitions should have linkend.

I hope to get this committed today, but I'm going to sleep now so if you
want to suggest further edits, now's the time.  I think the terms
proposed by Justin are good to have -- please discuss the defs he
proposed -- only "normalized" I'd rather stay away from.

Thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Jürgen Purtz
Date:
+1 and many thanks to Alvaros edits.


Kind regards

Jürgen Purtz





Re: Add A Glossary

From
Corey Huinker
Date:


On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz <juergen@purtz.de> wrote:
+1 and many thanks to Alvaros edits.


I did some of the grunt work Alvaro alluded to in v6, and the results are attached and they build, which means there are no invalid links.

Notes:
* no definition wordings were changed
* added a linkend to all remaining glossterms that do not immediately follow a glossentry
* renamed id glossary-temporary-tables to glossary-temporary-table
* temporarily re-added an id for glossary-row as we have many references to that. unsure if we should use the term Tuple in all those places or say Row while linking to glossary-tuple, or something else
* temporarily re-added an id for glossary-segment, glossary-wal-segment, glossary-analytic-function, as those were also referenced and will need similar decisions made
* added a stub entry for glossary-unique-index, unsure if it should have a definition on it's own, or we split it into unique and index.
* I noticed several cases where a glossterm is used twice in a definition, but didn't de-term them
* I'm curious about how we should tag a term when using it in its own definition. same as anywhere else?


 
Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-02, Corey Huinker wrote:

> On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz <juergen@purtz.de> wrote:
> 
> > +1 and many thanks to Alvaros edits.
> >
> >
> I did some of the grunt work Alvaro alluded to in v6, and the results are
> attached and they build, which means there are no invalid links.

Thank you!  I had been working on some other changes myself, and merged
most of your changes.  I give you v8.

> * renamed id glossary-temporary-tables to glossary-temporary-table

Good.

> * temporarily re-added an id for glossary-row as we have many references to
> that. unsure if we should use the term Tuple in all those places or say Row
> while linking to glossary-tuple, or something else

I changed these to link to glossary-tuple; that entry already explains
these two other terms, so this seems acceptable.

> * temporarily re-added an id for glossary-segment, glossary-wal-segment,
> glossary-analytic-function, as those were also referenced and will need
> similar decisions made

Ditto.

> * added a stub entry for glossary-unique-index, unsure if it should have a
> definition on it's own, or we split it into unique and index.

I changed Unique Index into Unique Constraint, which is supposed to be
the overarching concept.  Used that in the definition of primary key.

> * I noticed several cases where a glossterm is used twice in a definition,
> but didn't de-term them

Did that for most I found, but I expect that some remain.

> * I'm curious about how we should tag a term when using it in its own
> definition. same as anywhere else?

I think we should not tag those.

I fixed the definition of global object as mentioned previously.  Also
added "client", made "connection" have less importance compared to
"session", and removed "window frame" (made "window function" refer to
"partition" instead).  If you (or anybody) have suggestions for the
definition of "client" and "session", I'm all ears.

I'm quite liking the result of this now.  Thanks for all your efforts.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Justin Pryzby
Date:
On Thu, Apr 02, 2020 at 07:09:32PM -0300, Alvaro Herrera wrote:
> "partition" instead).  If you (or anybody) have suggestions for the
> definition of "client" and "session", I'm all ears.

We already have Session:
    A Connection to the Database. 

I propose: Client:
    A host (or a process on a host) which connects to a server to make
queries or other requests.

But note, "host" is still defined as "server", which I didn't like.

Maybe it should be:
    A computer which may act as a >client< or a >server<.

-- 
Justin



Re: Add A Glossary

From
Alvaro Herrera
Date:
Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
and Roger for the great help, and to Justin for the extensive review and
Fabien for the initial discussion.

This is just a starting point.  Let's keep improving it.  And how that
we have it, we can start thinking of patching the main part of the docs
to make reference to it by using <glossterm> in key spots.  Right now
the glossary links to itself, but it makes lots of sense to have other
places point to it.

On 2020-Apr-02, Justin Pryzby wrote:

> We already have Session:
>     A Connection to the Database. 

Yes, but I didn't like that much, so I rewrote it -- I was asking for
suggestions on how to improve it further.  While I think we use those
terms (connection and session) interchangeably sometimes, they're not
exactly the same and the glossary should be more precise or at least
less vague about the distinction.

> I propose: Client:
>     A host (or a process on a host) which connects to a server to make
> queries or other requests.
> 
> But note, "host" is still defined as "server", which I didn't like.
> 
> Maybe it should be:
>     A computer which may act as a >client< or a >server<.

I changed all these terms, and a few others, added a couple more and
commented out some that I was not happy with, and pushed.

I think this still needs more work:

* We had "serializable", but none of the other isolation levels were
  defined.  If we think we should define them, let's define them all.
  But also the definition we had for serializable was not correct;
  it seemed more suited to define "repeatable read".

* I commented out the definition of "sequence", which seemed to go into
  excessive detail.  Let's have a more concise definition?

* We're missing exclusion constraints, and NOT NULL which is also a
  weird type of constraint.

Patches for these omissions, and other contributions, welcome.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Corey Huinker
Date:
we have it, we can start thinking of patching the main part of the docs
to make reference to it by using <glossterm> in key spots.  Right now
the glossary links to itself, but it makes lots of sense to have other
places point to it.

I have some ideas about how to patch the main docs, but will leave those to a separate thread.
 
* I commented out the definition of "sequence", which seemed to go into
  excessive detail.  Let's have a more concise definition?

That one's my fault. 
 

Patches for these omissions, and other contributions, welcome.

Thanks for all your work on this! 

Re: Add A Glossary

From
Roger Harkavy
Date:
On Fri, Apr 3, 2020 at 1:34 PM Corey Huinker <corey.huinker@gmail.com> wrote:
Thanks for all your work on this! 

And to add on to Corey's message of thanks, I also want to thank everyone for their input and assistance on that. I am very grateful for the opportunity to contribute to this project!

Re: Add A Glossary

From
Erik Rijkers
Date:
On 2020-04-03 18:45, Alvaro Herrera wrote:
> Pushed now.  Many thanks to Corey who put the main thrust, and to 
> Jürgen
> and Roger for the great help, and to Justin for the extensive review 
> and
> Fabien for the initial discussion.

A few improvements:

'its value that cannot'  should be
'its value cannot'

'A newly created Cluster'  should be
'A newly created cluster'

'term Cluster'  should be
'term cluster'

'allowed within a Table.'  should be
'allowed within a table.'

'of a SQL data type.'  should be
'of an SQL data type.'

'A SQL command'  should be
'An SQL command'

'i.e. the data'  should be
'i.e., the data'

'that a a view is'  should be
'that a view is'

'One of the tables that each contain part'  should be
'One of multiple tables that each contain part'

'a partition is a user-defined criteria'  should be
'a partition is a user-defined criterion'

'Roless are'  should be
'Roles are'

'undo all of the operations'  should be
'undo all operations'

'A special mark inside the sequence of steps'  should be
'A special mark in the sequence of steps'

'are enforced unique'  should be (?)
'are enforced to be unique'

'the term Schema is used'  should be
'the term schema is used'

'belong to exactly one Schema.'  should be
'belong to exactly one schema.'

'about the Cluster's activities'  should be
'about the cluster's activities'

'the most common form of Relation'  should be
'the most common form of relation'

'A Trigger executes'  should be
'A trigger executes'

'and other closely related garbage-collection-like processing'  should 
be
'and other processing'

'each of the changes are replayed'  should be
'each of the changes is replayed'

Should also be a  lemmata in the glossary:

ACID


'archaic' should maybe be 'obsolete'. That seems to me to be an easier 
word for non-native speakers.


Thanks,

Erik Rijkers



Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-03, Erik Rijkers wrote:

> On 2020-04-03 18:45, Alvaro Herrera wrote:
> > Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
> > and Roger for the great help, and to Justin for the extensive review and
> > Fabien for the initial discussion.
> 
> A few improvements:

Thanks!  That gives me the attached patch.

> Should also be a  lemmata in the glossary:
> 
> ACID

Agreed.  Wording suggestions welcome.

> 'archaic' should maybe be 'obsolete'. That seems to me to be an easier word
> for non-native speakers.

Bummer ;-)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Justin Pryzby
Date:
On Fri, Apr 03, 2020 at 05:51:43PM -0300, Alvaro Herrera wrote:
> -     The internal representation of one value of a <acronym>SQL</acronym>
> +     The internal representation of one value of an <acronym>SQL</acronym>

I'm not sure about this one.  The new glossary says "a SQL" seven times, and
doesn't say "an sql" at all.

"An SQL" does appear to be more common in the rest of the docs, but if you
change one, I think you'd change them all.

BTW it's now visible at:
https://www.postgresql.org/docs/devel/glossary.html

-- 
Justin



Re: Add A Glossary

From
Erik Rijkers
Date:
On 2020-04-03 22:51, Alvaro Herrera wrote:
> On 2020-Apr-03, Erik Rijkers wrote:
> 
>> On 2020-04-03 18:45, Alvaro Herrera wrote:
>> > Pushed now.  Many thanks to Corey who put the main thrust, and to Jürgen
>> > and Roger for the great help, and to Justin for the extensive review and
>> > Fabien for the initial discussion.
>> 
>> A few improvements:
> 
> Thanks!  That gives me the attached patch.
> 
>> Should also be a  lemmata in the glossary:
>> 
>> ACID
> 
> Agreed.  Wording suggestions welcome.

How about:

"
ACID

Atomicity, consistency, isolation, and durability. ACID is a set of 
properties of database transactions intended to guarantee validity even 
in the event of power failures, etc.
ACID is concerned with how the database recovers from such failures that 
might occur while processing a transaction.
"

>> 'archaic' should maybe be 'obsolete'. That seems to me to be an easier 
>> word
>> for non-native speakers.
> 
> Bummer ;-)

OK - we'll figure it out :)


> --
> Álvaro Herrera                https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Laurenz Albe
Date:
On Fri, 2020-04-03 at 16:01 -0500, Justin Pryzby wrote:
> BTW it's now visible at:
> https://www.postgresql.org/docs/devel/glossary.html

Great!

Some comments:

- SQL object: There are more kinds of objects, like roles or full text dictionaries.
  Perhaps better:

    Anything that is created with a CREATE statement, for example ...
    Most objects belong to a database schema, except ...

  Or do we consider a replication slot to be an object?

- The glossary has "Primary (server)", but not "Standby (server)".
  That should be a synonym for "Replica".

- Server: is that really our definition?
  I thought that "server" is what the glossary defines as "instance", and
  the thing called "server" in the glossary should really be called "host".

  Maybe I am too Unix-centered.

  Many people I know use "instance" synonymous to "cluster".

- Role: I understand the motivation behind the definition (except that the word "instance"
  is ill chosen), but a role is more than a collection of privileges.
  How can a collection of privileges have a password or own an object?
  Perhaps, instead of the first sentence:

    A database object used for authentication, authorization and ownership.
    Both database users and user groups are "roles" in PostgreSQL.

  In the second sentence, "roles" is mis-spelled as "roless".

- Null

  I think it should say "It represents the absence of *a definite* value."
  Usually it is better to think of NULL as "unknown".

- Function

  I don't know if "transformation of data" describes it well.
  Quite a lot of functions in PostgreSQL have side effects.
  How about:

    Procedural code stored in the database that can be used in SQL statements.

Yours,
Laurenz Albe




Re: Add A Glossary

From
Fabien COELHO
Date:
> BTW it's now visible at:
> https://www.postgresql.org/docs/devel/glossary.html

Awesome! Linking beetween defs and to relevant sections is great.

BTW, I'm in favor of "an SQL" because I pronounce it "ess-kew-el", but I 
guess that people who say "sequel" would prefer "a SQL". Failing that, I'm 
fine with some heterogeneity, life is diverse!

ISTM that occurrences of these words elsewhere in the documentation should 
link to the glossary definitions?

As the definitions are short and to the point, maybe the HTML display 
could (also) "hover" the definitions when the mouse passes over the word, 
using the "title" attribute?

"ACID" does not appear as an entry, nor in the acronyms sections. Also no 
DCL, although DML & DDL are in acronyms.

Entries could link to relevant wikipedia pages, like the acronyms section 
does?

-- 
Fabien.



Re: Add A Glossary

From
Jürgen Purtz
Date:
> - Server: is that really our definition?
>    I thought that "server" is what the glossary defines as "instance", and
>    the thing called "server" in the glossary should really be called "host".
>
>    Maybe I am too Unix-centered.
>
>    Many people I know use "instance" synonymous to "cluster".

Currently our documentation uses 'server', 'database server', 'host', 
'instance', ...  in an indifferent way. Similar problem with 
database/cluster. Now we have the chance to come to a conclusion about 
preferred terms an their exact meaning. Definitions in the glossary 
shall be the guideline, the documentation itself can adopt these terms 
over time.

Here is my point of view. We have distinguishable things:

(1) (virtual) hardware

(2) an abstract structure of several object types, which models a 
management system for data

(3) a group of closely related processes. They implement the internal 
'business logic' or 'work flow' of (2).

(4) abstract data, which fits into (2)

(5) a physical representation of (4). Mainly and long lasting on disc, 
but - partly - mirrored in RAM.

(6) client processes, which connect to (3)


IMO for (1) the two terms 'server' and 'host' both have their 
justification, depending on the context. There are historical terms 
('server-side', 'foreign server', 'client/server architecture', 'host' 
or 'host name' for IP-specification, 'host variable') which cannot be 
changed. Therefor we shall accept both with identical definition and use 
them as synonyms. Independent from this, there are many paragraphs in 
the documentation, where they are used in a misleading sense ('server 
crash', '... started the server', 'database server'). They should be 
changed over time.

For me, (3) is an 'instance' and (5) is a 'cluster'. There is a 1:1 
relation between the two, because one 'instance' controls exactly one 
'cluster'. But the 'instance' consists of processes and memory whereas 
the 'cluster' of databases which resides (mainly) on disc.

Concerning (6) we are not interested in any hardware-question. We are 
only interested in the processes, which connect to backend processes. We 
should only define the term "Client process".

Kind regards, Jürgen





Re: Add A Glossary

From
Corey Huinker
Date:
On Sat, Apr 4, 2020 at 2:55 AM Fabien COELHO <coelho@cri.ensmp.fr> wrote:

> BTW it's now visible at:
> https://www.postgresql.org/docs/devel/glossary.html
 
Nice. I went looking for it yesterday and the docs hadn't rebuilt yet.
 
ISTM that occurrences of these words elsewhere in the documentation should
link to the glossary definitions?

Yes, that's a big project. I was considering writing a script to compile all the terms as search terms, paired with their glossary ids, and then invoke git grep to identify all pages that have term FOO but don't have glossary-foo. We would then go about gloss-linking those pages as appropriate, but only a few pages at a time to keep scope sane. Also, I'm unclear about the circumstances under which we should _not_ tag a term. I remember hearing that we should only tag it on the first usage, but is that per section or per page?
 
As the definitions are short and to the point, maybe the HTML display
could (also) "hover" the definitions when the mouse passes over the word,
using the "title" attribute?

I like that idea, if it doesn't conflict with accessibility standards (maybe that's just titles on images, not sure).
I suspect we would want to just carry over the first sentence or so with a ... to avoid cluttering the screen with my overblown definition of a sequence.
I suggest we pursue this idea in another thread, as we'd probably want to do it for acronyms as well.
 

"ACID" does not appear as an entry, nor in the acronyms sections. Also no
DCL, although DML & DDL are in acronyms.

It needs to be in the acronyms page, and in light of all the docbook wizardry that I've learned from Alvaro, those should probably get their own acronym-foo ids as well. The cutoff date for 13 fast approaches, so it might be for 14+ unless doc-only patches are treated differently.
 
Entries could link to relevant wikipedia pages, like the acronyms section
does?

They could. I opted not to do that because each external link invites debate about how authoritative that link is, which is easier to do with acronyms. Now that the glossary is a reality, it's easier to have those discussions.

Re: Add A Glossary

From
Fabien COELHO
Date:
Hi Corey,

>> ISTM that occurrences of these words elsewhere in the documentation should
>> link to the glossary definitions?
>
> Yes, that's a big project. I was considering writing a script to compile
> all the terms as search terms, paired with their glossary ids, and then
> invoke git grep to identify all pages that have term FOO but don't have
> glossary-foo. We would then go about gloss-linking those pages as
> appropriate, but only a few pages at a time to keep scope sane.

Id go for scripting the thing.

Should the glossary be backpatched, to possibly ease doc patchpatches?

> Also, I'm unclear about the circumstances under which we should _not_ 
> tag a term.

At least when then are explained locally.

> I remember hearing that we should only tag it on the first usage, but is 
> that per section or per page?

Page?

>> As the definitions are short and to the point, maybe the HTML display
>> could (also) "hover" the definitions when the mouse passes over the word,
>> using the "title" attribute?
>
> I like that idea, if it doesn't conflict with accessibility standards
> (maybe that's just titles on images, not sure).

The following worked fine:

   <html><head><title>Title Tag Test</title></head>
   <body>The <a href="acid.html" title="ACID stands for Atomic, Consistent, Isolated & Durable">ACID</a>
   property is great.
   </body></html>

So basically the def can be put on the glossary link, however retrieving 
the definition should be automatic.

> I suspect we would want to just carry over the first sentence or so with a
> ... to avoid cluttering the screen with my overblown definition of a
> sequence.

Dunno. The definitions are quite short, maybe the can fit whole.

> I suggest we pursue this idea in another thread, as we'd probably want to
> do it for acronyms as well.

Or not. I'd test committer temperature before investing time because it 
would mean that backpatching the doc would be a little harder.

>> Entries could link to relevant wikipedia pages, like the acronyms section
>> does?
>
> They could. I opted not to do that because each external link invites
> debate about how authoritative that link is, which is easier to do with
> acronyms. Now that the glossary is a reality, it's easier to have those
> discussions.

Ok.

-- 
Fabien.



Re: Add A Glossary

From
Jürgen Purtz
Date:
a) Some rearrangements of the sequence of terms to meet alphabetical order.

b) <glossterm id="linkend-xxx">  --> <glossterm linkend="glossary-xxx">  
in two cases. Or should it be a <firstterm>?


Kind regards, Jürgen



Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-05, Jürgen Purtz wrote:

> a) Some rearrangements of the sequence of terms to meet alphabetical order.

Thanks, will get this pushed.

> b) <glossterm id="linkend-xxx">  --> <glossterm linkend="glossary-xxx">  in
> two cases. Or should it be a <firstterm>?

Ah, yeah, those should be linkend.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Apr-05, Fabien COELHO wrote:

> > > As the definitions are short and to the point, maybe the HTML display
> > > could (also) "hover" the definitions when the mouse passes over the word,
> > > using the "title" attribute?
> > 
> > I like that idea, if it doesn't conflict with accessibility standards
> > (maybe that's just titles on images, not sure).
> 
> The following worked fine:
> 
>   <html><head><title>Title Tag Test</title></head>
>   <body>The <a href="acid.html" title="ACID stands for Atomic, Consistent, Isolated & Durable">ACID</a>
>   property is great.
>   </body></html>

I don't see myself patching the stylesheet as would be needed to do
this.

> > I suggest we pursue this idea in another thread, as we'd probably want to
> > do it for acronyms as well.
> 
> Or not. I'd test committer temperature before investing time because it
> would mean that backpatching the doc would be a little harder.

TBH I can't get very excited about this idea.  Maybe other documentation
champions would be happier about doing that.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Jürgen Purtz
Date:
> On 2020-Apr-05, Jürgen Purtz wrote:
>
>> a) Some rearrangements of the sequence of terms to meet alphabetical order.
> Thanks, will get this pushed.
>
>> b) <glossterm id="linkend-xxx">  --> <glossterm linkend="glossary-xxx">  in
>> two cases. Or should it be a <firstterm>?
> Ah, yeah, those should be linkend.
>
Term 'relation': A sequence is internally a table with one row - right? 
Shall we extend the list of concrete relations by 'sequence'? Or is this 
not necessary because 'table' is already there?

Kind regards, Jürgen





Re: Add A Glossary

From
Corey Huinker
Date:

Term 'relation': A sequence is internally a table with one row - right?
Shall we extend the list of concrete relations by 'sequence'? Or is this
not necessary because 'table' is already there?

I wrote one for sequence, it was a bit math-y for Alvaro's taste, so we're going to try again.

 

Re: Add A Glossary

From
Jürgen Purtz
Date:


On 11.04.20 21:47, Corey Huinker wrote:

Term 'relation': A sequence is internally a table with one row - right?
Shall we extend the list of concrete relations by 'sequence'? Or is this
not necessary because 'table' is already there?

I wrote one for sequence, it was a bit math-y for Alvaro's taste, so we're going to try again.

 

This seems to be a misunderstanding. My question was whether we shall extend the definition of Relation to: "... Tables, views, foreign tables, materialized views, indexes, and *sequences* are all relations."

Kind regards, Jürgen

Re: Add A Glossary

From
Peter Eisentraut
Date:
Why are all the glossary terms capitalized?  Seems kind of strange.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Corey Huinker
Date:
On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
Why are all the glossary terms capitalized?  Seems kind of strange.


They weren't intended to be, and they don't appear to be in the page I'm looking at. Are you referring to the anchor like in https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? If so, that all-capping is part of the rendering, as the ids were all named in all-lower-case.
 

Re: Add A Glossary

From
Alvaro Herrera
Date:
Thanks everybody.  I have compiled together all the suggestions and the
result is in the attached patch.  Some of it is of my own devising.

* I changed "instance", and made "cluster" be mostly a synonym of that.

* I removed "global SQL object" and made "SQL object" explain it.

* Added definitions for ACID, sequence, bloat, fork, FSM, VM, data page,
  transaction ID, epoch.

* Changed "a SQL" to "an sql" everywhere.

* Sorted alphabetically.

* Removed caps in term names.

I think I should get this pushed, and if there are further suggestions,
they're welcome.

Dim Fontaine and others suggested a number of terms that could be
included; see https://twitter.com/alvherre/status/1246192786287865856

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Justin Pryzby
Date:
On Thu, May 14, 2020 at 08:00:17PM -0400, Alvaro Herrera wrote:
> +   <glossterm>ACID</glossterm>
> +   <glossdef>
> +    <para>
> +     <glossterm linkend="glossary-atomicity">Atomicity</glossterm>,
> +     <glossterm linkend="glossary-consistency">consistency</glossterm>,
> +     <glossterm linkend="glossary-isolation">isolation</glossterm>, and
> +     <glossterm linkend="glossary-durability">durability</glossterm>.
> +     A set of properties of database transactions intended to guarantee validity
> +     in concurrent operation and even in event of errors, power failures, etc.

I would capitalize Consistency, Isolation, Durability, and say "These four
properties" or "This set of four properties" (althought that makes this sounds
more like a fun game of DBA jeopardy).

> +   <glossterm>Background writer (process)</glossterm>
>     <glossdef>
>      <para>
> -     A process that continuously writes dirty pages from
> +     A process that continuously writes dirty

I don't like "continuously"

> +     <glossterm linkend="glossary-data-page">data pages</glossterm> from
>  
> +  <glossentry id="glossary-bloat">
> +   <glossterm>Bloat</glossterm>
> +   <glossdef>
> +    <para>
> +     Space in data pages which does not contain relevant data,
> +     such as unused (free) space or outdated row versions.

"current row versions" instead of relevant ?

> +  <glossentry id="glossary-data-page">
> +   <glossterm>Data page</glossterm>
> +   <glossdef>
> +    <para>
> +     The basic structure used to store relation data.
> +     All pages are of the same size.
> +     Data pages are typically stored on disk, each in a specific file,
> +     and can be read to <glossterm linkend="glossary-shared-memory">shared buffers</glossterm>
> +     where they can be modified, becoming
> +     <firstterm>dirty</firstterm>.  They get clean by being written down

say "They become clean when written to disk"

> +     to disk.  New pages, which initially exist in memory only, are also
> +     dirty until written.

> +  <glossentry id="glossary-fork">
> +   <glossterm>Fork</glossterm>
> +   <glossdef>
> +    <para>
> +     Each of the separate segmented file sets that a relation stores its
> +     data in.  There exist a <firstterm>main fork</firstterm> and two secondary

"in which a relation's data is stored"

> +     forks: the <glossterm linkend="glossary-fsm">free space map</glossterm>
> +     <glossterm linkend="glossary-vm">visibility map</glossterm>.

missing "and" ?

> +  <glossentry id="glossary-fsm">
> +   <glossterm>Free space map (fork)</glossterm>
> +   <glossdef>
> +    <para>
> +     A storage structure that keeps metadata about each data page in a table's
> +     main storage space.

s/in/of/

just say "main fork"?

> The free space map entry for each space stores the

for each page ?

> +     amount of free space that's available for future tuples, and is structured
> +     so it is efficient to search for available space for a new tuple of a given
> +     size.

..to be efficiently searched to find free space..

>       The heap is realized within
> -     <glossterm linkend="glossary-file-segment">segment files</glossterm>.
> +     <glossterm linkend="glossary-file-segment">segmented files</glossterm>
> +     in the relation's <glossterm linkend="glossary-fork">main fork</glossterm>.

Hm, the files aren't segmented.  Say "one or more file segments per relation"

> +      There also exist local objects that do not belong to schemas; some examples are
> +      <glossterm linkend="glossary-extension">extensions</glossterm>,
> +      <glossterm linkend="glossary-cast">data type casts</glossterm>, and
> +      <glossterm linkend="glossary-foreign-data-wrapper">foreign data wrappers</glossterm>.

Don't extensions have schemas ?

> +  <glossentry id="glossary-xid">
> +   <glossterm>Transaction ID</glossterm>
> +   <glossdef>
> +    <para>
> +     The numerical, unique, sequentially-assigned identifier that each
> +     transaction receives when it first causes a database modification.
> +     Frequently abbreviated <firstterm>xid</firstterm>.

abbreviated *as* xid

> +     approximately four billion write transactions IDs can be generated;
> +     to permit the system to run for longer than that would allow,

remove "would allow"

>      <para>
>       The process of removing outdated <glossterm linkend="glossary-tuple">tuple
>       versions</glossterm> from tables, and other closely related

actually tables or materialized views..

> +  <glossentry id="glossary-vm">
> +   <glossterm>Visibility map (fork)</glossterm>
> +   <glossdef>
> +    <para>
> +     A storage structure that keeps metadata about each data page
> +     in a table's main storage space.  The visibility map entry for

s/in/of/

main fork?

-- 
Justin



Re: Add A Glossary

From
Alvaro Herrera
Date:
Applied all these suggestions, and made a few additional very small
edits, and pushed -- better to ship what we have now in beta1, but
further edits are still possible.

Other possible terms to define, including those from the tweet I linked
to and a couple more:

archive
availability
backup
composite type
common table expression
data type
domain
dump
export
fault tolerance
GUC
high availability
hot standby
LSN
restore
secondary server (?)
snapshot
transactions per second

Anybody want to try their hand at a tentative definition?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Erik Rijkers
Date:
On 2020-05-15 19:26, Alvaro Herrera wrote:
> Applied all these suggestions, and made a few additional very small
> edits, and pushed -- better to ship what we have now in beta1, but
> further edits are still possible.

I've gone through the glossary as committed and found some more small 
things; patch attached.

Thanks,


Erik Rijkers


> Other possible terms to define, including those from the tweet I linked
> to and a couple more:
> 
> archive
> availability
> backup
> composite type
> common table expression
> data type
> domain
> dump
> export
> fault tolerance
> GUC
> high availability
> hot standby
> LSN
> restore
> secondary server (?)
> snapshot
> transactions per second
> 
> Anybody want to try their hand at a tentative definition?
> 
> --
> Álvaro Herrera                https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-May-16, Erik Rijkers wrote:

> On 2020-05-15 19:26, Alvaro Herrera wrote:
> > Applied all these suggestions, and made a few additional very small
> > edits, and pushed -- better to ship what we have now in beta1, but
> > further edits are still possible.
> 
> I've gone through the glossary as committed and found some more small
> things; patch attached.

All pushed!  Many thanks,

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 15.05.20 02:00, Alvaro Herrera wrote:
> Thanks everybody.  I have compiled together all the suggestions and the
> result is in the attached patch.  Some of it is of my own devising.
>
> * I changed "instance", and made "cluster" be mostly a synonym of that.
In my understanding, "instance" and "cluster" should be different 
things, not only synonyms. "instance" can be the term for permanently 
fluctuating objects (processes and RAM) and "cluster" can denote the 
more static objects (directories and files). What do you think? If you 
agree, I would create a patch.
> * I removed "global SQL object" and made "SQL object" explain it.
+1., but see the (huge) different spellings in patch.

bloat: changed 'current row' to 'relevant row' because not only the 
youngest one is relevant (non-bloat).

data type casts: Are you sure that they are global? In pg_cast 
'relisshared' is 'false'.

--

Jürgen Purtz



Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-May-17, Jürgen Purtz wrote:

> On 15.05.20 02:00, Alvaro Herrera wrote:
> > Thanks everybody.  I have compiled together all the suggestions and the
> > result is in the attached patch.  Some of it is of my own devising.
> > 
> > * I changed "instance", and made "cluster" be mostly a synonym of that.
> In my understanding, "instance" and "cluster" should be different things,
> not only synonyms. "instance" can be the term for permanently fluctuating
> objects (processes and RAM) and "cluster" can denote the more static objects
> (directories and files). What do you think? If you agree, I would create a
> patch.

I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.


> > * I removed "global SQL object" and made "SQL object" explain it.
> +1., but see the (huge) different spellings in patch.

This seems a misunderstanding of what "local" means.  Any object that
exists in a database is local, regardless of whether it exists in a
schema or not.  "Extensions" is one type of object that does not belong
in a schema.  "Foreign data wrapper" is another type of object that does
not belong in a schema.  Same with data type casts.  They are *not*
global objects.

> bloat: changed 'current row' to 'relevant row' because not only the youngest
> one is relevant (non-bloat).

Hm.  TBH I'm not sure of this term at all.  I think we sometimes use the
term "bloat" to talk about the dead rows only, ignoring the free space.

> data type casts: Are you sure that they are global? In pg_cast 'relisshared'
> is 'false'.

I'm not saying they're global.  I'm saying they're outside schemas.
Maybe this definition needs more rewording, if this bit is unclear.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 17.05.20 08:51, Alvaro Herrera wrote:
> Any object that
> exists in a database is local, regardless of whether it exists in a
> schema or not.
This implies that the term "local" is unnecessary, just call them "SQL 
object".
> "Extensions" is one type of object that does not belong
> in a schema.  "Foreign data wrapper" is another type of object that does
> not belong in a schema.  ...  They are*not*
> global objects.
postgres_fdw is a module among many others. It's only an example for 
"extensions" and has no different nature. Yes, they are not global SQL 
objects because they don't belong to the cluster.

In summary we have 3 types of objects: belonging to a schema, to a 
database, or to the cluster (global). Maybe, we can avoid the use of the 
different names 'local SQL object' and 'global SQL object' at all and 
just call them 'SQL object'. 'global SQL object' is used only once. We 
could rephrase "A set of databases and accompanying global SQL objects 
... " to "A set of databases and accompanying SQL objects, which exists 
at the cluster level, ... "

> TBH I'm not sure of this term at all.  I think we sometimes use the
> term "bloat" to talk about the dead rows only, ignoring the free space.

That's a good example for the necessity of the glossary. Currently we 
don't have a common understanding about all of our used terms. The 
glossary shall fix that and give a mandatory definition - after a 
clearing discussion.

--

Jürgen Purtz





Re: Add A Glossary

From
Jürgen Purtz
Date:
On 17.05.20 08:51, Alvaro Herrera wrote:
On 15.05.20 02:00, Alvaro Herrera wrote:
Thanks everybody.  I have compiled together all the suggestions and the
result is in the attached patch.  Some of it is of my own devising.

* I changed "instance", and made "cluster" be mostly a synonym of that.
In my understanding, "instance" and "cluster" should be different things,
not only synonyms. "instance" can be the term for permanently fluctuating
objects (processes and RAM) and "cluster" can denote the more static objects
(directories and files). What do you think? If you agree, I would create a
patch.
I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.

We have the basic tools "initdb — create a new PostgreSQL database cluster" which affects nothing but files, and we have "pg_ctl — initialize, start, stop, or control a PostgreSQL server" which - directly - affects nothing but processes and RAM. (Here the term "server" collides with new definitions in the glossary. But that's another story.)

--

Jürgen Purtz


Re: Add A Glossary

From
Erik Rijkers
Date:
On 2020-05-17 08:51, Alvaro Herrera wrote:
> On 2020-May-17, Jürgen Purtz wrote:
> 
>> On 15.05.20 02:00, Alvaro Herrera wrote:
>> > Thanks everybody.  I have compiled together all the suggestions and the
>> >
>> > * I changed "instance", and made "cluster" be mostly a synonym of that.
>> In my understanding, "instance" and "cluster" should be different 
>> things,
> 
> I don't think that's the general understanding of those terms.  For all
> I know, they *are* synonyms, and there's no specific term for "the
> fluctuating objects" as you call them.  The instance is either running
> (in which case there are processes and RAM) or it isn't.

For what it's worth, I've also always understood 'instance' as 'a 
running database'.  I admit it might be a left-over from my oracle 
years:

   
https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601

There, 'instance' clearly refers to a running database.  When that 
database is stopped, it ceases to be an instance.  I've always 
understood this to be the same for the PostgreSQL 'instance'.  Once 
stopped, it is no longer an instance, but it is, of course, still a 
cluster.

I know, we don't have to do the same as Oracle, but clearly it's going 
to be an ongoing source of misunderstanding if we define such a 
high-level term differently.

Erik Rijkers



Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-May-17, Erik Rijkers wrote:

> On 2020-05-17 08:51, Alvaro Herrera wrote:

> > I don't think that's the general understanding of those terms.  For all
> > I know, they *are* synonyms, and there's no specific term for "the
> > fluctuating objects" as you call them.  The instance is either running
> > (in which case there are processes and RAM) or it isn't.
> 
> For what it's worth, I've also always understood 'instance' as 'a running
> database'.  I admit it might be a left-over from my oracle years:
> 
> https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601
> 
> There, 'instance' clearly refers to a running database.  When that database
> is stopped, it ceases to be an instance.

I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)

It seems difficult to get this sorted out before beta1, but there's
still time before the glossary is released.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 17.05.20 17:28, Alvaro Herrera wrote:
On 2020-May-17, Erik Rijkers wrote:

On 2020-05-17 08:51, Alvaro Herrera wrote:
I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.
For what it's worth, I've also always understood 'instance' as 'a running
database'.  I admit it might be a left-over from my oracle years:

https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601

There, 'instance' clearly refers to a running database.  When that database
is stopped, it ceases to be an instance.
I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)

In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms.

Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*.  For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.

The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible.

server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". --

Jürgen Purtz

(PS: I added the docs mailing list)


Re: Add A Glossary

From
Jürgen Purtz
Date:
On 17.05.20 17:28, Alvaro Herrera wrote:
On 2020-May-17, Erik Rijkers wrote:

On 2020-05-17 08:51, Alvaro Herrera wrote:
I don't think that's the general understanding of those terms.  For all
I know, they *are* synonyms, and there's no specific term for "the
fluctuating objects" as you call them.  The instance is either running
(in which case there are processes and RAM) or it isn't.
For what it's worth, I've also always understood 'instance' as 'a running
database'.  I admit it might be a left-over from my oracle years:

https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601

There, 'instance' clearly refers to a running database.  When that database
is stopped, it ceases to be an instance.
I've never understood it that way, but I'm open to having my opinion on
it changed.  So let's discuss it and maybe gather opinions from others.

I think the terms under discussion are just

* cluster
* instance
* server

We don't have "host" (I just made it a synonym for server), but perhaps
we can add that too, if it's useful.  It would be good to be consistent
with historical Postgres usage, such as the initdb usage of "cluster"
etc.

Perhaps we should not only define what our use of each term is, but also
explain how each term is used outside PostgreSQL and highlight the
differences.  (This would be particularly useful for "cluster" ISTM.)

In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms.

Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*.  For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.

The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible.

server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". --

Jürgen Purtz

(PS: I added the docs mailing list)


Re: Add A Glossary

From
Laurenz Albe
Date:
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".

A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon.  The term "client-server architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe




Re: Add A Glossary

From
Laurenz Albe
Date:
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".

A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon.  The term "client-server architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe




Re: Add A Glossary

From
Andrew Grillet
Date:
I think there needs to be a careful analysis of the language and a formal effort to stabilise it for the future.

In the context of, say, an Oracle T series, which is partitioned into multiple domains (virtual machines) in it, each
of these has multiple CPUs, and can run an instance of the OS which hosts multiple virtual instances
of the same or different OSes. Som domains might do this while others do not!

A host could be a domain, one of many virtual machines, or it could be one of many hosts on that VM
but even these hosts could be virtual machines that each runs several virtual servers!

Of course, PostgreSQL can run on any tier of this regime, but the documentation at least needs to be consistent
about language.

A "machine" should probably refer to hardware, although I would accept that a domain might count as "virtual
hardware" while a host should probably refer to a single instance of OS.

Of course it is possible for a single  instance of OS to run multiple instances of PostgreSQL, and people do this. (I have
in the past).

Slightly more confusingly, it would appear possible for a single instance of an OS to have multiple IP addresses
and if there are multiple instances of PostgreSQL, they may serve different IP Addresses uniquely, or
share them. I think this case suggests that a host probably best describes an OS instance. I might be wrong.

The word "server" might be an instance of any of the above, or a waiter with a bowl of soup. It is best
reserved for situations where clarity is not required.

If you are new to all this, I am sure it is very confusing, and inconsistent language is not going to help.

Andrew



AFAICT





On Tue, 19 May 2020 at 07:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".

A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon.  The term "client-server architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe



Re: Add A Glossary

From
Andrew Grillet
Date:
I think there needs to be a careful analysis of the language and a formal effort to stabilise it for the future.

In the context of, say, an Oracle T series, which is partitioned into multiple domains (virtual machines) in it, each
of these has multiple CPUs, and can run an instance of the OS which hosts multiple virtual instances
of the same or different OSes. Som domains might do this while others do not!

A host could be a domain, one of many virtual machines, or it could be one of many hosts on that VM
but even these hosts could be virtual machines that each runs several virtual servers!

Of course, PostgreSQL can run on any tier of this regime, but the documentation at least needs to be consistent
about language.

A "machine" should probably refer to hardware, although I would accept that a domain might count as "virtual
hardware" while a host should probably refer to a single instance of OS.

Of course it is possible for a single  instance of OS to run multiple instances of PostgreSQL, and people do this. (I have
in the past).

Slightly more confusingly, it would appear possible for a single instance of an OS to have multiple IP addresses
and if there are multiple instances of PostgreSQL, they may serve different IP Addresses uniquely, or
share them. I think this case suggests that a host probably best describes an OS instance. I might be wrong.

The word "server" might be an instance of any of the above, or a waiter with a bowl of soup. It is best
reserved for situations where clarity is not required.

If you are new to all this, I am sure it is very confusing, and inconsistent language is not going to help.

Andrew



AFAICT





On Tue, 19 May 2020 at 07:17, Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
> or *remote instance*.  For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.

FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".

On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".

A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon.  The term "client-server architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

Yours,
Laurenz Albe



Re: Add A Glossary

From
Peter Eisentraut
Date:
On 2020-05-19 08:17, Laurenz Albe wrote:
> The term "cluster" is unfortunate, because to most people it suggests a group of
> machines, so the term "instance" is better, but that ship has sailed long ago.

I don't see what would stop us from renaming some things, with some care.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Peter Eisentraut
Date:
On 2020-05-19 08:17, Laurenz Albe wrote:
> The term "cluster" is unfortunate, because to most people it suggests a group of
> machines, so the term "instance" is better, but that ship has sailed long ago.

I don't see what would stop us from renaming some things, with some care.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 19.05.20 08:17, Laurenz Albe wrote:
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
cluster/instance: PG (mainly) consists of a group of processes that commonly
act on shared buffers. The processes are very closely related to each other
and with the buffers. They exist altogether or not at all. They use a common
initialization file and are incarnated by one command. Everything exists
solely in RAM and therefor has a fluctuating nature. In summary: they build
a unit and this unit needs to have a name of itself. In some pages we used
to use the term *instance* - sometimes in extended forms: *database instance*,
*PG instance*, *standby instance*, *standby server instance*, *server instance*,
or *remote instance*.  For me, the term *instance* makes sense, the extensions
*standby instance* and *remote instance* in their context too.
FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

cluster/instance: The different nature (static/dynamic) of what I call "cluster" and "instance" as well as the existence of the two commands "initdb — create a new PostgreSQL database cluster" and  "pg_ctl — initialize, start, stop, or control a PostgreSQL server" confirms me in my opinion that we need two different terms for them. Those two terms shall not be synonym to each other, they label distinct things. If people prefer "data directory" instead of "cluster", this is ok for me.

There are situations where we need a single term for both of them. "Instance and its data directory" or "Instance and its cluster" are too wordy. In many cases we use "database server" or "server" in this sense. Imo "Server" is too short and ambiguous. "database server", the plural form "databases server", or the new term "cluster server", which is more accurate, would be ok for me. (Similar to "server", the term "cluster" is also used in many different contexts - but only outside of the PG world; within our context "cluster" is not ambiguous.) 

server/host: We need a term to describe the underlying hardware respectively
the virtual machine or container, where PG is running. I suggest to use both
*server* and *host*. In computer science, both have their eligibility and are
widely used. Everybody understands *client/server architecture* or *host* in
TCP/IP configuration. We cannot change such matter of course. I suggest to
use both depending on the context, but with the same meaning: "real hardware,
a container, or a virtual machine".
On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".

A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon.  The term "client-server architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

server/host: I agree that we are not interested in the question whether there is real hardware or any virtualization container. We are even not interested in the operating system. Our primary concern is the existence of a port of the Internet Protocol. But is the term "server" appropriate to name an IP-port? Additionally, "server" is used for other meanings: a) the previously mentioned "database server" b) a (virtual) machine: "server-side", "... the file ... loaded by the server ..." c) binaries "... the server must be built with SSL support ..." d) whenever it seems to be appropriate: "standby server", "... the server parses query ...", "server configuration", "server process".

Because of its ambiguous usage, the definition of "server" must clarify the allowed meanings. What's about:

--

server: Depending on the context, the term *server* denotes:

  • An IP-port which is offered by any OS.   ?????
  • A - possibly virtualized - machine
  • An abbreviation for the slightly longer term "database(s)/cluster server"  ??? this will support the readability, but not the clarity ???
  • More ?

--

The term "host" is used mainly for IP configuration "host name", "host address" and in the context of compiling "host language", "host variable". These are clear situations and can be defined easily.

Re: Add A Glossary

From
Jürgen Purtz
Date:
On 19.05.20 08:17, Laurenz Albe wrote:
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
cluster/instance: PG (mainly) consists of a group of processes that commonly
act on shared buffers. The processes are very closely related to each other
and with the buffers. They exist altogether or not at all. They use a common
initialization file and are incarnated by one command. Everything exists
solely in RAM and therefor has a fluctuating nature. In summary: they build
a unit and this unit needs to have a name of itself. In some pages we used
to use the term *instance* - sometimes in extended forms: *database instance*,
*PG instance*, *standby instance*, *standby server instance*, *server instance*,
or *remote instance*.  For me, the term *instance* makes sense, the extensions
*standby instance* and *remote instance* in their context too.
FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.

The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.

The static part of a cluster to me is the "data directory".

cluster/instance: The different nature (static/dynamic) of what I call "cluster" and "instance" as well as the existence of the two commands "initdb — create a new PostgreSQL database cluster" and  "pg_ctl — initialize, start, stop, or control a PostgreSQL server" confirms me in my opinion that we need two different terms for them. Those two terms shall not be synonym to each other, they label distinct things. If people prefer "data directory" instead of "cluster", this is ok for me.

There are situations where we need a single term for both of them. "Instance and its data directory" or "Instance and its cluster" are too wordy. In many cases we use "database server" or "server" in this sense. Imo "Server" is too short and ambiguous. "database server", the plural form "databases server", or the new term "cluster server", which is more accurate, would be ok for me. (Similar to "server", the term "cluster" is also used in many different contexts - but only outside of the PG world; within our context "cluster" is not ambiguous.) 

server/host: We need a term to describe the underlying hardware respectively
the virtual machine or container, where PG is running. I suggest to use both
*server* and *host*. In computer science, both have their eligibility and are
widely used. Everybody understands *client/server architecture* or *host* in
TCP/IP configuration. We cannot change such matter of course. I suggest to
use both depending on the context, but with the same meaning: "real hardware,
a container, or a virtual machine".
On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".

A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon.  The term "client-server architecture"
that you quote emphasized that.

Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).

server/host: I agree that we are not interested in the question whether there is real hardware or any virtualization container. We are even not interested in the operating system. Our primary concern is the existence of a port of the Internet Protocol. But is the term "server" appropriate to name an IP-port? Additionally, "server" is used for other meanings: a) the previously mentioned "database server" b) a (virtual) machine: "server-side", "... the file ... loaded by the server ..." c) binaries "... the server must be built with SSL support ..." d) whenever it seems to be appropriate: "standby server", "... the server parses query ...", "server configuration", "server process".

Because of its ambiguous usage, the definition of "server" must clarify the allowed meanings. What's about:

--

server: Depending on the context, the term *server* denotes:

  • An IP-port which is offered by any OS.   ?????
  • A - possibly virtualized - machine
  • An abbreviation for the slightly longer term "database(s)/cluster server"  ??? this will support the readability, but not the clarity ???
  • More ?

--

The term "host" is used mainly for IP configuration "host name", "host address" and in the context of compiling "host language", "host variable". These are clear situations and can be defined easily.

Re: Add A Glossary

From
Laurenz Albe
Date:
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote:
> > FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
> > perhaps distinguishing between a "started cluster" and a "stopped cluster".
> > After all, "cluster" refers to "a cluster of databases", which are there, regardless
> > if you start the server or not.
> > 
> > The term "cluster" is unfortunate, because to most people it suggests a group of
> > machines, so the term "instance" is better, but that ship has sailed long ago.
> > 
> > The static part of a cluster to me is the "data directory".
>   
> cluster/instance: The different nature (static/dynamic) of what I
>       call "cluster" and "instance" as well as the existence of the two
>       commands "initdb — create a new PostgreSQL database cluster" and 
>       "pg_ctl — initialize, start, stop, or control a PostgreSQL server"
>       confirms me in my opinion that we need two different terms for
>       them.

I think that the "pg_ctl" example does not apply:
It does not talk about starting the cluster, but about starting the server process,
that is "server" in the way I understand it.

> There are situations where we need a single term for both of
>       them. "Instance and its data directory" or "Instance and its
>       cluster" are too wordy. In many cases we use "database server" or
>       "server" in this sense. Imo "Server" is too short and ambiguous.
>       "database server", the plural form "databases server", or the new
>       term "cluster server", which is more accurate, would be ok for me.
>       (Similar to "server", the term "cluster" is also used in many
>       different contexts - but only outside of the PG world; within our
>       context "cluster" is not ambiguous.) 

That does not feel right to me.

"cluster server", ouch. "databases server", ouch as well.

I never felt the term "cluster" was unclear in these contexts.
Sometimes it means "data directory", sometimes it is used for "server process",
but I think few people would think one cound connect to a data directory
or create a process in a directory (initdb).

I think clarity is a Good Thing, but it can be overdone.

> > > server/host: We need a term to describe the underlying hardware respectively
> > > the virtual machine or container, where PG is running. I suggest to use both
> > > *server* and *host*. In computer science, both have their eligibility and are
> > > widely used. Everybody understands *client/server architecture* or *host* in
> > > TCP/IP configuration. We cannot change such matter of course. I suggest to
> > > use both depending on the context, but with the same meaning: "real hardware,
> > > a container, or a virtual machine".
> > 
> > On this I have a strong opinion because of my Unix mindset.
> > "machine" and "host" are synonyms, and it doesn't matter to the database if they
> > are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".
> > 
> > A "server" is a piece of software that responds to client requests, never a machine.
> > In my book, this is purely Windows jargon.  The term "client-server architecture"
> > that you quote emphasized that.
> > 
> > Perhaps "machine" would be the preferable term, because "host" is more prone to
> > misunderstandings (except in a networking context).
>     
> server/host: I agree that we are not interested in the question
>       whether there is real hardware or any virtualization container. We
>       are even not interested in the operating system. Our primary
>       concern is the existence of a port of the Internet Protocol. But
>       is the term "server" appropriate to name an IP-port? Additionally,
>       "server" is used for other meanings: a) the previously mentioned
>       "database server" b) a (virtual) machine: "server-side", "... the
>       file ... loaded by the server ..." c) binaries "... the server
>       must be built with SSL support ..." d) whenever it seems to be
>       appropriate: "standby server", "... the server parses query ...",
>       "server configuration", "server process".

You are most thorough :^)
   
> Because of its ambiguous usage, the definition of "server" must
>       clarify the allowed meanings. What's about:
> 
> server: Depending on the context, the term *server* denotes:
>       
> An IP-port which is offered by any OS.   ?????

A port is a server?  No way.
      
> A - possibly virtualized - machine

It might be good to disambiguate that, but I don't think that the PostgreSQL
documentation should use the word "server" to mean "machine".

> An abbreviation for the slightly longer term
>         "database(s)/cluster server"  ??? this will support the
>         readability, but not the clarity ???

"Server" is short for "database server" and is a set of processes that listen
for and handle incoming database client requests.

I think that covers all the meanings you quoted from the documentation,
except c), where it is used as shorthand for "server executable".

Yours,
Laurenz Albe




Re: Add A Glossary

From
Laurenz Albe
Date:
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote:
> > FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
> > perhaps distinguishing between a "started cluster" and a "stopped cluster".
> > After all, "cluster" refers to "a cluster of databases", which are there, regardless
> > if you start the server or not.
> > 
> > The term "cluster" is unfortunate, because to most people it suggests a group of
> > machines, so the term "instance" is better, but that ship has sailed long ago.
> > 
> > The static part of a cluster to me is the "data directory".
>   
> cluster/instance: The different nature (static/dynamic) of what I
>       call "cluster" and "instance" as well as the existence of the two
>       commands "initdb — create a new PostgreSQL database cluster" and 
>       "pg_ctl — initialize, start, stop, or control a PostgreSQL server"
>       confirms me in my opinion that we need two different terms for
>       them.

I think that the "pg_ctl" example does not apply:
It does not talk about starting the cluster, but about starting the server process,
that is "server" in the way I understand it.

> There are situations where we need a single term for both of
>       them. "Instance and its data directory" or "Instance and its
>       cluster" are too wordy. In many cases we use "database server" or
>       "server" in this sense. Imo "Server" is too short and ambiguous.
>       "database server", the plural form "databases server", or the new
>       term "cluster server", which is more accurate, would be ok for me.
>       (Similar to "server", the term "cluster" is also used in many
>       different contexts - but only outside of the PG world; within our
>       context "cluster" is not ambiguous.) 

That does not feel right to me.

"cluster server", ouch. "databases server", ouch as well.

I never felt the term "cluster" was unclear in these contexts.
Sometimes it means "data directory", sometimes it is used for "server process",
but I think few people would think one cound connect to a data directory
or create a process in a directory (initdb).

I think clarity is a Good Thing, but it can be overdone.

> > > server/host: We need a term to describe the underlying hardware respectively
> > > the virtual machine or container, where PG is running. I suggest to use both
> > > *server* and *host*. In computer science, both have their eligibility and are
> > > widely used. Everybody understands *client/server architecture* or *host* in
> > > TCP/IP configuration. We cannot change such matter of course. I suggest to
> > > use both depending on the context, but with the same meaning: "real hardware,
> > > a container, or a virtual machine".
> > 
> > On this I have a strong opinion because of my Unix mindset.
> > "machine" and "host" are synonyms, and it doesn't matter to the database if they
> > are virtualized or not.  You can always disambiguate by adding "virtual" or "physical".
> > 
> > A "server" is a piece of software that responds to client requests, never a machine.
> > In my book, this is purely Windows jargon.  The term "client-server architecture"
> > that you quote emphasized that.
> > 
> > Perhaps "machine" would be the preferable term, because "host" is more prone to
> > misunderstandings (except in a networking context).
>     
> server/host: I agree that we are not interested in the question
>       whether there is real hardware or any virtualization container. We
>       are even not interested in the operating system. Our primary
>       concern is the existence of a port of the Internet Protocol. But
>       is the term "server" appropriate to name an IP-port? Additionally,
>       "server" is used for other meanings: a) the previously mentioned
>       "database server" b) a (virtual) machine: "server-side", "... the
>       file ... loaded by the server ..." c) binaries "... the server
>       must be built with SSL support ..." d) whenever it seems to be
>       appropriate: "standby server", "... the server parses query ...",
>       "server configuration", "server process".

You are most thorough :^)
   
> Because of its ambiguous usage, the definition of "server" must
>       clarify the allowed meanings. What's about:
> 
> server: Depending on the context, the term *server* denotes:
>       
> An IP-port which is offered by any OS.   ?????

A port is a server?  No way.
      
> A - possibly virtualized - machine

It might be good to disambiguate that, but I don't think that the PostgreSQL
documentation should use the word "server" to mean "machine".

> An abbreviation for the slightly longer term
>         "database(s)/cluster server"  ??? this will support the
>         readability, but not the clarity ???

"Server" is short for "database server" and is a set of processes that listen
for and handle incoming database client requests.

I think that covers all the meanings you quoted from the documentation,
except c), where it is used as shorthand for "server executable".

Yours,
Laurenz Albe




Re: Add A Glossary

From
Peter Eisentraut
Date:
On 2020-04-29 21:55, Corey Huinker wrote:
> On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut 
> <peter.eisentraut@2ndquadrant.com 
> <mailto:peter.eisentraut@2ndquadrant.com>> wrote:
> 
>     Why are all the glossary terms capitalized?  Seems kind of strange.
> 
> 
> They weren't intended to be, and they don't appear to be in the page I'm 
> looking at. Are you referring to the anchor like in 
> https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? 
> If so, that all-capping is part of the rendering, as the ids were all 
> named in all-lower-case.

Sorry, I meant why is the first letter of each term capitalized.  That 
seems unusual.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Jürgen Purtz
Date:
On 17.05.20 17:28, Alvaro Herrera wrote:
> I think the terms under discussion are just
>
> * cluster
> * instance
> * server


Despite the short period of its existence the glossary achieved some 
importance, see: 
https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru 
. We have to be careful with publications. It's not acceptable that we 
change definitions from release to release. Therefore IMO we should mark 
or even ignore such terms for which we cannot reach consensus.

Can you agree to the following definitions? If no, we can alternatively 
formulate for each of them: "Under discussion - currently not defined". 
My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped 
into databases, and a collection of databases managed by a single 
PostgreSQL server instance constitutes a database cluster."


- "Database" (No change to existing definition): "A named collection of 
SQL objects."


- "Database Cluster", "Cluster" (New definition and rearrangements of 
some sentences): "A collection of related databases, and their common 
static and dynamic meta-data.

This term is sometimes used to refer to an instance.

(Don't confuse the term CLUSTER with the SQL command CLUSTER.)"


- "Data Directory" (Replaced 'instance' by 'cluster'): "The base 
directory on the filesystem of a server that contains all data files and 
subdirectories associated with a cluster (with the exception of 
tablespaces). The environment variable PGDATA is commonly used to refer 
to the data directory.

A cluster's storage space comprises the data directory plus any 
additional tablespaces.

For more information, see Section 68.1."


- "Database Server", "Instance" (Major changes): "A group of backend and 
auxiliary processes that communicate using a common shared memory area. 
One postmaster process manages the instance; one instance manages 
exactly one cluster with all its databases. Many instances can run on 
the same server as long as their TCP ports do not conflict.

The instance handles all key features of a DBMS: read and write access 
to files and shared memory, assurance of the ACID properties, 
connections to client processes, privilege verification, crash recovery, 
replication, etc."


- "Server" (No change to existing definition): "A computer on which 
PostgreSQL instances run. The term server denotes real hardware, a 
container, or a virtual machine.

This term is sometimes used to refer to an instance or to a host."


- "Host" (No change to existing definition): "A computer that 
communicates with other computers over a network. This is sometimes used 
as a synonym for server. It is also used to refer to a computer where 
client processes run."


--

Jürgen Purtz





Re: Add A Glossary

From
Jürgen Purtz
Date:
On 17.05.20 17:28, Alvaro Herrera wrote:
> I think the terms under discussion are just
>
> * cluster
> * instance
> * server


Despite the short period of its existence the glossary achieved some 
importance, see: 
https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru 
. We have to be careful with publications. It's not acceptable that we 
change definitions from release to release. Therefore IMO we should mark 
or even ignore such terms for which we cannot reach consensus.

Can you agree to the following definitions? If no, we can alternatively 
formulate for each of them: "Under discussion - currently not defined". 
My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped 
into databases, and a collection of databases managed by a single 
PostgreSQL server instance constitutes a database cluster."


- "Database" (No change to existing definition): "A named collection of 
SQL objects."


- "Database Cluster", "Cluster" (New definition and rearrangements of 
some sentences): "A collection of related databases, and their common 
static and dynamic meta-data.

This term is sometimes used to refer to an instance.

(Don't confuse the term CLUSTER with the SQL command CLUSTER.)"


- "Data Directory" (Replaced 'instance' by 'cluster'): "The base 
directory on the filesystem of a server that contains all data files and 
subdirectories associated with a cluster (with the exception of 
tablespaces). The environment variable PGDATA is commonly used to refer 
to the data directory.

A cluster's storage space comprises the data directory plus any 
additional tablespaces.

For more information, see Section 68.1."


- "Database Server", "Instance" (Major changes): "A group of backend and 
auxiliary processes that communicate using a common shared memory area. 
One postmaster process manages the instance; one instance manages 
exactly one cluster with all its databases. Many instances can run on 
the same server as long as their TCP ports do not conflict.

The instance handles all key features of a DBMS: read and write access 
to files and shared memory, assurance of the ACID properties, 
connections to client processes, privilege verification, crash recovery, 
replication, etc."


- "Server" (No change to existing definition): "A computer on which 
PostgreSQL instances run. The term server denotes real hardware, a 
container, or a virtual machine.

This term is sometimes used to refer to an instance or to a host."


- "Host" (No change to existing definition): "A computer that 
communicates with other computers over a network. This is sometimes used 
as a synonym for server. It is also used to refer to a computer where 
client processes run."


--

Jürgen Purtz





Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Jun-09, Jürgen Purtz wrote:

> Can you agree to the following definitions? If no, we can alternatively
> formulate for each of them: "Under discussion - currently not defined". My
> proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
> databases, and a collection of databases managed by a single PostgreSQL
> server instance constitutes a database cluster."

After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl
> pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.
>  -D, --pgdata=DATADIR   location of the database storage area
to:
> pg_ctl is a utility to initialize or control a PostgreSQL database cluster.
>  -D, --pgdata=DATADIR   location of the database directory

pg_basebackup:
> pg_basebackup takes a base backup of a running PostgreSQL server.
to:
> pg_basebackup takes a base backup of a PostgreSQL instance.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Jun-09, Jürgen Purtz wrote:

> Can you agree to the following definitions? If no, we can alternatively
> formulate for each of them: "Under discussion - currently not defined". My
> proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
> databases, and a collection of databases managed by a single PostgreSQL
> server instance constitutes a database cluster."

After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl
> pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.
>  -D, --pgdata=DATADIR   location of the database storage area
to:
> pg_ctl is a utility to initialize or control a PostgreSQL database cluster.
>  -D, --pgdata=DATADIR   location of the database directory

pg_basebackup:
> pg_basebackup takes a base backup of a running PostgreSQL server.
to:
> pg_basebackup takes a base backup of a PostgreSQL instance.


-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

From
Justin Pryzby
Date:
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:
> diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
> index 25b03f3b37..e29b55e5ac 100644
> --- a/doc/src/sgml/glossary.sgml
> +++ b/doc/src/sgml/glossary.sgml
> @@ -395,15 +395,15 @@
>      <para>
>       The base directory on the filesystem of a
>       <glossterm linkend="glossary-server">server</glossterm> that contains all
> -     data files and subdirectories associated with an
> -     <glossterm linkend="glossary-instance">instance</glossterm> (with the
> -     exception of <glossterm linkend="glossary-tablespace">tablespaces</glossterm>).
> +     data files and subdirectories associated with a
> +     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>
> +     (with the exception of
> +     <glossterm linkend="glossary-tablespace">tablespaces</glossterm>).

and (optionally) WAL

> +  <glossentry id="glossary-db-cluster">
> +   <glossterm>Database cluster</glossterm>
> +   <glossdef>
> +    <para>
> +     A collection of databases and global SQL objects,
> +     and their common static and dynamic meta-data.

metadata

> @@ -1245,12 +1255,17 @@
>       <glossterm linkend="glossary-sql-object">SQL objects</glossterm>,
>       which all reside in the same
>       <glossterm linkend="glossary-database">database</glossterm>.
> -     Each SQL object must reside in exactly one schema.
> +     Each SQL object must reside in exactly one schema
> +     (though certain types of SQL objects exist outside schemas).

(except for global objects which ..)

>      <para>
>       The names of SQL objects of the same type in the same schema are enforced
>       to be unique.
>       There is no restriction on reusing a name in multiple schemas.
> +     For local objects that exist outside schemas, their names are enforced
> +     unique across the whole database.  For global objects, their names

I would say "unique within the database"

> +     are enforced unique across the whole
> +     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>.

and "unique within the whole db cluster"

>        Most local objects belong to a specific
> -      <glossterm linkend="glossary-schema">schema</glossterm> in their containing database.
> +      <glossterm linkend="glossary-schema">schema</glossterm> in their
> +      containing database, such as
> +      <glossterm linkend="glossary-relation">all types of relations</glossterm>,
> +      <glossterm linkend="glossary-function">all types of functions</glossterm>,

Maybe say: >Relations< (all types), and >Functions< (all types)

>       used as the default one for all SQL objects, called <literal>pg_default</literal>.
                                                                                                            
 
"the default" (remove "one")

-- 
Justin



Re: Add A Glossary

From
Justin Pryzby
Date:
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:
> diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml
> index 25b03f3b37..e29b55e5ac 100644
> --- a/doc/src/sgml/glossary.sgml
> +++ b/doc/src/sgml/glossary.sgml
> @@ -395,15 +395,15 @@
>      <para>
>       The base directory on the filesystem of a
>       <glossterm linkend="glossary-server">server</glossterm> that contains all
> -     data files and subdirectories associated with an
> -     <glossterm linkend="glossary-instance">instance</glossterm> (with the
> -     exception of <glossterm linkend="glossary-tablespace">tablespaces</glossterm>).
> +     data files and subdirectories associated with a
> +     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>
> +     (with the exception of
> +     <glossterm linkend="glossary-tablespace">tablespaces</glossterm>).

and (optionally) WAL

> +  <glossentry id="glossary-db-cluster">
> +   <glossterm>Database cluster</glossterm>
> +   <glossdef>
> +    <para>
> +     A collection of databases and global SQL objects,
> +     and their common static and dynamic meta-data.

metadata

> @@ -1245,12 +1255,17 @@
>       <glossterm linkend="glossary-sql-object">SQL objects</glossterm>,
>       which all reside in the same
>       <glossterm linkend="glossary-database">database</glossterm>.
> -     Each SQL object must reside in exactly one schema.
> +     Each SQL object must reside in exactly one schema
> +     (though certain types of SQL objects exist outside schemas).

(except for global objects which ..)

>      <para>
>       The names of SQL objects of the same type in the same schema are enforced
>       to be unique.
>       There is no restriction on reusing a name in multiple schemas.
> +     For local objects that exist outside schemas, their names are enforced
> +     unique across the whole database.  For global objects, their names

I would say "unique within the database"

> +     are enforced unique across the whole
> +     <glossterm linkend="glossary-db-cluster">database cluster</glossterm>.

and "unique within the whole db cluster"

>        Most local objects belong to a specific
> -      <glossterm linkend="glossary-schema">schema</glossterm> in their containing database.
> +      <glossterm linkend="glossary-schema">schema</glossterm> in their
> +      containing database, such as
> +      <glossterm linkend="glossary-relation">all types of relations</glossterm>,
> +      <glossterm linkend="glossary-function">all types of functions</glossterm>,

Maybe say: >Relations< (all types), and >Functions< (all types)

>       used as the default one for all SQL objects, called <literal>pg_default</literal>.
                                                                                                            
 
"the default" (remove "one")

-- 
Justin



Re: Add A Glossary

From
Jürgen Purtz
Date:

On 17.06.20 02:09, Alvaro Herrera wrote:
On 2020-Jun-09, Jürgen Purtz wrote:

Can you agree to the following definitions? If no, we can alternatively
formulate for each of them: "Under discussion - currently not defined". My
proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
databases, and a collection of databases managed by a single PostgreSQL
server instance constitutes a database cluster."
After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl
pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.-D, --pgdata=DATADIR   location of the database storage area
to:
pg_ctl is a utility to initialize or control a PostgreSQL database cluster.-D, --pgdata=DATADIR   location of the database directory
pg_basebackup:
pg_basebackup takes a base backup of a running PostgreSQL server.
to:
pg_basebackup takes a base backup of a PostgreSQL instance.

+1, with two formal changes:

-  Rearrangement of term "Data page" to meet alphabetical order.

-  Add </glossdef> in one case to meet xml-well-formedness.


One last question: The definition of "Data directory" reads "... A cluster's storage space comprises the data directory plus ..." and 'cluster' links to '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?

--

Jürgen Purtz


Attachment

Re: Add A Glossary

From
Jürgen Purtz
Date:

On 17.06.20 02:09, Alvaro Herrera wrote:
On 2020-Jun-09, Jürgen Purtz wrote:

Can you agree to the following definitions? If no, we can alternatively
formulate for each of them: "Under discussion - currently not defined". My
proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into
databases, and a collection of databases managed by a single PostgreSQL
server instance constitutes a database cluster."
After sleeping on it a few more times, I don't oppose the idea of making
"instance" be the running state and "database cluster" the on-disk stuff
that supports the instance.  Here's a patch that does things pretty much
along the lines you suggested.

I made small adjustments to "SQL objects":

* SQL objects in schemas were said to have their names unique in the
schema, but we failed to say anything about names of objects not in
schemas and global objects.  Added that.

* Had example object types for global objects and objects not in
schemas, but no examples for objects in schemas.  Added that.


Some programs whose output we could tweak per this:
pg_ctl
pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.-D, --pgdata=DATADIR   location of the database storage area
to:
pg_ctl is a utility to initialize or control a PostgreSQL database cluster.-D, --pgdata=DATADIR   location of the database directory
pg_basebackup:
pg_basebackup takes a base backup of a running PostgreSQL server.
to:
pg_basebackup takes a base backup of a PostgreSQL instance.

+1, with two formal changes:

-  Rearrangement of term "Data page" to meet alphabetical order.

-  Add </glossdef> in one case to meet xml-well-formedness.


One last question: The definition of "Data directory" reads "... A cluster's storage space comprises the data directory plus ..." and 'cluster' links to '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?

--

Jürgen Purtz


Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Jun-16, Justin Pryzby wrote:
> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:

Thanks for the review.  I merged all your suggestions.  This one:

> >        Most local objects belong to a specific
> > +      <glossterm linkend="glossary-schema">schema</glossterm> in their
> > +      containing database, such as
> > +      <glossterm linkend="glossary-relation">all types of relations</glossterm>,
> > +      <glossterm linkend="glossary-function">all types of functions</glossterm>,
> 
> Maybe say: >Relations< (all types), and >Functions< (all types)

led me down not one but two rabbit holes; first I realized that
"functions" is an insufficient term since procedures should also be
included but weren't, so I had to add the more generic term "routine"
and then modify the definitions of all routine types to mix in well.  I
think overall the quality of these definitions is improved as a result.

I also felt the need to revise the definition of "relations", so I did
that too; this made me change the definition of resultset too.

On 2020-Jun-17, Jürgen Purtz wrote:

> +1, with two formal changes:
> 
> -  Rearrangement of term "Data page" to meet alphabetical order.

To forestall these ordering issues (look, another rabbit hole), I
grepped the file for all glossterms and sorted that under en_US rules,
then reordered the terms to match that.  Turns out there were several
other ordering mistakes.

git grep '<glossterm>'  | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig
LC_COLLATE=en_US.UTF-8 sort orig > sorted

(Eliminating the tags is important, otherwise the sort uses the tags
themselves to disambiguate)

> One last question: The definition of "Data directory" reads "... A cluster's
> storage space comprises the data directory plus ..." and 'cluster' links to
> '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?

Yes, an oversight, thanks.

I also added TPS, because I had already written it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

Re: Add A Glossary

From
Alvaro Herrera
Date:
On 2020-Jun-16, Justin Pryzby wrote:
> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:

Thanks for the review.  I merged all your suggestions.  This one:

> >        Most local objects belong to a specific
> > +      <glossterm linkend="glossary-schema">schema</glossterm> in their
> > +      containing database, such as
> > +      <glossterm linkend="glossary-relation">all types of relations</glossterm>,
> > +      <glossterm linkend="glossary-function">all types of functions</glossterm>,
> 
> Maybe say: >Relations< (all types), and >Functions< (all types)

led me down not one but two rabbit holes; first I realized that
"functions" is an insufficient term since procedures should also be
included but weren't, so I had to add the more generic term "routine"
and then modify the definitions of all routine types to mix in well.  I
think overall the quality of these definitions is improved as a result.

I also felt the need to revise the definition of "relations", so I did
that too; this made me change the definition of resultset too.

On 2020-Jun-17, Jürgen Purtz wrote:

> +1, with two formal changes:
> 
> -  Rearrangement of term "Data page" to meet alphabetical order.

To forestall these ordering issues (look, another rabbit hole), I
grepped the file for all glossterms and sorted that under en_US rules,
then reordered the terms to match that.  Turns out there were several
other ordering mistakes.

git grep '<glossterm>'  | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig
LC_COLLATE=en_US.UTF-8 sort orig > sorted

(Eliminating the tags is important, otherwise the sort uses the tags
themselves to disambiguate)

> One last question: The definition of "Data directory" reads "... A cluster's
> storage space comprises the data directory plus ..." and 'cluster' links to
> '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?

Yes, an oversight, thanks.

I also added TPS, because I had already written it.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Add A Glossary

From
Erik Rijkers
Date:
On 2020-06-19 01:51, Alvaro Herrera wrote:
> On 2020-Jun-16, Justin Pryzby wrote:
>> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:

I noticed one typo:

'aggregates functions'  should be
'aggregate functions'


And one thing that I am not sure of (but strikes me as a bit odd):
there are several cases of
'are enforced unique'. Should that not be
'are enforced to be unique'  ?


Anther small mistake (2x):

'The name of such objects of the same type are'  should be
'The names of such objects of the same type are'

(this phrase occurs 2x wrong, 1x correct)


thanks,

Erik Rijkers
















Re: Add A Glossary

From
Erik Rijkers
Date:
On 2020-06-19 01:51, Alvaro Herrera wrote:
> On 2020-Jun-16, Justin Pryzby wrote:
>> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote:

I noticed one typo:

'aggregates functions'  should be
'aggregate functions'


And one thing that I am not sure of (but strikes me as a bit odd):
there are several cases of
'are enforced unique'. Should that not be
'are enforced to be unique'  ?


Anther small mistake (2x):

'The name of such objects of the same type are'  should be
'The names of such objects of the same type are'

(this phrase occurs 2x wrong, 1x correct)


thanks,

Erik Rijkers
















Re: Add A Glossary

From
Alvaro Herrera
Date:
Thanks for these fixes!  I included all of these.

On 2020-Jun-19, Erik Rijkers wrote:

> And one thing that I am not sure of (but strikes me as a bit odd):
> there are several cases of
> 'are enforced unique'. Should that not be
> 'are enforced to be unique'  ?

I included this change too; I am not too sure of it myself.  If some
English language neatnik wants to argue one way or the other, be my
guest.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Add A Glossary

From
Alvaro Herrera
Date:
Thanks for these fixes!  I included all of these.

On 2020-Jun-19, Erik Rijkers wrote:

> And one thing that I am not sure of (but strikes me as a bit odd):
> there are several cases of
> 'are enforced unique'. Should that not be
> 'are enforced to be unique'  ?

I included this change too; I am not too sure of it myself.  If some
English language neatnik wants to argue one way or the other, be my
guest.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services