Thread: Add A Glossary
5. adding link anchors to each term would make them cite-able, useful in forum conversations.
4. My understanding of DocBook is not strong. The glossary vs glosslist tag issue is a bit confusing to me, and I'm not sure if the glossary tag is even appropriate for our needs.
6. I'm not quite sure how to handle terms that have different definitions in different contexts. Should that be two glossdefs following one glossterm, or two separate def/term pairs?
Please review and share your thoughts.
Attachment
Hello Corey, My 0.02€: On principle, I'm fine with having a glossary, i.e. word definitions, which are expected to be rather stable in the long run. I'm wondering whether the effort would not be made redundant by other on-line effort such as wikipedia, wiktionary, stackoverflow, standards, whatever. When explaining something, the teacher I am usually provides some level of example. This may or may not be appropriate there. ISTM that there should be pointers to relevant sections in the documentation, for instance "Analytics" provided definition suggests pointing to windowing functions. There is significant redundancy involved, because a lot of term would be defined in other sections anyway. There should be cross references, eg "Column" definition talks about Attribute, Table & View, which should be linked to. I'd consider making SQL keywords uppercase. Developing that is a significant undertaking. Do we have the available energy? Patch generates a warning on "git apply". sh> git apply ... ... terms-and-definitions.patch:159: tab in indent. [...] warning: 1 line adds whitespace errors. "Record" def as nested <para> for some unclear reason. Basically the redacted definitions look pretty clear and well written to the non-native English speaker I am. On Sun, 13 Oct 2019, Corey Huinker wrote: > Date: Sun, 13 Oct 2019 16:52:05 -0400 > From: Corey Huinker <corey.huinker@gmail.com> > To: pgsql-hackers@postgresql.org > Subject: Add A Glossary > > Attached is a v1 patch to add a Glossary to the appendix of our current > documentation. > > I believe that our documentation needs a glossary for a few reasons: > > 1. It's hard to ask for help if you don't know the proper terminology of > the problem you're having. > > 2. Readers who are new to databases may not understand a few of the terms > that are used casually both in the documentation and in forums. This helps > to make our documentation a bit more useful as a teaching tool. > > 3. Readers whose primary language is not English may struggle to find the > correct search terms, and this glossary may help them grasp that a given > term has a usage in databases that is different from common English usage. > > 3b. If we are not able to find the resources to translate all of the > documentation into a given language, translating the glossary page would be > a good first step. > > 4. The glossary would be web-searchable, and draw viewers to the official > documentation. > > 5. adding link anchors to each term would make them cite-able, useful in > forum conversations. > > > A few notes about this patch: > > 1. It's obviously incomplete. There are more terms, a lot more, to add. > > 2. The individual definitions supplied are off-the-cuff, and should be > thoroughly reviewed. > > 3. The definitions as a whole should be reviewed by an actual tech writer > (one was initially involved but had to step back due to prior commitments), > and the definitions should be normalized in terms of voice, tone, audience, > etc. > > 4. My understanding of DocBook is not strong. The glossary vs glosslist tag > issue is a bit confusing to me, and I'm not sure if the glossary tag is > even appropriate for our needs. > > 5. I've made no effort at making each term an anchor, nor have I done any > CSS styling at all. > > 6. I'm not quite sure how to handle terms that have different definitions > in different contexts. Should that be two glossdefs following one > glossterm, or two separate def/term pairs? > > Please review and share your thoughts. > -- Fabien Coelho - CRI, MINES ParisTech
On Sat, Nov 09, 2019 at 09:19:16AM +0100, Fabien COELHO wrote: > On principle, I'm fine with having a glossary, i.e. word definitions, which > are expected to be rather stable in the long run. > > I'm wondering whether the effort would not be made redundant by other > on-line effort such as wikipedia, wiktionary, stackoverflow, standards, > whatever. > > When explaining something, the teacher I am usually provides some level of > example. This may or may not be appropriate there. That's exactly a good reason for being a reviewer here. You have quite some insight here. > I'd consider making SQL keywords uppercase. > > Developing that is a significant undertaking. Do we have the available > energy? It seems like this could be a good idea, still the patch has been waiting on his author for more than two weeks now, so I have marked it as returned with feedback. -- Michael
Attachment
It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.
Attachment
It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.In light of feedback, I enlisted the help of an actual technical writer (Roger Harkavy, CCed) and we eventually found the time to take a second pass at this.Attached is a revised patch.
Attachment
This latest version is an attempt at merging the work of Jürgen Purtz into what I had posted earlier. There was relatively little overlap in the terms we had chosen to define.Each glossary definition now has a reference id (good idea Jürgen), the form of which is "glossary-term". So we can link to the glossary from outside if we so choose.I encourage everyone to read the definitions, and suggest fixes to any inaccuracies or awkward phrasings. Mostly, though, I'm seeking feedback on the structure itself, and hoping to get that committed.On Tue, Feb 11, 2020 at 11:22 PM Corey Huinker <corey.huinker@gmail.com> wrote:It seems like this could be a good idea, still the patch has been
waiting on his author for more than two weeks now, so I have marked it
as returned with feedback.In light of feedback, I enlisted the help of an actual technical writer (Roger Harkavy, CCed) and we eventually found the time to take a second pass at this.Attached is a revised patch.
I made changes on top of 0001-add-glossary-page.patch which was supplied by C. Huinker. This affects not only terms proposed by me but also his original terms. If my changes are not obvious, please let me know and I will describe my motivation. Please note especially lines marked with question marks. It will be helpful for diff-ing to restrict the length of lines in the SGML files to 71 characters (as usual). J. Purtz
Attachment
I made changes on top of 0001-add-glossary-page.patch which was supplied
by C. Huinker. This affects not only terms proposed by me but also his
original terms. If my changes are not obvious, please let me know and I
will describe my motivation.
Please note especially lines marked with question marks.
It will be helpful for diff-ing to restrict the length of lines in the
SGML files to 71 characters (as usual).
J. Purtz
It will be helpful for diff-ing to restrict the length of lines in the
SGML files to 71 characters (as usual).
3. I recall that the code is put through a linter as one of the final steps before release, I assumed that the SGML gets the same.
* Partition - the a/b split works for me
I'm not so sure on / responses to your ???s:
* The statement that names of schema objects are unique isn't strictly true, just mostly true. Take the case of a unique constraints. The constraint has a name and the unique index has the same name, to the point where adding a unique constraint using an existing index renames that index to conform to the constraint name.
* Transaction - yes, all those things could be "visible" or they could be "side effects". It may be best to leave the over-simplified definition in place, and add a "For more information see <<linref to tutorial-transactions>>
transaction-iso would be a better linkref in this case
The statement that names of schema objects are unique isn't strictly true, just mostly true. Take the case of a unique constraints.
Concerning CONSTRAINTS you are right. Constraints seems to be an exception:
- Their name belongs to a schema, but are not necessarily unique within this context: https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
- There is a UNIQUE index within the system catalog pg_constraints: "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname), which expresses that names are unique within the context of a table/constraint-type. Nevertheless tests have shown that some stronger restrictions exists across table-boarders (,which seems to be implemented in CREATE statements - or as a consequence of your mentioned correlation between constraint and index ?).
I hope that there are no more such exception to the global rule 'object names in a schema are unique': https://www.postgresql.org/docs/current/sql-createschema.html
This facts must be mentioned as a short note in glossary and in more detail in the later patch about the architecture.
J. Purtz
The statement that names of schema objects are unique isn't strictly true, just mostly true. Take the case of a unique constraints.Concerning CONSTRAINTS you are right. Constraints seems to be an exception:
- Their name belongs to a schema, but are not necessarily unique within this context: https://www.postgresql.org/docs/current/catalog-pg-constraint.html.
- There is a UNIQUE index within the system catalog pg_constraints: "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname), which expresses that names are unique within the context of a table/constraint-type. Nevertheless tests have shown that some stronger restrictions exists across table-boarders (,which seems to be implemented in CREATE statements - or as a consequence of your mentioned correlation between constraint and index ?).
I hope that there are no more such exception to the global rule 'object names in a schema are unique': https://www.postgresql.org/docs/current/sql-createschema.html
This facts must be mentioned as a short note in glossary and in more detail in the later patch about the architecture.
Attachment
I gave this a look. I first reformatted it so I could read it; that's 0001. Second I changed all the long <link> items into <xref>s, which are shorter and don't have to repeat the title of the refered to page. (Of course, this changes the link to be in the same style as every other link in our documentation; some people don't like it. But it's our style.) There are some mistakes. "Tupple" is most glaring one -- not just the typo but also the fact that it goes to sql-revoke. A few definitions we'll want to modify. Nothing too big. In general I like this work and I think we should have it in pg13. Please bikeshed the definition of your favorite term, and suggest what other terms to add. No pointing out of mere typos yet, please. I think we should have the terms Consistency, Isolation, Durability. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
I gave this a look. I first reformatted it so I could read it; that's
0001. Second I changed all the long <link> items into <xref>s, which
are shorter and don't have to repeat the title of the refered to page.
(Of course, this changes the link to be in the same style as every other
link in our documentation; some people don't like it. But it's our
style.)
There are some mistakes. "Tupple" is most glaring one -- not just the
typo but also the fact that it goes to sql-revoke. A few definitions
we'll want to modify. Nothing too big. In general I like this work and
I think we should have it in pg13.
Please bikeshed the definition of your favorite term, and suggest what
other terms to add. No pointing out of mere typos yet, please.
I think we should have the terms Consistency, Isolation, Durability.
Jürgen mentioned off-list that the man page doesn't build. I was going to look into that, but if anyone has more familiarity with that, I'm listening.
sgml/filelist.sgml:<!ENTITY acronyms SYSTEM "acronyms.sgml">
sgml/postgres.sgml: &acronyms;
sgml/release.sgml:[A-Z][A-Z_ ]+[A-Z_] <command>, <literal>, <envar>, <acronym>
sgml/stylesheet.css:acronym { font-style: inherit; }
Of course I could be missing something.
On 2020-Mar-20, Corey Huinker wrote: > > Jürgen mentioned off-list that the man page doesn't build. I was going to > > look into that, but if anyone has more familiarity with that, I'm listening. > Looking at this some more, I'm not sure anything needs to be done for man > pages. Yeah, I don't think he was saying that we needed to do anything to produce a glossary man page; rather that the "make man" command failed. I tried it here, and indeed it failed. But on further investigation, after a "make maintainer-clean" it no longer failed. I'm not sure what to make of it, but it seems that this patch needn't concern itself with that. I gave a read through the first few actual definitions. It's a much slower work than I thought! Attached you'll find the first few edits that I propose. Looking at the definition of "Aggregate" it seemed weird to have it stand as a verb infinitive. I looked up other glossaries, found this one https://www.gartner.com/en/information-technology/glossary?glossaryletter=T and realized that when they do verbs, they put the present participle (-ing) form. So I changed it to "Aggregating", and split out the "Aggregate function" into its own term. In Atomic, there seemed to be excessive use of <glossterm> in the definitions. Style guides seem to suggest to do that only the first time you use a term in a definition. I removed some markup. I'm not sure about some terms such as "analytic" and "backend server". I put them in XML comments for now. The other changes should be self-explanatory. It's hard to review work from a professional tech writer. I'm under the constant impression that I'm ruining somebody's perfect end product, making a fool of myself. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-Mar-20, Corey Huinker wrote:
> > Jürgen mentioned off-list that the man page doesn't build. I was going to
> > look into that, but if anyone has more familiarity with that, I'm listening.
> Looking at this some more, I'm not sure anything needs to be done for man
> pages.
Yeah, I don't think he was saying that we needed to do anything to
produce a glossary man page; rather that the "make man" command failed.
I tried it here, and indeed it failed. But on further investigation,
after a "make maintainer-clean" it no longer failed. I'm not sure what
to make of it, but it seems that this patch needn't concern itself with
that.
I gave a read through the first few actual definitions. It's a much
slower work than I thought! Attached you'll find the first few edits
that I propose.
Looking at the definition of "Aggregate" it seemed weird to have it
stand as a verb infinitive. I looked up other glossaries, found this
one
https://www.gartner.com/en/information-technology/glossary?glossaryletter=T
and realized that when they do verbs, they put the present participle
(-ing) form. So I changed it to "Aggregating", and split out the
"Aggregate function" into its own term.
In Atomic, there seemed to be excessive use of <glossterm> in the
definitions. Style guides seem to suggest to do that only the first
time you use a term in a definition. I removed some markup.
I'm not sure about some terms such as "analytic" and "backend server".
I put them in XML comments for now.
The other changes should be self-explanatory.
It's hard to review work from a professional tech writer. I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
It's hard to review work from a professional tech writer. I'm under the
constant impression that I'm ruining somebody's perfect end product,
making a fool of myself.
On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote: > + <glossterm>Aggregate</glossterm> > + <glossdef> > + <para> > + To combine a collection of data values into a single value, whose > + value may not be of the same type as the original values. > + <glossterm>Aggregate</glossterm> <glossterm>Functions</glossterm> > + combine multiple <glossterm>Rows</glossterm> that share a common set > + of values into one <glossterm>Row</glossterm>, which means that the > + only data visible in the values in common, and the aggregates of the IS the values in common ? (or, "is the shared values") > + <glossterm>Analytic</glossterm> > + <glossdef> > + <para> > + A <glossterm>Function</glossterm> whose computed value can reference > + values found in nearby <glossterm>Rows</glossterm> of the same > + <glossterm>Result Set</glossterm>. > + <glossterm>Archiver</glossterm> Can you change that to archiver process ? > + <glossterm>Atomic</glossterm> .. > + <para> > + In reference to an operation: An event that cannot be completed in > + part: it must either entirely succeed or entirely fail. A series of Can you say: "an action which is not allowed to partially succed and then fail, ..." > + <glossterm>Autovacuum</glossterm> Say autovacuum process ? > + <glossdef> > + <para> > + Processes that remove outdated <acronym>MVCC</acronym> I would say "A set of processes that remove..." > + <glossterm>Records</glossterm> of the <glossterm>Heap</glossterm> and I'm not sure, can you say "tuples" ? > + <glossterm>Backend Process</glossterm> > + <glossdef> > + <para> > + Processes of an <glossterm>Instance</glossterm> which act on behalf of Say DATABASE instance > + <glossterm>Backend Server</glossterm> > + <glossdef> > + <para> > + See <glossterm>Instance</glossterm>. same > + <glossterm>Background Worker</glossterm> > + <glossdef> > + <para> > + Individual processes within an <glossterm>Instance</glossterm>, which same > + run system- or user-supplied code. Typical use cases are processes > + which handle parts of an <acronym>SQL</acronym> query to take > + advantage of parallel execution on servers with multiple > + <acronym>CPUs</acronym>. I would say "A typical use case is" > + <glossterm>Background Writer</glossterm> Add "process" ? > + <glossdef> > + <para> > + Writes continuously dirty pages from <glossterm>Shared Say "Continuously writes" > + Memory</glossterm> to the file system. It starts periodically, but Hm, maybe "wakes up periodically" > + <glossterm>Cast</glossterm> > + <glossdef> > + <para> > + A conversion of a <glossterm>Datum</glossterm> from its current data > + type to another data type. maybe just say A conversion of a <glossterm>Datum</glossterm> another data type. > + <glossterm>Catalog</glossterm> > + <glossdef> > + <para> > + The <acronym>SQL</acronym> standard uses this standalone term to > + indicate what is called a <glossterm>Database</glossterm> in > + <productname>PostgreSQL</productname>'s terminology. Maybe remove "standalone" ? > + <glossterm>Checkpointer</glossterm> Process > + A process that writes dirty pages and <glossterm>WAL > + Records</glossterm> to the file system and creates a special Does the chckpointer actually write WAL ? > + checkpoint record. This process is initiated when predefined > + conditions are met, such as a specified amount of time has passed, or > + a certain volume of records have been collected. collected or written? I would say: > + A checkpoint is usually initiated by > + a specified amount of time having passed, or > + a certain volume of records having been written. > + <glossterm>Checkpoint</glossterm> > + <glossdef> > + <para> > + A <link linkend="sql-checkpoint"> Checkpoint</link> is a point in time Extra space > + <glossentry id="glossary-connection"> > + <glossterm>Connection</glossterm> > + <glossdef> > + <para> > + A <acronym>TCP/IP</acronym> or socket line for inter-process I don't know if I've ever heard the phase "socket line" I guess you mean a unix socket. > + <glossterm>Constraint</glossterm> > + <glossdef> > + <para> > + A concept of restricting the values of data allowed within a > + <glossterm>Table</glossterm>. Just say: "A restriction on the values..."? > + <glossterm>Data Area</glossterm> Remove this ? I've never heard this phrase before. > + <glossdef> > + <para> > + The base directory on the filesystem of a > + <glossterm>Server</glossterm> that contains all data files and > + subdirectories associated with a <glossterm>Cluster</glossterm> with > + the exception of tablespaces. The environment variable Should add an entry for "tablespace". > + <glossterm>Datum</glossterm> > + <glossdef> > + <para> > + The internal representation of a <acronym>SQL</acronym> data type. I'm not sure if should use "a SQL" or "an SQL", but not both. > + <glossterm>Delete</glossterm> > + <glossdef> > + <para> > + A <acronym>SQL</acronym> command whose purpose is to remove just say "which removes" > + <glossentry id="glossary-file-segment"> > + <glossterm>File Segment</glossterm> > + <glossdef> > + <para> > + If a heap or index file grows in size over 1 GB, it will be split 1GB is the default "segment size", which you should define. > + <glossentry id="glossary-foreign-data-wrapper"> > + <glossterm>Foreign Data Wrapper</glossterm> > + <glossdef> > + <para> > + A means of representing data that is not contained in the local > + <glossterm>Database</glossterm> as if were in local > + <glossterm>Table</glossterm>(s). I'd say: + A means of representing data as a <glossterm>Table</glossterm>(s) even though + it is not contained in the local <glossterm>Database</glossterm> > + <glossentry id="glossary-foreign-key"> > + <glossterm>Foreign Key</glossterm> > + <glossdef> > + <para> > + A type of <glossterm>Constraint</glossterm> defined on one or more > + <glossterm>Column</glossterm>s in a <glossterm>Table</glossterm> which > + requires the value in those <glossterm>Column</glossterm>s to uniquely > + identify a <glossterm>Row</glossterm> in the specified > + <glossterm>Table</glossterm>. An FK doesn't require the values in its table to be unique, right ? I'd say something like: "..which enforces that the values in those Columns are also present in an(other) table." Reference Referential Integrity? > + <glossterm>Function</glossterm> > + <glossdef> > + <para> > + Any pre-defined transformation of data. Many > + <glossterm>Functions</glossterm> are already defined within > + <productname>PostgreSQL</productname> itself, but can also be > + user-defined. I would remove "pre-", since you mentioned that it can be user-defined. > + <glossterm>Global SQL Object</glossterm> > + <glossdef> > + <para> > + <!-- FIXME --> > + Not all <glossterm>SQL Objects</glossterm> belong to a certain > + <glossterm>Schema</glossterm>. Some belong to the complete > + <glossterm>Database</glossterm>, or even to the complete > + <glossterm>Cluster</glossterm>. These are referred to as > + <glossterm>Global SQL Objects</glossterm>. Collations and Extensions > + such as <glossterm>Foreign Data Wrappers</glossterm> reside at the > + <glossterm>Database</glossterm> level; <glossterm>Database</glossterm> > + names, <glossterm>Roles</glossterm>, > + <glossterm>Tablespaces</glossterm>, <glossterm>Replication</glossterm> > + origins, and subscriptions for logical > + <glossterm>Replication</glossterm> at the > + <glossterm>Cluster</glossterm> level. I think "complete" is the wrong world. I would say: "An object which is not specific to a given database, but instead shared across the entire Cluster". > + <glossentry id="glossary-grant"> > + <glossterm>Grant</glossterm> > + <glossdef> > + <para> > + A <acronym>SQL</acronym> command that is used to enable I'd say "allow" > + <glossentry id="glossary-heap"> > + <glossterm>Heap</glossterm> > + <glossdef> > + <para> > + Contains the original values of <glossterm>Row</glossterm> attributes I'm not sure what "original" means here ? > + (i.e. the data). The <glossterm>Heap</glossterm> is realized within > + <glossterm>Database</glossterm> files and mirrored in > + <glossterm>Shared Memory</glossterm>. I wouldn't say mirrored, and probably just remove at least the part after "and". > + <glossentry id="glossary-host"> > + <glossterm>Host</glossterm> > + <glossdef> > + <para> > + See <glossterm>Server</glossterm>. Or client. Or proxy at some layer or other intermediate thing. Maybe just remove this. > + <glossentry id="glossary-index"> > + <glossterm>Index</glossterm> > + <glossdef> > + <para> > + A <glossterm>Relation</glossterm> that contains data derived from a > + <glossterm>Table</glossterm> (or <glossterm>Relation</glossterm> such > + as a <glossterm>Materialized View</glossterm>). It's internal Its > + structure supports very fast retrieval of and access to the original > + data. > + <glossterm>Instance</glossterm> > + <glossdef> > + <para> ... > + <para> > + Many <glossterm>Instances</glossterm> can run on the same server as > + long as they use different <acronym>IP</acronym> ports and manage I would say "as long as their TCP/IP ports or sockets don't conflict, and manage..." > + <glossterm>Join</glossterm> > + <glossdef> > + <para> > + A technique used with <command>SELECT</command> statements for > + correlating data in one or more <glossterm>Relations</glossterm>. I would refer to this as a SQL keyword allowing to combine data from multiple relations. > + <glossterm>Lock</glossterm> > + <glossdef> > + <para> > + A mechanism for one process temporarily preventing data from being > + manipulated by any other process. I'd say: + A mechanism by which a process protects simultaneous access to a resource + by other processes. (I said "protects" since shared locks don't prevent all access, and it's easier than explaining "unsafe access"). > + <glossentry id="glossary-log-file"> > + <glossterm>Log File</glossterm> > + <glossdef> > + <para> > + <link linkend="logfile-maintenance">LOG files</link> contain readable > + text lines about serious and non-serious events, e.g.: use of wrong > + password, long-running queries, ... . Serious and non-serious? > + <glossterm>Log Writer</glossterm> process > + <glossdef> > + <para> > + If activated and parameterized, the I don't know what parameterized means here > + <link linkend="runtime-config-logging">Log Writer</link> process > + writes information about database events into the current > + <glossterm>Log file</glossterm>. When reaching certain time- or > + volume-dependent criterias, he <!-- FIXME "he"? --> creates a new I think criteria is the plural.. > + <glossterm>Log Record</glossterm> Can we remove this ? Couple releases ago, "pg_xlog" was renamed to pg_wal. I'd prefer to avoid defining something called "Log Record" about WAL that's right next to text logs. > + <glossterm>Logged</glossterm> > + <glossdef> > + <para> > + A <glossterm>Table</glossterm> is considered > + <glossterm>Logged</glossterm> if changes to it are sent to the > + <glossterm>WAL Log</glossterm>. By default, all regular > + <glossterm>Tables</glossterm> are <glossterm>Logged</glossterm>. A > + <glossterm>Table</glossterm> can be speficied as unlogged either at > + creation time or via the <command>ALTER TABLE</command> command. The > + primary use of unlogged <glossterm>Tables</glossterm> is for storing > + transient work data that must be shared across processes, but with a > + final result stored in logged <glossterm>Tables</glossterm>. > + <glossterm>Temporary Tables</glossterm> are always unlogged. > + </para> > + </glossdef> > + </glossentry> Maybe it's be better to define "unlogged", since 1) logged is the default; and 2) it's right next to text logs. > + <glossterm>Master</glossterm> > + <glossdef> > + <para> > + When two or more <glossterm>Databases</glossterm> are linked via > + <glossterm>Replication</glossterm>, the <glossterm>Server</glossterm> > + that is considered the authoritative source of information is called > + the <glossterm>Master</glossterm>. I think it'd actually be the <<instance>> which is authoritative, in case they're running on the same <<Server>> > + <glossentry id="glossary-materialized"> > + <glossterm>Materialized</glossterm> > + <glossdef> > + <para> > + The act of storing information rather than just the means of accessing remove "means of" ? > + the information. This term is used in <glossterm>Materialized > + Views</glossterm> meaning that the data derived from the > + <glossterm>View</glossterm> is actually stored on disk separate from separately > + the sources of that data. When the term > + <glossterm>Materialized</glossterm> is used in speaking about > + mulit-step queries, it means that the data of a given step is stored multi > + (in memory, but that storage may spill over onto disk). > + </para> > + </glossdef> > + </glossentry> > + > + <glossentry id="glossary-materialized-view"> > + <glossterm>Materialized View</glossterm> > + <glossdef> > + <para> > + A <glossterm>Relation</glossterm> that is defined in the same way that > + a <glossterm>View</glossterm> is, but it stores data in the same way change "it stores" to stores > + <glossentry id="glossary-partition"> > + <glossterm>Partition</glossterm> > + <glossdef> > + <para> > + <!-- FIXME should this use the style used in "atomic"? --> > + a) A <glossterm>Table</glossterm> that can be queried independently by > + its own name, but can also be queried via another just say "on its own" or "directly" > + <glossterm>Table</glossterm>, a partitionend partitioned also, put it in parens, like "via another table (a partitioned table)..." > + <glossterm>Table</glossterm>, which is a collection of Say "set" here since you later talk about "subsets" and sets. > + <glossentry id="glossary-primary-key"> > + <glossterm>Primary Key</glossterm> > + <glossdef> > + <para> > + A special case of <glossterm>Unique Index</glossterm> defined on a > + <glossterm>Table</glossterm> or other <glossterm>Relation</glossterm> > + that also guarantees that all of the <glossterm>Attributes</glossterm> > + within the <glossterm>Primary Key</glossterm> do not have > + <glossterm>Null</glossterm> values. As the name implies, there can be > + only one <glossterm>Primary Key</glossterm> per > + <glossterm>Table</glossterm>, though it is possible to have multiple > + <glossterm>Unique Indexes</glossterm> that also have no > + <glossterm>Null</glossterm>-capable <glossterm>Attributes</glossterm>. I would say "multiple >>unique indexes<< on >>attributes<< defined as not nullable. > + <glossterm>Procedure</glossterm> > + <glossdef> > + <para> > + A defined set of instructions for manipulating data within a > + <glossterm>Database</glossterm>. <glossterm>Procedure</glossterm> can "procedures" or "a procedure" > + <glossterm>Record</glossterm> > + <glossdef> > + <para> > + See <link linkend="sql-revoke">Tupple</link>. Tupple is back. And again below. > + A single <glossterm>Row</glossterm> of a <glossterm>Table</glossterm> > + or other Relation. I think it's commonly used to mean "an instance of a row" (in an MVCC sense), but maybe that's too much detail for here. > + <glossterm>Referential Integrity</glossterm> > + <glossdef> > + <para> > + The means of restricting data in one <glossterm>Relation</glossterm> A means > + <glossentry id="glossary-relation"> > + <glossterm>Relation</glossterm> > + <glossdef> > + <para> > + The generic term for all objects in a <glossterm>Database</glossterm> "A generic term for any object in a >>database<< that has a name and..." > + <glossentry id="glossary-result-set"> > + <glossterm>Result Set</glossterm> > + <glossdef> > + <para> > + A data structure transmitted from a <glossterm>Server</glossterm> to > + client program upon the completion of a <acronym>SQL</acronym> > + command, usually a <command>SELECT</command> but it can be an > + <command>INSERT</command>, <command>UPDATE</command>, or > + <command>DELETE</command> command if the <literal>RETURNING</literal> > + clause is specified. I'd remove everything in that sentence after "usually". > + <glossterm>Revoke</glossterm> > + <glossdef> > + <para> > + A command to reduce access to a named set of s/reduce/prevent/ ? > + <glossterm>Row</glossterm> > + <glossdef> > + <para> > + See <link linkend="sql-revoke">Tupple</link>. tuple > + <glossentry id="glossary-savepoint"> > + <glossterm>Savepoint</glossterm> > + <glossdef> > + <para> > + A special mark (such as a timestamp) inside a > + <glossterm>Transaction</glossterm>. Data modifications after this > + point in time may be rolled back to the time of the savepoint. I don't think "timestamp" is a useful or accurate analogy for this. > + <glossterm>Schema</glossterm> > + <glossdef> > + <para> > + A <link linkend="ddl-schemas">schema</link> is a namespace for > + <glossterm>SQL objects</glossterm>, which all reside in the same > + <glossterm>database</glossterm>. Each <glossterm>SQL > + object</glossterm> must reside in exactly one > + <glossterm>Schema</glossterm>. > + </para> > + <para> > + In general, the names of <glossterm>SQL objects</glossterm> in the > + schema are unique - even across different types of objects. The lone > + exception is the case of <glossterm>Unique</glossterm> > + <glossterm>Constraint</glossterm>s, in which case there > + <emphasis>must</emphasis> be a <glossterm>Unique Index</glossterm> > + with the same name and <glossterm>Schema</glossterm> as the > + <glossterm>Constraint</glossterm>. There is no restriction on having > + a name used in multiple <glossterm>Schema</glossterm>s. I think there's some confusion. Constraints are not objects, right ? But, constraints do have an exception (not just unique constraints, though): the constraint is only unique on its table, not in its database/schema. "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname) CLUSTER > + <glossterm>Select</glossterm> > + <glossdef> > + <para> > + The command used to query a <glossterm>Database</glossterm>. Normally, > + <command>SELECT</command>s are not expected to modify the > + <glossterm>Database</glossterm> in any way, but it is possible that > + <glossterm>Functions</glossterm> invoked within the query could have > + side-effects that do modify data. </para> I think there should be references to the sql-* pages for this and others. > + <glossentry id="glossary-serializable"> > + <glossterm>Serializable</glossterm> > + <glossdef> > + <para> > + Transactions defined as <literal>SERIALIZABLE</literal> are unable to > + see changes made within other transactions. In effect, for the > + initializing session the entire <glossterm>Database</glossterm> > + appears to be frozen duration such a > + <glossterm>Transaction</glossterm>. Do you mean "for the duration of the >>Transaction<<" > + <glossentry id="glossary-session"> > + <glossterm>Session</glossterm> > + <glossdef> > + <para> > + A <glossterm>Connection</glossterm> to the <glossterm>Database</glossterm>. > + </para> > + <para> > + A description of the commands that were issued in the life cycle of a > + particular <glossterm>Connection</glossterm> to the > + <glossterm>Database</glossterm>. I'm not sure what this <para> means. > + <glossterm>Sequence</glossterm> > + <glossdef> > + <para> > + <!-- sounds excessively complicated a definition --> > + An <glossterm>Database</glossterm> object which represents the A not An > + mathematical concept of a numerical integral sequence. It can be > + thought of as a <glossterm>Table</glossterm> with exactly one > + <glossterm>Row</glossterm> and one <glossterm>Column</glossterm>. The > + value stored is known as the current value. A > + <glossterm>Sequence</glossterm> has a defined direction (almost always > + increasing) and an interval step (usually 1). Whenever the > + <literal>NEXTVAL</literal> pseudo-column of a > + <glossterm>Sequence</glossterm> is accessed, the current value is moved > + in the defined direction by the defined interval step, and that value say "given interval step" > + <glossterm>Shared Memory</glossterm> > + <glossdef> > + <para> > + <acronym>RAM</acronym> which is used by the processes common to an > + <glossterm>Instance</glossterm>. It mirrors parts of > + <glossterm>Database</glossterm> files, provides an area for > + <glossterm>WAL Records</glossterm>, Do we use shared_buffers for WAL ? > + <glossentry id="glossary-table"> > + <glossterm>Table</glossterm> > + <glossdef> > + <para> > + A collection of <glossterm>Tuples</glossterm> (also known as > + <glossterm>Rows</glossterm> or <glossterm>Records</glossterm>) having > + a common data structure (the same number of > + <glossterm>Attributes</glossterm>s, in the same order, having the same Attributes has two esses. > + name and type per position). A <glossterm>Table</glossterm> is the I don't think you need to say here that the columns of a table all have the same type and order. > + <glossterm>Temporary Tables</glossterm> > + <glossdef> > + <para> > + <glossterm>Table</glossterm>s that exist either for the lifetime of a > + <glossterm>Session</glossterm> or a > + <glossterm>Transaction</glossterm>, as defined at creation time. The I would say "as specified at the time of its creation". > + <glossterm>Transaction</glossterm> > + <glossdef> > + <para> > + A combination of one or more commands that must act as a single Remove "one or more" > + <glossterm>Trigger</glossterm> > + <glossdef> > + <para> > + A <glossterm>Function</glossterm> which can be defined to execute > + whenever a certain operation (<command>INSERT</command>, > + <command>UPDATE</command>, or <command>DELTE</command>) is applied to > + that <glossterm>Relation</glossterm>. A <glossterm>Trigger</glossterm> s/that/a/ > + <glossentry id="glossary-unique"> > + <glossterm>Unique</glossterm> > + <glossdef> > + <para> > + The condition of having no matching values in the same s/matching/duplicate/ > + <glossterm>Relation</glossterm>. Most often used in the concept of s/concept/context/ > + <glossentry id="glossary-update"> > + <glossterm>Update</glossterm> > + <glossdef> > + <para> > + A command used to modify <glossterm>Rows</glossterm> that already or 'may already' > + <glossterm>WAL File</glossterm> ... > + <para> > + The sequence of <glossterm>WAL Records</glossterm> in combination with > + the sequence of <glossterm>WAL Files</glossterm> represents the Remove "in combination with the sequence of >WAL Files<" > + <glossentry id="glossary-wal-log"> > + <glossterm>WAL Log</glossterm> Can you just say WAL or "write-ahead log". > + <glossdef> > + <para> > + A <glossterm>WAL Record</glossterm> contains either new or changed > + <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> data or > + information about a <command>COMMIT</command>, > + <command>ROLLBACK</command>, <command>SAVEPOINT</command>, or > + <glossterm>Checkpointer</glossterm> operation. WAL records use a > + non-printabe binary format. non-printable Or just remove it. Or just remove the sentence. > + <glossterm>WAL Writer</glossterm> process > + <glossentry id="glossary-window-function"> > + <glossterm>Window Function</glossterm> > + <glossdef> > + <para> > + A type of <glossterm>Function</glossterm> similar to an > + <glossterm>Aggregate</glossterm> in that can derive its value from a in that IT > + set of <glossterm>Rows</glossterm> in a <glossterm>Result > + Set</glossterm>, but still retaining the original source data. -- Justin
man pages: Sorry, if I confused someone with my poor English. I just want to express in my 'offline' mail that we don't have to worry about man page generation. The patch doesn't affect files in the /ref subdirectory from where man pages are created. review process: Yes, it will be time-consumptive and it may be a hard job because of a) the patch has multiple authors with divergent writing styles and b) the terms affect different fundamental issues: SQL basics and PG basics. Concerning PG basics in the past we used a wide range of similar terms with different meanings as well as different terms for the same matter - within our documentation as well as in secondary publications. The terms "backend server" / "instance" are such an example and there shall be a clear decision in favor of one of the two. Presumably we will see more discussions about the question which one is the preferred term (remember the discussion concerning the terms master/slave, primary/secondary some weeks ago). ongoing: Intermediate questions for clarifications are welcome. Kind regards, Jürgen
On 20.03.20 20:58, Justin Pryzby wrote: > On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote: >> + <glossterm>Aggregate</glossterm> >> + <glossdef> >> + <para> >> + To combine a collection of data values into a single value, whose >> + value may not be of the same type as the original values. >> + <glossterm>Aggregate</glossterm> <glossterm>Functions</glossterm> >> + combine multiple <glossterm>Rows</glossterm> that share a common set >> + of values into one <glossterm>Row</glossterm>, which means that the >> + only data visible in the values in common, and the aggregates of the > IS the values in common ? > (or, "is the shared values") > >> + <glossterm>Analytic</glossterm> >> + <glossdef> >> + <para> >> + A <glossterm>Function</glossterm> whose computed value can reference >> + values found in nearby <glossterm>Rows</glossterm> of the same >> + <glossterm>Result Set</glossterm>. >> + <glossterm>Archiver</glossterm> > Can you change that to archiver process ? I prefer the short term without the addition of 'process' - concerning 'Archiver' as well as the other cases. But I'm not an native English speaker. >> + <glossterm>Atomic</glossterm> > .. >> + <para> >> + In reference to an operation: An event that cannot be completed in >> + part: it must either entirely succeed or entirely fail. A series of > Can you say: "an action which is not allowed to partially succed and then fail, > ..." > >> + <glossterm>Autovacuum</glossterm> > Say autovacuum process ? > >> + <glossdef> >> + <para> >> + Processes that remove outdated <acronym>MVCC</acronym> > I would say "A set of processes that remove..." > >> + <glossterm>Records</glossterm> of the <glossterm>Heap</glossterm> and > I'm not sure, can you say "tuples" ? This concerns the upcomming MVCC terms. We need a linguistic distinction between the different versions of 'records' or 'tuples'. In my understanding the term 'tuple' is nearer to a logical construct (relational algebra) and a 'record' some concrete implementation on disc. Therefor I prefer 'record' in this context. >> + <glossterm>Backend Process</glossterm> >> + <glossdef> >> + <para> >> + Processes of an <glossterm>Instance</glossterm> which act on behalf of > Say DATABASE instance -1: The term 'database' is used inflationary. We shall restrict it to a few cases. >> + <glossterm>Backend Server</glossterm> >> + <glossdef> >> + <para> >> + See <glossterm>Instance</glossterm>. > same > >> + <glossterm>Background Worker</glossterm> >> + <glossdef> >> + <para> >> + Individual processes within an <glossterm>Instance</glossterm>, which > same > >> + run system- or user-supplied code. Typical use cases are processes >> + which handle parts of an <acronym>SQL</acronym> query to take >> + advantage of parallel execution on servers with multiple >> + <acronym>CPUs</acronym>. > I would say "A typical use case is" +1 >> + <glossterm>Background Writer</glossterm> > Add "process" ? > >> + <glossdef> >> + <para> >> + Writes continuously dirty pages from <glossterm>Shared > Say "Continuously writes" +1 >> + Memory</glossterm> to the file system. It starts periodically, but > Hm, maybe "wakes up periodically" +1 >> + <glossterm>Cast</glossterm> >> + <glossdef> >> + <para> >> + A conversion of a <glossterm>Datum</glossterm> from its current data >> + type to another data type. > maybe just say > A conversion of a <glossterm>Datum</glossterm> another data type. > >> + <glossterm>Catalog</glossterm> >> + <glossdef> >> + <para> >> + The <acronym>SQL</acronym> standard uses this standalone term to >> + indicate what is called a <glossterm>Database</glossterm> in >> + <productname>PostgreSQL</productname>'s terminology. > Maybe remove "standalone" ? > >> + <glossterm>Checkpointer</glossterm> > Process > >> + A process that writes dirty pages and <glossterm>WAL >> + Records</glossterm> to the file system and creates a special > Does the chckpointer actually write WAL ? YES, not only WAL Writer. >> + checkpoint record. This process is initiated when predefined >> + conditions are met, such as a specified amount of time has passed, or >> + a certain volume of records have been collected. > collected or written? > > I would say: >> + A checkpoint is usually initiated by >> + a specified amount of time having passed, or >> + a certain volume of records having been written. +-0 >> + <glossterm>Checkpoint</glossterm> >> + <glossdef> >> + <para> >> + A <link linkend="sql-checkpoint"> Checkpoint</link> is a point in time > Extra space > >> + <glossentry id="glossary-connection"> >> + <glossterm>Connection</glossterm> >> + <glossdef> >> + <para> >> + A <acronym>TCP/IP</acronym> or socket line for inter-process > I don't know if I've ever heard the phase "socket line" > I guess you mean a unix socket. +1 >> + <glossterm>Constraint</glossterm> >> + <glossdef> >> + <para> >> + A concept of restricting the values of data allowed within a >> + <glossterm>Table</glossterm>. > Just say: "A restriction on the values..."? > >> + <glossterm>Data Area</glossterm> > Remove this ? I've never heard this phrase before. grep on *.sgml delivers 4 occurrences. >> + <glossdef> >> + <para> >> + The base directory on the filesystem of a >> + <glossterm>Server</glossterm> that contains all data files and >> + subdirectories associated with a <glossterm>Cluster</glossterm> with >> + the exception of tablespaces. The environment variable > Should add an entry for "tablespace". +1 >> + <glossterm>Datum</glossterm> >> + <glossdef> >> + <para> >> + The internal representation of a <acronym>SQL</acronym> data type. > I'm not sure if should use "a SQL" or "an SQL", but not both. grep | wc delivers 106 occurrences for "an SQL" and 63 for "a SQL". It depends on how people pronounce the term SQL: "an esquel" or "a sequel". >> + <glossterm>Delete</glossterm> >> + <glossdef> >> + <para> >> + A <acronym>SQL</acronym> command whose purpose is to remove > just say "which removes" +1 >> + <glossentry id="glossary-file-segment"> >> + <glossterm>File Segment</glossterm> >> + <glossdef> >> + <para> >> + If a heap or index file grows in size over 1 GB, it will be split > 1GB is the default "segment size", which you should define. ??? >> + <glossentry id="glossary-foreign-data-wrapper"> >> + <glossterm>Foreign Data Wrapper</glossterm> >> + <glossdef> >> + <para> >> + A means of representing data that is not contained in the local >> + <glossterm>Database</glossterm> as if were in local >> + <glossterm>Table</glossterm>(s). > I'd say: > > + A means of representing data as a <glossterm>Table</glossterm>(s) even though > + it is not contained in the local <glossterm>Database</glossterm> > > >> + <glossentry id="glossary-foreign-key"> >> + <glossterm>Foreign Key</glossterm> >> + <glossdef> >> + <para> >> + A type of <glossterm>Constraint</glossterm> defined on one or more >> + <glossterm>Column</glossterm>s in a <glossterm>Table</glossterm> which >> + requires the value in those <glossterm>Column</glossterm>s to uniquely >> + identify a <glossterm>Row</glossterm> in the specified >> + <glossterm>Table</glossterm>. > An FK doesn't require the values in its table to be unique, right ? > I'd say something like: "..which enforces that the values in those Columns are > also present in an(other) table." > Reference Referential Integrity? > >> + <glossterm>Function</glossterm> >> + <glossdef> >> + <para> >> + Any pre-defined transformation of data. Many >> + <glossterm>Functions</glossterm> are already defined within >> + <productname>PostgreSQL</productname> itself, but can also be >> + user-defined. > I would remove "pre-", since you mentioned that it can be user-defined. > >> + <glossterm>Global SQL Object</glossterm> >> + <glossdef> >> + <para> >> + <!-- FIXME --> >> + Not all <glossterm>SQL Objects</glossterm> belong to a certain >> + <glossterm>Schema</glossterm>. Some belong to the complete >> + <glossterm>Database</glossterm>, or even to the complete >> + <glossterm>Cluster</glossterm>. These are referred to as >> + <glossterm>Global SQL Objects</glossterm>. Collations and Extensions >> + such as <glossterm>Foreign Data Wrappers</glossterm> reside at the >> + <glossterm>Database</glossterm> level; <glossterm>Database</glossterm> >> + names, <glossterm>Roles</glossterm>, >> + <glossterm>Tablespaces</glossterm>, <glossterm>Replication</glossterm> >> + origins, and subscriptions for logical >> + <glossterm>Replication</glossterm> at the >> + <glossterm>Cluster</glossterm> level. > I think "complete" is the wrong world. > I would say: > "An object which is not specific to a given database, but instead shared across > the entire Cluster". This phrase seems to be too simple. We must differentiate between the different levels: schema, database, cluster. Possibly someone finds a better phrase. >> + <glossentry id="glossary-grant"> >> + <glossterm>Grant</glossterm> >> + <glossdef> >> + <para> >> + A <acronym>SQL</acronym> command that is used to enable > I'd say "allow" > >> + <glossentry id="glossary-heap"> >> + <glossterm>Heap</glossterm> >> + <glossdef> >> + <para> >> + Contains the original values of <glossterm>Row</glossterm> attributes > I'm not sure what "original" means here ? Yes, this may be misleading. I want to express, that values are stored in the heap (the 'original') and possibly repeated as a key in an index. >> + (i.e. the data). The <glossterm>Heap</glossterm> is realized within >> + <glossterm>Database</glossterm> files and mirrored in >> + <glossterm>Shared Memory</glossterm>. > I wouldn't say mirrored, and probably just remove at least the part after "and". +-0 >> + <glossentry id="glossary-host"> >> + <glossterm>Host</glossterm> >> + <glossdef> >> + <para> >> + See <glossterm>Server</glossterm>. > Or client. Or proxy at some layer or other intermediate thing. Maybe just > remove this. Sometimes the term "host" is used in a different meaning. Therefor we shall have this glossary entry for clarification that it shall be used only in the sense of a "server". >> + <glossentry id="glossary-index"> >> + <glossterm>Index</glossterm> >> + <glossdef> >> + <para> >> + A <glossterm>Relation</glossterm> that contains data derived from a >> + <glossterm>Table</glossterm> (or <glossterm>Relation</glossterm> such >> + as a <glossterm>Materialized View</glossterm>). It's internal > Its > >> + structure supports very fast retrieval of and access to the original >> + data. >> + <glossterm>Instance</glossterm> >> + <glossdef> >> + <para> > ... >> + <para> >> + Many <glossterm>Instances</glossterm> can run on the same server as >> + long as they use different <acronym>IP</acronym> ports and manage > I would say "as long as their TCP/IP ports or sockets don't conflict, and manage..." +1 >> + <glossterm>Join</glossterm> >> + <glossdef> >> + <para> >> + A technique used with <command>SELECT</command> statements for >> + correlating data in one or more <glossterm>Relations</glossterm>. > I would refer to this as a SQL keyword allowing to combine data from multiple > relations. > >> + <glossterm>Lock</glossterm> >> + <glossdef> >> + <para> >> + A mechanism for one process temporarily preventing data from being >> + manipulated by any other process. > I'd say: > > + A mechanism by which a process protects simultaneous access to a resource > + by other processes. > > (I said "protects" since shared locks don't prevent all access, and it's easier > than explaining "unsafe access"). > > >> + <glossentry id="glossary-log-file"> >> + <glossterm>Log File</glossterm> >> + <glossdef> >> + <para> >> + <link linkend="logfile-maintenance">LOG files</link> contain readable >> + text lines about serious and non-serious events, e.g.: use of wrong >> + password, long-running queries, ... . > Serious and non-serious? ok, can be removed: 'events' only. >> + <glossterm>Log Writer</glossterm> > process > >> + <glossdef> >> + <para> >> + If activated and parameterized, the > I don't know what parameterized means here ok, unnecessary term. (There are parameters for the Log Writer process in the config file.) >> + <link linkend="runtime-config-logging">Log Writer</link> process >> + writes information about database events into the current >> + <glossterm>Log file</glossterm>. When reaching certain time- or >> + volume-dependent criterias, he <!-- FIXME "he"? --> creates a new > I think criteria is the plural.. +1 >> + <glossterm>Log Record</glossterm> > Can we remove this ? > Couple releases ago, "pg_xlog" was renamed to pg_wal. > I'd prefer to avoid defining something called "Log Record" about WAL that's > right next to text logs. "... that's right next to text logs." This is the problem, which shall be clarified. The rename of the directory does not affect the records which are written into the WAL files or are used for replication. The term "log record" is used in the documentation as well as in error messages. >> + <glossterm>Logged</glossterm> >> + <glossdef> >> + <para> >> + A <glossterm>Table</glossterm> is considered >> + <glossterm>Logged</glossterm> if changes to it are sent to the >> + <glossterm>WAL Log</glossterm>. By default, all regular >> + <glossterm>Tables</glossterm> are <glossterm>Logged</glossterm>. A >> + <glossterm>Table</glossterm> can be speficied as unlogged either at >> + creation time or via the <command>ALTER TABLE</command> command. The >> + primary use of unlogged <glossterm>Tables</glossterm> is for storing >> + transient work data that must be shared across processes, but with a >> + final result stored in logged <glossterm>Tables</glossterm>. >> + <glossterm>Temporary Tables</glossterm> are always unlogged. >> + </para> >> + </glossdef> >> + </glossentry> > Maybe it's be better to define "unlogged", since 1) logged is the default; and > 2) it's right next to text logs. > >> + <glossterm>Master</glossterm> >> + <glossdef> >> + <para> >> + When two or more <glossterm>Databases</glossterm> are linked via >> + <glossterm>Replication</glossterm>, the <glossterm>Server</glossterm> >> + that is considered the authoritative source of information is called >> + the <glossterm>Master</glossterm>. > I think it'd actually be the <<instance>> which is authoritative, in case they're > running on the same <<Server>> In this phase of the glossary we shall avoid the discussion about master/slave vs. primary/secondary. Some weeks ago we have seen many contributions without a clear result. In one of the next phases of the glossary we shall discuss all terms concerning replication separately. >> + <glossentry id="glossary-materialized"> >> + <glossterm>Materialized</glossterm> >> + <glossdef> >> + <para> >> + The act of storing information rather than just the means of accessing > remove "means of" ? > >> + the information. This term is used in <glossterm>Materialized >> + Views</glossterm> meaning that the data derived from the >> + <glossterm>View</glossterm> is actually stored on disk separate from > separately > >> + the sources of that data. When the term >> + <glossterm>Materialized</glossterm> is used in speaking about >> + mulit-step queries, it means that the data of a given step is stored > multi > >> + (in memory, but that storage may spill over onto disk). >> + </para> >> + </glossdef> >> + </glossentry> >> + >> + <glossentry id="glossary-materialized-view"> >> + <glossterm>Materialized View</glossterm> >> + <glossdef> >> + <para> >> + A <glossterm>Relation</glossterm> that is defined in the same way that >> + a <glossterm>View</glossterm> is, but it stores data in the same way > change "it stores" to stores > >> + <glossentry id="glossary-partition"> >> + <glossterm>Partition</glossterm> >> + <glossdef> >> + <para> >> + <!-- FIXME should this use the style used in "atomic"? --> >> + a) A <glossterm>Table</glossterm> that can be queried independently by >> + its own name, but can also be queried via another > just say "on its own" or "directly" > >> + <glossterm>Table</glossterm>, a partitionend > partitioned > also, put it in parens, like "via another table (a partitioned table)..." > >> + <glossterm>Table</glossterm>, which is a collection of > Say "set" here since you later talk about "subsets" and sets. > >> + <glossentry id="glossary-primary-key"> >> + <glossterm>Primary Key</glossterm> >> + <glossdef> >> + <para> >> + A special case of <glossterm>Unique Index</glossterm> defined on a >> + <glossterm>Table</glossterm> or other <glossterm>Relation</glossterm> >> + that also guarantees that all of the <glossterm>Attributes</glossterm> >> + within the <glossterm>Primary Key</glossterm> do not have >> + <glossterm>Null</glossterm> values. As the name implies, there can be >> + only one <glossterm>Primary Key</glossterm> per >> + <glossterm>Table</glossterm>, though it is possible to have multiple >> + <glossterm>Unique Indexes</glossterm> that also have no >> + <glossterm>Null</glossterm>-capable <glossterm>Attributes</glossterm>. > I would say "multiple >>unique indexes<< on >>attributes<< defined as not > nullable. > >> + <glossterm>Procedure</glossterm> >> + <glossdef> >> + <para> >> + A defined set of instructions for manipulating data within a >> + <glossterm>Database</glossterm>. <glossterm>Procedure</glossterm> can > "procedures" or "a procedure" > >> + <glossterm>Record</glossterm> >> + <glossdef> >> + <para> >> + See <link linkend="sql-revoke">Tupple</link>. > Tupple is back. And again below. > >> + A single <glossterm>Row</glossterm> of a <glossterm>Table</glossterm> >> + or other Relation. > I think it's commonly used to mean "an instance of a row" (in an MVCC sense), > but maybe that's too much detail for here. > >> + <glossterm>Referential Integrity</glossterm> >> + <glossdef> >> + <para> >> + The means of restricting data in one <glossterm>Relation</glossterm> > A means > >> + <glossentry id="glossary-relation"> >> + <glossterm>Relation</glossterm> >> + <glossdef> >> + <para> >> + The generic term for all objects in a <glossterm>Database</glossterm> > "A generic term for any object in a >>database<< that has a name and..." > >> + <glossentry id="glossary-result-set"> >> + <glossterm>Result Set</glossterm> >> + <glossdef> >> + <para> >> + A data structure transmitted from a <glossterm>Server</glossterm> to >> + client program upon the completion of a <acronym>SQL</acronym> >> + command, usually a <command>SELECT</command> but it can be an >> + <command>INSERT</command>, <command>UPDATE</command>, or >> + <command>DELETE</command> command if the <literal>RETURNING</literal> >> + clause is specified. > I'd remove everything in that sentence after "usually". > >> + <glossterm>Revoke</glossterm> >> + <glossdef> >> + <para> >> + A command to reduce access to a named set of > s/reduce/prevent/ ? > >> + <glossterm>Row</glossterm> >> + <glossdef> >> + <para> >> + See <link linkend="sql-revoke">Tupple</link>. > tuple > >> + <glossentry id="glossary-savepoint"> >> + <glossterm>Savepoint</glossterm> >> + <glossdef> >> + <para> >> + A special mark (such as a timestamp) inside a >> + <glossterm>Transaction</glossterm>. Data modifications after this >> + point in time may be rolled back to the time of the savepoint. > I don't think "timestamp" is a useful or accurate analogy for this. +1 >> + <glossterm>Schema</glossterm> >> + <glossdef> >> + <para> >> + A <link linkend="ddl-schemas">schema</link> is a namespace for >> + <glossterm>SQL objects</glossterm>, which all reside in the same >> + <glossterm>database</glossterm>. Each <glossterm>SQL >> + object</glossterm> must reside in exactly one >> + <glossterm>Schema</glossterm>. >> + </para> >> + <para> >> + In general, the names of <glossterm>SQL objects</glossterm> in the >> + schema are unique - even across different types of objects. The lone >> + exception is the case of <glossterm>Unique</glossterm> >> + <glossterm>Constraint</glossterm>s, in which case there >> + <emphasis>must</emphasis> be a <glossterm>Unique Index</glossterm> >> + with the same name and <glossterm>Schema</glossterm> as the >> + <glossterm>Constraint</glossterm>. There is no restriction on having >> + a name used in multiple <glossterm>Schema</glossterm>s. > I think there's some confusion. Constraints are not objects, right ? > > But, constraints do have an exception (not just unique constraints, though): > the constraint is only unique on its table, not in its database/schema. > > "pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname) CLUSTER Yes, you are right. But give me some time for a better suggestion. >> + <glossterm>Select</glossterm> >> + <glossdef> >> + <para> >> + The command used to query a <glossterm>Database</glossterm>. Normally, >> + <command>SELECT</command>s are not expected to modify the >> + <glossterm>Database</glossterm> in any way, but it is possible that >> + <glossterm>Functions</glossterm> invoked within the query could have >> + side-effects that do modify data. </para> > I think there should be references to the sql-* pages for this and others. > >> + <glossentry id="glossary-serializable"> >> + <glossterm>Serializable</glossterm> >> + <glossdef> >> + <para> >> + Transactions defined as <literal>SERIALIZABLE</literal> are unable to >> + see changes made within other transactions. In effect, for the >> + initializing session the entire <glossterm>Database</glossterm> >> + appears to be frozen duration such a >> + <glossterm>Transaction</glossterm>. > Do you mean "for the duration of the >>Transaction<<" > >> + <glossentry id="glossary-session"> >> + <glossterm>Session</glossterm> >> + <glossdef> >> + <para> >> + A <glossterm>Connection</glossterm> to the <glossterm>Database</glossterm>. >> + </para> >> + <para> >> + A description of the commands that were issued in the life cycle of a >> + particular <glossterm>Connection</glossterm> to the >> + <glossterm>Database</glossterm>. > I'm not sure what this <para> means. > >> + <glossterm>Sequence</glossterm> >> + <glossdef> >> + <para> >> + <!-- sounds excessively complicated a definition --> >> + An <glossterm>Database</glossterm> object which represents the > A not An > >> + mathematical concept of a numerical integral sequence. It can be >> + thought of as a <glossterm>Table</glossterm> with exactly one >> + <glossterm>Row</glossterm> and one <glossterm>Column</glossterm>. The >> + value stored is known as the current value. A >> + <glossterm>Sequence</glossterm> has a defined direction (almost always >> + increasing) and an interval step (usually 1). Whenever the >> + <literal>NEXTVAL</literal> pseudo-column of a >> + <glossterm>Sequence</glossterm> is accessed, the current value is moved >> + in the defined direction by the defined interval step, and that value > say "given interval step" > >> + <glossterm>Shared Memory</glossterm> >> + <glossdef> >> + <para> >> + <acronym>RAM</acronym> which is used by the processes common to an >> + <glossterm>Instance</glossterm>. It mirrors parts of >> + <glossterm>Database</glossterm> files, provides an area for >> + <glossterm>WAL Records</glossterm>, > Do we use shared_buffers for WAL ? Yes, my information is that WAL records are part of the shared_buffers. >> + <glossentry id="glossary-table"> >> + <glossterm>Table</glossterm> >> + <glossdef> >> + <para> >> + A collection of <glossterm>Tuples</glossterm> (also known as >> + <glossterm>Rows</glossterm> or <glossterm>Records</glossterm>) having >> + a common data structure (the same number of >> + <glossterm>Attributes</glossterm>s, in the same order, having the same > Attributes has two esses. > >> + name and type per position). A <glossterm>Table</glossterm> is the > I don't think you need to say here that the columns of a table all have the > same type and order. In my opinion this is an essential information. >> + <glossterm>Temporary Tables</glossterm> >> + <glossdef> >> + <para> >> + <glossterm>Table</glossterm>s that exist either for the lifetime of a >> + <glossterm>Session</glossterm> or a >> + <glossterm>Transaction</glossterm>, as defined at creation time. The > I would say "as specified at the time of its creation". > >> + <glossterm>Transaction</glossterm> >> + <glossdef> >> + <para> >> + A combination of one or more commands that must act as a single > Remove "one or more" > >> + <glossterm>Trigger</glossterm> >> + <glossdef> >> + <para> >> + A <glossterm>Function</glossterm> which can be defined to execute >> + whenever a certain operation (<command>INSERT</command>, >> + <command>UPDATE</command>, or <command>DELTE</command>) is applied to >> + that <glossterm>Relation</glossterm>. A <glossterm>Trigger</glossterm> > s/that/a/ > >> + <glossentry id="glossary-unique"> >> + <glossterm>Unique</glossterm> >> + <glossdef> >> + <para> >> + The condition of having no matching values in the same > s/matching/duplicate/ > >> + <glossterm>Relation</glossterm>. Most often used in the concept of > s/concept/context/ > >> + <glossentry id="glossary-update"> >> + <glossterm>Update</glossterm> >> + <glossdef> >> + <para> >> + A command used to modify <glossterm>Rows</glossterm> that already > or 'may already' > >> + <glossterm>WAL File</glossterm> > ... >> + <para> >> + The sequence of <glossterm>WAL Records</glossterm> in combination with >> + the sequence of <glossterm>WAL Files</glossterm> represents the > Remove "in combination with the sequence of >WAL Files<" > >> + <glossentry id="glossary-wal-log"> >> + <glossterm>WAL Log</glossterm> > Can you just say WAL or "write-ahead log". Sometimes the term "WAL log" is used in the documentation. But the preferred term is "WAL file". This glossary entry does nothing but points to the preferred term, which indicates that he shall be avoided in the future. >> + <glossdef> >> + <para> >> + A <glossterm>WAL Record</glossterm> contains either new or changed >> + <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> data or >> + information about a <command>COMMIT</command>, >> + <command>ROLLBACK</command>, <command>SAVEPOINT</command>, or >> + <glossterm>Checkpointer</glossterm> operation. WAL records use a >> + non-printabe binary format. > non-printable +1 > Or just remove it. > Or just remove the sentence. > >> + <glossterm>WAL Writer</glossterm> > process > >> + <glossentry id="glossary-window-function"> >> + <glossterm>Window Function</glossterm> >> + <glossdef> >> + <para> >> + A type of <glossterm>Function</glossterm> similar to an >> + <glossterm>Aggregate</glossterm> in that can derive its value from a > in that IT > >> + set of <glossterm>Rows</glossterm> in a <glossterm>Result >> + Set</glossterm>, but still retaining the original source data. Kind regards, Jürgen
On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote: > > > + <glossentry id="glossary-file-segment"> > > > + <glossterm>File Segment</glossterm> > > > + <glossdef> > > > + <para> > > > + If a heap or index file grows in size over 1 GB, it will be split > > 1GB is the default "segment size", which you should define. > > ??? "A <<Table>> or other >>Relation<<" is larger than a >Cluster's< segment size is stored in multiple physical files. This avoids file size limitations which vary across operating systems." https://www.postgresql.org/docs/devel/runtime-config-preset.html ts=# SELECT name, setting, unit, category, short_desc FROM pg_settings WHERE name~'block_size|segment_size'; name | setting | unit | category | short_desc ------------------+----------+------+----------------+---------------------------------------------- block_size | 8192 | | Preset Options | Shows the size of a disk block. segment_size | 131072 | 8kB | Preset Options | Shows the number of pages per disk file. wal_block_size | 8192 | | Preset Options | Shows the block size in the write ahead log. wal_segment_size | 16777216 | B | Preset Options | Shows the size of write ahead log segments. > > > + <glossentry id="glossary-heap"> > > > + <glossterm>Heap</glossterm> > > > + <glossdef> > > > + <para> > > > + Contains the original values of <glossterm>Row</glossterm> attributes > > I'm not sure what "original" means here ? > > Yes, this may be misleading. I want to express, that values are stored in > the heap (the 'original') and possibly repeated as a key in an index. Maybe "this is the content of rows/attributes in >>Tables<< or other >>Relations<<". or "this is the data store for ..." > > > + <glossentry id="glossary-host"> > > > + <glossterm>Host</glossterm> > > > + <glossdef> > > > + <para> > > > + See <glossterm>Server</glossterm>. > > Or client. Or proxy at some layer or other intermediate thing. Maybe just > > remove this. > > Sometimes the term "host" is used in a different meaning. Therefor we shall > have this glossary entry for clarification that it shall be used only in the > sense of a "server". I think that suggests just removing "host" and consistently saying "server". -- Justin
man pages: Sorry, if I confused someone with my poor English. I just
want to express in my 'offline' mail that we don't have to worry about
man page generation. The patch doesn't affect files in the /ref
subdirectory from where man pages are created.
+ <glossentry id="glossary-host"> + <glossterm>Host</glossterm> + <glossdef> + <para> + See <glossterm>Server</glossterm>.Or client. Or proxy at some layer or other intermediate thing. Maybe just remove this.Sometimes the term "host" is used in a different meaning. Therefor we shall have this glossary entry for clarification that it shall be used only in the sense of a "server".I think that suggests just removing "host" and consistently saying "server".
"server", "host", "database server": All three terms are used intensively in the documentation. When we define glossary terms, we should also take into account the consequences for those parts. "database server" is the most diffuse. E.g.: In 'config.sgml' he is used in the sense of some hardware or VM "... If you have a dedicated database server with 1GB or more of RAM ..." as well as in the sense of an instance "... To start the database server on the command prompt ...". Additionally the term is completely misleading. In both cases we do not mean something which is related to a database but something which is related to a cluster.
In the past, people accepted such blurs. My - minimal - intention is to raise awareness of such ambiguities, or - better - to clearly define the situation in the glossary. But this is only a first step. The second step shall be a rework of the documentation to use the preferred terms defined in the glossary. Because there will be a time gap between the two steps, we may want to be a little chatty in the glossary and define ambiguous terms as shown in the following example:
---
Server: The term "Server" denotes .... .
Host: An outdated term which will be replaced by <xref-to-the-glossary>Server</xref> over time.
Database Server: An outdated term which sometimes denotes a <xref-to-the-glossary>Server</xref> and sometimes an <xref-to-the-glossary>Instance</xref>.
---
This is a pattern for all terms which we currently described with the phrase "See ...". Later, after reviewing the documentation by eliminating the non-preferred terms, the glossary entries with "An outdated term ..." can be dropped.
In the last days we have seen many huge and small proposals. I think, it will be helpful to summarize this work by waiting for a patch from Alvaro containing everything it deems useful from his point of view.
Kind regards, Jürgen
On Sat, Mar 21, 2020 at 03:08:30PM +0100, Jürgen Purtz wrote: > On 21.03.20 00:03, Justin Pryzby wrote: > > > > > + <glossentry id="glossary-host"> > > > > > + <glossterm>Host</glossterm> > > > > > + <glossdef> > > > > > + <para> > > > > > + See <glossterm>Server</glossterm>. > > > > Or client. Or proxy at some layer or other intermediate thing. Maybe just > > > > remove this. > > > Sometimes the term "host" is used in a different meaning. Therefor we shall > > > have this glossary entry for clarification that it shall be used only in the > > > sense of a "server". > > I think that suggests just removing "host" and consistently saying "server". > > "server", "host", "database server": All three terms are used intensively in > the documentation. When we define glossary terms, we should also take into > account the consequences for those parts. The documentation uses "host", but doesn't always mean "server". $ git grep -Fw host doc/src/ doc/src/sgml/backup.sgml: that you can perform this backup procedure from any remote host that has pg_hba appears to be all about client "hosts". FATAL: no pg_hba.conf entry for host "123.123.123.123", user "andym", database "testdb" -- Justin
On 2020-03-20 01:11, Alvaro Herrera wrote: > I gave this a look. I first reformatted it so I could read it; that's > 0001. Second I changed all the long <link> items into <xref>s, which > are shorter and don't have to repeat the title of the refered to page. > (Of course, this changes the link to be in the same style as every other > link in our documentation; some people don't like it. But it's our > style.) AFAICT, all the <link> elements in this patch should be changed to <xref>. If there is something undesirable about the output style, we can change that, but it's not this patch's job to make up its own rules. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Mar 20, 2020 at 3:58 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > + A process that writes dirty pages and <glossterm>WAL > > + Records</glossterm> to the file system and creates a special > > Does the chckpointer actually write WAL ? Yes. > An FK doesn't require the values in its table to be unique, right ? I believe it does require that the values are unique. > I think there's some confusion. Constraints are not objects, right ? I think constraints are definitely objects. They have names and you can, for example, COMMENT on them. > Do we use shared_buffers for WAL ? No. (I have not reviewed the patch; these are just a few comments on your comments.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> > + Records</glossterm> to the file system and creates a special
>
> Does the chckpointer actually write WAL ?
Yes.
> An FK doesn't require the values in its table to be unique, right ?
I believe it does require that the values are unique.
> I think there's some confusion. Constraints are not objects, right ?
I think constraints are definitely objects. They have names and you
can, for example, COMMENT on them.
> Do we use shared_buffers for WAL ?
No.
(I have not reviewed the patch; these are just a few comments on your comments.)
Do we use shared_buffers for WAL ?No.
What's about the explanation in https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers (integer) The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the parameter wal_buffers grabs some of the existing shared_buffers for its own purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers two different shared memory areas?
Kind regards, Jürgen
On Tue, Mar 24, 2020 at 3:40 PM Jürgen Purtz <juergen@purtz.de> wrote: > On 24.03.20 19:46, Robert Haas wrote: > Do we use shared_buffers for WAL ? > > No. > > What's about the explanation in https://www.postgresql.org/docs/12/runtime-config-wal.html : "wal_buffers (integer) The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects asize equal to 1/32nd (about 3%) of shared_buffers, ... " ? My understanding was, that the parameter wal_buffers grabs someof the existing shared_buffers for its own purpose. Is this a misinterpretation? Are shared_buffers and wal_buffers twodifferent shared memory areas? Yes. The code adds up the shared memory requests from all of the different subsystems and then allocates one giant chunk of shared memory which is divided up between them. The overwhelming majority of that memory goes into shared_buffers, but not all of it. You can use the new pg_get_shmem_allocations() function to see how it's used. For example, with shared_buffers=4GB: rhaas=# select name, pg_size_pretty(size) from pg_get_shmem_allocations() order by size desc limit 10; name | pg_size_pretty ----------------------+---------------- Buffer Blocks | 4096 MB Buffer Descriptors | 32 MB <anonymous> | 32 MB XLOG Ctl | 16 MB Buffer IO Locks | 16 MB Checkpointer Data | 12 MB Checkpoint BufferIds | 10 MB clog | 2067 kB | 1876 kB subtrans | 261 kB (10 rows) rhaas=# select count(*), pg_size_pretty(sum(size)) from pg_get_shmem_allocations(); count | pg_size_pretty -------+---------------- 54 | 4219 MB (1 row) So, in this configuration, there whole shared memory segment is 4219MB, of which 4096MB is allocated to shared_buffers and the rest to dozens of smaller allocations, with 1876 kB left over that might get snapped up later by an extension that wants some shared memory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote: > > > + <glossterm>Archiver</glossterm> > > Can you change that to archiver process ? > > I prefer the short term without the addition of 'process' - concerning > 'Archiver' as well as the other cases. But I'm not an native English > speaker. I didn't like it due to lack of context. What about "wal archiver" ? It occured to me when I read this. https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com -- Justin
On 27.03.20 21:12, Justin Pryzby wrote: > On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote: >>>> + <glossterm>Archiver</glossterm> >>> Can you change that to archiver process ? >> I prefer the short term without the addition of 'process' - concerning >> 'Archiver' as well as the other cases. But I'm not an native English >> speaker. > I didn't like it due to lack of context. > > What about "wal archiver" ? > > It occured to me when I read this. > https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com > "WAL archiver" is ok for me. In the current documentation we have 2 places with "WAL archiver" and 4 with "archiver"-only (high-availability.sgml, monitoring.sgml). "backend process" is an exception to the other terms because the standalone term "backend" is sensibly used in diverse situations. Kind regards, Jürgen
On 27.03.20 21:12, Justin Pryzby wrote:
> On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>>>> + <glossterm>Archiver</glossterm>
>>> Can you change that to archiver process ?
>> I prefer the short term without the addition of 'process' - concerning
>> 'Archiver' as well as the other cases. But I'm not an native English
>> speaker.
> I didn't like it due to lack of context.
>
> What about "wal archiver" ?
>
> It occured to me when I read this.
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
>
"WAL archiver" is ok for me. In the current documentation we have 2
places with "WAL archiver" and 4 with "archiver"-only
(high-availability.sgml, monitoring.sgml).
"backend process" is an exception to the other terms because the
standalone term "backend" is sensibly used in diverse situations.
Kind regards, Jürgen
New definitions:* All four ACID terms* Vacuum (split off from Autovacuum)* Tablespace* WAL Archiver (replaces Archiver)Changes to existing terms:* Implemented most wording changes recommended by Justin* all remaining links were either made into xrefs or edited out of existence
* de-tagged most second uses of of a term within a definition
Did not do* Addressed the " Process" suffix suggested by Justin. There isn't consensus on these changes, and I'm neutral on the matter* change the Cast definition. I think it's important to express that a cast has a FROM datatype as well as a TO* anything host/server related as I couldn't see a consensus reachedOther thoughts:* Trivial definitions that are just see-other-definition are ok with me, as the goal of this glossary is to aid in discovery of term meanings, so knowing that two terms are interchangable is itself helpful
Attachment
On Sun, Mar 29, 2020 at 5:29 AM Jürgen Purtz <juergen@purtz.de> wrote:On 27.03.20 21:12, Justin Pryzby wrote:
> On Fri, Mar 20, 2020 at 11:32:25PM +0100, Jürgen Purtz wrote:
>>>> + <glossterm>Archiver</glossterm>
>>> Can you change that to archiver process ?
>> I prefer the short term without the addition of 'process' - concerning
>> 'Archiver' as well as the other cases. But I'm not an native English
>> speaker.
> I didn't like it due to lack of context.
>
> What about "wal archiver" ?
>
> It occured to me when I read this.
> https://www.postgresql.org/message-id/20200327.163007.128069746774242774.horikyota.ntt%40gmail.com
>
"WAL archiver" is ok for me. In the current documentation we have 2
places with "WAL archiver" and 4 with "archiver"-only
(high-availability.sgml, monitoring.sgml).
"backend process" is an exception to the other terms because the
standalone term "backend" is sensibly used in diverse situations.
Kind regards, JürgenI've taken Alvarao's fixes and done my best to incorporate the feedback into a new patch, which Roger's (tech writer) reviewed yesterday.The changes are too numerous to list, but the highlights are:New definitions:* All four ACID terms* Vacuum (split off from Autovacuum)* Tablespace* WAL Archiver (replaces Archiver)Changes to existing terms:* Implemented most wording changes recommended by Justin* all remaining links were either made into xrefs or edited out of existence* de-tagged most second uses of of a term within a definitionDid not do* Addressed the " Process" suffix suggested by Justin. There isn't consensus on these changes, and I'm neutral on the matter* change the Cast definition. I think it's important to express that a cast has a FROM datatype as well as a TO* anything host/server related as I couldn't see a consensus reachedOther thoughts:* Trivial definitions that are just see-other-definition are ok with me, as the goal of this glossary is to aid in discovery of term meanings, so knowing that two terms are interchangable is itself helpfulIt is my hope that this revision represents the final _structural_ change to the glossary. New definitions and edits to existing definitions will, of course, go on forever.
Please find some minor suggestions in the attachment. They are based on Corey's last patch 0001-glossary-v4.patch.
Kind regards, Jürgen
Attachment
On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote: > Please find some minor suggestions in the attachment. They are based on > Corey's last patch 0001-glossary-v4.patch. > @@ -220,7 +220,7 @@ > Record</glossterm>s to the file system and creates a special > checkpoint record. This process is initiated when predefined > conditions are met, such as a specified amount of time has passed, or > - a certain volume of records have been collected. > + a certain volume of records has been collected. I think you're correct in that "volume" is singular. But I think "collected" is the wrong world. I suggested "written". > <para> > - One of the <acronym>ACID</acronym> properties. This means that concurrently running > + One of the <acronym>ACID</acronym> properties. This means that concurrently running These could maybe say "required" or "essential" >ACID< properties > <para> > + In reference to a <glossterm>Table</glossterm>: > A <glossterm>Table</glossterm> that can be queried directly, Maybe: "In reference to a >Relation<: A table which can be queried directly," > table in the collection. > </para> > <para> > - When referring to an <glossterm>Analytic</glossterm> > - <glossterm>Function</glossterm>: a partition is a definition > - that identifies which neighboring > + In reference to a <glossterm>Analytic Function</glossterm>: s/a/an/ > @@ -1333,7 +1334,8 @@ > <glossdef> > <para> > The condition of having no duplicate values in the same > - <glossterm>Relation</glossterm>. Often used in the concept of > + <glossterm>Column</glossterm> of a <glossterm>Relation</glossterm>. > + Often used in the concept of s/concept/context/, but I said that before, so maybe it was rejected. -- Justin
On Mon, Mar 30, 2020 at 01:10:19PM -0400, Corey Huinker wrote: > + <glossentry id="glossary-aggregating"> > + <glossterm>Aggregating</glossterm> > + <glossdef> > + <para> > + The act of combining a collection of data (input) values into > + a single output value, which may not be of the same type as the > + input values. I think we maybe already tried to address this ; but could we define a noun form ? But not "aggregate" since it's the same word as the verb form. I think it would maybe be best to merge with "aggregate function", below. > + <glossentry id="glossary-consistency"> > + <glossterm>Consistency</glossterm> > + <glossdef> > + <para> > + One of the <acronym>ACID</acronym> properties. This means that the database > + is always in compliance with its own rules such as <glossterm>Table</glossterm> > + structure, <glossterm>Constraint</glossterm>s, I don't think the definition of "compliance" is good. The state of being consistent means an absense of corruption more than that an absense of data integrity issues (which could be caused by corruption). > + <glossentry id="glossary-datum"> > + <glossterm>Datum</glossterm> > + <glossdef> > + <para> > + The internal representation of a <acronym>SQL</acronym> data type. Could you say "..used by PostgreSQL" ? > + <glossterm>File Segment</glossterm> > + <glossdef> > + <para> > + A physical file which stores data for a given > + <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> object. > + <glossterm>File Segment</glossterm>s are limited in size by a > + configuration value and if that size is exceeded, it will be split > + into multiple physical files. Say "if an object exceeds that size, then it will be stored across multiple physical files". > + which handles parts of an <acronym>SQL</acronym> query to take ... > + A <acronym>SQL</acronym> command used to add new data into a I mentioned before, please be consistent: "A SQL or An SQL". > + </para> > + <para> > + Many <glossterm>Instance</glossterm>s can run on the same server as Say "multiple" not many. > + <glossentry id="glossary-join"> > + <glossterm>Join</glossterm> > + <glossdef> > + <para> > + A <acronym>SQL</acronym> keyword used in <command>SELECT</command> statements for > + combining data from multiple <glossterm>Relation</glossterm>s. Could you add a link to the docs ? > + <glossentry id="glossary-log-writer"> > + <glossterm>Log Writer</glossterm> > + <glossdef> > + <para> > + If activated and parameterized, the I still don't know what parameterized means here. > + <glossentry id="glossary-system-catalog"> > + <glossterm>System Catalog</glossterm> > + <glossdef> > + <para> > + A collection of <glossterm>Table</glossterm>s and > + <glossterm>View</glossterm>s which describe the structure of all > + <acronym>SQL</acronym> objects of the <glossterm>Database</glossterm> I would say "... a PostgreSQL >Database<" > + and the <glossterm>Global SQL Object</glossterm>s of the > + <glossterm>Cluster</glossterm>. The <glossterm>System > + Catalog</glossterm> resides in the schema > + <literal>pg_catalog</literal>. Main parts are mirrored as > + <glossterm>View</glossterm>s in the <glossterm>Schema</glossterm> > + <literal>information_schema</literal>. I wouldn't say "mirror": Some information is also exposed as >Views< in the >information_schema< >Schema<. > + <glossentry id="glossary-tablespace"> > + <glossterm>Tablespace</glossterm> > + <glossdef> > + <para> > + A named location on the server filesystem. All <glossterm>SQL Object</glossterm>s > + which require storage beyond their definition in the > + <glossterm>System Catalog</glossterm> > + must belong to a single tablespace. Remove "single" as it sounds like we only support one. > + <glossterm>Transaction</glossterm> > + <glossdef> > + <para> > + A combination of commands that must act as a single > + <glossterm>Atomic</glossterm> command: they all succeed or all fail > + as a single unit, and their effects are not visible to other > + <glossterm>Session</glossterm>s until > + the <glossterm>Transaction</glossterm> is complete. s/complete/commited/ ? > + <glossentry id="glossary-unique"> > + <glossterm>Unique</glossterm> > + <glossdef> > + <para> > + The condition of having no duplicate values in the same > + <glossterm>Relation</glossterm>. Often used in the concept of s/concept/context/ > + <glossterm>Vacuum</glossterm> > + <glossdef> > + <para> > + The process of removing outdated <acronym>MVCC</acronym> Maybe say "tuples which were deleted or obsoleted by an UPDATE". But maybe you're trying to use generic language. -- Justin
On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote: > 1. It's obviously incomplete. There are more terms, a lot more, to add. How did you come up with the initial list of terms ? Here's some ideas; I'm *not* suggesting to include all of everything, but hopefully start with a coherent, self-contained list. grep -roh '<firstterm>[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c |sort -nr |less Maybe also: object identifier operator classes operator family visibility map -- Justin
On Sun, Oct 13, 2019 at 04:52:05PM -0400, Corey Huinker wrote:
> 1. It's obviously incomplete. There are more terms, a lot more, to add.
How did you come up with the initial list of terms ?
Here's some ideas; I'm *not* suggesting to include all of everything, but
hopefully start with a coherent, self-contained list.
grep -roh '<firstterm>[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c |sort -nr |less
Maybe also:
object identifier
operator classes
operator family
visibility map
On 31.03.20 19:58, Justin Pryzby wrote: > On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote: >> Please find some minor suggestions in the attachment. They are based on >> Corey's last patch 0001-glossary-v4.patch. >> @@ -220,7 +220,7 @@ >> Record</glossterm>s to the file system and creates a special >> checkpoint record. This process is initiated when predefined >> conditions are met, such as a specified amount of time has passed, or >> - a certain volume of records have been collected. >> + a certain volume of records has been collected. > I think you're correct in that "volume" is singular. But I think "collected" > is the wrong world. I suggested "written". > "collected" is not optimal. I suggest "created". Please avoid "written", the WAL records will be written when the Checkpointer is running, not before. So: "a certain volume of <glossterm>WAL records<glossterm> has been collected." Every thing else is ok for me. Kind regards, Jürgen
On 31.03.20 20:07, Justin Pryzby wrote: > On Mon, Mar 30, 2020 at 01:10:19PM -0400, Corey Huinker wrote: >> + <glossentry id="glossary-aggregating"> >> + <glossterm>Aggregating</glossterm> >> + <glossdef> >> + <para> >> + The act of combining a collection of data (input) values into >> + a single output value, which may not be of the same type as the >> + input values. > I think we maybe already tried to address this ; but could we define a noun > form ? But not "aggregate" since it's the same word as the verb form. I think > it would maybe be best to merge with "aggregate function", below. Yes, combine the two. Or remove "aggregating" at all. > + <glossentry id="glossary-log-writer"> >> + <glossterm>Log Writer</glossterm> >> + <glossdef> >> + <para> >> + If activated and parameterized, the > I still don't know what parameterized means here. Remove "and parameterized". The Log Writer always has (default) parameters. Every thing else is ok for me. Kind regards, Jürgen
On 2020-Apr-01, Jürgen Purtz wrote: > > On 31.03.20 19:58, Justin Pryzby wrote: > > On Tue, Mar 31, 2020 at 04:13:00PM +0200, Jürgen Purtz wrote: > > > Please find some minor suggestions in the attachment. They are based on > > > Corey's last patch 0001-glossary-v4.patch. > > > @@ -220,7 +220,7 @@ > > > Record</glossterm>s to the file system and creates a special > > > checkpoint record. This process is initiated when predefined > > > conditions are met, such as a specified amount of time has passed, or > > > - a certain volume of records have been collected. > > > + a certain volume of records has been collected. > > I think you're correct in that "volume" is singular. But I think "collected" > > is the wrong world. I suggested "written". > > > "collected" is not optimal. I suggest "created". Please avoid "written", the > WAL records will be written when the Checkpointer is running, not before. Actually, you're mistaken; the checkpointer hardly writes any WAL records. In fact, it only writes *one* wal record, which is the checkpoint record itself. All the other wal records are written either by the backends that produce it, or by the wal writer process. By the time the checkpoint runs, the wal records are long expected to be written. Anyway I changed a lot of terms again, as well as changing the way the terms are marked up -- for two reasons: 1. I didn't like the way the WAL-related entries were structured. I created a new entry called "Write-Ahead Log", which explains what WAL is; this replaces the term "WAL Log", which is redundant (since the L in WAL stands for "log" already). I kept the id as glossary-wal, though, because it's shorter and *shrug*. The definition uses the terms "wal record" and "wal file", which I also rewrote. 2. I found out that "see xyz" and "see also" have bespoke markup in Docbook -- <glosssee> and <glossseealso>. I changed some glossentries to use those, removing some glossdefs and changing a couple of paras to glossseealsos. I also removed all "id" properties from glossentries that are just <glosssee>, because I think it's a mistake to have references to entries that will make the reader look up a different term; for me as a reader that's annoying, and I don't like to annoy people. While at it, I again came across "analytic", which is a term we don't use much, so I made it a glosssee for "window function"; and while at it I realized we didn't clearly explain what a window was. So I added "window frame" for that. I considered adding the term "partition" which is used in this context, but decided it wasn't necessary. I also added "(process)" to terms that define processes. So now we have "checkpointer (process)" and so on. I rewrote the definition for "atomic" once again. Made it two glossdefs, because I can. If you don't like this, I can undo. I added "recycling". I still have to go through some other defs. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Tue, Mar 31, 2020 at 03:26:02PM -0400, Corey Huinker wrote: > Just so I can prioritize my work, which of these things, along with your > suggestions in previous emails, would you say is a barrier to considering > this ready for a committer? To answer your off-list inquiry, I'm not likely to mark it "ready" myself. I don't know if any of these would be a "blocker" for someone else. > > Here's some ideas; I'm *not* suggesting to include all of everything, but > > hopefully start with a coherent, self-contained list. > > > grep -roh '<firstterm>[^<]*' doc/src/ |sed 's/.*/\L&/' |sort |uniq -c > > |sort -nr |less I looked through that list and found these that might be good to include now or in the future. Probably all of these need language polishing; I'm not requesting you to just copy them in just to say they're there. join: concept of combining columns from two tables or other relations. The result of joining a table with N rows to another table with M rows might have up to N*M rows (if every row from the first table "joins to" every row on the second table). normalized: A database schema is said to be "normalized" if its redundancy has been removed. Typically a "normalized" schema has a larger number of tables, which include ID columns, and queries typically involve joining together multiple tables. query: a request send by a client to a server, usually to return results or to modify data on the server; query plan: the particular procedure by which the database server executes a query. A simple query involving a single table could might be planned using a sequential scan or an index scan. For a complex query involving multiple tables joined togther, the optimizer attempts to determine the cheapest/fastest/best way to execute the query, by joining tables in the optimal order, and with the optimal join strategy. planner/optimizer: ... transaction isolation: psql: ... synchronous: An action is said to be "synchronous" if it does not return to its requestor until its completion; bind parameters: arguments to a SQL query that are sent separately from the query text. For example, the query text "SELECT * FROM tbl WHERE col=$1" might be executed for some certain value of the $1 parameter. If parameters are sent "in-line" as a part of the query text, they need to be properly quoted/escaped/sanitized, to avoid accidental or malicious misbehavior if the input contains special characters like semicolons or quotes. > > Maybe also: > > object identifier > > operator classes > > operator family > > visibility map -- Justin
On 2020-Apr-01, Justin Pryzby wrote: > planner/optimizer: ... I propose we define "planner" and make "optimizer" a <glosssee> entry. I further propose not to define the term "normalized", at least not for now. That seems a very deep rabbit hole. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
2. I found out that "see xyz" and "see also" have bespoke markup in
Docbook -- <glosssee> and <glossseealso>. I changed some glossentries
to use those, removing some glossdefs and changing a couple of paras to
glossseealsos. I also removed all "id" properties from glossentries
that are just <glosssee>, because I think it's a mistake to have
references to entries that will make the reader look up a different
term; for me as a reader that's annoying, and I don't like to annoy
people.
I rewrote the definition for "atomic" once again. Made it two
glossdefs, because I can. If you don't like this, I can undo.
I propose we define "planner" and make "optimizer" a <glosssee> entry.
I further propose not to define the term "normalized", at least not for
now. That seems a very deep rabbit hole.
On 2020-Apr-01, Corey Huinker wrote: > > I propose we define "planner" and make "optimizer" a <glosssee> entry. > > I have no objection to more entries, or edits to entries, but am concerned > that the process leads to someone having to manually merge several > start-from-scratch patches, with no clear sense of when we'll be done. I > may make sense to appoint an edit-collector. I added "query planner" (please suggest edits) and "query" (using Justin's def) and edited the defs of the ACID terms a little bit (in particular moved the definition of atomic transaction to "atomicity" from "atomic", and made the latter reference the former instead of the other way around). Also removed "Aggregating" as suggested upthread. I moved "master" over to "primary (server)", keeping the ref; we don't use the former much. There's only one "serious" mistake in the defs AFAICS which is that of "global objects". Only roles, tablespace, databases are global objects. Objects that are not in a schema (extensions, etc) are not "global" in that sense. I think all <glossterm> used in definitions should have linkend. I hope to get this committed today, but I'm going to sleep now so if you want to suggest further edits, now's the time. I think the terms proposed by Justin are good to have -- please discuss the defs he proposed -- only "normalized" I'd rather stay away from. Thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
+1 and many thanks to Alvaros edits. Kind regards Jürgen Purtz
+1 and many thanks to Alvaros edits.
* temporarily re-added an id for glossary-row as we have many references to that. unsure if we should use the term Tuple in all those places or say Row while linking to glossary-tuple, or something else
* temporarily re-added an id for glossary-segment, glossary-wal-segment, glossary-analytic-function, as those were also referenced and will need similar decisions made
* added a stub entry for glossary-unique-index, unsure if it should have a definition on it's own, or we split it into unique and index.
Attachment
On 2020-Apr-02, Corey Huinker wrote: > On Thu, Apr 2, 2020 at 8:44 AM Jürgen Purtz <juergen@purtz.de> wrote: > > > +1 and many thanks to Alvaros edits. > > > > > I did some of the grunt work Alvaro alluded to in v6, and the results are > attached and they build, which means there are no invalid links. Thank you! I had been working on some other changes myself, and merged most of your changes. I give you v8. > * renamed id glossary-temporary-tables to glossary-temporary-table Good. > * temporarily re-added an id for glossary-row as we have many references to > that. unsure if we should use the term Tuple in all those places or say Row > while linking to glossary-tuple, or something else I changed these to link to glossary-tuple; that entry already explains these two other terms, so this seems acceptable. > * temporarily re-added an id for glossary-segment, glossary-wal-segment, > glossary-analytic-function, as those were also referenced and will need > similar decisions made Ditto. > * added a stub entry for glossary-unique-index, unsure if it should have a > definition on it's own, or we split it into unique and index. I changed Unique Index into Unique Constraint, which is supposed to be the overarching concept. Used that in the definition of primary key. > * I noticed several cases where a glossterm is used twice in a definition, > but didn't de-term them Did that for most I found, but I expect that some remain. > * I'm curious about how we should tag a term when using it in its own > definition. same as anywhere else? I think we should not tag those. I fixed the definition of global object as mentioned previously. Also added "client", made "connection" have less importance compared to "session", and removed "window frame" (made "window function" refer to "partition" instead). If you (or anybody) have suggestions for the definition of "client" and "session", I'm all ears. I'm quite liking the result of this now. Thanks for all your efforts. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, Apr 02, 2020 at 07:09:32PM -0300, Alvaro Herrera wrote: > "partition" instead). If you (or anybody) have suggestions for the > definition of "client" and "session", I'm all ears. We already have Session: A Connection to the Database. I propose: Client: A host (or a process on a host) which connects to a server to make queries or other requests. But note, "host" is still defined as "server", which I didn't like. Maybe it should be: A computer which may act as a >client< or a >server<. -- Justin
Pushed now. Many thanks to Corey who put the main thrust, and to Jürgen and Roger for the great help, and to Justin for the extensive review and Fabien for the initial discussion. This is just a starting point. Let's keep improving it. And how that we have it, we can start thinking of patching the main part of the docs to make reference to it by using <glossterm> in key spots. Right now the glossary links to itself, but it makes lots of sense to have other places point to it. On 2020-Apr-02, Justin Pryzby wrote: > We already have Session: > A Connection to the Database. Yes, but I didn't like that much, so I rewrote it -- I was asking for suggestions on how to improve it further. While I think we use those terms (connection and session) interchangeably sometimes, they're not exactly the same and the glossary should be more precise or at least less vague about the distinction. > I propose: Client: > A host (or a process on a host) which connects to a server to make > queries or other requests. > > But note, "host" is still defined as "server", which I didn't like. > > Maybe it should be: > A computer which may act as a >client< or a >server<. I changed all these terms, and a few others, added a couple more and commented out some that I was not happy with, and pushed. I think this still needs more work: * We had "serializable", but none of the other isolation levels were defined. If we think we should define them, let's define them all. But also the definition we had for serializable was not correct; it seemed more suited to define "repeatable read". * I commented out the definition of "sequence", which seemed to go into excessive detail. Let's have a more concise definition? * We're missing exclusion constraints, and NOT NULL which is also a weird type of constraint. Patches for these omissions, and other contributions, welcome. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
we have it, we can start thinking of patching the main part of the docs
to make reference to it by using <glossterm> in key spots. Right now
the glossary links to itself, but it makes lots of sense to have other
places point to it.
* I commented out the definition of "sequence", which seemed to go into
excessive detail. Let's have a more concise definition?
Patches for these omissions, and other contributions, welcome.
Thanks for all your work on this!
On 2020-04-03 18:45, Alvaro Herrera wrote: > Pushed now. Many thanks to Corey who put the main thrust, and to > Jürgen > and Roger for the great help, and to Justin for the extensive review > and > Fabien for the initial discussion. A few improvements: 'its value that cannot' should be 'its value cannot' 'A newly created Cluster' should be 'A newly created cluster' 'term Cluster' should be 'term cluster' 'allowed within a Table.' should be 'allowed within a table.' 'of a SQL data type.' should be 'of an SQL data type.' 'A SQL command' should be 'An SQL command' 'i.e. the data' should be 'i.e., the data' 'that a a view is' should be 'that a view is' 'One of the tables that each contain part' should be 'One of multiple tables that each contain part' 'a partition is a user-defined criteria' should be 'a partition is a user-defined criterion' 'Roless are' should be 'Roles are' 'undo all of the operations' should be 'undo all operations' 'A special mark inside the sequence of steps' should be 'A special mark in the sequence of steps' 'are enforced unique' should be (?) 'are enforced to be unique' 'the term Schema is used' should be 'the term schema is used' 'belong to exactly one Schema.' should be 'belong to exactly one schema.' 'about the Cluster's activities' should be 'about the cluster's activities' 'the most common form of Relation' should be 'the most common form of relation' 'A Trigger executes' should be 'A trigger executes' 'and other closely related garbage-collection-like processing' should be 'and other processing' 'each of the changes are replayed' should be 'each of the changes is replayed' Should also be a lemmata in the glossary: ACID 'archaic' should maybe be 'obsolete'. That seems to me to be an easier word for non-native speakers. Thanks, Erik Rijkers
On 2020-Apr-03, Erik Rijkers wrote: > On 2020-04-03 18:45, Alvaro Herrera wrote: > > Pushed now. Many thanks to Corey who put the main thrust, and to Jürgen > > and Roger for the great help, and to Justin for the extensive review and > > Fabien for the initial discussion. > > A few improvements: Thanks! That gives me the attached patch. > Should also be a lemmata in the glossary: > > ACID Agreed. Wording suggestions welcome. > 'archaic' should maybe be 'obsolete'. That seems to me to be an easier word > for non-native speakers. Bummer ;-) -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Fri, Apr 03, 2020 at 05:51:43PM -0300, Alvaro Herrera wrote: > - The internal representation of one value of a <acronym>SQL</acronym> > + The internal representation of one value of an <acronym>SQL</acronym> I'm not sure about this one. The new glossary says "a SQL" seven times, and doesn't say "an sql" at all. "An SQL" does appear to be more common in the rest of the docs, but if you change one, I think you'd change them all. BTW it's now visible at: https://www.postgresql.org/docs/devel/glossary.html -- Justin
On 2020-04-03 22:51, Alvaro Herrera wrote: > On 2020-Apr-03, Erik Rijkers wrote: > >> On 2020-04-03 18:45, Alvaro Herrera wrote: >> > Pushed now. Many thanks to Corey who put the main thrust, and to Jürgen >> > and Roger for the great help, and to Justin for the extensive review and >> > Fabien for the initial discussion. >> >> A few improvements: > > Thanks! That gives me the attached patch. > >> Should also be a lemmata in the glossary: >> >> ACID > > Agreed. Wording suggestions welcome. How about: " ACID Atomicity, consistency, isolation, and durability. ACID is a set of properties of database transactions intended to guarantee validity even in the event of power failures, etc. ACID is concerned with how the database recovers from such failures that might occur while processing a transaction. " >> 'archaic' should maybe be 'obsolete'. That seems to me to be an easier >> word >> for non-native speakers. > > Bummer ;-) OK - we'll figure it out :) > -- > Álvaro Herrera https://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, 2020-04-03 at 16:01 -0500, Justin Pryzby wrote: > BTW it's now visible at: > https://www.postgresql.org/docs/devel/glossary.html Great! Some comments: - SQL object: There are more kinds of objects, like roles or full text dictionaries. Perhaps better: Anything that is created with a CREATE statement, for example ... Most objects belong to a database schema, except ... Or do we consider a replication slot to be an object? - The glossary has "Primary (server)", but not "Standby (server)". That should be a synonym for "Replica". - Server: is that really our definition? I thought that "server" is what the glossary defines as "instance", and the thing called "server" in the glossary should really be called "host". Maybe I am too Unix-centered. Many people I know use "instance" synonymous to "cluster". - Role: I understand the motivation behind the definition (except that the word "instance" is ill chosen), but a role is more than a collection of privileges. How can a collection of privileges have a password or own an object? Perhaps, instead of the first sentence: A database object used for authentication, authorization and ownership. Both database users and user groups are "roles" in PostgreSQL. In the second sentence, "roles" is mis-spelled as "roless". - Null I think it should say "It represents the absence of *a definite* value." Usually it is better to think of NULL as "unknown". - Function I don't know if "transformation of data" describes it well. Quite a lot of functions in PostgreSQL have side effects. How about: Procedural code stored in the database that can be used in SQL statements. Yours, Laurenz Albe
> BTW it's now visible at: > https://www.postgresql.org/docs/devel/glossary.html Awesome! Linking beetween defs and to relevant sections is great. BTW, I'm in favor of "an SQL" because I pronounce it "ess-kew-el", but I guess that people who say "sequel" would prefer "a SQL". Failing that, I'm fine with some heterogeneity, life is diverse! ISTM that occurrences of these words elsewhere in the documentation should link to the glossary definitions? As the definitions are short and to the point, maybe the HTML display could (also) "hover" the definitions when the mouse passes over the word, using the "title" attribute? "ACID" does not appear as an entry, nor in the acronyms sections. Also no DCL, although DML & DDL are in acronyms. Entries could link to relevant wikipedia pages, like the acronyms section does? -- Fabien.
> - Server: is that really our definition? > I thought that "server" is what the glossary defines as "instance", and > the thing called "server" in the glossary should really be called "host". > > Maybe I am too Unix-centered. > > Many people I know use "instance" synonymous to "cluster". Currently our documentation uses 'server', 'database server', 'host', 'instance', ... in an indifferent way. Similar problem with database/cluster. Now we have the chance to come to a conclusion about preferred terms an their exact meaning. Definitions in the glossary shall be the guideline, the documentation itself can adopt these terms over time. Here is my point of view. We have distinguishable things: (1) (virtual) hardware (2) an abstract structure of several object types, which models a management system for data (3) a group of closely related processes. They implement the internal 'business logic' or 'work flow' of (2). (4) abstract data, which fits into (2) (5) a physical representation of (4). Mainly and long lasting on disc, but - partly - mirrored in RAM. (6) client processes, which connect to (3) IMO for (1) the two terms 'server' and 'host' both have their justification, depending on the context. There are historical terms ('server-side', 'foreign server', 'client/server architecture', 'host' or 'host name' for IP-specification, 'host variable') which cannot be changed. Therefor we shall accept both with identical definition and use them as synonyms. Independent from this, there are many paragraphs in the documentation, where they are used in a misleading sense ('server crash', '... started the server', 'database server'). They should be changed over time. For me, (3) is an 'instance' and (5) is a 'cluster'. There is a 1:1 relation between the two, because one 'instance' controls exactly one 'cluster'. But the 'instance' consists of processes and memory whereas the 'cluster' of databases which resides (mainly) on disc. Concerning (6) we are not interested in any hardware-question. We are only interested in the processes, which connect to backend processes. We should only define the term "Client process". Kind regards, Jürgen
> BTW it's now visible at:
> https://www.postgresql.org/docs/devel/glossary.html
ISTM that occurrences of these words elsewhere in the documentation should
link to the glossary definitions?
As the definitions are short and to the point, maybe the HTML display
could (also) "hover" the definitions when the mouse passes over the word,
using the "title" attribute?
"ACID" does not appear as an entry, nor in the acronyms sections. Also no
DCL, although DML & DDL are in acronyms.
Entries could link to relevant wikipedia pages, like the acronyms section
does?
Hi Corey, >> ISTM that occurrences of these words elsewhere in the documentation should >> link to the glossary definitions? > > Yes, that's a big project. I was considering writing a script to compile > all the terms as search terms, paired with their glossary ids, and then > invoke git grep to identify all pages that have term FOO but don't have > glossary-foo. We would then go about gloss-linking those pages as > appropriate, but only a few pages at a time to keep scope sane. Id go for scripting the thing. Should the glossary be backpatched, to possibly ease doc patchpatches? > Also, I'm unclear about the circumstances under which we should _not_ > tag a term. At least when then are explained locally. > I remember hearing that we should only tag it on the first usage, but is > that per section or per page? Page? >> As the definitions are short and to the point, maybe the HTML display >> could (also) "hover" the definitions when the mouse passes over the word, >> using the "title" attribute? > > I like that idea, if it doesn't conflict with accessibility standards > (maybe that's just titles on images, not sure). The following worked fine: <html><head><title>Title Tag Test</title></head> <body>The <a href="acid.html" title="ACID stands for Atomic, Consistent, Isolated & Durable">ACID</a> property is great. </body></html> So basically the def can be put on the glossary link, however retrieving the definition should be automatic. > I suspect we would want to just carry over the first sentence or so with a > ... to avoid cluttering the screen with my overblown definition of a > sequence. Dunno. The definitions are quite short, maybe the can fit whole. > I suggest we pursue this idea in another thread, as we'd probably want to > do it for acronyms as well. Or not. I'd test committer temperature before investing time because it would mean that backpatching the doc would be a little harder. >> Entries could link to relevant wikipedia pages, like the acronyms section >> does? > > They could. I opted not to do that because each external link invites > debate about how authoritative that link is, which is easier to do with > acronyms. Now that the glossary is a reality, it's easier to have those > discussions. Ok. -- Fabien.
a) Some rearrangements of the sequence of terms to meet alphabetical order. b) <glossterm id="linkend-xxx"> --> <glossterm linkend="glossary-xxx"> in two cases. Or should it be a <firstterm>? Kind regards, Jürgen
Attachment
On 2020-Apr-05, Jürgen Purtz wrote: > a) Some rearrangements of the sequence of terms to meet alphabetical order. Thanks, will get this pushed. > b) <glossterm id="linkend-xxx"> --> <glossterm linkend="glossary-xxx"> in > two cases. Or should it be a <firstterm>? Ah, yeah, those should be linkend. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-Apr-05, Fabien COELHO wrote: > > > As the definitions are short and to the point, maybe the HTML display > > > could (also) "hover" the definitions when the mouse passes over the word, > > > using the "title" attribute? > > > > I like that idea, if it doesn't conflict with accessibility standards > > (maybe that's just titles on images, not sure). > > The following worked fine: > > <html><head><title>Title Tag Test</title></head> > <body>The <a href="acid.html" title="ACID stands for Atomic, Consistent, Isolated & Durable">ACID</a> > property is great. > </body></html> I don't see myself patching the stylesheet as would be needed to do this. > > I suggest we pursue this idea in another thread, as we'd probably want to > > do it for acronyms as well. > > Or not. I'd test committer temperature before investing time because it > would mean that backpatching the doc would be a little harder. TBH I can't get very excited about this idea. Maybe other documentation champions would be happier about doing that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> On 2020-Apr-05, Jürgen Purtz wrote: > >> a) Some rearrangements of the sequence of terms to meet alphabetical order. > Thanks, will get this pushed. > >> b) <glossterm id="linkend-xxx"> --> <glossterm linkend="glossary-xxx"> in >> two cases. Or should it be a <firstterm>? > Ah, yeah, those should be linkend. > Term 'relation': A sequence is internally a table with one row - right? Shall we extend the list of concrete relations by 'sequence'? Or is this not necessary because 'table' is already there? Kind regards, Jürgen
Term 'relation': A sequence is internally a table with one row - right?
Shall we extend the list of concrete relations by 'sequence'? Or is this
not necessary because 'table' is already there?
Term 'relation': A sequence is internally a table with one row - right?
Shall we extend the list of concrete relations by 'sequence'? Or is this
not necessary because 'table' is already there?I wrote one for sequence, it was a bit math-y for Alvaro's taste, so we're going to try again.
This seems to be a misunderstanding. My question was whether we shall extend the definition of Relation to: "... Tables, views, foreign tables, materialized views, indexes, and *sequences* are all relations."
Kind regards, Jürgen
Why are all the glossary terms capitalized? Seems kind of strange. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Why are all the glossary terms capitalized? Seems kind of strange.
Thanks everybody. I have compiled together all the suggestions and the result is in the attached patch. Some of it is of my own devising. * I changed "instance", and made "cluster" be mostly a synonym of that. * I removed "global SQL object" and made "SQL object" explain it. * Added definitions for ACID, sequence, bloat, fork, FSM, VM, data page, transaction ID, epoch. * Changed "a SQL" to "an sql" everywhere. * Sorted alphabetically. * Removed caps in term names. I think I should get this pushed, and if there are further suggestions, they're welcome. Dim Fontaine and others suggested a number of terms that could be included; see https://twitter.com/alvherre/status/1246192786287865856 -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On Thu, May 14, 2020 at 08:00:17PM -0400, Alvaro Herrera wrote: > + <glossterm>ACID</glossterm> > + <glossdef> > + <para> > + <glossterm linkend="glossary-atomicity">Atomicity</glossterm>, > + <glossterm linkend="glossary-consistency">consistency</glossterm>, > + <glossterm linkend="glossary-isolation">isolation</glossterm>, and > + <glossterm linkend="glossary-durability">durability</glossterm>. > + A set of properties of database transactions intended to guarantee validity > + in concurrent operation and even in event of errors, power failures, etc. I would capitalize Consistency, Isolation, Durability, and say "These four properties" or "This set of four properties" (althought that makes this sounds more like a fun game of DBA jeopardy). > + <glossterm>Background writer (process)</glossterm> > <glossdef> > <para> > - A process that continuously writes dirty pages from > + A process that continuously writes dirty I don't like "continuously" > + <glossterm linkend="glossary-data-page">data pages</glossterm> from > > + <glossentry id="glossary-bloat"> > + <glossterm>Bloat</glossterm> > + <glossdef> > + <para> > + Space in data pages which does not contain relevant data, > + such as unused (free) space or outdated row versions. "current row versions" instead of relevant ? > + <glossentry id="glossary-data-page"> > + <glossterm>Data page</glossterm> > + <glossdef> > + <para> > + The basic structure used to store relation data. > + All pages are of the same size. > + Data pages are typically stored on disk, each in a specific file, > + and can be read to <glossterm linkend="glossary-shared-memory">shared buffers</glossterm> > + where they can be modified, becoming > + <firstterm>dirty</firstterm>. They get clean by being written down say "They become clean when written to disk" > + to disk. New pages, which initially exist in memory only, are also > + dirty until written. > + <glossentry id="glossary-fork"> > + <glossterm>Fork</glossterm> > + <glossdef> > + <para> > + Each of the separate segmented file sets that a relation stores its > + data in. There exist a <firstterm>main fork</firstterm> and two secondary "in which a relation's data is stored" > + forks: the <glossterm linkend="glossary-fsm">free space map</glossterm> > + <glossterm linkend="glossary-vm">visibility map</glossterm>. missing "and" ? > + <glossentry id="glossary-fsm"> > + <glossterm>Free space map (fork)</glossterm> > + <glossdef> > + <para> > + A storage structure that keeps metadata about each data page in a table's > + main storage space. s/in/of/ just say "main fork"? > The free space map entry for each space stores the for each page ? > + amount of free space that's available for future tuples, and is structured > + so it is efficient to search for available space for a new tuple of a given > + size. ..to be efficiently searched to find free space.. > The heap is realized within > - <glossterm linkend="glossary-file-segment">segment files</glossterm>. > + <glossterm linkend="glossary-file-segment">segmented files</glossterm> > + in the relation's <glossterm linkend="glossary-fork">main fork</glossterm>. Hm, the files aren't segmented. Say "one or more file segments per relation" > + There also exist local objects that do not belong to schemas; some examples are > + <glossterm linkend="glossary-extension">extensions</glossterm>, > + <glossterm linkend="glossary-cast">data type casts</glossterm>, and > + <glossterm linkend="glossary-foreign-data-wrapper">foreign data wrappers</glossterm>. Don't extensions have schemas ? > + <glossentry id="glossary-xid"> > + <glossterm>Transaction ID</glossterm> > + <glossdef> > + <para> > + The numerical, unique, sequentially-assigned identifier that each > + transaction receives when it first causes a database modification. > + Frequently abbreviated <firstterm>xid</firstterm>. abbreviated *as* xid > + approximately four billion write transactions IDs can be generated; > + to permit the system to run for longer than that would allow, remove "would allow" > <para> > The process of removing outdated <glossterm linkend="glossary-tuple">tuple > versions</glossterm> from tables, and other closely related actually tables or materialized views.. > + <glossentry id="glossary-vm"> > + <glossterm>Visibility map (fork)</glossterm> > + <glossdef> > + <para> > + A storage structure that keeps metadata about each data page > + in a table's main storage space. The visibility map entry for s/in/of/ main fork? -- Justin
Applied all these suggestions, and made a few additional very small edits, and pushed -- better to ship what we have now in beta1, but further edits are still possible. Other possible terms to define, including those from the tweet I linked to and a couple more: archive availability backup composite type common table expression data type domain dump export fault tolerance GUC high availability hot standby LSN restore secondary server (?) snapshot transactions per second Anybody want to try their hand at a tentative definition? -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-05-15 19:26, Alvaro Herrera wrote: > Applied all these suggestions, and made a few additional very small > edits, and pushed -- better to ship what we have now in beta1, but > further edits are still possible. I've gone through the glossary as committed and found some more small things; patch attached. Thanks, Erik Rijkers > Other possible terms to define, including those from the tweet I linked > to and a couple more: > > archive > availability > backup > composite type > common table expression > data type > domain > dump > export > fault tolerance > GUC > high availability > hot standby > LSN > restore > secondary server (?) > snapshot > transactions per second > > Anybody want to try their hand at a tentative definition? > > -- > Álvaro Herrera https://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-May-16, Erik Rijkers wrote: > On 2020-05-15 19:26, Alvaro Herrera wrote: > > Applied all these suggestions, and made a few additional very small > > edits, and pushed -- better to ship what we have now in beta1, but > > further edits are still possible. > > I've gone through the glossary as committed and found some more small > things; patch attached. All pushed! Many thanks, -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 15.05.20 02:00, Alvaro Herrera wrote: > Thanks everybody. I have compiled together all the suggestions and the > result is in the attached patch. Some of it is of my own devising. > > * I changed "instance", and made "cluster" be mostly a synonym of that. In my understanding, "instance" and "cluster" should be different things, not only synonyms. "instance" can be the term for permanently fluctuating objects (processes and RAM) and "cluster" can denote the more static objects (directories and files). What do you think? If you agree, I would create a patch. > * I removed "global SQL object" and made "SQL object" explain it. +1., but see the (huge) different spellings in patch. bloat: changed 'current row' to 'relevant row' because not only the youngest one is relevant (non-bloat). data type casts: Are you sure that they are global? In pg_cast 'relisshared' is 'false'. -- Jürgen Purtz
Attachment
On 2020-May-17, Jürgen Purtz wrote: > On 15.05.20 02:00, Alvaro Herrera wrote: > > Thanks everybody. I have compiled together all the suggestions and the > > result is in the attached patch. Some of it is of my own devising. > > > > * I changed "instance", and made "cluster" be mostly a synonym of that. > In my understanding, "instance" and "cluster" should be different things, > not only synonyms. "instance" can be the term for permanently fluctuating > objects (processes and RAM) and "cluster" can denote the more static objects > (directories and files). What do you think? If you agree, I would create a > patch. I don't think that's the general understanding of those terms. For all I know, they *are* synonyms, and there's no specific term for "the fluctuating objects" as you call them. The instance is either running (in which case there are processes and RAM) or it isn't. > > * I removed "global SQL object" and made "SQL object" explain it. > +1., but see the (huge) different spellings in patch. This seems a misunderstanding of what "local" means. Any object that exists in a database is local, regardless of whether it exists in a schema or not. "Extensions" is one type of object that does not belong in a schema. "Foreign data wrapper" is another type of object that does not belong in a schema. Same with data type casts. They are *not* global objects. > bloat: changed 'current row' to 'relevant row' because not only the youngest > one is relevant (non-bloat). Hm. TBH I'm not sure of this term at all. I think we sometimes use the term "bloat" to talk about the dead rows only, ignoring the free space. > data type casts: Are you sure that they are global? In pg_cast 'relisshared' > is 'false'. I'm not saying they're global. I'm saying they're outside schemas. Maybe this definition needs more rewording, if this bit is unclear. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17.05.20 08:51, Alvaro Herrera wrote: > Any object that > exists in a database is local, regardless of whether it exists in a > schema or not. This implies that the term "local" is unnecessary, just call them "SQL object". > "Extensions" is one type of object that does not belong > in a schema. "Foreign data wrapper" is another type of object that does > not belong in a schema. ... They are*not* > global objects. postgres_fdw is a module among many others. It's only an example for "extensions" and has no different nature. Yes, they are not global SQL objects because they don't belong to the cluster. In summary we have 3 types of objects: belonging to a schema, to a database, or to the cluster (global). Maybe, we can avoid the use of the different names 'local SQL object' and 'global SQL object' at all and just call them 'SQL object'. 'global SQL object' is used only once. We could rephrase "A set of databases and accompanying global SQL objects ... " to "A set of databases and accompanying SQL objects, which exists at the cluster level, ... " > TBH I'm not sure of this term at all. I think we sometimes use the > term "bloat" to talk about the dead rows only, ignoring the free space. That's a good example for the necessity of the glossary. Currently we don't have a common understanding about all of our used terms. The glossary shall fix that and give a mandatory definition - after a clearing discussion. -- Jürgen Purtz
On 15.05.20 02:00, Alvaro Herrera wrote:Thanks everybody. I have compiled together all the suggestions and the result is in the attached patch. Some of it is of my own devising. * I changed "instance", and made "cluster" be mostly a synonym of that.In my understanding, "instance" and "cluster" should be different things, not only synonyms. "instance" can be the term for permanently fluctuating objects (processes and RAM) and "cluster" can denote the more static objects (directories and files). What do you think? If you agree, I would create a patch.I don't think that's the general understanding of those terms. For all I know, they *are* synonyms, and there's no specific term for "the fluctuating objects" as you call them. The instance is either running (in which case there are processes and RAM) or it isn't.
We have the basic tools "initdb — create a new PostgreSQL database cluster" which affects nothing but files, and we have "pg_ctl — initialize, start, stop, or control a PostgreSQL server" which - directly - affects nothing but processes and RAM. (Here the term "server" collides with new definitions in the glossary. But that's another story.)
--
Jürgen Purtz
On 2020-05-17 08:51, Alvaro Herrera wrote: > On 2020-May-17, Jürgen Purtz wrote: > >> On 15.05.20 02:00, Alvaro Herrera wrote: >> > Thanks everybody. I have compiled together all the suggestions and the >> > >> > * I changed "instance", and made "cluster" be mostly a synonym of that. >> In my understanding, "instance" and "cluster" should be different >> things, > > I don't think that's the general understanding of those terms. For all > I know, they *are* synonyms, and there's no specific term for "the > fluctuating objects" as you call them. The instance is either running > (in which case there are processes and RAM) or it isn't. For what it's worth, I've also always understood 'instance' as 'a running database'. I admit it might be a left-over from my oracle years: https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601 There, 'instance' clearly refers to a running database. When that database is stopped, it ceases to be an instance. I've always understood this to be the same for the PostgreSQL 'instance'. Once stopped, it is no longer an instance, but it is, of course, still a cluster. I know, we don't have to do the same as Oracle, but clearly it's going to be an ongoing source of misunderstanding if we define such a high-level term differently. Erik Rijkers
On 2020-May-17, Erik Rijkers wrote: > On 2020-05-17 08:51, Alvaro Herrera wrote: > > I don't think that's the general understanding of those terms. For all > > I know, they *are* synonyms, and there's no specific term for "the > > fluctuating objects" as you call them. The instance is either running > > (in which case there are processes and RAM) or it isn't. > > For what it's worth, I've also always understood 'instance' as 'a running > database'. I admit it might be a left-over from my oracle years: > > https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601 > > There, 'instance' clearly refers to a running database. When that database > is stopped, it ceases to be an instance. I've never understood it that way, but I'm open to having my opinion on it changed. So let's discuss it and maybe gather opinions from others. I think the terms under discussion are just * cluster * instance * server We don't have "host" (I just made it a synonym for server), but perhaps we can add that too, if it's useful. It would be good to be consistent with historical Postgres usage, such as the initdb usage of "cluster" etc. Perhaps we should not only define what our use of each term is, but also explain how each term is used outside PostgreSQL and highlight the differences. (This would be particularly useful for "cluster" ISTM.) It seems difficult to get this sorted out before beta1, but there's still time before the glossary is released. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-May-17, Erik Rijkers wrote:On 2020-05-17 08:51, Alvaro Herrera wrote:I don't think that's the general understanding of those terms. For all I know, they *are* synonyms, and there's no specific term for "the fluctuating objects" as you call them. The instance is either running (in which case there are processes and RAM) or it isn't.For what it's worth, I've also always understood 'instance' as 'a running database'. I admit it might be a left-over from my oracle years: https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601 There, 'instance' clearly refers to a running database. When that database is stopped, it ceases to be an instance.I've never understood it that way, but I'm open to having my opinion on it changed. So let's discuss it and maybe gather opinions from others. I think the terms under discussion are just * cluster * instance * server We don't have "host" (I just made it a synonym for server), but perhaps we can add that too, if it's useful. It would be good to be consistent with historical Postgres usage, such as the initdb usage of "cluster" etc. Perhaps we should not only define what our use of each term is, but also explain how each term is used outside PostgreSQL and highlight the differences. (This would be particularly useful for "cluster" ISTM.)
In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms.
Here are my two cents.
cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*. For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.
The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible.
server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". --
Jürgen Purtz
(PS: I added the docs mailing list)
On 2020-May-17, Erik Rijkers wrote:On 2020-05-17 08:51, Alvaro Herrera wrote:I don't think that's the general understanding of those terms. For all I know, they *are* synonyms, and there's no specific term for "the fluctuating objects" as you call them. The instance is either running (in which case there are processes and RAM) or it isn't.For what it's worth, I've also always understood 'instance' as 'a running database'. I admit it might be a left-over from my oracle years: https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601 There, 'instance' clearly refers to a running database. When that database is stopped, it ceases to be an instance.I've never understood it that way, but I'm open to having my opinion on it changed. So let's discuss it and maybe gather opinions from others. I think the terms under discussion are just * cluster * instance * server We don't have "host" (I just made it a synonym for server), but perhaps we can add that too, if it's useful. It would be good to be consistent with historical Postgres usage, such as the initdb usage of "cluster" etc. Perhaps we should not only define what our use of each term is, but also explain how each term is used outside PostgreSQL and highlight the differences. (This would be particularly useful for "cluster" ISTM.)
In fact, we have reached a point where we don't have a common understanding of a group of terms. I'm sure that we will meet some more situations like this in the future. Such discussions, subsequent decisions, and implementations in the docs are necessary to gain a solid foundation - primarily for newcomers (what is my first motivation) as well as for more complex discussions among experts. Obviously, each of us will include his previous understanding of terms. But we also should be open to sometimes revise old terms.
Here are my two cents.
cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*. For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.
The next essential component is the data itself. It is organized as a group of databases plus some common management information (global, pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a whole because the management information concerns all databases. Its nature is different from the processes and shared buffers. Of course, its content changes, but it has a steady nature. It even survives a 'power down'. There is one command to instantiate a new incarnation of the directory structure and all files. In summary, it's something of its own and should have its own name. 'database' is not possible because it consists of databases and other things. My favorite is *cluster*; *database cluster* is also possible.
server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine". --
Jürgen Purtz
(PS: I added the docs mailing list)
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: > cluster/instance: PG (mainly) consists of a group of processes that commonly > act on shared buffers. The processes are very closely related to each other > and with the buffers. They exist altogether or not at all. They use a common > initialization file and are incarnated by one command. Everything exists > solely in RAM and therefor has a fluctuating nature. In summary: they build > a unit and this unit needs to have a name of itself. In some pages we used > to use the term *instance* - sometimes in extended forms: *database instance*, > *PG instance*, *standby instance*, *standby server instance*, *server instance*, > or *remote instance*. For me, the term *instance* makes sense, the extensions > *standby instance* and *remote instance* in their context too. FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory". > server/host: We need a term to describe the underlying hardware respectively > the virtual machine or container, where PG is running. I suggest to use both > *server* and *host*. In computer science, both have their eligibility and are > widely used. Everybody understands *client/server architecture* or *host* in > TCP/IP configuration. We cannot change such matter of course. I suggest to > use both depending on the context, but with the same meaning: "real hardware, > a container, or a virtual machine". On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context). Yours, Laurenz Albe
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote: > cluster/instance: PG (mainly) consists of a group of processes that commonly > act on shared buffers. The processes are very closely related to each other > and with the buffers. They exist altogether or not at all. They use a common > initialization file and are incarnated by one command. Everything exists > solely in RAM and therefor has a fluctuating nature. In summary: they build > a unit and this unit needs to have a name of itself. In some pages we used > to use the term *instance* - sometimes in extended forms: *database instance*, > *PG instance*, *standby instance*, *standby server instance*, *server instance*, > or *remote instance*. For me, the term *instance* makes sense, the extensions > *standby instance* and *remote instance* in their context too. FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory". > server/host: We need a term to describe the underlying hardware respectively > the virtual machine or container, where PG is running. I suggest to use both > *server* and *host*. In computer science, both have their eligibility and are > widely used. Everybody understands *client/server architecture* or *host* in > TCP/IP configuration. We cannot change such matter of course. I suggest to > use both depending on the context, but with the same meaning: "real hardware, > a container, or a virtual machine". On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context). Yours, Laurenz Albe
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
> or *remote instance*. For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.
FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.
The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.
The static part of a cluster to me is the "data directory".
> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".
On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not. You can always disambiguate by adding "virtual" or "physical".
A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon. The term "client-server architecture"
that you quote emphasized that.
Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).
Yours,
Laurenz Albe
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
> cluster/instance: PG (mainly) consists of a group of processes that commonly
> act on shared buffers. The processes are very closely related to each other
> and with the buffers. They exist altogether or not at all. They use a common
> initialization file and are incarnated by one command. Everything exists
> solely in RAM and therefor has a fluctuating nature. In summary: they build
> a unit and this unit needs to have a name of itself. In some pages we used
> to use the term *instance* - sometimes in extended forms: *database instance*,
> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
> or *remote instance*. For me, the term *instance* makes sense, the extensions
> *standby instance* and *remote instance* in their context too.
FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
perhaps distinguishing between a "started cluster" and a "stopped cluster".
After all, "cluster" refers to "a cluster of databases", which are there, regardless
if you start the server or not.
The term "cluster" is unfortunate, because to most people it suggests a group of
machines, so the term "instance" is better, but that ship has sailed long ago.
The static part of a cluster to me is the "data directory".
> server/host: We need a term to describe the underlying hardware respectively
> the virtual machine or container, where PG is running. I suggest to use both
> *server* and *host*. In computer science, both have their eligibility and are
> widely used. Everybody understands *client/server architecture* or *host* in
> TCP/IP configuration. We cannot change such matter of course. I suggest to
> use both depending on the context, but with the same meaning: "real hardware,
> a container, or a virtual machine".
On this I have a strong opinion because of my Unix mindset.
"machine" and "host" are synonyms, and it doesn't matter to the database if they
are virtualized or not. You can always disambiguate by adding "virtual" or "physical".
A "server" is a piece of software that responds to client requests, never a machine.
In my book, this is purely Windows jargon. The term "client-server architecture"
that you quote emphasized that.
Perhaps "machine" would be the preferable term, because "host" is more prone to
misunderstandings (except in a networking context).
Yours,
Laurenz Albe
On 2020-05-19 08:17, Laurenz Albe wrote: > The term "cluster" is unfortunate, because to most people it suggests a group of > machines, so the term "instance" is better, but that ship has sailed long ago. I don't see what would stop us from renaming some things, with some care. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-05-19 08:17, Laurenz Albe wrote: > The term "cluster" is unfortunate, because to most people it suggests a group of > machines, so the term "instance" is better, but that ship has sailed long ago. I don't see what would stop us from renaming some things, with some care. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*. For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory".
cluster/instance: The different nature (static/dynamic) of what I call "cluster" and "instance" as well as the existence of the two commands "initdb — create a new PostgreSQL database cluster" and "pg_ctl — initialize, start, stop, or control a PostgreSQL server" confirms me in my opinion that we need two different terms for them. Those two terms shall not be synonym to each other, they label distinct things. If people prefer "data directory" instead of "cluster", this is ok for me.
There are situations where we need a single term for both of them. "Instance and its data directory" or "Instance and its cluster" are too wordy. In many cases we use "database server" or "server" in this sense. Imo "Server" is too short and ambiguous. "database server", the plural form "databases server", or the new term "cluster server", which is more accurate, would be ok for me. (Similar to "server", the term "cluster" is also used in many different contexts - but only outside of the PG world; within our context "cluster" is not ambiguous.)
server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine".On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context).
server/host: I agree that we are not interested in the question whether there is real hardware or any virtualization container. We are even not interested in the operating system. Our primary concern is the existence of a port of the Internet Protocol. But is the term "server" appropriate to name an IP-port? Additionally, "server" is used for other meanings: a) the previously mentioned "database server" b) a (virtual) machine: "server-side", "... the file ... loaded by the server ..." c) binaries "... the server must be built with SSL support ..." d) whenever it seems to be appropriate: "standby server", "... the server parses query ...", "server configuration", "server process".
Because of its ambiguous usage, the definition of "server" must clarify the allowed meanings. What's about:
--
server: Depending on the context, the term *server* denotes:
- An IP-port which is offered by any OS. ?????
- A - possibly virtualized - machine
- An abbreviation for the slightly longer term "database(s)/cluster server" ??? this will support the readability, but not the clarity ???
- More ?
--
The term "host" is used mainly for IP configuration "host name", "host address" and in the context of compiling "host language", "host variable". These are clear situations and can be defined easily.
On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:cluster/instance: PG (mainly) consists of a group of processes that commonly act on shared buffers. The processes are very closely related to each other and with the buffers. They exist altogether or not at all. They use a common initialization file and are incarnated by one command. Everything exists solely in RAM and therefor has a fluctuating nature. In summary: they build a unit and this unit needs to have a name of itself. In some pages we used to use the term *instance* - sometimes in extended forms: *database instance*, *PG instance*, *standby instance*, *standby server instance*, *server instance*, or *remote instance*. For me, the term *instance* makes sense, the extensions *standby instance* and *remote instance* in their context too.FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, perhaps distinguishing between a "started cluster" and a "stopped cluster". After all, "cluster" refers to "a cluster of databases", which are there, regardless if you start the server or not. The term "cluster" is unfortunate, because to most people it suggests a group of machines, so the term "instance" is better, but that ship has sailed long ago. The static part of a cluster to me is the "data directory".
cluster/instance: The different nature (static/dynamic) of what I call "cluster" and "instance" as well as the existence of the two commands "initdb — create a new PostgreSQL database cluster" and "pg_ctl — initialize, start, stop, or control a PostgreSQL server" confirms me in my opinion that we need two different terms for them. Those two terms shall not be synonym to each other, they label distinct things. If people prefer "data directory" instead of "cluster", this is ok for me.
There are situations where we need a single term for both of them. "Instance and its data directory" or "Instance and its cluster" are too wordy. In many cases we use "database server" or "server" in this sense. Imo "Server" is too short and ambiguous. "database server", the plural form "databases server", or the new term "cluster server", which is more accurate, would be ok for me. (Similar to "server", the term "cluster" is also used in many different contexts - but only outside of the PG world; within our context "cluster" is not ambiguous.)
server/host: We need a term to describe the underlying hardware respectively the virtual machine or container, where PG is running. I suggest to use both *server* and *host*. In computer science, both have their eligibility and are widely used. Everybody understands *client/server architecture* or *host* in TCP/IP configuration. We cannot change such matter of course. I suggest to use both depending on the context, but with the same meaning: "real hardware, a container, or a virtual machine".On this I have a strong opinion because of my Unix mindset. "machine" and "host" are synonyms, and it doesn't matter to the database if they are virtualized or not. You can always disambiguate by adding "virtual" or "physical". A "server" is a piece of software that responds to client requests, never a machine. In my book, this is purely Windows jargon. The term "client-server architecture" that you quote emphasized that. Perhaps "machine" would be the preferable term, because "host" is more prone to misunderstandings (except in a networking context).
server/host: I agree that we are not interested in the question whether there is real hardware or any virtualization container. We are even not interested in the operating system. Our primary concern is the existence of a port of the Internet Protocol. But is the term "server" appropriate to name an IP-port? Additionally, "server" is used for other meanings: a) the previously mentioned "database server" b) a (virtual) machine: "server-side", "... the file ... loaded by the server ..." c) binaries "... the server must be built with SSL support ..." d) whenever it seems to be appropriate: "standby server", "... the server parses query ...", "server configuration", "server process".
Because of its ambiguous usage, the definition of "server" must clarify the allowed meanings. What's about:
--
server: Depending on the context, the term *server* denotes:
- An IP-port which is offered by any OS. ?????
- A - possibly virtualized - machine
- An abbreviation for the slightly longer term "database(s)/cluster server" ??? this will support the readability, but not the clarity ???
- More ?
--
The term "host" is used mainly for IP configuration "host name", "host address" and in the context of compiling "host language", "host variable". These are clear situations and can be defined easily.
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote: > > FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, > > perhaps distinguishing between a "started cluster" and a "stopped cluster". > > After all, "cluster" refers to "a cluster of databases", which are there, regardless > > if you start the server or not. > > > > The term "cluster" is unfortunate, because to most people it suggests a group of > > machines, so the term "instance" is better, but that ship has sailed long ago. > > > > The static part of a cluster to me is the "data directory". > > cluster/instance: The different nature (static/dynamic) of what I > call "cluster" and "instance" as well as the existence of the two > commands "initdb — create a new PostgreSQL database cluster" and > "pg_ctl — initialize, start, stop, or control a PostgreSQL server" > confirms me in my opinion that we need two different terms for > them. I think that the "pg_ctl" example does not apply: It does not talk about starting the cluster, but about starting the server process, that is "server" in the way I understand it. > There are situations where we need a single term for both of > them. "Instance and its data directory" or "Instance and its > cluster" are too wordy. In many cases we use "database server" or > "server" in this sense. Imo "Server" is too short and ambiguous. > "database server", the plural form "databases server", or the new > term "cluster server", which is more accurate, would be ok for me. > (Similar to "server", the term "cluster" is also used in many > different contexts - but only outside of the PG world; within our > context "cluster" is not ambiguous.) That does not feel right to me. "cluster server", ouch. "databases server", ouch as well. I never felt the term "cluster" was unclear in these contexts. Sometimes it means "data directory", sometimes it is used for "server process", but I think few people would think one cound connect to a data directory or create a process in a directory (initdb). I think clarity is a Good Thing, but it can be overdone. > > > server/host: We need a term to describe the underlying hardware respectively > > > the virtual machine or container, where PG is running. I suggest to use both > > > *server* and *host*. In computer science, both have their eligibility and are > > > widely used. Everybody understands *client/server architecture* or *host* in > > > TCP/IP configuration. We cannot change such matter of course. I suggest to > > > use both depending on the context, but with the same meaning: "real hardware, > > > a container, or a virtual machine". > > > > On this I have a strong opinion because of my Unix mindset. > > "machine" and "host" are synonyms, and it doesn't matter to the database if they > > are virtualized or not. You can always disambiguate by adding "virtual" or "physical". > > > > A "server" is a piece of software that responds to client requests, never a machine. > > In my book, this is purely Windows jargon. The term "client-server architecture" > > that you quote emphasized that. > > > > Perhaps "machine" would be the preferable term, because "host" is more prone to > > misunderstandings (except in a networking context). > > server/host: I agree that we are not interested in the question > whether there is real hardware or any virtualization container. We > are even not interested in the operating system. Our primary > concern is the existence of a port of the Internet Protocol. But > is the term "server" appropriate to name an IP-port? Additionally, > "server" is used for other meanings: a) the previously mentioned > "database server" b) a (virtual) machine: "server-side", "... the > file ... loaded by the server ..." c) binaries "... the server > must be built with SSL support ..." d) whenever it seems to be > appropriate: "standby server", "... the server parses query ...", > "server configuration", "server process". You are most thorough :^) > Because of its ambiguous usage, the definition of "server" must > clarify the allowed meanings. What's about: > > server: Depending on the context, the term *server* denotes: > > An IP-port which is offered by any OS. ????? A port is a server? No way. > A - possibly virtualized - machine It might be good to disambiguate that, but I don't think that the PostgreSQL documentation should use the word "server" to mean "machine". > An abbreviation for the slightly longer term > "database(s)/cluster server" ??? this will support the > readability, but not the clarity ??? "Server" is short for "database server" and is a set of processes that listen for and handle incoming database client requests. I think that covers all the meanings you quoted from the documentation, except c), where it is used as shorthand for "server executable". Yours, Laurenz Albe
On Wed, 2020-05-20 at 13:17 +0200, Jürgen Purtz wrote: > > FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously, > > perhaps distinguishing between a "started cluster" and a "stopped cluster". > > After all, "cluster" refers to "a cluster of databases", which are there, regardless > > if you start the server or not. > > > > The term "cluster" is unfortunate, because to most people it suggests a group of > > machines, so the term "instance" is better, but that ship has sailed long ago. > > > > The static part of a cluster to me is the "data directory". > > cluster/instance: The different nature (static/dynamic) of what I > call "cluster" and "instance" as well as the existence of the two > commands "initdb — create a new PostgreSQL database cluster" and > "pg_ctl — initialize, start, stop, or control a PostgreSQL server" > confirms me in my opinion that we need two different terms for > them. I think that the "pg_ctl" example does not apply: It does not talk about starting the cluster, but about starting the server process, that is "server" in the way I understand it. > There are situations where we need a single term for both of > them. "Instance and its data directory" or "Instance and its > cluster" are too wordy. In many cases we use "database server" or > "server" in this sense. Imo "Server" is too short and ambiguous. > "database server", the plural form "databases server", or the new > term "cluster server", which is more accurate, would be ok for me. > (Similar to "server", the term "cluster" is also used in many > different contexts - but only outside of the PG world; within our > context "cluster" is not ambiguous.) That does not feel right to me. "cluster server", ouch. "databases server", ouch as well. I never felt the term "cluster" was unclear in these contexts. Sometimes it means "data directory", sometimes it is used for "server process", but I think few people would think one cound connect to a data directory or create a process in a directory (initdb). I think clarity is a Good Thing, but it can be overdone. > > > server/host: We need a term to describe the underlying hardware respectively > > > the virtual machine or container, where PG is running. I suggest to use both > > > *server* and *host*. In computer science, both have their eligibility and are > > > widely used. Everybody understands *client/server architecture* or *host* in > > > TCP/IP configuration. We cannot change such matter of course. I suggest to > > > use both depending on the context, but with the same meaning: "real hardware, > > > a container, or a virtual machine". > > > > On this I have a strong opinion because of my Unix mindset. > > "machine" and "host" are synonyms, and it doesn't matter to the database if they > > are virtualized or not. You can always disambiguate by adding "virtual" or "physical". > > > > A "server" is a piece of software that responds to client requests, never a machine. > > In my book, this is purely Windows jargon. The term "client-server architecture" > > that you quote emphasized that. > > > > Perhaps "machine" would be the preferable term, because "host" is more prone to > > misunderstandings (except in a networking context). > > server/host: I agree that we are not interested in the question > whether there is real hardware or any virtualization container. We > are even not interested in the operating system. Our primary > concern is the existence of a port of the Internet Protocol. But > is the term "server" appropriate to name an IP-port? Additionally, > "server" is used for other meanings: a) the previously mentioned > "database server" b) a (virtual) machine: "server-side", "... the > file ... loaded by the server ..." c) binaries "... the server > must be built with SSL support ..." d) whenever it seems to be > appropriate: "standby server", "... the server parses query ...", > "server configuration", "server process". You are most thorough :^) > Because of its ambiguous usage, the definition of "server" must > clarify the allowed meanings. What's about: > > server: Depending on the context, the term *server* denotes: > > An IP-port which is offered by any OS. ????? A port is a server? No way. > A - possibly virtualized - machine It might be good to disambiguate that, but I don't think that the PostgreSQL documentation should use the word "server" to mean "machine". > An abbreviation for the slightly longer term > "database(s)/cluster server" ??? this will support the > readability, but not the clarity ??? "Server" is short for "database server" and is a set of processes that listen for and handle incoming database client requests. I think that covers all the meanings you quoted from the documentation, except c), where it is used as shorthand for "server executable". Yours, Laurenz Albe
On 2020-04-29 21:55, Corey Huinker wrote: > On Wed, Apr 29, 2020 at 3:15 PM Peter Eisentraut > <peter.eisentraut@2ndquadrant.com > <mailto:peter.eisentraut@2ndquadrant.com>> wrote: > > Why are all the glossary terms capitalized? Seems kind of strange. > > > They weren't intended to be, and they don't appear to be in the page I'm > looking at. Are you referring to the anchor like in > https://www.postgresql.org/docs/devel/glossary.html#GLOSSARY-RELATION ? > If so, that all-capping is part of the rendering, as the ids were all > named in all-lower-case. Sorry, I meant why is the first letter of each term capitalized. That seems unusual. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 17.05.20 17:28, Alvaro Herrera wrote: > I think the terms under discussion are just > > * cluster > * instance > * server Despite the short period of its existence the glossary achieved some importance, see: https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru . We have to be careful with publications. It's not acceptable that we change definitions from release to release. Therefore IMO we should mark or even ignore such terms for which we cannot reach consensus. Can you agree to the following definitions? If no, we can alternatively formulate for each of them: "Under discussion - currently not defined". My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster." - "Database" (No change to existing definition): "A named collection of SQL objects." - "Database Cluster", "Cluster" (New definition and rearrangements of some sentences): "A collection of related databases, and their common static and dynamic meta-data. This term is sometimes used to refer to an instance. (Don't confuse the term CLUSTER with the SQL command CLUSTER.)" - "Data Directory" (Replaced 'instance' by 'cluster'): "The base directory on the filesystem of a server that contains all data files and subdirectories associated with a cluster (with the exception of tablespaces). The environment variable PGDATA is commonly used to refer to the data directory. A cluster's storage space comprises the data directory plus any additional tablespaces. For more information, see Section 68.1." - "Database Server", "Instance" (Major changes): "A group of backend and auxiliary processes that communicate using a common shared memory area. One postmaster process manages the instance; one instance manages exactly one cluster with all its databases. Many instances can run on the same server as long as their TCP ports do not conflict. The instance handles all key features of a DBMS: read and write access to files and shared memory, assurance of the ACID properties, connections to client processes, privilege verification, crash recovery, replication, etc." - "Server" (No change to existing definition): "A computer on which PostgreSQL instances run. The term server denotes real hardware, a container, or a virtual machine. This term is sometimes used to refer to an instance or to a host." - "Host" (No change to existing definition): "A computer that communicates with other computers over a network. This is sometimes used as a synonym for server. It is also used to refer to a computer where client processes run." -- Jürgen Purtz
On 17.05.20 17:28, Alvaro Herrera wrote: > I think the terms under discussion are just > > * cluster > * instance > * server Despite the short period of its existence the glossary achieved some importance, see: https://www.postgresql.org/message-id/b8e12875ebec9e6d3107df5fa1129e1e%40postgrespro.ru . We have to be careful with publications. It's not acceptable that we change definitions from release to release. Therefore IMO we should mark or even ignore such terms for which we cannot reach consensus. Can you agree to the following definitions? If no, we can alternatively formulate for each of them: "Under discussion - currently not defined". My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster." - "Database" (No change to existing definition): "A named collection of SQL objects." - "Database Cluster", "Cluster" (New definition and rearrangements of some sentences): "A collection of related databases, and their common static and dynamic meta-data. This term is sometimes used to refer to an instance. (Don't confuse the term CLUSTER with the SQL command CLUSTER.)" - "Data Directory" (Replaced 'instance' by 'cluster'): "The base directory on the filesystem of a server that contains all data files and subdirectories associated with a cluster (with the exception of tablespaces). The environment variable PGDATA is commonly used to refer to the data directory. A cluster's storage space comprises the data directory plus any additional tablespaces. For more information, see Section 68.1." - "Database Server", "Instance" (Major changes): "A group of backend and auxiliary processes that communicate using a common shared memory area. One postmaster process manages the instance; one instance manages exactly one cluster with all its databases. Many instances can run on the same server as long as their TCP ports do not conflict. The instance handles all key features of a DBMS: read and write access to files and shared memory, assurance of the ACID properties, connections to client processes, privilege verification, crash recovery, replication, etc." - "Server" (No change to existing definition): "A computer on which PostgreSQL instances run. The term server denotes real hardware, a container, or a virtual machine. This term is sometimes used to refer to an instance or to a host." - "Host" (No change to existing definition): "A computer that communicates with other computers over a network. This is sometimes used as a synonym for server. It is also used to refer to a computer where client processes run." -- Jürgen Purtz
On 2020-Jun-09, Jürgen Purtz wrote: > Can you agree to the following definitions? If no, we can alternatively > formulate for each of them: "Under discussion - currently not defined". My > proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into > databases, and a collection of databases managed by a single PostgreSQL > server instance constitutes a database cluster." After sleeping on it a few more times, I don't oppose the idea of making "instance" be the running state and "database cluster" the on-disk stuff that supports the instance. Here's a patch that does things pretty much along the lines you suggested. I made small adjustments to "SQL objects": * SQL objects in schemas were said to have their names unique in the schema, but we failed to say anything about names of objects not in schemas and global objects. Added that. * Had example object types for global objects and objects not in schemas, but no examples for objects in schemas. Added that. Some programs whose output we could tweak per this: pg_ctl > pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server. > -D, --pgdata=DATADIR location of the database storage area to: > pg_ctl is a utility to initialize or control a PostgreSQL database cluster. > -D, --pgdata=DATADIR location of the database directory pg_basebackup: > pg_basebackup takes a base backup of a running PostgreSQL server. to: > pg_basebackup takes a base backup of a PostgreSQL instance. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-Jun-09, Jürgen Purtz wrote: > Can you agree to the following definitions? If no, we can alternatively > formulate for each of them: "Under discussion - currently not defined". My > proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into > databases, and a collection of databases managed by a single PostgreSQL > server instance constitutes a database cluster." After sleeping on it a few more times, I don't oppose the idea of making "instance" be the running state and "database cluster" the on-disk stuff that supports the instance. Here's a patch that does things pretty much along the lines you suggested. I made small adjustments to "SQL objects": * SQL objects in schemas were said to have their names unique in the schema, but we failed to say anything about names of objects not in schemas and global objects. Added that. * Had example object types for global objects and objects not in schemas, but no examples for objects in schemas. Added that. Some programs whose output we could tweak per this: pg_ctl > pg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server. > -D, --pgdata=DATADIR location of the database storage area to: > pg_ctl is a utility to initialize or control a PostgreSQL database cluster. > -D, --pgdata=DATADIR location of the database directory pg_basebackup: > pg_basebackup takes a base backup of a running PostgreSQL server. to: > pg_basebackup takes a base backup of a PostgreSQL instance. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: > diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml > index 25b03f3b37..e29b55e5ac 100644 > --- a/doc/src/sgml/glossary.sgml > +++ b/doc/src/sgml/glossary.sgml > @@ -395,15 +395,15 @@ > <para> > The base directory on the filesystem of a > <glossterm linkend="glossary-server">server</glossterm> that contains all > - data files and subdirectories associated with an > - <glossterm linkend="glossary-instance">instance</glossterm> (with the > - exception of <glossterm linkend="glossary-tablespace">tablespaces</glossterm>). > + data files and subdirectories associated with a > + <glossterm linkend="glossary-db-cluster">database cluster</glossterm> > + (with the exception of > + <glossterm linkend="glossary-tablespace">tablespaces</glossterm>). and (optionally) WAL > + <glossentry id="glossary-db-cluster"> > + <glossterm>Database cluster</glossterm> > + <glossdef> > + <para> > + A collection of databases and global SQL objects, > + and their common static and dynamic meta-data. metadata > @@ -1245,12 +1255,17 @@ > <glossterm linkend="glossary-sql-object">SQL objects</glossterm>, > which all reside in the same > <glossterm linkend="glossary-database">database</glossterm>. > - Each SQL object must reside in exactly one schema. > + Each SQL object must reside in exactly one schema > + (though certain types of SQL objects exist outside schemas). (except for global objects which ..) > <para> > The names of SQL objects of the same type in the same schema are enforced > to be unique. > There is no restriction on reusing a name in multiple schemas. > + For local objects that exist outside schemas, their names are enforced > + unique across the whole database. For global objects, their names I would say "unique within the database" > + are enforced unique across the whole > + <glossterm linkend="glossary-db-cluster">database cluster</glossterm>. and "unique within the whole db cluster" > Most local objects belong to a specific > - <glossterm linkend="glossary-schema">schema</glossterm> in their containing database. > + <glossterm linkend="glossary-schema">schema</glossterm> in their > + containing database, such as > + <glossterm linkend="glossary-relation">all types of relations</glossterm>, > + <glossterm linkend="glossary-function">all types of functions</glossterm>, Maybe say: >Relations< (all types), and >Functions< (all types) > used as the default one for all SQL objects, called <literal>pg_default</literal>. "the default" (remove "one") -- Justin
On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: > diff --git a/doc/src/sgml/glossary.sgml b/doc/src/sgml/glossary.sgml > index 25b03f3b37..e29b55e5ac 100644 > --- a/doc/src/sgml/glossary.sgml > +++ b/doc/src/sgml/glossary.sgml > @@ -395,15 +395,15 @@ > <para> > The base directory on the filesystem of a > <glossterm linkend="glossary-server">server</glossterm> that contains all > - data files and subdirectories associated with an > - <glossterm linkend="glossary-instance">instance</glossterm> (with the > - exception of <glossterm linkend="glossary-tablespace">tablespaces</glossterm>). > + data files and subdirectories associated with a > + <glossterm linkend="glossary-db-cluster">database cluster</glossterm> > + (with the exception of > + <glossterm linkend="glossary-tablespace">tablespaces</glossterm>). and (optionally) WAL > + <glossentry id="glossary-db-cluster"> > + <glossterm>Database cluster</glossterm> > + <glossdef> > + <para> > + A collection of databases and global SQL objects, > + and their common static and dynamic meta-data. metadata > @@ -1245,12 +1255,17 @@ > <glossterm linkend="glossary-sql-object">SQL objects</glossterm>, > which all reside in the same > <glossterm linkend="glossary-database">database</glossterm>. > - Each SQL object must reside in exactly one schema. > + Each SQL object must reside in exactly one schema > + (though certain types of SQL objects exist outside schemas). (except for global objects which ..) > <para> > The names of SQL objects of the same type in the same schema are enforced > to be unique. > There is no restriction on reusing a name in multiple schemas. > + For local objects that exist outside schemas, their names are enforced > + unique across the whole database. For global objects, their names I would say "unique within the database" > + are enforced unique across the whole > + <glossterm linkend="glossary-db-cluster">database cluster</glossterm>. and "unique within the whole db cluster" > Most local objects belong to a specific > - <glossterm linkend="glossary-schema">schema</glossterm> in their containing database. > + <glossterm linkend="glossary-schema">schema</glossterm> in their > + containing database, such as > + <glossterm linkend="glossary-relation">all types of relations</glossterm>, > + <glossterm linkend="glossary-function">all types of functions</glossterm>, Maybe say: >Relations< (all types), and >Functions< (all types) > used as the default one for all SQL objects, called <literal>pg_default</literal>. "the default" (remove "one") -- Justin
On 2020-Jun-09, Jürgen Purtz wrote:Can you agree to the following definitions? If no, we can alternatively formulate for each of them: "Under discussion - currently not defined". My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster."After sleeping on it a few more times, I don't oppose the idea of making "instance" be the running state and "database cluster" the on-disk stuff that supports the instance. Here's a patch that does things pretty much along the lines you suggested. I made small adjustments to "SQL objects": * SQL objects in schemas were said to have their names unique in the schema, but we failed to say anything about names of objects not in schemas and global objects. Added that. * Had example object types for global objects and objects not in schemas, but no examples for objects in schemas. Added that. Some programs whose output we could tweak per this: pg_ctlpg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.-D, --pgdata=DATADIR location of the database storage areato:pg_ctl is a utility to initialize or control a PostgreSQL database cluster.-D, --pgdata=DATADIR location of the database directorypg_basebackup:pg_basebackup takes a base backup of a running PostgreSQL server.to:pg_basebackup takes a base backup of a PostgreSQL instance.
+1, with two formal changes:
- Rearrangement of term "Data page" to meet alphabetical order.
- Add </glossdef> in one case to meet xml-well-formedness.
One last question: The definition of "Data directory" reads "... A cluster's storage space comprises the data directory plus ..." and 'cluster' links to '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?
--
Jürgen Purtz
Attachment
On 2020-Jun-09, Jürgen Purtz wrote:Can you agree to the following definitions? If no, we can alternatively formulate for each of them: "Under discussion - currently not defined". My proposals are inspired by chapter 2.2 Concepts: "Tables are grouped into databases, and a collection of databases managed by a single PostgreSQL server instance constitutes a database cluster."After sleeping on it a few more times, I don't oppose the idea of making "instance" be the running state and "database cluster" the on-disk stuff that supports the instance. Here's a patch that does things pretty much along the lines you suggested. I made small adjustments to "SQL objects": * SQL objects in schemas were said to have their names unique in the schema, but we failed to say anything about names of objects not in schemas and global objects. Added that. * Had example object types for global objects and objects not in schemas, but no examples for objects in schemas. Added that. Some programs whose output we could tweak per this: pg_ctlpg_ctl is a utility to initialize, start, stop, or control a PostgreSQL server.-D, --pgdata=DATADIR location of the database storage areato:pg_ctl is a utility to initialize or control a PostgreSQL database cluster.-D, --pgdata=DATADIR location of the database directorypg_basebackup:pg_basebackup takes a base backup of a running PostgreSQL server.to:pg_basebackup takes a base backup of a PostgreSQL instance.
+1, with two formal changes:
- Rearrangement of term "Data page" to meet alphabetical order.
- Add </glossdef> in one case to meet xml-well-formedness.
One last question: The definition of "Data directory" reads "... A cluster's storage space comprises the data directory plus ..." and 'cluster' links to '"glossary-instance". Shouldn't it link to "glossary-db-cluster"?
--
Jürgen Purtz
On 2020-Jun-16, Justin Pryzby wrote: > On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: Thanks for the review. I merged all your suggestions. This one: > > Most local objects belong to a specific > > + <glossterm linkend="glossary-schema">schema</glossterm> in their > > + containing database, such as > > + <glossterm linkend="glossary-relation">all types of relations</glossterm>, > > + <glossterm linkend="glossary-function">all types of functions</glossterm>, > > Maybe say: >Relations< (all types), and >Functions< (all types) led me down not one but two rabbit holes; first I realized that "functions" is an insufficient term since procedures should also be included but weren't, so I had to add the more generic term "routine" and then modify the definitions of all routine types to mix in well. I think overall the quality of these definitions is improved as a result. I also felt the need to revise the definition of "relations", so I did that too; this made me change the definition of resultset too. On 2020-Jun-17, Jürgen Purtz wrote: > +1, with two formal changes: > > - Rearrangement of term "Data page" to meet alphabetical order. To forestall these ordering issues (look, another rabbit hole), I grepped the file for all glossterms and sorted that under en_US rules, then reordered the terms to match that. Turns out there were several other ordering mistakes. git grep '<glossterm>' | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig LC_COLLATE=en_US.UTF-8 sort orig > sorted (Eliminating the tags is important, otherwise the sort uses the tags themselves to disambiguate) > One last question: The definition of "Data directory" reads "... A cluster's > storage space comprises the data directory plus ..." and 'cluster' links to > '"glossary-instance". Shouldn't it link to "glossary-db-cluster"? Yes, an oversight, thanks. I also added TPS, because I had already written it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2020-Jun-16, Justin Pryzby wrote: > On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: Thanks for the review. I merged all your suggestions. This one: > > Most local objects belong to a specific > > + <glossterm linkend="glossary-schema">schema</glossterm> in their > > + containing database, such as > > + <glossterm linkend="glossary-relation">all types of relations</glossterm>, > > + <glossterm linkend="glossary-function">all types of functions</glossterm>, > > Maybe say: >Relations< (all types), and >Functions< (all types) led me down not one but two rabbit holes; first I realized that "functions" is an insufficient term since procedures should also be included but weren't, so I had to add the more generic term "routine" and then modify the definitions of all routine types to mix in well. I think overall the quality of these definitions is improved as a result. I also felt the need to revise the definition of "relations", so I did that too; this made me change the definition of resultset too. On 2020-Jun-17, Jürgen Purtz wrote: > +1, with two formal changes: > > - Rearrangement of term "Data page" to meet alphabetical order. To forestall these ordering issues (look, another rabbit hole), I grepped the file for all glossterms and sorted that under en_US rules, then reordered the terms to match that. Turns out there were several other ordering mistakes. git grep '<glossterm>' | sed -e 's/<[^>]*>\([^<]*\)<[^>]*>/\1/' > orig LC_COLLATE=en_US.UTF-8 sort orig > sorted (Eliminating the tags is important, otherwise the sort uses the tags themselves to disambiguate) > One last question: The definition of "Data directory" reads "... A cluster's > storage space comprises the data directory plus ..." and 'cluster' links to > '"glossary-instance". Shouldn't it link to "glossary-db-cluster"? Yes, an oversight, thanks. I also added TPS, because I had already written it. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2020-06-19 01:51, Alvaro Herrera wrote: > On 2020-Jun-16, Justin Pryzby wrote: >> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: I noticed one typo: 'aggregates functions' should be 'aggregate functions' And one thing that I am not sure of (but strikes me as a bit odd): there are several cases of 'are enforced unique'. Should that not be 'are enforced to be unique' ? Anther small mistake (2x): 'The name of such objects of the same type are' should be 'The names of such objects of the same type are' (this phrase occurs 2x wrong, 1x correct) thanks, Erik Rijkers
On 2020-06-19 01:51, Alvaro Herrera wrote: > On 2020-Jun-16, Justin Pryzby wrote: >> On Tue, Jun 16, 2020 at 08:09:26PM -0400, Alvaro Herrera wrote: I noticed one typo: 'aggregates functions' should be 'aggregate functions' And one thing that I am not sure of (but strikes me as a bit odd): there are several cases of 'are enforced unique'. Should that not be 'are enforced to be unique' ? Anther small mistake (2x): 'The name of such objects of the same type are' should be 'The names of such objects of the same type are' (this phrase occurs 2x wrong, 1x correct) thanks, Erik Rijkers
Thanks for these fixes! I included all of these. On 2020-Jun-19, Erik Rijkers wrote: > And one thing that I am not sure of (but strikes me as a bit odd): > there are several cases of > 'are enforced unique'. Should that not be > 'are enforced to be unique' ? I included this change too; I am not too sure of it myself. If some English language neatnik wants to argue one way or the other, be my guest. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Thanks for these fixes! I included all of these. On 2020-Jun-19, Erik Rijkers wrote: > And one thing that I am not sure of (but strikes me as a bit odd): > there are several cases of > 'are enforced unique'. Should that not be > 'are enforced to be unique' ? I included this change too; I am not too sure of it myself. If some English language neatnik wants to argue one way or the other, be my guest. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services