Thread: Additional Chapter for Tutorial
Our documentation explains many details about commands, tools, parameters in detail and with high accuracy. Nevertheless my impression is that we neglect the 'big picture': why certain processes exist and what their relation to each other is, summary of strategies, visualization of key situations, ... . People with mature knowledge don't miss this information because they know all about it. But for beginners such explanations would be a great help. In the time before GSoD 2019 we had similar discussions. I plan to extend over time the part 'Tutorial' by an additional chapter with an overview about key design decisions and basic features. The typical audience should consist of persons with limited pre-knowledge in database systems and some interest in PostgreSQL. In the attachment you find a patch for the first sub-chapter. Subsequent sub-chapters should be: MVCC, transactions, VACUUM, backup, replication, ... - mostly with the focus on the PostgreSQL implementation and not on generic topics like b-trees. There is a predecessor of this patch: https://www.postgresql.org/message-id/974e09b8-edf5-f38f-2fb5-a5875782ffc9%40purtz.de . In the meanwhile its glossary-part is separated and commited. The new patch contains two elements: textual descriptions and 4 figures. My opinion concerning figures is set out in detail in the previous patch. Kind regards, Jürgen Purtz
Attachment
I plan to extend over time the part 'Tutorial' by an additional chapter
with an overview about key design decisions and basic features. The
typical audience should consist of persons with limited pre-knowledge in
database systems and some interest in PostgreSQL. In the attachment you
find a patch for the first sub-chapter. Subsequent sub-chapters should
be: MVCC, transactions, VACUUM, backup, replication, ... - mostly with
the focus on the PostgreSQL implementation and not on generic topics
like b-trees.
On 2020-04-17 19:56, Jürgen Purtz wrote: > Our documentation explains many details about commands, tools, > parameters in detail and with high accuracy. Nevertheless my > impression is that we neglect the 'big picture': why certain processes > [0001-architecture.patch] Very good stuff, and useful. I think. I mean that but nevertheless here is a lot of comment :) (I didn't fully compile as docs, just read the 'text' from the patch file) Collabortion Collaboration drop 'resulting' He acts in close cooperation with the It acts in close cooperation with the He loads the configuration files, allocates the It loads the configuration files, allocates the process</firstterm>. He checks the authorization, starts a process</firstterm>. it checks the authorization, starts a and instructs the client application to connect to him. All further and instructs the client application to connect to it. All further by him. by it. In an first attempt In a first attempt much huger than memory, it's likely that much larger than memory, it's likely that RAM is performed in units of complete pages while retaining RAM is performed in units of complete pages, retaining Sooner or later it is necessary to overwrite old RAM Sooner or later it becomes necessary to overwrite old RAM transfered transferred (multiple times) who runs which runs He writes it writes This is the primarily duty of the This is primarily the duty of the or possibly: This is the primary duty of the he starts periodically it starts periodically speeds up a possibly occurring recovery. can speed up recovery. writen written collects counter about accesses collects counters about accesses and others. He stores the obtained information in system and more. It stores the obtained information in system sudirectories consists subdirectories consist <-- plural, no -s there are information there is information and contains the ID of the and contains the ID (pid) of the ( IMHO, it is conventional (and therefore easier to read) to have 'e.g.' followed by a comma, and not by a semi-colon, although obviously that's not really wrong either. ) Thanks, Erik Rijkers
On 17.04.20 20:40, Erik Rijkers wrote: > Very good stuff, and useful. I think. > > I mean that but nevertheless here is a lot of comment :) > > (I didn't fully compile as docs, just read the 'text' from the patch > file) Thanks. Added nearly all of the suggestions. -- Jürgen Purtz
Attachment
On 20.04.20 10:30, Jürgen Purtz wrote: > On 17.04.20 20:40, Erik Rijkers wrote: >> Very good stuff, and useful. I think. >> >> I mean that but nevertheless here is a lot of comment :) >> >> (I didn't fully compile as docs, just read the 'text' from the patch >> file) > > Thanks. Added nearly all of the suggestions. > > What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics. Made some modifications in previous sub-chapters and in existing titles. Added some glossary entries. -- Jürgen Purtz
Attachment
On 2020-04-29 16:13, Jürgen Purtz wrote: > On 20.04.20 10:30, Jürgen Purtz wrote: >> On 17.04.20 20:40, Erik Rijkers wrote: >>> Very good stuff, and useful. I think. >>> >>> I mean that but nevertheless here is a lot of comment :) >>> >>> (I didn't fully compile as docs, just read the 'text' from the patch >>> file) >> >> Thanks. Added nearly all of the suggestions. >> >> > What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics. > Made some modifications in previous sub-chapters and in existing > titles. Added some glossary entries. > [0003-architecture.patch] Hi Jürgen, Here are again some suggested changes, up to line 600 (of the patch - that is around start of the new NVCC paragraph) I may have repeated some thing you have already rejected (it was too much work to go back and check). I am not a native speaker of english. One general remark: in my humble opinion, you write too many capitalized words. It's not really a problem but overall it's becomes bit too much. But I have not marked these. perhaps some future iteration. I'll probably read through the latter part of the patch later (probably tomorrow). Thanks, Erik Rijkers they merely send requests to the server side and receives they merely send requests to the server side and receive is a group of tightly coupled other server side processes plus a is a group of tightly coupled other server-side processes plus a Client requests (SELECT, UPDATE, ...) usually leads to the Client requests (SELECT, UPDATE, ...) usually lead to the Because files are much larger than memory, it's likely that Because files are often larger than memory, it's likely that RAM is performed in units of complete pages, retaining their size and layout. RAM is performed in units of complete pages. Reading file pages is notedly slower than reading Reading file pages is slower than reading of the <firstterm>Backend processes</firstterm> has done the job those pages are available for all other of the <firstterm>Backend processes</firstterm> has read pages into memory those pages are available for all other they must be transferred back to disk. This is a two-step process. they must be written back to disk. This is a two-step process. Because of the sequential nature of this writing, it is much Because of this writing is sequential, it is much in an independent process. Nevertheless all in an independent process. Nevertheless, all huge I/O activities can block other processes significantly, I/O activities can block other processes, it starts periodically and acts only for a short period. it starts periodically and is active only for a short period. duty. As its name suggests, he has to create duty. As its name suggests, it has to create In consequence, after a <firstterm>Checkpoint</firstterm> After a <firstterm>Checkpoint</firstterm>, In correlation with data changes, As a result of data changes, text lines about serious and non-serious events which can happen text lines about serious and less serious events which can happen database contains many <glossterm linkend="glossary-schema">schema</glossterm>, database contains many <glossterm linkend="glossary-schema">schemas</glossterm>, belongs to a certain <firstterm>schema</firstterm>, they cannot belongs to a single <firstterm>schema</firstterm>, they cannot A <firstterm>Cluster</firstterm> is the outer frame for a A <firstterm>Cluster</firstterm> is the outer container for a <literal>postgres</literal> as a copy of <literal>postgres</literal> is generated as a copy of role of <literal>template0</literal> as the origin role of <literal>template0</literal> as the pristine origin are different objects and absolutely independent from each are different objects and independent from each complete <firstterm>cluster</firstterm>, independent from <firstterm>cluster</firstterm>, independent from anywhere in the file system. In many cases, the environment somewhere in the file system. In many cases, the environment some files, all of which are necessary to store long lasting some files, all of which are necessary to store long-lasting <firstterm>tablespaces</firstterm> itself. <firstterm>tablespaces</firstterm> themselves. <firstterm>Postgres</firstterm> (respectively <firstterm>Postmaster</firstterm>) process. <firstterm>Postgres</firstterm> process (also known as <firstterm>Postmaster</firstterm>). <title>MVCC</title> <title>MVCC - Multiversion Concurrency Control</title> The dabase must take a sensible decision to prevent the application The database must take a sensible decision to prevent the application # this sentence I just don't understand - can you please elucidate? The database must take a sensible decision to prevent the application from promising delivery of the single article to both clients.
On 2020-04-29 16:13, Jürgen Purtz wrote: > On 20.04.20 10:30, Jürgen Purtz wrote: >> On 17.04.20 20:40, Erik Rijkers wrote: >>> Very good stuff, and useful. I think. >>> >>> I mean that but nevertheless here is a lot of comment :) >>> >>> (I didn't fully compile as docs, just read the 'text' from the patch >>> file) >> >> Thanks. Added nearly all of the suggestions. >> >> > What is new? Added two sub-chapters 'mvcc' and 'vacuum' plus graphics. > Made some modifications in previous sub-chapters and in existing titles. > Added some glossary entries. I don't see this really as belonging into the tutorial. The tutorial should be hands-on, how do you get started, how do you get some results. Your material is more of an overview of the whole system. What's a new user supposed to do with that? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 29.04.20 21:12, Peter Eisentraut wrote: > > I don't see this really as belonging into the tutorial. The tutorial > should be hands-on, how do you get started, how do you get some results. > Yes, the tutorial should be a short overview and give instructions how to start. IMO the first 4 sub-chapters fulfill this expectation. Indeed, the fifth (VACUUM) is extensive and offers many details. During the inspection of the existing documentation I recognized that there are many details about VACUUM, AUTOVACUUM, all of their parameters as well as their behavior. But the information is spread across many pages: Automatic Vacuuming, Client Connection Defaults, Routine Vacuuming, Resource Consumption, VACUUM. Even for a person with some pre-knowledge it is hard to get an overview how this fits together and why things are solved in exactly this way. In the end we have very good descriptions of all details but I miss the 'big picture'. Therefore I summarized central aspects and tried to give an answer to the question 'why is it done in this way?'. I do not dispute that the current version of the page is not adequate for beginners. But at some place we should have such a summary about vacuuming and freezing. How to proceed? - Remove the page and add a short paragraph to the MVCC page instead. - Cut down the page to a tiny portion. - Divide it into two parts: a) a short introduction and b) the rest after a statement like 'The following offers more details and parameters that are more interesting for an experienced user than for a beginner. You can easily skip it.' > Your material is more of an overview of the whole system. What's a > new user supposed to do with that? When I dive into a new subject, I'm more interested in its architecture than in its details. We shall offer an overview about the major PG components and strategies to beginners. -- Jürgen Purtz
On 30.04.20 14:31, Jürgen Purtz wrote: > On 29.04.20 21:12, Peter Eisentraut wrote: >> >> I don't see this really as belonging into the tutorial. The tutorial >> should be hands-on, how do you get started, how do you get some results. >> > Yes, the tutorial should be a short overview and give instructions how > to start. IMO the first 4 sub-chapters fulfill this expectation. > Indeed, the fifth (VACUUM) is extensive and offers many details. > > During the inspection of the existing documentation I recognized that > there are many details about VACUUM, AUTOVACUUM, all of their > parameters as well as their behavior. But the information is spread > across many pages: Automatic Vacuuming, Client Connection Defaults, > Routine Vacuuming, Resource Consumption, VACUUM. Even for a person > with some pre-knowledge it is hard to get an overview how this fits > together and why things are solved in exactly this way. In the end we > have very good descriptions of all details but I miss the 'big > picture'. Therefore I summarized central aspects and tried to give an > answer to the question 'why is it done in this way?'. I do not dispute > that the current version of the page is not adequate for beginners. > But at some place we should have such a summary about vacuuming and > freezing. > > How to proceed? > > - Remove the page and add a short paragraph to the MVCC page instead. > > - Cut down the page to a tiny portion. > > - Divide it into two parts: a) a short introduction and b) the rest > after a statement like 'The following offers more details and > parameters that are more interesting for an experienced user than for > a beginner. You can easily skip it.' > > >> Your material is more of an overview of the whole system. What's a >> new user supposed to do with that? > > When I dive into a new subject, I'm more interested in its > architecture than in its details. We shall offer an overview about the > major PG components and strategies to beginners. > > In comparison with to previous patch this one contains: - Position and title changed to reflect its intention and importance. - A <note> delimits VACUUM basics from details. This is done because I cannot find another suitable place for such a summarizing description. - Three additional sub-chapters. -- Jürgen Purtz
Attachment
> On 2 Jun 2020, at 17:01, Jürgen Purtz <juergen@purtz.de> wrote: > In comparison with to previous patch this one contains: > > - Position and title changed to reflect its intention and importance. > > - A <note> delimits VACUUM basics from details. This is done because I cannot find another suitable place for such a summarizingdescription. > > - Three additional sub-chapters. This patch no longer applies, due to conflicts in start.sgml, can you please submit a rebased version? cheers ./daniel
On 12.07.20 22:45, Daniel Gustafsson wrote: > This patch no longer applies, due to conflicts in start.sgml, can you please > submit a rebased version? ok. but I need some days. juergen
On 12.07.20 22:45, Daniel Gustafsson wrote:
> This patch no longer applies, due to conflicts in start.sgml, can you please
> submit a rebased version?
ok. but I need some days. juergen
> On 13 Jul 2020, at 14:20, Naresh gandi <naresh5310@gmail.com> wrote: (please avoid top-posting) > Which version is this application for? > > I tried for v12 and v13 Beta, both failed. Unless being a bugfix, all patches are only considered against the main development branch in Git. As this is new material, it would be for v14. cheers ./daniel
On 12.07.20 22:45, Daniel Gustafsson wrote: > > This patch no longer applies, due to conflicts in start.sgml, can you please > submit a rebased version? > > cheers ./daniel > New version attached. -- Jürgen Purtz
Attachment
On 2020-07-17 11:32, Jürgen Purtz wrote: > On 12.07.20 22:45, Daniel Gustafsson wrote: >> >> This patch no longer applies, due to conflicts in start.sgml, can you >> please >> submit a rebased version? >> >> cheers ./daniel >> > New version attached. > > [0005-architecture.patch] Hi, I went through the architecture.sgml file once, and accumulated the attached edits. There are still far too many Unneeded Capitals On Words for my taste but I have not changed many of those. We could use some more opinions on that, I suppose. (if it becomes too silent maybe include the pgsql-hackers again?) Thanks, Erik Rijkers > -- > > Jürgen Purtz
Attachment
On 18.07.20 19:17, Erik Rijkers wrote: > > Hi, > > I went through the architecture.sgml file once, and accumulated the > attached edits. > > There are still far too many Unneeded Capitals On Words for my taste > but I have not changed many of those. We could use some more opinions > on that, I suppose. (if it becomes too silent maybe include the > pgsql-hackers again?) > > Thanks, > > > Erik Rijkers The attached patch contains: - integration of Erik's suggestions - coordination of terms in text, graphic and glossary - some changes in upper-case usage - fewer usage of <firstterm> with two exceptions: The first chapter 4.1 emphasize all important terms to help beginners in their learning process; chapter 4.5. emphasize the term 'autovacuum' to straighten the fact that - despite its similarities - the tool autovacuum is something else than the SQL command vacuum. -- Jürgen Purtz
Attachment
Hi all,
I want to import XML file into PG database table.
I've find functions to get the XML content of a cell after imported an XML file with the pg_get_file function.
But, I want to explode the XML content to colums. How can I do this ?
PG 10 under Ubuntu 18
Cordialement, Pascal CROZET
DBA - Qualis Consulting
• 300 Route Nationale 6 – 69760 LIMONEST
_________________________________
Again, I don't see how this belongs into the tutorial. It is mostly advanced low-level information that is irrelevant for someone starting up, it is not hands-on, so quite unlike the rest of the tutorial, and for the most part the information just duplicates what is already explained elsewhere. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 01.09.20 23:30, Peter Eisentraut wrote: > It is mostly advanced low-level information that is irrelevant for > someone starting up, That applies only to the VACUUM chapter. VACUUM and AUTOVACUUM are controlled by a lot of parameters. Therefor the current documentation concerning the two mechanism spreads the description across different pages (20.4, 25.1, VACUUM command). Because of the structure of our documentation that's ok. But we should have a summary page somewhere - not necessarily in the tutorial. > the most part the information just duplicates what is already > explained elsewhere. That is the nature of a tutorial respectively a summary. -- J. Purtz
On 2020-09-02 09:04, Jürgen Purtz wrote: > On 01.09.20 23:30, Peter Eisentraut wrote: >> It is mostly advanced low-level information that is irrelevant for >> someone starting up, > That applies only to the VACUUM chapter. VACUUM and AUTOVACUUM are > controlled by a lot of parameters. Therefor the current documentation > concerning the two mechanism spreads the description across different > pages (20.4, 25.1, VACUUM command). Because of the structure of our > documentation that's ok. But we should have a summary page somewhere - > not necessarily in the tutorial. There is probably room for improvement, but the section numbers you mention are not about VACUUM, AFAICT, so I can't really comment on what you have in mind. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10.09.20 18:26, Peter Eisentraut wrote: > On 2020-09-02 09:04, Jürgen Purtz wrote: >> On 01.09.20 23:30, Peter Eisentraut wrote: >>> It is mostly advanced low-level information that is irrelevant for >>> someone starting up, >> That applies only to the VACUUM chapter. VACUUM and AUTOVACUUM are >> controlled by a lot of parameters. Therefor the current documentation >> concerning the two mechanism spreads the description across different >> pages (20.4, 25.1, VACUUM command). Because of the structure of our >> documentation that's ok. But we should have a summary page somewhere - >> not necessarily in the tutorial. > > There is probably room for improvement, but the section numbers you > mention are not about VACUUM, AFAICT, so I can't really comment on > what you have in mind. > Because of the additional chapter for the 'tutorial' on my local computer, the numbers increased for me. The regular chapter numbers are 19.4 and 24.1. Sorry for the confusion. In detail: 19.4: parameters to configure the server, especially five parameters 'vacuum_cost_xxx'. 19.10: parameters to configure autovacuum. 19.11: parameters to configure client connections, especially five parameters 'vacuum_xxx' concerning their freeze-behavior. 24.1: explains the general necessity of (auto)vacuum and their strategies. The page about the SQL command VACUUM explains the different options (FULL, FREEZE, ..) and their meaning. -- Jürgen Purtz
On 01.09.20 23:30, Peter Eisentraut wrote:
> It is mostly advanced low-level information that is irrelevant for
> someone starting up,
That applies only to the VACUUM chapter. VACUUM and AUTOVACUUM are
controlled by a lot of parameters. Therefor the current documentation
concerning the two mechanism spreads the description across different
pages (20.4, 25.1, VACUUM command). Because of the structure of our
documentation that's ok. But we should have a summary page somewhere -
not necessarily in the tutorial.
> the most part the information just duplicates what is already
> explained elsewhere.
That is the nature of a tutorial respectively a summary.
Attachment
I've begun looking at this and have included quite a few html comments within the patch. However, the two main items that I have found so far are:One, I agree with Peter that this seems misplaced in Tutorial. I would create a new Internals Chapter and place this material there, or maybe consider a sub-chapter under "Overview of PostgreSQL Internals". If this is deemed to be of a more primary importance than the content in the Internals section I would recommend placing it in Reference. I feel it does fit there and given the general importance of that section readers will be inclined to click into it and skim over its content.
I like the idea of dividing the material into two different chapters. The existing part "I. Tutorial" contains the first concrete steps: installation, creating database and database objects, using SQL basic and advanced features. Its typical audience consists of persons doing their first steps with PG. The new material is aimed at persons interested in implementation aspects of PG. Therefore, the part "VII. Internals" seems to be the natural place to integrate it, something like "Architecture and Implementation Aspects" or "Architecture and Implementation Cornerstones".
Creating such a chapter in "VII. Internals" will increase the existing chapter numbers 50 - 71, which may lead to some confusion. On the other hand the content can possibly be applied to all supported PG versions at the same time, which will lead to a consistent behavior. Extending one of the existing chapters won't work because all of them handle their own topic, eg.: "50. Overview of PostgreSQL Internals" (misleading title?) focuses on the handling of SQL statements from parsing to execution.
What are your thoughts?
--
J. Purtz
Creating such a chapter in "VII. Internals" will increase the existing chapter numbers 50 - 71, which may lead to some confusion. On the other hand the content can possibly be applied to all supported PG versions at the same time, which will lead to a consistent behavior. Extending one of the existing chapters won't work because all of them handle their own topic, eg.: "50. Overview of PostgreSQL Internals" (misleading title?) focuses on the handling of SQL statements from parsing to execution.
What are your thoughts?
I've begun looking at this and have included quite a few html comments within the patch. However, the two main items that I have found so far are:One, I agree with Peter that this seems misplaced in Tutorial. I would create a new Internals Chapter and place this material there, or maybe consider a sub-chapter under "Overview of PostgreSQL Internals". If this is deemed to be of a more primary importance than the content in the Internals section I would recommend placing it in Reference. I feel it does fit there and given the general importance of that section readers will be inclined to click into it and skim over its content.Two, I find the amount of detail being provided here to be on the too-much side. A bit more judicious use of links into the appropriate detail chapters seems warranted.I took a pretty heavy hand to the original section though aside from the scope comment it can probably be considered a bit weighted toward style preferences. Though I did note/rewrite a couple of things that seemed factually incorrect - and seemingly not done intentionally in the interest of simplification. Specifically the client connection process and, I think, the relationship between the checkpointer and background writer.I do like the idea and the general flow of the material so far - though I haven't really looked at the overall structure yet, just started reading and editing from the top of the new file.I've attached the original 0007 patch and my diff against it applied to HEAD.Took a quick peek at the image (at the end) and while I will need a second pass over this section regardless I figured I'd provide this subset of feedback now in order to move things along a bit.David J.
The attached patch is an intermediate, mostly formal step. It includes:
- Moving the chapter to "Part VII. Internals".
- Changing the title of the current chapter "Chapter 50. Overview of PostgreSQL Internals" to "Overview of Query Handling" because the old title is too generic. This chapter is focused on the handling of queries.
- Integration of David's smaller suggestions. For the more important suggestions I need some days.
The patch is intended to give every interested person an overall impression of the chapter within its new position. Because it has moved from part 'Tutorial' to 'Internals' the text should be very accurate concerning technical issues - like all the other chapters in this part. A tutorial chapter has a more superficial nature.
--
J. Purtz
Attachment
Hi Jürgen, What's going to happen with this work? If you intend to have it eventually committed, I think it will be necessary to make the patches smaller, and bring them into the commitfest app, so that others can follow progress. I for one, cannot see/remember/understand what has been done, or even whether you intend to continue with it. Thanks, Erik
> Hi Jürgen, > > What's going to happen with this work? > > If you intend to have it eventually committed, I think it will be > necessary to make the patches smaller, and bring them into the > commitfest app, so that others can follow progress. > > I for one, cannot see/remember/understand what has been done, or even > whether you intend to continue with it. > > Thanks, > > Erik > > Peter changed the status to 'Returned with feedback' at the end of the last commit fest. I'm not absolutely sure, but my understanding is that the patch is rejected. -- Jürgen Purtz