Thread: documentation structure

documentation structure

From

Robert Haas

Date:

18 March 2024, 14:11:51

I was looking at the documentation index this morning[1], and I can't
help feeling like there are some parts of it that are over-emphasized
and some parts that are under-emphasized. I'm not sure what we can do
about this exactly, but I thought it worth writing an email and seeing
what other people think.

The two sections of the documentation that seem really
under-emphasized to me are the GUC documentation and the SQL
reference. The GUC documentation is all buried under "20. Server
Configuration" and the SQL command reference is under "I. SQL
commands". For reasons that I don't understand, all chapters except
for those in "VI. Reference" are numbered, but the chapters in that
section have Roman numerals instead.

I don't know what other people's experience is, but for me, wanting to
know what a command does or what a setting does is extremely common.
Therefore, I think these chapters are disproportionately important and
should be emphasized more. In the case of the GUC reference, one idea
I have is to split up "III. Server Administration". My proposal is
that we divide it into three sections. The first would be called "III.
Server Installation" and would cover chapters 16 (installation from
binaries) through 19 (server setup and operation). The second would be
called "IV. Server Configuration" -- so every section that's currently
a subsection of "server configuration" would become a top-level
chapter. The third division would be "V. Server Administration," and
would cover the current chapters 21-33. This is probably far from
perfect, but it seems like a relatively simple change and better than
what we have now.

I don't know what to do about "I. SQL commands". It's obviously
impractical to promote that to a top-level section, because it's got a
zillion sub-pages which I don't think we want in the top-level
documentation index. But having it as one of several unnumbered
chapters interposed between 51 and 52 doesn't seem great either.

The stuff that I think is over-emphasized is as follows: (a) chapters
1-3, the tutorial; (b) chapters 4-6, which are essentially a
continuation of the tutorial, and not at all similar to chapters 8-11
which are chalk-full of detailed technical information; (c) chapters
43-46, one per procedural language; perhaps these could just be
demoted to sub-sections of chapter 42 on procedural languages; (d)
chapters 47 (server programming interface), 50 (replication progress
tracking), and 51 (archive modules), all of which are important to
document but none of which seem important enough to put them in the
top-level documentation index; and (e) large parts of section "VII.
Internals," which again contain tons of stuff of very marginal
interest. The first ~4 chapters of the internals section seem like
they might be mainstream enough to justify the level of prominence
that we give them, but the rest has got to be of interest to a tiny
minority of readers.

I think it might be possible to consolidate the internals section by
grouping a bunch of existing entries together by category. Basically,
after the first few chapters, you've got stuff that is of interest to
C programmers writing core or extension code; and you've got
explainers on things like GEQO and index op-classes and support
functions which might be of interest even to non-programmers. I think
for example that we don't need separate top-level chapters on writing
procedural language handlers, FDWs, tablesample methods, custom scan
providers, table access methods, index access methods, and WAL
resource managers. Some or all of those could be grouped under a
single chapter, perhaps, e.g. Using PostgreSQL Extensibility
Interfaces.

Thoughts? I realize that this topic is HIGHLY prone to ENDLESS
bikeshedding, and it's inevitable that not everybody is going to
agree. But I hope we can agree that it's completely silly that it's
vastly easier to find the documentation about the backup manifest
format than it is to find the documentation on CREATE TABLE or
shared_buffers, and if we can agree on that, then perhaps we can agree
on some way to make things better.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/docs/16/index.html

Re: documentation structure

From

Matthias van de Meent

Date:

18 March 2024, 14:54:45

On Mon, 18 Mar 2024 at 15:12, Robert Haas <robertmhaas@gmail.com> wrote:

I'm not going into detail about the other docs comments, I don't have
much of an opinion either way on the mentioned sections. You make good
arguments; yet I don't usually use those sections of the docs but
rather do code searches.

> I don't know what to do about "I. SQL commands". It's obviously
> impractical to promote that to a top-level section, because it's got a
> zillion sub-pages which I don't think we want in the top-level
> documentation index. But having it as one of several unnumbered
> chapters interposed between 51 and 52 doesn't seem great either.

Could "SQL Commands" be a top-level construct, with subsections for
SQL/DML, SQL/DDL, SQL/Transaction management, and PG's
extensions/administrative/misc features? I sometimes find myself
trying to mentally organize what SQL commands users can use vs those
accessible to database owners and administrators, which is not
currently organized as such in the SQL Commands section.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: documentation structure

From

Robert Haas

Date:

18 March 2024, 17:21:35

On Mon, Mar 18, 2024 at 10:55 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > I don't know what to do about "I. SQL commands". It's obviously
> > impractical to promote that to a top-level section, because it's got a
> > zillion sub-pages which I don't think we want in the top-level
> > documentation index. But having it as one of several unnumbered
> > chapters interposed between 51 and 52 doesn't seem great either.
>
> Could "SQL Commands" be a top-level construct, with subsections for
> SQL/DML, SQL/DDL, SQL/Transaction management, and PG's
> extensions/administrative/misc features? I sometimes find myself
> trying to mentally organize what SQL commands users can use vs those
> accessible to database owners and administrators, which is not
> currently organized as such in the SQL Commands section.

Yeah, I wondered about that, too. Or for example you could group all
CREATE commands together, all ALTER commands together, all DROP
commands together, etc. But I can't really see a future in such
schemes, because having a single page that links to the reference
documentation for every single command we have in alphabetical order
is incredibly handy, or at least I have found it so. So my feeling -
at least at present - is that it's more fruitful to look into cutting
down the amount of clutter that appears in the top-level documentation
index, and maybe finding ways to make important sections like the SQL
reference more prominent.

Given how much documentation we have, it's just not going to be
possible to make everything that matters conveniently visible at the
top level. I think if people have to click down a level for the SQL
reference, that's fine, as long as the link they need to click on is
reasonably visible. What annoys me about the present structure is that
it isn't. You don't get any visual clue that the "SQL Commands" page
with ~100 subpages is more important than "51. Archive Modules" or
"33. Regression Tests" or "58. Writing a Procedural Language Handler,"
but it totally is.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Roberto Mello

Date:

18 March 2024, 19:06:38

On Mon, Mar 18, 2024 at 10:12 AM Robert Haas <robertmhaas@gmail.com> wrote:

I was looking at the documentation index this morning[1], and I can't
help feeling like there are some parts of it that are over-emphasized
and some parts that are under-emphasized. I'm not sure what we can do
about this exactly, but I thought it worth writing an email and seeing
what other people think.

I agree, and my usage patterns of the docs are similar.

As the project progresses and more features are added and tacked on to existing docs, things can get

murky or buried. I imagine that web access and search logs could paint a picture of documentation usage.

I don't know what other people's experience is, but for me, wanting to
know what a command does or what a setting does is extremely common.
Therefore, I think these chapters are disproportionately important and
should be emphasized more. In the case of the GUC reference, one idea

I have is to split up "III. Server Administration". My proposal is
that we divide it into three sections. The first would be called "III.
Server Installation" and would cover chapters 16 (installation from
binaries) through 19 (server setup and operation). The second would be
called "IV. Server Configuration" -- so every section that's currently
a subsection of "server configuration" would become a top-level

chapter. The third division would be "V. Server Administration," and
would cover the current chapters 21-33. This is probably far from

I like all of those.

I don't know what to do about "I. SQL commands". It's obviously
impractical to promote that to a top-level section, because it's got a
zillion sub-pages which I don't think we want in the top-level
documentation index. But having it as one of several unnumbered
chapters interposed between 51 and 52 doesn't seem great either.

I think it'd be easier to read if current "VI. Reference" came right after "Server Administration",

ahead of "Client Interfaces" and "Server Programming", which are of interest to a much smaller

subset of users.

Also if the subchapters were numbered like the rest of them. I don't think the roman numerals are

particularly helpful.

The stuff that I think is over-emphasized is as follows: (a) chapters
1-3, the tutorial; (b) chapters 4-6, which are essentially a

...

Also +1

Thoughts? I realize that this topic is HIGHLY prone to ENDLESS
bikeshedding, and it's inevitable that not everybody is going to
agree. But I hope we can agree that it's completely silly that it's
vastly easier to find the documentation about the backup manifest
format than it is to find the documentation on CREATE TABLE or
shared_buffers, and if we can agree on that, then perhaps we can agree
on some way to make things better.

Impossible to please everyone, but I'm sure we can improve things.

I've contributed to different parts of the docs over the years, and would be happy

to help with this work.

Roberto

Re: documentation structure

From

Laurenz Albe

Date:

18 March 2024, 21:40:07

On Mon, 2024-03-18 at 10:11 -0400, Robert Haas wrote:
> The two sections of the documentation that seem really
> under-emphasized to me are the GUC documentation and the SQL
> reference. The GUC documentation is all buried under "20. Server
> Configuration" and the SQL command reference is under "I. SQL
> commands". For reasons that I don't understand, all chapters except
> for those in "VI. Reference" are numbered, but the chapters in that
> section have Roman numerals instead.

That last fact is very odd indeed and could be easily fixed.

> I don't know what other people's experience is, but for me, wanting to
> know what a command does or what a setting does is extremely common.
> Therefore, I think these chapters are disproportionately important and
> should be emphasized more. In the case of the GUC reference, one idea
> I have is to split up "III. Server Administration". My proposal is
> that we divide it into three sections. The first would be called "III.
> Server Installation" and would cover chapters 16 (installation from
> binaries) through 19 (server setup and operation). The second would be
> called "IV. Server Configuration" -- so every section that's currently
> a subsection of "server configuration" would become a top-level
> chapter. The third division would be "V. Server Administration," and
> would cover the current chapters 21-33. This is probably far from
> perfect, but it seems like a relatively simple change and better than
> what we have now.

I'm fine with splitting up "Server Administration" into three sections
like you propose.

> I don't know what to do about "I. SQL commands". It's obviously
> impractical to promote that to a top-level section, because it's got a
> zillion sub-pages which I don't think we want in the top-level
> documentation index. But having it as one of several unnumbered
> chapters interposed between 51 and 52 doesn't seem great either.

I think that both the GUCs and the SQL reference could be top-level
sections.  For the GUCs there is an obvious split in sub-chapters,
and the SQL reference could be a top-level section without any chapters
under it.

> The stuff that I think is over-emphasized is as follows: (a) chapters
> 1-3, the tutorial; (b) chapters 4-6, which are essentially a
> continuation of the tutorial, and not at all similar to chapters 8-11
> which are chalk-full of detailed technical information; (c) chapters
> 43-46, one per procedural language; perhaps these could just be
> demoted to sub-sections of chapter 42 on procedural languages; (d)
> chapters 47 (server programming interface), 50 (replication progress
> tracking), and 51 (archive modules), all of which are important to
> document but none of which seem important enough to put them in the
> top-level documentation index; and (e) large parts of section "VII.
> Internals," which again contain tons of stuff of very marginal
> interest. The first ~4 chapters of the internals section seem like
> they might be mainstream enough to justify the level of prominence
> that we give them, but the rest has got to be of interest to a tiny
> minority of readers.

I disagree that the tutorial is over-emphasized.

I also disagree that chapters 4 to 6 are a continuation of the tutorial.
Or at least, they shouldn't be.
When I am looking for a documentation reference on something like
security considerations of SECURITY DEFINER functions, my first
impulse is to look in chapter 5 (Data Definition) or in chapter 38
(Extending SQL), and I am surprised to find it discussed in the
SQL reference of CREATE FUNCTION.

Another case in point is the "Notes" section for CREATE VIEW.  Why is
that not somewhere under "Data Definition"?

For me, the reference should be terse and focused on the syntax.

Changing that is probably a lost cause by now, but I feel that we need
not encourage that development any more by playing down the earlier
chapters.

> I think it might be possible to consolidate the internals section by
> grouping a bunch of existing entries together by category. Basically,
> after the first few chapters, you've got stuff that is of interest to
> C programmers writing core or extension code; and you've got
> explainers on things like GEQO and index op-classes and support
> functions which might be of interest even to non-programmers. I think
> for example that we don't need separate top-level chapters on writing
> procedural language handlers, FDWs, tablesample methods, custom scan
> providers, table access methods, index access methods, and WAL
> resource managers. Some or all of those could be grouped under a
> single chapter, perhaps, e.g. Using PostgreSQL Extensibility
> Interfaces.

I have no strong feelings about that.

Yours,
Laurenz Albe

Re: documentation structure

From

Tom Lane

Date:

18 March 2024, 22:51:13

Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Mon, 2024-03-18 at 10:11 -0400, Robert Haas wrote:
>> I don't know what to do about "I. SQL commands". It's obviously
>> impractical to promote that to a top-level section, because it's got a
>> zillion sub-pages which I don't think we want in the top-level
>> documentation index. But having it as one of several unnumbered
>> chapters interposed between 51 and 52 doesn't seem great either.

> I think that both the GUCs and the SQL reference could be top-level
> sections.  For the GUCs there is an obvious split in sub-chapters,
> and the SQL reference could be a top-level section without any chapters
> under it.

I'd be in favor of promoting all three of the "Reference" things to
the top level, except that as Robert says, it seems likely that that
would end in having a hundred individual command reference pages
visible in the topmost table of contents.  Also, if we manage to
suppress that, did we really make it any more prominent?  Not sure.

Making "SQL commands" top-level with half a dozen subsections would
solve the visibility problem, but I'm not real eager to go there,
because I foresee endless arguments about which subsection a given
command goes in.  Robert's point about wanting a single alphabetized
list is valid too (although you could imagine that being a list in an
introductory section, similar to what we have for system catalogs).

This might be a silly suggestion, but: could we just render the
"most important" chapter titles in a larger font?

            regards, tom lane

Re: documentation structure

From

Robert Haas

Date:

18 March 2024, 23:34:23

On Mon, Mar 18, 2024 at 6:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> This might be a silly suggestion, but: could we just render the
> "most important" chapter titles in a larger font?

It's not the silliest suggestion ever -- you could have proposed
<blink>! -- but I also suspect it's not the right answer. Of course,
varying the font size can be a great way of emphasizing some things
more than others, but it doesn't usually work out well to just take a
document that was designed to be displayed in a uniform font size and
enlarge bits of text here and there. You usually want to have some
kind of overall plan of which font size is a single component.

For example, on a corporate home page, it's quite common to have two
nav bars, the larger of which has entries that correspond to the
company's product offerings and/or marketing materials, and the
smaller of which has "utility functions" like "login", "contact us",
and "search". Font size can be an effective tool for emphasizing the
relative importance of one nav bar versus the other, but you don't
start by deciding which things are going to get displayed in a larger
font. You start with an overall idea of the layout and then the font
size flows out of that.

Just riffing a bit, you could imagine adding a nav bar to our
documentation, either across the top or along the side, that is always
there on every page of the documentation and contains those links that
we want to make sure are always visible. Necessarily, these must be
limited in number. Then on the home page you could have the whole
table of contents as we do today, and you use that to navigate to
everything that isn't one of the quick links.

Or you can imagine that the home page of our documentation isn't just
a tree view like it is today; it might instead be written in paragraph
form. "Welcome to the PostgreSQL documentation! If you're new here,
check out our <link>tutorial</link>! Otherwise, you might be
interested in our <link>SQL reference</link>, our <link>configuration
reference</link>, or our <link>banana plantation</link>. If none of
those sound like what you want, check out the <link>documentation
index</link>." Obviously in order to actually work, something like
this would need to be expanded into enough paragraphs to actually
cover all of the important sections of the documentation, and probably
not mention banana plantations. Or maybe it wouldn't be just
paragraphs, but a two-column table, with each row of the table having
a main title and link in the narrower lefthand column and a blurb with
more links in the wider righthand column.

I'm sure there are a lot of other ways to do this, too. Our main
documentation page is very old-school, and there are probably a bunch
of ways to do better.

But I'm not sure how easy it would be to get agreement on something
specific, and I don't know how well our toolchain can support anything
other than what we've already got. I've also learned from painful
experience that you can't fix bad content with good markup. I think it
is worth spending some effort on trying to beat the existing format
into submission, promoting things that seem to deserve it and demoting
those that seem to deserve that. At some point, we'll probably reach a
point of diminishing returns, either because we all agree we've done
as well as we can, or because we can't agree on what else to do, and
maybe at that point the only way to improve further is with better web
design and/or a different documentation toolchain. But I think it's
fairly clear that we're not at that point now.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Tom Lane

Date:

19 March 2024, 13:50:36

Daniel Gustafsson <daniel@yesql.se> writes:
> It's actually not very odd, the reference section is using <reference> elements
> and we had missed the arabic numerals setting on those.  The attached fixes
> that for me.  That being said, we've had roman numerals for the reference
> section since forever (all the way down to the 7.2 docs online has it) so maybe
> it was intentional?

I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
it's not that way simply because nobody thought about it.

            regards, tom lane

Re: documentation structure

From

Andrew Dunstan

Date:

19 March 2024, 21:39:39

On Mon, Mar 18, 2024 at 10:12 AM Robert Haas <robertmhaas@gmail.com> wrote:

I was looking at the documentation index this morning[1], and I can't
help feeling like there are some parts of it that are over-emphasized
and some parts that are under-emphasized. I'm not sure what we can do
about this exactly, but I thought it worth writing an email and seeing
what other people think.

The two sections of the documentation that seem really
under-emphasized to me are the GUC documentation and the SQL
reference. The GUC documentation is all buried under "20. Server
Configuration" and the SQL command reference is under "I. SQL
commands". For reasons that I don't understand, all chapters except
for those in "VI. Reference" are numbered, but the chapters in that
section have Roman numerals instead.

I don't know what other people's experience is, but for me, wanting to
know what a command does or what a setting does is extremely common.
Therefore, I think these chapters are disproportionately important and
should be emphasized more. In the case of the GUC reference, one idea
I have is to split up "III. Server Administration". My proposal is
that we divide it into three sections. The first would be called "III.
Server Installation" and would cover chapters 16 (installation from
binaries) through 19 (server setup and operation). The second would be
called "IV. Server Configuration" -- so every section that's currently
a subsection of "server configuration" would become a top-level
chapter. The third division would be "V. Server Administration," and
would cover the current chapters 21-33. This is probably far from
perfect, but it seems like a relatively simple change and better than
what we have now.

I don't know what to do about "I. SQL commands". It's obviously
impractical to promote that to a top-level section, because it's got a
zillion sub-pages which I don't think we want in the top-level
documentation index. But having it as one of several unnumbered
chapters interposed between 51 and 52 doesn't seem great either.

The stuff that I think is over-emphasized is as follows: (a) chapters
1-3, the tutorial; (b) chapters 4-6, which are essentially a
continuation of the tutorial, and not at all similar to chapters 8-11
which are chalk-full of detailed technical information; (c) chapters
43-46, one per procedural language; perhaps these could just be
demoted to sub-sections of chapter 42 on procedural languages; (d)
chapters 47 (server programming interface), 50 (replication progress
tracking), and 51 (archive modules), all of which are important to
document but none of which seem important enough to put them in the
top-level documentation index; and (e) large parts of section "VII.
Internals," which again contain tons of stuff of very marginal
interest. The first ~4 chapters of the internals section seem like
they might be mainstream enough to justify the level of prominence
that we give them, but the rest has got to be of interest to a tiny
minority of readers.

I think it might be possible to consolidate the internals section by
grouping a bunch of existing entries together by category. Basically,
after the first few chapters, you've got stuff that is of interest to
C programmers writing core or extension code; and you've got
explainers on things like GEQO and index op-classes and support
functions which might be of interest even to non-programmers. I think
for example that we don't need separate top-level chapters on writing
procedural language handlers, FDWs, tablesample methods, custom scan
providers, table access methods, index access methods, and WAL
resource managers. Some or all of those could be grouped under a
single chapter, perhaps, e.g. Using PostgreSQL Extensibility
Interfaces.

Thoughts? I realize that this topic is HIGHLY prone to ENDLESS
bikeshedding, and it's inevitable that not everybody is going to
agree. But I hope we can agree that it's completely silly that it's
vastly easier to find the documentation about the backup manifest
format than it is to find the documentation on CREATE TABLE or
shared_buffers, and if we can agree on that, then perhaps we can agree
on some way to make things better.

+many for improving the index.

My own pet docs peeve is a purely editorial one: func.sgml is a 30k line beast, and I think there's a good case for splitting out at least the larger chunks of it.

cheers

andrew

Re: documentation structure

From

Robert Haas

Date:

20 March 2024, 14:32:53

On Mon, Mar 18, 2024 at 5:40 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> I also disagree that chapters 4 to 6 are a continuation of the tutorial.
> Or at least, they shouldn't be.
> When I am looking for a documentation reference on something like
> security considerations of SECURITY DEFINER functions, my first
> impulse is to look in chapter 5 (Data Definition) or in chapter 38
> (Extending SQL), and I am surprised to find it discussed in the
> SQL reference of CREATE FUNCTION.

I looked at this a bit more closely. There's actually a lot of
detailed technical information in chapters 4 and 5, but chapter 6 is
extremely short and mostly recapitulates chapter 2.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Robert Haas

Date:

20 March 2024, 16:43:08

On Tue, Mar 19, 2024 at 5:39 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> +many for improving the index.

Here's a series of four patches. Taken together, they cut down the
number of numbered chapters from 76 to 68. I think we could easily
save that much again if I wrote a few more patches along similar
lines, but I'm posting these first to see what people think.

0001 removes the "Installation from Binaries" chapter. The whole thing
is four sentences. I moved the most important information into the
"Installation from Source Code" chapter and retitled it
"Installation".

0002 removes the "Monitoring Disk Usage" chapter by folding it into
the immediately-preceding "Monitoring Database Activity" chapter. I
kind of feel like the "Monitoring Disk Usage" chapter might be in need
of a bigger rewrite or just outright removal, but there's surely not
enough content here to justify making it a top-level chapter.

0003 merges all of the "Internals" chapters whose names are the names
of built-in index access methods (Btree, Gin, etc.) into a single
chapter called "Built-In Index Access Methods". All of these chapters
have a very similar structure and none of them are very long, so it
makes a lot of sense, at least in my mind, to consolidate them into
one.

0004 merges the "Generic WAL Records" and "Custom WAL Resource
Managers" chapter together, creating a new chapter called "Write Ahead
Logging for Extensions".

Overall, I think this achieves a minor but pleasant level of
de-cluttering of the index. It's going to take a lot more than one
morning's work to produce a major improvement, but at least this is
something.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: documentation structure

From

Bruce Momjian

Date:

20 March 2024, 17:35:42

On Wed, Mar 20, 2024 at 12:43:08PM -0400, Robert Haas wrote:
> Overall, I think this achieves a minor but pleasant level of
> de-cluttering of the index. It's going to take a lot more than one
> morning's work to produce a major improvement, but at least this is
> something.

I think this kind of doc structure review is long overdue.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

From

Robert Haas

Date:

20 March 2024, 18:16:07

On Wed, Mar 20, 2024 at 1:35 PM Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Mar 20, 2024 at 12:43:08PM -0400, Robert Haas wrote:
> > Overall, I think this achieves a minor but pleasant level of
> > de-cluttering of the index. It's going to take a lot more than one
> > morning's work to produce a major improvement, but at least this is
> > something.
>
> I think this kind of doc structure review is long overdue.

Thanks, Bruce!

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Alvaro Herrera

Date:

20 March 2024, 21:05:45

On 2024-Mar-20, Robert Haas wrote:

> 0003 merges all of the "Internals" chapters whose names are the names
> of built-in index access methods (Btree, Gin, etc.) into a single
> chapter called "Built-In Index Access Methods". All of these chapters
> have a very similar structure and none of them are very long, so it
> makes a lot of sense, at least in my mind, to consolidate them into
> one.

I think you can achieve this with a much smaller patch that just changes
the outer tag in each file so that each file is a <sect1>, then create a
single file that includes all of these plus an additional outer tag for
the <chapter> (or maybe just add the <chapter> in postgres.sgml).  This
has the advantage that each AM continues to be a separate single file,
and you still have your desired structure.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Re: documentation structure

From

Robert Haas

Date:

20 March 2024, 21:21:53

On Wed, Mar 20, 2024 at 5:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> I think you can achieve this with a much smaller patch that just changes
> the outer tag in each file so that each file is a <sect1>, then create a
> single file that includes all of these plus an additional outer tag for
> the <chapter> (or maybe just add the <chapter> in postgres.sgml).  This
> has the advantage that each AM continues to be a separate single file,
> and you still have your desired structure.

Right, that could also be done, and not just for 0003. I just wasn't
sure that was the right approach. It would mean that the division of
the SGML into files continues to reflect the original chapter
divisions rather than the current ones forever. In the short run
that's less churn, less back-patching pain, etc.; but in the long term
it means you've got relics of a structure that doesn't exist any more
sticking around forever.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Tom Lane

Date:

20 March 2024, 21:25:45

Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Mar 20, 2024 at 5:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> I think you can achieve this with a much smaller patch that just changes
>> the outer tag in each file so that each file is a <sect1>, then create a
>> single file that includes all of these plus an additional outer tag for
>> the <chapter> (or maybe just add the <chapter> in postgres.sgml).  This
>> has the advantage that each AM continues to be a separate single file,
>> and you still have your desired structure.

> Right, that could also be done, and not just for 0003. I just wasn't
> sure that was the right approach. It would mean that the division of
> the SGML into files continues to reflect the original chapter
> divisions rather than the current ones forever. In the short run
> that's less churn, less back-patching pain, etc.; but in the long term
> it means you've got relics of a structure that doesn't exist any more
> sticking around forever.

I'd say that a separate file per AM is a good thing regardless.
Elsewhere in this same thread are grumblings about how big func.sgml
is; why would you think it good to start down that same path for the
AM documentation?

            regards, tom lane

Re: documentation structure

From

Robert Haas

Date:

21 March 2024, 13:23:57

On Wed, Mar 20, 2024 at 5:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd say that a separate file per AM is a good thing regardless.
> Elsewhere in this same thread are grumblings about how big func.sgml
> is; why would you think it good to start down that same path for the
> AM documentation?

Well, I suppose I thought it was a good idea because (1) we don't seem
to have any existing precedent for file-per-sect1 rather than
file-per-chapter and (2) all of the per-AM files combined are less
than 20% of the size of func.sgml.

But, OK, if you want to establish a new paradigm here, sure. I see two
ways to do it. We can either put the <chapter> tag directly in
postgres.sgml, or I can still create a new indextypes.sgml and put
&btree; etc. inside of it. Which way do you prefer?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Tom Lane

Date:

21 March 2024, 13:38:18

Robert Haas <robertmhaas@gmail.com> writes:
> Well, I suppose I thought it was a good idea because (1) we don't seem
> to have any existing precedent for file-per-sect1 rather than
> file-per-chapter and (2) all of the per-AM files combined are less
> than 20% of the size of func.sgml.

We have done (1) in places, eg. json.sgml, array.sgml,
rangetypes.sgml, rowtypes.sgml, and the bulk of extend.sgml is split
out into xaggr, xfunc, xindex, xoper, xtypes.  I'd be the first to
concede it's a bit haphazard, but it's not like there's no precedent.

As for (2), func.sgml likely should have been split years ago.

> But, OK, if you want to establish a new paradigm here, sure. I see two
> ways to do it. We can either put the <chapter> tag directly in
> postgres.sgml, or I can still create a new indextypes.sgml and put
> &btree; etc. inside of it. Which way do you prefer?

I'd follow the extend.sgml precedent: have a file corresponding to the
chapter and containing any top-level text we need, then that includes
a file per sect1.

            regards, tom lane

Re: documentation structure

From

Robert Haas

Date:

21 March 2024, 14:31:15

On Thu, Mar 21, 2024 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd follow the extend.sgml precedent: have a file corresponding to the
> chapter and containing any top-level text we need, then that includes
> a file per sect1.

OK, here's a new patch set. I've revised 0003 and 0004 to use this
approach, and I've added a new 0005 that does essentially the same
thing for the PL chapters.

0001 and 0002 are changed. Should 0002 use the include-an-entity
approach as well?

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

Re: documentation structure

From

Robert Haas

Date:

21 March 2024, 16:16:11

On Thu, Mar 21, 2024 at 10:31 AM Robert Haas <robertmhaas@gmail.com> wrote:
> 0001 and 0002 are changed. Should 0002 use the include-an-entity
> approach as well?

Woops. I meant to say that 0001 and 0002 are *unchanged*.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Alvaro Herrera

Date:

21 March 2024, 16:42:58

On 2024-Mar-21, Robert Haas wrote:

> On Thu, Mar 21, 2024 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I'd follow the extend.sgml precedent: have a file corresponding to the
> > chapter and containing any top-level text we need, then that includes
> > a file per sect1.
> 
> OK, here's a new patch set. I've revised 0003 and 0004 to use this
> approach, 

Great, thanks.  Looking at the index in the PDF after (only) 0003, we
now have this structure

62. Table Access Method Interface Definition ....................................................... 2475
63. Index Access Method Interface Definition ....................................................... 2476
63.1. Basic API Structure for Indexes .......................................................... 2476
63.2. Index Access Method Functions .......................................................... 2479
63.3. Index Scanning ................................................................................ 2485
63.4. Index Locking Considerations ............................................................. 2486
63.5. Index Uniqueness Checks .................................................................. 2487
63.6. Index Cost Estimation Functions ......................................................... 2489
64. Generic WAL Records ................................................................................. 2492
65. Custom WAL Resource Managers ................................................................. 2494
66. Built-in Index Access Methods ...................................................................... 2496

which is a bit odd: why are the two WAL chapters in the middle of the
chapters 62 and 63 talking about AMs?  Maybe put 66 right after 63
instead.    Also, is it really better to have 62/63 first and 66
later?  It sounds to me like 66 is more user-oriented and the other two
are developer-oriented, so I'm inclined to suggest putting them the
other way around, but I'm not really sure about this.  (Also, starting
chapter 66 straight with 66.1 BTree without any intro text looks a bit
odd; maybe one short introductory paragraph is sufficient?)

> and I've added a new 0005 that does essentially the same
> thing for the PL chapters.

I was looking at the PL chapters earlier today too, wondering whether
this would be valuable; but I worry that there are too many
sub-sub-sections there, so it could end up being a bit messy.  I didn't
look at the resulting output though.

> 0001 and 0002 are [un]changed. Should 0002 use the include-an-entity
> approach as well?

Shrug, I wouldn't, doesn't look worth it.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No es bueno caminar con un hombre muerto"

Re: documentation structure

From

Robert Haas

Date:

21 March 2024, 18:29:55

On Thu, Mar 21, 2024 at 12:43 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> which is a bit odd: why are the two WAL chapters in the middle of the
> chapters 62 and 63 talking about AMs?  Maybe put 66 right after 63
> instead.    Also, is it really better to have 62/63 first and 66
> later?  It sounds to me like 66 is more user-oriented and the other two
> are developer-oriented, so I'm inclined to suggest putting them the
> other way around, but I'm not really sure about this.  (Also, starting
> chapter 66 straight with 66.1 BTree without any intro text looks a bit
> odd; maybe one short introductory paragraph is sufficient?)

I had similar thoughts. I think that we should consider some changes
to the chapter ordering, but I didn't want to try to change too many
things all at once, since somebody only has to hate one thing about
the patch to sink the whole thing.

But since you brought it up, what I've been thinking about is that the
whole division into parts might need to be rethought a bit. I feel
like "VII. Internals" is a mix of about four different kinds of
content. First, the biggest portion of it is information about
developing certain kinds of C extensions -- all the "Writing a
Whatever" chapters, the "Whatever Access Method Interface Definition"
chapters, "Generic WAL Records", "Custom WAL Resource Managers", and
all the index-related chapters. Second, we've got some information
that I think is mainly of interest to people developing PostgreSQL
itself, namely, "PostgreSQL Coding Conventions", "Native Language
Support", and "System Catalog Declarations and Initial Contents". You
*might* care about these if you're developing extensions, or even if
you're not a developer at all, but then again you might not. Third,
we've got some reference material, namely "System Catalogs", "System
Views", and perhaps "Frontend/Backend Protocol". I distinguish these
from the previous two categories because I think you could care about
this stuff as a random user, or a developer of products that
interoperate with PostgreSQL but don't link with it or share any
common code. Finally, there's just a bunch of random bits and bobs
that we've decided to document here for one reason or another, either
because somebody else did a bunch of the work, like "Overview of
PostgreSQL Internals", or because some developer did something and
someone said "hey, that should be documented!", like "Backup Manifest
Format."

So my first thought is to pull out the stuff that's mainly for
PostgreSQL core developers and move it to an appendix. I propose we
create an appendix called "Developer Guide" and that it absorb the
existing appendix I, "The Source Code Repository", possibly all or
part of appendix J, "Documentation", and the chapters from "VII.
Internals" that are mostly of developer interest. I think that
possibly some of what's in "J. Documentation" should actually be moved
into the "Installation" chapter where we talk about building the
source code, because it doesn't make much sense to document the build
tool chain in one part of the documentation and the documentation
toolchain someplace else entirely, but "J.6. Style Guide" is developer
information, not build instructions.

My second thought is that the stuff from "VII. Internals" that I
categorized as reference material should move into section "VI.
Reference". I think we should also consider moving appendix F,
"Additional Supplied Modules and Extensions," and appendix G,
"Additional Supplied Programs" to the reference section. However,
prior to doing that, I think that appendix G needs some cleanup or
possibly we should just find a way to remove it outright. We're
shipping an appendix G with two major subsections, one of which is
completely empty and has been since v14, and the other of which
contains only two things. I think we should just remove the empty
sub-section entirely. I'm not sure what to do about the only with only
2 things in it (vacuumlo and oid2name). Would it be a bad idea to just
merge those bits into the client applications reference section?

My third thought is about what to do with the material in "VII.
Internals" that is about developing specific kind of extensions, like,
say, "Writing a Foreign Data Wrapper." If you look at "V. Server
Programming", you see that we actually have some very similar sections
there, like chapter 47, "Background Worker Processes" and chapter 50,
"Archive Modules". I think it's not very clear in the current
structure where topics that are relevant for extension developers
should go under "Server Programming" or under "Internals", and it
looks to me like different people have just done different things and
it's all a bit haphazard. One idea is to decide that the correct
answer is "Server Programming" and move all of the internals chapters
that fall into this category over to there. I don't think this is the
right answer, because that section also contains information about a
bunch of stuff that's strictly SQL-level, like rules and triggers. So
what I think we should do is create either [A] a new top-level part,
just before or just after what's currently called "VI. Reference" or
[B] a new appendix or [C] a new "Reference" section, that is
specifically for documentation of server APIs intended for extension
use. And then all the chapters under "V. Server Programming" or "VII.
Internals" that are documenting APIs would get moved there.

If we adopted all of the patches that I proposed in my previous email
and all of the suggestions that I just dropped in the preceding wall
of text, then the internals section would be left with only these
chapters:

- Overview of PostgreSQL Internals
- Genetic Query Optimizer
- Database Physical Storage
- Transaction Processing
- How the Planner Uses Statistics
- Backup Manifest Format

A lot of those chapters are pretty dated and maybe not that useful in
2024, but this email is already long enough and full of
sufficiently-aggressive proposals that I'm not inclined to opine too
much further on what we might want to do if and when we've done
everything I just proposed. For now, suffice it to say that I think we
might choose to either rewrite and expand some of these to make them
more useful, or demote some of them to some less prominent place in
the documentation, or just delete some of them entirely; but we can
figure that out if and when we get there.

> I was looking at the PL chapters earlier today too, wondering whether
> this would be valuable; but I worry that there are too many
> sub-sub-sections there, so it could end up being a bit messy.  I didn't
> look at the resulting output though.

That thought occurred to me as well. I certainly think that if we
perform the sort of aggressive purging of the top-level index for
which I'm advocating, there are going to be some people who are grumpy
that the stuff they're trying to find isn't where it used to be, or
who legitimately had trouble finding the content that they want. It
seems to me that if you're looking for the documentation on one of the
individual procedural languages and you don't see it, you'll try
clicking on "Procedural Languages" and then you'll find that it's now
under there. Now, what's maybe a bit unfortunate is that chapter
indexes only show two levels of section headings, and the PL/pgsql
chapter in particular has a lot of <sect2> items. If those get demoted
to <sect3> as I am proposing, they won't show up the chapter index any
more. I do think there's a possibility that this could be a problem
for someone.

On the other hand, the table of contents for
https://www.postgresql.org/docs/devel/plpgsql.html is so long right
now that it doesn't fit on the page, so maybe losing a level of
subsections won't be so bad. Alternately, maybe we could revise the
structure of the section a bit to ameliorate the problem. It seems to
me that most of the first-level section headers are actually pretty
clear about what you're likely to find underneath. If you're looking
for WHILE and you see a section called "Control Structures", it seems
like you're chances of guessing that WHILE will be underneath that
section are pretty good. The major exception that I see is 45.2,
"Basic Statements," which isn't very clear about what might be covered
there. But what if we split that apart into separate sections called
"Assignment", "Executing SQL", and "Doing Nothing at All"? And maybe
we'd even pull "Returning" out of "Control Structures" as well. I
think that would be clear enough for people to find what they need
without the extra level of headers.

(For the sake of completeness, let me note that PL/python and PL/perl
have a few <sect2> headings as well, but I don't think it would create
a problem for users if all of those got changed to <sect3>. PL/Tcl has
no <sect2> headings.)

I'm not sure if this kind of rearrangement is actually necessary or
not; but my point here is that if we think that people will have
problems or we find out that they actually did have problems, we can
look at doing this kind of stuff to compensate. What I don't think we
should do is decide that the only workable solution is to keep having
so many separate chapters at the top level. We're way, way beyond the
point where you can easily find anything on that page, and trying to
emphasize everything just ends up emphasizing nothing. We need to push
in a direction where every chapter and every appendix is expected to
have a large amount of content under it, so that the top-level index
becomes a way of finding the kind of content you want (SQL reference
pages, extension APIs, built-in SQL-callable functions, whatever) and
then you use that page to find the specific content that you want
within that category.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

"David G. Johnston"

Date:

21 March 2024, 22:31:37

On Wed, Mar 20, 2024 at 9:43 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Mar 19, 2024 at 5:39 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> +many for improving the index.

Here's a series of four patches.

I reviewed the most recent set of 5 patches.

Taken together, they cut down the
number of numbered chapters from 76 to 68. I think we could easily
save that much again if I wrote a few more patches along similar
lines, but I'm posting these first to see what people think.

0001 removes the "Installation from Binaries" chapter. The whole thing
is four sentences. I moved the most important information into the
"Installation from Source Code" chapter and retitled it
"Installation".

Makes sense

0002 removes the "Monitoring Disk Usage" chapter by folding it into
the immediately-preceding "Monitoring Database Activity" chapter. I
kind of feel like the "Monitoring Disk Usage" chapter might be in need
of a bigger rewrite or just outright removal, but there's surely not
enough content here to justify making it a top-level chapter.

Just going to note that the section on the cumulative statistics views being a single page is still a strongly bothersome issue here. Though the quick fix actually requires upgrading the section to chapter status...

Maybe we can stub out that section in the "Monitoring Database Activity" chapter and move that entire section after "System Views" in the Internals part?

I agree with subordinating Monitoring Disk Usage.

0003 merges all of the "Internals" chapters whose names are the names
of built-in index access methods (Btree, Gin, etc.) into a single
chapter called "Built-In Index Access Methods". All of these chapters
have a very similar structure and none of them are very long, so it
makes a lot of sense, at least in my mind, to consolidate them into
one.

One of the more impactful and wanted improvements, IMO.

0004 merges the "Generic WAL Records" and "Custom WAL Resource
Managers" chapter together, creating a new chapter called "Write Ahead
Logging for Extensions".

The positioning of this and the preceding Built-in Index Access Methods chapter seem like they should be switched.

If this sticks we should add an introductory paragraph for the chapter.

and I've added a new 0005 that does essentially the same
thing for the PL chapters.

The following page needs to be reworded to take the new structure into account:

https://www.postgresql.org/docs/current/xfunc-pl.html

Not having pl/pgsql appear on the main ToC seems like a loss but the others make sense and a special exception for it probably isn't warranted.

Maybe "pl/pgsql and Other Procedural Languages" as the title?

David J.

Re: documentation structure

From

"David G. Johnston"

Date:

21 March 2024, 22:57:05

On Thu, Mar 21, 2024 at 11:30 AM Robert Haas <robertmhaas@gmail.com> wrote:

My second thought is that the stuff from "VII. Internals" that I
categorized as reference material should move into section "VI.
Reference". I think we should also consider moving appendix F,
"Additional Supplied Modules and Extensions," and appendix G,
"Additional Supplied Programs" to the reference section.

For "VI. Reference" I propose the following Chapters:

SQL Commands

PL/pgSQL

Cumulative Statistics Views

System Views

System Catalogs

Client Applications

Server Applications

Modules and Extensions

-- Remove Appendix G (Programs) altogether and just note for the two that are listed that they are in contrib as opposed to core.

-- The PostgreSQL qualifier doesn't seem helpful and once you add the additional chapters its unusual presence stands out even more.

-- PL/pgSQL gets its own reference chapter since we wrote it. Stuff like Perl and Python have entire books that the user can consult as reference material for those languages.

David J.

Re: documentation structure

From

Peter Eisentraut

Date:

21 March 2024, 23:33:22

On 19.03.24 14:50, Tom Lane wrote:
> Daniel Gustafsson <daniel@yesql.se> writes:
>> It's actually not very odd, the reference section is using <reference> elements
>> and we had missed the arabic numerals setting on those.  The attached fixes
>> that for me.  That being said, we've had roman numerals for the reference
>> section since forever (all the way down to the 7.2 docs online has it) so maybe
>> it was intentional?
> 
> I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
> it's not that way simply because nobody thought about it.

Looks to me it was just that way because it's the default setting of the 
stylesheets.

Re: documentation structure

From

Peter Eisentraut

Date:

21 March 2024, 23:37:19

On 20.03.24 17:43, Robert Haas wrote:
> 0001 removes the "Installation from Binaries" chapter. The whole thing
> is four sentences. I moved the most important information into the
> "Installation from Source Code" chapter and retitled it
> "Installation".

But this separation was explicitly added a few years ago, because most 
people just want to read about the binaries.

Re: documentation structure

From

Peter Eisentraut

Date:

21 March 2024, 23:40:38

On 21.03.24 15:31, Robert Haas wrote:
> On Thu, Mar 21, 2024 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd follow the extend.sgml precedent: have a file corresponding to the
>> chapter and containing any top-level text we need, then that includes
>> a file per sect1.
> 
> OK, here's a new patch set. I've revised 0003 and 0004 to use this
> approach, and I've added a new 0005 that does essentially the same
> thing for the PL chapters.

I'm highly against this.  If I want to read about PL/Python, why should 
I have to wade through PL/Perl and PL/Tcl?

I think, abstractly, in a book, PL/Python should be a chapter of its 
own.  Just like GiST should be a chapter of its own.  Because they are 
self-contained topics.

Re: documentation structure

From

Daniel Gustafsson

Date:

22 March 2024, 00:12:30

> On 22 Mar 2024, at 00:33, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 19.03.24 14:50, Tom Lane wrote:
>> Daniel Gustafsson <daniel@yesql.se> writes:
>>> It's actually not very odd, the reference section is using <reference> elements
>>> and we had missed the arabic numerals setting on those.  The attached fixes
>>> that for me.  That being said, we've had roman numerals for the reference
>>> section since forever (all the way down to the 7.2 docs online has it) so maybe
>>> it was intentional?
>> I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
>> it's not that way simply because nobody thought about it.
>
> Looks to me it was just that way because it's the default setting of the stylesheets.

That's quite possible.  I don't have strong opinions on whether we should
change, or keep it the way it is.

--
Daniel Gustafsson

Re: documentation structure

From

Bruce Momjian

Date:

22 March 2024, 00:14:37

On Fri, Mar 22, 2024 at 01:12:30AM +0100, Daniel Gustafsson wrote:
> > On 22 Mar 2024, at 00:33, Peter Eisentraut <peter@eisentraut.org> wrote:
> > 
> > On 19.03.24 14:50, Tom Lane wrote:
> >> Daniel Gustafsson <daniel@yesql.se> writes:
> >>> It's actually not very odd, the reference section is using <reference> elements
> >>> and we had missed the arabic numerals setting on those.  The attached fixes
> >>> that for me.  That being said, we've had roman numerals for the reference
> >>> section since forever (all the way down to the 7.2 docs online has it) so maybe
> >>> it was intentional?
> >> I'm quite sure it *was* intentional.  Maybe it was a bad idea, but
> >> it's not that way simply because nobody thought about it.
> > 
> > Looks to me it was just that way because it's the default setting of the stylesheets.
> 
> That's quite possible.  I don't have strong opinions on whether we should
> change, or keep it the way it is.

If we can't justify why it should be different, it should be like the
surrounding sections.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 12:32:14

On Thu, Mar 21, 2024 at 6:32 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Just going to note that the section on the cumulative statistics views being a single page is still a strongly
bothersomeissue here. Though the quick fix actually requires upgrading the section to chapter status...

Yeah, I've been bothered by this in the past, too. I'm not very keen
to start promoting things to the top-level, though. I think we need a
more thoughtful fix than that.

One question I have is why all of these views are documented here
rather than in chapter 53, "System Views," because surely they are
system views. I feel like if our documentation index weren't a mile
long and if you could easily find the entry for "System Views," that's
where you would naturally look for these details. I don't think it's
natural for a user to expect that most of the system views are going
to be documented in section VII, chapter 53 but one particular kind is
going to be documented in section III, chapter 27, under a chapter
title that gives no hint that it will document any views.

> Maybe "pl/pgsql and Other Procedural Languages" as the title?

I guess I have a hard time seeing this as an improvement. It would
help someone who knows that plpgsql exists but doesn't know that it
falls into the general category called procedural languages, but I
suspect that's not a very common confusion. I think it's better to
keep the chapter titles short and to the point.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 12:50:38

On Thu, Mar 21, 2024 at 7:37 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> On 20.03.24 17:43, Robert Haas wrote:
> > 0001 removes the "Installation from Binaries" chapter. The whole thing
> > is four sentences. I moved the most important information into the
> > "Installation from Source Code" chapter and retitled it
> > "Installation".
>
> But this separation was explicitly added a few years ago, because most
> people just want to read about the binaries.

I really doubt that this is true. I've been installing software on
UNIX-like operating systems for more than 30 years now, and I don't
think there's been a single time when I have ever consulted the
documentation for a software package to find the download location for
that package. When I first started out, everything was ftp rather than
www, so you went to ftp.whatever.{com,org,net,gov,edu} and tried to
download the distribution bundle, and then you untarred it and ran
configure and make. Then you read the README or the documentation or
whatever afterward. These days, I think what people do is either (a)
use their package manager to install PostgreSQL and then come to the
documentation afterward to find out how to use it or (b) do a search
for "PostgreSQL download" and click on whatever comes up. I'm not
saying there's never been a user who made use of this section of the
documentation to find the download location, but surely the normal
thing to do if you come to www.postgresql.org and you want to download
the software is to click "Download" on the nav bar, not
"Documentation," then a specific version, then chapter 16, then the
exact same download link that's already there on the nav bar.

I do agree that it is very questionable whether "Installation from
Source Code" is of sufficient interest to ordinary users to justify
including it in "III. Server Administration." Most people, probably
including many extension developers, are only going to install the
binary packages. But the solution to that isn't to have a
four-sentence chapter telling me about a download location that I
likely found long before I looked at the documentation, and that I can
certainly find very easily without needing the documentation. Rather,
what we should do if we think that installing from source code is of
marginal interest is move it to an appendix. As I said to Alvaro
yesterday, I think that a "Developer Guide" appendix could be a good
place to house a number of things that currently have toplevel
chapters but don't really need them because they're only of interest
to a small minority of users. This might be another thing that could
go there.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Peter Eisentraut

Date:

22 March 2024, 13:35:47

On 22.03.24 13:50, Robert Haas wrote:
> On Thu, Mar 21, 2024 at 7:37 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>> On 20.03.24 17:43, Robert Haas wrote:
>>> 0001 removes the "Installation from Binaries" chapter. The whole thing
>>> is four sentences. I moved the most important information into the
>>> "Installation from Source Code" chapter and retitled it
>>> "Installation".
>>
>> But this separation was explicitly added a few years ago, because most
>> people just want to read about the binaries.
> 
> I really doubt that this is true.

Here is the thread: 
https://www.postgresql.org/message-id/flat/CABUevExRCf8waYOsrCO-QxQL50XGapMf5dnWScOXj7X%3DMXW--g%40mail.gmail.com

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 13:59:47

On Thu, Mar 21, 2024 at 7:40 PM Peter Eisentraut <peter@eisentraut.org> wrote:
> I'm highly against this.  If I want to read about PL/Python, why should
> I have to wade through PL/Perl and PL/Tcl?
>
> I think, abstractly, in a book, PL/Python should be a chapter of its
> own.  Just like GiST should be a chapter of its own.  Because they are
> self-contained topics.

On the other hand, in a book, chapters tend to be of relatively
uniform length. People don't usually write a book with some chapters
that are 100+ pages long, and others that are a single page, or even
just a couple of sentences. I mean, I'm sure it's been done, but it's
not a normal way to write a book.

And I don't believe that if someone were writing a physical book about
PostgreSQL from scratch, they'd ever end up with a top-level chapter
that looks anything like our GiST chapter. All of the index AM
chapters are quite obviously clones of each other, and they're all
quite short. Surely you'd make them sections within a chapter, not
entire chapters.

I do agree that PL/pgsql is more arguable. I can imagine somebody
writing a book about PostgreSQL and choosing to make that topic into a
whole chapter.

However, I also think that people don't make decisions about what
should be a chapter in a vacuum. If you've got 100 people writing a
book together, which is essentially what we actually do have, and each
of those people makes decisions in isolation about what is worthy of
being a chapter, then you end up with exactly the kind of mess that we
now have. Some chapters are long and some are short. Some are
well-written and some are poorly written. Some are updated regularly
and others have hardly been touched in a decade. Books have editors to
straighten out those kinds of inconsistencies so that there's some
uniformity to the product as a whole.

The problem with that, of course, is that it invites bike-shedding. As
you say, every decision that is reflected in our documentation was
made for some reason, and most of them will have been made by
prominent, active committers. So discussions about how to improve
things can easily bog down even when people agree on the overall
goals, simply because few individual changes find consensus. I hope
that doesn't happen here, because I think most people who have
commented so far agree that there is a problem here and that we should
try to fix it. Let's not let the perfect be the enemy of the good.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 14:10:22

On Fri, Mar 22, 2024 at 9:35 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> >> But this separation was explicitly added a few years ago, because most
> >> people just want to read about the binaries.
> >
> > I really doubt that this is true.
>
> Here is the thread:
> https://www.postgresql.org/message-id/flat/CABUevExRCf8waYOsrCO-QxQL50XGapMf5dnWScOXj7X%3DMXW--g%40mail.gmail.com

Sorry. I didn't mean to dispute the point that the section was added a
few years ago, nor the point that most people just want to read about
the binaries. I am confident that both of those things are true. What
I do want to dispute is that having a four-sentence chapter in the
documentation index that tells people something they can find much
more easily without using the documentation at all is a good plan. I
agree with the concern that Magnus expressed on the thread, i.e:

> It's kind of strange that if you start your PostgreSQL journey by reading our instructions, you get nothing useful
aboutinstalling PostgreSQL from binary packages other than "go ask somebody else about it". 

But I don't agree that this was the right way to address that problem.
I think it would have been better to just add the download link to the
existing installation chapter. That's actually what we had in chapter
18, "Installation from Source Code on Windows", since removed. But for
some reason we decided that on non-Windows platforms, it needed a
whole new chapter rather than an extra sentence in the existing one. I
think that's massively overkill.

Alternately, I think it would be reasonable to address the concern by
just moving all the stuff about building from source code to an
appendix, and assume people can figure out how to download the
software without us needing to say anything in the documentation at
all. What was weird about the state before that patch, IMHO, was that
we both talked about building from source code and didn't talk about
binary packages. That can be addressed either by adding a mention of
binary packages, or by deemphasizing the idea of installing from
source code.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

"David G. Johnston"

Date:

22 March 2024, 15:50:21

On Fri, Mar 22, 2024 at 7:10 AM Robert Haas <robertmhaas@gmail.com> wrote:

That's actually what we had in chapter
18, "Installation from Source Code on Windows", since removed. But for
some reason we decided that on non-Windows platforms, it needed a
whole new chapter rather than an extra sentence in the existing one. I
think that's massively overkill.

I agree with the premise that we should have a single chapter, in the main documentation flow, named "Installation". It should cover the architectural overview and point people to where they can find the stuff they need to install PostgreSQL in the various ways available to them. I agree with moving the source installation material to the appendix. None of the sections under Installation would then actually detail how to install the software since that isn't something the project itself handles but has delegated to packagers for the vast majority of cases and the source install details are in the appendix for the one "supported" mechanism that most people do not use.

David J.

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 16:32:40

On Fri, Mar 22, 2024 at 11:50 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> On Fri, Mar 22, 2024 at 7:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> That's actually what we had in chapter
>> 18, "Installation from Source Code on Windows", since removed. But for
>> some reason we decided that on non-Windows platforms, it needed a
>> whole new chapter rather than an extra sentence in the existing one. I
>> think that's massively overkill.
>
> I agree with the premise that we should have a single chapter, in the main documentation flow, named "Installation".
Itshould cover the architectural overview and point people to where they can find the stuff they need to install
PostgreSQLin the various ways available to them.  I agree with moving the source installation material to the appendix.
None of the sections under Installation would then actually detail how to install the software since that isn't
somethingthe project itself handles but has delegated to packagers for the vast majority of cases and the source
installdetails are in the appendix for the one "supported" mechanism that most people do not use. 

Hmm, that's not quite the same as my position. I'm fine with either
moving the installation from source material to an appendix, or
leaving it where it is. But I'm strongly against continuing to have a
chapter with four sentences in it that says to use the same download
link that is on the main navigation bar of every page on the
postgresql.org web site. We're never going to get the chapter index
down to a reasonable size if we insist on having chapters that have a
totally de minimis amount of content.

So my feeling is that if we keep the installation from source material
where it is, then we can make it also mention the download link, just
as we used to do in the installation-on-windows chapter. But if we
banish installation from source to the appendixes, then we shouldn't
keep a whole chapter in the main documentation to tell people
something that is anyway obvious. I don't really think that material
needs to be there at all, but if we want to have it, surely we can
find someplace to put it such that it doesn't require a whole chapter
to say that and nothing else. It could for example go at the beginning
of the "Server Setup and Operation" chapter, for instance; if that
were the first chapter of section III, I think that would be natural
enough.

I notice that you say that the "Installation" section should "cover
the architectural overview and point people to where they can find the
stuff they need to install PostgreSQL in the various ways available to
them" so maybe you're not imagining a four-sentence chapter, either.
But this project is going to be impossible unless we stick to limited
goals. We can, and should, rewrite some sections of the documentation
to be more useful; but if we try to do that as part of the same
project that aims to tidy up the index, the chances of us getting
stuck in an endless bikeshedding loop go from "high" to "certain". So
I don't want to hypothesize the existence of an installation chapter
that isn't any of the things we have today. Let's try to get the
things we have into places that make sense, and then consider other
improvements separately.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

"David G. Johnston"

Date:

22 March 2024, 16:55:24

On Fri, Mar 22, 2024, 09:32 Robert Haas <robertmhaas@gmail.com> wrote:

I notice that you say that the "Installation" section should "cover
the architectural overview and point people to where they can find the
stuff they need to install PostgreSQL in the various ways available to
them" so maybe you're not imagining a four-sentence chapter, either.

Fair point but I posit that new users are looking for a chapter named Installation in the documentation. At least the ones willing to read documentation. Having two of them isn't needed but having zero doesn't make sense either.

The current proposal does that so I'm ok as-is but it can be further improved by moving source install talk elsewhere and having the installation chapter redirect the reader there for details. I'm not concerned with how long or short the resultant installation chapter is.

David J.

Re: documentation structure

From

Bruce Momjian

Date:

22 March 2024, 17:35:17

On Fri, Mar 22, 2024 at 08:32:14AM -0400, Robert Haas wrote:
> On Thu, Mar 21, 2024 at 6:32 PM David G. Johnston
> <david.g.johnston@gmail.com> wrote:
> > Just going to note that the section on the cumulative statistics views being a single page is still a strongly
bothersomeissue here.  Though the quick fix actually requires upgrading the section to chapter status...
 
> 
> Yeah, I've been bothered by this in the past, too. I'm not very keen
> to start promoting things to the top-level, though. I think we need a
> more thoughtful fix than that.
> 
> One question I have is why all of these views are documented here
> rather than in chapter 53, "System Views," because surely they are
> system views. I feel like if our documentation index weren't a mile
> long and if you could easily find the entry for "System Views," that's
> where you would naturally look for these details. I don't think it's
> natural for a user to expect that most of the system views are going
> to be documented in section VII, chapter 53 but one particular kind is
> going to be documented in section III, chapter 27, under a chapter

Well, until this commit in 2022, the system views were _under_ the
system catalogs chapter:

    commit 64d364bb39c
    Author: Bruce Momjian <bruce@momjian.us>
    Date:   Thu Jul 14 16:07:12 2022 -0400
    
        doc:  move system views section to its own chapter
    
        Previously it was inside the system catalogs chapter.
    
        Reported-by: Peter Smith
    
        Discussion: https://postgr.es/m/CAHut+PsMc18QP60D+L0hJBOXrLQT5m88yVaCDyxLq34gfPHsow@mail.gmail.com
    
        Backpatch-through: 15

The thread contains more discussion the issue, and I think it still needs help:


https://www.postgresql.org/message-id/flat/CAHut%2BPsMc18QP60D%2BL0hJBOXrLQT5m88yVaCDyxLq34gfPHsow%40mail.gmail.com

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 18:19:29

On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian <bruce@momjian.us> wrote:
> > One question I have is why all of these views are documented here
> > rather than in chapter 53, "System Views," because surely they are
> > system views. I feel like if our documentation index weren't a mile
> > long and if you could easily find the entry for "System Views," that's
> > where you would naturally look for these details. I don't think it's
> > natural for a user to expect that most of the system views are going
> > to be documented in section VII, chapter 53 but one particular kind is
> > going to be documented in section III, chapter 27, under a chapter
>
> Well, until this commit in 2022, the system views were _under_ the
> system catalogs chapter:

Even before that commit, the statistics collector views were
documented in a completely separate part of the documentation from all
of the other system views.

I think that commit was a good idea, even though it made the top-level
documentation index bigger, because in v14, the "System Catalogs"
chapter looks like this:

...
52.61. pg_ts_template
52.62. pg_type
52.63. pg_user_mapping
52.64. System Views
52.65. pg_available_extensions
52.66. pg_available_extension_versions
52.67. pg_backend_memory_contexts
...

If you were actually looking for the section called "System Views",
you weren't likely to see it here unless you already knew it was
there, because it was 64 items into a 97-item list. Having one of
these two sections inside the other just doesn't work at all. We could
have alternatively chosen to have one chapter with two <sect1> tags
inside of it, but I think what you actually did was perfectly fine.
IMHO, "System Views" is important enough (and big enough) that giving
it its own chapter is perfectly reasonable.

But that all seems like a separate question from why we have the
statistic collector views in a completely different part of the
documentation from the rest of the system views. My guess is that it's
just kind of a historical accident, but maybe there was some other
logic to it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Bruce Momjian

Date:

22 March 2024, 18:59:44

On Fri, Mar 22, 2024 at 02:19:29PM -0400, Robert Haas wrote:
> If you were actually looking for the section called "System Views",
> you weren't likely to see it here unless you already knew it was
> there, because it was 64 items into a 97-item list. Having one of
> these two sections inside the other just doesn't work at all. We could
> have alternatively chosen to have one chapter with two <sect1> tags
> inside of it, but I think what you actually did was perfectly fine.
> IMHO, "System Views" is important enough (and big enough) that giving
> it its own chapter is perfectly reasonable.
> 
> But that all seems like a separate question from why we have the
> statistic collector views in a completely different part of the
> documentation from the rest of the system views. My guess is that it's
> just kind of a historical accident, but maybe there was some other
> logic to it.

I assume statistics collector views are in "Monitoring Database
Activity" because that is their purpose.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

From

"David G. Johnston"

Date:

22 March 2024, 19:06:18

On Fri, Mar 22, 2024 at 11:19 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian <bruce@momjian.us> wrote:

But that all seems like a separate question from why we have the
statistic collector views in a completely different part of the
documentation from the rest of the system views. My guess is that it's
just kind of a historical accident, but maybe there was some other
logic to it.

The details under-pinning the cumulative statistics subsystem are definitely large enough to warrant their own subsection. And it isn't like placing them into the monitoring chapter is wrong and aside from a couple of views those under System Views don't fit into what we've defined as monitoring. I don't have any desire to lump them under the generic system views; which itself could probably use a level of categorization since the nature of pg_locks and pg_cursors is decidedly different than pg_indexes and pg_config. This all becomes more appealing to work on once we solve the problem of all sect2 entries being placed on a single page.

I struggled for a long while where I'd always look for pg_stat_activity under system views instead of monitoring. Amending my prior suggestion in light of this I would suggest we move the Cumulative Statistics Views into Reference but as its own Chapter, not part of System Views, and change its name to "Monitoring Views" (going more generalized here feels like a win to me). I'd move pg_locks, pg_cursors, pg_backend_memory_contexts, pg_prepared_*, pg_shmem_allocations, and pg_replication_*. Those all have the same general monitoring nature to them compared to the others that basically provide details regarding schema and static or session configuration.

The original server admin monitoring section can go into detail regarding Cumulative Statistics versus other kinds of monitoring. We can use section ordering to fulfill logical grouping desires until we are able to make section3 entries appear on their own pages.

David J.

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 19:13:29

On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian <bruce@momjian.us> wrote:
> I assume statistics collector views are in "Monitoring Database
> Activity" because that is their purpose.

Well, yes. :-)

But the point is that all other statistics views are in a single
section regardless of their purpose. We don't document pg_roles in the
"Database Roles" chapter, for example.

And on the flip side, pg_locks and pg_replication_origin_status are
also for monitoring database activity, but they're in the "System
Views" chapter anyway. The only system views that are in "Monitoring
Database Activity" rather than "System Views" are the ones where the
name starts with "pg_stat_".

So the reason you state is why these views are under "Monitoring
Database Activity" rather than a chapter chosen at random. But it
doesn't really explain why they're separate from the other system
views at all. That seems to be a pretty much random choice, AFAICT.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Bruce Momjian

Date:

22 March 2024, 19:17:48

On Fri, Mar 22, 2024 at 03:13:29PM -0400, Robert Haas wrote:
> On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian <bruce@momjian.us> wrote:
> > I assume statistics collector views are in "Monitoring Database
> > Activity" because that is their purpose.
> 
> Well, yes. :-)
> 
> But the point is that all other statistics views are in a single
> section regardless of their purpose. We don't document pg_roles in the
> "Database Roles" chapter, for example.
> 
> And on the flip side, pg_locks and pg_replication_origin_status are
> also for monitoring database activity, but they're in the "System
> Views" chapter anyway. The only system views that are in "Monitoring
> Database Activity" rather than "System Views" are the ones where the
> name starts with "pg_stat_".
> 
> So the reason you state is why these views are under "Monitoring
> Database Activity" rather than a chapter chosen at random. But it
> doesn't really explain why they're separate from the other system
> views at all. That seems to be a pretty much random choice, AFAICT.

I agree and they should be with the other views.  I was just explaining
why, at the time, I didn't touch them.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

From

Robert Haas

Date:

22 March 2024, 19:20:41

On Fri, Mar 22, 2024 at 3:17 PM Bruce Momjian <bruce@momjian.us> wrote:
> I agree and they should be with the other views.  I was just explaining
> why, at the time, I didn't touch them.

Ah, OK. That makes total sense.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Peter Eisentraut

Date:

25 March 2024, 15:26:20

On 22.03.24 14:59, Robert Haas wrote:
> And I don't believe that if someone were writing a physical book about
> PostgreSQL from scratch, they'd ever end up with a top-level chapter
> that looks anything like our GiST chapter. All of the index AM
> chapters are quite obviously clones of each other, and they're all
> quite short. Surely you'd make them sections within a chapter, not
> entire chapters.
> 
> I do agree that PL/pgsql is more arguable. I can imagine somebody
> writing a book about PostgreSQL and choosing to make that topic into a
> whole chapter.

Yeah, I think there is probably a range of of things from pretty obvious 
to mostly controversial.

Re: documentation structure

From

Peter Eisentraut

Date:

25 March 2024, 15:40:52

On 22.03.24 15:10, Robert Haas wrote:
> Sorry. I didn't mean to dispute the point that the section was added a
> few years ago, nor the point that most people just want to read about
> the binaries. I am confident that both of those things are true. What
> I do want to dispute is that having a four-sentence chapter in the
> documentation index that tells people something they can find much
> more easily without using the documentation at all is a good plan.

I think a possible problem we need to consider with these proposals to 
combine chapters is that they could make the chapters themselves too 
deep and harder to navigate.  For example, if we combined the 
installation from source and binaries chapters, the structure of the new 
chapter would presumably be

<chapter> Installation
  <sect1>   Installation from Binaries
  <sect1>   Installation from Source
   <sect2>   Requirements
   <sect2>   Getting the Source
   <sect2>   Building and Installation with Autoconf and Make
   <sect2>   Building and Installation with Meson
etc.

This would mean that the entire "Installation from Source" part would be 
rendered on a single HTML page.

The rendering can be adjusted to some degree, but then we also need to 
make sure any new chunking makes sense in other chapters.  (And it might 
also change a bunch of externally known HTML links.)

I think maybe more could also be done at the top-level structure, too. 
Right now, we have <book> -> <part> -> <chapter>.  We could add <set> on 
top of that.

We could also play with CSS or JavaScript to make the top-level table of 
contents more navigable, with collapsing subsections or whatever.

We could also render additional tables of contents or indexes, so there 
is more than one way to navigate into the content from the top.

We could also build better search.

Re: documentation structure

From

Robert Haas

Date:

25 March 2024, 16:01:28

On Mon, Mar 25, 2024 at 11:40 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> I think a possible problem we need to consider with these proposals to
> combine chapters is that they could make the chapters themselves too
> deep and harder to navigate.  For example, if we combined the
> installation from source and binaries chapters, the structure of the new
> chapter would presumably be

I agree with this in theory, but in practice I think the patches that
I posted don't have this issue to a degree that is problematic, and I
posted some specific proposals on adjustments that we could make to
ameliorate the problem if other people feel differently.

> I think maybe more could also be done at the top-level structure, too.
> Right now, we have <book> -> <part> -> <chapter>.  We could add <set> on
> top of that.
>
> We could also play with CSS or JavaScript to make the top-level table of
> contents more navigable, with collapsing subsections or whatever.
>
> We could also render additional tables of contents or indexes, so there
> is more than one way to navigate into the content from the top.
>
> We could also build better search.

These are all reasonable ideas. I think some better CSS and JavaScript
could definitely help, and I also wondered whether the entrypoint to
the documentation has to be the index page, or whether it could maybe
be a page we've crafted specifically for that purpose, that might
include some text as well as a bunch of links.

But that having been said, I don't believe that any of those ideas (or
anything else we do) will obviate the need for some curation of the
toplevel index. If you're going to add another level, as you propose
in the first point, you still need to make decisions about which
things properly go at which levels. If you're going to allow for
collapsing subsections, you still want the overall tree in which
subsections are be expanded and collapsed to make logical sense. If
you have multiple ways to navigate to the content, one of them will
probably be still the index, and it should be good. And good search is
good, but it shouldn't be the only convenient way to find the content.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Robert Haas

Date:

29 March 2024, 13:40:08

OK, so I'm coming back to this thread after giving it a few days to
cool off. My last series of patches proposed to do five things:

1. Merge the four-sentence "Installation from Binaries" chapter back
into "Installation from Source". I thought this was a slam-dunk, but
Peter pointed out that exactly the opposite of this was done a few
years ago to create the "Installation from Binaries" chapter in the
first place. Based on subsequent discussion, what I'm now inclined to
do is come up with a new proposal that involves moving the information
about compiling from source to an appendix. So never mind about this
one for now.

2. Demote "Monitoring Disk Usage" from a chapter on its own to a
section of the "Monitoring Database Activity" chapter. I haven't seen
any objections to this, and I'd like to move ahead with it.

3. Merge the separate chapters on various built-in index AMs into one.
Peter didn't think this was a good idea, but Tom and Alvaro's comments
focused on how to do it mechanically, and on whether the chapters
needed to be reordered afterwards, which I took to mean that they were
OK with the basic concept. David Johnston was also clearly in favor of
it. So I'd like to move ahead with this one, too.

4. Consolidate the "Generic WAL Records" and "Custom WAL Resource
Managers" chapters, which cover related topics, into a single one. I
didn't see anyone object to this, but David Johnston pointed out that
the patch I posted was a few bricks short of a load, because it really
needed to put some introductory text into the new chapter. I'll study
this a bit more and propose a new patch that does the same thing a bit
more carefully than my previous version did.

5. Consolidate all of the procedural language chapters into one. This
was clearly the most controversial part of the proposal. I'm going to
lay this one aside for now and possibly come back to it at a later
time.

I hope that this way of proceeding makes sense to people.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Robert Haas

Date:

05 April 2024, 15:11:53

On Fri, Mar 29, 2024 at 9:40 AM Robert Haas <robertmhaas@gmail.com> wrote:
> 2. Demote "Monitoring Disk Usage" from a chapter on its own to a
> section of the "Monitoring Database Activity" chapter. I haven't seen
> any objections to this, and I'd like to move ahead with it.
>
> 3. Merge the separate chapters on various built-in index AMs into one.
> Peter didn't think this was a good idea, but Tom and Alvaro's comments
> focused on how to do it mechanically, and on whether the chapters
> needed to be reordered afterwards, which I took to mean that they were
> OK with the basic concept. David Johnston was also clearly in favor of
> it. So I'd like to move ahead with this one, too.

I committed these two patches.

> 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource
> Managers" chapters, which cover related topics, into a single one. I
> didn't see anyone object to this, but David Johnston pointed out that
> the patch I posted was a few bricks short of a load, because it really
> needed to put some introductory text into the new chapter. I'll study
> this a bit more and propose a new patch that does the same thing a bit
> more carefully than my previous version did.

Here is a new version of this patch. I think this is v18 material at
this point, absent an outcry to the contrary. Sometimes we're flexible
about doc patches.

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

v3-0001-docs-Consolidate-into-new-WAL-for-Extensions-chap.patch

Re: documentation structure

From

Robert Haas

Date:

05 April 2024, 16:01:10

On Mon, Mar 25, 2024 at 11:40 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> I think a possible problem we need to consider with these proposals to
> combine chapters is that they could make the chapters themselves too
> deep and harder to navigate.

I looked into various options for further combining chapters and/or
appendixes and found that this is indeed a huge problem. For example,
I had thought of creating a Developer Information chapter in the
appendix and moving various existing chapters and appendixes inside of
it, but that means that the <sect1> elements in those chapters get
demoted to <sect2>, and what used to be a whole chapter or appendix
becomes a <sect1>. And since you get one HTML page per <sect1>, that
means that instead of a bunch of individual HTML pages of very
pleasant length, you suddenly get one very long HTML page that is,
exactly as you say, hard to navigate.

> The rendering can be adjusted to some degree, but then we also need to
> make sure any new chunking makes sense in other chapters.  (And it might
> also change a bunch of externally known HTML links.)

I looked into this and I'm unclear how much customization is possible.
I gather that the current scheme comes from having chunk.section.depth
of 1, and I guess you can change that to 2 to get an HTML page per
<sect2>, but it seems like it would take a LOT of restructuring to
make that work. It would be much easier if you could vary this across
different parts of the documentation; for instance, if you could say,
well, in this particular chapter or appendix, I want
chunk.section.depth of 2, but elsewhere 1, that would be quite handy,
but after several hours reading various things about DocBook on the
Internet, I was still unable to determine  conclusively whether this
was possible. There's an interesting comment in
stylesheet-speedup-xhtml.xsl that says "Since we set a fixed
$chunk.section.depth, we can do away with a bunch of complicated XPath
searches for the previous and next sections at various levels." That
sounds like it's suggesting that it is in fact possible for this
setting to vary, but I don't know if that's true, or how to do it, and
it sounds like there might be performance consequences, too.

> I think maybe more could also be done at the top-level structure, too.
> Right now, we have <book> -> <part> -> <chapter>.  We could add <set> on
> top of that.

Does this let you create structures of non-uniform depth? i.e. is
there a way that we can group some chapters into sets while leaving
others as standalone chapters, or somesuch?

I'm not 100% confident that non-uniform depth (either via <set> or via
chunk.section.depth or via some other mechanism) is a good idea.
There's a sort of uniformity to our navigation right now that does
have some real appeal. The downside, though, is that if you want
something to be a single HTML page, it's got to either be a chapter
(or appendix) by itself with no sections inside of it, or it's got to
be a <sect1> inside of a chapter, and so anything that's long enough
that it should be an HTML page by itself can never be more than one
level below the index. And that seems to make it quite difficult to
keep the index small.

Without some kind of variable-depth structure, the only other ways
that I can see to improve things are:

1. Make chunk.section.depth 2 and restructure the entire documentation
until the results look reasonable. This might be possible but I bet
it's difficult. We have, at present, chapters of *wildly* varying
length, from a few sentences to many, many pages. That is perhaps a
bad thing; you most likely wouldn't do that in a printed book. But
fixing it is a huge project. We don't necessarily have the same amount
of content about each topic, and there isn't necessarily a way of
grouping related topics together that produces units of relatively
uniform length. I think it's sensible to try to make improvements
where we can, by pushing stuff down that's short and not that
important, but finding our way to a chunk.section.depth=2 world that
feels good to most people compared to what we have today seems like
it's going to be challening.

2. Replace the current index with a custom index or landing page of
some kind. Or keep the current index and add a new landing page
alongside it. Something that isn't derived automatically from the
documentation structure but is created by hand.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

"David G. Johnston"

Date:

05 April 2024, 16:14:56

On Fri, Apr 5, 2024 at 9:01 AM Robert Haas <robertmhaas@gmail.com> wrote:

> The rendering can be adjusted to some degree, but then we also need to
> make sure any new chunking makes sense in other chapters. (And it might
> also change a bunch of externally known HTML links.)

I looked into this and I'm unclear how much customization is possible.

Here is a link to my attempt at this a couple of years ago. It basically "abuses" refentry.

https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com

I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination without negative consequences and with minimal risk of unintended consequences since only the markup on the pages we want to alter is changed, not global configuration.

David J.

Re: documentation structure

From

Robert Haas

Date:

05 April 2024, 16:18:30

On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Here is a link to my attempt at this a couple of years ago.  It basically "abuses" refentry.
>
> https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com
>
> I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination
withoutnegative consequences and with minimal risk of unintended consequences since only the markup on the pages we
wantto alter is changed, not global configuration. 

Hmm, but it seems like that might have generated some man page entries
that we don't want?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

"David G. Johnston"

Date:

05 April 2024, 16:22:36

On Fri, Apr 5, 2024 at 9:18 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Here is a link to my attempt at this a couple of years ago. It basically "abuses" refentry.
>
> https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com
>
> I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination without negative consequences and with minimal risk of unintended consequences since only the markup on the pages we want to alter is changed, not global configuration.

Hmm, but it seems like that might have generated some man page entries
that we don't want?

If so (didn't check) maybe just remove them in post?

David J.

Re: documentation structure

From

Peter Eisentraut

Date:

08 April 2024, 14:15:29

On 05.04.24 17:11, Robert Haas wrote:
>> 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource
>> Managers" chapters, which cover related topics, into a single one. I
>> didn't see anyone object to this, but David Johnston pointed out that
>> the patch I posted was a few bricks short of a load, because it really
>> needed to put some introductory text into the new chapter. I'll study
>> this a bit more and propose a new patch that does the same thing a bit
>> more carefully than my previous version did.
> 
> Here is a new version of this patch. I think this is v18 material at
> this point, absent an outcry to the contrary. Sometimes we're flexible
> about doc patches.

Looks good to me.  I think this could go into PG17.

Re: documentation structure

From

jian he

Date:

15 April 2024, 05:00:00

On Wed, Mar 20, 2024 at 5:40 AM Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> +many for improving the index.
>
> My own pet docs peeve is a purely editorial one: func.sgml is a 30k line beast, and I think there's a good case for
splittingout at least the larger chunks of it. 
>

I think I successfully reduced func.sgml from 311322 lines to 13167 lines.
(base-commit: 93582974315174d544592185d797a2b44696d1e5)

writing a patch would be unreviewable.
key gotcha is put the contents between opening `<sect1>`  and closing
`</sect1>` (both inclusive)
into a new file.
in func.sgml, using `&entity`  to refernce the new file.
also update filelist.sgml

here is how I do it:

I found out these build html files are the biggest one:
doc/src/sgml/html/functions-string.html
doc/src/sgml/html/functions-matching.html
doc/src/sgml/html/functions-datetime.html
doc/src/sgml/html/functions-json.html
doc/src/sgml/html/functions-aggregate.html
doc/src/sgml/html/functions-info.html
doc/src/sgml/html/functions-admin.html

so create these new sgml files hold corrspedoning content:
func-string.sgml
func-matching.sgml
func-datetime.sgml
func-json.sgml
func-aggregate.sgml
func-info.sgml
func-admin.sgml

based on funs.sgml structure pattern:
<sect1 id="functions-string">
next section1 line number:
<sect1 id="functions-binarystring">

<sect1 id="functions-matching">
next section1 line number:
<sect1 id="functions-formatting">

<sect1 id="functions-datetime">
next section1 line number:
<sect1 id="functions-enum">

<sect1 id="functions-json">
next section1 line number:
<sect1 id="functions-sequence">

<sect1 id="functions-aggregate">
next section1 line number:
<sect1 id="functions-window">

<sect1 id="functions-info">
next section1 line number:
<sect1 id="functions-admin">

<sect1 id="functions-admin">
next section1 line number:
<sect1 id="functions-trigger">
------------------------------------
step1:   pipe the relative line range contents to new sgml files.
(example: line 2407 to line 4177 include all the content correspond to
functions-string.html)

sed -n '2407,4177 p' func.sgml > func-string.sgml
sed -n '5328,7756 p' func.sgml >  func-matching.sgml
sed -n '8939,11122 p' func.sgml > func-datetime.sgml
sed -n '15498,19348 p' func.sgml > func-json.sgml
sed -n '21479,22896 p' func.sgml > func-aggregate.sgml
sed -n '24257,27896 p' func.sgml > func-info.sgml
sed -n '27898,30579 p' func.sgml > func-admin.sgml

step2:
in place delete these line ranges in func.sgml
sed --in-place  "2407,4177d ; 5328,7756d ; 8939,11122d ; 15498,19348d
; 21479,22896d ; 24257,27896d ; 27898,30579d" \
    func.sgml
reference: https://unix.stackexchange.com/questions/676210/matching-multiple-ranges-with-sed-range-expressions
           https://www.gnu.org/software/sed/manual/sed.html#Command_002dLine-Options

step3:
put following lines into relative position in func.sgml:
(based on above structure pattern, quickly location line position)

`
&func-string
&func-matching
&func-datetime
&func-json
&func-aggregate
&func-info
&func-admin
`

step4: update filelist.sgml:
diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml
index 3fb0709f..0b78a361 100644
--- a/doc/src/sgml/filelist.sgml
+++ b/doc/src/sgml/filelist.sgml
@@ -18,6 +18,13 @@
 <!ENTITY ddl        SYSTEM "ddl.sgml">
 <!ENTITY dml        SYSTEM "dml.sgml">
 <!ENTITY func       SYSTEM "func.sgml">
+<!ENTITY func-string       SYSTEM "func-string.sgml">
+<!ENTITY func-matching       SYSTEM "func-matching.sgml">
+<!ENTITY func-datetime       SYSTEM "func-datetime.sgml">
+<!ENTITY func-json       SYSTEM "func-json.sgml">
+<!ENTITY func-aggregate       SYSTEM "func-aggregate.sgml">
+<!ENTITY func-info       SYSTEM "func-info.sgml">
+<!ENTITY func-admin       SYSTEM "func-admin.sgml">
 <!ENTITY indices    SYSTEM "indices.sgml">
 <!ENTITY json       SYSTEM "json.sgml">
 <!ENTITY mvcc       SYSTEM "mvcc.sgml">

 doc/src/sgml/filelist.sgml       |     7 +
 doc/src/sgml/func-admin.sgml     |  2682 +++++
 doc/src/sgml/func-aggregate.sgml |  1418 +++
 doc/src/sgml/func-datetime.sgml  |  2184 ++++
 doc/src/sgml/func-info.sgml      |  3640 ++++++
 doc/src/sgml/func-json.sgml      |  3851 ++++++
 doc/src/sgml/func-matching.sgml  |  2429 ++++
 doc/src/sgml/func-string.sgml    |  1771 +++
 doc/src/sgml/func.sgml           | 17979 +----------------------------

we can do it one by one, but it's still worth it.

Re: documentation structure

From

Robert Haas

Date:

15 April 2024, 20:06:11

On Mon, Apr 8, 2024 at 10:15 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> > Here is a new version of this patch. I think this is v18 material at
> > this point, absent an outcry to the contrary. Sometimes we're flexible
> > about doc patches.
>
> Looks good to me.  I think this could go into PG17.

Hearing no objections, done.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: documentation structure

From

Andres Freund

Date:

16 April 2024, 18:23:10

Hi,

On 2024-03-19 17:39:39 -0400, Andrew Dunstan wrote:
> My own pet docs peeve is a purely editorial one: func.sgml is a 30k line
> beast, and I think there's a good case for splitting out at least the
> larger chunks of it.

I think we should work on generating a lot of func.sgml.  Particularly the
signature etc should just come from pg_proc.dat, it's pointlessly painful to
generate that by hand. And for a lot of the functions we should probably move
the existing func.sgml comments to the description in pg_proc.dat.

I suspect that we can't just generate all the documentation from pg_proc,
because of xrefs etc.  Although perhaps we could just strip those out for
pg_proc.

We'd need to add some more metadata to pg_proc, for grouping kinds of
functions together. But that seems doable.

Greetings,

Andres Freund

Re: documentation structure

From

Tom Lane

Date:

16 April 2024, 19:05:32

Andres Freund <andres@anarazel.de> writes:
> I think we should work on generating a lot of func.sgml.  Particularly the
> signature etc should just come from pg_proc.dat, it's pointlessly painful to
> generate that by hand. And for a lot of the functions we should probably move
> the existing func.sgml comments to the description in pg_proc.dat.

Where are you going to get the examples and text descriptions from?
(And no, I don't agree that the pg_description string should match
what's in the docs.  The description string has to be a short
one-liner in just about every case.)

This sounds to me like it would be a painful exercise with not a
lot of benefit in the end.

I do agree with Andrew that splitting func.sgml into multiple files
would be beneficial.

            regards, tom lane

Re: documentation structure

From

Bruce Momjian

Date:

16 April 2024, 19:29:29

On Tue, Apr 16, 2024 at 03:05:32PM -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think we should work on generating a lot of func.sgml.  Particularly the
> > signature etc should just come from pg_proc.dat, it's pointlessly painful to
> > generate that by hand. And for a lot of the functions we should probably move
> > the existing func.sgml comments to the description in pg_proc.dat.
> 
> Where are you going to get the examples and text descriptions from?
> (And no, I don't agree that the pg_description string should match
> what's in the docs.  The description string has to be a short
> one-liner in just about every case.)
> 
> This sounds to me like it would be a painful exercise with not a
> lot of benefit in the end.

Maybe we could _verify_ the contents of func.sgml against pg_proc.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Re: documentation structure

From

Andres Freund

Date:

16 April 2024, 20:17:00

Hi,

On 2024-04-16 15:05:32 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think we should work on generating a lot of func.sgml.  Particularly the
> > signature etc should just come from pg_proc.dat, it's pointlessly painful to
> > generate that by hand. And for a lot of the functions we should probably move
> > the existing func.sgml comments to the description in pg_proc.dat.
>
> Where are you going to get the examples and text descriptions from?

I think there's a few different way to do that. E.g. having long_desc, example
fields in pg_proc.dat. Or having examples and description in a separate file
and "enriching" that with auto-generated function signatures.

> (And no, I don't agree that the pg_description string should match
> what's in the docs.  The description string has to be a short
> one-liner in just about every case.)

Definitely shouldn't be the same in all cases, but I think there's a decent
number of cases where they can be the same. The differences between the two is
often minimal today.

Entirely randomly chosen example:

{ oid => '2825',
  descr => 'slope of the least-squares-fit linear equation determined by the (X, Y) pairs',
  proname => 'regr_slope', prokind => 'a', proisstrict => 'f',
  prorettype => 'float8', proargtypes => 'float8 float8',
  prosrc => 'aggregate_dummy' },

and

      <row>
       <entry role="func_table_entry"><para role="func_signature">
        <indexterm>
         <primary>regression slope</primary>
        </indexterm>
        <indexterm>
         <primary>regr_slope</primary>
        </indexterm>
        <function>regr_slope</function> ( <parameter>Y</parameter> <type>double precision</type>,
<parameter>X</parameter><type>double precision</type> )

        <returnvalue>double precision</returnvalue>
       </para>
       <para>
        Computes the slope of the least-squares-fit linear equation determined
        by the (<parameter>X</parameter>, <parameter>Y</parameter>)
        pairs.
       </para></entry>
       <entry>Yes</entry>
      </row>

The description is quite similar, the pg_proc entry lacks argument names. 

> This sounds to me like it would be a painful exercise with not a
> lot of benefit in the end.

I think the manual work for writing signatures in sgml is not insignificant,
nor is the volume of sgml for them. Manually maintaining the signatures makes
it impractical to significantly improve the presentation - which I don't think
is all that great today.

And the lack of argument names in the pg_proc entries is occasionally fairly
annoying, because a \df+ doesn't provide enough information to use functions.

It'd also be quite useful if clients could render more of the documentation
for functions. People are used to language servers providing full
documentation for functions etc...

Greetings,

Andres Freund

Re: documentation structure

From

Corey Huinker

Date:

17 April 2024, 06:46:53

> This sounds to me like it would be a painful exercise with not a
> lot of benefit in the end.

Maybe we could _verify_ the contents of func.sgml against pg_proc.

All of the functions redefined in catalog/system_functions.sql complicate using pg_proc.dat as a doc generator or source of validation. We'd probably do better to validate against a live instance, and even then the benefit wouldn't be great.

Re: documentation structure

From

Dagfinn Ilmari Mannsåker

Date:

17 April 2024, 11:07:24

Andres Freund <andres@anarazel.de> writes:

> Definitely shouldn't be the same in all cases, but I think there's a decent
> number of cases where they can be the same. The differences between the two is
> often minimal today.
>
> Entirely randomly chosen example:
>
> { oid => '2825',
>   descr => 'slope of the least-squares-fit linear equation determined by the (X, Y) pairs',
>   proname => 'regr_slope', prokind => 'a', proisstrict => 'f',
>   prorettype => 'float8', proargtypes => 'float8 float8',
>   prosrc => 'aggregate_dummy' },
>
> and
>
>       <row>
>        <entry role="func_table_entry"><para role="func_signature">
>         <indexterm>
>          <primary>regression slope</primary>
>         </indexterm>
>         <indexterm>
>          <primary>regr_slope</primary>
>         </indexterm>
>         <function>regr_slope</function> ( <parameter>Y</parameter> <type>double precision</type>,
<parameter>X</parameter><type>double precision</type> )

>         <returnvalue>double precision</returnvalue>
>        </para>
>        <para>
>         Computes the slope of the least-squares-fit linear equation determined
>         by the (<parameter>X</parameter>, <parameter>Y</parameter>)
>         pairs.
>        </para></entry>
>        <entry>Yes</entry>
>       </row>
>
>
> The description is quite similar, the pg_proc entry lacks argument names. 
>
>
>> This sounds to me like it would be a painful exercise with not a
>> lot of benefit in the end.
>
> I think the manual work for writing signatures in sgml is not insignificant,
> nor is the volume of sgml for them. Manually maintaining the signatures makes
> it impractical to significantly improve the presentation - which I don't think
> is all that great today.

And it's very inconsistent.  For example, some functions use <optional>
tags for optional parameters, others use square brackets, and some use
<literal>VARIADIC</literal> to indicate variadic parameters, others use
ellipses (sometimes in <optional> tags or brackets).

> And the lack of argument names in the pg_proc entries is occasionally fairly
> annoying, because a \df+ doesn't provide enough information to use functions.

I was also annoyed by this the other day (specifically wrt. the boolean
arguments to pg_ls_dir), and started whipping up a Perl script to parse
func.sgml and generate missing proargnames values for pg_proc.dat, which
is how I discovered the above.  The script currently has a pile of hacky
regexes to cope with that, so I'd be happy to submit a doc patch to turn
it into actual markup to get rid of that, if people think that's a
worhtwhile use of time and won't clash with any other plans for the
documentation.

> It'd also be quite useful if clients could render more of the documentation
> for functions. People are used to language servers providing full
> documentation for functions etc...

A more user-friendly version of \df+ (maybe spelled \hf, for symmetry
with \h for commands?) would certainly be nice.

> Greetings,
>
> Andres Freund

- ilmari

Re: documentation structure

From

Corey Huinker

Date:

17 April 2024, 17:11:50

And it's very inconsistent. For example, some functions use <optional>
tags for optional parameters, others use square brackets, and some use
<literal>VARIADIC</literal> to indicate variadic parameters, others use
ellipses (sometimes in <optional> tags or brackets).

Having just written a couple of those functions, I wasn't able to find any guidance on how to document them with regards to <optional> vs [], etc. Having such a thing would be helpful.

While we're throwing out ideas, does it make sense to have function parameters and return values be things that can accept COMMENTs? Like so:

COMMENT ON FUNCTION function_name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ] ARGUMENT argname IS '....';
COMMENT ON FUNCTION function_name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ] RETURN VALUE IS '....';

I don't think this is a great idea, but if we're going to auto-generate documentation then we've got to store the metadata somewhere, and pg_proc.dat is already lacking relevant details.

Re: documentation structure

From

Andres Freund

Date:

17 April 2024, 17:21:34

Hi,

On 2024-04-17 02:46:53 -0400, Corey Huinker wrote:
> > > This sounds to me like it would be a painful exercise with not a
> > > lot of benefit in the end.
> >
> > Maybe we could _verify_ the contents of func.sgml against pg_proc.
> >
> 
> All of the functions redefined in catalog/system_functions.sql complicate
> using pg_proc.dat as a doc generator or source of validation. We'd probably
> do better to validate against a live instance, and even then the benefit
> wouldn't be great.

There are 80 'CREATE OR REPLACE's in system_functions.sql, 1016 occurrences of
func_table_entry in funcs.sgml and 3.3k functions in pg_proc. I'm not saying
that differences due to system_functions.sql wouldn't be annoying to deal
with, but it'd also be far from the end of the world.

Greetings,

Andres Freund

Re: documentation structure

From

Andres Freund

Date:

17 April 2024, 17:28:03

Hi,

On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think the manual work for writing signatures in sgml is not insignificant,
> > nor is the volume of sgml for them. Manually maintaining the signatures makes
> > it impractical to significantly improve the presentation - which I don't think
> > is all that great today.
> 
> And it's very inconsistent.  For example, some functions use <optional>
> tags for optional parameters, others use square brackets, and some use
> <literal>VARIADIC</literal> to indicate variadic parameters, others use
> ellipses (sometimes in <optional> tags or brackets).

That seems almost inevitably the outcome of many people having to manually
infer the recommended semantics, for writing something boring but nontrivial,
from a 30k line file.


> > And the lack of argument names in the pg_proc entries is occasionally fairly
> > annoying, because a \df+ doesn't provide enough information to use functions.
> 
> I was also annoyed by this the other day (specifically wrt. the boolean
> arguments to pg_ls_dir),

My bane is regexp_match et al, I have given up on remembering the argument
order.


> and started whipping up a Perl script to parse func.sgml and generate
> missing proargnames values for pg_proc.dat, which is how I discovered the
> above.

Nice.


> The script currently has a pile of hacky regexes to cope with that,
> so I'd be happy to submit a doc patch to turn it into actual markup to get
> rid of that, if people think that's a worhtwhile use of time and won't clash
> with any other plans for the documentation.

I guess it's a bit hard to say without knowing how voluminious the changes
would be. If we end up rewriting the whole file the tradeoff is less clear
than if it's a dozen inconsistent entries.


> > It'd also be quite useful if clients could render more of the documentation
> > for functions. People are used to language servers providing full
> > documentation for functions etc...
> 
> A more user-friendly version of \df+ (maybe spelled \hf, for symmetry
> with \h for commands?) would certainly be nice.

Indeed.

Greetings,

Andres Freund

Re: documentation structure

From

Dagfinn Ilmari Mannsåker

Date:

17 April 2024, 18:37:21

Andres Freund <andres@anarazel.de> writes:

> Hi,
>
> On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote:
>> Andres Freund <andres@anarazel.de> writes:
>> > I think the manual work for writing signatures in sgml is not insignificant,
>> > nor is the volume of sgml for them. Manually maintaining the signatures makes
>> > it impractical to significantly improve the presentation - which I don't think
>> > is all that great today.
>> 
>> And it's very inconsistent.  For example, some functions use <optional>
>> tags for optional parameters, others use square brackets, and some use
>> <literal>VARIADIC</literal> to indicate variadic parameters, others use
>> ellipses (sometimes in <optional> tags or brackets).
>
> That seems almost inevitably the outcome of many people having to manually
> infer the recommended semantics, for writing something boring but nontrivial,
> from a 30k line file.

As Corey mentioned elsethread, having a markup style guide (maybe a
comment at the top of the file?) would be nice.

>> > And the lack of argument names in the pg_proc entries is occasionally fairly
>> > annoying, because a \df+ doesn't provide enough information to use functions.
>> 
>> I was also annoyed by this the other day (specifically wrt. the boolean
>> arguments to pg_ls_dir),
>
> My bane is regexp_match et al, I have given up on remembering the argument
> order.

There's a thread elsewhere about those specifically, but I can't be
bothered to find the link right now.

>> and started whipping up a Perl script to parse func.sgml and generate
>> missing proargnames values for pg_proc.dat, which is how I discovered the
>> above.
>
> Nice.
>
>> The script currently has a pile of hacky regexes to cope with that,
>> so I'd be happy to submit a doc patch to turn it into actual markup to get
>> rid of that, if people think that's a worhtwhile use of time and won't clash
>> with any other plans for the documentation.
>
> I guess it's a bit hard to say without knowing how voluminious the changes
> would be. If we end up rewriting the whole file the tradeoff is less clear
> than if it's a dozen inconsistent entries.

It turned out to not be that many that used [] for optional parameters,
see the attached patch. 

I havent dealt with variadic yet, since the two styles are visually
different, not just markup (<optional>...</optional> renders as [...]).

The two styles for variadic are the what I call caller-style:

   concat ( val1 "any" [, val2 "any" [, ...] ] )
   format(formatstr text [, formatarg "any" [, ...] ])

which shows more clearly how you'd call it, versus definition-style:

   num_nonnulls ( VARIADIC "any" )
   jsonb_extract_path ( from_json jsonb, VARIADIC path_elems text[] )

which matches the CREATE FUNCTION statement.  I don't have a strong
opinion on which we should use, but we should be consistent.

> Greetings,
>
> Andres Freund

- ilmari

From f71e0669eb25b205bd5065f15657ba6d749261f3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Wed, 17 Apr 2024 16:00:52 +0100
Subject: [PATCH] func.sgml: Consistently use <optional> to indicate optional
 parameters

Some functions were using square brackets instead.
---
 doc/src/sgml/func.sgml | 54 +++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8dfb42ad4d..afaaf61d69 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3036,7 +3036,7 @@
          <primary>concat</primary>
         </indexterm>
         <function>concat</function> ( <parameter>val1</parameter> <type>"any"</type>
-         [, <parameter>val2</parameter> <type>"any"</type> [, ...] ] )
+         <optional>, <parameter>val2</parameter> <type>"any"</type> [, ...] </optional> )
         <returnvalue>text</returnvalue>
        </para>
        <para>
@@ -3056,7 +3056,7 @@
         </indexterm>
         <function>concat_ws</function> ( <parameter>sep</parameter> <type>text</type>,
         <parameter>val1</parameter> <type>"any"</type>
-        [, <parameter>val2</parameter> <type>"any"</type> [, ...] ] )
+        <optional>, <parameter>val2</parameter> <type>"any"</type> [, ...] </optional> )
         <returnvalue>text</returnvalue>
        </para>
        <para>
@@ -3076,7 +3076,7 @@
          <primary>format</primary>
         </indexterm>
         <function>format</function> ( <parameter>formatstr</parameter> <type>text</type>
-        [, <parameter>formatarg</parameter> <type>"any"</type> [, ...] ] )
+        <optional>, <parameter>formatarg</parameter> <type>"any"</type> [, ...] </optional> )
         <returnvalue>text</returnvalue>
        </para>
        <para>
@@ -3170,7 +3170,7 @@
          <primary>parse_ident</primary>
         </indexterm>
         <function>parse_ident</function> ( <parameter>qualified_identifier</parameter> <type>text</type>
-        [, <parameter>strict_mode</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>true</literal>
])
 
+        <optional>, <parameter>strict_mode</parameter> <type>boolean</type> <literal>DEFAULT</literal>
<literal>true</literal></optional> )
 
         <returnvalue>text[]</returnvalue>
        </para>
        <para>
@@ -3309,8 +3309,8 @@
          <primary>regexp_count</primary>
         </indexterm>
         <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type>
 
-         [, <parameter>start</parameter> <type>integer</type>
-         [, <parameter>flags</parameter> <type>text</type> ] ] )
+         <optional>, <parameter>start</parameter> <type>integer</type>
+         <optional>, <parameter>flags</parameter> <type>text</type> </optional> </optional> )
         <returnvalue>integer</returnvalue>
        </para>
        <para>
@@ -3331,11 +3331,11 @@
          <primary>regexp_instr</primary>
         </indexterm>
         <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type>
 
-         [, <parameter>start</parameter> <type>integer</type>
-         [, <parameter>N</parameter> <type>integer</type>
-         [, <parameter>endoption</parameter> <type>integer</type>
-         [, <parameter>flags</parameter> <type>text</type>
-         [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] )
+         <optional>, <parameter>start</parameter> <type>integer</type>
+         <optional>, <parameter>N</parameter> <type>integer</type>
+         <optional>, <parameter>endoption</parameter> <type>integer</type>
+         <optional>, <parameter>flags</parameter> <type>text</type>
+         <optional>, <parameter>subexpr</parameter> <type>integer</type> </optional> </optional> </optional>
</optional></optional> )
 
         <returnvalue>integer</returnvalue>
        </para>
        <para>
@@ -3360,7 +3360,7 @@
          <primary>regexp_like</primary>
         </indexterm>
         <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type>
 
-         [, <parameter>flags</parameter> <type>text</type> ] )
+         <optional>, <parameter>flags</parameter> <type>text</type> </optional> )
         <returnvalue>boolean</returnvalue>
        </para>
        <para>
@@ -3380,7 +3380,7 @@
         <indexterm>
          <primary>regexp_match</primary>
         </indexterm>
-        <function>regexp_match</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
 
+        <function>regexp_match</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional>
)
         <returnvalue>text[]</returnvalue>
        </para>
        <para>
@@ -3400,7 +3400,7 @@
         <indexterm>
          <primary>regexp_matches</primary>
         </indexterm>
-        <function>regexp_matches</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
 
+        <function>regexp_matches</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional>
)
         <returnvalue>setof text[]</returnvalue>
        </para>
        <para>
@@ -3426,8 +3426,8 @@
          <primary>regexp_replace</primary>
         </indexterm>
         <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type>, <parameter>replacement</parameter> <type>text</type>
 
-         [, <parameter>start</parameter> <type>integer</type> ]
-         [, <parameter>flags</parameter> <type>text</type> ] )
+         <optional>, <parameter>start</parameter> <type>integer</type> </optional>
+         <optional>, <parameter>flags</parameter> <type>text</type> </optional> )
         <returnvalue>text</returnvalue>
        </para>
        <para>
@@ -3447,7 +3447,7 @@
         <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type>, <parameter>replacement</parameter> <type>text</type>,
 
          <parameter>start</parameter> <type>integer</type>,
          <parameter>N</parameter> <type>integer</type>
-         [, <parameter>flags</parameter> <type>text</type> ] )
+         <optional>, <parameter>flags</parameter> <type>text</type> </optional> )
         <returnvalue>text</returnvalue>
        </para>
        <para>
@@ -3467,7 +3467,7 @@
         <indexterm>
          <primary>regexp_split_to_array</primary>
         </indexterm>
-        <function>regexp_split_to_array</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
 
+        <function>regexp_split_to_array</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional>
)
         <returnvalue>text[]</returnvalue>
        </para>
        <para>
@@ -3486,7 +3486,7 @@
         <indexterm>
          <primary>regexp_split_to_table</primary>
         </indexterm>
-        <function>regexp_split_to_table</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] )
 
+        <function>regexp_split_to_table</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional>
)
         <returnvalue>setof text</returnvalue>
        </para>
        <para>
@@ -3510,10 +3510,10 @@
          <primary>regexp_substr</primary>
         </indexterm>
         <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter><type>text</type>
 
-         [, <parameter>start</parameter> <type>integer</type>
-         [, <parameter>N</parameter> <type>integer</type>
-         [, <parameter>flags</parameter> <type>text</type>
-         [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] )
+         <optional>, <parameter>start</parameter> <type>integer</type>
+         <optional>, <parameter>N</parameter> <type>integer</type>
+         <optional>, <parameter>flags</parameter> <type>text</type>
+         <optional>, <parameter>subexpr</parameter> <type>integer</type> </optional> </optional> </optional>
</optional>)
 
         <returnvalue>text</returnvalue>
        </para>
        <para>
@@ -3980,7 +3980,7 @@
 
     <para>
 <synopsis>
-<function>format</function>(<parameter>formatstr</parameter> <type>text</type> [, <parameter>formatarg</parameter>
<type>"any"</type>[, ...] ])
 
+<function>format</function>(<parameter>formatstr</parameter> <type>text</type> <optional>,
<parameter>formatarg</parameter><type>"any"</type> [, ...] </optional>)
 
 </synopsis>
      <parameter>formatstr</parameter> is a format string that specifies how the
      result should be formatted.  Text in the format string is copied
@@ -10568,7 +10568,7 @@
 
    <para>
 <synopsis>
-date_trunc(<replaceable>field</replaceable>, <replaceable>source</replaceable> [, <replaceable>time_zone</replaceable>
])
+date_trunc(<replaceable>field</replaceable>, <replaceable>source</replaceable> <optional>,
<replaceable>time_zone</replaceable></optional>)
 
 </synopsis>
     <replaceable>source</replaceable> is a value expression of type
     <type>timestamp</type>, <type>timestamp with time zone</type>,
@@ -29308,11 +29308,11 @@
         <indexterm>
          <primary>pg_logical_emit_message</primary>
         </indexterm>
-        <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>,
<parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>text</type> [,
<parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>] )
 
+        <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>,
<parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>text</type> <optional>,
<parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
 
         <returnvalue>pg_lsn</returnvalue>
        </para>
        <para role="func_signature">
-        <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>,
<parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>bytea</type> [,
<parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>] )
 
+        <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>,
<parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>bytea</type> <optional>,
<parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> )
 
         <returnvalue>pg_lsn</returnvalue>
        </para>
        <para>
-- 
2.39.2

Re: documentation structure

From

jian he

Date:

18 April 2024, 13:28:00

On Thu, Apr 18, 2024 at 2:37 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
>
> > Hi,
> >
> > On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote:
> >> Andres Freund <andres@anarazel.de> writes:
> >> > I think the manual work for writing signatures in sgml is not insignificant,
> >> > nor is the volume of sgml for them. Manually maintaining the signatures makes
> >> > it impractical to significantly improve the presentation - which I don't think
> >> > is all that great today.
> >>
> >> And it's very inconsistent.  For example, some functions use <optional>
> >> tags for optional parameters, others use square brackets, and some use
> >> <literal>VARIADIC</literal> to indicate variadic parameters, others use
> >> ellipses (sometimes in <optional> tags or brackets).
> >
> > That seems almost inevitably the outcome of many people having to manually
> > infer the recommended semantics, for writing something boring but nontrivial,
> > from a 30k line file.
>
> As Corey mentioned elsethread, having a markup style guide (maybe a
> comment at the top of the file?) would be nice.
>
> >> > And the lack of argument names in the pg_proc entries is occasionally fairly
> >> > annoying, because a \df+ doesn't provide enough information to use functions.
> >>
> >> I was also annoyed by this the other day (specifically wrt. the boolean
> >> arguments to pg_ls_dir),
> >
> > My bane is regexp_match et al, I have given up on remembering the argument
> > order.
>
> There's a thread elsewhere about those specifically, but I can't be
> bothered to find the link right now.
>
> >> and started whipping up a Perl script to parse func.sgml and generate
> >> missing proargnames values for pg_proc.dat, which is how I discovered the
> >> above.
> >
> > Nice.
> >
> >> The script currently has a pile of hacky regexes to cope with that,
> >> so I'd be happy to submit a doc patch to turn it into actual markup to get
> >> rid of that, if people think that's a worhtwhile use of time and won't clash
> >> with any other plans for the documentation.
> >
> > I guess it's a bit hard to say without knowing how voluminious the changes
> > would be. If we end up rewriting the whole file the tradeoff is less clear
> > than if it's a dozen inconsistent entries.
>
> It turned out to not be that many that used [] for optional parameters,
> see the attached patch.
>

hi.
I manually checked the html output. It looks good to me.

Re: documentation structure

From

Corey Huinker

Date:

18 April 2024, 17:51:10

I havent dealt with variadic yet, since the two styles are visually
different, not just markup (<optional>...</optional> renders as [...]).

The two styles for variadic are the what I call caller-style:

concat ( val1 "any" [, val2 "any" [, ...] ] )
format(formatstr text [, formatarg "any" [, ...] ])

While this style is obviously clumsier for us to compose, it does avoid relying on the user understanding what the word variadic means. Searching through online documentation of the python *args parameter, the word variadic never comes up, the closest they get is "variable length argument". I realize that python is not SQL, but I think it's a good point of reference for what concepts the average reader is likely to know.

Looking at the patch, I think it is good, though I'd consider doing some indentation for the nested <optional>s to allow the author to do more visual tag-matching. The ']'s were sufficiently visually distinct that we didn't really need or want nesting, but <optional> is just another tag to my eyes in a sea of tags.

Re: documentation structure

From

Dagfinn Ilmari Mannsåker

Date:

18 April 2024, 21:34:26

Corey Huinker <corey.huinker@gmail.com> writes:

>>
>> I havent dealt with variadic yet, since the two styles are visually
>> different, not just markup (<optional>...</optional> renders as [...]).
>>
>> The two styles for variadic are the what I call caller-style:
>>
>>    concat ( val1 "any" [, val2 "any" [, ...] ] )
>>    format(formatstr text [, formatarg "any" [, ...] ])
>>
>
> While this style is obviously clumsier for us to compose, it does avoid
> relying on the user understanding what the word variadic means. Searching
> through online documentation of the python *args parameter, the word
> variadic never comes up, the closest they get is "variable length
> argument". I realize that python is not SQL, but I think it's a good point
> of reference for what concepts the average reader is likely to know.

Yeah, we can't expect everyone wanting to call a built-in function to
know how they would define an equivalent one themselves. In that case I
propos marking it up like this:

    <function>format</function> (
    <parameter>formatstr</parameter> <type>text</type>
    <optional>, <parameter>formatarg</parameter> <type>"any"</type>
    <optional>, ...</optional> </optional> )
    <returnvalue>text</returnvalue>

> Looking at the patch, I think it is good, though I'd consider doing some
> indentation for the nested <optional>s to allow the author to do more
> visual tag-matching. The ']'s were sufficiently visually distinct that we
> didn't really need or want nesting, but <optional> is just another tag to
> my eyes in a sea of tags.

The requisite nesting when there are multiple optional parameters makes
it annoying to wrap and indent it "properly" per XML convention, but how
about something like this, with each parameter on a line of its own, and
all the closing </optional> tags on one line?

    <function>regexp_substr</function> (
    <parameter>string</parameter> <type>text</type>,
    <parameter>pattern</parameter> <type>text</type>
    <optional>, <parameter>start</parameter> <type>integer</type>
    <optional>, <parameter>N</parameter> <type>integer</type>
    <optional>, <parameter>flags</parameter> <type>text</type>
    <optional>, <parameter>subexpr</parameter> <type>integer</type>
    </optional> </optional> </optional> </optional> )
    <returnvalue>text</returnvalue>

A lot of functions mostly follow this style, except they tend to put the
first parameter on the same line of the function namee, even when that
makes the line overly long. I propose going the other way, with each
parameter on a line of its own, even if the first one would fit after
the function name, except the whole parameter list fits after the
function name.

Also, when there's only one optional argument, or they're independently
optional, not nested, the </optional> tag should go on the same line as
the parameter.

    <function>substring</function> (
    <parameter>bits</parameter> <type>bit</type>
    <optional> <literal>FROM</literal> <parameter>start</parameter> <type>integer</type> </optional>
    <optional> <literal>FOR</literal> <parameter>count</parameter> <type>integer</type> </optional> )
    <returnvalue>bit</returnvalue>

I'm not quite sure what to with things like json_object which have even
more complex nexting of optional parameters, but I do think the current
200+ character lines are too long.

- ilmari

Re: documentation structure

From

Corey Huinker

Date:

19 April 2024, 00:57:17

Yeah, we can't expect everyone wanting to call a built-in function to
know how they would define an equivalent one themselves. In that case I
propos marking it up like this:

<function>format</function> (
<parameter>formatstr</parameter> <type>text</type>
<optional>, <parameter>formatarg</parameter> <type>"any"</type>
<optional>, ...</optional> </optional> )
<returnvalue>text</returnvalue>

Looks good, but I guess I have to ask: is there a parameter-list tag out there instead of (, and should we be using that?

The requisite nesting when there are multiple optional parameters makes
it annoying to wrap and indent it "properly" per XML convention, but how
about something like this, with each parameter on a line of its own, and
all the closing </optional> tags on one line?

<function>regexp_substr</function> (
<parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter> <type>text</type>
<optional>, <parameter>start</parameter> <type>integer</type>
<optional>, <parameter>N</parameter> <type>integer</type>
<optional>, <parameter>flags</parameter> <type>text</type>
<optional>, <parameter>subexpr</parameter> <type>integer</type>
</optional> </optional> </optional> </optional> )
<returnvalue>text</returnvalue>

Yes, that has an easy count-the-vertical, count-the-horizontal, do-they-match flow to it.

A lot of functions mostly follow this style, except they tend to put the
first parameter on the same line of the function namee, even when that
makes the line overly long. I propose going the other way, with each
parameter on a line of its own, even if the first one would fit after
the function name, except the whole parameter list fits after the
function name.

Also, when there's only one optional argument, or they're independently
optional, not nested, the </optional> tag should go on the same line as
the parameter.

<function>substring</function> (
<parameter>bits</parameter> <type>bit</type>
<optional> <literal>FROM</literal> <parameter>start</parameter> <type>integer</type> </optional>
<optional> <literal>FOR</literal> <parameter>count</parameter> <type>integer</type> </optional> )
<returnvalue>bit</returnvalue>

Re: documentation structure

From

jian he

Date:

19 April 2024, 11:54:43

On Wed, Apr 17, 2024 at 7:07 PM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
>
>
> > It'd also be quite useful if clients could render more of the documentation
> > for functions. People are used to language servers providing full
> > documentation for functions etc...
>
> A more user-friendly version of \df+ (maybe spelled \hf, for symmetry
> with \h for commands?) would certainly be nice.
>

I think `\hf` is useful.
otherwise people first need google to find out the function html page,
then need Ctrl + F to locate specific function entry.

for \hf
we may need to offer a doc url link.
but currently many functions are unlinkable in the doc.
Also one section can have many sections.
I guess just linking directly to a nearby position in a html page
should be fine.

We can also add a url for functions decorated as underscore
like mysql (https://dev.mysql.com/doc/refman/8.3/en/string-functions.html#function_concat).
I am not sure it is an elegant solution.

Re: documentation structure

From

jian he

Date:

28 April 2024, 00:00:00

On Mon, Apr 15, 2024 at 1:00 PM jian he <jian.universality@gmail.com> wrote:
>
> On Wed, Mar 20, 2024 at 5:40 AM Andrew Dunstan <andrew@dunslane.net> wrote:
> >
> >
> > +many for improving the index.
> >
> > My own pet docs peeve is a purely editorial one: func.sgml is a 30k line beast, and I think there's a good case for
splittingout at least the larger chunks of it. 
> >
>
> I think I successfully reduced func.sgml from 311322 lines to 13167 lines.
> (base-commit: 93582974315174d544592185d797a2b44696d1e5)
>
> writing a patch would be unreviewable.

I've splitted it to7 patches.
each patch split one <sect1> into separate new files.

> func-string.sgml
> func-matching.sgml
> func-datetime.sgml
> func-json.sgml
> func-aggregate.sgml
> func-info.sgml
> func-admin.sgml

the above will be newly created files, each corresponding to related
individual patches.

Attachment

Re: documentation structure

From

Corey Huinker

Date:

29 April 2024, 05:17:39

I've splitted it to7 patches.
each patch split one <sect1> into separate new files.

Seems like a good start. Looking at the diffs of these, I wonder if we would be better off with a func/ directory, each function gets its own file in that dir, and either these files above include the individual files, or the original func.sgml just becomes the organizer of all the functions. That would allow us to do future reorganizations with minimal churn, make validation of this patch a bit more straightforward, and make it easier for future editors to find the function they need to edit.

Re: documentation structure

From

jian he

Date:

15 July 2024, 06:35:36

On Mon, Apr 29, 2024 at 1:17 PM Corey Huinker <corey.huinker@gmail.com> wrote:
>>
>> I've splitted it to7 patches.
>> each patch split one <sect1> into separate new files.
>
>
> Seems like a good start. Looking at the diffs of these, I wonder if we would be better off with a func/ directory,
eachfunction gets its own file in that dir, and either these files above include the individual files, or the original
func.sgmljust becomes the organizer of all the functions. That would allow us to do future reorganizations with minimal
churn,make validation of this patch a bit more straightforward, and make it easier for future editors to find the
functionthey need to edit. 

looking back.
The patch is big. no convenient way to review/validate it.
so I created a python script to automate it.
we can review the python script.
(just googling around, I know little about python).

* create new files for holding the content.
func-string.sgml
func-matching.sgml
func-datetime.sgml
func-json.sgml
func-aggregate.sgml
func-info.sgml
func-admin.sgml

* locate parts that need to copy paste to a newly created file, based
on line number.
line number pattern is mentioned here:
http://postgr.es/m/CACJufxEcMjjn-m6fpC2wXHsQbE5nyd%3Dxt6k-jDizBVUKK6O4KQ%40mail.gmail.com

* insert placeholder string in func.sgml:
&func-string;
&func-matching;
&func-datetime;
&func-json;
&func-aggregate;
&func-info;
&func-admin;

* copy the parts to new files.

* validate newly created file. (must only have 2 occurrences of "sect1").

* delete the parts from func.sgml files, since they already copy to new files.
sed --in-place "2408,4180d ; 5330,7760d ; 8942,11127d ; 15502,19436d ;
21567,22985d ; 24346,28017d ; 28020,30714d " func.sgml

* manually change doc/src/sgml/filelist.sgml
 <!ENTITY func       SYSTEM "func.sgml">
+<!ENTITY func-string       SYSTEM "func-string.sgml">
+<!ENTITY func-matching       SYSTEM "func-matching.sgml">
+<!ENTITY func-datetime       SYSTEM "func-datetime.sgml">
+<!ENTITY func-json       SYSTEM "func-json.sgml">
+<!ENTITY func-aggregate       SYSTEM "func-aggregate.sgml">
+<!ENTITY func-info       SYSTEM "func-info.sgml">
+<!ENTITY func-admin     SYSTEM "func-admin.sgml">

2 requirements.
1. manual change doc/src/sgml/filelist.sgml as mentioned before;
2. in python script, at line 35, i use
"os.chdir("/home/jian/Desktop/pg_src/src7/postgres/doc/src/sgml")"
you need to change to your "doc/src/sgml" directory accordingly.

Attachment

test.py

Re: documentation structure

From

Corey Huinker

Date:

18 July 2024, 20:16:37

looking back.
The patch is big. no convenient way to review/validate it.

Perhaps we can break up the patches as follows:

1. create the filelist.sgml entries, and create new files as you detailed, empty with func.sgml still managing the sections, but each section now has it's corresponding &func-something; The files are included, but they're completely empty.

2 to 999+. one commit per function moved from func.sgml to it's corresponding func-something.sgml.

It'll be a ton of commits, but each one will be very easy to review.

Alternately, if we put each function in its own file, there would be a series of commits, one per function like such:

1. create the new func-my-function.sgml, copying the definition of the same name

2. delete the definition in func.sgml, replaced with the &func-include;

3. new entry in the filelist.

This approach looks (and IS) tedious, but it has several key advantages:

1. Each one is very easy to review.
2. Big reduction in future merge conflicts on func.sgml.

3. location of a given functions docs is now trivial.

4. separation of concerns with regard to content of function def vs placement of same.

5. Easy to ensure that all functions have an anchor.

6. The effort can stall and be resumed at our own pace.

Perhaps your python script can be adapted to this approach? I'm willing to review, or collaborate, or both.

Re: documentation structure

From

Andrew Dunstan

Date:

18 July 2024, 20:33:08

On 2024-07-18 Th 4:16 PM, Corey Huinker wrote:

looking back.
The patch is big. no convenient way to review/validate it.

Perhaps we can break up the patches as follows:

1. create the filelist.sgml entries, and create new files as you detailed, empty with func.sgml still managing the sections, but each section now has it's corresponding &func-something; The files are included, but they're completely empty.

2 to 999+. one commit per function moved from func.sgml to it's corresponding func-something.sgml.

It'll be a ton of commits, but each one will be very easy to review.

Alternately, if we put each function in its own file, there would be a series of commits, one per function like such:

1. create the new func-my-function.sgml, copying the definition of the same name
2. delete the definition in func.sgml, replaced with the &func-include;
3. new entry in the filelist.

This approach looks (and IS) tedious, but it has several key advantages:

1. Each one is very easy to review.
2. Big reduction in future merge conflicts on func.sgml.
3. location of a given functions docs is now trivial.
4. separation of concerns with regard to content of function def vs placement of same.
5. Easy to ensure that all functions have an anchor.
6. The effort can stall and be resumed at our own pace.

Perhaps your python script can be adapted to this approach? I'm willing to review, or collaborate, or both.

I'm opposed to having a separate file for every function. I think breaking up func.sgml into one piece per sect1 is about right. If that proves cumbersome still we can look at breaking it up further, but let's start with that.

More concretely, sometimes the bits that relate to a particular function are not contiguous. e.g. you might have an entry in a table for a function and then at the end Notes relating to that function. That make doing a file per function not very practical.

Also, being able to view the context for your function documentation is useful when editing.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: documentation structure

From

Tom Lane

Date:

18 July 2024, 20:40:49

Andrew Dunstan <andrew@dunslane.net> writes:
> I'm opposed to having a separate file for every function.

I also think that would be a disaster.  It would result in a huge
number of files which would make global editing (eg markup changes)
really painful, and it by no means flattens the document structure.
Somewhere there will still need to be decisions that functions
A, B, C go into one documentation section while D, E, F go somewhere
else.

> I think 
> breaking up func.sgml into one piece per sect1 is about right. If that 
> proves cumbersome still we can look at breaking it up further, but let's 
> start with that.

+1

            regards, tom lane

Re: documentation structure

From

Tatsuo Ishii

Date:

19 July 2024, 03:06:22

> I'm opposed to having a separate file for every function. I think
> breaking up func.sgml into one piece per sect1 is about right. If that
> proves cumbersome still we can look at breaking it up further, but
> let's start with that.

That will create at least 30 func-xx.sgml files.

t-ishii$ grep '<sect1' func.sgml|wc -l
30

I am afraid that's too many?

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: documentation structure

From

"David G. Johnston"

Date:

19 July 2024, 04:07:20

On Thu, Jul 18, 2024 at 8:06 PM Tatsuo Ishii <ishii@postgresql.org> wrote:

> I'm opposed to having a separate file for every function. I think
> breaking up func.sgml into one piece per sect1 is about right. If that
> proves cumbersome still we can look at breaking it up further, but
> let's start with that.

That will create at least 30 func-xx.sgml files.

t-ishii$ grep '<sect1' func.sgml|wc -l
30

I am afraid that's too many?

The premise and the resultant number of files both seem reasonable to me. I could get that number down to maybe 20 if pressed but I don't see any benefit to doing so. I look at a page on the website that needs updating then go open its source file. Nice and tidy.

David J.

Re: documentation structure

From

jian he

Date:

19 July 2024, 06:55:00

On Fri, Jul 19, 2024 at 12:07 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 8:06 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
>>
>> > I'm opposed to having a separate file for every function. I think
>> > breaking up func.sgml into one piece per sect1 is about right. If that
>> > proves cumbersome still we can look at breaking it up further, but
>> > let's start with that.
>>
>> That will create at least 30 func-xx.sgml files.
>>
>> t-ishii$ grep '<sect1' func.sgml|wc -l
>> 30
>>
>> I am afraid that's too many?
>>
>
> The premise and the resultant number of files both seem reasonable to me.  I could get that number down to maybe 20
ifpressed but I don't see any benefit to doing so. I look at a page on the website that needs updating then go open its
sourcefile.  Nice and tidy. 
>

hi. my python test.py script in [1] cut func.sgml from 31k lines to
13k lines by putting some contents to these 7 new files.
        new file:   func-admin.sgml
        new file:   func-aggregate.sgml
        new file:   func-datetime.sgml
        new file:   func-info.sgml
        new file:   func-json.sgml
        new file:   func-matching.sgml
        new file:   func-string.sgml

doc/src/sgml/filelist.sgml changes are addressed in the attached patch.
cutting func.sgml into half is ok for me.


in test.py , at line 35, i use
"os.chdir("/home/jian/Desktop/pg_src/src7/postgres/doc/src/sgml")"
you need to change to your corresponding "doc/src/sgml" directory.
overall, you need apply the attached patch, change test.py line 35.



main gotcha is based on pattern mentioned in [2] and
index layout from https://www.postgresql.org/docs/devel/functions.html

[1] https://postgr.es/m/CACJufxH%2BYi521QrncwnW4sFGOhPmJQpsmoJ%2BYnj%2BVpHu5wAahQ%40mail.gmail.com\
[2] http://postgr.es/m/CACJufxEcMjjn-m6fpC2wXHsQbE5nyd%3Dxt6k-jDizBVUKK6O4KQ%40mail.gmail.com

Attachment

v1-0001-filelist.sgml-add-new-file-entries.patch

Re: documentation structure

From

Tom Lane

Date:

19 July 2024, 16:22:14

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thu, Jul 18, 2024 at 8:06 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
>>> I'm opposed to having a separate file for every function. I think
>>> breaking up func.sgml into one piece per sect1 is about right.

>> That will create at least 30 func-xx.sgml files.
>> I am afraid that's too many?

> The premise and the resultant number of files both seem reasonable to me.

I agree.  The hundreds that would result from file-per-function, or
anything close to that, would be too many.  But I can deal with
file-per-sect1.  For context, I count currently 167 sgml/*.sgml files
plus 219 ref/*.sgml, so adding 30 more would be an 8% increase.

Do we want to use a "func-" prefix on the file names?  I could
imagine dispensing with that as unnecessary; or another idea
could be to make a new subdirectory func/ for these.

            regards, tom lane

Re: documentation structure

From

Tatsuo Ishii

Date:

19 July 2024, 20:00:46

> Do we want to use a "func-" prefix on the file names?  I could
> imagine dispensing with that as unnecessary;

If we don't use the prefix and we generate new file names from sect1
tag, we could have file name collision: for example, json.sgml because
there's sect1 tag "functions-json".

> or another idea
> could be to make a new subdirectory func/ for these.

+1. Looks better than adding +30 files right under sgml directory.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: documentation structure

From

"David G. Johnston"

Date:

19 July 2024, 20:11:46

On Fri, Jul 19, 2024 at 1:01 PM Tatsuo Ishii <ishii@postgresql.org> wrote:

> Do we want to use a "func-" prefix on the file names? I could
> imagine dispensing with that as unnecessary;

If we don't use the prefix and we generate new file names from sect1
tag, we could have file name collision: for example, json.sgml because
there's sect1 tag "functions-json".

> or another idea
> could be to make a new subdirectory func/ for these.

+1. Looks better than adding +30 files right under sgml directory.

+many for placing these under a subdirectory.

IMO the file name should match the ID of the sect1 element with the leading "functions-" removed, naming the directory "functions". Thus when viewing the web page the corresponding sgml file is determinable.

David J.

Re: documentation structure

From

Tom Lane

Date:

19 July 2024, 20:17:41

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> +many for placing these under a subdirectory.

> IMO the file name should match the ID of the sect1 element with the leading
> "functions-" removed, naming the directory "functions".  Thus when viewing
> the web page the corresponding sgml file is determinable.

I'd go for shorter myself (ie "func/"), mainly due to the precedent
of the existing subdirectory which is "ref/" not "reference/".
It's hardly a big deal though.

            regards, tom lane

Re: documentation structure

From

Tatsuo Ishii

Date:

20 July 2024, 00:47:17

>> IMO the file name should match the ID of the sect1 element with the leading
>> "functions-" removed, naming the directory "functions".  Thus when viewing
>> the web page the corresponding sgml file is determinable.
> 
> I'd go for shorter myself (ie "func/"), mainly due to the precedent
> of the existing subdirectory which is "ref/" not "reference/".
> It's hardly a big deal though.

I don't have strong preference neither but I agree that "func/" is
more consistent with existing subdirectory names.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Re: documentation structure

From

"David G. Johnston"

Date:

20 July 2024, 01:00:18

On Fri, Jul 19, 2024 at 5:47 PM Tatsuo Ishii <ishii@postgresql.org> wrote:

>> IMO the file name should match the ID of the sect1 element with the leading
>> "functions-" removed, naming the directory "functions". Thus when viewing
>> the web page the corresponding sgml file is determinable.
>
> I'd go for shorter myself (ie "func/"), mainly due to the precedent
> of the existing subdirectory which is "ref/" not "reference/".
> It's hardly a big deal though.

I don't have strong preference neither but I agree that "func/" is
more consistent with existing subdirectory names.

Works for me. I was definitely on the fence between the two.

David J.

Re: documentation structure

From

jian he

Date:

22 July 2024, 02:42:00

1. manually change line 4 in split_func_sgml.py and run the script.
2. git apply v6-0001-all-filelist-for-directory-doc-src-sgml-func.patch
now you can "ninja doc/src/sgml/html"
The logic is simple, but I am using verbose variable names, that's why
split_func_sgml.py size is large.


adding a new directory: doc/src/sgml/func
new directory files: doc/src/sgml/func/allfiles.sgml and others.
For filenames, we can use "func-pattern" or just "pattern". right now,
I use the prefix "func".
each newly created file validates by: only 2 "sect1" exists, each file
has only "<sect1 id=" pattern.


the following ar content of  doc/src/sgml/func/allfiles.sgml

<!--
doc/src/sgml/func/allfiles.sgml
PostgreSQL documentation
Complete list of usable sgml source files in this directory.
-->

<!-- function references -->

<!ENTITY func                       SYSTEM "func.sgml">
<!ENTITY func-logical               SYSTEM "func-logical.sgml">
<!ENTITY func-comparison            SYSTEM "func-comparison.sgml">
<!ENTITY func-math                  SYSTEM "func-math.sgml">
<!ENTITY func-string                SYSTEM "func-string.sgml">
<!ENTITY func-binarystring          SYSTEM "func-binarystring.sgml">
<!ENTITY func-bitstring             SYSTEM "func-bitstring.sgml">
<!ENTITY func-matching              SYSTEM "func-matching.sgml">
<!ENTITY func-formatting            SYSTEM "func-formatting.sgml">
<!ENTITY func-datetime              SYSTEM "func-datetime.sgml">
<!ENTITY func-enum                  SYSTEM "func-enum.sgml">
<!ENTITY func-geometry              SYSTEM "func-geometry.sgml">
<!ENTITY func-net                   SYSTEM "func-net.sgml">
<!ENTITY func-textsearch            SYSTEM "func-textsearch.sgml">
<!ENTITY func-uuid                  SYSTEM "func-uuid.sgml">
<!ENTITY func-xml                   SYSTEM "func-xml.sgml">
<!ENTITY func-json                  SYSTEM "func-json.sgml">
<!ENTITY func-sequence              SYSTEM "func-sequence.sgml">
<!ENTITY func-conditional           SYSTEM "func-conditional.sgml">
<!ENTITY func-array                 SYSTEM "func-array.sgml">
<!ENTITY func-range                 SYSTEM "func-range.sgml">
<!ENTITY func-aggregate             SYSTEM "func-aggregate.sgml">
<!ENTITY func-window                SYSTEM "func-window.sgml">
<!ENTITY func-merge-support         SYSTEM "func-merge-support.sgml">
<!ENTITY func-subquery              SYSTEM "func-subquery.sgml">
<!ENTITY func-comparisons           SYSTEM "func-comparisons.sgml">
<!ENTITY func-srf                   SYSTEM "func-srf.sgml">
<!ENTITY func-info                  SYSTEM "func-info.sgml">
<!ENTITY func-admin                 SYSTEM "func-admin.sgml">
<!ENTITY func-trigger               SYSTEM "func-trigger.sgml">
<!ENTITY func-event-triggers        SYSTEM "func-event-triggers.sgml">
<!ENTITY func-statistics            SYSTEM "func-statistics.sgml">

Attachment

Re: documentation structure

From

jian he

Date:

20 August 2024, 06:37:53

On Tue, Aug 20, 2024 at 11:18 AM jian he <jian.universality@gmail.com> wrote:
>
> On Mon, Jul 22, 2024 at 10:42 AM jian he <jian.universality@gmail.com> wrote:
> >
> hi. this time everything should be ok!
>
>
> step1. python3  split_func_sgml.py
> step2. git apply
> v6-0001-all-filelist-for-directory-doc-src-sgml-func.patch (step2,
> cannot use "git am").
>

sorry. I mean the file attached in the previous mail.
step1. python3  v7_split_func_sgml.py
step2. git apply v7-0001-all-filelist-for-directory-doc-src-sgml-func.patch
(step2, cannot use "git am").

Re: documentation structure

From

Heikki Linnakangas

Date:

04 November 2024, 19:42:24

On 30/09/2024 11:15, jian he wrote:
> In, 0001-func.sgml-Consistently-use-optional-to-indicate-opti.patch
> 
> -<function>format</function>(<parameter>formatstr</parameter>
> <type>text</type> [, <parameter>formatarg</parameter>
> <type>"any"</type> [, ...] ])
> +<function>format</function>(<parameter>formatstr</parameter>
> <type>text</type> <optional>, <parameter>formatarg</parameter>
> <type>"any"</type> [, ...] </optional>)
> 
> i change it further to
> +<function>format</function>(<parameter>formatstr</parameter>
> <type>text</type> <optional>, <parameter>formatarg</parameter>
> <type>"any"</type> <optional>, ...</optional> </optional>)
> 
> i did these kind of change to <function>format</function>,
> <function>concat_ws</function>, <function>concat</function>
> 
> I've rebased your patch,
> added a commitfest entry: https://commitfest.postgresql.org/50/5278/
> it seems I cannot mark you as the author in commitfest.
> anyway, you ( Dagfinn Ilmari Mannsåker ) are the author of it.

Committed, thanks!

One funny case remains:

        <entry role="func_table_entry"><para role="func_signature">
         <function>unnest</function> ( <type>anyarray</type>, 
<type>anyarray</type> <optional>, ... </optional> )
         <returnvalue>setof anyelement, anyelement [, ... ]</returnvalue>
        </para>
        <para>

The returnvalue contains brackets. I tried to change that to use 
"<optional>" tag too, but it's not valid:

postgres.sgml:20552: element returnvalue: validity error : Element 
optional is not declared in returnvalue list of possible children

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: documentation structure

From

Dagfinn Ilmari Mannsåker

Date:

05 November 2024, 00:08:51

Heikki Linnakangas <hlinnaka@iki.fi> writes:

> On 30/09/2024 11:15, jian he wrote:
>> In, 0001-func.sgml-Consistently-use-optional-to-indicate-opti.patch
>> -<function>format</function>(<parameter>formatstr</parameter>
>> <type>text</type> [, <parameter>formatarg</parameter>
>> <type>"any"</type> [, ...] ])
>> +<function>format</function>(<parameter>formatstr</parameter>
>> <type>text</type> <optional>, <parameter>formatarg</parameter>
>> <type>"any"</type> [, ...] </optional>)
>> i change it further to
>> +<function>format</function>(<parameter>formatstr</parameter>
>> <type>text</type> <optional>, <parameter>formatarg</parameter>
>> <type>"any"</type> <optional>, ...</optional> </optional>)
>> i did these kind of change to <function>format</function>,
>> <function>concat_ws</function>, <function>concat</function>
>> I've rebased your patch,
>> added a commitfest entry: https://commitfest.postgresql.org/50/5278/
>> it seems I cannot mark you as the author in commitfest.
>> anyway, you ( Dagfinn Ilmari Mannsåker ) are the author of it.
>
> Committed, thanks!

Thanks!

> One funny case remains:
>
>        <entry role="func_table_entry"><para role="func_signature">
>         <function>unnest</function> ( <type>anyarray</type>,
>         <type>anyarray</type> <optional>, ... </optional> )
>         <returnvalue>setof anyelement, anyelement [, ... ]</returnvalue>
>        </para>
>        <para>
>
> The returnvalue contains brackets. I tried to change that to use
> "<optional>" tag too, but it's not valid:
>
> postgres.sgml:20552: element returnvalue: validity error : Element
> optional is not declared in returnvalue list of possible children

Yeah, I remember running into that, and deciding I couldn't be bothered
to try to learn enough about docbook to figure out if a solution was
possible.

- ilmari