Thread: documentation structure
I was looking at the documentation index this morning[1], and I can't help feeling like there are some parts of it that are over-emphasized and some parts that are under-emphasized. I'm not sure what we can do about this exactly, but I thought it worth writing an email and seeing what other people think. The two sections of the documentation that seem really under-emphasized to me are the GUC documentation and the SQL reference. The GUC documentation is all buried under "20. Server Configuration" and the SQL command reference is under "I. SQL commands". For reasons that I don't understand, all chapters except for those in "VI. Reference" are numbered, but the chapters in that section have Roman numerals instead. I don't know what other people's experience is, but for me, wanting to know what a command does or what a setting does is extremely common. Therefore, I think these chapters are disproportionately important and should be emphasized more. In the case of the GUC reference, one idea I have is to split up "III. Server Administration". My proposal is that we divide it into three sections. The first would be called "III. Server Installation" and would cover chapters 16 (installation from binaries) through 19 (server setup and operation). The second would be called "IV. Server Configuration" -- so every section that's currently a subsection of "server configuration" would become a top-level chapter. The third division would be "V. Server Administration," and would cover the current chapters 21-33. This is probably far from perfect, but it seems like a relatively simple change and better than what we have now. I don't know what to do about "I. SQL commands". It's obviously impractical to promote that to a top-level section, because it's got a zillion sub-pages which I don't think we want in the top-level documentation index. But having it as one of several unnumbered chapters interposed between 51 and 52 doesn't seem great either. The stuff that I think is over-emphasized is as follows: (a) chapters 1-3, the tutorial; (b) chapters 4-6, which are essentially a continuation of the tutorial, and not at all similar to chapters 8-11 which are chalk-full of detailed technical information; (c) chapters 43-46, one per procedural language; perhaps these could just be demoted to sub-sections of chapter 42 on procedural languages; (d) chapters 47 (server programming interface), 50 (replication progress tracking), and 51 (archive modules), all of which are important to document but none of which seem important enough to put them in the top-level documentation index; and (e) large parts of section "VII. Internals," which again contain tons of stuff of very marginal interest. The first ~4 chapters of the internals section seem like they might be mainstream enough to justify the level of prominence that we give them, but the rest has got to be of interest to a tiny minority of readers. I think it might be possible to consolidate the internals section by grouping a bunch of existing entries together by category. Basically, after the first few chapters, you've got stuff that is of interest to C programmers writing core or extension code; and you've got explainers on things like GEQO and index op-classes and support functions which might be of interest even to non-programmers. I think for example that we don't need separate top-level chapters on writing procedural language handlers, FDWs, tablesample methods, custom scan providers, table access methods, index access methods, and WAL resource managers. Some or all of those could be grouped under a single chapter, perhaps, e.g. Using PostgreSQL Extensibility Interfaces. Thoughts? I realize that this topic is HIGHLY prone to ENDLESS bikeshedding, and it's inevitable that not everybody is going to agree. But I hope we can agree that it's completely silly that it's vastly easier to find the documentation about the backup manifest format than it is to find the documentation on CREATE TABLE or shared_buffers, and if we can agree on that, then perhaps we can agree on some way to make things better. -- Robert Haas EDB: http://www.enterprisedb.com [1] https://www.postgresql.org/docs/16/index.html
On Mon, 18 Mar 2024 at 15:12, Robert Haas <robertmhaas@gmail.com> wrote: I'm not going into detail about the other docs comments, I don't have much of an opinion either way on the mentioned sections. You make good arguments; yet I don't usually use those sections of the docs but rather do code searches. > I don't know what to do about "I. SQL commands". It's obviously > impractical to promote that to a top-level section, because it's got a > zillion sub-pages which I don't think we want in the top-level > documentation index. But having it as one of several unnumbered > chapters interposed between 51 and 52 doesn't seem great either. Could "SQL Commands" be a top-level construct, with subsections for SQL/DML, SQL/DDL, SQL/Transaction management, and PG's extensions/administrative/misc features? I sometimes find myself trying to mentally organize what SQL commands users can use vs those accessible to database owners and administrators, which is not currently organized as such in the SQL Commands section. Kind regards, Matthias van de Meent Neon (https://neon.tech)
On Mon, Mar 18, 2024 at 10:55 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > I don't know what to do about "I. SQL commands". It's obviously > > impractical to promote that to a top-level section, because it's got a > > zillion sub-pages which I don't think we want in the top-level > > documentation index. But having it as one of several unnumbered > > chapters interposed between 51 and 52 doesn't seem great either. > > Could "SQL Commands" be a top-level construct, with subsections for > SQL/DML, SQL/DDL, SQL/Transaction management, and PG's > extensions/administrative/misc features? I sometimes find myself > trying to mentally organize what SQL commands users can use vs those > accessible to database owners and administrators, which is not > currently organized as such in the SQL Commands section. Yeah, I wondered about that, too. Or for example you could group all CREATE commands together, all ALTER commands together, all DROP commands together, etc. But I can't really see a future in such schemes, because having a single page that links to the reference documentation for every single command we have in alphabetical order is incredibly handy, or at least I have found it so. So my feeling - at least at present - is that it's more fruitful to look into cutting down the amount of clutter that appears in the top-level documentation index, and maybe finding ways to make important sections like the SQL reference more prominent. Given how much documentation we have, it's just not going to be possible to make everything that matters conveniently visible at the top level. I think if people have to click down a level for the SQL reference, that's fine, as long as the link they need to click on is reasonably visible. What annoys me about the present structure is that it isn't. You don't get any visual clue that the "SQL Commands" page with ~100 subpages is more important than "51. Archive Modules" or "33. Regression Tests" or "58. Writing a Procedural Language Handler," but it totally is. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Mar 18, 2024 at 10:12 AM Robert Haas <robertmhaas@gmail.com> wrote:
I was looking at the documentation index this morning[1], and I can't
help feeling like there are some parts of it that are over-emphasized
and some parts that are under-emphasized. I'm not sure what we can do
about this exactly, but I thought it worth writing an email and seeing
what other people think.
I agree, and my usage patterns of the docs are similar.
As the project progresses and more features are added and tacked on to existing docs, things can get
murky or buried. I imagine that web access and search logs could paint a picture of documentation usage.
I don't know what other people's experience is, but for me, wanting to
know what a command does or what a setting does is extremely common.
Therefore, I think these chapters are disproportionately important and
should be emphasized more. In the case of the GUC reference, one idea
+1
I have is to split up "III. Server Administration". My proposal is
that we divide it into three sections. The first would be called "III.
Server Installation" and would cover chapters 16 (installation from
binaries) through 19 (server setup and operation). The second would be
called "IV. Server Configuration" -- so every section that's currently
a subsection of "server configuration" would become a top-level
chapter. The third division would be "V. Server Administration," and
would cover the current chapters 21-33. This is probably far from
I like all of those.
I don't know what to do about "I. SQL commands". It's obviously
impractical to promote that to a top-level section, because it's got a
zillion sub-pages which I don't think we want in the top-level
documentation index. But having it as one of several unnumbered
chapters interposed between 51 and 52 doesn't seem great either.
I think it'd be easier to read if current "VI. Reference" came right after "Server Administration",
ahead of "Client Interfaces" and "Server Programming", which are of interest to a much smaller
subset of users.
Also if the subchapters were numbered like the rest of them. I don't think the roman numerals are
particularly helpful.
The stuff that I think is over-emphasized is as follows: (a) chapters
1-3, the tutorial; (b) chapters 4-6, which are essentially a
...
Also +1
Thoughts? I realize that this topic is HIGHLY prone to ENDLESS
bikeshedding, and it's inevitable that not everybody is going to
agree. But I hope we can agree that it's completely silly that it's
vastly easier to find the documentation about the backup manifest
format than it is to find the documentation on CREATE TABLE or
shared_buffers, and if we can agree on that, then perhaps we can agree
on some way to make things better.
Impossible to please everyone, but I'm sure we can improve things.
I've contributed to different parts of the docs over the years, and would be happy
to help with this work.
Roberto
On Mon, 2024-03-18 at 10:11 -0400, Robert Haas wrote: > The two sections of the documentation that seem really > under-emphasized to me are the GUC documentation and the SQL > reference. The GUC documentation is all buried under "20. Server > Configuration" and the SQL command reference is under "I. SQL > commands". For reasons that I don't understand, all chapters except > for those in "VI. Reference" are numbered, but the chapters in that > section have Roman numerals instead. That last fact is very odd indeed and could be easily fixed. > I don't know what other people's experience is, but for me, wanting to > know what a command does or what a setting does is extremely common. > Therefore, I think these chapters are disproportionately important and > should be emphasized more. In the case of the GUC reference, one idea > I have is to split up "III. Server Administration". My proposal is > that we divide it into three sections. The first would be called "III. > Server Installation" and would cover chapters 16 (installation from > binaries) through 19 (server setup and operation). The second would be > called "IV. Server Configuration" -- so every section that's currently > a subsection of "server configuration" would become a top-level > chapter. The third division would be "V. Server Administration," and > would cover the current chapters 21-33. This is probably far from > perfect, but it seems like a relatively simple change and better than > what we have now. I'm fine with splitting up "Server Administration" into three sections like you propose. > I don't know what to do about "I. SQL commands". It's obviously > impractical to promote that to a top-level section, because it's got a > zillion sub-pages which I don't think we want in the top-level > documentation index. But having it as one of several unnumbered > chapters interposed between 51 and 52 doesn't seem great either. I think that both the GUCs and the SQL reference could be top-level sections. For the GUCs there is an obvious split in sub-chapters, and the SQL reference could be a top-level section without any chapters under it. > The stuff that I think is over-emphasized is as follows: (a) chapters > 1-3, the tutorial; (b) chapters 4-6, which are essentially a > continuation of the tutorial, and not at all similar to chapters 8-11 > which are chalk-full of detailed technical information; (c) chapters > 43-46, one per procedural language; perhaps these could just be > demoted to sub-sections of chapter 42 on procedural languages; (d) > chapters 47 (server programming interface), 50 (replication progress > tracking), and 51 (archive modules), all of which are important to > document but none of which seem important enough to put them in the > top-level documentation index; and (e) large parts of section "VII. > Internals," which again contain tons of stuff of very marginal > interest. The first ~4 chapters of the internals section seem like > they might be mainstream enough to justify the level of prominence > that we give them, but the rest has got to be of interest to a tiny > minority of readers. I disagree that the tutorial is over-emphasized. I also disagree that chapters 4 to 6 are a continuation of the tutorial. Or at least, they shouldn't be. When I am looking for a documentation reference on something like security considerations of SECURITY DEFINER functions, my first impulse is to look in chapter 5 (Data Definition) or in chapter 38 (Extending SQL), and I am surprised to find it discussed in the SQL reference of CREATE FUNCTION. Another case in point is the "Notes" section for CREATE VIEW. Why is that not somewhere under "Data Definition"? For me, the reference should be terse and focused on the syntax. Changing that is probably a lost cause by now, but I feel that we need not encourage that development any more by playing down the earlier chapters. > I think it might be possible to consolidate the internals section by > grouping a bunch of existing entries together by category. Basically, > after the first few chapters, you've got stuff that is of interest to > C programmers writing core or extension code; and you've got > explainers on things like GEQO and index op-classes and support > functions which might be of interest even to non-programmers. I think > for example that we don't need separate top-level chapters on writing > procedural language handlers, FDWs, tablesample methods, custom scan > providers, table access methods, index access methods, and WAL > resource managers. Some or all of those could be grouped under a > single chapter, perhaps, e.g. Using PostgreSQL Extensibility > Interfaces. I have no strong feelings about that. Yours, Laurenz Albe
Laurenz Albe <laurenz.albe@cybertec.at> writes: > On Mon, 2024-03-18 at 10:11 -0400, Robert Haas wrote: >> I don't know what to do about "I. SQL commands". It's obviously >> impractical to promote that to a top-level section, because it's got a >> zillion sub-pages which I don't think we want in the top-level >> documentation index. But having it as one of several unnumbered >> chapters interposed between 51 and 52 doesn't seem great either. > I think that both the GUCs and the SQL reference could be top-level > sections. For the GUCs there is an obvious split in sub-chapters, > and the SQL reference could be a top-level section without any chapters > under it. I'd be in favor of promoting all three of the "Reference" things to the top level, except that as Robert says, it seems likely that that would end in having a hundred individual command reference pages visible in the topmost table of contents. Also, if we manage to suppress that, did we really make it any more prominent? Not sure. Making "SQL commands" top-level with half a dozen subsections would solve the visibility problem, but I'm not real eager to go there, because I foresee endless arguments about which subsection a given command goes in. Robert's point about wanting a single alphabetized list is valid too (although you could imagine that being a list in an introductory section, similar to what we have for system catalogs). This might be a silly suggestion, but: could we just render the "most important" chapter titles in a larger font? regards, tom lane
On Mon, Mar 18, 2024 at 6:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > This might be a silly suggestion, but: could we just render the > "most important" chapter titles in a larger font? It's not the silliest suggestion ever -- you could have proposed <blink>! -- but I also suspect it's not the right answer. Of course, varying the font size can be a great way of emphasizing some things more than others, but it doesn't usually work out well to just take a document that was designed to be displayed in a uniform font size and enlarge bits of text here and there. You usually want to have some kind of overall plan of which font size is a single component. For example, on a corporate home page, it's quite common to have two nav bars, the larger of which has entries that correspond to the company's product offerings and/or marketing materials, and the smaller of which has "utility functions" like "login", "contact us", and "search". Font size can be an effective tool for emphasizing the relative importance of one nav bar versus the other, but you don't start by deciding which things are going to get displayed in a larger font. You start with an overall idea of the layout and then the font size flows out of that. Just riffing a bit, you could imagine adding a nav bar to our documentation, either across the top or along the side, that is always there on every page of the documentation and contains those links that we want to make sure are always visible. Necessarily, these must be limited in number. Then on the home page you could have the whole table of contents as we do today, and you use that to navigate to everything that isn't one of the quick links. Or you can imagine that the home page of our documentation isn't just a tree view like it is today; it might instead be written in paragraph form. "Welcome to the PostgreSQL documentation! If you're new here, check out our <link>tutorial</link>! Otherwise, you might be interested in our <link>SQL reference</link>, our <link>configuration reference</link>, or our <link>banana plantation</link>. If none of those sound like what you want, check out the <link>documentation index</link>." Obviously in order to actually work, something like this would need to be expanded into enough paragraphs to actually cover all of the important sections of the documentation, and probably not mention banana plantations. Or maybe it wouldn't be just paragraphs, but a two-column table, with each row of the table having a main title and link in the narrower lefthand column and a blurb with more links in the wider righthand column. I'm sure there are a lot of other ways to do this, too. Our main documentation page is very old-school, and there are probably a bunch of ways to do better. But I'm not sure how easy it would be to get agreement on something specific, and I don't know how well our toolchain can support anything other than what we've already got. I've also learned from painful experience that you can't fix bad content with good markup. I think it is worth spending some effort on trying to beat the existing format into submission, promoting things that seem to deserve it and demoting those that seem to deserve that. At some point, we'll probably reach a point of diminishing returns, either because we all agree we've done as well as we can, or because we can't agree on what else to do, and maybe at that point the only way to improve further is with better web design and/or a different documentation toolchain. But I think it's fairly clear that we're not at that point now. -- Robert Haas EDB: http://www.enterprisedb.com
> On 18 Mar 2024, at 22:40, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > On Mon, 2024-03-18 at 10:11 -0400, Robert Haas wrote: >> For reasons that I don't understand, all chapters except >> for those in "VI. Reference" are numbered, but the chapters in that >> section have Roman numerals instead. > > That last fact is very odd indeed and could be easily fixed. It's actually not very odd, the reference section is using <reference> elements and we had missed the arabic numerals setting on those. The attached fixes that for me. That being said, we've had roman numerals for the reference section since forever (all the way down to the 7.2 docs online has it) so maybe it was intentional? Or no one managed to see it until Robert did, I've certainly never noticed it until now. -- Daniel Gustafsson
Attachment
Daniel Gustafsson <daniel@yesql.se> writes: > It's actually not very odd, the reference section is using <reference> elements > and we had missed the arabic numerals setting on those. The attached fixes > that for me. That being said, we've had roman numerals for the reference > section since forever (all the way down to the 7.2 docs online has it) so maybe > it was intentional? I'm quite sure it *was* intentional. Maybe it was a bad idea, but it's not that way simply because nobody thought about it. regards, tom lane
On Mon, Mar 18, 2024 at 10:12 AM Robert Haas <robertmhaas@gmail.com> wrote:
I was looking at the documentation index this morning[1], and I can't
help feeling like there are some parts of it that are over-emphasized
and some parts that are under-emphasized. I'm not sure what we can do
about this exactly, but I thought it worth writing an email and seeing
what other people think.
The two sections of the documentation that seem really
under-emphasized to me are the GUC documentation and the SQL
reference. The GUC documentation is all buried under "20. Server
Configuration" and the SQL command reference is under "I. SQL
commands". For reasons that I don't understand, all chapters except
for those in "VI. Reference" are numbered, but the chapters in that
section have Roman numerals instead.
I don't know what other people's experience is, but for me, wanting to
know what a command does or what a setting does is extremely common.
Therefore, I think these chapters are disproportionately important and
should be emphasized more. In the case of the GUC reference, one idea
I have is to split up "III. Server Administration". My proposal is
that we divide it into three sections. The first would be called "III.
Server Installation" and would cover chapters 16 (installation from
binaries) through 19 (server setup and operation). The second would be
called "IV. Server Configuration" -- so every section that's currently
a subsection of "server configuration" would become a top-level
chapter. The third division would be "V. Server Administration," and
would cover the current chapters 21-33. This is probably far from
perfect, but it seems like a relatively simple change and better than
what we have now.
I don't know what to do about "I. SQL commands". It's obviously
impractical to promote that to a top-level section, because it's got a
zillion sub-pages which I don't think we want in the top-level
documentation index. But having it as one of several unnumbered
chapters interposed between 51 and 52 doesn't seem great either.
The stuff that I think is over-emphasized is as follows: (a) chapters
1-3, the tutorial; (b) chapters 4-6, which are essentially a
continuation of the tutorial, and not at all similar to chapters 8-11
which are chalk-full of detailed technical information; (c) chapters
43-46, one per procedural language; perhaps these could just be
demoted to sub-sections of chapter 42 on procedural languages; (d)
chapters 47 (server programming interface), 50 (replication progress
tracking), and 51 (archive modules), all of which are important to
document but none of which seem important enough to put them in the
top-level documentation index; and (e) large parts of section "VII.
Internals," which again contain tons of stuff of very marginal
interest. The first ~4 chapters of the internals section seem like
they might be mainstream enough to justify the level of prominence
that we give them, but the rest has got to be of interest to a tiny
minority of readers.
I think it might be possible to consolidate the internals section by
grouping a bunch of existing entries together by category. Basically,
after the first few chapters, you've got stuff that is of interest to
C programmers writing core or extension code; and you've got
explainers on things like GEQO and index op-classes and support
functions which might be of interest even to non-programmers. I think
for example that we don't need separate top-level chapters on writing
procedural language handlers, FDWs, tablesample methods, custom scan
providers, table access methods, index access methods, and WAL
resource managers. Some or all of those could be grouped under a
single chapter, perhaps, e.g. Using PostgreSQL Extensibility
Interfaces.
Thoughts? I realize that this topic is HIGHLY prone to ENDLESS
bikeshedding, and it's inevitable that not everybody is going to
agree. But I hope we can agree that it's completely silly that it's
vastly easier to find the documentation about the backup manifest
format than it is to find the documentation on CREATE TABLE or
shared_buffers, and if we can agree on that, then perhaps we can agree
on some way to make things better.
+many for improving the index.
My own pet docs peeve is a purely editorial one: func.sgml is a 30k line beast, and I think there's a good case for splitting out at least the larger chunks of it.
cheers
andrew
On Mon, Mar 18, 2024 at 5:40 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > I also disagree that chapters 4 to 6 are a continuation of the tutorial. > Or at least, they shouldn't be. > When I am looking for a documentation reference on something like > security considerations of SECURITY DEFINER functions, my first > impulse is to look in chapter 5 (Data Definition) or in chapter 38 > (Extending SQL), and I am surprised to find it discussed in the > SQL reference of CREATE FUNCTION. I looked at this a bit more closely. There's actually a lot of detailed technical information in chapters 4 and 5, but chapter 6 is extremely short and mostly recapitulates chapter 2. -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, Mar 19, 2024 at 5:39 PM Andrew Dunstan <andrew@dunslane.net> wrote: > +many for improving the index. Here's a series of four patches. Taken together, they cut down the number of numbered chapters from 76 to 68. I think we could easily save that much again if I wrote a few more patches along similar lines, but I'm posting these first to see what people think. 0001 removes the "Installation from Binaries" chapter. The whole thing is four sentences. I moved the most important information into the "Installation from Source Code" chapter and retitled it "Installation". 0002 removes the "Monitoring Disk Usage" chapter by folding it into the immediately-preceding "Monitoring Database Activity" chapter. I kind of feel like the "Monitoring Disk Usage" chapter might be in need of a bigger rewrite or just outright removal, but there's surely not enough content here to justify making it a top-level chapter. 0003 merges all of the "Internals" chapters whose names are the names of built-in index access methods (Btree, Gin, etc.) into a single chapter called "Built-In Index Access Methods". All of these chapters have a very similar structure and none of them are very long, so it makes a lot of sense, at least in my mind, to consolidate them into one. 0004 merges the "Generic WAL Records" and "Custom WAL Resource Managers" chapter together, creating a new chapter called "Write Ahead Logging for Extensions". Overall, I think this achieves a minor but pleasant level of de-cluttering of the index. It's going to take a lot more than one morning's work to produce a major improvement, but at least this is something. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
On Wed, Mar 20, 2024 at 12:43:08PM -0400, Robert Haas wrote: > Overall, I think this achieves a minor but pleasant level of > de-cluttering of the index. It's going to take a lot more than one > morning's work to produce a major improvement, but at least this is > something. I think this kind of doc structure review is long overdue. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Wed, Mar 20, 2024 at 1:35 PM Bruce Momjian <bruce@momjian.us> wrote: > On Wed, Mar 20, 2024 at 12:43:08PM -0400, Robert Haas wrote: > > Overall, I think this achieves a minor but pleasant level of > > de-cluttering of the index. It's going to take a lot more than one > > morning's work to produce a major improvement, but at least this is > > something. > > I think this kind of doc structure review is long overdue. Thanks, Bruce! -- Robert Haas EDB: http://www.enterprisedb.com
On 2024-Mar-20, Robert Haas wrote: > 0003 merges all of the "Internals" chapters whose names are the names > of built-in index access methods (Btree, Gin, etc.) into a single > chapter called "Built-In Index Access Methods". All of these chapters > have a very similar structure and none of them are very long, so it > makes a lot of sense, at least in my mind, to consolidate them into > one. I think you can achieve this with a much smaller patch that just changes the outer tag in each file so that each file is a <sect1>, then create a single file that includes all of these plus an additional outer tag for the <chapter> (or maybe just add the <chapter> in postgres.sgml). This has the advantage that each AM continues to be a separate single file, and you still have your desired structure. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Wed, Mar 20, 2024 at 5:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > I think you can achieve this with a much smaller patch that just changes > the outer tag in each file so that each file is a <sect1>, then create a > single file that includes all of these plus an additional outer tag for > the <chapter> (or maybe just add the <chapter> in postgres.sgml). This > has the advantage that each AM continues to be a separate single file, > and you still have your desired structure. Right, that could also be done, and not just for 0003. I just wasn't sure that was the right approach. It would mean that the division of the SGML into files continues to reflect the original chapter divisions rather than the current ones forever. In the short run that's less churn, less back-patching pain, etc.; but in the long term it means you've got relics of a structure that doesn't exist any more sticking around forever. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Mar 20, 2024 at 5:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: >> I think you can achieve this with a much smaller patch that just changes >> the outer tag in each file so that each file is a <sect1>, then create a >> single file that includes all of these plus an additional outer tag for >> the <chapter> (or maybe just add the <chapter> in postgres.sgml). This >> has the advantage that each AM continues to be a separate single file, >> and you still have your desired structure. > Right, that could also be done, and not just for 0003. I just wasn't > sure that was the right approach. It would mean that the division of > the SGML into files continues to reflect the original chapter > divisions rather than the current ones forever. In the short run > that's less churn, less back-patching pain, etc.; but in the long term > it means you've got relics of a structure that doesn't exist any more > sticking around forever. I'd say that a separate file per AM is a good thing regardless. Elsewhere in this same thread are grumblings about how big func.sgml is; why would you think it good to start down that same path for the AM documentation? regards, tom lane
On Wed, Mar 20, 2024 at 5:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'd say that a separate file per AM is a good thing regardless. > Elsewhere in this same thread are grumblings about how big func.sgml > is; why would you think it good to start down that same path for the > AM documentation? Well, I suppose I thought it was a good idea because (1) we don't seem to have any existing precedent for file-per-sect1 rather than file-per-chapter and (2) all of the per-AM files combined are less than 20% of the size of func.sgml. But, OK, if you want to establish a new paradigm here, sure. I see two ways to do it. We can either put the <chapter> tag directly in postgres.sgml, or I can still create a new indextypes.sgml and put &btree; etc. inside of it. Which way do you prefer? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > Well, I suppose I thought it was a good idea because (1) we don't seem > to have any existing precedent for file-per-sect1 rather than > file-per-chapter and (2) all of the per-AM files combined are less > than 20% of the size of func.sgml. We have done (1) in places, eg. json.sgml, array.sgml, rangetypes.sgml, rowtypes.sgml, and the bulk of extend.sgml is split out into xaggr, xfunc, xindex, xoper, xtypes. I'd be the first to concede it's a bit haphazard, but it's not like there's no precedent. As for (2), func.sgml likely should have been split years ago. > But, OK, if you want to establish a new paradigm here, sure. I see two > ways to do it. We can either put the <chapter> tag directly in > postgres.sgml, or I can still create a new indextypes.sgml and put > &btree; etc. inside of it. Which way do you prefer? I'd follow the extend.sgml precedent: have a file corresponding to the chapter and containing any top-level text we need, then that includes a file per sect1. regards, tom lane
On Thu, Mar 21, 2024 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'd follow the extend.sgml precedent: have a file corresponding to the > chapter and containing any top-level text we need, then that includes > a file per sect1. OK, here's a new patch set. I've revised 0003 and 0004 to use this approach, and I've added a new 0005 that does essentially the same thing for the PL chapters. 0001 and 0002 are changed. Should 0002 use the include-an-entity approach as well? -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
- v2-0004-docs-Consolidate-into-new-WAL-for-Extensions-chap.patch
- v2-0002-docs-Demote-Monitoring-Disk-Usage-from-chapter-to.patch
- v2-0003-docs-Merge-separate-chapters-on-built-in-index-AM.patch
- v2-0001-docs-Remove-the-Installation-from-Binaries-chapte.patch
- v2-0005-docs-Merge-all-procedural-language-documentation-.patch
On Thu, Mar 21, 2024 at 10:31 AM Robert Haas <robertmhaas@gmail.com> wrote: > 0001 and 0002 are changed. Should 0002 use the include-an-entity > approach as well? Woops. I meant to say that 0001 and 0002 are *unchanged*. -- Robert Haas EDB: http://www.enterprisedb.com
On 2024-Mar-21, Robert Haas wrote: > On Thu, Mar 21, 2024 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'd follow the extend.sgml precedent: have a file corresponding to the > > chapter and containing any top-level text we need, then that includes > > a file per sect1. > > OK, here's a new patch set. I've revised 0003 and 0004 to use this > approach, Great, thanks. Looking at the index in the PDF after (only) 0003, we now have this structure 62. Table Access Method Interface Definition ....................................................... 2475 63. Index Access Method Interface Definition ....................................................... 2476 63.1. Basic API Structure for Indexes .......................................................... 2476 63.2. Index Access Method Functions .......................................................... 2479 63.3. Index Scanning ................................................................................ 2485 63.4. Index Locking Considerations ............................................................. 2486 63.5. Index Uniqueness Checks .................................................................. 2487 63.6. Index Cost Estimation Functions ......................................................... 2489 64. Generic WAL Records ................................................................................. 2492 65. Custom WAL Resource Managers ................................................................. 2494 66. Built-in Index Access Methods ...................................................................... 2496 which is a bit odd: why are the two WAL chapters in the middle of the chapters 62 and 63 talking about AMs? Maybe put 66 right after 63 instead. Also, is it really better to have 62/63 first and 66 later? It sounds to me like 66 is more user-oriented and the other two are developer-oriented, so I'm inclined to suggest putting them the other way around, but I'm not really sure about this. (Also, starting chapter 66 straight with 66.1 BTree without any intro text looks a bit odd; maybe one short introductory paragraph is sufficient?) > and I've added a new 0005 that does essentially the same > thing for the PL chapters. I was looking at the PL chapters earlier today too, wondering whether this would be valuable; but I worry that there are too many sub-sub-sections there, so it could end up being a bit messy. I didn't look at the resulting output though. > 0001 and 0002 are [un]changed. Should 0002 use the include-an-entity > approach as well? Shrug, I wouldn't, doesn't look worth it. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "No es bueno caminar con un hombre muerto"
On Thu, Mar 21, 2024 at 12:43 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > which is a bit odd: why are the two WAL chapters in the middle of the > chapters 62 and 63 talking about AMs? Maybe put 66 right after 63 > instead. Also, is it really better to have 62/63 first and 66 > later? It sounds to me like 66 is more user-oriented and the other two > are developer-oriented, so I'm inclined to suggest putting them the > other way around, but I'm not really sure about this. (Also, starting > chapter 66 straight with 66.1 BTree without any intro text looks a bit > odd; maybe one short introductory paragraph is sufficient?) I had similar thoughts. I think that we should consider some changes to the chapter ordering, but I didn't want to try to change too many things all at once, since somebody only has to hate one thing about the patch to sink the whole thing. But since you brought it up, what I've been thinking about is that the whole division into parts might need to be rethought a bit. I feel like "VII. Internals" is a mix of about four different kinds of content. First, the biggest portion of it is information about developing certain kinds of C extensions -- all the "Writing a Whatever" chapters, the "Whatever Access Method Interface Definition" chapters, "Generic WAL Records", "Custom WAL Resource Managers", and all the index-related chapters. Second, we've got some information that I think is mainly of interest to people developing PostgreSQL itself, namely, "PostgreSQL Coding Conventions", "Native Language Support", and "System Catalog Declarations and Initial Contents". You *might* care about these if you're developing extensions, or even if you're not a developer at all, but then again you might not. Third, we've got some reference material, namely "System Catalogs", "System Views", and perhaps "Frontend/Backend Protocol". I distinguish these from the previous two categories because I think you could care about this stuff as a random user, or a developer of products that interoperate with PostgreSQL but don't link with it or share any common code. Finally, there's just a bunch of random bits and bobs that we've decided to document here for one reason or another, either because somebody else did a bunch of the work, like "Overview of PostgreSQL Internals", or because some developer did something and someone said "hey, that should be documented!", like "Backup Manifest Format." So my first thought is to pull out the stuff that's mainly for PostgreSQL core developers and move it to an appendix. I propose we create an appendix called "Developer Guide" and that it absorb the existing appendix I, "The Source Code Repository", possibly all or part of appendix J, "Documentation", and the chapters from "VII. Internals" that are mostly of developer interest. I think that possibly some of what's in "J. Documentation" should actually be moved into the "Installation" chapter where we talk about building the source code, because it doesn't make much sense to document the build tool chain in one part of the documentation and the documentation toolchain someplace else entirely, but "J.6. Style Guide" is developer information, not build instructions. My second thought is that the stuff from "VII. Internals" that I categorized as reference material should move into section "VI. Reference". I think we should also consider moving appendix F, "Additional Supplied Modules and Extensions," and appendix G, "Additional Supplied Programs" to the reference section. However, prior to doing that, I think that appendix G needs some cleanup or possibly we should just find a way to remove it outright. We're shipping an appendix G with two major subsections, one of which is completely empty and has been since v14, and the other of which contains only two things. I think we should just remove the empty sub-section entirely. I'm not sure what to do about the only with only 2 things in it (vacuumlo and oid2name). Would it be a bad idea to just merge those bits into the client applications reference section? My third thought is about what to do with the material in "VII. Internals" that is about developing specific kind of extensions, like, say, "Writing a Foreign Data Wrapper." If you look at "V. Server Programming", you see that we actually have some very similar sections there, like chapter 47, "Background Worker Processes" and chapter 50, "Archive Modules". I think it's not very clear in the current structure where topics that are relevant for extension developers should go under "Server Programming" or under "Internals", and it looks to me like different people have just done different things and it's all a bit haphazard. One idea is to decide that the correct answer is "Server Programming" and move all of the internals chapters that fall into this category over to there. I don't think this is the right answer, because that section also contains information about a bunch of stuff that's strictly SQL-level, like rules and triggers. So what I think we should do is create either [A] a new top-level part, just before or just after what's currently called "VI. Reference" or [B] a new appendix or [C] a new "Reference" section, that is specifically for documentation of server APIs intended for extension use. And then all the chapters under "V. Server Programming" or "VII. Internals" that are documenting APIs would get moved there. If we adopted all of the patches that I proposed in my previous email and all of the suggestions that I just dropped in the preceding wall of text, then the internals section would be left with only these chapters: - Overview of PostgreSQL Internals - Genetic Query Optimizer - Database Physical Storage - Transaction Processing - How the Planner Uses Statistics - Backup Manifest Format A lot of those chapters are pretty dated and maybe not that useful in 2024, but this email is already long enough and full of sufficiently-aggressive proposals that I'm not inclined to opine too much further on what we might want to do if and when we've done everything I just proposed. For now, suffice it to say that I think we might choose to either rewrite and expand some of these to make them more useful, or demote some of them to some less prominent place in the documentation, or just delete some of them entirely; but we can figure that out if and when we get there. > I was looking at the PL chapters earlier today too, wondering whether > this would be valuable; but I worry that there are too many > sub-sub-sections there, so it could end up being a bit messy. I didn't > look at the resulting output though. That thought occurred to me as well. I certainly think that if we perform the sort of aggressive purging of the top-level index for which I'm advocating, there are going to be some people who are grumpy that the stuff they're trying to find isn't where it used to be, or who legitimately had trouble finding the content that they want. It seems to me that if you're looking for the documentation on one of the individual procedural languages and you don't see it, you'll try clicking on "Procedural Languages" and then you'll find that it's now under there. Now, what's maybe a bit unfortunate is that chapter indexes only show two levels of section headings, and the PL/pgsql chapter in particular has a lot of <sect2> items. If those get demoted to <sect3> as I am proposing, they won't show up the chapter index any more. I do think there's a possibility that this could be a problem for someone. On the other hand, the table of contents for https://www.postgresql.org/docs/devel/plpgsql.html is so long right now that it doesn't fit on the page, so maybe losing a level of subsections won't be so bad. Alternately, maybe we could revise the structure of the section a bit to ameliorate the problem. It seems to me that most of the first-level section headers are actually pretty clear about what you're likely to find underneath. If you're looking for WHILE and you see a section called "Control Structures", it seems like you're chances of guessing that WHILE will be underneath that section are pretty good. The major exception that I see is 45.2, "Basic Statements," which isn't very clear about what might be covered there. But what if we split that apart into separate sections called "Assignment", "Executing SQL", and "Doing Nothing at All"? And maybe we'd even pull "Returning" out of "Control Structures" as well. I think that would be clear enough for people to find what they need without the extra level of headers. (For the sake of completeness, let me note that PL/python and PL/perl have a few <sect2> headings as well, but I don't think it would create a problem for users if all of those got changed to <sect3>. PL/Tcl has no <sect2> headings.) I'm not sure if this kind of rearrangement is actually necessary or not; but my point here is that if we think that people will have problems or we find out that they actually did have problems, we can look at doing this kind of stuff to compensate. What I don't think we should do is decide that the only workable solution is to keep having so many separate chapters at the top level. We're way, way beyond the point where you can easily find anything on that page, and trying to emphasize everything just ends up emphasizing nothing. We need to push in a direction where every chapter and every appendix is expected to have a large amount of content under it, so that the top-level index becomes a way of finding the kind of content you want (SQL reference pages, extension APIs, built-in SQL-callable functions, whatever) and then you use that page to find the specific content that you want within that category. -- Robert Haas EDB: http://www.enterprisedb.com
On Wed, Mar 20, 2024 at 9:43 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 19, 2024 at 5:39 PM Andrew Dunstan <andrew@dunslane.net> wrote:
> +many for improving the index.
Here's a series of four patches.
I reviewed the most recent set of 5 patches.
Taken together, they cut down the
number of numbered chapters from 76 to 68. I think we could easily
save that much again if I wrote a few more patches along similar
lines, but I'm posting these first to see what people think.
0001 removes the "Installation from Binaries" chapter. The whole thing
is four sentences. I moved the most important information into the
"Installation from Source Code" chapter and retitled it
"Installation".
Makes sense
0002 removes the "Monitoring Disk Usage" chapter by folding it into
the immediately-preceding "Monitoring Database Activity" chapter. I
kind of feel like the "Monitoring Disk Usage" chapter might be in need
of a bigger rewrite or just outright removal, but there's surely not
enough content here to justify making it a top-level chapter.
Just going to note that the section on the cumulative statistics views being a single page is still a strongly bothersome issue here. Though the quick fix actually requires upgrading the section to chapter status...
Maybe we can stub out that section in the "Monitoring Database Activity" chapter and move that entire section after "System Views" in the Internals part?
I agree with subordinating Monitoring Disk Usage.
0003 merges all of the "Internals" chapters whose names are the names
of built-in index access methods (Btree, Gin, etc.) into a single
chapter called "Built-In Index Access Methods". All of these chapters
have a very similar structure and none of them are very long, so it
makes a lot of sense, at least in my mind, to consolidate them into
one.
One of the more impactful and wanted improvements, IMO.
0004 merges the "Generic WAL Records" and "Custom WAL Resource
Managers" chapter together, creating a new chapter called "Write Ahead
Logging for Extensions".
The positioning of this and the preceding Built-in Index Access Methods chapter seem like they should be switched.
If this sticks we should add an introductory paragraph for the chapter.
and I've added a new 0005 that does essentially the same
thing for the PL chapters.
The following page needs to be reworded to take the new structure into account:
Not having pl/pgsql appear on the main ToC seems like a loss but the others make sense and a special exception for it probably isn't warranted.
Maybe "pl/pgsql and Other Procedural Languages" as the title?
David J.
On Thu, Mar 21, 2024 at 11:30 AM Robert Haas <robertmhaas@gmail.com> wrote:
My second thought is that the stuff from "VII. Internals" that I
categorized as reference material should move into section "VI.
Reference". I think we should also consider moving appendix F,
"Additional Supplied Modules and Extensions," and appendix G,
"Additional Supplied Programs" to the reference section.
For "VI. Reference" I propose the following Chapters:
SQL Commands
PL/pgSQL
Cumulative Statistics Views
System Views
System Catalogs
Client Applications
Server Applications
Modules and Extensions
-- Remove Appendix G (Programs) altogether and just note for the two that are listed that they are in contrib as opposed to core.
-- The PostgreSQL qualifier doesn't seem helpful and once you add the additional chapters its unusual presence stands out even more.
-- PL/pgSQL gets its own reference chapter since we wrote it. Stuff like Perl and Python have entire books that the user can consult as reference material for those languages.
David J.
On 19.03.24 14:50, Tom Lane wrote: > Daniel Gustafsson <daniel@yesql.se> writes: >> It's actually not very odd, the reference section is using <reference> elements >> and we had missed the arabic numerals setting on those. The attached fixes >> that for me. That being said, we've had roman numerals for the reference >> section since forever (all the way down to the 7.2 docs online has it) so maybe >> it was intentional? > > I'm quite sure it *was* intentional. Maybe it was a bad idea, but > it's not that way simply because nobody thought about it. Looks to me it was just that way because it's the default setting of the stylesheets.
On 20.03.24 17:43, Robert Haas wrote: > 0001 removes the "Installation from Binaries" chapter. The whole thing > is four sentences. I moved the most important information into the > "Installation from Source Code" chapter and retitled it > "Installation". But this separation was explicitly added a few years ago, because most people just want to read about the binaries.
On 21.03.24 15:31, Robert Haas wrote: > On Thu, Mar 21, 2024 at 9:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'd follow the extend.sgml precedent: have a file corresponding to the >> chapter and containing any top-level text we need, then that includes >> a file per sect1. > > OK, here's a new patch set. I've revised 0003 and 0004 to use this > approach, and I've added a new 0005 that does essentially the same > thing for the PL chapters. I'm highly against this. If I want to read about PL/Python, why should I have to wade through PL/Perl and PL/Tcl? I think, abstractly, in a book, PL/Python should be a chapter of its own. Just like GiST should be a chapter of its own. Because they are self-contained topics.
> On 22 Mar 2024, at 00:33, Peter Eisentraut <peter@eisentraut.org> wrote: > > On 19.03.24 14:50, Tom Lane wrote: >> Daniel Gustafsson <daniel@yesql.se> writes: >>> It's actually not very odd, the reference section is using <reference> elements >>> and we had missed the arabic numerals setting on those. The attached fixes >>> that for me. That being said, we've had roman numerals for the reference >>> section since forever (all the way down to the 7.2 docs online has it) so maybe >>> it was intentional? >> I'm quite sure it *was* intentional. Maybe it was a bad idea, but >> it's not that way simply because nobody thought about it. > > Looks to me it was just that way because it's the default setting of the stylesheets. That's quite possible. I don't have strong opinions on whether we should change, or keep it the way it is. -- Daniel Gustafsson
On Fri, Mar 22, 2024 at 01:12:30AM +0100, Daniel Gustafsson wrote: > > On 22 Mar 2024, at 00:33, Peter Eisentraut <peter@eisentraut.org> wrote: > > > > On 19.03.24 14:50, Tom Lane wrote: > >> Daniel Gustafsson <daniel@yesql.se> writes: > >>> It's actually not very odd, the reference section is using <reference> elements > >>> and we had missed the arabic numerals setting on those. The attached fixes > >>> that for me. That being said, we've had roman numerals for the reference > >>> section since forever (all the way down to the 7.2 docs online has it) so maybe > >>> it was intentional? > >> I'm quite sure it *was* intentional. Maybe it was a bad idea, but > >> it's not that way simply because nobody thought about it. > > > > Looks to me it was just that way because it's the default setting of the stylesheets. > > That's quite possible. I don't have strong opinions on whether we should > change, or keep it the way it is. If we can't justify why it should be different, it should be like the surrounding sections. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Thu, Mar 21, 2024 at 6:32 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > Just going to note that the section on the cumulative statistics views being a single page is still a strongly bothersomeissue here. Though the quick fix actually requires upgrading the section to chapter status... Yeah, I've been bothered by this in the past, too. I'm not very keen to start promoting things to the top-level, though. I think we need a more thoughtful fix than that. One question I have is why all of these views are documented here rather than in chapter 53, "System Views," because surely they are system views. I feel like if our documentation index weren't a mile long and if you could easily find the entry for "System Views," that's where you would naturally look for these details. I don't think it's natural for a user to expect that most of the system views are going to be documented in section VII, chapter 53 but one particular kind is going to be documented in section III, chapter 27, under a chapter title that gives no hint that it will document any views. > Maybe "pl/pgsql and Other Procedural Languages" as the title? I guess I have a hard time seeing this as an improvement. It would help someone who knows that plpgsql exists but doesn't know that it falls into the general category called procedural languages, but I suspect that's not a very common confusion. I think it's better to keep the chapter titles short and to the point. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, Mar 21, 2024 at 7:37 PM Peter Eisentraut <peter@eisentraut.org> wrote: > On 20.03.24 17:43, Robert Haas wrote: > > 0001 removes the "Installation from Binaries" chapter. The whole thing > > is four sentences. I moved the most important information into the > > "Installation from Source Code" chapter and retitled it > > "Installation". > > But this separation was explicitly added a few years ago, because most > people just want to read about the binaries. I really doubt that this is true. I've been installing software on UNIX-like operating systems for more than 30 years now, and I don't think there's been a single time when I have ever consulted the documentation for a software package to find the download location for that package. When I first started out, everything was ftp rather than www, so you went to ftp.whatever.{com,org,net,gov,edu} and tried to download the distribution bundle, and then you untarred it and ran configure and make. Then you read the README or the documentation or whatever afterward. These days, I think what people do is either (a) use their package manager to install PostgreSQL and then come to the documentation afterward to find out how to use it or (b) do a search for "PostgreSQL download" and click on whatever comes up. I'm not saying there's never been a user who made use of this section of the documentation to find the download location, but surely the normal thing to do if you come to www.postgresql.org and you want to download the software is to click "Download" on the nav bar, not "Documentation," then a specific version, then chapter 16, then the exact same download link that's already there on the nav bar. I do agree that it is very questionable whether "Installation from Source Code" is of sufficient interest to ordinary users to justify including it in "III. Server Administration." Most people, probably including many extension developers, are only going to install the binary packages. But the solution to that isn't to have a four-sentence chapter telling me about a download location that I likely found long before I looked at the documentation, and that I can certainly find very easily without needing the documentation. Rather, what we should do if we think that installing from source code is of marginal interest is move it to an appendix. As I said to Alvaro yesterday, I think that a "Developer Guide" appendix could be a good place to house a number of things that currently have toplevel chapters but don't really need them because they're only of interest to a small minority of users. This might be another thing that could go there. -- Robert Haas EDB: http://www.enterprisedb.com
On 22.03.24 13:50, Robert Haas wrote: > On Thu, Mar 21, 2024 at 7:37 PM Peter Eisentraut <peter@eisentraut.org> wrote: >> On 20.03.24 17:43, Robert Haas wrote: >>> 0001 removes the "Installation from Binaries" chapter. The whole thing >>> is four sentences. I moved the most important information into the >>> "Installation from Source Code" chapter and retitled it >>> "Installation". >> >> But this separation was explicitly added a few years ago, because most >> people just want to read about the binaries. > > I really doubt that this is true. Here is the thread: https://www.postgresql.org/message-id/flat/CABUevExRCf8waYOsrCO-QxQL50XGapMf5dnWScOXj7X%3DMXW--g%40mail.gmail.com
On Thu, Mar 21, 2024 at 7:40 PM Peter Eisentraut <peter@eisentraut.org> wrote: > I'm highly against this. If I want to read about PL/Python, why should > I have to wade through PL/Perl and PL/Tcl? > > I think, abstractly, in a book, PL/Python should be a chapter of its > own. Just like GiST should be a chapter of its own. Because they are > self-contained topics. On the other hand, in a book, chapters tend to be of relatively uniform length. People don't usually write a book with some chapters that are 100+ pages long, and others that are a single page, or even just a couple of sentences. I mean, I'm sure it's been done, but it's not a normal way to write a book. And I don't believe that if someone were writing a physical book about PostgreSQL from scratch, they'd ever end up with a top-level chapter that looks anything like our GiST chapter. All of the index AM chapters are quite obviously clones of each other, and they're all quite short. Surely you'd make them sections within a chapter, not entire chapters. I do agree that PL/pgsql is more arguable. I can imagine somebody writing a book about PostgreSQL and choosing to make that topic into a whole chapter. However, I also think that people don't make decisions about what should be a chapter in a vacuum. If you've got 100 people writing a book together, which is essentially what we actually do have, and each of those people makes decisions in isolation about what is worthy of being a chapter, then you end up with exactly the kind of mess that we now have. Some chapters are long and some are short. Some are well-written and some are poorly written. Some are updated regularly and others have hardly been touched in a decade. Books have editors to straighten out those kinds of inconsistencies so that there's some uniformity to the product as a whole. The problem with that, of course, is that it invites bike-shedding. As you say, every decision that is reflected in our documentation was made for some reason, and most of them will have been made by prominent, active committers. So discussions about how to improve things can easily bog down even when people agree on the overall goals, simply because few individual changes find consensus. I hope that doesn't happen here, because I think most people who have commented so far agree that there is a problem here and that we should try to fix it. Let's not let the perfect be the enemy of the good. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 22, 2024 at 9:35 AM Peter Eisentraut <peter@eisentraut.org> wrote: > >> But this separation was explicitly added a few years ago, because most > >> people just want to read about the binaries. > > > > I really doubt that this is true. > > Here is the thread: > https://www.postgresql.org/message-id/flat/CABUevExRCf8waYOsrCO-QxQL50XGapMf5dnWScOXj7X%3DMXW--g%40mail.gmail.com Sorry. I didn't mean to dispute the point that the section was added a few years ago, nor the point that most people just want to read about the binaries. I am confident that both of those things are true. What I do want to dispute is that having a four-sentence chapter in the documentation index that tells people something they can find much more easily without using the documentation at all is a good plan. I agree with the concern that Magnus expressed on the thread, i.e: > It's kind of strange that if you start your PostgreSQL journey by reading our instructions, you get nothing useful aboutinstalling PostgreSQL from binary packages other than "go ask somebody else about it". But I don't agree that this was the right way to address that problem. I think it would have been better to just add the download link to the existing installation chapter. That's actually what we had in chapter 18, "Installation from Source Code on Windows", since removed. But for some reason we decided that on non-Windows platforms, it needed a whole new chapter rather than an extra sentence in the existing one. I think that's massively overkill. Alternately, I think it would be reasonable to address the concern by just moving all the stuff about building from source code to an appendix, and assume people can figure out how to download the software without us needing to say anything in the documentation at all. What was weird about the state before that patch, IMHO, was that we both talked about building from source code and didn't talk about binary packages. That can be addressed either by adding a mention of binary packages, or by deemphasizing the idea of installing from source code. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 22, 2024 at 7:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
That's actually what we had in chapter
18, "Installation from Source Code on Windows", since removed. But for
some reason we decided that on non-Windows platforms, it needed a
whole new chapter rather than an extra sentence in the existing one. I
think that's massively overkill.
I agree with the premise that we should have a single chapter, in the main documentation flow, named "Installation". It should cover the architectural overview and point people to where they can find the stuff they need to install PostgreSQL in the various ways available to them. I agree with moving the source installation material to the appendix. None of the sections under Installation would then actually detail how to install the software since that isn't something the project itself handles but has delegated to packagers for the vast majority of cases and the source install details are in the appendix for the one "supported" mechanism that most people do not use.
David J.
On Fri, Mar 22, 2024 at 11:50 AM David G. Johnston <david.g.johnston@gmail.com> wrote: > On Fri, Mar 22, 2024 at 7:10 AM Robert Haas <robertmhaas@gmail.com> wrote: >> That's actually what we had in chapter >> 18, "Installation from Source Code on Windows", since removed. But for >> some reason we decided that on non-Windows platforms, it needed a >> whole new chapter rather than an extra sentence in the existing one. I >> think that's massively overkill. > > I agree with the premise that we should have a single chapter, in the main documentation flow, named "Installation". Itshould cover the architectural overview and point people to where they can find the stuff they need to install PostgreSQLin the various ways available to them. I agree with moving the source installation material to the appendix. None of the sections under Installation would then actually detail how to install the software since that isn't somethingthe project itself handles but has delegated to packagers for the vast majority of cases and the source installdetails are in the appendix for the one "supported" mechanism that most people do not use. Hmm, that's not quite the same as my position. I'm fine with either moving the installation from source material to an appendix, or leaving it where it is. But I'm strongly against continuing to have a chapter with four sentences in it that says to use the same download link that is on the main navigation bar of every page on the postgresql.org web site. We're never going to get the chapter index down to a reasonable size if we insist on having chapters that have a totally de minimis amount of content. So my feeling is that if we keep the installation from source material where it is, then we can make it also mention the download link, just as we used to do in the installation-on-windows chapter. But if we banish installation from source to the appendixes, then we shouldn't keep a whole chapter in the main documentation to tell people something that is anyway obvious. I don't really think that material needs to be there at all, but if we want to have it, surely we can find someplace to put it such that it doesn't require a whole chapter to say that and nothing else. It could for example go at the beginning of the "Server Setup and Operation" chapter, for instance; if that were the first chapter of section III, I think that would be natural enough. I notice that you say that the "Installation" section should "cover the architectural overview and point people to where they can find the stuff they need to install PostgreSQL in the various ways available to them" so maybe you're not imagining a four-sentence chapter, either. But this project is going to be impossible unless we stick to limited goals. We can, and should, rewrite some sections of the documentation to be more useful; but if we try to do that as part of the same project that aims to tidy up the index, the chances of us getting stuck in an endless bikeshedding loop go from "high" to "certain". So I don't want to hypothesize the existence of an installation chapter that isn't any of the things we have today. Let's try to get the things we have into places that make sense, and then consider other improvements separately. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 22, 2024, 09:32 Robert Haas <robertmhaas@gmail.com> wrote:
I notice that you say that the "Installation" section should "cover
the architectural overview and point people to where they can find the
stuff they need to install PostgreSQL in the various ways available to
them" so maybe you're not imagining a four-sentence chapter, either.
Fair point but I posit that new users are looking for a chapter named Installation in the documentation. At least the ones willing to read documentation. Having two of them isn't needed but having zero doesn't make sense either.
The current proposal does that so I'm ok as-is but it can be further improved by moving source install talk elsewhere and having the installation chapter redirect the reader there for details. I'm not concerned with how long or short the resultant installation chapter is.
David J.
On Fri, Mar 22, 2024 at 08:32:14AM -0400, Robert Haas wrote: > On Thu, Mar 21, 2024 at 6:32 PM David G. Johnston > <david.g.johnston@gmail.com> wrote: > > Just going to note that the section on the cumulative statistics views being a single page is still a strongly bothersomeissue here. Though the quick fix actually requires upgrading the section to chapter status... > > Yeah, I've been bothered by this in the past, too. I'm not very keen > to start promoting things to the top-level, though. I think we need a > more thoughtful fix than that. > > One question I have is why all of these views are documented here > rather than in chapter 53, "System Views," because surely they are > system views. I feel like if our documentation index weren't a mile > long and if you could easily find the entry for "System Views," that's > where you would naturally look for these details. I don't think it's > natural for a user to expect that most of the system views are going > to be documented in section VII, chapter 53 but one particular kind is > going to be documented in section III, chapter 27, under a chapter Well, until this commit in 2022, the system views were _under_ the system catalogs chapter: commit 64d364bb39c Author: Bruce Momjian <bruce@momjian.us> Date: Thu Jul 14 16:07:12 2022 -0400 doc: move system views section to its own chapter Previously it was inside the system catalogs chapter. Reported-by: Peter Smith Discussion: https://postgr.es/m/CAHut+PsMc18QP60D+L0hJBOXrLQT5m88yVaCDyxLq34gfPHsow@mail.gmail.com Backpatch-through: 15 The thread contains more discussion the issue, and I think it still needs help: https://www.postgresql.org/message-id/flat/CAHut%2BPsMc18QP60D%2BL0hJBOXrLQT5m88yVaCDyxLq34gfPHsow%40mail.gmail.com -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian <bruce@momjian.us> wrote: > > One question I have is why all of these views are documented here > > rather than in chapter 53, "System Views," because surely they are > > system views. I feel like if our documentation index weren't a mile > > long and if you could easily find the entry for "System Views," that's > > where you would naturally look for these details. I don't think it's > > natural for a user to expect that most of the system views are going > > to be documented in section VII, chapter 53 but one particular kind is > > going to be documented in section III, chapter 27, under a chapter > > Well, until this commit in 2022, the system views were _under_ the > system catalogs chapter: Even before that commit, the statistics collector views were documented in a completely separate part of the documentation from all of the other system views. I think that commit was a good idea, even though it made the top-level documentation index bigger, because in v14, the "System Catalogs" chapter looks like this: ... 52.61. pg_ts_template 52.62. pg_type 52.63. pg_user_mapping 52.64. System Views 52.65. pg_available_extensions 52.66. pg_available_extension_versions 52.67. pg_backend_memory_contexts ... If you were actually looking for the section called "System Views", you weren't likely to see it here unless you already knew it was there, because it was 64 items into a 97-item list. Having one of these two sections inside the other just doesn't work at all. We could have alternatively chosen to have one chapter with two <sect1> tags inside of it, but I think what you actually did was perfectly fine. IMHO, "System Views" is important enough (and big enough) that giving it its own chapter is perfectly reasonable. But that all seems like a separate question from why we have the statistic collector views in a completely different part of the documentation from the rest of the system views. My guess is that it's just kind of a historical accident, but maybe there was some other logic to it. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 22, 2024 at 02:19:29PM -0400, Robert Haas wrote: > If you were actually looking for the section called "System Views", > you weren't likely to see it here unless you already knew it was > there, because it was 64 items into a 97-item list. Having one of > these two sections inside the other just doesn't work at all. We could > have alternatively chosen to have one chapter with two <sect1> tags > inside of it, but I think what you actually did was perfectly fine. > IMHO, "System Views" is important enough (and big enough) that giving > it its own chapter is perfectly reasonable. > > But that all seems like a separate question from why we have the > statistic collector views in a completely different part of the > documentation from the rest of the system views. My guess is that it's > just kind of a historical accident, but maybe there was some other > logic to it. I assume statistics collector views are in "Monitoring Database Activity" because that is their purpose. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Fri, Mar 22, 2024 at 11:19 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Mar 22, 2024 at 1:35 PM Bruce Momjian <bruce@momjian.us> wrote:
But that all seems like a separate question from why we have the
statistic collector views in a completely different part of the
documentation from the rest of the system views. My guess is that it's
just kind of a historical accident, but maybe there was some other
logic to it.
The details under-pinning the cumulative statistics subsystem are definitely large enough to warrant their own subsection. And it isn't like placing them into the monitoring chapter is wrong and aside from a couple of views those under System Views don't fit into what we've defined as monitoring. I don't have any desire to lump them under the generic system views; which itself could probably use a level of categorization since the nature of pg_locks and pg_cursors is decidedly different than pg_indexes and pg_config. This all becomes more appealing to work on once we solve the problem of all sect2 entries being placed on a single page.
I struggled for a long while where I'd always look for pg_stat_activity under system views instead of monitoring. Amending my prior suggestion in light of this I would suggest we move the Cumulative Statistics Views into Reference but as its own Chapter, not part of System Views, and change its name to "Monitoring Views" (going more generalized here feels like a win to me). I'd move pg_locks, pg_cursors, pg_backend_memory_contexts, pg_prepared_*, pg_shmem_allocations, and pg_replication_*. Those all have the same general monitoring nature to them compared to the others that basically provide details regarding schema and static or session configuration.
The original server admin monitoring section can go into detail regarding Cumulative Statistics versus other kinds of monitoring. We can use section ordering to fulfill logical grouping desires until we are able to make section3 entries appear on their own pages.
David J.
On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian <bruce@momjian.us> wrote: > I assume statistics collector views are in "Monitoring Database > Activity" because that is their purpose. Well, yes. :-) But the point is that all other statistics views are in a single section regardless of their purpose. We don't document pg_roles in the "Database Roles" chapter, for example. And on the flip side, pg_locks and pg_replication_origin_status are also for monitoring database activity, but they're in the "System Views" chapter anyway. The only system views that are in "Monitoring Database Activity" rather than "System Views" are the ones where the name starts with "pg_stat_". So the reason you state is why these views are under "Monitoring Database Activity" rather than a chapter chosen at random. But it doesn't really explain why they're separate from the other system views at all. That seems to be a pretty much random choice, AFAICT. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 22, 2024 at 03:13:29PM -0400, Robert Haas wrote: > On Fri, Mar 22, 2024 at 2:59 PM Bruce Momjian <bruce@momjian.us> wrote: > > I assume statistics collector views are in "Monitoring Database > > Activity" because that is their purpose. > > Well, yes. :-) > > But the point is that all other statistics views are in a single > section regardless of their purpose. We don't document pg_roles in the > "Database Roles" chapter, for example. > > And on the flip side, pg_locks and pg_replication_origin_status are > also for monitoring database activity, but they're in the "System > Views" chapter anyway. The only system views that are in "Monitoring > Database Activity" rather than "System Views" are the ones where the > name starts with "pg_stat_". > > So the reason you state is why these views are under "Monitoring > Database Activity" rather than a chapter chosen at random. But it > doesn't really explain why they're separate from the other system > views at all. That seems to be a pretty much random choice, AFAICT. I agree and they should be with the other views. I was just explaining why, at the time, I didn't touch them. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
On Fri, Mar 22, 2024 at 3:17 PM Bruce Momjian <bruce@momjian.us> wrote: > I agree and they should be with the other views. I was just explaining > why, at the time, I didn't touch them. Ah, OK. That makes total sense. -- Robert Haas EDB: http://www.enterprisedb.com
On 22.03.24 14:59, Robert Haas wrote: > And I don't believe that if someone were writing a physical book about > PostgreSQL from scratch, they'd ever end up with a top-level chapter > that looks anything like our GiST chapter. All of the index AM > chapters are quite obviously clones of each other, and they're all > quite short. Surely you'd make them sections within a chapter, not > entire chapters. > > I do agree that PL/pgsql is more arguable. I can imagine somebody > writing a book about PostgreSQL and choosing to make that topic into a > whole chapter. Yeah, I think there is probably a range of of things from pretty obvious to mostly controversial.
On 22.03.24 15:10, Robert Haas wrote: > Sorry. I didn't mean to dispute the point that the section was added a > few years ago, nor the point that most people just want to read about > the binaries. I am confident that both of those things are true. What > I do want to dispute is that having a four-sentence chapter in the > documentation index that tells people something they can find much > more easily without using the documentation at all is a good plan. I think a possible problem we need to consider with these proposals to combine chapters is that they could make the chapters themselves too deep and harder to navigate. For example, if we combined the installation from source and binaries chapters, the structure of the new chapter would presumably be <chapter> Installation <sect1> Installation from Binaries <sect1> Installation from Source <sect2> Requirements <sect2> Getting the Source <sect2> Building and Installation with Autoconf and Make <sect2> Building and Installation with Meson etc. This would mean that the entire "Installation from Source" part would be rendered on a single HTML page. The rendering can be adjusted to some degree, but then we also need to make sure any new chunking makes sense in other chapters. (And it might also change a bunch of externally known HTML links.) I think maybe more could also be done at the top-level structure, too. Right now, we have <book> -> <part> -> <chapter>. We could add <set> on top of that. We could also play with CSS or JavaScript to make the top-level table of contents more navigable, with collapsing subsections or whatever. We could also render additional tables of contents or indexes, so there is more than one way to navigate into the content from the top. We could also build better search.
On Mon, Mar 25, 2024 at 11:40 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I think a possible problem we need to consider with these proposals to > combine chapters is that they could make the chapters themselves too > deep and harder to navigate. For example, if we combined the > installation from source and binaries chapters, the structure of the new > chapter would presumably be I agree with this in theory, but in practice I think the patches that I posted don't have this issue to a degree that is problematic, and I posted some specific proposals on adjustments that we could make to ameliorate the problem if other people feel differently. > I think maybe more could also be done at the top-level structure, too. > Right now, we have <book> -> <part> -> <chapter>. We could add <set> on > top of that. > > We could also play with CSS or JavaScript to make the top-level table of > contents more navigable, with collapsing subsections or whatever. > > We could also render additional tables of contents or indexes, so there > is more than one way to navigate into the content from the top. > > We could also build better search. These are all reasonable ideas. I think some better CSS and JavaScript could definitely help, and I also wondered whether the entrypoint to the documentation has to be the index page, or whether it could maybe be a page we've crafted specifically for that purpose, that might include some text as well as a bunch of links. But that having been said, I don't believe that any of those ideas (or anything else we do) will obviate the need for some curation of the toplevel index. If you're going to add another level, as you propose in the first point, you still need to make decisions about which things properly go at which levels. If you're going to allow for collapsing subsections, you still want the overall tree in which subsections are be expanded and collapsed to make logical sense. If you have multiple ways to navigate to the content, one of them will probably be still the index, and it should be good. And good search is good, but it shouldn't be the only convenient way to find the content. -- Robert Haas EDB: http://www.enterprisedb.com
OK, so I'm coming back to this thread after giving it a few days to cool off. My last series of patches proposed to do five things: 1. Merge the four-sentence "Installation from Binaries" chapter back into "Installation from Source". I thought this was a slam-dunk, but Peter pointed out that exactly the opposite of this was done a few years ago to create the "Installation from Binaries" chapter in the first place. Based on subsequent discussion, what I'm now inclined to do is come up with a new proposal that involves moving the information about compiling from source to an appendix. So never mind about this one for now. 2. Demote "Monitoring Disk Usage" from a chapter on its own to a section of the "Monitoring Database Activity" chapter. I haven't seen any objections to this, and I'd like to move ahead with it. 3. Merge the separate chapters on various built-in index AMs into one. Peter didn't think this was a good idea, but Tom and Alvaro's comments focused on how to do it mechanically, and on whether the chapters needed to be reordered afterwards, which I took to mean that they were OK with the basic concept. David Johnston was also clearly in favor of it. So I'd like to move ahead with this one, too. 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource Managers" chapters, which cover related topics, into a single one. I didn't see anyone object to this, but David Johnston pointed out that the patch I posted was a few bricks short of a load, because it really needed to put some introductory text into the new chapter. I'll study this a bit more and propose a new patch that does the same thing a bit more carefully than my previous version did. 5. Consolidate all of the procedural language chapters into one. This was clearly the most controversial part of the proposal. I'm going to lay this one aside for now and possibly come back to it at a later time. I hope that this way of proceeding makes sense to people. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Mar 29, 2024 at 9:40 AM Robert Haas <robertmhaas@gmail.com> wrote: > 2. Demote "Monitoring Disk Usage" from a chapter on its own to a > section of the "Monitoring Database Activity" chapter. I haven't seen > any objections to this, and I'd like to move ahead with it. > > 3. Merge the separate chapters on various built-in index AMs into one. > Peter didn't think this was a good idea, but Tom and Alvaro's comments > focused on how to do it mechanically, and on whether the chapters > needed to be reordered afterwards, which I took to mean that they were > OK with the basic concept. David Johnston was also clearly in favor of > it. So I'd like to move ahead with this one, too. I committed these two patches. > 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource > Managers" chapters, which cover related topics, into a single one. I > didn't see anyone object to this, but David Johnston pointed out that > the patch I posted was a few bricks short of a load, because it really > needed to put some introductory text into the new chapter. I'll study > this a bit more and propose a new patch that does the same thing a bit > more carefully than my previous version did. Here is a new version of this patch. I think this is v18 material at this point, absent an outcry to the contrary. Sometimes we're flexible about doc patches. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
On Mon, Mar 25, 2024 at 11:40 AM Peter Eisentraut <peter@eisentraut.org> wrote: > I think a possible problem we need to consider with these proposals to > combine chapters is that they could make the chapters themselves too > deep and harder to navigate. I looked into various options for further combining chapters and/or appendixes and found that this is indeed a huge problem. For example, I had thought of creating a Developer Information chapter in the appendix and moving various existing chapters and appendixes inside of it, but that means that the <sect1> elements in those chapters get demoted to <sect2>, and what used to be a whole chapter or appendix becomes a <sect1>. And since you get one HTML page per <sect1>, that means that instead of a bunch of individual HTML pages of very pleasant length, you suddenly get one very long HTML page that is, exactly as you say, hard to navigate. > The rendering can be adjusted to some degree, but then we also need to > make sure any new chunking makes sense in other chapters. (And it might > also change a bunch of externally known HTML links.) I looked into this and I'm unclear how much customization is possible. I gather that the current scheme comes from having chunk.section.depth of 1, and I guess you can change that to 2 to get an HTML page per <sect2>, but it seems like it would take a LOT of restructuring to make that work. It would be much easier if you could vary this across different parts of the documentation; for instance, if you could say, well, in this particular chapter or appendix, I want chunk.section.depth of 2, but elsewhere 1, that would be quite handy, but after several hours reading various things about DocBook on the Internet, I was still unable to determine conclusively whether this was possible. There's an interesting comment in stylesheet-speedup-xhtml.xsl that says "Since we set a fixed $chunk.section.depth, we can do away with a bunch of complicated XPath searches for the previous and next sections at various levels." That sounds like it's suggesting that it is in fact possible for this setting to vary, but I don't know if that's true, or how to do it, and it sounds like there might be performance consequences, too. > I think maybe more could also be done at the top-level structure, too. > Right now, we have <book> -> <part> -> <chapter>. We could add <set> on > top of that. Does this let you create structures of non-uniform depth? i.e. is there a way that we can group some chapters into sets while leaving others as standalone chapters, or somesuch? I'm not 100% confident that non-uniform depth (either via <set> or via chunk.section.depth or via some other mechanism) is a good idea. There's a sort of uniformity to our navigation right now that does have some real appeal. The downside, though, is that if you want something to be a single HTML page, it's got to either be a chapter (or appendix) by itself with no sections inside of it, or it's got to be a <sect1> inside of a chapter, and so anything that's long enough that it should be an HTML page by itself can never be more than one level below the index. And that seems to make it quite difficult to keep the index small. Without some kind of variable-depth structure, the only other ways that I can see to improve things are: 1. Make chunk.section.depth 2 and restructure the entire documentation until the results look reasonable. This might be possible but I bet it's difficult. We have, at present, chapters of *wildly* varying length, from a few sentences to many, many pages. That is perhaps a bad thing; you most likely wouldn't do that in a printed book. But fixing it is a huge project. We don't necessarily have the same amount of content about each topic, and there isn't necessarily a way of grouping related topics together that produces units of relatively uniform length. I think it's sensible to try to make improvements where we can, by pushing stuff down that's short and not that important, but finding our way to a chunk.section.depth=2 world that feels good to most people compared to what we have today seems like it's going to be challening. 2. Replace the current index with a custom index or landing page of some kind. Or keep the current index and add a new landing page alongside it. Something that isn't derived automatically from the documentation structure but is created by hand. -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 5, 2024 at 9:01 AM Robert Haas <robertmhaas@gmail.com> wrote:
> The rendering can be adjusted to some degree, but then we also need to
> make sure any new chunking makes sense in other chapters. (And it might
> also change a bunch of externally known HTML links.)
I looked into this and I'm unclear how much customization is possible.
Here is a link to my attempt at this a couple of years ago. It basically "abuses" refentry.
I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination without negative consequences and with minimal risk of unintended consequences since only the markup on the pages we want to alter is changed, not global configuration.
David J.
On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > Here is a link to my attempt at this a couple of years ago. It basically "abuses" refentry. > > https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com > > I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination withoutnegative consequences and with minimal risk of unintended consequences since only the markup on the pages we wantto alter is changed, not global configuration. Hmm, but it seems like that might have generated some man page entries that we don't want? -- Robert Haas EDB: http://www.enterprisedb.com
On Fri, Apr 5, 2024 at 9:18 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 5, 2024 at 12:15 PM David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Here is a link to my attempt at this a couple of years ago. It basically "abuses" refentry.
>
> https://www.postgresql.org/message-id/CAKFQuwaVm%3D6d_sw9Wrp4cdSm5_k%3D8ZVx0--v2v4BH4KnJtqXqg%40mail.gmail.com
>
> I never did dive into the man page or PDF dynamics of this particular change but it seemed to solve HTML pagination without negative consequences and with minimal risk of unintended consequences since only the markup on the pages we want to alter is changed, not global configuration.
Hmm, but it seems like that might have generated some man page entries
that we don't want?
If so (didn't check) maybe just remove them in post?
David J.
On 05.04.24 17:11, Robert Haas wrote: >> 4. Consolidate the "Generic WAL Records" and "Custom WAL Resource >> Managers" chapters, which cover related topics, into a single one. I >> didn't see anyone object to this, but David Johnston pointed out that >> the patch I posted was a few bricks short of a load, because it really >> needed to put some introductory text into the new chapter. I'll study >> this a bit more and propose a new patch that does the same thing a bit >> more carefully than my previous version did. > > Here is a new version of this patch. I think this is v18 material at > this point, absent an outcry to the contrary. Sometimes we're flexible > about doc patches. Looks good to me. I think this could go into PG17.
On Wed, Mar 20, 2024 at 5:40 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > > +many for improving the index. > > My own pet docs peeve is a purely editorial one: func.sgml is a 30k line beast, and I think there's a good case for splittingout at least the larger chunks of it. > I think I successfully reduced func.sgml from 311322 lines to 13167 lines. (base-commit: 93582974315174d544592185d797a2b44696d1e5) writing a patch would be unreviewable. key gotcha is put the contents between opening `<sect1>` and closing `</sect1>` (both inclusive) into a new file. in func.sgml, using `&entity` to refernce the new file. also update filelist.sgml here is how I do it: I found out these build html files are the biggest one: doc/src/sgml/html/functions-string.html doc/src/sgml/html/functions-matching.html doc/src/sgml/html/functions-datetime.html doc/src/sgml/html/functions-json.html doc/src/sgml/html/functions-aggregate.html doc/src/sgml/html/functions-info.html doc/src/sgml/html/functions-admin.html so create these new sgml files hold corrspedoning content: func-string.sgml func-matching.sgml func-datetime.sgml func-json.sgml func-aggregate.sgml func-info.sgml func-admin.sgml based on funs.sgml structure pattern: <sect1 id="functions-string"> next section1 line number: <sect1 id="functions-binarystring"> <sect1 id="functions-matching"> next section1 line number: <sect1 id="functions-formatting"> <sect1 id="functions-datetime"> next section1 line number: <sect1 id="functions-enum"> <sect1 id="functions-json"> next section1 line number: <sect1 id="functions-sequence"> <sect1 id="functions-aggregate"> next section1 line number: <sect1 id="functions-window"> <sect1 id="functions-info"> next section1 line number: <sect1 id="functions-admin"> <sect1 id="functions-admin"> next section1 line number: <sect1 id="functions-trigger"> ------------------------------------ step1: pipe the relative line range contents to new sgml files. (example: line 2407 to line 4177 include all the content correspond to functions-string.html) sed -n '2407,4177 p' func.sgml > func-string.sgml sed -n '5328,7756 p' func.sgml > func-matching.sgml sed -n '8939,11122 p' func.sgml > func-datetime.sgml sed -n '15498,19348 p' func.sgml > func-json.sgml sed -n '21479,22896 p' func.sgml > func-aggregate.sgml sed -n '24257,27896 p' func.sgml > func-info.sgml sed -n '27898,30579 p' func.sgml > func-admin.sgml step2: in place delete these line ranges in func.sgml sed --in-place "2407,4177d ; 5328,7756d ; 8939,11122d ; 15498,19348d ; 21479,22896d ; 24257,27896d ; 27898,30579d" \ func.sgml reference: https://unix.stackexchange.com/questions/676210/matching-multiple-ranges-with-sed-range-expressions https://www.gnu.org/software/sed/manual/sed.html#Command_002dLine-Options step3: put following lines into relative position in func.sgml: (based on above structure pattern, quickly location line position) ` &func-string &func-matching &func-datetime &func-json &func-aggregate &func-info &func-admin ` step4: update filelist.sgml: diff --git a/doc/src/sgml/filelist.sgml b/doc/src/sgml/filelist.sgml index 3fb0709f..0b78a361 100644 --- a/doc/src/sgml/filelist.sgml +++ b/doc/src/sgml/filelist.sgml @@ -18,6 +18,13 @@ <!ENTITY ddl SYSTEM "ddl.sgml"> <!ENTITY dml SYSTEM "dml.sgml"> <!ENTITY func SYSTEM "func.sgml"> +<!ENTITY func-string SYSTEM "func-string.sgml"> +<!ENTITY func-matching SYSTEM "func-matching.sgml"> +<!ENTITY func-datetime SYSTEM "func-datetime.sgml"> +<!ENTITY func-json SYSTEM "func-json.sgml"> +<!ENTITY func-aggregate SYSTEM "func-aggregate.sgml"> +<!ENTITY func-info SYSTEM "func-info.sgml"> +<!ENTITY func-admin SYSTEM "func-admin.sgml"> <!ENTITY indices SYSTEM "indices.sgml"> <!ENTITY json SYSTEM "json.sgml"> <!ENTITY mvcc SYSTEM "mvcc.sgml"> doc/src/sgml/filelist.sgml | 7 + doc/src/sgml/func-admin.sgml | 2682 +++++ doc/src/sgml/func-aggregate.sgml | 1418 +++ doc/src/sgml/func-datetime.sgml | 2184 ++++ doc/src/sgml/func-info.sgml | 3640 ++++++ doc/src/sgml/func-json.sgml | 3851 ++++++ doc/src/sgml/func-matching.sgml | 2429 ++++ doc/src/sgml/func-string.sgml | 1771 +++ doc/src/sgml/func.sgml | 17979 +---------------------------- we can do it one by one, but it's still worth it.
On Mon, Apr 8, 2024 at 10:15 AM Peter Eisentraut <peter@eisentraut.org> wrote: > > Here is a new version of this patch. I think this is v18 material at > > this point, absent an outcry to the contrary. Sometimes we're flexible > > about doc patches. > > Looks good to me. I think this could go into PG17. Hearing no objections, done. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2024-03-19 17:39:39 -0400, Andrew Dunstan wrote: > My own pet docs peeve is a purely editorial one: func.sgml is a 30k line > beast, and I think there's a good case for splitting out at least the > larger chunks of it. I think we should work on generating a lot of func.sgml. Particularly the signature etc should just come from pg_proc.dat, it's pointlessly painful to generate that by hand. And for a lot of the functions we should probably move the existing func.sgml comments to the description in pg_proc.dat. I suspect that we can't just generate all the documentation from pg_proc, because of xrefs etc. Although perhaps we could just strip those out for pg_proc. We'd need to add some more metadata to pg_proc, for grouping kinds of functions together. But that seems doable. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I think we should work on generating a lot of func.sgml. Particularly the > signature etc should just come from pg_proc.dat, it's pointlessly painful to > generate that by hand. And for a lot of the functions we should probably move > the existing func.sgml comments to the description in pg_proc.dat. Where are you going to get the examples and text descriptions from? (And no, I don't agree that the pg_description string should match what's in the docs. The description string has to be a short one-liner in just about every case.) This sounds to me like it would be a painful exercise with not a lot of benefit in the end. I do agree with Andrew that splitting func.sgml into multiple files would be beneficial. regards, tom lane
On Tue, Apr 16, 2024 at 03:05:32PM -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think we should work on generating a lot of func.sgml. Particularly the > > signature etc should just come from pg_proc.dat, it's pointlessly painful to > > generate that by hand. And for a lot of the functions we should probably move > > the existing func.sgml comments to the description in pg_proc.dat. > > Where are you going to get the examples and text descriptions from? > (And no, I don't agree that the pg_description string should match > what's in the docs. The description string has to be a short > one-liner in just about every case.) > > This sounds to me like it would be a painful exercise with not a > lot of benefit in the end. Maybe we could _verify_ the contents of func.sgml against pg_proc. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Hi, On 2024-04-16 15:05:32 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think we should work on generating a lot of func.sgml. Particularly the > > signature etc should just come from pg_proc.dat, it's pointlessly painful to > > generate that by hand. And for a lot of the functions we should probably move > > the existing func.sgml comments to the description in pg_proc.dat. > > Where are you going to get the examples and text descriptions from? I think there's a few different way to do that. E.g. having long_desc, example fields in pg_proc.dat. Or having examples and description in a separate file and "enriching" that with auto-generated function signatures. > (And no, I don't agree that the pg_description string should match > what's in the docs. The description string has to be a short > one-liner in just about every case.) Definitely shouldn't be the same in all cases, but I think there's a decent number of cases where they can be the same. The differences between the two is often minimal today. Entirely randomly chosen example: { oid => '2825', descr => 'slope of the least-squares-fit linear equation determined by the (X, Y) pairs', proname => 'regr_slope', prokind => 'a', proisstrict => 'f', prorettype => 'float8', proargtypes => 'float8 float8', prosrc => 'aggregate_dummy' }, and <row> <entry role="func_table_entry"><para role="func_signature"> <indexterm> <primary>regression slope</primary> </indexterm> <indexterm> <primary>regr_slope</primary> </indexterm> <function>regr_slope</function> ( <parameter>Y</parameter> <type>double precision</type>, <parameter>X</parameter><type>double precision</type> ) <returnvalue>double precision</returnvalue> </para> <para> Computes the slope of the least-squares-fit linear equation determined by the (<parameter>X</parameter>, <parameter>Y</parameter>) pairs. </para></entry> <entry>Yes</entry> </row> The description is quite similar, the pg_proc entry lacks argument names. > This sounds to me like it would be a painful exercise with not a > lot of benefit in the end. I think the manual work for writing signatures in sgml is not insignificant, nor is the volume of sgml for them. Manually maintaining the signatures makes it impractical to significantly improve the presentation - which I don't think is all that great today. And the lack of argument names in the pg_proc entries is occasionally fairly annoying, because a \df+ doesn't provide enough information to use functions. It'd also be quite useful if clients could render more of the documentation for functions. People are used to language servers providing full documentation for functions etc... Greetings, Andres Freund
> This sounds to me like it would be a painful exercise with not a
> lot of benefit in the end.
Maybe we could _verify_ the contents of func.sgml against pg_proc.
All of the functions redefined in catalog/system_functions.sql complicate using pg_proc.dat as a doc generator or source of validation. We'd probably do better to validate against a live instance, and even then the benefit wouldn't be great.
Andres Freund <andres@anarazel.de> writes: > Definitely shouldn't be the same in all cases, but I think there's a decent > number of cases where they can be the same. The differences between the two is > often minimal today. > > Entirely randomly chosen example: > > { oid => '2825', > descr => 'slope of the least-squares-fit linear equation determined by the (X, Y) pairs', > proname => 'regr_slope', prokind => 'a', proisstrict => 'f', > prorettype => 'float8', proargtypes => 'float8 float8', > prosrc => 'aggregate_dummy' }, > > and > > <row> > <entry role="func_table_entry"><para role="func_signature"> > <indexterm> > <primary>regression slope</primary> > </indexterm> > <indexterm> > <primary>regr_slope</primary> > </indexterm> > <function>regr_slope</function> ( <parameter>Y</parameter> <type>double precision</type>, <parameter>X</parameter><type>double precision</type> ) > <returnvalue>double precision</returnvalue> > </para> > <para> > Computes the slope of the least-squares-fit linear equation determined > by the (<parameter>X</parameter>, <parameter>Y</parameter>) > pairs. > </para></entry> > <entry>Yes</entry> > </row> > > > The description is quite similar, the pg_proc entry lacks argument names. > > >> This sounds to me like it would be a painful exercise with not a >> lot of benefit in the end. > > I think the manual work for writing signatures in sgml is not insignificant, > nor is the volume of sgml for them. Manually maintaining the signatures makes > it impractical to significantly improve the presentation - which I don't think > is all that great today. And it's very inconsistent. For example, some functions use <optional> tags for optional parameters, others use square brackets, and some use <literal>VARIADIC</literal> to indicate variadic parameters, others use ellipses (sometimes in <optional> tags or brackets). > And the lack of argument names in the pg_proc entries is occasionally fairly > annoying, because a \df+ doesn't provide enough information to use functions. I was also annoyed by this the other day (specifically wrt. the boolean arguments to pg_ls_dir), and started whipping up a Perl script to parse func.sgml and generate missing proargnames values for pg_proc.dat, which is how I discovered the above. The script currently has a pile of hacky regexes to cope with that, so I'd be happy to submit a doc patch to turn it into actual markup to get rid of that, if people think that's a worhtwhile use of time and won't clash with any other plans for the documentation. > It'd also be quite useful if clients could render more of the documentation > for functions. People are used to language servers providing full > documentation for functions etc... A more user-friendly version of \df+ (maybe spelled \hf, for symmetry with \h for commands?) would certainly be nice. > Greetings, > > Andres Freund - ilmari
And it's very inconsistent. For example, some functions use <optional>
tags for optional parameters, others use square brackets, and some use
<literal>VARIADIC</literal> to indicate variadic parameters, others use
ellipses (sometimes in <optional> tags or brackets).
Having just written a couple of those functions, I wasn't able to find any guidance on how to document them with regards to <optional> vs [], etc. Having such a thing would be helpful.
While we're throwing out ideas, does it make sense to have function parameters and return values be things that can accept COMMENTs? Like so:
COMMENT ON FUNCTION function_name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ] ARGUMENT argname IS '....';
COMMENT ON FUNCTION function_name [ ( [ [ argmode ] [ argname ] argtype [, ...] ] ) ] RETURN VALUE IS '....';
I don't think this is a great idea, but if we're going to auto-generate documentation then we've got to store the metadata somewhere, and pg_proc.dat is already lacking relevant details.
Hi, On 2024-04-17 02:46:53 -0400, Corey Huinker wrote: > > > This sounds to me like it would be a painful exercise with not a > > > lot of benefit in the end. > > > > Maybe we could _verify_ the contents of func.sgml against pg_proc. > > > > All of the functions redefined in catalog/system_functions.sql complicate > using pg_proc.dat as a doc generator or source of validation. We'd probably > do better to validate against a live instance, and even then the benefit > wouldn't be great. There are 80 'CREATE OR REPLACE's in system_functions.sql, 1016 occurrences of func_table_entry in funcs.sgml and 3.3k functions in pg_proc. I'm not saying that differences due to system_functions.sql wouldn't be annoying to deal with, but it'd also be far from the end of the world. Greetings, Andres Freund
Hi, On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote: > Andres Freund <andres@anarazel.de> writes: > > I think the manual work for writing signatures in sgml is not insignificant, > > nor is the volume of sgml for them. Manually maintaining the signatures makes > > it impractical to significantly improve the presentation - which I don't think > > is all that great today. > > And it's very inconsistent. For example, some functions use <optional> > tags for optional parameters, others use square brackets, and some use > <literal>VARIADIC</literal> to indicate variadic parameters, others use > ellipses (sometimes in <optional> tags or brackets). That seems almost inevitably the outcome of many people having to manually infer the recommended semantics, for writing something boring but nontrivial, from a 30k line file. > > And the lack of argument names in the pg_proc entries is occasionally fairly > > annoying, because a \df+ doesn't provide enough information to use functions. > > I was also annoyed by this the other day (specifically wrt. the boolean > arguments to pg_ls_dir), My bane is regexp_match et al, I have given up on remembering the argument order. > and started whipping up a Perl script to parse func.sgml and generate > missing proargnames values for pg_proc.dat, which is how I discovered the > above. Nice. > The script currently has a pile of hacky regexes to cope with that, > so I'd be happy to submit a doc patch to turn it into actual markup to get > rid of that, if people think that's a worhtwhile use of time and won't clash > with any other plans for the documentation. I guess it's a bit hard to say without knowing how voluminious the changes would be. If we end up rewriting the whole file the tradeoff is less clear than if it's a dozen inconsistent entries. > > It'd also be quite useful if clients could render more of the documentation > > for functions. People are used to language servers providing full > > documentation for functions etc... > > A more user-friendly version of \df+ (maybe spelled \hf, for symmetry > with \h for commands?) would certainly be nice. Indeed. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Hi, > > On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote: >> Andres Freund <andres@anarazel.de> writes: >> > I think the manual work for writing signatures in sgml is not insignificant, >> > nor is the volume of sgml for them. Manually maintaining the signatures makes >> > it impractical to significantly improve the presentation - which I don't think >> > is all that great today. >> >> And it's very inconsistent. For example, some functions use <optional> >> tags for optional parameters, others use square brackets, and some use >> <literal>VARIADIC</literal> to indicate variadic parameters, others use >> ellipses (sometimes in <optional> tags or brackets). > > That seems almost inevitably the outcome of many people having to manually > infer the recommended semantics, for writing something boring but nontrivial, > from a 30k line file. As Corey mentioned elsethread, having a markup style guide (maybe a comment at the top of the file?) would be nice. >> > And the lack of argument names in the pg_proc entries is occasionally fairly >> > annoying, because a \df+ doesn't provide enough information to use functions. >> >> I was also annoyed by this the other day (specifically wrt. the boolean >> arguments to pg_ls_dir), > > My bane is regexp_match et al, I have given up on remembering the argument > order. There's a thread elsewhere about those specifically, but I can't be bothered to find the link right now. >> and started whipping up a Perl script to parse func.sgml and generate >> missing proargnames values for pg_proc.dat, which is how I discovered the >> above. > > Nice. > >> The script currently has a pile of hacky regexes to cope with that, >> so I'd be happy to submit a doc patch to turn it into actual markup to get >> rid of that, if people think that's a worhtwhile use of time and won't clash >> with any other plans for the documentation. > > I guess it's a bit hard to say without knowing how voluminious the changes > would be. If we end up rewriting the whole file the tradeoff is less clear > than if it's a dozen inconsistent entries. It turned out to not be that many that used [] for optional parameters, see the attached patch. I havent dealt with variadic yet, since the two styles are visually different, not just markup (<optional>...</optional> renders as [...]). The two styles for variadic are the what I call caller-style: concat ( val1 "any" [, val2 "any" [, ...] ] ) format(formatstr text [, formatarg "any" [, ...] ]) which shows more clearly how you'd call it, versus definition-style: num_nonnulls ( VARIADIC "any" ) jsonb_extract_path ( from_json jsonb, VARIADIC path_elems text[] ) which matches the CREATE FUNCTION statement. I don't have a strong opinion on which we should use, but we should be consistent. > Greetings, > > Andres Freund - ilmari From f71e0669eb25b205bd5065f15657ba6d749261f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org> Date: Wed, 17 Apr 2024 16:00:52 +0100 Subject: [PATCH] func.sgml: Consistently use <optional> to indicate optional parameters Some functions were using square brackets instead. --- doc/src/sgml/func.sgml | 54 +++++++++++++++++++++--------------------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 8dfb42ad4d..afaaf61d69 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -3036,7 +3036,7 @@ <primary>concat</primary> </indexterm> <function>concat</function> ( <parameter>val1</parameter> <type>"any"</type> - [, <parameter>val2</parameter> <type>"any"</type> [, ...] ] ) + <optional>, <parameter>val2</parameter> <type>"any"</type> [, ...] </optional> ) <returnvalue>text</returnvalue> </para> <para> @@ -3056,7 +3056,7 @@ </indexterm> <function>concat_ws</function> ( <parameter>sep</parameter> <type>text</type>, <parameter>val1</parameter> <type>"any"</type> - [, <parameter>val2</parameter> <type>"any"</type> [, ...] ] ) + <optional>, <parameter>val2</parameter> <type>"any"</type> [, ...] </optional> ) <returnvalue>text</returnvalue> </para> <para> @@ -3076,7 +3076,7 @@ <primary>format</primary> </indexterm> <function>format</function> ( <parameter>formatstr</parameter> <type>text</type> - [, <parameter>formatarg</parameter> <type>"any"</type> [, ...] ] ) + <optional>, <parameter>formatarg</parameter> <type>"any"</type> [, ...] </optional> ) <returnvalue>text</returnvalue> </para> <para> @@ -3170,7 +3170,7 @@ <primary>parse_ident</primary> </indexterm> <function>parse_ident</function> ( <parameter>qualified_identifier</parameter> <type>text</type> - [, <parameter>strict_mode</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>true</literal> ]) + <optional>, <parameter>strict_mode</parameter> <type>boolean</type> <literal>DEFAULT</literal> <literal>true</literal></optional> ) <returnvalue>text[]</returnvalue> </para> <para> @@ -3309,8 +3309,8 @@ <primary>regexp_count</primary> </indexterm> <function>regexp_count</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> - [, <parameter>start</parameter> <type>integer</type> - [, <parameter>flags</parameter> <type>text</type> ] ] ) + <optional>, <parameter>start</parameter> <type>integer</type> + <optional>, <parameter>flags</parameter> <type>text</type> </optional> </optional> ) <returnvalue>integer</returnvalue> </para> <para> @@ -3331,11 +3331,11 @@ <primary>regexp_instr</primary> </indexterm> <function>regexp_instr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> - [, <parameter>start</parameter> <type>integer</type> - [, <parameter>N</parameter> <type>integer</type> - [, <parameter>endoption</parameter> <type>integer</type> - [, <parameter>flags</parameter> <type>text</type> - [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ] ) + <optional>, <parameter>start</parameter> <type>integer</type> + <optional>, <parameter>N</parameter> <type>integer</type> + <optional>, <parameter>endoption</parameter> <type>integer</type> + <optional>, <parameter>flags</parameter> <type>text</type> + <optional>, <parameter>subexpr</parameter> <type>integer</type> </optional> </optional> </optional> </optional></optional> ) <returnvalue>integer</returnvalue> </para> <para> @@ -3360,7 +3360,7 @@ <primary>regexp_like</primary> </indexterm> <function>regexp_like</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> - [, <parameter>flags</parameter> <type>text</type> ] ) + <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>boolean</returnvalue> </para> <para> @@ -3380,7 +3380,7 @@ <indexterm> <primary>regexp_match</primary> </indexterm> - <function>regexp_match</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] ) + <function>regexp_match</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>text[]</returnvalue> </para> <para> @@ -3400,7 +3400,7 @@ <indexterm> <primary>regexp_matches</primary> </indexterm> - <function>regexp_matches</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] ) + <function>regexp_matches</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>setof text[]</returnvalue> </para> <para> @@ -3426,8 +3426,8 @@ <primary>regexp_replace</primary> </indexterm> <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type>, <parameter>replacement</parameter> <type>text</type> - [, <parameter>start</parameter> <type>integer</type> ] - [, <parameter>flags</parameter> <type>text</type> ] ) + <optional>, <parameter>start</parameter> <type>integer</type> </optional> + <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>text</returnvalue> </para> <para> @@ -3447,7 +3447,7 @@ <function>regexp_replace</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type>, <parameter>replacement</parameter> <type>text</type>, <parameter>start</parameter> <type>integer</type>, <parameter>N</parameter> <type>integer</type> - [, <parameter>flags</parameter> <type>text</type> ] ) + <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>text</returnvalue> </para> <para> @@ -3467,7 +3467,7 @@ <indexterm> <primary>regexp_split_to_array</primary> </indexterm> - <function>regexp_split_to_array</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] ) + <function>regexp_split_to_array</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>text[]</returnvalue> </para> <para> @@ -3486,7 +3486,7 @@ <indexterm> <primary>regexp_split_to_table</primary> </indexterm> - <function>regexp_split_to_table</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> [, <parameter>flags</parameter> <type>text</type> ] ) + <function>regexp_split_to_table</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> <optional>, <parameter>flags</parameter> <type>text</type> </optional> ) <returnvalue>setof text</returnvalue> </para> <para> @@ -3510,10 +3510,10 @@ <primary>regexp_substr</primary> </indexterm> <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter><type>text</type> - [, <parameter>start</parameter> <type>integer</type> - [, <parameter>N</parameter> <type>integer</type> - [, <parameter>flags</parameter> <type>text</type> - [, <parameter>subexpr</parameter> <type>integer</type> ] ] ] ] ) + <optional>, <parameter>start</parameter> <type>integer</type> + <optional>, <parameter>N</parameter> <type>integer</type> + <optional>, <parameter>flags</parameter> <type>text</type> + <optional>, <parameter>subexpr</parameter> <type>integer</type> </optional> </optional> </optional> </optional>) <returnvalue>text</returnvalue> </para> <para> @@ -3980,7 +3980,7 @@ <para> <synopsis> -<function>format</function>(<parameter>formatstr</parameter> <type>text</type> [, <parameter>formatarg</parameter> <type>"any"</type>[, ...] ]) +<function>format</function>(<parameter>formatstr</parameter> <type>text</type> <optional>, <parameter>formatarg</parameter><type>"any"</type> [, ...] </optional>) </synopsis> <parameter>formatstr</parameter> is a format string that specifies how the result should be formatted. Text in the format string is copied @@ -10568,7 +10568,7 @@ <para> <synopsis> -date_trunc(<replaceable>field</replaceable>, <replaceable>source</replaceable> [, <replaceable>time_zone</replaceable> ]) +date_trunc(<replaceable>field</replaceable>, <replaceable>source</replaceable> <optional>, <replaceable>time_zone</replaceable></optional>) </synopsis> <replaceable>source</replaceable> is a value expression of type <type>timestamp</type>, <type>timestamp with time zone</type>, @@ -29308,11 +29308,11 @@ <indexterm> <primary>pg_logical_emit_message</primary> </indexterm> - <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>, <parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>text</type> [, <parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>] ) + <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>, <parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>text</type> <optional>, <parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> ) <returnvalue>pg_lsn</returnvalue> </para> <para role="func_signature"> - <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>, <parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>bytea</type> [, <parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal>] ) + <function>pg_logical_emit_message</function> ( <parameter>transactional</parameter> <type>boolean</type>, <parameter>prefix</parameter><type>text</type>, <parameter>content</parameter> <type>bytea</type> <optional>, <parameter>flush</parameter><type>boolean</type> <literal>DEFAULT</literal> <literal>false</literal></optional> ) <returnvalue>pg_lsn</returnvalue> </para> <para> -- 2.39.2
On Thu, Apr 18, 2024 at 2:37 AM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote: > > Andres Freund <andres@anarazel.de> writes: > > > Hi, > > > > On 2024-04-17 12:07:24 +0100, Dagfinn Ilmari Mannsåker wrote: > >> Andres Freund <andres@anarazel.de> writes: > >> > I think the manual work for writing signatures in sgml is not insignificant, > >> > nor is the volume of sgml for them. Manually maintaining the signatures makes > >> > it impractical to significantly improve the presentation - which I don't think > >> > is all that great today. > >> > >> And it's very inconsistent. For example, some functions use <optional> > >> tags for optional parameters, others use square brackets, and some use > >> <literal>VARIADIC</literal> to indicate variadic parameters, others use > >> ellipses (sometimes in <optional> tags or brackets). > > > > That seems almost inevitably the outcome of many people having to manually > > infer the recommended semantics, for writing something boring but nontrivial, > > from a 30k line file. > > As Corey mentioned elsethread, having a markup style guide (maybe a > comment at the top of the file?) would be nice. > > >> > And the lack of argument names in the pg_proc entries is occasionally fairly > >> > annoying, because a \df+ doesn't provide enough information to use functions. > >> > >> I was also annoyed by this the other day (specifically wrt. the boolean > >> arguments to pg_ls_dir), > > > > My bane is regexp_match et al, I have given up on remembering the argument > > order. > > There's a thread elsewhere about those specifically, but I can't be > bothered to find the link right now. > > >> and started whipping up a Perl script to parse func.sgml and generate > >> missing proargnames values for pg_proc.dat, which is how I discovered the > >> above. > > > > Nice. > > > >> The script currently has a pile of hacky regexes to cope with that, > >> so I'd be happy to submit a doc patch to turn it into actual markup to get > >> rid of that, if people think that's a worhtwhile use of time and won't clash > >> with any other plans for the documentation. > > > > I guess it's a bit hard to say without knowing how voluminious the changes > > would be. If we end up rewriting the whole file the tradeoff is less clear > > than if it's a dozen inconsistent entries. > > It turned out to not be that many that used [] for optional parameters, > see the attached patch. > hi. I manually checked the html output. It looks good to me.
I havent dealt with variadic yet, since the two styles are visually
different, not just markup (<optional>...</optional> renders as [...]).
The two styles for variadic are the what I call caller-style:
concat ( val1 "any" [, val2 "any" [, ...] ] )
format(formatstr text [, formatarg "any" [, ...] ])
While this style is obviously clumsier for us to compose, it does avoid relying on the user understanding what the word variadic means. Searching through online documentation of the python *args parameter, the word variadic never comes up, the closest they get is "variable length argument". I realize that python is not SQL, but I think it's a good point of reference for what concepts the average reader is likely to know.
Looking at the patch, I think it is good, though I'd consider doing some indentation for the nested <optional>s to allow the author to do more visual tag-matching. The ']'s were sufficiently visually distinct that we didn't really need or want nesting, but <optional> is just another tag to my eyes in a sea of tags.
Corey Huinker <corey.huinker@gmail.com> writes: >> >> I havent dealt with variadic yet, since the two styles are visually >> different, not just markup (<optional>...</optional> renders as [...]). >> >> The two styles for variadic are the what I call caller-style: >> >> concat ( val1 "any" [, val2 "any" [, ...] ] ) >> format(formatstr text [, formatarg "any" [, ...] ]) >> > > While this style is obviously clumsier for us to compose, it does avoid > relying on the user understanding what the word variadic means. Searching > through online documentation of the python *args parameter, the word > variadic never comes up, the closest they get is "variable length > argument". I realize that python is not SQL, but I think it's a good point > of reference for what concepts the average reader is likely to know. Yeah, we can't expect everyone wanting to call a built-in function to know how they would define an equivalent one themselves. In that case I propos marking it up like this: <function>format</function> ( <parameter>formatstr</parameter> <type>text</type> <optional>, <parameter>formatarg</parameter> <type>"any"</type> <optional>, ...</optional> </optional> ) <returnvalue>text</returnvalue> > Looking at the patch, I think it is good, though I'd consider doing some > indentation for the nested <optional>s to allow the author to do more > visual tag-matching. The ']'s were sufficiently visually distinct that we > didn't really need or want nesting, but <optional> is just another tag to > my eyes in a sea of tags. The requisite nesting when there are multiple optional parameters makes it annoying to wrap and indent it "properly" per XML convention, but how about something like this, with each parameter on a line of its own, and all the closing </optional> tags on one line? <function>regexp_substr</function> ( <parameter>string</parameter> <type>text</type>, <parameter>pattern</parameter> <type>text</type> <optional>, <parameter>start</parameter> <type>integer</type> <optional>, <parameter>N</parameter> <type>integer</type> <optional>, <parameter>flags</parameter> <type>text</type> <optional>, <parameter>subexpr</parameter> <type>integer</type> </optional> </optional> </optional> </optional> ) <returnvalue>text</returnvalue> A lot of functions mostly follow this style, except they tend to put the first parameter on the same line of the function namee, even when that makes the line overly long. I propose going the other way, with each parameter on a line of its own, even if the first one would fit after the function name, except the whole parameter list fits after the function name. Also, when there's only one optional argument, or they're independently optional, not nested, the </optional> tag should go on the same line as the parameter. <function>substring</function> ( <parameter>bits</parameter> <type>bit</type> <optional> <literal>FROM</literal> <parameter>start</parameter> <type>integer</type> </optional> <optional> <literal>FOR</literal> <parameter>count</parameter> <type>integer</type> </optional> ) <returnvalue>bit</returnvalue> I'm not quite sure what to with things like json_object which have even more complex nexting of optional parameters, but I do think the current 200+ character lines are too long. - ilmari
Yeah, we can't expect everyone wanting to call a built-in function to
know how they would define an equivalent one themselves. In that case I
propos marking it up like this:
<function>format</function> (
<parameter>formatstr</parameter> <type>text</type>
<optional>, <parameter>formatarg</parameter> <type>"any"</type>
<optional>, ...</optional> </optional> )
<returnvalue>text</returnvalue>
Looks good, but I guess I have to ask: is there a parameter-list tag out there instead of (, and should we be using that?
The requisite nesting when there are multiple optional parameters makes
it annoying to wrap and indent it "properly" per XML convention, but how
about something like this, with each parameter on a line of its own, and
all the closing </optional> tags on one line?
<function>regexp_substr</function> (
<parameter>string</parameter> <type>text</type>,
<parameter>pattern</parameter> <type>text</type>
<optional>, <parameter>start</parameter> <type>integer</type>
<optional>, <parameter>N</parameter> <type>integer</type>
<optional>, <parameter>flags</parameter> <type>text</type>
<optional>, <parameter>subexpr</parameter> <type>integer</type>
</optional> </optional> </optional> </optional> )
<returnvalue>text</returnvalue>
Yes, that has an easy count-the-vertical, count-the-horizontal, do-they-match flow to it.
A lot of functions mostly follow this style, except they tend to put the
first parameter on the same line of the function namee, even when that
makes the line overly long. I propose going the other way, with each
parameter on a line of its own, even if the first one would fit after
the function name, except the whole parameter list fits after the
function name.
+1
Also, when there's only one optional argument, or they're independently
optional, not nested, the </optional> tag should go on the same line as
the parameter.
<function>substring</function> (
<parameter>bits</parameter> <type>bit</type>
<optional> <literal>FROM</literal> <parameter>start</parameter> <type>integer</type> </optional>
<optional> <literal>FOR</literal> <parameter>count</parameter> <type>integer</type> </optional> )
<returnvalue>bit</returnvalue>
+1
On Wed, Apr 17, 2024 at 7:07 PM Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> wrote: > > > > It'd also be quite useful if clients could render more of the documentation > > for functions. People are used to language servers providing full > > documentation for functions etc... > > A more user-friendly version of \df+ (maybe spelled \hf, for symmetry > with \h for commands?) would certainly be nice. > I think `\hf` is useful. otherwise people first need google to find out the function html page, then need Ctrl + F to locate specific function entry. for \hf we may need to offer a doc url link. but currently many functions are unlinkable in the doc. Also one section can have many sections. I guess just linking directly to a nearby position in a html page should be fine. We can also add a url for functions decorated as underscore like mysql (https://dev.mysql.com/doc/refman/8.3/en/string-functions.html#function_concat). I am not sure it is an elegant solution.
On Mon, Apr 15, 2024 at 1:00 PM jian he <jian.universality@gmail.com> wrote: > > On Wed, Mar 20, 2024 at 5:40 AM Andrew Dunstan <andrew@dunslane.net> wrote: > > > > > > +many for improving the index. > > > > My own pet docs peeve is a purely editorial one: func.sgml is a 30k line beast, and I think there's a good case for splittingout at least the larger chunks of it. > > > > I think I successfully reduced func.sgml from 311322 lines to 13167 lines. > (base-commit: 93582974315174d544592185d797a2b44696d1e5) > > writing a patch would be unreviewable. I've splitted it to7 patches. each patch split one <sect1> into separate new files. > func-string.sgml > func-matching.sgml > func-datetime.sgml > func-json.sgml > func-aggregate.sgml > func-info.sgml > func-admin.sgml the above will be newly created files, each corresponding to related individual patches.
Attachment
- v1-0001-split-sect1-functions-string-from-func.sgml.patch
- v1-0006-split-sect1-functions-info-from-func.sgml.patch
- v1-0005-split-sect1-functions-aggregate-from-func.sgml.patch
- v1-0004-split-sect1-functions-json-from-func.sgml.patch
- v1-0007-split-sect1-functions-admin-from-func.sgml.patch
- v1-0003-split-functions-datetime-sect1-from-func.sgml.patch
- v1-0002-split-functions-matching-sect1-from-func.sgml.patch
I've splitted it to7 patches.
each patch split one <sect1> into separate new files.
Seems like a good start. Looking at the diffs of these, I wonder if we would be better off with a func/ directory, each function gets its own file in that dir, and either these files above include the individual files, or the original func.sgml just becomes the organizer of all the functions. That would allow us to do future reorganizations with minimal churn, make validation of this patch a bit more straightforward, and make it easier for future editors to find the function they need to edit.
On Mon, Apr 29, 2024 at 1:17 PM Corey Huinker <corey.huinker@gmail.com> wrote: >> >> I've splitted it to7 patches. >> each patch split one <sect1> into separate new files. > > > Seems like a good start. Looking at the diffs of these, I wonder if we would be better off with a func/ directory, eachfunction gets its own file in that dir, and either these files above include the individual files, or the original func.sgmljust becomes the organizer of all the functions. That would allow us to do future reorganizations with minimal churn,make validation of this patch a bit more straightforward, and make it easier for future editors to find the functionthey need to edit. looking back. The patch is big. no convenient way to review/validate it. so I created a python script to automate it. we can review the python script. (just googling around, I know little about python). * create new files for holding the content. func-string.sgml func-matching.sgml func-datetime.sgml func-json.sgml func-aggregate.sgml func-info.sgml func-admin.sgml * locate parts that need to copy paste to a newly created file, based on line number. line number pattern is mentioned here: http://postgr.es/m/CACJufxEcMjjn-m6fpC2wXHsQbE5nyd%3Dxt6k-jDizBVUKK6O4KQ%40mail.gmail.com * insert placeholder string in func.sgml: &func-string; &func-matching; &func-datetime; &func-json; &func-aggregate; &func-info; &func-admin; * copy the parts to new files. * validate newly created file. (must only have 2 occurrences of "sect1"). * delete the parts from func.sgml files, since they already copy to new files. sed --in-place "2408,4180d ; 5330,7760d ; 8942,11127d ; 15502,19436d ; 21567,22985d ; 24346,28017d ; 28020,30714d " func.sgml * manually change doc/src/sgml/filelist.sgml <!ENTITY func SYSTEM "func.sgml"> +<!ENTITY func-string SYSTEM "func-string.sgml"> +<!ENTITY func-matching SYSTEM "func-matching.sgml"> +<!ENTITY func-datetime SYSTEM "func-datetime.sgml"> +<!ENTITY func-json SYSTEM "func-json.sgml"> +<!ENTITY func-aggregate SYSTEM "func-aggregate.sgml"> +<!ENTITY func-info SYSTEM "func-info.sgml"> +<!ENTITY func-admin SYSTEM "func-admin.sgml"> 2 requirements. 1. manual change doc/src/sgml/filelist.sgml as mentioned before; 2. in python script, at line 35, i use "os.chdir("/home/jian/Desktop/pg_src/src7/postgres/doc/src/sgml")" you need to change to your "doc/src/sgml" directory accordingly.
Attachment
looking back.
The patch is big. no convenient way to review/validate it.
Perhaps we can break up the patches as follows:
1. create the filelist.sgml entries, and create new files as you detailed, empty with func.sgml still managing the sections, but each section now has it's corresponding &func-something; The files are included, but they're completely empty.
2 to 999+. one commit per function moved from func.sgml to it's corresponding func-something.sgml.
It'll be a ton of commits, but each one will be very easy to review.
Alternately, if we put each function in its own file, there would be a series of commits, one per function like such:
1. create the new func-my-function.sgml, copying the definition of the same name
2. delete the definition in func.sgml, replaced with the &func-include;
3. new entry in the filelist.
This approach looks (and IS) tedious, but it has several key advantages:
1. Each one is very easy to review.
2. Big reduction in future merge conflicts on func.sgml.
2. Big reduction in future merge conflicts on func.sgml.
3. location of a given functions docs is now trivial.
4. separation of concerns with regard to content of function def vs placement of same.
5. Easy to ensure that all functions have an anchor.
6. The effort can stall and be resumed at our own pace.
Perhaps your python script can be adapted to this approach? I'm willing to review, or collaborate, or both.
On 2024-07-18 Th 4:16 PM, Corey Huinker wrote:
looking back.
The patch is big. no convenient way to review/validate it.
Perhaps we can break up the patches as follows:1. create the filelist.sgml entries, and create new files as you detailed, empty with func.sgml still managing the sections, but each section now has it's corresponding &func-something; The files are included, but they're completely empty.2 to 999+. one commit per function moved from func.sgml to it's corresponding func-something.sgml.It'll be a ton of commits, but each one will be very easy to review.Alternately, if we put each function in its own file, there would be a series of commits, one per function like such:1. create the new func-my-function.sgml, copying the definition of the same name2. delete the definition in func.sgml, replaced with the &func-include;3. new entry in the filelist.This approach looks (and IS) tedious, but it has several key advantages:1. Each one is very easy to review.
2. Big reduction in future merge conflicts on func.sgml.3. location of a given functions docs is now trivial.4. separation of concerns with regard to content of function def vs placement of same.5. Easy to ensure that all functions have an anchor.6. The effort can stall and be resumed at our own pace.Perhaps your python script can be adapted to this approach? I'm willing to review, or collaborate, or both.
I'm opposed to having a separate file for every function. I think breaking up func.sgml into one piece per sect1 is about right. If that proves cumbersome still we can look at breaking it up further, but let's start with that.
More concretely, sometimes the bits that relate to a particular function are not contiguous. e.g. you might have an entry in a table for a function and then at the end Notes relating to that function. That make doing a file per function not very practical.
Also, being able to view the context for your function documentation is useful when editing.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Andrew Dunstan <andrew@dunslane.net> writes: > I'm opposed to having a separate file for every function. I also think that would be a disaster. It would result in a huge number of files which would make global editing (eg markup changes) really painful, and it by no means flattens the document structure. Somewhere there will still need to be decisions that functions A, B, C go into one documentation section while D, E, F go somewhere else. > I think > breaking up func.sgml into one piece per sect1 is about right. If that > proves cumbersome still we can look at breaking it up further, but let's > start with that. +1 regards, tom lane
> I'm opposed to having a separate file for every function. I think > breaking up func.sgml into one piece per sect1 is about right. If that > proves cumbersome still we can look at breaking it up further, but > let's start with that. That will create at least 30 func-xx.sgml files. t-ishii$ grep '<sect1' func.sgml|wc -l 30 I am afraid that's too many? Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Thu, Jul 18, 2024 at 8:06 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
> I'm opposed to having a separate file for every function. I think
> breaking up func.sgml into one piece per sect1 is about right. If that
> proves cumbersome still we can look at breaking it up further, but
> let's start with that.
That will create at least 30 func-xx.sgml files.
t-ishii$ grep '<sect1' func.sgml|wc -l
30
I am afraid that's too many?
The premise and the resultant number of files both seem reasonable to me. I could get that number down to maybe 20 if pressed but I don't see any benefit to doing so. I look at a page on the website that needs updating then go open its source file. Nice and tidy.
David J.
On Fri, Jul 19, 2024 at 12:07 PM David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Thu, Jul 18, 2024 at 8:06 PM Tatsuo Ishii <ishii@postgresql.org> wrote: >> >> > I'm opposed to having a separate file for every function. I think >> > breaking up func.sgml into one piece per sect1 is about right. If that >> > proves cumbersome still we can look at breaking it up further, but >> > let's start with that. >> >> That will create at least 30 func-xx.sgml files. >> >> t-ishii$ grep '<sect1' func.sgml|wc -l >> 30 >> >> I am afraid that's too many? >> > > The premise and the resultant number of files both seem reasonable to me. I could get that number down to maybe 20 ifpressed but I don't see any benefit to doing so. I look at a page on the website that needs updating then go open its sourcefile. Nice and tidy. > hi. my python test.py script in [1] cut func.sgml from 31k lines to 13k lines by putting some contents to these 7 new files. new file: func-admin.sgml new file: func-aggregate.sgml new file: func-datetime.sgml new file: func-info.sgml new file: func-json.sgml new file: func-matching.sgml new file: func-string.sgml doc/src/sgml/filelist.sgml changes are addressed in the attached patch. cutting func.sgml into half is ok for me. in test.py , at line 35, i use "os.chdir("/home/jian/Desktop/pg_src/src7/postgres/doc/src/sgml")" you need to change to your corresponding "doc/src/sgml" directory. overall, you need apply the attached patch, change test.py line 35. main gotcha is based on pattern mentioned in [2] and index layout from https://www.postgresql.org/docs/devel/functions.html [1] https://postgr.es/m/CACJufxH%2BYi521QrncwnW4sFGOhPmJQpsmoJ%2BYnj%2BVpHu5wAahQ%40mail.gmail.com\ [2] http://postgr.es/m/CACJufxEcMjjn-m6fpC2wXHsQbE5nyd%3Dxt6k-jDizBVUKK6O4KQ%40mail.gmail.com
Attachment
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Thu, Jul 18, 2024 at 8:06 PM Tatsuo Ishii <ishii@postgresql.org> wrote: >>> I'm opposed to having a separate file for every function. I think >>> breaking up func.sgml into one piece per sect1 is about right. >> That will create at least 30 func-xx.sgml files. >> I am afraid that's too many? > The premise and the resultant number of files both seem reasonable to me. I agree. The hundreds that would result from file-per-function, or anything close to that, would be too many. But I can deal with file-per-sect1. For context, I count currently 167 sgml/*.sgml files plus 219 ref/*.sgml, so adding 30 more would be an 8% increase. Do we want to use a "func-" prefix on the file names? I could imagine dispensing with that as unnecessary; or another idea could be to make a new subdirectory func/ for these. regards, tom lane
> Do we want to use a "func-" prefix on the file names? I could > imagine dispensing with that as unnecessary; If we don't use the prefix and we generate new file names from sect1 tag, we could have file name collision: for example, json.sgml because there's sect1 tag "functions-json". > or another idea > could be to make a new subdirectory func/ for these. +1. Looks better than adding +30 files right under sgml directory. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Fri, Jul 19, 2024 at 1:01 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
> Do we want to use a "func-" prefix on the file names? I could
> imagine dispensing with that as unnecessary;
If we don't use the prefix and we generate new file names from sect1
tag, we could have file name collision: for example, json.sgml because
there's sect1 tag "functions-json".
> or another idea
> could be to make a new subdirectory func/ for these.
+1. Looks better than adding +30 files right under sgml directory.
+many for placing these under a subdirectory.
IMO the file name should match the ID of the sect1 element with the leading "functions-" removed, naming the directory "functions". Thus when viewing the web page the corresponding sgml file is determinable.
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > +many for placing these under a subdirectory. > IMO the file name should match the ID of the sect1 element with the leading > "functions-" removed, naming the directory "functions". Thus when viewing > the web page the corresponding sgml file is determinable. I'd go for shorter myself (ie "func/"), mainly due to the precedent of the existing subdirectory which is "ref/" not "reference/". It's hardly a big deal though. regards, tom lane
>> IMO the file name should match the ID of the sect1 element with the leading >> "functions-" removed, naming the directory "functions". Thus when viewing >> the web page the corresponding sgml file is determinable. > > I'd go for shorter myself (ie "func/"), mainly due to the precedent > of the existing subdirectory which is "ref/" not "reference/". > It's hardly a big deal though. I don't have strong preference neither but I agree that "func/" is more consistent with existing subdirectory names. Best reagards, -- Tatsuo Ishii SRA OSS LLC English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp
On Fri, Jul 19, 2024 at 5:47 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
>> IMO the file name should match the ID of the sect1 element with the leading
>> "functions-" removed, naming the directory "functions". Thus when viewing
>> the web page the corresponding sgml file is determinable.
>
> I'd go for shorter myself (ie "func/"), mainly due to the precedent
> of the existing subdirectory which is "ref/" not "reference/".
> It's hardly a big deal though.
I don't have strong preference neither but I agree that "func/" is
more consistent with existing subdirectory names.
Works for me. I was definitely on the fence between the two.
David J.
1. manually change line 4 in split_func_sgml.py and run the script. 2. git apply v6-0001-all-filelist-for-directory-doc-src-sgml-func.patch now you can "ninja doc/src/sgml/html" The logic is simple, but I am using verbose variable names, that's why split_func_sgml.py size is large. adding a new directory: doc/src/sgml/func new directory files: doc/src/sgml/func/allfiles.sgml and others. For filenames, we can use "func-pattern" or just "pattern". right now, I use the prefix "func". each newly created file validates by: only 2 "sect1" exists, each file has only "<sect1 id=" pattern. the following ar content of doc/src/sgml/func/allfiles.sgml <!-- doc/src/sgml/func/allfiles.sgml PostgreSQL documentation Complete list of usable sgml source files in this directory. --> <!-- function references --> <!ENTITY func SYSTEM "func.sgml"> <!ENTITY func-logical SYSTEM "func-logical.sgml"> <!ENTITY func-comparison SYSTEM "func-comparison.sgml"> <!ENTITY func-math SYSTEM "func-math.sgml"> <!ENTITY func-string SYSTEM "func-string.sgml"> <!ENTITY func-binarystring SYSTEM "func-binarystring.sgml"> <!ENTITY func-bitstring SYSTEM "func-bitstring.sgml"> <!ENTITY func-matching SYSTEM "func-matching.sgml"> <!ENTITY func-formatting SYSTEM "func-formatting.sgml"> <!ENTITY func-datetime SYSTEM "func-datetime.sgml"> <!ENTITY func-enum SYSTEM "func-enum.sgml"> <!ENTITY func-geometry SYSTEM "func-geometry.sgml"> <!ENTITY func-net SYSTEM "func-net.sgml"> <!ENTITY func-textsearch SYSTEM "func-textsearch.sgml"> <!ENTITY func-uuid SYSTEM "func-uuid.sgml"> <!ENTITY func-xml SYSTEM "func-xml.sgml"> <!ENTITY func-json SYSTEM "func-json.sgml"> <!ENTITY func-sequence SYSTEM "func-sequence.sgml"> <!ENTITY func-conditional SYSTEM "func-conditional.sgml"> <!ENTITY func-array SYSTEM "func-array.sgml"> <!ENTITY func-range SYSTEM "func-range.sgml"> <!ENTITY func-aggregate SYSTEM "func-aggregate.sgml"> <!ENTITY func-window SYSTEM "func-window.sgml"> <!ENTITY func-merge-support SYSTEM "func-merge-support.sgml"> <!ENTITY func-subquery SYSTEM "func-subquery.sgml"> <!ENTITY func-comparisons SYSTEM "func-comparisons.sgml"> <!ENTITY func-srf SYSTEM "func-srf.sgml"> <!ENTITY func-info SYSTEM "func-info.sgml"> <!ENTITY func-admin SYSTEM "func-admin.sgml"> <!ENTITY func-trigger SYSTEM "func-trigger.sgml"> <!ENTITY func-event-triggers SYSTEM "func-event-triggers.sgml"> <!ENTITY func-statistics SYSTEM "func-statistics.sgml">
Attachment
On Tue, Aug 20, 2024 at 11:18 AM jian he <jian.universality@gmail.com> wrote: > > On Mon, Jul 22, 2024 at 10:42 AM jian he <jian.universality@gmail.com> wrote: > > > hi. this time everything should be ok! > > > step1. python3 split_func_sgml.py > step2. git apply > v6-0001-all-filelist-for-directory-doc-src-sgml-func.patch (step2, > cannot use "git am"). > sorry. I mean the file attached in the previous mail. step1. python3 v7_split_func_sgml.py step2. git apply v7-0001-all-filelist-for-directory-doc-src-sgml-func.patch (step2, cannot use "git am").
On 30/09/2024 11:15, jian he wrote: > In, 0001-func.sgml-Consistently-use-optional-to-indicate-opti.patch > > -<function>format</function>(<parameter>formatstr</parameter> > <type>text</type> [, <parameter>formatarg</parameter> > <type>"any"</type> [, ...] ]) > +<function>format</function>(<parameter>formatstr</parameter> > <type>text</type> <optional>, <parameter>formatarg</parameter> > <type>"any"</type> [, ...] </optional>) > > i change it further to > +<function>format</function>(<parameter>formatstr</parameter> > <type>text</type> <optional>, <parameter>formatarg</parameter> > <type>"any"</type> <optional>, ...</optional> </optional>) > > i did these kind of change to <function>format</function>, > <function>concat_ws</function>, <function>concat</function> > > I've rebased your patch, > added a commitfest entry: https://commitfest.postgresql.org/50/5278/ > it seems I cannot mark you as the author in commitfest. > anyway, you ( Dagfinn Ilmari Mannsåker ) are the author of it. Committed, thanks! One funny case remains: <entry role="func_table_entry"><para role="func_signature"> <function>unnest</function> ( <type>anyarray</type>, <type>anyarray</type> <optional>, ... </optional> ) <returnvalue>setof anyelement, anyelement [, ... ]</returnvalue> </para> <para> The returnvalue contains brackets. I tried to change that to use "<optional>" tag too, but it's not valid: postgres.sgml:20552: element returnvalue: validity error : Element optional is not declared in returnvalue list of possible children -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 30/09/2024 11:15, jian he wrote: >> In, 0001-func.sgml-Consistently-use-optional-to-indicate-opti.patch >> -<function>format</function>(<parameter>formatstr</parameter> >> <type>text</type> [, <parameter>formatarg</parameter> >> <type>"any"</type> [, ...] ]) >> +<function>format</function>(<parameter>formatstr</parameter> >> <type>text</type> <optional>, <parameter>formatarg</parameter> >> <type>"any"</type> [, ...] </optional>) >> i change it further to >> +<function>format</function>(<parameter>formatstr</parameter> >> <type>text</type> <optional>, <parameter>formatarg</parameter> >> <type>"any"</type> <optional>, ...</optional> </optional>) >> i did these kind of change to <function>format</function>, >> <function>concat_ws</function>, <function>concat</function> >> I've rebased your patch, >> added a commitfest entry: https://commitfest.postgresql.org/50/5278/ >> it seems I cannot mark you as the author in commitfest. >> anyway, you ( Dagfinn Ilmari Mannsåker ) are the author of it. > > Committed, thanks! Thanks! > One funny case remains: > > <entry role="func_table_entry"><para role="func_signature"> > <function>unnest</function> ( <type>anyarray</type>, > <type>anyarray</type> <optional>, ... </optional> ) > <returnvalue>setof anyelement, anyelement [, ... ]</returnvalue> > </para> > <para> > > The returnvalue contains brackets. I tried to change that to use > "<optional>" tag too, but it's not valid: > > postgres.sgml:20552: element returnvalue: validity error : Element > optional is not declared in returnvalue list of possible children Yeah, I remember running into that, and deciding I couldn't be bothered to try to learn enough about docbook to figure out if a solution was possible. - ilmari