Thread: Should we put command options in alphabetical order in the doc?
Over on [1], Peter mentions that we might want to consider putting the VACUUM options into some order that's better than the apparent random order that they're currently in. VACUUM is certainly one command that's grown a fairly good number of options over the years and it appears we've not given much consideration to what order to put those in in the documentation. It's not just VACUUM that has this issue. I see 6 commands using the following text: $ git grep "option</replaceable> can be one of" src/sgml/ref/analyze.sgml: ... src/sgml/ref/cluster.sgml: ... src/sgml/ref/copy.sgml: ... src/sgml/ref/explain.sgml: ... src/sgml/ref/reindex.sgml: ... src/sgml/ref/vacuum.sgml: ... (maybe there's more we should consider adjusting?) Likely if we do opt to put these options in a more well-defined order, we should apply that to at least the 6 commands listed above. For the case of reindex.sgml, I do see that the existing parameter order lists INDEX | TABLE | SCHEMA | DATABASE | SYSTEM first which is the target of the reindex. I wondered if that was worth keeping. I'm just thinking that since all of these are under the "Parameters" heading that we should class them all as equals and just make the order alphabetical. I feel that if we don't do that then the order to add any new parameters is just not going to be obvious and we'll end up with things getting out of order again quite quickly. I've attached a patch which makes the changes as I propose them. David [1] https://postgr.es/m/16845cb1-b228-e157-f293-5892bced9253@enterprisedb.com
Attachment
On Mon, Apr 17, 2023 at 10:45 PM David Rowley <dgrowleyml@gmail.com> wrote: > For the case of reindex.sgml, I do see that the existing parameter > order lists INDEX | TABLE | SCHEMA | DATABASE | SYSTEM first which is > the target of the reindex. I wondered if that was worth keeping. I'm > just thinking that since all of these are under the "Parameters" > heading that we should class them all as equals and just make the > order alphabetical. I feel that if we don't do that then the order to > add any new parameters is just not going to be obvious and we'll end > up with things getting out of order again quite quickly. I don't think that alphabetical order makes much sense. Surely some parameters are more important than others. Surely there is some kind of natural grouping that makes somewhat more sense than alphabetical order. Take the VACUUM command. Right now FULL, FREEZE, and VERBOSE all come first. Those options are approximately the most important options -- especially VERBOSE. But your patch places VERBOSE dead last. -- Peter Geoghegan
On Tue, 18 Apr 2023 at 18:53, Peter Geoghegan <pg@bowt.ie> wrote: > Take the VACUUM command. Right now FULL, FREEZE, and VERBOSE all come > first. Those options are approximately the most important options -- > especially VERBOSE. But your patch places VERBOSE dead last. hmm, how can we verify that the options are kept in order of importance? What guidance can we provide to developers adding options about where they should slot in the new option to the docs? "Importance order" just seems horribly subjective to me. I'd be interested to know if you could tell me if SKIP_LOCKED has more importance than INDEX_CLEANUP, for example. If you can, it would seem like trying to say apples are more important than oranges, or vice-versa. David
On Tue, Apr 18, 2023 at 4:18 PM David Rowley <dgrowleyml@gmail.com> wrote: > "Importance order" just seems horribly subjective to me. Alphabetical order seems objectively bad. At least to me. > I'd be interested to know if you could tell me if SKIP_LOCKED has more > importance than INDEX_CLEANUP, for example. If you can, it would seem > like trying to say apples are more important than oranges, or > vice-versa. I don't accept your premise that the only thing that matters (or the most important thing) is adherence to some unambiguous and consistent order. -- Peter Geoghegan
On Tue, Apr 18, 2023 at 4:30 PM Peter Geoghegan <pg@bowt.ie> wrote: > > I'd be interested to know if you could tell me if SKIP_LOCKED has more > > importance than INDEX_CLEANUP, for example. If you can, it would seem > > like trying to say apples are more important than oranges, or > > vice-versa. > > I don't accept your premise that the only thing that matters (or the > most important thing) is adherence to some unambiguous and consistent > order. In the case of VACUUM, the current devel order is: FULL, FREEZE, VERBOSE, ANALYZE, DISABLE_PAGE_SKIPPING, SKIP_LOCKED, INDEX_CLEANUP, PROCESS_MAIN, PROCESS_TOAST, TRUNCATE, PARALLEL, SKIP_DATABASE_STATS, ONLY_DATABASE_STATS, BUFFER_USAGE_LIMIT I think that this order is far superior to alphabetical order, which is tantamount to random order. The first 4 items are indeed the really important ones to users, in my experience. I do have some minor quibbles beyond that, though. These are: * PARALLEL deserves to be at the start, maybe 4th or 5th overall. * DISABLE_PAGE_SKIPPING should be later, since it's really only a testing option that probably never proved useful in production. In particular, it has little business being before SKIP_LOCKED, which is much more important and relevant. * TRUNCATE and INDEX_CLEANUP are similar options, and ought to be side by side. I would put PROCESS_MAIN and PROCESS_TOAST after those two for the same reason. While I'm certain that nobody will agree with me on every little detail, I have to imagine that most would find my preferred ordering quite understandable and unsurprising, at a high level -- this is not a hopelessly idiosyncratic ranking, that could just as easily have been generated by a PRNG. People may not easily agree that "apples are more important than oranges, or vice-versa", but what does it matter? I've really only put each option into buckets of items with *roughly* the same importance. All of the details beyond that don't matter to me, at all. -- Peter Geoghegan
On 2023-Apr-18, Peter Geoghegan wrote: > While I'm certain that nobody will agree with me on every little > detail, I have to imagine that most would find my preferred ordering > quite understandable and unsurprising, at a high level -- this is not > a hopelessly idiosyncratic ranking, that could just as easily have > been generated by a PRNG. People may not easily agree that "apples are > more important than oranges, or vice-versa", but what does it matter? > I've really only put each option into buckets of items with *roughly* > the same importance. All of the details beyond that don't matter to > me, at all. I agree with you that roughly bucketing items is a good approach. Within each bucket we can then sort alphabetically. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
On 19.04.23 01:30, Peter Geoghegan wrote: >> I'd be interested to know if you could tell me if SKIP_LOCKED has more >> importance than INDEX_CLEANUP, for example. If you can, it would seem >> like trying to say apples are more important than oranges, or >> vice-versa. > > I don't accept your premise that the only thing that matters (or the > most important thing) is adherence to some unambiguous and consistent > order. My thinking is, if I want to look up FREEZE on the VACUUM man page, I would welcome some easily identifiable way of locating it. At that point, I don't know whether FREEZE is important or what kind of option it is. For reference material, easy lookup should be a priority. For a narrative chapter on VACUUM, you can introduce the options in any other suitable order.
> On 19 Apr 2023, at 10:52, Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > For reference material, easy lookup should be a priority. +1. Alphabetical ordering is consistent with POLA. > For a narrative chapter on VACUUM, you can introduce the options in any other > suitable order. I would even phrase it such that in this case one *should* present the options in the order most suitable to educate the reader. -- Daniel Gustafsson
On Wed, Apr 19, 2023 at 3:04 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > While I'm certain that nobody will agree with me on every little > > detail, I have to imagine that most would find my preferred ordering > > quite understandable and unsurprising, at a high level -- this is not > > a hopelessly idiosyncratic ranking, that could just as easily have > > been generated by a PRNG. People may not easily agree that "apples are > > more important than oranges, or vice-versa", but what does it matter? > > I've really only put each option into buckets of items with *roughly* > > the same importance. All of the details beyond that don't matter to > > me, at all. > > I agree with you that roughly bucketing items is a good approach. > Within each bucket we can then sort alphabetically. I think of these buckets as working at a logarithmic scale. The FULL, FREEZE, VERBOSE, and ANALYZE options are multiple orders of magnitude more important than most of the other options, and maybe one order of magnitude more important than the PARALLEL, TRUNCATE, and INDEX_CLEANUP options. With differences that big, you have a structure that generalizes across all users quite well. This doesn't seem particularly subjective. -- Peter Geoghegan
On Wed, Apr 19, 2023 at 2:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Apr 19, 2023 at 3:04 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > While I'm certain that nobody will agree with me on every little > > > detail, I have to imagine that most would find my preferred ordering > > > quite understandable and unsurprising, at a high level -- this is not > > > a hopelessly idiosyncratic ranking, that could just as easily have > > > been generated by a PRNG. People may not easily agree that "apples are > > > more important than oranges, or vice-versa", but what does it matter? > > > I've really only put each option into buckets of items with *roughly* > > > the same importance. All of the details beyond that don't matter to > > > me, at all. > > > > I agree with you that roughly bucketing items is a good approach. > > Within each bucket we can then sort alphabetically. > > I think of these buckets as working at a logarithmic scale. The FULL, > FREEZE, VERBOSE, and ANALYZE options are multiple orders of magnitude > more important than most of the other options, and maybe one order of > magnitude more important than the PARALLEL, TRUNCATE, and > INDEX_CLEANUP options. With differences that big, you have a structure > that generalizes across all users quite well. This doesn't seem > particularly subjective. I actually favor query/command order followed by alphabetical order for most of the commands David included in his patch. Of course the parameter argument types, like boolean and integer, should be grouped together separate from the main parameters. David fit this into the alphabetical paradigm by doing uppercase alphabetical followed by lowercase alphabetical. There are some specific cases where I think this isn't working quite as intended in his patch. I've called those out in my command-by-command code review below. I actually think we should consider having a single location which defines all argument types for all SQL command parameters. Then we wouldn't need to define them for each command. We could simply link to the definition from the synopsis. That would clean up these lists quite a bit. Perhaps there is some variation from command to command in the actual definitions, though (I haven't checked). I would be happy to try and write this patch if folks are interested in the idea. As for alphabetical ordering vs importance ordering: while I do think that if a user does not know what parameter they are looking for, an alphabetical ordering is unhelpful, I also think the primary issue with grouping them by "importance" is that it is difficult to maintain. Doing so requires a discussion of importance for every new option added. That seems like an annoying bit of overhead to give ourselves. Having a subjective ordering seems worse than having a rule-based ordering. I think command/query order followed by alphabetical order is a reasonable rule-based ordering. I went and took a look at some of the other SQL commands' documentation and noticed that they are all pretty different (for good reason). ALTER ROLE parameters [1], for example, have a seemingly meaningless order except for the fact that there are pairs of parameters. SUPERUSER and NOSUPERUSER, INHERIT and NOINHERIT, etc. It might be a bit odd for these to follow an absolute alphabetical ordering rule. Many of the CREATE type SQL commands don't really have this problem because there are only one or two options within each section of the command and otherwise the order the parameters must appear in the query dictates their order [2]. Others, like EXPLAIN [3], for example, obviously benefit from an alphabetical ordering of parameters -- which David has done in the patch. I think most of the commands that David has patched here are good candidates for alphabetical ordering. Below I've reviewed each command in the patch specifically: For ANALYZE, I think this looks good in its new alphabetized form. Though table_name is alphabetically last for the lower case parameters and thus doesn't pose an issue, if it were alphabetically earlier, I would still favor putting it at the end to maintain a query order then alphabetical order ordering. For CLUSTER, I think alphabetical order isn't working well. I think we should maintain query order followed by alphabetical order. Even though table_name is optional, in the event that it is included, it would precede index_name. So, perhaps the order should be VERBOSE, boolean, table_name, index_name -- which pretty much cancels out alphabetizing. For COPY, I think the new ordering of COPY has some issues. table_name is no longer first even though for COPY FROM it is required before the other parameters. I think this is confusing. Perhaps the options should be after the other parameters are defined. I think having the options alphabetized at the end of the others would be nice. So, my suggested ordering is table_name, column_name, filename, PROGRAM, STDIN, STDOUT, then the WITH options alphabetically, WHERE, and then the parameter argument types alphabetically. The last one (where to put the parameter argument types) I'm not so sure about. EXPLAIN looks good to me as is. For REINDEX, I would again suggest a query ordering followed by alphabetical ordering. CONCURRENTLY, TABLESPACE, VERBOSE, DATABASE, INDEX, SCHEMA, SYSTEM, TABLE, name, then all of the parameter argument types alphabetically. (Also, you can put CONCURRENTLY in two different places in the REINDEX command?) For VACUUM, I'd perhaps suggest the options in alphabetical order followed by table_name and then column_name and then putting the parameter argument types at the end alphabetically. Of course, we could decide VACUUM is special and group its options by importance because this is especially helpful for users. I think that there are other SQL commands whose options' importance is not particularly worth debating. I do think we should consider deprecating and dropping documentation of the options that are supported without parentheses (relevant to commands like ANALYZE, CLUSTER, VACUUM, and others). It is fine if we keep the code to make ANALYZE VERBOSE work, but I don't think it is useful to keep that documented. That is not a concern of this patch, however. - Melanie [1] https://www.postgresql.org/docs/devel/sql-alterrole.html [2] https://www.postgresql.org/docs/devel/sql-createindex.html [3] https://www.postgresql.org/docs/devel/sql-explain.html
Melanie Plageman <melanieplageman@gmail.com> writes: > I do think we should consider deprecating and dropping documentation of > the options that are supported without parentheses (relevant to commands > like ANALYZE, CLUSTER, VACUUM, and others). It is fine if we keep the > code to make ANALYZE VERBOSE work, but I don't think it is useful to > keep that documented. That is not a concern of this patch, however. I doubt it's a great idea to de-document syntax that's still allowed and will still be widely used for years to come; that just promotes confusion. However, we could do something similar to what we did for COPY years ago, and move the un-parenthesized syntax to the "Compatibility" section. regards, tom lane
On Wed, Apr 19, 2023 at 2:33 PM Melanie Plageman <melanieplageman@gmail.com> wrote: > As for alphabetical ordering vs importance ordering: while I do think > that if a user does not know what parameter they are looking for, an > alphabetical ordering is unhelpful, I also think the primary issue with > grouping them by "importance" is that it is difficult to maintain. Doing > so requires a discussion of importance for every new option added. Not really. It's a matter that requires some amount of individual judgement, in some cases. It may require effort, but I think that that's likely to be worth it. I won't be the one that quibbles over every little thing. > For VACUUM, I'd perhaps suggest the options in alphabetical order > followed by table_name and then column_name and then putting the > parameter argument types at the end alphabetically. > > Of course, we could decide VACUUM is special and group its options by > importance because this is especially helpful for users. I think that > there are other SQL commands whose options' importance is not > particularly worth debating. That's very likely true -- it may be that most individual commands really wouldn't be any worse off if they just used a standard alphabetical order. I agree that consistency can be a virtue. But it's not the highest virtue. There will be a number of important exceptions, which will have outsized impact. VACUUM, ANALYZE, maybe CREATE INDEX. So if there is going to be a new standard, there should also be significant wiggle-room. Kind of like with the guidelines for rmgr desc authors discussion. -- Peter Geoghegan
On Wed, Apr 19, 2023 at 05:33:47PM -0400, Melanie Plageman wrote: > On Wed, Apr 19, 2023 at 2:39 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Wed, Apr 19, 2023 at 3:04 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > While I'm certain that nobody will agree with me on every little > > > > detail, I have to imagine that most would find my preferred ordering > > > > quite understandable and unsurprising, at a high level -- this is not > > > > a hopelessly idiosyncratic ranking, that could just as easily have > > > > been generated by a PRNG. People may not easily agree that "apples are > > > > more important than oranges, or vice-versa", but what does it matter? > > > > I've really only put each option into buckets of items with *roughly* > > > > the same importance. All of the details beyond that don't matter to > > > > me, at all. > > > > > > I agree with you that roughly bucketing items is a good approach. > > > Within each bucket we can then sort alphabetically. > > > > I think of these buckets as working at a logarithmic scale. The FULL, > > FREEZE, VERBOSE, and ANALYZE options are multiple orders of magnitude > > more important than most of the other options, and maybe one order of > > magnitude more important than the PARALLEL, TRUNCATE, and > > INDEX_CLEANUP options. With differences that big, you have a structure > > that generalizes across all users quite well. This doesn't seem > > particularly subjective. > > I actually favor query/command order followed by alphabetical order for > most of the commands David included in his patch. > > Of course the parameter argument types, like boolean and integer, should > be grouped together separate from the main parameters. David fit this > into the alphabetical paradigm by doing uppercase alphabetical followed > by lowercase alphabetical. There are some specific cases where I think > this isn't working quite as intended in his patch. I've called those out > in my command-by-command code review below. > > I actually think we should consider having a single location which > defines all argument types for all SQL command parameters. Then we > wouldn't need to define them for each command. We could simply link to > the definition from the synopsis. That would clean up these lists quite > a bit. Perhaps there is some variation from command to command in the > actual definitions, though (I haven't checked). I would be happy to try > and write this patch if folks are interested in the idea. I looked into this and it isn't a good idea. Out of the 183 SQL commands, really only ANALYZE, VACUUM, COPY, CLUSTER, EXPLAIN, and REINDEX have parameter argument types that are context-independent. And out of those, boolean is the only type shared by all. VACUUM is the only one with more than one parameter argument "type". So, it is basically just a bad idea. Oh well... - Melanie
On Wed, 19 Apr 2023 at 22:04, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Apr-18, Peter Geoghegan wrote: > > > While I'm certain that nobody will agree with me on every little > > detail, I have to imagine that most would find my preferred ordering > > quite understandable and unsurprising, at a high level -- this is not > > a hopelessly idiosyncratic ranking, that could just as easily have > > been generated by a PRNG. People may not easily agree that "apples are > > more important than oranges, or vice-versa", but what does it matter? > > I've really only put each option into buckets of items with *roughly* > > the same importance. All of the details beyond that don't matter to > > me, at all. > > I agree with you that roughly bucketing items is a good approach. > Within each bucket we can then sort alphabetically. If these "buckets" were subcategories, then it might be ok. I see "man grep" categorises the command line options and then sorts alphabetically within the category. If we could come up with a way of categorising the options then this would satisfy what Melanie mentioned about having the argument types listed separately. However, I'm really not sure which categories we could have. I really don't have any concrete ideas here, but I'll attempt to at least start something: Behavioral: ANALYZE DISABLE_PAGE_SKIPPING FREEZE FULL INDEX_CLEANUP ONLY_DATABASE_STATS PROCESS_MAIN PROCESS_TOAST SKIP_DATABASE_STATS SKIP_LOCKED TRUNCATE Resource Usage: BUFFER_USAGE_LIMIT PARALLEL Informational: VERBOSE Option Parameters: boolean column_name integer size table_name I'm just not sure if we have enough options to have a need to categorise them. Also, going by the categories I attempted to come up with, it just feels like "Behavioral" contains too many and "Informational" is likely only ever going to contain VERBOSE. So I'm not very happy with them. I'm not really feeling excited enough about this to even come up with a draft patch. I thought I'd send out this anyway to see if anyone can think of anything better. FWIW, vacuumdb --help has its options in alphabetical order using the abbreviated form of the option. David
> On 20 Apr 2023, at 14:40, David Rowley <dgrowleyml@gmail.com> wrote: > I see "man grep" categorises the command line options and then sorts > alphabetically within the category. On FreeBSD and macOS "man grep" lists all options alphabetically. > FWIW, vacuumdb --help has its options in alphabetical order using the > abbreviated form of the option. It does (as most of our binaries do) group "Connection options" separately though, and in initdb --help and pg_dump --help we have other groupings as well. -- Daniel Gustafsson