Thread: Make name optional in CREATE STATISTICS
Currently, CREATE STATS requires you to think of a name for each stats object, which is fairly painful, so users would prefer an automatically assigned name. Attached patch allows this, which turns out to be very simple, since a name assignment function already exists. The generated name is simple, but that's exactly what users do anyway, so it is not too bad. Tests, docs included. -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
ne 15. 5. 2022 v 14:20 odesílatel Simon Riggs <simon.riggs@enterprisedb.com> napsal:
Currently, CREATE STATS requires you to think of a name for each stats
object, which is fairly painful, so users would prefer an
automatically assigned name.
Attached patch allows this, which turns out to be very simple, since a
name assignment function already exists.
The generated name is simple, but that's exactly what users do anyway,
so it is not too bad.
good idea
Pavel
Tests, docs included.
--
Simon Riggs http://www.EnterpriseDB.com/
On Sun, 15 May 2022 at 14:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > Currently, CREATE STATS requires you to think of a name for each stats > object, which is fairly painful, so users would prefer an > automatically assigned name. > > Attached patch allows this, which turns out to be very simple, since a > name assignment function already exists. > > The generated name is simple, but that's exactly what users do anyway, > so it is not too bad. Cool. > Tests, docs included. Something I noticed is that this grammar change is quite different from how create index specifies its optional name. Because we already have a seperate statement sections for with and without IF NOT EXISTS, adding another branch will add even more duplication. Using a new opt_name production (potentially renamed from opt_index_name?) would probably reduce the amount of duplication in the grammar. We might be able to use opt_if_not_exists to fully remove the duplicated grammars, but I don't think we would be able to keep the "CREATE STATISTICS IF NOT EXISTS <<no name>> ON col1, col2 FROM table" syntax illegal. Please also update the comment in gram.y above the updated section that details the expected grammar for CREATE STATISTICS, as you seem to have overlooked that copy of grammar documentation. Apart from these two small issues, this passes tests and seems complete. Kind regards, Matthias van de Meent
On Wed, 6 Jul 2022 at 19:35, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Sun, 15 May 2022 at 14:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > Currently, CREATE STATS requires you to think of a name for each stats > > object, which is fairly painful, so users would prefer an > > automatically assigned name. > > > > Attached patch allows this, which turns out to be very simple, since a > > name assignment function already exists. > > > > The generated name is simple, but that's exactly what users do anyway, > > so it is not too bad. > > Cool. Thanks for your review. > > Tests, docs included. > > Something I noticed is that this grammar change is quite different > from how create index specifies its optional name. Because we already > have a seperate statement sections for with and without IF NOT EXISTS, > adding another branch will add even more duplication. Using a new > opt_name production (potentially renamed from opt_index_name?) would > probably reduce the amount of duplication in the grammar. There are various other ways of doing this and, yes, we could refactor other parts of the grammar to make this work. There is a specific guideline about patch submission that says the best way to get a patch rejected is to include unnecessary changes. With that in mind, let's keep the patch simple and exactly aimed at the original purpose. I'll leave it for committers to decide whether other refactoring is wanted. > We might be able to use opt_if_not_exists to fully remove the > duplicated grammars, but I don't think we would be able to keep the > "CREATE STATISTICS IF NOT EXISTS <<no name>> ON col1, col2 FROM table" > syntax illegal. > > Please also update the comment in gram.y above the updated section > that details the expected grammar for CREATE STATISTICS, as you seem > to have overlooked that copy of grammar documentation. I have made the comment show that the name is optional, thank you. > Apart from these two small issues, this passes tests and seems complete. Patch v4 attached -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Thu, 7 Jul 2022 at 12:55, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > There are various other ways of doing this and, yes, we could refactor > other parts of the grammar to make this work. There is a specific > guideline about patch submission that says the best way to get a patch > rejected is to include unnecessary changes. With that in mind, let's > keep the patch simple and exactly aimed at the original purpose. > > I'll leave it for committers to decide whether other refactoring is wanted. Fair enough. > I have made the comment show that the name is optional, thank you. The updated comment implies that IF NOT EXISTS is allowed without a defined name, which is false: > + * CREATE STATISTICS [IF NOT EXISTS] [stats_name] [(stat types)] A more correct version would be + * CREATE STATISTICS [ [IF NOT EXISTS] stats_name ] [(stat types)] > Patch v4 attached Thanks for working on this! Kind regards, Matthias van de Meent
On Thu, 7 Jul 2022 at 11:58, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Thu, 7 Jul 2022 at 12:55, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > There are various other ways of doing this and, yes, we could refactor > > other parts of the grammar to make this work. There is a specific > > guideline about patch submission that says the best way to get a patch > > rejected is to include unnecessary changes. With that in mind, let's > > keep the patch simple and exactly aimed at the original purpose. > > > > I'll leave it for committers to decide whether other refactoring is wanted. > > Fair enough. > > > I have made the comment show that the name is optional, thank you. > > The updated comment implies that IF NOT EXISTS is allowed without a > defined name, which is false: > > > + * CREATE STATISTICS [IF NOT EXISTS] [stats_name] [(stat types)] > > A more correct version would be > > + * CREATE STATISTICS [ [IF NOT EXISTS] stats_name ] > [(stat types)] There you go -- Simon Riggs http://www.EnterpriseDB.com/
Attachment
On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > On Thu, 7 Jul 2022 at 11:58, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > A more correct version would be > > > > + * CREATE STATISTICS [ [IF NOT EXISTS] stats_name ] > > [(stat types)] > > There you go Thanks! I think this is ready for a committer, so I've marked it as such. Kind regards, Matthias van de Meent
On Wed, 20 Jul 2022 at 12:01, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > > + * CREATE STATISTICS [ [IF NOT EXISTS] stats_name ] > > I think this is ready for a committer, so I've marked it as such. > Picking this up... I tend to agree with Matthias' earlier point about avoiding code duplication in the grammar. Without going off and refactoring other parts of the grammar not related to this patch, it's still a slightly smaller, simpler change, and less code duplication, to do this using a new opt_stats_name production in the grammar, as in the attached. I also noticed a comment in CreateStatistics() that needed updating. Barring any further comments, I'll push this shortly. Regards, Dean
Attachment
On Thu, 21 Jul 2022 at 15:12, Dean Rasheed <dean.a.rasheed@gmail.com> wrote: > > On Wed, 20 Jul 2022 at 12:01, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote: > > > > > > > + * CREATE STATISTICS [ [IF NOT EXISTS] stats_name ] > > > > I think this is ready for a committer, so I've marked it as such. > > > > Picking this up... > > I tend to agree with Matthias' earlier point about avoiding code > duplication in the grammar. Without going off and refactoring other > parts of the grammar not related to this patch, it's still a slightly > smaller, simpler change, and less code duplication, to do this using a > new opt_stats_name production in the grammar, as in the attached. > > I also noticed a comment in CreateStatistics() that needed updating. > > Barring any further comments, I'll push this shortly. Nice change, please proceed. -- Simon Riggs http://www.EnterpriseDB.com/
On 7/21/22 16:12, Dean Rasheed wrote: > On Wed, 20 Jul 2022 at 12:01, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: >> >> On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote: >>> >>>> + * CREATE STATISTICS [ [IF NOT EXISTS] stats_name ] >> >> I think this is ready for a committer, so I've marked it as such. >> > > Picking this up... > > I tend to agree with Matthias' earlier point about avoiding code > duplication in the grammar. Without going off and refactoring other > parts of the grammar not related to this patch, it's still a slightly > smaller, simpler change, and less code duplication, to do this using a > new opt_stats_name production in the grammar, as in the attached. > > I also noticed a comment in CreateStatistics() that needed updating. > > Barring any further comments, I'll push this shortly. > +1 -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2022-Jul-21, Dean Rasheed wrote: > I tend to agree with Matthias' earlier point about avoiding code > duplication in the grammar. Without going off and refactoring other > parts of the grammar not related to this patch, it's still a slightly > smaller, simpler change, and less code duplication, to do this using a > new opt_stats_name production in the grammar, as in the attached. > > I also noticed a comment in CreateStatistics() that needed updating. > > Barring any further comments, I'll push this shortly. Thanks. I was looking at the recently modified REINDEX syntax and noticed there another spot for taking an optional name. I ended up reusing OptSchemaName for that, as in the attached patch. I think adding production-specific additional productions is pointless and probably bloats the grammar. So let me +1 your push of the patch you posted, just to keep things moving forward, but in addition I propose to later rename OptSchemaName to something more generic and use it in these three places. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
On Thu, 21 Jul 2022 at 18:42, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > Thanks. I was looking at the recently modified REINDEX syntax and > noticed there another spot for taking an optional name. I ended up > reusing OptSchemaName for that, as in the attached patch. I think > adding production-specific additional productions is pointless and > probably bloats the grammar. So let me +1 your push of the patch you > posted, just to keep things moving forward, but in addition I propose to > later rename OptSchemaName to something more generic and use it in these > three places. > OK, pushed. Before writing opt_stats_name, I went looking for anything else I could use, but didn't see anything. The difference between this and the index case, is that statistics objects can be schema-qualified, so it might be tricky to get something that'll work for all 3 places. Regards, Dean
On Thu, Jul 21, 2022 at 07:42:12PM +0200, Alvaro Herrera wrote: > Thanks. I was looking at the recently modified REINDEX syntax and > noticed there another spot for taking an optional name. I ended up > reusing OptSchemaName for that, as in the attached patch. I think > adding production-specific additional productions is pointless and > probably bloats the grammar. So let me +1 your push of the patch you > posted, just to keep things moving forward, but in addition I propose to > later rename OptSchemaName to something more generic and use it in these > three places. This slightly changes the behavior of the grammar, as CONCURRENTLY was working on DATABASE as follows: * On HEAD: =# reindex database concurrently postgres; REINDEX =# reindex (concurrently) database concurrently postgres; REINDEX =# reindex (concurrently) database ; REINDEX =# reindex (concurrently) database postgres; REINDEX =# reindex database concurrently postgres; REINDEX =# reindex database concurrently; ERROR: 42601: syntax error at or near ";" And actually, even on HEAD, the last case is marked as supported by the docs but we don't allow it in the parser. My mistake on this one. Now, with the patch, I get: =# reindex (concurrently) database concurrently; ERROR: 42601: syntax error at or near "concurrently" LINE 1: reindex (concurrently) database concurrently postgres ; =# reindex (concurrently) database concurrently postgres; ERROR: 42601: syntax error at or near "concurrently" LINE 1: reindex (concurrently) database concurrently postgres; =# reindex (concurrently) database ; REINDEX =# reindex (concurrently) database postgres; REINDEX =# reindex database concurrently postgres; ERROR: 42601: syntax error at or near "concurrently" LINE 1: reindex database concurrently postgres; =# reindex database concurrently; ERROR: 42601: syntax error at or near "concurrently" So this indeed has as effect to make possible the use of CONCURRENTLY for DATABASE and SYSTEM only within the parenthesized grammar. Seeing the simplifications this creates, I'd agree with dropping this part of the grammar. I think that I would add the following queries to create_index.sql to test this grammar, as we can rely on this code path generating an error: REINDEX (CONCURRENTLY) SYSTEM postgres; REINDEX (CONCURRENTLY) SYSTEM; At least it would validate the parsing for DATABASE. This breaks reindexdb for the database case, because the query generated in run_reindex_command() adds CONCURRENTLY only *after* the database name, and we should be careful to generate something backward-compatible in this tool, as well. The fix is simple: you can add CONCURRENTLY within the section with TABLESPACE and VERBOSE for a connection >= 14, and use it after the object name for <= 13. -- Michael
Attachment
On 2022-Jul-22, Michael Paquier wrote: > So this indeed has as effect to make possible the use of CONCURRENTLY > for DATABASE and SYSTEM only within the parenthesized grammar. Seeing > the simplifications this creates, I'd agree with dropping this part of > the grammar. Actually, looking at the grammar again I realized that the '('options')' part could be refactored; and with that, keeping an extra production for REINDEX DATABASE CONCURRENTLY is short enough. It is removed from REINDEX SYSTEM, but that's OK because that doesn't work anyway. I added the new test lines you proposed and amended the docs; the result is attached. Initially I wanted to use the "optional list of options" for all utilities that have similar constructions, (VACUUM, ANALYZE, CLUSTER, EXPLAIN) but it is not possible because their alternative productions accept different keywords, so it doesn't look possible. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "If you have nothing to say, maybe you need just the right tool to help you not say it." (New York Times, about Microsoft PowerPoint)
Attachment
On Fri, Jul 22, 2022 at 03:06:46PM +0200, Alvaro Herrera wrote: > Actually, looking at the grammar again I realized that the '('options')' > part could be refactored; and with that, keeping an extra production for > REINDEX DATABASE CONCURRENTLY is short enough. It is removed from > REINDEX SYSTEM, but that's OK because that doesn't work anyway. I have just looked at 83011ce, and got what you've done here. You have thrown away reindex_target_multitable and added three parts for SCHEMA, DATABASE and SYSTEM instead with their own options, enforcing the restriction on CONCURRENTLY at the end of REINDEX SYSTEM in the parser rather than indexcmds.c. + | REINDEX opt_reindex_option_list SYSTEM_P opt_single_name { ReindexStmt *n = makeNode(ReindexStmt); - - n->kind = $5; - n->name = $7; + n->kind = REINDEX_OBJECT_SYSTEM; + n->name = NULL; I think that there is a bug in this logic. ReindexStmt->name is always set to NULL, meaning that a REINDEX command with any database name would still pass, but I don't think that we should allow that. For example, with something like these commands, we should complain that the database name specified does not match with the database we are connected to: =# reindex system popo_foo_bar; REINDEX =# reindex database popo_foo_bar; REINDEX It may have been better to wait a bit if you wanted me to look at all that, as our timezones are not compatible, especially on Fridays, but that's fine :D. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > I have just looked at 83011ce, and got what you've done here. You > have thrown away reindex_target_multitable and added three parts for > SCHEMA, DATABASE and SYSTEM instead with their own options, enforcing > the restriction on CONCURRENTLY at the end of REINDEX SYSTEM in the > parser rather than indexcmds.c. That does not seem like an improvement. In v15: regression=# REINDEX SYSTEM CONCURRENTLY db; ERROR: cannot reindex system catalogs concurrently As of HEAD: regression=# REINDEX SYSTEM CONCURRENTLY db; ERROR: syntax error at or near "CONCURRENTLY" LINE 1: REINDEX SYSTEM CONCURRENTLY db; ^ That is not a very helpful error, not even if the man page doesn't show the syntax as legal. regards, tom lane
On Fri, Jul 22, 2022 at 11:54:27PM -0400, Tom Lane wrote: > That does not seem like an improvement. In v15: > > regression=# REINDEX SYSTEM CONCURRENTLY db; > ERROR: cannot reindex system catalogs concurrently > > As of HEAD: > > regression=# REINDEX SYSTEM CONCURRENTLY db; > ERROR: syntax error at or near "CONCURRENTLY" > LINE 1: REINDEX SYSTEM CONCURRENTLY db; > ^ > > That is not a very helpful error, not even if the man page > doesn't show the syntax as legal. As the problem comes down to the fact that INDEX/TABLE, SCHEMA and DATABASE/SYSTEM need to handle names for different object types each, I think that we could do something like the attached, removing one block on the way at the cost of an extra parser node. By the way, it seems that 83011ce also broke the case of "REINDEX DATABASE CONCURRENTLY", where the parser missed the addition of a DefElem for "concurrently" in this case. -- Michael
Attachment
On 2022-Jul-23, Michael Paquier wrote: > As the problem comes down to the fact that INDEX/TABLE, SCHEMA and > DATABASE/SYSTEM need to handle names for different object types each, > I think that we could do something like the attached, removing one > block on the way at the cost of an extra parser node. Yeah, looks good. I propose to also test the error for reindexing a different database, which is currently uncovered, as attached. > By the way, it seems that 83011ce also broke the case of "REINDEX > DATABASE CONCURRENTLY", where the parser missed the addition of a > DefElem for "concurrently" in this case. Wow. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)
Attachment
On Mon, Jul 25, 2022 at 11:49:50AM +0200, Alvaro Herrera wrote: > On 2022-Jul-23, Michael Paquier wrote: >> As the problem comes down to the fact that INDEX/TABLE, SCHEMA and >> DATABASE/SYSTEM need to handle names for different object types each, >> I think that we could do something like the attached, removing one >> block on the way at the cost of an extra parser node. > > Yeah, looks good. I propose to also test the error for reindexing a > different database, which is currently uncovered, as attached. Good idea. >> By the way, it seems that 83011ce also broke the case of "REINDEX >> DATABASE CONCURRENTLY", where the parser missed the addition of a >> DefElem for "concurrently" in this case. > > Wow. For this one, we have a gap in the test, actually. It seems to me that we'd better make sure that the OID of the indexes rebuilt concurrently is changed. There is a REINDEX DATABASE CONCURRENTLY already in the TAP tests, and the only thing that would be needed for the job is an extra query that compares the OID saved before the reindex with the one in the catalogs after the fact.. -- Michael
Attachment
On 2022-Jul-25, Michael Paquier wrote: > On Mon, Jul 25, 2022 at 11:49:50AM +0200, Alvaro Herrera wrote: > > On 2022-Jul-23, Michael Paquier wrote: > >> By the way, it seems that 83011ce also broke the case of "REINDEX > >> DATABASE CONCURRENTLY", where the parser missed the addition of a > >> DefElem for "concurrently" in this case. > > > > Wow. > > For this one, we have a gap in the test, actually. It seems to me > that we'd better make sure that the OID of the indexes rebuilt > concurrently is changed. There is a REINDEX DATABASE CONCURRENTLY > already in the TAP tests, and the only thing that would be needed for > the job is an extra query that compares the OID saved before the > reindex with the one in the catalogs after the fact.. Agreed. I think you already have the query for that elsewhere in the test, so it's just a matter of copying it from there. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "No tengo por qué estar de acuerdo con lo que pienso" (Carlos Caszeli)
On Mon, Jul 25, 2022 at 12:55:54PM +0200, Alvaro Herrera wrote: > Agreed. I think you already have the query for that elsewhere in the > test, so it's just a matter of copying it from there. I actually already wrote most of it in 2cbc3c1, and I just needed to extend things a bit to detect the OID changes :) So, applied, with all the extra tests. -- Michael
Attachment
On 2022-Jul-26, Michael Paquier wrote: > On Mon, Jul 25, 2022 at 12:55:54PM +0200, Alvaro Herrera wrote: > > Agreed. I think you already have the query for that elsewhere in the > > test, so it's just a matter of copying it from there. > > I actually already wrote most of it in 2cbc3c1, and I just needed to > extend things a bit to detect the OID changes :) > > So, applied, with all the extra tests. Thank you! -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I am amazed at [the pgsql-sql] mailing list for the wonderful support, and lack of hesitasion in answering a lost soul's question, I just wished the rest of the mailing list could be like this." (Fotis) (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)