Thread: Make name optional in CREATE STATISTICS

Make name optional in CREATE STATISTICS

From
Simon Riggs
Date:
Currently, CREATE STATS requires you to think of a name for each stats
object, which is fairly painful, so users would prefer an
automatically assigned name.

Attached patch allows this, which turns out to be very simple, since a
name assignment function already exists.

The generated name is simple, but that's exactly what users do anyway,
so it is not too bad.

Tests, docs included.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Make name optional in CREATE STATISTICS

From
Pavel Stehule
Date:


ne 15. 5. 2022 v 14:20 odesílatel Simon Riggs <simon.riggs@enterprisedb.com> napsal:
Currently, CREATE STATS requires you to think of a name for each stats
object, which is fairly painful, so users would prefer an
automatically assigned name.

Attached patch allows this, which turns out to be very simple, since a
name assignment function already exists.

The generated name is simple, but that's exactly what users do anyway,
so it is not too bad.


good idea

Pavel
 
Tests, docs included.

--
Simon Riggs                http://www.EnterpriseDB.com/

Re: Make name optional in CREATE STATISTICS

From
Matthias van de Meent
Date:
On Sun, 15 May 2022 at 14:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> Currently, CREATE STATS requires you to think of a name for each stats
> object, which is fairly painful, so users would prefer an
> automatically assigned name.
>
> Attached patch allows this, which turns out to be very simple, since a
> name assignment function already exists.
>
> The generated name is simple, but that's exactly what users do anyway,
> so it is not too bad.

Cool.

> Tests, docs included.

Something I noticed is that this grammar change is quite different
from how create index specifies its optional name. Because we already
have a seperate statement sections for with and without IF NOT EXISTS,
adding another branch will add even more duplication. Using a new
opt_name production (potentially renamed from opt_index_name?) would
probably reduce the amount of duplication in the grammar.

We might be able to use opt_if_not_exists to fully remove the
duplicated grammars, but I don't think we would be able to keep the
"CREATE STATISTICS IF NOT EXISTS <<no name>> ON col1, col2 FROM table"
syntax illegal.

Please also update the comment in gram.y above the updated section
that details the expected grammar for CREATE STATISTICS, as you seem
to have overlooked that copy of grammar documentation.

Apart from these two small issues, this passes tests and seems complete.


Kind regards,

Matthias van de Meent



Re: Make name optional in CREATE STATISTICS

From
Simon Riggs
Date:
On Wed, 6 Jul 2022 at 19:35, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Sun, 15 May 2022 at 14:20, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> >
> > Currently, CREATE STATS requires you to think of a name for each stats
> > object, which is fairly painful, so users would prefer an
> > automatically assigned name.
> >
> > Attached patch allows this, which turns out to be very simple, since a
> > name assignment function already exists.
> >
> > The generated name is simple, but that's exactly what users do anyway,
> > so it is not too bad.
>
> Cool.

Thanks for your review.


> > Tests, docs included.
>
> Something I noticed is that this grammar change is quite different
> from how create index specifies its optional name. Because we already
> have a seperate statement sections for with and without IF NOT EXISTS,
> adding another branch will add even more duplication. Using a new
> opt_name production (potentially renamed from opt_index_name?) would
> probably reduce the amount of duplication in the grammar.

There are various other ways of doing this and, yes, we could refactor
other parts of the grammar to make this work. There is a specific
guideline about patch submission that says the best way to get a patch
rejected is to include unnecessary changes. With that in mind, let's
keep the patch simple and exactly aimed at the original purpose.

I'll leave it for committers to decide whether other refactoring is wanted.

> We might be able to use opt_if_not_exists to fully remove the
> duplicated grammars, but I don't think we would be able to keep the
> "CREATE STATISTICS IF NOT EXISTS <<no name>> ON col1, col2 FROM table"
> syntax illegal.
>
> Please also update the comment in gram.y above the updated section
> that details the expected grammar for CREATE STATISTICS, as you seem
> to have overlooked that copy of grammar documentation.

I have made the comment show that the name is optional, thank you.


> Apart from these two small issues, this passes tests and seems complete.

Patch v4 attached

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Make name optional in CREATE STATISTICS

From
Matthias van de Meent
Date:
On Thu, 7 Jul 2022 at 12:55, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> There are various other ways of doing this and, yes, we could refactor
> other parts of the grammar to make this work. There is a specific
> guideline about patch submission that says the best way to get a patch
> rejected is to include unnecessary changes. With that in mind, let's
> keep the patch simple and exactly aimed at the original purpose.
>
> I'll leave it for committers to decide whether other refactoring is wanted.

Fair enough.

> I have made the comment show that the name is optional, thank you.

The updated comment implies that IF NOT EXISTS is allowed without a
defined name, which is false:

> + *                CREATE STATISTICS [IF NOT EXISTS] [stats_name] [(stat types)]

A more correct version would be

+ *                CREATE STATISTICS [ [IF NOT EXISTS] stats_name ]
[(stat types)]

> Patch v4 attached

Thanks for working on this!


Kind regards,

Matthias van de Meent



Re: Make name optional in CREATE STATISTICS

From
Simon Riggs
Date:
On Thu, 7 Jul 2022 at 11:58, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Thu, 7 Jul 2022 at 12:55, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > There are various other ways of doing this and, yes, we could refactor
> > other parts of the grammar to make this work. There is a specific
> > guideline about patch submission that says the best way to get a patch
> > rejected is to include unnecessary changes. With that in mind, let's
> > keep the patch simple and exactly aimed at the original purpose.
> >
> > I'll leave it for committers to decide whether other refactoring is wanted.
>
> Fair enough.
>
> > I have made the comment show that the name is optional, thank you.
>
> The updated comment implies that IF NOT EXISTS is allowed without a
> defined name, which is false:
>
> > + *                CREATE STATISTICS [IF NOT EXISTS] [stats_name] [(stat types)]
>
> A more correct version would be
>
> + *                CREATE STATISTICS [ [IF NOT EXISTS] stats_name ]
> [(stat types)]

There you go

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Attachment

Re: Make name optional in CREATE STATISTICS

From
Matthias van de Meent
Date:
On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>
> On Thu, 7 Jul 2022 at 11:58, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > A more correct version would be
> >
> > + *                CREATE STATISTICS [ [IF NOT EXISTS] stats_name ]
> > [(stat types)]
>
> There you go

Thanks!

I think this is ready for a committer, so I've marked it as such.

Kind regards,

Matthias van de Meent



Re: Make name optional in CREATE STATISTICS

From
Dean Rasheed
Date:
On Wed, 20 Jul 2022 at 12:01, Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> >
> > > + *                CREATE STATISTICS [ [IF NOT EXISTS] stats_name ]
>
> I think this is ready for a committer, so I've marked it as such.
>

Picking this up...

I tend to agree with Matthias' earlier point about avoiding code
duplication in the grammar. Without going off and refactoring other
parts of the grammar not related to this patch, it's still a slightly
smaller, simpler change, and less code duplication, to do this using a
new opt_stats_name production in the grammar, as in the attached.

I also noticed a comment in CreateStatistics() that needed updating.

Barring any further comments, I'll push this shortly.

Regards,
Dean

Attachment

Re: Make name optional in CREATE STATISTICS

From
Simon Riggs
Date:
On Thu, 21 Jul 2022 at 15:12, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> On Wed, 20 Jul 2022 at 12:01, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> >
> > On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> > >
> > > > + *                CREATE STATISTICS [ [IF NOT EXISTS] stats_name ]
> >
> > I think this is ready for a committer, so I've marked it as such.
> >
>
> Picking this up...
>
> I tend to agree with Matthias' earlier point about avoiding code
> duplication in the grammar. Without going off and refactoring other
> parts of the grammar not related to this patch, it's still a slightly
> smaller, simpler change, and less code duplication, to do this using a
> new opt_stats_name production in the grammar, as in the attached.
>
> I also noticed a comment in CreateStatistics() that needed updating.
>
> Barring any further comments, I'll push this shortly.

Nice change, please proceed.

-- 
Simon Riggs                http://www.EnterpriseDB.com/



Re: Make name optional in CREATE STATISTICS

From
Tomas Vondra
Date:
On 7/21/22 16:12, Dean Rasheed wrote:
> On Wed, 20 Jul 2022 at 12:01, Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
>>
>> On Wed, 13 Jul 2022 at 08:07, Simon Riggs <simon.riggs@enterprisedb.com> wrote:
>>>
>>>> + *                CREATE STATISTICS [ [IF NOT EXISTS] stats_name ]
>>
>> I think this is ready for a committer, so I've marked it as such.
>>
> 
> Picking this up...
> 
> I tend to agree with Matthias' earlier point about avoiding code
> duplication in the grammar. Without going off and refactoring other
> parts of the grammar not related to this patch, it's still a slightly
> smaller, simpler change, and less code duplication, to do this using a
> new opt_stats_name production in the grammar, as in the attached.
> 
> I also noticed a comment in CreateStatistics() that needed updating.
> 
> Barring any further comments, I'll push this shortly.
> 

+1


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Make name optional in CREATE STATISTICS

From
Alvaro Herrera
Date:
On 2022-Jul-21, Dean Rasheed wrote:

> I tend to agree with Matthias' earlier point about avoiding code
> duplication in the grammar. Without going off and refactoring other
> parts of the grammar not related to this patch, it's still a slightly
> smaller, simpler change, and less code duplication, to do this using a
> new opt_stats_name production in the grammar, as in the attached.
> 
> I also noticed a comment in CreateStatistics() that needed updating.
> 
> Barring any further comments, I'll push this shortly.

Thanks.  I was looking at the recently modified REINDEX syntax and
noticed there another spot for taking an optional name.  I ended up
reusing OptSchemaName for that, as in the attached patch.  I think
adding production-specific additional productions is pointless and
probably bloats the grammar.  So let me +1 your push of the patch you
posted, just to keep things moving forward, but in addition I propose to
later rename OptSchemaName to something more generic and use it in these
three places.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Attachment

Re: Make name optional in CREATE STATISTICS

From
Dean Rasheed
Date:
On Thu, 21 Jul 2022 at 18:42, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> Thanks.  I was looking at the recently modified REINDEX syntax and
> noticed there another spot for taking an optional name.  I ended up
> reusing OptSchemaName for that, as in the attached patch.  I think
> adding production-specific additional productions is pointless and
> probably bloats the grammar.  So let me +1 your push of the patch you
> posted, just to keep things moving forward, but in addition I propose to
> later rename OptSchemaName to something more generic and use it in these
> three places.
>

OK, pushed.

Before writing opt_stats_name, I went looking for anything else I
could use, but didn't see anything. The difference between this and
the index case, is that statistics objects can be schema-qualified, so
it might be tricky to get something that'll work for all 3 places.

Regards,
Dean



Re: Make name optional in CREATE STATISTICS

From
Michael Paquier
Date:
On Thu, Jul 21, 2022 at 07:42:12PM +0200, Alvaro Herrera wrote:
> Thanks.  I was looking at the recently modified REINDEX syntax and
> noticed there another spot for taking an optional name.  I ended up
> reusing OptSchemaName for that, as in the attached patch.  I think
> adding production-specific additional productions is pointless and
> probably bloats the grammar.  So let me +1 your push of the patch you
> posted, just to keep things moving forward, but in addition I propose to
> later rename OptSchemaName to something more generic and use it in these
> three places.

This slightly changes the behavior of the grammar, as CONCURRENTLY
was working on DATABASE as follows:
* On HEAD:
=# reindex database concurrently postgres;
REINDEX
=# reindex (concurrently) database concurrently postgres;
REINDEX
=# reindex (concurrently) database ;
REINDEX
=# reindex (concurrently) database postgres;
REINDEX
=# reindex database concurrently postgres;
REINDEX
=# reindex database concurrently;
ERROR:  42601: syntax error at or near ";"

And actually, even on HEAD, the last case is marked as supported by
the docs but we don't allow it in the parser.  My mistake on this
one.

Now, with the patch, I get:
=# reindex (concurrently) database concurrently;
ERROR:  42601: syntax error at or near "concurrently"
LINE 1: reindex (concurrently) database concurrently postgres ;
=# reindex (concurrently) database concurrently postgres;
ERROR:  42601: syntax error at or near "concurrently"
LINE 1: reindex (concurrently) database concurrently postgres;
=# reindex (concurrently) database ;
REINDEX
=# reindex (concurrently) database postgres;
REINDEX
=# reindex database concurrently postgres;
ERROR:  42601: syntax error at or near "concurrently"
LINE 1: reindex database concurrently postgres;
=# reindex database concurrently;
ERROR:  42601: syntax error at or near "concurrently"

So this indeed has as effect to make possible the use of CONCURRENTLY
for DATABASE and SYSTEM only within the parenthesized grammar.  Seeing
the simplifications this creates, I'd agree with dropping this part of
the grammar.  I think that I would add the following queries to
create_index.sql to test this grammar, as we can rely on this code
path generating an error:
REINDEX (CONCURRENTLY) SYSTEM postgres;
REINDEX (CONCURRENTLY) SYSTEM;
At least it would validate the parsing for DATABASE.

This breaks reindexdb for the database case, because the query
generated in run_reindex_command() adds CONCURRENTLY only *after* the
database name, and we should be careful to generate something
backward-compatible in this tool, as well.  The fix is simple: you
can add CONCURRENTLY within the section with TABLESPACE and VERBOSE
for a connection >= 14, and use it after the object name for <= 13.
--
Michael

Attachment

Re: Make name optional in CREATE STATISTICS

From
Alvaro Herrera
Date:
On 2022-Jul-22, Michael Paquier wrote:

> So this indeed has as effect to make possible the use of CONCURRENTLY
> for DATABASE and SYSTEM only within the parenthesized grammar.  Seeing
> the simplifications this creates, I'd agree with dropping this part of
> the grammar.

Actually, looking at the grammar again I realized that the '('options')'
part could be refactored; and with that, keeping an extra production for
REINDEX DATABASE CONCURRENTLY is short enough.  It is removed from
REINDEX SYSTEM, but that's OK because that doesn't work anyway.

I added the new test lines you proposed and amended the docs; the result
is attached.

Initially I wanted to use the "optional list of options" for all
utilities that have similar constructions, (VACUUM, ANALYZE, CLUSTER,
EXPLAIN) but it is not possible because their alternative productions
accept different keywords, so it doesn't look possible.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"If you have nothing to say, maybe you need just the right tool to help you
not say it."                   (New York Times, about Microsoft PowerPoint)

Attachment

Re: Make name optional in CREATE STATISTICS

From
Michael Paquier
Date:
On Fri, Jul 22, 2022 at 03:06:46PM +0200, Alvaro Herrera wrote:
> Actually, looking at the grammar again I realized that the '('options')'
> part could be refactored; and with that, keeping an extra production for
> REINDEX DATABASE CONCURRENTLY is short enough.  It is removed from
> REINDEX SYSTEM, but that's OK because that doesn't work anyway.

I have just looked at 83011ce, and got what you've done here.  You
have thrown away reindex_target_multitable and added three parts for
SCHEMA, DATABASE and SYSTEM instead with their own options, enforcing
the restriction on CONCURRENTLY at the end of REINDEX SYSTEM in the
parser rather than indexcmds.c.

+      | REINDEX opt_reindex_option_list SYSTEM_P opt_single_name
           {
                ReindexStmt *n =
                makeNode(ReindexStmt);
-
-               n->kind = $5;
-               n->name = $7;
+               n->kind = REINDEX_OBJECT_SYSTEM;
+               n->name = NULL;

I think that there is a bug in this logic.  ReindexStmt->name is
always set to NULL, meaning that a REINDEX command with any database
name would still pass, but I don't think that we should allow that.
For example, with something like these commands, we should complain
that the database name specified does not match with the database we
are connected to:
=# reindex system popo_foo_bar;
REINDEX
=# reindex database popo_foo_bar;
REINDEX

It may have been better to wait a bit if you wanted me to look at all
that, as our timezones are not compatible, especially on Fridays, but
that's fine :D.
--
Michael

Attachment

Re: Make name optional in CREATE STATISTICS

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> I have just looked at 83011ce, and got what you've done here.  You
> have thrown away reindex_target_multitable and added three parts for
> SCHEMA, DATABASE and SYSTEM instead with their own options, enforcing
> the restriction on CONCURRENTLY at the end of REINDEX SYSTEM in the
> parser rather than indexcmds.c.

That does not seem like an improvement.  In v15:

regression=# REINDEX SYSTEM CONCURRENTLY db;
ERROR:  cannot reindex system catalogs concurrently

As of HEAD:

regression=# REINDEX SYSTEM CONCURRENTLY db;
ERROR:  syntax error at or near "CONCURRENTLY"
LINE 1: REINDEX SYSTEM CONCURRENTLY db;
                       ^

That is not a very helpful error, not even if the man page
doesn't show the syntax as legal.

            regards, tom lane



Re: Make name optional in CREATE STATISTICS

From
Michael Paquier
Date:
On Fri, Jul 22, 2022 at 11:54:27PM -0400, Tom Lane wrote:
> That does not seem like an improvement.  In v15:
>
> regression=# REINDEX SYSTEM CONCURRENTLY db;
> ERROR:  cannot reindex system catalogs concurrently
>
> As of HEAD:
>
> regression=# REINDEX SYSTEM CONCURRENTLY db;
> ERROR:  syntax error at or near "CONCURRENTLY"
> LINE 1: REINDEX SYSTEM CONCURRENTLY db;
>                        ^
>
> That is not a very helpful error, not even if the man page
> doesn't show the syntax as legal.

As the problem comes down to the fact that INDEX/TABLE, SCHEMA and
DATABASE/SYSTEM need to handle names for different object types each,
I think that we could do something like the attached, removing one
block on the way at the cost of an extra parser node.

By the way, it seems that 83011ce also broke the case of "REINDEX
DATABASE CONCURRENTLY", where the parser missed the addition of a
DefElem for "concurrently" in this case.
--
Michael

Attachment

Re: Make name optional in CREATE STATISTICS

From
Alvaro Herrera
Date:
On 2022-Jul-23, Michael Paquier wrote:

> As the problem comes down to the fact that INDEX/TABLE, SCHEMA and
> DATABASE/SYSTEM need to handle names for different object types each,
> I think that we could do something like the attached, removing one
> block on the way at the cost of an extra parser node.

Yeah, looks good.  I propose to also test the error for reindexing a
different database, which is currently uncovered, as attached.

> By the way, it seems that 83011ce also broke the case of "REINDEX
> DATABASE CONCURRENTLY", where the parser missed the addition of a
> DefElem for "concurrently" in this case.

Wow.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)

Attachment

Re: Make name optional in CREATE STATISTICS

From
Michael Paquier
Date:
On Mon, Jul 25, 2022 at 11:49:50AM +0200, Alvaro Herrera wrote:
> On 2022-Jul-23, Michael Paquier wrote:
>> As the problem comes down to the fact that INDEX/TABLE, SCHEMA and
>> DATABASE/SYSTEM need to handle names for different object types each,
>> I think that we could do something like the attached, removing one
>> block on the way at the cost of an extra parser node.
>
> Yeah, looks good.  I propose to also test the error for reindexing a
> different database, which is currently uncovered, as attached.

Good idea.

>> By the way, it seems that 83011ce also broke the case of "REINDEX
>> DATABASE CONCURRENTLY", where the parser missed the addition of a
>> DefElem for "concurrently" in this case.
>
> Wow.

For this one, we have a gap in the test, actually.  It seems to me
that we'd better make sure that the OID of the indexes rebuilt
concurrently is changed.  There is a REINDEX DATABASE CONCURRENTLY
already in the TAP tests, and the only thing that would be needed for
the job is an extra query that compares the OID saved before the
reindex with the one in the catalogs after the fact..
--
Michael

Attachment

Re: Make name optional in CREATE STATISTICS

From
Alvaro Herrera
Date:
On 2022-Jul-25, Michael Paquier wrote:

> On Mon, Jul 25, 2022 at 11:49:50AM +0200, Alvaro Herrera wrote:
> > On 2022-Jul-23, Michael Paquier wrote:

> >> By the way, it seems that 83011ce also broke the case of "REINDEX
> >> DATABASE CONCURRENTLY", where the parser missed the addition of a
> >> DefElem for "concurrently" in this case.
> > 
> > Wow.
> 
> For this one, we have a gap in the test, actually.  It seems to me
> that we'd better make sure that the OID of the indexes rebuilt
> concurrently is changed.  There is a REINDEX DATABASE CONCURRENTLY
> already in the TAP tests, and the only thing that would be needed for
> the job is an extra query that compares the OID saved before the
> reindex with the one in the catalogs after the fact..

Agreed.  I think you already have the query for that elsewhere in the
test, so it's just a matter of copying it from there.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"No tengo por qué estar de acuerdo con lo que pienso"
                             (Carlos Caszeli)



Re: Make name optional in CREATE STATISTICS

From
Michael Paquier
Date:
On Mon, Jul 25, 2022 at 12:55:54PM +0200, Alvaro Herrera wrote:
> Agreed.  I think you already have the query for that elsewhere in the
> test, so it's just a matter of copying it from there.

I actually already wrote most of it in 2cbc3c1, and I just needed to
extend things a bit to detect the OID changes :)

So, applied, with all the extra tests.
--
Michael

Attachment

Re: Make name optional in CREATE STATISTICS

From
Alvaro Herrera
Date:
On 2022-Jul-26, Michael Paquier wrote:

> On Mon, Jul 25, 2022 at 12:55:54PM +0200, Alvaro Herrera wrote:
> > Agreed.  I think you already have the query for that elsewhere in the
> > test, so it's just a matter of copying it from there.
> 
> I actually already wrote most of it in 2cbc3c1, and I just needed to
> extend things a bit to detect the OID changes :)
> 
> So, applied, with all the extra tests.

Thank you!

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I am amazed at [the pgsql-sql] mailing list for the wonderful support, and
lack of hesitasion in answering a lost soul's question, I just wished the rest
of the mailing list could be like this."                               (Fotis)
               (http://archives.postgresql.org/pgsql-sql/2006-06/msg00265.php)