Thread: Getting rid of SQLValueFunction

Getting rid of SQLValueFunction

From
Michael Paquier
Date:
Hi all,

I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX
(introduced by 40c24bf) and SQLValueFunction are around to do the
exact same thing, as known as enforcing single-function calls with
dedicated SQL keywords.  For example, keywords like SESSION_USER,
CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser
to set a state that gets then used in execExprInterp.c.  And it is
rather easy to implement incorrect SQLValueFunctions, as these rely on
more hardcoded assumptions in the parser and executor than the
equivalent FuncCalls (like collation to assign when using a text-like
SQLValueFunctions).

There are two categories of single-value functions:
- The ones returning names, where we enforce a C collation in two
places of the code (current_role, role, current_catalog,
current_schema, current_database, current_user), even if
get_typcollation() should do that for name types.
- The ones working on time, date and timestamps (localtime[stamp],
current_date, current_time[stamp]), for 9 patterns as these accept an
optional typmod.

I have dug into the possibility to unify all that with a single
interface, and finish with the attached patch set which is a reduction
of code, where all the SQLValueFunctions are replaced by a set of
FuncCalls:
 25 files changed, 338 insertions(+), 477 deletions(-)

0001 is the move done for the name-related functions, cleaning up two
places in the executor when a C collation is assigned to those
function expressions.  0002 is the remaining cleanup for the
time-related ones, moving a set of parser-side checks to the execution
path within each function, so as all this knowledge is now local to
each file holding the date and timestamp types.  Most of the gain is
in 0002, obviously.

The pg_proc entries introduced for the sake of the move use the same
name as the SQL keywords.  These should perhaps be prefixed with a
"pg_" at least.  There would be an exception with pg_localtime[stamp],
though, where we could use a pg_localtime[stamp]_sql for the function
name for prosrc.  I am open to suggestions for these names.

Thoughts?
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Corey Huinker
Date:


On Fri, Sep 30, 2022 at 2:04 AM Michael Paquier <michael@paquier.xyz> wrote:
Hi all,

I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX
(introduced by 40c24bf) and SQLValueFunction are around to do the
exact same thing, as known as enforcing single-function calls with
dedicated SQL keywords.  For example, keywords like SESSION_USER,
CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser
to set a state that gets then used in execExprInterp.c.  And it is
rather easy to implement incorrect SQLValueFunctions, as these rely on
more hardcoded assumptions in the parser and executor than the
equivalent FuncCalls (like collation to assign when using a text-like
SQLValueFunctions).

There are two categories of single-value functions:
- The ones returning names, where we enforce a C collation in two
places of the code (current_role, role, current_catalog,
current_schema, current_database, current_user), even if
get_typcollation() should do that for name types.
- The ones working on time, date and timestamps (localtime[stamp],
current_date, current_time[stamp]), for 9 patterns as these accept an
optional typmod.

I have dug into the possibility to unify all that with a single
interface, and finish with the attached patch set which is a reduction
of code, where all the SQLValueFunctions are replaced by a set of
FuncCalls:
 25 files changed, 338 insertions(+), 477 deletions(-)

0001 is the move done for the name-related functions, cleaning up two
places in the executor when a C collation is assigned to those
function expressions.  0002 is the remaining cleanup for the
time-related ones, moving a set of parser-side checks to the execution
path within each function, so as all this knowledge is now local to
each file holding the date and timestamp types.  Most of the gain is
in 0002, obviously.

The pg_proc entries introduced for the sake of the move use the same
name as the SQL keywords.  These should perhaps be prefixed with a
"pg_" at least.  There would be an exception with pg_localtime[stamp],
though, where we could use a pg_localtime[stamp]_sql for the function
name for prosrc.  I am open to suggestions for these names.

Thoughts?
--
Michael

I like this a lot. Deleted code is debugged code.

Patch applies and passes make check-world.

No trace of SQLValueFunction is left in the codebase, at least according to `git grep -l`.

I have only one non-nitpick question about the code:

+ /*
+ * we're not too tense about good error message here because grammar
+ * shouldn't allow wrong number of modifiers for TIME
+ */
+ if (n != 1)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid type modifier")));

I agree that we shouldn't spend too much effort on a good error message here, but perhaps we should have the message mention that it is date/time-related? A person git-grepping for this error message will get 4 hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a slight variation in the message could save them some time.

This is an extreme nitpick, but the patchset seems like it should have been 1 file or 3 (remove name functions, remove time functions, remove SQLValueFunction infrastructure), but that will only matter in the unlikely case that we find a need for SQLValueFunction but we want to leave the timestamp function as COERCE_SQL_SYNTAX.
 

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
(Adding Tom in CC, in case.)

On Tue, Oct 18, 2022 at 04:35:33PM -0400, Corey Huinker wrote:
> I agree that we shouldn't spend too much effort on a good error message
> here, but perhaps we should have the message mention that it is
> date/time-related? A person git-grepping for this error message will get 4
> hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a
> slight variation in the message could save them some time.

The message is the same between HEAD and the patch and these have been
around for a long time, except that we would see it at parsing time on
HEAD, and at executor time with the patch.  I would not mind changing
if there are better ideas than what's used now, of course ;)

> This is an extreme nitpick, but the patchset seems like it should have been
> 1 file or 3 (remove name functions, remove time functions, remove
> SQLValueFunction infrastructure), but that will only matter in the unlikely
> case that we find a need for SQLValueFunction but we want to leave the
> timestamp function as COERCE_SQL_SYNTAX.

Once the timestamp functions are removed, SQLValueFunction is just
dead code so including its removal in 0002 does not change much in my
opinion.

An other thing I had on my list for this patch was to check its
performance impact.  So I have spent some time having a look at the
perf profiles produced on HEAD and with the patch using queries like
SELECT current_role FROM generate_series(1,N) where N > 10M and I have
not noticed any major differences in runtime or in the profiles, at
the difference that we don't have anymore SQLValueFunction() and its
internal functions called, hence they are missing from the stacks, but
that's the whole point of the patch.

With this in mind, would somebody complain if I commit that?  That's a
nice reduction in code, while completing the work done in 40c24bf:
 25 files changed, 338 insertions(+), 477 deletions(-)

Thanks,
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Wed, Oct 19, 2022 at 03:45:48PM +0900, Michael Paquier wrote:
> With this in mind, would somebody complain if I commit that?  That's a
> nice reduction in code, while completing the work done in 40c24bf:
>  25 files changed, 338 insertions(+), 477 deletions(-)

On second look, there is something I have underestimated here with
FigureColnameInternal().  This function would create an attribute name
based on the SQL keyword given in input.  For example, on HEAD we
would get that:
=# SELECT * FROM CURRENT_CATALOG;
 current_catalog
-----------------
 postgres
(1 row)

But the patch enforces the attribute name to be the underlying
function name, switching the previous "current_catalog" to
"current_database".  For example:
=# SELECT * FROM CURRENT_CATALOG;
 current_database
------------------
 postgres
(1 row)

I am not sure how much it matters in practice, but this could break
some queries.  One way to tackle that is to extend
FigureColnameInternal() so as we use a compatible name when the node
is a T_FuncCall, but that won't be entirely water-proof as long as
there is not a one-one mapping between the SQL keywords and the
underlying function names, aka we would need a current_catalog.
"user" would be also too generic as a catalog function name, so we
should name its proc entry to a pg_user anyway, requiring a shortcut
in FigureColnameInternal().  Or perhaps I am worrying too much and
keeping the code simpler is better?  Does the SQL specification
require that the attribute name has to match its SQL keyword when
specified in a FROM clause when there is no aliases?

Thoughts?
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> But the patch enforces the attribute name to be the underlying
> function name, switching the previous "current_catalog" to
> "current_database".

The entire point of SQLValueFunction IMO was to hide the underlying
implementation(s).  Replacing it with something that leaks
implementation details does not seem like a step forward.

            regards, tom lane



Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Thu, Oct 20, 2022 at 11:10:22PM -0400, Tom Lane wrote:
> The entire point of SQLValueFunction IMO was to hide the underlying
> implementation(s).  Replacing it with something that leaks
> implementation details does not seem like a step forward.

Hmm..  Okay, thanks.  So this just comes down that I am going to need
one different pg_proc entry per SQL keyword, then, or this won't fly
far.  For example, note that on HEAD or with the patch, a view with a
SQL keyword in a FROM clause translates the same way with quotes
applied in the same places, as of:
=# create view test as select (SELECT * FROM CURRENT_USER) as cu;
CREATE VIEW
=# select pg_get_viewdef('test', true);
                           pg_get_viewdef
---------------------------------------------------------------------
  SELECT ( SELECT "current_user"."current_user"                     +
            FROM CURRENT_USER "current_user"("current_user")) AS cu;
(1 row)

A sticky point is that this would need the creation of a pg_proc entry
for "user" which is a generic word, or a shortcut around
FigureColnameInternal().  The code gain overall still looks appealing
in the executor, even if we do all that and the resulting backend code
gets kind of nicer and easier to maintain long-term IMO.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Fri, Oct 21, 2022 at 12:34:23PM +0900, Michael Paquier wrote:
> A sticky point is that this would need the creation of a pg_proc entry
> for "user" which is a generic word, or a shortcut around
> FigureColnameInternal().  The code gain overall still looks appealing
> in the executor, even if we do all that and the resulting backend code
> gets kind of nicer and easier to maintain long-term IMO.

I have looked at that, and the attribute mapping remains compatible
with past versions once the appropriate pg_proc entries are added.
The updated patch set attached does that (with a user() function as
well to keep the code a maximum simple), with more tests to cover the
attribute case mentioned upthread.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Fri, Oct 21, 2022 at 02:27:07PM +0900, Michael Paquier wrote:
> I have looked at that, and the attribute mapping remains compatible
> with past versions once the appropriate pg_proc entries are added.
> The updated patch set attached does that (with a user() function as
> well to keep the code a maximum simple), with more tests to cover the
> attribute case mentioned upthread.

Attached is a rebased patch set, as of the conflicts from 2e0d80c.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Tue, Oct 25, 2022 at 02:20:12PM +0900, Michael Paquier wrote:
> Attached is a rebased patch set, as of the conflicts from 2e0d80c.

So, this patch set has been sitting in the CF app for a few weeks now,
and I would like to apply them to remove a bit of code from the
executor.

Please note that in order to avoid tweaks when choosing the attribute
name of function call, this needs a total of 8 new catalog functions
mapping to the SQL keywords, which is what the test added by 2e0d80c
is about:
- current_role
- user
- current_catalog
- current_date
- current_time
- current_timestamp
- localtime
- localtimestamp

Any objections?
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote:
> Please note that in order to avoid tweaks when choosing the attribute
> name of function call, this needs a total of 8 new catalog functions
> mapping to the SQL keywords, which is what the test added by 2e0d80c
> is about:
> - current_role
> - user
> - current_catalog
> - current_date
> - current_time
> - current_timestamp
> - localtime
> - localtimestamp
>
> Any objections?

Hearing nothing, I have gone through 0001 again and applied it as
fb32748 to remove the dependency between names and SQLValueFunction.
Attached is 0002, to bring back the CI to a green state.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Ted Yu
Date:


On Sat, Nov 19, 2022 at 7:01 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote:
> Please note that in order to avoid tweaks when choosing the attribute
> name of function call, this needs a total of 8 new catalog functions
> mapping to the SQL keywords, which is what the test added by 2e0d80c
> is about:
> - current_role
> - user
> - current_catalog
> - current_date
> - current_time
> - current_timestamp
> - localtime
> - localtimestamp
>
> Any objections?

Hearing nothing, I have gone through 0001 again and applied it as
fb32748 to remove the dependency between names and SQLValueFunction.
Attached is 0002, to bring back the CI to a green state.
--
Michael

Hi,
For get_func_sql_syntax(), the code for cases of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is mostly the same.
Maybe we can introduce a helper so that code duplication is reduced.

Cheers 

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote:
> For get_func_sql_syntax(), the code for cases
> of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is
> mostly the same.
> Maybe we can introduce a helper so that code duplication is reduced.

It would.  Thanks for the suggestion.

Do you like something like the patch 0002 attached?  This reduces a
bit the overall size of the patch.  Both ought to be merged in the
same commit, still it is easier to see the simplification created.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Ted Yu
Date:


On Sun, Nov 20, 2022 at 3:12 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote:
> For get_func_sql_syntax(), the code for cases
> of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is
> mostly the same.
> Maybe we can introduce a helper so that code duplication is reduced.

It would.  Thanks for the suggestion.

Do you like something like the patch 0002 attached?  This reduces a
bit the overall size of the patch.  Both ought to be merged in the
same commit, still it is easier to see the simplification created.
--
Michael
Hi,
Thanks for the quick response.

+ * timestamp.  These require a specific handling with their typmod is given
+ * by the function caller through their SQL keyword. 

typo: typmod is given -> typmod given

Other than the above, code looks good to me.

Cheers

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Sun, Nov 20, 2022 at 03:15:34PM -0800, Ted Yu wrote:
> + * timestamp.  These require a specific handling with their typmod is given
> + * by the function caller through their SQL keyword.
>
> typo: typmod is given -> typmod given
>
> Other than the above, code looks good to me.

Thanks for double-checking.  I intended a different wording, actually,
so fixed this one.  And applied after an extra round of reviews.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Ian Lawrence Barwick
Date:
Hi

2022年11月21日(月) 18:39 Michael Paquier <michael@paquier.xyz>:
>
> On Sun, Nov 20, 2022 at 03:15:34PM -0800, Ted Yu wrote:
> > + * timestamp.  These require a specific handling with their typmod is given
> > + * by the function caller through their SQL keyword.
> >
> > typo: typmod is given -> typmod given
> >
> > Other than the above, code looks good to me.
>
> Thanks for double-checking.  I intended a different wording, actually,
> so fixed this one.  And applied after an extra round of reviews.

I noticed this commit (f193883f) introduces following regressions:

    postgres=# SELECT current_timestamp(7);
    WARNING:  TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum
allowed, 6
    ERROR:  timestamp(7) precision must be between 0 and 6

    postgres=# SELECT localtimestamp(7);
    WARNING:  TIMESTAMP(7) precision reduced to maximum allowed, 6
    ERROR:  timestamp(7) precision must be between 0 and 6

Suggested fix attached.

Regards

Ian Barwick

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Fri, Dec 30, 2022 at 10:57:52AM +0900, Ian Lawrence Barwick wrote:
> I noticed this commit (f193883f) introduces following regressions:
>
>     postgres=# SELECT current_timestamp(7);
>     WARNING:  TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum
> allowed, 6
>     ERROR:  timestamp(7) precision must be between 0 and 6
>
>     postgres=# SELECT localtimestamp(7);
>     WARNING:  TIMESTAMP(7) precision reduced to maximum allowed, 6
>     ERROR:  timestamp(7) precision must be between 0 and 6
>
> Suggested fix attached.

Thanks for the report, Ian.  Will fix.
--
Michael

Attachment

Re: Getting rid of SQLValueFunction

From
Michael Paquier
Date:
On Fri, Dec 30, 2022 at 10:57:52AM +0900, Ian Lawrence Barwick wrote:
> I noticed this commit (f193883f) introduces following regressions:
>
>     postgres=# SELECT current_timestamp(7);
>     WARNING:  TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum
> allowed, 6
>     ERROR:  timestamp(7) precision must be between 0 and 6
>
>     postgres=# SELECT localtimestamp(7);
>     WARNING:  TIMESTAMP(7) precision reduced to maximum allowed, 6
>     ERROR:  timestamp(7) precision must be between 0 and 6
>
> Suggested fix attached.

The two changes in timestamp.c are fine,  Now I can see that the same
mistake was introduced in date.c.  The WARNINGs were issued and the
compilation went through the same way as the default, but they passed
down an incorrect precision, so I have fixed all that.  Coverage has
been added for all four, while the patch proposed covered only two.
--
Michael

Attachment