Thread: Getting rid of SQLValueFunction
Hi all, I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX (introduced by 40c24bf) and SQLValueFunction are around to do the exact same thing, as known as enforcing single-function calls with dedicated SQL keywords. For example, keywords like SESSION_USER, CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser to set a state that gets then used in execExprInterp.c. And it is rather easy to implement incorrect SQLValueFunctions, as these rely on more hardcoded assumptions in the parser and executor than the equivalent FuncCalls (like collation to assign when using a text-like SQLValueFunctions). There are two categories of single-value functions: - The ones returning names, where we enforce a C collation in two places of the code (current_role, role, current_catalog, current_schema, current_database, current_user), even if get_typcollation() should do that for name types. - The ones working on time, date and timestamps (localtime[stamp], current_date, current_time[stamp]), for 9 patterns as these accept an optional typmod. I have dug into the possibility to unify all that with a single interface, and finish with the attached patch set which is a reduction of code, where all the SQLValueFunctions are replaced by a set of FuncCalls: 25 files changed, 338 insertions(+), 477 deletions(-) 0001 is the move done for the name-related functions, cleaning up two places in the executor when a C collation is assigned to those function expressions. 0002 is the remaining cleanup for the time-related ones, moving a set of parser-side checks to the execution path within each function, so as all this knowledge is now local to each file holding the date and timestamp types. Most of the gain is in 0002, obviously. The pg_proc entries introduced for the sake of the move use the same name as the SQL keywords. These should perhaps be prefixed with a "pg_" at least. There would be an exception with pg_localtime[stamp], though, where we could use a pg_localtime[stamp]_sql for the function name for prosrc. I am open to suggestions for these names. Thoughts? -- Michael
Attachment
On Fri, Sep 30, 2022 at 2:04 AM Michael Paquier <michael@paquier.xyz> wrote:
Hi all,
I have bumped a few days ago on the fact that COERCE_SQL_SYNTAX
(introduced by 40c24bf) and SQLValueFunction are around to do the
exact same thing, as known as enforcing single-function calls with
dedicated SQL keywords. For example, keywords like SESSION_USER,
CURRENT_DATE, etc. go through SQLValueFunction and rely on the parser
to set a state that gets then used in execExprInterp.c. And it is
rather easy to implement incorrect SQLValueFunctions, as these rely on
more hardcoded assumptions in the parser and executor than the
equivalent FuncCalls (like collation to assign when using a text-like
SQLValueFunctions).
There are two categories of single-value functions:
- The ones returning names, where we enforce a C collation in two
places of the code (current_role, role, current_catalog,
current_schema, current_database, current_user), even if
get_typcollation() should do that for name types.
- The ones working on time, date and timestamps (localtime[stamp],
current_date, current_time[stamp]), for 9 patterns as these accept an
optional typmod.
I have dug into the possibility to unify all that with a single
interface, and finish with the attached patch set which is a reduction
of code, where all the SQLValueFunctions are replaced by a set of
FuncCalls:
25 files changed, 338 insertions(+), 477 deletions(-)
0001 is the move done for the name-related functions, cleaning up two
places in the executor when a C collation is assigned to those
function expressions. 0002 is the remaining cleanup for the
time-related ones, moving a set of parser-side checks to the execution
path within each function, so as all this knowledge is now local to
each file holding the date and timestamp types. Most of the gain is
in 0002, obviously.
The pg_proc entries introduced for the sake of the move use the same
name as the SQL keywords. These should perhaps be prefixed with a
"pg_" at least. There would be an exception with pg_localtime[stamp],
though, where we could use a pg_localtime[stamp]_sql for the function
name for prosrc. I am open to suggestions for these names.
Thoughts?
--
Michael
I like this a lot. Deleted code is debugged code.
Patch applies and passes make check-world.
No trace of SQLValueFunction is left in the codebase, at least according to `git grep -l`.
I have only one non-nitpick question about the code:
+ /*+ * we're not too tense about good error message here because grammar+ * shouldn't allow wrong number of modifiers for TIME+ */+ if (n != 1)+ ereport(ERROR,+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),+ errmsg("invalid type modifier")));
I agree that we shouldn't spend too much effort on a good error message here, but perhaps we should have the message mention that it is date/time-related? A person git-grepping for this error message will get 4 hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a slight variation in the message could save them some time.
This is an extreme nitpick, but the patchset seems like it should have been 1 file or 3 (remove name functions, remove time functions, remove SQLValueFunction infrastructure), but that will only matter in the unlikely case that we find a need for SQLValueFunction but we want to leave the timestamp function as COERCE_SQL_SYNTAX.
(Adding Tom in CC, in case.) On Tue, Oct 18, 2022 at 04:35:33PM -0400, Corey Huinker wrote: > I agree that we shouldn't spend too much effort on a good error message > here, but perhaps we should have the message mention that it is > date/time-related? A person git-grepping for this error message will get 4 > hits in .c files (date.c, timestamp.c, varbit.c, varchar.c) so even a > slight variation in the message could save them some time. The message is the same between HEAD and the patch and these have been around for a long time, except that we would see it at parsing time on HEAD, and at executor time with the patch. I would not mind changing if there are better ideas than what's used now, of course ;) > This is an extreme nitpick, but the patchset seems like it should have been > 1 file or 3 (remove name functions, remove time functions, remove > SQLValueFunction infrastructure), but that will only matter in the unlikely > case that we find a need for SQLValueFunction but we want to leave the > timestamp function as COERCE_SQL_SYNTAX. Once the timestamp functions are removed, SQLValueFunction is just dead code so including its removal in 0002 does not change much in my opinion. An other thing I had on my list for this patch was to check its performance impact. So I have spent some time having a look at the perf profiles produced on HEAD and with the patch using queries like SELECT current_role FROM generate_series(1,N) where N > 10M and I have not noticed any major differences in runtime or in the profiles, at the difference that we don't have anymore SQLValueFunction() and its internal functions called, hence they are missing from the stacks, but that's the whole point of the patch. With this in mind, would somebody complain if I commit that? That's a nice reduction in code, while completing the work done in 40c24bf: 25 files changed, 338 insertions(+), 477 deletions(-) Thanks, -- Michael
Attachment
On Wed, Oct 19, 2022 at 03:45:48PM +0900, Michael Paquier wrote: > With this in mind, would somebody complain if I commit that? That's a > nice reduction in code, while completing the work done in 40c24bf: > 25 files changed, 338 insertions(+), 477 deletions(-) On second look, there is something I have underestimated here with FigureColnameInternal(). This function would create an attribute name based on the SQL keyword given in input. For example, on HEAD we would get that: =# SELECT * FROM CURRENT_CATALOG; current_catalog ----------------- postgres (1 row) But the patch enforces the attribute name to be the underlying function name, switching the previous "current_catalog" to "current_database". For example: =# SELECT * FROM CURRENT_CATALOG; current_database ------------------ postgres (1 row) I am not sure how much it matters in practice, but this could break some queries. One way to tackle that is to extend FigureColnameInternal() so as we use a compatible name when the node is a T_FuncCall, but that won't be entirely water-proof as long as there is not a one-one mapping between the SQL keywords and the underlying function names, aka we would need a current_catalog. "user" would be also too generic as a catalog function name, so we should name its proc entry to a pg_user anyway, requiring a shortcut in FigureColnameInternal(). Or perhaps I am worrying too much and keeping the code simpler is better? Does the SQL specification require that the attribute name has to match its SQL keyword when specified in a FROM clause when there is no aliases? Thoughts? -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > But the patch enforces the attribute name to be the underlying > function name, switching the previous "current_catalog" to > "current_database". The entire point of SQLValueFunction IMO was to hide the underlying implementation(s). Replacing it with something that leaks implementation details does not seem like a step forward. regards, tom lane
On Thu, Oct 20, 2022 at 11:10:22PM -0400, Tom Lane wrote: > The entire point of SQLValueFunction IMO was to hide the underlying > implementation(s). Replacing it with something that leaks > implementation details does not seem like a step forward. Hmm.. Okay, thanks. So this just comes down that I am going to need one different pg_proc entry per SQL keyword, then, or this won't fly far. For example, note that on HEAD or with the patch, a view with a SQL keyword in a FROM clause translates the same way with quotes applied in the same places, as of: =# create view test as select (SELECT * FROM CURRENT_USER) as cu; CREATE VIEW =# select pg_get_viewdef('test', true); pg_get_viewdef --------------------------------------------------------------------- SELECT ( SELECT "current_user"."current_user" + FROM CURRENT_USER "current_user"("current_user")) AS cu; (1 row) A sticky point is that this would need the creation of a pg_proc entry for "user" which is a generic word, or a shortcut around FigureColnameInternal(). The code gain overall still looks appealing in the executor, even if we do all that and the resulting backend code gets kind of nicer and easier to maintain long-term IMO. -- Michael
Attachment
On Fri, Oct 21, 2022 at 12:34:23PM +0900, Michael Paquier wrote: > A sticky point is that this would need the creation of a pg_proc entry > for "user" which is a generic word, or a shortcut around > FigureColnameInternal(). The code gain overall still looks appealing > in the executor, even if we do all that and the resulting backend code > gets kind of nicer and easier to maintain long-term IMO. I have looked at that, and the attribute mapping remains compatible with past versions once the appropriate pg_proc entries are added. The updated patch set attached does that (with a user() function as well to keep the code a maximum simple), with more tests to cover the attribute case mentioned upthread. -- Michael
Attachment
On Fri, Oct 21, 2022 at 02:27:07PM +0900, Michael Paquier wrote: > I have looked at that, and the attribute mapping remains compatible > with past versions once the appropriate pg_proc entries are added. > The updated patch set attached does that (with a user() function as > well to keep the code a maximum simple), with more tests to cover the > attribute case mentioned upthread. Attached is a rebased patch set, as of the conflicts from 2e0d80c. -- Michael
Attachment
On Tue, Oct 25, 2022 at 02:20:12PM +0900, Michael Paquier wrote: > Attached is a rebased patch set, as of the conflicts from 2e0d80c. So, this patch set has been sitting in the CF app for a few weeks now, and I would like to apply them to remove a bit of code from the executor. Please note that in order to avoid tweaks when choosing the attribute name of function call, this needs a total of 8 new catalog functions mapping to the SQL keywords, which is what the test added by 2e0d80c is about: - current_role - user - current_catalog - current_date - current_time - current_timestamp - localtime - localtimestamp Any objections? -- Michael
Attachment
On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote: > Please note that in order to avoid tweaks when choosing the attribute > name of function call, this needs a total of 8 new catalog functions > mapping to the SQL keywords, which is what the test added by 2e0d80c > is about: > - current_role > - user > - current_catalog > - current_date > - current_time > - current_timestamp > - localtime > - localtimestamp > > Any objections? Hearing nothing, I have gone through 0001 again and applied it as fb32748 to remove the dependency between names and SQLValueFunction. Attached is 0002, to bring back the CI to a green state. -- Michael
Attachment
On Sat, Nov 19, 2022 at 7:01 PM Michael Paquier <michael@paquier.xyz> wrote:
On Fri, Nov 18, 2022 at 10:23:58AM +0900, Michael Paquier wrote:
> Please note that in order to avoid tweaks when choosing the attribute
> name of function call, this needs a total of 8 new catalog functions
> mapping to the SQL keywords, which is what the test added by 2e0d80c
> is about:
> - current_role
> - user
> - current_catalog
> - current_date
> - current_time
> - current_timestamp
> - localtime
> - localtimestamp
>
> Any objections?
Hearing nothing, I have gone through 0001 again and applied it as
fb32748 to remove the dependency between names and SQLValueFunction.
Attached is 0002, to bring back the CI to a green state.
--
Michael
Hi,
For get_func_sql_syntax(), the code for cases of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is mostly the same.
Maybe we can introduce a helper so that code duplication is reduced.
Cheers
On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote: > For get_func_sql_syntax(), the code for cases > of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is > mostly the same. > Maybe we can introduce a helper so that code duplication is reduced. It would. Thanks for the suggestion. Do you like something like the patch 0002 attached? This reduces a bit the overall size of the patch. Both ought to be merged in the same commit, still it is easier to see the simplification created. -- Michael
Attachment
On Sun, Nov 20, 2022 at 3:12 PM Michael Paquier <michael@paquier.xyz> wrote:
On Sun, Nov 20, 2022 at 08:21:10AM -0800, Ted Yu wrote:
> For get_func_sql_syntax(), the code for cases
> of F_CURRENT_TIME, F_CURRENT_TIMESTAMP, F_LOCALTIME and F_LOCALTIMESTAMP is
> mostly the same.
> Maybe we can introduce a helper so that code duplication is reduced.
It would. Thanks for the suggestion.
Do you like something like the patch 0002 attached? This reduces a
bit the overall size of the patch. Both ought to be merged in the
same commit, still it is easier to see the simplification created.
--
Michael
Hi,
Thanks for the quick response.
+ * by the function caller through their SQL keyword.
typo: typmod is given -> typmod given
Other than the above, code looks good to me.
Cheers
On Sun, Nov 20, 2022 at 03:15:34PM -0800, Ted Yu wrote: > + * timestamp. These require a specific handling with their typmod is given > + * by the function caller through their SQL keyword. > > typo: typmod is given -> typmod given > > Other than the above, code looks good to me. Thanks for double-checking. I intended a different wording, actually, so fixed this one. And applied after an extra round of reviews. -- Michael
Attachment
Hi 2022年11月21日(月) 18:39 Michael Paquier <michael@paquier.xyz>: > > On Sun, Nov 20, 2022 at 03:15:34PM -0800, Ted Yu wrote: > > + * timestamp. These require a specific handling with their typmod is given > > + * by the function caller through their SQL keyword. > > > > typo: typmod is given -> typmod given > > > > Other than the above, code looks good to me. > > Thanks for double-checking. I intended a different wording, actually, > so fixed this one. And applied after an extra round of reviews. I noticed this commit (f193883f) introduces following regressions: postgres=# SELECT current_timestamp(7); WARNING: TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum allowed, 6 ERROR: timestamp(7) precision must be between 0 and 6 postgres=# SELECT localtimestamp(7); WARNING: TIMESTAMP(7) precision reduced to maximum allowed, 6 ERROR: timestamp(7) precision must be between 0 and 6 Suggested fix attached. Regards Ian Barwick
Attachment
On Fri, Dec 30, 2022 at 10:57:52AM +0900, Ian Lawrence Barwick wrote: > I noticed this commit (f193883f) introduces following regressions: > > postgres=# SELECT current_timestamp(7); > WARNING: TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum > allowed, 6 > ERROR: timestamp(7) precision must be between 0 and 6 > > postgres=# SELECT localtimestamp(7); > WARNING: TIMESTAMP(7) precision reduced to maximum allowed, 6 > ERROR: timestamp(7) precision must be between 0 and 6 > > Suggested fix attached. Thanks for the report, Ian. Will fix. -- Michael
Attachment
On Fri, Dec 30, 2022 at 10:57:52AM +0900, Ian Lawrence Barwick wrote: > I noticed this commit (f193883f) introduces following regressions: > > postgres=# SELECT current_timestamp(7); > WARNING: TIMESTAMP(7) WITH TIME ZONE precision reduced to maximum > allowed, 6 > ERROR: timestamp(7) precision must be between 0 and 6 > > postgres=# SELECT localtimestamp(7); > WARNING: TIMESTAMP(7) precision reduced to maximum allowed, 6 > ERROR: timestamp(7) precision must be between 0 and 6 > > Suggested fix attached. The two changes in timestamp.c are fine, Now I can see that the same mistake was introduced in date.c. The WARNINGs were issued and the compilation went through the same way as the default, but they passed down an incorrect precision, so I have fixed all that. Coverage has been added for all four, while the patch proposed covered only two. -- Michael