Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word - Mailing list pgsql-hackers

From Tom Lane
Subject Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
Date
Msg-id 4158666.1653945615@sss.pgh.pa.us
Whole thread Raw
In response to Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word  (Michael Meskes <meskes@postgresql.org>)
Responses Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
List pgsql-hackers
Michael Meskes <meskes@postgresql.org> writes:
>> This seems to be because what follows ecpgstart can be either a general
>> SQL statement or an ECPGVarDeclaration (beginning with var_type),
>> and bison isn't smart enough to disambiguate.  I have a feeling that
>> this situation could be improved with enough elbow grease, because
>> plpgsql manages to solve a closely-related problem in allowing its
>> assignment statements to coexist with general SQL statements.

> Right, the reason for all this is that the part after the "exec sql" could be
> either language, SQL or C. It has been like this for all those years. I do not
> claim that the solution we have is the best, it's only the best I could come up
> with when I implemented it. If anyone has a better way, please be my guest.

I pushed the proposed patch, but after continuing to think about
it I have an idea for a possible solution to the older problem.
I noticed that the problematic cases in var_type aren't really
interested in seeing any possible unreserved keyword: they care about
certain specific built-in type names (most of which are keywords
already) and about typedef names.  Now, almost every C-parsing program
I've ever seen has to lex typedef names specially, so what if we made
ecpg do that too?  After a couple of false starts, I came up with the
attached draft patch.  The key ideas are:

1. In pgc.l, if an identifier is a typedef name, ignore any possible
keyword meaning and return it as an IDENT.  (I'd originally supposed
that we'd want to return some new TYPEDEF token type, but that does
not seem to be necessary right now, and adding a new token type would
increase the patch footprint quite a bit.)

2. In the var_type production, forget about ECPGColLabel[Common]
and just handle the keywords we know we need, plus IDENT for the
typedef case.  It turns out that we have to have duplicate coding
because most of these keywords are not keywords in C lexing mode,
so that they'll come through the IDENT path anyway when we're
in a C rather than SQL context.  That seemed acceptable to me.
I thought about adding them all to the C keywords list but that
seemed likely to have undesirable side-effects, and again it'd
bloat the patch footprint.

This fix is not without downsides.  Disabling recognition of
keywords that match typedefs means that, for example, if you
declare a typedef named "work" then ECPG will fail to parse
"EXEC SQL BEGIN WORK".  So in a real sense this is just trading
one hazard for another.  But there is an important difference:
with this, whether your ECPG program works depends only on what
typedef names and SQL commands are used in the program.  If
it compiles today it'll still compile next year, whereas with
the present implementation the addition of some new unreserved
SQL keyword could break it.  We'd have to document this change
for sure, and it wouldn't be something to back-patch, but it
seems like it might be acceptable from the users' standpoint.

We could narrow (not eliminate) this hazard if we could get the
typedef lookup in pgc.l to happen only when we're about to parse
a var_type construct.  But because of Bison's lookahead behavior,
that seems to be impossible, or at least undesirably messy
and fragile.  But perhaps somebody else will see a way.

Anyway, this seems like too big a change to consider for v15,
so I'll stick this patch into the v16 CF queue.  It's only
draft quality anyway --- lacks documentation changes and test
cases.  There are also some coding points that could use review.
Notably, I made the typedef lookup override SQL keywords but
not C keywords; this is for consistency with the C-mode lookup
rules, but is it the right thing?

            regards, tom lane

diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer
index b95fc44314..fba35f6be6 100644
--- a/src/interfaces/ecpg/preproc/ecpg.trailer
+++ b/src/interfaces/ecpg/preproc/ecpg.trailer
@@ -564,8 +564,29 @@ var_type:    simple_type
             $$.type_index = mm_strdup("-1");
             $$.type_sizeof = NULL;
         }
-        | ECPGColLabelCommon '(' precision opt_scale ')'
+        | NUMERIC '(' precision opt_scale ')'
         {
+            $$.type_enum = ECPGt_numeric;
+            $$.type_str = mm_strdup("numeric");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | DECIMAL_P '(' precision opt_scale ')'
+        {
+            $$.type_enum = ECPGt_decimal;
+            $$.type_str = mm_strdup("decimal");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | IDENT '(' precision opt_scale ')'
+        {
+            /*
+             * In C parsing mode, NUMERIC and DECIMAL are not keywords, so
+             * they will show up here as a plain identifier, and we need
+             * this duplicate code to recognize them.
+             */
             if (strcmp($1, "numeric") == 0)
             {
                 $$.type_enum = ECPGt_numeric;
@@ -587,15 +608,98 @@ var_type:    simple_type
             $$.type_index = mm_strdup("-1");
             $$.type_sizeof = NULL;
         }
-        | ECPGColLabelCommon ecpg_interval
+        | VARCHAR
         {
-            if (strlen($2) != 0 && strcmp ($1, "datetime") != 0 && strcmp ($1, "interval") != 0)
-                mmerror (PARSE_ERROR, ET_ERROR, "interval specification not allowed here");
+            $$.type_enum = ECPGt_varchar;
+            $$.type_str = EMPTY; /*mm_strdup("varchar");*/
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | FLOAT_P
+        {
+            /* Note: DOUBLE is handled in simple_type */
+            $$.type_enum = ECPGt_float;
+            $$.type_str = mm_strdup("float");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | NUMERIC
+        {
+            $$.type_enum = ECPGt_numeric;
+            $$.type_str = mm_strdup("numeric");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | DECIMAL_P
+        {
+            $$.type_enum = ECPGt_decimal;
+            $$.type_str = mm_strdup("decimal");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | TIMESTAMP
+        {
+            $$.type_enum = ECPGt_timestamp;
+            $$.type_str = mm_strdup("timestamp");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | INTERVAL ecpg_interval
+        {
+            $$.type_enum = ECPGt_interval;
+            $$.type_str = mm_strdup("interval");
+            $$.type_dimension = mm_strdup("-1");
+            $$.type_index = mm_strdup("-1");
+            $$.type_sizeof = NULL;
+        }
+        | STRING
+        {
+            if (INFORMIX_MODE)
+            {
+                /* In Informix mode, "string" is automatically a typedef */
+                $$.type_enum = ECPGt_string;
+                $$.type_str = mm_strdup("char");
+                $$.type_dimension = mm_strdup("-1");
+                $$.type_index = mm_strdup("-1");
+                $$.type_sizeof = NULL;
+            }
+            else
+            {
+                /* Otherwise, legal only if user typedef'ed it */
+                struct typedefs *this = get_typedef("string", false);
+
+                $$.type_str = (this->type->type_enum == ECPGt_varchar || this->type->type_enum == ECPGt_bytea) ? EMPTY
:mm_strdup(this->name); 
+                $$.type_enum = this->type->type_enum;
+                $$.type_dimension = this->type->type_dimension;
+                $$.type_index = this->type->type_index;
+                if (this->type->type_sizeof && strlen(this->type->type_sizeof) != 0)
+                    $$.type_sizeof = this->type->type_sizeof;
+                else
+                    $$.type_sizeof = cat_str(3, mm_strdup("sizeof("), mm_strdup(this->name), mm_strdup(")"));

+                struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list);
+            }
+        }
+        | IDENT ecpg_interval
+        {
             /*
-             * Check for type names that the SQL grammar treats as
-             * unreserved keywords
+             * In C parsing mode, the above SQL type names are not keywords,
+             * so they will show up here as a plain identifier, and we need
+             * this duplicate code to recognize them.
+             *
+             * Note that we also handle the type names bytea, date, and
+             * datetime here, but not above because those are not currently
+             * SQL keywords.  If they ever become so, they must gain duplicate
+             * productions above.
              */
+            if (strlen($2) != 0 && strcmp ($1, "datetime") != 0 && strcmp ($1, "interval") != 0)
+                mmerror (PARSE_ERROR, ET_ERROR, "interval specification not allowed here");
+
             if (strcmp($1, "varchar") == 0)
             {
                 $$.type_enum = ECPGt_varchar;
@@ -686,45 +790,8 @@ var_type:    simple_type
             }
             else
             {
-                /* this is for typedef'ed types */
-                struct typedefs *this = get_typedef($1);
-
-                $$.type_str = (this->type->type_enum == ECPGt_varchar || this->type->type_enum == ECPGt_bytea) ? EMPTY
:mm_strdup(this->name); 
-                $$.type_enum = this->type->type_enum;
-                $$.type_dimension = this->type->type_dimension;
-                $$.type_index = this->type->type_index;
-                if (this->type->type_sizeof && strlen(this->type->type_sizeof) != 0)
-                    $$.type_sizeof = this->type->type_sizeof;
-                else
-                    $$.type_sizeof = cat_str(3, mm_strdup("sizeof("), mm_strdup(this->name), mm_strdup(")"));
-
-                struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list);
-            }
-        }
-        | STRING
-        {
-            /*
-             * It's quite horrid that ECPGColLabelCommon excludes
-             * unreserved_keyword, meaning that unreserved keywords can't be
-             * used as type names in var_type.  However, this is hard to avoid
-             * since what follows ecpgstart can be either a random SQL
-             * statement or an ECPGVarDeclaration (beginning with var_type).
-             * Pending a bright idea about how to fix that, we must
-             * special-case STRING (and any other unreserved keywords that are
-             * likely to be needed here).
-             */
-            if (INFORMIX_MODE)
-            {
-                $$.type_enum = ECPGt_string;
-                $$.type_str = mm_strdup("char");
-                $$.type_dimension = mm_strdup("-1");
-                $$.type_index = mm_strdup("-1");
-                $$.type_sizeof = NULL;
-            }
-            else
-            {
-                /* this is for typedef'ed types */
-                struct typedefs *this = get_typedef("string");
+                /* Otherwise, it must be a user-defined typedef name */
+                struct typedefs *this = get_typedef($1, false);

                 $$.type_str = (this->type->type_enum == ECPGt_varchar || this->type->type_enum == ECPGt_bytea) ? EMPTY
:mm_strdup(this->name); 
                 $$.type_enum = this->type->type_enum;
@@ -751,7 +818,7 @@ var_type:    simple_type
             {
                 /* No */

-                this = get_typedef(name);
+                this = get_typedef(name, false);
                 $$.type_str = mm_strdup(this->name);
                 $$.type_enum = this->type->type_enum;
                 $$.type_dimension = this->type->type_dimension;
@@ -1657,17 +1724,14 @@ ColLabel:  ECPGColLabel                { $$ = $1; }
         | ECPGunreserved_interval    { $$ = $1; }
         ;

-ECPGColLabel:  ECPGColLabelCommon    { $$ = $1; }
+ECPGColLabel:  ecpg_ident            { $$ = $1; }
         | unreserved_keyword        { $$ = $1; }
-        | reserved_keyword            { $$ = $1; }
-        | ECPGKeywords_rest            { $$ = $1; }
-        | CONNECTION                { $$ = mm_strdup("connection"); }
-        ;
-
-ECPGColLabelCommon:  ecpg_ident        { $$ = $1; }
         | col_name_keyword            { $$ = $1; }
         | type_func_name_keyword    { $$ = $1; }
+        | reserved_keyword            { $$ = $1; }
         | ECPGKeywords_vanames        { $$ = $1; }
+        | ECPGKeywords_rest            { $$ = $1; }
+        | CONNECTION                { $$ = mm_strdup("connection"); }
         ;

 ECPGCKeywords: S_AUTO                { $$ = mm_strdup("auto"); }
diff --git a/src/interfaces/ecpg/preproc/ecpg.type b/src/interfaces/ecpg/preproc/ecpg.type
index d1cde691c0..4fe80a5a4b 100644
--- a/src/interfaces/ecpg/preproc/ecpg.type
+++ b/src/interfaces/ecpg/preproc/ecpg.type
@@ -3,7 +3,6 @@
 %type <str> ECPGCKeywords
 %type <str> ECPGColId
 %type <str> ECPGColLabel
-%type <str> ECPGColLabelCommon
 %type <str> ECPGConnect
 %type <str> ECPGCursorStmt
 %type <str> ECPGDeallocateDescr
diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l
index 996718cb8a..c344f8f30f 100644
--- a/src/interfaces/ecpg/preproc/pgc.l
+++ b/src/interfaces/ecpg/preproc/pgc.l
@@ -983,10 +983,19 @@ cppline            {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+
                     {
                         int        kwvalue;

-                        /* Is it an SQL/ECPG keyword? */
-                        kwvalue = ScanECPGKeywordLookup(yytext);
-                        if (kwvalue >= 0)
-                            return kwvalue;
+                        /*
+                         * User-defined typedefs override SQL keywords, but
+                         * not C keywords.  Currently, a typedef name is just
+                         * reported as IDENT, but someday we might need to
+                         * return a distinct token type.
+                         */
+                        if (get_typedef(yytext, true) == NULL)
+                        {
+                            /* Is it an SQL/ECPG keyword? */
+                            kwvalue = ScanECPGKeywordLookup(yytext);
+                            if (kwvalue >= 0)
+                                return kwvalue;
+                        }

                         /* Is it a C keyword? */
                         kwvalue = ScanCKeywordLookup(yytext);
diff --git a/src/interfaces/ecpg/preproc/preproc_extern.h b/src/interfaces/ecpg/preproc/preproc_extern.h
index 992797b8bb..6be59b7193 100644
--- a/src/interfaces/ecpg/preproc/preproc_extern.h
+++ b/src/interfaces/ecpg/preproc/preproc_extern.h
@@ -93,7 +93,7 @@ extern void add_variable_to_head(struct arguments **, struct variable *, struct
 extern void add_variable_to_tail(struct arguments **, struct variable *, struct variable *);
 extern void remove_variable_from_list(struct arguments **list, struct variable *var);
 extern void dump_variables(struct arguments *, int);
-extern struct typedefs *get_typedef(char *);
+extern struct typedefs *get_typedef(const char *name, bool noerror);
 extern void adjust_array(enum ECPGttype, char **, char **, char *, char *, int, bool);
 extern void reset_variables(void);
 extern void check_indicator(struct ECPGtype *);
diff --git a/src/interfaces/ecpg/preproc/variable.c b/src/interfaces/ecpg/preproc/variable.c
index 887d479e73..2a2b953118 100644
--- a/src/interfaces/ecpg/preproc/variable.c
+++ b/src/interfaces/ecpg/preproc/variable.c
@@ -497,15 +497,20 @@ check_indicator(struct ECPGtype *var)
 }

 struct typedefs *
-get_typedef(char *name)
+get_typedef(const char *name, bool noerror)
 {
     struct typedefs *this;

-    for (this = types; this && strcmp(this->name, name) != 0; this = this->next);
-    if (!this)
+    for (this = types; this != NULL; this = this->next)
+    {
+        if (strcmp(this->name, name) == 0)
+            return this;
+    }
+
+    if (!noerror)
         mmfatal(PARSE_ERROR, "unrecognized data type name \"%s\"", name);

-    return this;
+    return NULL;
 }

 void

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Ignoring BRIN for HOT udpates seems broken
Next
From: David Rowley
Date:
Subject: Re: PG15 beta1 sort performance regression due to Generation context change