Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word |
Date | |
Msg-id | 4158666.1653945615@sss.pgh.pa.us Whole thread Raw |
In response to | Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word (Michael Meskes <meskes@postgresql.org>) |
Responses |
Re: SQL/JSON functions vs. ECPG vs. STRING as a reserved word
|
List | pgsql-hackers |
Michael Meskes <meskes@postgresql.org> writes: >> This seems to be because what follows ecpgstart can be either a general >> SQL statement or an ECPGVarDeclaration (beginning with var_type), >> and bison isn't smart enough to disambiguate. I have a feeling that >> this situation could be improved with enough elbow grease, because >> plpgsql manages to solve a closely-related problem in allowing its >> assignment statements to coexist with general SQL statements. > Right, the reason for all this is that the part after the "exec sql" could be > either language, SQL or C. It has been like this for all those years. I do not > claim that the solution we have is the best, it's only the best I could come up > with when I implemented it. If anyone has a better way, please be my guest. I pushed the proposed patch, but after continuing to think about it I have an idea for a possible solution to the older problem. I noticed that the problematic cases in var_type aren't really interested in seeing any possible unreserved keyword: they care about certain specific built-in type names (most of which are keywords already) and about typedef names. Now, almost every C-parsing program I've ever seen has to lex typedef names specially, so what if we made ecpg do that too? After a couple of false starts, I came up with the attached draft patch. The key ideas are: 1. In pgc.l, if an identifier is a typedef name, ignore any possible keyword meaning and return it as an IDENT. (I'd originally supposed that we'd want to return some new TYPEDEF token type, but that does not seem to be necessary right now, and adding a new token type would increase the patch footprint quite a bit.) 2. In the var_type production, forget about ECPGColLabel[Common] and just handle the keywords we know we need, plus IDENT for the typedef case. It turns out that we have to have duplicate coding because most of these keywords are not keywords in C lexing mode, so that they'll come through the IDENT path anyway when we're in a C rather than SQL context. That seemed acceptable to me. I thought about adding them all to the C keywords list but that seemed likely to have undesirable side-effects, and again it'd bloat the patch footprint. This fix is not without downsides. Disabling recognition of keywords that match typedefs means that, for example, if you declare a typedef named "work" then ECPG will fail to parse "EXEC SQL BEGIN WORK". So in a real sense this is just trading one hazard for another. But there is an important difference: with this, whether your ECPG program works depends only on what typedef names and SQL commands are used in the program. If it compiles today it'll still compile next year, whereas with the present implementation the addition of some new unreserved SQL keyword could break it. We'd have to document this change for sure, and it wouldn't be something to back-patch, but it seems like it might be acceptable from the users' standpoint. We could narrow (not eliminate) this hazard if we could get the typedef lookup in pgc.l to happen only when we're about to parse a var_type construct. But because of Bison's lookahead behavior, that seems to be impossible, or at least undesirably messy and fragile. But perhaps somebody else will see a way. Anyway, this seems like too big a change to consider for v15, so I'll stick this patch into the v16 CF queue. It's only draft quality anyway --- lacks documentation changes and test cases. There are also some coding points that could use review. Notably, I made the typedef lookup override SQL keywords but not C keywords; this is for consistency with the C-mode lookup rules, but is it the right thing? regards, tom lane diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer index b95fc44314..fba35f6be6 100644 --- a/src/interfaces/ecpg/preproc/ecpg.trailer +++ b/src/interfaces/ecpg/preproc/ecpg.trailer @@ -564,8 +564,29 @@ var_type: simple_type $$.type_index = mm_strdup("-1"); $$.type_sizeof = NULL; } - | ECPGColLabelCommon '(' precision opt_scale ')' + | NUMERIC '(' precision opt_scale ')' { + $$.type_enum = ECPGt_numeric; + $$.type_str = mm_strdup("numeric"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | DECIMAL_P '(' precision opt_scale ')' + { + $$.type_enum = ECPGt_decimal; + $$.type_str = mm_strdup("decimal"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | IDENT '(' precision opt_scale ')' + { + /* + * In C parsing mode, NUMERIC and DECIMAL are not keywords, so + * they will show up here as a plain identifier, and we need + * this duplicate code to recognize them. + */ if (strcmp($1, "numeric") == 0) { $$.type_enum = ECPGt_numeric; @@ -587,15 +608,98 @@ var_type: simple_type $$.type_index = mm_strdup("-1"); $$.type_sizeof = NULL; } - | ECPGColLabelCommon ecpg_interval + | VARCHAR { - if (strlen($2) != 0 && strcmp ($1, "datetime") != 0 && strcmp ($1, "interval") != 0) - mmerror (PARSE_ERROR, ET_ERROR, "interval specification not allowed here"); + $$.type_enum = ECPGt_varchar; + $$.type_str = EMPTY; /*mm_strdup("varchar");*/ + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | FLOAT_P + { + /* Note: DOUBLE is handled in simple_type */ + $$.type_enum = ECPGt_float; + $$.type_str = mm_strdup("float"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | NUMERIC + { + $$.type_enum = ECPGt_numeric; + $$.type_str = mm_strdup("numeric"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | DECIMAL_P + { + $$.type_enum = ECPGt_decimal; + $$.type_str = mm_strdup("decimal"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | TIMESTAMP + { + $$.type_enum = ECPGt_timestamp; + $$.type_str = mm_strdup("timestamp"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | INTERVAL ecpg_interval + { + $$.type_enum = ECPGt_interval; + $$.type_str = mm_strdup("interval"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + | STRING + { + if (INFORMIX_MODE) + { + /* In Informix mode, "string" is automatically a typedef */ + $$.type_enum = ECPGt_string; + $$.type_str = mm_strdup("char"); + $$.type_dimension = mm_strdup("-1"); + $$.type_index = mm_strdup("-1"); + $$.type_sizeof = NULL; + } + else + { + /* Otherwise, legal only if user typedef'ed it */ + struct typedefs *this = get_typedef("string", false); + + $$.type_str = (this->type->type_enum == ECPGt_varchar || this->type->type_enum == ECPGt_bytea) ? EMPTY :mm_strdup(this->name); + $$.type_enum = this->type->type_enum; + $$.type_dimension = this->type->type_dimension; + $$.type_index = this->type->type_index; + if (this->type->type_sizeof && strlen(this->type->type_sizeof) != 0) + $$.type_sizeof = this->type->type_sizeof; + else + $$.type_sizeof = cat_str(3, mm_strdup("sizeof("), mm_strdup(this->name), mm_strdup(")")); + struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list); + } + } + | IDENT ecpg_interval + { /* - * Check for type names that the SQL grammar treats as - * unreserved keywords + * In C parsing mode, the above SQL type names are not keywords, + * so they will show up here as a plain identifier, and we need + * this duplicate code to recognize them. + * + * Note that we also handle the type names bytea, date, and + * datetime here, but not above because those are not currently + * SQL keywords. If they ever become so, they must gain duplicate + * productions above. */ + if (strlen($2) != 0 && strcmp ($1, "datetime") != 0 && strcmp ($1, "interval") != 0) + mmerror (PARSE_ERROR, ET_ERROR, "interval specification not allowed here"); + if (strcmp($1, "varchar") == 0) { $$.type_enum = ECPGt_varchar; @@ -686,45 +790,8 @@ var_type: simple_type } else { - /* this is for typedef'ed types */ - struct typedefs *this = get_typedef($1); - - $$.type_str = (this->type->type_enum == ECPGt_varchar || this->type->type_enum == ECPGt_bytea) ? EMPTY :mm_strdup(this->name); - $$.type_enum = this->type->type_enum; - $$.type_dimension = this->type->type_dimension; - $$.type_index = this->type->type_index; - if (this->type->type_sizeof && strlen(this->type->type_sizeof) != 0) - $$.type_sizeof = this->type->type_sizeof; - else - $$.type_sizeof = cat_str(3, mm_strdup("sizeof("), mm_strdup(this->name), mm_strdup(")")); - - struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list); - } - } - | STRING - { - /* - * It's quite horrid that ECPGColLabelCommon excludes - * unreserved_keyword, meaning that unreserved keywords can't be - * used as type names in var_type. However, this is hard to avoid - * since what follows ecpgstart can be either a random SQL - * statement or an ECPGVarDeclaration (beginning with var_type). - * Pending a bright idea about how to fix that, we must - * special-case STRING (and any other unreserved keywords that are - * likely to be needed here). - */ - if (INFORMIX_MODE) - { - $$.type_enum = ECPGt_string; - $$.type_str = mm_strdup("char"); - $$.type_dimension = mm_strdup("-1"); - $$.type_index = mm_strdup("-1"); - $$.type_sizeof = NULL; - } - else - { - /* this is for typedef'ed types */ - struct typedefs *this = get_typedef("string"); + /* Otherwise, it must be a user-defined typedef name */ + struct typedefs *this = get_typedef($1, false); $$.type_str = (this->type->type_enum == ECPGt_varchar || this->type->type_enum == ECPGt_bytea) ? EMPTY :mm_strdup(this->name); $$.type_enum = this->type->type_enum; @@ -751,7 +818,7 @@ var_type: simple_type { /* No */ - this = get_typedef(name); + this = get_typedef(name, false); $$.type_str = mm_strdup(this->name); $$.type_enum = this->type->type_enum; $$.type_dimension = this->type->type_dimension; @@ -1657,17 +1724,14 @@ ColLabel: ECPGColLabel { $$ = $1; } | ECPGunreserved_interval { $$ = $1; } ; -ECPGColLabel: ECPGColLabelCommon { $$ = $1; } +ECPGColLabel: ecpg_ident { $$ = $1; } | unreserved_keyword { $$ = $1; } - | reserved_keyword { $$ = $1; } - | ECPGKeywords_rest { $$ = $1; } - | CONNECTION { $$ = mm_strdup("connection"); } - ; - -ECPGColLabelCommon: ecpg_ident { $$ = $1; } | col_name_keyword { $$ = $1; } | type_func_name_keyword { $$ = $1; } + | reserved_keyword { $$ = $1; } | ECPGKeywords_vanames { $$ = $1; } + | ECPGKeywords_rest { $$ = $1; } + | CONNECTION { $$ = mm_strdup("connection"); } ; ECPGCKeywords: S_AUTO { $$ = mm_strdup("auto"); } diff --git a/src/interfaces/ecpg/preproc/ecpg.type b/src/interfaces/ecpg/preproc/ecpg.type index d1cde691c0..4fe80a5a4b 100644 --- a/src/interfaces/ecpg/preproc/ecpg.type +++ b/src/interfaces/ecpg/preproc/ecpg.type @@ -3,7 +3,6 @@ %type <str> ECPGCKeywords %type <str> ECPGColId %type <str> ECPGColLabel -%type <str> ECPGColLabelCommon %type <str> ECPGConnect %type <str> ECPGCursorStmt %type <str> ECPGDeallocateDescr diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l index 996718cb8a..c344f8f30f 100644 --- a/src/interfaces/ecpg/preproc/pgc.l +++ b/src/interfaces/ecpg/preproc/pgc.l @@ -983,10 +983,19 @@ cppline {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+ { int kwvalue; - /* Is it an SQL/ECPG keyword? */ - kwvalue = ScanECPGKeywordLookup(yytext); - if (kwvalue >= 0) - return kwvalue; + /* + * User-defined typedefs override SQL keywords, but + * not C keywords. Currently, a typedef name is just + * reported as IDENT, but someday we might need to + * return a distinct token type. + */ + if (get_typedef(yytext, true) == NULL) + { + /* Is it an SQL/ECPG keyword? */ + kwvalue = ScanECPGKeywordLookup(yytext); + if (kwvalue >= 0) + return kwvalue; + } /* Is it a C keyword? */ kwvalue = ScanCKeywordLookup(yytext); diff --git a/src/interfaces/ecpg/preproc/preproc_extern.h b/src/interfaces/ecpg/preproc/preproc_extern.h index 992797b8bb..6be59b7193 100644 --- a/src/interfaces/ecpg/preproc/preproc_extern.h +++ b/src/interfaces/ecpg/preproc/preproc_extern.h @@ -93,7 +93,7 @@ extern void add_variable_to_head(struct arguments **, struct variable *, struct extern void add_variable_to_tail(struct arguments **, struct variable *, struct variable *); extern void remove_variable_from_list(struct arguments **list, struct variable *var); extern void dump_variables(struct arguments *, int); -extern struct typedefs *get_typedef(char *); +extern struct typedefs *get_typedef(const char *name, bool noerror); extern void adjust_array(enum ECPGttype, char **, char **, char *, char *, int, bool); extern void reset_variables(void); extern void check_indicator(struct ECPGtype *); diff --git a/src/interfaces/ecpg/preproc/variable.c b/src/interfaces/ecpg/preproc/variable.c index 887d479e73..2a2b953118 100644 --- a/src/interfaces/ecpg/preproc/variable.c +++ b/src/interfaces/ecpg/preproc/variable.c @@ -497,15 +497,20 @@ check_indicator(struct ECPGtype *var) } struct typedefs * -get_typedef(char *name) +get_typedef(const char *name, bool noerror) { struct typedefs *this; - for (this = types; this && strcmp(this->name, name) != 0; this = this->next); - if (!this) + for (this = types; this != NULL; this = this->next) + { + if (strcmp(this->name, name) == 0) + return this; + } + + if (!noerror) mmfatal(PARSE_ERROR, "unrecognized data type name \"%s\"", name); - return this; + return NULL; } void
pgsql-hackers by date: