Thread: Planned cleanups in attribute parsing

Planned cleanups in attribute parsing

From
Tom Lane
Date:
On the way to supporting schemas, I am thinking about trying to make the
parsing of attributes a little more intelligible.  The Attr node type
seems overused to mean several different things.  I'd like to do the
following:

For column references:

Split "Attr" into three node types for different uses:

Alias: for AS clauses.  Carries a "char *aliasname" and a List of column
alias names.  The current uses of Attr in range table entries would
become Alias.

ColumnRef: for referencing a column (possibly qualified, possibly with
array subscripts) in the raw grammar output.  Carries a List of names
which correspond to the dotted names (eg, a.b.c), plus a List of array
subscripting info (currently called "indirection" in Attr, but I wonder
if "subscripts" wouldn't be a more useful name).

ParamRef: for referencing a parameter.  Carries parameter number,
possibly-empty list of field names to qualify the param, and a subscript
list.  The ParamNo node type goes away, to be merged into this.

The Ident node type is not semantically distinct from ColumnRef with a
one-element name list.  Probably should retire it.

Perhaps indirection should be split out as a separate node type, with an eye
to allowing (arbitrary-expression)[42] someday.

For table references:

Currently, the table name associated with an unparsed statement is typically
just a string.  I propose replacing this with a RelationRef node type,
carrying a List of names corresponding to the dotted names of the reference
(1 to 3 names).  Alternatively, we could just use the raw List of names and
not bother with an explicit node; any preferences?


Also, I think we could retire the notion of "relation vs. column
precedence" in the parser.  AFAICS the only place where transformExpr is
told EXPR_RELATION_FIRST is for processing an Attr's paramNo --- but
the ParamNo path through transformExpr never looks at the precedence!
Accordingly, only the EXPR_COLUMN_FIRST cases are ever actually used
anywhere, and there's no need for the notational cruft of passing
precedence around.

Comments?
        regards, tom lane


Re: Planned cleanups in attribute parsing

From
Fernando Nasser
Date:
Tom Lane wrote:
> 
> On the way to supporting schemas, I am thinking about trying to make the
> parsing of attributes a little more intelligible.  The Attr node type
> seems overused to mean several different things.  I'd like to do the
> following:
> 
> For column references:
> 
> Split "Attr" into three node types for different uses:
> 
> Alias: for AS clauses.  Carries a "char *aliasname" and a List of column
> alias names.  The current uses of Attr in range table entries would
> become Alias.
> 
> ColumnRef: for referencing a column (possibly qualified, possibly with
> array subscripts) in the raw grammar output.  Carries a List of names
> which correspond to the dotted names (eg, a.b.c), plus a List of array
> subscripting info (currently called "indirection" in Attr, but I wonder
> if "subscripts" wouldn't be a more useful name).
> 
> ParamRef: for referencing a parameter.  Carries parameter number,
> possibly-empty list of field names to qualify the param, and a subscript
> list.  The ParamNo node type goes away, to be merged into this.
> 
> The Ident node type is not semantically distinct from ColumnRef with a
> one-element name list.  Probably should retire it.
> 

These sound good to me.

> Perhaps indirection should be split out as a separate node type, with an eye
> to allowing (arbitrary-expression)[42] someday.
> 
> For table references:
> 
> Currently, the table name associated with an unparsed statement is typically
> just a string.  I propose replacing this with a RelationRef node type,
> carrying a List of names corresponding to the dotted names of the reference
> (1 to 3 names).  Alternatively, we could just use the raw List of names and
> not bother with an explicit node; any preferences?
> 

We can handle most cases with RangeVar (+ the ones you've proposed
above).
The schema name will have to go into RangeVar anyway.


> Also, I think we could retire the notion of "relation vs. column
> precedence" in the parser.  AFAICS the only place where transformExpr is
> told EXPR_RELATION_FIRST is for processing an Attr's paramNo --- but
> the ParamNo path through transformExpr never looks at the precedence!
> Accordingly, only the EXPR_COLUMN_FIRST cases are ever actually used
> anywhere, and there's no need for the notational cruft of passing
> precedence around.
> 
> Comments?
> 
>                         regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
> 
> http://archives.postgresql.org

-- 
Fernando Nasser
Red Hat Canada Ltd.                     E-Mail:  fnasser@redhat.com
2323 Yonge Street, Suite #300
Toronto, Ontario   M4P 2C9


Re: Planned cleanups in attribute parsing

From
Tom Lane
Date:
Fernando Nasser <fnasser@redhat.com> writes:
> Tom Lane wrote:
>> Currently, the table name associated with an unparsed statement is typically
>> just a string.  I propose replacing this with a RelationRef node type,
>> carrying a List of names corresponding to the dotted names of the reference
>> (1 to 3 names).  Alternatively, we could just use the raw List of names and
>> not bother with an explicit node; any preferences?

> We can handle most cases with RangeVar (+ the ones you've proposed
> above).

Right, I had not noticed there was already a suitable node type.
RangeVar will do fine, no need to invent RelationRef ...
        regards, tom lane


Re: Planned cleanups in attribute parsing

From
Thomas Lockhart
Date:
> On the way to supporting schemas, I am thinking about trying to make the
> parsing of attributes a little more intelligible.  The Attr node type
> seems overused to mean several different things.

Right.

> For column references:
> Split "Attr" into three node types for different uses:
> Alias: for AS clauses.  Carries a "char *aliasname" and a List of column
> alias names.  The current uses of Attr in range table entries would
> become Alias.

Is there a one-to-one relationship between the alias and the column? Or
does the list of column names actually have more than one entry? Or is
the "list of column alias names" the qualified name of the column?

> ColumnRef: for referencing a column (possibly qualified, possibly with
> array subscripts) in the raw grammar output.  Carries a List of names
> which correspond to the dotted names (eg, a.b.c), plus a List of array
> subscripting info (currently called "indirection" in Attr, but I wonder
> if "subscripts" wouldn't be a more useful name).

Would it be helpful to separate the name itself from the qualifying
prefixes? istm that most use cases would require this anyway...

> ParamRef: for referencing a parameter.  Carries parameter number,
> possibly-empty list of field names to qualify the param, and a subscript
> list.  The ParamNo node type goes away, to be merged into this.

OK.

> The Ident node type is not semantically distinct from ColumnRef with a
> one-element name list.  Probably should retire it.
> Perhaps indirection should be split out as a separate node type, with an eye
> to allowing (arbitrary-expression)[42] someday.

OK.

> Currently, the table name associated with an unparsed statement is typically
> just a string.  I propose replacing this with a RelationRef node type,
> carrying a List of names corresponding to the dotted names of the reference
> (1 to 3 names).  Alternatively, we could just use the raw List of names and
> not bother with an explicit node; any preferences?

Nodes are better imho.

> Also, I think we could retire the notion of "relation vs. column
> precedence" in the parser.  AFAICS the only place where transformExpr is
> told EXPR_RELATION_FIRST is for processing an Attr's paramNo --- but
> the ParamNo path through transformExpr never looks at the precedence!
> Accordingly, only the EXPR_COLUMN_FIRST cases are ever actually used
> anywhere, and there's no need for the notational cruft of passing
> precedence around.

Hmm. I can't think of a case where either columns *or* tables could be
mentioned and where a table name would take precedence. otoh we should
decide pretty carefully that this will *never* happen before ripping too
much out.
                    - Thomas


Re: Planned cleanups in attribute parsing

From
Tom Lane
Date:
Thomas Lockhart <lockhart@fourpalms.org> writes:
>> Alias: for AS clauses.  Carries a "char *aliasname" and a List of column
>> alias names.  The current uses of Attr in range table entries would
>> become Alias.

> Is there a one-to-one relationship between the alias and the column? Or
> does the list of column names actually have more than one entry? Or is
> the "list of column alias names" the qualified name of the column?

Basically type Alias represents an AS clause, which can come in two
flavors: just "AS foo", or "AS foo(bar1,bar2,bar3)" for renaming a
FROM-list item along with its columns.  So the list of names in this
case represents individual column names, *not* a qualified name.
One reason that I want to separate this from Attr is that the list
of names has a totally different meaning from what it has in Attr.

>> ColumnRef: for referencing a column (possibly qualified, possibly with
>> array subscripts) in the raw grammar output.  Carries a List of names
>> which correspond to the dotted names (eg, a.b.c), plus a List of array
>> subscripting info (currently called "indirection" in Attr, but I wonder
>> if "subscripts" wouldn't be a more useful name).

> Would it be helpful to separate the name itself from the qualifying
> prefixes? istm that most use cases would require this anyway...

Remember this is raw parsetree output; the grammar does not have a real
good idea which names are qualifiers and which are field names and/or
function names.  The parse analysis phase will rip the list apart and
determine what's what.  The output of that will be some other node type
(eg, a Var).
        regards, tom lane