Re: [RFC] nodeToString format and exporting the SQL parser - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [RFC] nodeToString format and exporting the SQL parser
Date
Msg-id 201004250055.o3P0ttT21506@momjian.us
Whole thread Raw
In response to Re: [RFC] nodeToString format and exporting the SQL parser  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Added to TODO:
Allow the parser output to be seen by clientsThis is useful for external tools.*
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00095.php

---------------------------------------------------------------------------

Robert Haas wrote:
> On Fri, Apr 2, 2010 at 3:53 PM, Michael Tharp <gxti@partiallystapled.com> wrote:
> > Most Esteemed Hackers:
> >
> > Due to popular demand on #postgresql (by which I mean David Fetter), I have
> > been spending a little time making the internal SQL parser available to
> > clients via a C-language SQL function. The function itself is extremely
> > simple: just a wrapper around a call to raw_parser followed by nodeToString.
> 
> Seems reasonable.
> 
> > Most of the "hard stuff" has been in parsing the output of nodeToString on
> > the client side. So, I have a few questions to help gauge interest in
> > related patches:
> >
> > Is there interest in a patch to extend nodes/outfuncs.c with support for
> > serializing more node types? Coverage has been pretty good so far but
> > various utility statements and their related nodes are missing, e.g.
> > AlterTableStmt and GrantStmt. I expect that this will be the least
> > contentious suggestion.
> 
> This wouldn't bother me provided the code footprint is small.  I would
> be against adding a lot of complexity for this.
> 
> > The nodeToString format as it stands is somewhat ambiguous with respect to
> > the type of a node member's value if one does not have access to
> > readfuncs.c. For example, a T_BitString called foo is serialized as ':foo
> > b1010' while a char * containing 'b1010' is also serialized as ':foo b1010'.
> > This may just mean that _outToken needs to escape the leading 'b'. A similar
> > problem exists for booleans ('true' as a string vs. as a boolean).
> 
> I am not inclined to change this.  Turning the format into something
> self-describing seems to me to be significant work and a significant
> compatibility break for a very small amount of benefit.
> 
> > Additionally, values may span more than one token for certain types e.g.
> > Datum (":constvalue 4 [ 16 0 0 0 ]"). Plan trees have a few types that don't
> > have a corresponding read function and output an array of space-separated
> > integers. PlanInvalItem even seems to use a format containing parentheses,
> > which the tokenizer splits as if it were a list. While most of these only
> > occur in plan nodes and thus don't affect my use case (Datum being the
> > exception), it would be ideal if they could be parsed more
> > straightforwardly.
> 
> I'm not inclined to change this, either.
> 
> > These last two problems perhaps can be worked around by escaping more things
> > in _outToken, but maybe it would be smarter to make the fields
> > self-descriptive in terms of type. For example, the field names could be
> > prefixed with a short string describing its type, which in most cases would
> > be a single character, e.g. 's:schemaname' for a char*, 'b:true' for a bool,
> > 'n:...' for any node (including Value nodes), or longer strings for less
> > commonly used types like the integer arrays in plan nodes (although these
> > would probably be better as a real integer list). These could be used to
> > unambiguously parse individual tokens and also to determine how many or what
> > kind of token to expect for multi-token values such as Datum which would
> > otherwise require guessing. Does this seem reasonable? Is there another
> > format that might make more sense?
> 
> This seems ugly to me and I don't see the utility of it.
> 
> > As far as I can tell, the current parser in nodes/read.c ignores the field
> > names entirely, so this can be done without changing postgres' own parsing
> > code at all and without affecting backwards compatibility of any stored
> > trees. Does anyone else out there use nodeToString() output in their own
> > tools, and if so, does this make your life easier or harder?
> >
> > Lastly, I'll leave a link to my WIP implementation in case anyone is
> > interested:
> > ?http://bitbucket.org/gxti/parse_sql/src/
> > Currently I'm working on adding support for cooked parse trees and figuring
> > out what, if anything, I need to do to support multibyte encodings. My
> > personal use is for parsing DDL so the input is decidedly not hostile but
> > I'd still like to make this a generally useful module.
> >
> > Thanks in advance for any comments, tips, or flames sent my way.
> 
> Thanks for having a thick skin.  :-)
> 
> I'm having a hard time imaging what you could use this for without
> encoding a lot of information about the meaning of particular
> constructs.  In which case the self-describing stuff is not needed.
> As you point out downthread, if all you want to do is compare, it's
> not needed either.
> 
> ...Robert
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [RFC] nodeToString format and exporting the SQL parser
Next
From: Tom Lane
Date:
Subject: Re: [RFC] nodeToString format and exporting the SQL parser