Re: record statement boundaries - Mailing list pgsql-patches

From Tom Lane
Subject Re: record statement boundaries
Date
Msg-id 5488.1140156922@sss.pgh.pa.us
Whole thread Raw
In response to record statement boundaries  (Neil Conway <neilc@samurai.com>)
List pgsql-patches
Neil Conway <neilc@samurai.com> writes:
> Attached is a patch from Joachim Wieland that changes the parser to
> optionally maintain a list of the source texts of the statements in the
> input query string.

I'm too tired to put together a coherent response, but on a first
look-over this seems to be doing a lot of things in the wrong place.
In particular it looks like it only considers the case of query strings
arriving at the parser via an interactive query.  What about SPI,
SQL functions, etc?  I do not like the side data structure that the
strings are returned in either ... why isn't the information added to
the grammar's main return data structure, ie the list of statement
nodes?  Using a global variable for this is inherently non-reentrant.

I don't think I like the extensive fooling with scanbufhandle,
yy_buf_pos, etc either.  It's been awhile since I read the flex manual
but I think there are APIs defined that you are supposed to use for
the purpose of tracing token locations, and these ain't them.  (For one
thing, the lexer may well be a token ahead of what the grammar has
managed to reduce --- this technique will not be able to cope with
such cases.)

A more general comment is that we should not be designing something that
only tracks statement boundaries.  To improve our error messages, we
eventually need something that keeps track of the source-string location
of just about every token --- for example, when complaining about an
unknown column name, we ought to be able to point out the errant
identifier in the command, rather than leaving the user to search
through what might be many dozens of lines of SQL.  I think now would be
a good time to bite that bullet.  Again, flex/bison already have notions
about how to do that embedded in them, and we should work with the tools
rather than outside them.

(To make this practical, what we probably want to do is store integer
character positions in the parse data structure, not make multiple
copies of pieces of the source string.)

Anyway, I'm glad to see someone tackling this area, but what you've got
here is just a rough draft.  Read the stuff about "locations" in the
bison manual.

            regards, tom lane

pgsql-patches by date:

Previous
From: Neil Conway
Date:
Subject: record statement boundaries
Next
From: "Marko Kreen"
Date:
Subject: pgcrypto: fix memory leak in openssl.c