Thread: General Parser

General Parser

From
Ulrich Meis
Date:
I was still worried about that "two words" table bug, so I tried myself on the
parser todo.

I skimmed through the code and found parsers at the following spots:
(tell me if I missed something)

1. AbstractJdbc2Statement.replaceProcessing
2. AbstractJdbc2Statement.modifyJdbcCall
3. V2Query constructor
4. v3/QueryExecutorImpl.parseQuery
5. AbstractJdbc2ResultSet.parseQuery

Problems I discovered:

Number 3 and 4 do not handle an escaped literal quote, i.e. \'
Number 5 doesn't handle (among other things) any quotes, joins, unions.

Finally, none of the above handle dollar quoting.

Attached you'll find a QueryParser class I've written that has all the
functionality that is implemented in the above methods and handles dollar
quoting. It parses the query in one run, so I believe it's pretty fast.

For testing purposes it has a main method so you can fire queries at it and it
will show you the determined subqueries and their fragments (delimited by
X?X).

I hope it's what you had in mind with the parser todo.

The class still needs extensive testing but I wanted to make sure that I'm
doing the right thing first...so what are your thoughts?

Uli

Attachment

Re: General Parser

From
Oliver Jowett
Date:
Ulrich Meis wrote:

> 3. V2Query constructor
> 4. v3/QueryExecutorImpl.parseQuery

> Number 3 and 4 do not handle an escaped literal quote, i.e. \'

Also they don't handle double quotes ("table with ? in it") properly.

I've just committed fixes for these problems -- could you take a look
and see if they look OK now?

-O

Re: General Parser

From
Oliver Jowett
Date:
Ulrich Meis wrote:

> Attached you'll find a QueryParser class I've written that has all the
> functionality that is implemented in the above methods and handles dollar
> quoting. It parses the query in one run, so I believe it's pretty fast.

> The class still needs extensive testing but I wanted to make sure that I'm
> doing the right thing first...so what are your thoughts?

I think it's a good idea to unify all the parsing that needs to be done,
if only because it reduces code duplication.

That said, this still needs some work as the various different callers
need different parsing done. For example, we only care about splitting
queries into multiple statements when using the V3 protocol, we only
care about ? placeholders when parsing a query supplied via
prepareStatement, the { call } syntax only makes sense for
CallableStatements, and there's no need to disassemble the SELECT etc if
the application never uses an updatable resultset.

I'm not sure how easy it will be to have a single flexible parser and
still have low overhead for the cases where we don't want to completely
disassemble the query.

-O

Re: General Parser

From
Ulrich Meis
Date:
Hi Oliver!

On Monday 01 November 2004 12:12, you wrote:
> That said, this still needs some work as the various different callers
> need different parsing done. For example, we only care about splitting
> queries into multiple statements when using the V3 protocol, we only
> care about ? placeholders when parsing a query supplied via
> prepareStatement, the { call } syntax only makes sense for
> CallableStatements, and there's no need to disassemble the SELECT etc if
> the application never uses an updatable resultset.

I thought about these aspects and in a first draft I splitted off the
disassembly of select statements in a separate method. But finally, I merged
them again because I thought the overhead would be neglectable.
I never considered splitting off the others because they really only involve
one char comparison per word(literals/identifiers being one word) or less.

All the handling you describe above is done in the addFraction method and that
is roughly called once per word. There's not a single loop so we don't add to
complexity.
If not neglectable in the scope of the parser itself, I would think that it is
regarding the whole query execution, i.e.  parsing,data conversion, sending
the query, the backend handling the query, receiving the results, data
conversion.
If you have an application that fires tons of queries that the backend needs
almost no time to execute for, then those queries are likely to be very
short. In that case parsing is quite quickly, too.

The cases in detail:

As said, all this happens once per word:

1. Splitting multiple statements.

if (first == SEMICOLON) {....}

That char comparison is all the overhead if it's a single query only.
Maybe it would even be nice to be able to throw an exception if multiple
queries are used in V2?

2. ? placeholders

if (first == QUESTIONMARK) {...}

As above.

3. call syntax

if (first == QUESTIONMARK) {...}

if (fraction.equalsIgnoreCase("call")) {...}

Those two comparisons are only done on the first word in a Java Escape and
otherwise never executed!

4. select statement disassembly

This is the only case I would consider to split off although I think it's not
necessary.
The whole section is completely ignored if the following holds true, i.e. it's
not a select statement and/or we already know it's not to a single table:

if (querytype == NONSINGLE) break;

Otherwise, we have on average less than one string comparison per word.

> I'm not sure how easy it will be to have a single flexible parser and
> still have low overhead for the cases where we don't want to completely
> disassemble the query.

Most of the overhead here comes from determining the tableName for an
updatable ResultSet. I can easily move that handling into a separate method.
In that case, I would make the parser keep the fractions ArrayList, so that
when the table name is actually needed in a ResultSet, the parsing can be
done real quick. Tell me if you prefer it that way and I'll do it.

I would really recommend to keep the other handling where it is, I don't think
it is neccessary to complicate things here. But tell me if I'm wrong.

Regards,

Uli


Re: General Parser

From
Ulrich Meis
Date:
On Monday 01 November 2004 12:06, you wrote:
> Ulrich Meis wrote:
> > 3. V2Query constructor
> > 4. v3/QueryExecutorImpl.parseQuery
> >
> > Number 3 and 4 do not handle an escaped literal quote, i.e. \'
>
> Also they don't handle double quotes ("table with ? in it") properly.
>
> I've just committed fixes for these problems -- could you take a look
> and see if they look OK now?

Seems fine!

Uli


code duplication (was: Re: General Parser)

From
Vadim Nasardinov
Date:
On Monday 01 November 2004 06:12, Oliver Jowett wrote:
> if only because it reduces code duplication.

Speaking of which, I ran CPD on the source tree and found a few cases
of copied and pasted code.  The results are posted here:

  http://people.redhat.com/~vadimn/scratch/pgsql-jdbc/pgjdbc-cpd-brief.txt
  http://people.redhat.com/~vadimn/scratch/pgsql-jdbc/pgjdbc-cpd-full.txt

I haven't had a chance to try and assess how difficult it would be to
factor out the duplicated chunks.


Vadim



stable release

From
Dave Cramer
Date:
As the server is nearing release, we should consider releasing a beta
version of the driver.
I think a feature freeze first, and since only Kris, and Oliver are
adding things, this could be relatively short.

Then tag a stable version and attempt to release it at the same time or
shortly after the server ?

Thoughts ?

Dave

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561



Re: General Parser

From
Ulrich Meis
Date:
Just to make sure we are on the same page...I thought of the class to be
instantiated only once for every query. Methods that need access to
subqueries or fragments fetch them from that object.

Uli


Re: stable release

From
Kris Jurka
Date:

On Mon, 1 Nov 2004, Dave Cramer wrote:

> As the server is nearing release, we should consider releasing a beta
> version of the driver.
> I think a feature freeze first, and since only Kris, and Oliver are
> adding things, this could be relatively short.

In addition to the 8.0 list
http://jdbc.postgresql.org/todo.html#Before+8.0+Release

I also have the setNull(i, Types.OTHER) and custom datatypes without
explicitly using addDataType on my list of 8.0 things.  Documentation is
also an always needed thing.

>
> Then tag a stable version and attempt to release it at the same time or
> shortly after the server ?

That's the idea.

Kris Jurka

Re: stable release

From
Dave Cramer
Date:
This seems like quite a bit of work, my understanding was that we were
shooting for releasing with the server ?

Dave
Kris Jurka wrote:

>On Mon, 1 Nov 2004, Dave Cramer wrote:
>
>
>
>>As the server is nearing release, we should consider releasing a beta
>>version of the driver.
>>I think a feature freeze first, and since only Kris, and Oliver are
>>adding things, this could be relatively short.
>>
>>
>
>In addition to the 8.0 list
>http://jdbc.postgresql.org/todo.html#Before+8.0+Release
>
>I also have the setNull(i, Types.OTHER) and custom datatypes without
>explicitly using addDataType on my list of 8.0 things.  Documentation is
>also an always needed thing.
>
>
>
>>Then tag a stable version and attempt to release it at the same time or
>>shortly after the server ?
>>
>>
>
>That's the idea.
>
>Kris Jurka
>
>---------------------------(end of broadcast)---------------------------
>TIP 3: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>
>
>
>

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561


Re: stable release

From
Kris Jurka
Date:

On Mon, 1 Nov 2004, Dave Cramer wrote:

> This seems like quite a bit of work, my understanding was that we were
> shooting for releasing with the server ?

The only one that looks particularly time consuming is the date/time
issues.  The ACL problem isn't really a must fix for release kind of thing
because I imagine it's largely unused.  The server release is coming, but
not all that quickly.  I suppose the issue is how much time do we
need/want for our own beta?

Kris Jurka

Re: stable release

From
Dave Cramer
Date:
Since the server is already in beta, and has had feature freeze for
quite a while, I'd like to see the same relatively shortly.

Can we agree on a hard date for feature freeze ? ( I'm suggesting
earlier than later )

Dave



Dave Cramer wrote:

> This seems like quite a bit of work, my understanding was that we were
> shooting for releasing with the server ?
>
> Dave
> Kris Jurka wrote:
>
>> On Mon, 1 Nov 2004, Dave Cramer wrote:
>>
>>
>>
>>> As the server is nearing release, we should consider releasing a
>>> beta version of the driver.
>>> I think a feature freeze first, and since only Kris, and Oliver are
>>> adding things, this could be relatively short.
>>>
>>
>>
>> In addition to the 8.0 list
>> http://jdbc.postgresql.org/todo.html#Before+8.0+Release
>>
>> I also have the setNull(i, Types.OTHER) and custom datatypes without
>> explicitly using addDataType on my list of 8.0 things.  Documentation
>> is also an always needed thing.
>>
>>
>>
>>> Then tag a stable version and attempt to release it at the same time
>>> or shortly after the server ?
>>>
>>
>>
>> That's the idea.
>>
>> Kris Jurka
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 3: if posting/reading through Usenet, please send an appropriate
>>      subscribe-nomail command to majordomo@postgresql.org so that your
>>      message can get through to the mailing list cleanly
>>
>>
>>
>>
>

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561


Re: stable release

From
Kris Jurka
Date:
On Wed, 3 Nov 2004, Dave Cramer wrote:

> Since the server is already in beta, and has had feature freeze for
> quite a while, I'd like to see the same relatively shortly.
>
> Can we agree on a hard date for feature freeze ? ( I'm suggesting
> earlier than later )
>

Well if we go with the servers definition of feature freeze (we won't
accept new features that patches have not yet been submitted for) then I
would say we are already in it.  For a beta date I would say Monday, Nov.
8 is reasonable for me.  Checking the server's open items list they do
seem to be closer than I thought, but they still haven't called for port
reports and all that, so I wouldn't imagine they'll release in less than a
month.

Kris Jurka

Re: stable release

From
Dave Cramer
Date:
Kris,

Monday the 8th sounds good to me.  My concern is that we don't get
nearly the exposure to users that the server gets so I'd like to get
some user testing before the server releases.

Oliver ?

Dave

Kris Jurka wrote:

>On Wed, 3 Nov 2004, Dave Cramer wrote:
>
>
>
>>Since the server is already in beta, and has had feature freeze for
>>quite a while, I'd like to see the same relatively shortly.
>>
>>Can we agree on a hard date for feature freeze ? ( I'm suggesting
>>earlier than later )
>>
>>
>>
>
>Well if we go with the servers definition of feature freeze (we won't
>accept new features that patches have not yet been submitted for) then I
>would say we are already in it.  For a beta date I would say Monday, Nov.
>8 is reasonable for me.  Checking the server's open items list they do
>seem to be closer than I thought, but they still haven't called for port
>reports and all that, so I wouldn't imagine they'll release in less than a
>month.
>
>Kris Jurka
>
>---------------------------(end of broadcast)---------------------------
>TIP 9: the planner will ignore your desire to choose an index scan if your
>      joining column's datatypes do not match
>
>
>
>

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561


Re: stable release

From
Oliver Jowett
Date:
Dave Cramer wrote:

> Monday the 8th sounds good to me.  My concern is that we don't get
> nearly the exposure to users that the server gets so I'd like to get
> some user testing before the server releases.

We are starting to get some testing already, which is good.

> Oliver ?

8th is fine, I am a bit out of action at the moment anyway so I don't
have any changes pending.

-O