Thread: General Parser
I was still worried about that "two words" table bug, so I tried myself on the parser todo. I skimmed through the code and found parsers at the following spots: (tell me if I missed something) 1. AbstractJdbc2Statement.replaceProcessing 2. AbstractJdbc2Statement.modifyJdbcCall 3. V2Query constructor 4. v3/QueryExecutorImpl.parseQuery 5. AbstractJdbc2ResultSet.parseQuery Problems I discovered: Number 3 and 4 do not handle an escaped literal quote, i.e. \' Number 5 doesn't handle (among other things) any quotes, joins, unions. Finally, none of the above handle dollar quoting. Attached you'll find a QueryParser class I've written that has all the functionality that is implemented in the above methods and handles dollar quoting. It parses the query in one run, so I believe it's pretty fast. For testing purposes it has a main method so you can fire queries at it and it will show you the determined subqueries and their fragments (delimited by X?X). I hope it's what you had in mind with the parser todo. The class still needs extensive testing but I wanted to make sure that I'm doing the right thing first...so what are your thoughts? Uli
Attachment
Ulrich Meis wrote: > 3. V2Query constructor > 4. v3/QueryExecutorImpl.parseQuery > Number 3 and 4 do not handle an escaped literal quote, i.e. \' Also they don't handle double quotes ("table with ? in it") properly. I've just committed fixes for these problems -- could you take a look and see if they look OK now? -O
Ulrich Meis wrote: > Attached you'll find a QueryParser class I've written that has all the > functionality that is implemented in the above methods and handles dollar > quoting. It parses the query in one run, so I believe it's pretty fast. > The class still needs extensive testing but I wanted to make sure that I'm > doing the right thing first...so what are your thoughts? I think it's a good idea to unify all the parsing that needs to be done, if only because it reduces code duplication. That said, this still needs some work as the various different callers need different parsing done. For example, we only care about splitting queries into multiple statements when using the V3 protocol, we only care about ? placeholders when parsing a query supplied via prepareStatement, the { call } syntax only makes sense for CallableStatements, and there's no need to disassemble the SELECT etc if the application never uses an updatable resultset. I'm not sure how easy it will be to have a single flexible parser and still have low overhead for the cases where we don't want to completely disassemble the query. -O
Hi Oliver! On Monday 01 November 2004 12:12, you wrote: > That said, this still needs some work as the various different callers > need different parsing done. For example, we only care about splitting > queries into multiple statements when using the V3 protocol, we only > care about ? placeholders when parsing a query supplied via > prepareStatement, the { call } syntax only makes sense for > CallableStatements, and there's no need to disassemble the SELECT etc if > the application never uses an updatable resultset. I thought about these aspects and in a first draft I splitted off the disassembly of select statements in a separate method. But finally, I merged them again because I thought the overhead would be neglectable. I never considered splitting off the others because they really only involve one char comparison per word(literals/identifiers being one word) or less. All the handling you describe above is done in the addFraction method and that is roughly called once per word. There's not a single loop so we don't add to complexity. If not neglectable in the scope of the parser itself, I would think that it is regarding the whole query execution, i.e. parsing,data conversion, sending the query, the backend handling the query, receiving the results, data conversion. If you have an application that fires tons of queries that the backend needs almost no time to execute for, then those queries are likely to be very short. In that case parsing is quite quickly, too. The cases in detail: As said, all this happens once per word: 1. Splitting multiple statements. if (first == SEMICOLON) {....} That char comparison is all the overhead if it's a single query only. Maybe it would even be nice to be able to throw an exception if multiple queries are used in V2? 2. ? placeholders if (first == QUESTIONMARK) {...} As above. 3. call syntax if (first == QUESTIONMARK) {...} if (fraction.equalsIgnoreCase("call")) {...} Those two comparisons are only done on the first word in a Java Escape and otherwise never executed! 4. select statement disassembly This is the only case I would consider to split off although I think it's not necessary. The whole section is completely ignored if the following holds true, i.e. it's not a select statement and/or we already know it's not to a single table: if (querytype == NONSINGLE) break; Otherwise, we have on average less than one string comparison per word. > I'm not sure how easy it will be to have a single flexible parser and > still have low overhead for the cases where we don't want to completely > disassemble the query. Most of the overhead here comes from determining the tableName for an updatable ResultSet. I can easily move that handling into a separate method. In that case, I would make the parser keep the fractions ArrayList, so that when the table name is actually needed in a ResultSet, the parsing can be done real quick. Tell me if you prefer it that way and I'll do it. I would really recommend to keep the other handling where it is, I don't think it is neccessary to complicate things here. But tell me if I'm wrong. Regards, Uli
On Monday 01 November 2004 12:06, you wrote: > Ulrich Meis wrote: > > 3. V2Query constructor > > 4. v3/QueryExecutorImpl.parseQuery > > > > Number 3 and 4 do not handle an escaped literal quote, i.e. \' > > Also they don't handle double quotes ("table with ? in it") properly. > > I've just committed fixes for these problems -- could you take a look > and see if they look OK now? Seems fine! Uli
On Monday 01 November 2004 06:12, Oliver Jowett wrote: > if only because it reduces code duplication. Speaking of which, I ran CPD on the source tree and found a few cases of copied and pasted code. The results are posted here: http://people.redhat.com/~vadimn/scratch/pgsql-jdbc/pgjdbc-cpd-brief.txt http://people.redhat.com/~vadimn/scratch/pgsql-jdbc/pgjdbc-cpd-full.txt I haven't had a chance to try and assess how difficult it would be to factor out the duplicated chunks. Vadim
As the server is nearing release, we should consider releasing a beta version of the driver. I think a feature freeze first, and since only Kris, and Oliver are adding things, this could be relatively short. Then tag a stable version and attempt to release it at the same time or shortly after the server ? Thoughts ? Dave -- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561
Just to make sure we are on the same page...I thought of the class to be instantiated only once for every query. Methods that need access to subqueries or fragments fetch them from that object. Uli
On Mon, 1 Nov 2004, Dave Cramer wrote: > As the server is nearing release, we should consider releasing a beta > version of the driver. > I think a feature freeze first, and since only Kris, and Oliver are > adding things, this could be relatively short. In addition to the 8.0 list http://jdbc.postgresql.org/todo.html#Before+8.0+Release I also have the setNull(i, Types.OTHER) and custom datatypes without explicitly using addDataType on my list of 8.0 things. Documentation is also an always needed thing. > > Then tag a stable version and attempt to release it at the same time or > shortly after the server ? That's the idea. Kris Jurka
This seems like quite a bit of work, my understanding was that we were shooting for releasing with the server ? Dave Kris Jurka wrote: >On Mon, 1 Nov 2004, Dave Cramer wrote: > > > >>As the server is nearing release, we should consider releasing a beta >>version of the driver. >>I think a feature freeze first, and since only Kris, and Oliver are >>adding things, this could be relatively short. >> >> > >In addition to the 8.0 list >http://jdbc.postgresql.org/todo.html#Before+8.0+Release > >I also have the setNull(i, Types.OTHER) and custom datatypes without >explicitly using addDataType on my list of 8.0 things. Documentation is >also an always needed thing. > > > >>Then tag a stable version and attempt to release it at the same time or >>shortly after the server ? >> >> > >That's the idea. > >Kris Jurka > >---------------------------(end of broadcast)--------------------------- >TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > > -- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561
On Mon, 1 Nov 2004, Dave Cramer wrote: > This seems like quite a bit of work, my understanding was that we were > shooting for releasing with the server ? The only one that looks particularly time consuming is the date/time issues. The ACL problem isn't really a must fix for release kind of thing because I imagine it's largely unused. The server release is coming, but not all that quickly. I suppose the issue is how much time do we need/want for our own beta? Kris Jurka
Since the server is already in beta, and has had feature freeze for quite a while, I'd like to see the same relatively shortly. Can we agree on a hard date for feature freeze ? ( I'm suggesting earlier than later ) Dave Dave Cramer wrote: > This seems like quite a bit of work, my understanding was that we were > shooting for releasing with the server ? > > Dave > Kris Jurka wrote: > >> On Mon, 1 Nov 2004, Dave Cramer wrote: >> >> >> >>> As the server is nearing release, we should consider releasing a >>> beta version of the driver. >>> I think a feature freeze first, and since only Kris, and Oliver are >>> adding things, this could be relatively short. >>> >> >> >> In addition to the 8.0 list >> http://jdbc.postgresql.org/todo.html#Before+8.0+Release >> >> I also have the setNull(i, Types.OTHER) and custom datatypes without >> explicitly using addDataType on my list of 8.0 things. Documentation >> is also an always needed thing. >> >> >> >>> Then tag a stable version and attempt to release it at the same time >>> or shortly after the server ? >>> >> >> >> That's the idea. >> >> Kris Jurka >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 3: if posting/reading through Usenet, please send an appropriate >> subscribe-nomail command to majordomo@postgresql.org so that your >> message can get through to the mailing list cleanly >> >> >> >> > -- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561
On Wed, 3 Nov 2004, Dave Cramer wrote: > Since the server is already in beta, and has had feature freeze for > quite a while, I'd like to see the same relatively shortly. > > Can we agree on a hard date for feature freeze ? ( I'm suggesting > earlier than later ) > Well if we go with the servers definition of feature freeze (we won't accept new features that patches have not yet been submitted for) then I would say we are already in it. For a beta date I would say Monday, Nov. 8 is reasonable for me. Checking the server's open items list they do seem to be closer than I thought, but they still haven't called for port reports and all that, so I wouldn't imagine they'll release in less than a month. Kris Jurka
Kris, Monday the 8th sounds good to me. My concern is that we don't get nearly the exposure to users that the server gets so I'd like to get some user testing before the server releases. Oliver ? Dave Kris Jurka wrote: >On Wed, 3 Nov 2004, Dave Cramer wrote: > > > >>Since the server is already in beta, and has had feature freeze for >>quite a while, I'd like to see the same relatively shortly. >> >>Can we agree on a hard date for feature freeze ? ( I'm suggesting >>earlier than later ) >> >> >> > >Well if we go with the servers definition of feature freeze (we won't >accept new features that patches have not yet been submitted for) then I >would say we are already in it. For a beta date I would say Monday, Nov. >8 is reasonable for me. Checking the server's open items list they do >seem to be closer than I thought, but they still haven't called for port >reports and all that, so I wouldn't imagine they'll release in less than a >month. > >Kris Jurka > >---------------------------(end of broadcast)--------------------------- >TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match > > > > -- Dave Cramer http://www.postgresintl.com 519 939 0336 ICQ#14675561
Dave Cramer wrote: > Monday the 8th sounds good to me. My concern is that we don't get > nearly the exposure to users that the server gets so I'd like to get > some user testing before the server releases. We are starting to get some testing already, which is good. > Oliver ? 8th is fine, I am a bit out of action at the moment anyway so I don't have any changes pending. -O