Thread: Re: [JDBC] Strange server error with current 8.0beta driver
Tom, Here is what you requested. (Thanks to Oliver for the good logging in the jdbc driver). I also have the test case (in java) down to the bare minimum that generated the following output (that test case is attached). (Note that if the FETCH in the test case is not executed then the backend crashes; with the FETCH you get an error: "ERROR: unrecognized node type: 0") Thanks, --Barry PostgreSQL 8.0devel JDBC3 with SSL (build 308) Trying to establish a protocol version 3 connection to localhost:5432 FE=> StartupPacket(user=blind, database=fileswfs43, client_encoding=UNICODE, DateStyle=ISO) <=BE AuthenticationOk <=BE ParameterStatus(client_encoding = UNICODE) <=BE ParameterStatus(DateStyle = ISO, MDY) <=BE ParameterStatus(integer_datetimes = off) <=BE ParameterStatus(is_superuser = on) <=BE ParameterStatus(server_encoding = UNICODE) <=BE ParameterStatus(server_version = 8.0.0beta4) <=BE ParameterStatus(session_authorization = blind) <=BE BackendKeyData(pid=3348,ckey=914259969) <=BE ReadyForQuery(I) compatible = 8.0 loglevel = 2 prepare threshold = 1 getConnection returning driver[className=org.postgresql.Driver,org.postgresql.Driver@a9ae05] simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandl er@1df5a8f, maxRows=0, fetchSize=0, flags=0 FE=> Parse(stmt=S_1,query="BEGIN",oids={}) FE=> Bind(stmt=S_1,portal=null) FE=> Execute(portal=null,limit=0) FE=> Parse(stmt=S_2,query="DECLARE CUR CURSOR FOR SELECT 1",oids={}) FE=> Bind(stmt=S_2,portal=null) FE=> Describe(portal=null) FE=> Execute(portal=null,limit=0) FE=> Sync <=BE ParseComplete [S_1] <=BE BindComplete [null] <=BE CommandStatus(BEGIN) <=BE ParseComplete [S_2] <=BE BindComplete [null] <=BE NoData <=BE CommandStatus(DECLARE CURSOR) <=BE ReadyForQuery(T) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandl er@1db699b, maxRows=0, fetchSize=0, flags=0 FE=> Parse(stmt=S_3,query="FETCH FORWARD 10 FROM CUR",oids={}) FE=> Bind(stmt=S_3,portal=null) FE=> Describe(portal=null) FE=> Execute(portal=null,limit=0) FE=> Sync <=BE ParseComplete [S_3] <=BE BindComplete [null] <=BE RowDescription(1) <=BE DataRow <=BE CommandStatus(FETCH) <=BE ReadyForQuery(T) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandl er@12a0f6c, maxRows=0, fetchSize=0, flags=0 FE=> Parse(stmt=S_4,query="CLOSE CUR",oids={}) FE=> Bind(stmt=S_4,portal=null) FE=> Describe(portal=null) FE=> Execute(portal=null,limit=0) FE=> Sync <=BE ParseComplete [S_4] <=BE BindComplete [null] <=BE NoData <=BE CommandStatus(CLOSE CURSOR) <=BE ReadyForQuery(T) simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandl er@134a7d8, maxRows=0, fetchSize=0, flags=0 FE=> Bind(stmt=S_2,portal=null) FE=> Describe(portal=null) FE=> Execute(portal=null,limit=0) FE=> Sync <=BE BindComplete [null] <=BE NoData <=BE ErrorMessage(ERROR: unrecognized node type: 0 Location: File: clauses.c, Routine: expression_tree_mutator, Line: 3220 Server SQLState: XX000) java.sql.SQLException: ERROR: unrecognized node type: 0 Location: File: clauses.c, Routine: expression_tree_mutator, Line: 3220 Server SQLState: XX000 at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecu torImpl.java:1356) at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImp l.java:1151) at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java: 166) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Stateme nt.java:363) at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdb c2Statement.java:308) at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Stateme nt.java:299) at test80.main(test80.java:31) SQLException: SQLState(XX000) <=BE ReadyForQuery(E) java.sql.SQLException: ERROR: unrecognized node type: 0 Location: File: clauses.c, Routine: expression_tree_mutator, Line: 3220 Server SQLState: XX000 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, November 23, 2004 7:10 AM To: Barry Lind Cc: pgsql-jdbc@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [JDBC] Strange server error with current 8.0beta driver "Barry Lind" <blind@xythos.com> writes: > OK, getting closer. The error happens if in jdbc I reuse > PreparedStatement objects to reexecute the same set of queries > multiple times. The TRAP you showed looked like it might have something to do with trying to execute code outside any transaction. But I dunno how it got there. > If you still need a test case, let me know, and I will continue to > package up what I have been working on. What I'd actually rather have is a list of the exact sequence of messages sent to the server. regards, tom lane
Attachment
Barry Lind wrote: > I also have the test case (in java) down to the bare minimum that > generated the following output (that test case is attached). (Note that > if the FETCH in the test case is not executed then the backend crashes; > with the FETCH you get an error: "ERROR: unrecognized node type: 0") I narrowed this down to: while (true) { l_stmtDeclare.execute(); } producing: > FE=> Parse(stmt=S_1,query="BEGIN",oids={}) > FE=> Bind(stmt=S_1,portal=null) > FE=> Execute(portal=null,limit=0) > FE=> Parse(stmt=S_2,query="DECLARE CUR CURSOR FOR SELECT 1",oids={}) > FE=> Bind(stmt=S_2,portal=null) > FE=> Describe(portal=null) > FE=> Execute(portal=null,limit=0) > FE=> Sync > <=BE ParseComplete [S_1] > <=BE BindComplete [null] > <=BE CommandStatus(BEGIN) > <=BE ParseComplete [S_2] > <=BE BindComplete [null] > <=BE NoData > <=BE CommandStatus(DECLARE CURSOR) > <=BE ReadyForQuery(T) > simple execute, handler=org.postgresql.jdbc2.AbstractJdbc2Statement$StatementResultHandler@7ecd2c3c, maxRows=0, fetchSize=0,flags=0 > FE=> Bind(stmt=S_2,portal=null) > FE=> Describe(portal=null) > FE=> Execute(portal=null,limit=0) > FE=> Sync > <=BE BindComplete [null] > <=BE NoData > <=BE ErrorMessage(ERROR: unrecognized node type: 2139062143 > Location: File: clauses.c, Routine: expression_tree_mutator, Line: 3237 > Server SQLState: XX000) Valgrind says this is the culprit: > ==26451== Invalid read of size 4 > ==26451== at 0x8185C86: eval_const_expressions_mutator (clauses.c:1185) > ==26451== by 0x8185C32: eval_const_expressions (clauses.c:1152) > ==26451== by 0x817D1A6: preprocess_expression (planner.c:415) > ==26451== by 0x817CEBF: subquery_planner (planner.c:240) > ==26451== by 0x817CD59: planner (planner.c:129) > ==26451== by 0x810DF03: PerformCursorOpen (portalcmds.c:87) > ==26451== by 0x81C1402: PortalRunUtility (pquery.c:934) > ==26451== by 0x81C1762: PortalRunMulti (pquery.c:1001) > ==26451== by 0x81C0D8E: PortalRun (pquery.c:617) > ==26451== by 0x81BDDA7: exec_execute_message (postgres.c:1673) > ==26451== by 0x81BF6E1: PostgresMain (postgres.c:3035) > ==26451== by 0x818FC39: BackendRun (postmaster.c:2817) > ==26451== by 0x818F642: BackendStartup (postmaster.c:2453) > ==26451== by 0x818D989: ServerLoop (postmaster.c:1198) > ==26451== by 0x818CDBA: PostmasterMain (postmaster.c:917) > ==26451== by 0x81570F4: main (main.c:268) > ==26451== Address 0x1BBBF704 is 260 bytes inside a block of size 1024 free'd > ==26451== at 0x1B905460: free (vg_replace_malloc.c:153) > ==26451== by 0x8245706: AllocSetDelete (aset.c:466) > ==26451== by 0x82468B8: MemoryContextDelete (mcxt.c:193) > ==26451== by 0x8247BCF: PortalDrop (portalmem.c:384) > ==26451== by 0x82475B5: CreatePortal (portalmem.c:179) > ==26451== by 0x81BD735: exec_bind_message (postgres.c:1369) > ==26451== by 0x81BF4EF: PostgresMain (postgres.c:3023) > ==26451== by 0x818FC39: BackendRun (postmaster.c:2817) > ==26451== by 0x818F642: BackendStartup (postmaster.c:2453) > ==26451== by 0x818D989: ServerLoop (postmaster.c:1198) > ==26451== by 0x818CDBA: PostmasterMain (postmaster.c:917) > ==26451== by 0x81570F4: main (main.c:268) With a bit of gdb work, I think what is happening is this: The first Execute of S_2, running in portal context, calls the planner on the query contained in S_2's DeclareCursorStmt. The planner modifies the query tree in the course of planning it (specifically, it modifies parse->targetList). Memory allocated for the modified query comes from the portal context. The portal context is freed implicitly by the second Bind of S_2 (second stack trace above). The second Execute of S_2 then tries to use parse->targetList when planning (first stack trace above), but that's now pointing to freed memory. Boom. Perhaps PerformCursorOpen should copy the query tree before planning, or plan in a different memory context? -O
Oliver Jowett wrote: > Perhaps PerformCursorOpen should copy the query tree before planning, or > plan in a different memory context? Patch attached. It moves query planning inside the new portal's memory context. With this applied I can run Barry's testcase without errors, and valgrind seems OK with it too. -O Index: src/backend/commands/portalcmds.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/commands/portalcmds.c,v retrieving revision 1.36 diff -c -r1.36 portalcmds.c *** src/backend/commands/portalcmds.c 16 Sep 2004 16:58:28 -0000 1.36 --- src/backend/commands/portalcmds.c 24 Nov 2004 09:28:34 -0000 *************** *** 67,73 **** * query, so we are not expecting rule rewriting to do anything * strange. */ ! rewritten = QueryRewrite((Query *) stmt->query); if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query)) elog(ERROR, "unexpected rewrite result"); query = (Query *) linitial(rewritten); --- 67,86 ---- * query, so we are not expecting rule rewriting to do anything * strange. */ ! ! /* Create a new portal, and do all query planning on a copy of ! * the query allocated in the new portal's memory context. The ! * planner may modify the query, and it is not safe to have ! * those modifications persist as we are ourselves running in a ! * transient portal context. ! */ ! portal = CreatePortal(stmt->portalname, false, false); ! ! oldContext = MemoryContextSwitchTo(PortalGetHeapMemory(portal)); ! ! query = copyObject(stmt->query); ! ! rewritten = QueryRewrite(query); if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query)) elog(ERROR, "unexpected rewrite result"); query = (Query *) linitial(rewritten); *************** *** 86,102 **** plan = planner(query, true, stmt->options, NULL); - /* - * Create a portal and copy the query and plan into its memory - * context. - */ - portal = CreatePortal(stmt->portalname, false, false); - - oldContext = MemoryContextSwitchTo(PortalGetHeapMemory(portal)); - - query = copyObject(query); - plan = copyObject(plan); - PortalDefineQuery(portal, NULL, /* unfortunately don't have sourceText */ "SELECT", /* cursor's query is always a SELECT */ --- 99,104 ----
Oliver Jowett <oliver@opencloud.com> writes: >> Perhaps PerformCursorOpen should copy the query tree before planning, or >> plan in a different memory context? > Patch attached. It moves query planning inside the new portal's memory > context. With this applied I can run Barry's testcase without errors, > and valgrind seems OK with it too. I think the better solution is the first way (copy the querytree first). The problem with the way you did it is that all the temporary structures built by the planner will be left behind in the cursor's memory context, and can't be reclaimed until the cursor is destroyed. In the case of a complex query that could represent a pretty serious memory leak. It seems better to eat the cost of copying the querytree an extra time, especially since this way forms a patch that's easy to reverse whenever we fix the planner to be less cavalier about scribbling on its input. I've applied the attached patch instead (and analogously in 7.4 branch). Would you confirm it fixes the problem you see? regards, tom lane *** src/backend/commands/portalcmds.c.orig Thu Sep 16 12:58:28 2004 --- src/backend/commands/portalcmds.c Sun Nov 28 17:02:22 2004 *************** *** 62,73 **** RequireTransactionChain((void *) stmt, "DECLARE CURSOR"); /* * The query has been throughparse analysis, but not rewriting or * planning as yet. Note that the grammar ensured we have a SELECT * query, so we are not expecting rule rewriting to do anything * strange. */ ! rewritten = QueryRewrite((Query *) stmt->query); if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query)) elog(ERROR, "unexpected rewrite result"); query = (Query *) linitial(rewritten); --- 62,82 ---- RequireTransactionChain((void *) stmt, "DECLARE CURSOR"); /* + * Because the planner is not cool about not scribbling on its input, + * we make a preliminary copy of the source querytree. This prevents + * problems in the case that the DECLARE CURSOR is in a portal and is + * executed repeatedly. XXX the planner really shouldn't modify its + * input ... FIXME someday. + */ + query = copyObject(stmt->query); + + /* * The query has been through parse analysis, but not rewriting or * planning as yet. Note that the grammarensured we have a SELECT * query, so we are not expecting rule rewriting to do anything * strange. */ ! rewritten = QueryRewrite(query); if (list_length(rewritten) != 1 || !IsA(linitial(rewritten), Query)) elog(ERROR,"unexpected rewrite result"); query = (Query *) linitial(rewritten);