Thread: named queries and the wire protocol
Hello, I sort of volunteered for work on an Erlang Postgres driver, here: https://github.com/epgsql/epgsql And one thing I was looking into was a patch regarding the cleanup of named queries. Specifically, I'm wondering how this code behaves in the case that the Execute runs into trouble: https://github.com/epgsql/epgsql/blob/0e84176be4b54fb712d1cc227a2b91c24b7a66ab/src/pgsql_sock.erl#L199 Does the Close still clean things up properly? Thank you, -- David N. Welton http://www.welton.it/davidw/ http://www.dedasys.com/
David Welton <davidnwelton@gmail.com> writes: > Specifically, I'm wondering how this code behaves in the case that the > Execute runs into trouble: > https://github.com/epgsql/epgsql/blob/0e84176be4b54fb712d1cc227a2b91c24b7a66ab/src/pgsql_sock.erl#L199 I guess you mean this: command({equery, Statement, Parameters}, State) -> #statement{name = StatementName, columns = Columns} = Statement, Bin1 = pgsql_wire:encode_parameters(Parameters), Bin2 = pgsql_wire:encode_formats(Columns), send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]), send(State, ?EXECUTE, ["", 0, <<0:?int32>>]), send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]), send(State, ?SYNC, []), {noreply, State}; Bearing in mind that I don't know Erlang, so I'm just assuming that these commands work as they're apparently intended to ... > Does the Close still clean things up properly? If either the Bind or the Execute fails, the server will discard messages till Sync, so the Close is not executed. But I'm not sure why you'd want this subroutine to destroy the prepared statement? Which is what this code appears to be doing. A Close on the unnamed portal created by the Bind would make sense there. It's not really necessary, since the unnamed portal is recycled anyway when next used, but it's good style. (Because it's not necessary, there's no need to worry about it not being executed if the Execute fails.) If you were using a named portal for execution, error recovery would become a more interesting topic, but with the unnamed portal you don't need to sweat it much. regards, tom lane
Hi, > David Welton <davidnwelton@gmail.com> writes: >> Specifically, I'm wondering how this code behaves in the case that the >> Execute runs into trouble: > >> https://github.com/epgsql/epgsql/blob/0e84176be4b54fb712d1cc227a2b91c24b7a66ab/src/pgsql_sock.erl#L199 > > I guess you mean this: Yes > command({equery, Statement, Parameters}, State) -> > #statement{name = StatementName, columns = Columns} = Statement, > Bin1 = pgsql_wire:encode_parameters(Parameters), > Bin2 = pgsql_wire:encode_formats(Columns), > send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]), > send(State, ?EXECUTE, ["", 0, <<0:?int32>>]), > send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]), > send(State, ?SYNC, []), > {noreply, State}; > > Bearing in mind that I don't know Erlang, so I'm just assuming that these > commands work as they're apparently intended to ... They're asynchronous, but yes, I think it's a safe assumption. >> Does the Close still clean things up properly? > If either the Bind or the Execute fails, the server will discard messages > till Sync, so the Close is not executed. But I'm not sure why you'd want > this subroutine to destroy the prepared statement? Which is what this > code appears to be doing. A Close on the unnamed portal created by the > Bind would make sense there. It's not really necessary, since the unnamed > portal is recycled anyway when next used, but it's good style. (Because > it's not necessary, there's no need to worry about it not being executed > if the Execute fails.) > If you were using a named portal for execution, error recovery would > become a more interesting topic, but with the unnamed portal you don't > need to sweat it much. This is code I inherited and am trying to clean up, so I'm not 100% sure why it does what it does. The general flow looks like this, though: equery(C, Sql, Parameters) -> Name = ["equery-", atom_to_list(node()), pid_to_list(self())], %% ^^^^^^^^^^^^^^^^^^^ generated name ^^^^^^^^^^^^^^^^^^^^^^^^^^ case parse(C, Name, Sql, []) of {ok, #statement{types = Types} = S} -> Typed_Parameters = lists:zip(Types, Parameters), gen_server:call(C, {equery, S, Typed_Parameters}, infinity); Error -> Error end. And then the code above. So it's generating a name itself and then destroying it once the query is done. Perhaps this behavior is not a good idea and using the unnamed portal would be a better idea? Thank you! -- David N. Welton http://www.dedasys.com/
David Welton <davidw@dedasys.com> writes: >> send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]), >> send(State, ?EXECUTE, ["", 0, <<0:?int32>>]), >> send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]), >> send(State, ?SYNC, []), > And then the code above. So it's generating a name itself and then > destroying it once the query is done. > Perhaps this behavior is not a good idea and using the unnamed portal > would be a better idea? My point is that it *is* using the unnamed portal, AFAICS --- the ""s in the Bind and Execute commands appear to correspond to the empty strings that would select that portal. The Close on the other hand is specifying closing a prepared statement, not a portal. If you're right about the control flow around this function, then the code is generating a prepared statement, using it once, and destroying it. Which is dumb; you should instead use the unnamed-statement protocol flow, which is better optimized for that usage pattern. regards, tom lane
Hi, On Thu, Mar 13, 2014 at 1:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > David Welton <davidw@dedasys.com> writes: >>> send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]), >>> send(State, ?EXECUTE, ["", 0, <<0:?int32>>]), >>> send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]), >>> send(State, ?SYNC, []), > >> And then the code above. So it's generating a name itself and then >> destroying it once the query is done. >> Perhaps this behavior is not a good idea and using the unnamed portal >> would be a better idea? > My point is that it *is* using the unnamed portal, AFAICS --- the ""s > in the Bind and Execute commands appear to correspond to the empty > strings that would select that portal. Ok, yes, that makes sense. > The Close on the other hand is specifying closing a prepared statement, > not a portal. If you're right about the control flow around this > function, then the code is generating a prepared statement, using it > once, and destroying it. Which is dumb; you should instead use the > unnamed-statement protocol flow, which is better optimized for that > usage pattern. We tracked down the commit that introduced the automatically generated prepared statement names: https://github.com/epgsql/epgsql/commit/dabf972f74735d2 The author wrote "Usage of unnamed prepared statement and portals leads to unpredictable results in case of concurrent access to same connection." For my own clarification, going by http://www.postgresql.org/docs/devel/static/protocol-overview.html - the named statement has no parameters - it's just a parsed statement, whereas a portal is a statement subsequently bound to some parameters? Can you or someone speak to the concurrency issues? A big thanks for taking the time to go over this with me, -- David N. Welton http://www.dedasys.com/
David Welton <davidw@dedasys.com> writes: > We tracked down the commit that introduced the automatically generated > prepared statement names: > https://github.com/epgsql/epgsql/commit/dabf972f74735d2 > The author wrote "Usage of unnamed prepared statement and portals > leads to unpredictable results in case of concurrent access to same > connection." Um ... in general, concurrent use of a single PG connection by multiple threads will not work, period. Neither the server nor the wire protocol definition are designed for that. Now, you can have multiple portals open and fetch a few rows at a time from each one, if you're careful to serialize the fetch operations; but TBH such usage is a niche case. It's possible this is of use depending on the details of the API your driver exposes to applications, but I don't have those details. If there's not locking in your driver that restricts concurrent access to the connection, then use of named rather than unnamed statements and portals isn't going to fix that. > For my own clarification, going by > http://www.postgresql.org/docs/devel/static/protocol-overview.html - > the named statement has no parameters - it's just a parsed statement, > whereas a portal is a statement subsequently bound to some parameters? A prepared statement is a static object created by a Parse protocol command. The Bind command creates a portal (a query execution state) from a statement, and must supply values for any parameters the statement has. (Too lazy to double-check, but I think either a named or an unnamed statement can have parameters.) Then you say Execute to run the portal. The main difference between named and unnamed statements/portals is that the protocol details are designed to simplify one-shot use of the unnamed statement and portal, for instance the provision for implicit Close of the old unnamed statement or portal if you just go and create a new one. Also the server is optimized to expect a single use of an unnamed statement vs multiple uses of named statements. regards, tom lane