Thread: named queries and the wire protocol

named queries and the wire protocol

From
David Welton
Date:
Hello,

I sort of volunteered for work on an Erlang Postgres driver, here:

https://github.com/epgsql/epgsql

And one thing I was looking into was a patch regarding the cleanup of
named queries.

Specifically, I'm wondering how this code behaves in the case that the
Execute runs into trouble:

https://github.com/epgsql/epgsql/blob/0e84176be4b54fb712d1cc227a2b91c24b7a66ab/src/pgsql_sock.erl#L199

Does the Close still clean things up properly?

Thank you,
--
David N. Welton

http://www.welton.it/davidw/

http://www.dedasys.com/


Re: named queries and the wire protocol

From
Tom Lane
Date:
David Welton <davidnwelton@gmail.com> writes:
> Specifically, I'm wondering how this code behaves in the case that the
> Execute runs into trouble:

> https://github.com/epgsql/epgsql/blob/0e84176be4b54fb712d1cc227a2b91c24b7a66ab/src/pgsql_sock.erl#L199

I guess you mean this:

command({equery, Statement, Parameters}, State) ->
    #statement{name = StatementName, columns = Columns} = Statement,
    Bin1 = pgsql_wire:encode_parameters(Parameters),
    Bin2 = pgsql_wire:encode_formats(Columns),
    send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]),
    send(State, ?EXECUTE, ["", 0, <<0:?int32>>]),
    send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]),
    send(State, ?SYNC, []),
    {noreply, State};

Bearing in mind that I don't know Erlang, so I'm just assuming that these
commands work as they're apparently intended to ...

> Does the Close still clean things up properly?

If either the Bind or the Execute fails, the server will discard messages
till Sync, so the Close is not executed.  But I'm not sure why you'd want
this subroutine to destroy the prepared statement?  Which is what this
code appears to be doing.  A Close on the unnamed portal created by the
Bind would make sense there.  It's not really necessary, since the unnamed
portal is recycled anyway when next used, but it's good style.  (Because
it's not necessary, there's no need to worry about it not being executed
if the Execute fails.)

If you were using a named portal for execution, error recovery would
become a more interesting topic, but with the unnamed portal you don't
need to sweat it much.

            regards, tom lane


Re: named queries and the wire protocol

From
David Welton
Date:
Hi,

> David Welton <davidnwelton@gmail.com> writes:
>> Specifically, I'm wondering how this code behaves in the case that the
>> Execute runs into trouble:
>
>> https://github.com/epgsql/epgsql/blob/0e84176be4b54fb712d1cc227a2b91c24b7a66ab/src/pgsql_sock.erl#L199
>
> I guess you mean this:

Yes

> command({equery, Statement, Parameters}, State) ->
>     #statement{name = StatementName, columns = Columns} = Statement,
>     Bin1 = pgsql_wire:encode_parameters(Parameters),
>     Bin2 = pgsql_wire:encode_formats(Columns),
>     send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]),
>     send(State, ?EXECUTE, ["", 0, <<0:?int32>>]),
>     send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]),
>     send(State, ?SYNC, []),
>     {noreply, State};
>
> Bearing in mind that I don't know Erlang, so I'm just assuming that these
> commands work as they're apparently intended to ...

They're asynchronous, but yes, I think it's a safe assumption.

>> Does the Close still clean things up properly?

> If either the Bind or the Execute fails, the server will discard messages
> till Sync, so the Close is not executed.  But I'm not sure why you'd want
> this subroutine to destroy the prepared statement?  Which is what this
> code appears to be doing.  A Close on the unnamed portal created by the
> Bind would make sense there.  It's not really necessary, since the unnamed
> portal is recycled anyway when next used, but it's good style.  (Because
> it's not necessary, there's no need to worry about it not being executed
> if the Execute fails.)

> If you were using a named portal for execution, error recovery would
> become a more interesting topic, but with the unnamed portal you don't
> need to sweat it much.

This is code I inherited and am trying to clean up, so I'm not 100%
sure why it does what it does.  The general flow looks like this,
though:

equery(C, Sql, Parameters) ->
    Name = ["equery-", atom_to_list(node()), pid_to_list(self())],
%% ^^^^^^^^^^^^^^^^^^^ generated name ^^^^^^^^^^^^^^^^^^^^^^^^^^

    case parse(C, Name, Sql, []) of
        {ok, #statement{types = Types} = S} ->
            Typed_Parameters = lists:zip(Types, Parameters),
            gen_server:call(C, {equery, S, Typed_Parameters}, infinity);
        Error ->
            Error
    end.

And then the code above.  So it's generating a name itself and then
destroying it once the query is done.

Perhaps this behavior is not a good idea and using the unnamed portal
would be a better idea?

Thank you!

--
David N. Welton

http://www.dedasys.com/


Re: named queries and the wire protocol

From
Tom Lane
Date:
David Welton <davidw@dedasys.com> writes:
>> send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]),
>> send(State, ?EXECUTE, ["", 0, <<0:?int32>>]),
>> send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]),
>> send(State, ?SYNC, []),

> And then the code above.  So it's generating a name itself and then
> destroying it once the query is done.
> Perhaps this behavior is not a good idea and using the unnamed portal
> would be a better idea?

My point is that it *is* using the unnamed portal, AFAICS --- the ""s
in the Bind and Execute commands appear to correspond to the empty
strings that would select that portal.

The Close on the other hand is specifying closing a prepared statement,
not a portal.  If you're right about the control flow around this
function, then the code is generating a prepared statement, using it
once, and destroying it.  Which is dumb; you should instead use the
unnamed-statement protocol flow, which is better optimized for that
usage pattern.

            regards, tom lane


Re: named queries and the wire protocol

From
David Welton
Date:
Hi,

On Thu, Mar 13, 2014 at 1:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Welton <davidw@dedasys.com> writes:
>>> send(State, ?BIND, ["", 0, StatementName, 0, Bin1, Bin2]),
>>> send(State, ?EXECUTE, ["", 0, <<0:?int32>>]),
>>> send(State, ?CLOSE, [?PREPARED_STATEMENT, StatementName, 0]),
>>> send(State, ?SYNC, []),
>
>> And then the code above.  So it's generating a name itself and then
>> destroying it once the query is done.
>> Perhaps this behavior is not a good idea and using the unnamed portal
>> would be a better idea?

> My point is that it *is* using the unnamed portal, AFAICS --- the ""s
> in the Bind and Execute commands appear to correspond to the empty
> strings that would select that portal.

Ok, yes, that makes sense.

> The Close on the other hand is specifying closing a prepared statement,
> not a portal.  If you're right about the control flow around this
> function, then the code is generating a prepared statement, using it
> once, and destroying it.  Which is dumb; you should instead use the
> unnamed-statement protocol flow, which is better optimized for that
> usage pattern.

We tracked down the commit that introduced the automatically generated
prepared statement names:

https://github.com/epgsql/epgsql/commit/dabf972f74735d2

The author wrote "Usage of unnamed prepared statement and portals
leads to unpredictable results in case of concurrent access to same
connection."

For my own clarification, going by
http://www.postgresql.org/docs/devel/static/protocol-overview.html -
the named statement has no parameters - it's just a parsed statement,
whereas a portal is a statement subsequently bound to some parameters?

Can you or someone speak to the concurrency issues?

A big thanks for taking the time to go over this with me,
--
David N. Welton

http://www.dedasys.com/


Re: named queries and the wire protocol

From
Tom Lane
Date:
David Welton <davidw@dedasys.com> writes:
> We tracked down the commit that introduced the automatically generated
> prepared statement names:

> https://github.com/epgsql/epgsql/commit/dabf972f74735d2

> The author wrote "Usage of unnamed prepared statement and portals
> leads to unpredictable results in case of concurrent access to same
> connection."

Um ... in general, concurrent use of a single PG connection by multiple
threads will not work, period.  Neither the server nor the wire protocol
definition are designed for that.

Now, you can have multiple portals open and fetch a few rows at a time
from each one, if you're careful to serialize the fetch operations;
but TBH such usage is a niche case.  It's possible this is of use
depending on the details of the API your driver exposes to applications,
but I don't have those details.

If there's not locking in your driver that restricts concurrent access to
the connection, then use of named rather than unnamed statements and
portals isn't going to fix that.

> For my own clarification, going by
> http://www.postgresql.org/docs/devel/static/protocol-overview.html -
> the named statement has no parameters - it's just a parsed statement,
> whereas a portal is a statement subsequently bound to some parameters?

A prepared statement is a static object created by a Parse protocol
command.  The Bind command creates a portal (a query execution state) from
a statement, and must supply values for any parameters the statement has.
(Too lazy to double-check, but I think either a named or an unnamed
statement can have parameters.)  Then you say Execute to run the portal.

The main difference between named and unnamed statements/portals is that
the protocol details are designed to simplify one-shot use of the unnamed
statement and portal, for instance the provision for implicit Close of the
old unnamed statement or portal if you just go and create a new one.  Also
the server is optimized to expect a single use of an unnamed statement vs
multiple uses of named statements.

            regards, tom lane