Thread: More protocol discussion: breaking down query processing

More protocol discussion: breaking down query processing

From
Tom Lane
Date:
Last month's discussions about a revised FE/BE protocol make it clear that
there is usefulness in breaking down the query-processing cycle and making
individual steps available at the protocol level.  This surfaced in
several forms in the discussion, including requests for protocol-level
PREPARE functionality, concerns about the inefficiency of fetching a
RowDescription with each row pulled from a cursor, etc.  Here are some
thoughts about how to do this.

There are two intermediate objects involved in processing a query: a
parsed query and a cursor (perhaps we can find better names for them).
The parsed query contains all the info needed to execute a query, except
for actual values of any parameter placeholders used in the query.
A cursor is a parsed query instantiated with actual parameter values;
it's ready to execute and return rows (if the query returns any --- a
"cursor" for an UPDATE, say, wouldn't return anything).

Query processing can then be broken down into several steps.  The PARSE
step takes a query text string, and optionally some indication of the
types of the parameters needed, and produces a parsed query object (this
is exactly equivalent to the existing PREPARE command's functionality).
Parameter types are induced from the query if not specified.  The BIND
step takes a parsed-query object and some actual parameter values and
produces a ready-to-execute cursor.  The FETCH step fetches some or all
rows from a cursor.  One may CLOSE a cursor when done with it, and may
DEALLOCATE a parsed query object when it's no longer needed.  We will also
want a DESCRIBE operation that can be applied to either cursors or parsed
queries (these perhaps should be thought of as two separate operations).
Describing a cursor yields info about the output columns (equivalent to
RowDescription message in the present protocol).  Describing a parsed
query yields RowDescription plus some description of the parameter(s).

The existing Q"sql statement" command can now be thought of as a macro
that invokes PARSE, BIND no parameters, DESCRIBE cursor, FETCH all from
cursor, CLOSE, DEALLOCATE.

We will still offer that all-in-one functionality, but we'll also offer an
extended query command that allows any combination of these functions to
be requested at the protocol level.

These protocol operations are interoperable with the SQL-level commands
for prepared queries and cursors, in the sense that a prepared query or
open cursor can be created either way and then used either way.
(Alternatively we might think that this is a bad idea, and that
protocol-level operations should use a different namespace from SQL
commands, so that application-requested operations can't interfere with
state that the client-side library has set up.  Any opinions?)

The BIND operation will allow actual parameter values to be sent in either
text or binary form; in either case we'll use a bytecount-followed-by-data
format so that it's fully eight-bit-clean.

I am inclined to think also that the FETCH operation should allow
specification of whether it wants the results in text or binary form.
It's not real clear how that should interoperate with the existing
DECLARE BINARY CURSOR command, though --- which one should win if there's
a conflict?

All of these functions can specify the relevant parsed-query and cursor
objects by name.  A named parsed-query object lives until end of session
unless deallocated.  A named cursor object lives until end of transaction
unless closed.  (This might be extended to allow holdable cursors, though
I'm more inclined to say that such things can only be created by SQL
commands.)  Alternatively, the protocol functions can reference nameless
(zero-length-name) parsed-query and cursor objects.  There can be only one
of each of these, living only till the next one is created.  The system
can optimize nameless objects a little since it knows it need not save
them beyond the end of the current query.  The all-in-one query command
always uses nameless objects.

A difficulty with this scheme is that it doesn't scale very well to
querystrings containing more than one SQL statement.  For the all-in-one
query command, we can just define that the BIND-DESCRIBE-FETCH-CLOSE cycle
is automatically repeated for each parsetree constructed from the string;
this is backwards-compatible with what happens now.  It seems fairly messy
to extend that to the case where the steps are individually commanded ...
especially if you want to assume that the steps are being commanded by a
client library that hasn't parsed the querystring and doesn't know how
many commands exist therein.  Perhaps we can get away with disallowing
multiple commands per string when using the extended protocol.

Comments?
        regards, tom lane



FE/BE - Side effects

From
Bruce Badger
Date:
There was some discussion of including the results of side effects
(triggers etc.) along with the results of the primary query over the
FE/BE protocol.

Has this been taken any further for the new protocol version?

If this is to be pursued, I think it would be great if: - The side effect response was explicitly marked as such
(asopposed to just being the nth CursorResponse message) - There would be a way (SET, perhaps) to suppress side effect
 responses .
 



Re: More protocol discussion: breaking down query processing

From
Greg Stark
Date:
Yay, this would be very cool.

Two small things to keep in mind during design, though they aren't necessarily
things needed in a first implementation:

1) There's an alternate form of the execute step that provides an array of  sets of bind values and executes the
originalquery many times, once for  each set of bind values. This is essential for performing large batch data  updates
efficiently.Usually this would be for insert or update statements,  I'm not sure if other databases even allow it for
selectqueries, though I  could imagine it being useful.
 

2) There could be an option to delay optimization until the bind stage. This  would be an option the interface layer or
applicationwould provide in  cases where either the query is unlikely to be repeated (such as if it was  called from a
higherlevel function that does the parse and bind step  together) or is likely to be heavily affected by the input
parameters(if  the application programmer knows something special about the data  distribution).
 

Oh, and I'm sure it's obvious, but there would be no problem having multiple
open cursors, right? So an application could prepare all the queries it's
going to ever execute during initialization, then execute query 1, and for
each row it fetches from 1 execute query 2 with calculated values. For some
reason this isn't working with the PHP interface currently, I assume that's
just a bug in the PHP driver, not in libpq.

--
greg



Re: [HACKERS] More protocol discussion: breaking down query processing

From
Joe Conway
Date:
Tom Lane wrote:
> The existing Q"sql statement" command can now be thought of as a macro
> that invokes PARSE, BIND no parameters, DESCRIBE cursor, FETCH all from
> cursor, CLOSE, DEALLOCATE.

This makes good sense to me.

> These protocol operations are interoperable with the SQL-level commands
> for prepared queries and cursors, in the sense that a prepared query or
> open cursor can be created either way and then used either way.
> (Alternatively we might think that this is a bad idea, and that
> protocol-level operations should use a different namespace from SQL
> commands, so that application-requested operations can't interfere with
> state that the client-side library has set up.  Any opinions?)

Does this mean that a statement PREPAREd at the protocol level could be 
EXECUTEd at the sql level? If that's the case, I'd tend to agree it is a 
bad idea.

> I am inclined to think also that the FETCH operation should allow
> specification of whether it wants the results in text or binary form.
> It's not real clear how that should interoperate with the existing
> DECLARE BINARY CURSOR command, though --- which one should win if there's
> a conflict?

I'd think that binary support at the protocol level would obsolete the 
need for the DECLARE BINARY CURSOR command.

A related question: how difficult would it be to support the ability to 
specify specific attributes as binary, while the rest are not. The 
reason this would be useful is that you might want to receive bytea 
fields in binary, but not have to deal with binary-to-string conversion 
of other fields.

> A difficulty with this scheme is that it doesn't scale very well to
> querystrings containing more than one SQL statement.  For the all-in-one
> query command, we can just define that the BIND-DESCRIBE-FETCH-CLOSE cycle
> is automatically repeated for each parsetree constructed from the string;
> this is backwards-compatible with what happens now.  It seems fairly messy
> to extend that to the case where the steps are individually commanded ...
> especially if you want to assume that the steps are being commanded by a
> client library that hasn't parsed the querystring and doesn't know how
> many commands exist therein.  Perhaps we can get away with disallowing
> multiple commands per string when using the extended protocol.

I think disallowing multiple commands is probably the way to go, but I 
guess you could make the interim results into arrays of interim results 
instead.

Joe



Re: [HACKERS] More protocol discussion: breaking down query processing

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> Tom Lane wrote:
>> These protocol operations are interoperable with the SQL-level commands
>> for prepared queries and cursors, in the sense that a prepared query or
>> open cursor can be created either way and then used either way.
>> (Alternatively we might think that this is a bad idea, and that
>> protocol-level operations should use a different namespace from SQL
>> commands, so that application-requested operations can't interfere with
>> state that the client-side library has set up.  Any opinions?)

> Does this mean that a statement PREPAREd at the protocol level could be 
> EXECUTEd at the sql level? If that's the case, I'd tend to agree it is a 
> bad idea.

I'm of two minds about it.  On the basis of flexibility and ease of
debugging I'd think that sharing one namespace is good.  But I suppose
a client library that wants to defend itself against clueless
application programmers might prefer a separate namespace.

> I'd think that binary support at the protocol level would obsolete the 
> need for the DECLARE BINARY CURSOR command.

Yeah, but making something obsolete is not the same as being willing to
remove it immediately.  If we keep DECLARE BINARY CURSOR around, how
should it act?

> A related question: how difficult would it be to support the ability to 
> specify specific attributes as binary, while the rest are not.

I'd prefer not to go there.  It'd be messy and I don't see that it has
any use for common client libraries.  (Remember that most of this
discussion has focused on serving client libs that don't want to know
very much about the queries they're passing on --- so how are they going
to know which columns to ask for in binary?)
        regards, tom lane



Re: [HACKERS] More protocol discussion: breaking down

From
Joe Conway
Date:
Tom Lane wrote:
>>I'd think that binary support at the protocol level would obsolete the 
>>need for the DECLARE BINARY CURSOR command.
> 
> Yeah, but making something obsolete is not the same as being willing to
> remove it immediately.  If we keep DECLARE BINARY CURSOR around, how
> should it act?

The protocol level should win if it is set to binary, but I think the 
statement level has to win otherwise in order to maintain backward 
compatibility, at least for the next release.

Joe