Thread: More protocol discussion: breaking down query processing
Last month's discussions about a revised FE/BE protocol make it clear that there is usefulness in breaking down the query-processing cycle and making individual steps available at the protocol level. This surfaced in several forms in the discussion, including requests for protocol-level PREPARE functionality, concerns about the inefficiency of fetching a RowDescription with each row pulled from a cursor, etc. Here are some thoughts about how to do this. There are two intermediate objects involved in processing a query: a parsed query and a cursor (perhaps we can find better names for them). The parsed query contains all the info needed to execute a query, except for actual values of any parameter placeholders used in the query. A cursor is a parsed query instantiated with actual parameter values; it's ready to execute and return rows (if the query returns any --- a "cursor" for an UPDATE, say, wouldn't return anything). Query processing can then be broken down into several steps. The PARSE step takes a query text string, and optionally some indication of the types of the parameters needed, and produces a parsed query object (this is exactly equivalent to the existing PREPARE command's functionality). Parameter types are induced from the query if not specified. The BIND step takes a parsed-query object and some actual parameter values and produces a ready-to-execute cursor. The FETCH step fetches some or all rows from a cursor. One may CLOSE a cursor when done with it, and may DEALLOCATE a parsed query object when it's no longer needed. We will also want a DESCRIBE operation that can be applied to either cursors or parsed queries (these perhaps should be thought of as two separate operations). Describing a cursor yields info about the output columns (equivalent to RowDescription message in the present protocol). Describing a parsed query yields RowDescription plus some description of the parameter(s). The existing Q"sql statement" command can now be thought of as a macro that invokes PARSE, BIND no parameters, DESCRIBE cursor, FETCH all from cursor, CLOSE, DEALLOCATE. We will still offer that all-in-one functionality, but we'll also offer an extended query command that allows any combination of these functions to be requested at the protocol level. These protocol operations are interoperable with the SQL-level commands for prepared queries and cursors, in the sense that a prepared query or open cursor can be created either way and then used either way. (Alternatively we might think that this is a bad idea, and that protocol-level operations should use a different namespace from SQL commands, so that application-requested operations can't interfere with state that the client-side library has set up. Any opinions?) The BIND operation will allow actual parameter values to be sent in either text or binary form; in either case we'll use a bytecount-followed-by-data format so that it's fully eight-bit-clean. I am inclined to think also that the FETCH operation should allow specification of whether it wants the results in text or binary form. It's not real clear how that should interoperate with the existing DECLARE BINARY CURSOR command, though --- which one should win if there's a conflict? All of these functions can specify the relevant parsed-query and cursor objects by name. A named parsed-query object lives until end of session unless deallocated. A named cursor object lives until end of transaction unless closed. (This might be extended to allow holdable cursors, though I'm more inclined to say that such things can only be created by SQL commands.) Alternatively, the protocol functions can reference nameless (zero-length-name) parsed-query and cursor objects. There can be only one of each of these, living only till the next one is created. The system can optimize nameless objects a little since it knows it need not save them beyond the end of the current query. The all-in-one query command always uses nameless objects. A difficulty with this scheme is that it doesn't scale very well to querystrings containing more than one SQL statement. For the all-in-one query command, we can just define that the BIND-DESCRIBE-FETCH-CLOSE cycle is automatically repeated for each parsetree constructed from the string; this is backwards-compatible with what happens now. It seems fairly messy to extend that to the case where the steps are individually commanded ... especially if you want to assume that the steps are being commanded by a client library that hasn't parsed the querystring and doesn't know how many commands exist therein. Perhaps we can get away with disallowing multiple commands per string when using the extended protocol. Comments? regards, tom lane
There was some discussion of including the results of side effects (triggers etc.) along with the results of the primary query over the FE/BE protocol. Has this been taken any further for the new protocol version? If this is to be pursued, I think it would be great if: - The side effect response was explicitly marked as such (asopposed to just being the nth CursorResponse message) - There would be a way (SET, perhaps) to suppress side effect responses .
Yay, this would be very cool. Two small things to keep in mind during design, though they aren't necessarily things needed in a first implementation: 1) There's an alternate form of the execute step that provides an array of sets of bind values and executes the originalquery many times, once for each set of bind values. This is essential for performing large batch data updates efficiently.Usually this would be for insert or update statements, I'm not sure if other databases even allow it for selectqueries, though I could imagine it being useful. 2) There could be an option to delay optimization until the bind stage. This would be an option the interface layer or applicationwould provide in cases where either the query is unlikely to be repeated (such as if it was called from a higherlevel function that does the parse and bind step together) or is likely to be heavily affected by the input parameters(if the application programmer knows something special about the data distribution). Oh, and I'm sure it's obvious, but there would be no problem having multiple open cursors, right? So an application could prepare all the queries it's going to ever execute during initialization, then execute query 1, and for each row it fetches from 1 execute query 2 with calculated values. For some reason this isn't working with the PHP interface currently, I assume that's just a bug in the PHP driver, not in libpq. -- greg
Tom Lane wrote: > The existing Q"sql statement" command can now be thought of as a macro > that invokes PARSE, BIND no parameters, DESCRIBE cursor, FETCH all from > cursor, CLOSE, DEALLOCATE. This makes good sense to me. > These protocol operations are interoperable with the SQL-level commands > for prepared queries and cursors, in the sense that a prepared query or > open cursor can be created either way and then used either way. > (Alternatively we might think that this is a bad idea, and that > protocol-level operations should use a different namespace from SQL > commands, so that application-requested operations can't interfere with > state that the client-side library has set up. Any opinions?) Does this mean that a statement PREPAREd at the protocol level could be EXECUTEd at the sql level? If that's the case, I'd tend to agree it is a bad idea. > I am inclined to think also that the FETCH operation should allow > specification of whether it wants the results in text or binary form. > It's not real clear how that should interoperate with the existing > DECLARE BINARY CURSOR command, though --- which one should win if there's > a conflict? I'd think that binary support at the protocol level would obsolete the need for the DECLARE BINARY CURSOR command. A related question: how difficult would it be to support the ability to specify specific attributes as binary, while the rest are not. The reason this would be useful is that you might want to receive bytea fields in binary, but not have to deal with binary-to-string conversion of other fields. > A difficulty with this scheme is that it doesn't scale very well to > querystrings containing more than one SQL statement. For the all-in-one > query command, we can just define that the BIND-DESCRIBE-FETCH-CLOSE cycle > is automatically repeated for each parsetree constructed from the string; > this is backwards-compatible with what happens now. It seems fairly messy > to extend that to the case where the steps are individually commanded ... > especially if you want to assume that the steps are being commanded by a > client library that hasn't parsed the querystring and doesn't know how > many commands exist therein. Perhaps we can get away with disallowing > multiple commands per string when using the extended protocol. I think disallowing multiple commands is probably the way to go, but I guess you could make the interim results into arrays of interim results instead. Joe
Joe Conway <mail@joeconway.com> writes: > Tom Lane wrote: >> These protocol operations are interoperable with the SQL-level commands >> for prepared queries and cursors, in the sense that a prepared query or >> open cursor can be created either way and then used either way. >> (Alternatively we might think that this is a bad idea, and that >> protocol-level operations should use a different namespace from SQL >> commands, so that application-requested operations can't interfere with >> state that the client-side library has set up. Any opinions?) > Does this mean that a statement PREPAREd at the protocol level could be > EXECUTEd at the sql level? If that's the case, I'd tend to agree it is a > bad idea. I'm of two minds about it. On the basis of flexibility and ease of debugging I'd think that sharing one namespace is good. But I suppose a client library that wants to defend itself against clueless application programmers might prefer a separate namespace. > I'd think that binary support at the protocol level would obsolete the > need for the DECLARE BINARY CURSOR command. Yeah, but making something obsolete is not the same as being willing to remove it immediately. If we keep DECLARE BINARY CURSOR around, how should it act? > A related question: how difficult would it be to support the ability to > specify specific attributes as binary, while the rest are not. I'd prefer not to go there. It'd be messy and I don't see that it has any use for common client libraries. (Remember that most of this discussion has focused on serving client libs that don't want to know very much about the queries they're passing on --- so how are they going to know which columns to ask for in binary?) regards, tom lane
Tom Lane wrote: >>I'd think that binary support at the protocol level would obsolete the >>need for the DECLARE BINARY CURSOR command. > > Yeah, but making something obsolete is not the same as being willing to > remove it immediately. If we keep DECLARE BINARY CURSOR around, how > should it act? The protocol level should win if it is set to binary, but I think the statement level has to win otherwise in order to maintain backward compatibility, at least for the next release. Joe