Re: Roadmap for FE/BE protocol redesign - Mailing list pgsql-hackers

From Barry Lind
Subject Re: Roadmap for FE/BE protocol redesign
Date
Msg-id 3E6EEA9B.3050201@xythos.com
Whole thread Raw
In response to Roadmap for FE/BE protocol redesign  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Roadmap for FE/BE protocol redesign  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Roadmap for FE/BE protocol redesign  (Christof Petig <christof@petig-baender.de>)
List pgsql-hackers
> * Backend should pass its version number, database encoding, default
> client encoding, and possibly other data (any ideas?) to frontend during
> startup, to avoid need for explicit queries to get this info.  We could
> also consider eliminating SET commands sent by libpq in favor of adding
> variable settings to startup packet's PGOPTIONS field.  Ideally we could
> get back to the point where a standard connection startup takes only one
> packet in each direction.

This handles the JDBC needs (currently on startup the jdbc driver 
selects the database encoding and version number and sets the datestyle 
and autocommit parameters).

One addition I would personally like to see (it comes up in my apps 
code) is the ability to detect wheather the server is big endian or 
little endian.  When using binary cursors this is necessary in order to 
read int data.  Currently I issue a 'select 1' statement at connection 
startup to determine what format the server is using.


Other things I would like to see to help jdbc:

1) More information about the attributes selected in a query (I see 
there is an entire thread on this already) to minimize the work 
necessary to implement updateable result sets as defined by the jdbc spec.

2) Better support for domains.  Currently the jdbc driver is broken with 
regards to domains (although no one has reported this yet).  The driver 
will treat a datatype that is a domain as an unknown/unsupported 
datatype.  It would be great if the T response included the 'base' 
datatype for a domain attribute so that the driver would know what 
parsing routines to call to convert to/from the text representation the 
backend expects.

3) Protocol level support for CURSORs.  It would be nice if cursor 
support was done at the protocol level and not as a SQL command.  The 
current default behavior of returning all results from a query in the 
query response message is often a problem (can easily lead to out of 
memory problems for poorly written queries).  So it is desirable to use 
cursors.  But with the current implementation in SQL, cursors are not 
the appropriate choice if a query is only going to return one or a few 
rows.  The reason is that using a cursor requires a minimum of three SQL 
statements:  DECLARE, FETCH, CLOSE.  The jdbc driver issues the DECLARE 
and FETCH in one server call, but the CLOSE needs to be a second call. 
Thus for simple one row selects (which in many cases are the majority of 
selects issued) using CURSORS requires two roundtrips to the server vs. 
one for the nonCursor case.
This leaves me with a problem in the jdbc driver, I can either use 
standard fast/performant queries for single row selects that blowup with 
out of memory errors for large results, or I can use cursors and avoid 
large memory usage but hurt overall performance.  What I have currently 
done is require that the developer call an extra method to turn on the 
use of cursors when they know that the cursor is going to return a large 
number of rows and leave the default be the non-cursor case.  This works 
but requires that developers who are writing code to interact with 
multiple different databases, code differently for the postgres jdbc 
driver.  And this is a problem since one of the goals of jdbc is to be 
able to write code that works against multiple different databases.
So I would request the ability of the client to set a max rows parameter   for query results.  If a query were to
returnmore than the max 
 
number of rows, the client would be given a handle (essentially a cursor 
name) that it could use to fetch additional sets of rows.

4) Protocol level support of PREPARE.  In jdbc and most other 
interfaces, there is support for parameterized SQL.  If you want to take 
advantage of the performance benefits of reusing parsed plans you have 
to use the PREPARE SQL statement.  My complaint on doing this at the SQL 
level vs the protocol level is similar to the problem with cursors 
above.  To use prepare you need to issue three SQL statements: PREPARE, 
EXCECUTE, DEALLOCATE.  If you know ahead of time that you are going to 
reuse a statement many times doing PREPARE, EXECUTE, EXECUTE, ..., 
DEALLOCATE makes sense and can be a big win in performance.  However if 
you only ever execute the statement once then you need to use two round 
trips (one for the PREPARE, EXECUTE and another for the DEALLOCATE) 
versus one round trip to execute the statement 'normally'.  So it 
decreases performance to use prepares for all parameterized sql 
statements.  So the current implementation in jdbc requires the user to 
issue a postgres specific call to turn on the use of prepared statements 
for those cases the developer knows will be a performance win.  But this 
requires coding differently for postgres jdbc than for other databases.
So being better able to handle this in the protocol would be nice.

5) Better support for "large values".  Generally I recommend that users 
of jdbc use bytea to store large binary values.  I generally tell people 
to avoid using LOs (Large Objects).  The reason for this is that LOs 
have two significant problems:  1) security - any user on the database 
can access all LOs even though they may not be able to access the row 
that contains the LO reference, 2) cleanup - deleting the row containing 
the LO reference doesn't delete the LO requireing extra code or triggers 
to behave like a regular value in a regular column.  Bytea works OK for 
small to medium sized values, but doesn't work for very large values, 
where by very large I mean over a few Megabytes.  The reason very large 
values are a problem is memory usage.  There is no way to 'stream' bytea 
values from the server like you can do with LOs, so the driver ends up 
storeing the entire value in memory as it reads the result from the 
backend for a query.  And if the query returns multiple rows each with a 
large value you quickly run out of memory.
So what I would like to see is the ability for the client to set a MAX 
VALUE size parameter.  The server would send up to this amount of data 
for any column.  If the value was longer than MAX VALUE, the server 
would respond with a handle that the client could use to get the rest of 
the value (in chunks of MAX VALUE) if it wanted to.  This would allow 
the client to get the entire result set which could contain perhaps many 
large bytea values, but not use a lot of memory up front.  Then fetch 
the entire values only when/if the application asked for them and stream 
the result to the application and never bring the entire contents of the 
column into memory at once.  (There are probably a number of different 
implementation posibilities so use this one as a suggestion to explain 
what I would like to see not necessarily how it should be implemented).

6)  Better over the wire support for bytea.  The current encoding of 
binary data \000 results in a significant expansion in the size of data 
transmitted.  It would be nice if bytea data didn't result in 2 or 3 
times data expansion. (and all the cpu cycles to convert to/from the 
escaped format).  This may not be a protocol issue, but IMHO the best 
way to fix this would be in the protocol.

thanks,
--Barry





pgsql-hackers by date:

Previous
From: Hiroshi Inoue
Date:
Subject: Re: Roadmap for FE/BE protocol redesign
Next
From: Barry Lind
Date:
Subject: Re: Roadmap for FE/BE protocol redesign