Thread: PostgreSQL gaps wrt to java, and jdbc
Binary represenation of XML and JSON instead of converting it to a String BEFORE transmission. Both JSON and XML essentially are graphs, and can be inflated with whitespace on the driver side AFTER transmission. Same in the other direction (deflate by transmitting binary object graph instead of whitespace-inflated String representation).
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
Sent: Montag, 6. Juli 2015 14:15
To: List
Subject: [JDBC] PostgreSQL gaps wrt to java, and jdbc
I have been actively maintaining the driver off and on since 1999 or so. Recently we have had a flurry of activity and one of the things I noticed was the surprise that PostgreSQL didn't support X or did Y in some unexpected way. Sometimes we are a bit too complacent, and accept things the way they are.
I am wondering what could the server do better that would help JDBC?
Obviously streaming a column is one. Possibly rowid's. Anything else ?
Dave Cramer
Binary represenation of XML and JSON instead of converting it to a String BEFORE transmission. Both JSON and XML essentially are graphs, and can be inflated with whitespace on the driver side AFTER transmission. Same in the other direction (deflate by transmitting binary object graph instead of whitespace-inflated String representation).
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
Sent: Montag, 6. Juli 2015 14:15
To: List
Subject: [JDBC] PostgreSQL gaps wrt to java, and jdbc
I have been actively maintaining the driver off and on since 1999 or so. Recently we have had a flurry of activity and one of the things I noticed was the surprise that PostgreSQL didn't support X or did Y in some unexpected way. Sometimes we are a bit too complacent, and accept things the way they are.
I am wondering what could the server do better that would help JDBC?
Obviously streaming a column is one. Possibly rowid's. Anything else ?
Dave Cramer
By “streaming a column” I assume you mean large bytea columns? If so, I say… hell yes. Also,.. == Binary type coercion == The biggest impediment I’ve run into is the fact that PG will not attempt coercion on binary types. If you specifya binary type in the protocol, that’s the final word. A better system would be to have PG perform the same coercionit does for text types and then send the results back with coerced types, in binary; finally falling back to textwhen binary cannot be produced. This would solve almost every issue we have had to work around in the “ng” driver. == Unbound binary results == Being able to send a query, without the parse & bind step, that returns binary results. == Better cursor support == Providing capabilities that better match JDBC. PG is pretty close but a few cases make cursors not work as expecteda lot of the time. I could probably come up with a bunch more but these are the big ones that jump out at me. I have a couple solutions (with no idea how to implement them)... For "binary type coercion” & “unbound binary results” a simple setting flag telling PG that we basically know how to handleall the binary types would suffice. In this flag was set PG could just start returning stuff in the fashion we wantand it would helpfully “just work”. For streaming it seems a sub mode, similar to the copy protocol, that doesn’t force us out of the transaction but can sendus multiple packets would really help. Although I know there are many factors at work on the server that I definitelydon’t understand. > On Jul 6, 2015, at 5:14 AM, Dave Cramer <davecramer@gmail.com> wrote: > > I have been actively maintaining the driver off and on since 1999 or so. Recently we have had a flurry of activity andone of the things I noticed was the surprise that PostgreSQL didn't support X or did Y in some unexpected way. Sometimeswe are a bit too complacent, and accept things the way they are. > > I am wondering what could the server do better that would help JDBC? > > Obviously streaming a column is one. Possibly rowid's. Anything else ? > > > Dave Cramer
On 07/07/2015 09:31 PM, Kevin Wooten wrote: > == Binary type coercion == > The biggest impediment I’ve run into is the fact that PG will not > attempt coercion on binary types. If you specify a binary type in the > protocol, that’s the final word. A better system would be to have PG > perform the same coercion it does for text types and then send the > results back with coerced types, in binary; finally falling back to > text when binary cannot be produced. This would solve almost every > issue we have had to work around in the “ng” driver. Can you elaborate? I'm confused, because the server can produce binary output for any datatype that has binary output functions. That covers all built-in datatypes and all extension datatypes people use in practice. What I've imagined to be a problem, though, is that you have to either understand the binary representation of all datatypes in the client, or you have to request binary for only those datatypes you can handle. And to know which datatype a result set's column has, you have to Describe it before fetching the results, which adds an extra round-trip. So it would be handy if the driver could provide the server a list of datatypes that should be sent as binary, as a one-time operation at the beginning of the session for example, rather than indicating for each column in each query separately. - Heikki
On 07/07/2015 09:31 PM, Kevin Wooten wrote:== Binary type coercion ==
The biggest impediment I’ve run into is the fact that PG will not
attempt coercion on binary types. If you specify a binary type in the
protocol, that’s the final word. A better system would be to have PG
perform the same coercion it does for text types and then send the
results back with coerced types, in binary; finally falling back to
text when binary cannot be produced. This would solve almost every
issue we have had to work around in the “ng” driver.
Can you elaborate? I'm confused, because the server can produce binary output for any datatype that has binary output functions. That covers all built-in datatypes and all extension datatypes people use in practice.
What I've imagined to be a problem, though, is that you have to either understand the binary representation of all datatypes in the client, or you have to request binary for only those datatypes you can handle. And to know which datatype a result set's column has, you have to Describe it before fetching the results, which adds an extra round-trip. So it would be handy if the driver could provide the server a list of datatypes that should be sent as binary, as a one-time operation at the beginning of the session for example, rather than indicating for each column in each query separately.
On 07/07/2015 10:51 PM, Dave Cramer wrote: > On 7 July 2015 at 15:46, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> On 07/07/2015 09:31 PM, Kevin Wooten wrote: >> >>> == Binary type coercion == >>> The biggest impediment I’ve run into is the fact that PG will not >>> attempt coercion on binary types. If you specify a binary type in the >>> protocol, that’s the final word. A better system would be to have PG >>> perform the same coercion it does for text types and then send the >>> results back with coerced types, in binary; finally falling back to >>> text when binary cannot be produced. This would solve almost every >>> issue we have had to work around in the “ng” driver. >>> >> >> Can you elaborate? I'm confused, because the server can produce binary >> output for any datatype that has binary output functions. That covers all >> built-in datatypes and all extension datatypes people use in practice. >> >> What I've imagined to be a problem, though, is that you have to either >> understand the binary representation of all datatypes in the client, or you >> have to request binary for only those datatypes you can handle. And to know >> which datatype a result set's column has, you have to Describe it before >> fetching the results, which adds an extra round-trip. So it would be handy >> if the driver could provide the server a list of datatypes that should be >> sent as binary, as a one-time operation at the beginning of the session for >> example, rather than indicating for each column in each query separately. >> > Is there a way to get json and or xml out as binary ? Well, as far as the protocol is considered, yes. However, the "binary representation" of json and xml datatypes is not too exciting; here's the binary send function for json for example: /* * Binary send. */ Datum json_send(PG_FUNCTION_ARGS) { text *t = PG_GETARG_TEXT_PP(0); StringInfoData buf; pq_begintypsend(&buf); pq_sendtext(&buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t)); PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); } I.e. it just sends the string with a bytea header. There is no benefit to using that instead of the regular text representation. That dummy representation is used just so that e.g. binary-format pg_dump works. - Heikki
Although what you suggest might be useful. For the ng driver we’ve handled that. At startup we download every type andmap them based on the send,recv,in,out. So we just specify them for each query. As far as “coercion" goes maybe a better term would be “deduction”. There are numerous cases where it would be great ifwe could send the server a “don’t care” type and just a binary format. Instead we have to specify a type and format together.Doing this causes issues in edge cases (e.g. strings coming back padded because we specified varchar but the columnwas text). Actually an easier solution specific to our implementation would be to simply tell the driver to always return results inbinary when it’s available. It would cover most cases including removing the parse/bind step for many queries. > On Jul 7, 2015, at 12:46 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 07/07/2015 09:31 PM, Kevin Wooten wrote: >> == Binary type coercion == >> The biggest impediment I’ve run into is the fact that PG will not >> attempt coercion on binary types. If you specify a binary type in the >> protocol, that’s the final word. A better system would be to have PG >> perform the same coercion it does for text types and then send the >> results back with coerced types, in binary; finally falling back to >> text when binary cannot be produced. This would solve almost every >> issue we have had to work around in the “ng” driver. > > Can you elaborate? I'm confused, because the server can produce binary output for any datatype that has binary output functions.That covers all built-in datatypes and all extension datatypes people use in practice. > > What I've imagined to be a problem, though, is that you have to either understand the binary representation of all datatypesin the client, or you have to request binary for only those datatypes you can handle. And to know which datatypea result set's column has, you have to Describe it before fetching the results, which adds an extra round-trip. Soit would be handy if the driver could provide the server a list of datatypes that should be sent as binary, as a one-timeoperation at the beginning of the session for example, rather than indicating for each column in each query separately. > > - Heikki >
On Jul 7, 2015, at 1:02 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:On 07/07/2015 10:51 PM, Dave Cramer wrote:On 7 July 2015 at 15:46, Heikki Linnakangas <hlinnaka@iki.fi> wrote:On 07/07/2015 09:31 PM, Kevin Wooten wrote:Is there a way to get json and or xml out as binary ?== Binary type coercion ==
The biggest impediment I’ve run into is the fact that PG will not
attempt coercion on binary types. If you specify a binary type in the
protocol, that’s the final word. A better system would be to have PG
perform the same coercion it does for text types and then send the
results back with coerced types, in binary; finally falling back to
text when binary cannot be produced. This would solve almost every
issue we have had to work around in the “ng” driver.
Can you elaborate? I'm confused, because the server can produce binary
output for any datatype that has binary output functions. That covers all
built-in datatypes and all extension datatypes people use in practice.
What I've imagined to be a problem, though, is that you have to either
understand the binary representation of all datatypes in the client, or you
have to request binary for only those datatypes you can handle. And to know
which datatype a result set's column has, you have to Describe it before
fetching the results, which adds an extra round-trip. So it would be handy
if the driver could provide the server a list of datatypes that should be
sent as binary, as a one-time operation at the beginning of the session for
example, rather than indicating for each column in each query separately.
Well, as far as the protocol is considered, yes. However, the "binary representation" of json and xml datatypes is not too exciting; here's the binary send function for json for example:
/*
* Binary send.
*/
Datum
json_send(PG_FUNCTION_ARGS)
{
text *t = PG_GETARG_TEXT_PP(0);
StringInfoData buf;
pq_begintypsend(&buf);
pq_sendtext(&buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t));
PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
}
I.e. it just sends the string with a bytea header. There is no benefit to using that instead of the regular text representation. That dummy representation is used just so that e.g. binary-format pg_dump works.
- Heikki
Well, as far as the protocol is considered, yes. However, the "binary representation" of json and xml datatypes is not too exciting; here's the binary send function for json for example:On 07/07/2015 10:51 PM, Dave Cramer wrote:On 7 July 2015 at 15:46, Heikki Linnakangas <hlinnaka@iki.fi> wrote:On 07/07/2015 09:31 PM, Kevin Wooten wrote:Is there a way to get json and or xml out as binary ?== Binary type coercion ==
The biggest impediment I’ve run into is the fact that PG will not
attempt coercion on binary types. If you specify a binary type in the
protocol, that’s the final word. A better system would be to have PG
perform the same coercion it does for text types and then send the
results back with coerced types, in binary; finally falling back to
text when binary cannot be produced. This would solve almost every
issue we have had to work around in the “ng” driver.
Can you elaborate? I'm confused, because the server can produce binary
output for any datatype that has binary output functions. That covers all
built-in datatypes and all extension datatypes people use in practice.
What I've imagined to be a problem, though, is that you have to either
understand the binary representation of all datatypes in the client, or you
have to request binary for only those datatypes you can handle. And to know
which datatype a result set's column has, you have to Describe it before
fetching the results, which adds an extra round-trip. So it would be handy
if the driver could provide the server a list of datatypes that should be
sent as binary, as a one-time operation at the beginning of the session for
example, rather than indicating for each column in each query separately.
/*
* Binary send.
*/
Datum
json_send(PG_FUNCTION_ARGS)
{
text *t = PG_GETARG_TEXT_PP(0);
StringInfoData buf;
pq_begintypsend(&buf);
pq_sendtext(&buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t));
PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
}
I.e. it just sends the string with a bytea header. There is no benefit to using that instead of the regular text representation. That dummy representation is used just so that e.g. binary-format pg_dump works.
- Heikki
On 07/07/2015 11:04 PM, Kevin Wooten wrote: > As far as “coercion" goes maybe a better term would be “deduction”. > There are numerous cases where it would be great if we could send the > server a “don’t care” type and just a binary format. Instead we > have to specify a type and format together. Doing this causes issues > in edge cases (e.g. strings coming back padded because we specified > varchar but the column was text). Oh, are you talking about query parameters, sent from the client to the server? The type OID and format code are not actually sent in the same message - you give the OID in the Parse message and the format code in Bind. You don't have to specify the OID in the Parse if you don't want to, and the server will deduce the most sensible one from the query context. After the Parse, you can find out what the server deduced by sending a ParameterDescription message, and then form the appropriate binary representation for the deduced type, and send it with Bind. Am I missing something? - Heikki
On 07/07/2015 11:04 PM, Kevin Wooten wrote:As far as “coercion" goes maybe a better term would be “deduction”.
There are numerous cases where it would be great if we could send the
server a “don’t care” type and just a binary format. Instead we
have to specify a type and format together. Doing this causes issues
in edge cases (e.g. strings coming back padded because we specified
varchar but the column was text).
Oh, are you talking about query parameters, sent from the client to the server? The type OID and format code are not actually sent in the same message - you give the OID in the Parse message and the format code in Bind. You don't have to specify the OID in the Parse if you don't want to, and the server will deduce the most sensible one from the query context. After the Parse, you can find out what the server deduced by sending a ParameterDescription message, and then form the appropriate binary representation for the deduced type, and send it with Bind. Am I missing something?
On 07/07/2015 11:08 PM, Kevin Wooten wrote: > >> On Jul 7, 2015, at 1:02 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> On 07/07/2015 10:51 PM, Dave Cramer wrote: >>> Is there a way to get json and or xml out as binary ? >> >> Well, as far as the protocol is considered, yes. However, the >> "binary representation" of json and xml datatypes is not too >> exciting; here's the binary send function for json for example: >> >> /* >> * Binary send. >> */ >> Datum >> json_send(PG_FUNCTION_ARGS) >> { >> text *t = PG_GETARG_TEXT_PP(0); >> StringInfoData buf; >> >> pq_begintypsend(&buf); >> pq_sendtext(&buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t)); >> PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); >> } >> >> I.e. it just sends the string with a bytea header. There is no >> benefit to using that instead of the regular text representation. >> That dummy representation is used just so that e.g. binary-format >> pg_dump works. > > This confused me when we went to implement this as a binary type. > Isn’t it stored in a binary format? Or is that just an intermediate > format that’s cached around inside the server? Json is stored as text. It's really nothing more than a text field with a validation step in the input function to check that it's valid json. It preserves formatting and all. Jsonb is stored in a binary format, but funnily enough its "binary representation" in the protocol is mostly text: > /* > * jsonb type send function > * > * Just send jsonb as a version number, then a string of text > */ > Datum > jsonb_send(PG_FUNCTION_ARGS) > { > Jsonb *jb = PG_GETARG_JSONB(0); > StringInfoData buf; > StringInfo jtext = makeStringInfo(); > int version = 1; > > (void) JsonbToCString(jtext, &jb->root, VARSIZE(jb)); > > pq_begintypsend(&buf); > pq_sendint(&buf, version, 1); > pq_sendtext(&buf, jtext->data, jtext->len); > pfree(jtext->data); > pfree(jtext); > > PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); > } There's a version number there, so perhaps we'll replace that with something more binary in the future ;-) - Heikki
> On Jul 7, 2015, at 1:30 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 07/07/2015 11:08 PM, Kevin Wooten wrote: >> >>> On Jul 7, 2015, at 1:02 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: >>> >>> On 07/07/2015 10:51 PM, Dave Cramer wrote: >>>> Is there a way to get json and or xml out as binary ? >>> >>> Well, as far as the protocol is considered, yes. However, the >>> "binary representation" of json and xml datatypes is not too >>> exciting; here's the binary send function for json for example: >>> >>> /* >>> * Binary send. >>> */ >>> Datum >>> json_send(PG_FUNCTION_ARGS) >>> { >>> text *t = PG_GETARG_TEXT_PP(0); >>> StringInfoData buf; >>> >>> pq_begintypsend(&buf); >>> pq_sendtext(&buf, VARDATA_ANY(t), VARSIZE_ANY_EXHDR(t)); >>> PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); >>> } >>> >>> I.e. it just sends the string with a bytea header. There is no >>> benefit to using that instead of the regular text representation. >>> That dummy representation is used just so that e.g. binary-format >>> pg_dump works. >> >> This confused me when we went to implement this as a binary type. >> Isn’t it stored in a binary format? Or is that just an intermediate >> format that’s cached around inside the server? > > Json is stored as text. It's really nothing more than a text field with a validation step in the input function to checkthat it's valid json. It preserves formatting and all. > > Jsonb is stored in a binary format, but funnily enough its "binary representation" in the protocol is mostly text: > >> /* >> * jsonb type send function >> * >> * Just send jsonb as a version number, then a string of text >> */ >> Datum >> jsonb_send(PG_FUNCTION_ARGS) >> { >> Jsonb *jb = PG_GETARG_JSONB(0); >> StringInfoData buf; >> StringInfo jtext = makeStringInfo(); >> int version = 1; >> >> (void) JsonbToCString(jtext, &jb->root, VARSIZE(jb)); >> >> pq_begintypsend(&buf); >> pq_sendint(&buf, version, 1); >> pq_sendtext(&buf, jtext->data, jtext->len); >> pfree(jtext->data); >> pfree(jtext); >> >> PG_RETURN_BYTEA_P(pq_endtypsend(&buf)); >> } > > There's a version number there, so perhaps we'll replace that with something more binary in the future ;-) > > - Heikki > Apologies I wasn’t paying enough attention to see the first example wasn’t JSONB. Seems crazy to add another format indirectionwith the “version” field. Seems the field itself would have solved whatever problems would have been encounteredwith just spitting out the binary type in the first place.
On 07/07/2015 11:26 PM, Dave Cramer wrote: > On 7 July 2015 at 16:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > >> On 07/07/2015 11:04 PM, Kevin Wooten wrote: >> >>> As far as “coercion" goes maybe a better term would be “deduction”. >>> There are numerous cases where it would be great if we could send the >>> server a “don’t care” type and just a binary format. Instead we >>> have to specify a type and format together. Doing this causes issues >>> in edge cases (e.g. strings coming back padded because we specified >>> varchar but the column was text). >> >> Oh, are you talking about query parameters, sent from the client to the >> server? The type OID and format code are not actually sent in the same >> message - you give the OID in the Parse message and the format code in >> Bind. You don't have to specify the OID in the Parse if you don't want to, >> and the server will deduce the most sensible one from the query context. >> After the Parse, you can find out what the server deduced by sending a >> ParameterDescription message, and then form the appropriate binary >> representation for the deduced type, and send it with Bind. Am I missing >> something? >> > I think Kevin was hoping the server would just "figure it out" without all > the back and forth. Well, if you use the binary representation, you need to somehow tell the server which datatype's binary representation you're using. Perhaps you could tell the server "I'm sending this in the binary representation of int4, but I still want you to deduce the actual type as if it was unknown", but that raises the question: if you know it's an integer, why do you want the server to do the deduction? - Heikki
On Jul 7, 2015, at 1:26 PM, Dave Cramer <davecramer@gmail.com> wrote:On 7 July 2015 at 16:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:On 07/07/2015 11:04 PM, Kevin Wooten wrote:As far as “coercion" goes maybe a better term would be “deduction”.
There are numerous cases where it would be great if we could send the
server a “don’t care” type and just a binary format. Instead we
have to specify a type and format together. Doing this causes issues
in edge cases (e.g. strings coming back padded because we specified
varchar but the column was text).
Oh, are you talking about query parameters, sent from the client to the server? The type OID and format code are not actually sent in the same message - you give the OID in the Parse message and the format code in Bind. You don't have to specify the OID in the Parse if you don't want to, and the server will deduce the most sensible one from the query context. After the Parse, you can find out what the server deduced by sending a ParameterDescription message, and then form the appropriate binary representation for the deduced type, and send it with Bind. Am I missing something?I think Kevin was hoping the server would just "figure it out" without all the back and forth.Dave Cramer
No, unfortunately not, as there is a big difference between compression and binary transmission. See the following example:
<Some-Element xyz="345678,901">
<Some-Other-Element abc="54321,876">
<Some-Value my-attribute="123456,789"/>
</Some-Other-Element>
</Some-Element>
With compression, you can certainly get rid of the whitespace, and if the compression algorithms is better you even will have refs instead of element names, but that's it, mostly. What about the numbers? Still transfered untouched, as unique hence uncompressable. So lots of integers and decimals screws compression.
Binary transmission on the other hand will only need four bytes per integer. That makes twelve bytes for all the above attributes.
To sum up, compressed transmission should be an option that is performed on a lower level (like gzip transmission with http), but is *additive* to binary transmission. The latter should be used even in case no compressed transmission is in place at all.
-Markus
From: davecramer@gmail.com [mailto:davecramer@gmail.com] On Behalf Of Dave Cramer
Sent: Dienstag, 7. Juli 2015 19:54
To: Markus KARG
Cc: List
Subject: Re: [JDBC] PostgreSQL gaps wrt to java, and jdbc
Looking at the backend code. One possibility is to just use compression to send it over?
Dave Cramer
dave.cramer(at)credativ(dot)ca
http://www.credativ.ca
On 6 July 2015 at 12:35, Markus KARG <markus@headcrashing.eu> wrote:
Binary represenation of XML and JSON instead of converting it to a String BEFORE transmission. Both JSON and XML essentially are graphs, and can be inflated with whitespace on the driver side AFTER transmission. Same in the other direction (deflate by transmitting binary object graph instead of whitespace-inflated String representation).
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
Sent: Montag, 6. Juli 2015 14:15
To: List
Subject: [JDBC] PostgreSQL gaps wrt to java, and jdbc
I have been actively maintaining the driver off and on since 1999 or so. Recently we have had a flurry of activity and one of the things I noticed was the surprise that PostgreSQL didn't support X or did Y in some unexpected way. Sometimes we are a bit too complacent, and accept things the way they are.
I am wondering what could the server do better that would help JDBC?
Obviously streaming a column is one. Possibly rowid's. Anything else ?
Dave Cramer
No, unfortunately not, as there is a big difference between compression and binary transmission. See the following example:
<Some-Element xyz="345678,901">
<Some-Other-Element abc="54321,876">
<Some-Value my-attribute="123456,789"/>
</Some-Other-Element>
</Some-Element>
With compression, you can certainly get rid of the whitespace, and if the compression algorithms is better you even will have refs instead of element names, but that's it, mostly. What about the numbers? Still transfered untouched, as unique hence uncompressable. So lots of integers and decimals screws compression.
Binary transmission on the other hand will only need four bytes per integer. That makes twelve bytes for all the above attributes.
From: davecramer@gmail.com [mailto:davecramer@gmail.com] On Behalf Of Dave Cramer
Sent: Dienstag, 7. Juli 2015 19:54
To: Markus KARG
Cc: List
Subject: Re: [JDBC] PostgreSQL gaps wrt to java, and jdbc
Looking at the backend code. One possibility is to just use compression to send it over?
Dave Cramer
dave.cramer(at)credativ(dot)ca
http://www.credativ.ca
On 6 July 2015 at 12:35, Markus KARG <markus@headcrashing.eu> wrote:
Binary represenation of XML and JSON instead of converting it to a String BEFORE transmission. Both JSON and XML essentially are graphs, and can be inflated with whitespace on the driver side AFTER transmission. Same in the other direction (deflate by transmitting binary object graph instead of whitespace-inflated String representation).
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
Sent: Montag, 6. Juli 2015 14:15
To: List
Subject: [JDBC] PostgreSQL gaps wrt to java, and jdbc
I have been actively maintaining the driver off and on since 1999 or so. Recently we have had a flurry of activity and one of the things I noticed was the surprise that PostgreSQL didn't support X or did Y in some unexpected way. Sometimes we are a bit too complacent, and accept things the way they are.
I am wondering what could the server do better that would help JDBC?
Obviously streaming a column is one. Possibly rowid's. Anything else ?
Dave Cramer
On Jul 7, 2015, at 1:49 PM, Dave Cramer <pg@fastcrypt.com> wrote:On 7 July 2015 at 16:46, Markus KARG <markus@headcrashing.eu> wrote:No, unfortunately not, as there is a big difference between compression and binary transmission. See the following example:
<Some-Element xyz="345678,901">
<Some-Other-Element abc="54321,876">
<Some-Value my-attribute="123456,789"/>
</Some-Other-Element>
</Some-Element>
With compression, you can certainly get rid of the whitespace, and if the compression algorithms is better you even will have refs instead of element names, but that's it, mostly. What about the numbers? Still transfered untouched, as unique hence uncompressable. So lots of integers and decimals screws compression.
Binary transmission on the other hand will only need four bytes per integer. That makes twelve bytes for all the above attributes.
You would require oids (or some other way to decode) for numbers, boolean, strings, etc for each binary representation, no ?
From: davecramer@gmail.com [mailto:davecramer@gmail.com] On Behalf Of Dave Cramer
Sent: Dienstag, 7. Juli 2015 19:54
To: Markus KARG
Cc: List
Subject: Re: [JDBC] PostgreSQL gaps wrt to java, and jdbc
Looking at the backend code. One possibility is to just use compression to send it over?
Dave Cramer
dave.cramer(at)credativ(dot)ca
http://www.credativ.ca
On 6 July 2015 at 12:35, Markus KARG <markus@headcrashing.eu> wrote:
Binary represenation of XML and JSON instead of converting it to a String BEFORE transmission. Both JSON and XML essentially are graphs, and can be inflated with whitespace on the driver side AFTER transmission. Same in the other direction (deflate by transmitting binary object graph instead of whitespace-inflated String representation).
From: pgsql-jdbc-owner@postgresql.org [mailto:pgsql-jdbc-owner@postgresql.org] On Behalf Of Dave Cramer
Sent: Montag, 6. Juli 2015 14:15
To: List
Subject: [JDBC] PostgreSQL gaps wrt to java, and jdbc
I have been actively maintaining the driver off and on since 1999 or so. Recently we have had a flurry of activity and one of the things I noticed was the surprise that PostgreSQL didn't support X or did Y in some unexpected way. Sometimes we are a bit too complacent, and accept things the way they are.
I am wondering what could the server do better that would help JDBC?
Obviously streaming a column is one. Possibly rowid's. Anything else ?
Dave Cramer
On Jul 7, 2015, at 1:49 PM, Dave Cramer <pg@fastcrypt.com> wrote:On 7 July 2015 at 16:46, Markus KARG <markus@headcrashing.eu> wrote:No, unfortunately not, as there is a big difference between compression and binary transmission. See the following example:
<Some-Element xyz="345678,901">
<Some-Other-Element abc="54321,876">
<Some-Value my-attribute="123456,789"/>
</Some-Other-Element>
</Some-Element>
With compression, you can certainly get rid of the whitespace, and if the compression algorithms is better you even will have refs instead of element names, but that's it, mostly. What about the numbers? Still transfered untouched, as unique hence uncompressable. So lots of integers and decimals screws compression.
Binary transmission on the other hand will only need four bytes per integer. That makes twelve bytes for all the above attributes.
You would require oids (or some other way to decode) for numbers, boolean, strings, etc for each binary representation, no ?My assumption has been that the true binary format is something dictated by Javascript engine (I assume it would be V8 since Pl/V8 exists). So it seems we could just glean it from there and need no other transmitted information besides that “standard”.
On Jul 7, 2015, at 1:59 PM, Dave Cramer <pg@fastcrypt.com> wrote: > My assumption has been that the true binary format is something dictated by Javascript engine (I assume it would be V8since Pl/V8 exists). So it seems we could just glean it from there and need no other transmitted information besidesthat “standard”. > > Not having followed JS too closely are there other binary formats besides V8 ? http://wiki.fasterxml.com/SmileFormatSpec