Thread: Proposal to provide the facility to set binary format output for specific OID's per session

Greetings,


The JDBC driver has a similar problem and defers switching to binary format until a statement has been reused 5 times; at which point we create a named prepared statement and incur the overhead of an extra round trip for the DESCRIBE statement. Because the extra round trip generally negates any performance enhancements that receiving the data in binary format may provide, we avoid using binary and receive everything in text format until we are sure the extra trip is worth it.

Connection pools further complicate the issue: We can't use named statements with connection pools since there is no binding of the connection to the client. As such in the JDBC driver we recommend turning off the ability to create a named statement and thus binary formats.

As a proof of concept I provide the attached patch which implements the ability to specify which oids will be returned in binary format per session. 

IE set format_binary='20,21,25' for instance. 

After which the specified oids will be output in binary format if there is no describe statement or even using simpleQuery.

Both the JDBC driver and the go driver can exploit this change with no changes. I haven't confirmed if other drivers would work without changes. 

Furthermore jackc/postgresql_simple_protocol_binary_format_bench (github.com) suggests that there is a considerable performance benefit. To quote 'At 100 rows the text format takes 48% longer than the binary format.'

Regards,
Dave Cramer
Attachment
At Fri, 22 Jul 2022 11:00:18 -0400, Dave Cramer <davecramer@gmail.com> wrote in 
> As a proof of concept I provide the attached patch which implements the
> ability to specify which oids will be returned in binary format per
> session.
...
> Both the JDBC driver and the go driver can exploit this change with no
> changes. I haven't confirmed if other drivers would work without changes.

I'm not sure about the needs of that, but binary exchange format is
not the one that can be turned on ignoring the peer's capability.  If
JDBC driver wants some types be sent in binary format, it seems to be
able to be specified in bind message.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Dave Cramer


On Sun, 24 Jul 2022 at 23:02, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Fri, 22 Jul 2022 11:00:18 -0400, Dave Cramer <davecramer@gmail.com> wrote in
> As a proof of concept I provide the attached patch which implements the
> ability to specify which oids will be returned in binary format per
> session.
...
> Both the JDBC driver and the go driver can exploit this change with no
> changes. I haven't confirmed if other drivers would work without changes.

I'm not sure about the needs of that, but binary exchange format is
not the one that can be turned on ignoring the peer's capability.
I'm not sure what this means. The client is specifying which types it wants in binary format. 
  If
JDBC driver wants some types be sent in binary format, it seems to be
able to be specified in bind message.
To be clear it's not just the JDBC client; the original idea came from the author of go driver.
And yes you can specify it in the bind message but you have to specify it in *every* bind message which pretty much negates any advantage you might get out of binary format due to the extra round trip. 

Regards,
Dave 

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


On Mon, Jul 25, 2022 at 4:57 AM Dave Cramer <davecramer@gmail.com> wrote:

Dave Cramer


On Sun, 24 Jul 2022 at 23:02, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Fri, 22 Jul 2022 11:00:18 -0400, Dave Cramer <davecramer@gmail.com> wrote in
> As a proof of concept I provide the attached patch which implements the
> ability to specify which oids will be returned in binary format per
> session.
...
> Both the JDBC driver and the go driver can exploit this change with no
> changes. I haven't confirmed if other drivers would work without changes.

I'm not sure about the needs of that, but binary exchange format is
not the one that can be turned on ignoring the peer's capability.
I'm not sure what this means. The client is specifying which types it wants in binary format. 
  If
JDBC driver wants some types be sent in binary format, it seems to be
able to be specified in bind message.
To be clear it's not just the JDBC client; the original idea came from the author of go driver.
And yes you can specify it in the bind message but you have to specify it in *every* bind message which pretty much negates any advantage you might get out of binary format due to the extra round trip. 

Regards,
Dave 

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

The advantage is to be able to use the binary format with only a single network round trip in cases where prepared statements are not possible. e.g. when using PgBouncer. Using the simple protocol with this patch lets users of pgx (the Go driver mentioned above) and PgBouncer use the binary format. The performance gains can be significant especially with types such as timestamptz that are very slow to parse.

As far as only sending binary types that the client can understand, the client driver would call `set format_binary` at the beginning of the session.

Jack Christensen
On 7/25/22 10:07, Jack Christensen wrote:
> The advantage is to be able to use the binary format with only a single 
> network round trip in cases where prepared statements are not possible. 
> e.g. when using PgBouncer. Using the simple protocol with this patch 
> lets users of pgx (the Go driver mentioned above) and PgBouncer use the 
> binary format. The performance gains can be significant especially with 
> types such as timestamptz that are very slow to parse.
> 
> As far as only sending binary types that the client can understand, the 
> client driver would call `set format_binary` at the beginning of the 
> session.

+1 makes a lot of sense to me.

Dave please add this to the open commitfest (202209)

-- 
Joe Conway
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Idea here makes sense and I've seen this brought up repeatedly on the JDBC lists.

Does the driver need to be aware that this SET command was executed? I'm wondering what happens if an end user executes this with an OID the driver does not actually know how to handle.

> + Oid *tmpOids = palloc(length+1);
> ...
> + tmpOids = repalloc(tmpOids, length+1);

These should be: sizeof(Oid) * (length + 1)

Also, I think you need to specify an explicit context via MemoryContextAlloc or the allocated memory will be in the default context and released at the end of the command.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/
Hi Sehrope,


On Mon, 25 Jul 2022 at 17:22, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Idea here makes sense and I've seen this brought up repeatedly on the JDBC lists.

Does the driver need to be aware that this SET command was executed? I'm wondering what happens if an end user executes this with an OID the driver does not actually know how to handle.
I suppose there would be a failure to read the attribute correctly.

> + Oid *tmpOids = palloc(length+1);
> ...
> + tmpOids = repalloc(tmpOids, length+1);

These should be: sizeof(Oid) * (length + 1)

Yes they should, thanks! 

Also, I think you need to specify an explicit context via MemoryContextAlloc or the allocated memory will be in the default context and released at the end of the command.

Also good catch 

Thanks,

Dave
Hi Sehrope,




On Mon, 25 Jul 2022 at 17:53, Dave Cramer <davecramer@gmail.com> wrote:
Hi Sehrope,


On Mon, 25 Jul 2022 at 17:22, Sehrope Sarkuni <sehrope@jackdb.com> wrote:
Idea here makes sense and I've seen this brought up repeatedly on the JDBC lists.

Does the driver need to be aware that this SET command was executed? I'm wondering what happens if an end user executes this with an OID the driver does not actually know how to handle.
I suppose there would be a failure to read the attribute correctly.

> + Oid *tmpOids = palloc(length+1);
> ...
> + tmpOids = repalloc(tmpOids, length+1);

These should be: sizeof(Oid) * (length + 1)

Yes they should, thanks! 

Also, I think you need to specify an explicit context via MemoryContextAlloc or the allocated memory will be in the default context and released at the end of the command.

Also good catch 

Thanks,

Attached patch to correct these deficiencies.

Thanks again,
 

Dave
Attachment
On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
> Attached patch to correct these deficiencies.

You sent a patch to be applied on top of the first patch, but cfbot doesn't
know that, so it says the patch doesn't apply.
http://cfbot.cputube.org/dave-cramer.html

BTW, a previous discussion about this idea is here:
https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aacdae@2ndquadrant.com

-- 
Justin





On Fri, 5 Aug 2022 at 17:51, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
> Attached patch to correct these deficiencies.

You sent a patch to be applied on top of the first patch, but cfbot doesn't
know that, so it says the patch doesn't apply.
http://cfbot.cputube.org/dave-cramer.html

BTW, a previous discussion about this idea is here:
https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aacdae@2ndquadrant.com

squashed patch attached

Dave
Attachment


On Fri, Aug 12, 2022 at 5:48 PM Dave Cramer <davecramer@gmail.com> wrote:


On Fri, 5 Aug 2022 at 17:51, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
> Attached patch to correct these deficiencies.

You sent a patch to be applied on top of the first patch, but cfbot doesn't
know that, so it says the patch doesn't apply.
http://cfbot.cputube.org/dave-cramer.html

BTW, a previous discussion about this idea is here:
https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aacdae@2ndquadrant.com

squashed patch attached

Dave
The patch does not apply successfully; a rebase is required.

=== applying patch ./0001-add-format_binary.patch
patching file src/backend/tcop/postgres.c
Hunk #1 succeeded at 97 (offset -8 lines).
patching file src/backend/tcop/pquery.c
patching file src/backend/utils/init/globals.c
patching file src/backend/utils/misc/guc.c
Hunk #1 succeeded at 144 (offset 1 line).
Hunk #2 succeeded at 244 with fuzz 2 (offset 1 line).
Hunk #3 succeeded at 4298 (offset -1 lines).
Hunk #4 FAILED at 12906.
1 out of 4 hunks FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej
patching file src/include/miscadmin.h
 


--
Ibrar Ahmed



On Tue, 6 Sept 2022 at 02:30, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:


On Fri, Aug 12, 2022 at 5:48 PM Dave Cramer <davecramer@gmail.com> wrote:


On Fri, 5 Aug 2022 at 17:51, Justin Pryzby <pryzby@telsasoft.com> wrote:
On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
> Attached patch to correct these deficiencies.

You sent a patch to be applied on top of the first patch, but cfbot doesn't
know that, so it says the patch doesn't apply.
http://cfbot.cputube.org/dave-cramer.html

BTW, a previous discussion about this idea is here:
https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aacdae@2ndquadrant.com

squashed patch attached

Dave
The patch does not apply successfully; a rebase is required.

=== applying patch ./0001-add-format_binary.patch
patching file src/backend/tcop/postgres.c
Hunk #1 succeeded at 97 (offset -8 lines).
patching file src/backend/tcop/pquery.c
patching file src/backend/utils/init/globals.c
patching file src/backend/utils/misc/guc.c
Hunk #1 succeeded at 144 (offset 1 line).
Hunk #2 succeeded at 244 with fuzz 2 (offset 1 line).
Hunk #3 succeeded at 4298 (offset -1 lines).
Hunk #4 FAILED at 12906.
1 out of 4 hunks FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej
patching file src/include/miscadmin.h
 



Thanks,

New rebased patch attached  

Dave
Attachment
Waiting on the author to do what ? I'm waiting for a review.
2022年9月6日(火) 21:32 Dave Cramer <davecramer@gmail.com>:
>
>
>
>
> On Tue, 6 Sept 2022 at 02:30, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>>
>>
>>
>> On Fri, Aug 12, 2022 at 5:48 PM Dave Cramer <davecramer@gmail.com> wrote:
>>>
>>>
>>>
>>> On Fri, 5 Aug 2022 at 17:51, Justin Pryzby <pryzby@telsasoft.com> wrote:
>>>>
>>>> On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
>>>> > Attached patch to correct these deficiencies.
>>>>
>>>> You sent a patch to be applied on top of the first patch, but cfbot doesn't
>>>> know that, so it says the patch doesn't apply.
>>>> http://cfbot.cputube.org/dave-cramer.html
>>>>
>>>> BTW, a previous discussion about this idea is here:
>>>> https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aacdae@2ndquadrant.com
>>>
>>>
>>> squashed patch attached
>>>
>>> Dave
>>
>> The patch does not apply successfully; a rebase is required.
>>
>> === applying patch ./0001-add-format_binary.patch
>> patching file src/backend/tcop/postgres.c
>> Hunk #1 succeeded at 97 (offset -8 lines).
>> patching file src/backend/tcop/pquery.c
>> patching file src/backend/utils/init/globals.c
>> patching file src/backend/utils/misc/guc.c
>> Hunk #1 succeeded at 144 (offset 1 line).
>> Hunk #2 succeeded at 244 with fuzz 2 (offset 1 line).
>> Hunk #3 succeeded at 4298 (offset -1 lines).
>> Hunk #4 FAILED at 12906.
>> 1 out of 4 hunks FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej
>> patching file src/include/miscadmin.h
>>
>
> Thanks,
>
> New rebased patch attached

Hi

cfbot reports the patch no longer applies [1].  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch again.

[1] http://cfbot.cputube.org/patch_40_3777.log

Thanks

Ian Barwick



Hi Ian,

Thanks, will do
Dave Cramer


On Thu, 3 Nov 2022 at 21:36, Ian Lawrence Barwick <barwick@gmail.com> wrote:
2022年9月6日(火) 21:32 Dave Cramer <davecramer@gmail.com>:
>
>
>
>
> On Tue, 6 Sept 2022 at 02:30, Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
>>
>>
>>
>> On Fri, Aug 12, 2022 at 5:48 PM Dave Cramer <davecramer@gmail.com> wrote:
>>>
>>>
>>>
>>> On Fri, 5 Aug 2022 at 17:51, Justin Pryzby <pryzby@telsasoft.com> wrote:
>>>>
>>>> On Tue, Jul 26, 2022 at 08:11:04AM -0400, Dave Cramer wrote:
>>>> > Attached patch to correct these deficiencies.
>>>>
>>>> You sent a patch to be applied on top of the first patch, but cfbot doesn't
>>>> know that, so it says the patch doesn't apply.
>>>> http://cfbot.cputube.org/dave-cramer.html
>>>>
>>>> BTW, a previous discussion about this idea is here:
>>>> https://www.postgresql.org/message-id/flat/40cbb35d-774f-23ed-3079-03f938aacdae@2ndquadrant.com
>>>
>>>
>>> squashed patch attached
>>>
>>> Dave
>>
>> The patch does not apply successfully; a rebase is required.
>>
>> === applying patch ./0001-add-format_binary.patch
>> patching file src/backend/tcop/postgres.c
>> Hunk #1 succeeded at 97 (offset -8 lines).
>> patching file src/backend/tcop/pquery.c
>> patching file src/backend/utils/init/globals.c
>> patching file src/backend/utils/misc/guc.c
>> Hunk #1 succeeded at 144 (offset 1 line).
>> Hunk #2 succeeded at 244 with fuzz 2 (offset 1 line).
>> Hunk #3 succeeded at 4298 (offset -1 lines).
>> Hunk #4 FAILED at 12906.
>> 1 out of 4 hunks FAILED -- saving rejects to file src/backend/utils/misc/guc.c.rej
>> patching file src/include/miscadmin.h
>>
>
> Thanks,
>
> New rebased patch attached

Hi

cfbot reports the patch no longer applies [1].  As CommitFest 2022-11 is
currently underway, this would be an excellent time to update the patch again.

[1] http://cfbot.cputube.org/patch_40_3777.log

Thanks

Ian Barwick