Thread: Streaming bytea implementation offered

Streaming bytea implementation offered

From

Werner Donné

Date:

31 August 2012, 16:49:49

Hello,

I have modified version 9.1-902 of the driver in order for it to stream bytea values when calling
ResultSet.getBinaryStream().The implementation is here: 

http://www.pincette.biz/home/werner/postgresql-jdbc-9.1-902-patched.src.zip

When a bytea value is larger than 1MB it is streamed to a temporary file instead of being copied in the "answer" array.
Whenthe stream is later accessed, either with "getBinaryInputStream" or "getBytes", the bytes are fetched from the
temporaryfile, which is subsequently deleted. The "getBytes" method copies the bytes in the result array, while
"getBinaryInputStream"returns a FileInputStream with an overridden "close" method that deletes the temporary file. 

Best regards,

Werner.
--
http://www.pincette.biz/
Handling your documents with care, wherever you are.

Re: Streaming bytea implementation offered

From

Craig Ringer

Date:

01 September 2012, 01:06:16

On 09/01/2012 12:49 AM, Werner Donné wrote:
> Hello,
>
> I have modified version 9.1-902 of the driver in order for it to stream bytea values when calling
ResultSet.getBinaryStream().The implementation is here: 
>
> http://www.pincette.biz/home/werner/postgresql-jdbc-9.1-902-patched.src.zip
>
> When a bytea value is larger than 1MB it is streamed to a temporary file instead of being copied in the "answer"
array.When the stream is later accessed, either with "getBinaryInputStream" or "getBytes", the bytes are fetched from
thetemporary file, which is subsequently deleted. The "getBytes" method copies the bytes in the result array, while
"getBinaryInputStream"returns a FileInputStream with an overridden "close" method that deletes the temporary file 

Sounds potentially good, but what happens if it's run inside the
SecurityManager, or in an environment with no ability to create
temporary files?

What about out-of-disk causing failures that wouldn't occur with OOM?

Do you propose to make this always-on, or configurable?

How (if at all) does this interact with JDBC-standard BLOB support, and
with large objects?

If you'd like this change considered, it'd be great if you could create
a patch against the current PgJDBC sources, rather than just publishing
patched sources. I'd recommend that you clone PgJDBC
(https://github.com/pgjdbc/pgjdbc):

     git://github.com/pgjdbc/pgjdbc.git

then copy your changes into that working tree and use `git diff`.
Alternately, you can , `git add` them, `git commit`, then use `git
format-patch` to get a diff; see

     http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html

Make sure not to include any generated files, and please add some unit
tests demonstrating the functionality if possible.

--
Craig Ringer

Re: Streaming bytea implementation offered

From

Werner Donné

Date:

01 September 2012, 15:13:21

Op 01 Sep 2012 om 03:04 heeft Craig Ringer <ringerc@ringerc.id.au> het volgende geschreven:

> On 09/01/2012 12:49 AM, Werner Donné wrote:
>> Hello,
>>
>> I have modified version 9.1-902 of the driver in order for it to stream bytea values when calling
ResultSet.getBinaryStream().The implementation is here: 
>>
>> http://www.pincette.biz/home/werner/postgresql-jdbc-9.1-902-patched.src.zip
>>
>> When a bytea value is larger than 1MB it is streamed to a temporary file instead of being copied in the "answer"
array.When the stream is later accessed, either with "getBinaryInputStream" or "getBytes", the bytes are fetched from
thetemporary file, which is subsequently deleted. The "getBytes" method copies the bytes in the result array, while
"getBinaryInputStream"returns a FileInputStream with an overridden "close" method that deletes the temporary file 
>
> Sounds potentially good, but what happens if it's run inside the SecurityManager, or in an environment with no
abilityto create temporary files? 
>
> What about out-of-disk causing failures that wouldn't occur with OOM?

We will have to fall back on copying the byte array in case a temporary file couldn't be created, for whatever reason.

>
> Do you propose to make this always-on, or configurable?

The treashold could be configurable.

>
> How (if at all) does this interact with JDBC-standard BLOB support, and with large objects?

It has no effect on either.

>
>
> If you'd like this change considered, it'd be great if you could create a patch against the current PgJDBC sources,
ratherthan just publishing patched sources. I'd recommend that you clone PgJDBC (https://github.com/pgjdbc/pgjdbc): 
>
>    git://github.com/pgjdbc/pgjdbc.git
>
> then copy your changes into that working tree and use `git diff`. Alternately, you can , `git add` them, `git
commit`,then use `git format-patch` to get a diff; see 
>
>    http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html
>
> Make sure not to include any generated files, and please add some unit tests demonstrating the functionality if
possible.

I have never used git, nor have I written unit tests, but I'll have a look.

>
> --
> Craig Ringer
>

Werner.
--
http://www.pincette.biz/
Handling your documents with care, wherever you are.

Re: Streaming bytea implementation offered

From

Craig Ringer

Date:

01 September 2012, 23:09:21

On 09/01/2012 11:13 PM, Werner Donné wrote:

We will have to fall back on copying the byte array in case a temporary file couldn't be created, for whatever reason.

That's reasonable, especially if we log the fact at debug/trace level too, to encourage people to turn the feature off if it doesn't apply in their environment.

Do you propose to make this always-on, or configurable?

The treashold could be configurable.

I think it'll need to be, if nothing else to stop people from having tempfile issues.

How (if at all) does this interact with JDBC-standard BLOB support, and with large objects?

It has no effect on either.

I suspected not, but it's good to confirm. Thanks.

I have never used git, nor have I written unit tests, but I'll have a look.

If you've used any SCM like CVS, Subversion, or Perforce the principles are similar for basic use. I highly recommend it. This'll be handy:

https://help.github.com/

GitHub have spent a lot of time and effort writing documentation and tutorials for git.

Sorry to ask you to do that; I'd probably just grab the source tarball and pick through it, but I'm on a travelling holiday for the next couple of weeks and just trying to check in on the mailing list sometimes.

--
Craig Ringer

Re: Streaming bytea implementation offered

From

Gavin Flower

Date:

04 September 2012, 15:38:08

On 01/09/12 13:04, Craig Ringer wrote:

On 09/01/2012 12:49 AM, Werner Donné wrote:
Hello,

I have modified version 9.1-902 of the driver in order for it to stream bytea values when calling ResultSet.getBinaryStream(). The implementation is here:

http://www.pincette.biz/home/werner/postgresql-jdbc-9.1-902-patched.src.zip

When a bytea value is larger than 1MB it is streamed to a temporary file instead of being copied in the "answer" array. When the stream is later accessed, either with "getBinaryInputStream" or "getBytes", the bytes are fetched from the temporary file, which is subsequently deleted. The "getBytes" method copies the bytes in the result array, while "getBinaryInputStream" returns a FileInputStream with an overridden "close" method that deletes the temporary file

Sounds potentially good, but what happens if it's run inside the SecurityManager, or in an environment with no ability to create temporary files?

What about out-of-disk causing failures that wouldn't occur with OOM?

Do you propose to make this always-on, or configurable?

How (if at all) does this interact with JDBC-standard BLOB support, and with large objects?

If you'd like this change considered, it'd be great if you could create a patch against the current PgJDBC sources, rather than just publishing patched sources. I'd recommend that you clone PgJDBC (https://github.com/pgjdbc/pgjdbc):

git://github.com/pgjdbc/pgjdbc.git

then copy your changes into that working tree and use `git diff`. Alternately, you can , `git add` them, `git commit`, then use `git format-patch` to get a diff; see

http://www.kernel.org/pub/software/scm/git/docs/git-format-patch.html

Make sure not to include any generated files, and please add some unit tests demonstrating the functionality if possible.

--
Craig Ringer

Hi,

BACKGROUND: I am working on a project at the University of Auckland, that will involve serving images over the web from JBoss backed by PostgreSQL (probably 9.2, though 9.1 is sufficient) running on a Linux box. We are talking transactions per minute rather than per second, but the SQL queries are likely to be quite complicated and involve many images – though detailed database design is yet to be done. We will probably have a reasonable amount of RAM

So in the future, I may have need of the functionality discussed.

At this stage, I think I would prefer to avoid work files as much as possible, so having the threshold for deciding when to use a temporary file being configurable would be appreciated. Currently the test image files are under 1 MB, but I expect larger images will need to be dealt with.

Cheers,
Gavin