Thread: BYTEA

BYTEA

From
Eugene Yin
Date:
<div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida
Grande,sans-serif;font-size:13px"><div id="yui_3_16_0_1_1453003373099_23692">Pg 9.4+</div><div
id="yui_3_16_0_1_1453003373099_23692"><br/></div><div id="yui_3_16_0_1_1453003373099_23692">Storing binary data using
<aclass="" href="http://www.postgresql.org/docs/8.4/static/datatype-binary.html" id="yui_3_16_0_1_1453003373099_23702"
rel="nofollow">bytea</a>or <a class="" href="http://www.postgresql.org/docs/8.4/static/datatype-character.html"
id="yui_3_16_0_1_1453003373099_23704"rel="nofollow">text</a> data types </div><ul class=""
id="yui_3_16_0_1_1453003373099_23706"><liclass="" id="yui_3_16_0_1_1453003373099_23708"> Pluses <ul class=""
id="yui_3_16_0_1_1453003373099_23710"><liclass="" id="yui_3_16_0_1_1453003373099_23712"> Storing and Accessing entry
utilizesthe same interface when accessing any other data type or record. <li class=""
id="yui_3_16_0_1_1453003373099_23714">No need to track OID of a "large object" you create </ul><li class=""
id="yui_3_16_0_1_1453003373099_23716">Minus <ul class="" dir="ltr" id="yui_3_16_0_1_1453003373099_23718"><li class=""
id="yui_3_16_0_1_1453003373099_23720">bytea and text data type both use <a class=""
href="http://www.postgresql.org/docs/8.4/static/storage-toast.html"id="yui_3_16_0_1_1453003373099_23722"
rel="nofollow">TOAST</a>(details <a class="" href="https://wiki.postgresql.org/wiki/TOAST"
id="yui_3_16_0_1_1453003373099_23724"title="TOAST" title-off="">here</a>) <ul class=""
id="yui_3_16_0_1_1453003373099_23726"><liclass="" id="yui_3_16_0_1_1453003373099_23728"> limited to 1G per entry <li
class=""id="yui_3_16_0_1_1453003373099_23730"> 4 Billion (> 2KB) entries per table <a class=""
href="https://wiki.postgresql.org/wiki/TOAST"id="yui_3_16_0_1_1453003373099_23732" title="TOAST" title-off="">max</a>.
</ul><liclass="" id="yui_3_16_0_1_1453003373099_23734"><b id="yui_3_16_0_1_1453003373099_23878"> Need to escape/encode
binarydata before sending to DB then do the reverse after retrieving the data </b><li class=""
id="yui_3_16_0_1_1453003373099_23736">Memory requirements on the server can be steep even on a small record
set. </ul></ul><divdir="ltr" id="yui_3_16_0_1_1453003373099_23806"><a
href="https://wiki.postgresql.org/wiki/BinaryFilesInDB"
id="yui_3_16_0_1_1453003373099_23804">https://wiki.postgresql.org/wiki/BinaryFilesInDB</a><br/></div><div
id="yui_3_16_0_1_1453003373099_23806"><br/></div><div id="yui_3_16_0_1_1453003373099_23806"><br /></div><div
id="yui_3_16_0_1_1453003373099_23806"><br/></div><div id="yui_3_16_0_1_1453003373099_23806">Do I  really <b
id="yui_3_16_0_1_1453003373099_23885"><fontid="yui_3_16_0_1_1453003373099_23963" size="3">Need to escape/encode binary
databefore sending to DB then do the reverse after retrieving the data?</font></b></div><div
id="yui_3_16_0_1_1453003373099_23806"><b><fontsize="3"><br /></font></b></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><spanid="yui_3_16_0_1_1453003373099_24014"><font
id="yui_3_16_0_1_1453003373099_24013"size="3">If so, what (<b id="yui_3_16_0_1_1453003373099_24092">Java</b>) codes
shouldI use to achieve this goal (I am using the Java to interface with the DB)?</font></span></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><span><fontsize="3"><br /></font></span></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><span><fontsize="3"><br /></font></span></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><span><fontsize="3"><br /></font></span></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><span><fontsize="3">Thanks</font></span></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><span><fontsize="3"><br /></font></span></div><div dir="ltr"
id="yui_3_16_0_1_1453003373099_23806"><span><fontsize="3">Eugene</font></span></div></div> 

Re: BYTEA

From
Adrian Klaver
Date:
On 01/17/2016 11:33 AM, Eugene Yin wrote:
> Pg 9.4+
>
> Storing binary data using bytea
> <http://www.postgresql.org/docs/8.4/static/datatype-binary.html> or text
> <http://www.postgresql.org/docs/8.4/static/datatype-character.html> data
> types
>
>   * Pluses
>       o Storing and Accessing entry utilizes the same interface when
>         accessing any other data type or record.
>       o No need to track OID of a "large object" you create
>   * Minus
>       o bytea and text data type both use TOAST
>         <http://www.postgresql.org/docs/8.4/static/storage-toast.html>
>         (details here <https://wiki.postgresql.org/wiki/TOAST>)
>           + limited to 1G per entry
>           + 4 Billion (> 2KB) entries per table max
>             <https://wiki.postgresql.org/wiki/TOAST>.
>       o *Need to escape/encode binary data before sending to DB then do
>         the reverse after retrieving the data *
>       o Memory requirements on the server can be steep even on a small
>         record set.
>
> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>
>
>
> Do I  really *Need to escape/encode binary data before sending to DB
> then do the reverse after retrieving the data?*
> *
> *
> If so, what (*Java*) codes should I use to achieve this goal (I am using
> the Java to interface with the DB)?

https://jdbc.postgresql.org/documentation/94/binary-data.html

>
>
>
> Thanks
>
> Eugene


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: BYTEA

From
Andreas Joseph Krogh
Date:
På søndag 17. januar 2016 kl. 21:29:08, skrev Adrian Klaver <adrian.klaver@aklaver.com>:
On 01/17/2016 11:33 AM, Eugene Yin wrote:
> Pg 9.4+
>
> Storing binary data using bytea
> <http://www.postgresql.org/docs/8.4/static/datatype-binary.html> or text
> <http://www.postgresql.org/docs/8.4/static/datatype-character.html> data
> types
>
>   * Pluses
>       o Storing and Accessing entry utilizes the same interface when
>         accessing any other data type or record.
>       o No need to track OID of a "large object" you create
>   * Minus
>       o bytea and text data type both use TOAST
>         <http://www.postgresql.org/docs/8.4/static/storage-toast.html>
>         (details here <https://wiki.postgresql.org/wiki/TOAST>)
>           + limited to 1G per entry
>           + 4 Billion (> 2KB) entries per table max
>             <https://wiki.postgresql.org/wiki/TOAST>.
>       o *Need to escape/encode binary data before sending to DB then do
>         the reverse after retrieving the data *
>       o Memory requirements on the server can be steep even on a small
>         record set.
>
> https://wiki.postgresql.org/wiki/BinaryFilesInDB
>
>
>
> Do I  really *Need to escape/encode binary data before sending to DB
> then do the reverse after retrieving the data?*
> *
> *
> If so, what (*Java*) codes should I use to achieve this goal (I am using
> the Java to interface with the DB)?

https://jdbc.postgresql.org/documentation/94/binary-data.html
 
Save yourself the trouble and don't go this route. Use https://github.com/impossibl/pgjdbc-ng instead.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: BYTEA

From
Thomas Kellerer
Date:
Andreas Joseph Krogh schrieb am 17.01.2016 um 23:09:
>      > Do I  really *Need to escape/encode binary data before sending to DB
>      > then do the reverse after retrieving the data?*
>      > *
>      > *
>      > If so, what (*Java*) codes should I use to achieve this goal (I am using
>      > the Java to interface with the DB)?
>
>     https://jdbc.postgresql.org/documentation/94/binary-data.html
>
> Save yourself the trouble and don't go this route. Use https://github.com/impossibl/pgjdbc-ng instead.

Can you elaborate?

Using the "official" JDBC driver with bytea column works just fine for me.







Re: BYTEA

From
Cat
Date:
On Sun, Jan 17, 2016 at 07:33:38PM +0000, Eugene Yin wrote:
> Pg 9.4+
> Storing binary data using bytea or text data types   
>    -  Pluses      
>       -  Storing and Accessing entry utilizes the same interface when accessing any other data type or record. 
>       -  No need to track OID of a "large object" you create
> 
>    -  Minus 
>    
>    -  bytea and text data type both use TOAST (details here)      
>       -  limited to 1G per entry 
>       -  4 Billion (> 2KB) entries per table max.
> 
>    -  Need to escape/encode binary data before sending to DB then do the reverse after retrieving the data 
>    -  Memory requirements on the server can be steep even on a small record set. 
> https://wiki.postgresql.org/wiki/BinaryFilesInDB

Would http://multicorn.org/foreign-data-wrappers/#idfilesystem-foreign-data-wrapper be of greater
use? Not sure if it's r/w or r/o.

--  "A search of his car uncovered pornography, a homemade sex aid, women's  stockings and a Jack Russell terrier."   -
http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-1111118083480



Re: BYTEA

From
Karsten Hilbert
Date:
On Mon, Jan 18, 2016 at 11:25:43AM +1100, Cat wrote:

> On Sun, Jan 17, 2016 at 07:33:38PM +0000, Eugene Yin wrote:
> > Pg 9.4+
> > Storing binary data using bytea or text data types   
> >    -  Pluses      
> >       -  Storing and Accessing entry utilizes the same interface when accessing any other data type or record. 
> >       -  No need to track OID of a "large object" you create
> > 
> >    -  Minus 
> >    
> >    -  bytea and text data type both use TOAST (details here)      
> >       -  limited to 1G per entry 
> >       -  4 Billion (> 2KB) entries per table max.
> > 
> >    -  Need to escape/encode binary data before sending to DB then do the reverse after retrieving the data 
> >    -  Memory requirements on the server can be steep even on a small record set. 
> > https://wiki.postgresql.org/wiki/BinaryFilesInDB
> 
> Would http://multicorn.org/foreign-data-wrappers/#idfilesystem-foreign-data-wrapper be of greater use?

It would certainly be of use to add that option to the aforementioned wiki page.

Karsten
-- 
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346



Re: BYTEA

From
Andreas Joseph Krogh
Date:
På søndag 17. januar 2016 kl. 23:13:09, skrev Thomas Kellerer <spam_eater@gmx.net>:
Andreas Joseph Krogh schrieb am 17.01.2016 um 23:09:
>      > Do I  really *Need to escape/encode binary data before sending to DB
>      > then do the reverse after retrieving the data?*
>      > *
>      > *
>      > If so, what (*Java*) codes should I use to achieve this goal (I am using
>      > the Java to interface with the DB)?
>
>     https://jdbc.postgresql.org/documentation/94/binary-data.html
>
> Save yourself the trouble and don't go this route. Use https://github.com/impossibl/pgjdbc-ng instead.

Can you elaborate?

Using the "official" JDBC driver with bytea column works just fine for me.
 
Depends on what "works" is.
Using BLOBs (that is SQL-BLOB, not *ps.setBinaryStream etc.) with ps.setBlob/rs.getBlob and Connection.createBlob certainly doesn't work using the official driver.
 
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java#L1284-L1287
 
  public Blob createBlob() throws SQLException {    checkClosed();    throw org.postgresql.Driver.notImplemented(this.getClass(), "createBlob()");  }
 
AFAIU this thread is about working with LARGE OBJECTS, not only binary data.
Also, using BYTEA with LARGE objects (not just binary data) quickly leads to OutOfMemoryError. Which is why I recommend using pgjdbc-ng and real BLOBs (using OID) instead. It is true that get/setBinaryStream "works", in essence that it appears to do the jobb. The problem is that despite using get/setBinaryStream with BYTEA appears to use streams, it doesn't, and the whole byte-array is kept in memory, both in the JAVA-app and in PG. The only way to work with real streams all the way is using OID, not BYTEA.
 
But, of course, if you only work with small-ish binary data, yes - BYTEA does the job.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: BYTEA

From
Eugene Yin
Date:
"if you only work with small-ish binary data, yes - BYTEA does the job."

--The goal is to save/retrieve the user's photos (JPEG, TIFF, GIF) and PDF files.  The size of each file is less than
5 MB.  For such purpose, is BYTEA ok?  If not, then how about 1 MB each?


"whole byte-array is kept in memory, both in the JAVA-app and in PG"

--I believe in Java the GarbageCollector will clean it up (?).  "Who" will then clean up the Pg side?


"whole byte-array is kept in memory, both in the JAVA-app and in PG"

--Does that mean for the OID, byte-array is NOT kept in memory (of JAVA-app or PG)?  If so, where is it kept? And how they are got cleaned up?



Thanks

Eugene








On Monday, January 18, 2016 5:07 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:


På søndag 17. januar 2016 kl. 23:13:09, skrev Thomas Kellerer <spam_eater@gmx.net>:
Andreas Joseph Krogh schrieb am 17.01.2016 um 23:09:
>      > Do I  really *Need to escape/encode binary data before sending to DB
>      > then do the reverse after retrieving the data?*
>      > *
>      > *
>      > If so, what (*Java*) codes should I use to achieve this goal (I am using
>      > the Java to interface with the DB)?
>
>     https://jdbc.postgresql.org/documentation/94/binary-data.html
>
> Save yourself the trouble and don't go this route. Use https://github.com/impossibl/pgjdbc-ng instead.

Can you elaborate?

Using the "official" JDBC driver with bytea column works just fine for me.
 
Depends on what "works" is.
Using BLOBs (that is SQL-BLOB, not *ps.setBinaryStream etc.) with ps.setBlob/rs.getBlob and Connection.createBlob certainly doesn't work using the official driver.
 
https://github.com/pgjdbc/pgjdbc/blob/master/pgjdbc/src/main/java/org/postgresql/jdbc/PgConnection.java#L1284-L1287
 
  public Blob createBlob() throws SQLException {    checkClosed();    throw org.postgresql.Driver.notImplemented(this.getClass(), "createBlob()");  }
 
AFAIU this thread is about working with LARGE OBJECTS, not only binary data.
Also, using BYTEA with LARGE objects (not just binary data) quickly leads to OutOfMemoryError. Which is why I recommend using pgjdbc-ng and real BLOBs (using OID) instead. It is true that get/setBinaryStream "works", in essence that it appears to do the jobb. The problem is that despite using get/setBinaryStream with BYTEA appears to use streams, it doesn't, and the whole byte-array is kept in memory, both in the JAVA-app and in PG. The only way to work with real streams all the way is using OID, not BYTEA.
 
But, of course, if you only work with small-ish binary data, yes - BYTEA does the job.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 


Attachment

Re: BYTEA

From
Andreas Joseph Krogh
Date:
På mandag 18. januar 2016 kl. 16:40:26, skrev Eugene Yin <eugeneymail@ymail.com>:
"if you only work with small-ish binary data, yes - BYTEA does the job."
 
--The goal is to save/retrieve the user's photos (JPEG, TIFF, GIF) and PDF files.  The size of each file is less than
5 MB.  For such purpose, is BYTEA ok?  If not, then how about 1 MB each?
 
Again, that depends. If you plan on having 1000 simultaneous users then having everything in memory might not be a good idea. Remember the memory consumed by the JVM is quite a lot more the the raw byte-size of each image/blob.
 
"whole byte-array is kept in memory, both in the JAVA-app and in PG"
 
--I believe in Java the GarbageCollector will clean it up (?).  "Who" will then clean up the Pg side?
 
It will clean it up, if it's not referenced anymore. PG will clean up its side at the end of the transaction (at least).
 
 
"whole byte-array is kept in memory, both in the JAVA-app and in PG"
 
--Does that mean for the OID, byte-array is NOT kept in memory (of JAVA-app or PG)?  If so, where is it kept? And how they are got cleaned up?
 
As reading BLOBs (using the OID-type) really means your processing a stream of bytes it's really up to you to decide the size of the byte-buffer you want to use. Only this byte-buffer is kept in memory (and is freed by the GC when not referenced anymore), which often is much less than the whole BLOB.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment

Re: BYTEA

From
Eugene Yin
Date:
To understand it further, let me give an example that I have a group of pictures (assume they a set of house pictures, such as living room, bedroom, kitchen, patio, yard, etc).  I first save them (larger size) to the DB.  Then when the user retrieve them, the user first see a line-up of small pics, then they pick and click one, the selected one will popup a larger picture

I have two ways to do this:

1) BYTEA

2) OID

For BYTEA, the whole set of pictures (larger size) are retrieved (dumped) into the memories on both the Java and Pg sides, occupying the same size as the original picture is.  Hence taking up larger memories.

For OID, it takes smaller memories on both Java and Pg sides (store the byte array in a buffer?).  Whenever the user clicks the smaller pic, it then get the byte array from the buffer and display the larger (original) sized one, hence taking up less memories (avoiding the memory leaking)?


Thanks


Eugene






On Monday, January 18, 2016 7:51 AM, Andreas Joseph Krogh <andreas@visena.com> wrote:


På mandag 18. januar 2016 kl. 16:40:26, skrev Eugene Yin <eugeneymail@ymail.com>:
"if you only work with small-ish binary data, yes - BYTEA does the job."
 
--The goal is to save/retrieve the user's photos (JPEG, TIFF, GIF) and PDF files.  The size of each file is less than
5 MB.  For such purpose, is BYTEA ok?  If not, then how about 1 MB each?
 
Again, that depends. If you plan on having 1000 simultaneous users then having everything in memory might not be a good idea. Remember the memory consumed by the JVM is quite a lot more the the raw byte-size of each image/blob.
 
"whole byte-array is kept in memory, both in the JAVA-app and in PG"
 
--I believe in Java the GarbageCollector will clean it up (?).  "Who" will then clean up the Pg side?
 
It will clean it up, if it's not referenced anymore. PG will clean up its side at the end of the transaction (at least).
 
 
"whole byte-array is kept in memory, both in the JAVA-app and in PG"
 
--Does that mean for the OID, byte-array is NOT kept in memory (of JAVA-app or PG)?  If so, where is it kept? And how they are got cleaned up?
 
As reading BLOBs (using the OID-type) really means your processing a stream of bytes it's really up to you to decide the size of the byte-buffer you want to use. Only this byte-buffer is kept in memory (and is freed by the GC when not referenced anymore), which often is much less than the whole BLOB.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 


Attachment

Re: BYTEA

From
Andreas Joseph Krogh
Date:
På mandag 18. januar 2016 kl. 17:30:08, skrev Eugene Yin <eugeneymail@ymail.com>:
To understand it further, let me give an example that I have a group of pictures (assume they a set of house pictures, such as living room, bedroom, kitchen, patio, yard, etc).  I first save them (larger size) to the DB.  Then when the user retrieve them, the user first see a line-up of small pics, then they pick and click one, the selected one will popup a larger picture
 
I have two ways to do this:
 
1) BYTEA
 
2) OID
 
For BYTEA, the whole set of pictures (larger size) are retrieved (dumped) into the memories on both the Java and Pg sides, occupying the same size as the original picture is.  Hence taking up larger memories.
 
For OID, it takes smaller memories on both Java and Pg sides (store the byte array in a buffer?).  Whenever the user clicks the smaller pic, it then get the byte array from the buffer and display the larger (original) sized one, hence taking up less memories (avoiding the memory leaking)?
 
One is always free to design an app in such a way that no matter how you store the images it may still use all available memory. My point is that using BLOB with OID (together with the pgjdbc-ng driver) lets you get away with using as little memory as you choose. It's the same as working with a large file using FileInputStream. Consuming such streams is done by reading chunks of it into a pre-allocated byte-array, which is the common way in most programming-languages. This byte-array is the only extra penalty in you app. How you feed the contents of the stream back to the browser is up to you, and there are lots of ways to do that inefficiently, even if the underlying producer of data is a stream of bytes. If you are in a JEE servlet-environment then writing to ServletOutputStream while reading form the BLOB's inputstream is the most memory-efficient way to stream data from PG to the browser.
 
Note that reading BLOBs in PG requires a transaction.
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
 
Attachment