Thread: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
Hi,

I run into a nasty behavior of current PostgreSQL JDBC.

I maintain images (from Webcams). In the Java and Hibernate (JPA) code
I specified a @Lob annotation on class MyData and a attribte/data type
"byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
one called MyData with a column mydata of type oid and an internal one
called pg_largobjects (which contain foreign keys to the oid). That's
also explained in the JDBC docs [1], saying "PostgreSQL provides two
distinct ways to store binary data. Binary data can be stored in a
table using the data type bytea or by using the Large Object feature
which stores the binary data in a separate table in a special format
and refers to that table by storing a value of type oid in your
table."

Now, when replacing the images (few hundred) with new ones using Java,
pg_largeobjects grew constantly until the file system run out of
space. So old image data did'nt get released! This is to me a bug
because the user/programmer must (and should) assume that there is a
strict 1:1 relationship between generated table MyData and its LOB
column data (stored in pg_largeobjects).
=> I finally found the supplied module 'lo' [2] which releases
detached records. Is this the recommended way to resolve this problem?

Searching for explanations I found a ticket HHH2244 [3] which was
closed by the Hibernate team without action referring to the JDBC
Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
Large Object as a column value in a row of a database table".
=> In other words: The PostgreSQL JDBC team should take action on this
but didn't until now, right?

There is another issue about "PostgreSQL and BLOBs" [4]. First it
cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5) needs
to change not to use MaterializedBlobType until the Postgres (JDBC)
team changes their driver (which does not seem to have happened in the
last 6 years)."
=> Any solutions or comments form insiders on this?

Yours, Stefan

[1] http://jdbc.postgresql.org/documentation/head/binary-data.html
[2] http://www.postgresql.org/docs/current/interactive/lo.html
[3] https://hibernate.onjira.com/browse/HHH-2244
[4] http://relation.to/Bloggers/PostgreSQLAndBLOBs

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
I'd like to backup my statement below regarding in JDBC driver from PostgreSQL:

When storing fields of type BLOB it inserts the binary string in
system table pg_largeobject (via user table). But when rows in user
table get updated or deleted it does not update nor delete
corresponding rows in table pg_largeobject.

That's really a bug!

Fortunately there's a solution indicated in the official docs
(http://www.postgresql.org/docs/9.1/static/lo.html)
To me, something like this should be implemented before hand in JDBC driver.
And in any case there should be a bold note about this in the JDBC
docs (http://jdbc.postgresql.org/documentation/head/binary-data.html )

Yours, Stefan



2012/1/6 Stefan Keller <sfkeller@gmail.com>:
> Hi,
>
> I run into a nasty behavior of current PostgreSQL JDBC.
>
> I maintain images (from Webcams). In the Java and Hibernate (JPA) code
> I specified a @Lob annotation on class MyData and a attribte/data type
> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
> one called MyData with a column mydata of type oid and an internal one
> called pg_largobjects (which contain foreign keys to the oid). That's
> also explained in the JDBC docs [1], saying "PostgreSQL provides two
> distinct ways to store binary data. Binary data can be stored in a
> table using the data type bytea or by using the Large Object feature
> which stores the binary data in a separate table in a special format
> and refers to that table by storing a value of type oid in your
> table."
>
> Now, when replacing the images (few hundred) with new ones using Java,
> pg_largeobjects grew constantly until the file system run out of
> space. So old image data did'nt get released! This is to me a bug
> because the user/programmer must (and should) assume that there is a
> strict 1:1 relationship between generated table MyData and its LOB
> column data (stored in pg_largeobjects).
> => I finally found the supplied module 'lo' [2] which releases
> detached records. Is this the recommended way to resolve this problem?
>
> Searching for explanations I found a ticket HHH2244 [3] which was
> closed by the Hibernate team without action referring to the JDBC
> Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
> Large Object as a column value in a row of a database table".
> => In other words: The PostgreSQL JDBC team should take action on this
> but didn't until now, right?
>
> There is another issue about "PostgreSQL and BLOBs" [4]. First it
> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5) needs
> to change not to use MaterializedBlobType until the Postgres (JDBC)
> team changes their driver (which does not seem to have happened in the
> last 6 years)."
> => Any solutions or comments form insiders on this?
>
> Yours, Stefan
>
> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html
> [2] http://www.postgresql.org/docs/current/interactive/lo.html
> [3] https://hibernate.onjira.com/browse/HHH-2244
> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Thomas Kellerer
Date:
Stefan Keller wrote on 06.01.2012 19:04:
> I maintain images (from Webcams). In the Java and Hibernate (JPA) code
> I specified a @Lob annotation on class MyData and a attribte/data type
> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
> one called MyData with a column mydata of type oid and an internal one
> called pg_largobjects (which contain foreign keys to the oid). That's
> also explained in the JDBC docs [1], saying "PostgreSQL provides two
> distinct ways to store binary data. Binary data can be stored in a
> table using the data type bytea or by using the Large Object feature
> which stores the binary data in a separate table in a special format
> and refers to that table by storing a value of type oid in your
> table."

I think you are better off using bytea unless you need to access only parts of the blob regularly.



Re: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
2012/1/8 Thomas Kellerer <spam_eater@gmx.net> wrote:
> I think you are better off using bytea unless you need to access only parts
> of the blob regularly.

That's a valid tip. But it's to the current JDBC implementation to
take action because it currently leads to disk space leakage when
using JDBC and JPA/Hibernate.

After a tedious time of bug searching I found a possible solution and
a testimonial here: "One of the problems with the JDBC driver (and
this affects the ODBC driver also), is that the specification assumes
that references to BLOBs (Binary Large OBjects) are stored within a
table, and if that entry is changed, the associated BLOB is deleted
from the database. As PostgreSQL stands, this doesn't occur." (taken
from the docs http://www.postgresql.org/docs/9.1/static/lo.html )

In addition, unfortunately the (lo_unlink) problem of LargeObjects and
BLOBs is not mentioned in the JDBC docs:
http://jdbc.postgresql.org/documentation/head/binary-data.html

Stefan

2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:
> Stefan Keller wrote on 06.01.2012 19:04:
>
>> I maintain images (from Webcams). In the Java and Hibernate (JPA) code
>> I specified a @Lob annotation on class MyData and a attribte/data type
>> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
>> one called MyData with a column mydata of type oid and an internal one
>> called pg_largobjects (which contain foreign keys to the oid). That's
>> also explained in the JDBC docs [1], saying "PostgreSQL provides two
>> distinct ways to store binary data. Binary data can be stored in a
>> table using the data type bytea or by using the Large Object feature
>> which stores the binary data in a separate table in a special format
>> and refers to that table by storing a value of type oid in your
>> table."
>
>
> I think you are better off using bytea unless you need to access only parts
> of the blob regularly.
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Thomas Kellerer
Date:
Stefan Keller wrote on 08.01.2012 19:13:
>> I think you are better off using bytea unless you need to access only parts
>> of the blob regularly.
>
> That's a valid tip. But it's to the current JDBC implementation to
> take action because it currently leads to disk space leakage when
> using JDBC and JPA/Hibernate.

But only if you use large objects.

 From my perspective bytea is the (only) data type that matches the JDBC BLOB type.
And none of the problems you have occur when using bytea.

There is no match for PG's large objects in the JDBC API so I don't see your claim that it's a fault of the driver.

What's the reason for you to stick with LargeObjects?





Re: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:> What's the reason for
you to stick with LargeObjects?

I simply used the @Lob annotation in Hibernate/JPA.
That's all to get a leaking pg_largeobject table.
See http://relation.to/Bloggers/PostgreSQLAndBLOBs,
https://hibernate.onjira.com/browse/HHH-2244 and
https://hibernate.onjira.com/browse/HHH-4876 for some background of
the dilemma.

Stefan

2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:
> Stefan Keller wrote on 08.01.2012 19:13:
>
>>> I think you are better off using bytea unless you need to access only
>>> parts the blob regularly.
>>
>>
>> That's a valid tip. But it's to the current JDBC implementation to
>> take action because it currently leads to disk space leakage when
>> using JDBC and JPA/Hibernate.
>
>
> But only if you use large objects.
>
> From my perspective bytea is the (only) data type that matches the JDBC BLOB
> type.
> And none of the problems you have occur when using bytea.
>
> There is no match for PG's large objects in the JDBC API so I don't see your
> claim that it's a fault of the driver.
>
> What's the reason for you to stick with LargeObjects?
>
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Thomas Kellerer
Date:
Stefan Keller wrote on 08.01.2012 20:35:
> 2012/1/8 Thomas Kellerer<spam_eater@gmx.net>:
> What's the reason for you to stick with LargeObjects?
>
> I simply used the @Lob annotation in Hibernate/JPA.
> That's all to get a leaking pg_largeobject table.
> See http://relation.to/Bloggers/PostgreSQLAndBLOBs,
> https://hibernate.onjira.com/browse/HHH-2244 and
> https://hibernate.onjira.com/browse/HHH-4876 for some background of
> the dilemma.

So it's clearly a Hibernate bug.

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
Thanks, Radosław, for the clarification.

2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
> 3. pg_largeobjects is system table, hibernate do not creates it.
>
> 4. Trigger based approach is good for this, but You need to do this mannualy

That's exactly my point:
Since JDBC manages creation of OID and pg_largeobjects it's also JDBC
which is responsible for update/delete of rows in pg_largeobjects.
As the user expects, the only thing JDBC would have to do is to issue
lo_unlink() when rows are updated or deleted.
Thus, it's currently a bug in the JDBC driver. And whatever the
solution is, it needs to be mentioned in the JDBC docs.

> 5. If you want to use bytea use
> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
> should remove @Lob too) on your field.

In fact, this annotation syntax usage looks like the Hibernate mapping
for PostgreSQL could be enhanced.

Yours, Stefan

2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
>>
>> I'd like to backup my statement below regarding in JDBC driver from
>> PostgreSQL:
>>
>> When storing fields of type BLOB it inserts the binary string in
>> system table pg_largeobject (via user table). But when rows in user
>> table get updated or deleted it does not update nor delete
>> corresponding rows in table pg_largeobject.
>>
>> That's really a bug!
>>
>> Fortunately there's a solution indicated in the official docs
>> (http://www.postgresql.org/docs/9.1/static/lo.html)
>> To me, something like this should be implemented before hand in JDBC
>> driver.
>> And in any case there should be a bold note about this in the JDBC
>> docs (http://jdbc.postgresql.org/documentation/head/binary-data.html )
>>
>> Yours, Stefan
>>
>>
>>
>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>:
>>>
>>> Hi,
>>>
>>> I run into a nasty behavior of current PostgreSQL JDBC.
>>>
>>> I maintain images (from Webcams). In the Java and Hibernate (JPA) code
>>> I specified a @Lob annotation on class MyData and a attribte/data type
>>> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
>>> one called MyData with a column mydata of type oid and an internal one
>>> called pg_largobjects (which contain foreign keys to the oid). That's
>>> also explained in the JDBC docs [1], saying "PostgreSQL provides two
>>> distinct ways to store binary data. Binary data can be stored in a
>>> table using the data type bytea or by using the Large Object feature
>>> which stores the binary data in a separate table in a special format
>>> and refers to that table by storing a value of type oid in your
>>> table."
>>>
>>> Now, when replacing the images (few hundred) with new ones using Java,
>>> pg_largeobjects grew constantly until the file system run out of
>>> space. So old image data did'nt get released! This is to me a bug
>>> because the user/programmer must (and should) assume that there is a
>>> strict 1:1 relationship between generated table MyData and its LOB
>>> column data (stored in pg_largeobjects).
>>> => I finally found the supplied module 'lo' [2] which releases
>>> detached records. Is this the recommended way to resolve this problem?
>>>
>>> Searching for explanations I found a ticket HHH2244 [3] which was
>>> closed by the Hibernate team without action referring to the JDBC
>>> Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
>>> Large Object as a column value in a row of a database table".
>>> => In other words: The PostgreSQL JDBC team should take action on this
>>> but didn't until now, right?
>>>
>>> There is another issue about "PostgreSQL and BLOBs" [4]. First it
>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
>>> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5) needs
>>> to change not to use MaterializedBlobType until the Postgres (JDBC)
>>> team changes their driver (which does not seem to have happened in the
>>> last 6 years)."
>>> => Any solutions or comments form insiders on this?
>>>
>>> Yours, Stefan
>>>
>>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html
>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html
>>> [3] https://hibernate.onjira.com/browse/HHH-2244
>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
>
> This is common approach for PostgreSQL and some proxy of real life Blobs.
> One time I submitted bug about this to Hibernate. But step by step.
>
> 1. BLOBs are... Large Objects, they are stored as reference because those
> objects are large, if you will store this objects as bytea then select *
> will return all large data. It may not be comfortable not only to download
> few GB of data, but to keep this on stack too. From your perspective it
> doesn't matters because you put it in byte[]. But if You will keep e.g.
> CD-ROM images then it's much more better to use streaming approach then
> bytea[]. More over due to some security JDBC driver will at least double
> memory consumed by bytea.
>
> 2. Specifying hibernate data type as bytea do not resolve problems because
> it will still use LOB approach.
>
> 3. pg_largeobjects is system table, hibernate do not creates it.
>
> 4. Trigger based approach is good for this, but You need to do this mannualy
>
> 5. If you want to use bytea use
> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
> should remove @Lob too) on your field.
>
> Regards,
> Radosław Smogura

Re: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:>  So it's clearly a
Hibernate bug.

Obviously not :->
Hibernate mapping just uses one of two valid variants to map large
objects in JDBC.
So, AFAIK it's a PostgreSQL JDBC bug and an omission in the JDBC docs as well.

Stefan

2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:
> Stefan Keller wrote on 08.01.2012 20:35:
>
>> 2012/1/8 Thomas Kellerer<spam_eater@gmx.net>:
>> What's the reason for you to stick with LargeObjects?
>>
>> I simply used the @Lob annotation in Hibernate/JPA.
>> That's all to get a leaking pg_largeobject table.
>> See http://relation.to/Bloggers/PostgreSQLAndBLOBs,
>> https://hibernate.onjira.com/browse/HHH-2244 and
>> https://hibernate.onjira.com/browse/HHH-4876 for some background of
>> the dilemma.
>
>  So it's clearly a Hibernate bug.
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Radosław Smogura
Date:
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
> I'd like to backup my statement below regarding in JDBC driver from
> PostgreSQL:
>
> When storing fields of type BLOB it inserts the binary string in
> system table pg_largeobject (via user table). But when rows in user
> table get updated or deleted it does not update nor delete
> corresponding rows in table pg_largeobject.
>
> That's really a bug!
>
> Fortunately there's a solution indicated in the official docs
> (http://www.postgresql.org/docs/9.1/static/lo.html)
> To me, something like this should be implemented before hand in JDBC
> driver.
> And in any case there should be a bold note about this in the JDBC
> docs (http://jdbc.postgresql.org/documentation/head/binary-data.html
> )
>
> Yours, Stefan
>
>
>
> 2012/1/6 Stefan Keller <sfkeller@gmail.com>:
>> Hi,
>>
>> I run into a nasty behavior of current PostgreSQL JDBC.
>>
>> I maintain images (from Webcams). In the Java and Hibernate (JPA)
>> code
>> I specified a @Lob annotation on class MyData and a attribte/data
>> type
>> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
>> one called MyData with a column mydata of type oid and an internal
>> one
>> called pg_largobjects (which contain foreign keys to the oid).
>> That's
>> also explained in the JDBC docs [1], saying "PostgreSQL provides two
>> distinct ways to store binary data. Binary data can be stored in a
>> table using the data type bytea or by using the Large Object feature
>> which stores the binary data in a separate table in a special format
>> and refers to that table by storing a value of type oid in your
>> table."
>>
>> Now, when replacing the images (few hundred) with new ones using
>> Java,
>> pg_largeobjects grew constantly until the file system run out of
>> space. So old image data did'nt get released! This is to me a bug
>> because the user/programmer must (and should) assume that there is a
>> strict 1:1 relationship between generated table MyData and its LOB
>> column data (stored in pg_largeobjects).
>> => I finally found the supplied module 'lo' [2] which releases
>> detached records. Is this the recommended way to resolve this
>> problem?
>>
>> Searching for explanations I found a ticket HHH2244 [3] which was
>> closed by the Hibernate team without action referring to the JDBC
>> Spec. which says: "An SQL BLOB is a built-in type that stores a
>> Binary
>> Large Object as a column value in a row of a database table".
>> => In other words: The PostgreSQL JDBC team should take action on
>> this
>> but didn't until now, right?
>>
>> There is another issue about "PostgreSQL and BLOBs" [4]. First it
>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
>> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5) needs
>> to change not to use MaterializedBlobType until the Postgres (JDBC)
>> team changes their driver (which does not seem to have happened in
>> the
>> last 6 years)."
>> => Any solutions or comments form insiders on this?
>>
>> Yours, Stefan
>>
>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html
>> [2] http://www.postgresql.org/docs/current/interactive/lo.html
>> [3] https://hibernate.onjira.com/browse/HHH-2244
>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
This is common approach for PostgreSQL and some proxy of real life
Blobs. One time I submitted bug about this to Hibernate. But step by
step.

1. BLOBs are... Large Objects, they are stored as reference because
those objects are large, if you will store this objects as bytea then
select * will return all large data. It may not be comfortable not only
to download few GB of data, but to keep this on stack too. From your
perspective it doesn't matters because you put it in byte[]. But if You
will keep e.g. CD-ROM images then it's much more better to use streaming
approach then bytea[]. More over due to some security JDBC driver will
at least double memory consumed by bytea.

2. Specifying hibernate data type as bytea do not resolve problems
because it will still use LOB approach.

3. pg_largeobjects is system table, hibernate do not creates it.

4. Trigger based approach is good for this, but You need to do this
mannualy

5. If you want to use bytea use
@Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
should remove @Lob too) on your field.

Regards,
Radosław Smogura

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Radosław Smogura
Date:
On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
> Thanks, Radosław, for the clarification.
>
> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>
>> 4. Trigger based approach is good for this, but You need to do this
>> mannualy
>
> That's exactly my point:
> Since JDBC manages creation of OID and pg_largeobjects it's also JDBC
> which is responsible for update/delete of rows in pg_largeobjects.
> As the user expects, the only thing JDBC would have to do is to issue
> lo_unlink() when rows are updated or deleted.
> Thus, it's currently a bug in the JDBC driver. And whatever the
> solution is, it needs to be mentioned in the JDBC docs.
Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
reference to LOB. In fact, BLOB behaviour in JDBC
is just thin wrapper for this what is missing in PostgreSQL - BLOBs. It
was build form available parts. In addition OID type may be used and
it's used as the system id or may be used as just some kind of row id -
all types, tables, sequences etc has OID. You may create table with
"WITH OIDS" clause. You may use OID data type just as replacement for
(unsigned) int, so JDBC can create LOB but it can't decide if given
field in row is reference to LOB or e.g. table and what with statements
"DELETE WHERE date > "?

In fact JDBC driver is so "stupid" that if you will call getBytes or
getBlob on any column with Oid it will ask for LOB.

Hibernate knows this and it does what is best - calls standard BLOB
interface, and creates table with Oid column.

And here again round trip, in case of deletion only Hibernate may
delete given LOB because only Hibernate and You knows that any value in
Oid column will reflect LOB - JDBC driver doesn't "knows " this, but...
oids may be shared because those are only numbers, not all tables may be
covered by Hibernate, and assuming huge imagination, someone may encode
OID by adding 1, may store it as long, etc. I know it's quite popular
that DB schema comes from entities, but not always You have such
flexibility and sometimes You create Entities for particular schema.

So, as You see only this trigger approach is currently (universally)
the best way. If this is that module I think about it's just trigger
which calls unlink on replace or deletion - in fact You may write own
without any problems. Those are few lines only.

>> 5. If you want to use bytea use
>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think
>> you
>> should remove @Lob too) on your field.
>
> In fact, this annotation syntax usage looks like the Hibernate
> mapping
> for PostgreSQL could be enhanced.
You have right, but Hibernate team will be in need to add auto deletion
for bulk updates, too. PostgreSQL isn't still so popular it's worth of
it (and they have no so much developers bug report may stand for months
without any comment). Look how many peoples ask for Lobs. It means no
one needs true LOB - true LOB large object stored outside table/row data
space, to allow out of statements operation like streaming, partial
updates etc. This is my definition of LOB, because this is idea of LOB.
I think only DB2 stores LOBs in row, PostgreSQL do not make this but
adverts this bytea ugly approach.

Personally, I create app which captures images from WebCam (like You) -
just frames not movies. From above reason I wanted to move to bytea, but
due to changes (wired instability and leak of backward compatibility) I
still have Oid. Because I have only two tables for Lobs I have garbage
collection simple script.

Best regards
Radek

> Yours, Stefan
>
> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
>> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
>>>
>>> I'd like to backup my statement below regarding in JDBC driver from
>>> PostgreSQL:
>>>
>>> When storing fields of type BLOB it inserts the binary string in
>>> system table pg_largeobject (via user table). But when rows in user
>>> table get updated or deleted it does not update nor delete
>>> corresponding rows in table pg_largeobject.
>>>
>>> That's really a bug!
>>>
>>> Fortunately there's a solution indicated in the official docs
>>> (http://www.postgresql.org/docs/9.1/static/lo.html)
>>> To me, something like this should be implemented before hand in
>>> JDBC
>>> driver.
>>> And in any case there should be a bold note about this in the JDBC
>>> docs
>>> (http://jdbc.postgresql.org/documentation/head/binary-data.html )
>>>
>>> Yours, Stefan
>>>
>>>
>>>
>>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>:
>>>>
>>>> Hi,
>>>>
>>>> I run into a nasty behavior of current PostgreSQL JDBC.
>>>>
>>>> I maintain images (from Webcams). In the Java and Hibernate (JPA)
>>>> code
>>>> I specified a @Lob annotation on class MyData and a attribte/data
>>>> type
>>>> "byte[] mydata;". Hibernate then generates two tables in
>>>> PostgreSQL,
>>>> one called MyData with a column mydata of type oid and an internal
>>>> one
>>>> called pg_largobjects (which contain foreign keys to the oid).
>>>> That's
>>>> also explained in the JDBC docs [1], saying "PostgreSQL provides
>>>> two
>>>> distinct ways to store binary data. Binary data can be stored in a
>>>> table using the data type bytea or by using the Large Object
>>>> feature
>>>> which stores the binary data in a separate table in a special
>>>> format
>>>> and refers to that table by storing a value of type oid in your
>>>> table."
>>>>
>>>> Now, when replacing the images (few hundred) with new ones using
>>>> Java,
>>>> pg_largeobjects grew constantly until the file system run out of
>>>> space. So old image data did'nt get released! This is to me a bug
>>>> because the user/programmer must (and should) assume that there is
>>>> a
>>>> strict 1:1 relationship between generated table MyData and its LOB
>>>> column data (stored in pg_largeobjects).
>>>> => I finally found the supplied module 'lo' [2] which releases
>>>> detached records. Is this the recommended way to resolve this
>>>> problem?
>>>>
>>>> Searching for explanations I found a ticket HHH2244 [3] which was
>>>> closed by the Hibernate team without action referring to the JDBC
>>>> Spec. which says: "An SQL BLOB is a built-in type that stores a
>>>> Binary
>>>> Large Object as a column value in a row of a database table".
>>>> => In other words: The PostgreSQL JDBC team should take action on
>>>> this
>>>> but didn't until now, right?
>>>>
>>>> There is another issue about "PostgreSQL and BLOBs" [4]. First it
>>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in
>>>> a
>>>> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5)
>>>> needs
>>>> to change not to use MaterializedBlobType until the Postgres
>>>> (JDBC)
>>>> team changes their driver (which does not seem to have happened in
>>>> the
>>>> last 6 years)."
>>>> => Any solutions or comments form insiders on this?
>>>>
>>>> Yours, Stefan
>>>>
>>>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html
>>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html
>>>> [3] https://hibernate.onjira.com/browse/HHH-2244
>>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
>>
>> This is common approach for PostgreSQL and some proxy of real life
>> Blobs.
>> One time I submitted bug about this to Hibernate. But step by step.
>>
>> 1. BLOBs are... Large Objects, they are stored as reference because
>> those
>> objects are large, if you will store this objects as bytea then
>> select *
>> will return all large data. It may not be comfortable not only to
>> download
>> few GB of data, but to keep this on stack too. From your perspective
>> it
>> doesn't matters because you put it in byte[]. But if You will keep
>> e.g.
>> CD-ROM images then it's much more better to use streaming approach
>> then
>> bytea[]. More over due to some security JDBC driver will at least
>> double
>> memory consumed by bytea.
>>
>> 2. Specifying hibernate data type as bytea do not resolve problems
>> because
>> it will still use LOB approach.
>>
>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>
>> 4. Trigger based approach is good for this, but You need to do this
>> mannualy
>>
>> 5. If you want to use bytea use
>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think
>> you
>> should remove @Lob too) on your field.
>>
>> Regards,
>> Radosław Smogura


Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference> to LOB. In fact, BLOB behaviour in JDBC>
isjust thin wrapper for this what is missing in PostgreSQL - BLOBs. 

I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too but
I'd like to get the job done with LO too.
JDBC offers both:
http://jdbc.postgresql.org/documentation/head/binary-data.html
I tried to summarize the characteristics here:
http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_Objects

There's an important principle that the code which "allocates"
resources is also responsible to release it.
AFAIK in this case it's JDBC which choses to use LO (which creates
pg_largeobjects entries) and it's therefore also JDBC which has to
clean up.

Yours, Stefan

P.S. Just in order get some more insight I also tried to forward this
to the JDBC list (which currently seems to have problems accepting new
subscriptions).

2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
> On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
>>
>> Thanks, Radosław, for the clarification.
>>
>> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
>>>
>>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>>
>>> 4. Trigger based approach is good for this, but You need to do this
>>> mannualy
>>
>>
>> That's exactly my point:
>> Since JDBC manages creation of OID and pg_largeobjects it's also JDBC
>> which is responsible for update/delete of rows in pg_largeobjects.
>> As the user expects, the only thing JDBC would have to do is to issue
>> lo_unlink() when rows are updated or deleted.
>> Thus, it's currently a bug in the JDBC driver. And whatever the
>> solution is, it needs to be mentioned in the JDBC docs.
>
> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference
> to LOB. In fact, BLOB behaviour in JDBC
> is just thin wrapper for this what is missing in PostgreSQL - BLOBs. It was
> build form available parts. In addition OID type may be used and it's used
> as the system id or may be used as just some kind of row id - all types,
> tables, sequences etc has OID. You may create table with "WITH OIDS" clause.
> You may use OID data type just as replacement for (unsigned) int, so JDBC
> can create LOB but it can't decide if given field in row is reference to LOB
> or e.g. table and what with statements "DELETE WHERE date > "?
>
> In fact JDBC driver is so "stupid" that if you will call getBytes or getBlob
> on any column with Oid it will ask for LOB.
>
> Hibernate knows this and it does what is best - calls standard BLOB
> interface, and creates table with Oid column.
>
> And here again round trip, in case of deletion only Hibernate may delete
> given LOB because only Hibernate and You knows that any value in Oid column
> will reflect LOB - JDBC driver doesn't "knows " this, but... oids may be
> shared because those are only numbers, not all tables may be covered by
> Hibernate, and assuming huge imagination, someone may encode OID by adding
> 1, may store it as long, etc. I know it's quite popular that DB schema comes
> from entities, but not always You have such flexibility and sometimes You
> create Entities for particular schema.
>
> So, as You see only this trigger approach is currently (universally) the
> best way. If this is that module I think about it's just trigger which calls
> unlink on replace or deletion - in fact You may write own without any
> problems. Those are few lines only.
>
>
>>> 5. If you want to use bytea use
>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
>>> should remove @Lob too) on your field.
>>
>>
>> In fact, this annotation syntax usage looks like the Hibernate mapping
>> for PostgreSQL could be enhanced.
>
> You have right, but Hibernate team will be in need to add auto deletion for
> bulk updates, too. PostgreSQL isn't still so popular it's worth of it (and
> they have no so much developers bug report may stand for months without any
> comment). Look how many peoples ask for Lobs. It means no one needs true LOB
> - true LOB large object stored outside table/row data space, to allow out of
> statements operation like streaming, partial updates etc. This is my
> definition of LOB, because this is idea of LOB. I think only DB2 stores LOBs
> in row, PostgreSQL do not make this but adverts this bytea ugly approach.
>
> Personally, I create app which captures images from WebCam (like You) - just
> frames not movies. From above reason I wanted to move to bytea, but due to
> changes (wired instability and leak of backward compatibility) I still have
> Oid. Because I have only two tables for Lobs I have garbage collection
> simple script.
>
> Best regards
> Radek
>
>
>> Yours, Stefan
>>
>> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
>>>
>>> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
>>>>
>>>>
>>>> I'd like to backup my statement below regarding in JDBC driver from
>>>> PostgreSQL:
>>>>
>>>> When storing fields of type BLOB it inserts the binary string in
>>>> system table pg_largeobject (via user table). But when rows in user
>>>> table get updated or deleted it does not update nor delete
>>>> corresponding rows in table pg_largeobject.
>>>>
>>>> That's really a bug!
>>>>
>>>> Fortunately there's a solution indicated in the official docs
>>>> (http://www.postgresql.org/docs/9.1/static/lo.html)
>>>> To me, something like this should be implemented before hand in JDBC
>>>> driver.
>>>> And in any case there should be a bold note about this in the JDBC
>>>> docs (http://jdbc.postgresql.org/documentation/head/binary-data.html )
>>>>
>>>> Yours, Stefan
>>>>
>>>>
>>>>
>>>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>:
>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I run into a nasty behavior of current PostgreSQL JDBC.
>>>>>
>>>>> I maintain images (from Webcams). In the Java and Hibernate (JPA) code
>>>>> I specified a @Lob annotation on class MyData and a attribte/data type
>>>>> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL,
>>>>> one called MyData with a column mydata of type oid and an internal one
>>>>> called pg_largobjects (which contain foreign keys to the oid). That's
>>>>> also explained in the JDBC docs [1], saying "PostgreSQL provides two
>>>>> distinct ways to store binary data. Binary data can be stored in a
>>>>> table using the data type bytea or by using the Large Object feature
>>>>> which stores the binary data in a separate table in a special format
>>>>> and refers to that table by storing a value of type oid in your
>>>>> table."
>>>>>
>>>>> Now, when replacing the images (few hundred) with new ones using Java,
>>>>> pg_largeobjects grew constantly until the file system run out of
>>>>> space. So old image data did'nt get released! This is to me a bug
>>>>> because the user/programmer must (and should) assume that there is a
>>>>> strict 1:1 relationship between generated table MyData and its LOB
>>>>> column data (stored in pg_largeobjects).
>>>>> => I finally found the supplied module 'lo' [2] which releases
>>>>> detached records. Is this the recommended way to resolve this problem?
>>>>>
>>>>> Searching for explanations I found a ticket HHH2244 [3] which was
>>>>> closed by the Hibernate team without action referring to the JDBC
>>>>> Spec. which says: "An SQL BLOB is a built-in type that stores a Binary
>>>>> Large Object as a column value in a row of a database table".
>>>>> => In other words: The PostgreSQL JDBC team should take action on this
>>>>> but didn't until now, right?
>>>>>
>>>>> There is another issue about "PostgreSQL and BLOBs" [4]. First it
>>>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a
>>>>> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5) needs
>>>>> to change not to use MaterializedBlobType until the Postgres (JDBC)
>>>>> team changes their driver (which does not seem to have happened in the
>>>>> last 6 years)."
>>>>> => Any solutions or comments form insiders on this?
>>>>>
>>>>> Yours, Stefan
>>>>>
>>>>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html
>>>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html
>>>>> [3] https://hibernate.onjira.com/browse/HHH-2244
>>>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
>>>
>>>
>>> This is common approach for PostgreSQL and some proxy of real life Blobs.
>>> One time I submitted bug about this to Hibernate. But step by step.
>>>
>>> 1. BLOBs are... Large Objects, they are stored as reference because those
>>> objects are large, if you will store this objects as bytea then select *
>>> will return all large data. It may not be comfortable not only to
>>> download
>>> few GB of data, but to keep this on stack too. From your perspective it
>>> doesn't matters because you put it in byte[]. But if You will keep e.g.
>>> CD-ROM images then it's much more better to use streaming approach then
>>> bytea[]. More over due to some security JDBC driver will at least double
>>> memory consumed by bytea.
>>>
>>> 2. Specifying hibernate data type as bytea do not resolve problems
>>> because
>>> it will still use LOB approach.
>>>
>>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>>
>>> 4. Trigger based approach is good for this, but You need to do this
>>> mannualy
>>>
>>> 5. If you want to use bytea use
>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you
>>> should remove @Lob too) on your field.
>>>
>>> Regards,
>>> Radosław Smogura
>
>

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Oliver Jowett
Date:
On 9 January 2012 12:40, Stefan Keller <sfkeller@gmail.com> wrote:

> There's an important principle that the code which "allocates"
> resources is also responsible to release it.

That's one resource allocation model, yes. The other common model is
that resources are freed when there are no remaining references to
them, i.e. a GC model.

> AFAIK in this case it's JDBC which choses to use LO (which creates
> pg_largeobjects entries) and it's therefore also JDBC which has to
> clean up.

If the application calls LargeObjectManager.create() then it's also
responsible for eventually calling LargeObjectManager.unlink().

If you're using JDBC's Blob API, there's no API there to tell the
driver to actually delete the underlying data (there is Blob.free(),
but that appears to just be about freeing local resources, not the
underlying storage).
As a LO is independent storage that might have multiple references to
it (the OID might be stored in many places), without explicit deletion
you need a GC mechanism to collect unreferenced LOs eventually -
that's what vacuumlo etc are doing.

What do you suggest that the driver does differently here? (Perhaps we
could do something like interpret Blob.truncate(0) as "delete the blob
right now" - but is that what Hibernate actually does?)

(Much of this is the whole LO vs. bytea argument all over again. If
you want to store data with a lifetime that's the same as the row it's
embedded in, then bytea is a much better mapping)

Oliver

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Oliver Jowett
Date:
On 9 January 2012 14:29, Stefan Keller <sfkeller@gmail.com> wrote:
> 2012/1/9 Oliver Jowett <oliver@opencloud.com>:
>> As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places),
withoutexplicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc
aredoing. 
> I can follow that. But that's not what the JDBC user expects nor is it
> explained (nor mentioned) in the JDBC docs.
>
> From a conceptual view I have just an entity MyWebcam with an
> attribute called image. Attribute image is of attribute cardinality
> 1:1 (and private):
>
> // Java using Hibernate/JPA:
>  @Entity
>  @Lob
>  @Basic(fetch=FetchType.LAZY)
>  public class MyWebcam {
>    private byte[] image;
>    private String name;
>    public byte[] getImage() { return image; }
>    public void setImage(byte[] _image) { image=_image; }
>    // ... other stuff
>  }
>
> That's the classic use case.
> Isn't it obvious that if setImage() sets another byte[] that the image
> space get's cleared by the layers below?
> And since Hibernate chose to use one variant of JDBC, it's also JDBC
> which has to take care about orphans.

Well, either the Hibernate mapping is misconfigured, or your database
is misconfigured i.e. you are not collecting garbage LOs. If you have
a suitable GC mechanism configured, then what happens?

Otherwise, what should JDBC do differently here? Be specific. It would
be helpful if you could provide a native JDBC example, rather than a
Hibernate example, since it's not clear what JDBC calls are being made
by Hibernate.

Oliver

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
> As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places),
withoutexplicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc
aredoing. 
I can follow that. But that's not what the JDBC user expects nor is it
explained (nor mentioned) in the JDBC docs.

From a conceptual view I have just an entity MyWebcam with an
attribute called image. Attribute image is of attribute cardinality
1:1 (and private):

// Java using Hibernate/JPA:
  @Entity
  @Lob
  @Basic(fetch=FetchType.LAZY)
  public class MyWebcam {
    private byte[] image;
    private String name;
    public byte[] getImage() { return image; }
    public void setImage(byte[] _image) { image=_image; }
    // ... other stuff
  }

That's the classic use case.
Isn't it obvious that if setImage() sets another byte[] that the image
space get's cleared by the layers below?
And since Hibernate chose to use one variant of JDBC, it's also JDBC
which has to take care about orphans.

Yours, Stefan

2012/1/9 Oliver Jowett <oliver@opencloud.com>:
> On 9 January 2012 12:40, Stefan Keller <sfkeller@gmail.com> wrote:
>
>> There's an important principle that the code which "allocates"
>> resources is also responsible to release it.
>
> That's one resource allocation model, yes. The other common model is
> that resources are freed when there are no remaining references to
> them, i.e. a GC model.
>
>> AFAIK in this case it's JDBC which choses to use LO (which creates
>> pg_largeobjects entries) and it's therefore also JDBC which has to
>> clean up.
>
> If the application calls LargeObjectManager.create() then it's also
> responsible for eventually calling LargeObjectManager.unlink().
>
> If you're using JDBC's Blob API, there's no API there to tell the
> driver to actually delete the underlying data (there is Blob.free(),
> but that appears to just be about freeing local resources, not the
> underlying storage).
> As a LO is independent storage that might have multiple references to
> it (the OID might be stored in many places), without explicit deletion
> you need a GC mechanism to collect unreferenced LOs eventually -
> that's what vacuumlo etc are doing.
>
> What do you suggest that the driver does differently here? (Perhaps we
> could do something like interpret Blob.truncate(0) as "delete the blob
> right now" - but is that what Hibernate actually does?)
>
> (Much of this is the whole LO vs. bytea argument all over again. If
> you want to store data with a lifetime that's the same as the row it's
> embedded in, then bytea is a much better mapping)
>
> Oliver

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Stefan Keller
Date:
2012/1/9 Oliver Jowett <oliver@opencloud.com>:
> Otherwise, what should JDBC do differently here? Be specific. It would

First, I pretty sure that Hibernate nor the Tomcat/Java GC are
misconfigured - since it works now after having installed the trigger
by hand.

To become more specific read the first two sections as a first hint
here in this official doc:
http://www.postgresql.org/docs/current/interactive/lo.html

I try to trace the JDBC calls coming from Hibernate (although the
problem seems to me pretty clearly located).
That investigation will take some time.

Yours, Stefan


2012/1/9 Oliver Jowett <oliver@opencloud.com>:
> On 9 January 2012 14:29, Stefan Keller <sfkeller@gmail.com> wrote:
>> 2012/1/9 Oliver Jowett <oliver@opencloud.com>:
>>> As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places),
withoutexplicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc
aredoing. 
>> I can follow that. But that's not what the JDBC user expects nor is it
>> explained (nor mentioned) in the JDBC docs.
>>
>> From a conceptual view I have just an entity MyWebcam with an
>> attribute called image. Attribute image is of attribute cardinality
>> 1:1 (and private):
>>
>> // Java using Hibernate/JPA:
>>  @Entity
>>  @Lob
>>  @Basic(fetch=FetchType.LAZY)
>>  public class MyWebcam {
>>    private byte[] image;
>>    private String name;
>>    public byte[] getImage() { return image; }
>>    public void setImage(byte[] _image) { image=_image; }
>>    // ... other stuff
>>  }
>>
>> That's the classic use case.
>> Isn't it obvious that if setImage() sets another byte[] that the image
>> space get's cleared by the layers below?
>> And since Hibernate chose to use one variant of JDBC, it's also JDBC
>> which has to take care about orphans.
>
> Well, either the Hibernate mapping is misconfigured, or your database
> is misconfigured i.e. you are not collecting garbage LOs. If you have
> a suitable GC mechanism configured, then what happens?
>
> Otherwise, what should JDBC do differently here? Be specific. It would
> be helpful if you could provide a native JDBC example, rather than a
> Hibernate example, since it's not clear what JDBC calls are being made
> by Hibernate.
>
> Oliver

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Oliver Jowett
Date:
On 10 January 2012 00:06, Stefan Keller <sfkeller@gmail.com> wrote:
> 2012/1/9 Oliver Jowett <oliver@opencloud.com>:
>> Otherwise, what should JDBC do differently here? Be specific. It would
>
> First, I pretty sure that Hibernate nor the Tomcat/Java GC are
> misconfigured - since it works now after having installed the trigger
> by hand.

You misunderstand - by GC I mean the process that collects garbage LOs
that are no longer referenced. I don't mean the JVM's heap GC.
You need a GC process like this if you are using LOs and not managing
their lifetimes explicitly from the application. Consider it part of
the necessary DB setup. You've already discovered the usual mechanisms
for it ('lo' or 'vacuumlo' depending on exactly what your data model
looks like).

> To become more specific read the first two sections as a first hint
> here in this official doc:
> http://www.postgresql.org/docs/current/interactive/lo.html

FWIW, that documentation is pretty old (the JDBC docs now live
separately on jdbc.postgresql.org; the JDBC references in that
appendix are mostly historical)
But I'm not sure quite what you're referring to - those docs are
fairly clear about what you need to do? Specifically:

> Now this is fine for PostgreSQL-specific applications, but standard code using JDBC or ODBC won't delete the objects,
resultingin orphan objects — objects that are not referenced by anything, and simply occupy disk space. 

Which is exactly my point - if you are going to use generic JDBC code
that does not explicitly delete LOs when they become detached, then
you need a separate mechanism to clean them up - that's just the way
the model works. If you want to avoid that, don't use LOs, use bytea.

So I'm still confused about what you'd like to see changed in the JDBC
driver. Can you explain?

Oliver

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Oliver Jowett
Date:
On 10 January 2012 00:29, Oliver Jowett <oliver@opencloud.com> wrote:

> So I'm still confused about what you'd like to see changed in the JDBC
> driver. Can you explain?

Perhaps what you're looking for here is "it all just works out of the
box". In that case, the missing piece seems to be that the DDL that
Hibernate emits (or the DDL which you have set up by hand - I don't
know how you have things set up) does not match the data model that
Hibernate is expecting. If Hibernate is expecting a data model where
the lifecycle of the LO is managed by the database and there's only at
most one reference to a particular LO (i.e. you can't link to the same
LO from multiple places), then it can set up appropriate cleanup
triggers on LO columns as part of its DDL automatically. Or it could
just map blobs to bytea, which might be a more natural mapping in that
case anyway and doesn't require triggers etc. So you'd need to talk to
the Hibernate guys about making those changes.

Oliver

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Radosław Smogura
Date:
On Mon, 9 Jan 2012 00:40:08 +0100, Stefan Keller wrote:
> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
>> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
>> reference> to LOB. In fact, BLOB behaviour in JDBC> is just thin
>> wrapper for this what is missing in PostgreSQL - BLOBs.
>
> I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too
> but
> I'd like to get the job done with LO too.
> JDBC offers both:
> http://jdbc.postgresql.org/documentation/head/binary-data.html
> I tried to summarize the characteristics here:
> http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_Objects
>
> There's an important principle that the code which "allocates"
> resources is also responsible to release it.
> AFAIK in this case it's JDBC which choses to use LO (which creates
> pg_largeobjects entries) and it's therefore also JDBC which has to
> clean up.
Hmm we have really different point of view. JDBC driver is just
interface which translates JDBC "commands" to database commands. JDBC
driver is not database. So responsible for this is database, but
database doesn't have "real" BLOBs, this what is made in PG JDBC driver
is just "not perfect" way to add this functionality to PostgreSQL. To
make complete system You need PostgreSQL, JDBC driver, and this trigger.

 From reasons I mentioned before it's quite hard for JDBC driver to
implement deallocation, because driver will be in need to ask and to
"know" for OIDs to unlink. This could made from driver database which
need to parse queries, manages db structure etc, and this will be
executed twice in driver and in database - this is ineffective. Just for
example driver may support this for "DELETE FROM", but what will be if
deletion will be result of calling stored procedure? What if stored
procedure will be created in language different then pgSQL (driver still
may try to download it), but if this will be C, or any other? JDBC
driver must be universal and fully "compatible" with PG functionality,
as it may be used only as "helper" for presentation layer, true
processing logic may be somewhere e. g. legacy system for which we add
web interface.

Because of this JDBC driver can't expose desired behavior, even if
everyone agrees that current behavior is unperfect.

Only PG server may do this, or Hibernate for example by new annotation
@MangedPgLob.

Regards,
Radek

> Yours, Stefan
>
> P.S. Just in order get some more insight I also tried to forward this
> to the JDBC list (which currently seems to have problems accepting
> new
> subscriptions).
>
> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
>> On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote:
>>>
>>> Thanks, Radosław, for the clarification.
>>>
>>> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote:
>>>>
>>>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>>>
>>>> 4. Trigger based approach is good for this, but You need to do
>>>> this
>>>> mannualy
>>>
>>>
>>> That's exactly my point:
>>> Since JDBC manages creation of OID and pg_largeobjects it's also
>>> JDBC
>>> which is responsible for update/delete of rows in pg_largeobjects.
>>> As the user expects, the only thing JDBC would have to do is to
>>> issue
>>> lo_unlink() when rows are updated or deleted.
>>> Thus, it's currently a bug in the JDBC driver. And whatever the
>>> solution is, it needs to be mentioned in the JDBC docs.
>>
>> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only
>> reference
>> to LOB. In fact, BLOB behaviour in JDBC
>> is just thin wrapper for this what is missing in PostgreSQL - BLOBs.
>> It was
>> build form available parts. In addition OID type may be used and
>> it's used
>> as the system id or may be used as just some kind of row id - all
>> types,
>> tables, sequences etc has OID. You may create table with "WITH OIDS"
>> clause.
>> You may use OID data type just as replacement for (unsigned) int, so
>> JDBC
>> can create LOB but it can't decide if given field in row is
>> reference to LOB
>> or e.g. table and what with statements "DELETE WHERE date > "?
>>
>> In fact JDBC driver is so "stupid" that if you will call getBytes or
>> getBlob
>> on any column with Oid it will ask for LOB.
>>
>> Hibernate knows this and it does what is best - calls standard BLOB
>> interface, and creates table with Oid column.
>>
>> And here again round trip, in case of deletion only Hibernate may
>> delete
>> given LOB because only Hibernate and You knows that any value in Oid
>> column
>> will reflect LOB - JDBC driver doesn't "knows " this, but... oids
>> may be
>> shared because those are only numbers, not all tables may be covered
>> by
>> Hibernate, and assuming huge imagination, someone may encode OID by
>> adding
>> 1, may store it as long, etc. I know it's quite popular that DB
>> schema comes
>> from entities, but not always You have such flexibility and
>> sometimes You
>> create Entities for particular schema.
>>
>> So, as You see only this trigger approach is currently (universally)
>> the
>> best way. If this is that module I think about it's just trigger
>> which calls
>> unlink on replace or deletion - in fact You may write own without
>> any
>> problems. Those are few lines only.
>>
>>
>>>> 5. If you want to use bytea use
>>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I
>>>> think you
>>>> should remove @Lob too) on your field.
>>>
>>>
>>> In fact, this annotation syntax usage looks like the Hibernate
>>> mapping
>>> for PostgreSQL could be enhanced.
>>
>> You have right, but Hibernate team will be in need to add auto
>> deletion for
>> bulk updates, too. PostgreSQL isn't still so popular it's worth of
>> it (and
>> they have no so much developers bug report may stand for months
>> without any
>> comment). Look how many peoples ask for Lobs. It means no one needs
>> true LOB
>> - true LOB large object stored outside table/row data space, to
>> allow out of
>> statements operation like streaming, partial updates etc. This is my
>> definition of LOB, because this is idea of LOB. I think only DB2
>> stores LOBs
>> in row, PostgreSQL do not make this but adverts this bytea ugly
>> approach.
>>
>> Personally, I create app which captures images from WebCam (like
>> You) - just
>> frames not movies. From above reason I wanted to move to bytea, but
>> due to
>> changes (wired instability and leak of backward compatibility) I
>> still have
>> Oid. Because I have only two tables for Lobs I have garbage
>> collection
>> simple script.
>>
>> Best regards
>> Radek
>>
>>
>>> Yours, Stefan
>>>
>>> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>:
>>>>
>>>> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote:
>>>>>
>>>>>
>>>>> I'd like to backup my statement below regarding in JDBC driver
>>>>> from
>>>>> PostgreSQL:
>>>>>
>>>>> When storing fields of type BLOB it inserts the binary string in
>>>>> system table pg_largeobject (via user table). But when rows in
>>>>> user
>>>>> table get updated or deleted it does not update nor delete
>>>>> corresponding rows in table pg_largeobject.
>>>>>
>>>>> That's really a bug!
>>>>>
>>>>> Fortunately there's a solution indicated in the official docs
>>>>> (http://www.postgresql.org/docs/9.1/static/lo.html)
>>>>> To me, something like this should be implemented before hand in
>>>>> JDBC
>>>>> driver.
>>>>> And in any case there should be a bold note about this in the
>>>>> JDBC
>>>>> docs
>>>>> (http://jdbc.postgresql.org/documentation/head/binary-data.html )
>>>>>
>>>>> Yours, Stefan
>>>>>
>>>>>
>>>>>
>>>>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>:
>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I run into a nasty behavior of current PostgreSQL JDBC.
>>>>>>
>>>>>> I maintain images (from Webcams). In the Java and Hibernate
>>>>>> (JPA) code
>>>>>> I specified a @Lob annotation on class MyData and a
>>>>>> attribte/data type
>>>>>> "byte[] mydata;". Hibernate then generates two tables in
>>>>>> PostgreSQL,
>>>>>> one called MyData with a column mydata of type oid and an
>>>>>> internal one
>>>>>> called pg_largobjects (which contain foreign keys to the oid).
>>>>>> That's
>>>>>> also explained in the JDBC docs [1], saying "PostgreSQL provides
>>>>>> two
>>>>>> distinct ways to store binary data. Binary data can be stored in
>>>>>> a
>>>>>> table using the data type bytea or by using the Large Object
>>>>>> feature
>>>>>> which stores the binary data in a separate table in a special
>>>>>> format
>>>>>> and refers to that table by storing a value of type oid in your
>>>>>> table."
>>>>>>
>>>>>> Now, when replacing the images (few hundred) with new ones using
>>>>>> Java,
>>>>>> pg_largeobjects grew constantly until the file system run out of
>>>>>> space. So old image data did'nt get released! This is to me a
>>>>>> bug
>>>>>> because the user/programmer must (and should) assume that there
>>>>>> is a
>>>>>> strict 1:1 relationship between generated table MyData and its
>>>>>> LOB
>>>>>> column data (stored in pg_largeobjects).
>>>>>> => I finally found the supplied module 'lo' [2] which releases
>>>>>> detached records. Is this the recommended way to resolve this
>>>>>> problem?
>>>>>>
>>>>>> Searching for explanations I found a ticket HHH2244 [3] which
>>>>>> was
>>>>>> closed by the Hibernate team without action referring to the
>>>>>> JDBC
>>>>>> Spec. which says: "An SQL BLOB is a built-in type that stores a
>>>>>> Binary
>>>>>> Large Object as a column value in a row of a database table".
>>>>>> => In other words: The PostgreSQL JDBC team should take action
>>>>>> on this
>>>>>> but didn't until now, right?
>>>>>>
>>>>>> There is another issue about "PostgreSQL and BLOBs" [4]. First
>>>>>> it
>>>>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too
>>>>>> in a
>>>>>> 'deadlock' concluding  "...the PostgreSQLDialect (as of 3.5.5)
>>>>>> needs
>>>>>> to change not to use MaterializedBlobType until the Postgres
>>>>>> (JDBC)
>>>>>> team changes their driver (which does not seem to have happened
>>>>>> in the
>>>>>> last 6 years)."
>>>>>> => Any solutions or comments form insiders on this?
>>>>>>
>>>>>> Yours, Stefan
>>>>>>
>>>>>> [1]
>>>>>> http://jdbc.postgresql.org/documentation/head/binary-data.html
>>>>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html
>>>>>> [3] https://hibernate.onjira.com/browse/HHH-2244
>>>>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
>>>>
>>>>
>>>> This is common approach for PostgreSQL and some proxy of real life
>>>> Blobs.
>>>> One time I submitted bug about this to Hibernate. But step by
>>>> step.
>>>>
>>>> 1. BLOBs are... Large Objects, they are stored as reference
>>>> because those
>>>> objects are large, if you will store this objects as bytea then
>>>> select *
>>>> will return all large data. It may not be comfortable not only to
>>>> download
>>>> few GB of data, but to keep this on stack too. From your
>>>> perspective it
>>>> doesn't matters because you put it in byte[]. But if You will keep
>>>> e.g.
>>>> CD-ROM images then it's much more better to use streaming approach
>>>> then
>>>> bytea[]. More over due to some security JDBC driver will at least
>>>> double
>>>> memory consumed by bytea.
>>>>
>>>> 2. Specifying hibernate data type as bytea do not resolve problems
>>>> because
>>>> it will still use LOB approach.
>>>>
>>>> 3. pg_largeobjects is system table, hibernate do not creates it.
>>>>
>>>> 4. Trigger based approach is good for this, but You need to do
>>>> this
>>>> mannualy
>>>>
>>>> 5. If you want to use bytea use
>>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I
>>>> think you
>>>> should remove @Lob too) on your field.
>>>>
>>>> Regards,
>>>> Radosław Smogura
>>
>>


Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Alban Hertroys
Date:
On 9 January 2012 14:55, Radosław Smogura <rsmogura@softperience.eu> wrote:
> So responsible for this is database, but database doesn't have
> "real" BLOBs, this what is made in PG JDBC driver is just "not perfect" way
> to add this functionality to PostgreSQL.

I think you should elaborate on what you mean when you say that
Postgres doesn't have "real" BLOBs.

This discussion did make me wonder about something in Postgres'
LOB-support though. As explained earlier, the current implementation
allows for dedubbing LOB's, so that it's not necessary to store the
same large(!) object multiple times. That is also what's causing this
issue with the JDBC driver, or perhaps Hybernate in particular.

However, shouldn't it be up to the application designer to dedup large
objects or not?

The current implementation is probably rather convenient for projects
where duplicate large objects are common, but - as it turns out - it
can be quite inconvenient when that's opposite to expectations.

ISTMT this behaviour should at least be optional. Of course that
raises the question what should happen with an existing set of LOB's
when that setting gets changed.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.

Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Thomas Markus
Date:
Hi,

thats not a bug its a feature ;)

the combination hibernate/pgjdbc uses pg large objects for byte[] and
Blob properties so only oids are referenced. Use 'vacuumlo' to free up
your space.

regards
Thomas


Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Radosław Smogura
Date:
Dnia poniedziałek, 9 stycznia 2012 o 15:52:23 Alban Hertroys napisał(a):
> On 9 January 2012 14:55, Radosław Smogura <rsmogura@softperience.eu> wrote:
> > So responsible for this is database, but database doesn't have
> > "real" BLOBs, this what is made in PG JDBC driver is just "not perfect"
> > way to add this functionality to PostgreSQL.
>
> I think you should elaborate on what you mean when you say that
> Postgres doesn't have "real" BLOBs.
>
> This discussion did make me wonder about something in Postgres'
> LOB-support though. As explained earlier, the current implementation
> allows for dedubbing LOB's, so that it's not necessary to store the
> same large(!) object multiple times. That is also what's causing this
> issue with the JDBC driver, or perhaps Hybernate in particular.
>
> However, shouldn't it be up to the application designer to dedup large
> objects or not?
>
> The current implementation is probably rather convenient for projects
> where duplicate large objects are common, but - as it turns out - it
> can be quite inconvenient when that's opposite to expectations.
>
> ISTMT this behaviour should at least be optional. Of course that
> raises the question what should happen with an existing set of LOB's
> when that setting gets changed.
In real world BLOBs are transfered as references, and those references are
managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is
universal type so it can't be used by GC approach, unles collector will know
which Oid is LOB oid.

Oid is like void*, it's abstarct pointer. If you get void* you don't know if
data referenced by it represent person row, or car row, you don't know if
void* is even reference or just 64 bit number. Current implementation is not
type safe. You can't just write UPDATE TABLE x SET blob = 'aadfasfasfda' which
in current times should be supported, but you may write (if are not fully
familiar with db) UPDATE table X set varchar_d = blob_column;

In fact LOB's id may be stored even as varchar. So true is that PG supports
LOBs, but due to missing functionality LOBs are quite hard to manage. It's
like car withot steering wheel - you may drive, but it's little bit hard.

Regards,
Radek




Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues

From
Alban Hertroys
Date:
On 9 Jan 2012, at 18:57, Radosław Smogura wrote:

> In real world BLOBs are transfered as references, and those references are
> managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is
> universal type so it can't be used by GC approach, unles collector will know
> which Oid is LOB oid.

What do you mean by "nacked"?
You can unlink lob's, there's your support for deletion.

> Oid is like void*, it's abstarct pointer. If you get void* you don't know if
> data referenced by it represent person row, or car row, you don't know if
> void* is even reference or just 64 bit number. Current implementation is not
> type safe. You can't just write UPDATE TABLE x SET blob = 'aadfasfasfda' which
> in current times should be supported, but you may write (if are not fully
> familiar with db) UPDATE table X set varchar_d = blob_column;

That's easy to remedy, similar to how most implementations in C don't use straight void pointers. In C you'd just
typedefthem to something meaningful: 

    typedef blob oid;

Similarly you can wrap them in a domain in PG:

    create domain blob as oid;


It would be cool if that would allow to add an FK-constraint to the oid in pg_largeobject to that domain, but alas,
thatisn't possible in my version (I'm a bit behind with pg 8.4). 

I agree that it would be nice if PG provided a built-in type for lobs (blob's are a subdivision of those), especially
ifthat would also handle the reference to pg_largeobject. 

> In fact LOB's id may be stored even as varchar. So true is that PG supports
> LOBs, but due to missing functionality LOBs are quite hard to manage. It's
> like car withot steering wheel - you may drive, but it's little bit hard.


That's probably just because PG knows to cast that varchar to something compatible with oid's. I suspect that in recent
versionsthat cast may not be allowed anymore though. 

And remember, SELECT 'my explicit string value'; does not in fact denote a string value, but a literal. While the query
isstill in SQL notation (meaning until the query parser is done with it), everything is text. 
The way I understand it, a literal gets a meaningful type once it is compared to a value of a known type (typically
froma column) or once it gets cast to a type explicitly. If that never happens, I expect that the literal will not be
convertedto any type and stay the text value that it was in the SQL query string. 

This is probably documented, but I don't have time to dig into the manuals right now.

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


On 10 January 2012 06:57, Radosław Smogura <rsmogura@softperience.eu> wrote:

> In real world BLOBs are transfered as references, and those references are
> managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is
> universal type so it can't be used by GC approach, unles collector will know
> which Oid is LOB oid.

So you just end up with a conservative collector, not an exact
collector. In practice conservative collectors work OK. (You can use a
subtype to identify OIDs-that-refer-to-a-LO as suggested elsewhere in
the thread if you want an exact collector)

Oliver

I'm still fumbling in the dark but I think I have a smell:

Does somebody know what supportsLobValueChangePropogation according to
the current JDBC specs?

There's an interesting note there:
> NOTE : I do not know the correct answer currently for databases which (1) are not part of the cruise control process
or(2) do not supportsExpectedLobUsagePattern(). 

http://srcrr.org/java/hibernate/3.6.2/reference/org/hibernate/dialect/PostgreSQLDialect.html#supportsLobValueChangePropogation()

It's currently set to false:

https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/dialect/PostgreSQLDialect.java

Yours, Stefan

2012/1/9 Oliver Jowett <oliver@opencloud.com>:
> On 10 January 2012 06:57, Radosław Smogura <rsmogura@softperience.eu> wrote:
>
>> In real world BLOBs are transfered as references, and those references are
>> managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is
>> universal type so it can't be used by GC approach, unles collector will know
>> which Oid is LOB oid.
>
> So you just end up with a conservative collector, not an exact
> collector. In practice conservative collectors work OK. (You can use a
> subtype to identify OIDs-that-refer-to-a-LO as suggested elsewhere in
> the thread if you want an exact collector)
>
> Oliver