Thread: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
Hi, I run into a nasty behavior of current PostgreSQL JDBC. I maintain images (from Webcams). In the Java and Hibernate (JPA) code I specified a @Lob annotation on class MyData and a attribte/data type "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, one called MyData with a column mydata of type oid and an internal one called pg_largobjects (which contain foreign keys to the oid). That's also explained in the JDBC docs [1], saying "PostgreSQL provides two distinct ways to store binary data. Binary data can be stored in a table using the data type bytea or by using the Large Object feature which stores the binary data in a separate table in a special format and refers to that table by storing a value of type oid in your table." Now, when replacing the images (few hundred) with new ones using Java, pg_largeobjects grew constantly until the file system run out of space. So old image data did'nt get released! This is to me a bug because the user/programmer must (and should) assume that there is a strict 1:1 relationship between generated table MyData and its LOB column data (stored in pg_largeobjects). => I finally found the supplied module 'lo' [2] which releases detached records. Is this the recommended way to resolve this problem? Searching for explanations I found a ticket HHH2244 [3] which was closed by the Hibernate team without action referring to the JDBC Spec. which says: "An SQL BLOB is a built-in type that stores a Binary Large Object as a column value in a row of a database table". => In other words: The PostgreSQL JDBC team should take action on this but didn't until now, right? There is another issue about "PostgreSQL and BLOBs" [4]. First it cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs to change not to use MaterializedBlobType until the Postgres (JDBC) team changes their driver (which does not seem to have happened in the last 6 years)." => Any solutions or comments form insiders on this? Yours, Stefan [1] http://jdbc.postgresql.org/documentation/head/binary-data.html [2] http://www.postgresql.org/docs/current/interactive/lo.html [3] https://hibernate.onjira.com/browse/HHH-2244 [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
I'd like to backup my statement below regarding in JDBC driver from PostgreSQL: When storing fields of type BLOB it inserts the binary string in system table pg_largeobject (via user table). But when rows in user table get updated or deleted it does not update nor delete corresponding rows in table pg_largeobject. That's really a bug! Fortunately there's a solution indicated in the official docs (http://www.postgresql.org/docs/9.1/static/lo.html) To me, something like this should be implemented before hand in JDBC driver. And in any case there should be a bold note about this in the JDBC docs (http://jdbc.postgresql.org/documentation/head/binary-data.html ) Yours, Stefan 2012/1/6 Stefan Keller <sfkeller@gmail.com>: > Hi, > > I run into a nasty behavior of current PostgreSQL JDBC. > > I maintain images (from Webcams). In the Java and Hibernate (JPA) code > I specified a @Lob annotation on class MyData and a attribte/data type > "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, > one called MyData with a column mydata of type oid and an internal one > called pg_largobjects (which contain foreign keys to the oid). That's > also explained in the JDBC docs [1], saying "PostgreSQL provides two > distinct ways to store binary data. Binary data can be stored in a > table using the data type bytea or by using the Large Object feature > which stores the binary data in a separate table in a special format > and refers to that table by storing a value of type oid in your > table." > > Now, when replacing the images (few hundred) with new ones using Java, > pg_largeobjects grew constantly until the file system run out of > space. So old image data did'nt get released! This is to me a bug > because the user/programmer must (and should) assume that there is a > strict 1:1 relationship between generated table MyData and its LOB > column data (stored in pg_largeobjects). > => I finally found the supplied module 'lo' [2] which releases > detached records. Is this the recommended way to resolve this problem? > > Searching for explanations I found a ticket HHH2244 [3] which was > closed by the Hibernate team without action referring to the JDBC > Spec. which says: "An SQL BLOB is a built-in type that stores a Binary > Large Object as a column value in a row of a database table". > => In other words: The PostgreSQL JDBC team should take action on this > but didn't until now, right? > > There is another issue about "PostgreSQL and BLOBs" [4]. First it > cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a > 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs > to change not to use MaterializedBlobType until the Postgres (JDBC) > team changes their driver (which does not seem to have happened in the > last 6 years)." > => Any solutions or comments form insiders on this? > > Yours, Stefan > > [1] http://jdbc.postgresql.org/documentation/head/binary-data.html > [2] http://www.postgresql.org/docs/current/interactive/lo.html > [3] https://hibernate.onjira.com/browse/HHH-2244 > [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Thomas Kellerer
Date:
Stefan Keller wrote on 06.01.2012 19:04: > I maintain images (from Webcams). In the Java and Hibernate (JPA) code > I specified a @Lob annotation on class MyData and a attribte/data type > "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, > one called MyData with a column mydata of type oid and an internal one > called pg_largobjects (which contain foreign keys to the oid). That's > also explained in the JDBC docs [1], saying "PostgreSQL provides two > distinct ways to store binary data. Binary data can be stored in a > table using the data type bytea or by using the Large Object feature > which stores the binary data in a separate table in a special format > and refers to that table by storing a value of type oid in your > table." I think you are better off using bytea unless you need to access only parts of the blob regularly.
Re: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
2012/1/8 Thomas Kellerer <spam_eater@gmx.net> wrote: > I think you are better off using bytea unless you need to access only parts > of the blob regularly. That's a valid tip. But it's to the current JDBC implementation to take action because it currently leads to disk space leakage when using JDBC and JPA/Hibernate. After a tedious time of bug searching I found a possible solution and a testimonial here: "One of the problems with the JDBC driver (and this affects the ODBC driver also), is that the specification assumes that references to BLOBs (Binary Large OBjects) are stored within a table, and if that entry is changed, the associated BLOB is deleted from the database. As PostgreSQL stands, this doesn't occur." (taken from the docs http://www.postgresql.org/docs/9.1/static/lo.html ) In addition, unfortunately the (lo_unlink) problem of LargeObjects and BLOBs is not mentioned in the JDBC docs: http://jdbc.postgresql.org/documentation/head/binary-data.html Stefan 2012/1/8 Thomas Kellerer <spam_eater@gmx.net>: > Stefan Keller wrote on 06.01.2012 19:04: > >> I maintain images (from Webcams). In the Java and Hibernate (JPA) code >> I specified a @Lob annotation on class MyData and a attribte/data type >> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, >> one called MyData with a column mydata of type oid and an internal one >> called pg_largobjects (which contain foreign keys to the oid). That's >> also explained in the JDBC docs [1], saying "PostgreSQL provides two >> distinct ways to store binary data. Binary data can be stored in a >> table using the data type bytea or by using the Large Object feature >> which stores the binary data in a separate table in a special format >> and refers to that table by storing a value of type oid in your >> table." > > > I think you are better off using bytea unless you need to access only parts > of the blob regularly. > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Thomas Kellerer
Date:
Stefan Keller wrote on 08.01.2012 19:13: >> I think you are better off using bytea unless you need to access only parts >> of the blob regularly. > > That's a valid tip. But it's to the current JDBC implementation to > take action because it currently leads to disk space leakage when > using JDBC and JPA/Hibernate. But only if you use large objects. From my perspective bytea is the (only) data type that matches the JDBC BLOB type. And none of the problems you have occur when using bytea. There is no match for PG's large objects in the JDBC API so I don't see your claim that it's a fault of the driver. What's the reason for you to stick with LargeObjects?
Re: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:> What's the reason for you to stick with LargeObjects? I simply used the @Lob annotation in Hibernate/JPA. That's all to get a leaking pg_largeobject table. See http://relation.to/Bloggers/PostgreSQLAndBLOBs, https://hibernate.onjira.com/browse/HHH-2244 and https://hibernate.onjira.com/browse/HHH-4876 for some background of the dilemma. Stefan 2012/1/8 Thomas Kellerer <spam_eater@gmx.net>: > Stefan Keller wrote on 08.01.2012 19:13: > >>> I think you are better off using bytea unless you need to access only >>> parts the blob regularly. >> >> >> That's a valid tip. But it's to the current JDBC implementation to >> take action because it currently leads to disk space leakage when >> using JDBC and JPA/Hibernate. > > > But only if you use large objects. > > From my perspective bytea is the (only) data type that matches the JDBC BLOB > type. > And none of the problems you have occur when using bytea. > > There is no match for PG's large objects in the JDBC API so I don't see your > claim that it's a fault of the driver. > > What's the reason for you to stick with LargeObjects? > > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Thomas Kellerer
Date:
Stefan Keller wrote on 08.01.2012 20:35: > 2012/1/8 Thomas Kellerer<spam_eater@gmx.net>: > What's the reason for you to stick with LargeObjects? > > I simply used the @Lob annotation in Hibernate/JPA. > That's all to get a leaking pg_largeobject table. > See http://relation.to/Bloggers/PostgreSQLAndBLOBs, > https://hibernate.onjira.com/browse/HHH-2244 and > https://hibernate.onjira.com/browse/HHH-4876 for some background of > the dilemma. So it's clearly a Hibernate bug.
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
Thanks, Radosław, for the clarification. 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote: > 3. pg_largeobjects is system table, hibernate do not creates it. > > 4. Trigger based approach is good for this, but You need to do this mannualy That's exactly my point: Since JDBC manages creation of OID and pg_largeobjects it's also JDBC which is responsible for update/delete of rows in pg_largeobjects. As the user expects, the only thing JDBC would have to do is to issue lo_unlink() when rows are updated or deleted. Thus, it's currently a bug in the JDBC driver. And whatever the solution is, it needs to be mentioned in the JDBC docs. > 5. If you want to use bytea use > @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you > should remove @Lob too) on your field. In fact, this annotation syntax usage looks like the Hibernate mapping for PostgreSQL could be enhanced. Yours, Stefan 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>: > On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote: >> >> I'd like to backup my statement below regarding in JDBC driver from >> PostgreSQL: >> >> When storing fields of type BLOB it inserts the binary string in >> system table pg_largeobject (via user table). But when rows in user >> table get updated or deleted it does not update nor delete >> corresponding rows in table pg_largeobject. >> >> That's really a bug! >> >> Fortunately there's a solution indicated in the official docs >> (http://www.postgresql.org/docs/9.1/static/lo.html) >> To me, something like this should be implemented before hand in JDBC >> driver. >> And in any case there should be a bold note about this in the JDBC >> docs (http://jdbc.postgresql.org/documentation/head/binary-data.html ) >> >> Yours, Stefan >> >> >> >> 2012/1/6 Stefan Keller <sfkeller@gmail.com>: >>> >>> Hi, >>> >>> I run into a nasty behavior of current PostgreSQL JDBC. >>> >>> I maintain images (from Webcams). In the Java and Hibernate (JPA) code >>> I specified a @Lob annotation on class MyData and a attribte/data type >>> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, >>> one called MyData with a column mydata of type oid and an internal one >>> called pg_largobjects (which contain foreign keys to the oid). That's >>> also explained in the JDBC docs [1], saying "PostgreSQL provides two >>> distinct ways to store binary data. Binary data can be stored in a >>> table using the data type bytea or by using the Large Object feature >>> which stores the binary data in a separate table in a special format >>> and refers to that table by storing a value of type oid in your >>> table." >>> >>> Now, when replacing the images (few hundred) with new ones using Java, >>> pg_largeobjects grew constantly until the file system run out of >>> space. So old image data did'nt get released! This is to me a bug >>> because the user/programmer must (and should) assume that there is a >>> strict 1:1 relationship between generated table MyData and its LOB >>> column data (stored in pg_largeobjects). >>> => I finally found the supplied module 'lo' [2] which releases >>> detached records. Is this the recommended way to resolve this problem? >>> >>> Searching for explanations I found a ticket HHH2244 [3] which was >>> closed by the Hibernate team without action referring to the JDBC >>> Spec. which says: "An SQL BLOB is a built-in type that stores a Binary >>> Large Object as a column value in a row of a database table". >>> => In other words: The PostgreSQL JDBC team should take action on this >>> but didn't until now, right? >>> >>> There is another issue about "PostgreSQL and BLOBs" [4]. First it >>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a >>> 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs >>> to change not to use MaterializedBlobType until the Postgres (JDBC) >>> team changes their driver (which does not seem to have happened in the >>> last 6 years)." >>> => Any solutions or comments form insiders on this? >>> >>> Yours, Stefan >>> >>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html >>> [2] http://www.postgresql.org/docs/current/interactive/lo.html >>> [3] https://hibernate.onjira.com/browse/HHH-2244 >>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs > > This is common approach for PostgreSQL and some proxy of real life Blobs. > One time I submitted bug about this to Hibernate. But step by step. > > 1. BLOBs are... Large Objects, they are stored as reference because those > objects are large, if you will store this objects as bytea then select * > will return all large data. It may not be comfortable not only to download > few GB of data, but to keep this on stack too. From your perspective it > doesn't matters because you put it in byte[]. But if You will keep e.g. > CD-ROM images then it's much more better to use streaming approach then > bytea[]. More over due to some security JDBC driver will at least double > memory consumed by bytea. > > 2. Specifying hibernate data type as bytea do not resolve problems because > it will still use LOB approach. > > 3. pg_largeobjects is system table, hibernate do not creates it. > > 4. Trigger based approach is good for this, but You need to do this mannualy > > 5. If you want to use bytea use > @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you > should remove @Lob too) on your field. > > Regards, > Radosław Smogura
Re: Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
2012/1/8 Thomas Kellerer <spam_eater@gmx.net>:> So it's clearly a Hibernate bug. Obviously not :-> Hibernate mapping just uses one of two valid variants to map large objects in JDBC. So, AFAIK it's a PostgreSQL JDBC bug and an omission in the JDBC docs as well. Stefan 2012/1/8 Thomas Kellerer <spam_eater@gmx.net>: > Stefan Keller wrote on 08.01.2012 20:35: > >> 2012/1/8 Thomas Kellerer<spam_eater@gmx.net>: >> What's the reason for you to stick with LargeObjects? >> >> I simply used the @Lob annotation in Hibernate/JPA. >> That's all to get a leaking pg_largeobject table. >> See http://relation.to/Bloggers/PostgreSQLAndBLOBs, >> https://hibernate.onjira.com/browse/HHH-2244 and >> https://hibernate.onjira.com/browse/HHH-4876 for some background of >> the dilemma. > > So it's clearly a Hibernate bug. > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Radosław Smogura
Date:
On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote: > I'd like to backup my statement below regarding in JDBC driver from > PostgreSQL: > > When storing fields of type BLOB it inserts the binary string in > system table pg_largeobject (via user table). But when rows in user > table get updated or deleted it does not update nor delete > corresponding rows in table pg_largeobject. > > That's really a bug! > > Fortunately there's a solution indicated in the official docs > (http://www.postgresql.org/docs/9.1/static/lo.html) > To me, something like this should be implemented before hand in JDBC > driver. > And in any case there should be a bold note about this in the JDBC > docs (http://jdbc.postgresql.org/documentation/head/binary-data.html > ) > > Yours, Stefan > > > > 2012/1/6 Stefan Keller <sfkeller@gmail.com>: >> Hi, >> >> I run into a nasty behavior of current PostgreSQL JDBC. >> >> I maintain images (from Webcams). In the Java and Hibernate (JPA) >> code >> I specified a @Lob annotation on class MyData and a attribte/data >> type >> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, >> one called MyData with a column mydata of type oid and an internal >> one >> called pg_largobjects (which contain foreign keys to the oid). >> That's >> also explained in the JDBC docs [1], saying "PostgreSQL provides two >> distinct ways to store binary data. Binary data can be stored in a >> table using the data type bytea or by using the Large Object feature >> which stores the binary data in a separate table in a special format >> and refers to that table by storing a value of type oid in your >> table." >> >> Now, when replacing the images (few hundred) with new ones using >> Java, >> pg_largeobjects grew constantly until the file system run out of >> space. So old image data did'nt get released! This is to me a bug >> because the user/programmer must (and should) assume that there is a >> strict 1:1 relationship between generated table MyData and its LOB >> column data (stored in pg_largeobjects). >> => I finally found the supplied module 'lo' [2] which releases >> detached records. Is this the recommended way to resolve this >> problem? >> >> Searching for explanations I found a ticket HHH2244 [3] which was >> closed by the Hibernate team without action referring to the JDBC >> Spec. which says: "An SQL BLOB is a built-in type that stores a >> Binary >> Large Object as a column value in a row of a database table". >> => In other words: The PostgreSQL JDBC team should take action on >> this >> but didn't until now, right? >> >> There is another issue about "PostgreSQL and BLOBs" [4]. First it >> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a >> 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs >> to change not to use MaterializedBlobType until the Postgres (JDBC) >> team changes their driver (which does not seem to have happened in >> the >> last 6 years)." >> => Any solutions or comments form insiders on this? >> >> Yours, Stefan >> >> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html >> [2] http://www.postgresql.org/docs/current/interactive/lo.html >> [3] https://hibernate.onjira.com/browse/HHH-2244 >> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs This is common approach for PostgreSQL and some proxy of real life Blobs. One time I submitted bug about this to Hibernate. But step by step. 1. BLOBs are... Large Objects, they are stored as reference because those objects are large, if you will store this objects as bytea then select * will return all large data. It may not be comfortable not only to download few GB of data, but to keep this on stack too. From your perspective it doesn't matters because you put it in byte[]. But if You will keep e.g. CD-ROM images then it's much more better to use streaming approach then bytea[]. More over due to some security JDBC driver will at least double memory consumed by bytea. 2. Specifying hibernate data type as bytea do not resolve problems because it will still use LOB approach. 3. pg_largeobjects is system table, hibernate do not creates it. 4. Trigger based approach is good for this, but You need to do this mannualy 5. If you want to use bytea use @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you should remove @Lob too) on your field. Regards, Radosław Smogura
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Radosław Smogura
Date:
On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote: > Thanks, Radosław, for the clarification. > > 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote: >> 3. pg_largeobjects is system table, hibernate do not creates it. >> >> 4. Trigger based approach is good for this, but You need to do this >> mannualy > > That's exactly my point: > Since JDBC manages creation of OID and pg_largeobjects it's also JDBC > which is responsible for update/delete of rows in pg_largeobjects. > As the user expects, the only thing JDBC would have to do is to issue > lo_unlink() when rows are updated or deleted. > Thus, it's currently a bug in the JDBC driver. And whatever the > solution is, it needs to be mentioned in the JDBC docs. Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference to LOB. In fact, BLOB behaviour in JDBC is just thin wrapper for this what is missing in PostgreSQL - BLOBs. It was build form available parts. In addition OID type may be used and it's used as the system id or may be used as just some kind of row id - all types, tables, sequences etc has OID. You may create table with "WITH OIDS" clause. You may use OID data type just as replacement for (unsigned) int, so JDBC can create LOB but it can't decide if given field in row is reference to LOB or e.g. table and what with statements "DELETE WHERE date > "? In fact JDBC driver is so "stupid" that if you will call getBytes or getBlob on any column with Oid it will ask for LOB. Hibernate knows this and it does what is best - calls standard BLOB interface, and creates table with Oid column. And here again round trip, in case of deletion only Hibernate may delete given LOB because only Hibernate and You knows that any value in Oid column will reflect LOB - JDBC driver doesn't "knows " this, but... oids may be shared because those are only numbers, not all tables may be covered by Hibernate, and assuming huge imagination, someone may encode OID by adding 1, may store it as long, etc. I know it's quite popular that DB schema comes from entities, but not always You have such flexibility and sometimes You create Entities for particular schema. So, as You see only this trigger approach is currently (universally) the best way. If this is that module I think about it's just trigger which calls unlink on replace or deletion - in fact You may write own without any problems. Those are few lines only. >> 5. If you want to use bytea use >> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think >> you >> should remove @Lob too) on your field. > > In fact, this annotation syntax usage looks like the Hibernate > mapping > for PostgreSQL could be enhanced. You have right, but Hibernate team will be in need to add auto deletion for bulk updates, too. PostgreSQL isn't still so popular it's worth of it (and they have no so much developers bug report may stand for months without any comment). Look how many peoples ask for Lobs. It means no one needs true LOB - true LOB large object stored outside table/row data space, to allow out of statements operation like streaming, partial updates etc. This is my definition of LOB, because this is idea of LOB. I think only DB2 stores LOBs in row, PostgreSQL do not make this but adverts this bytea ugly approach. Personally, I create app which captures images from WebCam (like You) - just frames not movies. From above reason I wanted to move to bytea, but due to changes (wired instability and leak of backward compatibility) I still have Oid. Because I have only two tables for Lobs I have garbage collection simple script. Best regards Radek > Yours, Stefan > > 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>: >> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote: >>> >>> I'd like to backup my statement below regarding in JDBC driver from >>> PostgreSQL: >>> >>> When storing fields of type BLOB it inserts the binary string in >>> system table pg_largeobject (via user table). But when rows in user >>> table get updated or deleted it does not update nor delete >>> corresponding rows in table pg_largeobject. >>> >>> That's really a bug! >>> >>> Fortunately there's a solution indicated in the official docs >>> (http://www.postgresql.org/docs/9.1/static/lo.html) >>> To me, something like this should be implemented before hand in >>> JDBC >>> driver. >>> And in any case there should be a bold note about this in the JDBC >>> docs >>> (http://jdbc.postgresql.org/documentation/head/binary-data.html ) >>> >>> Yours, Stefan >>> >>> >>> >>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>: >>>> >>>> Hi, >>>> >>>> I run into a nasty behavior of current PostgreSQL JDBC. >>>> >>>> I maintain images (from Webcams). In the Java and Hibernate (JPA) >>>> code >>>> I specified a @Lob annotation on class MyData and a attribte/data >>>> type >>>> "byte[] mydata;". Hibernate then generates two tables in >>>> PostgreSQL, >>>> one called MyData with a column mydata of type oid and an internal >>>> one >>>> called pg_largobjects (which contain foreign keys to the oid). >>>> That's >>>> also explained in the JDBC docs [1], saying "PostgreSQL provides >>>> two >>>> distinct ways to store binary data. Binary data can be stored in a >>>> table using the data type bytea or by using the Large Object >>>> feature >>>> which stores the binary data in a separate table in a special >>>> format >>>> and refers to that table by storing a value of type oid in your >>>> table." >>>> >>>> Now, when replacing the images (few hundred) with new ones using >>>> Java, >>>> pg_largeobjects grew constantly until the file system run out of >>>> space. So old image data did'nt get released! This is to me a bug >>>> because the user/programmer must (and should) assume that there is >>>> a >>>> strict 1:1 relationship between generated table MyData and its LOB >>>> column data (stored in pg_largeobjects). >>>> => I finally found the supplied module 'lo' [2] which releases >>>> detached records. Is this the recommended way to resolve this >>>> problem? >>>> >>>> Searching for explanations I found a ticket HHH2244 [3] which was >>>> closed by the Hibernate team without action referring to the JDBC >>>> Spec. which says: "An SQL BLOB is a built-in type that stores a >>>> Binary >>>> Large Object as a column value in a row of a database table". >>>> => In other words: The PostgreSQL JDBC team should take action on >>>> this >>>> but didn't until now, right? >>>> >>>> There is another issue about "PostgreSQL and BLOBs" [4]. First it >>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in >>>> a >>>> 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) >>>> needs >>>> to change not to use MaterializedBlobType until the Postgres >>>> (JDBC) >>>> team changes their driver (which does not seem to have happened in >>>> the >>>> last 6 years)." >>>> => Any solutions or comments form insiders on this? >>>> >>>> Yours, Stefan >>>> >>>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html >>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html >>>> [3] https://hibernate.onjira.com/browse/HHH-2244 >>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs >> >> This is common approach for PostgreSQL and some proxy of real life >> Blobs. >> One time I submitted bug about this to Hibernate. But step by step. >> >> 1. BLOBs are... Large Objects, they are stored as reference because >> those >> objects are large, if you will store this objects as bytea then >> select * >> will return all large data. It may not be comfortable not only to >> download >> few GB of data, but to keep this on stack too. From your perspective >> it >> doesn't matters because you put it in byte[]. But if You will keep >> e.g. >> CD-ROM images then it's much more better to use streaming approach >> then >> bytea[]. More over due to some security JDBC driver will at least >> double >> memory consumed by bytea. >> >> 2. Specifying hibernate data type as bytea do not resolve problems >> because >> it will still use LOB approach. >> >> 3. pg_largeobjects is system table, hibernate do not creates it. >> >> 4. Trigger based approach is good for this, but You need to do this >> mannualy >> >> 5. If you want to use bytea use >> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think >> you >> should remove @Lob too) on your field. >> >> Regards, >> Radosław Smogura
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote: > Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference> to LOB. In fact, BLOB behaviour in JDBC> isjust thin wrapper for this what is missing in PostgreSQL - BLOBs. I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too but I'd like to get the job done with LO too. JDBC offers both: http://jdbc.postgresql.org/documentation/head/binary-data.html I tried to summarize the characteristics here: http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_Objects There's an important principle that the code which "allocates" resources is also responsible to release it. AFAIK in this case it's JDBC which choses to use LO (which creates pg_largeobjects entries) and it's therefore also JDBC which has to clean up. Yours, Stefan P.S. Just in order get some more insight I also tried to forward this to the JDBC list (which currently seems to have problems accepting new subscriptions). 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>: > On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote: >> >> Thanks, Radosław, for the clarification. >> >> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote: >>> >>> 3. pg_largeobjects is system table, hibernate do not creates it. >>> >>> 4. Trigger based approach is good for this, but You need to do this >>> mannualy >> >> >> That's exactly my point: >> Since JDBC manages creation of OID and pg_largeobjects it's also JDBC >> which is responsible for update/delete of rows in pg_largeobjects. >> As the user expects, the only thing JDBC would have to do is to issue >> lo_unlink() when rows are updated or deleted. >> Thus, it's currently a bug in the JDBC driver. And whatever the >> solution is, it needs to be mentioned in the JDBC docs. > > Not quite, PostgreSQL doesn't have LOB, nor OID type that is only reference > to LOB. In fact, BLOB behaviour in JDBC > is just thin wrapper for this what is missing in PostgreSQL - BLOBs. It was > build form available parts. In addition OID type may be used and it's used > as the system id or may be used as just some kind of row id - all types, > tables, sequences etc has OID. You may create table with "WITH OIDS" clause. > You may use OID data type just as replacement for (unsigned) int, so JDBC > can create LOB but it can't decide if given field in row is reference to LOB > or e.g. table and what with statements "DELETE WHERE date > "? > > In fact JDBC driver is so "stupid" that if you will call getBytes or getBlob > on any column with Oid it will ask for LOB. > > Hibernate knows this and it does what is best - calls standard BLOB > interface, and creates table with Oid column. > > And here again round trip, in case of deletion only Hibernate may delete > given LOB because only Hibernate and You knows that any value in Oid column > will reflect LOB - JDBC driver doesn't "knows " this, but... oids may be > shared because those are only numbers, not all tables may be covered by > Hibernate, and assuming huge imagination, someone may encode OID by adding > 1, may store it as long, etc. I know it's quite popular that DB schema comes > from entities, but not always You have such flexibility and sometimes You > create Entities for particular schema. > > So, as You see only this trigger approach is currently (universally) the > best way. If this is that module I think about it's just trigger which calls > unlink on replace or deletion - in fact You may write own without any > problems. Those are few lines only. > > >>> 5. If you want to use bytea use >>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you >>> should remove @Lob too) on your field. >> >> >> In fact, this annotation syntax usage looks like the Hibernate mapping >> for PostgreSQL could be enhanced. > > You have right, but Hibernate team will be in need to add auto deletion for > bulk updates, too. PostgreSQL isn't still so popular it's worth of it (and > they have no so much developers bug report may stand for months without any > comment). Look how many peoples ask for Lobs. It means no one needs true LOB > - true LOB large object stored outside table/row data space, to allow out of > statements operation like streaming, partial updates etc. This is my > definition of LOB, because this is idea of LOB. I think only DB2 stores LOBs > in row, PostgreSQL do not make this but adverts this bytea ugly approach. > > Personally, I create app which captures images from WebCam (like You) - just > frames not movies. From above reason I wanted to move to bytea, but due to > changes (wired instability and leak of backward compatibility) I still have > Oid. Because I have only two tables for Lobs I have garbage collection > simple script. > > Best regards > Radek > > >> Yours, Stefan >> >> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>: >>> >>> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote: >>>> >>>> >>>> I'd like to backup my statement below regarding in JDBC driver from >>>> PostgreSQL: >>>> >>>> When storing fields of type BLOB it inserts the binary string in >>>> system table pg_largeobject (via user table). But when rows in user >>>> table get updated or deleted it does not update nor delete >>>> corresponding rows in table pg_largeobject. >>>> >>>> That's really a bug! >>>> >>>> Fortunately there's a solution indicated in the official docs >>>> (http://www.postgresql.org/docs/9.1/static/lo.html) >>>> To me, something like this should be implemented before hand in JDBC >>>> driver. >>>> And in any case there should be a bold note about this in the JDBC >>>> docs (http://jdbc.postgresql.org/documentation/head/binary-data.html ) >>>> >>>> Yours, Stefan >>>> >>>> >>>> >>>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>: >>>>> >>>>> >>>>> Hi, >>>>> >>>>> I run into a nasty behavior of current PostgreSQL JDBC. >>>>> >>>>> I maintain images (from Webcams). In the Java and Hibernate (JPA) code >>>>> I specified a @Lob annotation on class MyData and a attribte/data type >>>>> "byte[] mydata;". Hibernate then generates two tables in PostgreSQL, >>>>> one called MyData with a column mydata of type oid and an internal one >>>>> called pg_largobjects (which contain foreign keys to the oid). That's >>>>> also explained in the JDBC docs [1], saying "PostgreSQL provides two >>>>> distinct ways to store binary data. Binary data can be stored in a >>>>> table using the data type bytea or by using the Large Object feature >>>>> which stores the binary data in a separate table in a special format >>>>> and refers to that table by storing a value of type oid in your >>>>> table." >>>>> >>>>> Now, when replacing the images (few hundred) with new ones using Java, >>>>> pg_largeobjects grew constantly until the file system run out of >>>>> space. So old image data did'nt get released! This is to me a bug >>>>> because the user/programmer must (and should) assume that there is a >>>>> strict 1:1 relationship between generated table MyData and its LOB >>>>> column data (stored in pg_largeobjects). >>>>> => I finally found the supplied module 'lo' [2] which releases >>>>> detached records. Is this the recommended way to resolve this problem? >>>>> >>>>> Searching for explanations I found a ticket HHH2244 [3] which was >>>>> closed by the Hibernate team without action referring to the JDBC >>>>> Spec. which says: "An SQL BLOB is a built-in type that stores a Binary >>>>> Large Object as a column value in a row of a database table". >>>>> => In other words: The PostgreSQL JDBC team should take action on this >>>>> but didn't until now, right? >>>>> >>>>> There is another issue about "PostgreSQL and BLOBs" [4]. First it >>>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too in a >>>>> 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) needs >>>>> to change not to use MaterializedBlobType until the Postgres (JDBC) >>>>> team changes their driver (which does not seem to have happened in the >>>>> last 6 years)." >>>>> => Any solutions or comments form insiders on this? >>>>> >>>>> Yours, Stefan >>>>> >>>>> [1] http://jdbc.postgresql.org/documentation/head/binary-data.html >>>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html >>>>> [3] https://hibernate.onjira.com/browse/HHH-2244 >>>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs >>> >>> >>> This is common approach for PostgreSQL and some proxy of real life Blobs. >>> One time I submitted bug about this to Hibernate. But step by step. >>> >>> 1. BLOBs are... Large Objects, they are stored as reference because those >>> objects are large, if you will store this objects as bytea then select * >>> will return all large data. It may not be comfortable not only to >>> download >>> few GB of data, but to keep this on stack too. From your perspective it >>> doesn't matters because you put it in byte[]. But if You will keep e.g. >>> CD-ROM images then it's much more better to use streaming approach then >>> bytea[]. More over due to some security JDBC driver will at least double >>> memory consumed by bytea. >>> >>> 2. Specifying hibernate data type as bytea do not resolve problems >>> because >>> it will still use LOB approach. >>> >>> 3. pg_largeobjects is system table, hibernate do not creates it. >>> >>> 4. Trigger based approach is good for this, but You need to do this >>> mannualy >>> >>> 5. If you want to use bytea use >>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I think you >>> should remove @Lob too) on your field. >>> >>> Regards, >>> Radosław Smogura > >
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Oliver Jowett
Date:
On 9 January 2012 12:40, Stefan Keller <sfkeller@gmail.com> wrote: > There's an important principle that the code which "allocates" > resources is also responsible to release it. That's one resource allocation model, yes. The other common model is that resources are freed when there are no remaining references to them, i.e. a GC model. > AFAIK in this case it's JDBC which choses to use LO (which creates > pg_largeobjects entries) and it's therefore also JDBC which has to > clean up. If the application calls LargeObjectManager.create() then it's also responsible for eventually calling LargeObjectManager.unlink(). If you're using JDBC's Blob API, there's no API there to tell the driver to actually delete the underlying data (there is Blob.free(), but that appears to just be about freeing local resources, not the underlying storage). As a LO is independent storage that might have multiple references to it (the OID might be stored in many places), without explicit deletion you need a GC mechanism to collect unreferenced LOs eventually - that's what vacuumlo etc are doing. What do you suggest that the driver does differently here? (Perhaps we could do something like interpret Blob.truncate(0) as "delete the blob right now" - but is that what Hibernate actually does?) (Much of this is the whole LO vs. bytea argument all over again. If you want to store data with a lifetime that's the same as the row it's embedded in, then bytea is a much better mapping) Oliver
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Oliver Jowett
Date:
On 9 January 2012 14:29, Stefan Keller <sfkeller@gmail.com> wrote: > 2012/1/9 Oliver Jowett <oliver@opencloud.com>: >> As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places), withoutexplicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc aredoing. > I can follow that. But that's not what the JDBC user expects nor is it > explained (nor mentioned) in the JDBC docs. > > From a conceptual view I have just an entity MyWebcam with an > attribute called image. Attribute image is of attribute cardinality > 1:1 (and private): > > // Java using Hibernate/JPA: > @Entity > @Lob > @Basic(fetch=FetchType.LAZY) > public class MyWebcam { > private byte[] image; > private String name; > public byte[] getImage() { return image; } > public void setImage(byte[] _image) { image=_image; } > // ... other stuff > } > > That's the classic use case. > Isn't it obvious that if setImage() sets another byte[] that the image > space get's cleared by the layers below? > And since Hibernate chose to use one variant of JDBC, it's also JDBC > which has to take care about orphans. Well, either the Hibernate mapping is misconfigured, or your database is misconfigured i.e. you are not collecting garbage LOs. If you have a suitable GC mechanism configured, then what happens? Otherwise, what should JDBC do differently here? Be specific. It would be helpful if you could provide a native JDBC example, rather than a Hibernate example, since it's not clear what JDBC calls are being made by Hibernate. Oliver
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
2012/1/9 Oliver Jowett <oliver@opencloud.com>: > As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places), withoutexplicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc aredoing. I can follow that. But that's not what the JDBC user expects nor is it explained (nor mentioned) in the JDBC docs. From a conceptual view I have just an entity MyWebcam with an attribute called image. Attribute image is of attribute cardinality 1:1 (and private): // Java using Hibernate/JPA: @Entity @Lob @Basic(fetch=FetchType.LAZY) public class MyWebcam { private byte[] image; private String name; public byte[] getImage() { return image; } public void setImage(byte[] _image) { image=_image; } // ... other stuff } That's the classic use case. Isn't it obvious that if setImage() sets another byte[] that the image space get's cleared by the layers below? And since Hibernate chose to use one variant of JDBC, it's also JDBC which has to take care about orphans. Yours, Stefan 2012/1/9 Oliver Jowett <oliver@opencloud.com>: > On 9 January 2012 12:40, Stefan Keller <sfkeller@gmail.com> wrote: > >> There's an important principle that the code which "allocates" >> resources is also responsible to release it. > > That's one resource allocation model, yes. The other common model is > that resources are freed when there are no remaining references to > them, i.e. a GC model. > >> AFAIK in this case it's JDBC which choses to use LO (which creates >> pg_largeobjects entries) and it's therefore also JDBC which has to >> clean up. > > If the application calls LargeObjectManager.create() then it's also > responsible for eventually calling LargeObjectManager.unlink(). > > If you're using JDBC's Blob API, there's no API there to tell the > driver to actually delete the underlying data (there is Blob.free(), > but that appears to just be about freeing local resources, not the > underlying storage). > As a LO is independent storage that might have multiple references to > it (the OID might be stored in many places), without explicit deletion > you need a GC mechanism to collect unreferenced LOs eventually - > that's what vacuumlo etc are doing. > > What do you suggest that the driver does differently here? (Perhaps we > could do something like interpret Blob.truncate(0) as "delete the blob > right now" - but is that what Hibernate actually does?) > > (Much of this is the whole LO vs. bytea argument all over again. If > you want to store data with a lifetime that's the same as the row it's > embedded in, then bytea is a much better mapping) > > Oliver
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
2012/1/9 Oliver Jowett <oliver@opencloud.com>: > Otherwise, what should JDBC do differently here? Be specific. It would First, I pretty sure that Hibernate nor the Tomcat/Java GC are misconfigured - since it works now after having installed the trigger by hand. To become more specific read the first two sections as a first hint here in this official doc: http://www.postgresql.org/docs/current/interactive/lo.html I try to trace the JDBC calls coming from Hibernate (although the problem seems to me pretty clearly located). That investigation will take some time. Yours, Stefan 2012/1/9 Oliver Jowett <oliver@opencloud.com>: > On 9 January 2012 14:29, Stefan Keller <sfkeller@gmail.com> wrote: >> 2012/1/9 Oliver Jowett <oliver@opencloud.com>: >>> As a LO is independent storage that might have multiple references to> it (the OID might be stored in many places), withoutexplicit deletion> you need a GC mechanism to collect unreferenced LOs eventually -> that's what vacuumlo etc aredoing. >> I can follow that. But that's not what the JDBC user expects nor is it >> explained (nor mentioned) in the JDBC docs. >> >> From a conceptual view I have just an entity MyWebcam with an >> attribute called image. Attribute image is of attribute cardinality >> 1:1 (and private): >> >> // Java using Hibernate/JPA: >> @Entity >> @Lob >> @Basic(fetch=FetchType.LAZY) >> public class MyWebcam { >> private byte[] image; >> private String name; >> public byte[] getImage() { return image; } >> public void setImage(byte[] _image) { image=_image; } >> // ... other stuff >> } >> >> That's the classic use case. >> Isn't it obvious that if setImage() sets another byte[] that the image >> space get's cleared by the layers below? >> And since Hibernate chose to use one variant of JDBC, it's also JDBC >> which has to take care about orphans. > > Well, either the Hibernate mapping is misconfigured, or your database > is misconfigured i.e. you are not collecting garbage LOs. If you have > a suitable GC mechanism configured, then what happens? > > Otherwise, what should JDBC do differently here? Be specific. It would > be helpful if you could provide a native JDBC example, rather than a > Hibernate example, since it's not clear what JDBC calls are being made > by Hibernate. > > Oliver
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Oliver Jowett
Date:
On 10 January 2012 00:06, Stefan Keller <sfkeller@gmail.com> wrote: > 2012/1/9 Oliver Jowett <oliver@opencloud.com>: >> Otherwise, what should JDBC do differently here? Be specific. It would > > First, I pretty sure that Hibernate nor the Tomcat/Java GC are > misconfigured - since it works now after having installed the trigger > by hand. You misunderstand - by GC I mean the process that collects garbage LOs that are no longer referenced. I don't mean the JVM's heap GC. You need a GC process like this if you are using LOs and not managing their lifetimes explicitly from the application. Consider it part of the necessary DB setup. You've already discovered the usual mechanisms for it ('lo' or 'vacuumlo' depending on exactly what your data model looks like). > To become more specific read the first two sections as a first hint > here in this official doc: > http://www.postgresql.org/docs/current/interactive/lo.html FWIW, that documentation is pretty old (the JDBC docs now live separately on jdbc.postgresql.org; the JDBC references in that appendix are mostly historical) But I'm not sure quite what you're referring to - those docs are fairly clear about what you need to do? Specifically: > Now this is fine for PostgreSQL-specific applications, but standard code using JDBC or ODBC won't delete the objects, resultingin orphan objects — objects that are not referenced by anything, and simply occupy disk space. Which is exactly my point - if you are going to use generic JDBC code that does not explicitly delete LOs when they become detached, then you need a separate mechanism to clean them up - that's just the way the model works. If you want to avoid that, don't use LOs, use bytea. So I'm still confused about what you'd like to see changed in the JDBC driver. Can you explain? Oliver
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Oliver Jowett
Date:
On 10 January 2012 00:29, Oliver Jowett <oliver@opencloud.com> wrote: > So I'm still confused about what you'd like to see changed in the JDBC > driver. Can you explain? Perhaps what you're looking for here is "it all just works out of the box". In that case, the missing piece seems to be that the DDL that Hibernate emits (or the DDL which you have set up by hand - I don't know how you have things set up) does not match the data model that Hibernate is expecting. If Hibernate is expecting a data model where the lifecycle of the LO is managed by the database and there's only at most one reference to a particular LO (i.e. you can't link to the same LO from multiple places), then it can set up appropriate cleanup triggers on LO columns as part of its DDL automatically. Or it could just map blobs to bytea, which might be a more natural mapping in that case anyway and doesn't require triggers etc. So you'd need to talk to the Hibernate guys about making those changes. Oliver
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Radosław Smogura
Date:
On Mon, 9 Jan 2012 00:40:08 +0100, Stefan Keller wrote: > 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote: >> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only >> reference> to LOB. In fact, BLOB behaviour in JDBC> is just thin >> wrapper for this what is missing in PostgreSQL - BLOBs. > > I can't follow: PostgreSQL has bytea and LO. I'm ok with bytea too > but > I'd like to get the job done with LO too. > JDBC offers both: > http://jdbc.postgresql.org/documentation/head/binary-data.html > I tried to summarize the characteristics here: > http://www.gis.hsr.ch/wiki/PostgreSQL_-_Binary_Large_Objects > > There's an important principle that the code which "allocates" > resources is also responsible to release it. > AFAIK in this case it's JDBC which choses to use LO (which creates > pg_largeobjects entries) and it's therefore also JDBC which has to > clean up. Hmm we have really different point of view. JDBC driver is just interface which translates JDBC "commands" to database commands. JDBC driver is not database. So responsible for this is database, but database doesn't have "real" BLOBs, this what is made in PG JDBC driver is just "not perfect" way to add this functionality to PostgreSQL. To make complete system You need PostgreSQL, JDBC driver, and this trigger. From reasons I mentioned before it's quite hard for JDBC driver to implement deallocation, because driver will be in need to ask and to "know" for OIDs to unlink. This could made from driver database which need to parse queries, manages db structure etc, and this will be executed twice in driver and in database - this is ineffective. Just for example driver may support this for "DELETE FROM", but what will be if deletion will be result of calling stored procedure? What if stored procedure will be created in language different then pgSQL (driver still may try to download it), but if this will be C, or any other? JDBC driver must be universal and fully "compatible" with PG functionality, as it may be used only as "helper" for presentation layer, true processing logic may be somewhere e. g. legacy system for which we add web interface. Because of this JDBC driver can't expose desired behavior, even if everyone agrees that current behavior is unperfect. Only PG server may do this, or Hibernate for example by new annotation @MangedPgLob. Regards, Radek > Yours, Stefan > > P.S. Just in order get some more insight I also tried to forward this > to the JDBC list (which currently seems to have problems accepting > new > subscriptions). > > 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>: >> On Sun, 8 Jan 2012 21:57:37 +0100, Stefan Keller wrote: >>> >>> Thanks, Radosław, for the clarification. >>> >>> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu> wrote: >>>> >>>> 3. pg_largeobjects is system table, hibernate do not creates it. >>>> >>>> 4. Trigger based approach is good for this, but You need to do >>>> this >>>> mannualy >>> >>> >>> That's exactly my point: >>> Since JDBC manages creation of OID and pg_largeobjects it's also >>> JDBC >>> which is responsible for update/delete of rows in pg_largeobjects. >>> As the user expects, the only thing JDBC would have to do is to >>> issue >>> lo_unlink() when rows are updated or deleted. >>> Thus, it's currently a bug in the JDBC driver. And whatever the >>> solution is, it needs to be mentioned in the JDBC docs. >> >> Not quite, PostgreSQL doesn't have LOB, nor OID type that is only >> reference >> to LOB. In fact, BLOB behaviour in JDBC >> is just thin wrapper for this what is missing in PostgreSQL - BLOBs. >> It was >> build form available parts. In addition OID type may be used and >> it's used >> as the system id or may be used as just some kind of row id - all >> types, >> tables, sequences etc has OID. You may create table with "WITH OIDS" >> clause. >> You may use OID data type just as replacement for (unsigned) int, so >> JDBC >> can create LOB but it can't decide if given field in row is >> reference to LOB >> or e.g. table and what with statements "DELETE WHERE date > "? >> >> In fact JDBC driver is so "stupid" that if you will call getBytes or >> getBlob >> on any column with Oid it will ask for LOB. >> >> Hibernate knows this and it does what is best - calls standard BLOB >> interface, and creates table with Oid column. >> >> And here again round trip, in case of deletion only Hibernate may >> delete >> given LOB because only Hibernate and You knows that any value in Oid >> column >> will reflect LOB - JDBC driver doesn't "knows " this, but... oids >> may be >> shared because those are only numbers, not all tables may be covered >> by >> Hibernate, and assuming huge imagination, someone may encode OID by >> adding >> 1, may store it as long, etc. I know it's quite popular that DB >> schema comes >> from entities, but not always You have such flexibility and >> sometimes You >> create Entities for particular schema. >> >> So, as You see only this trigger approach is currently (universally) >> the >> best way. If this is that module I think about it's just trigger >> which calls >> unlink on replace or deletion - in fact You may write own without >> any >> problems. Those are few lines only. >> >> >>>> 5. If you want to use bytea use >>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I >>>> think you >>>> should remove @Lob too) on your field. >>> >>> >>> In fact, this annotation syntax usage looks like the Hibernate >>> mapping >>> for PostgreSQL could be enhanced. >> >> You have right, but Hibernate team will be in need to add auto >> deletion for >> bulk updates, too. PostgreSQL isn't still so popular it's worth of >> it (and >> they have no so much developers bug report may stand for months >> without any >> comment). Look how many peoples ask for Lobs. It means no one needs >> true LOB >> - true LOB large object stored outside table/row data space, to >> allow out of >> statements operation like streaming, partial updates etc. This is my >> definition of LOB, because this is idea of LOB. I think only DB2 >> stores LOBs >> in row, PostgreSQL do not make this but adverts this bytea ugly >> approach. >> >> Personally, I create app which captures images from WebCam (like >> You) - just >> frames not movies. From above reason I wanted to move to bytea, but >> due to >> changes (wired instability and leak of backward compatibility) I >> still have >> Oid. Because I have only two tables for Lobs I have garbage >> collection >> simple script. >> >> Best regards >> Radek >> >> >>> Yours, Stefan >>> >>> 2012/1/8 Radosław Smogura <rsmogura@softperience.eu>: >>>> >>>> On Sun, 8 Jan 2012 18:08:09 +0100, Stefan Keller wrote: >>>>> >>>>> >>>>> I'd like to backup my statement below regarding in JDBC driver >>>>> from >>>>> PostgreSQL: >>>>> >>>>> When storing fields of type BLOB it inserts the binary string in >>>>> system table pg_largeobject (via user table). But when rows in >>>>> user >>>>> table get updated or deleted it does not update nor delete >>>>> corresponding rows in table pg_largeobject. >>>>> >>>>> That's really a bug! >>>>> >>>>> Fortunately there's a solution indicated in the official docs >>>>> (http://www.postgresql.org/docs/9.1/static/lo.html) >>>>> To me, something like this should be implemented before hand in >>>>> JDBC >>>>> driver. >>>>> And in any case there should be a bold note about this in the >>>>> JDBC >>>>> docs >>>>> (http://jdbc.postgresql.org/documentation/head/binary-data.html ) >>>>> >>>>> Yours, Stefan >>>>> >>>>> >>>>> >>>>> 2012/1/6 Stefan Keller <sfkeller@gmail.com>: >>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I run into a nasty behavior of current PostgreSQL JDBC. >>>>>> >>>>>> I maintain images (from Webcams). In the Java and Hibernate >>>>>> (JPA) code >>>>>> I specified a @Lob annotation on class MyData and a >>>>>> attribte/data type >>>>>> "byte[] mydata;". Hibernate then generates two tables in >>>>>> PostgreSQL, >>>>>> one called MyData with a column mydata of type oid and an >>>>>> internal one >>>>>> called pg_largobjects (which contain foreign keys to the oid). >>>>>> That's >>>>>> also explained in the JDBC docs [1], saying "PostgreSQL provides >>>>>> two >>>>>> distinct ways to store binary data. Binary data can be stored in >>>>>> a >>>>>> table using the data type bytea or by using the Large Object >>>>>> feature >>>>>> which stores the binary data in a separate table in a special >>>>>> format >>>>>> and refers to that table by storing a value of type oid in your >>>>>> table." >>>>>> >>>>>> Now, when replacing the images (few hundred) with new ones using >>>>>> Java, >>>>>> pg_largeobjects grew constantly until the file system run out of >>>>>> space. So old image data did'nt get released! This is to me a >>>>>> bug >>>>>> because the user/programmer must (and should) assume that there >>>>>> is a >>>>>> strict 1:1 relationship between generated table MyData and its >>>>>> LOB >>>>>> column data (stored in pg_largeobjects). >>>>>> => I finally found the supplied module 'lo' [2] which releases >>>>>> detached records. Is this the recommended way to resolve this >>>>>> problem? >>>>>> >>>>>> Searching for explanations I found a ticket HHH2244 [3] which >>>>>> was >>>>>> closed by the Hibernate team without action referring to the >>>>>> JDBC >>>>>> Spec. which says: "An SQL BLOB is a built-in type that stores a >>>>>> Binary >>>>>> Large Object as a column value in a row of a database table". >>>>>> => In other words: The PostgreSQL JDBC team should take action >>>>>> on this >>>>>> but didn't until now, right? >>>>>> >>>>>> There is another issue about "PostgreSQL and BLOBs" [4]. First >>>>>> it >>>>>> cites PostgreSQL JDBC docs [1]. The thread [4] ends somehow too >>>>>> in a >>>>>> 'deadlock' concluding "...the PostgreSQLDialect (as of 3.5.5) >>>>>> needs >>>>>> to change not to use MaterializedBlobType until the Postgres >>>>>> (JDBC) >>>>>> team changes their driver (which does not seem to have happened >>>>>> in the >>>>>> last 6 years)." >>>>>> => Any solutions or comments form insiders on this? >>>>>> >>>>>> Yours, Stefan >>>>>> >>>>>> [1] >>>>>> http://jdbc.postgresql.org/documentation/head/binary-data.html >>>>>> [2] http://www.postgresql.org/docs/current/interactive/lo.html >>>>>> [3] https://hibernate.onjira.com/browse/HHH-2244 >>>>>> [4] http://relation.to/Bloggers/PostgreSQLAndBLOBs >>>> >>>> >>>> This is common approach for PostgreSQL and some proxy of real life >>>> Blobs. >>>> One time I submitted bug about this to Hibernate. But step by >>>> step. >>>> >>>> 1. BLOBs are... Large Objects, they are stored as reference >>>> because those >>>> objects are large, if you will store this objects as bytea then >>>> select * >>>> will return all large data. It may not be comfortable not only to >>>> download >>>> few GB of data, but to keep this on stack too. From your >>>> perspective it >>>> doesn't matters because you put it in byte[]. But if You will keep >>>> e.g. >>>> CD-ROM images then it's much more better to use streaming approach >>>> then >>>> bytea[]. More over due to some security JDBC driver will at least >>>> double >>>> memory consumed by bytea. >>>> >>>> 2. Specifying hibernate data type as bytea do not resolve problems >>>> because >>>> it will still use LOB approach. >>>> >>>> 3. pg_largeobjects is system table, hibernate do not creates it. >>>> >>>> 4. Trigger based approach is good for this, but You need to do >>>> this >>>> mannualy >>>> >>>> 5. If you want to use bytea use >>>> @Type(type="org.hibernate.type.PrimitiveByteArrayBlobType") (I >>>> think you >>>> should remove @Lob too) on your field. >>>> >>>> Regards, >>>> Radosław Smogura >> >>
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Alban Hertroys
Date:
On 9 January 2012 14:55, Radosław Smogura <rsmogura@softperience.eu> wrote: > So responsible for this is database, but database doesn't have > "real" BLOBs, this what is made in PG JDBC driver is just "not perfect" way > to add this functionality to PostgreSQL. I think you should elaborate on what you mean when you say that Postgres doesn't have "real" BLOBs. This discussion did make me wonder about something in Postgres' LOB-support though. As explained earlier, the current implementation allows for dedubbing LOB's, so that it's not necessary to store the same large(!) object multiple times. That is also what's causing this issue with the JDBC driver, or perhaps Hybernate in particular. However, shouldn't it be up to the application designer to dedup large objects or not? The current implementation is probably rather convenient for projects where duplicate large objects are common, but - as it turns out - it can be quite inconvenient when that's opposite to expectations. ISTMT this behaviour should at least be optional. Of course that raises the question what should happen with an existing set of LOB's when that setting gets changed. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Thomas Markus
Date:
Hi, thats not a bug its a feature ;) the combination hibernate/pgjdbc uses pg large objects for byte[] and Blob properties so only oids are referenced. Use 'vacuumlo' to free up your space. regards Thomas
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Radosław Smogura
Date:
Dnia poniedziałek, 9 stycznia 2012 o 15:52:23 Alban Hertroys napisał(a): > On 9 January 2012 14:55, Radosław Smogura <rsmogura@softperience.eu> wrote: > > So responsible for this is database, but database doesn't have > > "real" BLOBs, this what is made in PG JDBC driver is just "not perfect" > > way to add this functionality to PostgreSQL. > > I think you should elaborate on what you mean when you say that > Postgres doesn't have "real" BLOBs. > > This discussion did make me wonder about something in Postgres' > LOB-support though. As explained earlier, the current implementation > allows for dedubbing LOB's, so that it's not necessary to store the > same large(!) object multiple times. That is also what's causing this > issue with the JDBC driver, or perhaps Hybernate in particular. > > However, shouldn't it be up to the application designer to dedup large > objects or not? > > The current implementation is probably rather convenient for projects > where duplicate large objects are common, but - as it turns out - it > can be quite inconvenient when that's opposite to expectations. > > ISTMT this behaviour should at least be optional. Of course that > raises the question what should happen with an existing set of LOB's > when that setting gets changed. In real world BLOBs are transfered as references, and those references are managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is universal type so it can't be used by GC approach, unles collector will know which Oid is LOB oid. Oid is like void*, it's abstarct pointer. If you get void* you don't know if data referenced by it represent person row, or car row, you don't know if void* is even reference or just 64 bit number. Current implementation is not type safe. You can't just write UPDATE TABLE x SET blob = 'aadfasfasfda' which in current times should be supported, but you may write (if are not fully familiar with db) UPDATE table X set varchar_d = blob_column; In fact LOB's id may be stored even as varchar. So true is that PG supports LOBs, but due to missing functionality LOBs are quite hard to manage. It's like car withot steering wheel - you may drive, but it's little bit hard. Regards, Radek
Re: Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Alban Hertroys
Date:
On 9 Jan 2012, at 18:57, Radosław Smogura wrote: > In real world BLOBs are transfered as references, and those references are > managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is > universal type so it can't be used by GC approach, unles collector will know > which Oid is LOB oid. What do you mean by "nacked"? You can unlink lob's, there's your support for deletion. > Oid is like void*, it's abstarct pointer. If you get void* you don't know if > data referenced by it represent person row, or car row, you don't know if > void* is even reference or just 64 bit number. Current implementation is not > type safe. You can't just write UPDATE TABLE x SET blob = 'aadfasfasfda' which > in current times should be supported, but you may write (if are not fully > familiar with db) UPDATE table X set varchar_d = blob_column; That's easy to remedy, similar to how most implementations in C don't use straight void pointers. In C you'd just typedefthem to something meaningful: typedef blob oid; Similarly you can wrap them in a domain in PG: create domain blob as oid; It would be cool if that would allow to add an FK-constraint to the oid in pg_largeobject to that domain, but alas, thatisn't possible in my version (I'm a bit behind with pg 8.4). I agree that it would be nice if PG provided a built-in type for lobs (blob's are a subdivision of those), especially ifthat would also handle the reference to pg_largeobject. > In fact LOB's id may be stored even as varchar. So true is that PG supports > LOBs, but due to missing functionality LOBs are quite hard to manage. It's > like car withot steering wheel - you may drive, but it's little bit hard. That's probably just because PG knows to cast that varchar to something compatible with oid's. I suspect that in recent versionsthat cast may not be allowed anymore though. And remember, SELECT 'my explicit string value'; does not in fact denote a string value, but a literal. While the query isstill in SQL notation (meaning until the query parser is done with it), everything is text. The way I understand it, a literal gets a meaningful type once it is compared to a value of a known type (typically froma column) or once it gets cast to a type explicitly. If that never happens, I expect that the literal will not be convertedto any type and stay the text value that it was in the SQL query string. This is probably documented, but I don't have time to dig into the manuals right now. Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling.
Re: [JDBC] Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Oliver Jowett
Date:
On 10 January 2012 06:57, Radosław Smogura <rsmogura@softperience.eu> wrote: > In real world BLOBs are transfered as references, and those references are > managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is > universal type so it can't be used by GC approach, unles collector will know > which Oid is LOB oid. So you just end up with a conservative collector, not an exact collector. In practice conservative collectors work OK. (You can use a subtype to identify OIDs-that-refer-to-a-LO as suggested elsewhere in the thread if you want an exact collector) Oliver
Re: [JDBC] Binary Large Objects (LOB/BLOB) in Hibernate and JDBC: Unresolved issues
From
Stefan Keller
Date:
I'm still fumbling in the dark but I think I have a smell: Does somebody know what supportsLobValueChangePropogation according to the current JDBC specs? There's an interesting note there: > NOTE : I do not know the correct answer currently for databases which (1) are not part of the cruise control process or(2) do not supportsExpectedLobUsagePattern(). http://srcrr.org/java/hibernate/3.6.2/reference/org/hibernate/dialect/PostgreSQLDialect.html#supportsLobValueChangePropogation() It's currently set to false: https://github.com/hibernate/hibernate-core/blob/master/hibernate-core/src/main/java/org/hibernate/dialect/PostgreSQLDialect.java Yours, Stefan 2012/1/9 Oliver Jowett <oliver@opencloud.com>: > On 10 January 2012 06:57, Radosław Smogura <rsmogura@softperience.eu> wrote: > >> In real world BLOBs are transfered as references, and those references are >> managed in way as the trigger does. Nacked PG doesn't support deletion, Oid is >> universal type so it can't be used by GC approach, unles collector will know >> which Oid is LOB oid. > > So you just end up with a conservative collector, not an exact > collector. In practice conservative collectors work OK. (You can use a > subtype to identify OIDs-that-refer-to-a-LO as suggested elsewhere in > the thread if you want an exact collector) > > Oliver