Thread: BLOB / CLOB support in PostgreSQL
Just in case, I'm PgJDBC committer.
PgJDBC receives requests to "support CLOB" from time to time, however, I believe it is impossible without the support from the database.
To my best knowledge, the database does not have APIs for "streaming large text data".
The only "streaming large binary data" API I know is LargeObject which seems to be old-fashioned.
I believe Java is not the only client that wants streaming access for binary and text data.
Here's a recent pull request to PgJDBC https://github.com/pgjdbc/pgjdbc/pull/1892 where Andrew suggests to add BLOB/CLOB support via bytea/text,
and apparently, Andrew is surprised that the database lacks BLOB/CLOB support.
Any ideas on the way to proceed here?
I don't think it is right to implement Clob via text, especially in case the database provides its own "large text with streaming" datatype in the future.
On 9/28/20 9:46 AM, Vladimir Sitnikov wrote: > Let me please bump the thread. > > Just in case, I'm PgJDBC committer. > > PgJDBC receives requests to "support CLOB" from time to time, however, > I believe it is impossible without the support from the database. > To my best knowledge, the database does not have APIs for "streaming > large text data". > The only "streaming large binary data" API I know is LargeObject which > seems to be old-fashioned. > > I believe Java is not the only client that wants streaming access for > binary and text data. > > Here's a recent pull request to PgJDBC > https://github.com/pgjdbc/pgjdbc/pull/1892 where Andrew suggests to > add BLOB/CLOB support via bytea/text, > and apparently, Andrew is surprised that the database lacks BLOB/CLOB > support. I didn't at all say I was surprised. I said I didn't recall seeing a call for what you're talking about, and indeed there has been very little. > > Any ideas on the way to proceed here? > I don't think it is right to implement Clob via text, especially in > case the database provides its own "large text with streaming" > datatype in the future. > > The concerns to avoid "Clob maps to text" could be: > a) Once the behavior is implemented, it is hard to change. That is > applications would rely on it (and it becomes a defacto standard), and > it would be hard to move to the proper "text with streaming API" datatype. > b) If we make «clob is text», then people might start using > update/substring APIs (which is the primary motivation for Clob) > without realizing there’s full value update behind the scenes. > Currently, they can use setString/getString for text, and it is > crystal clear that the text is updated fully on every update. > > I think your concerns are seriously misplaced. In the first place, what I have proposed is an optional behaviour, and it does not at all preclude other behaviour being made available in future, as well as the current behaviour which would remain the default. Furthermore, it's consistent with how the MySQL driver behaves, AIUI, and possibly others as well. That consistency is a major motivator for the work I've done. Do you suggest they should rip out their Clob interface until MySQL supports a streaming data type? But this is hardly the place for such arguments. To the best of my knowledge there is no concrete proposal for the type of data type / interface you suggest. I don't even have any very good idea what such a thing would look like. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andrew>consistent with how the MySQL driver behaves, AIUI,
On 9/28/20 4:44 PM, Vladimir Sitnikov wrote: > Andrew>To the best of my knowledge there is no concrete proposal for > the type > Andrew>of data type / interface you suggest. > > To the best of my knowledge, one of the concrete proposals was in the very > first message. > > Let me please copy it: > > Vladimir> **Here goes the question**: do you think such an implementation > Vladimir> ("large string stored in Large Objects" could be merged into > the core eventually)? You and I clearly have a different idea from what constitutes a concrete proposal. This is hardly the ghost of a proposal. > > --- > > Andrew>In the first place, what > Andrew>I have proposed is an optional behaviour > > Adding optional behaviors makes support complicated, so flags should > be avoided when > a standard solution is possible. > > Even if you name the behavior "optional", people would still rely on it. > For instance, you implement Clob.truncate(int) as in-driver-memory > truncate operation, however, > in practice, that API should truncate the value in the database. > > Just in case: the current behavior for LO-based lobs is that > Blob.truncate(int) immediately truncates LO. > So if the application works with the current LO mode, then they would > get a behavior change if they flip the switch. > > The same for setString and other modifications. > > An escape hatch could be like "ok, let's throw unimplemented for clob > modifications", however, it contradicts the spec > which says: > > JDBC Spec> * All methods on the <code>Clob</code> interface must be > fully implemented if the > JDBC Spec> * JDBC driver supports the data type. > > What I say here is that the behavior you suggest spans far away from > just "blob maps to bytea". Again, the truncate() I implemented is 100% compatible with the MySQL driver. I just checked the MSSQL driver and it too just appears to truncate what's in memory. So maybe we wouldn't be in such bad company. > > ---- > > Andrew>Furthermore, it's > Andrew>consistent with how the MySQL driver behaves, AIUI, > Andrew>and possibly others as well > > Oracle DB behaves differently. They have BLOB/CLOB column types, and > Clob/Blob interfaces > map to CLOB/BLOB. > > Andrew>That consistency is a major motivator for the work I've done. > > PostgreSQL supports large binary via LargeObject API, so the driver > maps Blob to LO. > > If people want to use bytea, they can use `setBinaryStream(...)`. > What does drive people to use Clob/Blob when the database lacks APIs > for it? > > The reason for my proposal (as I stated more than once) is that people want to be able to use the same code across databases. cheers andrew. -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>100% compatible with the MySQLIt is hardly a justification for a feature or for a change request.Vladimir
On Tue, Sep 29, 2020 at 5:22 AM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:>100% compatible with the MySQLIt is hardly a justification for a feature or for a change request.Vladimir
Glad to see this topic.The obviously different opinion for this feature is based on if we need a "perfect" solutionor a "OK-to-most-user cases" solution.As for PG core developers, I'm +1 with "pg have their own serious problems" , andwe are lacking the resources to handle everything well. However, "serious problems"to different people may be different.As a rare experienced Java developer, looks raise "NotImplemented" error for someunimplemented APIs will not make the maintenance work hard, that probably not commonused APIs. Not fully supported API should be better than fully not supported APIs at all.
As an Oracle DBA before, I do see users need CLOB/BLOB some time but for most of them,they just want to save/get big stuff. This case in Oracle may be more outstanding because ofthe max length of varchar2 is too low.
When come to the JDBC standardJDBC Spec> * All methods on the <code>Clob</code> interface must be fully implemented if theJDBC Spec> * JDBC driver supports the data type.What would be the sense behind this? This is not reasonable based on limited experience.
On 2020-09-28 15:46, Vladimir Sitnikov wrote: > The concerns to avoid "Clob maps to text" could be: > a) Once the behavior is implemented, it is hard to change. That is > applications would rely on it (and it becomes a defacto standard), and > it would be hard to move to the proper "text with streaming API" datatype. > b) If we make «clob is text», then people might start using > update/substring APIs (which is the primary motivation for Clob) without > realizing there’s full value update behind the scenes. Currently, they > can use setString/getString for text, and it is crystal clear that the > text is updated fully on every update. When we added TOAST, we made the explicit decision to not add a "LONG" type but instead have the toasting mechanism transparent in all variable-length types. And that turned out to be a very successful decision, because it allows this system to be used by all data types, not only one or two hardcoded ones. Therefore, I'm very strongly of the opinion that if a streaming system of the sort you allude to were added, it would also be added transparently into the TOAST system. The JDBC spec says """ An implementation of a Blob, Clob or NClob object may either be locator based or result in the object being fully materialized on the client. By default, a JDBC driver should implement the Blob, Clob and NClob interfaces using the appropriate locator type. An application does not deal directly with the locator types that are defined in SQL. """ (A "locator" in SQL is basically what you might call a streaming handle.) So yes, this encourages the implementation of locators. But it also specifies that if you don't have locators, you can implement this using non-large-object types. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/29/20 10:26 AM, Peter Eisentraut wrote: > On 2020-09-28 15:46, Vladimir Sitnikov wrote: >> The concerns to avoid "Clob maps to text" could be: >> a) Once the behavior is implemented, it is hard to change. That is >> applications would rely on it (and it becomes a defacto standard), >> and it would be hard to move to the proper "text with streaming API" >> datatype. >> b) If we make «clob is text», then people might start using >> update/substring APIs (which is the primary motivation for Clob) >> without realizing there’s full value update behind the scenes. >> Currently, they can use setString/getString for text, and it is >> crystal clear that the text is updated fully on every update. > > When we added TOAST, we made the explicit decision to not add a "LONG" > type but instead have the toasting mechanism transparent in all > variable-length types. And that turned out to be a very successful > decision, because it allows this system to be used by all data types, > not only one or two hardcoded ones. Therefore, I'm very strongly of > the opinion that if a streaming system of the sort you allude to were > added, it would also be added transparently into the TOAST system. > > The JDBC spec says > > """ > An implementation of a Blob, Clob or NClob object may either be > locator based or result in the object being fully materialized on the > client. > > By default, a JDBC driver should implement the Blob, Clob and NClob > interfaces using the appropriate locator type. An application does not > deal directly with the locator types that are defined in SQL. > """ > > (A "locator" in SQL is basically what you might call a streaming handle.) > > So yes, this encourages the implementation of locators. But it also > specifies that if you don't have locators, you can implement this > using non-large-object types. > > So if I read this correctly what I have proposed is completely kosher according to the spec - it's the "fully materialized on the client" variant, just like the MySQL and MSSQL drivers. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/29/20 10:26 AM, Peter Eisentraut wrote:
> On 2020-09-28 15:46, Vladimir Sitnikov wrote:
>> The concerns to avoid "Clob maps to text" could be:
>> a) Once the behavior is implemented, it is hard to change. That is
>> applications would rely on it (and it becomes a defacto standard),
>> and it would be hard to move to the proper "text with streaming API"
>> datatype.
>> b) If we make «clob is text», then people might start using
>> update/substring APIs (which is the primary motivation for Clob)
>> without realizing there’s full value update behind the scenes.
>> Currently, they can use setString/getString for text, and it is
>> crystal clear that the text is updated fully on every update.
>
> When we added TOAST, we made the explicit decision to not add a "LONG"
> type but instead have the toasting mechanism transparent in all
> variable-length types. And that turned out to be a very successful
> decision, because it allows this system to be used by all data types,
> not only one or two hardcoded ones. Therefore, I'm very strongly of
> the opinion that if a streaming system of the sort you allude to were
> added, it would also be added transparently into the TOAST system.
>
> The JDBC spec says
>
> """
> An implementation of a Blob, Clob or NClob object may either be
> locator based or result in the object being fully materialized on the
> client.
>
> By default, a JDBC driver should implement the Blob, Clob and NClob
> interfaces using the appropriate locator type. An application does not
> deal directly with the locator types that are defined in SQL.
> """
>
> (A "locator" in SQL is basically what you might call a streaming handle.)
>
> So yes, this encourages the implementation of locators. But it also
> specifies that if you don't have locators, you can implement this
> using non-large-object types.
>
>
So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.
On 9/29/20 2:39 PM, Dave Cramer wrote: > > > On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan > <andrew.dunstan@2ndquadrant.com > <mailto:andrew.dunstan@2ndquadrant.com>> wrote: > > > On 9/29/20 10:26 AM, Peter Eisentraut wrote: > > On 2020-09-28 15:46, Vladimir Sitnikov wrote: > >> The concerns to avoid "Clob maps to text" could be: > >> a) Once the behavior is implemented, it is hard to change. That is > >> applications would rely on it (and it becomes a defacto standard), > >> and it would be hard to move to the proper "text with streaming > API" > >> datatype. > >> b) If we make «clob is text», then people might start using > >> update/substring APIs (which is the primary motivation for Clob) > >> without realizing there’s full value update behind the scenes. > >> Currently, they can use setString/getString for text, and it is > >> crystal clear that the text is updated fully on every update. > > > > When we added TOAST, we made the explicit decision to not add a > "LONG" > > type but instead have the toasting mechanism transparent in all > > variable-length types. And that turned out to be a very successful > > decision, because it allows this system to be used by all data > types, > > not only one or two hardcoded ones. Therefore, I'm very strongly of > > the opinion that if a streaming system of the sort you allude to > were > > added, it would also be added transparently into the TOAST system. > > > > The JDBC spec says > > > > """ > > An implementation of a Blob, Clob or NClob object may either be > > locator based or result in the object being fully materialized > on the > > client. > > > > By default, a JDBC driver should implement the Blob, Clob and NClob > > interfaces using the appropriate locator type. An application > does not > > deal directly with the locator types that are defined in SQL. > > """ > > > > (A "locator" in SQL is basically what you might call a streaming > handle.) > > > > So yes, this encourages the implementation of locators. But it also > > specifies that if you don't have locators, you can implement this > > using non-large-object types. > > > > > > So if I read this correctly what I have proposed is completely kosher > according to the spec - it's the "fully materialized on the client" > variant, just like the MySQL and MSSQL drivers. > > > I haven't really looked at MySQL or MSSQL but do they implement the > full CLOB API ? > We would need to implement the full API. > > BTW, just because it adheres to the spec doesn't seem to hold water in > the PostgreSQL project. Just sayin' > > I take your point, but my remark was more in response to the apparent suggestion that what I submitted was not according to spec. There are two Clob methods I didn't implement, and one Blob method - the set*Stream methods, I think they should be implementable, but they will make the implementation somewhat more complex. Anyway, at this stage let's take the discussion back to the github forums. cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 9/29/20 3:48 PM, Vladimir Sitnikov wrote: > Andrew>You and I clearly have a different idea from what constitutes a > concrete > Andrew>proposal. This is hardly the ghost of a proposal. > > Can you please clarify what is a proposal from your point of view? > Is it documented? > > I think I have read the relevant TODO items: > https://wiki.postgresql.org/wiki/Developer_FAQ#What_do_I_do_after_choosing_an_item_to_work_on.3F > > Wiki clearly suggests posting a mail to pgsql-hackers before starting > work. > > A concrete proposal needs to be more than "a feature that does X". It needs to contain a substantial implementation plan. What structures will be affected, what APIS, what protocol changes and so on. You don't need to have the code for these things but you do need a description of what's intended by way of implementation that is detailed enough for the community to critique. If you don't the danger is that you will spend a lot of time coding and then present it to the community and they will say "Oh, the design is all wrong." That's happened to a number of people in the past, including some quite high profile people, and it's very sad and frustrating for everyone when it happens. A feature for streaming large data types could well be very valuable, but right at the moment I at least don't have any idea what such a thing could look like (and as you might imagine I'm quite interested). cheers andrew -- Andrew Dunstan https://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>
> Just in case, I'm PgJDBC committer.
Thank you very much for your great efforts for the wonderful PgJDBC. I saw you active.
# I'd be happy if you send emails in text format so that the reply looks nice. Your email seems to be in HTML.
> and apparently, Andrew is surprised that the database lacks BLOB/CLOB support.
I was a bit surprised too when I first saw Postgres not support blob/clob but bytea, because I had an impression that Postgres is highly SQL standard compliant. I'm for adding blob/clob data types in server.
At the same time, I wonder why Postgres had to add bytea instead of blob. It may be that there are or were some technical issues. They may stand in the way even now.
One thing I can think of is the parameter format (text/binary). libpq's PQexecParams() can specify input format for each parameter, but it can only specify the output format for all returned columns, not for each column. As a consequence, the bytea host variable support added in PG 12 can INSERT 1 GB of binary data, but retrieval of the value fails with an error message like "invalid alloc request." That's because the server allocates twice the size of stored data to convert it into text format, whose size becomes about 2 GB. That exceeds the limit palloc() can allocate.
33.3. Command Execution Functions
https://www.postgresql.org/docs/devel/libpq-exec.html
> The concerns to avoid "Clob maps to text" could be:
> a) Once the behavior is implemented, it is hard to change. That is applications would rely on it (and it becomes a defacto standard), and it would be hard to move to the proper "text with streaming API" datatype.
> b) If we make «clob is text», then people might start using update/substring APIs (which is the primary motivation for Clob) without realizing there’s full value update behind the scenes. Currently, they can use setString/getString for text, and it is crystal clear that the text is updated fully on every update.
And if we treat clob as a synonym for text (just like the relationship between char and nchar), even when the user writes clob in DDL, pg_dump will output it as text. That makes it a bit harder to use the output for other DBMSs.
Regards
Takayuki Tsunakawa