Thread: BLOB / CLOB support in PostgreSQL

BLOB / CLOB support in PostgreSQL

From
Vladimir Sitnikov
Date:
Hi,

According to Pgjdbc GitHub statistics, the most popular page is https://github.com/pgjdbc/pgjdbc/issues/1102 which is
 "org.postgresql.jdbc.PgConnection.createClob() is not yet implemented" issue (1600 visits from 1400 unique visitors per a fortnight).

There are workarounds to silence the error, however I'm sure CLOB (as in "streaming text datatype") is not yet supported in PostgreSQL backend.

I have browsed pgsql-hackers mailing list re CLOB, and it looks like there's no relevant discussion, so I'm quite sure I've done my homework on "check prior mails regarding the subject".

**Issue**: there's no sensible way to map java.sql.Clob to the existing backend datatypes. `text` can't stream (it always loads the data fully), and it is 1GB limited.

Java distinguishes java.sql.Blob and java.sql.Clob.
Blob is a "binary stream with streaming features". It can be mapped to existing "Large Objects", and existing Large Object API somewhat suits for the implementation.
There are glitches (like "pgjdbc has to perform 4 API calls tell/seek/tell/seek in order just to get LO length once"), however it is fine.

Java Clob API is just a dozen of methods (13 to be exact), however there are two major issues there:
1) "Large Object" is just a binary object. There's no way to tell if the contents is a UTF-8 string or Windows-1251 string or protobuf-encoded message or whatever.
That is if pgjdbc encodes java.sql.Clob (large string) into some form of binary (e.g. UTF-8) and store it as PostgreSQL Large Object, then this LO automatically becomes "pgjdbc-specific blob".
There's no way to use the data in SQL or pl/pgsql or other applications. 
For instance, one can't perform " where clob_column like '%abcd%' "

2) "characters". For instance, `long length()` should return the number of characters in the string.
If pgjdbc implements java.sql.Clob as a UTF-8 encoded binary, then it would have to **process the whole blob** in order to measure string length.
The same thing goes for `String getSubString(long pos, int length)`. It would have to process all the bytes up to character `long pos` (how otherwise it would know byte position for character `pos`?).

Currently pgjdbc encodes strings using client_encoding, stores them as LO, and has been like that for ages. Apparently that might easily produce garbage in the DB if clients use various encodings, however pgjdbc's default setting is to use UTF-8 so the problem should be not that visible.

I fully understand LO has issues with "removing obsolete entries", however mapping java.sql.Clob to `text` seems to make less sense.
For instance: suppose pgjdbc choses "Clob == text". Then a client meets "1GB" limit.

"Streaming TOAST data" looks more like a research project rather than a clear thing to implement.

What if there was a standard of storing strings in Large Objects?
For instance: "CLOB is a UTF-8 encoded string stored as a single LO". When such an agreement exists, various applications could read and write the data.
Of course, UTF-8 might not suit everybody, so a format might be "prefix that specifies encoding, then encoded string".
Of course both variations above fail to support streaming (as in "need to process all the contents in order to get the last character"), so it might be better to use
"prefix that specifies encoding + 'index block' (that specifies offsets for each 1M characters) + encoded string".
I'm sure there are known algorithms to store strings in binary format that support subsequence / overwrite / length in reasonable time (O(1) or O(N) with reasonable constant).
There might be an option to use UTF-16 (so each "character" becomes 2 bytes always), however it would come at a cost of space usage.

**Here goes the question**:  do you think such an implementation ("large string stored in Large Objects" could be merged into the core eventually)?

Q2: any ideas/existing libraries for random access read-write large strings stored as binary?


--
Regards,
Vladimir Sitnikov

Re: BLOB / CLOB support in PostgreSQL

From
Vladimir Sitnikov
Date:
Let me please bump the thread.

Just in case, I'm PgJDBC committer.

PgJDBC receives requests to "support CLOB" from time to time, however, I believe it is impossible without the support from the database.
To my best knowledge, the database does not have APIs for "streaming large text data".
The only "streaming large binary data" API I know is LargeObject which seems to be old-fashioned.

I believe Java is not the only client that wants streaming access for binary and text data.

Here's a recent pull request to PgJDBC https://github.com/pgjdbc/pgjdbc/pull/1892 where Andrew suggests to add BLOB/CLOB support via bytea/text,
and apparently, Andrew is surprised that the database lacks BLOB/CLOB support.

Any ideas on the way to proceed here?
I don't think it is right to implement Clob via text, especially in case the database provides its own "large text with streaming" datatype in the future.

The concerns to avoid "Clob maps to text" could be:
a) Once the behavior is implemented, it is hard to change. That is applications would rely on it (and it becomes a defacto standard), and it would be hard to move to the proper "text with streaming API" datatype.
b) If we make «clob is text», then people might start using update/substring APIs (which is the primary motivation for Clob) without realizing there’s full value update behind the scenes. Currently, they can use setString/getString for text, and it is crystal clear that the text is updated fully on every update.

Vladimir

Re: BLOB / CLOB support in PostgreSQL

From
Andrew Dunstan
Date:
On 9/28/20 9:46 AM, Vladimir Sitnikov wrote:
> Let me please bump the thread.
>
> Just in case, I'm PgJDBC committer.
>
> PgJDBC receives requests to "support CLOB" from time to time, however,
> I believe it is impossible without the support from the database.
> To my best knowledge, the database does not have APIs for "streaming
> large text data".
> The only "streaming large binary data" API I know is LargeObject which
> seems to be old-fashioned.
>
> I believe Java is not the only client that wants streaming access for
> binary and text data.
>
> Here's a recent pull request to PgJDBC
> https://github.com/pgjdbc/pgjdbc/pull/1892 where Andrew suggests to
> add BLOB/CLOB support via bytea/text,
> and apparently, Andrew is surprised that the database lacks BLOB/CLOB
> support.


I didn't at all say I was surprised. I said I didn't recall seeing  a
call for what you're talking about, and indeed there has been very little.


>
> Any ideas on the way to proceed here?
> I don't think it is right to implement Clob via text, especially in
> case the database provides its own "large text with streaming"
> datatype in the future.
>
> The concerns to avoid "Clob maps to text" could be:
> a) Once the behavior is implemented, it is hard to change. That is
> applications would rely on it (and it becomes a defacto standard), and
> it would be hard to move to the proper "text with streaming API" datatype.
> b) If we make «clob is text», then people might start using
> update/substring APIs (which is the primary motivation for Clob)
> without realizing there’s full value update behind the scenes.
> Currently, they can use setString/getString for text, and it is
> crystal clear that the text is updated fully on every update.
>
>


I think your concerns are seriously misplaced. In the first place, what
I have proposed is an optional behaviour, and it does not at all
preclude other behaviour being made available in future, as well as the
current behaviour which would remain the default. Furthermore, it's
consistent with how the MySQL driver behaves, AIUI, and possibly others
as well. That consistency is a major motivator for the work I've done.
Do you suggest they should rip out their Clob interface until MySQL
supports a streaming data type?


But this is hardly the place for such arguments.


To the best of my knowledge there is no concrete proposal for the type
of data type / interface you suggest. I don't even have any very good
idea what such a thing would look like.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BLOB / CLOB support in PostgreSQL

From
Vladimir Sitnikov
Date:
Andrew>To the best of my knowledge there is no concrete proposal for the type
Andrew>of data type / interface you suggest. 

To the best of my knowledge, one of the concrete proposals was in the very
first message.

Let me please copy it:

Vladimir> **Here goes the question**:  do you think such an implementation
Vladimir> ("large string stored in Large Objects" could be merged into the core eventually)?

---

Andrew>In the first place, what
Andrew>I have proposed is an optional behaviour

Adding optional behaviors makes support complicated, so flags should be avoided when
a standard solution is possible.

Even if you name the behavior "optional", people would still rely on it.
For instance, you implement Clob.truncate(int) as in-driver-memory truncate operation, however,
in practice, that API should truncate the value in the database.

Just in case: the current behavior for LO-based lobs is that Blob.truncate(int) immediately truncates LO.
So if the application works with the current LO mode, then they would get a behavior change if they flip the switch.

The same for setString and other modifications.

An escape hatch could be like "ok, let's throw unimplemented for clob modifications", however, it contradicts the spec
which says:

JDBC Spec> * All methods on the <code>Clob</code> interface must be fully implemented if the
JDBC Spec> * JDBC driver supports the data type.

What I say here is that the behavior you suggest spans far away from just "blob maps to bytea".

----

Andrew>Furthermore, it's
Andrew>consistent with how the MySQL driver behaves, AIUI,
Andrew>and possibly others as well

Oracle DB behaves differently. They have BLOB/CLOB column types, and Clob/Blob interfaces
map to CLOB/BLOB.

Andrew>That consistency is a major motivator for the work I've done.

PostgreSQL supports large binary via LargeObject API, so the driver maps Blob to LO.

If people want to use bytea, they can use `setBinaryStream(...)`.
What does drive people to use Clob/Blob when the database lacks APIs for it?

Vladimir

Re: BLOB / CLOB support in PostgreSQL

From
Andrew Dunstan
Date:
On 9/28/20 4:44 PM, Vladimir Sitnikov wrote:
> Andrew>To the best of my knowledge there is no concrete proposal for
> the type
> Andrew>of data type / interface you suggest. 
>
> To the best of my knowledge, one of the concrete proposals was in the very
> first message.
>
> Let me please copy it:
>
> Vladimir> **Here goes the question**:  do you think such an implementation
> Vladimir> ("large string stored in Large Objects" could be merged into
> the core eventually)?



You and I clearly have a different idea from what constitutes a concrete
proposal. This is hardly the ghost of a proposal.


>
> ---
>
> Andrew>In the first place, what
> Andrew>I have proposed is an optional behaviour
>
> Adding optional behaviors makes support complicated, so flags should
> be avoided when
> a standard solution is possible.
>
> Even if you name the behavior "optional", people would still rely on it.
> For instance, you implement Clob.truncate(int) as in-driver-memory
> truncate operation, however,
> in practice, that API should truncate the value in the database.
>
> Just in case: the current behavior for LO-based lobs is that
> Blob.truncate(int) immediately truncates LO.
> So if the application works with the current LO mode, then they would
> get a behavior change if they flip the switch.
>
> The same for setString and other modifications.
>
> An escape hatch could be like "ok, let's throw unimplemented for clob
> modifications", however, it contradicts the spec
> which says:
>
> JDBC Spec> * All methods on the <code>Clob</code> interface must be
> fully implemented if the
> JDBC Spec> * JDBC driver supports the data type.
>
> What I say here is that the behavior you suggest spans far away from
> just "blob maps to bytea".



Again, the truncate() I implemented is 100% compatible with the MySQL
driver. I just checked the MSSQL driver and it too just appears to
truncate what's in memory. So maybe we wouldn't be in such bad company.


>
> ----
>
> Andrew>Furthermore, it's
> Andrew>consistent with how the MySQL driver behaves, AIUI,
> Andrew>and possibly others as well
>
> Oracle DB behaves differently. They have BLOB/CLOB column types, and
> Clob/Blob interfaces
> map to CLOB/BLOB.
>
> Andrew>That consistency is a major motivator for the work I've done.
>
> PostgreSQL supports large binary via LargeObject API, so the driver
> maps Blob to LO.
>
> If people want to use bytea, they can use `setBinaryStream(...)`.
> What does drive people to use Clob/Blob when the database lacks APIs
> for it?
>
>

The reason for my proposal (as I stated more than once) is that people
want to be able to use the same code across databases.


cheers


andrew.


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BLOB / CLOB support in PostgreSQL

From
Vladimir Sitnikov
Date:
>100% compatible with the MySQL

It is hardly a justification for a feature or for a change request.

Vladimir

Re: BLOB / CLOB support in PostgreSQL

From
Andy Fan
Date:


On Tue, Sep 29, 2020 at 5:22 AM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>100% compatible with the MySQL

It is hardly a justification for a feature or for a change request.

Vladimir


Glad to see this topic. 

The obviously different opinion for this feature is based on if we need a "perfect" solution
or  a  "OK-to-most-user cases" solution.  

As for PG core developers,  I'm +1 with "pg have their own serious problems" , and
we are lacking the resources to handle everything well. However, "serious problems"
to different people may be different.  

As a rare experienced Java developer,  looks raise "NotImplemented" error for some 
unimplemented APIs will not make the maintenance work hard, that probably not common
used APIs.  Not fully supported API should be better than fully not supported APIs at all.

As an Oracle DBA before, I do see users need CLOB/BLOB some time but for most of them, 
they just want to save/get big stuff.  This case in Oracle may be more outstanding because of
the max length of varchar2 is too low. 

When come to the JDBC standard 

JDBC Spec> * All methods on the <code>Clob</code> interface must be fully implemented if the
JDBC Spec> * JDBC driver supports the data type.

What would be the sense behind this?  This is not reasonable based on limited experience. 

To be short, I am supporting Andrew's proposal for now. 

--
Best Regards
Andy Fan

Re: BLOB / CLOB support in PostgreSQL

From
Dave Cramer
Date:


On Mon, 28 Sep 2020 at 20:08, Andy Fan <zhihui.fan1213@gmail.com> wrote:


On Tue, Sep 29, 2020 at 5:22 AM Vladimir Sitnikov <sitnikov.vladimir@gmail.com> wrote:
>100% compatible with the MySQL

It is hardly a justification for a feature or for a change request.

Vladimir

I would have to agree. This argument alone has not swayed discussions in the past.  


Glad to see this topic. 

The obviously different opinion for this feature is based on if we need a "perfect" solution
or  a  "OK-to-most-user cases" solution.  

As for PG core developers,  I'm +1 with "pg have their own serious problems" , and
we are lacking the resources to handle everything well. However, "serious problems"
to different people may be different.  

As a rare experienced Java developer,  looks raise "NotImplemented" error for some 
unimplemented APIs will not make the maintenance work hard, that probably not common
used APIs.  Not fully supported API should be better than fully not supported APIs at all.

I don't agree with this statement. Currently the largest source of problems with CLOB not being implemented is from hibernate which chose to not follow the spec and for some reason fails if we throw a not implemented (which is allowed by the spec). If we now implement something which only partially implements the API I can imagine others failing in other ways. Honestly I have no idea but what is the point of he spec if we don't adhere to it?
 
As an Oracle DBA before, I do see users need CLOB/BLOB some time but for most of them, 
they just want to save/get big stuff.  This case in Oracle may be more outstanding because of
the max length of varchar2 is too low. 

It is rare to see people writing JDBC now. It is much more likely they are using JPA or hibernate. Getting around this is rather trivial simply by using @Type(org.hibernate.type.TextType) 
 
When come to the JDBC standard 

JDBC Spec> * All methods on the <code>Clob</code> interface must be fully implemented if the
JDBC Spec> * JDBC driver supports the data type.

What would be the sense behind this?  This is not reasonable based on limited experience. 


The sense about this is that others writing code above it expect everything to be implemented. If they aren't then things fail. 

As mentioned above hibernate fails to check for not implemented and we see the number of issues that resulted in that. Imagine what happens when we partially implement the interface. It will be all but useless to them.

I'm not currently convinced the risk/reward ratio is in our favour.

Dave

Re: BLOB / CLOB support in PostgreSQL

From
Peter Eisentraut
Date:
On 2020-09-28 15:46, Vladimir Sitnikov wrote:
> The concerns to avoid "Clob maps to text" could be:
> a) Once the behavior is implemented, it is hard to change. That is 
> applications would rely on it (and it becomes a defacto standard), and 
> it would be hard to move to the proper "text with streaming API" datatype.
> b) If we make «clob is text», then people might start using 
> update/substring APIs (which is the primary motivation for Clob) without 
> realizing there’s full value update behind the scenes. Currently, they 
> can use setString/getString for text, and it is crystal clear that the 
> text is updated fully on every update.

When we added TOAST, we made the explicit decision to not add a "LONG" 
type but instead have the toasting mechanism transparent in all 
variable-length types.  And that turned out to be a very successful 
decision, because it allows this system to be used by all data types, 
not only one or two hardcoded ones.  Therefore, I'm very strongly of the 
opinion that if a streaming system of the sort you allude to were added, 
it would also be added transparently into the TOAST system.

The JDBC spec says

"""
An implementation of a Blob, Clob or NClob object may either be locator 
based or result in the object being fully materialized on the client.

By default, a JDBC driver should implement the Blob, Clob and NClob 
interfaces using the appropriate locator type. An application does not 
deal directly with the locator types that are defined in SQL.
"""

(A "locator" in SQL is basically what you might call a streaming handle.)

So yes, this encourages the implementation of locators.  But it also 
specifies that if you don't have locators, you can implement this using 
non-large-object types.


-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: BLOB / CLOB support in PostgreSQL

From
Andrew Dunstan
Date:
On 9/29/20 10:26 AM, Peter Eisentraut wrote:
> On 2020-09-28 15:46, Vladimir Sitnikov wrote:
>> The concerns to avoid "Clob maps to text" could be:
>> a) Once the behavior is implemented, it is hard to change. That is
>> applications would rely on it (and it becomes a defacto standard),
>> and it would be hard to move to the proper "text with streaming API"
>> datatype.
>> b) If we make «clob is text», then people might start using
>> update/substring APIs (which is the primary motivation for Clob)
>> without realizing there’s full value update behind the scenes.
>> Currently, they can use setString/getString for text, and it is
>> crystal clear that the text is updated fully on every update.
>
> When we added TOAST, we made the explicit decision to not add a "LONG"
> type but instead have the toasting mechanism transparent in all
> variable-length types.  And that turned out to be a very successful
> decision, because it allows this system to be used by all data types,
> not only one or two hardcoded ones.  Therefore, I'm very strongly of
> the opinion that if a streaming system of the sort you allude to were
> added, it would also be added transparently into the TOAST system.
>
> The JDBC spec says
>
> """
> An implementation of a Blob, Clob or NClob object may either be
> locator based or result in the object being fully materialized on the
> client.
>
> By default, a JDBC driver should implement the Blob, Clob and NClob
> interfaces using the appropriate locator type. An application does not
> deal directly with the locator types that are defined in SQL.
> """
>
> (A "locator" in SQL is basically what you might call a streaming handle.)
>
> So yes, this encourages the implementation of locators.  But it also
> specifies that if you don't have locators, you can implement this
> using non-large-object types.
>
>

So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BLOB / CLOB support in PostgreSQL

From
Dave Cramer
Date:


On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan <andrew.dunstan@2ndquadrant.com> wrote:

On 9/29/20 10:26 AM, Peter Eisentraut wrote:
> On 2020-09-28 15:46, Vladimir Sitnikov wrote:
>> The concerns to avoid "Clob maps to text" could be:
>> a) Once the behavior is implemented, it is hard to change. That is
>> applications would rely on it (and it becomes a defacto standard),
>> and it would be hard to move to the proper "text with streaming API"
>> datatype.
>> b) If we make «clob is text», then people might start using
>> update/substring APIs (which is the primary motivation for Clob)
>> without realizing there’s full value update behind the scenes.
>> Currently, they can use setString/getString for text, and it is
>> crystal clear that the text is updated fully on every update.
>
> When we added TOAST, we made the explicit decision to not add a "LONG"
> type but instead have the toasting mechanism transparent in all
> variable-length types.  And that turned out to be a very successful
> decision, because it allows this system to be used by all data types,
> not only one or two hardcoded ones.  Therefore, I'm very strongly of
> the opinion that if a streaming system of the sort you allude to were
> added, it would also be added transparently into the TOAST system.
>
> The JDBC spec says
>
> """
> An implementation of a Blob, Clob or NClob object may either be
> locator based or result in the object being fully materialized on the
> client.
>
> By default, a JDBC driver should implement the Blob, Clob and NClob
> interfaces using the appropriate locator type. An application does not
> deal directly with the locator types that are defined in SQL.
> """
>
> (A "locator" in SQL is basically what you might call a streaming handle.)
>
> So yes, this encourages the implementation of locators.  But it also
> specifies that if you don't have locators, you can implement this
> using non-large-object types.
>
>

So if I read this correctly what I have proposed is completely kosher
according to the spec - it's the "fully materialized on the client"
variant, just like the MySQL and MSSQL drivers.


I haven't really looked at MySQL or MSSQL but do they implement the full CLOB API ?
We would need to implement the full API.

BTW, just because it adheres to the spec doesn't seem to hold water in the PostgreSQL project. Just sayin'

Dave

Re: BLOB / CLOB support in PostgreSQL

From
Andrew Dunstan
Date:
On 9/29/20 2:39 PM, Dave Cramer wrote:
>
>
> On Tue, 29 Sep 2020 at 14:33, Andrew Dunstan
> <andrew.dunstan@2ndquadrant.com
> <mailto:andrew.dunstan@2ndquadrant.com>> wrote:
>
>
>     On 9/29/20 10:26 AM, Peter Eisentraut wrote:
>     > On 2020-09-28 15:46, Vladimir Sitnikov wrote:
>     >> The concerns to avoid "Clob maps to text" could be:
>     >> a) Once the behavior is implemented, it is hard to change. That is
>     >> applications would rely on it (and it becomes a defacto standard),
>     >> and it would be hard to move to the proper "text with streaming
>     API"
>     >> datatype.
>     >> b) If we make «clob is text», then people might start using
>     >> update/substring APIs (which is the primary motivation for Clob)
>     >> without realizing there’s full value update behind the scenes.
>     >> Currently, they can use setString/getString for text, and it is
>     >> crystal clear that the text is updated fully on every update.
>     >
>     > When we added TOAST, we made the explicit decision to not add a
>     "LONG"
>     > type but instead have the toasting mechanism transparent in all
>     > variable-length types.  And that turned out to be a very successful
>     > decision, because it allows this system to be used by all data
>     types,
>     > not only one or two hardcoded ones.  Therefore, I'm very strongly of
>     > the opinion that if a streaming system of the sort you allude to
>     were
>     > added, it would also be added transparently into the TOAST system.
>     >
>     > The JDBC spec says
>     >
>     > """
>     > An implementation of a Blob, Clob or NClob object may either be
>     > locator based or result in the object being fully materialized
>     on the
>     > client.
>     >
>     > By default, a JDBC driver should implement the Blob, Clob and NClob
>     > interfaces using the appropriate locator type. An application
>     does not
>     > deal directly with the locator types that are defined in SQL.
>     > """
>     >
>     > (A "locator" in SQL is basically what you might call a streaming
>     handle.)
>     >
>     > So yes, this encourages the implementation of locators.  But it also
>     > specifies that if you don't have locators, you can implement this
>     > using non-large-object types.
>     >
>     >
>
>     So if I read this correctly what I have proposed is completely kosher
>     according to the spec - it's the "fully materialized on the client"
>     variant, just like the MySQL and MSSQL drivers.
>
>
> I haven't really looked at MySQL or MSSQL but do they implement the
> full CLOB API ?
> We would need to implement the full API.
>
> BTW, just because it adheres to the spec doesn't seem to hold water in
> the PostgreSQL project. Just sayin'
>
>

I take your point, but my remark was more in response to the apparent
suggestion that what I submitted was not according to spec.


There are two Clob methods I didn't implement, and one Blob method - the
set*Stream methods, I think they should be implementable, but they will
make the implementation somewhat more complex.


Anyway, at this stage let's take the discussion back to the github forums.


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BLOB / CLOB support in PostgreSQL

From
Vladimir Sitnikov
Date:
Andrew>You and I clearly have a different idea from what constitutes a concrete
Andrew>proposal. This is hardly the ghost of a proposal.

Can you please clarify what is a proposal from your point of view?
Is it documented?

I think I have read the relevant TODO items:

Wiki clearly suggests posting a mail to pgsql-hackers before starting work.

Vladimir

Re: BLOB / CLOB support in PostgreSQL

From
Andrew Dunstan
Date:
On 9/29/20 3:48 PM, Vladimir Sitnikov wrote:
> Andrew>You and I clearly have a different idea from what constitutes a
> concrete
> Andrew>proposal. This is hardly the ghost of a proposal.
>
> Can you please clarify what is a proposal from your point of view?
> Is it documented?
>
> I think I have read the relevant TODO items:
> https://wiki.postgresql.org/wiki/Developer_FAQ#What_do_I_do_after_choosing_an_item_to_work_on.3F
>
> Wiki clearly suggests posting a mail to pgsql-hackers before starting
> work.
>
>

A concrete proposal needs to be more than "a feature that does X". It
needs to contain a substantial implementation plan. What structures will
be affected, what APIS, what protocol changes and so on. You don't need
to have the code for these things but you do need a description of
what's intended by way of implementation that is detailed enough for the
community to critique. If you don't the danger is that you will spend a
lot of time coding and then present it to the community and they will
say "Oh, the design is all wrong." That's happened to a number of people
in the past, including some quite high profile people, and it's very sad
and frustrating for everyone when it happens.

A feature for streaming large data types could well be very valuable,
but right at the moment I at least don't have any idea what such a thing
could look like (and as you might imagine I'm quite interested).


cheers

andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: BLOB / CLOB support in PostgreSQL

From
Vladimir Sitnikov
Date:
Andrew>It needs to contain a substantial implementation plan

Here's an implementation plan, again, quoted from the very same mail:

Vladimir>Of course both variations above fail to support streaming
Vladimir> (as in "need to process all the contents in order to get the last character"), so it might be better to use
Vladimir>"prefix that specifies encoding + 'index block' (that specifies offsets for each 1M characters) + encoded string".

It does describe the data structure.

Andrew>what APIS

I believe it does not matter much.
However, it might be similar to the existing LO APIs, except the indices are in characters rather than bytes.

Andrew>protocol changes

None.

Andrew>and so on

Well, it looks like I had everything you mentioned in the very first email.

Vladimir

RE: BLOB / CLOB support in PostgreSQL

From
"tsunakawa.takay@fujitsu.com"
Date:

From: Vladimir Sitnikov <sitnikov.vladimir@gmail.com>

> Just in case, I'm PgJDBC committer.

 

Thank you very much for your great efforts for the wonderful PgJDBC.  I saw you active.

 

# I'd be happy if you send emails in text format so that the reply looks nice.  Your email seems to be in HTML.

 

> and apparently, Andrew is surprised that the database lacks BLOB/CLOB support.

 

I was a bit surprised too when I first saw Postgres not support blob/clob but bytea, because I had an impression that Postgres is highly SQL standard compliant.  I'm for adding blob/clob data types in server.

 

At the same time, I wonder why Postgres had to add bytea instead of blob.  It may be that there are or were some technical issues.  They may stand in the way even now.

 

One thing I can think of is the parameter format (text/binary).  libpq's PQexecParams() can specify input format for each parameter, but it can only specify the output format for all returned columns, not for each column.  As a consequence, the bytea host variable support added in PG 12 can INSERT 1 GB of binary data, but retrieval of the value fails with an error message like "invalid alloc request."  That's because the server allocates twice the size of stored data to convert it into text format, whose size becomes about 2 GB.  That exceeds the limit palloc() can allocate.

 

33.3. Command Execution Functions

https://www.postgresql.org/docs/devel/libpq-exec.html

 

 

> The concerns to avoid "Clob maps to text" could be:

> a) Once the behavior is implemented, it is hard to change. That is applications would rely on it (and it becomes a defacto standard), and it would be hard to move to the proper "text with streaming API" datatype.

> b) If we make «clob is text», then people might start using update/substring APIs (which is the primary motivation for Clob) without realizing there’s full value update behind the scenes. Currently, they can use setString/getString for text, and it is crystal clear that the text is updated fully on every update.

 

And if we treat clob as a synonym for text (just like the relationship between char and nchar), even when the user writes clob in DDL, pg_dump will output it as text.  That makes it a bit harder to use the output for other DBMSs.

 

 

Regards

Takayuki Tsunakawa