Thread: BLOB support

BLOB support

From

Radosław Smogura

Date:

01 June 2011, 17:06:54

Hello,

I partialy implemented following missing LOBs types. Requirement for this was 
to give ability to create (B/C)LOB columns and add casting functionality e.g. 
SET my_clob = 'My long text'.

Idea is as follow:
0. Blob is two state object: 1st in memory contains just bytea, serialized 
contains Oid of large object.
1. Each type has additional boolean haslobs, which is set recursivly.
2. Relation has same bool haslobs (used to speed up tables without LOBs)
3. When data are inserted/updated then "special" function is called and tuple 
is modified in this way all LOBs are serialized to (old) LOB table and just 
Oid is stored.
4. When removed LOB is removed from (old) LOB table.

I have implemented:
- 0.
- 1. and 2. without altering relations and types.
- 3. Inserts only, Updates need polishing. No recursion (no support for BLOBs 
in composites or arrays).

Current patch is here (many changes to pg_types.h - new column haslobs added), 
it's in _early_ stage for those who wants to look inside it.

Any ideas or suggestions?

Regards,
Radek

P. S. 
I'm during removal, and I have limited access to Internet.

Re: BLOB support

From

Peter Eisentraut

Date:

02 June 2011, 09:39:51

On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
> I partialy implemented following missing LOBs types. Requirement for this was 
> to give ability to create (B/C)LOB columns and add casting functionality e.g. 
> SET my_clob = 'My long text'.
> 
> Idea is as follow:
> 0. Blob is two state object: 1st in memory contains just bytea, serialized 
> contains Oid of large object.
> 1. Each type has additional boolean haslobs, which is set recursivly.
> 2. Relation has same bool haslobs (used to speed up tables without LOBs)
> 3. When data are inserted/updated then "special" function is called and tuple 
> is modified in this way all LOBs are serialized to (old) LOB table and just 
> Oid is stored.
> 4. When removed LOB is removed from (old) LOB table.

Superficially, this looks like a reimplementation of TOAST.  What
functionality exactly do you envision that the BLOB and CLOB types would
need to have that would warrant treating them different from, say, bytea
and text?

Re: BLOB support

From

Pavel Stehule

Date:

02 June 2011, 09:59:40

2011/6/2 Peter Eisentraut <peter_e@gmx.net>:
> On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
>> I partialy implemented following missing LOBs types. Requirement for this was
>> to give ability to create (B/C)LOB columns and add casting functionality e.g.
>> SET my_clob = 'My long text'.
>>
>> Idea is as follow:
>> 0. Blob is two state object: 1st in memory contains just bytea, serialized
>> contains Oid of large object.
>> 1. Each type has additional boolean haslobs, which is set recursivly.
>> 2. Relation has same bool haslobs (used to speed up tables without LOBs)
>> 3. When data are inserted/updated then "special" function is called and tuple
>> is modified in this way all LOBs are serialized to (old) LOB table and just
>> Oid is stored.
>> 4. When removed LOB is removed from (old) LOB table.
>
> Superficially, this looks like a reimplementation of TOAST.  What
> functionality exactly do you envision that the BLOB and CLOB types would
> need to have that would warrant treating them different from, say, bytea
> and text?
>

a streaming for bytea could be nice. A very large bytea are limited by
query size - processing long query needs too RAM,

Pavel

>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: BLOB support

From

"ktm@rice.edu"

Date:

02 June 2011, 10:18:44

On Thu, Jun 02, 2011 at 02:58:52PM +0200, Pavel Stehule wrote:
> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:
> > On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
> >> I partialy implemented following missing LOBs types. Requirement for this was
> >> to give ability to create (B/C)LOB columns and add casting functionality e.g.
> >> SET my_clob = 'My long text'.
> >>
> >> Idea is as follow:
> >> 0. Blob is two state object: 1st in memory contains just bytea, serialized
> >> contains Oid of large object.
> >> 1. Each type has additional boolean haslobs, which is set recursivly.
> >> 2. Relation has same bool haslobs (used to speed up tables without LOBs)
> >> 3. When data are inserted/updated then "special" function is called and tuple
> >> is modified in this way all LOBs are serialized to (old) LOB table and just
> >> Oid is stored.
> >> 4. When removed LOB is removed from (old) LOB table.
> >
> > Superficially, this looks like a reimplementation of TOAST.  What
> > functionality exactly do you envision that the BLOB and CLOB types would
> > need to have that would warrant treating them different from, say, bytea
> > and text?
> >
> 
> a streaming for bytea could be nice. A very large bytea are limited by
> query size - processing long query needs too RAM,
> 
> Pavel
> 

+1 for a streaming interface to bytea/text. I do agree that there is no need
to reinvent the TOAST architecture with another name, just improve the existing
implementation.

Regards,
Ken

Re: BLOB support

From

Pavel Golub

Date:

02 June 2011, 10:27:19

Hello, Pavel.

You wrote:

PS> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:
>> On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
>>> I partialy implemented following missing LOBs types. Requirement for this was
>>> to give ability to create (B/C)LOB columns and add casting functionality e.g.
>>> SET my_clob = 'My long text'.
>>>
>>> Idea is as follow:
>>> 0. Blob is two state object: 1st in memory contains just bytea, serialized
>>> contains Oid of large object.
>>> 1. Each type has additional boolean haslobs, which is set recursivly.
>>> 2. Relation has same bool haslobs (used to speed up tables without LOBs)
>>> 3. When data are inserted/updated then "special" function is called and tuple
>>> is modified in this way all LOBs are serialized to (old) LOB table and just
>>> Oid is stored.
>>> 4. When removed LOB is removed from (old) LOB table.
>>
>> Superficially, this looks like a reimplementation of TOAST.  What
>> functionality exactly do you envision that the BLOB and CLOB types would
>> need to have that would warrant treating them different from, say, bytea
>> and text?
>>

PS> a streaming for bytea could be nice. A very large bytea are limited by
PS> query size - processing long query needs too RAM,

LO (oid) solves this, doesn't it?

PS> Pavel

>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>




-- 
With best wishes,Pavel                          mailto:pavel@gf.microolap.com

Re: BLOB support

From

Radosław Smogura

Date:

02 June 2011, 10:29:39

On Thu, 02 Jun 2011 15:39:39 +0300, Peter Eisentraut wrote:
> On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
>> I partialy implemented following missing LOBs types. Requirement for 
>> this was
>> to give ability to create (B/C)LOB columns and add casting 
>> functionality e.g.
>> SET my_clob = 'My long text'.
>>
>> Idea is as follow:
>> 0. Blob is two state object: 1st in memory contains just bytea, 
>> serialized
>> contains Oid of large object.
>> 1. Each type has additional boolean haslobs, which is set 
>> recursivly.
>> 2. Relation has same bool haslobs (used to speed up tables without 
>> LOBs)
>> 3. When data are inserted/updated then "special" function is called 
>> and tuple
>> is modified in this way all LOBs are serialized to (old) LOB table 
>> and just
>> Oid is stored.
>> 4. When removed LOB is removed from (old) LOB table.
>
> Superficially, this looks like a reimplementation of TOAST.May look similar, but Datums doesn't support more then
32bitlength and size of any field is limited to 1GB, am I right? Serializations is only for casting simple values <
1GB,and simple operations, to do not overhead creation of hundreds LOBs.
 

> What functionality exactly do you envision that the BLOB and CLOB 
> types would
> need to have that would warrant treating them different from, say, 
> bytea
> and text?
Actually I thought about less sophisticated support of LOBs, supporting casting and copying data, as well known form
otherdatabases idea that LOBs are not "downloaded" during normal query execution (just ids are taken). Currently, e.g.
LOBsare not connected with tables, so deleting rows doesn't delete LOB, table actually holds Oid of large objects, no
supportfor casting to/from LOB, no support for CLOBS. Some drivers try to emulate BLOBs/CLOBs, but it is not perfect,
mainlyfrom above reasons.
 
Regards,Radek

Re: BLOB support

From

Pavel Stehule

Date:

02 June 2011, 10:50:14

2011/6/2 Pavel Golub <pavel@microolap.com>:
> Hello, Pavel.
>
> You wrote:
>
> PS> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:
>>> On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
>>>> I partialy implemented following missing LOBs types. Requirement for this was
>>>> to give ability to create (B/C)LOB columns and add casting functionality e.g.
>>>> SET my_clob = 'My long text'.
>>>>
>>>> Idea is as follow:
>>>> 0. Blob is two state object: 1st in memory contains just bytea, serialized
>>>> contains Oid of large object.
>>>> 1. Each type has additional boolean haslobs, which is set recursivly.
>>>> 2. Relation has same bool haslobs (used to speed up tables without LOBs)
>>>> 3. When data are inserted/updated then "special" function is called and tuple
>>>> is modified in this way all LOBs are serialized to (old) LOB table and just
>>>> Oid is stored.
>>>> 4. When removed LOB is removed from (old) LOB table.
>>>
>>> Superficially, this looks like a reimplementation of TOAST.  What
>>> functionality exactly do you envision that the BLOB and CLOB types would
>>> need to have that would warrant treating them different from, say, bytea
>>> and text?
>>>
>
> PS> a streaming for bytea could be nice. A very large bytea are limited by
> PS> query size - processing long query needs too RAM,
>
> LO (oid) solves this, doesn't it?

partially

There is a few disadvantages LO against bytea, so there are requests
for "smarter" API for bytea.

Significant problem is different implementation of LO for people who
have to port application to PostgreSQL from Oracle, DB2. There are
some JDBC issues too.

For me - main disadvantage of LO in one space for all. Bytea removes
this disadvantage, but it is slower for lengths > 20 MB. It could be
really very practical have a possibility insert some large fields in
second NON SQL stream. Same situation is when large bytea is read.

Pavel


>
> PS> Pavel
>
>>>
>>>
>>> --
>>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>>> To make changes to your subscription:
>>> http://www.postgresql.org/mailpref/pgsql-hackers
>>>
>
>
>
>
> --
> With best wishes,
>  Pavel                          mailto:pavel@gf.microolap.com
>
>

Re: BLOB support

From

Robert Haas

Date:

02 June 2011, 10:59:52

On Thu, Jun 2, 2011 at 9:29 AM, Radosław Smogura
<rsmogura@softperience.eu> wrote:
>> What functionality exactly do you envision that the BLOB and CLOB types
>> would
>> need to have that would warrant treating them different from, say, bytea
>> and text?
>
> Actually I thought about less sophisticated support of LOBs, supporting
> casting and copying data, as well known form other databases idea that LOBs
> are not "downloaded" during normal query execution (just ids are taken).
> Currently, e.g. LOBs are not connected with tables, so deleting rows doesn't
> delete LOB, table actually holds Oid of large objects, no support for
> casting to/from LOB, no support for CLOBS. Some drivers try to emulate
> BLOBs/CLOBs, but it is not perfect, mainly from above reasons.

But these problems can be fixed without inventing a completely new
system, I think.  Or at least we should try.  I can see the point of a
data type that is really a pointer to a LOB, and the LOB gets deleted
when the pointer is removed, but I don't think that should require
far-reaching changes all over the system (like relhaslobs) to make it
work efficiently.  I think you need to start with a problem statement,
get agreement that it is a problem and on what the solution should be,
and then go write the code to implement that solution.  This is a
classic example of writing the code first and then working backwards
toward the problem you're trying to solve, and that rarely works out
well for the reasons that you're now finding out: people may not agree
with your proposed solution, they may want things done differently,
and now you're stuck reworking code that you've already written.  It's
much easier to change a design document than it is to rewrite code.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BLOB support

From

Tom Lane

Date:

02 June 2011, 11:44:19

Robert Haas <robertmhaas@gmail.com> writes:
> But these problems can be fixed without inventing a completely new
> system, I think.  Or at least we should try.  I can see the point of a
> data type that is really a pointer to a LOB, and the LOB gets deleted
> when the pointer is removed, but I don't think that should require
> far-reaching changes all over the system (like relhaslobs) to make it
> work efficiently.  I think you need to start with a problem statement,
> get agreement that it is a problem and on what the solution should be,
> and then go write the code to implement that solution.

Yes.  I think the appropriate problem statement is "provide streaming
access to large field values, as an alternative to just fetching/storing
the entire value at once".  I see no good reason to import the entire
messy notion of LOBS/CLOBS.  (The fact that other databases have done it
is not a good reason.)

For primitive types like text or bytea it seems pretty obvious what
"streaming access" should entail, but it might be interesting to
consider what it should mean for structured types.  For instance, if I
have an array field with umpteen zillion elements, it might be nice to
fetch them one at a time using the streaming access mechanism.  I don't
say that that has to be in the first version, but it'd be a good idea to
keep that in the back of your head so you don't design a dead-end
solution that can't be extended in that direction.
        regards, tom lane

Re: BLOB support

From

Pavel Stehule

Date:

02 June 2011, 13:20:57

2011/6/2 Tom Lane <tgl@sss.pgh.pa.us>:
> Robert Haas <robertmhaas@gmail.com> writes:
>> But these problems can be fixed without inventing a completely new
>> system, I think.  Or at least we should try.  I can see the point of a
>> data type that is really a pointer to a LOB, and the LOB gets deleted
>> when the pointer is removed, but I don't think that should require
>> far-reaching changes all over the system (like relhaslobs) to make it
>> work efficiently.  I think you need to start with a problem statement,
>> get agreement that it is a problem and on what the solution should be,
>> and then go write the code to implement that solution.
>
> Yes.  I think the appropriate problem statement is "provide streaming
> access to large field values, as an alternative to just fetching/storing
> the entire value at once".  I see no good reason to import the entire
> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
> is not a good reason.)
>
> For primitive types like text or bytea it seems pretty obvious what
> "streaming access" should entail, but it might be interesting to
> consider what it should mean for structured types.  For instance, if I
> have an array field with umpteen zillion elements, it might be nice to
> fetch them one at a time using the streaming access mechanism.  I don't
> say that that has to be in the first version, but it'd be a good idea to
> keep that in the back of your head so you don't design a dead-end
> solution that can't be extended in that direction.

+1

Pavel

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: BLOB support

From

Radosław Smogura

Date:

02 June 2011, 13:58:05

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42
> Robert Haas <robertmhaas@gmail.com> writes:
> > But these problems can be fixed without inventing a completely new
> > system, I think.  Or at least we should try.  I can see the point of a
> > data type that is really a pointer to a LOB, and the LOB gets deleted
> > when the pointer is removed, but I don't think that should require
> > far-reaching changes all over the system (like relhaslobs) to make it
> > work efficiently.  I think you need to start with a problem statement,
> > get agreement that it is a problem and on what the solution should be,
> > and then go write the code to implement that solution.
> 
> Yes.  I think the appropriate problem statement is "provide streaming
> access to large field values, as an alternative to just fetching/storing
> the entire value at once".  I see no good reason to import the entire
> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
> is not a good reason.)
> 
> For primitive types like text or bytea it seems pretty obvious what
> "streaming access" should entail, but it might be interesting to
> consider what it should mean for structured types.  For instance, if I
> have an array field with umpteen zillion elements, it might be nice to
> fetch them one at a time using the streaming access mechanism.  I don't
> say that that has to be in the first version, but it'd be a good idea to
> keep that in the back of your head so you don't design a dead-end
> solution that can't be extended in that direction.
> 
>             regards, tom lane

In context of LOBs streaming is resolved... I use current LO functionallity 
(so driver may be able to read LOBs as psql \lo_export does it or using COPY 
subprotocol) and client should get just LO's id. BLOBs in this implementation, 
like Robert wanted are just wrapper for core LO, with some extensions for 
special situations.... Adding of relhaslob in this impl is quite importnat to 
do not examine tupledesc for each table operation, but this value may be 
deduced during relation open (with performance penatly). I saw simillar is 
made few lines above when triggers are fired, and few lines below when indices 
are updated. 

Currently BLOBs may be emulated using core LO (JDBC driver does it), but among 
everything else, other problems are, if you look from point of view of 
application developing:

1. No tracking of unused LO (you store just id of such object). You may leak 
LO after row remove/update. User may write triggers for this, but it is not 
argument - BLOB type is popular, and it's simplicity of use is quite 
important. When I create app this is worst thing.

2. No support for casting in UPDATE/INSERT. So there is no way to simple 
migrate data (e.g. from too long varchars). Or to copy BLOBs.

3. Limitation of field size to 1GB.

Other solution, I was think about, is to introduce system triggers (such 
triggers can't be disabled or removed). So there will be new flag in triggers 
table.

Now I think, we should try to mix both aproches, as system triggers may give 
interesting API for other developers.

Other databases (may) store LOBs, Arrays, and Composites in external tables, 
so user get's just id of such object.

I think about two weaks about streaming, I have some concepts about this, but 
from point of view of memory consumption and performance. I will send concept 
later, I want to think a little bit about it once more, and search what can be 
actually done.

Regards,
Radek

Re: BLOB support

From

Tom Lane

Date:

02 June 2011, 15:28:01

Radosław Smogura <rsmogura@softperience.eu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42
>> Yes.  I think the appropriate problem statement is "provide streaming
>> access to large field values, as an alternative to just fetching/storing
>> the entire value at once".  I see no good reason to import the entire
>> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
>> is not a good reason.)

> In context of LOBs streaming is resolved... I use current LO functionallity 
> (so driver may be able to read LOBs as psql \lo_export does it or using COPY 
> subprotocol) and client should get just LO's id.

Just to be clear: I do not want to expose a concept of object IDs for
field values in the first place.  All of the problems you enumerate stem
from the idea that LOBs ought to be a distinct kind of field, and I
don't buy that.
        regards, tom lane

Re: BLOB support

From

Radosław Smogura

Date:

02 June 2011, 17:05:12

Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 19:43:16
> Radosław Smogura <rsmogura@softperience.eu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42
> > 
> >> Yes.  I think the appropriate problem statement is "provide streaming
> >> access to large field values, as an alternative to just fetching/storing
> >> the entire value at once".  I see no good reason to import the entire
> >> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
> >> is not a good reason.)
> > 
> > In context of LOBs streaming is resolved... I use current LO
> > functionallity (so driver may be able to read LOBs as psql \lo_export
> > does it or using COPY subprotocol) and client should get just LO's id.
> 
> Just to be clear: I do not want to expose a concept of object IDs for
> field values in the first place.  All of the problems you enumerate stem
> from the idea that LOBs ought to be a distinct kind of field, and I
> don't buy that.
> 
>             regards, tom lane

So do I understand good should We think about create bettered TOAST to support 
larger values then 30-bit length? I like this much more, but without Objects 
ID quering relation with lobs will require to lock relation for some time, as 
client will need to reference LOB in some way, I think using TID or some 
derivative of TID, am I right?

Regards,
Radek

Re: BLOB support

From

Tomas Vondra

Date:

02 June 2011, 17:37:31

Dne 2.6.2011 15:49, Pavel Stehule napsal(a):
> 2011/6/2 Pavel Golub <pavel@microolap.com>:
>> Hello, Pavel.
>>
>> You wrote:
>>
>> PS> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:
>>>> On ons, 2011-06-01 at 22:00 +0200, Radosław Smogura wrote:
>>>>> I partialy implemented following missing LOBs types. Requirement for this was
>>>>> to give ability to create (B/C)LOB columns and add casting functionality e.g.
>>>>> SET my_clob = 'My long text'.
>>>>>
>>>>> Idea is as follow:
>>>>> 0. Blob is two state object: 1st in memory contains just bytea, serialized
>>>>> contains Oid of large object.
>>>>> 1. Each type has additional boolean haslobs, which is set recursivly.
>>>>> 2. Relation has same bool haslobs (used to speed up tables without LOBs)
>>>>> 3. When data are inserted/updated then "special" function is called and tuple
>>>>> is modified in this way all LOBs are serialized to (old) LOB table and just
>>>>> Oid is stored.
>>>>> 4. When removed LOB is removed from (old) LOB table.
>>>>
>>>> Superficially, this looks like a reimplementation of TOAST.  What
>>>> functionality exactly do you envision that the BLOB and CLOB types would
>>>> need to have that would warrant treating them different from, say, bytea
>>>> and text?
>>>>
>>
>> PS> a streaming for bytea could be nice. A very large bytea are limited by
>> PS> query size - processing long query needs too RAM,
>>
>> LO (oid) solves this, doesn't it?
> 
> partially
> 
> There is a few disadvantages LO against bytea, so there are requests
> for "smarter" API for bytea.
> 
> Significant problem is different implementation of LO for people who
> have to port application to PostgreSQL from Oracle, DB2. There are
> some JDBC issues too.
> 
> For me - main disadvantage of LO in one space for all. Bytea removes
> this disadvantage, but it is slower for lengths > 20 MB. It could be
> really very practical have a possibility insert some large fields in
> second NON SQL stream. Same situation is when large bytea is read.

Yes, being able to do this (without the need to use LOs as they have
their own set of problems - no FKs, etc.) would help a lot of people who
want/need to keep memory usage low.

What I'd like to see is the ability to stream BYTEA columns in both
directions - let's not reinvent the API, other databases already support
this. E.g. with Oracle you can do this using PDO (PHP):

<?php

$db = new PDO('oci:', 'scott', 'tiger');
$stmt = $db->prepare("insert into images (imagedata)".       "VALUES (EMPTY_BLOB())");

$fp = fopen('./myfile.data', 'rb');

$stmt->bindParam(1, $fp, PDO::PARAM_LOB);
$stmt->execute();

?>

If we could make it work in a similar way, that would be great. There
are some more details at http://cz2.php.net/manual/en/pdo.lobs.php.

Tomas

Re: BLOB support

From

Tomas Vondra

Date:

02 June 2011, 17:43:30

Dne 2.6.2011 15:18, ktm@rice.edu napsal(a):
> On Thu, Jun 02, 2011 at 02:58:52PM +0200, Pavel Stehule wrote:
>> 2011/6/2 Peter Eisentraut <peter_e@gmx.net>:
>>> Superficially, this looks like a reimplementation of TOAST.  What
>>> functionality exactly do you envision that the BLOB and CLOB types would
>>> need to have that would warrant treating them different from, say, bytea
>>> and text?
>>>
>>
>> a streaming for bytea could be nice. A very large bytea are limited by
>> query size - processing long query needs too RAM,
>>
>> Pavel
>>
> 
> +1 for a streaming interface to bytea/text. I do agree that there is no need
> to reinvent the TOAST architecture with another name, just improve the existing
> implementation.

Building a "parallel" architecture that mimics TOAST is obviously a bad
idea.

But I do have a curious question - the current LO approach is based on
splitting the data into small chunks (2kB) and storing those chunks in a
bytea column of the pg_largeobject table.

How much overhead does all this mean? What if there is a special kind of
blocks for binary data, that limits the amount of chunks and TOAST?
Actually this probably would not need a special type of block, but when
writing a block there would be a single row with as much data as
possible (and some metadata). I.e. there would be almost 8kB of
compressed data.

This would probably bring some restrictions (e.g. inability to update
the data, but I don't think that's possible with the current LO anyway.
Has anyone thought about this?

regards
Tomas

Re: BLOB support

From

"Ross J. Reedstrom"

Date:

02 June 2011, 17:55:06

On Thu, Jun 02, 2011 at 01:43:16PM -0400, Tom Lane wrote:
> Radosław Smogura <rsmogura@softperience.eu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> Thursday 02 of June 2011 16:42:42
> >> Yes.  I think the appropriate problem statement is "provide streaming
> >> access to large field values, as an alternative to just fetching/storing
> >> the entire value at once".  I see no good reason to import the entire
> >> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
> >> is not a good reason.)
> 
> > In context of LOBs streaming is resolved... I use current LO functionallity 
> > (so driver may be able to read LOBs as psql \lo_export does it or using COPY 
> > subprotocol) and client should get just LO's id.
> 
> Just to be clear: I do not want to expose a concept of object IDs for
> field values in the first place.  All of the problems you enumerate stem
> from the idea that LOBs ought to be a distinct kind of field, and I
> don't buy that.
> 

I think you're saying no OIDs exposed to the SQL i.e. actually stored in
a field and returned by a SELECT? (Which seems to be the proposal).

As I mentioned recently on another list, I've wrapped a block-oriented
"streaming" interface over bytea in python for a web app, specifically
to deal with the latency and memory footprint issues of storing
'largish' files directly in the db.  I find that with a 64K blocksize,
latency is 'good enough' and substr() seems to be constant time for a
given size, no matter what part of the bytea value I'm fetching: toast
does a fine job of random access.

I was musing about providing a way to use the existing client lo
streaming interface (rather than the backend bits) for this type of
access. The thing called an OID in the client interface is really just a
nonce to tell the backend what data to send. With a single generator
function: 

SELECT CASE WHEN is_lo THEN my_loid ELSE make_lo_oid(my_bytea) END  FROM my_file_table WHERE id = 34534;

Then plugging that back into the lo interface from the client side,
would let me use bytea as I currently do for files under 1GB, lo for
larger, and gain client side streaming that is transparent to the
storage of that particular value. Admittedly, application software would
still need to know how to _store_ different values, and manage large
objects, with all the pain that entails. But there's some gain in
unifying the reading part.

Hard to not call it an oid, since that's what the client libraries
already document it as (at least, python does)

Ross
-- 
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

Re: BLOB support

From

Alvaro Herrera

Date:

03 June 2011, 11:35:50

Excerpts from Radosław Smogura's message of jue jun 02 15:26:29 -0400 2011:

> So do I understand good should We think about create bettered TOAST to support 
> larger values then 30-bit length? I like this much more,

Good :-)

(BTW while it'd be good to have longer-than-30 bit length words for
varlena, I'm not sure we have room for that.)

> but without Objects ID quering relation with lobs will require to lock
> relation for some time,

Why?  The tuples are not going away due to MVCC anyway.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: BLOB support

From

Tom Lane

Date:

03 June 2011, 11:44:55

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Radosław Smogura's message of jue jun 02 15:26:29 -0400 2011:
>> So do I understand good should We think about create bettered TOAST to support 
>> larger values then 30-bit length? I like this much more,

> Good :-)

> (BTW while it'd be good to have longer-than-30 bit length words for
> varlena, I'm not sure we have room for that.)

You wouldn't want to push such values around as whole values anyway.
Possibly what would work here is a variant form of TOAST pointer for
which we'd simply throw error if you tried to fetch the entire value
at once.
        regards, tom lane

Re: BLOB support

From

Robert Haas

Date:

03 June 2011, 13:03:50

On Thu, Jun 2, 2011 at 12:53 PM, Radosław Smogura
<rsmogura@softperience.eu> wrote:
> 1. No tracking of unused LO (you store just id of such object). You may leak
> LO after row remove/update. User may write triggers for this, but it is not
> argument - BLOB type is popular, and it's simplicity of use is quite
> important. When I create app this is worst thing.
>
> 2. No support for casting in UPDATE/INSERT. So there is no way to simple
> migrate data (e.g. from too long varchars). Or to copy BLOBs.
>
> 3. Limitation of field size to 1GB.

As a general point, it would probably be a good idea to address each
of these issues separately, and to have a separate discussion about
each one.

As to #1 specifically, if you use a text or bytea field rather than a
large object per se, then this issue goes away.  But then you lose the
streaming functionality.  So at least some people here are saying that
we should try to fix that by adding the streaming functionality to
text/bytea rather than by doing anything to the large object facility.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: BLOB support

From

Tom Lane

Date:

03 June 2011, 13:09:09

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Jun 2, 2011 at 12:53 PM, Radosław Smogura
> <rsmogura@softperience.eu> wrote:
>> 1. No tracking of unused LO (you store just id of such object). You may leak
>> LO after row remove/update. User may write triggers for this, but it is not
>> argument - BLOB type is popular, and it's simplicity of use is quite
>> important. When I create app this is worst thing.
>> 
>> 2. No support for casting in UPDATE/INSERT. So there is no way to simple
>> migrate data (e.g. from too long varchars). Or to copy BLOBs.
>> 
>> 3. Limitation of field size to 1GB.

> As a general point, it would probably be a good idea to address each
> of these issues separately, and to have a separate discussion about
> each one.

> As to #1 specifically, if you use a text or bytea field rather than a
> large object per se, then this issue goes away.  But then you lose the
> streaming functionality.  So at least some people here are saying that
> we should try to fix that by adding the streaming functionality to
> text/bytea rather than by doing anything to the large object facility.

#2 is also a problem that only becomes a problem if you insist that LOBs
have to be a distinct kind of value.
        regards, tom lane

Re: BLOB support

From

Radosław Smogura

Date:

03 June 2011, 13:10:49

Tom Lane <tgl@sss.pgh.pa.us> Friday 03 of June 2011 16:44:13
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Excerpts from RadosÅaw Smogura's message of jue jun 02 15:26:29 -0400
2011:
> >> So do I understand good should We think about create bettered TOAST to
> >> support larger values then 30-bit length? I like this much more,
> >
> > Good :-)
> >
> > (BTW while it'd be good to have longer-than-30 bit length words for
> > varlena, I'm not sure we have room for that.)
>
> You wouldn't want to push such values around as whole values anyway.
> Possibly what would work here is a variant form of TOAST pointer for
> which we'd simply throw error if you tried to fetch the entire value
> at once.
>
>             regards, tom lane

Ok, now it's more clear about this, what You have talked about, but I still
need to pass constant ID to client.

Actually, this variant must be passed to client.

Form other side, as BLOB may be created before statement invoke or if it's
called. This will require to create tempolary BLOBs, and introducing v3.1
protocol, which will allow to stream values greater then 4GB, by passing -2
size in length fields, and introducing stream_in/out in pg_type (this is from
my concept of streaming protocol).

So I think better will be to introduce 1st streaming protocol, as it is on top
LOBs. I will send thread for this in a moment.

>> Why?  The tuples are not going away due to MVCC anyway.
Vaccum / autovacumm + no lock may be enaugh, I think. Constant ID is required.

Regards,
Radek

Re: BLOB support

From

Radosław Smogura

Date:

03 June 2011, 13:54:13

Tom Lane <tgl@sss.pgh.pa.us> Friday 03 of June 2011 18:08:56
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Jun 2, 2011 at 12:53 PM, RadosÅaw Smogura
> >
> > <rsmogura@softperience.eu> wrote:
> >> 1. No tracking of unused LO (you store just id of such object). You may
> >> leak LO after row remove/update. User may write triggers for this, but
> >> it is not argument - BLOB type is popular, and it's simplicity of use
> >> is quite important. When I create app this is worst thing.
> >>
> >> 2. No support for casting in UPDATE/INSERT. So there is no way to simple
> >> migrate data (e.g. from too long varchars). Or to copy BLOBs.
> >>
> >> 3. Limitation of field size to 1GB.
> >
> > As a general point, it would probably be a good idea to address each
> > of these issues separately, and to have a separate discussion about
> > each one.
> >
> > As to #1 specifically, if you use a text or bytea field rather than a
> > large object per se, then this issue goes away.  But then you lose the
> > streaming functionality.  So at least some people here are saying that
> > we should try to fix that by adding the streaming functionality to
> > text/bytea rather than by doing anything to the large object facility.
>
> #2 is also a problem that only becomes a problem if you insist that LOBs
> have to be a distinct kind of value.
>
>             regards, tom lane

And one more topic to discuss. Should blob be referencable, e.g. I create in
JDBC new Blob, I set stream for it what should happen if I will call
UPDATE t set b = ? where 1=1
?

This is not about copy on write.

Regards,
Radek

Re: BLOB support

From

Radosław Smogura

Date:

04 June 2011, 12:25:28

Tom Lane <tgl@sss.pgh.pa.us> Friday 03 of June 2011 16:44:13
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Excerpts from RadosÅaw Smogura's message of jue jun 02 15:26:29 -0400
2011:
> >> So do I understand good should We think about create bettered TOAST to
> >> support larger values then 30-bit length? I like this much more,
> >
> > Good :-)
> >
> > (BTW while it'd be good to have longer-than-30 bit length words for
> > varlena, I'm not sure we have room for that.)
>
> You wouldn't want to push such values around as whole values anyway.
> Possibly what would work here is a variant form of TOAST pointer for
> which we'd simply throw error if you tried to fetch the entire value
> at once.
>
>             regards, tom lane

I mentoined about JDBC call.
1.
b = conn.createBlob();
ps = conn.preparesStatement("INSRT INTO t blob = ?");
ps.setBlob(1, b);
ps.executeQuery();

2.
Statements could be as well too
"SELECT blob_index_of(?, ?)";
where 1st ? is blob, 2nd one is some text/sequence

This call must be supported to make BLOB as far as possible simillar to other
types, this actually disallow to put LOB in TOAST as there is no relation (2.)
or relation is not known (1.) during reading LOB - in any case you can't skip
bytes from protocol stream, so possible solutions are:

1. Create temporaly LOB in file or memory depending of it size.
2. Use existing Large Object interface.
3. Client will not send LOB, just it's faked ID and during call Server will
ask client to serialize this LOB, by faked id.
4. Any other propositions?

I vote for 2.

For pg_type/class changes I think about removing haslobs, and put this as
attstorage or somewhere else for example ('l' may stand for lobs or -3
length), but currently TOAST composites doesn't support recursion, and those
types are toasted as whole. I may add recursion for those types, and support
special maintain for LOBs. In any case handling this will be better in
toasting code, not in nodeModifyTable.

Any ideas about this?

Reagrds,
Radek

Re: BLOB support

From

Dimitri Fontaine

Date:

05 June 2011, 17:17:04

Tom Lane <tgl@sss.pgh.pa.us> writes:
> Yes.  I think the appropriate problem statement is "provide streaming
> access to large field values, as an alternative to just fetching/storing
> the entire value at once".  I see no good reason to import the entire
> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
> is not a good reason.)

Spent some time in the archive to confirm a certain “déjà vu”
impression.  Couldn't find it.  Had to manually search in closed commit
fests… but here we are, I think:
 https://commitfest.postgresql.org/action/patch_view?id=70
http://archives.postgresql.org/message-id/17891.1246301879@sss.pgh.pa.us
http://archives.postgresql.org/message-id/4A4BF87E.7010107@ak.jp.nec.com

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: BLOB support

From

Radosław Smogura

Date:

06 June 2011, 05:11:43

On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Yes.  I think the appropriate problem statement is "provide 
>> streaming
>> access to large field values, as an alternative to just 
>> fetching/storing
>> the entire value at once".  I see no good reason to import the 
>> entire
>> messy notion of LOBS/CLOBS.  (The fact that other databases have 
>> done it
>> is not a good reason.)
>
> Spent some time in the archive to confirm a certain “déjà vu”
> impression.  Couldn't find it.  Had to manually search in closed 
> commit
> fests… but here we are, I think:
>
>   https://commitfest.postgresql.org/action/patch_view?id=70
>   
> http://archives.postgresql.org/message-id/17891.1246301879@sss.pgh.pa.us
>   
> http://archives.postgresql.org/message-id/4A4BF87E.7010107@ak.jp.nec.com
>
> Regards,
I think more about this with contrast to sent references, but I still have in my mind constructBlob myWeddingDvd =
conn.createBlob(myWeddingStream,size); //A bit outdated we have BlueRayconn.prepareStatemnt("INSERT INTO
someonetubevideosvalues (?)")where 1st parameter is myWeddingDvd,or if someone doesn't like Java he/she/it may wish to
putC++ istream, or C FILE.

I think (with respect to below consideration), this implicite requires that LOBs should be stored in one, centralized
placedoesn't matter if this will be file system or special table, or something else, but when statement is processed
thereis no idea with which table LOB will be associated, if we want to TOAST, where TOAST it, what will be if insertion
willby SQL function, which choose table depending on BLOB content?

Quite interesting idea from cited patch was about string identifying LOB, but with above it close road to for JDBC
createLOB. I think, as well constructs that insert 1st, small LOB into table to get some driver depending API are
littlebit old fashioned.

Possible solutions, if we don't want centralized storage, may be:1. Keep BLOB in memory, but this may, depending on
implementation,reduce size of initial BLOB.2. Temporally backup blob in file, then when values are stored copy file to
TOASTtable, but still some changes are required to support LOBs for complex types and arrays.

So please give some ideas how to resolve this, or may be it has low priority?
Regards,Radek

Re: BLOB support

From

Pavel Stehule

Date:

06 June 2011, 05:39:27

2011/6/6 Radosław Smogura <rsmogura@softperience.eu>:
> On Sun, 05 Jun 2011 22:16:41 +0200, Dimitri Fontaine wrote:
>>
>> Tom Lane <tgl@sss.pgh.pa.us> writes:
>>>
>>> Yes.  I think the appropriate problem statement is "provide streaming
>>> access to large field values, as an alternative to just fetching/storing
>>> the entire value at once".  I see no good reason to import the entire
>>> messy notion of LOBS/CLOBS.  (The fact that other databases have done it
>>> is not a good reason.)
>>
>> Spent some time in the archive to confirm a certain “déjà vu”
>> impression.  Couldn't find it.  Had to manually search in closed commit
>> fests… but here we are, I think:
>>
>>  https://commitfest.postgresql.org/action/patch_view?id=70
>>  http://archives.postgresql.org/message-id/17891.1246301879@sss.pgh.pa.us
>>  http://archives.postgresql.org/message-id/4A4BF87E.7010107@ak.jp.nec.com
>>
>> Regards,
>
> I think more about this with contrast to sent references, but I still have
> in my mind construct
> Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit outdated
> we have BlueRay
> conn.prepareStatemnt("INSERT INTO someonetubevideos values (?)")
> where 1st parameter is myWeddingDvd,
> or if someone doesn't like Java he/she/it may wish to put C++ istream, or C
> FILE.
>
> I think (with respect to below consideration), this implicite requires that
> LOBs should be stored in one, centralized place doesn't matter if this will
> be file system or special table, or something else, but when statement is
> processed there is no idea with which table LOB will be associated, if we
> want to TOAST, where TOAST it, what will be if insertion will by SQL
> function, which choose table depending on BLOB content?
>
> Quite interesting idea from cited patch was about string identifying LOB,
> but with above it close road to for JDBC create LOB. I think, as well
> constructs that insert 1st, small LOB into table to get some driver
> depending API are little bit old fashioned.
>
> Possible solutions, if we don't want centralized storage, may be:
> 1. Keep BLOB in memory, but this may, depending on implementation, reduce
> size of initial BLOB.
> 2. Temporally backup blob in file, then when values are stored copy file to
> TOAST table, but still some changes are required to support LOBs for complex
> types and arrays.

@1 is useles for multiuser applications. This is a problem of current
implemementation for large TOAST values. You can hold around
"work_mem" bytes in mem, but any larger content should to be forwarded
to file.

Pavel

>
> So please give some ideas how to resolve this, or may be it has low
> priority?
>
> Regards,
> Radek
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: BLOB support

From

Tom Lane

Date:

06 June 2011, 11:13:44

Radosław Smogura <rsmogura@softperience.eu> writes:
>  I think more about this with contrast to sent references, but I still 
>  have in my mind construct
>  Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit 
>  outdated we have BlueRay
>  conn.prepareStatemnt("INSERT INTO someonetubevideos values (?)")
>  where 1st parameter is myWeddingDvd,

Yes, if you insist upon designing the API like that, then you come to
the conclusion that you need global LOB identifiers.

However, there are many ways to design this that don't work that way.
One idea to think about is
insert into someonetubevideos values('')returning open_for_write(videocolumn)

which gives you back some kind of writable stream ID (this is a
transient, within-session ID, not global) for the target field in the
row you just inserted.

BTW, as was noted upthread by Dimitri, this whole subject has been
discussed before on pgsql-hackers.  You really ought to go re-read the
previous threads.
        regards, tom lane

Re: BLOB support

From

Radosław Smogura

Date:

06 June 2011, 12:24:13

Tom Lane <tgl@sss.pgh.pa.us> Monday 06 of June 2011 16:13:26
> Radosław Smogura <rsmogura@softperience.eu> writes:
> >  I think more about this with contrast to sent references, but I still
> >  have in my mind construct
> >  Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit
> >  outdated we have BlueRay
> >  conn.prepareStatemnt("INSERT INTO someonetubevideos values (?)")
> >  where 1st parameter is myWeddingDvd,
> 
> Yes, if you insist upon designing the API like that, then you come to
> the conclusion that you need global LOB identifiers.
> 
> However, there are many ways to design this that don't work that way.
> One idea to think about is
> 
>     insert into someonetubevideos values('')
>     returning open_for_write(videocolumn)
> 
> which gives you back some kind of writable stream ID (this is a
> transient, within-session ID, not global) for the target field in the
> row you just inserted.
I know, but this is a little bit old-fashioned bahaviour.

> BTW, as was noted upthread by Dimitri, this whole subject has been
> discussed before on pgsql-hackers.  You really ought to go re-read the
> previous threads.
> 
>             regards, tom lane

I read this, but it may be rethinked again. Actaully changes to TOAST (I mean 
streaming will be just for LOBs, I had written all found disadvantages for 
TOAST for LOB and it's looks like only performance of above is some kind of 
disadvantage, as well this prevent some less usefull concepts of Copy on Write 
for LOBs.

Introducing streaming for TOAST is little useless, sorry just for cite from 
my, mentoined document:

(This is generally about on demand stream of TOASTed value, in 
context of LOBs is acceptable, as long not transactional aware LOBs are 
acceptable). If we will add streaming of TOASTed values, so caller will 
get some reference to this value, we need to ensure that pointed data 
will not be changed, nor deleted - I think this will require caller to 
add FOR UPDATE (or silently to add this by server) for each statement 
returning pointers to TOASTed, as client may do transactional query, and 
other client just after (1st) may remove record, commit, and call 
VACUUM. In this situation when 1st will try to read data form given row, 
it will get error. This may be accpetable for LOBs (commonly LOBs may be not 
transaction aware, but I will be angry if this will happen with VARCHAR)

If this is acceptable I will do following changes.

Add 
- server_max_in_memory_lob_size - GUC server start-only config to describe 
maximum value of client session parameter max_in_memory_lob.

- max_in_memory_lob - session GUC describing how huge LOBs may be keept in 
memory before backing up to file

- rescursivly toasting, detoasting during insert/update/remove for searching 
for LOBs (we need this for arrays and complex types) - this is for last stage 
(error disallowing LOBs in composites/arrays may be quite enaugh, for 
begining) - I want LOBs to be starting point for LOBing other types (e.g. some 
big arrays may be LOBbed).

- during toasting, lob will be toasted and in place of LOB, the reference to 
it will be putted, and encoded in LOB datum.

- api for LOB manipulation (few changes to current implementation) in way that 
BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes 
for LOB will not affect size of datum looking at size of LOB.

- api for maintaing temoraly lob files, we need this as per session list of id 
-> file desc, to prevent prevent stealing of lobs by different connections 
(security)

- streaming api for TOASTED values (based on COPY protocol, or changed COPY 
protocol) or at least function calls - I havent looked at this in context of 
TOASTed LOBs.

Is it good?

Regards,
Radek

Re: BLOB support

From

Tom Lane

Date:

06 June 2011, 12:41:30

Radosław Smogura <rsmogura@softperience.eu> writes:
> Introducing streaming for TOAST is little useless, sorry just for cite from 
> my, mentoined document:

> (This is generally about on demand stream of TOASTed value, in 
> context of LOBs is acceptable, as long not transactional aware LOBs are 
> acceptable). If we will add streaming of TOASTed values, so caller will 
> get some reference to this value, we need to ensure that pointed data 
> will not be changed, nor deleted - I think this will require caller to 
> add FOR UPDATE (or silently to add this by server) for each statement 
> returning pointers to TOASTed, as client may do transactional query,

It's already been explained to you that that's not the case.

> If this is acceptable I will do following changes.

> Add 
> - server_max_in_memory_lob_size - GUC server start-only config to describe 
> maximum value of client session parameter max_in_memory_lob.

> - max_in_memory_lob - session GUC describing how huge LOBs may be keept in 
> memory before backing up to file

> - rescursivly toasting, detoasting during insert/update/remove for searching 
> for LOBs (we need this for arrays and complex types) - this is for last stage 
> (error disallowing LOBs in composites/arrays may be quite enaugh, for 
> begining) - I want LOBs to be starting point for LOBing other types (e.g. some 
> big arrays may be LOBbed).

> - during toasting, lob will be toasted and in place of LOB, the reference to 
> it will be putted, and encoded in LOB datum.

> - api for LOB manipulation (few changes to current implementation) in way that 
> BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes 
> for LOB will not affect size of datum looking at size of LOB.

> - api for maintaing temoraly lob files, we need this as per session list of id 
> -> file desc, to prevent prevent stealing of lobs by different connections 
> (security)

> - streaming api for TOASTED values (based on COPY protocol, or changed COPY 
> protocol) or at least function calls - I havent looked at this in context of 
> TOASTed LOBs.

> Is it good?

This all looks like you decided on a solution first and then started to
look for a problem to apply it to.  I don't want to see us inventing a
pile of GUCs for this, and I don't think there is a need to make any
fundamental changes in the TOAST mechanism either.  What we do need is a
streaming access protocol to read or write wide field values *without*
forcing people to change the basic structure of their tables.  You keep
on wanting to invent weird, IMO unnecessary language features instead.
Try to make the feature as transparent as possible, not as visible as
possible.
        regards, tom lane

Re: BLOB support

From

Heikki Linnakangas

Date:

06 June 2011, 12:48:57

On 06.06.2011 17:13, Tom Lane wrote:
> Radosław Smogura<rsmogura@softperience.eu>  writes:
>>   I think more about this with contrast to sent references, but I still
>>   have in my mind construct
>>   Blob myWeddingDvd = conn.createBlob(myWeddingStream, size); //A bit
>>   outdated we have BlueRay
>>   conn.prepareStatemnt("INSERT INTO someonetubevideos values (?)")
>>   where 1st parameter is myWeddingDvd,
>
> Yes, if you insist upon designing the API like that, then you come to
> the conclusion that you need global LOB identifiers.

That's what the JDBC api looks like, but it doesn't mean you need global 
LOB identifiers. When you create the Blob object (myWeddingDvd), the 
driver can just keep a reference to the given stream (myWeddingStream) 
to the Blob object. When you execute the INSERT statement, the driver 
can read the stream and stream the data to the server.

The protocol changes I think Tom and I and others are envisioning would 
work just fine with that.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: BLOB support

From

Radoslaw Smogura

Date:

06 June 2011, 14:48:17

I just started with some image as blob works. And I think topic of this will come back. As well many other problems
willarise. Flattering tuple, etc.

I will send scretches of streaming in this way, I hope, as I want go back to
clusterizationwork.

Sorry, for top reply, windows  phone 7.

Regards,
Radek 

-----Original Message-----
From: Tom Lane
Sent: 6 czerwca 2011 17:41
To: Radoslaw Smogura
Cc: Alvaro Herrera; Pavel Stehule; Dimitri Fontaine; Robert Haas; Peter Eisentraut; PG Hackers
Subject: Re: [HACKERS] BLOB support

Radosław Smogura <rsmogura@softperience.eu> writes:
> Introducing streaming for TOAST is little useless, sorry just for cite from
> my, mentoined document:

> (This is generally about on demand stream of TOASTed value, in
> context of LOBs is acceptable, as long not transactional aware LOBs are
> acceptable). If we will add streaming of TOASTed values, so caller will
> get some reference to this value, we need to ensure that pointed data
> will not be changed, nor deleted - I think this will require caller to
> add FOR UPDATE (or silently to add this by server) for each statement
> returning pointers to TOASTed, as client may do transactional query,

It's already been explained to you that that's not the case.

> If this is acceptable I will do following changes.

> Add
> - server_max_in_memory_lob_size - GUC server start-only config to describe
> maximum value of client session parameter max_in_memory_lob.

> - max_in_memory_lob - session GUC describing how huge LOBs may be keept in
> memory before backing up to file

> - rescursivly toasting, detoasting during insert/update/remove for searching
> for LOBs (we need this for arrays and complex types) - this is for last stage
> (error disallowing LOBs in composites/arrays may be quite enaugh, for
> begining) - I want LOBs to be starting point for LOBing other types (e.g. some
> big arrays may be LOBbed).

> - during toasting, lob will be toasted and in place of LOB, the reference to
> it will be putted, and encoded in LOB datum.

> - api for LOB manipulation (few changes to current implementation) in way that
> BLOB is field type and uninon of in_memory, in_file, in_toast. This few bytes
> for LOB will not affect size of datum looking at size of LOB.

> - api for maintaing temoraly lob files, we need this as per session list of id
> -> file desc, to prevent prevent stealing of lobs by different connections
> (security)

> - streaming api for TOASTED values (based on COPY protocol, or changed COPY
> protocol) or at least function calls - I havent looked at this in context of
> TOASTed LOBs.

> Is it good?

This all looks like you decided on a solution first and then started to
look for a problem to apply it to.  I don't want to see us inventing a
pile of GUCs for this, and I don't think there is a need to make any
fundamental changes in the TOAST mechanism either.  What we do need is a
streaming access protocol to read or write wide field values *without*
forcing people to change the basic structure of their tables.  You keep
on wanting to invent weird, IMO unnecessary language features instead.
Try to make the feature as transparent as possible, not as visible as
possible.
        regards, tom lane