Thread: Extending varlena

Extending varlena

From
David Fetter
Date:
Folks,

As the things stored in databases grow, we're going to start needing
to think about database objects that 4 bytes of size can't describe.
People are already storing video in lo and bytea fields.  To date, the
sizes of media files have never trended downward.

What would need to happen for the next jump up from where varlena is
now, to 8 bytes?  Would we want to use the bit-stuffing model that the
current varvarlena uses?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Extending varlena

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> What would need to happen for the next jump up from where varlena is
> now, to 8 bytes?

Dealing with upwards-of-4GB blobs as single Datums isn't remotely sane,
and won't become so in the near (or even medium) future.  So I don't
see the point of doing all the work that would be involved in making
this go.

What would make more sense is to redesign the large-object stuff to be
somewhat modern and featureful, and provide stream-access APIs (think
lo_read, lo_seek, etc) that allow offsets wider than 32 bits.  The main
things I think we'd need to consider besides just the access API are

- permissions features (more than "none" anyway)
- better management of orphaned objects (obsoleting vacuumlo)
- support > 16TB of large objects (maybe partition pg_largeobject?)
- dump and restore probably need improvement to be practical for such large data volumes
        regards, tom lane


Re: Extending varlena

From
David Fetter
Date:
On Mon, Aug 18, 2008 at 04:22:56PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > What would need to happen for the next jump up from where varlena
> > is now, to 8 bytes?
> 
> Dealing with upwards-of-4GB blobs as single Datums isn't remotely
> sane, and won't become so in the near (or even medium) future.  So I
> don't see the point of doing all the work that would be involved in
> making this go.

OK

> What would make more sense is to redesign the large-object stuff to
> be somewhat modern and featureful, and provide stream-access APIs
> (think lo_read, lo_seek, etc) that allow offsets wider than 32 bits.

Great!

> The main things I think we'd need to consider besides just the
> access API are
> 
> - permissions features (more than "none" anyway)

Would ROLEs work, or are you thinking of the per-row and per-column
access controls people sometimes want?

> - better management of orphaned objects (obsoleting vacuumlo)
> - support > 16TB of large objects (maybe partition pg_largeobject?)
> - dump and restore probably need improvement to be practical for such
>   large data volumes

That, and the usual upgrade-in-place :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Extending varlena

From
Andrew Chernow
Date:
David Fetter wrote:
> Folks,
> 
> As the things stored in databases grow, we're going to start needing
> to think about database objects that 4 bytes of size can't describe.
> People are already storing video in lo and bytea fields.  To date, the
> sizes of media files have never trended downward.
> 

I always find these requests puzzling.  Is it really useful to store the 
data for a jpeg, video file or a 10GB tar ball in a database column? 
Does anyone actually search for byte sequences within those data streams 
(maybe if it were text)?  I would think that the metadata is what gets 
searched: title, track, name, file times, size, etc...  Database storage 
is normally pricey, stocked with 15K drives, so wasting that expensive 
storage with non-searchable binary blobs doesn't make much sense.  Why 
not offload the data to a file system with 7200 RPM SATA drives and 
store a reference to it in the db?  Keep the db more compact and simpler 
to manage.

Andrew Chernow
eSilo, LLC



Re: Extending varlena

From
David Fetter
Date:
On Mon, Aug 18, 2008 at 06:09:13PM -0400, Andrew Chernow wrote:
> David Fetter wrote:
>> Folks,
>>
>> As the things stored in databases grow, we're going to start
>> needing to think about database objects that 4 bytes of size can't
>> describe.  People are already storing video in lo and bytea fields.
>> To date, the sizes of media files have never trended downward.
>
> I always find these requests puzzling.  Is it really useful to store
> the  data for a jpeg, video file or a 10GB tar ball in a database
> column?  

It is if you need transaction semantics.  Think medical records, etc.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Extending varlena

From
Andrew Chernow
Date:
David Fetter wrote:
> On Mon, Aug 18, 2008 at 06:09:13PM -0400, Andrew Chernow wrote:
>> David Fetter wrote:
>>> Folks,
>>>
>>> As the things stored in databases grow, we're going to start
>>> needing to think about database objects that 4 bytes of size can't
>>> describe.  People are already storing video in lo and bytea fields.
>>> To date, the sizes of media files have never trended downward.
>> I always find these requests puzzling.  Is it really useful to store
>> the  data for a jpeg, video file or a 10GB tar ball in a database
>> column?  
> 
> It is if you need transaction semantics.  Think medical records, etc.
> 
> Cheers,
> David.

I see that, although developing the middleware between db and fs is 
rather trival.  I think that is the puzzling part.  It just feels akward 
to "me" to just stuff it in the db.  You can do more by distributing.

Anyways (back on topic), I am in favor of removing limits from any 
section of the database ... not just your suggestion.  The end-user 
application should impose limits.

Andrew



Re: Extending varlena

From
Josh Berkus
Date:
Andrew,

> I always find these requests puzzling.  Is it really useful to store the
> data for a jpeg, video file or a 10GB tar ball in a database column?

Some people find it useful.  Because LOs are actually easier to manage in 
PG than in most other DBMSes, right now that's a significant source of 
PostgreSQL adoption.  I'd like to encourage those users by giving them 
more useful LO features.

-- 
--Josh

Josh Berkus
PostgreSQL
San Francisco


Re: Extending varlena

From
Jeff Davis
Date:
> I always find these requests puzzling.  Is it really useful to store the 
> data for a jpeg, video file or a 10GB tar ball in a database column? 

One use case is that it can use the existing postgresql protocol, and
does not require extra filesystem mounts, extra error handling, and
other complexity.

Regards,Jeff Davis



Re: Extending varlena

From
Tom Lane
Date:
Andrew Chernow <ac@esilo.com> writes:
> Anyways (back on topic), I am in favor of removing limits from any 
> section of the database ... not just your suggestion.  The end-user 
> application should impose limits.

That's nice as an abstract principle, but there are only so many hours
in the day, so we need to prioritize which limits we're going to get rid
of.  The 4-byte limit on individual Datum sizes does not strike me as a
limit that's going to be significant for practical use any time soon.
(I grant David's premise that people will soon want to work with objects
that are larger than that --- but not that they'll want to push them
around as indivisible, store-and-fetch-as-a-unit field values.)
        regards, tom lane


Re: Extending varlena

From
Andrew Chernow
Date:
Tom Lane wrote:
> Andrew Chernow <ac@esilo.com> writes:
>> Anyways (back on topic), I am in favor of removing limits from any 
>> section of the database ... not just your suggestion.  The end-user 
>> application should impose limits.
> 
> That's nice as an abstract principle, but there are only so many hours
> in the day, so we need to prioritize which limits we're going to get rid
> of.  The 4-byte limit on individual Datum sizes does not strike me as a
> limit that's going to be significant for practical use any time soon.
> (I grant David's premise that people will soon want to work with objects
> that are larger than that --- but not that they'll want to push them
> around as indivisible, store-and-fetch-as-a-unit field values.)
> 
>             regards, tom lane
> 
> 

Yeah, my comments were overly general.  I wasn't suggesting attention be 
put on one limit over another.  I was only saying that the act of 
removing a limit (of which many are arbitrary) is most often a good one.

andrew


Re: Extending varlena

From
Andrew Chernow
Date:
Jeff Davis wrote:
>> I always find these requests puzzling.  Is it really useful to store the 
>> data for a jpeg, video file or a 10GB tar ball in a database column? 
> 
> One use case is that it can use the existing postgresql protocol, 
> 
> 

So can what I am suggesting.  How about a user-defined C function in the 
backend that talks to the fs and uses SPI to sync info with a record? 
Now the operation is behind a transaction.  Yet, one must handle fs 
orphans from evil crash cases.

Just one solution, but other more creative cats may have better ideas. 
the point is, it can be done without too much effort.  A little TLC :)

Andrew


Re: Extending varlena

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Mon, Aug 18, 2008 at 04:22:56PM -0400, Tom Lane wrote:
>> The main things I think we'd need to consider besides just the
>> access API are
>> 
>> - permissions features (more than "none" anyway)

> Would ROLEs work, or are you thinking of the per-row and per-column
> access controls people sometimes want?

Well, obviously roles are the entities that receive permissions, but
on what do we base granting permissions to LOs?

With the current model that a LO is an independent entity that is merely
referenced (or not) by OIDs in the database, it seems like we'd have to
grant/revoke permissions to individual LOs, identified by OID; which
sure seems messy to me.  People don't really want to name their LOs
by OID anyway --- it's just a convention that's forced on them by the
current implementation.

I was kinda wondering about something closer to the TOAST model, where
a blob is only referenceable from a value that's in a table field;
and that value encapsulates the "name" of the blob in some way that
needn't even be user-visible.  This'd greatly simplify the
cleanup-dead-objects problem, and we could avoid addressing the
permissions problem at all, since regular SQL permissions on the table
would serve fine.  But it's not clear what regular SQL fetch and update
behaviors should be like for such a thing.  (Fetching or storing the
whole blob value is right out, IMHO.)  ISTR hearing of concepts roughly
like this in other DBs --- does it ring a bell for anyone?
        regards, tom lane


Re: Extending varlena

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Mon, Aug 18, 2008 at 06:09:13PM -0400, Andrew Chernow wrote:
>> I always find these requests puzzling.  Is it really useful to store
>> the  data for a jpeg, video file or a 10GB tar ball in a database
>> column?  

> It is if you need transaction semantics.  Think medical records, etc.

The basic problem with outside-the-DB data storage is keeping it in sync
with your inside-the-DB metadata.  In a slowly changing dataset you can
probably get away with external storage, but if there's lots of updates
then allowing the DB to manage the storage definitely makes life easier.

This is not to say that you want SQL-style operations on the blobs;
in fact I think you probably don't, which is why I was pointing to
a LO-style API.
        regards, tom lane


Re: Extending varlena

From
David Fetter
Date:
On Mon, Aug 18, 2008 at 07:31:04PM -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > On Mon, Aug 18, 2008 at 04:22:56PM -0400, Tom Lane wrote:
> >> The main things I think we'd need to consider besides just the
> >> access API are
> >> 
> >> - permissions features (more than "none" anyway)
> 
> > Would ROLEs work, or are you thinking of the per-row and per-column
> > access controls people sometimes want?
> 
> Well, obviously roles are the entities that receive permissions, but
> on what do we base granting permissions to LOs?
> 
> With the current model that a LO is an independent entity that is merely
> referenced (or not) by OIDs in the database, it seems like we'd have to
> grant/revoke permissions to individual LOs, identified by OID; which
> sure seems messy to me.  People don't really want to name their LOs
> by OID anyway --- it's just a convention that's forced on them by the
> current implementation.
> 
> I was kinda wondering about something closer to the TOAST model, where
> a blob is only referenceable from a value that's in a table field;
> and that value encapsulates the "name" of the blob in some way that
> needn't even be user-visible.

This vaguely reminds me of Sybase's hidden primary keys.

> This'd greatly simplify the
> cleanup-dead-objects problem, and we could avoid addressing the
> permissions problem at all, since regular SQL permissions on the table
> would serve fine.  But it's not clear what regular SQL fetch and update
> behaviors should be like for such a thing.  (Fetching or storing the
> whole blob value is right out, IMHO.)  ISTR hearing of concepts roughly
> like this in other DBs --- does it ring a bell for anyone?

Informix has some pretty good blob-handling:

http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlr.doc/sqlrmst101.htm

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Extending varlena

From
"Gregory Williamson"
Date:
<p><font size="2">David Fetter wrote"<br /><br /> <...><br /> ><br /> > > This'd greatly simplify the<br
/>> > cleanup-dead-objects problem, and we could avoid addressing the<br /> > > permissions problem at all,
sinceregular SQL permissions on the table<br /> > > would serve fine.  But it's not clear what regular SQL fetch
andupdate<br /> > > behaviors should be like for such a thing.  (Fetching or storing the<br /> > > whole
blobvalue is right out, IMHO.)  ISTR hearing of concepts roughly<br /> > > like this in other DBs --- does it
ringa bell for anyone?<br /> ><br /> > Informix has some pretty good blob-handling:<br /> ><br /> > <a
href="http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlr.doc/sqlrmst101.htm">http://publib.boulder.ibm.com/infocenter/idshelp/v10/index.jsp?topic=/com.ibm.sqlr.doc/sqlrmst101.htm</a><br
/>><br /><br /> Agreed. I used Informix a few years back in a system that scanned both sides of multi-page financial
documents;we stored them in Informix' blobs, which IIRC could be tuned to be given number of bytes. We found that 90%
ofour images fit in a given size and since Informix raw disk access let them move up the whole blob in a single pass,
itwas quite fast, and gave us all the warmth and fuzziness of ACID functionality. But we didn't fetch parts of the BLOB
--metadata lived in its own table. There is/was an Illustra/Informix blade which let you in theory do some processing
ofimages (indexing) but that seems like a very specialized case.<br /><br /> Greg Williamson<br /> Senior DBA<br />
DigitalGlobe<br/><br /> Confidentiality Notice: This e-mail message, including any attachments, is for the sole use of
theintended recipient(s) and may contain confidential and privileged information and must be protected in accordance
withthose provisions. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the
intendedrecipient, please contact the sender by reply e-mail and destroy all copies of the original message.<br /><br
/>(My corporate masters made me say this.)<br /></font> 

Re: Extending varlena

From
Simon Riggs
Date:
On Mon, 2008-08-18 at 16:22 -0400, Tom Lane wrote:
> David Fetter <david@fetter.org> writes:
> > What would need to happen for the next jump up from where varlena is
> > now, to 8 bytes?
> 
> Dealing with upwards-of-4GB blobs as single Datums isn't remotely sane,
> and won't become so in the near (or even medium) future.  So I don't
> see the point of doing all the work that would be involved in making
> this go.
> 
> What would make more sense is to redesign the large-object stuff to be
> somewhat modern and featureful, and provide stream-access APIs (think
> lo_read, lo_seek, etc) that allow offsets wider than 32 bits.  The main
> things I think we'd need to consider besides just the access API are
> 
> - permissions features (more than "none" anyway)
> - better management of orphaned objects (obsoleting vacuumlo)
> - support > 16TB of large objects (maybe partition pg_largeobject?)
> - dump and restore probably need improvement to be practical for such
>   large data volumes

Sounds like a good list.

Probably also using a separate Sequence to allocate numbers rather than
using up all the Oids on LOs would be a good plan.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Extending varlena

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> Probably also using a separate Sequence to allocate numbers rather than
> using up all the Oids on LOs would be a good plan.

Well, assuming that your Large Objects are actually Large, you aren't
going to need as many OIDs as all that ;-)

However: I was chewing on this point a few days ago, and it seemed to me
that essentially duplicating the functionality of the OID generator
wasn't likely to be a win.  What seems more practical is to extend the
internal next-OID counter to 64 bits, and allow callers to get either
the full 64 bits or just the lowest 32 bits depending on what they need.
This change would actually be entirely transparent to 32-bit callers,
and the extra cycles to manage a 64-bit counter would surely be lost in
the noise compared to acquiring/releasing OidGenLock.
        regards, tom lane


Re: Extending varlena

From
Simon Riggs
Date:
On Mon, 2008-08-18 at 23:43 -0400, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > Probably also using a separate Sequence to allocate numbers rather than
> > using up all the Oids on LOs would be a good plan.
> 
> Well, assuming that your Large Objects are actually Large, you aren't
> going to need as many OIDs as all that ;-)
> 
> However: I was chewing on this point a few days ago, and it seemed to me
> that essentially duplicating the functionality of the OID generator
> wasn't likely to be a win.  What seems more practical is to extend the
> internal next-OID counter to 64 bits, and allow callers to get either
> the full 64 bits or just the lowest 32 bits depending on what they need.
> This change would actually be entirely transparent to 32-bit callers,
> and the extra cycles to manage a 64-bit counter would surely be lost in
> the noise compared to acquiring/releasing OidGenLock.

Sounds very cool.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



Re: Extending varlena

From
Jeremy Drake
Date:
On Mon, 18 Aug 2008, Tom Lane wrote:

> What would make more sense is to redesign the large-object stuff to be
> somewhat modern and featureful, and provide stream-access APIs (think
> lo_read, lo_seek, etc) that allow offsets wider than 32 bits.

A few years ago, I was working on such a project for a company I used to
work for.  The company changed directions shortly thereafter, and the
project was dropped, but perhaps the patch might still be useful as a
starting point for someone else.

The original patch is
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01026.php, and the
advice I was working on implementing was in
http://archives.postgresql.org/pgsql-hackers/2005-09/msg01063.php

I am attaching the latest version of the patch I found around.  As it was
almost 3 years ago, I am a little fuzzy on where I left off, but I do
remember that I was trying to work through the suggestions Tom Lane gave
in that second linked email.  I would recommend discarding the libpq
changes, since that seemed to not pass muster.

Note that this patch was against 8.0.3.  There only seem to be a few
issues applying it to the current head, but I haven't really dug into them
to see how difficult it would be to update.  Luckily, the large object
code is fairly slow-moving, so there aren't too many conflicts.  One thing
I did notice is that it looks like someone extracted one of the functions
I wrote in this patch and applied it as a 32-bit version.  Good for them.
I'm glad someone got some use out of this project, and perhaps more use
will come of it.



-- 
At the source of every error which is blamed on the computer you will
find at least two human errors, including the error of blaming it on
the computer.

Re: Extending varlena

From
Magnus Hagander
Date:
Josh Berkus wrote:
> Andrew,
> 
>> I always find these requests puzzling.  Is it really useful to store the
>> data for a jpeg, video file or a 10GB tar ball in a database column?
> 
> Some people find it useful.  Because LOs are actually easier to manage in 
> PG than in most other DBMSes, right now that's a significant source of 
> PostgreSQL adoption.  I'd like to encourage those users by giving them 
> more useful LO features.

Given that they don't replicate with most (any?) of the currently
available replication solutions, the fact that they are easy to use
becomes irrelevant fairly quickly to larger installations in my experience.

But the interface *is* nice, so if we could fix that, it would be very good.

//Magnus


Re: Extending varlena

From
Magnus Hagander
Date:
Simon Riggs wrote:
> On Mon, 2008-08-18 at 16:22 -0400, Tom Lane wrote:
>> David Fetter <david@fetter.org> writes:
>>> What would need to happen for the next jump up from where varlena is
>>> now, to 8 bytes?
>> Dealing with upwards-of-4GB blobs as single Datums isn't remotely sane,
>> and won't become so in the near (or even medium) future.  So I don't
>> see the point of doing all the work that would be involved in making
>> this go.
>>
>> What would make more sense is to redesign the large-object stuff to be
>> somewhat modern and featureful, and provide stream-access APIs (think
>> lo_read, lo_seek, etc) that allow offsets wider than 32 bits.  The main
>> things I think we'd need to consider besides just the access API are
>>
>> - permissions features (more than "none" anyway)
>> - better management of orphaned objects (obsoleting vacuumlo)
>> - support > 16TB of large objects (maybe partition pg_largeobject?)
>> - dump and restore probably need improvement to be practical for such
>>   large data volumes
> 
> Sounds like a good list.
> 
> Probably also using a separate Sequence to allocate numbers rather than
> using up all the Oids on LOs would be a good plan.

The ability to partition the large object store would not suck either...
For backup/recovery purposes mainly.

//Magnus



Re: Extending varlena

From
Peter Eisentraut
Date:
Am Monday, 18. August 2008 schrieb Tom Lane:
> - permissions features (more than "none" anyway)
> - better management of orphaned objects (obsoleting vacuumlo)
> - support > 16TB of large objects (maybe partition pg_largeobject?)
> - dump and restore probably need improvement to be practical for such
>   large data volumes

If you replace the third point by "maybe partition TOAST tables", replace 
large object handle by TOAST pointer, and create an API to work on TOAST 
pointers, how are the two so much different?  And why should they be?  I can 
see that there are going to be needs to access large data with interfaces 
that are not traditional SQL, but at least the storage handling could be the 
same.  That way you would solve the first two points and others for free.


Re: Extending varlena

From
"Kevin Grittner"
Date:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> I was kinda wondering about something closer to the TOAST model,
where
> a blob is only referenceable from a value that's in a table field;
> and that value encapsulates the "name" of the blob in some way that
> needn't even be user-visible.  This'd greatly simplify the
> cleanup-dead-objects problem, and we could avoid addressing the
> permissions problem at all, since regular SQL permissions on the
table
> would serve fine.  But it's not clear what regular SQL fetch and
update
> behaviors should be like for such a thing.  (Fetching or storing the
> whole blob value is right out, IMHO.)  ISTR hearing of concepts
roughly
> like this in other DBs --- does it ring a bell for anyone?
It'd probably be good to have methods parallel to the JDBC API within
the implementation.
http://java.sun.com/javase/6/docs/api/java/sql/Blob.html
http://java.sun.com/javase/6/docs/api/java/sql/Clob.html
http://java.sun.com/javase/6/docs/api/java/sql/NClob.html
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#getBlob(int)
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#getClob(int)
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#getNClob(int)
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#updateBlob(int,%20java.sql.Blob)
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#updateBlob(int,%20java.io.InputStream)
http://java.sun.com/javase/6/docs/api/java/sql/ResultSet.html#updateBlob(int,%20java.io.InputStream,%20long)
(similar for CLOB & NCLOB) 
Reading a blob value gives you an object which lets you perform the
stream-based operations against the blob.  To set a blob value on an
insert, you prepare a statement and then link the stream to the blob
-- the insertRow method sucks the data from the stream.  To set a blob
on an update, you use an updateable cursor (or maybe a prepared
statement) to do the same.  You can set a lob from another lob
directly in SQL  I assume we'd want to support streams directly inline
in the protocol, as well as support functions to convert between
datums and streams (for, say, tiny little 2MB or 10MB values), and
files and streams (kinda like COPY).
-Kevin


Re: Extending varlena

From
"Stephen R. van den Berg"
Date:
Peter Eisentraut wrote:
>If you replace the third point by "maybe partition TOAST tables", replace 
>large object handle by TOAST pointer, and create an API to work on TOAST 
>pointers, how are the two so much different?  And why should they be?  I can 
>see that there are going to be needs to access large data with interfaces 
>that are not traditional SQL, but at least the storage handling could be the 
>same.  That way you would solve the first two points and others for free.

I've long wondered why toast and large object storage is not one and the
same (indeed).
It seems a like a natural solution to marry the two.
-- 
Sincerely,          Stephen R. van den Berg.
E-mails should be like a lady's skirt:
Long enough to cover the subject, and short enough to be interesting.


Re: Extending varlena

From
Tom Lane
Date:
> Peter Eisentraut wrote:
>> If you replace the third point by "maybe partition TOAST tables", replace 
>> large object handle by TOAST pointer, and create an API to work on TOAST 
>> pointers, how are the two so much different?  And why should they be?

The reason they should be different is that (IMHO anyway) you don't want
the default behavior of SELECT * FROM ... to include pulling back the
entire contents of the blob.  Indeed, we *can't* have that be the
behavior, unless we want to go back to the proposal that started this
thread of making the entire system safe for multi-gigabyte datums.

It's certainly possible that the underlying implementation could be
just TOAST, but we need some other API at the SQL level.
        regards, tom lane