Thread: Auto-delete large objects when referencing row is deleted

Auto-delete large objects when referencing row is deleted

From
higepon
Date:
Hi.
I found a TODO item "pg_dump Add dumping of comments on index columns"
for large objects.
and want to write a patch for it.

I examined contrib/lo which offers this functionality.

I have two plans, can anybody give me some advice on these?

Plan A:
   (1) Define a new type for large object   PostgreSQL stores blob columns as Oid type.   But to delete large objects,
we have to distinguish large objects as being different from Oid
 
type objects.
   So a new type for large object, say "lo type" should be defined on
pg_type.h .
   For compatibility with Oid values, we may add some code to pg_cast.h .
   (2) Define a trigger on create table   When "create table" has large object type columns,   PostgreSQL define a
tirggerwhich delete large object on
 
update/delete the row.   We can use the trigger defined contrib/lo for this purpose.
   We may add hook code on "create table" at
/src/backend/commands/tablecmds.c .
   (3) truncate/drop table   On truncate table or drop table, the trigger can't be used.   We have to handle this
case.

Plan B:
   This plan is quite simple.   Merge contrib/vacumelo to VACUUM.
   (1) Define a new type for large object     Same as Plan A. (unnecessary ?)
   (2) Delete on VACUUM     On VACUUM, we check all tables which have "lo type",     and delete un-referenced large
objects.    We may add a option "deleting large objects automatically" on VACUUM.
 


Best regards,

-----
Taro Minowa(Higepon)

Cybozu Labs, Inc.    

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/


Re: Auto-delete large objects when referencing row is deleted

From
Tom Lane
Date:
higepon <higepon@gmail.com> writes:
> I found a TODO item "pg_dump Add dumping of comments on index columns"
> for large objects.
> and want to write a patch for it.

I assume you mean $subject and not what you wrote here.

> I examined contrib/lo which offers this functionality.

Yes.  I wonder why the TODO item is there at all, when contrib/lo
already solves it in a perfectly reasonable way.
        regards, tom lane


Re: Auto-delete large objects when referencing row is deleted

From
higepon
Date:
Hi

> I assume you mean $subject and not what you wrote here.

Yes. Sorry it's my mistake.

>> I examined contrib/lo which offers this functionality.
>
> Yes.  I wonder why the TODO item is there at all, when contrib/lo
> already solves it in a perfectly reasonable way.

As a user of database, I think contrib/lo is not the best way.
Because it's not a part of core PostgreSQL, users may forget to use them.
Or it is a little messy to use.
So I think we need to implement *Auto* delete functionality in PostgreSQL core.

Cheers.

-----
Taro Minowa(Higepon)

Cybozu Labs, Inc.

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/

On Mon, Apr 6, 2009 at 11:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> higepon <higepon@gmail.com> writes:
>> I found a TODO item "pg_dump Add dumping of comments on index columns"
>> for large objects.
>> and want to write a patch for it.
>
> I assume you mean $subject and not what you wrote here.
>
>> I examined contrib/lo which offers this functionality.
>
> Yes.  I wonder why the TODO item is there at all, when contrib/lo
> already solves it in a perfectly reasonable way.
>
>                        regards, tom lane
>


Re: Auto-delete large objects when referencing row is deleted

From
Itagaki Takahiro
Date:
higepon <higepon@gmail.com> wrote:

> As a user of database, I think contrib/lo is not the best way.
> Because it's not a part of core PostgreSQL, users may forget to use them.
> Or it is a little messy to use.
> So I think we need to implement *Auto* delete functionality in PostgreSQL core.

(It would be a rare case, but) A large object might be referenced
by two or more rows because LO interface is split into two steps;
allocating oid and storing data for it. The oid could be stored in
two or more places and auto deletion would break such usecases.

BTW, bytea and TOASTing would works perfectly as you expected.
Why don't you use bytea instead of large objects? In other words,
what you want actually is not LO improvement but efficient TOASTing, no?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center




Re: Auto-delete large objects when referencing row is deleted

From
higepon
Date:
Hi.
> <itagaki.takahiro@oss.ntt.co.jp> wrote:

> (It would be a rare case, but) A large object might be referenced
> by two or more rows because LO interface is split into two steps;
> allocating oid and storing data for it. The oid could be stored in
> two or more places and auto deletion would break such usecases.

Indeed. We have to check the references on garbage collecting.
For this reason, my plan B "Merge contrib/vacumelo to VACUUM" is
easier to implement.

> BTW, bytea and TOASTing would works perfectly as you expected.
> Why don't you use bytea instead of large objects? In other words,
> what you want actually is not LO improvement but efficient TOASTing, no?

First of all, what I want is to contribute to PostgreSQL community by
writing patches.
And picked this issue up from TODO list.
So if there's no need to do about this issue, I will pick up another one :-)

I've checked some articles about "Oid large objects vs bytea".
If I understand them correctly, I think
both large objects and bytea are useful for different situations.
Neither of them are obsolete.

Is there no need to do about this issue?

Cheers.

==========================================================
the negative points of bytea: memory hungry. slower than large objects. 1GB limitation.

the negative points of large objects: ghost problem (no auto-delete). unable to store number of objects greater than
2^32.
==========================================================

-----
Taro Minowa(Higepon)

Cybozu Labs, Inc.    

http://www.monaos.org/
http://code.google.com/p/mosh-scheme/


On Wed, Apr 8, 2009 at 1:15 PM, Itagaki Takahiro
<itagaki.takahiro@oss.ntt.co.jp> wrote:
>
> higepon <higepon@gmail.com> wrote:
>
>> As a user of database, I think contrib/lo is not the best way.
>> Because it's not a part of core PostgreSQL, users may forget to use them.
>> Or it is a little messy to use.
>> So I think we need to implement *Auto* delete functionality in PostgreSQL core.
>
> (It would be a rare case, but) A large object might be referenced
> by two or more rows because LO interface is split into two steps;
> allocating oid and storing data for it. The oid could be stored in
> two or more places and auto deletion would break such usecases.
>
> BTW, bytea and TOASTing would works perfectly as you expected.
> Why don't you use bytea instead of large objects? In other words,
> what you want actually is not LO improvement but efficient TOASTing, no?
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>