Thread: Patch for better large objects support

Patch for better large objects support

From
Denis Perchine
Date:
Hello all,

Here is a patch attached which implement the following strategy of large object handling:
1. There's new system table: pg_largeobject.
2. All large objects are stored inside files not relations.
3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs.
Hashing density can be tuned in config.h.in.
4. For search in pg_largeobject we always use index scan.

That's all. This strategy is better than existing due to:
1. pg_class, pg_index, pg_attributes system tables are not bloated with large objects.
2. Hashing dir mechanism is faster than lots of files in one dir.
3. Files are much faster than relations. Also we save lots of space on indices.

What is not done:
1. Dirs are not removed if there's no any lo's inside. (Is it neccessary???)

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Attachment

Re: Patch for better large objects support

From
Chris Bitmead
Date:
Will anybody want to use this when TOAST comes to be?

Denis Perchine wrote:
>
> Hello all,
>
> Here is a patch attached which implement the following strategy of large object handling:
> 1. There's new system table: pg_largeobject.
> 2. All large objects are stored inside files not relations.
> 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs.
> Hashing density can be tuned in config.h.in.
> 4. For search in pg_largeobject we always use index scan.
>
> That's all. This strategy is better than existing due to:
> 1. pg_class, pg_index, pg_attributes system tables are not bloated with large objects.
> 2. Hashing dir mechanism is faster than lots of files in one dir.
> 3. Files are much faster than relations. Also we save lots of space on indices.
>
> What is not done:
> 1. Dirs are not removed if there's no any lo's inside. (Is it neccessary???)
>
> --
> Sincerely Yours,
> Denis Perchine
>
> ----------------------------------
> E-Mail: dyp@perchine.com
> HomePage: http://www.perchine.com/dyp/
> FidoNet: 2:5000/120.5
> ----------------------------------
>
>   ------------------------------------------------------------------------
>                             Name: pgsql.lo.new.patch.gz
>    pgsql.lo.new.patch.gz    Type: application/x-gzip
>                         Encoding: base64

Re: Patch for better large objects support

From
Tom Lane
Date:
Chris Bitmead <chris@bitmead.com> writes:
> Will anybody want to use this when TOAST comes to be?

I think it's worth doing --- existing users of large objects will
probably not want to move all their code overnight.  The core developers
have mostly felt they had more pressing problems to work on, but if
someone wants to contribute a better implementation of large objects
I have no objection...

>> Here is a patch attached which implement the following strategy of large object handling:
>> 1. There's new system table: pg_largeobject.
>> 2. All large objects are stored inside files not relations.
>> 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs.
>> Hashing density can be tuned in config.h.in.
>> 4. For search in pg_largeobject we always use index scan.

However, that is the wrong way to go about it.  The really fatal
objection is that you have given up transactional semantics for large
objects --- if you don't keep the data in a relation then how will you
roll back an aborted write?  A lesser objection is that you are working
hard to create a poor substitute for indexing that Postgres already has
perfectly good mechanisms for.  Having to tune a config parameter by
guessing how many LOs I will have doesn't strike me as attractive.

The approach that's been discussed in the past is to retain the existing
relation-based storage mechanism for large objects, but to combine all
the LOs of a database into one relation by adding an additional column
that is the LO identifier number.  By indexing this single relation on
LO identifier + chunk number (two columns), access should be just as
fast as for any other scheme you might come up with.

            regards, tom lane

Re: Patch for better large objects support

From
Denis Perchine
Date:
> Will anybody want to use this when TOAST comes to be?

1. There's no any TOAST at the moment.
2. For really large objects TOAST will be really inefficient for quite small < 64K other
way around.

> > Here is a patch attached which implement the following strategy of large object handling:
> > 1. There's new system table: pg_largeobject.
> > 2. All large objects are stored inside files not relations.
> > 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs.
> > Hashing density can be tuned in config.h.in.
> > 4. For search in pg_largeobject we always use index scan.
> >
> > That's all. This strategy is better than existing due to:
> > 1. pg_class, pg_index, pg_attributes system tables are not bloated with large objects.
> > 2. Hashing dir mechanism is faster than lots of files in one dir.
> > 3. Files are much faster than relations. Also we save lots of space on indices.
> >
> > What is not done:
> > 1. Dirs are not removed if there's no any lo's inside. (Is it neccessary???)
> >
> > --
> > Sincerely Yours,
> > Denis Perchine
> >
> > ----------------------------------
> > E-Mail: dyp@perchine.com
> > HomePage: http://www.perchine.com/dyp/
> > FidoNet: 2:5000/120.5
> > ----------------------------------
> >
> >   ------------------------------------------------------------------------
> >                             Name: pgsql.lo.new.patch.gz
> >    pgsql.lo.new.patch.gz    Type: application/x-gzip
> >                         Encoding: base64
--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Patch for better large objects support

From
Bruce Momjian
Date:
> > Will anybody want to use this when TOAST comes to be?
>
> 1. There's no any TOAST at the moment.
> 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> way around.

This stuff is going into 7.1, and TOAST will be there.  Also TOAST will
not be inefficient for small objects.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Patch for better large objects support

From
Denis Perchine
Date:
> > > Will anybody want to use this when TOAST comes to be?
> >
> > 1. There's no any TOAST at the moment.
> > 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> > way around.
>
> This stuff is going into 7.1, and TOAST will be there.  Also TOAST will
> not be inefficient for small objects.

Sorry. Maybe my english is a little bit dumb. I said that TOAST will be inefficient
for really large objects. And for small objects it will be much more efficient.

BTW, when 7.1 is planned?

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Patch for better large objects support

From
Bruce Momjian
Date:
> > > > Will anybody want to use this when TOAST comes to be?
> > >
> > > 1. There's no any TOAST at the moment.
> > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> > > way around.
> >
> > This stuff is going into 7.1, and TOAST will be there.  Also TOAST will
> > not be inefficient for small objects.
>
> Sorry. Maybe my english is a little bit dumb. I said that TOAST will be inefficient
> for really large objects. And for small objects it will be much more efficient.

I think we will have good performance up to 1 gig.

>
> BTW, when 7.1 is planned?

August.

--
  Bruce Momjian                        |  http://www.op.net/~candle
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: Patch for better large objects support

From
Chris Bitmead
Date:
> > Will anybody want to use this when TOAST comes to be?
>
> 1. There's no any TOAST at the moment.

I wasn't implying the patch is bad. Only wondering out load if toast
will be a super-set of large objects.

> 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> way around.

Why will toast be inefficient for really large objects?

Re: Patch for better large objects support

From
Denis Perchine
Date:
> > > Will anybody want to use this when TOAST comes to be?
> >
> > 1. There's no any TOAST at the moment.
>
> I wasn't implying the patch is bad. Only wondering out load if toast
> will be a super-set of large objects.

Not exactly.

> > 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> > way around.
>
> Why will toast be inefficient for really large objects?

Because data is stored in relations, and there's extra overhead for managing them.
Just look on Jan's mail in [HACKERS] for better description of the difference.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------

Re: Patch for better large objects support

From
Chris Bitmead
Date:
Denis Perchine wrote:

> > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> > > way around.
> >
> > Why will toast be inefficient for really large objects?
>
> Because data is stored in relations, and there's extra overhead for managing them.
> Just look on Jan's mail in [HACKERS] for better description of the difference.

According to Tom's last email, current LOs are stored in relations now.

--
Chris Bitmead
mailto:chris@bitmead.com

Re: Patch for better large objects support

From
Denis Perchine
Date:
> > > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other
> > > > way around.
> > >
> > > Why will toast be inefficient for really large objects?
> >
> > Because data is stored in relations, and there's extra overhead for managing them.
> > Just look on Jan's mail in [HACKERS] for better description of the difference.
>
> According to Tom's last email, current LOs are stored in relations now.

That was a part of my patch. :-)) Anyway I will try to implement one-table LO storage
for transaction integrity. Read Jan's mail.

--
Sincerely Yours,
Denis Perchine

----------------------------------
E-Mail: dyp@perchine.com
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
----------------------------------