Thread: Patch for better large objects support
Hello all, Here is a patch attached which implement the following strategy of large object handling: 1. There's new system table: pg_largeobject. 2. All large objects are stored inside files not relations. 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs. Hashing density can be tuned in config.h.in. 4. For search in pg_largeobject we always use index scan. That's all. This strategy is better than existing due to: 1. pg_class, pg_index, pg_attributes system tables are not bloated with large objects. 2. Hashing dir mechanism is faster than lots of files in one dir. 3. Files are much faster than relations. Also we save lots of space on indices. What is not done: 1. Dirs are not removed if there's no any lo's inside. (Is it neccessary???) -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Attachment
Will anybody want to use this when TOAST comes to be? Denis Perchine wrote: > > Hello all, > > Here is a patch attached which implement the following strategy of large object handling: > 1. There's new system table: pg_largeobject. > 2. All large objects are stored inside files not relations. > 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs. > Hashing density can be tuned in config.h.in. > 4. For search in pg_largeobject we always use index scan. > > That's all. This strategy is better than existing due to: > 1. pg_class, pg_index, pg_attributes system tables are not bloated with large objects. > 2. Hashing dir mechanism is faster than lots of files in one dir. > 3. Files are much faster than relations. Also we save lots of space on indices. > > What is not done: > 1. Dirs are not removed if there's no any lo's inside. (Is it neccessary???) > > -- > Sincerely Yours, > Denis Perchine > > ---------------------------------- > E-Mail: dyp@perchine.com > HomePage: http://www.perchine.com/dyp/ > FidoNet: 2:5000/120.5 > ---------------------------------- > > ------------------------------------------------------------------------ > Name: pgsql.lo.new.patch.gz > pgsql.lo.new.patch.gz Type: application/x-gzip > Encoding: base64
Chris Bitmead <chris@bitmead.com> writes: > Will anybody want to use this when TOAST comes to be? I think it's worth doing --- existing users of large objects will probably not want to move all their code overnight. The core developers have mostly felt they had more pressing problems to work on, but if someone wants to contribute a better implementation of large objects I have no objection... >> Here is a patch attached which implement the following strategy of large object handling: >> 1. There's new system table: pg_largeobject. >> 2. All large objects are stored inside files not relations. >> 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs. >> Hashing density can be tuned in config.h.in. >> 4. For search in pg_largeobject we always use index scan. However, that is the wrong way to go about it. The really fatal objection is that you have given up transactional semantics for large objects --- if you don't keep the data in a relation then how will you roll back an aborted write? A lesser objection is that you are working hard to create a poor substitute for indexing that Postgres already has perfectly good mechanisms for. Having to tune a config parameter by guessing how many LOs I will have doesn't strike me as attractive. The approach that's been discussed in the past is to retain the existing relation-based storage mechanism for large objects, but to combine all the LOs of a database into one relation by adding an additional column that is the LO identifier number. By indexing this single relation on LO identifier + chunk number (two columns), access should be just as fast as for any other scheme you might come up with. regards, tom lane
> Will anybody want to use this when TOAST comes to be? 1. There's no any TOAST at the moment. 2. For really large objects TOAST will be really inefficient for quite small < 64K other way around. > > Here is a patch attached which implement the following strategy of large object handling: > > 1. There's new system table: pg_largeobject. > > 2. All large objects are stored inside files not relations. > > 3. Large objects stored in dir $PGDATA/base/$DATABASE/lo in hashed dirs. > > Hashing density can be tuned in config.h.in. > > 4. For search in pg_largeobject we always use index scan. > > > > That's all. This strategy is better than existing due to: > > 1. pg_class, pg_index, pg_attributes system tables are not bloated with large objects. > > 2. Hashing dir mechanism is faster than lots of files in one dir. > > 3. Files are much faster than relations. Also we save lots of space on indices. > > > > What is not done: > > 1. Dirs are not removed if there's no any lo's inside. (Is it neccessary???) > > > > -- > > Sincerely Yours, > > Denis Perchine > > > > ---------------------------------- > > E-Mail: dyp@perchine.com > > HomePage: http://www.perchine.com/dyp/ > > FidoNet: 2:5000/120.5 > > ---------------------------------- > > > > ------------------------------------------------------------------------ > > Name: pgsql.lo.new.patch.gz > > pgsql.lo.new.patch.gz Type: application/x-gzip > > Encoding: base64 -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
> > Will anybody want to use this when TOAST comes to be? > > 1. There's no any TOAST at the moment. > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > way around. This stuff is going into 7.1, and TOAST will be there. Also TOAST will not be inefficient for small objects. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > > Will anybody want to use this when TOAST comes to be? > > > > 1. There's no any TOAST at the moment. > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > > way around. > > This stuff is going into 7.1, and TOAST will be there. Also TOAST will > not be inefficient for small objects. Sorry. Maybe my english is a little bit dumb. I said that TOAST will be inefficient for really large objects. And for small objects it will be much more efficient. BTW, when 7.1 is planned? -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
> > > > Will anybody want to use this when TOAST comes to be? > > > > > > 1. There's no any TOAST at the moment. > > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > > > way around. > > > > This stuff is going into 7.1, and TOAST will be there. Also TOAST will > > not be inefficient for small objects. > > Sorry. Maybe my english is a little bit dumb. I said that TOAST will be inefficient > for really large objects. And for small objects it will be much more efficient. I think we will have good performance up to 1 gig. > > BTW, when 7.1 is planned? August. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > Will anybody want to use this when TOAST comes to be? > > 1. There's no any TOAST at the moment. I wasn't implying the patch is bad. Only wondering out load if toast will be a super-set of large objects. > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > way around. Why will toast be inefficient for really large objects?
> > > Will anybody want to use this when TOAST comes to be? > > > > 1. There's no any TOAST at the moment. > > I wasn't implying the patch is bad. Only wondering out load if toast > will be a super-set of large objects. Not exactly. > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > > way around. > > Why will toast be inefficient for really large objects? Because data is stored in relations, and there's extra overhead for managing them. Just look on Jan's mail in [HACKERS] for better description of the difference. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------
Denis Perchine wrote: > > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > > > way around. > > > > Why will toast be inefficient for really large objects? > > Because data is stored in relations, and there's extra overhead for managing them. > Just look on Jan's mail in [HACKERS] for better description of the difference. According to Tom's last email, current LOs are stored in relations now. -- Chris Bitmead mailto:chris@bitmead.com
> > > > 2. For really large objects TOAST will be really inefficient for quite small < 64K other > > > > way around. > > > > > > Why will toast be inefficient for really large objects? > > > > Because data is stored in relations, and there's extra overhead for managing them. > > Just look on Jan's mail in [HACKERS] for better description of the difference. > > According to Tom's last email, current LOs are stored in relations now. That was a part of my patch. :-)) Anyway I will try to implement one-table LO storage for transaction integrity. Read Jan's mail. -- Sincerely Yours, Denis Perchine ---------------------------------- E-Mail: dyp@perchine.com HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 ----------------------------------