Thread: Large Objects

Large Objects

From

"Steven Lacroix"

Date:

19 September 2000, 14:44:20

A couple of questions and concerns about Blobs.

I'm wondering what kind of performance hits do BLOBS have on a database
large database.

Currently working on implementing a database with images.  I guess i'm
looking for some numbers showing the performence.   Note that it would be
for web database project.

Re: Large Objects

From

dyp

Date:

20 September 2000, 00:28:15

Hello Steven,

Tuesday, September 19, 2000, 11:00:02 PM, you wrote:

SL> A couple of questions and concerns about Blobs.

SL> I'm wondering what kind of performance hits do BLOBS have on a database
SL> large database.

SL> Currently working on implementing a database with images.  I guess i'm
SL> looking for some numbers showing the performence.   Note that it would be
SL> for web database project.

It hardly depends on the amount of objects you have.
In current LO implementation there is really huge drawback.
It creates 2 files for each LO. When you have > 10000 LOs you are
lost.

There was a patch made by me to provide LO support inside one table.

Without this patch I had approximatly 30 minutes to insert 10000 LOs.
With it it was only 5 min neccessary.

Also you should understand that all measures hardly depends on the
size of the LOs you will have. It would be easier for you just write
some small test cases on the random data and measure them.

--
Best regards,
 dyp                            mailto:dyp@perchine.com

Re: Large Objects

From

Tim Kientzle

Date:

20 September 2000, 18:46:22

Steven Lacroix asks:
> ... what kind of performance hits do BLOBS have on a database ...
> Note that it would be for web database project.

I haven't tried this with PostgreSQL, but I ran some experiments
to compare the speed of access for large numbers of BLOBs stored
in a single MySQL table and in a directory in the local filesystem.

For more than 10,000 BLOBs, the database was a clear winner.
That is, it took less time to locate and read an 8k-16k BLOB
from the MySQL database than from the local disk.  For smaller
numbers of BLOBs, the filesystem was faster.  This is pretty
much what you should expect:  reading a file through a system
call should be faster than doing a network operation to request
a BLOB.  Conversely, Unix filesystems store directories as unsorted
lists, which are a lot slower to search than the database's
structured indexes.

I'm currently working on a web publishing system that stores
all content as BLOBs in a back-end MySQL database; performance
is very good so far.

I'll be giving a talk about this at BSDCon in Monterey next month
with more details...

            - Tim Kientzle

Re: Re: Large Objects

From

Neil Conway

Date:

20 September 2000, 21:14:37

On Wed, Sep 20, 2000 at 03:45:39PM -0700, Tim Kientzle wrote:
> Steven Lacroix asks:
> > ... what kind of performance hits do BLOBS have on a database ...
> > Note that it would be for web database project.
>
> I haven't tried this with PostgreSQL, but I ran some experiments
> to compare the speed of access for large numbers of BLOBs stored
> in a single MySQL table and in a directory in the local filesystem.
>
> For more than 10,000 BLOBs, the database was a clear winner.
> That is, it took less time to locate and read an 8k-16k BLOB
> from the MySQL database than from the local disk.  For smaller
> numbers of BLOBs, the filesystem was faster.  This is pretty
> much what you should expect:  reading a file through a system
> call should be faster than doing a network operation to request
> a BLOB.  Conversely, Unix filesystems store directories as unsorted
> lists, which are a lot slower to search than the database's
> structured indexes.

Wow, can anyone confirm this (with Postgres preferrably)? In talking
with some developers at my old job, they all agreed that storing large
pieces of data (1k < x < 16K) was significantly faster on the FS than
in the DB. They were running Postgres 6.5 w/ JDBC on Solaris. I didn't
ask them what gave them this idea, but are they incorrect?

The project I'm working on currently needs to store a large number of
relatively small pieces of data (from 256 chars to a maximum of ~ 64 K
that is rarely used, average ~ 1-2K). Should this data be stored as a LO,
or a file on the FS? There could be up to 1 or 2 million of these
objects (although something around 100,000 would be more common).

Also, will this change with 7.1? I've heard the 8K row-length limit is
going away, so I suppose I could store this data directly in the
DB (as a text column, I guess). At that point, would this be the
best choice?

Thanks in advance,

Neil

--
Neil Conway <neilconway@home.com>
Get my GnuPG key from: http://klamath.dyndns.org/mykey.asc
Encrypted mail welcomed

They that give up liberty to obtain a little temporary security deserve
neither liberty nor safety.
        -- Benjamin Franklin

Attachment

msg-32091-14035.dat

Re: Re: Large Objects

From

Alessio Bragadini

Date:

21 September 2000, 08:20:50

Neil Conway wrote:

> > a BLOB.  Conversely, Unix filesystems store directories as unsorted
> > lists, which are a lot slower to search than the database's
> > structured indexes.

> Wow, can anyone confirm this (with Postgres preferrably)? In talking
> with some developers at my old job, they all agreed that storing large
> pieces of data (1k < x < 16K) was significantly faster on the FS than

I believe he's talking about storing all files in the same directory,
which is simply The Wrong Way for a number of reasons. While saving a
large number of external files, we use a sub-dir structure in the form
/data/f4/d3/12/myfile.bin in order to spread the number of files in a
tree pseudorandomly. This is the same approach used by the Squid
webcache.

--
Alessio F. Bragadini        alessio@albourne.com
APL Financial Services        http://village.albourne.com
Nicosia, Cyprus             phone: +357-2-755750

"It is more complicated than you think"
        -- The Eighth Networking Truth from RFC 1925

Re: Re: Large Objects

From

"Edward Q. Bridges"

Date:

21 September 2000, 10:57:03

in effect, this turns the filesystem into a "poor-mans" balanced tree.
the rdbms gives you a "rich-mans" balanced tree, but along with the
overhead of the rdbms.

cheers
--e--



On Thu, 21 Sep 2000 15:20:39 +0300, Alessio Bragadini wrote:

> Neil Conway wrote:
>
> > > a BLOB.  Conversely, Unix filesystems store directories as unsorted
> > > lists, which are a lot slower to search than the database's
> > > structured indexes.
>
> > Wow, can anyone confirm this (with Postgres preferrably)? In talking
> > with some developers at my old job, they all agreed that storing large
> > pieces of data (1k < x < 16K) was significantly faster on the FS than
>
> I believe he's talking about storing all files in the same directory,
> which is simply The Wrong Way for a number of reasons. While saving a
> large number of external files, we use a sub-dir structure in the form
> /data/f4/d3/12/myfile.bin in order to spread the number of files in a
> tree pseudorandomly. This is the same approach used by the Squid
> webcache.
>
> --
> Alessio F. Bragadini        alessio@albourne.com
> APL Financial Services        http://village.albourne.com
> Nicosia, Cyprus             phone: +357-2-755750
>
> "It is more complicated than you think"
>         -- The Eighth Networking Truth from RFC 1925
>

Re: Re: Large Objects

From

Tom Lane

Date:

21 September 2000, 22:31:22

Neil Conway <nconway@klamath.dyndns.org> writes:
>> For more than 10,000 BLOBs, the database was a clear winner.
>> That is, it took less time to locate and read an 8k-16k BLOB
>> from the MySQL database than from the local disk.  For smaller
>> numbers of BLOBs, the filesystem was faster.

> Wow, can anyone confirm this (with Postgres preferrably)? In talking
> with some developers at my old job, they all agreed that storing large
> pieces of data (1k < x < 16K) was significantly faster on the FS than
> in the DB. They were running Postgres 6.5 w/ JDBC on Solaris.

This is undoubtedly right for existing Postgres releases, because
currently PG stores each BLOB as an independent table --- and just to
add insult to injury, creates an index for it :-(.  So you actually have
*two* added files in the DB directory per BLOB.  Needless to say, this
does not scale real well to large numbers of BLOBs.

Denis Perchine has done the legwork to store BLOBs in a more reasonable
fashion, ie all together in one big (but indexed!) table.  His patch
needs review by a core person before we'll consider it safe to commit,
but I have that as a personal "must do for 7.1".  So those MySQL results
should also apply to Postgres in 7.1 and later.

You may be wondering "why bother, given TOAST?"  (TOAST is long-tuple
support, for those who haven't been paying attention.)  Although TOAST
should make many uses of BLOBs obsolete, there will be lots of
applications that don't get updated right away (if ever), and still some
where BLOBs are a natural, unforced solution.  So I think it's worth
capturing this performance improvement.  It's just a shame we didn't get
round to it sooner when BLOBs were the only game in town.

            regards, tom lane

Re: Re: Large Objects

From

Adam Haberlach

Date:

22 September 2000, 00:16:38

On Thu, Sep 21, 2000 at 10:30:31PM -0400, Tom Lane wrote:
> You may be wondering "why bother, given TOAST?"  (TOAST is long-tuple
> support, for those who haven't been paying attention.)  Although TOAST
> should make many uses of BLOBs obsolete, there will be lots of
> applications that don't get updated right away (if ever), and still some
> where BLOBs are a natural, unforced solution.  So I think it's worth
> capturing this performance improvement.  It's just a shame we didn't get
> round to it sooner when BLOBs were the only game in town.

    On that note, since it seems that the large objects that I've
just started using aren't likely to go away right this moment (and
their interface works for what I'm doing anyway), what is a good way
to write backend functions in C that will deal with large objects?

    I would like to write some backend routines that will do
things like resize, watermark, and cache graphics files.  If I
pass an Oid into a function, how would I access its data?

--
Adam Haberlach            | A billion hours ago, human life appeared on
adam@newsnipple.com       | earth.  A billion minutes ago, Christianity
http://www.newsnipple.com | emerged.  A billion Coca-Colas ago was
'88 EX500                 | yesterday morning. -1996 Coca-Cola Ann. Rpt.

Re: Re: Large Objects

From

David McWherter

Date:

22 September 2000, 10:21:23

 > currently PG stores each BLOB as an independent table --- and just to
 > add insult to injury, creates an index for it :-(.  So you actually have
 > *two* added files in the DB directory per BLOB.  Needless to say, this
 > does not scale real well to large numbers of BLOBs.
 >
 > Denis Perchine has done the legwork to store BLOBs in a more reasonable
 > fashion, ie all together in one big (but indexed!) table.  His patch
 > needs review by a core person before we'll consider it safe to commit,
 > but I have that as a personal "must do for 7.1".  So those MySQL results
 > should also apply to Postgres in 7.1 and later.

I was experimenting with the patches in question, after applying
them to postgres-7.0.2.  I'm encountering a nasty problem when
inserting my own huge (several megabyte) files into the relation,
that I can't quite figure out.

Apparently, what's going on is that the inserts/updates into the
pg_largeobject table result in buffers getting leaked (as
reported by BufferPoolCheckLeak()).  Eventually, importing
a file results in exhausting the block free list.  I think
there were a couple of places where a ReleaseBuffer was
needed, but not called...i've modified his code for inv_write_existing
to look more like:

------------
   buffer = InvalidBuffer;
   while ( index_getnext(...) ) {
     if ( buffer != InvalidBuffer ) {
        ReleaseBuffer( buffer );
        buffer = InvalidBuffer;
     }
     heap_fetch( ..., &buffer );
     index_endscan(...);

     {
        /* build a block of data */
        ...
        ntup = heap_modifytuple( ... );
        heap_update( ... );
        heap_freetuple( ntup );

    }
    if ( buffer != InvalidBuffer ) {
       WriteBuffer( buffer );
    }
------------

The Leak-check seems to indicate that the pages that are getting
leaked are all dirty pages, which seems slightly odd to me...if
I call it at the end of this loop function (which gets called over
and over again during an import), I end up getting something that
looks like this:
-------
lockNum=56, flags=0x85, refcount=1 1)
NOTICE:  Buffer Leak: [058] (freeNext=-3, freePrev=-3, relname=pg_largeobject, blockNum=9, flags=0x85, refcount=1 1)
NOTICE:  Buffer Leak: [059] (freeNext=-3, freePrev=-3, relname=pg_largeobject, blockNum=58, flags=0x85, refcount=1 3)
lockNum=49, flags=0x85, refcount=1 1)
NOTICE:  Buffer Leak: [061] (freeNext=-3, freePrev=-3, relname=pg_largeobject, blockNum=51, flags=0x85, refcount=1 3)
NOTICE:  Buffer Leak: [062] (freeNext=-3, freePrev=-3, relname=pg_largeobject, blockNum=37, flags=0x85, refcount=1 3)
ERROR:  out of free buffers: time to abort !
-----

The number of buffers leaked monotonically increases as the
import proceeds...I can't quite pin down at what location the
Buffers in question are being pinned down...

-David

----------------------[=========]------------------------
David T. McWherter                udmcwher@mcs.drexel.edu

Many Myths are based on truth
        -- Spock, "The Way to Eden",  stardate 5832.3

Re: Re: Large Objects

From

Denis Perchine

Date:

22 September 2000, 10:48:27

On Fri, 22 Sep 2000, David McWherter wrote:
> I was experimenting with the patches in question, after applying
> them to postgres-7.0.2.  I'm encountering a nasty problem when
> inserting my own huge (several megabyte) files into the relation,
> that I can't quite figure out.

I have already fixed this. There was some quite obvious mistakes in the code.
I will send a patch on 27th of Sep when I will back home.

Denis.

Re: Re: Large Objects

From

Tom Lane

Date:

22 September 2000, 10:54:47

Adam Haberlach <adam@newsnipple.com> writes:
>     On that note, since it seems that the large objects that I've
> just started using aren't likely to go away right this moment (and
> their interface works for what I'm doing anyway), what is a good way
> to write backend functions in C that will deal with large objects?

Call the routines in src/backend/libpq/be-fsstubs.c, which are the
backend support for lo_read and so forth.

Note that these routines are changing to "new function manager" call
style in 7.1, so you will need to tweak your calling code when you
update...

            regards, tom lane