Large objects [BLOB] again - general howto - Mailing list pgsql-general

From Lada 'Ray' Lostak
Subject Large objects [BLOB] again - general howto
Date
Msg-id 007901c3b1a5$47bacad0$0d01a8c0@utopia
Whole thread Raw
In response to array iteration?  (CSN <cool_screen_name90001@yahoo.com>)
List pgsql-general
Hi there :)

    Someone asks about performance of Large objects [LO] in PgSql [PG]. It
was interesting for me, because I didn't work yet with them yet and I will
have to soon. I tried search the web, doc and mailinglists, but I didn't
found any adequate reply. I would be happy, of someone, who have experimence
with them (or who know PG internals :) can make it clean. I think I am not
alone, who is interested in this topic.

    In past time, where row data was limited by single tupple (8K-32K), LO
was really needed, if you wanted to store more than 8K per row. Older
implementation issues are described in documentation.

    Today, there is 1G limit per row. Is there reason to use LO instead of
storing simple 'text' into database table ? (let's ignore 2G limit on LO,
which can make LO interesting in some cases :)Documentation says it is
obsolvete. But documentation also doesn't tell anything about performance,
suggestion about using, etc.

    By 'access' to binary file, I understand:

        - reading (and then working with its contents on server side)
        - sending (read and send as-is to client, without workaround)
        - updating (from contents which I get from somewhere)

    LO means storing file using large objects,
    column means storing file as 'text' column (standard way).

    Let's say I am having 'file' - smaller (tenhts of Kb) - let's say some
JPG image (photo of beer :) The file is 'readed' as is and passed to client
(browser, app, whatever). In this case, would be probably better to store
file as normal column, because it is easier to access it. This is one of
typical use. Every time I will write to that image, SQL parser need to parse
all datas, allocate proper memory, .... I think, real differnece will not be
too big, since picture having just tenths of Kb.

    But let's have the same picture, but in pretty fine resolution for
printing. File size will be megabytes and more [let's say 10Mb]. Ofcourse,
file should be stored in DB because of data integrity. In this case, we will
need pretty much memory while reading/updating. All the time. And if I will
get many similar requests - and they will go directly to DB, not throw sime
'local cache', it can raise problem.

Using 'text' column as binaru file facts [Am I right ???]:

    - time which SQL parser need to parse statements will be
      significant (adequate its size)
    - memory which SQL parser need will be at least equal to filesize
    - while fetching datas, file will be stored in memory while sending
      result
    - when I will receive result, PG will 'release' sql result memory,
      but file stay on 'my' side (still eating memory)
    - while updating file, we need 2*file_size memory at least (one for my
      generated SQLstatement, second for PgSql internal process - at least)
    - using char/text type is not 'binary safe', so, I need to convert it
      (next  additional memory/CPU)

On the other side, LO facts [Am I right ???]:

    - datas to file are not going throw SQL parser (no additional CPU/TIME
      requirements)
    - PG uses pg_largeobjects table to store the files 'directly'
    - datas are transfered in blocks - there is no nned for memory
    - because of binary safe functions, no additional converts needed
    - when I am updatinng the file, it doesn't need to be in memory - can be
      copied throw buffer from local stored file and PG also doesn't need to
      have its contents in memory (I don't count buffer/tupple/...)

Please, let me know, where I am true and where I am false.

If the PG at the end uses LO throw standard SQL, then everigth I wrote is
false and there is really no reason for use LO. Then I will miss that
feature..... So, I hope it works the assumed way :)

So, my questions:

    1. How big are claims of parser while analyzing 'megs' statement ? I
        mean CPU/Memory in comparing with LO ?
    2. How PG works with LO internally - throw SQL statements or directly ?
    3. When I will access LO file, it will be accessed really thow 'buffers'
        or PgSql somehow prepare file 'as-is' and then access it ?

If the things works as I assume (but I don't know where to verify right now,
except sources and mailing list, so, I selected faster way :)

Conclusion:

    a. For smaller files the difference between CPU/MEMORY usage is small
        LO: is not really needed for this case
        column: wins, because of consistent/portable work

    b. Files, which contents are needed 'as is' (full size) and work with
        them on server side it is also not too big difference
        LO: files here will probably save additional CPU, but only while
            updating contents
        column: wins, CPU difference only while updating

    c. Larger files, not processed on server-side (typically sent to client)
        LO: wins, save a LOTT OF memory while reading and also memory
            and cpu time while updating
        column: lost, easts redundant CPU & MEMORY, disadvantages of
                LO are lees then saved resources

    d. Larger files, processed on server @ local fs - like [c]

I wish new PgSql 7.4 documentation (as far I know it was not yet released)
contain small chapter / paragraph at LO about this. If someone is new to PG
(like me), there is no enough information about 'using' LO.

Thank you for corrections and comments about LO/columns binary storing.

I hope this email belong to general discussion. If no, I am sorry and let me
know - I will repost.

Best regards,
Lada 'Ray' Lostak
Unreal64 Develop group
http://www.orcave.com
http://www.unreal64.net


--------------------------------------------------------------------------
In the 1960s you needed the power of two C64s to get a rocket
to the moon. Now you need a machine which is a vast number
of times more powerful just to run the most popular GUI.



pgsql-general by date:

Previous
From: "Durai"
Date:
Subject: Re: Multiuser testing -- PostgreSQL 7.3.4
Next
From: "Ben-Nes Michael"
Date:
Subject: Re: defferable update & unique