Large objects [BLOB] again - general howto - Mailing list pgsql-general
From | Lada 'Ray' Lostak |
---|---|
Subject | Large objects [BLOB] again - general howto |
Date | |
Msg-id | 007901c3b1a5$47bacad0$0d01a8c0@utopia Whole thread Raw |
In response to | array iteration? (CSN <cool_screen_name90001@yahoo.com>) |
List | pgsql-general |
Hi there :) Someone asks about performance of Large objects [LO] in PgSql [PG]. It was interesting for me, because I didn't work yet with them yet and I will have to soon. I tried search the web, doc and mailinglists, but I didn't found any adequate reply. I would be happy, of someone, who have experimence with them (or who know PG internals :) can make it clean. I think I am not alone, who is interested in this topic. In past time, where row data was limited by single tupple (8K-32K), LO was really needed, if you wanted to store more than 8K per row. Older implementation issues are described in documentation. Today, there is 1G limit per row. Is there reason to use LO instead of storing simple 'text' into database table ? (let's ignore 2G limit on LO, which can make LO interesting in some cases :)Documentation says it is obsolvete. But documentation also doesn't tell anything about performance, suggestion about using, etc. By 'access' to binary file, I understand: - reading (and then working with its contents on server side) - sending (read and send as-is to client, without workaround) - updating (from contents which I get from somewhere) LO means storing file using large objects, column means storing file as 'text' column (standard way). Let's say I am having 'file' - smaller (tenhts of Kb) - let's say some JPG image (photo of beer :) The file is 'readed' as is and passed to client (browser, app, whatever). In this case, would be probably better to store file as normal column, because it is easier to access it. This is one of typical use. Every time I will write to that image, SQL parser need to parse all datas, allocate proper memory, .... I think, real differnece will not be too big, since picture having just tenths of Kb. But let's have the same picture, but in pretty fine resolution for printing. File size will be megabytes and more [let's say 10Mb]. Ofcourse, file should be stored in DB because of data integrity. In this case, we will need pretty much memory while reading/updating. All the time. And if I will get many similar requests - and they will go directly to DB, not throw sime 'local cache', it can raise problem. Using 'text' column as binaru file facts [Am I right ???]: - time which SQL parser need to parse statements will be significant (adequate its size) - memory which SQL parser need will be at least equal to filesize - while fetching datas, file will be stored in memory while sending result - when I will receive result, PG will 'release' sql result memory, but file stay on 'my' side (still eating memory) - while updating file, we need 2*file_size memory at least (one for my generated SQLstatement, second for PgSql internal process - at least) - using char/text type is not 'binary safe', so, I need to convert it (next additional memory/CPU) On the other side, LO facts [Am I right ???]: - datas to file are not going throw SQL parser (no additional CPU/TIME requirements) - PG uses pg_largeobjects table to store the files 'directly' - datas are transfered in blocks - there is no nned for memory - because of binary safe functions, no additional converts needed - when I am updatinng the file, it doesn't need to be in memory - can be copied throw buffer from local stored file and PG also doesn't need to have its contents in memory (I don't count buffer/tupple/...) Please, let me know, where I am true and where I am false. If the PG at the end uses LO throw standard SQL, then everigth I wrote is false and there is really no reason for use LO. Then I will miss that feature..... So, I hope it works the assumed way :) So, my questions: 1. How big are claims of parser while analyzing 'megs' statement ? I mean CPU/Memory in comparing with LO ? 2. How PG works with LO internally - throw SQL statements or directly ? 3. When I will access LO file, it will be accessed really thow 'buffers' or PgSql somehow prepare file 'as-is' and then access it ? If the things works as I assume (but I don't know where to verify right now, except sources and mailing list, so, I selected faster way :) Conclusion: a. For smaller files the difference between CPU/MEMORY usage is small LO: is not really needed for this case column: wins, because of consistent/portable work b. Files, which contents are needed 'as is' (full size) and work with them on server side it is also not too big difference LO: files here will probably save additional CPU, but only while updating contents column: wins, CPU difference only while updating c. Larger files, not processed on server-side (typically sent to client) LO: wins, save a LOTT OF memory while reading and also memory and cpu time while updating column: lost, easts redundant CPU & MEMORY, disadvantages of LO are lees then saved resources d. Larger files, processed on server @ local fs - like [c] I wish new PgSql 7.4 documentation (as far I know it was not yet released) contain small chapter / paragraph at LO about this. If someone is new to PG (like me), there is no enough information about 'using' LO. Thank you for corrections and comments about LO/columns binary storing. I hope this email belong to general discussion. If no, I am sorry and let me know - I will repost. Best regards, Lada 'Ray' Lostak Unreal64 Develop group http://www.orcave.com http://www.unreal64.net -------------------------------------------------------------------------- In the 1960s you needed the power of two C64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
pgsql-general by date: