Re: storing binary files / memory limit - Mailing list pgsql-php
From | Tomas Vondra |
---|---|
Subject | Re: storing binary files / memory limit |
Date | |
Msg-id | 48F11D6C.1000209@fuzzy.cz Whole thread Raw |
In response to | Re: storing binary files / memory limit (Andrew McMillan <andrew@catalyst.net.nz>) |
Responses |
Re: storing binary files / memory limit
|
List | pgsql-php |
> On Sat, 2008-10-11 at 18:26 +0200, Tomas Vondra wrote: >> Is there any other way to solve storing of large files in PostgreSQL? > > No, not until there are functions that let you fopen() on the bytea > column. :-( > Also, your "... || more_column" solution will generate large numbers of > dead rows and require frequent vacuuming. Yes, I'm aware of this and it's one more deficiency of the solution. But inserting the data is not so frequent (a few inserts a day), especially when compared to reading (several hundred SELECTs). >> - Optimization is a serious criterion, as is reliability. > > If you're using tables with very large columns, make sure you index on > every other column you're going to access it by. If PostgreSQL has to > resort to full-table scans on this table, and especially with a low > memory constraint, you could easily end up with it doing an on-disk sort > on a copy of the data. Well, the schema is quite well optimized I believe, so this shouldn't be a problem. All the necessary indexes are created etc. so the queries are executed quite fast (we have a comprehensive performance logging of queries, so I'm quite sure about this). But I'm not sure what you mean by 'low memory contraint' - the memory limit I've been talking about is purely PHP feature, so it's related to inserting / reading and escaping / unescaping data. Anyway 99% of queries returning multiple rows do not return BYTEA columns - these columns are references in queries returning single row, so there is no problem with sorting / memory occupied by the postmaster process. > If you *have* to store it in a table column (and it really isn't the > most efficient way of doing it) then create a separate table for it > which is just SERIAL + data. I *want* to store it in a table column, because I'm not able to come up with a better solution. As I understand it, creating a separate table to store the binary data is almost the same as using plain bytea columns. We don't have problems with performance (thanks to creating proper indexes and TOAST architecture), and it does not solve the problem I've described in my original post (hitting the PHP memory limit). OK, it would save a little bit of space when using the 'append' strategy described in my previous post (data = data || new_data), but table usually consists of a small amount of metadata plus large amount of binary data. So the amount of space wasted because of storing metadata in dead rows is negligible compared to space wasted because of dead rows and bytea columns. For example, one of our tables is used to store documents, so it has about this structure Documents ( id SERIAL, created DATE, title VARCHAR(128), author VARCHAR(128), description TEXT, -- short (usually less than 2000 characters) data BYTEA ) The 'data' column may have even several megabytes, so the metadata occupies less than 1% of the row. If you know a better way to store binary data, please describe it here. The only other way I'm aware of is LOB - it solves the problem of inserting data (by streaming), but has other disadvantages (no referential integrity, etc.) But creating a separate table for the binary data looks interesting under one condition - the file will be stored splitted. This basically mimics the LOB storage (pg_largeobject table). It still does not have the support for streaming the data, but it solves the problem with PHP memory limit and does not create large amount of dead rows. regards Tomas