Thread: Storing binary data.
Hi. Please be a bit patient.. I'm quite new to PostgreSQL. I'd like some advise on storing binary data in the database. Currently I have about 300.000 320.000 Bytes "Bytea" records in the database. It works quite well but I have a feeling that it actually is slowing the database down on queries only related to the surrounding attributes. The "common" solution, I guess would be to store them in the filesystem instead, but I like to have them just in the database it is nice clean database and application design and if I can get PostgreSQL to "not cache" them then it should be excactly as fast i assume. The binary data is not a part of most queries in the database only a few explicitly written to fetch them and they are not accessed very often. What do people normally do? Thanks, Jesper -- ./Jesper Krogh, jesper@krogh.cc Jabber ID: jesper@jabbernet.dk
On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote: > The "common" solution, I guess would be to store them in the filesystem > instead, but I like to have them just in the database it is nice clean > database and application design and if I can get PostgreSQL to "not > cache" them then it should be excactly as fast i assume. You can normalize them so that a table contains an id and a bytea column only. Te main table will contain all the other attributes and a mapping id. That way you will have only the main table cached. You don't have to go to filesystem for this, I hope. HTH Shridhar
Jesper Krogh <jesper@krogh.cc> writes: > I'd like some advise on storing binary data in the database. > Currently I have about 300.000 320.000 Bytes "Bytea" records in the > database. It works quite well but I have a feeling that it actually is > slowing the database down on queries only related to the surrounding > attributes. > The "common" solution, I guess would be to store them in the filesystem > instead, but I like to have them just in the database it is nice clean > database and application design and if I can get PostgreSQL to "not > cache" them then it should be excactly as fast i assume. > The binary data is not a part of most queries in the database only a few > explicitly written to fetch them and they are not accessed very often. > What do people normally do? Nothing. If the bytea values are large enough to be worth splitting out, Postgres will actually do that for you automatically. Wide field values get pushed into a separate "toast" table, and are not fetched by a query unless the value is specifically demanded. You can control this behavior to some extent by altering the storage option for the bytea column (see ALTER TABLE), but the default choice is usually fine. If you just want to see whether anything is happening, do a VACUUM VERBOSE on that table and note the amount of storage in the toast table as compared to the main table. regards, tom lane
I gmane.comp.db.postgresql.performance, skrev Shridhar Daithankar: > On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote: > > The "common" solution, I guess would be to store them in the filesystem > > instead, but I like to have them just in the database it is nice clean > > database and application design and if I can get PostgreSQL to "not > > cache" them then it should be excactly as fast i assume. > > You can normalize them so that a table contains an id and a bytea column only. > Te main table will contain all the other attributes and a mapping id. That > way you will have only the main table cached. > > You don't have to go to filesystem for this, I hope. Further benchmarking. I tried to create a table with the excact same attributes but without the binary attribute. It didn't change anything, so my idea that it should be the binary-stuff that sloved it down was wrong. I have a timestamp column in the table that I sort on. Data is ordered over the last 4 years and I select based on a timerange, I cannot make the query-planner use the index on the timestamp by itself but if I "set enable_seqscan = false" the query time drops by 1/3 (from 1.200 msec to about 400 msec). I cannot figure out why the query-planner chooses wrong. NB: It's postgresql 7.4.3 -- ./Jesper Krogh, jesper@krogh.cc Jabber ID: jesper@jabbernet.dk
On Thu, 12 Aug 2004 02:29 am, Jesper Krogh wrote: > I gmane.comp.db.postgresql.performance, skrev Shridhar Daithankar: > > On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote: > > > The "common" solution, I guess would be to store them in the filesystem > > > instead, but I like to have them just in the database it is nice clean > > > database and application design and if I can get PostgreSQL to "not > > > cache" them then it should be excactly as fast i assume. > > > > You can normalize them so that a table contains an id and a bytea column only. > > Te main table will contain all the other attributes and a mapping id. That > > way you will have only the main table cached. > > > > You don't have to go to filesystem for this, I hope. > > Further benchmarking. > > I tried to create a table with the excact same attributes but without > the binary attribute. It didn't change anything, so my idea that it > should be the binary-stuff that sloved it down was wrong. > > I have a timestamp column in the table that I sort on. Data is ordered > over the last 4 years and I select based on a timerange, I cannot make > the query-planner use the index on the timestamp by itself but if I "set > enable_seqscan = false" the query time drops by 1/3 (from 1.200 msec to > about 400 msec). > > I cannot figure out why the query-planner chooses wrong. > NB: It's postgresql 7.4.3 > Please post explain analyze of the query. I would guess you are using now() is your query, which is not optimized perfectly by the planner, so you end up with problems. But if you post explain analyze people will be able to tell you what the problem is. Maybe on with seqscan on, and one with it off. Regards Russell Smith