Thread: Storing binary data.

Storing binary data.

From

Jesper Krogh

Date:

11 August 2004, 14:49:32

Hi.

Please be a bit patient.. I'm quite new to PostgreSQL.

I'd like some advise on storing binary data in the database.

Currently I have about 300.000 320.000 Bytes "Bytea" records in the
database. It works quite well but I have a feeling that it actually is
slowing the database down on queries only related to the surrounding
attributes.

The "common" solution, I guess would be to store them in the filesystem
instead, but I like to have them just in the database it is nice clean
database and application design and if I can get PostgreSQL to "not
cache" them then it should be excactly as fast i assume.

The binary data is not a part of most queries in the database only a few
explicitly written to fetch them and they are not accessed very often.

What do people normally do?


Thanks, Jesper

--
./Jesper Krogh, jesper@krogh.cc
Jabber ID: jesper@jabbernet.dk

Re: Storing binary data.

From

Shridhar Daithankar

Date:

11 August 2004, 15:33:58

On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote:
> The "common" solution, I guess would be to store them in the filesystem
> instead, but I like to have them just in the database it is nice clean
> database and application design and if I can get PostgreSQL to "not
> cache" them then it should be excactly as fast i assume.

You can normalize them so that a table contains an id and a bytea column only.
Te main table will contain all the other attributes and a mapping id. That
way you will have only the main table cached.

You don't have to go to filesystem for this, I hope.

HTH

 Shridhar

Re: Storing binary data.

From

Tom Lane

Date:

11 August 2004, 16:47:17

Jesper Krogh <jesper@krogh.cc> writes:
> I'd like some advise on storing binary data in the database.

> Currently I have about 300.000 320.000 Bytes "Bytea" records in the
> database. It works quite well but I have a feeling that it actually is
> slowing the database down on queries only related to the surrounding
> attributes.

> The "common" solution, I guess would be to store them in the filesystem
> instead, but I like to have them just in the database it is nice clean
> database and application design and if I can get PostgreSQL to "not
> cache" them then it should be excactly as fast i assume.

> The binary data is not a part of most queries in the database only a few
> explicitly written to fetch them and they are not accessed very often.

> What do people normally do?

Nothing.  If the bytea values are large enough to be worth splitting
out, Postgres will actually do that for you automatically.  Wide field
values get pushed into a separate "toast" table, and are not fetched by
a query unless the value is specifically demanded.

You can control this behavior to some extent by altering the storage
option for the bytea column (see ALTER TABLE), but the default choice
is usually fine.

If you just want to see whether anything is happening, do a VACUUM
VERBOSE on that table and note the amount of storage in the toast table
as compared to the main table.

            regards, tom lane

Re: Storing binary data.

From

Jesper Krogh

Date:

11 August 2004, 18:02:12

I gmane.comp.db.postgresql.performance, skrev Shridhar Daithankar:
>  On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote:
> > The "common" solution, I guess would be to store them in the filesystem
> > instead, but I like to have them just in the database it is nice clean
> > database and application design and if I can get PostgreSQL to "not
> > cache" them then it should be excactly as fast i assume.
>
>  You can normalize them so that a table contains an id and a bytea column only.
>  Te main table will contain all the other attributes and a mapping id. That
>  way you will have only the main table cached.
>
>  You don't have to go to filesystem for this, I hope.

Further benchmarking.

I tried to create a table with the excact same attributes but without
the binary attribute. It didn't change anything, so my idea that it
should be the binary-stuff that sloved it down was wrong.

I have a timestamp column in the table that I sort on. Data is ordered
over the last 4 years and I select based on a timerange, I cannot make
the query-planner use the index on the timestamp by itself but if I "set
enable_seqscan = false" the query time drops by 1/3 (from 1.200 msec to
about 400 msec).

I cannot figure out why the query-planner chooses wrong.
NB: It's postgresql 7.4.3

--
./Jesper Krogh, jesper@krogh.cc
Jabber ID: jesper@jabbernet.dk

Re: Storing binary data.

From

Russell Smith

Date:

11 August 2004, 23:58:05

On Thu, 12 Aug 2004 02:29 am, Jesper Krogh wrote:
> I gmane.comp.db.postgresql.performance, skrev Shridhar Daithankar:
> >  On Wednesday 11 Aug 2004 7:59 pm, Jesper Krogh wrote:
> > > The "common" solution, I guess would be to store them in the filesystem
> > > instead, but I like to have them just in the database it is nice clean
> > > database and application design and if I can get PostgreSQL to "not
> > > cache" them then it should be excactly as fast i assume.
> >
> >  You can normalize them so that a table contains an id and a bytea column only.
> >  Te main table will contain all the other attributes and a mapping id. That
> >  way you will have only the main table cached.
> >
> >  You don't have to go to filesystem for this, I hope.
>
> Further benchmarking.
>
> I tried to create a table with the excact same attributes but without
> the binary attribute. It didn't change anything, so my idea that it
> should be the binary-stuff that sloved it down was wrong.
>
> I have a timestamp column in the table that I sort on. Data is ordered
> over the last 4 years and I select based on a timerange, I cannot make
> the query-planner use the index on the timestamp by itself but if I "set
> enable_seqscan = false" the query time drops by 1/3 (from 1.200 msec to
> about 400 msec).
>
> I cannot figure out why the query-planner chooses wrong.
> NB: It's postgresql 7.4.3
>
Please post explain analyze of the query.

I would guess you are using now() is your query, which is not optimized perfectly
by the planner, so you end up with problems.  But if you post explain analyze
people will be able to tell you what the problem is.

Maybe on with seqscan on, and one with it off.

Regards

Russell Smith