Thread: reading large BYTEA type is slower than expected

reading large BYTEA type is slower than expected

From
"Mark Harris"
Date:

We have recently ported our application to the postgres database. For the most part performance has not been an issue; however there is one situation that is a problem and that is the initial read of rows containing BYTEA values that have an average size of 2 kilobytes or greater. For BYTEA values postgres requires as much 3 seconds to read the values from disk into its buffer cache. After the initial read into buffer cache, performance is comparable to other commercial DBMS that we have ported to. As would be expected the commercial DBMS are also slower to display data that is not already in the buffer cache, but the magnitude of difference for postgres for this type of data read from disk as opposed to read from buffer cache is much greater.

 

We have vacuumed the table and played around with the database initialization parameters in the postgresql.conf. Neither helped with this problem.

 

Does anyone have any tips on improving the read from disk performance of BYTEA data that is typically 2KB or larger?

 

Mark

Re: reading large BYTEA type is slower than expected

From
Tom Lane
Date:
"Mark Harris" <mharris@esri.com> writes:
> We have recently ported our application to the postgres database. For
> the most part performance has not been an issue; however there is one
> situation that is a problem and that is the initial read of rows
> containing BYTEA values that have an average size of 2 kilobytes or
> greater. For BYTEA values postgres requires as much 3 seconds to read
> the values from disk into its buffer cache.

How large is "large"?

(No, I don't believe it takes 3 sec to fetch a single 2Kb value.)

            regards, tom lane

Re: reading large BYTEA type is slower than expected

From
"Y Sidhu"
Date:
Mark,

I am no expert but this looks like a file system I/O thing. I set hw.ata.wc=1 for a SATA drive and =0 for a SCSI drive in /boot/loader.conf on my FreeBSD systems. That seems to provide some needed tweaking.

Yudhvir
==========
On 5/18/07, Mark Harris <mharris@esri.com> wrote:

We have recently ported our application to the postgres database. For the most part performance has not been an issue; however there is one situation that is a problem and that is the initial read of rows containing BYTEA values that have an average size of 2 kilobytes or greater. For BYTEA values postgres requires as much 3 seconds to read the values from disk into its buffer cache. After the initial read into buffer cache, performance is comparable to other commercial DBMS that we have ported to. As would be expected the commercial DBMS are also slower to display data that is not already in the buffer cache, but the magnitude of difference for postgres for this type of data read from disk as opposed to read from buffer cache is much greater.

 

We have vacuumed the table and played around with the database initialization parameters in the postgresql.conf. Neither helped with this problem.

 

Does anyone have any tips on improving the read from disk performance of BYTEA data that is typically 2KB or larger?

 

Mark




--
Yudhvir Singh Sidhu
408 375 3134 cell

Re: reading large BYTEA type is slower than expected

From
"Mark Harris"
Date:
Tom,

No it is not 3 seconds to read a single value. Multiple records are
read, approximately 120 records if the raster dataset is created with
our application's default configuration.

Please read on to understand why, if you need to.

We are a GIS software company and have two basic types of data, raster
and vector. Both types of data are stored in a BYTEA.

Vector data are representations of geometry stored as a series of
vertices to represent points, lines and polygons. This type of data is
typically 30 to 200 bytes, but can be very large (consider how many
vertices would be required to represent the Pacific Ocean at a detailed
resolution). Vector data does not seem to exhibit the cold fetch issue
(fetch from disk as opposed to fetch from buffer cache).

It is with raster data that we see the problem. Raster data is image
data stored in the database. When we store a georeferenced image in the
database we block it up into tiles. The default tile size is 128 by 128
pixels.
We compress the data using either: LZ77, JPEG or JPEG2000. Typically the
pixel blocks stored in the BYTEA range in size from 800 bytes to 16000
bytes for 8-bit data stored with the default tile size, depending on the
type of compression and the variability of the data.

Our application is capable of mosaicking source images together into
huge raster datasets that can grow into terabytes. Consider the entire
landsat imagery with a resolution of 15 meters mosaicked into one raster
dataset. It requires less than a terabyte to store that data.

For practical reasons, as you can imagine, we construct a reduced
resolution pyramid on the raster base level, allowing applications to
view a reduced resolution level of the raster dataset as the user zooms
out, and a higher resolution level as the user zooms in. The pyramid
levels are also stored as pixel blocks in the table. Each pyramid level
is reduced in resolution by 1/2 in the X and Y dimension. Therefore
pyramid level 1 will be 1/4 of pyramid level 0 (the base).

As the application queries the raster blocks table which stores the
raster data tiles, it will request a raster tiles that fall within the
spatial extent of the window for a particular pyramid level. Therefore
the number of records queried from the raster blocks table containing
the BYTEA column of pixel data is fairly constant. For the screen
resolution of 1680 by 1050 that I am testing with about 120 records will
be fetched from the raster blocks table each time the user pans or
zooms.

Mark






-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, May 18, 2007 10:48 AM
To: Mark Harris
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] reading large BYTEA type is slower than expected

"Mark Harris" <mharris@esri.com> writes:
> We have recently ported our application to the postgres database. For
> the most part performance has not been an issue; however there is one
> situation that is a problem and that is the initial read of rows
> containing BYTEA values that have an average size of 2 kilobytes or
> greater. For BYTEA values postgres requires as much 3 seconds to read
> the values from disk into its buffer cache.

How large is "large"?

(No, I don't believe it takes 3 sec to fetch a single 2Kb value.)

            regards, tom lane


Re: reading large BYTEA type is slower than expected

From
"Mark Harris"
Date:
Tom,

Actually the 120 records I quoted is a mistake. Since it is a three band
image the number of records should be 360 records or 120 records for
each band.

Mark

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, May 18, 2007 10:48 AM
To: Mark Harris
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] reading large BYTEA type is slower than expected

"Mark Harris" <mharris@esri.com> writes:
> We have recently ported our application to the postgres database. For
> the most part performance has not been an issue; however there is one
> situation that is a problem and that is the initial read of rows
> containing BYTEA values that have an average size of 2 kilobytes or
> greater. For BYTEA values postgres requires as much 3 seconds to read
> the values from disk into its buffer cache.

How large is "large"?

(No, I don't believe it takes 3 sec to fetch a single 2Kb value.)

            regards, tom lane