Re: reading large BYTEA type is slower than expected - Mailing list pgsql-performance
From | Mark Harris |
---|---|
Subject | Re: reading large BYTEA type is slower than expected |
Date | |
Msg-id | D7BFFE348C53EF4E8AA0698B1E395FA9085ABE79@flybywire.esri.com Whole thread Raw |
In response to | Re: reading large BYTEA type is slower than expected (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-performance |
Tom, No it is not 3 seconds to read a single value. Multiple records are read, approximately 120 records if the raster dataset is created with our application's default configuration. Please read on to understand why, if you need to. We are a GIS software company and have two basic types of data, raster and vector. Both types of data are stored in a BYTEA. Vector data are representations of geometry stored as a series of vertices to represent points, lines and polygons. This type of data is typically 30 to 200 bytes, but can be very large (consider how many vertices would be required to represent the Pacific Ocean at a detailed resolution). Vector data does not seem to exhibit the cold fetch issue (fetch from disk as opposed to fetch from buffer cache). It is with raster data that we see the problem. Raster data is image data stored in the database. When we store a georeferenced image in the database we block it up into tiles. The default tile size is 128 by 128 pixels. We compress the data using either: LZ77, JPEG or JPEG2000. Typically the pixel blocks stored in the BYTEA range in size from 800 bytes to 16000 bytes for 8-bit data stored with the default tile size, depending on the type of compression and the variability of the data. Our application is capable of mosaicking source images together into huge raster datasets that can grow into terabytes. Consider the entire landsat imagery with a resolution of 15 meters mosaicked into one raster dataset. It requires less than a terabyte to store that data. For practical reasons, as you can imagine, we construct a reduced resolution pyramid on the raster base level, allowing applications to view a reduced resolution level of the raster dataset as the user zooms out, and a higher resolution level as the user zooms in. The pyramid levels are also stored as pixel blocks in the table. Each pyramid level is reduced in resolution by 1/2 in the X and Y dimension. Therefore pyramid level 1 will be 1/4 of pyramid level 0 (the base). As the application queries the raster blocks table which stores the raster data tiles, it will request a raster tiles that fall within the spatial extent of the window for a particular pyramid level. Therefore the number of records queried from the raster blocks table containing the BYTEA column of pixel data is fairly constant. For the screen resolution of 1680 by 1050 that I am testing with about 120 records will be fetched from the raster blocks table each time the user pans or zooms. Mark -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, May 18, 2007 10:48 AM To: Mark Harris Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] reading large BYTEA type is slower than expected "Mark Harris" <mharris@esri.com> writes: > We have recently ported our application to the postgres database. For > the most part performance has not been an issue; however there is one > situation that is a problem and that is the initial read of rows > containing BYTEA values that have an average size of 2 kilobytes or > greater. For BYTEA values postgres requires as much 3 seconds to read > the values from disk into its buffer cache. How large is "large"? (No, I don't believe it takes 3 sec to fetch a single 2Kb value.) regards, tom lane
pgsql-performance by date: