Re: Atomic access to large arrays - Mailing list pgsql-performance

From Oleg Bartunov
Subject Re: Atomic access to large arrays
Date
Msg-id Pine.LNX.4.64.0907220910220.8065@sn.sai.msu.ru
Whole thread Raw
In response to Atomic access to large arrays  ("Victor de Buen (Bayes)" <vdebuen@bayesinf.com>)
List pgsql-performance
Victor,

Just wondering why do you use array ?

Oleg
On Wed, 22 Jul 2009, Victor de Buen (Bayes) wrote:

> Hi
>
> I'm storing historical meteorological gridded data from GFS (
> http://www.nco.ncep.noaa.gov/pmb/products/gfs/) into an array field in a
> table like this:
>
> CREATE TABLE grid_f_data_i2 (
>  //Specifies the variable and other features of data
>  id_inventory integer REFERENCES grid_d_inventory(id_inventory),
>  //A new grid is available each 3 hours since 5 years ago
>  dh_date timestamp,
>  //Values are scaled to be stored as signed integers of 2 bytes
>  vl_grid smallint[361][720],
> CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
>  (co_inventory, dh_date)
> );
>
> Dimensions of each value of field vl_grid are (lat:361 x lon:720 = 259920
> cells} for a grid of 0.5 degrees (about each 55 Km) around the world. So,
> vl_grid[y][x] stores the value at dh_date of a meteorological variable
> specified by id_inventory in the geodesic point
>
> latitude  = -90 + y*0.5
> longitude = x*0.5
>
> The reverse formula for the closest point in the grid of an arbitary
> geodesic point will be
>
> y = Round((latitude+90) * 2
> x = Round(longitude*2)
>
> Field vl_grid is stored in the TOAST table and has a good compression level.
> PostgreSql is the only one database that is able to store this huge amount
> of data in only 34 GB of disk. It's really great system. Queries returning
> big rectangular areas are very fast, but the target of almost all queries is
> to get historical series for a geodesic point
>
> SELECT  dh_date, vl_grid[123][152]
> FROM  grid_f_data_i2
> WHERE  id_inventory = 6
> ORDER BY dh_date
>
> In this case, atomic access to just a cell of each one of a only few
> thousands of rows becomes too slow.
>
> Please, could somebody answer some of these questions?
>
>   - It's posible to tune some TOAST parameters to get faster atomic access
>   to large arrays?
>
>
>   - Using "EXTERNAL" strategy for storing TOAST-able columns could solve
>   the problem?
>
>
>   - Atomic access will be faster if table stores vectors for data in the
>   same parallel instead of matrices of global data?
>   CREATE TABLE grid_f_data_i2 (
>     //Specifies the variable and other features of data
>     id_inventory integer REFERENCES grid_d_inventory(id_inventory),
>     //A new grid is available each 3 hours since 5 years ago
>     dh_date timestamp,
>     // nu_parallel = y = Round((latitude+90) * 2
>     smallint nu_parallel,
>     //Values are scaled to be stored as signed integers of 2 bytes
>     vl_parallel smallint[],
>   CONSTRAINT meteo_f_gfs_tmp PRIMARY KEY
>     (co_inventory, nu_parallel, dh_date)
>   );
>
>    - There is another faster solution?
>
> Thanks in advance and best regards
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-performance by date:

Previous
From: "Victor de Buen (Bayes)"
Date:
Subject: Atomic access to large arrays
Next
From: Scott Marlowe
Date:
Subject: Re: Master/Slave, DB separation or just spend $$$?