Re: Optimizing Time Series Access - Mailing list pgsql-performance
From | Robert Burgholzer |
---|---|
Subject | Re: Optimizing Time Series Access |
Date | |
Msg-id | CACT-NG+bKY96EA3pYMyDMJU19EAyzdZrx1Y+GQo4f1MNQT9q=A@mail.gmail.com Whole thread Raw |
In response to | Optimizing Time Series Access (Robert Burgholzer <rburghol@vt.edu>) |
List | pgsql-performance |
Dave,
Thanks for asking about the structure. I can say that it appears to me to be fairly moderately structured, and I will list those aspects that I think make it defined (STRUCTURED), and those which are more variable (Moderately...).
STRUCTURED:
Location - Values are keyed according to a location, and there are only about 500 locations in my data set, so theoretically the data is able to be structured by these locations
dataval - Always numerical data
MODERATELY STRUCTURED:
timestamp - A set of values will be sequential in time, but be on a variable scale (15 minutes to 1 hour to 1 day are general temporal scale).
scenarioid - there may be several copies of each piece of data representing different model "scenarios".
param_group, param_block & param_name - descriptor of a piece of data - there may be an infinite number of these depending upon what our models are doing, but most of them have between 3-10 parameters.
On Tue, Apr 8, 2014 at 7:30 PM, Dave Duke <dave.duke@cryptic.co.uk> wrote:
Could you be more specific about the data is random or structured in some way ?I am looking for advice on dealing with large tables of environmental model data and looking for alternatives to my current optimization approaches. Basically, I have about 1 Billion records stored in a table which I access in groups of roughly 23 Million at a time. Which means that I have somewhere in the neighborhood of 400-500 sets of 23Mil points.The 23Mil that I pull at a time are keyed on 3 different columns, it's all indexed, and retrieval happens in say, 2-3 minutes (my hardware is so-so). So, my thought is to use some kind of caching and wonder if I can get advice - here are my thoughts on options, would love to hear others:* use cached tables for this - since my # of actual data groups is small, why not just retrieve them once, then keep them around in a specially named table (I do this with some other stuff, using a 30 day cache expiration)* Use some sort of stored procedure? I don't even know if such a thing really exists in PG and how it works.* Use table partitioning?Thanks,/r/b----
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles MingusAthletics: http://athleticalgorithm.wordpress.com/Science: http://robertwb.wordpress.com/
--
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus
Athletics: http://athleticalgorithm.wordpress.com/
Science: http://robertwb.wordpress.com/
pgsql-performance by date: