Home > mailing lists

Re: Optimizing Time Series Access - Mailing list pgsql-performance

From	Robert Burgholzer
Subject	Re: Optimizing Time Series Access
Date	April 9, 2014 19:49:40
Msg-id	CACT-NG+bKY96EA3pYMyDMJU19EAyzdZrx1Y+GQo4f1MNQT9q=A@mail.gmail.com Whole thread
In response to	Optimizing Time Series Access (Robert Burgholzer <rburghol@vt.edu>)
List	pgsql-performance

Tree view

Dave,

Thanks for asking about the structure. I can say that it appears to me to be fairly moderately structured, and I will list those aspects that I think make it defined (STRUCTURED), and those which are more variable (Moderately...).

STRUCTURED:

Location - Values are keyed according to a location, and there are only about 500 locations in my data set, so theoretically the data is able to be structured by these locations

dataval - Always numerical data

MODERATELY STRUCTURED:

timestamp - A set of values will be sequential in time, but be on a variable scale (15 minutes to 1 hour to 1 day are general temporal scale).

scenarioid - there may be several copies of each piece of data representing different model "scenarios".

param_group, param_block & param_name - descriptor of a piece of data - there may be an infinite number of these depending upon what our models are doing, but most of them have between 3-10 parameters.

On Tue, Apr 8, 2014 at 7:30 PM, Dave Duke <dave.duke@cryptic.co.uk> wrote:

Could you be more specific about the data is random or structured in some way ?

On 8 Apr 2014, at 22:20, Robert Burgholzer <rburghol@vt.edu> wrote:

I am looking for advice on dealing with large tables of environmental model data and looking for alternatives to my current optimization approaches. Basically, I have about 1 Billion records stored in a table which I access in groups of roughly 23 Million at a time. Which means that I have somewhere in the neighborhood of 400-500 sets of 23Mil points.

The 23Mil that I pull at a time are keyed on 3 different columns, it's all indexed, and retrieval happens in say, 2-3 minutes (my hardware is so-so). So, my thought is to use some kind of caching and wonder if I can get advice - here are my thoughts on options, would love to hear others:

* use cached tables for this - since my # of actual data groups is small, why not just retrieve them once, then keep them around in a specially named table (I do this with some other stuff, using a 30 day cache expiration)
* Use some sort of stored procedure? I don't even know if such a thing really exists in PG and how it works.
* Use table partitioning?

Thanks,
/r/b

--
--
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus
Athletics: http://athleticalgorithm.wordpress.com/
Science: http://robertwb.wordpress.com/
Wine: http://reesvineyard.wordpress.com/

--
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus

Athletics: http://athleticalgorithm.wordpress.com/

Science: http://robertwb.wordpress.com/

Wine: http://reesvineyard.wordpress.com/

pgsql-performance by date:

From: Claudio Freire
Date: 09 April 2014, 18:56:28
Subject: Interesting case of index un-usage

From: Bruce Momjian
Date: 09 April 2014, 21:21:10
Subject: Re: SSI slows down over time

Re: Optimizing Time Series Access - Mailing list pgsql-performance

Previous

Next