Home > mailing lists

Optimizing Time Series Access - Mailing list pgsql-performance

From	Robert Burgholzer
Subject	Optimizing Time Series Access
Date	April 8, 2014 21:21:01
Msg-id	CACT-NGJAMP0KULGMf3B8+0YqeywD1i34R7Okzb74Gbzn7_oU-A@mail.gmail.com Whole thread
List	pgsql-performance

Tree view

I am looking for advice on dealing with large tables of environmental model data and looking for alternatives to my current optimization approaches. Basically, I have about 1 Billion records stored in a table which I access in groups of roughly 23 Million at a time. Which means that I have somewhere in the neighborhood of 400-500 sets of 23Mil points.

The 23Mil that I pull at a time are keyed on 3 different columns, it's all indexed, and retrieval happens in say, 2-3 minutes (my hardware is so-so). So, my thought is to use some kind of caching and wonder if I can get advice - here are my thoughts on options, would love to hear others:

* use cached tables for this - since my # of actual data groups is small, why not just retrieve them once, then keep them around in a specially named table (I do this with some other stuff, using a 30 day cache expiration)

* Use some sort of stored procedure? I don't even know if such a thing really exists in PG and how it works.

* Use table partitioning?

Thanks,

/r/b

--
Robert W. Burgholzer
'Making the simple complicated is commonplace; making the complicated simple, awesomely simple, that's creativity.' - Charles Mingus

Athletics: http://athleticalgorithm.wordpress.com/

Science: http://robertwb.wordpress.com/

Wine: http://reesvineyard.wordpress.com/

pgsql-performance by date:

From: Dhananjay Singh
Date: 08 April 2014, 20:40:19
Subject: Re: Nested loop issue

From: Tom Lane
Date: 08 April 2014, 21:26:41
Subject: Re: Performance regressions in PG 9.3 vs PG 9.0

Optimizing Time Series Access - Mailing list pgsql-performance

Previous

Next