Home > mailing lists

Strategy for doing number-crunching - Mailing list pgsql-novice

From	Matthew Foster
Subject	Strategy for doing number-crunching
Date	January 4, 2012 15:36:36
Msg-id	CAP1ZYZGAXpa-WciBK-is5RB=eJ=+jqStUz28=Fy1HXrN4LDMiA@mail.gmail.com Whole thread Raw
Responses	Re: Strategy for doing number-crunching (Tom Lane <tgl@sss.pgh.pa.us>) Re: Strategy for doing number-crunching (Sean Davis <sdavis2@mail.nih.gov>) Re: Strategy for doing number-crunching ("Jean-Yves F. Barbier" <12ukwn@gmail.com>) Re: Strategy for doing number-crunching (János Löbb <janos.lobb@yale.edu>)
List	pgsql-novice

Tree view

We have a database with approximately 130M rows, and we need to produce statistics (e.g. mean, standard deviation, etc.) on the data. Right now, we're generating these stats via a single SELECT, and it is extremely slow...like it can take hours to return results. The queries are variable, so there is no way I can think of to "pre-calculate" the stats.

Here is an example query...

SELECT count(mae) AS count, to_char(avg(mae), 'FM990.999') AS mae,

to_char(avg(sqrt(rms)), 'FM990.999') AS rms, to_char(avg(bias), 'FM990.999') AS bias,

to_char(max(mae), 'FM990.999') AS mae_max, to_char(min(mae), 'FM990.999') AS mae_min,

to_char(max(bias), 'FM990.999') AS bias_max, to_char(min(bias), 'FM990.999') AS bias_min,

to_char(max(sqrt(rms)), 'FM990.999') AS rms_max, to_char(min(sqrt(rms)), 'FM990.999') AS rms_min,

to_char(stddev(sqrt(rms)), 'FM990.999') AS rms_stddev,

to_char(stddev(mae), 'FM990.999') AS mae_stddev,

to_char(stddev(bias), 'FM990.999') AS bias_stddev

FROM verify_data.verification_data

WHERE model_name='foo'

AND...several other conditions...

As one might imagine, this query pounds the CPU relentlessly for hours.

I'm thinking there must be a better approach for doing large amounts of calculations on the data. Any pointers would be greatly appreciated!

Matt

pgsql-novice by date:

From: "Birchall, Austen"
Date: 04 January 2012, 11:01:18
Subject: Removal of WAL logs

From: Tom Lane
Date: 04 January 2012, 15:48:53
Subject: Re: Strategy for doing number-crunching

Strategy for doing number-crunching - Mailing list pgsql-novice

Previous

Next