Re: Growth planning - Mailing list pgsql-general

From Alban Hertroys
Subject Re: Growth planning
Date
Msg-id 5F0C1A8D-F199-497A-B7FA-8143E48BB020@gmail.com
Whole thread Raw
In response to Growth planning  (Israel Brewster <ijbrewster@alaska.edu>)
List pgsql-general
> On 4 Oct 2021, at 18:22, Israel Brewster <ijbrewster@alaska.edu> wrote:

(…)

> the script owner is taking about wanting to process and pull in “all the historical data we have access to”, which
wouldgo back several years, not to mention the probable desire to keep things running into the foreseeable future. 

(…)

> - The largest SELECT workflow currently is a script that pulls all available data for ONE channel of each station
(currently,I suspect that will change to all channels in the near future), and runs some post-processing machine
learningalgorithms on it. This script (written in R, if that makes a difference) currently takes around half an hour to
run,and is run once every four hours. I would estimate about 50% of the run time is data retrieval and the rest doing
itsown thing. I am only responsible for integrating this script with the database, what it does with the data (and
thereforehow long that takes, as well as what data is needed), is up to my colleague. I have this script running on the
samemachine as the DB to minimize data transfer times. 

I suspect that a large portion of time is spent on downloading this data to the R script, would it help to rewrite it
inPL/R and do (part of) the ML calculations at the DB side? 

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: DELETE ... USING LATERAL
Next
From: Ron
Date:
Subject: Re: Growth planning