Re: Parallel Scaling of a pgplsql problem - Mailing list pgsql-performance

From Samuel Gendler
Subject Re: Parallel Scaling of a pgplsql problem
Date
Msg-id CAEV0TzAVQXf42n9vXu26JAe0-eQ9AgwwAErYFn0HMm8ScQaodA@mail.gmail.com
Whole thread Raw
In response to Parallel Scaling of a pgplsql problem  (Venki Ramachandran <venki_ramachandran@yahoo.com>)
Responses Re: Parallel Scaling of a pgplsql problem
List pgsql-performance


On Wed, Apr 25, 2012 at 11:52 AM, Venki Ramachandran <venki_ramachandran@yahoo.com> wrote:
Hi all:
Can someone please guide me as to how to solve this problem? If this is the wrong forum, please let me know which one to post this one in. I am new to Postgres (about 3 months into it)

I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql program that does some computation. It takes in a date range and for one pair of personnel (two employees in a company) it calculates some values over the time period. It takes about 40ms (milli seconds) to complete and give me the answer. All good so far.

Now I have to run the same pgplsql on all possible combinations of employees and with 542 employees that is about say 300,000 unique pairs.

So (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and show it on a screen. No user wants to wait for 3 hours,  they can probably wait for 10 minutes (even that is too much for a UI application). How do I solve this scaling problem? Can I have multiple parellel sessions and each session have multiple/processes that do a pair each at 40 ms and then collate the results. Does PostGres or pgplsql have any parallel computing capability.

The question is, how much of that 40ms is spent performing the calculation, how much is spent querying, and how much is function call overhead, and how much is round trip between the client and server with the query and results?  Depending upon the breakdown, it is entirely possible that the actual per-record multiplier can be kept down to a couple of milliseconds if you restructure things to query data in bulk and only call a single function to do the work.  If you get it down to 4ms, that's a 20 minute query.  Get it down to 1ms and you're looking at only 5 minutes for what would appear to be a fairly compute-intensive report over a relatively large dataset.


pgsql-performance by date:

Previous
From: Robert Klemme
Date:
Subject: Re: Configuration Recommendations
Next
From: "Kevin Grittner"
Date:
Subject: Re: Parallel Scaling of a pgplsql problem