Re: Parallel Scaling of a pgplsql problem - Mailing list pgsql-performance

From Jan Nielsen
Subject Re: Parallel Scaling of a pgplsql problem
Date
Msg-id CANxH4hFe=VDS4WzFAnOFArhVUUfarG4Vubku2CBxjbgz1TWmCQ@mail.gmail.com
Whole thread Raw
In response to Parallel Scaling of a pgplsql problem  (Venki Ramachandran <venki_ramachandran@yahoo.com>)
List pgsql-performance
On Wed, Apr 25, 2012 at 12:52 PM, Venki Ramachandran <venki_ramachandran@yahoo.com> wrote:
Hi all:
Can someone please guide me as to how to solve this problem? If this is the wrong forum, please let me know which one to post this one in. I am new to Postgres (about 3 months into it)

I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql program that does some computation. It takes in a date range and for one pair of personnel (two employees in a company) it calculates some values over the time period. It takes about 40ms (milli seconds) to complete and give me the answer. All good so far.

Now I have to run the same pgplsql on all possible combinations of employees and with 542 employees that is about say 300,000 unique pairs.

So (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and show it on a screen. No user wants to wait for 3 hours,  they can probably wait for 10 minutes (even that is too much for a UI application). How do I solve this scaling problem? Can I have multiple parellel sessions and each session have multiple/processes that do a pair each at 40 ms and then collate the results. Does PostGres or pgplsql have any parallel computing capability.

Setting aside the database concurrency question, have you considered application-level solutions?

How often does a user expect their rank to change? If a daily rank change is fine, trigger the (lengthy) ranking calculation nightly and cache the results in a materialized view for all users; you could continuously rebuild the view to improve freshness to within 4 hours. To go faster with an application-level solution, you will have to reduce your calculation to *what's* likely to be most important to the individual which, again, you can cache; or, if you can predict *who's* most likely to request a ranking, calculate these first; or, both.

These are likely good things to consider regardless of any improvements you make to the back-end ranking calculation, though at you will hit a point of diminishing returns if your ranking calculation drops below some "tolerable" wait. In the web world "tolerable" is about 3 seconds for the general public and about 30 seconds for a captured audience, e.g., employees. YMMV.


Cheers,

Jan

pgsql-performance by date:

Previous
From: Venki Ramachandran
Date:
Subject: Re: Parallel Scaling of a pgplsql problem
Next
From: Yeb Havinga
Date:
Subject: Re: Parallel Scaling of a pgplsql problem