Home > mailing lists

Re: Parallel Scaling of a pgplsql problem - Mailing list pgsql-performance

From	Jan Nielsen
Subject	Re: Parallel Scaling of a pgplsql problem
Date	April 26, 2012 00:41:32
Msg-id	CANxH4hFe=VDS4WzFAnOFArhVUUfarG4Vubku2CBxjbgz1TWmCQ@mail.gmail.com Whole thread
In response to	Parallel Scaling of a pgplsql problem (Venki Ramachandran <venki_ramachandran@yahoo.com>)
List	pgsql-performance

Tree view

On Wed, Apr 25, 2012 at 12:52 PM, Venki Ramachandran <venki_ramachandran@yahoo.com> wrote:

Hi all:
Can someone please guide me as to how to solve this problem? If this is the wrong forum, please let me know which one to post this one in. I am new to Postgres (about 3 months into it)

I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql program that does some computation. It takes in a date range and for one pair of personnel (two employees in a company) it calculates some values over the time period. It takes about 40ms (milli seconds) to complete and give me the answer. All good so far.

Now I have to run the same pgplsql on all possible combinations of employees and with 542 employees that is about say 300,000 unique pairs.

So (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and show it on a screen. No user wants to wait for 3 hours, they can probably wait for 10 minutes (even that is too much for a UI application). How do I solve this scaling problem? Can I have multiple parellel sessions and each session have multiple/processes that do a pair each at 40 ms and then collate the results. Does PostGres or pgplsql have any parallel computing capability.

Setting aside the database concurrency question, have you considered application-level solutions?

How often does a user expect their rank to change? If a daily rank change is fine, trigger the (lengthy) ranking calculation nightly and cache the results in a materialized view for all users; you could continuously rebuild the view to improve freshness to within 4 hours. To go faster with an application-level solution, you will have to reduce your calculation to *what's* likely to be most important to the individual which, again, you can cache; or, if you can predict *who's* most likely to request a ranking, calculate these first; or, both.

These are likely good things to consider regardless of any improvements you make to the back-end ranking calculation, though at you will hit a point of diminishing returns if your ranking calculation drops below some "tolerable" wait. In the web world "tolerable" is about 3 seconds for the general public and about 30 seconds for a captured audience, e.g., employees. YMMV.

Cheers,

Jan

pgsql-performance by date:

From: Venki Ramachandran
Date: 25 April 2012, 23:40:42
Subject: Re: Parallel Scaling of a pgplsql problem

From: Yeb Havinga
Date: 26 April 2012, 03:49:39
Subject: Re: Parallel Scaling of a pgplsql problem

Re: Parallel Scaling of a pgplsql problem - Mailing list pgsql-performance

Previous

Next