Re: Parallel Scaling of a pgplsql problem - Mailing list pgsql-performance

From Pavel Stehule
Subject Re: Parallel Scaling of a pgplsql problem
Date
Msg-id CAFj8pRB7gvUocfW5Uv5-NVFtRPSE97AS6Sbk5gzrdMweCpNS_Q@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Scaling of a pgplsql problem  (Venki Ramachandran <venki_ramachandran@yahoo.com>)
Responses Re: Parallel Scaling of a pgplsql problem  (Venki Ramachandran <venki_ramachandran@yahoo.com>)
List pgsql-performance
Hello

http://www.postgresql.org/docs/8.1/static/functions-datetime.html

CURRENT_TIME and CURRENT_TIMESTAMP deliver values with time zone;
LOCALTIME and LOCALTIMESTAMP deliver values without time zone.

CURRENT_TIME, CURRENT_TIMESTAMP, LOCALTIME, and LOCALTIMESTAMP can
optionally take a precision parameter, which causes the result to be
rounded to that many fractional digits in the seconds field. Without a
precision parameter, the result is given to the full available
precision.

Some examples:

SELECT CURRENT_TIME;
Result: 14:39:53.662522-05

SELECT CURRENT_DATE;
Result: 2001-12-23

SELECT CURRENT_TIMESTAMP;
Result: 2001-12-23 14:39:53.662522-05

SELECT CURRENT_TIMESTAMP(2);
Result: 2001-12-23 14:39:53.66-05

SELECT LOCALTIMESTAMP;
Result: 2001-12-23 14:39:53.662522

Since these functions return the start time of the current
transaction, their values do not change during the transaction. This
is considered a feature: the intent is to allow a single transaction
to have a consistent notion of the "current" time, so that multiple
modifications within the same transaction bear the same time stamp.

    Note: Other database systems might advance these values more frequently.

PostgreSQL also provides functions that return the start time of the
current statement, as well as the actual current time at the instant
the function is called. The complete list of non-SQL-standard time
functions is:

transaction_timestamp()
statement_timestamp()

Regards

Pavel Stehule

2012/4/25 Venki Ramachandran <venki_ramachandran@yahoo.com>:
> Another question (probably a silly mistake) while debugging this problem:
> I put in insert statements into the pgplsql code to collect the
> current_timestamp and see where the code was spending most of it time....The
> output is as follows:
>
>                                                                     log_text
>                                                                     |
>  insertdatetime
>
-------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------
>  INPUT
> VARIABLES,Src,emp_id_1,emp_id_100,StartDate,2012-01-01,EndDate,2012-02-01,Tou,DOnPk,CRR
> Type,ptp-obligations,MinPositivePirce,1.00 | 2012-04-25 20:56:42.691965+00
>  Starting get_DAM_Value function
>
>              | 2012-04-25 20:56:42.691965+00
>  StartDate,2012-01-01 01:00:00,EndDate,2012-02-02 00:00:00
>                                                                         |
> 2012-04-25 20:56:42.691965+00
>  StartHr,vSrcprice,vSinkprice,vDiff,vAvgValue,vTotalDAMValue,vPkTotal,vOffPkTotal
>                                                                | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 01:00:00,18.52,16.15,-2.370000,-2.370000,0.000000,0.000000,-2.370000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 02:00:00,17.22,15.60,-1.620000,-1.620000,0.000000,0.000000,-3.990000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 03:00:00,18.22,17.55,-0.670000,-0.670000,0.000000,0.000000,-4.660000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  2012-01-01
> 04:00:00,18.53,18.13,-0.400000,-0.400000,0.000000,0.000000,-5.060000
>                                                         | 2012-04-25
> 20:56:42.691965+00
>  ...
>  ...
>  ...
>  ...
>  2012-02-02
> 00:00:00,2,17.13,17.13,0.000000,0.000000,0.000000,-7940.250000,-5216.290000
>                                                          | 2012-04-25
> 20:56:42.691965+00
>  2012-02-02
> 00:00:00,3,16.54,16.54,0.000000,0.000000,0.000000,-7940.250000,-5216.290000
>                                                          | 2012-04-25
> 20:56:42.691965+00
>  2012-02-02
> 00:00:00,4,16.27,16.27,0.000000,0.000000,0.000000,-7940.250000,-5216.290000
>                                                          | 2012-04-25
> 20:56:42.691965+00
>  TotalPEAKMVal,-7940.250000,NumIntervals,2034,AvgRTMVAlue,-3.903761
>                                                                      |
> 2012-04-25 20:56:42.691965+00
>  OUTPUT
> VARIABLES,AvgPurValue,-2.84,AvgSaleValue,-3.90,Profit,-1.06,ProfitPercentage,-106.00
>                                                     | 2012-04-25
> 20:56:42.691965+00
> (3832 rows)
>
> Why doesn't the current_timestamp value change within the pgpsql code? For
> every select from a table to compute, I am inserting into my debug_log dummy
> table. For all 3832 rows I have the same current_timestamp value which I was
> hoping to get by the following insert statement:
> insert into debug_log ('some log text', current_timestamp);
>
> But when I run this procedure from the psql command line, it takes 11
> seconds....I gave a bigger date range than what I stated as (40 ms) in my
> first post hence the change to 11 seconds.
>
> dev=> select
>
calc_comp('emp_id_1','emp_id_100','DAM-RTM',to_date('2012-01-01','yyyy-mm-dd'),to_date('2012-02-01','yyyy-mm-dd'),'DOnPk','ptp-obligations',1.00,
> 0);
>          crr_valuation
> -------------------------------
>  0||-2.84|-3.90|-1.06|-106.00|
> (1 row)
>
> Time: 11685.828 ms.
>
> The last input value 1/0 while calling the above function controls, if any
> log line messages should be inserted or not, 0=insert, 1 = do_not_insert.
> When I toggle the flag the overall timing did not change. Does it not some
> time in ms to write 3832 rows to a table?
> Why is my current_timestamp value not changing. I was expecting the
> difference between the last row's timestamp value MINUS the first row's
> tiemstamp value to equal my 11.685 seconds. What is wrong here?
>
>
> -Venki
>
> ________________________________
> From: Samuel Gendler <sgendler@ideasculptor.com>
>
> To: Venki Ramachandran <venki_ramachandran@yahoo.com>
> Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
> Sent: Wednesday, April 25, 2012 12:36 PM
>
> Subject: Re: [PERFORM] Parallel Scaling of a pgplsql problem
>
>
>
> On Wed, Apr 25, 2012 at 11:52 AM, Venki Ramachandran
> <venki_ramachandran@yahoo.com> wrote:
>
> Hi all:
> Can someone please guide me as to how to solve this problem? If this is the
> wrong forum, please let me know which one to post this one in. I am new to
> Postgres (about 3 months into it)
>
> I have PostGres 9.0 database in a AWS server (x-large) and a pgplsql program
> that does some computation. It takes in a date range and for one pair of
> personnel (two employees in a company) it calculates some values over the
> time period. It takes about 40ms (milli seconds) to complete and give me the
> answer. All good so far.
>
> Now I have to run the same pgplsql on all possible combinations of employees
> and with 542 employees that is about say 300,000 unique pairs.
>
> So (300000 * 40)/(1000 * 60 * 60) = 3.33 hours and I have to rank them and
> show it on a screen. No user wants to wait for 3 hours,  they can probably
> wait for 10 minutes (even that is too much for a UI application). How do I
> solve this scaling problem? Can I have multiple parellel sessions and each
> session have multiple/processes that do a pair each at 40 ms and then
> collate the results. Does PostGres or pgplsql have any parallel computing
> capability.
>
>
> The question is, how much of that 40ms is spent performing the calculation,
> how much is spent querying, and how much is function call overhead, and how
> much is round trip between the client and server with the query and results?
>  Depending upon the breakdown, it is entirely possible that the actual
> per-record multiplier can be kept down to a couple of milliseconds if you
> restructure things to query data in bulk and only call a single function to
> do the work.  If you get it down to 4ms, that's a 20 minute query.  Get it
> down to 1ms and you're looking at only 5 minutes for what would appear to be
> a fairly compute-intensive report over a relatively large dataset.
>
>
>
>

pgsql-performance by date:

Previous
From: Venki Ramachandran
Date:
Subject: Re: Parallel Scaling of a pgplsql problem
Next
From: Shaun Thomas
Date:
Subject: Re: Configuration Recommendations