Is there a faster way to do this? - Mailing list pgsql-sql

From Patrick Hatcher
Subject Is there a faster way to do this?
Date
Msg-id OFCB14A796.8F1A370B-ON88256EB4.006549CA-88256EB4.0068B9F7@fds.com
Whole thread Raw
Responses Re: Is there a faster way to do this?  (Richard Huxton <dev@archonet.com>)
Re: Is there a faster way to do this?  (Edmund Bacon <ebacon@onesystem.com>)
Re: Is there a faster way to do this?  (Jean-Luc Lachance <jllachan@sympatico.ca>)
Re: Is there a faster way to do this?  ("Greg Sabino Mullane" <greg@turnstep.com>)
List pgsql-sql



pg: 7.4.2
RedHat 7.2

Can I get some advice on a possible faster way of doing this:

Scenario:  Each day I update a column in a table with an internal
percentile value.  To arrive at  this value, I'll get a count of records
with sales > 0 and then divide this count by the total number of tile
groups I want.  So for example:
Total records w/sales > 0 = 730,000tile# = 100
total percentile groups (730,000 / 100):7300

Now that I have the total number of groups I need, I cycle through my
recordset, grab the key field and the current percentile number and stuff
the values into a temp table. I mod the current row counter against the
total percentile group number.  If it is 0 then I add 1 to the  current
percentile number .  After inserting records into the temp file I then
update the main table.

Using the example above, the first 7300 records would get a  percentile
number of 1, the next 7300  records would get  a percentile number of 2,
then 3,4,5,etc.

Unfortunately, I am going record by record in a loop and the process takes
upwards of 20mins.  Is there a faster way to do this?  I thought about
using limit and offset, but I'm not sure how I would accomplish it.

Below is the function I currently use.  Thanks for any help provided

CREATE OR REPLACE FUNCTION cdm.percentile_calc() RETURNS text AS
'DECLARE    v_interval int4;    v_tile int4;    v_percentile int4;    v_check int4;    v_count int4;    v_rowcount
int4;   myRec  RECORD;
 

BEGIN    v_count:=0;    v_tile:= 100;    v_percentile:=1;    v_rowcount :=1;    v_check:=0;

    /* Get count of records with val_purch > 0 */    select into v_count count(*)  from cdm.cdm_indiv_mast where
val_purch_com >0;
    /* this number will be used as part of our MOD to tell when to add one
to our percentile */    v_interval := v_count / v_tile;
CREATE TEMP TABLE cdmperct (f1 int8, f2 int2);

    FOR myRec IN  select indiv_key from cdm.cdm_indiv_mast where
val_purch_com  >0 order by val_purch_com desc  LOOP     INSERT INTO cdmperct values (myRec.indiv_key,v_percentile);
v_check= mod(v_rowcount,v_interval);     IF v_check = 0 THEN      v_percentile:=v_percentile+1;     END IF;
v_rowcount:=v_rowcount+1;    END LOOP;
 

      UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from  cdmperct where
indiv_key = f1;

    DROP TABLE cdmperct;    RETURN  \'DONE\';
END; ' LANGUAGE 'plpgsql' IMMUTABLE;

Patrick Hatcher
Macys.Com



pgsql-sql by date:

Previous
From: "Jie Liang"
Date:
Subject: Re: Prepare Statement
Next
From: Richard Huxton
Date:
Subject: Re: Is there a faster way to do this?