Is there a faster way to do this? - Mailing list pgsql-sql
From | Patrick Hatcher |
---|---|
Subject | Is there a faster way to do this? |
Date | |
Msg-id | OFCB14A796.8F1A370B-ON88256EB4.006549CA-88256EB4.0068B9F7@fds.com Whole thread Raw |
Responses |
Re: Is there a faster way to do this?
Re: Is there a faster way to do this? Re: Is there a faster way to do this? Re: Is there a faster way to do this? |
List | pgsql-sql |
pg: 7.4.2 RedHat 7.2 Can I get some advice on a possible faster way of doing this: Scenario: Each day I update a column in a table with an internal percentile value. To arrive at this value, I'll get a count of records with sales > 0 and then divide this count by the total number of tile groups I want. So for example: Total records w/sales > 0 = 730,000tile# = 100 total percentile groups (730,000 / 100):7300 Now that I have the total number of groups I need, I cycle through my recordset, grab the key field and the current percentile number and stuff the values into a temp table. I mod the current row counter against the total percentile group number. If it is 0 then I add 1 to the current percentile number . After inserting records into the temp file I then update the main table. Using the example above, the first 7300 records would get a percentile number of 1, the next 7300 records would get a percentile number of 2, then 3,4,5,etc. Unfortunately, I am going record by record in a loop and the process takes upwards of 20mins. Is there a faster way to do this? I thought about using limit and offset, but I'm not sure how I would accomplish it. Below is the function I currently use. Thanks for any help provided CREATE OR REPLACE FUNCTION cdm.percentile_calc() RETURNS text AS 'DECLARE v_interval int4; v_tile int4; v_percentile int4; v_check int4; v_count int4; v_rowcount int4; myRec RECORD; BEGIN v_count:=0; v_tile:= 100; v_percentile:=1; v_rowcount :=1; v_check:=0; /* Get count of records with val_purch > 0 */ select into v_count count(*) from cdm.cdm_indiv_mast where val_purch_com >0; /* this number will be used as part of our MOD to tell when to add one to our percentile */ v_interval := v_count / v_tile; CREATE TEMP TABLE cdmperct (f1 int8, f2 int2); FOR myRec IN select indiv_key from cdm.cdm_indiv_mast where val_purch_com >0 order by val_purch_com desc LOOP INSERT INTO cdmperct values (myRec.indiv_key,v_percentile); v_check= mod(v_rowcount,v_interval); IF v_check = 0 THEN v_percentile:=v_percentile+1; END IF; v_rowcount:=v_rowcount+1; END LOOP; UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from cdmperct where indiv_key = f1; DROP TABLE cdmperct; RETURN \'DONE\'; END; ' LANGUAGE 'plpgsql' IMMUTABLE; Patrick Hatcher Macys.Com