Re: Is there a faster way to do this? - Mailing list pgsql-sql
From | Jean-Luc Lachance |
---|---|
Subject | Re: Is there a faster way to do this? |
Date | |
Msg-id | 40CF5E3D.2070208@sympatico.ca Whole thread Raw |
In response to | Is there a faster way to do this? (Patrick Hatcher <PHatcher@macys.com>) |
List | pgsql-sql |
One way to do it would be to: Not put the percentile in the sales table; Create an percentile table with a foreign key to the sales table primary key and percentile int4: CREATE TABLE percentiles( fkey <whatever> PRIMARY KEY REFERENCES sales( <primary key>), percentile INT4 ); Create a sequence for that ancillary table: CREATE SEQUENCE percent_seq; When ready to create the percentiles, truncate the percentile table and reset the sequence next value: TRUNCATE percentiles; setval( 'percent_seq', 0); Now query your sales with the proper order by: INSERT INTO percentiles SELECT pk_sales, nextval( 'percent_seq') / tile_size FROM sales ORDER BY sales_value; HTH Patrick Hatcher wrote: > > > > pg: 7.4.2 > RedHat 7.2 > > Can I get some advice on a possible faster way of doing this: > > Scenario: Each day I update a column in a table with an internal > percentile value. To arrive at this value, I'll get a count of records > with sales > 0 and then divide this count by the total number of tile > groups I want. So for example: > Total records w/sales > 0 = 730,000 > tile# = 100 > total percentile groups (730,000 / 100):7300 > > Now that I have the total number of groups I need, I cycle through my > recordset, grab the key field and the current percentile number and stuff > the values into a temp table. I mod the current row counter against the > total percentile group number. If it is 0 then I add 1 to the current > percentile number . After inserting records into the temp file I then > update the main table. > > Using the example above, the first 7300 records would get a percentile > number of 1, the next 7300 records would get a percentile number of 2, > then 3,4,5,etc. > > Unfortunately, I am going record by record in a loop and the process takes > upwards of 20mins. Is there a faster way to do this? I thought about > using limit and offset, but I'm not sure how I would accomplish it. > > Below is the function I currently use. Thanks for any help provided > > CREATE OR REPLACE FUNCTION cdm.percentile_calc() > RETURNS text AS > 'DECLARE > v_interval int4; > v_tile int4; > v_percentile int4; > v_check int4; > v_count int4; > v_rowcount int4; > myRec RECORD; > > BEGIN > v_count:=0; > v_tile:= 100; > v_percentile:=1; > v_rowcount :=1; > v_check:=0; > > > /* Get count of records with val_purch > 0 */ > select into v_count count(*) from cdm.cdm_indiv_mast where > val_purch_com >0; > > /* this number will be used as part of our MOD to tell when to add one > to our percentile */ > v_interval := v_count / v_tile; > > CREATE TEMP TABLE cdmperct (f1 int8, f2 int2); > > > FOR myRec IN select indiv_key from cdm.cdm_indiv_mast where > val_purch_com >0 order by val_purch_com desc LOOP > INSERT INTO cdmperct values (myRec.indiv_key,v_percentile); > v_check = mod(v_rowcount,v_interval); > IF v_check = 0 THEN > v_percentile:=v_percentile+1; > END IF; > v_rowcount:= v_rowcount+1; > END LOOP; > > > UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from cdmperct where > indiv_key = f1; > > > DROP TABLE cdmperct; > RETURN \'DONE\'; > END; ' > LANGUAGE 'plpgsql' IMMUTABLE; > > Patrick Hatcher > Macys.Com > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >