Re: Is there a faster way to do this? - Mailing list pgsql-sql

From Jean-Luc Lachance
Subject Re: Is there a faster way to do this?
Date
Msg-id 40CF5E3D.2070208@sympatico.ca
Whole thread Raw
In response to Is there a faster way to do this?  (Patrick Hatcher <PHatcher@macys.com>)
List pgsql-sql
One way to do it would be to:

Not put the percentile in the sales table;

Create an percentile table with a foreign key to the sales table primary 
key and percentile int4:

CREATE TABLE percentiles(
fkey <whatever> PRIMARY KEY REFERENCES sales( <primary key>),
percentile INT4 );

Create a sequence for that ancillary table:
CREATE SEQUENCE percent_seq;

When ready to create the percentiles, truncate the percentile table and 
reset the sequence next value:

TRUNCATE percentiles; setval( 'percent_seq', 0);

Now query your sales with the proper order by:

INSERT INTO percentiles SELECT pk_sales, nextval( 'percent_seq') / 
tile_size FROM sales ORDER BY sales_value;

HTH



Patrick Hatcher wrote:
> 
> 
> 
> pg: 7.4.2
> RedHat 7.2
> 
> Can I get some advice on a possible faster way of doing this:
> 
> Scenario:  Each day I update a column in a table with an internal
> percentile value.  To arrive at  this value, I'll get a count of records
> with sales > 0 and then divide this count by the total number of tile
> groups I want.  So for example:
> Total records w/sales > 0 = 730,000
>  tile# = 100
> total percentile groups (730,000 / 100):7300
> 
> Now that I have the total number of groups I need, I cycle through my
> recordset, grab the key field and the current percentile number and stuff
> the values into a temp table. I mod the current row counter against the
> total percentile group number.  If it is 0 then I add 1 to the  current
> percentile number .  After inserting records into the temp file I then
> update the main table.
> 
> Using the example above, the first 7300 records would get a  percentile
> number of 1, the next 7300  records would get  a percentile number of 2,
> then 3,4,5,etc.
> 
> Unfortunately, I am going record by record in a loop and the process takes
> upwards of 20mins.  Is there a faster way to do this?  I thought about
> using limit and offset, but I'm not sure how I would accomplish it.
> 
> Below is the function I currently use.  Thanks for any help provided
> 
> CREATE OR REPLACE FUNCTION cdm.percentile_calc()
>   RETURNS text AS
> 'DECLARE
>      v_interval int4;
>      v_tile int4;
>      v_percentile int4;
>      v_check int4;
>      v_count int4;
>      v_rowcount int4;
>      myRec  RECORD;
> 
> BEGIN
>      v_count:=0;
>      v_tile:= 100;
>      v_percentile:=1;
>      v_rowcount :=1;
>      v_check:=0;
> 
> 
>      /* Get count of records with val_purch > 0 */
>      select into v_count count(*)  from cdm.cdm_indiv_mast where
> val_purch_com >0;
> 
>      /* this number will be used as part of our MOD to tell when to add one
> to our percentile */
>      v_interval := v_count / v_tile;
> 
>  CREATE TEMP TABLE cdmperct (f1 int8, f2 int2);
> 
> 
>      FOR myRec IN  select indiv_key from cdm.cdm_indiv_mast where
> val_purch_com  >0 order by val_purch_com desc  LOOP
>       INSERT INTO cdmperct values (myRec.indiv_key,v_percentile);
>       v_check = mod(v_rowcount,v_interval);
>       IF v_check = 0 THEN
>        v_percentile:=v_percentile+1;
>       END IF;
>       v_rowcount:= v_rowcount+1;
>      END LOOP;
> 
> 
>        UPDATE cdm.cdm_indiv_mast SET percentiler = f2 from  cdmperct where
> indiv_key = f1;
> 
> 
>      DROP TABLE cdmperct;
>      RETURN  \'DONE\';
> END; '
>   LANGUAGE 'plpgsql' IMMUTABLE;
> 
> Patrick Hatcher
> Macys.Com
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 



pgsql-sql by date:

Previous
From: Edmund Bacon
Date:
Subject: Re: Is there a faster way to do this?
Next
From: "Stijn Vanroye"
Date:
Subject: Re: Your question in postgresql.org forum (Diff. between two times as a numeric value in a stored proc)