Thread: temp tables and function performance

temp tables and function performance

From
"Anton Melser"
Date:
Hi,
I am trying to move up in the world with my sql and need to do the following...
I have a subscribers table and I need to export to csv (semi-colon
separated) certain fields - that is fine, but I also need to export a
multi-select field from another table as one string (0 to n values
separated by commas) per line. The problem being that to get the
actual string I have to go via 4 other relations... and as I have
200k+ subscribers this takes a while.
My idea (which seems to work, though I haven't tested fully as it
takes too damn long!), was to do the following. I would REALLY
appreciate any pointers as my sql has never been this challenged!

CREATE OR REPLACE FUNCTION mytest()
  RETURNS integer AS
$BODY$DECLARE kindy INTEGER;
BEGIN

create temporary table tmp_interests(
id bigint,
value character varying(100)
)
WITHOUT OIDS
ON COMMIT DROP;

insert into tmp_interests
select distinct si.subid, rbi.value
from restem rbi, cats cc, trm_terms tt, subrest si
where rbi.key = cc.name
    and cc.catid = tt.modcid
    and tt.tid = si.themeid;

create temporary table tmp_subscribers(
email character varying(200),
format character varying(4),
interests  character varying(1000),
)
WITHOUT OIDS
ON COMMIT DROP;

insert into tmp_subscribers
Select email,
 format,
 my_interests(id) as interests
from subscriber;

GET DIAGNOSTICS kindy = ROW_COUNT;

copy tmp_subscribers to '/home/myname/subs.csv' WITH CSV DELIMITER AS
';' NULL AS '';


GET DIAGNOSTICS kindy = ROW_COUNT;

return kindy;

END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

...


CREATE OR REPLACE FUNCTION my_interests(bigint)
  RETURNS character varying AS
$BODY$DECLARE
subid ALIAS FOR $1;
interests character varying;
myinterest RECORD;

BEGIN
    interests := '';
    FOR myinterest IN execute 'select value from tmp_interests where id =
' || subid LOOP
        if interests = '' then
            interests := myinterest.value;
        else
            interests := interests || ',' || myinterest.value;
        end if;
    END LOOP;

    RETURN interests;
END$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

...

select mytest();

If there are errors then please just focus on the logic, as I have cut
back on the number of columns (for clarity) and changed a lot of the
real table/names... Am I going about it the right way? Is there a
better way?
Thanks heaps,
Antoine

Re: temp tables and function performance

From
Alban Hertroys
Date:
Anton Melser wrote:
> Hi,
> I am trying to move up in the world with my sql and need to do the
> following...
> I have a subscribers table and I need to export to csv (semi-colon
> separated) certain fields - that is fine, but I also need to export a
> multi-select field from another table as one string (0 to n values
> separated by commas) per line. The problem being that to get the
> actual string I have to go via 4 other relations... and as I have
> 200k+ subscribers this takes a while.

200k isn't all that much, unless you have a lot of large columns.

> My idea (which seems to work, though I haven't tested fully as it
> takes too damn long!), was to do the following. I would REALLY
> appreciate any pointers as my sql has never been this challenged!
>
> CREATE OR REPLACE FUNCTION mytest()
>  RETURNS integer AS
> $BODY$DECLARE kindy INTEGER;
> BEGIN


> create temporary table tmp_interests(
> id bigint,
> value character varying(100)
> )
> WITHOUT OIDS
> ON COMMIT DROP;
>
> insert into tmp_interests
> select distinct si.subid, rbi.value
> from restem rbi, cats cc, trm_terms tt, subrest si
> where rbi.key = cc.name
>     and cc.catid = tt.modcid
>     and tt.tid = si.themeid;

You can do these in one go using
 CREATE TEMPORARY TABLE tmp_interests AS SELECT ...

> create temporary table tmp_subscribers(
> email character varying(200),
> format character varying(4),
> interests  character varying(1000),
> )
> WITHOUT OIDS
> ON COMMIT DROP;
>
> insert into tmp_subscribers
> Select email,
> format,
> my_interests(id) as interests
> from subscriber;
>
> GET DIAGNOSTICS kindy = ROW_COUNT;
>
> copy tmp_subscribers to '/home/myname/subs.csv' WITH CSV DELIMITER AS
> ';' NULL AS '';

If you have PG8.2 and can combine your 2 select queries into one, then
you can create a view of them and copy that instead. It takes out all
the inserts and can use your already existing table statistics - it
should be faster.

Also, after inserting a bunch of records into a table, make a habit of
running ANALYSE on it. Otherwise the query-planner knows nothing about
the data in the tables and is likely to come up with a sub-optimal query
plan.

>
> GET DIAGNOSTICS kindy = ROW_COUNT;
>
> return kindy;
>
> END;$BODY$
>  LANGUAGE 'plpgsql' VOLATILE;
>
> ...
>
>
> CREATE OR REPLACE FUNCTION my_interests(bigint)
>  RETURNS character varying AS
> $BODY$DECLARE
> subid ALIAS FOR $1;
> interests character varying;
> myinterest RECORD;
>
> BEGIN
>     interests := '';
>     FOR myinterest IN execute 'select value from tmp_interests where id =
> ' || subid LOOP

No need for a dynamic query here...

>         if interests = '' then
>             interests := myinterest.value;
>         else
>             interests := interests || ',' || myinterest.value;
>         end if;
>     END LOOP;
>
>     RETURN interests;
> END$BODY$
>  LANGUAGE 'plpgsql' VOLATILE;

I'd have to look up the syntax, but I'm quite certain you can put the
results of a select into an array. After that you can call
array_to_string(...) to convert it into a comma seperated string. That'd
take away the need for this SP (which I think is actually STABLE instead
of VOLATILE).

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
   7500 AK Enschede

// Integrate Your World //