Thread: HOWTO caching data across function calls: temporary tables, cursor?
HOWTO caching data across function calls: temporary tables, cursor?
From
Ivan Sergio Borgonovo
Date:
I made a similar question but maybe it was not that clear. I've a large table (items) linked with other tables (attributes). Some product ends into a basket. create table items( item_id serial primary key, attributes... ); create table item_attributes( item_id int references items (item_id) attributes... ); create table baskets( basket_id serial primary key, ...other stuff ); create table basket_items( item_id int references items (item_id), basket_id int references baskets (basket_id), ... ); I've a bunch of functions that operates on the basket (a smaller list of products with their attributes). So many functions ends up in repeating over and over a select similar to: select [list of columns] from baskets b join basket_items bi on b.basket_=bi.basket_id join items i on i.item_id=bi.item_id join item_attributes a a.item_id=i.item_id where b.basket_id=$1 It would be nice if I could avoid to execute this query over and over. I'd have to find a way to pass this data across functions. One way would be to put this data in a temporary table, but many things are unclear to me. I still have to find a way to reference these tables across functions (there will be different basket_id, and each transaction should see the same temp table and not "steal" the one of other transactions). I've to take care of name clash and visibility. I need to take care of garbage collection at the right time. I've no idea of the performance gain. Caching of queries in function (relation with OID ##### does not exist) and all the above make the use of temp tables a bit overwhelming. It seems that another way would be to use cursors... but I haven't been able to find any example. I think this is a common problem but I can't find general guidelines. I'm on 8.1 thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > I've a bunch of functions that operates on the basket (a smaller list > of products with their attributes). > > So many functions ends up in repeating over and over a select similar > to: > > select [list of columns] from baskets b > join basket_items bi on b.basket_=bi.basket_id > join items i on i.item_id=bi.item_id > join item_attributes a a.item_id=i.item_id > where b.basket_id=$1 > > It would be nice if I could avoid to execute this query over and over. > I'd have to find a way to pass this data across functions. You could pass arrays containing the selected rows between functions. Something like: CREATE TYPE basket_row AS(id integer, name text, count integer, ...); CREATE FUNCTION sell (items basket_row[]) RETURNS boolean LANGUAGE plpgsql AS $$........$$; Yours, Laurenz Albe
Re: HOWTO caching data across function calls: temporary tables, cursor?
From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 12:01:21 +0200 "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > Ivan Sergio Borgonovo wrote: > > I've a bunch of functions that operates on the basket (a smaller > > list of products with their attributes). > > > > So many functions ends up in repeating over and over a select > > similar to: > > > > select [list of columns] from baskets b > > join basket_items bi on b.basket_=bi.basket_id > > join items i on i.item_id=bi.item_id > > join item_attributes a a.item_id=i.item_id > > where b.basket_id=$1 > > > > It would be nice if I could avoid to execute this query over and > > over. I'd have to find a way to pass this data across functions. > > You could pass arrays containing the selected rows between > functions. Something like: > > CREATE TYPE basket_row AS(id integer, name text, count > integer, ...); CREATE FUNCTION sell (items basket_row[]) RETURNS > boolean LANGUAGE plpgsql AS $$........$$; It doesn't look as I can do the same stuff with array and tables/records. Many times I use joint or aggregates on the basket. Is there any good tutorial/example on how to use cursors or temp tables in such circumstance? -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: > > > I've a bunch of functions that operates on the basket (a smaller > > > list of products with their attributes). > > > > > > So many functions ends up in repeating over and over a select > > > similar to: > > > > > > select [list of columns] from baskets b > > > join basket_items bi on b.basket_=bi.basket_id > > > join items i on i.item_id=bi.item_id > > > join item_attributes a a.item_id=i.item_id > > > where b.basket_id=$1 > > > > > > It would be nice if I could avoid to execute this query over and > > > over. I'd have to find a way to pass this data across functions. > > > > You could pass arrays containing the selected rows between > > functions. Something like: > > > > CREATE TYPE basket_row AS(id integer, name text, count > > integer, ...); CREATE FUNCTION sell (items basket_row[]) RETURNS > > boolean LANGUAGE plpgsql AS $$........$$; > > > It doesn't look as I can do the same stuff with array and > tables/records. > > Many times I use joint or aggregates on the basket. Sorry, my example was unclear. The Type I declare should not hold one row from a single table, but a result row from the 4 table join you wrote above. You said that you have to do the same select over and over in each function, and my idea is to execute the query only once and store the results in an array. That should work just fine. > Is there any good tutorial/example on how to use cursors or temp > tables in such circumstance? Don't know about temporary tables, but the docs have enough about "refcursor": http://www.postgresql.org/docs/current/static/plpgsql-cursors.html A cursor is actually a good idea for this, I hadn't thought of it. You'd need scrollable cursors though, and these did not exist in PL/pgSQL before version 8.3. You could just pass a variable of type refcursor between your functions and each function could do a MOVE FIRST FROM curs; FOR var IN FETCH ALL FROM curs LOOP .... END LOOP; Does that help? Yours, Laurenz Albe
Re: HOWTO caching data across function calls: temporary tables, cursor?
From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 14:00:39 +0200 "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > Ivan Sergio Borgonovo wrote: > > It doesn't look as I can do the same stuff with array and > > tables/records. > > Many times I use joint or aggregates on the basket. > Sorry, my example was unclear. I was the first not to be clear ;) > The Type I declare should not hold one row from a single table, but > a result row from the 4 table join you wrote above. yes... but it is not just a matter of caching the data but rather being able to exploit them with SQL. > You said that you have to do the same select over and over in each > function, and my idea is to execute the query only once and store > the results in an array. I've to deal with a very small subset of a larger one over and over. I think picking up the attributes of all items in a basket and use aggregates, join on that small subset but having to refer to the larger set has its cost. I'd like to pay this cost just one time. Actually I even don't know if this approach is worth, but still I won't know if it works unless I'll have a working implementation. I could even retrieve a larger set of columns so that I could use the same slice for many purposes. After all I think it should be faster to retrieve a larger set of columns one time rather than retrieving the same set divided into multiple requests. This way the query will actually be the same and it should be easier to cache the result. > That should work just fine. > > > Is there any good tutorial/example on how to use cursors or temp > > tables in such circumstance? > Don't know about temporary tables, but the docs have enough about > "refcursor": > http://www.postgresql.org/docs/current/static/plpgsql-cursors.html > A cursor is actually a good idea for this, I hadn't thought of it. > You'd need scrollable cursors though, and these did not exist in > PL/pgSQL before version 8.3. > You could just pass a variable of type refcursor between your > functions and each function could do a > MOVE FIRST FROM curs; > FOR var IN FETCH ALL FROM curs LOOP > .... > END LOOP; > Does that help? Well it does look as if I could refine a search in a cursor. A possible way could be to encapsulate the temp table in a function, but still I'd like to find a tutorial/howto etc... that will point out the troubles I'm going to meet with uniqueness of the name, visibility, garbage collection... Suppose I've a function that return a setof record I could use that function inside other function as in select into [vars,...] [cols,...] from AFunction(...) where cols1<7... Now AFuncion(...) is going to retrieve over and over the same record set inside a transaction. AFuncion will be used inside several other functions. It would be nice if I could cache the result of AFunction. For me it's not clear if adding some additional caching system (eg. create a temp table inside the function) would obtain the same result as marking the function STABLE. thx -- Ivan Sergio Borgonovo http://www.webthatworks.it
Ivan Sergio Borgonovo wrote: [wants to cache query results in a temporary table for use in several functions] > yes... but it is not just a matter of caching the data but rather > being able to exploit them with SQL. Oh, I see, you want to select/join with the cached data. Then neither arrays nor cursors can help; you need a temporary table. > A possible way could be to encapsulate the temp table in a function, > but still I'd like to find a tutorial/howto etc... that will point > out the troubles I'm going to meet with uniqueness of the name, > visibility, garbage collection... You can CREATE TEMPORARY TABLE basket_123 [ON COMMIT DROP] AS SELECT .... If you do not include ON COMMIT DROP, the table will be dropped at the end of your database session. To have more than one table and have a unique name, you can use a sequence to construct the table name. Tha means you will have to use dynamic SQL. > Suppose I've a function that return a setof record > > I could use that function inside other function as in > > select into [vars,...] [cols,...] from AFunction(...) where cols1<7... > > Now AFuncion(...) is going to retrieve over and over the same record > set inside a transaction. > > AFuncion will be used inside several other functions. > > It would be nice if I could cache the result of AFunction. Since you are looking for a sample, maybe something like that can get you started: CREATE SEQUENCE temp_names; CREATE FUNCTION mkcache(param1 integer, param2 text) RETURNS name LANGUAGE plpgsql VOLATILE STRICT AS $$DECLARE i integer; BEGIN SELECT nextval('temp_names') INTO i; EXECUTE 'CREATE TEMPORARY TABLE basket_' || i || '(col1, col2, ...) AS SELECT .....'; RETURN 'basket_' || i; END;$$; CREATE FUNCTION getcache(tabname IN name, col1 OUT integer, col2 OUT text, ...) RETURNS SETOF RECORD LANGUAGE plpgsql IMMUTABLE STRICT AS $$BEGIN FOR col1, col2, ... IN EXECUTE 'SELECT col1, col2, ... FROM ' || tabname LOOP RETURN NEXT; END LOOP; RETURN; END;$$; Sample use: SELECT mkcache(1, 'test'); mkcache ---------- basket_1 (1 row) SELECT * FROM getcache('basket_1'); col1 | col2 | ... ------+-------+----- 27 | item1 | ... .... (n rows) > For me it's not clear if adding some additional caching system (eg. > create a temp table inside the function) would obtain the same result > as marking the function STABLE. No, STABLE doen't help here - that is only a hint for the optimizer. Yours, Laurenz Albe
optimiser STABLE vs. temp table was: HOWTO caching data across function calls: temporary tables, cursor?
From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 16:08:45 +0200 "Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: [snip] > Since you are looking for a sample, maybe something like that can > get you started: > > CREATE SEQUENCE temp_names; [snip] I was thinking something in the line of it. I wasn't that sure of the course to follow. Just few hours ago it looked like a swamp. Thanks, now I've more confidence it is a feasible plan. > > For me it's not clear if adding some additional caching system > > (eg. create a temp table inside the function) would obtain the > > same result as marking the function STABLE. > No, STABLE doen't help here - that is only a hint for the optimizer. I can't really appreciate the difference... or better... I think the difference may be that I can't take for granted the function will be cached if I delegate the choice to the optimiser. I'd say the optimiser will try to cache it unless it finds something better to do with the memory. If that's what happens I think I'll avoid to complicate my life with a manual cache management and just wrap the most general query in a function. Can anybody confirm that's how the optimiser work or explain the differences between providing a "manual" cache and just declaring a function STABLE? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: optimiser STABLE vs. temp table was: HOWTO caching data across function calls: temporary tables, cursor?
From
Tom Lane
Date:
Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > I can't really appreciate the difference... or better... I think the > difference may be that I can't take for granted the function will be > cached if I delegate the choice to the optimiser. You can take for granted that it won't be, because there is no function cache in Postgres. regards, tom lane
Re: optimiser STABLE vs. temp table was: HOWTO caching data across function calls: temporary tables, cursor?
From
Ivan Sergio Borgonovo
Date:
On Tue, 01 Apr 2008 11:22:20 -0400 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > > I can't really appreciate the difference... or better... I think > > the difference may be that I can't take for granted the function > > will be cached if I delegate the choice to the optimiser. > You can take for granted that it won't be, because there is no > function cache in Postgres. Would you please be so kind to rephrase: http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html "A STABLE function cannot modify the database and is guaranteed to return the same results given the same arguments for all rows within a single statement. This category allows the optimizer to optimize multiple calls of the function to a single call. In particular, it is safe to use an expression containing such a function in an index scan condition. (Since an index scan will evaluate the comparison value only once, not once at each row, it is not valid to use a VOLATILE function in an index scan condition.)" I can't understand how it can call a function a single time and avoid to cache the result. Is it limited to a single statement? eg. create or replace function(...) as $$ begin select into bau cetti from stablefunc(sameparam); select into bingo t.bongo from stablefunc(sameparam) as s join sometable t on s.cetti=t.cetti; ... will call stablefunc 2 times? I do appreciate the difference between: create or replace function(...) as $$ begin select into bau cetti from stablefunc(sameparam); insert into sometable... select into bingo t.bongo from stablefunc(sameparam) as s join sometable t on s.cetti=t.cetti; since stablefunc may depend on sometable and it is going to see the change. So... then any pointer to some places where I could learn some caching techniques if STABLE doesn't do the trick? thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: optimiser STABLE vs. temp table was: HOWTO caching data across function calls: temporary tables, cursor?
From
Martijn van Oosterhout
Date:
On Tue, Apr 01, 2008 at 06:06:35PM +0200, Ivan Sergio Borgonovo wrote: > Would you please be so kind to rephrase: > > http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html <snip> > I can't understand how it can call a function a single time and avoid > to cache the result. > Is it limited to a single statement? Yes, it's limited to a single statememnt. Let's say hypothetically postgres could optimise: select f() + f(); to select 2*f(); It would have to optimised so it only calls the function once yet at no time does it "cache" the result in the usual sense. Does this helps, > select into bau cetti from stablefunc(sameparam); > > select into bingo t.bongo from stablefunc(sameparam) as s > join sometable t on s.cetti=t.cetti; > > will call stablefunc 2 times? Yes. > So... then any pointer to some places where I could learn some > caching techniques if STABLE doesn't do the trick? Anything persistnat usually needs to be in a table. If it's a really small amount you could use the global namespace in pl/perl or similar in other languages. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
still on techniques to cache table slices was: optimiser STABLE vs. temp table
From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 18:32:25 +0200 Martijn van Oosterhout <kleptog@svana.org> wrote: > On Tue, Apr 01, 2008 at 06:06:35PM +0200, Ivan Sergio Borgonovo > wrote: > > Would you please be so kind to rephrase: > > > > http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html > <snip> > > I can't understand how it can call a function a single time and > > avoid to cache the result. > > Is it limited to a single statement? > > Yes, it's limited to a single statememnt. Let's say hypothetically > postgres could optimise: It be nice if it was a bit clearer in the docs. Someone may guess it from the fact that between 2 statements there could be something that change the result of the function... but well on the other side people may hope the optimiser is smarter than what it really is and still can call a function just when actually needed and discern between: select ... stablefunc() select ... stablefunc() and select ... stablefunc() insert somestuff select ... stablefunc() That would make caching stuff definitively easier... IMMUTABLE is too much but extending the "cacheability" of functions a little bit further with some other attribute would make things much easier. > Does this helps, Definitively. thanks. > Anything persistnat usually needs to be in a table. If it's a really > small amount you could use the global namespace in pl/perl or > similar in other languages. It does look as it is "semi-persistent"... so temp tables may actually do the trick. Albe Laurenz's example was partially comforting. Could somebody point me to some other technique or a more in depth discussion or whatever that will help me to learn a bit more about this issue and available techniques? I'm wondering about the visibility of temp tables defined in a function. From my understanding pg behaviour is the one I'm looking for. I could use the same temp table name across different sessions without the trouble of clashes... but well this behaviour is not standard and I'm worried it will bite me once pg will follow the standard. Using the same name would make possible to do something like: if(table doesn't exist) create table end if return rows but this looks like it is going to be harder than expected since temp table aren't easy to "find". I'd have to resort to catching exceptions etc... If I use Albe Laurenz's technique I'll have to do some bookkeeping to store and pass the temp table name across functions. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it
Re: optimiser STABLE vs. temp table was: HOWTO caching data across function calls: temporary tables, cursor?
From
"Albe Laurenz"
Date:
Ivan Sergio Borgonovo wrote: > http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html > > "A STABLE function cannot modify the database and is guaranteed to > return the same results given the same arguments for all rows within > a single statement. This category allows the optimizer to optimize > multiple calls of the function to a single call. In particular, it is > safe to use an expression containing such a function in an index scan > condition. (Since an index scan will evaluate the comparison value > only once, not once at each row, it is not valid to use a VOLATILE > function in an index scan condition.)" > > I can't understand how it can call a function a single time and avoid > to cache the result. > Is it limited to a single statement? You did not notice the sentence *before* that list: "The volatility category is a promise to the optimizer about the behavior of the function" What is meant is this: You must not define a function STABLE unless you are certain that it will always have the same result if called multiple times with the same arguments within a single query. The function itself will not behave any differently if you define it IMMUTABLE or if you define it VOLATILE. It's up to the function definer to choose the right setting. If you do it wrong, weird things may happen. As an example, if you define: CREATE FUNCTION random_nr() RETURNS double precision IMMUTABLE LANGUAGE sql AS 'SELECT random()'; and you try to select 10 random numbers like: SELECT random_nr() FROM generate_series(1, 10); you will get the same number 10 times, because the optimizer will cause the function to be called only once. If you define the function as VOLATILE, it will be called ten times because the optimizer knows that it cannot reuse the first result again. On the other hand, if you CREATE OR REPLACE FUNCTION to_upcase(text) RETURNS text VOLATILE STRICT LANGUAGE plpgsql AS 'BEGIN RETURN upper($1); END;'; and you define CREATE TABLE tab (id integer PRIMARY KEY, val text UNIQUE); INSERT INTO tab VALUES (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four'); then look at the following EXPLAIN output: EXPLAIN SELECT id FROM tab WHERE val = to_upcase('three'); QUERY PLAN ----------------------------------------------------- Seq Scan on tab (cost=0.00..332.88 rows=1 width=4) Filter: (val = to_upcase('three'::text)) (2 rows) If you had defined the function as IMMUTABLE, the optimizer would know that it is safe to use the function in an index scan, because it won't change value: EXPLAIN SELECT id FROM tab WHERE val = to_upcase('three'); QUERY PLAN ----------------------------------------------------------------------- Index Scan using tab_val_key on tab (cost=0.00..8.27 rows=1 width=4) Index Cond: (val = 'THREE'::text) (2 rows) Yours, Laurenz Albe