Thread: HOWTO caching data across function calls: temporary tables, cursor?

HOWTO caching data across function calls: temporary tables, cursor?

From
Ivan Sergio Borgonovo
Date:
I made a similar question but maybe it was not that clear.

I've a large table (items) linked with other tables (attributes).

Some product ends into a basket.

create table items(
  item_id serial primary key,
  attributes...
);

create table item_attributes(
  item_id int references items (item_id)
  attributes...
);

create table baskets(
  basket_id serial primary key,
  ...other stuff
);
create table basket_items(
  item_id int references items (item_id),
  basket_id int references baskets (basket_id),
  ...
);


I've a bunch of functions that operates on the basket (a smaller list
of products with their attributes).

So many functions ends up in repeating over and over a select similar
to:

select [list of columns] from baskets b
join basket_items bi on b.basket_=bi.basket_id
join items i on i.item_id=bi.item_id
join item_attributes a a.item_id=i.item_id
where b.basket_id=$1

It would be nice if I could avoid to execute this query over and over.
I'd have to find a way to pass this data across functions.

One way would be to put this data in a temporary table, but many
things are unclear to me.
I still have to find a way to reference these tables across functions
(there will be different basket_id, and each transaction should see
the same temp table and not "steal" the one of other transactions).
I've to take care of name clash and visibility.
I need to take care of garbage collection at the right time.
I've no idea of the performance gain.
Caching of queries in function (relation with OID ##### does not
exist) and all the above make the use of temp tables a bit
overwhelming.

It seems that another way would be to use cursors... but I haven't
been able to find any example.

I think this is a common problem but I can't find general guidelines.

I'm on 8.1

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: HOWTO caching data across function calls: temporary tables, cursor?

From
"Albe Laurenz"
Date:
Ivan Sergio Borgonovo wrote:
> I've a bunch of functions that operates on the basket (a smaller list
> of products with their attributes).
>
> So many functions ends up in repeating over and over a select similar
> to:
>
> select [list of columns] from baskets b
> join basket_items bi on b.basket_=bi.basket_id
> join items i on i.item_id=bi.item_id
> join item_attributes a a.item_id=i.item_id
> where b.basket_id=$1
>
> It would be nice if I could avoid to execute this query over and over.
> I'd have to find a way to pass this data across functions.

You could pass arrays containing the selected rows between functions.
Something like:

CREATE TYPE basket_row AS(id integer, name text, count integer, ...);
CREATE FUNCTION sell (items basket_row[]) RETURNS boolean LANGUAGE plpgsql
   AS $$........$$;

Yours,
Laurenz Albe

Re: HOWTO caching data across function calls: temporary tables, cursor?

From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 12:01:21 +0200
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

> Ivan Sergio Borgonovo wrote:
> > I've a bunch of functions that operates on the basket (a smaller
> > list of products with their attributes).
> >
> > So many functions ends up in repeating over and over a select
> > similar to:
> >
> > select [list of columns] from baskets b
> > join basket_items bi on b.basket_=bi.basket_id
> > join items i on i.item_id=bi.item_id
> > join item_attributes a a.item_id=i.item_id
> > where b.basket_id=$1
> >
> > It would be nice if I could avoid to execute this query over and
> > over. I'd have to find a way to pass this data across functions.
>
> You could pass arrays containing the selected rows between
> functions. Something like:
>
> CREATE TYPE basket_row AS(id integer, name text, count
> integer, ...); CREATE FUNCTION sell (items basket_row[]) RETURNS
> boolean LANGUAGE plpgsql AS $$........$$;


It doesn't look as I can do the same stuff with array and
tables/records.

Many times I use joint or aggregates on the basket.

Is there any good tutorial/example on how to use cursors or temp
tables in such circumstance?


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: HOWTO caching data across function calls: temporary tables, cursor?

From
"Albe Laurenz"
Date:
Ivan Sergio Borgonovo wrote:
> > > I've a bunch of functions that operates on the basket (a smaller
> > > list of products with their attributes).
> > >
> > > So many functions ends up in repeating over and over a select
> > > similar to:
> > >
> > > select [list of columns] from baskets b
> > > join basket_items bi on b.basket_=bi.basket_id
> > > join items i on i.item_id=bi.item_id
> > > join item_attributes a a.item_id=i.item_id
> > > where b.basket_id=$1
> > >
> > > It would be nice if I could avoid to execute this query over and
> > > over. I'd have to find a way to pass this data across functions.
> >
> > You could pass arrays containing the selected rows between
> > functions. Something like:
> >
> > CREATE TYPE basket_row AS(id integer, name text, count
> > integer, ...); CREATE FUNCTION sell (items basket_row[]) RETURNS
> > boolean LANGUAGE plpgsql AS $$........$$;
>
>
> It doesn't look as I can do the same stuff with array and
> tables/records.
>
> Many times I use joint or aggregates on the basket.

Sorry, my example was unclear.

The Type I declare should not hold one row from a single table, but a
result row from the 4 table join you wrote above.

You said that you have to do the same select over and over in each
function, and my idea is to execute the query only once and store the
results in an array.

That should work just fine.

> Is there any good tutorial/example on how to use cursors or temp
> tables in such circumstance?

Don't know about temporary tables, but the docs have enough about
"refcursor":
http://www.postgresql.org/docs/current/static/plpgsql-cursors.html

A cursor is actually a good idea for this, I hadn't thought of it.
You'd need scrollable cursors though, and these did not exist in PL/pgSQL
before version 8.3.

You could just pass a variable of type refcursor between your functions
and each function could do a
MOVE FIRST FROM curs;
FOR var IN FETCH ALL FROM curs LOOP
   ....
END LOOP;

Does that help?

Yours,
Laurenz Albe

Re: HOWTO caching data across function calls: temporary tables, cursor?

From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 14:00:39 +0200
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

> Ivan Sergio Borgonovo wrote:

> > It doesn't look as I can do the same stuff with array and
> > tables/records.

> > Many times I use joint or aggregates on the basket.

> Sorry, my example was unclear.

I was the first not to be clear ;)

> The Type I declare should not hold one row from a single table, but
> a result row from the 4 table join you wrote above.

yes... but it is not just a matter of caching the data but rather
being able to exploit them with SQL.

> You said that you have to do the same select over and over in each
> function, and my idea is to execute the query only once and store
> the results in an array.

I've to deal with a very small subset of a larger one over and over.
I think picking up the attributes of all items in a basket and use
aggregates, join on that small subset but having to refer to the
larger set has its cost.

I'd like to pay this cost just one time. Actually I even don't know
if this approach is worth, but still I won't know if it works unless
I'll have a working implementation.

I could even retrieve a larger set of columns so that I could use the
same slice for many purposes. After all I think it should be faster
to retrieve a larger set of columns one time rather than retrieving
the same set divided into multiple requests.
This way the query will actually be the same and it should be easier
to cache the result.


> That should work just fine.
>
> > Is there any good tutorial/example on how to use cursors or temp
> > tables in such circumstance?

> Don't know about temporary tables, but the docs have enough about
> "refcursor":
> http://www.postgresql.org/docs/current/static/plpgsql-cursors.html

> A cursor is actually a good idea for this, I hadn't thought of it.
> You'd need scrollable cursors though, and these did not exist in
> PL/pgSQL before version 8.3.

> You could just pass a variable of type refcursor between your
> functions and each function could do a
> MOVE FIRST FROM curs;
> FOR var IN FETCH ALL FROM curs LOOP
>    ....
> END LOOP;

> Does that help?

Well it does look as if I could refine a search in a cursor.

A possible way could be to encapsulate the temp table in a function,
but still I'd like to find a tutorial/howto etc... that will point
out the troubles I'm going to meet with uniqueness of the name,
visibility, garbage collection...

Suppose I've a function that return a setof record

I could use that function inside other function as in

select into [vars,...] [cols,...] from AFunction(...) where cols1<7...

Now AFuncion(...) is going to retrieve over and over the same record
set inside a transaction.

AFuncion will be used inside several other functions.

It would be nice if I could cache the result of AFunction.
For me it's not clear if adding some additional caching system (eg.
create a temp table inside the function) would obtain the same result
as marking the function STABLE.

thx

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Re: HOWTO caching data across function calls: temporary tables, cursor?

From
"Albe Laurenz"
Date:
Ivan Sergio Borgonovo wrote:
[wants to cache query results in a temporary table for use in
several functions]

> yes... but it is not just a matter of caching the data but rather
> being able to exploit them with SQL.

Oh, I see, you want to select/join with the cached data.

Then neither arrays nor cursors can help; you need a temporary table.

> A possible way could be to encapsulate the temp table in a function,
> but still I'd like to find a tutorial/howto etc... that will point
> out the troubles I'm going to meet with uniqueness of the name,
> visibility, garbage collection...

You can
CREATE TEMPORARY TABLE basket_123 [ON COMMIT DROP] AS SELECT ....
If you do not include ON COMMIT DROP, the table will be dropped at the end
of your database session.

To have more than one table and have a unique name, you can
use a sequence to construct the table name. Tha means you will have
to use dynamic SQL.

> Suppose I've a function that return a setof record
>
> I could use that function inside other function as in
>
> select into [vars,...] [cols,...] from AFunction(...) where cols1<7...
>
> Now AFuncion(...) is going to retrieve over and over the same record
> set inside a transaction.
>
> AFuncion will be used inside several other functions.
>
> It would be nice if I could cache the result of AFunction.

Since you are looking for a sample, maybe something like that can get you started:

CREATE SEQUENCE temp_names;

CREATE FUNCTION mkcache(param1 integer, param2 text) RETURNS name
   LANGUAGE plpgsql VOLATILE STRICT AS
$$DECLARE
   i integer;
BEGIN
   SELECT nextval('temp_names') INTO i;
   EXECUTE 'CREATE TEMPORARY TABLE basket_' || i || '(col1, col2, ...) AS SELECT .....';
   RETURN 'basket_' || i;
END;$$;

CREATE FUNCTION getcache(tabname IN name, col1 OUT integer, col2 OUT text, ...) RETURNS SETOF RECORD
   LANGUAGE plpgsql IMMUTABLE STRICT AS
$$BEGIN
   FOR col1, col2, ... IN EXECUTE 'SELECT col1, col2, ... FROM ' || tabname LOOP
      RETURN NEXT;
   END LOOP;
   RETURN;
END;$$;

Sample use:

SELECT mkcache(1, 'test');
 mkcache
----------
 basket_1
(1 row)

SELECT * FROM getcache('basket_1');
 col1 | col2  | ...
------+-------+-----
   27 | item1 | ...
....
(n rows)

> For me it's not clear if adding some additional caching system (eg.
> create a temp table inside the function) would obtain the same result
> as marking the function STABLE.

No, STABLE doen't help here - that is only a hint for the optimizer.

Yours,
Laurenz Albe

On Tue, 1 Apr 2008 16:08:45 +0200
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote:

[snip]

> Since you are looking for a sample, maybe something like that can
> get you started:
>
> CREATE SEQUENCE temp_names;

[snip]

I was thinking something in the line of it.
I wasn't that sure of the course to follow.
Just few hours ago it looked like a swamp. Thanks, now I've more
confidence it is a feasible plan.

> > For me it's not clear if adding some additional caching system
> > (eg. create a temp table inside the function) would obtain the
> > same result as marking the function STABLE.

> No, STABLE doen't help here - that is only a hint for the optimizer.

I can't really appreciate the difference... or better... I think the
difference may be that I can't take for granted the function will be
cached if I delegate the choice to the optimiser.

I'd say the optimiser will try to cache it unless it finds something
better to do with the memory.
If that's what happens I think I'll avoid to complicate my life with
a manual cache management and just wrap the most general query in a
function.

Can anybody confirm that's how the optimiser work or explain the
differences between providing a "manual" cache and just declaring a
function STABLE?

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> I can't really appreciate the difference... or better... I think the
> difference may be that I can't take for granted the function will be
> cached if I delegate the choice to the optimiser.

You can take for granted that it won't be, because there is no function
cache in Postgres.

            regards, tom lane

On Tue, 01 Apr 2008 11:22:20 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> > I can't really appreciate the difference... or better... I think
> > the difference may be that I can't take for granted the function
> > will be cached if I delegate the choice to the optimiser.

> You can take for granted that it won't be, because there is no
> function cache in Postgres.

Would you please be so kind to rephrase:

http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html

"A STABLE function cannot modify the database and is guaranteed to
return the same results given the same arguments for all rows within
a single statement. This category allows the optimizer to optimize
multiple calls of the function to a single call. In particular, it is
safe to use an expression containing such a function in an index scan
condition. (Since an index scan will evaluate the comparison value
only once, not once at each row, it is not valid to use a VOLATILE
function in an index scan condition.)"

I can't understand how it can call a function a single time and avoid
to cache the result.
Is it limited to a single statement?

eg.
create or replace function(...) as
$$
begin

  select into bau cetti from stablefunc(sameparam);

  select into bingo t.bongo from stablefunc(sameparam) as s
    join sometable t on s.cetti=t.cetti;

...

will call stablefunc 2 times?


I do appreciate the difference between:

create or replace function(...) as
$$
begin

  select into bau cetti from stablefunc(sameparam);

  insert into sometable...

  select into bingo t.bongo from stablefunc(sameparam) as s
    join sometable t on s.cetti=t.cetti;

since stablefunc may depend on sometable and it is going to see the
change.

So... then any pointer to some places where I could learn some
caching techniques if STABLE doesn't do the trick?

thanks


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


On Tue, Apr 01, 2008 at 06:06:35PM +0200, Ivan Sergio Borgonovo wrote:
> Would you please be so kind to rephrase:
>
> http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html

<snip>

> I can't understand how it can call a function a single time and avoid
> to cache the result.
> Is it limited to a single statement?

Yes, it's limited to a single statememnt. Let's say hypothetically
postgres could optimise:

select f() + f();

to

select 2*f();

It would have to optimised so it only calls the function once yet at no
time does it "cache" the result in the usual sense.

Does this helps,
>   select into bau cetti from stablefunc(sameparam);
>
>   select into bingo t.bongo from stablefunc(sameparam) as s
>     join sometable t on s.cetti=t.cetti;
>
> will call stablefunc 2 times?

Yes.

> So... then any pointer to some places where I could learn some
> caching techniques if STABLE doesn't do the trick?

Anything persistnat usually needs to be in a table. If it's a really
small amount you could use the global namespace in pl/perl or similar in
other languages.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

still on techniques to cache table slices was: optimiser STABLE vs. temp table

From
Ivan Sergio Borgonovo
Date:
On Tue, 1 Apr 2008 18:32:25 +0200
Martijn van Oosterhout <kleptog@svana.org> wrote:

> On Tue, Apr 01, 2008 at 06:06:35PM +0200, Ivan Sergio Borgonovo
> wrote:
> > Would you please be so kind to rephrase:
> >
> > http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html

> <snip>

> > I can't understand how it can call a function a single time and
> > avoid to cache the result.
> > Is it limited to a single statement?
>
> Yes, it's limited to a single statememnt. Let's say hypothetically
> postgres could optimise:

It be nice if it was a bit clearer in the docs.
Someone may guess it from the fact that between 2 statements there
could be something that change the result of the function... but well
on the other side people may hope the optimiser is smarter than what
it really is and still can call a function just when actually needed
and discern between:

select ... stablefunc()
select ... stablefunc()

and

select ... stablefunc()
insert somestuff
select ... stablefunc()

That would make caching stuff definitively easier...
IMMUTABLE is too much but extending the "cacheability" of functions a
little bit further with some other attribute would make things much
easier.

> Does this helps,

Definitively. thanks.

> Anything persistnat usually needs to be in a table. If it's a really
> small amount you could use the global namespace in pl/perl or
> similar in other languages.

It does look as it is "semi-persistent"... so temp tables may
actually do the trick.

Albe Laurenz's example was partially comforting. Could somebody point
me to some other technique or a more in depth discussion or whatever
that will help me to learn a bit more about this issue and available
techniques?

I'm wondering about the visibility of temp tables defined in a
function.
From my understanding pg behaviour is the one I'm looking for.
I could use the same temp table name across different sessions
without the trouble of clashes... but well this behaviour is not
standard and I'm worried it will bite me once pg will follow the
standard.

Using the same name would make possible to do something like:

if(table doesn't exist)
  create table
end if
return rows

but this looks like it is going to be harder than expected since temp
table aren't easy to "find".
I'd have to resort to catching exceptions etc...

If I use Albe Laurenz's technique I'll have to do some bookkeeping to
store and pass the temp table name across functions.


thanks


--
Ivan Sergio Borgonovo
http://www.webthatworks.it


Ivan Sergio Borgonovo wrote:
> http://www.postgresql.org/docs/8.1/interactive/xfunc-volatility.html
>
> "A STABLE function cannot modify the database and is guaranteed to
> return the same results given the same arguments for all rows within
> a single statement. This category allows the optimizer to optimize
> multiple calls of the function to a single call. In particular, it is
> safe to use an expression containing such a function in an index scan
> condition. (Since an index scan will evaluate the comparison value
> only once, not once at each row, it is not valid to use a VOLATILE
> function in an index scan condition.)"
>
> I can't understand how it can call a function a single time and avoid
> to cache the result.
> Is it limited to a single statement?

You did not notice the sentence *before* that list:

"The volatility category is a promise to the optimizer about
 the behavior of the function"

What is meant is this:

You must not define a function STABLE unless you are certain that it
will always have the same result if called multiple times with the
same arguments within a single query.

The function itself will not behave any differently if you define
it IMMUTABLE or if you define it VOLATILE. It's up to the function definer
to choose the right setting.

If you do it wrong, weird things may happen.

As an example, if you define:

CREATE FUNCTION random_nr() RETURNS double precision
   IMMUTABLE LANGUAGE sql AS 'SELECT random()';

and you try to select 10 random numbers like:

SELECT random_nr() FROM generate_series(1, 10);

you will get the same number 10 times, because the optimizer will
cause the function to be called only once. If you define the function
as VOLATILE, it will be called ten times because the optimizer knows
that it cannot reuse the first result again.

On the other hand, if you

CREATE OR REPLACE FUNCTION to_upcase(text) RETURNS text
   VOLATILE STRICT LANGUAGE plpgsql AS
  'BEGIN RETURN upper($1); END;';

and you define

CREATE TABLE tab (id integer PRIMARY KEY, val text UNIQUE);
INSERT INTO tab VALUES (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four');

then look at the following EXPLAIN output:

EXPLAIN SELECT id FROM tab WHERE val = to_upcase('three');
                     QUERY PLAN
-----------------------------------------------------
 Seq Scan on tab  (cost=0.00..332.88 rows=1 width=4)
   Filter: (val = to_upcase('three'::text))
(2 rows)

If you had defined the function as IMMUTABLE, the optimizer would know
that it is safe to use the function in an index scan, because it won't
change value:

EXPLAIN SELECT id FROM tab WHERE val = to_upcase('three');
                              QUERY PLAN
-----------------------------------------------------------------------
 Index Scan using tab_val_key on tab  (cost=0.00..8.27 rows=1 width=4)
   Index Cond: (val = 'THREE'::text)
(2 rows)

Yours,
Laurenz Albe