Thread: Refreshing functional index

Refreshing functional index

From
Grzegorz Tańczyk
Date:
Hello,

I have a problem with functional index feature in Postgres 8.3

There are two tables, lets call them: PARENTS and CHILDREN(with timestamp column)

I created functional index on parents with function, which selects max value of timestamp from child elements(for given parent_id).

The problem is that plpgsql function, which returns the value is IMMUTABLE and it works like a cache.  When I insert new record to children table, select over parents with function gives wrong(outdated) results.

So far I figured out only one way to flush this "cache". It's by calling REINDEX on my index. I guess I should call it after every insert to children table. It's not good for me since it locks the table.

I'm thinking about partitioning the index by my app, so reindexing will be less painful, but perhaps there is some other easier way to solve tihs problem?

Thanks

--
Regards,
  Grzegorz

Re: Refreshing functional index

From
Merlin Moncure
Date:
On Wed, Aug 29, 2012 at 4:01 PM, Grzegorz Tańczyk <goliatus@polzone.pl> wrote:
> Hello,
>
> I have a problem with functional index feature in Postgres 8.3
>
> There are two tables, lets call them: PARENTS and CHILDREN(with timestamp
> column)
>
> I created functional index on parents with function, which selects max value
> of timestamp from child elements(for given parent_id).
>
> The problem is that plpgsql function, which returns the value is IMMUTABLE
> and it works like a cache.  When I insert new record to children table,
> select over parents with function gives wrong(outdated) results.
>
> So far I figured out only one way to flush this "cache". It's by calling
> REINDEX on my index. I guess I should call it after every insert to children
> table. It's not good for me since it locks the table.
>
> I'm thinking about partitioning the index by my app, so reindexing will be
> less painful, but perhaps there is some other easier way to solve tihs
> problem?

Well, the only reason what you're trying to do works at all is because
the database isn't stricter about double checking to see if your stuff
is IMMUTABLE: it isn't, so of course it doesn't work.

How about a trigger on the child table that updates an indexed column
on parent?

merlin


Re: Refreshing functional index

From
"David Johnston"
Date:
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Grzegorz Tanczyk
Sent: Wednesday, August 29, 2012 5:02 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Refreshing functional index

Hello,

I have a problem with functional index feature in Postgres 8.3

There are two tables, lets call them: PARENTS and CHILDREN(with timestamp column)

I created functional index on parents with function, which selects max value of timestamp from child elements(for given
parent_id). 

The problem is that plpgsql function, which returns the value is IMMUTABLE and it works like a cache.  When I insert
newrecord to children table, select over parents with function gives wrong(outdated) results.  

So far I figured out only one way to flush this "cache". It's by calling REINDEX on my index. I guess I should call it
afterevery insert to children table. It's not good for me since it locks the table.  

I'm thinking about partitioning the index by my app, so reindexing will be less painful, but perhaps there is some
othereasier way to solve tihs problem?  

Thanks

=========================================================================================

This is not really a problem rather the behavior is working as designed and you are trying or expecting it to behave
differently.

First I would see whether I actually need an INDEX for whatever problem you are trying to solve.  A view with an
embeddedsub-query to dynamically determine the max(timestamp) is likely to perform decently if there are not too many
rowsfor each child. 

Any other suggestions require guessing about your data properties but if said timestamp is auto-assigned and thus
alwaysincreasing as new children are added (and children are not deleted or change their timestamp) then adding an
INSERTtrigger on the child table that updates either the parent or some other relation-maintaining table would likely
suffice.

Also, your function is STABLE, not IMMUTABLE.  Saying it is IMMUTABLE in order to create the index doesn't magically
overcomethe very reason that STABLE functions cannot be indexed. 

I would also suggest that using a timestamp is probably not the best decision.  It is possible that two transactions
couldbe started at the same time and thus the resultant timestamps would match as well.  Also are you guaranteed that
thetimes with always come from the same source?  Again, they WHY behind your decision is unknown but storing the PK of
the"most recent child" would make more sense conceptually rather than storing a time and having to back into the child. 

If you go that route upon INSERT you simply "UPDATE parent SET child_id_mostrecent = child_pk".  In the face of
concurrencythe last one to commit remains.  Upon deletion, if allowed, you simply set it to NULL so that there is no
"mostrecent".  Another option in that case would be to store an array and pre-pend each new child PK but also truncate
thearray to a maximum of, say 10 children.  Upon delete you would then just remove any references to the deleted child
fromthe array.  For query purposes the first child is the one that matters (if present since an empty array is still
possible). Updates could work the same way if you want to "refresh" the recentness of the children in that situation. 

Lots of options (mostly trigger on the child oriented) but a functional index is not one of them.

David J.






Re: Refreshing functional index

From
Grzegorz Tańczyk
Date:
Hello,

W dniu 2012-08-29 23:06, Merlin Moncure pisze:
Well, the only reason what you're trying to do works at all is because the database isn't stricter about double checking to see if your stuff is IMMUTABLE: it isn't, so of course it doesn't work. How about a trigger on the child table that updates an indexed column on parent? merlin

According to docs:
"An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments forever."

My tables look like this:
CREATE TABLE groups (
  id serial PRIMARY KEY,
  last_item integer REFERENCES items
) WITHOUT OIDS;

CREATE TABLE items (
  id serial PRIMARY KEY,
  group integer NOT NULL REFERENCES groups,
  ts timestamp DEFAULT now()
) WITHOUT OIDS;

The index:
CREATE INDEX groups_last_ts
  ON groups
  USING btree
  (items_ts(last_post));

Plpgsql function items_ts returns timestamp for given item, which will never change(that's my assumption), so in fact according to definition IT IS immutable fuction.

Unfortunately, whenever I update last_item column in groups, I get wrong results, so I query like this:

SELECT * FROM groups WHERE items_ts(last_item) > now() - interval '1 week'

returns "outdated" results

I do realize about other ways for solving this problem, however I would prefer if it worked in the way described above.

Thanks!

--
Regards,
  Grzegorz

Re: Refreshing functional index

From
"David Johnston"
Date:
>>Grzegorz>>
Hello,

W dniu 2012-08-29 23:06, Merlin Moncure pisze:
Well, the only reason what you're trying to do works at all is because the database isn't stricter about double
checkingto see if your stuff is IMMUTABLE: it isn't, so of course it doesn't work. How about a trigger on the child
tablethat updates an indexed column on parent? merlin  

According to docs:
"An IMMUTABLE function cannot modify the database and is guaranteed to return the same results given the same arguments
forever."

My tables look like this:
CREATE TABLE groups (
  id serial PRIMARY KEY,
  last_item integer REFERENCES items
) WITHOUT OIDS;

CREATE TABLE items (
  id serial PRIMARY KEY,
  group integer NOT NULL REFERENCES groups,
  ts timestamp DEFAULT now()
) WITHOUT OIDS;

The index:
CREATE INDEX groups_last_ts
  ON groups
  USING btree
  (items_ts(last_post));

Plpgsql function items_ts returns timestamp for given item, which will never change(that's my assumption), so in fact
accordingto definition IT IS immutable fuction. 

Unfortunately, whenever I update last_item column in groups, I get wrong results, so I query like this:

SELECT * FROM groups WHERE items_ts(last_item) > now() - interval '1 week'

returns "outdated" results

I do realize about other ways for solving this problem, however I would prefer if it worked in the way described above.

>>/Grzegorz>>


From before you wrote:

"When I insert new record to children table, select over parents with function gives wrong(outdated) results."

Which is not the same as what you are describing above.


Furthermore:

"I created functional index on parents with function, which selects max value of timestamp from child elements(for
givenparent_id)." 

Is not the same as:

"..items_ts returns the timestamp for the given item..."


If all "items_ts" did was return the timestamp of the provided child then when you update the "last_item" column on
"groups"(however you would decide to do that) a new index entry would be created that stores the timestamp for the
specifiedchild id.  As long as the child's timestamp doesn't change (or become deleted) then the index will maintain
thecorrect value. 

Given that you are seeing "outdated" results that means you are changing the "items" table without updating the
"groups"table in a corresponding manner but instead are expecting the index function to somehow magically update.  That
isnot how the system works. 

If you want to put forth a self-contained example with descriptions of exactly where you believe there is a problem
thenmaybe we can help you understand better.  As it stands now you have provided two different descriptions of your
situation. The first one seems to be the most accurate and based upon that description the advice you have been given
iscorrect.  The second example is incomplete but could indeed work (given specific assumptions).  The idea you are
suggestingis that you maintain the id of the most recent "item" on the "group" table then use a functional index to
cachethe timestamp of that child.  The question becomes how do you update the "item id" on the "groups" table when you
addnew records to "items". 

David J.







Re: Refreshing functional index

From
Grzegorz Tańczyk
Date:
Hello,

W dniu 2012-08-31 20:25, David Johnston pisze:
> The question becomes how do you update the "item id" on the "groups"
> table when you add new records to "items".

I have a trigger on items table:

CREATE TRIGGER items_insert
   BEFORE INSERT
   ON items
   FOR EACH ROW
   EXECUTE PROCEDURE items_oninsert();

CREATE OR REPLACE FUNCTION items_oninsert()
   RETURNS trigger AS
$BODY$
BEGIN
   UPDATE groups SET last_item = NEW.id WHERE id=NEW.group AND
(last_item IS NULL OR last_item<NEW.id);
    RETURN NEW;
END;
$BODY$
   LANGUAGE 'plpgsql' VOLATILE;

Thanks!

--
Regards,
   Grzegorz



Re: Refreshing functional index

From
Grzegorz Tańczyk
Date:
"BEFORE INSERT" was the problem here, "AFTER INSERT" works fine

Thanks for all answers.

--
Regards,
   Grzegorz


W dniu 2012-08-31 20:41, Grzegorz Tańczyk pisze:
> Hello,
>
> W dniu 2012-08-31 20:25, David Johnston pisze:
>> The question becomes how do you update the "item id" on the "groups"
>> table when you add new records to "items".
>
> I have a trigger on items table:
>
> CREATE TRIGGER items_insert
>   BEFORE INSERT
>   ON items
>   FOR EACH ROW
>   EXECUTE PROCEDURE items_oninsert();
>
> CREATE OR REPLACE FUNCTION items_oninsert()
>   RETURNS trigger AS
> $BODY$
> BEGIN
>   UPDATE groups SET last_item = NEW.id WHERE id=NEW.group AND
> (last_item IS NULL OR last_item<NEW.id);
>    RETURN NEW;
> END;
> $BODY$
>   LANGUAGE 'plpgsql' VOLATILE;
>
> Thanks!
>
> --
> Regards,
>   Grzegorz