Thread: Re: [BUGS] BUG #6325: Useless Index updates

Re: [BUGS] BUG #6325: Useless Index updates

From
Daniel Migowski
Date:
Continuing this talk on general, as requested by Craig.

I have a functional Index on a table that is relative expensive to calculate. Now I noticed on every update of even
index-unrelatedfields of the table the index function is calculated again and again.  

I currenly understand that if the update moves the row to a new location (no HOT replacement), the key to the index has
tobe calculated from the old and the new row to update the index. 

This is expensive in my case, and useless, if the input to the immutable index function has not changed in my update
statement,and as such the calculation should always be done just once. In case of HOT replacement, it hasn't to be done
atall. 

I assume, that comparing values to each other is in most times cheaper than calling a function have these values as
parameters.If there is a high cost on the function (>1000?), it would be a good thing to always check if the inputs to
thefunction have changed, before calling this function once or twice. Since I have a lot of functional indexes, I would
greatlyprofit from an improvement in this area. 

Is anyone interested in implementing this?

Regards,
Daniel Migowski
________________________________  ________
Von: Craig Ringer [ringerc@ringerc.id.au]
Gesendet: Sonntag, 4. Dezember 2011 15:02
Bis: Daniel Migowski
Cc: pgsql-bugs@postgresql.org
Betreff: Re: [BUGS] BUG #6325: Useless Index updates

On 12/04/2011 08:54 PM, dmigowski@ikoffice.de wrote:
> The following bug has been logged on the website:
>
> Bug reference:      6325
> Logged by:          Daniel Migowski
> Email address:      dmigowski@ikoffice.de
> PostgreSQL version: 8.3.16
> Operating system:   Linux
> Description:
>
> It seems that an update to a row in a table always removes the element from
> an index and adds it again. Wouldn't it be faster to check for equality of
> the index parameters in the OLD and NEW record first?

- This isn't a bug report, it's a feature/enhancement request. Please
   use the mailing lists.

- You're reporting this issue against an old patch release of an old
   major release. Why not check with 9.1?

- The index isn't always updated. Check out HOT (introduced in 8.4, the
   release after your current one) which reduces unnecessary index
   updates in cases where the old and new row can fit on the same
   heap page.

- In most other cases the index update can't be avoided, because
   the new and old rows are on different database pages. The old index
   entry has to remain in place so that still-running transactions that
   can see the old row can still find it in the index, so it can't be
   overwritten and instead a new entry has to be added.

> I have this problem with an functional index using a relative expensive
> index function, and noticed that the index function is always called even if
> the parameter to the index function has not changed. Wouldn't it be better
> to validate that the input to the index functions has not changed, instead
> of calling the index function over and over again? Especially since the
> index functions seems to be called with the new and the old value anyway.

That's a more interesting one. Perhaps you could write it up in more
detail, with a test case, and submit it to the pgsql-general mailing list?

This isn't just about functions anyway. Pg would have to compare *all*
inputs to the old index expression to see if they were the same.
Otherwise, in an expression like  f(g(x,y),z)  Pg would not have any
stored value for the result of g(x,y) to compare against. It'd have to
instead compare (x1,y1,z1) to (x2,y2,z2) and decide that if they were
the same the result of the index expression hadn't changed.

That's probably possible, but I'm not sure it'd be a win over just
evaluating the expression in most cases. How would Pg know when to do
it? Using function COST parameters?

Essentially, this isn't as simple as it looks at face value.

> I can understand that this might be a precaution in the case that the index
> function isn't stable (is it even possible to use such a function for an
> index?)

No, it isn't possible. Index functions must be immutable, not just
stable, so their output must be determined entirely by their parameters.
At least on newer versions STABLE or VOLATILE functions should be
rejected in index expressions.

--
Craig Ringer

Re: [BUGS] BUG #6325: Useless Index updates

From
Gregg Jaskiewicz
Date:
btw, HOT was introduced in 8.3.


On 6 December 2011 14:51, Daniel Migowski <dmigowski@ikoffice.de> wrote:
>
> Continuing this talk on general, as requested by Craig.
>
> I have a functional Index on a table that is relative expensive to calculate. Now I noticed on every update of even
index-unrelatedfields of the table the index function is calculated again and again. 
>
> I currenly understand that if the update moves the row to a new location (no HOT replacement), the key to the index
hasto be calculated from the old and the new row to update the index. 
>
> This is expensive in my case, and useless, if the input to the immutable index function has not changed in my update
statement,and as such the calculation should always be done just once. In case of HOT replacement, it hasn't to be done
atall. 
>
> I assume, that comparing values to each other is in most times cheaper than calling a function have these values as
parameters.If there is a high cost on the function (>1000?), it would be a good thing to always check if the inputs to
thefunction have changed, before calling this function once or twice. Since I have a lot of functional indexes, I would
greatlyprofit from an improvement in this area. 
>
> Is anyone interested in implementing this?
>
> Regards,
> Daniel Migowski
> ________________________________  ________
> Von: Craig Ringer [ringerc@ringerc.id.au]
> Gesendet: Sonntag, 4. Dezember 2011 15:02
> Bis: Daniel Migowski
> Cc: pgsql-bugs@postgresql.org
> Betreff: Re: [BUGS] BUG #6325: Useless Index updates
>
> On 12/04/2011 08:54 PM, dmigowski@ikoffice.de wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      6325
>> Logged by:          Daniel Migowski
>> Email address:      dmigowski@ikoffice.de
>> PostgreSQL version: 8.3.16
>> Operating system:   Linux
>> Description:
>>
>> It seems that an update to a row in a table always removes the element from
>> an index and adds it again. Wouldn't it be faster to check for equality of
>> the index parameters in the OLD and NEW record first?
>
> - This isn't a bug report, it's a feature/enhancement request. Please
>   use the mailing lists.
>
> - You're reporting this issue against an old patch release of an old
>   major release. Why not check with 9.1?
>
> - The index isn't always updated. Check out HOT (introduced in 8.4, the
>   release after your current one) which reduces unnecessary index
>   updates in cases where the old and new row can fit on the same
>   heap page.
>
> - In most other cases the index update can't be avoided, because
>   the new and old rows are on different database pages. The old index
>   entry has to remain in place so that still-running transactions that
>   can see the old row can still find it in the index, so it can't be
>   overwritten and instead a new entry has to be added.
>
>> I have this problem with an functional index using a relative expensive
>> index function, and noticed that the index function is always called even if
>> the parameter to the index function has not changed. Wouldn't it be better
>> to validate that the input to the index functions has not changed, instead
>> of calling the index function over and over again? Especially since the
>> index functions seems to be called with the new and the old value anyway.
>
> That's a more interesting one. Perhaps you could write it up in more
> detail, with a test case, and submit it to the pgsql-general mailing list?
>
> This isn't just about functions anyway. Pg would have to compare *all*
> inputs to the old index expression to see if they were the same.
> Otherwise, in an expression like  f(g(x,y),z)  Pg would not have any
> stored value for the result of g(x,y) to compare against. It'd have to
> instead compare (x1,y1,z1) to (x2,y2,z2) and decide that if they were
> the same the result of the index expression hadn't changed.
>
> That's probably possible, but I'm not sure it'd be a win over just
> evaluating the expression in most cases. How would Pg know when to do
> it? Using function COST parameters?
>
> Essentially, this isn't as simple as it looks at face value.
>
>> I can understand that this might be a precaution in the case that the index
>> function isn't stable (is it even possible to use such a function for an
>> index?)
>
> No, it isn't possible. Index functions must be immutable, not just
> stable, so their output must be determined entirely by their parameters.
> At least on newer versions STABLE or VOLATILE functions should be
> rejected in index expressions.
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
GJ

Re: [BUGS] BUG #6325: Useless Index updates

From
Craig Ringer
Date:
On 06/12/11 22:51, Daniel Migowski wrote:
> Continuing this talk on general, as requested by Craig.
>
> I have a functional Index on a table that is relative expensive to calculate. Now I noticed on every update of even
index-unrelatedfields of the table the index function is calculated again and again.  

I thought I'd test this out, so I put together a quick test case. It's
clear that HOT is doing its job, because it's not easy to reproduce your
issue in a trivial dummy table. I had to fill up the table so there was
no free space on most of the pages before I could force repeat
evaluation of the expensive function. Anyway, here's a test-case:


DROP TABLE IF EXISTS funcindex;

CREATE OR REPLACE FUNCTION expensive(integer) RETURNS integer AS $$
BEGIN
  RAISE NOTICE 'Sleeping for 1s in expensive(%)',$1;
  PERFORM pg_sleep(1);
  RETURN $1 / 2;
END;
$$ LANGUAGE 'plpgsql' STRICT IMMUTABLE;

CREATE TABLE funcindex (
  id SERIAL PRIMARY KEY,
  somenumber integer not null,
  nonindexed integer not null
);

CREATE INDEX funcindex_somenumber_expensive_idx
ON funcindex( (expensive(somenumber)) );

-- Sleeps 3 times, once per expensive() invocation
INSERT INTO funcindex(somenumber, nonindexed) VALUES (1,1), (2,2), (3,3);

-- Uses index, avoiding evaluating expensive()
SELECT * from funcindex where expensive(somenumber) = 1;

-- Avoids evaluating expensive(): no changes
UPDATE funcindex SET somenumber = somenumber;

-- Avoids evaluating expensive(): no change to indexed col
UPDATE funcindex SET nonindexed = nonindexed + 1;

-- Evaluates expensive(): indexed col changed
UPDATE funcindex SET somenumber = somenumber + 1;

-- Let's put in some proper dummy data to fill the table
-- up and try to defeat HOT:
TRUNCATE TABLE funcindex;

-- Replace the wait while we populate the table
CREATE OR REPLACE FUNCTION expensive(integer) RETURNS integer AS $$
BEGIN
  RETURN $1 / 2;
END;
$$ LANGUAGE 'plpgsql' STRICT IMMUTABLE;

INSERT INTO funcindex(somenumber,nonindexed)
SELECT x.n, x.n*2 FROM (SELECT generate_series(1,10000) AS n) AS x;

-- and make it slow again after
CREATE OR REPLACE FUNCTION expensive(integer) RETURNS integer AS $$
BEGIN
  RAISE NOTICE 'Sleeping for 1s in expensive(%)',$1;
  PERFORM pg_sleep(1);
  RETURN $1 / 2;
END;
$$ LANGUAGE 'plpgsql' STRICT IMMUTABLE;

-- Because there's no free space in the pages to allow HOT to work, this
will
-- basically never end, it'll just sit in endless sleeps.
-- Using a FILLFACTOR makes this complete nearly instantly.
UPDATE funcindex SET somenumber = somenumber+1;




>
> I currenly understand that if the update moves the row to a new location (no HOT replacement), the key to the index
hasto be calculated from the old and the new row to update the index. 
>
> This is expensive in my case, and useless, if the input to the immutable index function has not changed in my update
statement,and as such the calculation should always be done just once. In case of HOT replacement, it hasn't to be done
atall. 

At least on 8.4, when HOT kicks in it appears the evaluation is *not*
done. See test case above.

If HOT can't do its job due to lack of space in the page, then Pg will
evaluate the function, and it would be nice if it didn't have to. It's
not as simple as skipping this for functions, though; Pg would need a
way to determine which fields the result of any expression depended on,
as the slow index expression might be something like:

  ( f(a) * f(g(b)) )

where the result depends on the values of (a) and (b).