Thread: Trigger function, bad performance

Trigger function, bad performance

From
"Rogatzki Rainer"
Date:
Hello,

I'm having problems with the following bad performing select-statement
in a trigger-function (on update before):

  ...
  for c in
    select id_country, sum(cost) as sum_cost
    from costs
    where id_user = p_id_user
    and id_state = 1
    and date(request) between p_begin and p_until
    group by id_country;
  loop
    ...
  end loop;
  ...

Explain shows that the following existing partial index isn't used:

  CREATE INDEX ix_costs_user_state_date_0701
  ON costs
  USING btree(id_user, id_state, date(request))
  WHERE id_state = 1 AND date(request) >= '2007-01-01'::date AND
date(request) <= '2007-01-31'::date;


The funny thing is, that while executing the statement with type-casted
string-literals the index is used as expected:

  ...
  for c in
    select id_country, sum(cost) as sum_cost
    from costs
    where id_user = p_id_user
    and id_state = 1
    and date(request) between '2007-01-01'::date AND '2007-01-31'::date
    group by id_country;
  loop
    ...
  end loop;
  ...

Any ideas?


Best regards

Rainer Rogatzki (mailto:rainer.rogatzki@ggrz-hagen.nrw.de)

Re: Trigger function, bad performance

From
"Albe Laurenz"
Date:
Rogatzki Rainer wrote:
> I'm having problems with the following bad performing select-statement
> in a trigger-function (on update before):
>
>   ...
>   for c in
>     select id_country, sum(cost) as sum_cost
>     from costs
>     where id_user = p_id_user
>     and id_state = 1
>     and date(request) between p_begin and p_until
>     group by id_country;
>   loop
>     ...
>   end loop;
>   ...
>
> Explain shows that the following existing partial index isn't used:
>
>   CREATE INDEX ix_costs_user_state_date_0701
>   ON costs
>   USING btree(id_user, id_state, date(request))
>   WHERE id_state = 1 AND date(request) >= '2007-01-01'::date AND
> date(request) <= '2007-01-31'::date;
>
>
> The funny thing is, that while executing the statement with
> type-casted
> string-literals the index is used as expected:
>
>   ...
>   for c in
>     select id_country, sum(cost) as sum_cost
>     from costs
>     where id_user = p_id_user
>     and id_state = 1
>     and date(request) between '2007-01-01'::date AND
> '2007-01-31'::date
>     group by id_country;
>   loop
>     ...
>   end loop;
>   ...
>
> Any ideas?

The problem is that "p_begin" and "p_until" are variables. Consequently PostgreSQL,
when the function is run the first time, will prepare this statement:

    select id_country, sum(cost) as sum_cost
    from costs
    where id_user = $1
    and id_state = 1
    and date(request) between $2 and $3
    group by id_country;

That prepared statement will be reused for subsequent invocations of the trigger
function, whiere the parameters will probably have different values.

So it cannot use the partial index.

If you want the index to be used, don't include "date(request)" in the WHERE clause.

Yours,
Laurenz Albe

Re: Trigger function, bad performance

From
"Rogatzki Rainer"
Date:
> Rogatzki Rainer wrote:
> > I'm having problems with the following bad performing
select-statement
> > in a trigger-function (on update before):
> >
> >   ...
> >   for c in
> >     select id_country, sum(cost) as sum_cost
> >     from costs
> >     where id_user = p_id_user
> >     and id_state = 1
> >     and date(request) between p_begin and p_until
> >     group by id_country;
> >   loop
> >     ...
> >   end loop;
> >   ...
> >
> > Explain shows that the following existing partial index isn't used:
> >
> >   CREATE INDEX ix_costs_user_state_date_0701
> >   ON costs
> >   USING btree(id_user, id_state, date(request))
> >   WHERE id_state = 1 AND date(request) >= '2007-01-01'::date AND
> > date(request) <= '2007-01-31'::date;
> >
> >
> > The funny thing is, that while executing the statement with
> > type-casted string-literals the index is used as expected:
> >
> >   ...
> >   for c in
> >     select id_country, sum(cost) as sum_cost
> >     from costs
> >     where id_user = p_id_user
> >     and id_state = 1
> >     and date(request) between '2007-01-01'::date AND
> > '2007-01-31'::date
> >     group by id_country;
> >   loop
> >     ...
> >   end loop;
> >   ...
> >
> > Any ideas?

Albe Laurenz wrote:
> The problem is that "p_begin" and "p_until" are variables.
Consequently PostgreSQL, when the function is run the first time, will
prepare this statement:
>
>     select id_country, sum(cost) as sum_cost
>     from costs
>     where id_user = $1
>     and id_state = 1
>     and date(request) between $2 and $3
>     group by id_country;
>
> That prepared statement will be reused for subsequent invocations of
the trigger function, whiere the parameters will probably have different
values.
>
> So it cannot use the partial index.
>
> If you want the index to be used, don't include "date(request)" in the
WHERE clause.
>
> Yours,
> Laurenz Albe

Hello Laurenz,

thank you for your analysis!

Unfortunately your proposal is no option for me, since I do have to
include the WHERE clause in both index and procedure.

By the way: The parameters in the WHERE clause (p_begin,p_until) come
from a RECORD which is filled before via SELECT INTO with begin and end
tablefields of the type date like the following:

  my_record RECORD;
  ...
  select into my_record p_begin, p_until
  from accounting_interval
  where id = 1;

I omitted this information in my first posting to make it easier to
read.

In fact I extracted the bad performing statement to let pgadmin explain
and the same effect shows:

  -- Example with bad performance since index isn't used
  explain
  select c.id_country, sum(c.cost) as sum_cost
  from costs c, accounting_interval a
  where c.id_user = 123
  and c.id_state = 1
  and a.id = 1
  and date(c.request) between a.p_begin and a.p_until
  group by id_country;

  -- Example with invoked index (100 times better performance)
  explain
  select c.id_country, sum(c.cost) as sum_cost
  from costs c
  where c.id_user = 123
  and c.id_state = 1
  and date(c.request) between '2007-01-01'::date and '2007-01-31'::date
  group by id_country;

Here I cannot see why statement preparation has an effect at all.

Apart from this I don't really understand why statement preparation
combined with parameters in functions prevent index invocation.
Especially since p_id_user is a parameter as well which doesn't prevent
the usage of another existing index on costs.id_user and costs.id_state.


So you see me still clueless :O)


Best regards
Rainer Rogatzki

Re: Trigger function, bad performance

From
"A. Kretschmer"
Date:
am  Fri, dem 05.12.2008, um 11:41:11 +0100 mailte Rogatzki Rainer folgendes:
>   -- Example with bad performance since index isn't used
>   explain
>   select c.id_country, sum(c.cost) as sum_cost
>   from costs c, accounting_interval a
>   where c.id_user = 123
>   and c.id_state = 1
>   and a.id = 1
>   and date(c.request) between a.p_begin and a.p_until
>   group by id_country;
>
>   -- Example with invoked index (100 times better performance)
>   explain
>   select c.id_country, sum(c.cost) as sum_cost
>   from costs c
>   where c.id_user = 123
>   and c.id_state = 1
>   and date(c.request) between '2007-01-01'::date and '2007-01-31'::date
>   group by id_country;
>
> Here I cannot see why statement preparation has an effect at all.

The planner don't know the parameters at compile-time. Because of this
fakt, the planner choose a other plan (a seq-scan).

You can try to use execute 'your query'. In this case the planner
investigate a new plan, and (maybe) with the index.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Trigger function, bad performance

From
"Rogatzki Rainer"
Date:
> > Rogatzki wrote:
> >   -- Example with bad performance since index isn't used
> >   explain
> >   select c.id_country, sum(c.cost) as sum_cost
> >   from costs c, accounting_interval a
> >   where c.id_user = 123
> >   and c.id_state = 1
> >   and a.id = 1
> >   and date(c.request) between a.p_begin and a.p_until
> >   group by id_country;
> >
> >   -- Example with invoked index (100 times better performance)
> >   explain
> >   select c.id_country, sum(c.cost) as sum_cost
> >   from costs c
> >   where c.id_user = 123
> >   and c.id_state = 1
> >   and date(c.request) between '2007-01-01'::date and
'2007-01-31'::date
> >   group by id_country;
> >
> > Here I cannot see why statement preparation has an effect at all.
>
> Andreas Kretschmer wrote:
> The planner don't know the parameters at compile-time. Because of this
fakt, the planner choose a other plan (a seq-scan).
>
> You can try to use execute 'your query'. In this case the planner
investigate a new plan, and (maybe) with the index.
>
>
> Andreas
> --
> Andreas Kretschmer
> Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
> GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net
Hello Andreas,

your hint did the trick - thank you very much!

After using "execute 'my query'" the index is used as expected.

Though I still wonder why the poor performance occurred since november,
without having done any relevant changes to neither postgres nor db
model (including index, procedures, ...) as far as I know.

Anyway - I'm deeply content with your solution.


Best regards

Rainer Rogatzki (mailto:rainer.rogatzki@ggrz-hagen.nrw.de)

Re: Trigger function, bad performance

From
"A. Kretschmer"
Date:
am  Fri, dem 05.12.2008, um 14:23:33 +0100 mailte Rogatzki Rainer folgendes:
> Hello Andreas,
>
> your hint did the trick - thank you very much!

glad to help you...


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Trigger function, bad performance

From
"Albe Laurenz"
Date:
Rogatzki Rainer wrote:
> > > in a trigger-function (on update before):
> > >
> > >   ...
> > >   for c in
> > >     select id_country, sum(cost) as sum_cost
> > >     from costs
> > >     where id_user = p_id_user
> > >     and id_state = 1
> > >     and date(request) between p_begin and p_until
> > >     group by id_country;
> > >   loop
> > >     ...
> > >   end loop;
> > >   ...
> > >
> > > Explain shows that the following existing partial index isn't used:
> > >
> > >   CREATE INDEX ix_costs_user_state_date_0701
> > >   ON costs
> > >   USING btree(id_user, id_state, date(request))
> > >   WHERE id_state = 1 AND date(request) >= '2007-01-01'::date AND
> > > date(request) <= '2007-01-31'::date;
> > >
> >
> > The problem is that "p_begin" and "p_until" are variables.
> >
> > So it cannot use the partial index.
> >
> > If you want the index to be used, don't include "date(request)" in the
> > WHERE clause.
>
> Unfortunately your proposal is no option for me, since I do have to
> include the WHERE clause in both index and procedure.

You have been ordered to use a partial index?

> Apart from this I don't really understand why statement preparation
> combined with parameters in functions prevent index invocation.
> Especially since p_id_user is a parameter as well which doesn't prevent
> the usage of another existing index on costs.id_user and costs.id_state.

The connection with parameters is by chance.

The main thing is that both "p_begin" and "p_until" are variables.

Andreas Kretschmer gave you the advice you'll want: use dynamic SQL.

Yours,
Laurenz Albe

Re: Trigger function, bad performance

From
"Rogatzki Rainer"
Date:
Laurenz Albe wrote:
> You have been ordered to use a partial index?
>
> > Apart from this I don't really understand why statement preparation
> > combined with parameters in functions prevent index invocation.
> > Especially since p_id_user is a parameter as well which doesn't
> > prevent the usage of another existing index on costs.id_user and
costs.id_state.
>
> The connection with parameters is by chance.
>
> The main thing is that both "p_begin" and "p_until" are variables.
>
> Andreas Kretschmer gave you the advice you'll want: use dynamic SQL.
>
> Yours,
> Laurenz Albe
Well it was no 'order' to use a partial index but it was necessary,
since the table is vastly filled with log-entries that have to persist
over 4 years for documentation. Since old entries are accessed and
changed less often we decided to introduce partial indexes for a better
performance. This prooved to speed up especially frequently used monthly
reports.

Now I understand (thanks to Andreas Kretschmer and you) that at
execution planning time postgres cannot decide what partial index to use
for following procedure calls and thus doesn't invoke it.

After implementing Andreas' proposal (execute 'my statement') the
expected index is used and performance is 125 time better :O))


Best regards
Rainer Rogatzki