Re: unexpected stable function behavior - Mailing list pgsql-performance

From Merlin Moncure
Subject Re: unexpected stable function behavior
Date
Msg-id AANLkTimTR_f98H4+h57iW6-S=BjZaMTSQ6VznRgXU5Ly@mail.gmail.com
Whole thread Raw
In response to unexpected stable function behavior  (Julius Tuskenis <julius@nsoft.lt>)
Responses Re: unexpected stable function behavior  (Julius Tuskenis <julius@nsoft.lt>)
List pgsql-performance
On Thu, Mar 10, 2011 at 10:26 AM, Julius Tuskenis <julius@nsoft.lt> wrote:
> Hello, list
>
> Our company is creating a ticketing system. Of course the performance issues
> are very important to us (as to all of you I guess). To increase speed of
> some queries stable functions are used, but somehow they don't act exactly
> as I expect, so would you please explain what am I doing (or expecting)
> wrong...
>
> First of all I have the stable function witch runs fast and I have no
> problems with it at all.
> CREATE OR REPLACE FUNCTION web_select_extra_price(prm_price_id integer,
> prm_event_id integer, prm_cashier_id integer)
>  RETURNS numeric AS
> '
> ........ some code here
> '
>  LANGUAGE plpgsql STABLE
>  COST 100;
>
> Now the test:
>
> 1) query without using the function
> explain analyze
>  SELECT thtp_tick_id, price_id,
>    price_price,
>    price_color
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>  ORDER BY price_id;
>
> Result:
> "Sort  (cost=132.47..133.77 rows=518 width=25) (actual time=5.125..5.842
> rows=4335 loops=1)"
> "  Sort Key: ticket_price.price_id"
> "  Sort Method:  quicksort  Memory: 433kB"
> "  ->  Nested Loop  (cost=0.00..109.12 rows=518 width=25) (actual
> time=0.037..3.148 rows=4335 loops=1)"
> "        ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=21) (actual time=0.014..0.026 rows=7
> loops=1)"
> "              Index Cond: (price_event_id = 7820)"
> "              Filter: ((now() >= price_date) AND (now() <=
> price_date_till))"
> "        ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
>  (cost=0.00..47.06 rows=259 width=8) (actual time=0.013..0.211 rows=619
> loops=7)"
> "              Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 6.425 ms"
>
>
> 2) Query using the function
> explain analyze
>  SELECT thtp_tick_id, price_id,
>    price_price, web_select_extra_price(price_id, price_event_id, 1),
>    price_color
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>  ORDER BY price_id;
>
> Result:
> "Sort  (cost=261.97..263.27 rows=518 width=29) (actual time=704.224..704.927
> rows=4335 loops=1)"
> "  Sort Key: ticket_price.price_id"
> "  Sort Method:  quicksort  Memory: 433kB"
> "  ->  Nested Loop  (cost=0.00..238.62 rows=518 width=29) (actual
> time=0.272..699.073 rows=4335 loops=1)"
> "        ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=25) (actual time=0.011..0.052 rows=7
> loops=1)"
> "              Index Cond: (price_event_id = 7820)"
> "              Filter: ((now() >= price_date) AND (now() <=
> price_date_till))"
> "        ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
>  (cost=0.00..47.06 rows=259 width=8) (actual time=0.017..0.582 rows=619
> loops=7)"
> "              Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 705.531 ms"
>
>
> Now what you can think is that executing web_select_extra_price takes the
> difference, but
> 3) As STABLE function should be executed once for every different set of
> parameters I do
> SELECT web_select_extra_price(price_id, 7820, 1) FROM (
>
>  SELECT distinct price_id
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>  ) as qq;
>
> Result:
> "Subquery Scan on qq  (cost=110.34..110.88 rows=2 width=4) (actual
> time=7.265..8.907 rows=7 loops=1)"
> "  ->  HashAggregate  (cost=110.34..110.36 rows=2 width=4) (actual
> time=6.866..6.873 rows=7 loops=1)"
> "        ->  Nested Loop  (cost=0.00..109.05 rows=517 width=4) (actual
> time=0.037..4.643 rows=4335 loops=1)"
> "              ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.038 rows=7 loops=1)"
> "                    Index Cond: (price_event_id = 7820)"
> "                    Filter: ((now() >= price_date) AND (now() <=
> price_date_till))"
> "              ->  Index Scan using idx_thtp_price_id on
> ticket_has_ticket_price  (cost=0.00..47.04 rows=258 width=4) (actual
> time=0.019..0.336 rows=619 loops=7)"
> "                    Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 8.966 ms"
>
>
> You can see the query has only 7 distinct parameter sets to pass to the
> function but...
> 4)   Explain analyze
>  SELECT web_select_extra_price(price_id, 7820, 1)
>  FROM ticket_price
>    JOIN ticket_has_ticket_price ON (price_id = thtp_price_id)
>  WHERE price_event_id = 7820 and (current_timestamp >= price_date AND
> current_timestamp <= price_date_till)
>
> Result:
> "Nested Loop  (cost=0.00..238.30 rows=517 width=4) (actual
> time=0.365..808.537 rows=4335 loops=1)"
> "  ->  Index Scan using index_price_event_id on ticket_price
>  (cost=0.00..8.52 rows=2 width=4) (actual time=0.014..0.040 rows=7 loops=1)"
> "        Index Cond: (price_event_id = 7820)"
> "        Filter: ((now() >= price_date) AND (now() <= price_date_till))"
> "  ->  Index Scan using idx_thtp_price_id on ticket_has_ticket_price
>  (cost=0.00..47.04 rows=258 width=4) (actual time=0.016..0.655 rows=619
> loops=7)"
> "        Index Cond: (ticket_has_ticket_price.thtp_price_id =
> ticket_price.price_id)"
> "Total runtime: 810.143 ms"
>
>
> So I am totally confused... It seems that selecting 4335 rows is a joke for
> Postgresql, but the great job is done then adding one of 7 possible values
> to the result set... Please help me understand what I am missing here?...
>
> Finally the system:
> Server
> PG: Version string    PostgreSQL 9.0.3 on i486-pc-linux-gnu, compiled by GCC
> gcc-4.4.real (Debian 4.4.5-10) 4.4.5, 32-bit
>
> Client
> Win XP SP3 with pgAdmin 1.12.2.

This is a huge problem with non trivial functions in the select list.
Pushing the result into and a subquery does NOT guarantee that the
inner result is materialized first.   Try a CTE.

with foo as
(
  select yadda;
)
select func(foo.a), foo.* from foo;

merlin

pgsql-performance by date:

Previous
From: fork
Date:
Subject: Re: Tuning massive UPDATES and GROUP BY's?
Next
From: Dan Ancona
Date:
Subject: big joins not converging