Query much slower when run from postgres function - Mailing list pgsql-performance

From Mario Splivalo
Subject Query much slower when run from postgres function
Date
Msg-id 49B51B60.6030600@megafon.hr
Whole thread Raw
Responses Re: Query much slower when run from postgres function  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Query much slower when run from postgres function  (decibel <decibel@decibel.org>)
List pgsql-performance
I have a function, looking like this:

CREATE OR REPLACE FUNCTION get_memo_display_queue_size(a_service_id integer)
   RETURNS integer AS
$BODY$
SELECT
    COUNT(*)::integer
FROM
    v_messages_memo
    LEFT JOIN messages_memo_displayed
        ON id = message_id
WHERE
    service_id = $1
    AND state = 1
    AND admin_id IS NULL;
$BODY$
   LANGUAGE 'sql' VOLATILE SECURITY DEFINER
   COST 100;

Now, when I run that function from psql, it takes around 200ms to complete:

pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                        QUERY PLAN

----------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=219.728..219.730
rows=1 loops=1)
  Total runtime: 219.758 ms
(2 rows)

pulitzer2=#

And it takes around 200ms each time I run the function!


When I rewrite the query so I can see queryplan, I get this:

create view _v1 as
SELECT
    *
FROM
    v_messages_memo
    LEFT JOIN messages_memo_displayed
        ON id = message_id
WHERE
    state = 1
    AND admin_id IS NULL;

pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id =
1829;

   QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=23506.14..23506.15 rows=1 width=0) (actual
time=6.001..6.002 rows=1 loops=1)
    ->  Nested Loop  (cost=150.69..23505.59 rows=216 width=0) (actual
time=5.744..5.971 rows=13 loops=1)
          ->  Hash Left Join  (cost=150.69..11035.16 rows=2104 width=4)
(actual time=5.721..5.793 rows=13 loops=1)
                Hash Cond: (messages.id =
messages_memo_displayed.message_id)
                Filter: (messages_memo_displayed.admin_id IS NULL)
                ->  Bitmap Heap Scan on messages  (cost=97.03..10955.11
rows=4209 width=4) (actual time=0.042..0.075 rows=13 loops=1)
                      Recheck Cond: (service_id = 1829)
                      ->  Bitmap Index Scan on
messages_uq__service_id__tan  (cost=0.00..95.98 rows=4209 width=0)
(actual time=0.032..0.032 rows=13 loops=1)
                            Index Cond: (service_id = 1829)
                ->  Hash  (cost=28.85..28.85 rows=1985 width=8) (actual
time=5.666..5.666 rows=1985 loops=1)
                      ->  Seq Scan on messages_memo_displayed
(cost=0.00..28.85 rows=1985 width=8) (actual time=0.009..2.697 rows=1985
loops=1)
          ->  Index Scan using messages_memo_pk on messages_memo
(cost=0.00..5.91 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=13)
                Index Cond: (messages_memo.message_id = messages.id)
                Filter: ((messages_memo.state)::integer = 1)
  Total runtime: 6.079 ms
(15 rows)


So I noticed that postgres is using seq_scan on messages_memo_displayed,
although there is a PK (and an index) on message_id in
messages_memo_displayed (I'll post DDL of the tables at the end of the
post).

So, I tried EXPLAIN ANALYZE after I forced planner not to use sequential
scans:

pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id =
1829;

               QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=25403.60..25403.61 rows=1 width=0) (actual
time=6.546..6.547 rows=1 loops=1)
    ->  Nested Loop  (cost=2048.16..25403.06 rows=216 width=0) (actual
time=6.287..6.512 rows=13 loops=1)
          ->  Hash Left Join  (cost=2048.16..12932.63 rows=2104 width=4)
(actual time=6.268..6.340 rows=13 loops=1)
                Hash Cond: (messages.id =
messages_memo_displayed.message_id)
                Filter: (messages_memo_displayed.admin_id IS NULL)
                ->  Bitmap Heap Scan on messages  (cost=97.03..10955.11
rows=4209 width=4) (actual time=0.043..0.078 rows=13 loops=1)
                      Recheck Cond: (service_id = 1829)
                      ->  Bitmap Index Scan on
messages_uq__service_id__tan  (cost=0.00..95.98 rows=4209 width=0)
(actual time=0.032..0.032 rows=13 loops=1)
                            Index Cond: (service_id = 1829)
                ->  Hash  (cost=1926.31..1926.31 rows=1985 width=8)
(actual time=6.211..6.211 rows=1985 loops=1)
                      ->  Index Scan using messages_memo_displayed_pk on
messages_memo_displayed  (cost=0.00..1926.31 rows=1985 width=8) (actual
time=0.069..3.221 rows=1985 loops=1)
          ->  Index Scan using messages_memo_pk on messages_memo
(cost=0.00..5.91 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=13)
                Index Cond: (messages_memo.message_id = messages.id)
                Filter: ((messages_memo.state)::integer = 1)
  Total runtime: 6.628 ms
(15 rows)


No sequential scan. So I 'changed' my function so that first row says
'SET enable_seqscan TO false'. After that, here are the times for the
function call:

mike@som:~$ psql -U postgres pulitzer2
Welcome to psql 8.3.5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
        \h for help with SQL commands
        \? for help with psql commands
        \g or terminate with semicolon to execute query
        \q to quit
pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                        QUERY PLAN

----------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=300.448..300.450
rows=1 loops=1)
  Total runtime: 300.491 ms
(2 rows)

pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                      QUERY PLAN

------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1.940..1.941
rows=1 loops=1)
  Total runtime: 1.961 ms
(2 rows)

pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                      QUERY PLAN

------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1.946..1.947
rows=1 loops=1)
  Total runtime: 1.973 ms
(2 rows)

pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                      QUERY PLAN

------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=1.936..1.937
rows=1 loops=1)
  Total runtime: 1.964 ms
(2 rows)

pulitzer2=#

So, first query on the same connection takes 300ms, and any
subsequential query on the same connection takes less than 2 ms. If I
remove 'SET enable_seqscan TO false' from the top of the function, every
call to the function takes around 200-300ms.


Now, as I was explained on pg-jdbc mailinglist, that 'SET enable_seqscan
TO false' affects all queries on that persistent connection from tomcat,
and It's not good solution. So I wanted to post here to ask what other
options do I have.

While writing this I realized that, without forcing sequential scan out,
I get much quicker execution times when I do:

SELECT count(*) FROM _v1 WHERE service_id = 1829

then when I do

SELECT get_memo_display_queue_size(1829),

as seen here:



mike@som:~$ psql -U postgres pulitzer2
Welcome to psql 8.3.5, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
        \h for help with SQL commands
        \? for help with psql commands
        \g or terminate with semicolon to execute query
        \q to quit

pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                        QUERY PLAN

----------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=259.090..259.092
rows=1 loops=1)
  Total runtime: 259.132 ms
(2 rows)

pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id =
1829;

   QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=23517.98..23517.99 rows=1 width=0) (actual
time=5.942..5.943 rows=1 loops=1)
    ->  Nested Loop  (cost=150.70..23517.44 rows=216 width=0) (actual
time=5.674..5.909 rows=13 loops=1)
          ->  Hash Left Join  (cost=150.70..11037.87 rows=2105 width=4)
(actual time=5.633..5.706 rows=13 loops=1)
                Hash Cond: (messages.id =
messages_memo_displayed.message_id)
                Filter: (messages_memo_displayed.admin_id IS NULL)
                ->  Bitmap Heap Scan on messages  (cost=97.04..10957.81
rows=4210 width=4) (actual time=0.032..0.063 rows=13 loops=1)
                      Recheck Cond: (service_id = 1829)
                      ->  Bitmap Index Scan on
messages_uq__service_id__tan  (cost=0.00..95.98 rows=4210 width=0)
(actual time=0.022..0.022 rows=13 loops=1)
                            Index Cond: (service_id = 1829)
                ->  Hash  (cost=28.85..28.85 rows=1985 width=8) (actual
time=5.588..5.588 rows=1985 loops=1)
                      ->  Seq Scan on messages_memo_displayed
(cost=0.00..28.85 rows=1985 width=8) (actual time=0.009..2.690 rows=1985
loops=1)
          ->  Index Scan using messages_memo_pk on messages_memo
(cost=0.00..5.92 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=13)
                Index Cond: (messages_memo.message_id = messages.id)
                Filter: ((messages_memo.state)::integer = 1)
  Total runtime: 6.026 ms
(15 rows)

pulitzer2=# explain analyze select get_memo_display_queue_size(1829);
                                        QUERY PLAN

----------------------------------------------------------------------------------------
  Result  (cost=0.00..0.26 rows=1 width=0) (actual time=211.712..211.714
rows=1 loops=1)
  Total runtime: 211.742 ms
(2 rows)

pulitzer2=# EXPLAIN ANALYZE select count(*) from _v1 WHERE service_id =
1829;

   QUERY PLAN


------------------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=23517.98..23517.99 rows=1 width=0) (actual
time=5.918..5.920 rows=1 loops=1)
    ->  Nested Loop  (cost=150.70..23517.44 rows=216 width=0) (actual
time=5.659..5.885 rows=13 loops=1)
          ->  Hash Left Join  (cost=150.70..11037.87 rows=2105 width=4)
(actual time=5.638..5.711 rows=13 loops=1)
                Hash Cond: (messages.id =
messages_memo_displayed.message_id)
                Filter: (messages_memo_displayed.admin_id IS NULL)
                ->  Bitmap Heap Scan on messages  (cost=97.04..10957.81
rows=4210 width=4) (actual time=0.043..0.078 rows=13 loops=1)
                      Recheck Cond: (service_id = 1829)
                      ->  Bitmap Index Scan on
messages_uq__service_id__tan  (cost=0.00..95.98 rows=4210 width=0)
(actual time=0.033..0.033 rows=13 loops=1)
                            Index Cond: (service_id = 1829)
                ->  Hash  (cost=28.85..28.85 rows=1985 width=8) (actual
time=5.581..5.581 rows=1985 loops=1)
                      ->  Seq Scan on messages_memo_displayed
(cost=0.00..28.85 rows=1985 width=8) (actual time=0.009..2.678 rows=1985
loops=1)
          ->  Index Scan using messages_memo_pk on messages_memo
(cost=0.00..5.92 rows=1 width=4) (actual time=0.006..0.008 rows=1 loops=13)
                Index Cond: (messages_memo.message_id = messages.id)
                Filter: ((messages_memo.state)::integer = 1)
  Total runtime: 5.994 ms
(15 rows)

pulitzer2=#



Now I'm confused, why is 'sql' function much slower than 'direct' SELECT?

    Mike

P.S. Here are tables definition, from psql:

pulitzer2=# \d messages
                                                Table "public.messages"
        Column       |           Type           |
        Modifiers
--------------------+--------------------------+---------------------------------------------------------------------
  id                 | integer                  | not null default
nextval(('public.message_id_seq'::text)::regclass)
  from               | character varying(15)    | not null
  to                 | character varying(10)    | not null
  receiving_time     | timestamp with time zone | not null default now()
  raw_text           | character varying        | not null
  keyword            | character varying        |
  destination_id     | integer                  | not null
  vpn_id             | integer                  |
  service_id         | integer                  |
  status             | integer                  | not null default 2
  gateway_message_id | character varying        | not null
  prize_id           | integer                  |
  tan                | character varying        |
Indexes:
     "messages_pk" PRIMARY KEY, btree (id)
     "messages_uq__gateway_message_id" UNIQUE, btree (gateway_message_id)
     "messages_uq__service_id__tan" UNIQUE, btree (service_id, tan)
     "messages_ix_from" btree ("from")
     "messages_ix_receiving_time__service_id__status" btree
(receiving_time, service_id, status)
     "messages_ix_vpn_id" btree (vpn_id)
Foreign-key constraints:
     "messages_fk__destinations_id" FOREIGN KEY (destination_id)
REFERENCES destinations(id)
     "messages_fk__service_prizes_prize_id" FOREIGN KEY (prize_id)
REFERENCES service_prizes(prize_id)
     "messages_fk__services_id" FOREIGN KEY (service_id) REFERENCES
services(id)
     "messages_fk__vpns_id" FOREIGN KEY (vpn_id) REFERENCES vpns(id)


pulitzer2=# \d messages_memo
                  Table "public.messages_memo"
          Column         |           Type           | Modifiers
------------------------+--------------------------+-----------
  message_id             | integer                  | not null
  memo                   | character varying        |
  state                  | dom_messages_memo_state  | not null
  admin_id               | integer                  |
  admin_change_timestamp | timestamp with time zone |
Indexes:
     "messages_memo_pk" PRIMARY KEY, btree (message_id)
Foreign-key constraints:
     "messages_memo_fk__messages_id" FOREIGN KEY (message_id) REFERENCES
messages(id)

pulitzer2=# \d messages_memo_displayed
Table "public.messages_memo_displayed"
    Column   |  Type   | Modifiers
------------+---------+-----------
  message_id | integer | not null
  admin_id   | integer | not null
Indexes:
     "messages_memo_displayed_pk" PRIMARY KEY, btree (message_id, admin_id)
Foreign-key constraints:
     "messages_memo_displayed_fk__admins_id" FOREIGN KEY (admin_id)
REFERENCES admins(id)
     "messages_memo_displayed_fk__messages_id" FOREIGN KEY (message_id)
REFERENCES messages(id)

pulitzer2=# \d v_messages_memo
                View "public.v_messages_memo"
        Column       |           Type           | Modifiers
--------------------+--------------------------+-----------
  id                 | integer                  |
  from               | character varying(15)    |
  to                 | character varying(10)    |
  receiving_time     | timestamp with time zone |
  raw_text           | character varying        |
  keyword            | character varying        |
  destination_id     | integer                  |
  vpn_id             | integer                  |
  service_id         | integer                  |
  status             | integer                  |
  gateway_message_id | character varying        |
  prize_id           | integer                  |
  tan                | character varying        |
  memo               | character varying        |
  state              | dom_messages_memo_state  |
  displayed          | boolean                  |
View definition:
  SELECT v_messages_full.id, v_messages_full."from",
v_messages_full."to", v_messages_full.receiving_time,
v_messages_full.raw_text, v_messages_full.keyword,
v_messages_full.destination_id, v_messages_full.vpn_id,
v_messages_full.service_id, v_messages_full.status,
v_messages_full.gateway_message_id, v_messages_full.prize_id,
v_messages_full.tan, messages_memo.memo, messages_memo.state,
NULL::boolean AS displayed
    FROM messages v_messages_full
    JOIN messages_memo ON v_messages_full.id = messages_memo.message_id;

pulitzer2=#



pgsql-performance by date:

Previous
From: Aaron Guyon
Date:
Subject: Re: Postgres 8.3, four times slower queries?
Next
From: Lee Hughes
Date:
Subject: DBT Presentation Location?