Hi all.
We=E2=80=99ve met a strange problem with not using functional GIN-index =
in one of the SELECT-queries used inside the PL/pgSQL function. Seems, =
that it happens after 5 calls of such function inside one session. All =
attempts to run this query from psql with exactly the same binds don=E2=80=
=99t reproduce the problem (index is used). Calling such function with =
the same arguments in other sessions also run fast. So my guess is that =
it is somehow related with generic plan. Below are some details.
We use PostgreSQL 9.4.6 (upgrade is planned) on RHEL 6. The table looks =
like this (a bit simplified):
pgload03g/maildb M # \dS+ mail.box
Table "mail.box"
Column | Type | Modifiers | =
Storage | Stats target | Description
=
---------------+--------------------------+------------------------+------=
----+--------------+-------------
uid | bigint | not null | =
plain | |
mid | bigint | not null | =
plain | |
<...>
tid | bigint | | =
plain | |
lids | integer[] | not null | =
extended | |
<...>
Indexes:
"pk_box" PRIMARY KEY, btree (uid, mid)
<...>
"i_box_uid_lids" gin (mail.ulids(uid, lids)) WITH (fastupdate=3Doff)
"i_box_uid_tid" btree (uid, tid)
Check constraints:
"box_lids_check" CHECK (array_ndims(lids) =3D 1)
<...>
pgload03g/maildb M # \sf mail.ulids
CREATE OR REPLACE FUNCTION mail.ulids(i_uid bigint, i_lids integer[])
RETURNS bigint[]
LANGUAGE sql
IMMUTABLE STRICT
AS $function$
SELECT array_agg((i_uid << 32) | lid)
FROM unnest(i_lids) lid;
$function$
pgload03g/maildb M #
The best reproduction of the problem until now looks like that:
pgload03g/maildb M # PREPARE has_lids (bigint, int) AS
SELECT EXISTS (
SELECT 1 FROM mail.box
WHERE mail.ulids(uid, lids)
&& mail.ulids($1::bigint, ARRAY[$2])
AND tid IS NOT NULL
);
PREPARE
Time: 0.398 ms
pgload03g/maildb M # explain (ANALYZE, VERBOSE, BUFFERS) execute =
has_lids(9, 1);
QUERY PLAN
=
--------------------------------------------------------------------------=
-----------------------------------------------------------
Result (cost=3D297.41..297.42 rows=3D1 width=3D0) (actual =
time=3D0.015..0.015 rows=3D1 loops=3D1)
Output: $0
Buffers: shared hit=3D4
InitPlan 1 (returns $0)
-> Bitmap Heap Scan on mail.box (cost=3D293.22..120596.51 =
rows=3D28663 width=3D0) (actual time=3D0.013..0.013 rows=3D0 loops=3D1)
Recheck Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705665}'::bigint[])
Filter: (box.tid IS NOT NULL)
Buffers: shared hit=3D4
-> Bitmap Index Scan on i_box_uid_lids (cost=3D0.00..286.05 =
rows=3D29574 width=3D0) (actual time=3D0.013..0.013 rows=3D0 loops=3D1)
Index Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705665}'::bigint[])
Buffers: shared hit=3D4
Execution time: 0.037 ms
(12 rows)
Time: 0.667 ms
pgload03g/maildb M # explain (ANALYZE, VERBOSE, BUFFERS) execute =
has_lids(9, 2);
QUERY PLAN
=
--------------------------------------------------------------------------=
-----------------------------------------------------------
Result (cost=3D297.41..297.42 rows=3D1 width=3D0) (actual =
time=3D0.017..0.017 rows=3D1 loops=3D1)
Output: $0
Buffers: shared hit=3D4
InitPlan 1 (returns $0)
-> Bitmap Heap Scan on mail.box (cost=3D293.22..120596.51 =
rows=3D28663 width=3D0) (actual time=3D0.016..0.016 rows=3D0 loops=3D1)
Recheck Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705666}'::bigint[])
Filter: (box.tid IS NOT NULL)
Buffers: shared hit=3D4
-> Bitmap Index Scan on i_box_uid_lids (cost=3D0.00..286.05 =
rows=3D29574 width=3D0) (actual time=3D0.015..0.015 rows=3D0 loops=3D1)
Index Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705666}'::bigint[])
Buffers: shared hit=3D4
Execution time: 0.042 ms
(12 rows)
Time: 0.695 ms
pgload03g/maildb M # explain (ANALYZE, VERBOSE, BUFFERS) execute =
has_lids(9, 3);
QUERY PLAN
=
--------------------------------------------------------------------------=
-----------------------------------------------------------
Result (cost=3D297.41..297.42 rows=3D1 width=3D0) (actual =
time=3D0.014..0.015 rows=3D1 loops=3D1)
Output: $0
Buffers: shared hit=3D4
InitPlan 1 (returns $0)
-> Bitmap Heap Scan on mail.box (cost=3D293.22..120596.51 =
rows=3D28663 width=3D0) (actual time=3D0.014..0.014 rows=3D0 loops=3D1)
Recheck Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705667}'::bigint[])
Filter: (box.tid IS NOT NULL)
Buffers: shared hit=3D4
-> Bitmap Index Scan on i_box_uid_lids (cost=3D0.00..286.05 =
rows=3D29574 width=3D0) (actual time=3D0.012..0.012 rows=3D0 loops=3D1)
Index Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705667}'::bigint[])
Buffers: shared hit=3D4
Execution time: 0.038 ms
(12 rows)
Time: 0.632 ms
pgload03g/maildb M # explain (ANALYZE, VERBOSE, BUFFERS) execute =
has_lids(9, 4);
QUERY PLAN
=
--------------------------------------------------------------------------=
-----------------------------------------------------------
Result (cost=3D297.41..297.42 rows=3D1 width=3D0) (actual =
time=3D0.015..0.015 rows=3D1 loops=3D1)
Output: $0
Buffers: shared hit=3D4
InitPlan 1 (returns $0)
-> Bitmap Heap Scan on mail.box (cost=3D293.22..120596.51 =
rows=3D28663 width=3D0) (actual time=3D0.013..0.013 rows=3D0 loops=3D1)
Recheck Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705668}'::bigint[])
Filter: (box.tid IS NOT NULL)
Buffers: shared hit=3D4
-> Bitmap Index Scan on i_box_uid_lids (cost=3D0.00..286.05 =
rows=3D29574 width=3D0) (actual time=3D0.012..0.012 rows=3D0 loops=3D1)
Index Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705668}'::bigint[])
Buffers: shared hit=3D4
Execution time: 0.036 ms
(12 rows)
Time: 0.802 ms
pgload03g/maildb M # explain (ANALYZE, VERBOSE, BUFFERS) execute =
has_lids(9, 5);
QUERY PLAN
=
--------------------------------------------------------------------------=
-----------------------------------------------------------
Result (cost=3D297.41..297.42 rows=3D1 width=3D0) (actual =
time=3D0.014..0.014 rows=3D1 loops=3D1)
Output: $0
Buffers: shared hit=3D4
InitPlan 1 (returns $0)
-> Bitmap Heap Scan on mail.box (cost=3D293.22..120596.51 =
rows=3D28663 width=3D0) (actual time=3D0.013..0.013 rows=3D0 loops=3D1)
Recheck Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705669}'::bigint[])
Filter: (box.tid IS NOT NULL)
Buffers: shared hit=3D4
-> Bitmap Index Scan on i_box_uid_lids (cost=3D0.00..286.05 =
rows=3D29574 width=3D0) (actual time=3D0.013..0.013 rows=3D0 loops=3D1)
Index Cond: (mail.ulids(box.uid, box.lids) && =
'{38654705669}'::bigint[])
Buffers: shared hit=3D4
Execution time: 0.037 ms
(12 rows)
Time: 0.763 ms
pgload03g/maildb M # explain (ANALYZE, VERBOSE, BUFFERS) execute =
has_lids(8, 5);
QUERY PLAN
=
--------------------------------------------------------------------------=
----------------------------------------------------------
Result (cost=3D54.64..54.65 rows=3D1 width=3D0) (actual =
time=3D6523020.562..6523020.562 rows=3D1 loops=3D1)
Output: $0
Buffers: shared hit=3D116142 read=3D7458134 dirtied=3D21
I/O Timings: read=3D29892.580
InitPlan 1 (returns $0)
-> Seq Scan on mail.box (cost=3D0.00..234921342.40 rows=3D4299410 =
width=3D0) (actual time=3D6523020.560..6523020.560 rows=3D0 loops=3D1)
Filter: ((box.tid IS NOT NULL) AND (mail.ulids(box.uid, =
box.lids) && mail.ulids($1, ARRAY[$2])))
Rows Removed by Filter: 295557368
Buffers: shared hit=3D116142 read=3D7458134 dirtied=3D21
I/O Timings: read=3D29892.580
Execution time: 6523020.634 ms
(11 rows)
Time: 6523021.312 ms
pgload03g/maildb M #
Running last query in another session also uses Bitmap Index Scan on =
i_box_uid_lids and completes fast.
It is clear that first five times postgres used Bitmap Index Scan on =
i_box_uid_lids with total cost 297.41..297.42. But then it decided to =
use Seq Scan on mail.box with total cost 54.64..54.65. The cost of seq =
scan is lower (and it seems to be the reason of choosing it) but it =
actually takes enormous amount of time to finish this query and it is a =
huge miss of optimizer.
It seems that the reason is that we are using SELECT EXISTS which should =
stop after receiving the first tuple from subnode and seq scan thinks =
that it would receive the first tuple immediately (cost=3D0.00..) but =
actually it wouldn=E2=80=99t. Another thing to pay attention is that =
optimizer thinks to scan 4.299.410 rows but actually it scans the whole =
table (295.557.368 rows). It is not related to inaccurate statistics =
because running VACUUM ANALYZE on that table doesn=E2=80=99t change the =
picture.
We are probably going to rewrite the query inside the PL/pgSQL function =
but it seems that such behavior isn=E2=80=99t expected for PostgreSQL =
and it would be cool to fix it.
--
May the force be with you=E2=80=A6
https://staff.yandex-team.ru/d0uble