Re: performance of sql and plpgsql functions - Mailing list pgsql-performance

From Julius Tuskenis
Subject Re: performance of sql and plpgsql functions
Date
Msg-id 09c67567-f351-415f-af2d-76f7e8661d12@gmail.com
Whole thread Raw
In response to Re: performance of sql and plpgsql functions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On 2024-06-17 17:24, Tom Lane wrote:
Julius Tuskenis <julius.tuskenis@gmail.com> writes:
EXPLAIN (ANALYZE, BUFFERS, VERBOSE)   SELECT     COALESCE(sum(mok_nepadengta), 0)   FROM     public.b_pardavimai     JOIN public.b_mokejimai ON (mok_pardavimas = pard_id)   WHERE     (pard_tipas = ANY('{1, 2, 6, 7}'))     AND (mok_saskaita = 7141968)
I believe that the SQL-language function executor always uses generic
plans for parameterized queries (which is bad, but nobody's gotten
round to improving it).  So the above is a poor way of investigating
what will happen, because it corresponds to a custom plan for the
value 7141968.  You should try something like

PREPARE p(integer) AS  SELECT COALESCE ...  ... AND (mok_saskaita = $1);

SET plan_cache_mode TO force_generic_plan;

EXPLAIN ANALYZE EXECUTE p(7141968);

What I suspect is that the statistics for mok_saskaita are
highly skewed and so with a generic plan the planner will
not risk using a plan that depends on the parameter value
being infrequent, as the one you're showing does.
			regards, tom lane


Thank you Tom Lane, for pointing the problem.

In deed, after setting plan_cache_mode to force_generic_plan I see very different plan:

```

"Finalize Aggregate  (cost=6901.01..6901.02 rows=1 width=32) (actual time=50.258..56.004 rows=1 loops=1)"
"  Output: COALESCE(sum((b_mokejimai.mok_nepadengta)::numeric), '0'::numeric)"
"  Buffers: shared hit=4"
"  ->  Gather  (cost=6900.89..6901.00 rows=1 width=32) (actual time=0.809..55.993 rows=2 loops=1)"
"        Output: (PARTIAL sum((b_mokejimai.mok_nepadengta)::numeric))"
"        Workers Planned: 1"
"        Workers Launched: 1"
"        Buffers: shared hit=4"
"        ->  Partial Aggregate  (cost=5900.89..5900.90 rows=1 width=32) (actual time=0.077..0.079 rows=1 loops=2)"
"              Output: PARTIAL sum((b_mokejimai.mok_nepadengta)::numeric)"
"              Buffers: shared hit=4"
"              Worker 0: actual time=0.052..0.053 rows=1 loops=1"
"              ->  Nested Loop  (cost=25.92..5897.69 rows=1280 width=3) (actual time=0.070..0.072 rows=0 loops=2)"
"                    Output: b_mokejimai.mok_nepadengta"
"                    Inner Unique: true"
"                    Buffers: shared hit=4"
"                    Worker 0: actual time=0.043..0.043 rows=0 loops=1"
"                    ->  Parallel Bitmap Heap Scan on public.b_mokejimai  (cost=25.48..2455.36 rows=1307 width=7) (actual time=0.069..0.070 rows=0 loops=2)"
"                          Output: b_mokejimai.mok_id, b_mokejimai.mok_moketojas, b_mokejimai.mok_pardavimas, b_mokejimai.mok_laikas, b_mokejimai.mok_suma, b_mokejimai.mok_budas, b_mokejimai.mok_terminas, b_mokejimai.mok_cekis, b_mokejimai.mok_saskaita, b_mokejimai.mok_suma_bazine, b_mokejimai.mok_nepadengta, b_mokejimai.mok_padengta, b_mokejimai.mok_laiko_diena"
"                          Recheck Cond: (b_mokejimai.mok_saskaita = $1)"
"                          Buffers: shared hit=4"
"                          Worker 0: actual time=0.042..0.042 rows=0 loops=1"
"                          ->  Bitmap Index Scan on idx_saskaita  (cost=0.00..24.93 rows=2222 width=0) (actual time=0.023..0.023 rows=0 loops=1)"
"                                Index Cond: (b_mokejimai.mok_saskaita = $1)"
"                                Buffers: shared hit=4"
"                    ->  Index Scan using pk_b_pardavimai_id on public.b_pardavimai  (cost=0.44..2.63 rows=1 width=4) (never executed)"
"                          Output: b_pardavimai.pard_id, b_pardavimai.pard_preke, b_pardavimai.pard_kaina, b_pardavimai.pard_nuolaida, b_pardavimai.pard_kiekis, b_pardavimai.pard_kasos_nr, b_pardavimai.pard_laikas, b_pardavimai.pard_prekes_id, b_pardavimai.pard_pirkejo_id, b_pardavimai.pard_pardavejas, b_pardavimai.pard_spausdinta, b_pardavimai.pard_reikia_grazinti, b_pardavimai.pard_kam_naudoti, b_pardavimai.pard_susieta, b_pardavimai.pard_galima_anuliuoti, b_pardavimai.pard_tipas, b_pardavimai.pard_pvm, b_pardavimai.pard_apsilankymas, b_pardavimai.pard_fk, b_pardavimai.pard_kelintas, b_pardavimai.pard_precekis, b_pardavimai.pard_imone, b_pardavimai.pard_grazintas, b_pardavimai.pard_debeto_sutartis, b_pardavimai.pard_kaina_be_nld, b_pardavimai.pard_uzsakymas_pos, b_pardavimai.pard_pvm_suma, b_pardavimai.pard_uzsakymo_nr, b_pardavimai.pard_nuolaidos_id, b_pardavimai.pard_nuolaida_taikyti, b_pardavimai.pard_pirkeja_keisti_galima, b_pardavimai.pard_suma_keisti_galima"
"                          Index Cond: (b_pardavimai.pard_id = b_mokejimai.mok_pardavimas)"
"                          Filter: (b_pardavimai.pard_tipas = ANY ('{1,2,6,7}'::integer[]))"
"Planning Time: 0.016 ms"
"Execution Time: 56.097 ms"

```

If I understand the plan correctly, the problem is the planner expects to find 2222 records for a provide value of `mok_saskaita`.  I've tried running analyze on `b_mokejimai`, but the plan remains the same - must be because some values of `mok_saskaita` do really return tens of thousands of records.

I don't know how the planner comes up with value 2222, because on average there are 15 b_mokejimai records for a single mok_saskaita (if NULL in mok_saskata is ignored), and 628 records if not.


Anyway...

Do you think rewriting a function in plpgsql is a way to go in such case? In pg documentation (https://www.postgresql.org/docs/12/plpgsql-implementation.html#PLPGSQL-PLAN-CACHING) I read that the plan for the plpgsql function is calculated the first time the function is executed (for a connection). I'm concerned, that the function execution is not replanned: I will be stuck with a plan that corresponds to the `mok_saskaita`  parameter value passed on the first execution. Or am I wrong?

Is there a way to make PostgreSQL recalculate the plan on each execution of the function? The observed planning times are acceptable for my application.


Regards,

Julius Tuskenis

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: performance of sql and plpgsql functions
Next
From: Thomas Simpson
Date:
Subject: Row level security