Re: Getting an optimal plan on the first execution of a pl/pgsql function - Mailing list pgsql-performance

From David G. Johnston
Subject Re: Getting an optimal plan on the first execution of a pl/pgsql function
Date
Msg-id CAKFQuwbavm1XnbX1dS=8R6vrpUbnAo6HraqCg_UD2+v3nEjaRA@mail.gmail.com
Whole thread Raw
In response to Getting an optimal plan on the first execution of a pl/pgsql function  (Pedro França <pedro.franca@golsat.com.br>)
Responses Re: Getting an optimal plan on the first execution of a pl/pgsql function  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Mon, Dec 14, 2015 at 11:53 AM, Pedro França <pedro.franca@golsat.com.br> wrote:
I have a really busy function that I need to optimize the best way I can. This function is just a nested select statement that is requested several times a sec by a legacy application. I'm running a PostgreSQL 9.4 on a CentOS 6;

The indexes are in place but I've noticed that it is only used after the first execution of the function.

​How do you know this?​

I think that the problem is that Postgres isn't getting the best execution plan at first because of a parameter that it is highly exclusive in the majority of the cases, but it can be not as good sometimes. We can't change the way we call the function to a plain sql statement or a view because we can't change the application code itself.

When I test with EXPLAIN ANALYZE after the first execution, the query runs really fast but the aplication sessions call the function only once and then are terminated. I need that the first execution use the actual optimized plan. 

We tried messing around with the connector driver that manage the connection pooling to issue a DISCARD TEMP instead of DISCARD ALL, so it could keep the cached plan of the sessions and the performance improved a lot, but I don't want to do that in a production environment.

Given the constraints you've listed this seems like it might be your only avenue of improvement.​  Your problem that the performance improvement is seen due to caching effects.  If you throw away the cache you loose the improvement.


I've tried to change the language to a sql function but it didn't help as the execution time didn't drop after the first execution.

​Yes, this likely would make thing worse...depending upon how it is called.

I've tried to add the "SET LOCAL join_collapse_limit = 1" too but it appears it doesn't work inside a function;

​I wouldn't expect that parameter to have any effect in this scenario.

Here is the function code:

CREATE OR REPLACE FUNCTION public.ap_keepalive_geteqpid_veiid(
    IN tcbserie bigint,
    IN protocolo integer)
  RETURNS TABLE(eqpid integer, veiid integer, tcbid integer, veiplaca character varying, veiproprietariocliid integer, tcbtppid integer, tcbversao character, veirpmparametro double precision, tcbconfiguracao bigint, tcbevtconfig integer, veibitsalertas integer, sluid integer, harid integer) AS
$BODY$
BEGIN

RETURN QUERY
SELECT teqp.eqpID, 
teqp.eqpveiID AS veiID, 
tcb.tcbID, 
tvei.veiPlaca, 
tvei.veiProprietariocliID, 
tcb.tcbtppID, 
tcb.tcbVersao,
tvei.veiRPMParametro, 
COALESCE(COALESCE(NULLIF(tcb.tcbConfiguracao, 0), tcc.clcConfiguracaoBitsVeic), 0) AS tcbConfiguracao,
COALESCE(tcb.tcbevtConfig, 0) AS tcbevtConfig,
COALESCE(tvei.veiBitsAlertas, 0) AS veiBitsAlertas,
COALESCE(tvei.veisluID, 0) AS sluID,
COALESCE(tcb.tcbharID, 0) AS harID
FROM TabComputadorBordo tcb
INNER JOIN TabEquipamento teqp ON teqp.eqptcbID = tcb.tcbID
INNER JOIN TabPacoteProduto tpp ON teqp.eqptppID = tpp.tppID
INNER JOIN TabVeiculos tvei ON teqp.eqpveiID = tvei.veiID
LEFT JOIN TabCliente tcli ON tcli.cliid = tvei.veiProprietariocliID
LEFT JOIN TabClienteConfig tcc ON tcc.clcCliID = tcli.cliID
WHERE   tcb.tcbserie = $1
AND teqp.eqpAtivo = 1
AND tpp.tppIDProtocolo = $2
AND tvei.veiBloqueioSinal = 0;

END
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 10000
  ROWS 1;

Execution plan in the first execution:

​You likely could make this STABLE instead of VOLATILE; though that doesn't solve your problem.​


"Function Scan on ap_keepalive_geteqpid_veiid  (cost=0.25..0.26 rows=1 width=116) (actual time=3.268..3.268 rows=1 loops=1)"
"Planning time: 0.032 ms"
"Execution time: 3.288 ms"

Second execution:

"Function Scan on ap_keepalive_geteqpid_veiid  (cost=0.25..0.26 rows=1 width=116) (actual time=0.401..0.402 rows=1 loops=1)"
"Planning time: 0.058 ms"
"Execution time: 0.423 ms"


​I'm doubting the query inside of the function is the problem here...it is the function usage itself.  Calling a function has overhead in that the body of function needs to be processed.  This only has to happen once per session.  The first call of the function incurs this overhead while subsequent calls do not.

Pending others correcting me...I fairly certain regarding my conclusions though somewhat inexperienced in doing this kind of diagnostics.

David J.

pgsql-performance by date:

Previous
From: Pedro França
Date:
Subject: Getting an optimal plan on the first execution of a pl/pgsql function
Next
From: Tom Lane
Date:
Subject: Re: Getting an optimal plan on the first execution of a pl/pgsql function