Re: plan cache overhead on plpgsql expression - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: plan cache overhead on plpgsql expression
Date
Msg-id CAFj8pRDCJcv6RwrVdceQVx_Ct0uxADmZN51nv=Z9Z0QBy68K+g@mail.gmail.com
Whole thread Raw
In response to Re: plan cache overhead on plpgsql expression  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: plan cache overhead on plpgsql expression  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers


so 21. 3. 2020 v 19:24 odesílatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
Pavel Stehule <pavel.stehule@gmail.com> writes:
> So the patch has a problem with constant casting - unfortunately the mix of
> double precision variables and numeric constants is pretty often in
> Postgres.

Yeah.  I believe the cause of that is that the patch thinks it can skip
passing an inline-function-free simple expression through the planner.
That's flat out wrong.  Quite aside from failing to perform
constant-folding (which is presumably the cause of the slowdown you
spotted), that means that we miss performing such non-optional
transformations as rearranging named-function-argument notation into
positional order.  I didn't bother to test that but I'm sure it can be
shown to lead to crashes.

Now that I've looked at the patch I don't like it one bit from a
structural standpoint either.  It's basically trying to make an end
run around the plancache, which is not going to be maintainable even
if it correctly accounted for everything the plancache does today.
Which it doesn't.  Two big problems are:

* It doesn't account for the possibility of search_path changes
affecting the interpretation of an expression.

* It assumes that the *only* things that a simple plan could get
invalidated for are functions that were inlined.  This isn't the
case --- a counterexample is that removal of no-op CoerceToDomain
nodes requires the plan to be invalidated if the domain's constraints
change.  And there are likely to be more such issues in future.


So while there's clearly something worth pursuing here, I do not like
anything about the way it was done.  I think that the right way to
think about this problem is "how can the plan cache provide a fast
path for checking validity of simple-expression plans?".  And when you
think about it that way, there's a pretty obvious answer: if the plan
involves no table references, there's not going to be any locks that
have to be taken before we can check the is_valid flag.  So we can
have a fast path that skips AcquirePlannerLocks and
AcquireExecutorLocks, which are a big part of the problem, and we can
also bypass some of the other random checks that GetCachedPlan has to
make, like whether RLS affects the plan.

Another chunk of the issue is the constant acquisition and release of
reference counts on the plan.  We can't really skip that (I suspect
there are additional bugs in Amit's patch arising from trying to do so).
However, plpgsql already has mechanisms for paying simple-expression
setup costs once per transaction rather than once per expression use.
So we can set up a simple-expression ResourceOwner managed much like
the simple-expression EState, and have it hold a refcount on the
CachedPlan for each simple expression, and pay that overhead just once
per transaction.

So I worked on those ideas for awhile, and came up with the attached
patchset:

0001 adds some regression tests in this area (Amit's patch fails the
tests concerning search_path changes).

0002 does what's suggested above.  I also did a little bit of marginal
micro-tuning in exec_eval_simple_expr() itself.

0003 improves the biggest remaining cost of validity rechecking,
which is verifying that the search_path is the same as it was when
the plan was cached.

I haven't done any serious performance testing on this, but it gives
circa 2X speedup on Pavel's original example, which is at least
fairly close to the results that Amit's patch got there.  And it
makes this last batch of test cases faster not slower, too.

With this patch, perf shows the hotspots on Pavel's original example
as being

+   19.24%    19.17%         46470  postmaster       plpgsql.so                   [.] exec_eval_expr
+   15.19%    15.15%         36720  postmaster       plpgsql.so                   [.] plpgsql_param_eval_var
+   14.98%    14.94%         36213  postmaster       postgres                     [.] ExecInterpExpr
+    6.32%     6.30%         15262  postmaster       plpgsql.so                   [.] exec_stmt
+    6.08%     6.06%         14681  postmaster       plpgsql.so                   [.] exec_assign_value

Maybe there's more that could be done to knock fat out of
exec_eval_expr and/or plpgsql_param_eval_var, but at least
the plan cache isn't the bottleneck anymore.

I tested Tom's patches, and I can confirm these results.

It doesn't break tests (and all tests plpgsql_check tests passed without problems).

The high overhead of ExecInterpExpr is related to prepare fcinfo, and checking nulls arguments because all functions are strict
plpgsql_param_eval_var, looks like expensive is var = (PLpgSQL_var *) estate->datums[dno] and *op->resvalue = var->value;

It looks great.

Pavel

 

                        regards, tom lane

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: SQL/JSON: functions
Next
From: Tomas Vondra
Date:
Subject: Re: Additional improvements to extended statistics