Thread: [HACKERS] Minor codegen silliness in ExecInterpExpr()

[HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Tom Lane
Date:
I noticed the following while poking around with perf:
      |                             fcinfo->isnull = false;      |b5b:   movb   $0x0,0x1c(%rdx)      |
          *op->resvalue = op->d.func.fn_addr(fcinfo); 0.02 |       mov    0x8(%rbx),%rcx 1.19 |       mov    %rdx,%rdi
0.93|       mov    %rdx,(%rsp)      |       mov    %rcx,0x8(%rsp) 0.01 |       callq  *0x28(%rbx) 2.17 |       mov
0x8(%rsp),%rcx     |       mov    %rax,(%rcx)      |                             *op->resnull = fcinfo->isnull; 1.18 |
    mov    (%rsp),%rdx 4.32 |       mov    0x10(%rbx),%rax 0.06 |       movzbl 0x1c(%rdx),%edx 9.14 |       mov
%dl,(%rax)

It looks to me like gcc believes it is required to evaluate "op->resvalue"
before invoking the called function, just in case the function somehow has
access to *op and modifies that.  We could save a pointless register spill
and reload if there were a temporary variable in there, ie
    EEO_CASE(EEOP_FUNCEXPR)    {        FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
+            Datum    fvalue;        fcinfo->isnull = false;
-            *op->resvalue = op->d.func.fn_addr(fcinfo);
+            fvalue = op->d.func.fn_addr(fcinfo);
+            *op->resvalue = fvalue;        *op->resnull = fcinfo->isnull;
        EEO_NEXT();    }

and likewise in the other FUNCEXPR cases.

This is on a rather old gcc, haven't checked on bleeding-edge versions.
        regards, tom lane


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Andres Freund
Date:
Hi,

On 2017-09-28 16:21:34 -0400, Tom Lane wrote:
> I noticed the following while poking around with perf:
>
>        |                             fcinfo->isnull = false;
>        |b5b:   movb   $0x0,0x1c(%rdx)
>        |                             *op->resvalue = op->d.func.fn_addr(fcinfo);
>   0.02 |       mov    0x8(%rbx),%rcx
>   1.19 |       mov    %rdx,%rdi
>   0.93 |       mov    %rdx,(%rsp)
>        |       mov    %rcx,0x8(%rsp)
>   0.01 |       callq  *0x28(%rbx)
>   2.17 |       mov    0x8(%rsp),%rcx
>        |       mov    %rax,(%rcx)
>        |                             *op->resnull = fcinfo->isnull;
>   1.18 |       mov    (%rsp),%rdx
>   4.32 |       mov    0x10(%rbx),%rax
>   0.06 |       movzbl 0x1c(%rdx),%edx
>   9.14 |       mov    %dl,(%rax)
>
> It looks to me like gcc believes it is required to evaluate "op->resvalue"
> before invoking the called function, just in case the function somehow has
> access to *op and modifies that.

Yea, the compiler probably doesn't have much choice besides assuming
that. Might be different if we'd annote function pointers as pure, and
used strict aliasing + perhaps some restrict markers, but ...


> We could save a pointless register spill
> and reload if there were a temporary variable in there, ie
>
>         EEO_CASE(EEOP_FUNCEXPR)
>         {
>             FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
> +            Datum    fvalue;
>
>             fcinfo->isnull = false;
> -            *op->resvalue = op->d.func.fn_addr(fcinfo);
> +            fvalue = op->d.func.fn_addr(fcinfo);
> +            *op->resvalue = fvalue;
>             *op->resnull = fcinfo->isnull;
>
>             EEO_NEXT();
>         }
>
> and likewise in the other FUNCEXPR cases.
>
> This is on a rather old gcc, haven't checked on bleeding-edge versions.

Makes sense.  Do you want to make it so, or shall I?  I'd personally be
somewhat tempted to keep the branches in sync here...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-09-28 16:21:34 -0400, Tom Lane wrote:
>> We could save a pointless register spill
>> and reload if there were a temporary variable in there,

> Makes sense.  Do you want to make it so, or shall I?

I just finished testing a patch, as attached.  On my machine (again,
not latest gcc: 4.4.7 on RHEL6 x86_64), it reduces the code size of
execExprInterp.o by a fraction of a percent, and it seems to offer
a slight benefit in "pgbench -S" performance although I'd not put
much stock in that being reproducible.

> I'd personally be
> somewhat tempted to keep the branches in sync here...

I was tempted that way too, but it doesn't apply cleanly to v10,
because of Peter's recent cleanup of function pointer invocation
style.  I don't think it's really worth worrying about.

            regards, tom lane

diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c
index 09abd46..68a1f96 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -501,15 +501,17 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
         EEO_CASE(EEOP_INNER_SYSVAR)
         {
             int            attnum = op->d.var.attnum;
+            Datum        d;

             /* these asserts must match defenses in slot_getattr */
             Assert(innerslot->tts_tuple != NULL);
             Assert(innerslot->tts_tuple != &(innerslot->tts_minhdr));
-            /* heap_getsysattr has sufficient defenses against bad attnums */

-            *op->resvalue = heap_getsysattr(innerslot->tts_tuple, attnum,
-                                            innerslot->tts_tupleDescriptor,
-                                            op->resnull);
+            /* heap_getsysattr has sufficient defenses against bad attnums */
+            d = heap_getsysattr(innerslot->tts_tuple, attnum,
+                                innerslot->tts_tupleDescriptor,
+                                op->resnull);
+            *op->resvalue = d;

             EEO_NEXT();
         }
@@ -517,15 +519,17 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
         EEO_CASE(EEOP_OUTER_SYSVAR)
         {
             int            attnum = op->d.var.attnum;
+            Datum        d;

             /* these asserts must match defenses in slot_getattr */
             Assert(outerslot->tts_tuple != NULL);
             Assert(outerslot->tts_tuple != &(outerslot->tts_minhdr));

             /* heap_getsysattr has sufficient defenses against bad attnums */
-            *op->resvalue = heap_getsysattr(outerslot->tts_tuple, attnum,
-                                            outerslot->tts_tupleDescriptor,
-                                            op->resnull);
+            d = heap_getsysattr(outerslot->tts_tuple, attnum,
+                                outerslot->tts_tupleDescriptor,
+                                op->resnull);
+            *op->resvalue = d;

             EEO_NEXT();
         }
@@ -533,15 +537,17 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
         EEO_CASE(EEOP_SCAN_SYSVAR)
         {
             int            attnum = op->d.var.attnum;
+            Datum        d;

             /* these asserts must match defenses in slot_getattr */
             Assert(scanslot->tts_tuple != NULL);
             Assert(scanslot->tts_tuple != &(scanslot->tts_minhdr));
-            /* heap_getsysattr has sufficient defenses against bad attnums */

-            *op->resvalue = heap_getsysattr(scanslot->tts_tuple, attnum,
-                                            scanslot->tts_tupleDescriptor,
-                                            op->resnull);
+            /* heap_getsysattr has sufficient defenses against bad attnums */
+            d = heap_getsysattr(scanslot->tts_tuple, attnum,
+                                scanslot->tts_tupleDescriptor,
+                                op->resnull);
+            *op->resvalue = d;

             EEO_NEXT();
         }
@@ -641,13 +647,22 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
          * As both STRICT checks and function-usage are noticeable performance
          * wise, and function calls are a very hot-path (they also back
          * operators!), it's worth having so many separate opcodes.
+         *
+         * Note: the reason for using a temporary variable "d", here and in
+         * other places, is that some compilers think "*op->resvalue = f();"
+         * requires them to evaluate op->resvalue into a register before
+         * calling f(), just in case f() is able to modify op->resvalue
+         * somehow.  The extra line of code can save a useless register spill
+         * and reload, on architectures without many registers.
          */
         EEO_CASE(EEOP_FUNCEXPR)
         {
             FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
+            Datum        d;

             fcinfo->isnull = false;
-            *op->resvalue = op->d.func.fn_addr(fcinfo);
+            d = op->d.func.fn_addr(fcinfo);
+            *op->resvalue = d;
             *op->resnull = fcinfo->isnull;

             EEO_NEXT();
@@ -658,6 +673,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
             FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
             bool       *argnull = fcinfo->argnull;
             int            argno;
+            Datum        d;

             /* strict function, so check for NULL args */
             for (argno = 0; argno < op->d.func.nargs; argno++)
@@ -669,7 +685,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
                 }
             }
             fcinfo->isnull = false;
-            *op->resvalue = op->d.func.fn_addr(fcinfo);
+            d = op->d.func.fn_addr(fcinfo);
+            *op->resvalue = d;
             *op->resnull = fcinfo->isnull;

     strictfail:
@@ -680,11 +697,13 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
         {
             FunctionCallInfo fcinfo = op->d.func.fcinfo_data;
             PgStat_FunctionCallUsage fcusage;
+            Datum        d;

             pgstat_init_function_usage(fcinfo, &fcusage);

             fcinfo->isnull = false;
-            *op->resvalue = op->d.func.fn_addr(fcinfo);
+            d = op->d.func.fn_addr(fcinfo);
+            *op->resvalue = d;
             *op->resnull = fcinfo->isnull;

             pgstat_end_function_usage(&fcusage, true);
@@ -698,6 +717,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
             PgStat_FunctionCallUsage fcusage;
             bool       *argnull = fcinfo->argnull;
             int            argno;
+            Datum        d;

             /* strict function, so check for NULL args */
             for (argno = 0; argno < op->d.func.nargs; argno++)
@@ -712,7 +732,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
             pgstat_init_function_usage(fcinfo, &fcusage);

             fcinfo->isnull = false;
-            *op->resvalue = op->d.func.fn_addr(fcinfo);
+            d = op->d.func.fn_addr(fcinfo);
+            *op->resvalue = d;
             *op->resnull = fcinfo->isnull;

             pgstat_end_function_usage(&fcusage, true);
@@ -1113,6 +1134,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
             if (!op->d.iocoerce.finfo_in->fn_strict || str != NULL)
             {
                 FunctionCallInfo fcinfo_in;
+                Datum        d;

                 fcinfo_in = op->d.iocoerce.fcinfo_data_in;
                 fcinfo_in->arg[0] = PointerGetDatum(str);
@@ -1120,7 +1142,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
                 /* second and third arguments are already set up */

                 fcinfo_in->isnull = false;
-                *op->resvalue = FunctionCallInvoke(fcinfo_in);
+                d = FunctionCallInvoke(fcinfo_in);
+                *op->resvalue = d;

                 /* Should get null result if and only if str is NULL */
                 if (str == NULL)
@@ -1268,6 +1291,7 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)
         EEO_CASE(EEOP_ROWCOMPARE_STEP)
         {
             FunctionCallInfo fcinfo = op->d.rowcompare_step.fcinfo_data;
+            Datum        d;

             /* force NULL result if strict fn and NULL input */
             if (op->d.rowcompare_step.finfo->fn_strict &&
@@ -1279,7 +1303,8 @@ ExecInterpExpr(ExprState *state, ExprContext *econtext, bool *isnull)

             /* Apply comparison function */
             fcinfo->isnull = false;
-            *op->resvalue = op->d.rowcompare_step.fn_addr(fcinfo);
+            d = op->d.rowcompare_step.fn_addr(fcinfo);
+            *op->resvalue = d;

             /* force NULL result if NULL function result */
             if (fcinfo->isnull)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Andres Freund
Date:
On 2017-09-28 18:39:03 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2017-09-28 16:21:34 -0400, Tom Lane wrote:
> >> We could save a pointless register spill
> >> and reload if there were a temporary variable in there,
> 
> > Makes sense.  Do you want to make it so, or shall I?
> 
> I just finished testing a patch, as attached.  On my machine (again,
> not latest gcc: 4.4.7 on RHEL6 x86_64), it reduces the code size of
> execExprInterp.o by a fraction of a percent, and it seems to offer
> a slight benefit in "pgbench -S" performance although I'd not put
> much stock in that being reproducible.

Cool.

> +         * Note: the reason for using a temporary variable "d", here and in
> +         * other places, is that some compilers think "*op->resvalue = f();"
> +         * requires them to evaluate op->resvalue into a register before
> +         * calling f(), just in case f() is able to modify op->resvalue
> +         * somehow.  The extra line of code can save a useless register spill
> +         * and reload, on architectures without many registers.

I'd remove the "without many registers" bit - that's really more an
functioncall ABI question (#caller vs #callee saved registers) than
about the actual architecture.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2017-09-28 18:39:03 -0400, Tom Lane wrote:
>> +         * Note: the reason for using a temporary variable "d", here and in
>> +         * other places, is that some compilers think "*op->resvalue = f();"
>> +         * requires them to evaluate op->resvalue into a register before
>> +         * calling f(), just in case f() is able to modify op->resvalue
>> +         * somehow.  The extra line of code can save a useless register spill
>> +         * and reload, on architectures without many registers.

> I'd remove the "without many registers" bit - that's really more an
> functioncall ABI question (#caller vs #callee saved registers) than
> about the actual architecture.

Fair enough.

I wondered how pervasive this behavior is.  AFAICS it is *not* required
by the C standard; C99 does not say that the left operand of assignment
must be evaluated first, in fact it says that the order of evaluation is
unspecified.  But the latest gcc I have at hand (6.4.1 on Fedora 25) still
does it this way.  OTOH, Apple's latest edition of clang (LLVM version
9.0.0 (clang-900.0.37)) appears to be just fine with waiting till after
the function call to load op->resvalue.  So that's not many data points,
but it does suggest that this is worth fixing, and is not just an artifact
of an old compiler version.
        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Andres Freund
Date:
On 2017-09-28 20:01:37 -0400, Tom Lane wrote:
> I wondered how pervasive this behavior is.  AFAICS it is *not* required
> by the C standard; C99 does not say that the left operand of assignment
> must be evaluated first, in fact it says that the order of evaluation is
> unspecified.

FWIW, it's being specificied in C++17 ([1]), which seems to make it not
unlikely to end up in C as well.

> but it does suggest that this is worth fixing, and is not just an artifact
> of an old compiler version.

+1, I saw the same in a bleeding edge c++ version.

Greetings,

Andres Freund

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2016/p0145r3.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor codegen silliness in ExecInterpExpr()

From
Andres Freund
Date:
On 2017-09-28 20:56:57 -0700, Andres Freund wrote:
> +1, I saw the same in a bleeding edge c++ version.

err, gcc version.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers