Thread: Time to drop old-style (V0) functions?

Time to drop old-style (V0) functions?

From
Andres Freund
Date:
Hi,

I'm wondering if it's not time for $subject:
- V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was forgotten
- They have us keep weird hacks around just for the sake of testing V0
- they actually cost performance, because we have to zero initialize Datums, even if the corresponding isnull marker is
set.
- they allow to call arbitrary functions pretty easily

I don't see any reason to keep them around. If seriously doubt anybody
is using them seriously in anything but error.

Greetings,

Andres Freund



Re: Time to drop old-style (V0) functions?

From
Stephen Frost
Date:
Andres, all,

* Andres Freund (andres@anarazel.de) wrote:
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily
>
> I don't see any reason to keep them around. If seriously doubt anybody
> is using them seriously in anything but error.

+100

Thanks!

Stephen

Re: Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily

If by the first point you mean "assume V1 when no info function is found",
I object to that.  If you mean you want to require an info function, that
might be OK.

The habit of zero-initializing Datums has got exactly nothing to do with
V0 functions; it's about ensuring consistent results and avoiding
heisenbugs from use of uninitialized memory.  I do not think we should
drop it.
        regards, tom lane



Re: Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'm wondering if it's not time for $subject:
> > - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
> >   forgotten
> > - They have us keep weird hacks around just for the sake of testing V0
> > - they actually cost performance, because we have to zero initialize Datums, even if
> >   the corresponding isnull marker is set.
> > - they allow to call arbitrary functions pretty easily
> 
> If by the first point you mean "assume V1 when no info function is found",
> I object to that.  If you mean you want to require an info function, that
> might be OK.

I mean throwing an error.  Silently assuming V1 seems like a horrible
idea to me.  It doesn't seem unlikely that we want to introduce a new
call interface at some point given the runtime cost of the current one,
and that'd just bring back the current problem.


> The habit of zero-initializing Datums has got exactly nothing to do with
> V0 functions; it's about ensuring consistent results and avoiding
> heisenbugs from use of uninitialized memory.  I do not think we should
> drop it.

Well, V0 functions don't have a real way to get information about NULL,
and we allow non-strict V0 functions, so?

Regards,

Andres



Re: Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 14:53:58 -0800, Andres Freund wrote:
> On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> > The habit of zero-initializing Datums has got exactly nothing to do with
> > V0 functions; it's about ensuring consistent results and avoiding
> > heisenbugs from use of uninitialized memory.  I do not think we should
> > drop it.
> 
> Well, V0 functions don't have a real way to get information about NULL,
> and we allow non-strict V0 functions, so?

Oh, and the regression tests fail in V0 functions if you drop that.

Andres



Re: Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
>> The habit of zero-initializing Datums has got exactly nothing to do with
>> V0 functions; it's about ensuring consistent results and avoiding
>> heisenbugs from use of uninitialized memory.  I do not think we should
>> drop it.

> Well, V0 functions don't have a real way to get information about NULL,
> and we allow non-strict V0 functions, so?

Non-strict V0 functions are pretty fundamentally broken, although IIRC
there was some hack whereby they could see the isnull marker for their
first argument, which is why we didn't just disallow the case.  There was
never any expectation that checking for value == 0 was an appropriate
coding method for detecting nulls, because it couldn't work for
pass-by-value data types.

Again, the point of initializing those values is not to support broken
tests for nullness.  It's to ensure consistent behavior in case of
buggy attempts to use null values.  It's much like the fact that makeNode
zero-fills new node structs: that's mostly wasted work, if you want to
look at it in a certain way, but it's good for reproducibility.
        regards, tom lane



Re: Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 18:03:04 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> >> The habit of zero-initializing Datums has got exactly nothing to do with
> >> V0 functions; it's about ensuring consistent results and avoiding
> >> heisenbugs from use of uninitialized memory.  I do not think we should
> >> drop it.
> 
> > Well, V0 functions don't have a real way to get information about NULL,
> > and we allow non-strict V0 functions, so?
> 
> Non-strict V0 functions are pretty fundamentally broken, although IIRC
> there was some hack whereby they could see the isnull marker for their
> first argument, which is why we didn't just disallow the case.  There was
> never any expectation that checking for value == 0 was an appropriate
> coding method for detecting nulls, because it couldn't work for
> pass-by-value data types.

Well, we have a bunch in our regression tests ;). And I'm not saying
it's *good* that they rely on that, I think it's a reason to drop the
whole V0 interface.

(I also suspect there's a bunch in brin related to this)


> Again, the point of initializing those values is not to support broken
> tests for nullness.  It's to ensure consistent behavior in case of
> buggy attempts to use null values.

Well, it also makes such attempts undetectable. I'm not really convinced
that that's such an improvement.


Greetings,

Andres Freund



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Stephen Frost
Date:
Andres, all,

* Andres Freund (andres@anarazel.de) wrote:
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily
>
> I don't see any reason to keep them around. If seriously doubt anybody
> is using them seriously in anything but error.

+100

Thanks!

Stephen

Re: [HACKERS] Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily

If by the first point you mean "assume V1 when no info function is found",
I object to that.  If you mean you want to require an info function, that
might be OK.

The habit of zero-initializing Datums has got exactly nothing to do with
V0 functions; it's about ensuring consistent results and avoiding
heisenbugs from use of uninitialized memory.  I do not think we should
drop it.
        regards, tom lane



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'm wondering if it's not time for $subject:
> > - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
> >   forgotten
> > - They have us keep weird hacks around just for the sake of testing V0
> > - they actually cost performance, because we have to zero initialize Datums, even if
> >   the corresponding isnull marker is set.
> > - they allow to call arbitrary functions pretty easily
> 
> If by the first point you mean "assume V1 when no info function is found",
> I object to that.  If you mean you want to require an info function, that
> might be OK.

I mean throwing an error.  Silently assuming V1 seems like a horrible
idea to me.  It doesn't seem unlikely that we want to introduce a new
call interface at some point given the runtime cost of the current one,
and that'd just bring back the current problem.


> The habit of zero-initializing Datums has got exactly nothing to do with
> V0 functions; it's about ensuring consistent results and avoiding
> heisenbugs from use of uninitialized memory.  I do not think we should
> drop it.

Well, V0 functions don't have a real way to get information about NULL,
and we allow non-strict V0 functions, so?

Regards,

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 14:53:58 -0800, Andres Freund wrote:
> On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> > The habit of zero-initializing Datums has got exactly nothing to do with
> > V0 functions; it's about ensuring consistent results and avoiding
> > heisenbugs from use of uninitialized memory.  I do not think we should
> > drop it.
> 
> Well, V0 functions don't have a real way to get information about NULL,
> and we allow non-strict V0 functions, so?

Oh, and the regression tests fail in V0 functions if you drop that.

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
>> The habit of zero-initializing Datums has got exactly nothing to do with
>> V0 functions; it's about ensuring consistent results and avoiding
>> heisenbugs from use of uninitialized memory.  I do not think we should
>> drop it.

> Well, V0 functions don't have a real way to get information about NULL,
> and we allow non-strict V0 functions, so?

Non-strict V0 functions are pretty fundamentally broken, although IIRC
there was some hack whereby they could see the isnull marker for their
first argument, which is why we didn't just disallow the case.  There was
never any expectation that checking for value == 0 was an appropriate
coding method for detecting nulls, because it couldn't work for
pass-by-value data types.

Again, the point of initializing those values is not to support broken
tests for nullness.  It's to ensure consistent behavior in case of
buggy attempts to use null values.  It's much like the fact that makeNode
zero-fills new node structs: that's mostly wasted work, if you want to
look at it in a certain way, but it's good for reproducibility.
        regards, tom lane



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 18:03:04 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-12-08 17:38:38 -0500, Tom Lane wrote:
> >> The habit of zero-initializing Datums has got exactly nothing to do with
> >> V0 functions; it's about ensuring consistent results and avoiding
> >> heisenbugs from use of uninitialized memory.  I do not think we should
> >> drop it.
> 
> > Well, V0 functions don't have a real way to get information about NULL,
> > and we allow non-strict V0 functions, so?
> 
> Non-strict V0 functions are pretty fundamentally broken, although IIRC
> there was some hack whereby they could see the isnull marker for their
> first argument, which is why we didn't just disallow the case.  There was
> never any expectation that checking for value == 0 was an appropriate
> coding method for detecting nulls, because it couldn't work for
> pass-by-value data types.

Well, we have a bunch in our regression tests ;). And I'm not saying
it's *good* that they rely on that, I think it's a reason to drop the
whole V0 interface.

(I also suspect there's a bunch in brin related to this)


> Again, the point of initializing those values is not to support broken
> tests for nullness.  It's to ensure consistent behavior in case of
> buggy attempts to use null values.

Well, it also makes such attempts undetectable. I'm not really convinced
that that's such an improvement.


Greetings,

Andres Freund



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Robert Haas
Date:
On Thu, Dec 8, 2016 at 5:53 PM, Andres Freund <andres@anarazel.de> wrote:
> I mean throwing an error.  Silently assuming V1 seems like a horrible
> idea to me.  It doesn't seem unlikely that we want to introduce a new
> call interface at some point given the runtime cost of the current one,
> and that'd just bring back the current problem.

I don't have a problem with getting rid of V0; I think it's an
annoyance.  But I think a much more interesting question is how to
redesign this interface.  I think V0 actually had the right idea in
passing arguments as C arguments rather than marshaling them in some
other data structure.  If the function doesn't need flinfo or context
or collation and takes a fixed number of argument each of which is
either strict or a pass-by-reference data type (where you can use
DatumGetPointer(NULL) to represent a NULL!) and either never returns
NULL or returns a pass-by-reference data type, then you don't need any
of this complicated fcinfo stuff.  You can just pass the arguments and
get back the result.  That's less work for both the caller (who
doesn't have to stick the arguments into an fcinfo) and the callee
(who doesn't have to fish them back out).  In the sortsupport
machinery, we've gone to some lengths to make it possible - at least
in the context of sorts - to get back to something closer to that kind
of interface.  But that interface goes to considerable trouble to
achieve that result, making it practical only for bulk operations.
It's kind of ironic, at least IMHO, that the DirectionFunctionCall
macros are anything but direct.  Each one is a non-inlined function
call that does a minimum of 8 variable assignments before actually
calling the function.

There obviously has to be some provision for functions that actually
need all the stuff we pass in the V1 calling convention, but there are
a huge number of functions that don't need any of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Peter Eisentraut
Date:
On 12/9/16 7:52 AM, Robert Haas wrote:
> It's kind of ironic, at least IMHO, that the DirectionFunctionCall
> macros are anything but direct.  Each one is a non-inlined function
> call that does a minimum of 8 variable assignments before actually
> calling the function.

If this is a problem (it might be), then we can just make those calls,
er, direct C function calls to different function.  For example,
   result = DatumGetObjectId(DirectFunctionCall1(oidin,
CStringGetDatum(pro_name_or_oid)));

could just be
   result = oidin_internal(pro_name_or_oid);

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Robert Haas
Date:
On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 12/9/16 7:52 AM, Robert Haas wrote:
>> It's kind of ironic, at least IMHO, that the DirectionFunctionCall
>> macros are anything but direct.  Each one is a non-inlined function
>> call that does a minimum of 8 variable assignments before actually
>> calling the function.
>
> If this is a problem (it might be), then we can just make those calls,
> er, direct C function calls to different function.  For example,
>
>     result = DatumGetObjectId(DirectFunctionCall1(oidin,
>                               CStringGetDatum(pro_name_or_oid)));
>
> could just be
>
>     result = oidin_internal(pro_name_or_oid);

Yeah, true -- that works for simple cases, and might be beneficial
where you're doing lots and lots of calls in a tight loop.

For the more general problem, I wonder if we should try to figure out
something where we have one calling convention for "simple" functions
(those that little or none of the information in fcinfo and can pretty
much just take their SQL args as C args) and another for "complex"
functions (where we do the full push-ups).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-19 15:25:42 -0500, Robert Haas wrote:
> On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 12/9/16 7:52 AM, Robert Haas wrote:
> >> It's kind of ironic, at least IMHO, that the DirectionFunctionCall
> >> macros are anything but direct.  Each one is a non-inlined function
> >> call that does a minimum of 8 variable assignments before actually
> >> calling the function.
> >
> > If this is a problem (it might be), then we can just make those calls,
> > er, direct C function calls to different function.  For example,
> >
> >     result = DatumGetObjectId(DirectFunctionCall1(oidin,
> >                               CStringGetDatum(pro_name_or_oid)));
> >
> > could just be
> >
> >     result = oidin_internal(pro_name_or_oid);
>
> Yeah, true -- that works for simple cases, and might be beneficial
> where you're doing lots and lots of calls in a tight loop.
>
> For the more general problem, I wonder if we should try to figure out
> something where we have one calling convention for "simple" functions
> (those that little or none of the information in fcinfo and can pretty
> much just take their SQL args as C args) and another for "complex"
> functions (where we do the full push-ups).

Unfortunately I think so would likely also increase overhead in a bunch
of places, because suddenly we need branches determining which callling
convention is in use. There's a fair amount of places, some of them
performance sensitive, that deal with functions that can get different
number of arguments. Prominently nodeAgg.c's transition functions for
example.

When JITing expressions that doesn't matter, because we avoid doing so
repetitively. But that's not really sufficient imo, non JITed
performance matters a lot too.

There's imo primarily two parts that make our calling convention
expensive:
a) Additional instructions required to push to/from fcinfo->arg[null],  including stack spilling required to have space
forthe arguments.
 
b) Increased number of cachelines touched, even for even trivial  functions. We need to read/write to at least five:
1)fcinfo->isnull to reset it  2) fcinfo->arg[0...] to set the argument  3) fcinfo->argnull[0...] to set argnull
(separatecacheline)  4) fcinfo->flinfo->fn_addr to get the actual address to call  5) instruction cache miss for the
function'scontents
 

We can doctor around this some. A retively easy way to reduce the
overhead of a similar function call interface would be to restructure
things so we have something like:

typedef struct FunctionArg2
{       Datum arg;       bool argnull;
} FunctionArg2;

typedef struct FunctionCallInfoData2
{       /* cached function address from ->flinfo */       PGFunction      fn_addr;       /* ptr to lookup info used for
thiscall */       FmgrInfo       *flinfo;       /* function must set true if result is NULL */       bool
isnull;      /* # arguments actually passed */       short           nargs;       /* collation for function to use */
   Oid             fncollation;    /* collation for function to use */       /* pass or return extra info about result
*/      fmNodePtr       resultinfo;       /* array big enough to fit flinfo->fn_nargs */       FunctionArg2
args[FLEXIBLE_ARRAY_MEMBER];
} FunctionCallInfoData2;

That'd mean some code changes because accessing arguments can't be done
quite as now, but it'd be fairly minor.  We'd end up with a lot fewer
cache misses for common cases, because there's no need to fetch fn_addr,
and because the first two arg/argnull pairs still fit within the first
cacheline (which they never can in the current interface!).

But we'd still be passing arguments via memory, instead of using
registers.

I think a more efficient variant would make the function signature look
something like:

Datum /* directly returned argument */
pgfunc(       /* extra information about function call */       FunctionCallInfo *fcinfo,       /* bitmap of NULL
arguments*/       uint64_t nulls,       /* first argument */       Datum arg0,       /* second argument */       Datum
arg1,      /* returned NULL */       bool *isnull
 
);

with additional arguments passed via fcinfo as currently done. Since
most of the performance critical function calls only have two arguments
that should allow to keep usage of fcinfo-> to the cases where we need
the extra information.  On 64bit linuxes the above will be entirely in
registers, on 64bit windows isnull would be placed on the stack.

I don't quite see how we could avoid stack usage at all for windows, any
of the above arguments seems important.

There'd be a need for some trickery to make PG_GETARG_DATUM() work
efficiently - afaics the argument numbers passed to PG_GETARG_*() are
always constants, so that shouldn't be too bad.

Regards,

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Pavel Stehule
Date:


2016-12-20 9:11 GMT+01:00 Andres Freund <andres@anarazel.de>:
On 2016-12-19 15:25:42 -0500, Robert Haas wrote:
> On Mon, Dec 19, 2016 at 3:13 PM, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> > On 12/9/16 7:52 AM, Robert Haas wrote:
> >> It's kind of ironic, at least IMHO, that the DirectionFunctionCall
> >> macros are anything but direct.  Each one is a non-inlined function
> >> call that does a minimum of 8 variable assignments before actually
> >> calling the function.
> >
> > If this is a problem (it might be), then we can just make those calls,
> > er, direct C function calls to different function.  For example,
> >
> >     result = DatumGetObjectId(DirectFunctionCall1(oidin,
> >                               CStringGetDatum(pro_name_or_oid)));
> >
> > could just be
> >
> >     result = oidin_internal(pro_name_or_oid);
>
> Yeah, true -- that works for simple cases, and might be beneficial
> where you're doing lots and lots of calls in a tight loop.
>
> For the more general problem, I wonder if we should try to figure out
> something where we have one calling convention for "simple" functions
> (those that little or none of the information in fcinfo and can pretty
> much just take their SQL args as C args) and another for "complex"
> functions (where we do the full push-ups).

Unfortunately I think so would likely also increase overhead in a bunch
of places, because suddenly we need branches determining which callling
convention is in use. There's a fair amount of places, some of them
performance sensitive, that deal with functions that can get different
number of arguments. Prominently nodeAgg.c's transition functions for
example.

When JITing expressions that doesn't matter, because we avoid doing so
repetitively. But that's not really sufficient imo, non JITed
performance matters a lot too.

There's imo primarily two parts that make our calling convention
expensive:
a) Additional instructions required to push to/from fcinfo->arg[null],
   including stack spilling required to have space for the arguments.
b) Increased number of cachelines touched, even for even trivial
   functions. We need to read/write to at least five:
   1) fcinfo->isnull to reset it
   2) fcinfo->arg[0...] to set the argument
   3) fcinfo->argnull[0...] to set argnull (separate cacheline)
   4) fcinfo->flinfo->fn_addr to get the actual address to call
   5) instruction cache miss for the function's contents

We can doctor around this some. A retively easy way to reduce the
overhead of a similar function call interface would be to restructure
things so we have something like:

typedef struct FunctionArg2
{
        Datum arg;
        bool argnull;
} FunctionArg2;

typedef struct FunctionCallInfoData2
{
        /* cached function address from ->flinfo */
        PGFunction      fn_addr;
        /* ptr to lookup info used for this call */
        FmgrInfo       *flinfo;
        /* function must set true if result is NULL */
        bool            isnull;
        /* # arguments actually passed */
        short           nargs;
        /* collation for function to use */
        Oid             fncollation;    /* collation for function to use */
        /* pass or return extra info about result */
        fmNodePtr       resultinfo;
        /* array big enough to fit flinfo->fn_nargs */
        FunctionArg2    args[FLEXIBLE_ARRAY_MEMBER];
} FunctionCallInfoData2;

That'd mean some code changes because accessing arguments can't be done
quite as now, but it'd be fairly minor.  We'd end up with a lot fewer
cache misses for common cases, because there's no need to fetch fn_addr,
and because the first two arg/argnull pairs still fit within the first
cacheline (which they never can in the current interface!).

But we'd still be passing arguments via memory, instead of using
registers.

I think a more efficient variant would make the function signature look
something like:

Datum /* directly returned argument */
pgfunc(
        /* extra information about function call */
        FunctionCallInfo *fcinfo,
        /* bitmap of NULL arguments */
        uint64_t nulls,
        /* first argument */
        Datum arg0,
        /* second argument */
        Datum arg1,
        /* returned NULL */
        bool *isnull
);

with additional arguments passed via fcinfo as currently done. Since
most of the performance critical function calls only have two arguments
that should allow to keep usage of fcinfo-> to the cases where we need
the extra information.  On 64bit linuxes the above will be entirely in
registers, on 64bit windows isnull would be placed on the stack.

I don't quite see how we could avoid stack usage at all for windows, any
of the above arguments seems important.

There'd be a need for some trickery to make PG_GETARG_DATUM() work
efficiently - afaics the argument numbers passed to PG_GETARG_*() are
always constants, so that shouldn't be too bad.

In this case some benchmark can be very important (and interesting). I am not sure if faster function execution has significant benefit on Vulcano like executor.

Regards

Pavel
 

Regards,

Andres


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-20 09:59:43 +0100, Pavel Stehule wrote:
> In this case some benchmark can be very important (and interesting). I am
> not sure if faster function execution has significant benefit on Vulcano
> like executor.

It's fairly to see function calls as significant overhead. In fact, I
moved things *away* from a pure Vulcano style executor, and the benefits
weren't huge, primarily due to expression evaluation overhead (of which
function call overhead is one part).  After JITing of expressions, it
becomes even more noticeable, because the overhead of the expression
evaluation is reduced.

I don't quite see why a Vulcano style executor should make function call
overhead meaningless?

Regards,

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-20 01:14:10 -0800, Andres Freund wrote:
> On 2016-12-20 09:59:43 +0100, Pavel Stehule wrote:
> > In this case some benchmark can be very important (and interesting). I am
> > not sure if faster function execution has significant benefit on Vulcano
> > like executor.
>
> It's fairly to see function calls as significant overhead. In fact, I
> moved things *away* from a pure Vulcano style executor, and the benefits
> weren't huge, primarily due to expression evaluation overhead (of which
> function call overhead is one part).  After JITing of expressions, it
> becomes even more noticeable, because the overhead of the expression
> evaluation is reduced.

As an example, here's a JITed TPCH Q1 profile:
+   15.48%  postgres  postgres                          [.] slot_deform_tuple
+    8.42%  postgres  perf-27760.map                    [.] evalexpr90
+    5.98%  postgres  postgres                          [.] float8_accum
+    4.63%  postgres  postgres                          [.] slot_getattr
+    3.69%  postgres  postgres                          [.] bpchareq
+    3.39%  postgres  postgres                          [.] heap_getnext
+    3.22%  postgres  postgres                          [.] float8pl
+    2.86%  postgres  postgres                          [.] TupleHashTableMatch.isra.7
+    2.77%  postgres  postgres                          [.] hashbpchar
+    2.77%  postgres  postgres                          [.] float8mul
+    2.73%  postgres  postgres                          [.] ExecAgg
+    2.40%  postgres  postgres                          [.] hash_any
+    2.34%  postgres  postgres                          [.] MemoryContextReset
+    1.98%  postgres  postgres                          [.] pg_detoast_datum_packed

evalexpr90 is the expression that does the aggregate transition
function. float8_accum, bpchareq, float8pl , float8mul, ...  are all
function calls, and a good percentage of the overhead in evalexpr90 is
pushing arguments onto fcinfo->arg[nulls].

Greetings,

Andres Freund



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Pavel Stehule
Date:


2016-12-20 10:28 GMT+01:00 Andres Freund <andres@anarazel.de>:
On 2016-12-20 01:14:10 -0800, Andres Freund wrote:
> On 2016-12-20 09:59:43 +0100, Pavel Stehule wrote:
> > In this case some benchmark can be very important (and interesting). I am
> > not sure if faster function execution has significant benefit on Vulcano
> > like executor.
>
> It's fairly to see function calls as significant overhead. In fact, I
> moved things *away* from a pure Vulcano style executor, and the benefits
> weren't huge, primarily due to expression evaluation overhead (of which
> function call overhead is one part).  After JITing of expressions, it
> becomes even more noticeable, because the overhead of the expression
> evaluation is reduced.

For me a Vulcano style is row by row processing. Using JIT or not using has not significant impact.

Interesting change can be block level processing.
 

As an example, here's a JITed TPCH Q1 profile:
+   15.48%  postgres  postgres                          [.] slot_deform_tuple
+    8.42%  postgres  perf-27760.map                    [.] evalexpr90
+    5.98%  postgres  postgres                          [.] float8_accum
+    4.63%  postgres  postgres                          [.] slot_getattr
+    3.69%  postgres  postgres                          [.] bpchareq
+    3.39%  postgres  postgres                          [.] heap_getnext
+    3.22%  postgres  postgres                          [.] float8pl
+    2.86%  postgres  postgres                          [.] TupleHashTableMatch.isra.7
+    2.77%  postgres  postgres                          [.] hashbpchar
+    2.77%  postgres  postgres                          [.] float8mul
+    2.73%  postgres  postgres                          [.] ExecAgg
+    2.40%  postgres  postgres                          [.] hash_any
+    2.34%  postgres  postgres                          [.] MemoryContextReset
+    1.98%  postgres  postgres                          [.] pg_detoast_datum_packed


Our bottle neck is row format and row related processing.
 
evalexpr90 is the expression that does the aggregate transition
function. float8_accum, bpchareq, float8pl , float8mul, ...  are all
function calls, and a good percentage of the overhead in evalexpr90 is
pushing arguments onto fcinfo->arg[nulls].

Greetings,

Andres Freund

Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-20 10:44:35 +0100, Pavel Stehule wrote:
> 2016-12-20 10:28 GMT+01:00 Andres Freund <andres@anarazel.de>:
>
> > On 2016-12-20 01:14:10 -0800, Andres Freund wrote:
> > > On 2016-12-20 09:59:43 +0100, Pavel Stehule wrote:
> > > > In this case some benchmark can be very important (and interesting). I
> > am
> > > > not sure if faster function execution has significant benefit on
> > Vulcano
> > > > like executor.
> > >
> > > It's fairly to see function calls as significant overhead. In fact, I
> > > moved things *away* from a pure Vulcano style executor, and the benefits
> > > weren't huge, primarily due to expression evaluation overhead (of which
> > > function call overhead is one part).  After JITing of expressions, it
> > > becomes even more noticeable, because the overhead of the expression
> > > evaluation is reduced.
> >
>
> For me a Vulcano style is row by row processing. Using JIT or not using has
> not significant impact.
>
> Interesting change can be block level processing.

I don't think that's true. The largest bottlenecks atm have relatively
little to do with block level processing. I know, because I went
there. We have so many other bottlenecks that row-by-row processing
vanishes behind them.  Without changing the tuple flow, the performance
with either applied or posted patches for TPCH-Q1 already went up by
more than a factor of 2.5.

Anyway, this seems like a side-track discussion, better done in another
thread.

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Robert Haas
Date:
On Tue, Dec 20, 2016 at 3:11 AM, Andres Freund <andres@anarazel.de> wrote:
> I think a more efficient variant would make the function signature look
> something like:
>
> Datum /* directly returned argument */
> pgfunc(
>         /* extra information about function call */
>         FunctionCallInfo *fcinfo,
>         /* bitmap of NULL arguments */
>         uint64_t nulls,
>         /* first argument */
>         Datum arg0,
>         /* second argument */
>         Datum arg1,
>         /* returned NULL */
>         bool *isnull
> );

Yeah, that's kind of nice.  I like the uint64 for nulls, although
FUNC_MAX_ARGS > 64 by default and certainly can be configured that
way.  It wouldn't be a problem for any common cases, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-20 08:15:07 -0500, Robert Haas wrote:
> On Tue, Dec 20, 2016 at 3:11 AM, Andres Freund <andres@anarazel.de> wrote:
> > I think a more efficient variant would make the function signature look
> > something like:
> >
> > Datum /* directly returned argument */
> > pgfunc(
> >         /* extra information about function call */
> >         FunctionCallInfo *fcinfo,
> >         /* bitmap of NULL arguments */
> >         uint64_t nulls,
> >         /* first argument */
> >         Datum arg0,
> >         /* second argument */
> >         Datum arg1,
> >         /* returned NULL */
> >         bool *isnull
> > );
> 
> Yeah, that's kind of nice.  I like the uint64 for nulls, although
> FUNC_MAX_ARGS > 64 by default and certainly can be configured that
> way.  It wouldn't be a problem for any common cases, of course.

Well, I suspect we'd have to change that then. Is anybody seriously
interested in supporting FUNC_MAX_ARGS > 64? We don't have to make our
life hard by supporting useless features... I'd even question using
64bit for this on 32bit platforms.

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Robert Haas
Date:
On Tue, Dec 20, 2016 at 8:21 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-12-20 08:15:07 -0500, Robert Haas wrote:
>> On Tue, Dec 20, 2016 at 3:11 AM, Andres Freund <andres@anarazel.de> wrote:
>> > I think a more efficient variant would make the function signature look
>> > something like:
>> >
>> > Datum /* directly returned argument */
>> > pgfunc(
>> >         /* extra information about function call */
>> >         FunctionCallInfo *fcinfo,
>> >         /* bitmap of NULL arguments */
>> >         uint64_t nulls,
>> >         /* first argument */
>> >         Datum arg0,
>> >         /* second argument */
>> >         Datum arg1,
>> >         /* returned NULL */
>> >         bool *isnull
>> > );
>>
>> Yeah, that's kind of nice.  I like the uint64 for nulls, although
>> FUNC_MAX_ARGS > 64 by default and certainly can be configured that
>> way.  It wouldn't be a problem for any common cases, of course.
>
> Well, I suspect we'd have to change that then. Is anybody seriously
> interested in supporting FUNC_MAX_ARGS > 64? We don't have to make our
> life hard by supporting useless features... I'd even question using
> 64bit for this on 32bit platforms.

Advanced Server uses 256, and we had a customer complain recently that
256 wasn't high enough.  So apparently the use case for functions with
ridiculous numbers of arguments isn't exactly 0.  For most people it's
not a big problem in practice, but actually I think that it's pretty
lame that we have a limit at all.  I mean, there's no reason that we
can't use dynamic allocation here.  If we put the first, say, 8
arguments in the FunctionCallInfoData itself and then separately
allocated space any extras, the structure could be a lot smaller
without needing an arbitrary limit on the argument count.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-20 08:35:05 -0500, Robert Haas wrote:
> On Tue, Dec 20, 2016 at 8:21 AM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-12-20 08:15:07 -0500, Robert Haas wrote:
> >> On Tue, Dec 20, 2016 at 3:11 AM, Andres Freund <andres@anarazel.de> wrote:
> >> > I think a more efficient variant would make the function signature look
> >> > something like:
> >> >
> >> > Datum /* directly returned argument */
> >> > pgfunc(
> >> >         /* extra information about function call */
> >> >         FunctionCallInfo *fcinfo,
> >> >         /* bitmap of NULL arguments */
> >> >         uint64_t nulls,
> >> >         /* first argument */
> >> >         Datum arg0,
> >> >         /* second argument */
> >> >         Datum arg1,
> >> >         /* returned NULL */
> >> >         bool *isnull
> >> > );
> >>
> >> Yeah, that's kind of nice.  I like the uint64 for nulls, although
> >> FUNC_MAX_ARGS > 64 by default and certainly can be configured that
> >> way.  It wouldn't be a problem for any common cases, of course.
> >
> > Well, I suspect we'd have to change that then. Is anybody seriously
> > interested in supporting FUNC_MAX_ARGS > 64? We don't have to make our
> > life hard by supporting useless features... I'd even question using
> > 64bit for this on 32bit platforms.
>
> Advanced Server uses 256, and we had a customer complain recently that
> 256 wasn't high enough.  So apparently the use case for functions with
> ridiculous numbers of arguments isn't exactly 0.

Well, there's a cost/benefit analysis involved here, as in a lot of
other places. There's a lot of things one can conceive a use-case for,
doesn't mean it's worth catering for all of them.


> I mean, there's no reason that we can't use dynamic allocation here.
> If we put the first, say, 8 arguments in the FunctionCallInfoData
> itself and then separately allocated space any extras, the structure
> could be a lot smaller without needing an arbitrary limit on the
> argument count.

Except that that'll mean additional overhead during evaluation of
function arguments, an overhead that everyone will have to pay. Suddenly
you need two loops that fill in arguments, depending on their
argument-number (i.e. additional branches in a thight spot).  And not
being able to pass the null bitmap via an register argument also costs
everyone.

Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Robert Haas
Date:
On Tue, Dec 20, 2016 at 8:44 AM, Andres Freund <andres@anarazel.de> wrote:
>> Advanced Server uses 256, and we had a customer complain recently that
>> 256 wasn't high enough.  So apparently the use case for functions with
>> ridiculous numbers of arguments isn't exactly 0.
>
> Well, there's a cost/benefit analysis involved here, as in a lot of
> other places. There's a lot of things one can conceive a use-case for,
> doesn't mean it's worth catering for all of them.

Clearly true.

>> I mean, there's no reason that we can't use dynamic allocation here.
>> If we put the first, say, 8 arguments in the FunctionCallInfoData
>> itself and then separately allocated space any extras, the structure
>> could be a lot smaller without needing an arbitrary limit on the
>> argument count.
>
> Except that that'll mean additional overhead during evaluation of
> function arguments, an overhead that everyone will have to pay. Suddenly
> you need two loops that fill in arguments, depending on their
> argument-number (i.e. additional branches in a thight spot).  And not
> being able to pass the null bitmap via an register argument also costs
> everyone.

Well, I'm hoping there is a way to have a fast-path and a slow-path
without slowing things down too much.  You're proposing something like
that anyway, because you're proposing to put the first two arguments
in actual function arguments and the rest someplace else.  I wouldn't
be surprised if 99% of function calls had 2 arguments or less because
most people probably call functions mostly or entirely as operators
and even user-defined functions don't necessarily have lots of
arguments.  But we wouldn't propose restricting all function calls to
2 arguments just so +(int4, int4) can be fast.  There's clearly got to
be a slow path for calls with more than 2 arguments and, if that's the
case, I don't see why there can't be a really-slow path, perhaps
guarded by unlikely(), for cases involving more than 8 or 16 or
whatever.  I find it really hard to believe that in 2016 that we can't
find a way to make simple things fast and complicated things possible.
Saying we can't make SQL-callable functions with large number of
arguments work feels to me like saying that we have to go back to
8-character filenames with 3-character extensions for efficiency - in
the 1980s, that argument might have held some water, but nobody buys
it any more.  Human time is worth more than computer time, and humans
save time when programs don't impose inconveniently-small limits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Well, I'm hoping there is a way to have a fast-path and a slow-path
> without slowing things down too much.

Seems like this discussion has veered way off into the weeds.
I suggest we confine it to the stated topic; if you want to discuss
ways to optimize V1 protocol (or invent a V2, which some of this
sounds like it would need), that should be a distinct thread.
        regards, tom lane



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2016-12-08 13:34:41 -0800, Andres Freund wrote:
> Hi,
> 
> I'm wondering if it's not time for $subject:
> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>   forgotten
> - They have us keep weird hacks around just for the sake of testing V0
> - they actually cost performance, because we have to zero initialize Datums, even if
>   the corresponding isnull marker is set.
> - they allow to call arbitrary functions pretty easily
> 
> I don't see any reason to keep them around. If seriously doubt anybody
> is using them seriously in anything but error.

Patches attached.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2017-02-28 23:15:15 -0800, Andres Freund wrote:
> On 2016-12-08 13:34:41 -0800, Andres Freund wrote:
> > Hi,
> > 
> > I'm wondering if it's not time for $subject:
> > - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
> >   forgotten
> > - They have us keep weird hacks around just for the sake of testing V0
> > - they actually cost performance, because we have to zero initialize Datums, even if
> >   the corresponding isnull marker is set.
> > - they allow to call arbitrary functions pretty easily
> > 
> > I don't see any reason to keep them around. If seriously doubt anybody
> > is using them seriously in anything but error.
> 
> Patches attached.

One unaddressed question in those patches is what we do with
src/backend/utils/fmgr/README - I'm not quite sure what its purpose is,
in its current state.  If we want to keep it, we'd probably have to
pretty aggressively revise it?

- Andres



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Peter Eisentraut
Date:
On 3/1/17 02:22, Andres Freund wrote:
> One unaddressed question in those patches is what we do with
> src/backend/utils/fmgr/README - I'm not quite sure what its purpose is,
> in its current state.  If we want to keep it, we'd probably have to
> pretty aggressively revise it?

Some of the information in there, such as the use of the FmgrInfo and
FunctionCallInfoData structs, doesn't seem to appear anywhere else in
that amount of detail, so it would be a loss to just delete the file, I
think.  Perhaps just lightly editing out the "I propose to do this" tone
would work.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-12-08 13:34:41 -0800, Andres Freund wrote:
>> Hi,
>> 
>> I'm wondering if it's not time for $subject:
>> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
>> forgotten
>> - They have us keep weird hacks around just for the sake of testing V0
>> - they actually cost performance, because we have to zero initialize Datums, even if
>> the corresponding isnull marker is set.
>> - they allow to call arbitrary functions pretty easily
>> 
>> I don't see any reason to keep them around. If seriously doubt anybody
>> is using them seriously in anything but error.

I find these arguments pretty weak.  In particular I don't buy your claim
that we could stop zero-initializing Datums, because I think we'd just
start getting valgrind complaints if we did.  It's too common to copy both
a Datum and its isnull flag without any particular inquiry into whether
the datum is null.
        regards, tom lane



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Andres Freund
Date:
On 2017-03-01 11:18:34 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-12-08 13:34:41 -0800, Andres Freund wrote:
> >> Hi,
> >> 
> >> I'm wondering if it's not time for $subject:
> >> - V0 causes confusion / weird crashes when PG_FUNCTION_INFO_V1 was
> >> forgotten
> >> - They have us keep weird hacks around just for the sake of testing V0
> >> - they actually cost performance, because we have to zero initialize Datums, even if
> >> the corresponding isnull marker is set.
> >> - they allow to call arbitrary functions pretty easily
> >> 
> >> I don't see any reason to keep them around. If seriously doubt anybody
> >> is using them seriously in anything but error.
> 
> I find these arguments pretty weak.  In particular I don't buy your claim
> that we could stop zero-initializing Datums, because I think we'd just
> start getting valgrind complaints if we did.

I tink we've litigated that in a side-thread - I'm not planning to change
any policy around this. Perhaps I shouldn't have quoted the original
start of the thread...

At this point I primarily am concerned about
a) the way they're confusing for authors, by causing spurious crashes
b) at some point not too far away in the future I want to introduce a  faster interface, and I don't want unnecessarily
manyaround.
 

Greetings,

Andres Freund



Re: [HACKERS] Time to drop old-style (V0) functions?

From
Peter Eisentraut
Date:
I think we have consensus to go ahead with this, and the patches are
mostly mechanical, so I only have a few comments on style and one
possible bug below:


0001-Move-contrib-seg-to-only-use-V1-calling-conventions.patch

 static int     restore(char *s, float val, int n);

-
 /*****************************************************************************
  * Input/Output functions
  *****************************************************************************/

+

doesn't seem like the right whitespace change


@@ -359,13 +360,14 @@ gseg_picksplit(GistEntryVector *entryvec,
        /*
         * Emit segments to the left output page, and compute its bounding box.
         */
-       datum_l = (SEG *) palloc(sizeof(SEG));
-       memcpy(datum_l, sort_items[0].data, sizeof(SEG));
+       datum_l = PointerGetDatum(palloc(sizeof(SEG)));
+       memcpy(DatumGetPointer(datum_l), sort_items[0].data, sizeof(SEG));

There would be a little bit less back-and-forth here if you kept
datum_l and datum_r of type SEG *.


                case RTOverlapStrategyNumber:
-                       retval = (bool) seg_overlap(key, query);
+                       retval =
+                               !DatumGetBool(DirectFunctionCall2(seg_overlap, key, query));
                        break;

Accidentally flipped the logic?


-bool
-seg_contains(SEG *a, SEG *b)
+Datum
+seg_contains(PG_FUNCTION_ARGS)
 {
-       return ((a->lower <= b->lower) && (a->upper >= b->upper));
+       SEG                *a = (SEG *) PG_GETARG_POINTER(0);
+       SEG                *b = (SEG *) PG_GETARG_POINTER(1);
+
+       PG_RETURN_BOOL((a->lower <= b->lower) && (a->upper >= b->upper));
 }

-bool
-seg_contained(SEG *a, SEG *b)
+Datum
+seg_contained(PG_FUNCTION_ARGS)
 {
-       return (seg_contains(b, a));
+       PG_RETURN_DATUM(
+                                       DirectFunctionCall2(seg_contains,
+                                                                               PG_GETARG_DATUM(1),
+                                                                               PG_GETARG_DATUM(0)));
 }

Maybe in seg_contained also assign the arguments to a and b, so it's
easier to see the relationship between contains and contained.


-bool
-seg_same(SEG *a, SEG *b)
+Datum
+seg_same(PG_FUNCTION_ARGS)
 {
-       return seg_cmp(a, b) == 0;
+       Datum           cmp =
+       DirectFunctionCall2(seg_cmp, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));
+
+       PG_RETURN_BOOL(DatumGetInt32(cmp) == 0);
 }

I would write this as

int32 cmp = DatumGetInt32(DirectFunctionCall2(seg_cmp, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));

Having a variable of type Datum is a bit meaningless.

Oh, I see you did it that way later in seg_lt() etc., so same here.


0002-Remove-support-for-version-0-calling-conventions.patch

diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 255bfddad7..cd41b89136 100644
--- a/doc/src/sgml/xfunc.sgml
+++ b/doc/src/sgml/xfunc.sgml
@@ -675,7 +675,7 @@ CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$
 $$ LANGUAGE SQL;

 SELECT mleast(10, -1, 5, 4.4);
- mleast
+ mleast
 --------
      -1
 (1 row)

These changes are neither right nor wrong, but we have had previous
discussions about this and settled on leaving the whitespace as psql
outputs it.  In any case it seems unrelated here.


    <para>
-    Two different calling conventions are currently used for C functions.
-    The newer <quote>version 1</quote> calling convention is indicated by writing
-    a <literal>PG_FUNCTION_INFO_V1()</literal> macro call for the function,
-    as illustrated below.  Lack of such a macro indicates an old-style
-    (<quote>version 0</quote>) function.  The language name specified in <command>CREATE FUNCTION</command>
-    is <literal>C</literal> in either case.  Old-style functions are now deprecated
-    because of portability problems and lack of functionality, but they
-    are still supported for compatibility reasons.
+
+    Currently only one calling convention is used for C functions
+    (<quote>version 1</quote>). Support for that calling convention is
+    indicated by writing a <literal>PG_FUNCTION_INFO_V1()</literal> macro
+    call for the function, as illustrated below.
    </para>

extra blank line


@@ -1655,8 +1652,8 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
       <para>
        If the name starts with the string <literal>$libdir</literal>,
        that part is replaced by the <productname>PostgreSQL</> package
-        library directory
-       name, which is determined at build time.<indexterm><primary>$libdir</></>
+       library directory name, which is determined at build time.
+       <indexterm><primary>$libdir</></>
       </para>
      </listitem>

unrelated?


@@ -455,9 +394,12 @@ fetch_finfo_record(void *filehandle, char *funcname)

infofuncname);
        if (infofunc == NULL)
        {
-               /* Not found --- assume version 0 */
-               pfree(infofuncname);
-               return &default_inforec;
+               ereport(ERROR,
+                               (errcode(ERRCODE_UNDEFINED_FUNCTION),
+                                errmsg("could not find function information for function \"%s\"",
+                                               funcname),
+                                errhint("SQL-callable functions need an accompanying
PG_FUNCTION_INFO_V1(funcname).")));
+               return NULL; /* silence compiler */
        }

        /* Found, so call it */

Perhaps plug in the actual C function name into the error message, like

    errhint("SQL-callable functions need an accompanying PG_FUNCTION_INFO_V1(%s).", funcname)


@@ -270,14 +269,16 @@ widget_in(char *str)
        result->center.y = atof(coord[1]);
        result->radius = atof(coord[2]);

-       return result;
+       PG_RETURN_DATUM(PointerGetDatum(result));
 }

Could be PG_RETURN_POINTER().


Attached is a patch for src/backend/utils/fmgr/README that edits out the
transitional comments and just keeps the parts still relevant.


Tests pass for me.  So this is mostly ready to go AFAICS.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Time to drop old-style (V0) functions?

From
Craig Ringer
Date:
On 7 March 2017 at 22:50, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> I think we have consensus to go ahead with this, and the patches are
> mostly mechanical, so I only have a few comments on style and one
> possible bug below:
>
>
> 0001-Move-contrib-seg-to-only-use-V1-calling-conventions.patch
>
>  static int     restore(char *s, float val, int n);
>
> -
>  /*****************************************************************************
>   * Input/Output functions
>   *****************************************************************************/
>
> +
>
> doesn't seem like the right whitespace change

Fixed.

> @@ -359,13 +360,14 @@ gseg_picksplit(GistEntryVector *entryvec,
>         /*
>          * Emit segments to the left output page, and compute its bounding box.
>          */
> -       datum_l = (SEG *) palloc(sizeof(SEG));
> -       memcpy(datum_l, sort_items[0].data, sizeof(SEG));
> +       datum_l = PointerGetDatum(palloc(sizeof(SEG)));
> +       memcpy(DatumGetPointer(datum_l), sort_items[0].data, sizeof(SEG));
>
> There would be a little bit less back-and-forth here if you kept
> datum_l and datum_r of type SEG *.

Also, currently it does:

    v->spl_ldatum = PointerGetDatum(datum_l);
    v->spl_rdatum = PointerGetDatum(datum_r);

even though they're already Datum.

Downside of keeping them as SEG is we land up with:

        seg_l = DatumGetPointer(DirectFunctionCall2(seg_union,
                                                    PointerGetDatum(datum_l),

PointerGetDatum(sort_items[i].data)));

but at least it's tied to the fmgr call.

>
>                 case RTOverlapStrategyNumber:
> -                       retval = (bool) seg_overlap(key, query);
> +                       retval =
> +                               !DatumGetBool(DirectFunctionCall2(seg_overlap, key, query));
>                         break;
>
> Accidentally flipped the logic?

Looks like it. I don't see a reason for it; there's no corresponding
change around seg_overlap and the other callsite isn't negated:

         case RTOverlapStrategyNumber:
-            retval = (bool) seg_overlap(key, query);
+            retval = DirectFunctionCall2(seg_overlap, key, query);

so I'd say copy-pasteo, given nearby negated bools.

Fixed. Didn't find any other cases.

> -bool
> -seg_contains(SEG *a, SEG *b)
> +Datum
> +seg_contains(PG_FUNCTION_ARGS)
>  {
> -       return ((a->lower <= b->lower) && (a->upper >= b->upper));
> +       SEG                *a = (SEG *) PG_GETARG_POINTER(0);
> +       SEG                *b = (SEG *) PG_GETARG_POINTER(1);
> +
> +       PG_RETURN_BOOL((a->lower <= b->lower) && (a->upper >= b->upper));
>  }
>
> -bool
> -seg_contained(SEG *a, SEG *b)
> +Datum
> +seg_contained(PG_FUNCTION_ARGS)
>  {
> -       return (seg_contains(b, a));
> +       PG_RETURN_DATUM(
> +                                       DirectFunctionCall2(seg_contains,
> +                                                                               PG_GETARG_DATUM(1),
> +                                                                               PG_GETARG_DATUM(0)));
>  }
>
> Maybe in seg_contained also assign the arguments to a and b, so it's
> easier to see the relationship between contains and contained.

Done. Makes for nicer formatting too.

> -bool
> -seg_same(SEG *a, SEG *b)
> +Datum
> +seg_same(PG_FUNCTION_ARGS)
>  {
> -       return seg_cmp(a, b) == 0;
> +       Datum           cmp =
> +       DirectFunctionCall2(seg_cmp, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));
> +
> +       PG_RETURN_BOOL(DatumGetInt32(cmp) == 0);
>  }
>
> I would write this as
>
> int32 cmp = DatumGetInt32(DirectFunctionCall2(seg_cmp, PG_GETARG_DATUM(0), PG_GETARG_DATUM(1));
>
> Having a variable of type Datum is a bit meaningless.

If you're passing things around between other fmgr-using functions
it's often useful to just carry the Datum form around.

In this case it doesn't make much sense though. Done.


> 0002-Remove-support-for-version-0-calling-conventions.patch
>
> diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
> index 255bfddad7..cd41b89136 100644
> --- a/doc/src/sgml/xfunc.sgml
> +++ b/doc/src/sgml/xfunc.sgml
> @@ -675,7 +675,7 @@ CREATE FUNCTION mleast(VARIADIC arr numeric[]) RETURNS numeric AS $$
>  $$ LANGUAGE SQL;
>
>  SELECT mleast(10, -1, 5, 4.4);
> - mleast
> + mleast
>  --------
>       -1
>  (1 row)
>
> These changes are neither right nor wrong, but we have had previous
> discussions about this and settled on leaving the whitespace as psql
> outputs it.  In any case it seems unrelated here.

Removed. Though personally since the patch touches the file anyway it
doesn't seem to matter much either way.

> +
> +    Currently only one calling convention is used for C functions
> +    (<quote>version 1</quote>). Support for that calling convention is
> +    indicated by writing a <literal>PG_FUNCTION_INFO_V1()</literal> macro
> +    call for the function, as illustrated below.
>     </para>
>
> extra blank line

Fixed.

> @@ -1655,8 +1652,8 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision
>        <para>
>         If the name starts with the string <literal>$libdir</literal>,
>         that part is replaced by the <productname>PostgreSQL</> package
> -        library directory
> -       name, which is determined at build time.<indexterm><primary>$libdir</></>
> +       library directory name, which is determined at build time.
> +       <indexterm><primary>$libdir</></>
>        </para>
>       </listitem>
>
> unrelated?

Reverted. Though personally I'd like to be more forgiving of
unrelated-ish changes in docs, since they often need a tidy up when
you're touching the file anyway.

> @@ -455,9 +394,12 @@ fetch_finfo_record(void *filehandle, char *funcname)
>
infofuncname);
>         if (infofunc == NULL)
>         {
> -               /* Not found --- assume version 0 */
> -               pfree(infofuncname);
> -               return &default_inforec;
> +               ereport(ERROR,
> +                               (errcode(ERRCODE_UNDEFINED_FUNCTION),
> +                                errmsg("could not find function information for function \"%s\"",
> +                                               funcname),
> +                                errhint("SQL-callable functions need an accompanying
PG_FUNCTION_INFO_V1(funcname).")));
> +               return NULL; /* silence compiler */
>         }
>
>         /* Found, so call it */
>
> Perhaps plug in the actual C function name into the error message, like
>
>     errhint("SQL-callable functions need an accompanying PG_FUNCTION_INFO_V1(%s).", funcname)

Doesn't make sense unless we then make it singlular, IMO, otherwise it
just reads weirdly. I'd rather keep the 'funcname'. If you're writing
C code it shouldn't be too much of an ask.

> @@ -270,14 +269,16 @@ widget_in(char *str)
>         result->center.y = atof(coord[1]);
>         result->radius = atof(coord[2]);
>
> -       return result;
> +       PG_RETURN_DATUM(PointerGetDatum(result));
>  }
>
> Could be PG_RETURN_POINTER().

Done.

> Attached is a patch for src/backend/utils/fmgr/README that edits out the
> transitional comments and just keeps the parts still relevant.

Applied.

Attached with the suggested amendments. I'll have a read-through, but
you seem to have done the fine-tooth comb treatment already.

Passes "make check" and recovery tests, check-world running now.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Time to drop old-style (V0) functions?

From
Craig Ringer
Date:
On 27 March 2017 at 10:45, Craig Ringer <craig@2ndquadrant.com> wrote:

> Passes "make check" and recovery tests, check-world running now.

A couple of fixes pending.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Time to drop old-style (V0) functions?

From
Craig Ringer
Date:
On 27 March 2017 at 10:59, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 27 March 2017 at 10:45, Craig Ringer <craig@2ndquadrant.com> wrote:
>
>> Passes "make check" and recovery tests, check-world running now.
>
> A couple of fixes pending.

Updated.

I didn't have any way to make

                seg_l = (SEG *) DatumGetPointer(DirectFunctionCall2(seg_union,
                        PointerGetDatum(seg_l),
PointerGetDatum(sort_items[i].data)));

pretty, but *shrug*.

-- 
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Time to drop old-style (V0) functions?

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> I didn't have any way to make

>                 seg_l = (SEG *) DatumGetPointer(DirectFunctionCall2(seg_union,
>                         PointerGetDatum(seg_l),
> PointerGetDatum(sort_items[i].data)));

> pretty, but *shrug*.

For the builtin types, fmgr.h generally defines a datum-to-specific-
pointer-type macro, eg it has these for bytea:

#define DatumGetByteaP(X)           ((bytea *) PG_DETOAST_DATUM(X))
#define PG_GETARG_BYTEA_P(n)        DatumGetByteaP(PG_GETARG_DATUM(n))

(A type that doesn't have toast support would just use DatumGetPointer
instead of PG_DETOAST_DATUM.)  Replacing "(SEG *) DatumGetPointer(...)"
with "DatumGetSegP(...)" would help a little here.

I wouldn't necessarily do that if this is the only place that would
benefit, but there might be more scope for upgrading seg.c's notation
than just here.
        regards, tom lane