Thread: NUMERIC private methods?

NUMERIC private methods?

From
David Fetter
Date:
Folks,

While noodling with some weighted statistics
<https://github.com/davidfetter/weighted_stats>, I noticed I was
having to jump through a lot of hoops because of all the private
methods in numeric.c, especially NumericVar.  Would there be some
major objection to exposing NumericVar as an opaque blob?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: NUMERIC private methods?

From
Heikki Linnakangas
Date:
On 12/16/2014 08:34 AM, David Fetter wrote:
> Folks,
>
> While noodling with some weighted statistics
> <https://github.com/davidfetter/weighted_stats>, I noticed I was
> having to jump through a lot of hoops because of all the private
> methods in numeric.c, especially NumericVar.  Would there be some
> major objection to exposing NumericVar as an opaque blob?

Hmm. You'd want to make add_var, mul_var etc. non-static?

Looking at the weighed_stats code, this probably illustrates the hoops 
you had to jump through:

> /* sqrt((n/(n-1)) * ((s0*s2 - s1*s1)/(s0*s0)) */
>
>         result
>             = DirectFunctionCall1(
>                 numeric_sqrt,
>                 DirectFunctionCall2(
>                     numeric_mul,
>                     DirectFunctionCall2(
>                         numeric_div,
>                         n_prime,
>                         DirectFunctionCall2(
>                             numeric_sub,
>                             n_prime,
>                             /*
>                              * This rather convoluted way to compute the value
>                              * 1 gives us a result which should have at least
>                              * as big a decimal scale as s_2 does, which should
>                              * guarantee that our result is as precise as the
>                              * input...
>                              */
>                             DirectFunctionCall2(
>                                 numeric_add,
>                                 DirectFunctionCall2(
>                                     numeric_sub,
>                                     state->s_2,
>                                     state->s_2
>                                     ),
>                                 make_numeric(1)
>                                 )
>                             )
>                         ),
>                     DirectFunctionCall2(
>                         numeric_div,
>                         DirectFunctionCall2(
>                             numeric_sub,
>                             DirectFunctionCall2(
>                                 numeric_mul,
>                                 state->s_0,
>                                 state->s_2
>                                 ),
>                             DirectFunctionCall2(
>                                 numeric_mul,
>                                 state->s_1,
>                                 state->s_1
>                                 )
>                             ),
>                         DirectFunctionCall2(
>                             numeric_mul,
>                             state->s_0,
>                             state->s_0
>                             )
>                         )
>                     )
>                 );

As a start, it would help a lot to #define a few helper macros like:

#define ADD(a, b) DirectFunctionCall2(numeric_add, a, b)
#define MUL(a, b) DirectFunctionCall2(numeric_mul, a, b)

in your extension. That would already make that a lot shorter.

You might also be worrying about performance, though. The above snippet 
was from the aggregate's final function, which isn't performance 
critical, but you have some numeric operations in the transition 
function too. I wonder how big the impact really is, though. 
init_var_from_num and make_result look quite cheap, but certainly not free.

- Heikki



Re: NUMERIC private methods?

From
Andrew Gierth
Date:
>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas@vmware.com> writes:
Heikki> Looking at the weighed_stats code, this probably illustratesHeikki> the hoops you had to jump through:

Actually that hoop-jumping expression is almost irrelevant.

The part that hurts (and yes, it's performance that's at issue here,
and not code aesthetics) is not being able to use NumericVar in the
aggregate's transition variable, because that means that every
computed intermediate value is palloc'd and pfree'd twice (once as the
digit buffer of a NumericVar and again as a Numeric datum).

-- 
Andrew (irc:RhodiumToad)



Re: NUMERIC private methods?

From
David Fetter
Date:
On Tue, Dec 16, 2014 at 09:01:47AM +0000, Andrew Gierth wrote:
> >>>>> "Heikki" == Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> 
>  Heikki> Looking at the weighed_stats code, this probably illustrates
>  Heikki> the hoops you had to jump through:
> 
> Actually that hoop-jumping expression is almost irrelevant.

Right.  Not that it made things fun or easy (at least for me) to
debug, as you can see by the git history.

> The part that hurts (and yes, it's performance that's at issue here,
> and not code aesthetics) is not being able to use NumericVar in the
> aggregate's transition variable, because that means that every
> computed intermediate value is palloc'd and pfree'd twice (once as
> the digit buffer of a NumericVar and again as a Numeric datum).

Yep.  Performance of NUMERIC might yet get some of the love it
deserves.  Perhaps something along the lines of a 128-bit default
structure with a promotion/demotion scheme for larger representations.
Until then, those of us writing extensions are stuck with heaps of
extra instructions in it that could easily be trimmed away to good
effect.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: NUMERIC private methods?

From
Tom Lane
Date:
Heikki Linnakangas <hlinnakangas@vmware.com> writes:
> On 12/16/2014 08:34 AM, David Fetter wrote:
>> While noodling with some weighted statistics
>> <https://github.com/davidfetter/weighted_stats>, I noticed I was
>> having to jump through a lot of hoops because of all the private
>> methods in numeric.c, especially NumericVar.  Would there be some
>> major objection to exposing NumericVar as an opaque blob?

> Hmm. You'd want to make add_var, mul_var etc. non-static?

-1 for that.

> Looking at the weighed_stats code, this probably illustrates the hoops 
> you had to jump through:

>> /* sqrt((n/(n-1)) * ((s0*s2 - s1*s1)/(s0*s0)) */

If you're concerned about arithmetic performance, there is a very obvious
fix here: use double.  Is there some utterly compelling reason to use
numeric, despite the fact that it's certain to be orders of magnitude
slower?

(It would still be orders of magnitude slower, no matter how much we
were willing to destroy numeric.c's modularity boundary.)
        regards, tom lane



Re: NUMERIC private methods?

From
Andrew Gierth
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Hmm. You'd want to make add_var, mul_var etc. non-static?
Tom> -1 for that.

possibly with more meaningful names.
Tom> If you're concerned about arithmetic performance, there is aTom> very obvious fix here: use double.

Independently of this specific example, the obvious issue there is that
there are applications for which double is simply not acceptable.

As it stands, no extension can use the numeric type in any non-trivial
way without paying a large penalty for repeated pallocs and data copies.
Given that the ability to write C extensions easily is one of pg's great
strengths, this is a defect that should be corrected.
Tom> (It would still be orders of magnitude slower, no matter howTom> much we were willing to destroy numeric.c's
modularityTom>boundary.)
 

There is no need to expose any details of NumericVar's implementation;
it would suffice to provide an interface to allocate NumericVars, and
access to the functions.

-- 
Andrew (irc:RhodiumToad)



Re: NUMERIC private methods?

From
Tom Lane
Date:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> If you're concerned about arithmetic performance, there is a
>  Tom> very obvious fix here: use double.

> Independently of this specific example, the obvious issue there is that
> there are applications for which double is simply not acceptable.

As the guy who last fooled with the numeric calculation algorithms in any
major way, I'm painfully aware that numeric is not necessarily more
accurate than double for anything more complicated than
addition/subtraction/multiplication.  The example that was shown upthread
is pretty nearly a textbook case of something where I'd not believe that
numeric offers any accuracy improvement without *very* careful
investigation.  In general, if your application is sufficiently
arithmetic-intensive that you need to care about the speed of calculations,
then it's not obvious which one to pick, and if I hear a knee-jerk "simply
not acceptable" I'm simply going to conclude that you haven't actually
studied the subject.

> As it stands, no extension can use the numeric type in any non-trivial
> way without paying a large penalty for repeated pallocs and data copies.
> Given that the ability to write C extensions easily is one of pg's great
> strengths, this is a defect that should be corrected.

>  Tom> (It would still be orders of magnitude slower, no matter how
>  Tom> much we were willing to destroy numeric.c's modularity
>  Tom> boundary.)

> There is no need to expose any details of NumericVar's implementation;
> it would suffice to provide an interface to allocate NumericVars, and
> access to the functions.

I call BS on that.  Exposing NumericVar is *already* violating numeric.c's
abstraction boundary.  If somebody were to want to reimplement numeric for
more speed, changing that data structure and/or the usage conventions for
it would likely be one of the core things they need to do.  As soon as we
expose the functions you're asking for, that becomes impossible.
        regards, tom lane



Re: NUMERIC private methods?

From
Jim Nasby
Date:
On 12/18/14, 9:21 AM, Tom Lane wrote:
>> As it stands, no extension can use the numeric type in any non-trivial
>> >way without paying a large penalty for repeated pallocs and data copies.
>> >Given that the ability to write C extensions easily is one of pg's great
>> >strengths, this is a defect that should be corrected.

If copying data/palloc is the root of numeric's performance problems then we need to address that, because it will
providebenefit across the entire database. The pattern of (palloc; copy) is repeated throughout a large part of the
codebase.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: NUMERIC private methods?

From
Robert Haas
Date:
On Thu, Dec 18, 2014 at 10:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>>  Tom> If you're concerned about arithmetic performance, there is a
>>  Tom> very obvious fix here: use double.
>
>> Independently of this specific example, the obvious issue there is that
>> there are applications for which double is simply not acceptable.
>
> As the guy who last fooled with the numeric calculation algorithms in any
> major way, I'm painfully aware that numeric is not necessarily more
> accurate than double for anything more complicated than
> addition/subtraction/multiplication.  The example that was shown upthread
> is pretty nearly a textbook case of something where I'd not believe that
> numeric offers any accuracy improvement without *very* careful
> investigation.  In general, if your application is sufficiently
> arithmetic-intensive that you need to care about the speed of calculations,
> then it's not obvious which one to pick, and if I hear a knee-jerk "simply
> not acceptable" I'm simply going to conclude that you haven't actually
> studied the subject.

I think that's ridiculous.  You're basically arguing that numeric
doesn't offer meaningful advantages over float8, which flies in the
face of the fact that essentially every database application I've ever
seen uses numeric and I'm not sure I've ever seen one using float8.
Nearly all database users prefer to store quantities like currency
units in a type that is guaranteed not to lose precision.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: NUMERIC private methods?

From
Alvaro Herrera
Date:
Robert Haas wrote:

> I think that's ridiculous.  You're basically arguing that numeric
> doesn't offer meaningful advantages over float8, which flies in the
> face of the fact that essentially every database application I've ever
> seen uses numeric and I'm not sure I've ever seen one using float8.
> Nearly all database users prefer to store quantities like currency
> units in a type that is guaranteed not to lose precision.

I think it's reasonable to expose NumericVar and the supporting function
prototypes in, say, numeric_internal.h; normal applications that just
want to operate on numerics as today can just include numeric.h, and
continue to be at arms-length of the implementation details, while code
that wants to optimize operations further can use numeric_internal.h and
be very aware that they are subject to heavy breakage if we ever feel a
need to change the internal API.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: NUMERIC private methods?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Dec 18, 2014 at 10:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As the guy who last fooled with the numeric calculation algorithms in any
>> major way, I'm painfully aware that numeric is not necessarily more
>> accurate than double for anything more complicated than
>> addition/subtraction/multiplication.  The example that was shown upthread
>> is pretty nearly a textbook case of something where I'd not believe that
>> numeric offers any accuracy improvement without *very* careful
>> investigation.

> I think that's ridiculous.  You're basically arguing that numeric
> doesn't offer meaningful advantages over float8, which flies in the
> face of the fact that essentially every database application I've ever
> seen uses numeric and I'm not sure I've ever seen one using float8.
> Nearly all database users prefer to store quantities like currency
> units in a type that is guaranteed not to lose precision.

If you're doing banking, you don't do anything except addition,
subtraction, and multiplication.  And that is what those users
who want "guaranteed precision" are doing, and yeah numeric will
make them happy.

If you're doing any sort of higher math or statistics, I stand by my
statement that you'd better think rather than just blindly assume that
numeric is going to be better for you.  A moment's fooling about finds
this example, which is pretty relevant to the formula we started this
thread with:

regression=# select (1234::numeric/1235) * 1235;         ?column?          
---------------------------1234.00000000000000000100
(1 row)

regression=# select (1234::float8/1235) * 1235; ?column? 
----------    1234
(1 row)

What it boils down to is that numeric is great for storing given decimal
inputs exactly, and it can do exact addition/subtraction/multiplication
on those too, but as soon as you get into territory where the result is
fundamentally inexact it is *not* promised to be better than float8.
In fact, it's designed to be more or less the same as float8; see the
comments in select_div_scale.

We could probably improve on this if we were to redesign the algorithms
around a concept of decimal floating-point, rather than decimal
fixed-point as it is now.  But I'm not sure how well that would comport
with the SQL standard.  And I'm very not sure that we could still do it
once we'd tied one hand behind our backs by virtue of exporting a bunch
of the internals as public API.
        regards, tom lane



Re: NUMERIC private methods?

From
Robert Haas
Date:
On Thu, Dec 18, 2014 at 11:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What it boils down to is that numeric is great for storing given decimal
> inputs exactly, and it can do exact addition/subtraction/multiplication
> on those too, but as soon as you get into territory where the result is
> fundamentally inexact it is *not* promised to be better than float8.

I think what it boils down to is that several people here (and I'll
add my voice to the chorus) are saying, hey, numeric is really useful,
and we'd like to be able to manipulate numerics without all the palloc
and fmgr overhead, and your response appears to be to say, use float8.
That may be the right answer in some cases, but certainly not in all.

> We could probably improve on this if we were to redesign the algorithms
> around a concept of decimal floating-point, rather than decimal
> fixed-point as it is now.  But I'm not sure how well that would comport
> with the SQL standard.  And I'm very not sure that we could still do it
> once we'd tied one hand behind our backs by virtue of exporting a bunch
> of the internals as public API.

You make this argument every time somebody wants to drop static from a
function or stick PGDLLIMPORT on a variable, and frankly I think it's
pretty developer-hostile.  I would much rather see us go ahead and do
those things and if people later complain that we broke stuff, we'll
go tell them to pound sand.  That's what we said when people
complained about relistemp -> relpersistence, and the number of people
who are affected by a change in the system catalogs has got to be a
good two orders of magnitude more than the number of people who are
going to notice a change in the C API.  Giving developers the ability
to write extensions that *might* get broken by future changes is a lot
better than not giving them that ability in the first place.  There's
only a problem for the core project if we think we're bound not to
break the APIs in a future release, and I don't feel so bound.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: NUMERIC private methods?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I think what it boils down to is that several people here (and I'll
> add my voice to the chorus) are saying, hey, numeric is really useful,
> and we'd like to be able to manipulate numerics without all the palloc
> and fmgr overhead, and your response appears to be to say, use float8.
> That may be the right answer in some cases, but certainly not in all.

Well, there are two components to what I'm saying.  One is that the
example David started with looks like it could use some better-informed
consideration about which datatype to use.  The other is that numeric
leaves quite a lot to be desired still, and someday we might want to fix
that, and that might require breaking the APIs you want to expose.

> You make this argument every time somebody wants to drop static from a
> function or stick PGDLLIMPORT on a variable, and frankly I think it's
> pretty developer-hostile.  I would much rather see us go ahead and do
> those things and if people later complain that we broke stuff, we'll
> go tell them to pound sand.

Really.  I will remember that next time you bitch about us changing some
extension-visible behavior, which AFAIR you are usually one of the first
to complain about.

Anyway, if we do this, I concur with Alvaro's suggestion that the
additional exposure be in a header file named something like
numeric_private.h, so that there's less room for complaint when
we change it.
        regards, tom lane



Re: NUMERIC private methods?

From
Robert Haas
Date:
On Fri, Dec 19, 2014 at 10:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Well, there are two components to what I'm saying.  One is that the
> example David started with looks like it could use some better-informed
> consideration about which datatype to use.  The other is that numeric
> leaves quite a lot to be desired still, and someday we might want to fix
> that, and that might require breaking the APIs you want to expose.

As I say, I'm OK with that.  That file has been nearly-static for a
really long time, so I don't expect a massive rewrite imminently.  But
if it happens, and it breaks things for people relying on those
functions, tough luck for them.

>> You make this argument every time somebody wants to drop static from a
>> function or stick PGDLLIMPORT on a variable, and frankly I think it's
>> pretty developer-hostile.  I would much rather see us go ahead and do
>> those things and if people later complain that we broke stuff, we'll
>> go tell them to pound sand.
>
> Really.  I will remember that next time you bitch about us changing some
> extension-visible behavior, which AFAIR you are usually one of the first
> to complain about.

Really?  I have bitched about what I see as nearly-pointless
reorganizations of header files, because I personally place very low
priority on speeding up incremental compiles, and most of those
changes have no other benefit.  But I can't remember ever bitching
about function signature changes or changes to the types of global
variables, or even the disappearance of previously-exposed types or
global variables.  And I daresay, given the nature of my
responsibilities at EDB, such things have at least one order of
magnitude more impact on me than they do on most extension
maintainers.  If it were up to me, I'd go stick PGDLLIMPORT on every
global variable I could find; I estimate that would make things easier
for me, say, five times as often as it would make them harder.  The
amount of time we've spent tracking down regressions on Windows
because some loadable module of ours depended on a variable that
wasn't PGDLLIMPORT'd is large - everything works fine on Linux, and
sometimes survives cursory testing on Windows too, but then breaks in
some obscure scenario.  And there have also been cases where we've had
to work around the absence of PGDLLIMPORT markings with ugly hacks and
then that workaround code has turned out to be buggy.  I would not
argue for indiscriminately making global every function we have
anywhere in the backend, but I would favor a policy of being
significantly more liberal about it than we have been heretofore.

> Anyway, if we do this, I concur with Alvaro's suggestion that the
> additional exposure be in a header file named something like
> numeric_private.h, so that there's less room for complaint when
> we change it.

Yes, I liked that proposal, too.  Also, I'm not sure whether there's
room to worry that making those function extern rather than static
would defeat compiler optimizations that materially affect
performance.  That might just be paranoia on my part, but I've seen
cases where it matters.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: NUMERIC private methods?

From
David Fetter
Date:
On Thu, Dec 18, 2014 at 11:51:37PM -0300, Alvaro Herrera wrote:
> Robert Haas wrote:
> 
> > I think that's ridiculous.  You're basically arguing that numeric
> > doesn't offer meaningful advantages over float8, which flies in
> > the face of the fact that essentially every database application
> > I've ever seen uses numeric and I'm not sure I've ever seen one
> > using float8.  Nearly all database users prefer to store
> > quantities like currency units in a type that is guaranteed not to
> > lose precision.
> 
> I think it's reasonable to expose NumericVar and the supporting
> function prototypes in, say, numeric_internal.h; normal applications
> that just want to operate on numerics as today can just include
> numeric.h, and continue to be at arms-length of the implementation
> details, while code that wants to optimize operations further can
> use numeric_internal.h and be very aware that they are subject to
> heavy breakage if we ever feel a need to change the internal API.

While nothing can prevent negligence and pilot error, making it clear
by the name of the included header that breakable stuff is being used
seems like an excellent way to proceed.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: NUMERIC private methods?

From
Bruce Momjian
Date:
On Thu, Dec 18, 2014 at 11:51:12PM -0500, Tom Lane wrote:
> If you're doing any sort of higher math or statistics, I stand by my
> statement that you'd better think rather than just blindly assume that
> numeric is going to be better for you.  A moment's fooling about finds
> this example, which is pretty relevant to the formula we started this
> thread with:
>
> regression=# select (1234::numeric/1235) * 1235;
>          ?column?
> ---------------------------
>  1234.00000000000000000100
> (1 row)
>
> regression=# select (1234::float8/1235) * 1235;
>  ?column?
> ----------
>      1234
> (1 row)
>
> What it boils down to is that numeric is great for storing given decimal
> inputs exactly, and it can do exact addition/subtraction/multiplication
> on those too, but as soon as you get into territory where the result is
> fundamentally inexact it is *not* promised to be better than float8.
> In fact, it's designed to be more or less the same as float8; see the
> comments in select_div_scale.

Based on the analysis above, I have written the attached patch to the
NUMERIC docs to mention this.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

Re: NUMERIC private methods?

From
Andrew Gierth
Date:
>>>>> "Bruce" == Bruce Momjian <bruce@momjian.us> writes:
> !      However, calculations on <type>numeric</type> values is very slow

arithmetic ... is,  but calculations ... are

-- 
Andrew (irc:RhodiumToad)



Re: NUMERIC private methods?

From
Bruce Momjian
Date:
On Sun, Mar 22, 2015 at 04:08:32AM +0000, Andrew Gierth wrote:
> >>>>> "Bruce" == Bruce Momjian <bruce@momjian.us> writes:
> 
>  > !      However, calculations on <type>numeric</type> values is very slow
> 
> arithmetic ... is,  but calculations ... are

Ah, good point.  Fixed an applied.  Thanks.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +