Thread: access numeric data in module

access numeric data in module

From
Ed Behn
Date:
I'm the maintainer of the PL/Haskell language extension. (https://github.com/ed-o-saurus/PLHaskell/)

I want to be able to have a function received and/or return numeric data. However, I'm having trouble getting data from Datums and building Datums to return. numeric.h does not contain the macros to do this. They are in numeric.c. 

Is there a way to access the values in the numeric structures? If not, would a PR to move macros to numeric.h be welcome in the next commitfest?

                  -Ed

Re: access numeric data in module

From
Tom Lane
Date:
Ed Behn <ed@behn.us> writes:
> I want to be able to have a function received and/or return numeric data.
> However, I'm having trouble getting data from Datums and building Datums to
> return. numeric.h does not contain the macros to do this. They are in
> numeric.c.

> Is there a way to access the values in the numeric structures? If not,
> would a PR to move macros to numeric.h be welcome in the next commitfest?

It's intentional that that stuff is not exposed, so no.

What actual functionality do you need that numeric.h doesn't expose?

            regards, tom lane



Re: access numeric data in module

From
Robert Haas
Date:
On Mon, Sep 9, 2024 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ed Behn <ed@behn.us> writes:
> > I want to be able to have a function received and/or return numeric data.
> > However, I'm having trouble getting data from Datums and building Datums to
> > return. numeric.h does not contain the macros to do this. They are in
> > numeric.c.
>
> > Is there a way to access the values in the numeric structures? If not,
> > would a PR to move macros to numeric.h be welcome in the next commitfest?
>
> It's intentional that that stuff is not exposed, so no.
>
> What actual functionality do you need that numeric.h doesn't expose?

I don't agree with this reponse at all. It seems entirely reasonable
for third-party code to want to have a way to construct and interpret
numeric datums. Keeping the details private would MAYBE make sense if
the internal details were changing release to release, but that's
clearly not the case. Even if it were, an extension author is
completely entitled to say "hey, I'd rather have access to an unstable
API and update my code for new releases" and we should accommodate
that. If we don't, people don't give up on writing the code that they
want to write -- they just cut-and-paste private declarations/code
into their own source tree, which is WAY worse than if we just put the
stuff in a .h file.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: access numeric data in module

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 9, 2024 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's intentional that that stuff is not exposed, so no.
>> What actual functionality do you need that numeric.h doesn't expose?

> I don't agree with this reponse at all. It seems entirely reasonable
> for third-party code to want to have a way to construct and interpret
> numeric datums. Keeping the details private would MAYBE make sense if
> the internal details were changing release to release, but that's
> clearly not the case.

We have changed numeric's internal representation in the past, and
I'd like to keep the freedom to do so again.  There's been discussion
for example of reconsidering the choice of NBASE to make more sense
on 64-bit hardware.  Yeah, maintaining on-disk compatibility limits
what we can do there, but not as much as if some external module
is in bed with the representation.

> Even if it were, an extension author is
> completely entitled to say "hey, I'd rather have access to an unstable
> API and update my code for new releases" and we should accommodate
> that. If we don't, people don't give up on writing the code that they
> want to write -- they just cut-and-paste private declarations/code
> into their own source tree, which is WAY worse than if we just put the
> stuff in a .h file.

IMO it'd be a lot better if numeric.c exposed whatever functionality
Ed feels is missing, while keeping the contents of a numeric opaque.

            regards, tom lane



Re: access numeric data in module

From
Robert Haas
Date:
On Mon, Sep 9, 2024 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We have changed numeric's internal representation in the past, and
> I'd like to keep the freedom to do so again.  There's been discussion
> for example of reconsidering the choice of NBASE to make more sense
> on 64-bit hardware.  Yeah, maintaining on-disk compatibility limits
> what we can do there, but not as much as if some external module
> is in bed with the representation.

I disagree with the idea that a contrib module looking at the details
of a Numeric value means we can't make these kinds of updates.

> > Even if it were, an extension author is
> > completely entitled to say "hey, I'd rather have access to an unstable
> > API and update my code for new releases" and we should accommodate
> > that. If we don't, people don't give up on writing the code that they
> > want to write -- they just cut-and-paste private declarations/code
> > into their own source tree, which is WAY worse than if we just put the
> > stuff in a .h file.
>
> IMO it'd be a lot better if numeric.c exposed whatever functionality
> Ed feels is missing, while keeping the contents of a numeric opaque.

We could certainly expose a bunch of functions, but I think that would
actually be a bigger maintenance burden for us than just exposing some
of the details that are currently private to numeric.c. It would also
presumably be less performant, since it means somebody has to call a
function rather than just using a macro.

Also, this seems to me to be holding the numeric data type to a
different standard than other things. For numeric, we have
NumericData, NumericChoice, NumericShort, and NumericLong as structs
that define the on-disk representation. They're in numeric.c. But
ArrayType is in array.h. RangeType is in rangetypes.h. MultiRangeType
is in multirangetypes.h. PATH and POLYGON are in geo_decls.h. inet and
inet_data are in inet.h. int2vector and oidvector are in c.h (which
seems like questionable placement, but I digress). And there must be
tons of third-party code out there that knows how to interpret a text
or bytea varlena. So it's not like we have some principled
project-wide policy of hiding these implementation details. At first
look, numeric seems like an outlier.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: access numeric data in module

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Sep 9, 2024 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> IMO it'd be a lot better if numeric.c exposed whatever functionality
>> Ed feels is missing, while keeping the contents of a numeric opaque.

> We could certainly expose a bunch of functions, but I think that would
> actually be a bigger maintenance burden for us than just exposing some
> of the details that are currently private to numeric.c.

This whole argument is contingent on details that haven't been
provided, namely exactly what it is that Ed wants to do that he can't
do today.  I think we should investigate that before deciding that
publishing previously-private detail is the best solution.

> Also, this seems to me to be holding the numeric data type to a
> different standard than other things.

By that argument, we should move every declaration in every .c file
into c.h and be done.  I'd personally be happier if we had *not*
exposed the other data structure details you mention, but that
ship has sailed.

If we do do what you're advocating, I'd at least insist that the
declarations go into a new file numeric_internal.h, so that it's
clear to all concerned that they're playing with fire if they
depend on that stuff.

            regards, tom lane



Re: access numeric data in module

From
Chapman Flack
Date:
On 09/09/24 13:00, Robert Haas wrote:
> I don't agree with this reponse at all. It seems entirely reasonable
> for third-party code to want to have a way to construct and interpret
> numeric datums. Keeping the details private would MAYBE make sense if
> the internal details were changing release to release, but that's
> clearly not the case. Even if it were, an extension author is
> completely entitled to say "hey, I'd rather have access to an unstable
> API and update my code for new releases" and we should accommodate
> that. If we don't, people don't give up on writing the code that they
> want to write -- they just cut-and-paste private declarations/code
> into their own source tree, which is WAY worse than if we just put the
> stuff in a .h file.

Amen.


https://tada.github.io/pljava/preview1.7/pljava-api/apidocs/org.postgresql.pljava/org/postgresql/pljava/adt/Numeric.html

The above API documentation was written when the PostgreSQL source
comments read "values of NBASE other than 10000 are considered of historical
interest only and are no longer supported in any sense".
I will have to generalize it a bit more if other NBASEs are now
to be considered again.

If Tom prefers the idea of keeping the datum layout strongly encapsulated
(pretty much uniquely among PG data types) and providing only a callable
C API for manipulating it, then I might propose something like the above-
linked Java API as one source of API ideas.

I think it's worth remembering that most PLs will have their own
libraries (sometimes multiple alternatives) for arbitrary-precision numbers,
and it's hard to generalize about /those/ libraries regarding what API
they will provide for most efficiently and faithfully converting a
foreign representation to or from their own. Conversion through a decimal
string (a) may not be most efficient, and (b) may not faithfully roundtrip
possible combinations of digits, displayScale, and weight.

From Java's perspective, there has historically been a significant JNI
overhead for calling from Java into a C API, so that it's advantageous
to know the memory layout and keep the processing in Java. There is
at last a batteries-included Java foreign-function interface that can
make it less costly to call into a C API, but that has only landed in
Java 22, which not everyone will be using right away.

Regards,
-Chap



Re: access numeric data in module

From
Robert Haas
Date:
On Mon, Sep 9, 2024 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> By that argument, we should move every declaration in every .c file
> into c.h and be done.  I'd personally be happier if we had *not*
> exposed the other data structure details you mention, but that
> ship has sailed.

Not every declaration in every .c file is of general interest, but the
ones that are probably should be moved into .h files. The on-disk
representation of a commonly-used data type certainly qualifies.

You can see from Chapman's reply that I'm not making this up: when we
don't expose things, it doesn't keep people from depending on them, it
just makes them copy our code into their own repository. That's not a
win. It makes those extensions more fragile, not less, and it makes
the PostgreSQL extension ecosystem worse. pg_hint_plan is another,
recently-discussed example of this phenomenon: refuse to give people
the keys, and they start hot-wiring stuff.

> If we do do what you're advocating, I'd at least insist that the
> declarations go into a new file numeric_internal.h, so that it's
> clear to all concerned that they're playing with fire if they
> depend on that stuff.

I think that's a bit pointless considering that we don't do it in any
of the other cases. I'd rather be consistent with our usual practice.
But if it ends up in a separate header file that's still better than
the status quo.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: access numeric data in module

From
Ed Behn
Date:
Sorry for taking so long to respond. I was at my day-job. 

As I mentioned, I am the maintainer of the PL/Haskell language extension. This extension allows users to write code in the Haskell language. In order to use numeric types, I will need to create a Haskell type equivalent. Something like

data Numeric = PosInfinity | NegInfinity | NaN | Number Integer Int16

where the Number constructor represents a numeric's mantissa and weight. 

In order to get or return data, I would need to be able to access those fields of the numeric type. 

I'm not proposing giving access to the actual numeric structure. Rather, the data should be accessed by function call or macro. This would allow future changes to the inner workings without breaking compatibility as long as the interface is maintained. It looks to me like all of the code to access data exists, it should simply be made accessible. An additional function should exist that allows an extension to create a numeric structure by passing the needed data. 

             -Ed


On Mon, Sep 9, 2024 at 2:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 9, 2024 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> By that argument, we should move every declaration in every .c file
> into c.h and be done.  I'd personally be happier if we had *not*
> exposed the other data structure details you mention, but that
> ship has sailed.

Not every declaration in every .c file is of general interest, but the
ones that are probably should be moved into .h files. The on-disk
representation of a commonly-used data type certainly qualifies.

You can see from Chapman's reply that I'm not making this up: when we
don't expose things, it doesn't keep people from depending on them, it
just makes them copy our code into their own repository. That's not a
win. It makes those extensions more fragile, not less, and it makes
the PostgreSQL extension ecosystem worse. pg_hint_plan is another,
recently-discussed example of this phenomenon: refuse to give people
the keys, and they start hot-wiring stuff.

> If we do do what you're advocating, I'd at least insist that the
> declarations go into a new file numeric_internal.h, so that it's
> clear to all concerned that they're playing with fire if they
> depend on that stuff.

I think that's a bit pointless considering that we don't do it in any
of the other cases. I'd rather be consistent with our usual practice.
But if it ends up in a separate header file that's still better than
the status quo.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: access numeric data in module

From
Ed Behn
Date:
Good afternoon-
    Was there a resolution of this? I'm wondering if it is worth it for me to submit a PR for the next commitfest. 

               -Ed

On Mon, Sep 9, 2024 at 8:40 PM Ed Behn <ed@behn.us> wrote:
Sorry for taking so long to respond. I was at my day-job. 

As I mentioned, I am the maintainer of the PL/Haskell language extension. This extension allows users to write code in the Haskell language. In order to use numeric types, I will need to create a Haskell type equivalent. Something like

data Numeric = PosInfinity | NegInfinity | NaN | Number Integer Int16

where the Number constructor represents a numeric's mantissa and weight. 

In order to get or return data, I would need to be able to access those fields of the numeric type. 

I'm not proposing giving access to the actual numeric structure. Rather, the data should be accessed by function call or macro. This would allow future changes to the inner workings without breaking compatibility as long as the interface is maintained. It looks to me like all of the code to access data exists, it should simply be made accessible. An additional function should exist that allows an extension to create a numeric structure by passing the needed data. 

             -Ed


On Mon, Sep 9, 2024 at 2:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 9, 2024 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> By that argument, we should move every declaration in every .c file
> into c.h and be done.  I'd personally be happier if we had *not*
> exposed the other data structure details you mention, but that
> ship has sailed.

Not every declaration in every .c file is of general interest, but the
ones that are probably should be moved into .h files. The on-disk
representation of a commonly-used data type certainly qualifies.

You can see from Chapman's reply that I'm not making this up: when we
don't expose things, it doesn't keep people from depending on them, it
just makes them copy our code into their own repository. That's not a
win. It makes those extensions more fragile, not less, and it makes
the PostgreSQL extension ecosystem worse. pg_hint_plan is another,
recently-discussed example of this phenomenon: refuse to give people
the keys, and they start hot-wiring stuff.

> If we do do what you're advocating, I'd at least insist that the
> declarations go into a new file numeric_internal.h, so that it's
> clear to all concerned that they're playing with fire if they
> depend on that stuff.

I think that's a bit pointless considering that we don't do it in any
of the other cases. I'd rather be consistent with our usual practice.
But if it ends up in a separate header file that's still better than
the status quo.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: access numeric data in module

From
Robert Haas
Date:
On Sat, Sep 14, 2024 at 2:10 PM Ed Behn <ed@behn.us> wrote:
>     Was there a resolution of this? I'm wondering if it is worth it for me to submit a PR for the next commitfest.

Well, it seems like what you want is different than what I want, and
what Tom wants is different from both of us. I'd like there to be a
way forward here but at the moment I'm not quite sure what it is.

--
Robert Haas
EDB: http://www.enterprisedb.com