Thread: access numeric data in module
I'm the maintainer of the PL/Haskell language extension. (https://github.com/ed-o-saurus/PLHaskell/)
I want to be able to have a function received and/or return numeric data. However, I'm having trouble getting data from Datums and building Datums to return. numeric.h does not contain the macros to do this. They are in numeric.c.
Is there a way to access the values in the numeric structures? If not, would a PR to move macros to numeric.h be welcome in the next commitfest?
-Ed
Ed Behn <ed@behn.us> writes: > I want to be able to have a function received and/or return numeric data. > However, I'm having trouble getting data from Datums and building Datums to > return. numeric.h does not contain the macros to do this. They are in > numeric.c. > Is there a way to access the values in the numeric structures? If not, > would a PR to move macros to numeric.h be welcome in the next commitfest? It's intentional that that stuff is not exposed, so no. What actual functionality do you need that numeric.h doesn't expose? regards, tom lane
On Mon, Sep 9, 2024 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ed Behn <ed@behn.us> writes: > > I want to be able to have a function received and/or return numeric data. > > However, I'm having trouble getting data from Datums and building Datums to > > return. numeric.h does not contain the macros to do this. They are in > > numeric.c. > > > Is there a way to access the values in the numeric structures? If not, > > would a PR to move macros to numeric.h be welcome in the next commitfest? > > It's intentional that that stuff is not exposed, so no. > > What actual functionality do you need that numeric.h doesn't expose? I don't agree with this reponse at all. It seems entirely reasonable for third-party code to want to have a way to construct and interpret numeric datums. Keeping the details private would MAYBE make sense if the internal details were changing release to release, but that's clearly not the case. Even if it were, an extension author is completely entitled to say "hey, I'd rather have access to an unstable API and update my code for new releases" and we should accommodate that. If we don't, people don't give up on writing the code that they want to write -- they just cut-and-paste private declarations/code into their own source tree, which is WAY worse than if we just put the stuff in a .h file. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Sep 9, 2024 at 10:14 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It's intentional that that stuff is not exposed, so no. >> What actual functionality do you need that numeric.h doesn't expose? > I don't agree with this reponse at all. It seems entirely reasonable > for third-party code to want to have a way to construct and interpret > numeric datums. Keeping the details private would MAYBE make sense if > the internal details were changing release to release, but that's > clearly not the case. We have changed numeric's internal representation in the past, and I'd like to keep the freedom to do so again. There's been discussion for example of reconsidering the choice of NBASE to make more sense on 64-bit hardware. Yeah, maintaining on-disk compatibility limits what we can do there, but not as much as if some external module is in bed with the representation. > Even if it were, an extension author is > completely entitled to say "hey, I'd rather have access to an unstable > API and update my code for new releases" and we should accommodate > that. If we don't, people don't give up on writing the code that they > want to write -- they just cut-and-paste private declarations/code > into their own source tree, which is WAY worse than if we just put the > stuff in a .h file. IMO it'd be a lot better if numeric.c exposed whatever functionality Ed feels is missing, while keeping the contents of a numeric opaque. regards, tom lane
On Mon, Sep 9, 2024 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > We have changed numeric's internal representation in the past, and > I'd like to keep the freedom to do so again. There's been discussion > for example of reconsidering the choice of NBASE to make more sense > on 64-bit hardware. Yeah, maintaining on-disk compatibility limits > what we can do there, but not as much as if some external module > is in bed with the representation. I disagree with the idea that a contrib module looking at the details of a Numeric value means we can't make these kinds of updates. > > Even if it were, an extension author is > > completely entitled to say "hey, I'd rather have access to an unstable > > API and update my code for new releases" and we should accommodate > > that. If we don't, people don't give up on writing the code that they > > want to write -- they just cut-and-paste private declarations/code > > into their own source tree, which is WAY worse than if we just put the > > stuff in a .h file. > > IMO it'd be a lot better if numeric.c exposed whatever functionality > Ed feels is missing, while keeping the contents of a numeric opaque. We could certainly expose a bunch of functions, but I think that would actually be a bigger maintenance burden for us than just exposing some of the details that are currently private to numeric.c. It would also presumably be less performant, since it means somebody has to call a function rather than just using a macro. Also, this seems to me to be holding the numeric data type to a different standard than other things. For numeric, we have NumericData, NumericChoice, NumericShort, and NumericLong as structs that define the on-disk representation. They're in numeric.c. But ArrayType is in array.h. RangeType is in rangetypes.h. MultiRangeType is in multirangetypes.h. PATH and POLYGON are in geo_decls.h. inet and inet_data are in inet.h. int2vector and oidvector are in c.h (which seems like questionable placement, but I digress). And there must be tons of third-party code out there that knows how to interpret a text or bytea varlena. So it's not like we have some principled project-wide policy of hiding these implementation details. At first look, numeric seems like an outlier. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Sep 9, 2024 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> IMO it'd be a lot better if numeric.c exposed whatever functionality >> Ed feels is missing, while keeping the contents of a numeric opaque. > We could certainly expose a bunch of functions, but I think that would > actually be a bigger maintenance burden for us than just exposing some > of the details that are currently private to numeric.c. This whole argument is contingent on details that haven't been provided, namely exactly what it is that Ed wants to do that he can't do today. I think we should investigate that before deciding that publishing previously-private detail is the best solution. > Also, this seems to me to be holding the numeric data type to a > different standard than other things. By that argument, we should move every declaration in every .c file into c.h and be done. I'd personally be happier if we had *not* exposed the other data structure details you mention, but that ship has sailed. If we do do what you're advocating, I'd at least insist that the declarations go into a new file numeric_internal.h, so that it's clear to all concerned that they're playing with fire if they depend on that stuff. regards, tom lane
On 09/09/24 13:00, Robert Haas wrote: > I don't agree with this reponse at all. It seems entirely reasonable > for third-party code to want to have a way to construct and interpret > numeric datums. Keeping the details private would MAYBE make sense if > the internal details were changing release to release, but that's > clearly not the case. Even if it were, an extension author is > completely entitled to say "hey, I'd rather have access to an unstable > API and update my code for new releases" and we should accommodate > that. If we don't, people don't give up on writing the code that they > want to write -- they just cut-and-paste private declarations/code > into their own source tree, which is WAY worse than if we just put the > stuff in a .h file. Amen. https://tada.github.io/pljava/preview1.7/pljava-api/apidocs/org.postgresql.pljava/org/postgresql/pljava/adt/Numeric.html The above API documentation was written when the PostgreSQL source comments read "values of NBASE other than 10000 are considered of historical interest only and are no longer supported in any sense". I will have to generalize it a bit more if other NBASEs are now to be considered again. If Tom prefers the idea of keeping the datum layout strongly encapsulated (pretty much uniquely among PG data types) and providing only a callable C API for manipulating it, then I might propose something like the above- linked Java API as one source of API ideas. I think it's worth remembering that most PLs will have their own libraries (sometimes multiple alternatives) for arbitrary-precision numbers, and it's hard to generalize about /those/ libraries regarding what API they will provide for most efficiently and faithfully converting a foreign representation to or from their own. Conversion through a decimal string (a) may not be most efficient, and (b) may not faithfully roundtrip possible combinations of digits, displayScale, and weight. From Java's perspective, there has historically been a significant JNI overhead for calling from Java into a C API, so that it's advantageous to know the memory layout and keep the processing in Java. There is at last a batteries-included Java foreign-function interface that can make it less costly to call into a C API, but that has only landed in Java 22, which not everyone will be using right away. Regards, -Chap
On Mon, Sep 9, 2024 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > By that argument, we should move every declaration in every .c file > into c.h and be done. I'd personally be happier if we had *not* > exposed the other data structure details you mention, but that > ship has sailed. Not every declaration in every .c file is of general interest, but the ones that are probably should be moved into .h files. The on-disk representation of a commonly-used data type certainly qualifies. You can see from Chapman's reply that I'm not making this up: when we don't expose things, it doesn't keep people from depending on them, it just makes them copy our code into their own repository. That's not a win. It makes those extensions more fragile, not less, and it makes the PostgreSQL extension ecosystem worse. pg_hint_plan is another, recently-discussed example of this phenomenon: refuse to give people the keys, and they start hot-wiring stuff. > If we do do what you're advocating, I'd at least insist that the > declarations go into a new file numeric_internal.h, so that it's > clear to all concerned that they're playing with fire if they > depend on that stuff. I think that's a bit pointless considering that we don't do it in any of the other cases. I'd rather be consistent with our usual practice. But if it ends up in a separate header file that's still better than the status quo. -- Robert Haas EDB: http://www.enterprisedb.com
Sorry for taking so long to respond. I was at my day-job.
As I mentioned, I am the maintainer of the PL/Haskell language extension. This extension allows users to write code in the Haskell language. In order to use numeric types, I will need to create a Haskell type equivalent. Something like
data Numeric = PosInfinity | NegInfinity | NaN | Number Integer Int16
where the Number constructor represents a numeric's mantissa and weight.
In order to get or return data, I would need to be able to access those fields of the numeric type.
I'm not proposing giving access to the actual numeric structure. Rather, the data should be accessed by function call or macro. This would allow future changes to the inner workings without breaking compatibility as long as the interface is maintained. It looks to me like all of the code to access data exists, it should simply be made accessible. An additional function should exist that allows an extension to create a numeric structure by passing the needed data.
-Ed
On Mon, Sep 9, 2024 at 2:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 9, 2024 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> By that argument, we should move every declaration in every .c file
> into c.h and be done. I'd personally be happier if we had *not*
> exposed the other data structure details you mention, but that
> ship has sailed.
Not every declaration in every .c file is of general interest, but the
ones that are probably should be moved into .h files. The on-disk
representation of a commonly-used data type certainly qualifies.
You can see from Chapman's reply that I'm not making this up: when we
don't expose things, it doesn't keep people from depending on them, it
just makes them copy our code into their own repository. That's not a
win. It makes those extensions more fragile, not less, and it makes
the PostgreSQL extension ecosystem worse. pg_hint_plan is another,
recently-discussed example of this phenomenon: refuse to give people
the keys, and they start hot-wiring stuff.
> If we do do what you're advocating, I'd at least insist that the
> declarations go into a new file numeric_internal.h, so that it's
> clear to all concerned that they're playing with fire if they
> depend on that stuff.
I think that's a bit pointless considering that we don't do it in any
of the other cases. I'd rather be consistent with our usual practice.
But if it ends up in a separate header file that's still better than
the status quo.
--
Robert Haas
EDB: http://www.enterprisedb.com
Good afternoon-
Was there a resolution of this? I'm wondering if it is worth it for me to submit a PR for the next commitfest.
-Ed
On Mon, Sep 9, 2024 at 8:40 PM Ed Behn <ed@behn.us> wrote:
Sorry for taking so long to respond. I was at my day-job.As I mentioned, I am the maintainer of the PL/Haskell language extension. This extension allows users to write code in the Haskell language. In order to use numeric types, I will need to create a Haskell type equivalent. Something likedata Numeric = PosInfinity | NegInfinity | NaN | Number Integer Int16where the Number constructor represents a numeric's mantissa and weight.In order to get or return data, I would need to be able to access those fields of the numeric type.I'm not proposing giving access to the actual numeric structure. Rather, the data should be accessed by function call or macro. This would allow future changes to the inner workings without breaking compatibility as long as the interface is maintained. It looks to me like all of the code to access data exists, it should simply be made accessible. An additional function should exist that allows an extension to create a numeric structure by passing the needed data.-EdOn Mon, Sep 9, 2024 at 2:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Mon, Sep 9, 2024 at 2:02 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> By that argument, we should move every declaration in every .c file
> into c.h and be done. I'd personally be happier if we had *not*
> exposed the other data structure details you mention, but that
> ship has sailed.
Not every declaration in every .c file is of general interest, but the
ones that are probably should be moved into .h files. The on-disk
representation of a commonly-used data type certainly qualifies.
You can see from Chapman's reply that I'm not making this up: when we
don't expose things, it doesn't keep people from depending on them, it
just makes them copy our code into their own repository. That's not a
win. It makes those extensions more fragile, not less, and it makes
the PostgreSQL extension ecosystem worse. pg_hint_plan is another,
recently-discussed example of this phenomenon: refuse to give people
the keys, and they start hot-wiring stuff.
> If we do do what you're advocating, I'd at least insist that the
> declarations go into a new file numeric_internal.h, so that it's
> clear to all concerned that they're playing with fire if they
> depend on that stuff.
I think that's a bit pointless considering that we don't do it in any
of the other cases. I'd rather be consistent with our usual practice.
But if it ends up in a separate header file that's still better than
the status quo.
--
Robert Haas
EDB: http://www.enterprisedb.com
On Sat, Sep 14, 2024 at 2:10 PM Ed Behn <ed@behn.us> wrote: > Was there a resolution of this? I'm wondering if it is worth it for me to submit a PR for the next commitfest. Well, it seems like what you want is different than what I want, and what Tom wants is different from both of us. I'd like there to be a way forward here but at the moment I'm not quite sure what it is. -- Robert Haas EDB: http://www.enterprisedb.com
Upon further review, I've updated the patch. This avoids possible name conflicts with other functions. See attached.
On Sat, Feb 8, 2025 at 3:24 PM Ed Behn <ed@behn.us> wrote:
I've created a patch (attached) to implement the changes discussed below.This change moves the definition of the NumericVar structure to numeric.h. Function definitions for functions used to work with NumericVar are also moved to the header as are definitions of functions used to convert NumericVar to Numeric. (Numeric is used to store numeric and decimal types.)All of this is so that third-party libraries can access numeric and decimal values without having to access the opaque Numeric structure.There is actually no new code. Code is simply moved from numeric.c to numeric.h.This is a patch against branch master.This successfully compiles and is tested with regression tests.Also attached is a code sample that uses the change.Please provide feedback. I'm planning to submit this for the March commitfest.-EdOn Wed, Sep 18, 2024 at 9:50 AM Robert Haas <robertmhaas@gmail.com> wrote:On Sat, Sep 14, 2024 at 2:10 PM Ed Behn <ed@behn.us> wrote:
> Was there a resolution of this? I'm wondering if it is worth it for me to submit a PR for the next commitfest.
Well, it seems like what you want is different than what I want, and
what Tom wants is different from both of us. I'd like there to be a
way forward here but at the moment I'm not quite sure what it is.
--
Robert Haas
EDB: http://www.enterprisedb.com
Attachment
Ed Behn <ed@behn.us> writes: >> There is actually no new code. Code is simply moved from numeric.c to >> numeric.h. I will absolutely not hold still for that. It would mean that any time we want to think about messing with the contents of numerics, we need to examine more or less the whole Postgres code base to see what else is poking into those structures. If we must do something like this, then a separate header "numeric_internal.h" or something like that would reduce the blast radius for changes. But IMO you still haven't made an acceptable case for deciding that these data structures aren't private to numeric.c. What behaviors do you actually need that aren't accessible via the existing exported functons? regards, tom lane
Tom-
I think that I can allay your concerns. Please give me a day or so to put together my more complete thoughts on the matter. I'll be in touch.
-Ed
On Sat, Mar 1, 2025 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ed Behn <ed@behn.us> writes:
>> There is actually no new code. Code is simply moved from numeric.c to
>> numeric.h.
I will absolutely not hold still for that. It would mean that any
time we want to think about messing with the contents of numerics,
we need to examine more or less the whole Postgres code base to see
what else is poking into those structures.
If we must do something like this, then a separate header
"numeric_internal.h" or something like that would reduce the blast
radius for changes. But IMO you still haven't made an acceptable case
for deciding that these data structures aren't private to numeric.c.
What behaviors do you actually need that aren't accessible via the
existing exported functons?
regards, tom lane
On Sat, 1 Mar 2025 at 17:33, Tom Lane <tgl@sss.pgh.pa.us> wrote: > But IMO you still haven't made an acceptable case > for deciding that these data structures aren't private to numeric.c. > What behaviors do you actually need that aren't accessible via the > existing exported functons? FWIW in pg_duckdb we would definitely have liked to have access to some of the currently unexposed numeric internals. We vendored in some of the definitions that we required ourselves[1], but it would be very nice if we didn't have to. I cannot speak for Ed's reasoning, but for pg_duckdb the reason we cannot use the many of the already exposed functions are not thread-safe. So we create our own thread-safe way of constructing these functions. Another reason is so we can construct numeric types from int128/uint128 types efficiently. I would be very happy if we'd have these definitions available, even if they would change in a future PG version (that's what you sign up for as an extension author). And indeed as Robert already said, numeric seems to be the only type that has this kind of restriction on viewing the internal representation. [1]: https://github.com/duckdb/pg_duckdb/blob/main/include/pgduckdb/vendor/pg_numeric_c.hpp
Tom-
I understand that you are concerned about future maintenance costs vs benefits of this change. I hope that I can address those concerns. An important thing to note is that there are two different structures that represent numeric values:
* NumericData is an opaque structure that is defined in numeric.c. It is this struct that is used to store values. The patch I submitted has this structure remain opaque and in numeric.c. Its internals are messy and subject to future changes. I agree that third parties should not have access to this. Of note is that the type Numeric is a typedef of NumericData*.
* NumericVar is a user-friendly structure that already exists. It is this structure that I propose moving to numeric.h. There are functions that exist to convert it to and from NumericData. It is these functions that I propose giving access to.
What the patch boils down to is the movement of NumericVar to numeric.h along with function declarations for the basic function to work with it and a few pre-processor declarations.
I agree that there is the potential for future maintenance costs here. However, any future changes to NumericData would necessitate updating the code to convert to and from NumericVar regardless of the proposed changes. I think that this small increase in costs is outweighed by the benefits of allowing third parties to access this powerful datatype.
As for the reason that I would like to make this change: I am the maintainer of the PL/Haskell extension. (It allows the use of Haskell code as a procedural language. https://github.com/ed-o-saurus/PLHaskell) In the extension, users can currently pass several Postgres types and also have the function return them. This is accomplished by using the functions and macros that convert between Datums and C data types. (For example DatumGetFloat8 and Float8GetDatum to handle double-precision floating point values) I would like to add support for the use of the numeric type to the extension. To this end, I would need to create a Haskell type that mirrors the Postgres numeric type. Passed Haskell values would be instantiated by reading the data from Postgres. Conversely, returned values would be converted to the Postgres type. Internally, users would be able to perform mathematical operations with the Haskell values like any other type. Currently, there is no way for a third-party extension to get needed information about numeric values or build new numeric values. The proposed changes would remedy this.
An alternative approach would be to make available calls to read and create numeric data. However, as the NumericVar struct already exists, I feel that utilizing it is the more natural approach.
What do you think?
-Ed
I understand that you are concerned about future maintenance costs vs benefits of this change. I hope that I can address those concerns. An important thing to note is that there are two different structures that represent numeric values:
* NumericData is an opaque structure that is defined in numeric.c. It is this struct that is used to store values. The patch I submitted has this structure remain opaque and in numeric.c. Its internals are messy and subject to future changes. I agree that third parties should not have access to this. Of note is that the type Numeric is a typedef of NumericData*.
* NumericVar is a user-friendly structure that already exists. It is this structure that I propose moving to numeric.h. There are functions that exist to convert it to and from NumericData. It is these functions that I propose giving access to.
What the patch boils down to is the movement of NumericVar to numeric.h along with function declarations for the basic function to work with it and a few pre-processor declarations.
I agree that there is the potential for future maintenance costs here. However, any future changes to NumericData would necessitate updating the code to convert to and from NumericVar regardless of the proposed changes. I think that this small increase in costs is outweighed by the benefits of allowing third parties to access this powerful datatype.
As for the reason that I would like to make this change: I am the maintainer of the PL/Haskell extension. (It allows the use of Haskell code as a procedural language. https://github.com/ed-o-saurus/PLHaskell) In the extension, users can currently pass several Postgres types and also have the function return them. This is accomplished by using the functions and macros that convert between Datums and C data types. (For example DatumGetFloat8 and Float8GetDatum to handle double-precision floating point values) I would like to add support for the use of the numeric type to the extension. To this end, I would need to create a Haskell type that mirrors the Postgres numeric type. Passed Haskell values would be instantiated by reading the data from Postgres. Conversely, returned values would be converted to the Postgres type. Internally, users would be able to perform mathematical operations with the Haskell values like any other type. Currently, there is no way for a third-party extension to get needed information about numeric values or build new numeric values. The proposed changes would remedy this.
An alternative approach would be to make available calls to read and create numeric data. However, as the NumericVar struct already exists, I feel that utilizing it is the more natural approach.
What do you think?
-Ed
On Sat, Mar 1, 2025 at 3:32 PM Ed Behn <ed@behn.us> wrote:
Tom-I think that I can allay your concerns. Please give me a day or so to put together my more complete thoughts on the matter. I'll be in touch.-EdOn Sat, Mar 1, 2025 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:Ed Behn <ed@behn.us> writes:
>> There is actually no new code. Code is simply moved from numeric.c to
>> numeric.h.
I will absolutely not hold still for that. It would mean that any
time we want to think about messing with the contents of numerics,
we need to examine more or less the whole Postgres code base to see
what else is poking into those structures.
If we must do something like this, then a separate header
"numeric_internal.h" or something like that would reduce the blast
radius for changes. But IMO you still haven't made an acceptable case
for deciding that these data structures aren't private to numeric.c.
What behaviors do you actually need that aren't accessible via the
existing exported functons?
regards, tom lane