Thread: Is a UDF binary portable across different minor releases and PostgreSQL distributions?

Is a UDF binary portable across different minor releases and PostgreSQL distributions?

From
"Tsunakawa, Takayuki"
Date:
Hello,

While I was thinking of application binary compatibility between PostgreSQL releases, some questions arose about C
languageuser-defined functions (UDFs) and extensions that depend on them.
 

[Q1]
Can the same UDF binary be used with different PostgreSQL minor releases?  If it is, is it a defined policy (e.g.
writtensomewhere in the manual, wiki, documentation in the source code)?
 

For example, suppose you build a UDF X (some_extension.so/dll) with PostgreSQL 9.5.0.  Can I use the binary with
PostgreSQL9.5.x without rebuilding?
 

Here, the UDF references the contents of server-side data structures, like pgstattuple accesses the members of
HeapScanData. If some bug fix of PostgreSQL changes the member layout of those structures, the UDF binary would
possiblymisbehave.  Basically, should all UDFs be rebuilt with the new minor release?  Or, are PostgreSQL developers
awareof such incompatibility and careful not to change data structure layout?
 


[Q2]
Can the same UDF binary be used with different PostgreSQL distributions (EnterpriseDB, OpenSCG, RHEL packages, etc.)?
Orshould the UDF be built with the target distribution?
 

I guess the rebuild is necessary if the distribution modified the source code of PostgreSQL.  That is, the UDF binary
builtwith the bare PostgreSQL cannot be used with EnterpriseDB's advanced edition, which may modify various data
structures.

How about other distributions which probably don't modify the source code?  Should the UDF be built with the target
PostgreSQLbecause configure options may differ, which affects data structures?
 


Regards
Takayuki Tsunakawa





On Fri, Jul 1, 2016 at 9:33 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> [Q1]
> Can the same UDF binary be used with different PostgreSQL minor releases?  If it is, is it a defined policy (e.g.
writtensomewhere in the manual, wiki, documentation in the source code)? 
>
> For example, suppose you build a UDF X (some_extension.so/dll) with PostgreSQL 9.5.0.  Can I use the binary with
PostgreSQL9.5.x without rebuilding? 

Yes, that works properly. There could be problems with potential
changes in the backend APIs in a stable branch, but this usually does
not happen much.

> Here, the UDF references the contents of server-side data structures, like pgstattuple accesses the members of
HeapScanData. If some bug fix of PostgreSQL changes the member layout of those structures, the UDF binary would
possiblymisbehave.  Basically, should all UDFs be rebuilt with the new minor release? 

Not necessarily.

> Or, are PostgreSQL developers aware of such incompatibility and careful not to change data structure layout?

Committers are aware and careful about that, that's why exposed APIs
and structures are normally kept stable. At least that's what I see.

> [Q2]
> Can the same UDF binary be used with different PostgreSQL distributions (EnterpriseDB, OpenSCG, RHEL packages, etc.)?
Or should the UDF be built with the target distribution? 

Each distribution has usually its own compilation options (say page
size, etc.) even if I recall that most of them use the defaults, so it
clearly depends on what kind of things each of them uses. I would
recommend a recompilation just to be safe. It may not be worth
spending time at looking and checking each one's differences.

> I guess the rebuild is necessary if the distribution modified the source code of PostgreSQL.  That is, the UDF binary
builtwith the bare PostgreSQL cannot be used with EnterpriseDB's advanced edition, which may modify various data
structures.

That's for sure.

> How about other distributions which probably don't modify the source code?  Should the UDF be built with the target
PostgreSQLbecause configure options may differ, which affects data structures? 

It depends on how they build it, but recompiling is the safest bet to
avoid any surprises... I recall seeing an extension code that caused a
SIGSEV with fclose(NULL) on SLES and only reported an error with
Ubuntu. The code was faulty in this case.. But recompiling is usually
a better bet of stability.
--
Michael



> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> On Fri, Jul 1, 2016 at 9:33 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > [Q1]
> > Can the same UDF binary be used with different PostgreSQL minor releases?
> If it is, is it a defined policy (e.g. written somewhere in the manual,
> wiki, documentation in the source code)?
> >
> > For example, suppose you build a UDF X (some_extension.so/dll) with
> PostgreSQL 9.5.0.  Can I use the binary with PostgreSQL 9.5.x without
> rebuilding?
> 
> Yes, that works properly. There could be problems with potential changes
> in the backend APIs in a stable branch, but this usually does not happen
> much.
> 
> > Here, the UDF references the contents of server-side data structures,
> like pgstattuple accesses the members of HeapScanData.  If some bug fix
> of PostgreSQL changes the member layout of those structures, the UDF binary
> would possibly misbehave.  Basically, should all UDFs be rebuilt with the
> new minor release?
> 
> Not necessarily.
> 
> > Or, are PostgreSQL developers aware of such incompatibility and careful
> not to change data structure layout?
> 
> Committers are aware and careful about that, that's why exposed APIs and
> structures are normally kept stable. At least that's what I see.
> 
> > [Q2]
> > Can the same UDF binary be used with different PostgreSQL distributions
> (EnterpriseDB, OpenSCG, RHEL packages, etc.)?  Or should the UDF be built
> with the target distribution?
> 
> Each distribution has usually its own compilation options (say page size,
> etc.) even if I recall that most of them use the defaults, so it clearly
> depends on what kind of things each of them uses. I would recommend a
> recompilation just to be safe. It may not be worth spending time at looking
> and checking each one's differences.

Thanks for sharing your experience, Michael.

I'd like to document the policy clearly in the upgrade section of PostgreSQL manual, eliminating any ambiguity, so that
userscan determine what they should do without fear like "may or may not work".  Which of the following policies should
Ibase on?
 

Option 1:
Rebuild UDFs with the target PostgreSQL distribution and minor release.

Option 2:
Rebuild UDFs with the target PostgreSQL distribution.
You do not have to rebuild UDFs when you upgrade or downgrade the minor release.  (If your UDF doesn't work after
changingthe minor release, it's the bug of PostgreSQL.  You can report it to pgsql-bugs.)
 


Regards
Takayuki Tsunakawa


On Fri, Jul 1, 2016 at 10:35 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> I'd like to document the policy clearly in the upgrade section of PostgreSQL manual, eliminating any ambiguity, so
thatusers can determine what they should do without fear like "may or may not work".  Which of the following policies
shouldI base on?
 
>
> Option 1:
> Rebuild UDFs with the target PostgreSQL distribution and minor release.
>
> Option 2:
> Rebuild UDFs with the target PostgreSQL distribution.
> You do not have to rebuild UDFs when you upgrade or downgrade the minor release.  (If your UDF doesn't work after
changingthe minor release, it's the bug of PostgreSQL.  You can report it to pgsql-bugs.)
 

That would not be a bug of PostgreSQL, the terms are incorrect. If
there is an API breakage, the extension needs to keep up in this case,
so it would be better to mention asking on the lists what may have
gone wrong.
-- 
Michael



"Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> I'd like to document the policy clearly in the upgrade section of PostgreSQL manual, eliminating any ambiguity, so
thatusers can determine what they should do without fear like "may or may not work".  Which of the following policies
shouldI base on?
 

> Option 1:
> Rebuild UDFs with the target PostgreSQL distribution and minor release.

> Option 2:
> Rebuild UDFs with the target PostgreSQL distribution.
> You do not have to rebuild UDFs when you upgrade or downgrade the minor release.  (If your UDF doesn't work after
changingthe minor release, it's the bug of PostgreSQL.  You can report it to pgsql-bugs.)
 

I do not like either of those.  We try hard not to break extensions in
minor releases, but I'm not willing to state it as a hard-and-fast policy
that we never will --- especially because there's no bright line as to
which internal APIs extensions can rely on or not.  With sufficiently
negative assumptions about what third-party authors might have chosen to
do, it could become impossible to fix anything at all in released
branches.

In practice, extensions seldom need to be modified for new minor releases.
But there's a long way between that statement and a promise that it won't
ever happen for any conceivable extension.

To make this situation better, what we'd really need is a bunch of work
to identify and document the specific APIs that we would promise won't
change within a release branch.  That idea has been batted around before,
but nobody's stepped up to do all the tedious (and, no doubt, contentious)
work that would be involved.
        regards, tom lane



> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> On Fri, Jul 1, 2016 at 10:35 AM, Tsunakawa, Takayuki
> <tsunakawa.takay@jp.fujitsu.com> wrote:
> > I'd like to document the policy clearly in the upgrade section of
> PostgreSQL manual, eliminating any ambiguity, so that users can determine
> what they should do without fear like "may or may not work".  Which of the
> following policies should I base on?
> >
> > Option 1:
> > Rebuild UDFs with the target PostgreSQL distribution and minor release.
> >
> > Option 2:
> > Rebuild UDFs with the target PostgreSQL distribution.
> > You do not have to rebuild UDFs when you upgrade or downgrade the
> > minor release.  (If your UDF doesn't work after changing the minor
> > release, it's the bug of PostgreSQL.  You can report it to
> > pgsql-bugs.)
> 
> That would not be a bug of PostgreSQL, the terms are incorrect. If there
> is an API breakage, the extension needs to keep up in this case, so it would
> be better to mention asking on the lists what may have gone wrong.

OK, I understood that your choice is option 2.  And the UDF developer should report the problem and ask for its reason
onpgsql-bugs, possibly end up haveing to rebuild the UDF.  But if so, it sounds like option 1.  That is, "For safety,
rebuildyour UDF with each minor release.  That way, you can avoid severe problems that might take time to pop up above
water." I wonder if this is similar to the Linux's loadable kernel modules.
 

I'd like to hear opinions from other decision makers here before proceeding, as well as Michael.


Regards
Takayuki Tsunakawa


On Fri, Jul 1, 2016 at 11:33 AM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> OK, I understood that your choice is option 2.  And the UDF developer should report the problem and ask for its
reasonon pgsql-bugs, possibly end up haveing to rebuild the UDF.  But if so, it sounds like option 1.  That is, "For
safety,rebuild your UDF with each minor release.  That way, you can avoid severe problems that might take time to pop
upabove water."  I wonder if this is similar to the Linux's loadable kernel modules. 
> I'd like to hear opinions from other decision makers here before proceeding, as well as Michael.

Speaking of some past experience, I got once bitten the change of
signature of IndexDefine() done in 820ab11 with 9.2, because at some
point, the tree I was maintaining kept a static copy of Postgres code,
and bootparse.c (!) was in the set. Guess the result. That was a lot
of fun to debug to find why Postgres kept crashing at initdb, and
extensions could blow up similarly if they expect routines with a
different shape.

Since then I take it on the safest side and all my in-house backend
extensions get recompiled, for each minor releases, as well as each
point in-between. So that's clearly the option 1, I get to do in for
the internal stuff I work on.

Even if there is a list of routines that are listed as in the docs
telling that those will not get broken, in some cases it is really
hard to not break that promise. Looking at for example the diffs of
820ab11, my guess is that there has been a lot of discussions around
this change, and at the end the signature of DefineIndex had to
change, for the best.

Now, speaking from the heart, it is somewhat a waste to have to
recompile that all the time... But by looking at any package
maintainer history, for example that, there are rebuilds triggered
from time to time because of changes of dependent libraries like
OpenSSL. Take here for example:
https://git.archlinux.org/svntogit/packages.git/log/trunk?h=packages/postgresql

So perhaps the best answer, is not 1 nor 2. Just saying that the
routines are carefully maintained with a best effort, though sometimes
you may need to rebuild depending on unavoidable changes in routine
signatures that had to be introduced.
--
Michael



> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> "Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com> writes:
> > Option 2:
> > Rebuild UDFs with the target PostgreSQL distribution.
> > You do not have to rebuild UDFs when you upgrade or downgrade the
> > minor release.  (If your UDF doesn't work after changing the minor
> > release, it's the bug of PostgreSQL.  You can report it to
> > pgsql-bugs.)
> 
> I do not like either of those.  We try hard not to break extensions in minor
> releases, but I'm not willing to state it as a hard-and-fast policy that
> we never will --- especially because there's no bright line as to which
> internal APIs extensions can rely on or not.  With sufficiently negative
> assumptions about what third-party authors might have chosen to do, it could
> become impossible to fix anything at all in released branches.

I feel empathy, but I think something needs to be documented for users to upgrade and/or change distributions with
relief. In practice, though it may be a shame, isn't option 1 the current answer?
 

Again, the current situation seems similar to the Linux loadable kernel modules.  So PostgreSQL is not alone.  See
"Binarycompatibility" section in:
 

https://en.wikipedia.org/wiki/Loadable_kernel_module


> In practice, extensions seldom need to be modified for new minor releases.
> But there's a long way between that statement and a promise that it won't
> ever happen for any conceivable extension.

I think so, too.

> To make this situation better, what we'd really need is a bunch of work
> to identify and document the specific APIs that we would promise won't change
> within a release branch.  That idea has been batted around before, but
> nobody's stepped up to do all the tedious (and, no doubt, contentious) work
> that would be involved.

I can't yet imagine if such API (including data structures) can really be defined so that UDF developers feel
comfortablewith its flexibility.  I wonder how other OSes provide such API and ABI.
 

Regards
Takayuki Tsunakawa




On Fri, Jul 1, 2016 at 12:19 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
>> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>> To make this situation better, what we'd really need is a bunch of work
>> to identify and document the specific APIs that we would promise won't change
>> within a release branch.  That idea has been batted around before, but
>> nobody's stepped up to do all the tedious (and, no doubt, contentious) work
>> that would be involved.
>
> I can't yet imagine if such API (including data structures) can really be defined so that UDF developers feel
comfortablewith its flexibility.  I wonder how other OSes provide such API and ABI.
 

That would be a lot of work, for little result. And at the end the
risk 0 does not exist and things may change. I still quite like the
answer being the mix between 1 and 2: we do our best to maintain the
backend APIs stable, but be careful that things may break if a change
is proving to be necessary.
-- 
Michael



> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> So perhaps the best answer, is not 1 nor 2. Just saying that the routines
> are carefully maintained with a best effort, though sometimes you may need
> to rebuild depending on unavoidable changes in routine signatures that had
> to be introduced.

Good, I'd like to use that "mild" expression in the manual.  Although the expression is mild, the reality for users is
not,is it?
 
Because the UDF developers and users cannot easily or correctly determine if rebuilding is necessary, nervous
(enterprise)users will rebuild their UDFs with each minor release for the maximum safety as Michael does.
 

Regards
Takayuki Tsunakawa


On 1 July 2016 at 08:33, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote:
Hello,

While I was thinking of application binary compatibility between PostgreSQL releases, some questions arose about C language user-defined functions (UDFs) and extensions that depend on them.

[Q1]
Can the same UDF binary be used with different PostgreSQL minor releases?  If it is, is it a defined policy (e.g. written somewhere in the manual, wiki, documentation in the source code)?

For example, suppose you build a UDF X (some_extension.so/dll) with PostgreSQL 9.5.0.  Can I use the binary with PostgreSQL 9.5.x without rebuilding?

Probably - but we don't guarantee it.

There's no formal extension API. So there's no boundary between "internal stuff we might have to change to fix a problem" and "things extensions can rely on not changing under them". In theory anything that changed behaviour or changed a header file in almost any way could break an extension.

There's no deliberate breakage and some awareness of possible consequences to extensions, but no formal process. I would prefer that the manual explicitly recommend recompiling extensions against each minor update (or updating them along with the packages), and advise that packagers make their extensions depend on an = minor version in their package specifications, not a >= .

However, in practice it's fine almost all the time.

I think making this more formal would require, as Tom noted, a formal extension API we can promise to maintain, likely incorporating:

- fmgr
- datatype functions and macros
- elog and other core infrastructure
- major shmem structures
- GUC variables
- plan nodes and command structs
- SPI
- replication origins
- bgworkers
- catalog definitions
- ... endlessly more

To actually ensure extensions conform to the API we'd probably have to build with -fvisibility=hidden (gcc) and on Windows change our .def generation, so we don't expose anything that's not part of the formal API. That's a very strict boundary though; there's no practical way an extension can say "I know what I'm doing, gimme the internals anyway" and reach through it. I'd prefer a soft boundary that spat warnings when you touch stuff you're not allowed to, but I don't know of any good way to do that that works across multiple compilers and toolchains.

We'd almost certainly have to allow ourselves to _expand_ the API in minor releases since otherwise the early introduction of the formal API would be a nightmare. That's fine on pretty much every platform though.

The main thing is that it's a great deal of work for limited benefit. I don't know about you, but I'm not keen.
 
Can the same UDF binary be used with different PostgreSQL distributions (EnterpriseDB, OpenSCG, RHEL packages, etc.)?  Or should the UDF be built with the target distribution?

Not especially safely.

If you verified that all the compiler flags were the same and your extension doesn't transitively bundled reference libraries that might be different and incompatible versions (notably gettext, which Pg exposes in its own headers) ... you're probably OK.

Again, in practice it generally works, but I wouldn't recommend it. Nor is this something we can easily address with an extension API policy.
 
How about other distributions which probably don't modify the source code?  Should the UDF be built with the target PostgreSQL because configure options may differ, which affects data structures

Yeah. And exposed ABI. I don't recommend it.

It's probably safe-ish on MS Windows, which is designed to allow greater compatibility between executables built with differing toolchains and options. I wouldn't do it on any unix.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Craig Ringer
There's no formal extension API. So there's no boundary between "internal stuff we might have to change to fix a problem" and "things extensions can rely on not changing under them". In theory anything that changed behaviour or changed a header file in almost any way could break an extension.

 

There's no deliberate breakage and some awareness of possible consequences to extensions, but no formal process. I would prefer that the manual explicitly recommend recompiling extensions against each minor update (or updating them along with the packages), and advise that packagers make their extensions depend on an = minor version in their package specifications, not a >= .

 

 

Yes, I think such recommendation in the manual is the best.

 

 

However, in practice it's fine almost all the time.

 

Maybe most extensions don’t use sensitive parts of the server…

 

 

I think making this more formal would require, as Tom noted, a formal extension API we can promise to maintain, likely incorporating:

- ... endlessly more

 

Endless (^^;)

 

The main thing is that it's a great deal of work for limited benefit. I don't know about you, but I'm not keen.

 

I’m not keen, either… I don’t think I can form the API that advanced extension developers will be satisfied with.  I’ll just document the compabibility article in the upgrade section.

 

Regards

Takayuki Tsunakawa