Thread: Stored procedure code no longer stored in v14 and v15, changed behaviour

Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"Martijn Tonies \(Upscene Productions\)"
Date:
Hi all,

In PosgreSQL version 13, the source code for a stored procedure or function 
in SQL/plpgsql/etc was stored in pg_proc.prosrc. This column would hold the 
original procedure or function body, verbatim.

Since version 14, the source code for a stored procedure or function written 
in plain (compound) SQL, a new feature, is no longer stored in 
pg_proc.prosrc, instead, there’s an additional column prosqlbody which 
returns some kind of pre-parsed SQL which has no use for the user.

I know you can use pg_get_functiondef to get a CREATE PROCEDURE/FUNCTION 
statement, but for compound SQL based routines in v14 and v15, this removes 
all formatting and comments in the original CREATE statement. For database 
development, including browsing an existing database and know what each 
routine can do, this is quite terrible.


What is the reason for this?


Can you modify the server code to store the original body in proc.prosrc 
again? It would be very helpful.


With regards,

Martijn Tonies
Upscene Productions
https://www.upscene.com

Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL, 
MySQL, InterBase, NexusDB and Firebird. 




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"David G. Johnston"
Date:
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <m.tonies@upscene.com> wrote:

Since version 14, the source code for a stored procedure or function written
in plain (compound) SQL, a new feature, is no longer stored in
pg_proc.prosrc, instead, there’s an additional column prosqlbody which
returns some kind of pre-parsed SQL which has no use for the user.
 
For database
development, including browsing an existing database and know what each
routine can do, this is quite terrible.


Frankly, this is not all that compelling.  The version controlled source code should be readily referenced to find out the extended details of this nature.  The function name, and a COMMENT ON comment, provide arguably sufficient in-database knowledge for cursory browsing redirecting the reader to the source code for implementation details and history.

Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.


I seem to recall that this option had been discussed and rejected when this feature went in.  The parsed body is a feature because its contents can be updated due to, e.g., renaming of objects.  The text source of the original command would be unable to be updated in the same way and so it is possible the text and the parsed contents would diverge over time, which is a situation we do not want to have.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
> m.tonies@upscene.com> wrote:
>> Can you modify the server code to store the original body in proc.prosrc
>> again? It would be very helpful.

> I seem to recall that this option had been discussed and rejected when this
> feature went in.  The parsed body is a feature because its contents can be
> updated due to, e.g., renaming of objects.  The text source of the original
> command would be unable to be updated in the same way and so it is possible
> the text and the parsed contents would diverge over time, which is a
> situation we do not want to have.

Indeed.  We used to have a similar situation with respect to column
default expressions and CHECK constraint expressions.  Eventually we got
rid of the textual storage of both, because it couldn't be maintained
in a reasonable way.

I think the answer here is "don't use the new syntax if you want the
function body stored textually".  You can have one set of benefits,
or the other set, but not both at once.

            regards, tom lane



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Dominique Devienne
Date:
On Thu, Dec 1, 2022 at 4:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
> > m.tonies@upscene.com> wrote:
> >> Can you modify the server code to store the original body in proc.prosrc
> >> again? It would be very helpful.
>
> > I seem to recall that this option had been discussed and rejected when this
> > feature went in.  The parsed body is a feature because its contents can be
> > updated due to, e.g., renaming of objects.  The text source of the original
> > command would be unable to be updated in the same way and so it is possible
> > the text and the parsed contents would diverge over time, which is a
> > situation we do not want to have.
>
> Indeed.  We used to have a similar situation with respect to column
> default expressions and CHECK constraint expressions.  Eventually we got
> rid of the textual storage of both, because it couldn't be maintained
> in a reasonable way.
>
> I think the answer here is "don't use the new syntax if you want the
> function body stored textually".  You can have one set of benefits,
> or the other set, but not both at once.

FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
and the fact the original SQL is not conserved as-is has also created
issues for us.

On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
etc... So this
byte-perfect comparison is no longer possible, and we must rely on heuristics
(a.k.a, hacks...) to do a fuzzy-compare (since we lack a real PG-compatible SQL
parsers to do an infoset-comparison instead, at the AST level for example).

So it's not just a matter of browsing the schema. For us, it's a
*functional* issue. --DD



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"Martijn Tonies \(Upscene Productions\)"
Date:
On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <m.tonies@upscene.com> wrote:

Since version 14, the source code for a stored procedure or function written
in plain (compound) SQL, a new feature, is no longer stored in
pg_proc.prosrc, instead, there’s an additional column prosqlbody which
returns some kind of pre-parsed SQL which has no use for the user.
 
For database
development, including browsing an existing database and know what each
routine can do, this is quite terrible.

 
Frankly, this is not all that compelling.  The version controlled source code should be readily referenced to find out the extended details of this nature.  The function name, and a COMMENT ON comment, provide arguably sufficient in-database knowledge for cursory browsing redirecting the reader to the source code for implementation details and history.
 
 
Frankly, David, you’re wrong. Although I agree a version controlled source code is a very good way to maintain a proper version of your development database, –while- developing and changing source code, it’s simply not the easiest way to run scripts, compared to having a tool that allows more interactive development, code commenting (instead of “COMMENT ON”), SQL Insight, browsing the current structure and so on.
 
 
 
 
Can you modify the server code to store the original body in proc.prosrc
again? It would be very helpful.

 
I seem to recall that this option had been discussed and rejected when this feature went in.  The parsed body is a feature because its contents can be updated due to, e.g., renaming of objects.  The text source of the original command would be unable to be updated in the same way and so it is possible the text and the parsed contents would diverge over time, which is a situation we do not want to have.
 
 
Ah yes, automatic renaming of database objects with its dependencies can be useful. Oracle invalidates routines that use the objects, marks those routines ‘invalid’ and they need to be recompiled, that would fail when the source code references an object that no longer exists. This also means that you actually know which files you need to touch in your version controlled source Winking smile
 
With regards,

Martijn Tonies
Upscene Productions
https://www.upscene.com

Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL, MySQL, InterBase, NexusDB and Firebird.
Attachment

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Laurenz Albe
Date:
On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
> FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
> and the fact the original SQL is not conserved as-is has also created
> issues for us.
> 
> On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
> PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
> etc...
>
> So it's not just a matter of browsing the schema. For us, it's a
> *functional* issue. --DD

This is arguable, but my opinion is that this is not a robust way to
do development.  You should use a schema versioning tool like Liquibase,
develop schema migration scripts and maintain the SQL code in a source
repository like other software.

At any rate, you won't be able to do it in your accustomed way in
PostgreSQL.

Yours,
Laurenz Albe



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Dominique Devienne
Date:
On Thu, Dec 1, 2022 at 5:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
> > FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
>
> This is arguable, but my opinion is that this is not a robust way to
> do development.  You should use a schema versioning tool like Liquibase,
> develop schema migration scripts and maintain the SQL code in a source
> repository like other software.

We don't maintain SQL. We maintain a *Logical* model, and generate the
physical model from it.
FKs, NKs, Enums, CHECK constraints, indexes, etc.. all that
"implementation details"
is programmatically generated, and always consistent, from a much
higher-level and simpler model.
And you also get auto-upgrade most of the time, greatly increasing
development velocity too.

I would argue that NOT doing it this way, is the non-robust way myself :)
We've been refactoring a large data-model maintained manually like you advocate,
and I can't tell you how many anomalies we've discovered and had to fix,
using the more robust formalism of using a high-level logical model
and (DDL) code gen.

I guess is a DBA-versus-Developer point-of-view difference. --DD



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Adrian Klaver
Date:
On 12/1/22 09:24, Dominique Devienne wrote:
> On Thu, Dec 1, 2022 at 5:10 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>> On Thu, 2022-12-01 at 16:38 +0100, Dominique Devienne wrote:
>>> FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
>>
>> This is arguable, but my opinion is that this is not a robust way to
>> do development.  You should use a schema versioning tool like Liquibase,
>> develop schema migration scripts and maintain the SQL code in a source
>> repository like other software.
> 
> We don't maintain SQL. We maintain a *Logical* model, and generate the
> physical model from it.
> FKs, NKs, Enums, CHECK constraints, indexes, etc.. all that
> "implementation details"
> is programmatically generated, and always consistent, from a much
> higher-level and simpler model.
> And you also get auto-upgrade most of the time, greatly increasing
> development velocity too.
> 
> I would argue that NOT doing it this way, is the non-robust way myself :)
> We've been refactoring a large data-model maintained manually like you advocate,
> and I can't tell you how many anomalies we've discovered and had to fix,
> using the more robust formalism of using a high-level logical model
> and (DDL) code gen.
> 
> I guess is a DBA-versus-Developer point-of-view difference. --DD

What this points to is that there are multiple ways to handle this, many 
external to the server itself. My take is that the system catalogs are 
there for the proper operation of the server and that is their task, 
first and foremost. If you can piggyback of that then great, but with 
the knowledge that the information may change to meet the needs of the 
server not external users.

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Dominique Devienne
Date:
On Thu, Dec 1, 2022 at 6:41 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> On 12/1/22 09:24, Dominique Devienne wrote:
> > I guess is a DBA-versus-Developer point-of-view difference. --DD
>
> What this points to is that there are multiple ways to handle this, many
> external to the server itself. My take is that the system catalogs are
> there for the proper operation of the server and that is their task,
> first and foremost. If you can piggyback of that then great, but with
> the knowledge that the information may change to meet the needs of the
> server not external users.

I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
Would be nice if PostgreSQL did too. That's all I'm saying.

Having in libpq functionality that allows to do the same kind of SQL
normalization / rewriting done server-side would help. Then I could use
that to generate the DDL "just right" the first time.

For now, our current work-around is a two step process.
We first generate at-build-time the DDL using "our syntax" and
instantiate the schema.
Then introspect that and re-generate code with the "rewritten syntax".
Subsequent generation (it's dynamic, at runtime) will use the re-generated code
that matches the syntax re-write. Thus now the introspection and
diff'ing match the in-memory DDL.
Still, that's a PITA. I of course don't contest that PostgreSQL
maintains what it needs.
But ALSO maintaining the original, at least until a re-write is
necessary on renames,
would go A LONG WAY to satisfy the OP and myself in our use-cases. FWIW. --DD



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Christophe Pettus
Date:

> On Dec 1, 2022, at 11:05, Dominique Devienne <ddevienne@gmail.com> wrote:
>
> I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
> Would be nice if PostgreSQL did too. That's all I'm saying.

Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that
storesthe original text of the function. 


Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Dominique Devienne
Date:
On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus <xof@thebuild.com> wrote:
> > On Dec 1, 2022, at 11:05, Dominique Devienne <ddevienne@gmail.com> wrote:
> > I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
> > Would be nice if PostgreSQL did too. That's all I'm saying.
>
> Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that
storesthe original text of the function.
 

That's not the point. If a DBA updates one of our triggers or proc or
whatever else,
the recorded info in a custom table won't be affected. We are diff'ing
the server-side
schema, against the expected in-memory model of the physical model.

Thus the dictionaries are the only source of truth we can trust for
the current state of the schema.
And beside minor syntactic differences, and some more troublesome
object-name rewrites, this is
exactly what we want. The system itself needs to preserve the original
DDL IMHO. --DD



Dominique Devienne <ddevienne@gmail.com> writes:
> On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus <xof@thebuild.com> wrote:
>> Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that
storesthe original text of the function. 

> That's not the point. If a DBA updates one of our triggers or proc or
> whatever else,
> the recorded info in a custom table won't be affected. We are diff'ing
> the server-side
> schema, against the expected in-memory model of the physical model.

> Thus the dictionaries are the only source of truth we can trust for
> the current state of the schema.
> And beside minor syntactic differences, and some more troublesome
> object-name rewrites, this is
> exactly what we want. The system itself needs to preserve the original
> DDL IMHO. --DD

Do you really fail to see the contradictions in this?  You want the
database to preserve the original DDL, but you also want it to update
in response to subsequent alterations.  You can't have both those
things.

            regards, tom lane



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Gavan Schneider
Date:
On 2 Dec 2022, at 6:51, Tom Lane wrote:

> Dominique Devienne <ddevienne@gmail.com> writes:
>> On Thu, Dec 1, 2022 at 8:09 PM Christophe Pettus <xof@thebuild.com> wrote:
>>> Since this is a custom-built system, there is nothing keeping you from creating your own table in the database that
storesthe original text of the function. 
>
>> That's not the point. If a DBA updates one of our triggers or proc or
>> whatever else,
>> the recorded info in a custom table won't be affected. We are diff'ing
>> the server-side
>> schema, against the expected in-memory model of the physical model.
>
>> Thus the dictionaries are the only source of truth we can trust for
>> the current state of the schema.
>> And beside minor syntactic differences, and some more troublesome
>> object-name rewrites, this is
>> exactly what we want. The system itself needs to preserve the original
>> DDL IMHO. --DD
>
> Do you really fail to see the contradictions in this?  You want the
> database to preserve the original DDL, but you also want it to update
> in response to subsequent alterations.  You can't have both those
> things.
>
At the risk of stating the (over) obvious…

NEW:  the Pg system has a parsed/tokenised version of functions and such like, with the excellent feature that these
willbe kept synchronised with any name change of objects referenced 
OLD: easily available text version of functions, etc. with comments and layout (but not easily updatable in the event
referencedobjects get renamed) — sysadmin usage 

The contradiction is obvious but both needs can be justified…

NEXT(?): human readable version of function, etc definitions be generated from the parsed version, with the addition of
tokens within this parsed version that allow programer’s comments to be reinstated. Leave the layout style conventions
tothe system and promise to keep this style consistent and maintained as part of the feature. Putting this generated
(andupdated as needed) text into pg_proc.prosrc would seem to be the least change needed to allow existing usage to
movewith the new Pg versions (and maybe help pgAdmin as well) 

Regards

Gavan Schneider
——
Gavan Schneider, Sodwalls, NSW, Australia
Explanations exist; they have existed for all time; there is always a well-known solution to every human problem —
neat,plausible, and wrong. 
— H. L. Mencken, 1920



On Thu, Dec 01, 2022 at 04:38:57PM +0100, Dominique Devienne <ddevienne@gmail.com> wrote:

> On Thu, Dec 1, 2022 at 4:23 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > "David G. Johnston" <david.g.johnston@gmail.com> writes:
> > > On Thu, Dec 1, 2022 at 7:59 AM Martijn Tonies (Upscene Productions) <
> > > m.tonies@upscene.com> wrote:
> > >> Can you modify the server code to store the original body in proc.prosrc
> > >> again? It would be very helpful.
> >
> > > I seem to recall that this option had been discussed and rejected when this
> > > feature went in.  The parsed body is a feature because its contents can be
> > > updated due to, e.g., renaming of objects.  The text source of the original
> > > command would be unable to be updated in the same way and so it is possible
> > > the text and the parsed contents would diverge over time, which is a
> > > situation we do not want to have.
> >
> > Indeed.  We used to have a similar situation with respect to column
> > default expressions and CHECK constraint expressions.  Eventually we got
> > rid of the textual storage of both, because it couldn't be maintained
> > in a reasonable way.
> >
> > I think the answer here is "don't use the new syntax if you want the
> > function body stored textually".  You can have one set of benefits,
> > or the other set, but not both at once.
> 
> FWIW, we have a custom schema introspection and diff'ing ad-hoc framework,
> and the fact the original SQL is not conserved as-is has also created
> issues for us.
> 
> On Oracle, our SQL was preserved as-is, so could be compared reliably. While on
> PostgreSQL, some names-in-SQL are rewritten, the text reformatted,
> etc... So this
> byte-perfect comparison is no longer possible, and we must rely on heuristics
> (a.k.a, hacks...) to do a fuzzy-compare (since we lack a real PG-compatible SQL
> parsers to do an infoset-comparison instead, at the AST level for example).
> 
> So it's not just a matter of browsing the schema. For us, it's a
> *functional* issue. --DD

Same here. Accessing the loaded stored procedure source
is how I audit the state of stored procedures in the
database against the code in the code repository.
Without the ability to make that comparison, there is
no way to audit the stored procedures, and the only way
to make sure that the database is up to date with the
stored procedures would be to reload every single
stored procedure. I might have to alter the procedure
loading program to store its own copy of the source code
in the database somewhere, and just hope that nobody
loads stored procedures using any other tool. Even so,
it will slow down loading old database backups and
bringing their schema and code up to date. But that's
probably OK compared with the time to load the backup
itself.

cheers,
raf




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"Martijn Tonies \(Upscene Productions\)"
Date:
>> On 12/1/22 09:24, Dominique Devienne wrote:
>> > I guess is a DBA-versus-Developer point-of-view difference. --DD
>>
>> What this points to is that there are multiple ways to handle this, many
>> external to the server itself. My take is that the system catalogs are
>> there for the proper operation of the server and that is their task,
>> first and foremost. If you can piggyback of that then great, but with
>> the knowledge that the information may change to meet the needs of the
>> server not external users.
>
>I see. Still, Oracle preserves SQL as-is. SQLite preserve SQL as-is.
>Would be nice if PostgreSQL did too. That's all I'm saying.

So do InterBase, Firebird, SQL Server, MySQL (except for Views, strangely 
enough),
MariaDB, NexusDB, SQL Anywhere, and, frankly, all others I know of.

And this is used all the time by database developers.

And at least InterBase and Firebird -also- stored a 'parsed version' (in 
binary).


With regards,

Martijn Tonies
Upscene Productions
https://www.upscene.com

Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL, 
MySQL, InterBase, NexusDB and Firebird. 




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Laurenz Albe
Date:
On Fri, 2022-12-02 at 08:49 +0100, Martijn Tonies (Upscene Productions) wrote:
> So do InterBase, Firebird, SQL Server, MySQL (except for Views, strangely 
> enough),
> MariaDB, NexusDB, SQL Anywhere, and, frankly, all others I know of.
> 
> And this is used all the time by database developers.
> 
> And at least InterBase and Firebird -also- stored a 'parsed version' (in 
> binary).

Great; then go ahead and use those databases, if it is important for you.

In the same vein, I don't think any of those databases have trigram or
bloom indexes.  Perhaps we should improve the acceptance of PostgreSQL
by removing those features?

Yours,
Laurenz Albe



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Pasi Oja-Nisula
Date:
On Fri, 2 Dec 2022 at 02:24, raf <raf@raf.org> wrote:
> Same here. Accessing the loaded stored procedure source
> is how I audit the state of stored procedures in the
> database against the code in the code repository.

Exactly. If our software is audited, how can I reliably prove to auditor
that the running version of the procedure has not been tampered with
either by customer, rogue developer or some malicious party?

Suggested solution "store the text in user table" does not work, because
it requires following a process. And the process overriding changes are those
we are interested in.

I'm not so much concerned about whether the stored procedure text compiles
anymore (because of object name changes etc.) I just really would like to
have the exact source code stored with the database. So if release management
goes horribly wrong, there is at least the possibility to compare procedure
text manually.

Pasi



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Daniel Gustafsson
Date:
> On 2 Dec 2022, at 14:00, Pasi Oja-Nisula <pon@iki.fi> wrote:
> 
> On Fri, 2 Dec 2022 at 02:24, raf <raf@raf.org> wrote:
>> Same here. Accessing the loaded stored procedure source
>> is how I audit the state of stored procedures in the
>> database against the code in the code repository.
> 
> Exactly. If our software is audited, how can I reliably prove to auditor
> that the running version of the procedure has not been tampered with
> either by customer, rogue developer or some malicious party?

How do you today prove that for other compiled programs in your system?

--
Daniel Gustafsson        https://vmware.com/




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Pasi Oja-Nisula
Date:
On Fri, 2 Dec 2022 at 15:02, Daniel Gustafsson <daniel@yesql.se> wrote:
> How do you today prove that for other compiled programs in your system?

Good question. I guess I never considered database to be compiled.

How do you debug this kind of system? What if the you suspect that there is
a wrong version of procedure in the system? How can you then find what release
or patch produced the problem if you can't see the actual procedure text
that produced the running version?

I'm not claiming that there isn't benefits in this system. I'm sure it actually
prevents rogue changes for not having the source code readily available. It's
just a big fundamental change to get over if you can't do comparison from
database to version control.

Pasi



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Dominique Devienne
Date:
On Fri, Dec 2, 2022 at 1:37 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> Great; then go ahead and use those databases, if it is important for you.

Now come on. We all love PostgreSQL.
But that doesn't mean we can't disagree on some decisions.

Especially when you are a USER of PostgreSQL, not a DEV of it,
and it's too late by the time you are even aware of the changes.
Not everyone can chime in on the dev-list when those are discussed.

From a user point of view, can also be seen as a "regression",
when an observable property of the system changes to a new
different / incompatible way, to some extent. I'm not saying it is,
still it is a change one discovers too late, creates pain to some,
and is both worth reporting and discussing.

Given the tone of the discussion here though,
I don't think there's much to hope for a middle ground...

> In the same vein, I don't think any of those databases have ... bloom indexes.

SQLite has bloom filters now :) (not persistent indexes)



On Fri, Dec 02, 2022 at 03:00:09PM +0200, Pasi Oja-Nisula <pon@iki.fi> wrote:

> On Fri, 2 Dec 2022 at 02:24, raf <raf@raf.org> wrote:
> > Same here. Accessing the loaded stored procedure source
> > is how I audit the state of stored procedures in the
> > database against the code in the code repository.
> 
> Exactly. If our software is audited, how can I reliably prove to auditor
> that the running version of the procedure has not been tampered with
> either by customer, rogue developer or some malicious party?

> Suggested solution "store the text in user table" does not work, because
> it requires following a process. And the process overriding changes are those
> we are interested in.
> 
> I'm not so much concerned about whether the stored procedure text compiles
> anymore (because of object name changes etc.) I just really would like to
> have the exact source code stored with the database. So if release management
> goes horribly wrong, there is at least the possibility to compare procedure
> text manually.
> 
> Pasi

That's a different concern to mine. I just want to be
able to check that multiple test systems all have the
same schema and stored procedures, and to be able to
reload an old database, apply any schema updates made
since it was backed up, and load/drop any stored
procedures that were added/changed/removed since then
without having to reload all of them. So storing
procedure code in a user table would be OK. I just have
to update my db loading/auditing tools for v14+. It was
great to be able to make use of the source in pg_proc
for my needs, but that's not what it was for. I was
just lucky it was there.

If you're concerned about tampering by
customers/users/developers, you can either set
permissions to prevent it in some cases, and when you
can't prevent it, make it tamper-evident by logging
actions to somewhere remote and monitoring for what
concerns you. That should satisfy auditors.

cheers,
raf




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Dominique Devienne
Date:
On Thu, Dec 1, 2022 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Do you really fail to see the contradictions in this?  You want the
> database to preserve the original DDL, but you also want it to update
> in response to subsequent alterations.  You can't have both those

Hi. I probably didn't express myself correctly. I don't think there's
a contradiction.

I originally wrote:
"maintaining the original, at least until a re-write is necessary on renames".

But that I meant that the SQL would be preserved as-is, *initially*.
But that if/when a rename affecting that SQL happens, then it's fair
game to re-write it.
Because then the diff between my in-memory code-generated DDL, and the
server-side
DDL is no longer a false positive, as it is now from the "pre-emptive" re-write.

What is creating me pain, is the fact the re-write of the SQL is
*eager* instead of *lazy*.
I.e. I'm paying for the rewrite, even when it's not strictly necessary
(from my POV at least).

I hope that makes more sense. Thanks, --DD



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
DAVID ROTH
Date:
Is there a way to reverse engineer the original code (or its equivalent) from what is saved in the database?

> On 12/02/2022 8:48 AM Dominique Devienne <ddevienne@gmail.com> wrote:
> 
>  
> On Thu, Dec 1, 2022 at 8:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Do you really fail to see the contradictions in this?  You want the
> > database to preserve the original DDL, but you also want it to update
> > in response to subsequent alterations.  You can't have both those
> 
> Hi. I probably didn't express myself correctly. I don't think there's
> a contradiction.
> 
> I originally wrote:
> "maintaining the original, at least until a re-write is necessary on renames".
> 
> But that I meant that the SQL would be preserved as-is, *initially*.
> But that if/when a rename affecting that SQL happens, then it's fair
> game to re-write it.
> Because then the diff between my in-memory code-generated DDL, and the
> server-side
> DDL is no longer a false positive, as it is now from the "pre-emptive" re-write.
> 
> What is creating me pain, is the fact the re-write of the SQL is
> *eager* instead of *lazy*.
> I.e. I'm paying for the rewrite, even when it's not strictly necessary
> (from my POV at least).
> 
> I hope that makes more sense. Thanks, --DD



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Jeremy Smith
Date:


From a user point of view, can also be seen as a "regression",
when an observable property of the system changes to a new
different / incompatible way, to some extent. I'm not saying it is,
still it is a change one discovers too late, creates pain to some,
and is both worth reporting and discussing.

Isn't this only true if you use the new sql_body/BEGIN ATOMIC syntax, though?  Can't you keep using the older AS 'definition' syntax and still get the body of the function, unchanged, in the prosrc column?  You would, of course, lose the benefit of the dependency tracking.  As a user, though, I don't find it surprising that creating a function using new syntax specifically designed to provide dependency tracking would change the way the function is stored internally.

To me, the new syntax gives the same benefits and drawbacks to SQL functions that we already have with views, which are also re-written on creation.

      -Jeremy

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Pasi Oja-Nisula
Date:
On Fri, 2 Dec 2022 at 15:47, raf <raf@raf.org> wrote:
> If you're concerned about tampering by
> customers/users/developers, you can either set
> permissions to prevent it in some cases, and when you
> can't prevent it, make it tamper-evident by logging
> actions to somewhere remote and monitoring for what
> concerns you. That should satisfy auditors.

True. But isn't this extra work compared to previous situation?
If you can compare procedure text directly and say to your developers
"you scoundrel did a change outside version control, no dessert for you".

I would be perfectly satisfied, if the sql that produced the procedure
would be stored "as is" read-only copy when it was compiled. If an object
rename makes it invalid, tweak a bit telling so, but don't change the text
until next alter procedure is run.

Pasi



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"Peter J. Holzer"
Date:
On 2022-12-02 14:02:37 +0100, Daniel Gustafsson wrote:
> > On 2 Dec 2022, at 14:00, Pasi Oja-Nisula <pon@iki.fi> wrote:
> >
> > On Fri, 2 Dec 2022 at 02:24, raf <raf@raf.org> wrote:
> >> Same here. Accessing the loaded stored procedure source
> >> is how I audit the state of stored procedures in the
> >> database against the code in the code repository.
> >
> > Exactly. If our software is audited, how can I reliably prove to auditor
> > that the running version of the procedure has not been tampered with
> > either by customer, rogue developer or some malicious party?
>
> How do you today prove that for other compiled programs in your system?

Generally by storing hashes of the binaries in some tamper-proof way
(for example, the packages may be signed by the distributor). Then you
can compute the hashes of the binaries on your system and compare them
with the known-good hashes.

But that assumes that the binaries that are installed are actually the
binaries which are used. As I understand it, this is not the case here
as the artefact which is sent to the server is the source code which is
then tokenized/compiled and stored by the server. So you can't simply
tell whether the stored/used version corresponds to the code you
installed.

I don't know how reproducable that tokenization process is. Can you just
do it again and compere the results?

        hp

--
   _  | Peter J. Holzer    | Story must make more sense than reality.
|_|_) |                    |
| |   | hjp@hjp.at         |    -- Charles Stross, "Creative writing
__/   | http://www.hjp.at/ |       challenge!"

Attachment
Pasi Oja-Nisula <pon@iki.fi> writes:
> I would be perfectly satisfied, if the sql that produced the procedure
> would be stored "as is" read-only copy when it was compiled.

As I mentioned upthread, we used to have that behavior (store both
compiled form and original text) for column default expressions and
CHECK constraint expressions.  We got rid of that because it caused
far more confusion, and bugs, than it was worth.  Not bugs in the
database, but faulty behavior in applications that thought they
could trust the text form to be accurate.  I don't intend to
fail to learn from that history.

Also, we've *never* had storage of the original text for views.
I'm a little mystified by people claiming they use original
text for vetting functions when they clearly have no equivalent
ability for views ... or are your applications completely
view-free?

If you want an audit comparison point, I'd suggest capturing
the result of pg_get_functiondef or one of its sibling functions
just after creating your function.  "pg_dump -s" is another
pretty credible mechanism for capturing schema details.

> If an object
> rename makes it invalid, tweak a bit telling so, but don't change the text
> until next alter procedure is run.

I'm astonished at the number of people who think that poorly-implemented
Oracle behavior is something we should emulate.

            regards, tom lane



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Christophe Pettus
Date:

> On Dec 2, 2022, at 08:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Pasi Oja-Nisula <pon@iki.fi> writes:
>> I would be perfectly satisfied, if the sql that produced the procedure
>> would be stored "as is" read-only copy when it was compiled.

> If you want an audit comparison point, I'd suggest capturing
> the result of pg_get_functiondef or one of its sibling functions
> just after creating your function.  "pg_dump -s" is another
> pretty credible mechanism for capturing schema details.

Agreeing with Tom, I find the argument that this is important for auditing unpersuasive.  Storing the original text but
allowingthe version that is actually executed to be different is a step away from anything like good auditing practice. 


Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Pasi Oja-Nisula
Date:
> If you want an audit comparison point, I'd suggest capturing
> the result of pg_get_functiondef or one of its sibling functions
> just after creating your function.  "pg_dump -s" is another
> pretty credible mechanism for capturing schema details.

That's very good info, thank you. I guess freaking out was unnecessary. Sorry.

So the idea is to do apples to apples comparison and forget the sql source files
with whitespace and comments. When comparing a know good database
I can do pg_get_functiondef in both databases and compare the results.
It's just a matter of how well each query tool can represent diffs or
do results
comparison. pg_dump output is even better for comparison.

Pasi



On Fri, Dec 02, 2022 at 11:02:18AM -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Also, we've *never* had storage of the original text for views.
> I'm a little mystified by people claiming they use original
> text for vetting functions when they clearly have no equivalent
> ability for views ... or are your applications completely
> view-free?

In my case, yes, all access is via procedures.
No views or triggers. So it was OK.

cheers,
raf




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Alban Hertroys
Date:
> On 2 Dec 2022, at 15:10, DAVID ROTH <adaptron@comcast.net> wrote:
>
> Is there a way to reverse engineer the original code (or its equivalent) from what is saved in the database?

I’m dumping an idea here… Treading back quite a bit with information from later in this thread.

With the original procedure source code under version control and assuming the tokenization converts that source code
consistently(since it’s done by a computer), you could store another level of diffs: From the source code you pushed,
againstthe output of pg_get_functiondef. 

Assuming that changes to the tokenised code, when converted back to text, only involve renames of database objects, the
resultof a reversely applied diff could very well be comparable to the original source code. 

I suspect that would be sufficient for telling whether a developer is responsible for the changes, or that they were
causedby renaming of database artefacts. 

You would need to wrap the function creation calls into some automation to generate and store those diffs, comparing it
back,etc, but that may be doable. I would also generate new diffs right after major version updates of the database (a
beforeand after of the output of pg_get_functiondef, applied to the stored diff?). 

I’m not so sure that would work for auditing, but that seems to have been tackled down-thread.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Brad White
Date:
On 12/2/2022 9:05 AM, Peter J. Holzer wrote:
I don't know how reproducable that tokenization process is. Can you just
do it again and compere the results?
Right.
You can compare passwords, even though you don't store the original.
You might be able to run your virgin copy through the tokenization and compare the results.
--
I talk with clients, find out where their pain points are, and solve those.
On-call IT Management for small companies and non-profits.
SCP, Inc.
bwhite@inebraska.com
402-601-7990


Quote of the Day
   There is a huge difference between fathering a child and being a father.
   One produces a child. The other produces an adult.
    -- John Eldredge

Aw: Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Karsten Hilbert
Date:
> You would need to wrap the function creation calls into some automation to generate and store those diffs, comparing
itback, etc, but that may be doable. I would also generate new diffs right after major version updates of the database
(abefore and after of the output of pg_get_functiondef, applied to the stored diff?).
 

I wonder whether that would tie the sanity check to a particular PG version.

I mean, pg_get_functiondef output being a server runtime artifact it might
well change between server versions, no ?

Best,
Karsten




Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Alban Hertroys
Date:
> On 3 Dec 2022, at 20:55, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
>
>> You would need to wrap the function creation calls into some automation to generate and store those diffs, comparing
itback, etc, but that may be doable. I would also generate new diffs right after major version updates of the database
(abefore and after of the output of pg_get_functiondef, applied to the stored diff?). 
>
> I wonder whether that would tie the sanity check to a particular PG version.
>
> I mean, pg_get_functiondef output being a server runtime artifact it might
> well change between server versions, no ?

I meant to write: “I would also generate new diffs right _before and_ after…”, precisely for that reason. The before
patchshould get you the last ’sane’ situation to get back to the source code. Next, you can diff that to the newly
tokenisedversion after the upgrade. 

It is a bit of a hassle, as you need to remember to do that before an upgrade, but at least you’d have something…

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




On 12/4/22 11:05, Alban Hertroys wrote:
>> On 3 Dec 2022, at 20:55, Karsten Hilbert <Karsten.Hilbert@gmx.net> wrote:
>>
>>> You would need to wrap the function creation calls into some automation to generate and store those diffs,
comparingit back, etc, but that may be doable. I would also generate new diffs right after major version updates of the
database(a before and after of the output of pg_get_functiondef, applied to the stored diff?).
 
>> I wonder whether that would tie the sanity check to a particular PG version.
>>
>> I mean, pg_get_functiondef output being a server runtime artifact it might
>> well change between server versions, no ?
> I meant to write: “I would also generate new diffs right _before and_ after…”, precisely for that reason. The before
patchshould get you the last ’sane’ situation to get back to the source code. Next, you can diff that to the newly
tokenisedversion after the upgrade.
 
>
> It is a bit of a hassle, as you need to remember to do that before an upgrade, but at least you’d have something…

I run this cron job every midnight:

#!/bin/bash
PGUSER=postgres
PGHOST=<some_host>
TS=`date +"%Y%m%d_%H%M%S%Z"`

cd /dba/schema

PREVSQL=`ls -1 *sql | tail -n1`
PREV_MD5=`md5sum $PREVSQL | cut -c1-32`
NEWSCHEMA=sides_${TS}${SUFFIX}.sql

pg_dump --schema-only --dbname=<some_db> --clean --create > $NEWSCHEMA

CURRSQL=`ls -1 *sql | tail -n1`
CURR_MD5=`md5sum $CURRSQL | cut -c1-32`

echo "--------"
date +"%F %T, %a"
if [ "$PREV_MD5" = "$CURR_MD5" ];
then
     echo Schema unchanged.  Deleting $NEWSCHEMA
     rm $NEWSCHEMA
else
     ls -aFl $NEWSCHEMA
fi


-- 
Angular momentum makes the world go 'round.



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
Karsten Hilbert
Date:
Am Sun, Dec 04, 2022 at 06:05:30PM +0100 schrieb Alban Hertroys:

> > I mean, pg_get_functiondef output being a server runtime artifact it might
> > well change between server versions, no ?
>
> I meant to write: “I would also generate new diffs right
> _before and_ after…”, precisely for that reason.

I see. That makes sense.

Karsten
--
GPG  40BE 5B0E C98E 1713 AFA6  5BC0 3BEA AC80 7D4F C89B



Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"Martijn Tonies \(Upscene Productions\)"
Date:
Hello Jeremy,
 
Yes and no Winking smile
 
For Stored Functions, the “sql_body” can either be a block with BEGIN ATOMIC ... END or a single statement RETURN expression;
 
For a Stored Procedure, the “sql_body” is always the block.
 
 
Using “language SQL” and an “sql_body” (as per documentation) is certainly easier (no string constant) and more compatible with the SQL standard. But if you do so, you loose the possibility to retrieve your actual code from the database, which, IMO, is very counter intuitive.
 
Lots of people use, how shall we call it, ‘interactive development’ in tools (like the one we create) and instead of running a saved and modified script, you load the objects from the database, you modify it in a database development tool, test, modify, test etc.
 
I understand that dependency tracking is useful, and automatic object modification when doing object renaming is nice, but personally, that would be a corner case compared to day-to-day development of stored code.
 
With regards,

Martijn Tonies
Upscene Productions
https://www.upscene.com

Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL, MySQL, InterBase, NexusDB and Firebird.
 
Sent: Friday, December 2, 2022 3:10 PM
Subject: Re: Stored procedure code no longer stored in v14 and v15, changed behaviour
 
 

From a user point of view, can also be seen as a "regression",
when an observable property of the system changes to a new
different / incompatible way, to some extent. I'm not saying it is,
still it is a change one discovers too late, creates pain to some,
and is both worth reporting and discussing.
 
Isn't this only true if you use the new sql_body/BEGIN ATOMIC syntax, though?  Can't you keep using the older AS 'definition' syntax and still get the body of the function, unchanged, in the prosrc column?  You would, of course, lose the benefit of the dependency tracking.  As a user, though, I don't find it surprising that creating a function using new syntax specifically designed to provide dependency tracking would change the way the function is stored internally.
 
To me, the new syntax gives the same benefits and drawbacks to SQL functions that we already have with views, which are also re-written on creation.
 
      -Jeremy
Attachment

Re: Stored procedure code no longer stored in v14 and v15, changed behaviour

From
"Martijn Tonies \(Upscene Productions\)"
Date:
>Also, we've *never* had storage of the original text for views.
>I'm a little mystified by people claiming they use original
>text for vetting functions when they clearly have no equivalent
>ability for views ... or are your applications completely
>view-free?

Well, I would say that's annoying too!

>If you want an audit comparison point, I'd suggest capturing
>the result of pg_get_functiondef or one of its sibling functions
>just after creating your function.  "pg_dump -s" is another
>pretty credible mechanism for capturing schema details.

Do note, that pg_get_viewdef returns the SELECT statement of a view,
no column name specification, while pg_get_function_def returns a CREATE 
statement.

So the implementation is different too.


With regards,

Martijn Tonies
Upscene Productions
https://www.upscene.com

Database Workbench - developer tool for Oracle, SQL Server, PostgreSQL, 
MySQL, InterBase, NexusDB and Firebird.