Thread: [GENERAL] LYDB: What advice about stored procedures and other server side code?

[GENERAL] LYDB: What advice about stored procedures and other server side code?

From
Guyren Howe
Date:
I am putting together some advice for developers about getting the most out of SQL servers in general and Postgres in particular. I have in mind the likes of most web developers, who through ignorance or a strange cultural preference that has emerged, tend to treat their database server as a dumb data bucket.

I call the project Love Your Database (LYDB). It is starting as a series of blog posts:


I would next like to cover server-side code such as stored procedures and triggers.

I am inclined to advise folks to use PL/V8 on Postgres, because it is a reasonable language, everyone knows it, it has good string functions, decent performance and it tends to be installed everywhere (in particular, Amazon RDF offers it).

Broadly, what advice should I offer that isn’t obvious? Not just about PL/V8 but server side code in general.

TIA
> On Dec 27, 2016, at 2:03 PM, Guyren Howe <guyren@gmail.com> wrote:
>
> I am putting together some advice for developers about getting the most out of SQL servers in general and Postgres in
particular.I have in mind the likes of most web developers, who through ignorance or a strange cultural preference that
hasemerged, tend to treat their database server as a dumb data bucket. 
>
> I call the project Love Your Database (LYDB). It is starting as a series of blog posts:
>
> https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb
> https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152
>
> I would next like to cover server-side code such as stored procedures and triggers.
>
> I am inclined to advise folks to use PL/V8 on Postgres, because it is a reasonable language, everyone knows it, it
hasgood string functions, decent performance and it tends to be installed everywhere (in particular, Amazon RDF offers
it).
>

Think hard about the "impedance mismatch" between parts of the system.

pl/pgsql uses sql data types and operators, and so interfaces very cleanly with the rest of postgresql. pl/v8 uses
javascriptdata types and *for database related things* is likely to be a less perfect match to the rest of the system -
asit's translating (or, in some cases, failing to translate) between sql data types and javascript data types that may
notbe entirely compatible, or which may not exist at all. 

So if your functions are mostly doing databasey things, pl/pgsql may well be a better choice. If they're mostly doing
appythings, that just happen to be in the database, then pl/v8 may be a better choice (but so might just doing the work
inthe app, perhaps with some listen/notify assistance). 

Most of the functions I write are short trigger functions, or data wrapper/modification functions for migration or
makingbusiness logic available for SQL. For the majority of those I find pl/pgsql the best match (if I can't get away
withsql functions). 

If you're trying to convince people to get the most out of their database, pushing them towards pl/v8 as their first
choiceof embedded language might not be the best path. (That it might encourage them to write code to iterate through
tablesrather than taking advantage of SQL where they can might be a thing too). 

Cheers,
  Steve


> Broadly, what advice should I offer that isn’t obvious? Not just about PL/V8 but server side code in general.
>
> TIA



Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
Alban Hertroys
Date:
> On 27 Dec 2016, at 23:03, Guyren Howe <guyren@gmail.com> wrote:
>
> I am putting together some advice for developers about getting the most out of SQL servers in general and Postgres in
particular.I have in mind the likes of most web developers, who through ignorance or a strange cultural preference that
hasemerged, tend to treat their database server as a dumb data bucket. 
>
> I call the project Love Your Database (LYDB). It is starting as a series of blog posts:
>
> https://medium.com/@gisborne/love-your-database-lydb-23c69f480a1d#.4jngp2rcb
> https://medium.com/@gisborne/love-your-database-simple-validations-68d5d6d0bbf3#.az4o2s152
>
> I would next like to cover server-side code such as stored procedures and triggers.

I think there's an objection that most developers would come up with: Namely that stored procedures living in the
centralrelational database can usually not be changed as easily as client-side code, without disrupting development (or
worse,operation of the production system) for other users. I think that needs some thought - I don't have the answer to
thatone. 

IMO, as long as those stored procedures are part of the model (ensuring data integrity, interpreting database values,
etc.),there shouldn't be a problem, but it's a terrain I haven't trodden much. There are things you usually don't want
tohandle at the database side, such as firing off other applications to perform tasks or user interface related
operations,things like that. I once was tasked with writing a library on a database server (Oracle 8i) to handle
printingof data and sending out faxes etc. That was a nightmare and in hindsight it just doesn't belong in a database.
That'sa task for middleware. Where the dividing line is is probably an interesting topic. 

Another thought related to seeing the database as a model server; since we're apparently in the model-view-controller
paradigm,it seems to me that database views would probably be useful to the view-part of MVC. That's just something
thatstruck me to make more than a bit of sense… 
Of course, for the actual view in the MVC paradigm there should be some kind of user interface, but database views
couldbe really useful in preparing the data required for those, to make it fit the shape of the view. 

So far I like what you're writing, so keep it coming!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
Christoph Moench-Tegeder
Date:
## Guyren Howe (guyren@gmail.com):

> I am inclined to advise folks to use PL/V8 on Postgres, because it is
> a reasonable language, everyone knows it, it has good string functions,
> decent performance and it tends to be installed everywhere (in particular,
> Amazon RDF offers it).

I'd be careful with "everyone knows JavaScript" - that may hold in
web development, but there are many cases where Javascript is completely
off the map. And for V8 itself - it adds quite a chunk of code to your
PostgreSQL installation, that may put some people off. At least, I
haven't seen it installed "generally" in the wild, but my view might
be as biased as anyone else's.

> Broadly, what advice should I offer that isn’t obvious? Not just about
> PL/V8 but server side code in general.

Initially, running code in your database can make life easier for
the developers (ise pgTap for testing, pl/profiler and pl/debugger,
etc.). But once you have to change your schema, the hurt begins:
you'll need downtime for that, or you'll have to deal with the
possibility of changing the API of your "database side" code, and
matching code and tables on the database. There have been talks
about that topic (using versioned schemas and leveraging search_path),
but still: a simple change to a function suddenly requires a lot
of care.
The next pain point is scalability: running code on the database server
puts your code on the most expensive and hardest to scale CPUs. You
can (almost) always add another appserver to your setup (just spin
up a VM with a tomcat or whatever-you-use). But if the bottleneck
is your database CPUs, you'd have to move to a larger server (that
got easier with VMs, within limits); or use replication to offload
some code to standbys, keeping writing code on the primary (and
hope you'll have enough horsepower there). Multi-Master introduces
some special limitations and operational overhead on it's own, I'd
not generally recommend that for all applications and developers
just moving up from the "dump data bucket" model.

TL;DR: database side code can be a great thing in a small application,
but once the application and traffic grows, "code in the database"
requires specialist attention and may become a burden.
Unfortunately, most large applications started small...

Regards,
Christoph

--
Spare Space


Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
Pavel Stehule
Date:


2016-12-28 16:12 GMT+01:00 Christoph Moench-Tegeder <cmt@burggraben.net>:
## Guyren Howe (guyren@gmail.com):

> I am inclined to advise folks to use PL/V8 on Postgres, because it is
> a reasonable language, everyone knows it, it has good string functions,
> decent performance and it tends to be installed everywhere (in particular,
> Amazon RDF offers it).

I'd be careful with "everyone knows JavaScript" - that may hold in
web development, but there are many cases where Javascript is completely
off the map. And for V8 itself - it adds quite a chunk of code to your
PostgreSQL installation, that may put some people off. At least, I
haven't seen it installed "generally" in the wild, but my view might
be as biased as anyone else's.

> Broadly, what advice should I offer that isn’t obvious? Not just about
> PL/V8 but server side code in general.

Initially, running code in your database can make life easier for
the developers (ise pgTap for testing, pl/profiler and pl/debugger,
etc.). But once you have to change your schema, the hurt begins:
you'll need downtime for that, or you'll have to deal with the
possibility of changing the API of your "database side" code, and
matching code and tables on the database. There have been talks
about that topic (using versioned schemas and leveraging search_path),
but still: a simple change to a function suddenly requires a lot
of care.

you are forgot on reduction of network cost - when some task generates lot of fast queries, then main bottleneck is a network. Stored procedures removes this bottleneck.

PLpgSQL shares data formats and process with PostgreSQL database engine - there are not data conversations, there are not network/protocols/API overhead, there are not interprocess communication overhead.

 
The next pain point is scalability: running code on the database server
puts your code on the most expensive and hardest to scale CPUs. You
can (almost) always add another appserver to your setup (just spin
up a VM with a tomcat or whatever-you-use). But if the bottleneck
is your database CPUs, you'd have to move to a larger server (that
got easier with VMs, within limits); or use replication to offload
some code to standbys, keeping writing code on the primary (and
hope you'll have enough horsepower there). Multi-Master introduces
some special limitations and operational overhead on it's own, I'd
not generally recommend that for all applications and developers
just moving up from the "dump data bucket" model.

TL;DR: database side code can be a great thing in a small application,
but once the application and traffic grows, "code in the database"
requires specialist attention and may become a burden.
Unfortunately, most large applications started small...

When you use stored procedures, you have to choose well the border - what should be done by server, what should be done by outside. Usually stored procedures should be glue of SQL - and then the overhead of stored procedures is near to zero. Surely, stupid ORM techniques has terrible impact on server side.

regards

Pavel

 

Regards,
Christoph

--
Spare Space


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

|From: Christoph Moench-Tegeder
|Initially, running code in your database can make life easier for the developers
|(ise pgTap for testing, pl/profiler and pl/debugger, etc.). But once you have to
|change your schema, the hurt begins:
|you'll need downtime for that, or you'll have to deal with the possibility of
|changing the API of your "database side" code, and matching code and tables on
|the database.

I look at this from the opposite direction: with a stable database API (via stored procs), I can change the schema and
logicwithin the procs without causing any app code breakage…the app tier is completely insulated from those changes –
that’sworth a lot.  Yes, for deploying the schema change there must be an outage, but that’s true regardless of where
thedata is being manipulated – the data is still in a db…and there are ways to mitigate/eliminate the duration of the
outage.

|The next pain point is scalability: running code on the database server puts your
|code on the most expensive and hardest to scale CPUs. You can (almost) always
|add another appserver to your setup (just spin up a VM with a tomcat or
|whatever-you-use). But if the bottleneck is your database CPUs, you'd have to
|move to a larger server

Our dedicated db servers have not yet shown any real CPU consumption during app use - memory, for us, is the only real
limitingfactor.  The only time CPU consumption spikes is during admin activities - reindexing, vacuuming, bulk data
loads...thatsort of thing.  Even the boxplot calculations barely cause a ripple.  To me that speaks to the efficiency
oflanguage and engine working together.  You are right of course on the scaling - if we do run out of CPU horsepower
andcannot scale up any more, we'd have to scale out, and there are ways to do that too.  IOWs, the model doesn't have
tochange, just the solution to solve the scaling (solve the problem, not the symptom). 

|TL;DR: database side code can be a great thing in a small application, but once
|the application and traffic grows, "code in the database"
|requires specialist attention and may become a burden.
|Unfortunately, most large applications started small...

In my opinion, having a database specialist work on database stuff is a GOOD thing.  Tables get designed properly,
correctindexes are built, efficient query plans are created, etc.  ORMs are a shortcut to getting an app talking to
data,but aren't a substitute for a proper, scalable data tier.  IMO...being a data specialist...  :-) 

Mike Sofen (Synthetic Genomics)



Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
Christoph Moench-Tegeder
Date:
## Mike Sofen (msofen@runbox.com):

> I look at this from the opposite direction: with a stable database
> API (via stored procs), I can change the schema and logic within the
> procs without causing any app code breakage…the app tier is completely
> insulated from those changes – that’s worth a lot.

Many applications are not designed to have a "stable" database API.
And suddenly... there's not only dependencies between functions
in the "core application" but also between application and database,
in more ways than "just a few tables". One needs to be aware of
that.

> Our dedicated db servers have not yet shown any real CPU consumption
> during app use - memory, for us, is the only real limiting factor.

That depends on the usage pattern. An application for "company internal"
use will behave quite differently from a popular internet application.
I do know of cases where too much code in the database became the
number one problem.
What I'm saying is: choose wisely, and be prepared for the day where
all your assumptions will be proven wrong. Predicting future use and
changes of an application in an emergent field is near impossible.
Database-side code can be a great thing, and a real problem. In the
worst case, it's both at the same time.

Regards,
Christoph

--
Spare Space


Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
"Karsten Hilbert"
Date:
> Many applications are not designed to have a "stable" database API.

It seems OP is arguing they should.

Regards,
Karsten


Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
Christoph Moench-Tegeder
Date:
## Karsten Hilbert (Karsten.Hilbert@gmx.net):

> > Many applications are not designed to have a "stable" database API.
> It seems OP is arguing they should.

Well, if the environment allows for that, fine. If not, well, duh.

Regards,
Christoph

--
Spare Space


Re: [GENERAL] LYDB: What advice about stored procedures and otherserver side code?

From
Adrian Klaver
Date:
On 12/29/2016 02:12 AM, Christoph Moench-Tegeder wrote:
> ## Karsten Hilbert (Karsten.Hilbert@gmx.net):
>
>>> Many applications are not designed to have a "stable" database API.
>> It seems OP is arguing they should.
>
> Well, if the environment allows for that, fine. If not, well, duh.

Been following along and while there are several views there seems to be
one constant: "there should be a method to the madness". For a
particular situation a best try should be made to analyze what the
current and future needs are and plan a course of action that meets
those needs the best. This could be database-centric or app-centric. In
a perfect world this would be a non-biased decision, though I suspect it
tends to follow the inclinations/backgrounds of those involved in the
planning. In any case a defined course of action that is set out and
followed would seem to be better then arbitrarily bolting on
new/different methods on a whim.


>
> Regards,
> Christoph
>


--
Adrian Klaver
adrian.klaver@aklaver.com