Thread: [HACKERS] RustgreSQL

[HACKERS] RustgreSQL

From
Joel Jacobson
Date:
Hi all,

Is anyone working on porting PostgreSQL to Rust?

Corrode looks a bit limited for the task, but maybe it can be a start.
It doesn't support goto or switch, but maybe the gotos patterns are not too complicated.

My motivation is primarily I don't want to learn all the over-complicated details of C,
but at the same time I would like to be productive in a safe system language,
a category in which Rust seems to be alone.

Porting PostgreSQL to Rust would be a multi-year project,
and it could only be done if the process could be fully automated,
by supporting all the coding patterns used by the project,
otherwise a Rust-port would quickly fall behind the master branch.
But if all git commits could be automatically converted to Rust,
then the RustgreSQL project could pull all commits from upstream
until all development has switched over to Rust among all developers.

Is this completely unrealistic or is it carved in stone PostgreSQL will always be a C project forever and ever?

Re: [HACKERS] RustgreSQL

From
Gavin Flower
Date:
On 08/01/17 22:09, Joel Jacobson wrote:
> Hi all,
>
> Is anyone working on porting PostgreSQL to Rust?
>
> Corrode looks a bit limited for the task, but maybe it can be a start.
> It doesn't support goto or switch, but maybe the gotos patterns are 
> not too complicated.
>
> My motivation is primarily I don't want to learn all the 
> over-complicated details of C,
> but at the same time I would like to be productive in a safe system 
> language,
> a category in which Rust seems to be alone.
>
> Porting PostgreSQL to Rust would be a multi-year project,
> and it could only be done if the process could be fully automated,
> by supporting all the coding patterns used by the project,
> otherwise a Rust-port would quickly fall behind the master branch.
> But if all git commits could be automatically converted to Rust,
> then the RustgreSQL project could pull all commits from upstream
> until all development has switched over to Rust among all developers.
>
> Is this completely unrealistic or is it carved in stone PostgreSQL 
> will always be a C project forever and ever?
>From my very limited understanding, PostgreSQL is more likely to be 
converted to C++!


Cheers,
Gavin




Re: [HACKERS] RustgreSQL

From
Fabien COELHO
Date:
>> Is this completely unrealistic or is it carved in stone PostgreSQL will 
>> always be a C project forever and ever?
>
> From my very limited understanding, PostgreSQL is more likely to be converted 
> to C++!

ISTM that currently pg is written C89. Personnaly I think that C99 
(standard from 18 years ago...) would be a progress, but this has been 
rejected in the past because of portability issues on some platforms (eg 
MS Visual C++ started to support part of C99 in ... 2013).

-- 
Fabien.



Re: [HACKERS] RustgreSQL

From
"Greg Sabino Mullane"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


Joel Jacobson asked:

> Is anyone working on porting PostgreSQL to Rust?

No; extremely unlikely.

> My motivation is primarily I don't want to learn all the 
> over-complicated details of C

Well that's going to be a show-stopper right there. For a proper 
port, a deep understanding of the current source code is necessary.
You'd need a team expert in both C and Rust to pull it off.

> Porting PostgreSQL to Rust would be a multi-year project,
> and it could only be done if the process could be fully automated,
> by supporting all the coding patterns used by the project,
> otherwise a Rust-port would quickly fall behind the master branch.
> But if all git commits could be automatically converted to Rust,

Developing such a system is bordering on AI and likely more complex 
than Postgres itself. :)

> Is this completely unrealistic or is it carved in stone PostgreSQL will
> always be a C project forever and ever?

It's unrealistic, but there is nothing to say Postgres will stay in C 
forever. Right now, however, there is no compelling reason to move 
away from it, and the porting effort to any language would be immense. 
C++ would be the least painful option, probably.

- -- 
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201701080905
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAlhyR44ACgkQvJuQZxSWSsimzgCg97QZZ47BfNtema5aoN2QIpY9
wTUAn3B042YDH82GPLDwXmDSgJMzsoGD
=PH10
-----END PGP SIGNATURE-----





Re: [HACKERS] RustgreSQL

From
Craig Ringer
Date:
On 8 Jan. 2017 17:10, "Joel Jacobson" <joel@trustly.com> wrote:

Is this completely unrealistic or is it carved in stone PostgreSQL will always be a C project forever and ever?

Incredibly unrealistic.

PostgreSQL makes heavy use of variable length arrays. longjmp() is critical to its error handling. Lots of other "unsafe" things.

More to the point though, we struggle to get any acceptance of the most trivial added dependences. Even optionally. We still support a more-than-10-year-old Perl. We still use C89. The project is VERY conservative and expects new releases to run on ancient UNIXes long forgotten by the rest of the world.

This has some advantages. Those weird compilers and platforms help catch bugs. Most of the time they cost us little but minor inconvenience and they're part of why some contributors stay around. 

But it also makes it way harder to make significant change.

IMO the chances of the project switching to Rust are about as high as Oracle Database going open source, or MongoDB declaring that it's changing to SQL as the primary and preferred query language.

The ONLY way I could imagine it happening would be if you could show that it could be done incrementally, in a way that retained support for 10+ year old platforms, with a significant increase in performance and decrease in code size/complexity for converted modules. With minimal or no "rust droppings" (macros everywhere etc) to help such an incremental adaptation along. Even then you'd have to convince a lot of people who know C well (or well enough, or think they do) that it was worth such massive change. If a low pain seamless conversation/adaptation like that were possible I'd have to wonder what Rust could actually offer us over C since it clearly has the same scope for issues if such an intermixture is possible. Kind of a catch-22.

A restricted subset of C++ is a lot more likely. Even then longjmp will cause us pain... I suspect we'd land up having to move to C++ exceptions.

Take a look at elog.c, the memory contexts code, etc. If you think Rust can play well with that, cool. I can't imagine how though.

You'd have a lot more chance writing extensions in Rust though. If you can make it play OK with the exception handling in postgres and our memory management, at least. If you really wanted to push this forward... start there. Show how great it is.

Then come up with a plan for how you'd handle existing extensions (PostGIS?), external PLs, ecpg, pgxs, etc. Make sure libpq stays totally compatible. All that fun.

I just don't see it happening. Not do I see you suggesting any possible reason why we'd care or want to. You don't want to to learn C so dozens/hundreds of people need to learn Rust. What the?

Re: [HACKERS] RustgreSQL

From
Craig Ringer
Date:
On 8 Jan. 2017 18:14, "Fabien COELHO" <coelho@cri.ensmp.fr> wrote:

Is this completely unrealistic or is it carved in stone PostgreSQL will always be a C project forever and ever?

From my very limited understanding, PostgreSQL is more likely to be converted to C++!

ISTM that currently pg is written C89. Personnaly I think that C99 (standard from 18 years ago...) would be a progress, but this has been rejected in the past because of portability issues on some platforms (eg MS Visual C++ started to support part of C99 in ... 2013).

MSVC was really the main issue. MS really insisted that C++ was the future and C99 was a pointless diversion. 

I kinda agree with them TBH, albeit with a preference for a small-ish and carefully used subset of C++. But they've recognised that it matters to enough people to add support now anyways. I suspect their increased interest in open source and Linux is related.

Re: [HACKERS] RustgreSQL

From
Craig Ringer
Date:



Is this completely unrealistic or is it carved in stone PostgreSQL will always be a C project forever and ever?

Another thing to look at if you want to approach this as a serious, practical effort is the atomics,  memory barrier, spinlock and lwlock code.

I just don't see it happening.

Re: [HACKERS] RustgreSQL

From
Joel Jacobson
Date:
Thank you Craig for explaining the current C state of the project,
very interesting to learn about.

On Sun, Jan 8, 2017 at 4:19 AM, Craig Ringer
<craig.ringer@2ndquadrant.com> wrote:
> If a low pain seamless conversation/adaptation like that were possible I'd have to wonder what Rust could actually
offerus over C since it clearly has the same scope for issues if such an intermixture is possible. Kind of a catch-22.
 

Not necessarily. If you write non-idiomatic Rust and preserve as much
of the code style as possible from the original C code, and just focus
on getting ride of all usage of unsafe, then it will be more easily
understandable by existing C developers than if strictly writing
idiomatic Rust code.

> You'd have a lot more chance writing extensions in Rust though. If you can make it play OK with the exception
handlingin postgres and our memory management, at least. If you really wanted to push this forward... start there. Show
howgreat it is.
 

Funny you mention, that's actually exactly how these thoughts started
in my head. I realized I would need to write a C function to do some
computations that needed to be fast, but since C is dangerous I don't
know the language well enough I thought "I wish there was a safe and
fast system language you could
write database functions in!", and that's how I first found Rust and
then later this project:
https://github.com/thehydroimpulse/postgres-extension.rs
Which allows doing exactly that, writing extensions in Rust.

> You don't want to to learn C so dozens/hundreds of people need to learn Rust. What the?

Oh, I don't think you seriously think I meant to suggest others should
learn Rust just because I don't want to learn C.
I don't want to learn the complicated details of C, that's true.
But that has nothing to do why others would need to learn Rust. They
don't, unless the majority of the project would also want to move to
Rust, and that has of course nothing to do with me.
I'm just asking possibly stupid questions and having possibly stupid
theories, trying to understand why such a project would be possible or
not.



Re: [HACKERS] RustgreSQL

From
Jim Nasby
Date:
On 1/8/17 10:28 AM, Joel Jacobson wrote:
>> You'd have a lot more chance writing extensions in Rust though. If you can make it play OK with the exception
handlingin postgres and our memory management, at least. If you really wanted to push this forward... start there. Show
howgreat it is.
 
> Funny you mention, that's actually exactly how these thoughts started
> in my head. I realized I would need to write a C function to do some
> computations that needed to be fast, but since C is dangerous I don't
> know the language well enough I thought "I wish there was a safe and
> fast system language you could
> write database functions in!", and that's how I first found Rust and
> then later this project:
> https://github.com/thehydroimpulse/postgres-extension.rs
> Which allows doing exactly that, writing extensions in Rust.

Somewhat related to that... it would be useful if Postgres had "fenced" 
functions; functions that ran in a separate process and only talked to a 
backend via a well defined API (such as libpq). There's two major 
advantages that would give us:

- Untrusted languages could be made trusted. Currently there's no way to 
limit what a user could do with a function like plpythonu or pl/r, and 
there's no good way to make either of those languages trusted. But if 
the function code was running in a separate process, that process could 
be owned by a user and group (ie: nobody:nogroup) that has no 
permissions to do anything at the OS level. This would make it possible 
for hosting platforms like RDS to offer these languages without undue 
risk. I believe Joe Conway has done some investigation in this area.

- If you had a buggy C function that crashed it wouldn't force us to 
panic the entire database. The fenced process would die, we'd detect 
that and just through a normal error. Currently, C functions can access 
*everything* in backend memory, including shared memory. That's why if 
one crashes we have to panic. Even worse, a bug that corrupts data but 
doesn't always crash could do enormous amounts of damage. The worst a 
bug in a fenced function could do is screw up whatever it's returning to 
the backend.

As for Postgres in general, I think it would be nice if it was easier to 
do some things in a language other than C, for code simplicity reasons. 
The issue with that though is not significantly expanding what's 
necessary to build Postgres. I doubt that rust would meet that criteria.

That said, the community will at least consider things that offer 
*significant* advantages. For example, there's been some discussion 
about making use of LLVM for the executor. Since LLVM is meant to make 
it easier to build new languages, it's possible that it could be used to 
simplify other pieces of code as well.

BTW, I just came across http://safecode.cs.illinois.edu; that might get 
you a lot of the benefits you're looking for in Rust. The description on 
http://llvm.org claims it can be used like Valgrind, which the project 
currently supports.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] RustgreSQL

From
Jan de Visser
Date:
On Sunday, January 8, 2017 6:28:17 AM EST Joel Jacobson wrote:
> I don't want to learn the complicated details of C, that's true.

And this is where I think you're wrong, and why conversion would be hard. C 
has very few complicated details. I don't think it has any, frankly. It 
basically says "If you want your datastructure to outlive a function call, 
I'll give you a chunk of memory and you're now responsible for it. Have fun". 
That's not complicated: it's two functions, malloc() and free(), basically. 

What's hard and complicated is keeping track of all those little chunks of 
memory you have laying around. That management is deeply intertwined with the 
algorithmics of the system, and separating memory management from the actual 
work done will be very hard. In many cases the algorithm will have been 
implemented with cheap memory management in mind, and porting it to a 
different paradigm (garbage collection or rust's reference ownership) can 
result in either a lot of work and probably bugs, or bad performing code.

I personally find keeping track of allocated memory easier than rust's 
convoluted (to me) rules regarding reference ownership.

Your fear of C in unfounded. The definitive c89 reference is a little book of 
about 250 pages, more than half of which is about the standard library of 
which you'll never use more than half. If you have some notepaper laying about 
on which to scribble pointer diagrams you can be a C programmer to :-)




Re: [HACKERS] RustgreSQL

From
Michael Paquier
Date:
On Mon, Jan 9, 2017 at 6:51 AM, Jan de Visser <jan@de-visser.net> wrote:
> Your fear of C in unfounded. The definitive c89 reference is a little book of
> about 250 pages, more than half of which is about the standard library of
> which you'll never use more than half. If you have some notepaper laying about
> on which to scribble pointer diagrams you can be a C programmer to :-)

The reference guide of Brian Kernighan and Dennis Ritchie? Definitely
a must-have!
-- 
Michael



Re: [HACKERS] RustgreSQL

From
Peter Geoghegan
Date:
On Sun, Jan 8, 2017 at 1:51 PM, Jan de Visser <jan@de-visser.net> wrote:
> And this is where I think you're wrong, and why conversion would be hard. C
> has very few complicated details. I don't think it has any, frankly. It
> basically says "If you want your datastructure to outlive a function call,
> I'll give you a chunk of memory and you're now responsible for it. Have fun".
> That's not complicated: it's two functions, malloc() and free(), basically.

I don't think that that is true in practice. This paper summarizes why
this is the case: https://www.cl.cam.ac.uk/~pes20/cerberus/pldi16.pdf

At the same time, I don't think it would be a good idea to adopt Rust
for Postgres development, and not purely because of our legacy (it
might be interesting as a language for extensions, however). The
contradictory goals of C are what results in the kind of ambiguity
that that paper goes into. C may have contradictory goals, but that
doesn't mean they're the wrong goals, even when considered as a whole.
The culture that C is steeped in still makes a lot of sense for a
system like Postgres.

-- 
Peter Geoghegan



Re: [HACKERS] RustgreSQL

From
Gavin Flower
Date:
On 09/01/17 11:31, Michael Paquier wrote:
> On Mon, Jan 9, 2017 at 6:51 AM, Jan de Visser <jan@de-visser.net> wrote:
>> Your fear of C in unfounded. The definitive c89 reference is a little book of
>> about 250 pages, more than half of which is about the standard library of
>> which you'll never use more than half. If you have some notepaper laying about
>> on which to scribble pointer diagrams you can be a C programmer to :-)
> The reference guide of Brian Kernighan and Dennis Ritchie? Definitely
> a must-have!

I learnt C from the original version of K&R, and bought the ANSI version 
when that came out - at the time I was a COBOL programmer on a mighty 
MainFrame (with a 'massive' 1 MB of Core Memory and a 'fast' 2 MHz 
processor).  Now I use Java.


Cheers,
Gavin




Re: [HACKERS] RustgreSQL

From
Craig Ringer
Date:


On 9 Jan. 2017 05:51, "Jan de Visser" <jan@de-visser.net> wrote:
On Sunday, January 8, 2017 6:28:17 AM EST Joel Jacobson wrote:
> I don't want to learn the complicated details of C, that's true.

And this is where I think you're wrong, and why conversion would be hard. C
has very few complicated details. I don't think it has any, frankly.

Oh, that's really rather optimistic. 

C is a small-ish language. But achieving a good understanding of its memory model and its implications isn't easy at all. There is lots of undefined behaviour in the language too, which makes it easy to write code that works... mostly. Usually. Until you hit some edge case, run on a different architecture/platform, etc.

Then the system libraries and their implementations add complexity. ptheads may not be part of C but for many projects a solid understanding of them is crucial... and not that easy.

Do you comprehensively understand the rules for memory ordering when processes interact in shared memory? Can you explain the correct uses of volatile and when declaring something volatile is / isn't sufficient for ensuring safe concurrent access? What happens if you dereference a pointer to a struct allocated on the stack of a just-returned function? 

I had a quick look at Rust and it sounds like it tries to make this sort of stuff simpler. I didn't see any formal definition of a memory model though - it seems like it figures you'll just use its concurrency primitives. And C looks simple enough at first too... emergent complexity from seemingly simple rules is hard.

It is easy to write C programs of moderate complexity that work reliably within tested conditions and are somewhat portable to a set of tested-for architectures. It is very hard to write C that is generally portable, robust in the face of various edge-case inputs and environmental conditions, are free from race conditions and memory ordering problems, and rely on no undefined behaviour.

This is only partly a deficiency of C. Lots of it is down to low level systems being complex, hard and varied. Weak vs strong memory ordering, LP64 vs ILP64, etc etc etc.

I know only just enough C to be dangerous. Admittedly I haven't adequately studied the language, but I have some idea how much I don't know. I doubt there are a lot of people who can write truly error-free C. But that's also true of pretty much any language, even ones that purport to be safe.

Re: [HACKERS] RustgreSQL

From
Jim Nasby
Date:
On 1/8/17 5:06 PM, Craig Ringer wrote:
> It is very hard to write C that is generally portable, robust in the
> face of various edge-case inputs and environmental conditions, are free
> from race conditions and memory ordering problems, and rely on no
> undefined behaviour.

BTW, if you s/memory/set/ then that exactly describes building 
non-trivial systems on top of relational databases. The devil is always 
in the details.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] RustgreSQL

From
Andrew Dunstan
Date:

On 01/08/2017 09:19 AM, Craig Ringer wrote:
> On 8 Jan. 2017 17:10, "Joel Jacobson" <joel@trustly.com 
> <mailto:joel@trustly.com>> wrote:
>
>
>     Is this completely unrealistic or is it carved in stone PostgreSQL
>     will always be a C project forever and ever?
>
>
> Incredibly unrealistic.
>
> [lots of other stuff I agree with]
>
> You'd have a lot more chance writing extensions in Rust though. If you 
> can make it play OK with the exception handling in postgres and our 
> memory management, at least. If you really wanted to push this 
> forward... start there. Show how great it is.
>
>



Yeah. A few years ago Tom Dunstan and I started creating a sample 
extension in Rust while he was here on a short visit. We ran out of time 
so we didn't quite get it finished. Bottom line is it's possible but far 
from straightforward. Rust's inbuilt build system makes life, er, 
interesting. The fun of getting PG_MODULE_MAGIC in was one of the things 
we had to deal with. We ended up using a small amount of C glue.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] RustgreSQL

From
Greg Stark
Date:
On 8 January 2017 at 21:50, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> Somewhat related to that... it would be useful if Postgres had "fenced"
> functions; functions that ran in a separate process and only talked to a
> backend via a well defined API (such as libpq). There's two major advantages
> that would give us:

The problem with this is that any of the "interesting" extensions need
to use the server API. That is, they need to be able to do things like
throw errors, expand toast data, etc.

IMHO just about anything you could do in an external process would be
something you could much more easily and conveniently do in the
client. And it would be more flexible and scalable as well as it's a
lot easier to add more clients than it is to scale up the database.

That said, there were several pl language implementations that worked
this way. IIRC one of the Java pl languages ran in a separate Java
process.

I think the solution to the problem you're describing is the project
formerly known as NaCl
https://en.wikipedia.org/wiki/Google_Native_Client

-- 
greg



Re: [HACKERS] RustgreSQL

From
Jan de Visser
Date:
On Monday, January 9, 2017 7:06:21 AM EST Craig Ringer wrote:
> On 9 Jan. 2017 05:51, "Jan de Visser" <jan@de-visser.net> wrote:
> 
> On Sunday, January 8, 2017 6:28:17 AM EST Joel Jacobson wrote:
> > I don't want to learn the complicated details of C, that's true.
> 
> And this is where I think you're wrong, and why conversion would be hard. C
> has very few complicated details. I don't think it has any, frankly.
> 
> 
> Oh, that's really rather optimistic.
> 
[snip]

Allow me to be snarky and summarize your (very true) points as: "writing 
complicated software systems is hard". 

That's not the fault of the language (in most cases). The complexity can 
somewhat be abstracted by the language, which Rust and Java try to do, or 
completely left to the design, as you're forced to do in C. The former gives 
you either a more complicated language or severly limit what you can do, and 
in the end you will find walls to bang your head against and edges of cliffs to 
fall off of.

The latter, of course, makes your head explode if your system is large enough 
and you don't have Tom Lane and Robert Haas around.




Re: [HACKERS] RustgreSQL

From
Jim Nasby
Date:
On 1/8/17 5:56 PM, Greg Stark wrote:
> On 8 January 2017 at 21:50, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> Somewhat related to that... it would be useful if Postgres had "fenced"
>> functions; functions that ran in a separate process and only talked to a
>> backend via a well defined API (such as libpq). There's two major advantages
>> that would give us:
>
> The problem with this is that any of the "interesting" extensions need
> to use the server API. That is, they need to be able to do things like
> throw errors, expand toast data, etc.

There's plenty of interesting things you can do in python or R, even 
without that ability.

> IMHO just about anything you could do in an external process would be
> something you could much more easily and conveniently do in the
> client. And it would be more flexible and scalable as well as it's a
> lot easier to add more clients than it is to scale up the database.

Well, then you're suffering from serious network latency, and you're 
forced into worrying about endian issues and what-not. Those problems 
don't exist when you're running on the same server. There's also things 
that might make sense on a local-only protocol but would make no sense 
with an external one. My guess is that you'd ultimately want a protocol 
that's something "in between" SPI and libpq.

> That said, there were several pl language implementations that worked
> this way. IIRC one of the Java pl languages ran in a separate Java
> process.
>
> I think the solution to the problem you're describing is the project
> formerly known as NaCl
> https://en.wikipedia.org/wiki/Google_Native_Client

Possibly; depends on if it would allow running things like R or python.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] RustgreSQL

From
Pavel Stehule
Date:


2017-01-09 1:21 GMT+01:00 Jim Nasby <Jim.Nasby@bluetreble.com>:
On 1/8/17 5:56 PM, Greg Stark wrote:
On 8 January 2017 at 21:50, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
Somewhat related to that... it would be useful if Postgres had "fenced"
functions; functions that ran in a separate process and only talked to a
backend via a well defined API (such as libpq). There's two major advantages
that would give us:

The problem with this is that any of the "interesting" extensions need
to use the server API. That is, they need to be able to do things like
throw errors, expand toast data, etc.

There's plenty of interesting things you can do in python or R, even without that ability.

IMHO just about anything you could do in an external process would be
something you could much more easily and conveniently do in the
client. And it would be more flexible and scalable as well as it's a
lot easier to add more clients than it is to scale up the database.

Well, then you're suffering from serious network latency, and you're forced into worrying about endian issues and what-not. Those problems don't exist when you're running on the same server. There's also things that might make sense on a local-only protocol but would make no sense with an external one. My guess is that you'd ultimately want a protocol that's something "in between" SPI and libpq.

The running unsafe PL in own managed processes is good idea - Years, I have a one diploma theme "better management of unsafe PL in Postgres" - but still without any interest from students :(.  I had two possibilities to see catastrophic errors related to wrong usage of PLPerlu. If we can locks interpret/environment in some safe sandbox, then it should be great. 

Regards

Pavel
 


That said, there were several pl language implementations that worked
this way. IIRC one of the Java pl languages ran in a separate Java
process.

I think the solution to the problem you're describing is the project
formerly known as NaCl
https://en.wikipedia.org/wiki/Google_Native_Client

Possibly; depends on if it would allow running things like R or python.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RustgreSQL

From
Robert Haas
Date:
On Sun, Jan 8, 2017 at 4:59 AM, Gavin Flower
<GavinFlower@archidevsys.co.nz> wrote:
>> Is this completely unrealistic or is it carved in stone PostgreSQL will
>> always be a C project forever and ever?
>>
> From my very limited understanding, PostgreSQL is more likely to be
> converted to C++!

I'm tempted to snarkily reply that we should start by finishing the
conversion of PostgreSQL from LISP to C before we worry about
converting it to anything else.  There are various code comments that
imply that it actually was LISP at one time and I can certainly
believe that given our incredibly wasteful use of linked lists in so
many places.  gram.y asserts that this problem was fixed as far as the
grammar is concerned...
*        AUTHOR                        DATE                    MAJOR EVENT*        Andrew Yu                     Sept,
1994
POSTQUEL to SQL conversion*        Andrew Yu                     Oct, 1994               lispy
code conversion

...but I think it'd be fair to say that even there it was fixed only in part.

Anyway, with regards to either Rust (which I know very little about)
or C++ (which I know more about) I think it would be more promising to
think about enabling extensions to be written in such languages than
to think about converting the entire source base.  A system like
PostgreSQL is almost a language of its own; we don't really code for
PostgreSQL in C, but in "PG-C".  Learning the PG-specific idioms is
arguably more work than learning C itself, and that would still be
true, I think, if we had a "PG-C++" or a "PG-Rust" or a "PG-D"
variant.  Still, if having such variants drew more programmers to work
on extending PostgreSQL, I think that would be worth some work on our
part to enable it.  However, maintaining multiple copies of our
1.2-million-line source base just for easier reference by people more
familiar with one of those languages than with C sounds to me like it
would create more problems than it would solve.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RustgreSQL

From
Jim Nasby
Date:
On 1/9/17 3:15 AM, Pavel Stehule wrote:
> The running unsafe PL in own managed processes is good idea - Years, I
> have a one diploma theme "better management of unsafe PL in Postgres" -
> but still without any interest from students :(.  I had two
> possibilities to see catastrophic errors related to wrong usage of
> PLPerlu. If we can locks interpret/environment in some safe sandbox,
> then it should be great.

Incidentally, Tom just observed in another thread that Tcl treats out of 
memory as a panic situation, so anyone using pltcl opens the risk of 
arbitrary database-wide panics. Fenced pltcl is one possible solution 
for that problem.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] RustgreSQL

From
Jim Nasby
Date:
On 1/9/17 11:51 AM, Robert Haas wrote:
> Anyway, with regards to either Rust (which I know very little about)
> or C++ (which I know more about) I think it would be more promising to
> think about enabling extensions to be written in such languages than
> to think about converting the entire source base.  A system like

Yeah, converting the entire codebase is probably doomed to failure from 
the start.

> PostgreSQL is almost a language of its own; we don't really code for
> PostgreSQL in C, but in "PG-C".  Learning the PG-specific idioms is
> arguably more work than learning C itself, and that would still be
> true, I think, if we had a "PG-C++" or a "PG-Rust" or a "PG-D"
> variant.  Still, if having such variants drew more programmers to work
> on extending PostgreSQL, I think that would be worth some work on our
> part to enable it.  However, maintaining multiple copies of our
> 1.2-million-line source base just for easier reference by people more
> familiar with one of those languages than with C sounds to me like it
> would create more problems than it would solve.

I do wonder if there are parts of the codebase that would be much better 
suited to a language other than C, and could reasonably be ported. 
Especially if that could be done in such a way that the net result is 
still C code so we're not adding dependencies to non developers (similar 
to bison).

Extensions are a step in that direction, but they're ultimately not core 
Postgres (which is a different issue).
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)



Re: [HACKERS] RustgreSQL

From
Joel Jacobson
Date:
On Mon, Jan 9, 2017 at 3:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> I do wonder if there are parts of the codebase that would be much better
> suited to a language other than C, and could reasonably be ported.
> Especially if that could be done in such a way that the net result is still
> C code so we're not adding dependencies to non developers (similar to
> bison).
>
> Extensions are a step in that direction, but they're ultimately not core
> Postgres (which is a different issue).

I think this is a great idea!

That way the amount of C code could be reduced over time,
while safely extending the official version with new functionality on
the surface,
without increasing the amount of C code.

One idea of an area that would be most useful from a user-perspective
is probably all pg_catalog functions that are immutable.
They should be able to be written without expertise of the PostgreSQL internals,
since they only depend on the input parameters to produce the output.

And that also means is should be easier to write them in a different
language than C,
because they don't need access to any PostgreSQL internal data structures,
or make use of existing C functions.



Re: [HACKERS] RustgreSQL

From
Craig Ringer
Date:
On 10 January 2017 at 09:54, Joel Jacobson <joel@trustly.com> wrote:

> One idea of an area that would be most useful from a user-perspective
> is probably all pg_catalog functions that are immutable.
> They should be able to be written without expertise of the PostgreSQL internals,
> since they only depend on the input parameters to produce the output.

Wait, what?

No, that doesn't follow at all.

Immutable functions can and do use functionality from all over the
server. They just don't depend on user-visible mutable _state_
elsewhere in the server.

As with all the rest, you'd need something that integrates well enough
with C that you can use C functions ... and macros. In which case
you're basically writing C.

That's why I mentioned upthread that C++ is probably the only
reasonably likely direction to go in, if we ever change, and probably
only a progressive change based on a restricted subset of C++ if we
start needing C++-only features, start having platform support issues,
etc. The common subset of C and C++ is a large portion of the C
language.

> And that also means is should be easier to write them in a different
> language than C,
> because they don't need access to any PostgreSQL internal data structures,
> or make use of existing C functions.

Not the case at all. You'd need a large adapter layer for pretty much
any language to handle interaction with Pg's data types, memory
management, etc.

We don't currently define a formal public API or a "safe" subset of
PostgreSQL's C interfaces. Anyone who wanted to make a serious attempt
at writing "safe" or "fenced" C extensions in another language that
supports restricted execution would need to start there. Whether they
want to use NaCL, .NET Core or Mono and C#, Java with SecurityManager,
or whatever.

That's what the existing PL/Perl does, it just has a _very_ small part
of PostgreSQL's interfaces exposed, so it's very limited in what it
can actually do.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] RustgreSQL

From
Joel Jacobson
Date:
On Mon, Jan 9, 2017 at 6:03 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Immutable functions can and do use functionality from all over the
> server. They just don't depend on user-visible mutable _state_
> elsewhere in the server.

OK, fair point, but at least the functionality *could* be written without using
existing C functions, since its only the input that determine what
output will be returned. The dependencies used by the immutable
functions can also be ported, function by function, until there are
no dependencies.



Re: [HACKERS] RustgreSQL

From
Jan de Visser
Date:
On Monday, January 9, 2017 7:39:49 PM EST Joel Jacobson wrote:
> On Mon, Jan 9, 2017 at 6:03 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> > Immutable functions can and do use functionality from all over the
> > server. They just don't depend on user-visible mutable _state_
> > elsewhere in the server.
> 
> OK, fair point, but at least the functionality *could* be written without
> using existing C functions, since its only the input that determine what
> output will be returned. The dependencies used by the immutable
> functions can also be ported, function by function, until there are
> no dependencies.

Be that as it may, I don't think you have convinced anybody that that is 
something worth doing. The fact it *could* be done doesn't mean it *should* be 
done.

You're proposing to introduce a metric eff-ton of instability in a project 
that routinely spends ten-message email threads discussing changing an elog to 
ereport.

To give you some perspective: *everybody* agrees autotools (the mechanism used 
to generate makefiles) is aweful. Everybody. About a year ago somebody showed 
saying "Hey, I have a draft patch replacing autotools with cmake". Cmake is 
infinitely better (mostly because it was developed in this century as opposed 
to the early 80s, and so is more in tune with current toolchains). Yury has 
been working on it for a year now, and I personally don't think it's going to 
land in version 10. And this is "just" the make infrastructure.

What you are proposing is not going to happen unless you get some serious buy-
in from a significant number of veteran contributors. And those are exactly the 
people that say "C? What's the problem?"




Re: [HACKERS] RustgreSQL

From
Robert Haas
Date:
On Tue, Jan 10, 2017 at 8:42 AM, Jan de Visser <jan@de-visser.net> wrote:
> Be that as it may, I don't think you have convinced anybody that that is
> something worth doing. The fact it *could* be done doesn't mean it *should* be
> done.

+1.

> What you are proposing is not going to happen unless you get some serious buy-
> in from a significant number of veteran contributors. And those are exactly the
> people that say "C? What's the problem?"

+1.

I'm not meaning to be funny or sarcastic or disrespectful when I say
that I think C is the best possible language for PostgreSQL.  It works
great, and we've got a ton of investment in making it work.  I can't
see why we'd want to start converting even a part of the code to
something else.  Perhaps it seems like a good idea from 10,000 feet,
but in practice I believe it would be fraught with difficulties - and
if it injected even a few additional instructions into hot code paths,
it would be a performance loser.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RustgreSQL

From
otheus uibk
Date:
Joel Jacobson joel@trustly.com wrote:
> My motivation is primarily I don't want to learn all the over-complicated details of C,

That's rich, mate. C is one of the simplest languages. It's simplicity is its main benefit and its biggest drawback: it shields very little from the actual underlying hardware and system. C++ is incredibly complex, while still granting one access to the underlying system.  Every other high-level language I've seen that shields its users from the "details", ultimately suffers for it: either you the programmer must accept the limitations of what the language provides at the cost of flexibility and power, or you can't do what you really want to do, or you have to use lower-level primitives to accomplish what you want. 

Craig Ringer said:
This is only partly a deficiency of C. Lots of it is down to low level systems being complex, hard and varied. Weak vs strong memory ordering, LP64 vs ILP64, etc etc etc.

Well-said. Adding to that: interprocess management and communication.

 I suspect we'd land up having to move to C++ exceptions

Craig, isn't it the case that C++ exceptions still cause tremendous slow-downs of the entire code-base?

Re: [HACKERS] RustgreSQL

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm not meaning to be funny or sarcastic or disrespectful when I say
> that I think C is the best possible language for PostgreSQL.  It works
> great, and we've got a ton of investment in making it work.

Yeah.  There's certainly a whole lot of path dependency in that statement
--- if you were starting to write Postgres from scratch today, you would
very likely choose some other language.  But given where we are, there's
just not a lot of attraction in trying to convert to another language.

As other people noted, the one path that might possibly make sense is
a gradual upgrade to C++.  But getting past the exceptions issue is a
pretty high bar that we'd have to clear before we could do much in
that direction; and it's not obvious that C++ would offer enough benefit
to be worth it.  Most of us would rather spend our time on new features
or performance improvements, not fighting with a language changeover.
        regards, tom lane



Re: [HACKERS] RustgreSQL

From
Kevin Grittner
Date:
On Tue, Jan 10, 2017 at 7:48 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> I'm not meaning to be funny or sarcastic or disrespectful when I say
> that I think C is the best possible language for PostgreSQL.  It works
> great, and we've got a ton of investment in making it work.  I can't
> see why we'd want to start converting even a part of the code to
> something else.  Perhaps it seems like a good idea from 10,000 feet,
> but in practice I believe it would be fraught with difficulties - and
> if it injected even a few additional instructions into hot code paths,
> it would be a performance loser.

It strikes me that exactly the set of functions that Joel is
suggesting could be written in another language is the set where
the declarations in the .h files could be considered for
replacement with static inline functions, potentially giving a
significant performance boost which would not be available if they
were instead converted to another language.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RustgreSQL

From
Robert Haas
Date:
On Tue, Jan 10, 2017 at 10:55 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I'm not meaning to be funny or sarcastic or disrespectful when I say
>> that I think C is the best possible language for PostgreSQL.  It works
>> great, and we've got a ton of investment in making it work.
>
> Yeah.  There's certainly a whole lot of path dependency in that statement
> --- if you were starting to write Postgres from scratch today, you would
> very likely choose some other language.  But given where we are, there's
> just not a lot of attraction in trying to convert to another language.

Really?  What language would you pick in a vacuum?  The Linux kernel
is written in C, too, for pretty much the same reasons: it's the
canonical language for system software.  I don't deny that there may
be some newer languages out which could theoretically be used and work
well, but do any of them really have a development community and user
base around them that is robust enough that we'd want to be downstream
of it?  C has its annoyances, but its sheer pervasiveness is an
extremely appealing feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RustgreSQL

From
"Joshua D. Drake"
Date:
On 01/10/2017 08:12 AM, Robert Haas wrote:

> Really?  What language would you pick in a vacuum?  The Linux kernel
> is written in C, too, for pretty much the same reasons: it's the
> canonical language for system software.  I don't deny that there may
> be some newer languages out which could theoretically be used and work
> well, but do any of them really have a development community and user
> base around them that is robust enough that we'd want to be downstream
> of it?  C has its annoyances, but its sheer pervasiveness is an
> extremely appealing feature.

If we boil this down, I don't think any of this idea has to do with the 
fact that our database is written in C. I think it has to do with C is 
no longer "hip". We don't want to be hip. We are database people. Leave 
hip to MongoDB.

We want performance, stability, maturity and portability. (Not 
necessarily in that order).

There is not a single above hardware language (E.g; let's not rewrite in 
assembly) that provides those four requirements.

Rust is awesome. It is also 5 years old.
Go is awesome. It is also 8 years old.

C is awesome. It is 39 years old.

In human terms, C is the only one of these that has been around long 
enough to realize it isn't a teenager (or child really), and although 
you may still be able to do the things you could in your 20s, you are 
going to pay for them the next day.

JD

-- 
Command Prompt, Inc.                  http://the.postgres.company/                        +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.



Re: [HACKERS] RustgreSQL

From
Robert Haas
Date:
On Tue, Jan 10, 2017 at 12:24 PM, Joshua D. Drake <jd@commandprompt.com> wrote:
> In human terms, C is the only one of these that has been around long enough
> to realize it isn't a teenager (or child really), and although you may still
> be able to do the things you could in your 20s, you are going to pay for
> them the next day.

On aging, tell me about it.

On language selection, if I were working at Google, I'd be fine with
doing my next project in Go, because if Go goes away, then either
Google will pay me or someone else to rewrite all of my code, or
they'll be the ones to suffer the fallout of telling me to write in Go
in the first place.  Either way, no problem.  If I were working at a a
startup that will either fail or go public within 5 years, Go would be
fine for that, too.  If Google stops supporting it, by the time they'd
be likely to make any decisions that would adversely affect the
company, I'd be either rich or employed elsewhere.  But if I were
starting a database project that I hoped or expected to last another
20 years, I'm not sure I'd want to be tied to a language with less
than 10 years of history.

We've often talked about the value of having a PostgreSQL community
which is not controlled by any one company.  We'd lose that at least
some of that value if our entire code base were written in a language
controlled by one company.  C is boring, but it's not going away.
It's also extremely fast and resource-efficient, and it's got a big,
rich ecosystem of tools and libraries around it.  There are all sorts
of things that are best written in some other language, but for system
software it's hard to beat.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] RustgreSQL

From
Josh Berkus
Date:
On 01/09/2017 05:54 PM, Joel Jacobson wrote:
> On Mon, Jan 9, 2017 at 3:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>> I do wonder if there are parts of the codebase that would be much better
>> suited to a language other than C, and could reasonably be ported.
>> Especially if that could be done in such a way that the net result is still
>> C code so we're not adding dependencies to non developers (similar to
>> bison).
>>
>> Extensions are a step in that direction, but they're ultimately not core
>> Postgres (which is a different issue).
> 
> I think this is a great idea!
> 
> That way the amount of C code could be reduced over time,
> while safely extending the official version with new functionality on
> the surface,
> without increasing the amount of C code.

Even if you don't ever end up touching core Postgres, being able to
write extensions in languages other than C (like, full-featured
extensions) would be its own benefit.

Why not start there?  That is, assuming that Joel has gobs of time to
work on this?  For that matter, I know that Jeff Davis is quite fond of
Rust.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)



Re: [HACKERS] RustgreSQL

From
Craig Ringer
Date:
On 10 January 2017 at 23:10, otheus uibk <otheus.uibk@gmail.com> wrote:

> Craig, isn't it the case that C++ exceptions still cause tremendous
> slow-downs of the entire code-base?

No, and it hasn't been so for a long time even for gcc.

See e.g. http://stackoverflow.com/questions/13835817/are-exceptions-in-c-really-slow

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] RustgreSQL

From
Amit Langote
Date:
On 2017/01/11 8:02, Josh Berkus wrote:
> On 01/09/2017 05:54 PM, Joel Jacobson wrote:
>> On Mon, Jan 9, 2017 at 3:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
>>> I do wonder if there are parts of the codebase that would be much better
>>> suited to a language other than C, and could reasonably be ported.
>>> Especially if that could be done in such a way that the net result is still
>>> C code so we're not adding dependencies to non developers (similar to
>>> bison).
>>>
>>> Extensions are a step in that direction, but they're ultimately not core
>>> Postgres (which is a different issue).
>>
>> I think this is a great idea!
>>
>> That way the amount of C code could be reduced over time,
>> while safely extending the official version with new functionality on
>> the surface,
>> without increasing the amount of C code.
> 
> Even if you don't ever end up touching core Postgres, being able to
> write extensions in languages other than C (like, full-featured
> extensions) would be its own benefit.
> 
> Why not start there?  That is, assuming that Joel has gobs of time to
> work on this?  For that matter, I know that Jeff Davis is quite fond of
> Rust.

It seems someone tried (perhaps ran out of steam, unfortunately):

* Postgres extensions written in Rust *
https://github.com/thehydroimpulse/postgres-extension.rs

Thanks,
Amit





Re: [HACKERS] RustgreSQL

From
David Fetter
Date:
On Mon, Jan 09, 2017 at 12:51:43PM -0500, Robert Haas wrote:
> On Sun, Jan 8, 2017 at 4:59 AM, Gavin Flower
> <GavinFlower@archidevsys.co.nz> wrote:
> >> Is this completely unrealistic or is it carved in stone PostgreSQL will
> >> always be a C project forever and ever?
> >>
> > From my very limited understanding, PostgreSQL is more likely to be
> > converted to C++!
> 
> I'm tempted to snarkily reply that we should start by finishing the
> conversion of PostgreSQL from LISP to C before we worry about
> converting it to anything else.  There are various code comments that
> imply that it actually was LISP at one time and I can certainly
> believe that given our incredibly wasteful use of linked lists in so
> many places.  gram.y asserts that this problem was fixed as far as the
> grammar is concerned...
> 
>  *        AUTHOR                        DATE                    MAJOR EVENT
>  *        Andrew Yu                     Sept, 1994
> POSTQUEL to SQL conversion
>  *        Andrew Yu                     Oct, 1994               lispy
> code conversion
> 
> ...but I think it'd be fair to say that even there it was fixed only in part.

David Gould (added to Cc:) mentioned that he had some ideas as to how
to address this.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] RustgreSQL

From
Ewan Higgs
Date:
> It seems someone tried (perhaps ran out of steam, unfortunately):
> * Postgres extensions written in Rust *
> https://github.com/thehydroimpulse/postgres-extension.rs


There is another effort which looks active and promising:


(R)ust (P)ost(g)res FFI

https://github.com/posix4e/rpgffi

This is a bunch of ffi functions largely generated by rust-bindgen.


JsonCDC (json change data capture)

https://github.com/posix4e/jsoncdc

jsoncdc is an interesting tool which takes the WAL and converts it to 

Json for consumption by other processes. Obviously the performance of 

consuming json will be questionable but it is to serve as an example 

project to help future extensions.


The jsoncdc people are on freenode in #jsoncdc.

Yours,
Ewan