Thread: RE: WAL versus Postgres (or: what goes around, comes ar ound)

RE: WAL versus Postgres (or: what goes around, comes ar ound)

From

"Mikheev, Vadim"

Date:

15 May 2000, 15:06:30

>  > I've read this paper ~2 years ago. My plans so far were:
>  > 
>  > 1. WAL in 7.1
>  > 2. New (overwriting) storage manager in 7.2
>  > 
>  > Comments?
> 
> Vadim,
> 
> Perhaps best solution will be to keep both (or three) storage 
> managers - and specify which one to use at database creation time.
> 
> After reading the Stonebraker's paper, I could think there 
> are situations that we want the no-overwrite storage manager and
> other where overwrite storage manager may offer better performance.
> Wasn't Postgres originally designed to allow different storage
> managers?

Overwriting and non-overwriting smgr-s have quite different nature.
Access methods would take care about what type of smgr is used for
specific table/index...

Vadim

Re: WAL versus Postgres (or: what goes around, comes ar ound)

From

Daniel Kalchev

Date:

16 May 2000, 03:04:26

>>>"Mikheev, Vadim" said:> > Perhaps best solution will be to keep both (or three) storage > > managers - and specify
whichone to use at database creation time.> > > > After reading the Stonebraker's paper, I could think there > > are
situationsthat we want the no-overwrite storage manager and> > other where overwrite storage manager may offer better
performance.>> Wasn't Postgres originally designed to allow different storage> > managers?> > Overwriting and
non-overwritingsmgr-s have quite different nature.> Access methods would take care about what type of smgr is used for>
specifictable/index...
 

In light of the discussion whether we can use Berkeley DB (or Sleepycat DB?) - 
perhaps it is indeed good idea to start working on the access methods layer - 
or perhaps just define more 'reasonable' SMGR layer at higher level than the 
current Postgres code.

The idea is: (when) we have this storage manager layer, we could use different 
storage managers (or heaps managers in current terms) to manage different 
tables/databases.

My idea to use different managers at the database level comes from the fact, 
that we do not have transactions that span databases, and that transactions 
are probably the things that will be difficult to implement (in short time) 
for heaps using different storage managers - such as one table no-overwrite, 
another table WAL, third table Berkeley DB etc.

From Vadim's response I imagine he considers this easier to implement...

On the license issue - it is unlikely PostgreSQL to rip off its storage 
internals to replace everything with Berkeley DB. This may have worked three 
or five years ago, but the current storage manager is reasonable (especially 
its crash recovery - I have not seen any other DBMS that is even close to 
PostgreSQL in terms of 'cost of crash recovery' - this is anyway different 
topic). But, if we have the storage manager layer, it may be possible to use 
Berkeley DB as an additional access method - for databases/applications that 
may make benefit of it - performance wise and where license permits.

Daniel

Berkeley DB license

From

"Michael A. Olson"

Date:

16 May 2000, 09:59:31

Yesterday I sent out a message explaining Sleepycat's standard
licensing policy with respect to binary redistribution.  That
policy generally imposes GPL-like restrictions on the embedding
application, unless the distributor purchases a separate
license.

We've talked it over at Sleepycat, and we're willing to write a
special agreement for PostgreSQL's use of Berkeley DB.  That
agreement would permit redistribution of Berkeley DB with
PostgreSQL at no charge, in binary or source code form, by any
party, with or without modifications to the engine.

In short, we can adopt the PostgreSQL license terms for PostgreSQL's
use of Berkeley DB.

The remaining issues are technical ones.

Rather than replacing just the storage manager, you'd be replacing
the access methods, buffer manager, transaction manager, and some
of the shared memory plumbing with our stuff.  I wasn't sufficiently
clear in my earlier message, and talked about "no-overwrite" as if
it were the only component.

Clearly, that's a lot of work.  On the other hand, you'd have the
benefit of an extremely well-tested and widely deployed library to
provide those services.  Lots of different groups have used the
software, so the abstractions that the API presents are well-thought
out and work well in most cases.

The group is interested in multi-version concurrency control, so that
readers never block on writers.  If that's genuinely critical, we'd
be willing to see some work done to add it to Berkeley DB, so that it
can do either conventional 2PL without versioning, or MV.  Naturally,
we'd participate in any kind of design discussions you wanted, but
we'd like to see the PostgreSQL implementors build it, since you
understand the feature you want.

Finally, there's the question of whether a tree-based heap store with
an artificial key will be as fast as the heap structure you're using
now.  Benchmarking is the only way to know for sure.  I don't believe
that this will be a major problem.  The internal nodes of any btree
generally wind up in the cache very quickly, and stay there because
they're hot.  So you're not doing a lot of disk I/O to get a record
off disk, you're chasing pointers in memory.  We don't lose technical
evaluations on performance, as a general thing; I think that you will
be satisfied with the speed.

                mike

Re: Berkeley DB license

From

Philip Warner

Date:

16 May 2000, 10:29:32

At 06:57 16/05/00 -0700, Michael A. Olson wrote:
>We've talked it over at Sleepycat, and we're willing to write a
>special agreement for PostgreSQL's use of Berkeley DB.  That
>agreement would permit redistribution of Berkeley DB with
>PostgreSQL at no charge, in binary or source code form, by any
>party, with or without modifications to the engine.

Just to clarify - if I take PostgreSQL, make a few minor changes to create
a commercial product called Boastgress, your proposed license would allow
the distribution of binaries for the new product without further
interaction, payments, or licensing from Sleepycat?

Similaryly, if changes were made to BDB, I would not have to send those
changes to you, nor would I have to make the source available?

Please don't misunderstand me - it seems to me that you are making a very
generous offer, and I want to clarify that I have understood correctly.

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: Berkeley DB license

From

"Michael A. Olson"

Date:

16 May 2000, 10:47:31

At 12:28 AM 5/17/00 +1000, you wrote:

> Just to clarify - if I take PostgreSQL, make a few minor changes to create
> a commercial product called Boastgress, your proposed license would allow
> the distribution of binaries for the new product without further
> interaction, payments, or licensing from Sleepycat?

Correct.

> Similaryly, if changes were made to BDB, I would not have to send those
> changes to you, nor would I have to make the source available?

Also correct.  However, the license would only permit redistribution of
the Berkeley DB software embedded in the PostgreSQL engine or the
derivative product that the proprietary vendor distributes.  The
vendor would not be permitted to extract Berkeley DB from PostgreSQL
and distribute it separately, as part of some other product offering
or as a standalone embedded database engine.

The intent here is to clear the way for use of Berkeley DB in PostgreSQL,
but not to apply PostgreSQL's license to Berkeley DB for other uses.
                mike

Re: Berkeley DB license

From

Bruce Momjian

Date:

16 May 2000, 11:03:35

> Also correct.  However, the license would only permit redistribution of
> the Berkeley DB software embedded in the PostgreSQL engine or the
> derivative product that the proprietary vendor distributes.  The
> vendor would not be permitted to extract Berkeley DB from PostgreSQL
> and distribute it separately, as part of some other product offering
> or as a standalone embedded database engine.
> 
> The intent here is to clear the way for use of Berkeley DB in PostgreSQL,
> but not to apply PostgreSQL's license to Berkeley DB for other uses.

Totally agree, and totally reasonable.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Berkeley DB license

From

The Hermit Hacker

Date:

16 May 2000, 12:06:32

On Tue, 16 May 2000, Michael A. Olson wrote:

> Rather than replacing just the storage manager, you'd be replacing
> the access methods, buffer manager, transaction manager, and some
> of the shared memory plumbing with our stuff.  

So, basically, we rip out 3+ years or work on our backend and put an SQL
front-end over top of BerkleyDB?

Re: Berkeley DB license

From

Bruce Momjian

Date:

16 May 2000, 12:24:37

> On Tue, 16 May 2000, Michael A. Olson wrote:
> 
> > Rather than replacing just the storage manager, you'd be replacing
> > the access methods, buffer manager, transaction manager, and some
> > of the shared memory plumbing with our stuff.  
> 
> So, basically, we rip out 3+ years or work on our backend and put an SQL
> front-end over top of BerkleyDB?  

Well, if we look at our main componients,
parser/rewrite/optimizer/executor, they stay pretty much the same.  It
is the lower level stuff that would change.

Now, no one is suggesting we do this.  The issue is exploring what gains
we could make in doing this.

I would hate to throw out our code, but I would also hate to not make 
change because we think our code is better without objectively judging
ours against someone else's.

In the end, we may find that the needs of a database for storage are
different enough that SDB would not be a win, but I think it is worth
exploring to see if that is true.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Berkeley DB license

From

"Michael A. Olson"

Date:

16 May 2000, 12:29:33

At 01:05 PM 5/16/00 -0300, The Hermit Hacker wrote:

> So, basically, we rip out 3+ years or work on our backend and put an SQL
> front-end over top of BerkleyDB?  

I'd put this differently.

Given that you're considering rewriting the low-level storage code
anyway, and given that Berkeley DB offers a number of interesting
services, you should consider using it.

It may make sense for you to leverage the 9+ years of work in
Berkeley DB to save yourself a major reimplementation effort now.

We'd like you guys to make the decision on technical merit, so we
agreed to the license terms you require for PostgreSQL.
                mike

Re: Berkeley DB license

From

Brian E Gallew

Date:

16 May 2000, 12:33:38

Then <pgman@candle.pha.pa.us> spoke up and said:
> In the end, we may find that the needs of a database for storage are
> different enough that SDB would not be a win, but I think it is worth
> exploring to see if that is true.

Actually, there are other possibilities, too.  As a for-instance, it
might be interesting to see what a reiserfs-based storage manager
looks/performs like.  

-- 
=====================================================================
| JAVA must have been developed in the wilds of West Virginia.      |
| After all, why else would it support only single inheritance??    |
=====================================================================
| Finger geek@cmu.edu for my public key.                            |
=====================================================================

Re: Berkeley DB license

From

The Hermit Hacker

Date:

16 May 2000, 13:50:35

On Tue, 16 May 2000, Bruce Momjian wrote:

> > On Tue, 16 May 2000, Michael A. Olson wrote:
> > 
> > > Rather than replacing just the storage manager, you'd be replacing
> > > the access methods, buffer manager, transaction manager, and some
> > > of the shared memory plumbing with our stuff.  
> > 
> > So, basically, we rip out 3+ years or work on our backend and put an SQL
> > front-end over top of BerkleyDB?  
> 
> Now, no one is suggesting we do this.  The issue is exploring what gains
> we could make in doing this.

Definitely ... I'm just reducing it down to simpler terms, that's all :)

Re: Berkeley DB license

From

Bruce Momjian

Date:

16 May 2000, 13:54:32

> On Tue, 16 May 2000, Bruce Momjian wrote:
> 
> > > On Tue, 16 May 2000, Michael A. Olson wrote:
> > > 
> > > > Rather than replacing just the storage manager, you'd be replacing
> > > > the access methods, buffer manager, transaction manager, and some
> > > > of the shared memory plumbing with our stuff.  
> > > 
> > > So, basically, we rip out 3+ years or work on our backend and put an SQL
> > > front-end over top of BerkleyDB?  
> > 
> > Now, no one is suggesting we do this.  The issue is exploring what gains
> > we could make in doing this.
> 
> Definitely ... I'm just reducing it down to simpler terms, that's all :)

I am glad you did.  I like the fact we are open to re-evaluate our code
and consider code from outside sources.  Many open-source efforts have
problems with code-not-made-here.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Berkeley DB license

From

Hannu Krosing

Date:

16 May 2000, 17:30:36

Bruce Momjian wrote:
> 
> > On Tue, 16 May 2000, Bruce Momjian wrote:
> >
> > > > On Tue, 16 May 2000, Michael A. Olson wrote:
> > > >
> > > > > Rather than replacing just the storage manager, you'd be replacing
> > > > > the access methods, buffer manager, transaction manager, and some
> > > > > of the shared memory plumbing with our stuff.
> > > >
> > > > So, basically, we rip out 3+ years or work on our backend and put an SQL
> > > > front-end over top of BerkleyDB?
> > >
> > > Now, no one is suggesting we do this.  The issue is exploring what gains
> > > we could make in doing this.
> >
> > Definitely ... I'm just reducing it down to simpler terms, that's all :)
> 
> I am glad you did.  I like the fact we are open to re-evaluate our code
> and consider code from outside sources.  Many open-source efforts have
> problems with code-not-made-here.

I have been planning to add a full-text index (a.k.a. inverted index) to
postgres 
text and array types for some time already. This is the only major index
type 
not yet supported by postgres and currently implemented as an external
index in 
our products. I have had a good excuse (for me) for postponing that work
in 
the limited size of text datatype (actually the tuples) but AFAIK it is
going 
away in 7.1 ;)

I have done a full-text index for a major national newspaper that has
worked 
ok for several years using python and old (v1.86, pre-sleepycat) BSD DB
code - I 
stayed away from 2.x versions due to SC license terms. I'm happy to 
hear that BSD DB and postgreSQL storage schemes are designed to be
compatible.

But I still suspect that taking some existing postgres index (most
likely btree)
as base would be less effort than integrating locking/transactions of
PG/BDB.

------------
Hannu

Re: Berkeley DB license

From

"Michael A. Olson"

Date:

16 May 2000, 20:38:38

At 09:46 AM 5/17/00 +1000, you wrote:

> What if I am an evil software
> empire who takes postgresql, renames it, cuts out most of the
> non-sleepycat code and starts competing with sleepycat?

We make zero dollars in the SQL market today, so I'm not risking
anything by promoting this project.  Our embedded engine is a very
different product from your object/relational client/server engine.
I'm not worried about competition from you guys or from derivatives.
It's just not my market.

I'd really like to see PostgreSQL go head-to-head with the established
proprietary vendors of relational systems.  I believe Berkeley DB will
help.  If you use it and if you're successful, then I get to brag to
all my customers in the embedded market about the number of installed
sites I have worldwide.

> Or what if I am
> just some company who wants to make proprietry use of sleepycat but
> don't want to pay the fee?

We've done these licenses pretty often for other groups.  Gnome is an
example that was mentioned recently.  We do two things:
+  Define the embedding application with reasonable precision; and
+  Explicitly state that the embedding application may not surface   our interfaces directly to third parties.

The first bullet keeps bad guys from using the trick you identify.
The second keeps bad guys from helping their friends use the trick.

If we decide we decide that the integration is a good idea, we'll
draft a letter and you can get it reviewed by your attorney.  I don't
know whether PostgreSQL.org has a lawyer, but if not, Great Bridge
will probably loan you one.  We'll work with you to be sure that the
language is right.

I don't want to minimize concern on this point.  Certainly the
agreement granting PostgreSQL these rights will require some care.
But that's what I do for a living, and I can assure you that you'll
get a letter that's fair and that lives up to the promises we've
made to the list.  I don't want to get hung up on the legal issues.
Those are tractable.  It's more important to me to know whether
there's a reasonable technical fit.
                mike

Re: Berkeley DB license

From

"Michael A. Olson"

Date:

16 May 2000, 21:05:43

At 10:41 AM 5/17/00 +1000, Chris Bitmead wrote:

> Can you explain in technical terms what "surface our interfaces" means?

Basically, it would not be permitted to write trivial wrappers around
Berkeley DB functions like db_open, simply to permit applications other
than PostgreSQL to call them to get around Sleepycat's license terms
for Berkeley DB.

I realize that we can argue at length about what constitutes a "trivial
wrapper," and how much gray area there is around that.  We'd write the
agreement so that there was plenty of room for you to improve PostgreSQL
without violating the terms.  You'll be able to review the agreement and
to get legal advice on it.

Let's hold the legal discussion until we decide whether we need to have
it at all.  If there's just no technical fit, we can save the trouble
and expense of drafting a letter agreement and haggling over terms.  If
the technical fit is good, then the next hurdle will be the agreement,
and we can focus on that with our full attention.
                mike

Re: Berkeley DB license

From

"Michael A. Olson"

Date:

16 May 2000, 21:37:38

At 11:13 AM 5/17/00 +1000, Chris Bitmead wrote:

> Why do you even need wrappers? You just link with libpgbackend.so and
> call whatever functions you want. Or would the agreement say something
> like you can only call postgresql functions that are not part of
> sleepycat?

Yes, the agreement would state that only the PostgreSQL application
could call the Berkeley DB interfaces directly.
                mike

Re: Berkeley DB license

From

Bruce Momjian

Date:

16 May 2000, 21:51:41

> At 11:13 AM 5/17/00 +1000, Chris Bitmead wrote:
> 
> > Why do you even need wrappers? You just link with libpgbackend.so and
> > call whatever functions you want. Or would the agreement say something
> > like you can only call postgresql functions that are not part of
> > sleepycat?
> 
> Yes, the agreement would state that only the PostgreSQL application
> could call the Berkeley DB interfaces directly.

Well then, let's start looking into the options.  I know Vadim has a new
storage manager planned for 7.2, so he is going to look at the sleepycat
code and give us an opinion.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Berkeley DB license

From

Mike Mascari

Date:

16 May 2000, 23:13:44

Bruce Momjian wrote:
> 
> > At 11:13 AM 5/17/00 +1000, Chris Bitmead wrote:
> >
> > > Why do you even need wrappers? You just link with libpgbackend.so and
> > > call whatever functions you want. Or would the agreement say something
> > > like you can only call postgresql functions that are not part of
> > > sleepycat?
> >
> > Yes, the agreement would state that only the PostgreSQL application
> > could call the Berkeley DB interfaces directly.
> 
> Well then, let's start looking into the options.  I know Vadim has a new
> storage manager planned for 7.2, so he is going to look at the sleepycat
> code and give us an opinion.

Just curious, 

But is he *allowed* to view the code? I realize its pseudo-Open
Source, but to my understanding one cannot even look at GPL code
without fear of "infecting" whatever project you may be working
on. If Vadim looks a the code, decides "Nah, I can do this in two
weeks time" and then the overwrite system appears in 7.2 aren't
there issues there?

Mike Mascaru

Re: Berkeley DB license

From

Bruce Momjian

Date:

16 May 2000, 23:14:51

> Just curious, 
> 
> But is he *allowed* to view the code? I realize its pseudo-Open
> Source, but to my understanding one cannot even look at GPL code
> without fear of "infecting" whatever project you may be working
> on. If Vadim looks a the code, decides "Nah, I can do this in two
> weeks time" and then the overwrite system appears in 7.2 aren't
> there issues there?

I have never heard that of GNU code, and I assume it is not true.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Berkeley DB license

From

Philip Warner

Date:

16 May 2000, 23:31:28

At 23:11 16/05/00 -0400, Mike Mascari wrote:
>
>But is he *allowed* to view the code? I realize its pseudo-Open
>Source, but to my understanding one cannot even look at GPL code
>without fear of "infecting" whatever project you may be working
>on. If Vadim looks a the code, decides "Nah, I can do this in two
>weeks time" and then the overwrite system appears in 7.2 aren't
>there issues there?
>

There used to be, but I'm not sure if the laws have changed. 

I *think* the problem was differentiating reverse engineering from copying
- in the most extreme cases I have heard of companies paying someone to
document the thing to be reverse engineered, then hunting for someone who
has never seen the thing in question to actually do the work from the
documentation.

In this case, Vadim is not planning to reverse engineer SDB, but it *might*
be worth getting an opinion or, better, a waiver from Sleepycat...or
getting someone else to look into it, depending on how busy Vadim is.

P.S. Isn't it amazing (depressing) how a very generous offer to help from
people who want the same outcomes as we do, seems to turn rapidly into
discussions of litigation. This can not be a healthy legal system.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: +61-03-5367 7422            |                 _________  \
Fax: +61-03-5367 7430            |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

Re: Berkeley DB license

From

Alfred Perlstein

Date:

17 May 2000, 00:03:56

* Michael A. Olson <mao@sleepycat.com> [000516 18:42] wrote:
> At 10:41 AM 5/17/00 +1000, Chris Bitmead wrote:
> 
> > Can you explain in technical terms what "surface our interfaces" means?
> 
> Basically, it would not be permitted to write trivial wrappers around
> Berkeley DB functions like db_open, simply to permit applications other
> than PostgreSQL to call them to get around Sleepycat's license terms
> for Berkeley DB.
> 
> I realize that we can argue at length about what constitutes a "trivial
> wrapper," and how much gray area there is around that.  We'd write the
> agreement so that there was plenty of room for you to improve PostgreSQL
> without violating the terms.  You'll be able to review the agreement and
> to get legal advice on it.
> 
> Let's hold the legal discussion until we decide whether we need to have
> it at all.  If there's just no technical fit, we can save the trouble
> and expense of drafting a letter agreement and haggling over terms.  If
> the technical fit is good, then the next hurdle will be the agreement,
> and we can focus on that with our full attention.

Not that I really have any say about it but...

I'm sorry, this proposal will probably lead to hurt on both sides,
one for SleepyCat possibly loosing intellectual rights by signing
into a BSDL'd program and another for Postgresql who might feel
the aftermath.

Making conditions on what constitutes a legal derivative of the
Postgresql engine versus a simple wrapper is so arbitrary that it
really scares me.

Now if SleepyCat could/would release under a full BSDL license, or
if Postgresql didn't have a problem with going closed source there
wouldn't be a problem, but I don't see either happening.

Finally, all this talk about changing licenses (the whole GPL mess),
incorperating _encumbered_ code (SleepyCat DB) is concerning to
say the least.

Are you guys serious about compromising the codebase and tainting
it in such a way that it becomes impossible for the people that
are actively working on it to eventually profit from it?

(with the exception of over-hyped IPO, but even that well has dried up)

-Alfred

Re: Berkeley DB license

From

Tom Lane

Date:

17 May 2000, 00:18:41

"Michael A. Olson" <mao@sleepycat.com> writes:
> Let's hold the legal discussion until we decide whether we need to have
> it at all.

Seems to me that Mike has stated that Sleepycat is willing to do what
they have to do to meet our requirements that Postgres remain completely
free.  Let's take that as read for the moment and pay more attention to
the technical issues.  As he says, if there are technical showstoppers
then there's no point in haggling over license wording.  We can come
back to the legal details when and if we decide that the idea looks
like a good one technically.

I'm going to be extremely presumptive here and try to push the technical
discussion into several specific channels.  It looks to me like we've
got four major areas of concern technically:

1. MVCC semantics.  If we can't keep MVCC then the deal's dead in the
water, no question.  Vadim's by far the best man to look at this issue;
Vadim, do you have time to think about it soon?

2. Where to draw the API line.  Berkeley DB's API doesn't seem to fit
very neatly into the existing modularization of Postgres.  How should
we approach that, and how much work would it cost us?  Are there parts
of BDB that we just don't want to use at all?

3. Additional access methods.  Mike thinks we could live with using
BDB's Recno access method for primary heap storage.  I'm dubious
(OK, call me stubborn...).  We also have index methods that BDB hasn't
got.  I'm not sure that our GIST code is being used or is even
functional, but the rtree code has certainly got users.  So, how hard
might it be to add raw-heap and rtree access methods to BDB?

4. What are we buying for the above work?  With all due respect to
Mike, he's said that he hasn't looked at the Postgres code since it
was in Berkeley's hands.  We've made some considerable strides since
then, and maybe moving to BDB wouldn't be as big a win as he thinks.
On the other hand, Vadim is about to invest a great deal of work
in WAL+new smgr; maybe we'd get more bang for the buck by putting
the same amount of effort into interfacing to BDB.  We've got to
look hard at this.

Anyone see any major points that I've missed here?

How can we move forward on looking at these questions?  Seems like
we ought to try for the "quick kill": if anyone can identify any
clear showstoppers in any of these areas, nailing down any one will
end the discussion.  As long as we don't find a showstopper it seems
that we ought to keep talking about it.
        regards, tom lane

Re: Berkeley DB license

From

Hannu Krosing

Date:

17 May 2000, 05:27:53

Tom Lane wrote:
> 
> 
> 3. Additional access methods.  Mike thinks we could live with using
> BDB's Recno access method for primary heap storage.  I'm dubious
> (OK, call me stubborn...).  We also have index methods that BDB hasn't
> got.  I'm not sure that our GIST code is being used or is even
> functional,

There was some discussion about GIST in 6.x and at least some people 
seemed to use it for their specific needs.

> but the rtree code has certainly got users.  So, how hard
> might it be to add raw-heap and rtree access methods to BDB?
>

What if we go ahead an add R-tree and Gist and whatnot to DBD, 
won't they become property of Sleepycat licensable for business 
use by them only ? 

Ditto for other things we may need to push inside BDB to keep good 
structure.

> 
> Anyone see any major points that I've missed here?
> 

I still feel kind of eery about making some parts of code 
proprietary/GPL/BDB-PL and using PostgreSQL only for SQL layer 
and not storage.

We should probably also look at the pre-Sleepycat (v 1.x.x) 
BDB code that had the original Berkeley license and see what we 
could make of that. 

It does not have transaction support, but we already have MVCC.

It does not have page size restrictions either ;)

And using even it still gives some bragging rights to Sleepycat ;)

---------
Hannu

Re: Berkeley DB license

From

Benjamin Adida

Date:

17 May 2000, 10:32:56

on 5/16/00 9:57 AM, Michael A. Olson at mao@sleepycat.com wrote:

> The group is interested in multi-version concurrency control, so that
> readers never block on writers.  If that's genuinely critical, we'd
> be willing to see some work done to add it to Berkeley DB, so that it
> can do either conventional 2PL without versioning, or MV.  Naturally,
> we'd participate in any kind of design discussions you wanted, but
> we'd like to see the PostgreSQL implementors build it, since you
> understand the feature you want.

I don't think this point has been made strongly enough yet: readers never
blocking writers AND writers never blocking readers are *critical* to any
serious web application. It is one of the main reasons (besides marketing)
why Oracle crushed Informix and Sybase in the web era.

Oracle solves this problem using its rollback segments to pull out old data
(sometimes resulting in the nasty "snapshot too old" error if a transaction
is trying to pull out data so old that it has been removed from the rollback
segment!), and Postgres uses MVCC (which is much cleaner IMHO).

As a user of databases for web applications only, I (and many others looking
to Postgres for a serious Open-Source Oracle replacement) would be forced to
completely drop Postgres if this (backwards) step were taken.

If you love gory details about this web & locking stuff, check out
http://photo.net/wtr/aolserver/introduction-2.html
(search for Postgres and read on from there).

-Ben

Re: Berkeley DB license

From

Bruce Momjian

Date:

20 May 2000, 00:14:50

I know people are still reviewing the SDB implementation for PostgreSQL,
but I was thinking about it today.

This is the first time I realized how efficient our current system is.
We have shared buffers that are mapped into the address space of each
backend. When a table is sequentially scanned, buffers are loaded into
that area and the backend accesses that 8k straight out of memory. If I
remember the optimizations I added, much of that access uses inlined
functions (macros) meaning the buffers are scanned at amazing speeds. I
know inlining a few of those functions gained a 10% speedup.

I wonder how SDB performs such file scans. Of course, the real trick is
getting those buffers loaded faster. For sequential scans, the kernel
prefetch does a good job, but index scans that hit many tuples have
problems, I am sure. ISAM helps in this regard, but I don't see that
SDB has it.

There is also the Linux problem of preventing read-ahead after an
seek(), while the BSD/HP kernels prevent prefetch only when prefetch
blocks remain unused.

And there is the problem of cache wiping, where a large sequential scan
removes all other cached blocks from the buffer. I don't know a way to
prevent that one, though we could have large sequential scans reuse
their own buffer, rather than grabbing the oldest buffer.

-- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610)
853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill,
Pennsylvania19026

Raw devices (was Re: Berkeley DB license)

From

Alex Pilosov

Date:

20 May 2000, 00:31:56

On Sat, 20 May 2000, Bruce Momjian wrote:

> And there is the problem of cache wiping, where a large sequential scan
> removes all other cached blocks from the buffer.  I don't know a way to
> prevent that one, though we could have large sequential scans reuse
> their own buffer, rather than grabbing the oldest buffer.

On some systems, you can specify (or hint) to the kernel that the file you
are reading should not be buffered.

The only (completely) real solution for this is to use raw devices,
uncached by the kernel, without any filesystem overhead... 

Are there any plans to support that?

Re: Raw devices (was Re: Berkeley DB license)u

From

Bruce Momjian

Date:

20 May 2000, 00:43:52

> On Sat, 20 May 2000, Bruce Momjian wrote:
> 
> > And there is the problem of cache wiping, where a large sequential scan
> > removes all other cached blocks from the buffer.  I don't know a way to
> > prevent that one, though we could have large sequential scans reuse
> > their own buffer, rather than grabbing the oldest buffer.
> 
> On some systems, you can specify (or hint) to the kernel that the file you
> are reading should not be buffered.

Well, I was actually thinking of the cache wiping that happens to our
own PostgreSQL shared buffers, which we certainly do control.

> The only (completely) real solution for this is to use raw devices,
> uncached by the kernel, without any filesystem overhead... 

We are not sure if we want to go in that direction.  Commercial vendors
have implemented it, but the gain seems to be minimal, especially with
modern file systems.  Trying to duplicate all the disk buffer management
in our code seems to be of marginal benefit.  We have bigger fish to
fry, as the saying goes.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Raw devices (was Re: Berkeley DB license)

From

Thomas Lockhart

Date:

20 May 2000, 00:48:52

> The only (completely) real solution for this is to use raw devices,
> uncached by the kernel, without any filesystem overhead...
> Are there any plans to support that?

No specific plans. afaik no "anti-plans" either, but the reason that
we don't do this yet is that it isn't clear to all of us that this
would be a real performance win. If someone wanted to do it as a
project that would result in a benchmark, that would help move things
along...
                     - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California

Re: Raw devices (was Re: Berkeley DB license)

From

Tom Lane

Date:

20 May 2000, 01:22:51

Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
>> The only (completely) real solution for this is to use raw devices,
>> uncached by the kernel, without any filesystem overhead...
>> Are there any plans to support that?

> No specific plans. afaik no "anti-plans" either, but the reason that
> we don't do this yet is that it isn't clear to all of us that this
> would be a real performance win.

... whereas it *is* clear that it would be a portability loss ...

> If someone wanted to do it as a project that would result in a
> benchmark, that would help move things along...

I think we'd want to see some indisputable evidence that there'd be
a substantial gain in the Postgres context.  We could be talked into
living with the portability issues if the prize is worthy enough;
but that is unproven as far as I've seen.

At the moment, we have a long list of known performance gains that we
can get without any portability compromise (for example, the lack of
pg_index caching that we were getting our noses rubbed in just this
morning).  So I think none of the key developers feel particularly
excited about raw I/O.  There's lots of lower-hanging fruit.

Still, if you want to pursue it, be our guest.  The great thing
about open-source software is there's room for everyone to play.
        regards, tom lane

Re: Raw devices (was Re: Berkeley DB license)

From

"Matthias Urlichs"

Date:

20 May 2000, 03:57:54

Hi,

Alex Pilosov:
> The only (completely) real solution for this is to use raw devices,
> uncached by the kernel, without any filesystem overhead... 
> 
...and with no OS caching _at_all_.

> Are there any plans to support that?
> 
IMHO it's interesting to note that even Oracle, which used to be one of
the "you gotta use a raw partition if you want any speed at all" guys,
has moved into the "use a normal partition or a regular file unless you
do things like sharing a RAID between two hosts" camp.

Or so I've been told a year or so ago.

-- 
Matthias Urlichs  |  noris network GmbH   |   smurf@noris.de  |  ICQ: 20193661
The quote was selected randomly. Really.       |        http://smurf.noris.de/
-- 
Lawsuit (noun) --      A machine which you go into as a pig and come out as a sausage.              --Ambrose Bierce

Re: Raw devices (was Re: Berkeley DB license)

From

Bruce Momjian

Date:

20 May 2000, 07:32:00

> Hi,
> 
> Alex Pilosov:
> > The only (completely) real solution for this is to use raw devices,
> > uncached by the kernel, without any filesystem overhead... 
> > 
> ...and with no OS caching _at_all_.
> 
> > Are there any plans to support that?
> > 
> IMHO it's interesting to note that even Oracle, which used to be one of
> the "you gotta use a raw partition if you want any speed at all" guys,
> has moved into the "use a normal partition or a regular file unless you
> do things like sharing a RAID between two hosts" camp.

Yes, we noticed that.  We are glad we didn't waste time going in that
direction.


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: Berkeley DB license

From

"Michael A. Olson"

Date:

20 May 2000, 13:02:59

At 12:14 AM 5/20/00 -0400, Bruce Momjian wrote:

> We have shared buffers that are mapped into the address space of each
> backend. [ ... ] 
> I wonder how SDB performs such file scans.

Berkeley DB is an embedded toolkit, and works hard to provide mechanism
in place of policy.  That is to say, you can request memory-mapped
access to database files, or you can use a shared memory buffer cache.
We give you the tools to do what you want.  We try not to force you to
do what we want.

We don't do query processing, we do fast storage and retrieval.  Query
planning and optimization, including access path selection and buffer
management policy selection, get built on top of Berkeley DB.  We offer
a variety of ways, via the API, to control the behavior of the lock,
log, shmem, and transaction subsystems.
                mike