Aberdeen Study on OS RDBMS: exceprts and breakdown. - Mailing list pgsql-advocacy

The Aberdeen group recently released a study on open source databases,
focusing on BerkleyDB, My$QL, and PostgreSQL.  Sleepycat Software (the
makers of BerkleyDB) have released a copy of the report available at:
ttp://www.sleepycat.com/aberdeenwp (registration required).  You can
read the press release for yourselves, though I thought I would point
out one item I thought was interesting; they refer to BerkleyDB as "the
most widely used application-specific data management software in the
world with over 200 million deployments".  I have no reason to believe
this isn't true, but I do notice that it does rub against my$ql's slogan
of being "the worlds most popular open source database".

Anyway, it is worth noting that the report is slanted to show BerkleyDB
in a positive light, but I wanted to included some excerpts of the full
report below that I think are relevant to our group:

--
Motivations For Choosing Open Source Databases (For ISV's)
    *    Control over maintenance and support
    * Availability of source code
    * Cost, such as for licenses and subsequent upgrades (Some cite
development cost and TCO as well, but others find that open source
database development cost and TCO are greater than for a proprietary
alternative.)
    * Flexibility to add new features as needed (Note that users rarely
change open source databases.)
    * Reliability

--
In it's description of my$ql, it describes them as providing the
following features (among others)
"* MySQL is a full relational database with SQL support.
* ACID (atomicity, consistency, isolation, and durability) compliance
(That is, MySQL guarantees that some system failure will not corrupt the
database and make it transactionally inconsistent.)"

These two bullet-points are important for this group because one of the
messages that we often use in comparisons between our systems is that
my$ql is not a full relational database, and that it is not truly ACID
compliant. While I tend to agree with those statements, obviously the
general mindset does not agree with that.

It mentions my$ql as lacking all the usual suspects like views/triggers
etc... but also lists this:
"* Incremental and online/parallel backup/restore"

which I find a little bit interesting because afaik they do offer this
as a proprietary add on solution.

--
The following is their general description of PostgreSQL, my comments
are surrounded by ##

The PostgreSQL Open Source Database

PostgreSQL is a full relational database with SQL support. Many users
view PostgreSQL as more complex and powerful than MySQL, but much less
powerful than an enterprise database. Interestingly, some knowledgeable
MySQL users view MySQL as more scalable than PostgreSQL.

# I think this links back to my$ql's ability to scale up on read only
type database systems, and having some big names associated with this
type of use (yahoo for instance). The other half being our legacy of
being a "slow" database.  #


PostgreSQL offers the following key features beyond the relational
“basics”:

    * Tablespace backups, i.e., the ability to back up a set of tables
rather than the entire database — with limits
    * ODBC support
    * Online backup (with limits)
    * ACID compliance for transactions
    * Deadlock detection
    * Unicode support (non-U.S. character sets)
    * Views (read-only, i.e., with limits)
    * Stored procedures and triggers (with limits on
programming-language support, including no support for Java)

# I thought it was interesting in how many of ours were labeled "with
limits" while the other descriptions did not have this so much...
however I find this one especially interesting, since we have support
for 10+ languages and the ability to add more. I think this stems from
not having all of these languages available "out of the box", and
potentially from the time frame they were studying in. #

    * Object table and data-type support

PostgreSQL does not offer the following key features typically provided
in enterprise databases:

    * Incremental and parallel backup/restore
    * Encryption (security)
    * Deadlock detection

# but up above you said we did have it... ?  I'd grant that it may not
be as robust as some of the commercial guys, but it's there. #

    * Row-level locking (typically required by large packaged
applications) — although PostgreSQL does offer alternative sub-table
locking schemes

# uh... we do have row level locking, and in fact we have better than
row level locking #

    * Bit-mapped indexing (for large data warehouses)
    * A single GUI administrative interface

# again... nothing packaged with the server itself, though we do have
several options in this area #

    * View update/insert/delete

# cough...rules..cough#

PostgreSQL users indicate that improvements in some areas are
contemplated, but the next version of PostgreSQL may come as late as two
years from now, and the community that drives PostgreSQL development is
not committing to any particular improvements.

# uh... two years out... i couldn't come up with an explanation for
that...  well, many of these features are planned for in 7.5, and I
wonder if it is worth coming up with a "response" to the information
above #


--

Key Characteristics of Enterprise Databases

Scalability

 OLTP scalability technologies:
· SMP support
· Clustering support
· Multithreading

Decision-support scalability technologies:
· Query optimization
· Replication
· Cost optimization
· Bit-mapped indexing

“Mixed” scalability technologies:
· Stored procedures
· Distributed database synchronization (2PC)
· Running the application “inside the database”
· Load balancing

Robustness/administrative costs

· Online, parallel load and backup/recovery, monitoring, and metadata
management
· “Zero-administration” tools and automated reorganization
· Cross-database tools

Flexibility

· Standards support
 - SQL- Java/EJBs
 - XML·
 Complex-data-type support
 - Rich-media support
 - File support
- Object and object-relational technology
 - Content management technology

· Integration with infrastructure software
 - Application servers
 - Infrastructure APIs and component libraries

Programmer-productivity support

· High-level development environment support
·Java programming support
 ·Frameworks


Of the above, the following four were listed as critical to
distinguishing between high end and low end DBMS:
   1. Stored procedures (and triggers)
   2. Replication (and two-phase commit)
   3. SMP (symmetric multiprocessing) support
   4. Cost-based query optimization

# Seems we are pretty darn close... but read on#

Many are clearly deficient in at least the first three aforementioned
technologies — they do not offer (or offer limited) stored procedures,
do not offer two-phase commit, and do not offer exceptional
multiprocessing support. However, interviewees’ perception is that in
many cases, today’s open source databases are enterprise class in
scalability and robustness. Moreover, like databases offered by lower
profile proprietary suppliers, such as Pervasive and Progress, open
source databases offer exceptional manageability, requiring little
maintenance compared with major enterprise databases.

--

Key Potential Applications for Open Source Databases

* Financial -- Investment, credit card, bank, insurance
* Telecommunications -- Telephone companies and PTTs, wireless suppliers
* Retail -- Department store chains, restaurants
* Manufacturing -- Car makers, such as General Motors and Ford;
equipment makers, such as Caterpillar; semiconductor suppliers
* Health care -- Hospitals and hospital chains
* Oil/gas/aerospace -- Energy suppliers, such as Exxon Mobil; aerospace
manufacturers, such as Boeing
* Travel/entertainment -- Hotels; luxury cruise lines; television,
movie, and music companies
* Transportation -- Truck, train, and shipping companies
* Government -- NSA, Department of Defense
* Education -- Universities

# It would be interesting have someone gather a comprehensive list of
large businesses (global 5000 maybe?) that use postgresql in each of the
major markets listed above #

--

"Although some open source suppliers have fallen by the wayside as the
Web frenzy subsided and they were forced to identify sources of revenue,
it now appears clear that several open source databases — including
MySQL, PostgreSQL, and Sleepycat Software’s Berkeley DB — are here for
the long haul."

I thought this was important to note since we've recently seen several
articles that refer back to the demise of Great Bridge...

--

In a who is using open source databases section, then mention "Most
applications involved a substantial transaction stream of reads, with
relatively few — but a significant number of — complex queries and
updates." which would be a good indicator of why my$ql and berkelydb are
so much more popular.

--

when asked why the ISV's chose an OS DB "All interviewees immediately
cited “price” (license cost) as the main reason."

--

when asked what they would like to see from their current open source
vendor... "One interviewee cited full-text indexing, one cited triggers
and a “geographic information system (GIS),” one cited rollback, and one
cited a GUI “control center.”"... seems they should all be looking at
postgresql a little closer :-)  Also in response to that question was
this telling line.. "Two-phase commit is not available in PostgreSQL.
Every PostgreSQL user requested this feature." ... makes you wish you
email those folks about the patches that have been posted to -hackers
:-\

--

finally, of the respondents in the study whom were using proprietary
databases for the past several years, none expressed interest in trying
an open source database package.

--

Hope this was informative for everyone here, and this will hopefully
help give some inspiration to new ideas on how we can push forward
postgresql adoption in the future.



Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


pgsql-advocacy by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: Best open source db poll currently
Next
From: David Costa
Date:
Subject: Advocacy initiative PHP5/PostgreSQL