Thread: Re: Comparison of PGSQL and DB2

Re: Comparison of PGSQL and DB2

From

Josh Berkus

Date:

09 March 2004, 18:28:01

Shashank:

What follows is my reply to IBM's 2-year-old feature comparison of PostgreSQL
and DB2.   Each of the entries is an item that DB2 has and claims that
PostgreSQL does not.    I've put my replies to these claims.

> On Demand Log Archiving
> (Allows a user to close and archive the active log of a recoverable atabase
whenever necessary, and use those log files to update a standby database.)
> Incremental Backup
> (Allows a user to perform interim backups that capture database changes
since the last successful full (or incremental) backup. Minimizes backup
window for large databases.)
> Incremental Online Backups
> (Ability to perform 'delta' backups while database is in use.)

IBM is correct.  We do not currently have the above features.   However, they
are all expected in or very soon after the release of 7.5.

> Dual Logging
> (Mirrors the active log files, which help to protect databases from
accidental deletion of an active log and data corruption caused by hardware
failure.)

This "feature" seems redundant with the previous 3 features; I'm not sure what
IBM is getting at.

> High Availability Support from Vendors.
> (Third party.)

We have this, in quite a variety of forms.    PGSQL Inc., Command Prompt, SRA
Inc., Linux Labs, Veritas, and a couple of companies in Europe whose names
escape me at the moment, all have some kind of HA PostgreSQL support.

> Dynamic Storage Allocation
> (Ability to add storage capacity to the database while the database is in
use.)

This is a "misfeature".   PostgreSQL does not have "Dynamic Storage
Allocation" becuase we don't have the filesytem access issues that DB2 does
-- PostgreSQL can expand to fill the host filesystem, no allocation required.
Or, to look at it another way, we've had this since Postgres 1.0.   This is
definitely an area in which we are superior to DB2.

> Automatic Integrity Checking
> (Automatic checking of the structural integrity of data pages when the data
is accessed by an application and during
> backup.)

Another "feature" we don't have becuase it's not needed.   We're more
interested in preventing corruption in the first place than finding out if
it's happened.

> Encryption Functions
> (Encryption and decryption functions with a key strength of 128 bits.)

We've had these for at least 3 versions.

> Group Authentication
> (Authentication by user groups & roles, rather than by username for more
manageability.)

Once again, groups are something we've had for several versions.  We don't
support "roles" becuase they are redundant to groups.

> Raw Device Support
> (Ability to leverage raw-disk access, bypassing the overhead of the
underlying operating system.
> (Note: May be supported soon))

This, in our opinion, is another "misfeature".    Raw Filesystem access is
either a crutch for DBMSes which don't know how to use their host OS
properly, or a legacy of older server OSes which did not support good
filesystems.   It is not useful for modern databases running on modern OSes.

> Automatic Summary Tables
> (Transparently provide the pre-aggregation/summarization of data to
dramatically improve the performance of transactions or queries that require
such data.)

Under development, possibily available with 7.5.   These are otherwise know as
"materialized views".

> User Defined Task Prioritization
> (Ability to define rules on how the database should prioritize workloads
either by query cost or user profiles. Ability to monitor resource usage and
adjust the priority of queries or cancel runaway queries that exceed
predefined limits.)

This would be nice.   We're not sure how to implement it; don't expect it soon
for PostgreSQL.

> Isolation Levels
> (Support all four ANSI isolation levels (UR, CS, RS, RR).)

I'm not sure about this one; I suspect that we do, however, since MVCC,
invented for the Postgres Project, has become a standard for transaction
isolation in the database industry.

> Clustered Indexes
> (Allows data records to be clustered on pages based on the sequence of a
particular index. Clustering increases the efficiency of data retrieval when
it involves accessing values by range.)

We've had these for at least 3 versions.

> Advanced Memory Management
> (Ability to pre-allocate memory for data required by applications before it
is needed (pre-fetching). Ability for considerable memory availability in
choosing access plan to resolve queries.)

Another "feature" caused by DB2's ignorance of the host OS.   We *are* working
on making Postgres' use of RAM and I/O more efficient.   However, I suspect
that it is already better than DB2's use.

> Dynamic Bit Map Indexing
> (Warehouse queries typically involve many AND predicates. Dynamic bit-map
technology efficiently combines multiple indexes for improved performance.)

Aha!  Yes, this would be nice.   We don't have it, under development or
otherwise.  We'd like it.

> Bi-Directional Indexes
> (Ability to perform both forward and reverse scans on a single index.
Improves the performance of reverse scans by eliminating the need to use
temporary tables.)

<laugh>  This is a *feature*?   I call it a bug fix.   We've had this since
7.1. -- BEFORE DB2.

For that matter, we support MANY more *types* of indexes thann DB2 does, from
my brief check.

> Data Partitioning
> (Ability to divide database into partitions, each of which can be stored and
managed on any combination of multiple logical partitions (for large SMP
machines) or physical partitions on separate machines in a "shared-nothing"
massively parallel
> hardware platform.)

This is under development and will (probably) be released with 7.5, for
seperate tables.   Currently, an ad-hoc solution is possible for any talented
DBA/sysadmin, but is somewhat hard to administer.   Partitioning individual
tables has been discussed but there is not yet any development code.

> SMP Support for complex queries
> (Ability to divide up work of a single complex/large query among processors
in a multi-processor environment to resolve subqueries in parallel.)

Nope, don't have this; another thing that would be nice to have.

> Parallel Backup / Restore
> (Ability to perform backup/restore of large databases faster by leveraging
all the processors in a multi-processor machine.)

This is directly related to the above feature.

> Mass Data Load
> (Ability to perform data load into the database faster by leveraging all
processors in a multi-processor machine.)

Same.

> Federated Database Support
> (Ability to allow applications to access & perform JOIN operations on
multiple disparate databases.)

This is a feature which we do not have because it violates the ANSI SQL
Specification.

> Built-in OLE DB Support
> (Allows for access to data from another database that supports the Microsoft
OLE DB standard.)

Don't have it, nor are likely to.   Our interface libraries are strictly
seperate from the core program, and should and will remain so.   Currently we
support ODBC and .Net access for MS tools.   OLE-DB is under development.

> DRDA Support
> (Support for DRDA (industry standard communications protocol for relational
databases).)

Not sure.  This is the first time I've heard of this standard.  We may support
it through a 3rd-party project or library.

> MQSeries Integration
> (Features to reduce effort required to integrate database with MQSeries
Infrastructure.)

Never heard of MQSeries before, either.   See above.

> J2EE Certified
> (Independently certified to be J2EE compliant.)

Um, why would a database be "J2EE certified"?    We *are* talking about an
RDBMS here, not a JVM.

> SQLJ
> (Allows developing Java applications with embedded static SQL.)

We don't have SQLJ because, to be frank, it's a worthless hybrid.   We
currently have two different implementations of procedural Java which can be
used inside the database, either of which is superior in design to SQLJ,
although I'm not sure of their production status.

> FIPS/SQL92E Support
> (Complies with FIPS/SQL92E standard.)

Not certain about this one.   We may support it, we may not.

> Database XML Tools
> (XML tools to easily store, retrieve, and search the content of XML
documents. Ability to map XML definitions to relational tables to dynamically
compose and decompose XML documents on demand.)

Our support for XML is through 3rd-party projects and plug-ins.  As I do not
do a lot of XML-to-DB, I cannot relate the exact status of these projects.

> Java UDFs
> (Support for cross platform User Defined Functions.)
> Java Stored Procedures
> (Support for cross platform Stored Procedures.)

We have PL/java and PL/J to do both of the above things.   My personal
knowledge is insufficient to compare our implementation with DB2s.

For that matter, we support 10 Procedural Languages *in addtion to* Java.
These include Perl, Python, Ruby, R, Tcl, and PHP.   DB2 supports, AFAIK,
only Java and a SQL-scripting dialect.

> WW 24x7 Support

We have a few companies that offer 24x7 support in specific areas.   Certainly
IBM has a leg up on us for offering worldwide support -- especially since
they support PostgreSQL in some locations!

--
Josh Berkus
PostgreSQL Advocacy Volunteer
http://advocacy.postgresql.org/
San Francisco

Re: Comparison of PGSQL and DB2

From

Andrew Sullivan

Date:

10 March 2004, 16:37:48

On Tue, Mar 09, 2004 at 10:22:24AM -0800, Josh Berkus wrote:
> > User Defined Task Prioritization
> > (Ability to define rules on how the database should prioritize workloads
> either by query cost or user profiles. Ability to monitor resource usage and
> adjust the priority of queries or cancel runaway queries that exceed
> predefined limits.)
>
> This would be nice.   We're not sure how to implement it; don't
> expect it soon for PostgreSQL.

Note that we do have a part of this, if "runaway queries" == "queries
which run too long".  I suspect that what would really be needed,
however, is something running on the OS to detect timeslice or
memory hogs.

> > Parallel Backup / Restore (Ability to perform backup/restore of
> > large databases faster by leveraging
> all the processors in a multi-processor machine.)
>
> This is directly related to the above feature.

We can simulate it though, right?  Use the binary file format for
pg_dump, and hand out pieces of recovery to different restorers
depending on your number of processors, one table at a time per
restorer.  (Afilias -- Chris Browne, really -- wrote something along
these lines for the import of the .org database.  I talked about it
a little at OSCON last year.)  This is very far from the
user-friendly tool that IBM offers, of course.

A

--
Andrew Sullivan  | ajs@crankycanuck.ca

Re: Comparison of PGSQL and DB2

From

Robert Treat

Date:

11 March 2004, 14:40:04

On Tue, 2004-03-09 at 13:22, Josh Berkus wrote:
> Shashank:
>
> What follows is my reply to IBM's 2-year-old feature comparison of PostgreSQL
> and DB2.   Each of the entries is an item that DB2 has and claims that
> PostgreSQL does not.    I've put my replies to these claims.
>
<snip>
>
> > Isolation Levels
> > (Support all four ANSI isolation levels (UR, CS, RS, RR).)
>
> I'm not sure about this one; I suspect that we do, however, since MVCC,
> invented for the Postgres Project, has become a standard for transaction
> isolation in the database industry.
>

Not exactly sure which acronyms above correspond to which levels (as I
know them by different names apparently) but we support Read Committed
and Serializable levels, which are by far the most common of the two
AFAIK. The others as I know them are Read Uncommitted aka phantom reads,
and Repeatable Read, which we do not support. Of course we also supply a
extensive locking methods for further control.

More info on our isolation levels can be found at
http://www.postgresql.org/docs/7.4/static/transaction-iso.html

<snip>
> > Federated Database Support
> > (Ability to allow applications to access & perform JOIN operations on
> multiple disparate databases.)
>
> This is a feature which we do not have because it violates the ANSI SQL
> Specification.
>

Perhaps worth a mention that using dblink, you can actually make
functionality similar to this.  I have recently been experimenting with
using views calling dblink functions to create "local tables" that
really live in a separate database on a completely separate machine. I
don't know if I would recommend the technique but I think the
capabilities are there.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Comparison of PGSQL and DB2

From

Peter Eisentraut

Date:

11 March 2004, 14:59:50

Josh Berkus wrote:
> > Isolation Levels
> > (Support all four ANSI isolation levels (UR, CS, RS, RR).)
>
> I'm not sure about this one; I suspect that we do, however, since
> MVCC, invented for the Postgres Project, has become a standard for
> transaction isolation in the database industry.

Surely MVCC wasn't "invented for the Postgres Project"; otherwise Vadim
wouldn't have done so much copying from Oracle. ;-)

7.5 will support all four isolation levels.

> > Federated Database Support
> > (Ability to allow applications to access & perform JOIN operations
> > on
>
> multiple disparate databases.)
>
> This is a feature which we do not have because it violates the ANSI
> SQL Specification.

That would be news to me.

Re: Comparison of PGSQL and DB2

From

Robert Treat

Date:

11 March 2004, 15:21:17

On Thu, 2004-03-11 at 13:59, Peter Eisentraut wrote:
> Josh Berkus wrote:
> > > Isolation Levels
> > > (Support all four ANSI isolation levels (UR, CS, RS, RR).)
> >
> > I'm not sure about this one; I suspect that we do, however, since
> > MVCC, invented for the Postgres Project, has become a standard for
> > transaction isolation in the database industry.
>
> 7.5 will support all four isolation levels.
>

I wasn't aware of anyone doing work on this... do you have any more
info?

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

Re: Comparison of PGSQL and DB2

From

Bruno Wolff III

Date:

11 March 2004, 15:44:29

On Thu, Mar 11, 2004 at 14:21:09 -0500,
  Robert Treat <xzilla@users.sourceforge.net> wrote:
> On Thu, 2004-03-11 at 13:59, Peter Eisentraut wrote:
> > Josh Berkus wrote:
> > > > Isolation Levels
> > > > (Support all four ANSI isolation levels (UR, CS, RS, RR).)
> > >
> > > I'm not sure about this one; I suspect that we do, however, since
> > > MVCC, invented for the Postgres Project, has become a standard for
> > > transaction isolation in the database industry.
> >
> > 7.5 will support all four isolation levels.
> >
>
> I wasn't aware of anyone doing work on this... do you have any more
> info?

My memory of past discussion on this is that you would be able to set all
4 transaction isolation levels in a command, but you would really get
the next level up for the two that we don't provide. This is OK since
since they get at least the transaction isolation safety they need.
The reason for doing this is portability.

Re: Comparison of PGSQL and DB2

From

Peter Eisentraut

Date:

11 March 2004, 16:23:25

Bruno Wolff III wrote:
> My memory of past discussion on this is that you would be able to set
> all 4 transaction isolation levels in a command, but you would really
> get the next level up for the two that we don't provide. This is OK
> since since they get at least the transaction isolation safety they
> need. The reason for doing this is portability.

That's exactly right, but it's more marketable with all four levels
"available". ;-)

Re: Comparison of PGSQL and DB2

From

Jonathan Gardner

Date:

11 March 2004, 16:34:39

On Tuesday 09 March 2004 10:22 am, Josh Berkus wrote:
> > Automatic Summary Tables
> > (Transparently provide the pre-aggregation/summarization of data to
> > dramatically improve the performance of transactions or queries that
> > require such data.)
>
> Under development, possibily available with 7.5.   These are otherwise
> know as "materialized views".
>

It is certainly possible to implement Materialized Views in PostgreSQL -- we
are using them for RedWeek.com right now. I wrote a page about the
experience, and I am currently working with a version 2 of the document.

As far as a "CREATE MATERIALIZED VIEW" syntax, and generic functions to do
any MV, those don't exist and probably won't in time for 7.5.

--
Jonathan Gardner
jgardner@jonathangardner.net

Re: Comparison of PGSQL and DB2

From

"scott.marlowe"

Date:

11 March 2004, 17:32:50

On Thu, 11 Mar 2004, Bruno Wolff III wrote:

> On Thu, Mar 11, 2004 at 14:21:09 -0500,
>   Robert Treat <xzilla@users.sourceforge.net> wrote:
> > On Thu, 2004-03-11 at 13:59, Peter Eisentraut wrote:
> > > Josh Berkus wrote:
> > > > > Isolation Levels
> > > > > (Support all four ANSI isolation levels (UR, CS, RS, RR).)
> > > >
> > > > I'm not sure about this one; I suspect that we do, however, since
> > > > MVCC, invented for the Postgres Project, has become a standard for
> > > > transaction isolation in the database industry.
> > >
> > > 7.5 will support all four isolation levels.
> > >
> >
> > I wasn't aware of anyone doing work on this... do you have any more
> > info?
>
> My memory of past discussion on this is that you would be able to set all
> 4 transaction isolation levels in a command, but you would really get
> the next level up for the two that we don't provide. This is OK since
> since they get at least the transaction isolation safety they need.
> The reason for doing this is portability.

I thought the other two were considered meaningless within the context of
MVCC...  I.e. they were defined when row locking was the standard way of
storing data, and reflect that, and MVCC changed the rules so much that
the other two, which were a cheaper to implement locking model in a row
locking database, became no cheaper, and therefore there was no reason go
program a way to expose the data the way that locking model defined, since
doing so would provide poorer data integrity while actually being the same
speed or slower.

Or something like that.

So, are those isolation levels being actually implemented, or are we just
pretending to set the level in 7.5 but still using the next level higher?

Re: Comparison of PGSQL and DB2

From

Christopher Kings-Lynne

Date:

11 March 2004, 21:42:53

>>>Federated Database Support
>>>(Ability to allow applications to access & perform JOIN operations on
>>
>>multiple disparate databases.)
>>
>>This is a feature which we do not have because it violates the ANSI SQL
>>Specification.

Dude - the reason we don't have it is because no-one can be bothered
coding it...nothing to do with the SQL spec...

Chris

Re: Comparison of PGSQL and DB2

From

Josh Berkus

Date:

13 March 2004, 15:41:39

Chris,

> >>This is a feature which we do not have because it violates the ANSI SQL
> >>Specification.
>
> Dude - the reason we don't have it is because no-one can be bothered
> coding it...nothing to do with the SQL spec...

?? I'm pretty sure that the SQL spec puts database at the top of the heirarchy
tree, and treats them as inviolate.

Unfortunately, my copy of Date is in storage.   Can someone check this?

--
Josh Berkus
PostgreSQL Advocacy Volunteer
http://advocacy.postgresql.org/
San Francisco

Re: Comparison of PGSQL and DB2

From

Rod Taylor

Date:

13 March 2004, 15:41:39

> are we just
> pretending to set the level in 7.5 but still using the next level higher?

I believe Peter found verbiage in the spec that said to do exactly that.
Something about the isolation level being the minimum requirement, and
better than that was acceptable.

--
Rod Taylor <rbt [at] rbt [dot] ca>

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc

Attachment

signature.asc

Re: Comparison of PGSQL and DB2

From

Josh Berkus

Date:

13 March 2004, 15:41:47

Peter,

> Surely MVCC wasn't "invented for the Postgres Project"; otherwise Vadim
> wouldn't have done so much copying from Oracle. ;-)

Please!   Reverse Engineering.    I can't imagine that anyone got a look at
Oracle's code.

Hmmm ... I'd always had the impression that MVCC came out of Berkeley.
Possibly mistaken, I wasn't around.

--
Josh Berkus
PostgreSQL Advocacy Volunteer
http://advocacy.postgresql.org/
San Francisco

Re: Comparison of PGSQL and DB2

From

Neil Conway

Date:

13 March 2004, 17:00:43

Josh Berkus <josh@postgresql.org> writes:
> Please!   Reverse Engineering.    I can't imagine that anyone got a look at
> Oracle's code.
>
> Hmmm ... I'd always had the impression that MVCC came out of Berkeley.
> Possibly mistaken, I wasn't around.

MVCC is a reasonably well known concurrency control technique -- it is
described in the DBMS literature, etc. It is by no means a PostgreSQL
invention.

-Neil

Re: Comparison of PGSQL and DB2

From

Peter Eisentraut

Date:

13 March 2004, 19:13:10

Josh Berkus wrote:
> ?? I'm pretty sure that the SQL spec puts database at the top of the
> heirarchy tree, and treats them as inviolate.

Here are some relevant sections from SQL92.  (SQL99 rearranged this and
I can't quite see through it, but certainly they didn't add any restrictions.)

         4.12  Catalogs

         Catalogs are named collections of schemas in an SQL-environment. An
         SQL-environment contains zero or more catalogs. A catalog con-
         tains one or more schemas, but always contains a schema named
         INFORMATION_SCHEMA that contains the views and domains of the
         Information Schema. The method of creation and destruction of
         catalogs is implementation-defined. The set of catalogs that
         can be referenced in any SQL-statement, during any particular
         SQL-transaction, or during the course of an SQL-session is also
         implementation-defined. The default catalog for a <module> whose
         <module authorization clause> does not specify an explicit <cata-
         log name> to qualify the <schema name> is implementation-defined.
         The default catalog for <preparable statement>s that are dynami-
         cally prepared in the current SQL-session through the execution
         of <prepare statement>s and <execute immediate statement>s is ini-
         tially implementation-defined but may be changed by the use of <set
         catalog statement>s.


         4.13  Clusters of catalogs

         A cluster is an implementation-defined collection of catalogs.
         Exactly one cluster is associated with an SQL-session and it
         defines the totality of the SQL-data that is available to that
         SQL-session.

         An instance of a cluster is described by an instance of a defi-
         nition schema. Given some SQL-data object, such as a view, a con-
         straint, a domain, or a base table, the definition of that object,
         and of all the objects that it directly or indirectly references,
         are in the same cluster of catalogs. For example, no <referential
         constraint definition> and no <joined table> can "cross" a cluster
         boundary.

         Whether or not any catalog can occur simultaneously in more than
         one cluster is implementation-defined.

         Within a cluster, no two catalogs have the same name.