Thread: Re: Comparison of PGSQL and DB2
Shashank: What follows is my reply to IBM's 2-year-old feature comparison of PostgreSQL and DB2. Each of the entries is an item that DB2 has and claims that PostgreSQL does not. I've put my replies to these claims. > On Demand Log Archiving > (Allows a user to close and archive the active log of a recoverable atabase whenever necessary, and use those log files to update a standby database.) > Incremental Backup > (Allows a user to perform interim backups that capture database changes since the last successful full (or incremental) backup. Minimizes backup window for large databases.) > Incremental Online Backups > (Ability to perform 'delta' backups while database is in use.) IBM is correct. We do not currently have the above features. However, they are all expected in or very soon after the release of 7.5. > Dual Logging > (Mirrors the active log files, which help to protect databases from accidental deletion of an active log and data corruption caused by hardware failure.) This "feature" seems redundant with the previous 3 features; I'm not sure what IBM is getting at. > High Availability Support from Vendors. > (Third party.) We have this, in quite a variety of forms. PGSQL Inc., Command Prompt, SRA Inc., Linux Labs, Veritas, and a couple of companies in Europe whose names escape me at the moment, all have some kind of HA PostgreSQL support. > Dynamic Storage Allocation > (Ability to add storage capacity to the database while the database is in use.) This is a "misfeature". PostgreSQL does not have "Dynamic Storage Allocation" becuase we don't have the filesytem access issues that DB2 does -- PostgreSQL can expand to fill the host filesystem, no allocation required. Or, to look at it another way, we've had this since Postgres 1.0. This is definitely an area in which we are superior to DB2. > Automatic Integrity Checking > (Automatic checking of the structural integrity of data pages when the data is accessed by an application and during > backup.) Another "feature" we don't have becuase it's not needed. We're more interested in preventing corruption in the first place than finding out if it's happened. > Encryption Functions > (Encryption and decryption functions with a key strength of 128 bits.) We've had these for at least 3 versions. > Group Authentication > (Authentication by user groups & roles, rather than by username for more manageability.) Once again, groups are something we've had for several versions. We don't support "roles" becuase they are redundant to groups. > Raw Device Support > (Ability to leverage raw-disk access, bypassing the overhead of the underlying operating system. > (Note: May be supported soon)) This, in our opinion, is another "misfeature". Raw Filesystem access is either a crutch for DBMSes which don't know how to use their host OS properly, or a legacy of older server OSes which did not support good filesystems. It is not useful for modern databases running on modern OSes. > Automatic Summary Tables > (Transparently provide the pre-aggregation/summarization of data to dramatically improve the performance of transactions or queries that require such data.) Under development, possibily available with 7.5. These are otherwise know as "materialized views". > User Defined Task Prioritization > (Ability to define rules on how the database should prioritize workloads either by query cost or user profiles. Ability to monitor resource usage and adjust the priority of queries or cancel runaway queries that exceed predefined limits.) This would be nice. We're not sure how to implement it; don't expect it soon for PostgreSQL. > Isolation Levels > (Support all four ANSI isolation levels (UR, CS, RS, RR).) I'm not sure about this one; I suspect that we do, however, since MVCC, invented for the Postgres Project, has become a standard for transaction isolation in the database industry. > Clustered Indexes > (Allows data records to be clustered on pages based on the sequence of a particular index. Clustering increases the efficiency of data retrieval when it involves accessing values by range.) We've had these for at least 3 versions. > Advanced Memory Management > (Ability to pre-allocate memory for data required by applications before it is needed (pre-fetching). Ability for considerable memory availability in choosing access plan to resolve queries.) Another "feature" caused by DB2's ignorance of the host OS. We *are* working on making Postgres' use of RAM and I/O more efficient. However, I suspect that it is already better than DB2's use. > Dynamic Bit Map Indexing > (Warehouse queries typically involve many AND predicates. Dynamic bit-map technology efficiently combines multiple indexes for improved performance.) Aha! Yes, this would be nice. We don't have it, under development or otherwise. We'd like it. > Bi-Directional Indexes > (Ability to perform both forward and reverse scans on a single index. Improves the performance of reverse scans by eliminating the need to use temporary tables.) <laugh> This is a *feature*? I call it a bug fix. We've had this since 7.1. -- BEFORE DB2. For that matter, we support MANY more *types* of indexes thann DB2 does, from my brief check. > Data Partitioning > (Ability to divide database into partitions, each of which can be stored and managed on any combination of multiple logical partitions (for large SMP machines) or physical partitions on separate machines in a "shared-nothing" massively parallel > hardware platform.) This is under development and will (probably) be released with 7.5, for seperate tables. Currently, an ad-hoc solution is possible for any talented DBA/sysadmin, but is somewhat hard to administer. Partitioning individual tables has been discussed but there is not yet any development code. > SMP Support for complex queries > (Ability to divide up work of a single complex/large query among processors in a multi-processor environment to resolve subqueries in parallel.) Nope, don't have this; another thing that would be nice to have. > Parallel Backup / Restore > (Ability to perform backup/restore of large databases faster by leveraging all the processors in a multi-processor machine.) This is directly related to the above feature. > Mass Data Load > (Ability to perform data load into the database faster by leveraging all processors in a multi-processor machine.) Same. > Federated Database Support > (Ability to allow applications to access & perform JOIN operations on multiple disparate databases.) This is a feature which we do not have because it violates the ANSI SQL Specification. > Built-in OLE DB Support > (Allows for access to data from another database that supports the Microsoft OLE DB standard.) Don't have it, nor are likely to. Our interface libraries are strictly seperate from the core program, and should and will remain so. Currently we support ODBC and .Net access for MS tools. OLE-DB is under development. > DRDA Support > (Support for DRDA (industry standard communications protocol for relational databases).) Not sure. This is the first time I've heard of this standard. We may support it through a 3rd-party project or library. > MQSeries Integration > (Features to reduce effort required to integrate database with MQSeries Infrastructure.) Never heard of MQSeries before, either. See above. > J2EE Certified > (Independently certified to be J2EE compliant.) Um, why would a database be "J2EE certified"? We *are* talking about an RDBMS here, not a JVM. > SQLJ > (Allows developing Java applications with embedded static SQL.) We don't have SQLJ because, to be frank, it's a worthless hybrid. We currently have two different implementations of procedural Java which can be used inside the database, either of which is superior in design to SQLJ, although I'm not sure of their production status. > FIPS/SQL92E Support > (Complies with FIPS/SQL92E standard.) Not certain about this one. We may support it, we may not. > Database XML Tools > (XML tools to easily store, retrieve, and search the content of XML documents. Ability to map XML definitions to relational tables to dynamically compose and decompose XML documents on demand.) Our support for XML is through 3rd-party projects and plug-ins. As I do not do a lot of XML-to-DB, I cannot relate the exact status of these projects. > Java UDFs > (Support for cross platform User Defined Functions.) > Java Stored Procedures > (Support for cross platform Stored Procedures.) We have PL/java and PL/J to do both of the above things. My personal knowledge is insufficient to compare our implementation with DB2s. For that matter, we support 10 Procedural Languages *in addtion to* Java. These include Perl, Python, Ruby, R, Tcl, and PHP. DB2 supports, AFAIK, only Java and a SQL-scripting dialect. > WW 24x7 Support We have a few companies that offer 24x7 support in specific areas. Certainly IBM has a leg up on us for offering worldwide support -- especially since they support PostgreSQL in some locations! -- Josh Berkus PostgreSQL Advocacy Volunteer http://advocacy.postgresql.org/ San Francisco
On Tue, Mar 09, 2004 at 10:22:24AM -0800, Josh Berkus wrote: > > User Defined Task Prioritization > > (Ability to define rules on how the database should prioritize workloads > either by query cost or user profiles. Ability to monitor resource usage and > adjust the priority of queries or cancel runaway queries that exceed > predefined limits.) > > This would be nice. We're not sure how to implement it; don't > expect it soon for PostgreSQL. Note that we do have a part of this, if "runaway queries" == "queries which run too long". I suspect that what would really be needed, however, is something running on the OS to detect timeslice or memory hogs. > > Parallel Backup / Restore (Ability to perform backup/restore of > > large databases faster by leveraging > all the processors in a multi-processor machine.) > > This is directly related to the above feature. We can simulate it though, right? Use the binary file format for pg_dump, and hand out pieces of recovery to different restorers depending on your number of processors, one table at a time per restorer. (Afilias -- Chris Browne, really -- wrote something along these lines for the import of the .org database. I talked about it a little at OSCON last year.) This is very far from the user-friendly tool that IBM offers, of course. A -- Andrew Sullivan | ajs@crankycanuck.ca
On Tue, 2004-03-09 at 13:22, Josh Berkus wrote: > Shashank: > > What follows is my reply to IBM's 2-year-old feature comparison of PostgreSQL > and DB2. Each of the entries is an item that DB2 has and claims that > PostgreSQL does not. I've put my replies to these claims. > <snip> > > > Isolation Levels > > (Support all four ANSI isolation levels (UR, CS, RS, RR).) > > I'm not sure about this one; I suspect that we do, however, since MVCC, > invented for the Postgres Project, has become a standard for transaction > isolation in the database industry. > Not exactly sure which acronyms above correspond to which levels (as I know them by different names apparently) but we support Read Committed and Serializable levels, which are by far the most common of the two AFAIK. The others as I know them are Read Uncommitted aka phantom reads, and Repeatable Read, which we do not support. Of course we also supply a extensive locking methods for further control. More info on our isolation levels can be found at http://www.postgresql.org/docs/7.4/static/transaction-iso.html <snip> > > Federated Database Support > > (Ability to allow applications to access & perform JOIN operations on > multiple disparate databases.) > > This is a feature which we do not have because it violates the ANSI SQL > Specification. > Perhaps worth a mention that using dblink, you can actually make functionality similar to this. I have recently been experimenting with using views calling dblink functions to create "local tables" that really live in a separate database on a completely separate machine. I don't know if I would recommend the technique but I think the capabilities are there. Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
Josh Berkus wrote: > > Isolation Levels > > (Support all four ANSI isolation levels (UR, CS, RS, RR).) > > I'm not sure about this one; I suspect that we do, however, since > MVCC, invented for the Postgres Project, has become a standard for > transaction isolation in the database industry. Surely MVCC wasn't "invented for the Postgres Project"; otherwise Vadim wouldn't have done so much copying from Oracle. ;-) 7.5 will support all four isolation levels. > > Federated Database Support > > (Ability to allow applications to access & perform JOIN operations > > on > > multiple disparate databases.) > > This is a feature which we do not have because it violates the ANSI > SQL Specification. That would be news to me.
On Thu, 2004-03-11 at 13:59, Peter Eisentraut wrote: > Josh Berkus wrote: > > > Isolation Levels > > > (Support all four ANSI isolation levels (UR, CS, RS, RR).) > > > > I'm not sure about this one; I suspect that we do, however, since > > MVCC, invented for the Postgres Project, has become a standard for > > transaction isolation in the database industry. > > 7.5 will support all four isolation levels. > I wasn't aware of anyone doing work on this... do you have any more info? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
On Thu, Mar 11, 2004 at 14:21:09 -0500, Robert Treat <xzilla@users.sourceforge.net> wrote: > On Thu, 2004-03-11 at 13:59, Peter Eisentraut wrote: > > Josh Berkus wrote: > > > > Isolation Levels > > > > (Support all four ANSI isolation levels (UR, CS, RS, RR).) > > > > > > I'm not sure about this one; I suspect that we do, however, since > > > MVCC, invented for the Postgres Project, has become a standard for > > > transaction isolation in the database industry. > > > > 7.5 will support all four isolation levels. > > > > I wasn't aware of anyone doing work on this... do you have any more > info? My memory of past discussion on this is that you would be able to set all 4 transaction isolation levels in a command, but you would really get the next level up for the two that we don't provide. This is OK since since they get at least the transaction isolation safety they need. The reason for doing this is portability.
Bruno Wolff III wrote: > My memory of past discussion on this is that you would be able to set > all 4 transaction isolation levels in a command, but you would really > get the next level up for the two that we don't provide. This is OK > since since they get at least the transaction isolation safety they > need. The reason for doing this is portability. That's exactly right, but it's more marketable with all four levels "available". ;-)
On Tuesday 09 March 2004 10:22 am, Josh Berkus wrote: > > Automatic Summary Tables > > (Transparently provide the pre-aggregation/summarization of data to > > dramatically improve the performance of transactions or queries that > > require such data.) > > Under development, possibily available with 7.5. These are otherwise > know as "materialized views". > It is certainly possible to implement Materialized Views in PostgreSQL -- we are using them for RedWeek.com right now. I wrote a page about the experience, and I am currently working with a version 2 of the document. As far as a "CREATE MATERIALIZED VIEW" syntax, and generic functions to do any MV, those don't exist and probably won't in time for 7.5. -- Jonathan Gardner jgardner@jonathangardner.net
On Thu, 11 Mar 2004, Bruno Wolff III wrote: > On Thu, Mar 11, 2004 at 14:21:09 -0500, > Robert Treat <xzilla@users.sourceforge.net> wrote: > > On Thu, 2004-03-11 at 13:59, Peter Eisentraut wrote: > > > Josh Berkus wrote: > > > > > Isolation Levels > > > > > (Support all four ANSI isolation levels (UR, CS, RS, RR).) > > > > > > > > I'm not sure about this one; I suspect that we do, however, since > > > > MVCC, invented for the Postgres Project, has become a standard for > > > > transaction isolation in the database industry. > > > > > > 7.5 will support all four isolation levels. > > > > > > > I wasn't aware of anyone doing work on this... do you have any more > > info? > > My memory of past discussion on this is that you would be able to set all > 4 transaction isolation levels in a command, but you would really get > the next level up for the two that we don't provide. This is OK since > since they get at least the transaction isolation safety they need. > The reason for doing this is portability. I thought the other two were considered meaningless within the context of MVCC... I.e. they were defined when row locking was the standard way of storing data, and reflect that, and MVCC changed the rules so much that the other two, which were a cheaper to implement locking model in a row locking database, became no cheaper, and therefore there was no reason go program a way to expose the data the way that locking model defined, since doing so would provide poorer data integrity while actually being the same speed or slower. Or something like that. So, are those isolation levels being actually implemented, or are we just pretending to set the level in 7.5 but still using the next level higher?
>>>Federated Database Support >>>(Ability to allow applications to access & perform JOIN operations on >> >>multiple disparate databases.) >> >>This is a feature which we do not have because it violates the ANSI SQL >>Specification. Dude - the reason we don't have it is because no-one can be bothered coding it...nothing to do with the SQL spec... Chris
Chris, > >>This is a feature which we do not have because it violates the ANSI SQL > >>Specification. > > Dude - the reason we don't have it is because no-one can be bothered > coding it...nothing to do with the SQL spec... ?? I'm pretty sure that the SQL spec puts database at the top of the heirarchy tree, and treats them as inviolate. Unfortunately, my copy of Date is in storage. Can someone check this? -- Josh Berkus PostgreSQL Advocacy Volunteer http://advocacy.postgresql.org/ San Francisco
> are we just > pretending to set the level in 7.5 but still using the next level higher? I believe Peter found verbiage in the spec that said to do exactly that. Something about the isolation level being the minimum requirement, and better than that was acceptable. -- Rod Taylor <rbt [at] rbt [dot] ca> Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL PGP Key: http://www.rbt.ca/signature.asc
Attachment
Peter, > Surely MVCC wasn't "invented for the Postgres Project"; otherwise Vadim > wouldn't have done so much copying from Oracle. ;-) Please! Reverse Engineering. I can't imagine that anyone got a look at Oracle's code. Hmmm ... I'd always had the impression that MVCC came out of Berkeley. Possibly mistaken, I wasn't around. -- Josh Berkus PostgreSQL Advocacy Volunteer http://advocacy.postgresql.org/ San Francisco
Josh Berkus <josh@postgresql.org> writes: > Please! Reverse Engineering. I can't imagine that anyone got a look at > Oracle's code. > > Hmmm ... I'd always had the impression that MVCC came out of Berkeley. > Possibly mistaken, I wasn't around. MVCC is a reasonably well known concurrency control technique -- it is described in the DBMS literature, etc. It is by no means a PostgreSQL invention. -Neil
Josh Berkus wrote: > ?? I'm pretty sure that the SQL spec puts database at the top of the > heirarchy tree, and treats them as inviolate. Here are some relevant sections from SQL92. (SQL99 rearranged this and I can't quite see through it, but certainly they didn't add any restrictions.) 4.12 Catalogs Catalogs are named collections of schemas in an SQL-environment. An SQL-environment contains zero or more catalogs. A catalog con- tains one or more schemas, but always contains a schema named INFORMATION_SCHEMA that contains the views and domains of the Information Schema. The method of creation and destruction of catalogs is implementation-defined. The set of catalogs that can be referenced in any SQL-statement, during any particular SQL-transaction, or during the course of an SQL-session is also implementation-defined. The default catalog for a <module> whose <module authorization clause> does not specify an explicit <cata- log name> to qualify the <schema name> is implementation-defined. The default catalog for <preparable statement>s that are dynami- cally prepared in the current SQL-session through the execution of <prepare statement>s and <execute immediate statement>s is ini- tially implementation-defined but may be changed by the use of <set catalog statement>s. 4.13 Clusters of catalogs A cluster is an implementation-defined collection of catalogs. Exactly one cluster is associated with an SQL-session and it defines the totality of the SQL-data that is available to that SQL-session. An instance of a cluster is described by an instance of a defi- nition schema. Given some SQL-data object, such as a view, a con- straint, a domain, or a base table, the definition of that object, and of all the objects that it directly or indirectly references, are in the same cluster of catalogs. For example, no <referential constraint definition> and no <joined table> can "cross" a cluster boundary. Whether or not any catalog can occur simultaneously in more than one cluster is implementation-defined. Within a cluster, no two catalogs have the same name.