Re: Data Warehouse Reevaluation - MySQL vs Postgres -- - Mailing list pgsql-performance
From | Joe Conway |
---|---|
Subject | Re: Data Warehouse Reevaluation - MySQL vs Postgres -- |
Date | |
Msg-id | 41485CA6.1080505@joeconway.com Whole thread Raw |
In response to | Re: Re: Data Warehouse Reevaluation - MySQL vs Postgres -- (<simon@2ndquadrant.com>) |
Responses |
Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Re: Data Warehouse Reevaluation - MySQL vs Postgres -- |
List | pgsql-performance |
simon@2ndquadrant.com wrote: > Joe Conway <mail@joeconway.com> wrote on 15.09.2004, 06:30:24: >>We're not completely done with our data conversion (from a commercial >>RDBMSi), but so far the results have been excellent. Similar to what >>others have said in this thread, the conversion involved restructuring >>the data to better suit Postgres, and the application (data >>analysis/mining vs. the source system which is operational). As a result >>we've compressed a > 1TB database down to ~0.4TB, and seen at least one >>typical query reduced from ~9 minutes down to ~40 seconds. > > Sounds interesting. > > The performance gain comes from partition elimination of the inherited > tables under the root? > > I take it the compression comes from use of arrays, avoiding the need > for additional rows and key overhead? Sorry, in trying to be concise I was not very clear. I'm using the term compression very generally here. I'll try to give a bit more background, The original data source is a database schema designed for use by an operational application that my company sells to provide enhanced management of equipment that we also sell. The application needs to be very flexible in exactly what data it stores in order to be useful across a wide variety of equipment models and versions. In order to do that there is a very large central "transaction" table that stores name->value pairs in varchar columns. The name->value pairs come from parsed output of the equipment, and as such there is a fair amount of redundancy and unneeded data that ends up getting stored. At each installation in the field this table can get very large (> billion rows). Additionally the application prematerializes a variety of summaries for use by the operators using the GUI. We collect the data exported from each of the systems in the field and accumulate it in a single central database for data mining and analysis. This is the database that is actually being converted. By compression I really mean that unneeded and redundant data is being stripped out, and data known to be of a certain datatype is stored in that type instead of varchar (e.g. values known to be int are stored as int). Also the summaries are not being converted (although we do some post processing to create new materialized summaries). My points in telling this were: - the use of inherited tables to partition this huge number of rows and yet allow simple query access to it seems to work well, at least in early validation tests - had we simply taken the original database and "slammed" it into Postgres with no further thought, we would not have seen the big improvements, and thus the project might have been seen as a failure (even though it saves substantial $) Hope that's a bit more clear. I'm hoping to write up a more detailed case study once we've cut the Postgres system into production and the dust settles a bit. Joe
pgsql-performance by date: