Home > mailing lists

Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables - Mailing list pgsql-performance

From	Mischa Sandberg
Subject	Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Date	September 16, 2004 05:20:39
Msg-id	AcL1d.26650$XP3.14397@edtnps84 Whole thread
In response to	Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables ("Simon Riggs" <simon@2ndquadrant.com>)
List	pgsql-performance

Tree view

Simon Riggs wrote:
>>Jim C. Nasby
>>On Mon, Sep 13, 2004 at 11:07:35PM +0100, Simon Riggs wrote:
>>
>>>PostgreSQL's functionality is in many ways similar to Oracle
>>Partitioning.
>>
>>>Loading up your data in many similar tables, then creating a view like:
>>>
>>>CREATE VIEW BIGTABLE (idate, col1, col2, col3...) AS
>>>SELECT 200409130800, col1, col2, col3... FROM table200409130800
>>>UNION ALL
>>>SELECT 200409131000, col1, col2, col3... FROM table200409131000
>>>UNION ALL
>>>SELECT 200409131200, col1, col2, col3... FROM table200409131200
>>>...etc...
>>>
>>>will allow the PostgreSQL optimizer to eliminate partitions
>>from the query
>>>when you run queries which include a predicate on the
>>partitioning_col, e.g.
>>
>>>select count(*) from bigtable where idate >= 200409131000
>
> The "partitions" are just tables, so no need for other management tools.
> Oracle treats the partitions as sub-tables, so you need a range of commands
> to add, swap etc the partitions of the main table.

A few years ago I wrote a federated query engine (wrapped as an ODBC
driver) that had to handle thousands of contributors (partitions) to a
pseudotable / VIEWofUNIONs. Joins did require some special handling in
the optimizer, because of the huge number of crossproducts between
different tables. It was definitely worth the effort at the time,
because you need different strategies for: joining a partition to
another partition on the same subserver; joining two large partitions on
different servers; and joining a large partition on one server to a
small one on another.

The differences may not be so great for a solitary server;
but they're still there, because of disparity in subtable sizes. The
simplistic query plans tend to let you down, when you're dealing with
honking warehouses.

I'm guessing that Oracle keeps per-subtable AND cross-all-subtables
statistics, rather than building the latter from scratch in the course
of evaluating the query plan. That's the one limitation I see in
emulating their partitioned tables with Views.

pgsql-performance by date:

From: Steve Atkins
Date: 16 September 2004, 05:09:48
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres --

From: Christopher Kings-Lynne
Date: 16 September 2004, 05:31:36
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres --

Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables - Mailing list pgsql-performance

Previous

Next