Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables - Mailing list pgsql-performance

From Christopher Browne
Subject Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Date
Msg-id m3acvulx5f.fsf@wolfe.cbbrowne.com
Whole thread Raw
In response to Data Warehouse Reevaluation - MySQL vs Postgres  (Mark Cotner <mcotner@yahoo.com>)
Responses Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables  (Mark Cotner <mcotner@yahoo.com>)
Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables  ("Simon Riggs" <simon@2ndquadrant.com>)
List pgsql-performance
The world rejoiced as Mischa Sandberg <ischamay.andbergsay@activestateway.com> wrote:
> Mark Cotner wrote:
>> Requirements:
>> Merge table definition equivalent.  We use these
>> extensively.

> Looked all over mysql.com etc, and afaics merge table is indeed
> exactly a view of a union-all. Is that right?

> PG supports views, of course, as well (now) as tablespaces, allowing
> you to split tables/tablesets across multiple disk systems.  PG is
> also pretty efficient in query plans on such views, where (say) you
> make one column a constant (identifier, sort of) per input table.

The thing that _doesn't_ work well with these sorts of UNION views are
when you do self-joins.  Supposing you have 10 members, a self-join
leads to a 100-way join, which is not particularly pretty.

I'm quite curious as to how MySQL(tm) copes with this, although it may
not be able to take place; they may not support that...

>> Um, gonna sound silly, but the web interface has to remain "snappy"
>> under load.  I don't see this as a major concern since you don't
>> require table locking.

> Agreed. It's more in your warehouse design, and intelligent bounding
> of queries. I'd say PG's query analyzer is a few years ahead of
> MySQL for large and complex queries.

The challenge comes in if the application has had enormous amounts of
effort put into it to attune it exactly to MySQL(tm)'s feature set.

The guys working on RT/3 have found this a challenge; they had rather
a lot of dependancies on its case-insensitive string comparisons,
causing considerable grief.

> On the other hand, if you do warehouse-style loading (Insert, or PG
> COPY, into a temp table; and then 'upsert' into the perm table), I
> can guarantee 2500 inserts/sec is no problem.

The big wins are thus:

 1.  Group plenty of INSERTs into a single transaction.

 2.  Better still, use COPY to cut parsing costs plenty more.

 3.  Adding indexes _after_ the COPY are a further win.

Another possibility is to do clever things with stored procs; load
incoming data using the above optimizations, and then run stored
procedures to use some more or less fancy logic to put the data where
it's ultimately supposed to be.  Having the logic running inside the
engine is the big optimization.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('cbbrowne','linuxfinances.info').
http://linuxfinances.info/info/spreadsheets.html
Rules  of  the  Evil  Overlord   #198.   "I  will  remember  that  any
vulnerabilities I have  are to be revealed strictly  on a need-to-know
basis.  I will also remember that no one needs to know."
<http://www.eviloverlord.com/>

pgsql-performance by date:

Previous
From: George Essig
Date:
Subject: Re: TSearch2 and optimisation ...
Next
From: mudfoot@rawbw.com
Date:
Subject: Determine optimal fdatasync/fsync, O_SYNC/O_DSYNC options