Re: Data Warehouse Reevaluation - MySQL vs Postgres -- - Mailing list pgsql-performance
From | Iain |
---|---|
Subject | Re: Data Warehouse Reevaluation - MySQL vs Postgres -- |
Date | |
Msg-id | 00b301c49bab$38275070$7201a8c0@mst1x5r347kymb Whole thread Raw |
In response to | Re: Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables ("Jim C. Nasby" <decibel@decibel.org>) |
List | pgsql-performance |
Hi Joe, You went to quite a bit of effort, thanks I have the picture now. Using inheritence seems to be a useful refinement on top of the earlier outlined aproach using the UNION ALL view with appropriate predicates on the condition used to do the partitioning. Having the individual partitions derived from a parent table makes a lot of sense. regards Iain ----- Original Message ----- From: "Joe Conway" <mail@joeconway.com> To: "Iain" <iain@mst.co.jp> Cc: <pgsql-performance@postgresql.org> Sent: Thursday, September 16, 2004 1:07 PM Subject: Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- > Iain wrote: > >>That's exactly what we're doing, but using inherited tables instead of a > >>union view. With inheritance, there is no need to rebuild the view each > >>time a table is added or removed. Basically, in our application, tables > >>are partitioned by either month or week, depending on the type of data > >>involved, and queries are normally date qualified. > > > > That sounds interesting. I have to admit that I havn't touched iheritance in > > pg at all yet so I find it hard to imagine how this would work. If you have > > a chance, would you mind elaborating on it just a little? > > OK, see below: > ===================== > > create table foo(f1 int, f2 date, f3 float8); > > create table foo_2004_01() inherits (foo); > create table foo_2004_02() inherits (foo); > create table foo_2004_03() inherits (foo); > > create index foo_2004_01_idx1 on foo_2004_01(f2); > create index foo_2004_02_idx1 on foo_2004_02(f2); > create index foo_2004_03_idx1 on foo_2004_03(f2); > > insert into foo_2004_02 values(1,'2004-feb-15',3.14); > > > -- needed just for illustration since these are toy tables > set enable_seqscan to false; > explain analyze select * from foo where f2 = '2004-feb-15'; > > QUERY PLAN > -------------------------------------------------------------------------- ---------------------------------------------------------------------- > Result (cost=100000000.00..100000061.32 rows=16 width=16) (actual > time=0.224..0.310 rows=1 loops=1) > -> Append (cost=100000000.00..100000061.32 rows=16 width=16) > (actual time=0.214..0.294 rows=1 loops=1) > -> Seq Scan on foo (cost=100000000.00..100000022.50 rows=5 > width=16) (actual time=0.004..0.004 rows=0 loops=1) > Filter: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_01_idx1 on foo_2004_01 foo > (cost=0.00..17.07 rows=5 width=16) (actual time=0.101..0.101 rows=0 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_02_idx1 on foo_2004_02 foo > (cost=0.00..4.68 rows=1 width=16) (actual time=0.095..0.101 rows=1 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_03_idx1 on foo_2004_03 foo > (cost=0.00..17.07 rows=5 width=16) (actual time=0.066..0.066 rows=0 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > Total runtime: 0.582 ms > (11 rows) > > create table foo_2004_04() inherits (foo); > create index foo_2004_04_idx1 on foo_2004_04(f2); > > explain analyze select * from foo where f2 = '2004-feb-15'; > > QUERY PLAN > -------------------------------------------------------------------------- ---------------------------------------------------------------------- > Result (cost=100000000.00..100000078.38 rows=21 width=16) (actual > time=0.052..0.176 rows=1 loops=1) > -> Append (cost=100000000.00..100000078.38 rows=21 width=16) > (actual time=0.041..0.159 rows=1 loops=1) > -> Seq Scan on foo (cost=100000000.00..100000022.50 rows=5 > width=16) (actual time=0.004..0.004 rows=0 loops=1) > Filter: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_01_idx1 on foo_2004_01 foo > (cost=0.00..17.07 rows=5 width=16) (actual time=0.012..0.012 rows=0 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_02_idx1 on foo_2004_02 foo > (cost=0.00..4.68 rows=1 width=16) (actual time=0.016..0.022 rows=1 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_03_idx1 on foo_2004_03 foo > (cost=0.00..17.07 rows=5 width=16) (actual time=0.008..0.008 rows=0 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > -> Index Scan using foo_2004_04_idx1 on foo_2004_04 foo > (cost=0.00..17.07 rows=5 width=16) (actual time=0.095..0.095 rows=0 loops=1) > Index Cond: (f2 = '2004-02-15'::date) > Total runtime: 0.443 ms > (13 rows) > > For loading data, we COPY into foo, and have a trigger that redirects > the rows to the appropriate partition. > > Notice that the partitions which do not contain any data of interest are > still probed for data, but since they have none it is very quick. In a > real life example I got the following results just this afternoon: > > - aggregate row count = 471,849,665 > - total number inherited tables = 216 > (many are future dated and therefore contain no data) > - select one month's worth of data for one piece of equipment by serial > number (49,257 rows) = 526.015 ms > > Not too bad -- quick enough for my needs. BTW, this is using NFS mounted > storage (NetApp NAS). > > Joe
pgsql-performance by date: