Re: Data Warehouse Reevaluation - MySQL vs Postgres -- - Mailing list pgsql-performance

From Joe Conway
Subject Re: Data Warehouse Reevaluation - MySQL vs Postgres --
Date
Msg-id 41491186.1060708@joeconway.com
Whole thread Raw
In response to Re: Data Warehouse Reevaluation - MySQL vs Postgres --  ("Iain" <iain@mst.co.jp>)
List pgsql-performance
Iain wrote:
>>That's exactly what we're doing, but using inherited tables instead of a
>>union view. With inheritance, there is no need to rebuild the view each
>>time a table is added or removed. Basically, in our application, tables
>>are partitioned by either month or week, depending on the type of data
>>involved, and queries are normally date qualified.
>
> That sounds interesting. I have to admit that I havn't touched iheritance in
> pg at all yet so I find it hard to imagine how this would work. If you have
> a chance, would you mind elaborating on it just a little?

OK, see below:
=====================

create table foo(f1 int, f2 date, f3 float8);

create table foo_2004_01() inherits (foo);
create table foo_2004_02() inherits (foo);
create table foo_2004_03() inherits (foo);

create index foo_2004_01_idx1 on foo_2004_01(f2);
create index foo_2004_02_idx1 on foo_2004_02(f2);
create index foo_2004_03_idx1 on foo_2004_03(f2);

insert into foo_2004_02 values(1,'2004-feb-15',3.14);


  -- needed just for illustration since these are toy tables
set enable_seqscan to false;
explain analyze select * from foo where f2 = '2004-feb-15';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
  Result  (cost=100000000.00..100000061.32 rows=16 width=16) (actual
time=0.224..0.310 rows=1 loops=1)
    ->  Append  (cost=100000000.00..100000061.32 rows=16 width=16)
(actual time=0.214..0.294 rows=1 loops=1)
          ->  Seq Scan on foo  (cost=100000000.00..100000022.50 rows=5
width=16) (actual time=0.004..0.004 rows=0 loops=1)
                Filter: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_01_idx1 on foo_2004_01 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.101..0.101 rows=0 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_02_idx1 on foo_2004_02 foo
(cost=0.00..4.68 rows=1 width=16) (actual time=0.095..0.101 rows=1 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_03_idx1 on foo_2004_03 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.066..0.066 rows=0 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
  Total runtime: 0.582 ms
(11 rows)

create table foo_2004_04() inherits (foo);
create index foo_2004_04_idx1 on foo_2004_04(f2);

explain analyze select * from foo where f2 = '2004-feb-15';

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
  Result  (cost=100000000.00..100000078.38 rows=21 width=16) (actual
time=0.052..0.176 rows=1 loops=1)
    ->  Append  (cost=100000000.00..100000078.38 rows=21 width=16)
(actual time=0.041..0.159 rows=1 loops=1)
          ->  Seq Scan on foo  (cost=100000000.00..100000022.50 rows=5
width=16) (actual time=0.004..0.004 rows=0 loops=1)
                Filter: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_01_idx1 on foo_2004_01 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.012..0.012 rows=0 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_02_idx1 on foo_2004_02 foo
(cost=0.00..4.68 rows=1 width=16) (actual time=0.016..0.022 rows=1 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_03_idx1 on foo_2004_03 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.008..0.008 rows=0 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
          ->  Index Scan using foo_2004_04_idx1 on foo_2004_04 foo
(cost=0.00..17.07 rows=5 width=16) (actual time=0.095..0.095 rows=0 loops=1)
                Index Cond: (f2 = '2004-02-15'::date)
  Total runtime: 0.443 ms
(13 rows)

For loading data, we COPY into foo, and have a trigger that redirects
the rows to the appropriate partition.

Notice that the partitions which do not contain any data of interest are
still probed for data, but since they have none it is very quick. In a
real life example I got the following results just this afternoon:

  - aggregate row count = 471,849,665
  - total number inherited tables = 216
    (many are future dated and therefore contain no data)
  - select one month's worth of data for one piece of equipment by serial
    number (49,257 rows) = 526.015 ms

Not too bad -- quick enough for my needs. BTW, this is using NFS mounted
storage (NetApp NAS).

Joe

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: Partitioning
Next
From: Steve Atkins
Date:
Subject: Re: Data Warehouse Reevaluation - MySQL vs Postgres --