Re: Partitioned table performance - Mailing list pgsql-performance

From Stacy White
Subject Re: Partitioned table performance
Date
Msg-id 001b01c4dcde$fc27be50$0200a8c0@grownups
Whole thread Raw
In response to Partitioned table performance  ("Stacy White" <harsh@computer.org>)
Responses Re: Partitioned table performance
List pgsql-performance
Thanks for the quick reply, Josh.  Here are some more, with wider tables and
'EXPLAIN ANALYZE' output.  These tests use the same basic structure as
before, but with 20 data columns rather than just the one:

   CREATE TABLE one_big_foo (
         partition INTEGER,
         bar1 INTEGER,
         ...
         bar20 INTEGER
    )

Each set of test tables holds 1,000,000 tuples with a partition value of
'1', and 1,000,000 with a partition value of '2'.  The bar* columns are all
set to non-null values.  The 'one_big_foo' table stores all 2M rows in one
table.  'super_foo' and 'union_foo' split the data into two tables, and use
inheritance and union views (respectively) to tie them together, as
described in my previous message.

Query timings and 'EXPLAIN ANALYZE' results for full table scans and for
partition scans follow:


vod=# SELECT COUNT(*), MAX(bar1) FROM one_big_foo ;
Time: 3695.274 ms

vod=# SELECT COUNT(*), MAX(bar1) FROM super_foo ;
Time: 4641.992 ms

vod=# SELECT COUNT(*), MAX(bar1) FROM union_foo ;
Time: 16035.025 ms


vod=# SELECT COUNT(*), MAX(bar1) FROM one_big_foo WHERE partition = 1 ;
Time: 4395.274 ms

vod=# SELECT COUNT(*), MAX(bar1) FROM super_foo WHERE partition = 1 ;
Time: 3050.920 ms

vod=# SELECT COUNT(*), MAX(bar1) FROM union_foo WHERE partition = 1 ;
Time: 7468.664 ms




vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM one_big_foo ;
                                                          QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------
 Aggregate  (cost=61747.92..61747.92 rows=1 width=4) (actual
time=18412.471..18412.474 rows=1 loops=1)
   ->  Seq Scan on one_big_foo  (cost=0.00..51747.61 rows=2000061
width=4) (actual time=0.097..10079.192 rows=2000000 loops=1)
 Total runtime: 18412.597 ms
(3 rows)

Time: 18413.919 ms


vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM super_foo ;
                                                                QUERY
PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------
 Aggregate  (cost=61749.87..61749.87 rows=1 width=4) (actual
time=30267.913..30267.916 rows=1 loops=1)
   ->  Append  (cost=0.00..51749.24 rows=2000125 width=4) (actual
time=0.127..22830.610 rows=2000000 loops=1)
         ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4)
(actual time=0.005..0.005 rows=0 loops=1)
         ->  Seq Scan on sub_foo1 super_foo  (cost=0.00..25874.62
rows=1000062 width=4) (actual time=0.113..5808.899 rows=1000000 loops=1)
         ->  Seq Scan on sub_foo2 super_foo  (cost=0.00..25874.62
rows=1000062 width=4) (actual time=0.075..5829.095 rows=1000000 loops=1)
 Total runtime: 30268.061 ms
(6 rows)

Time: 30303.271 ms


vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM union_foo ;
                                                                   QUERY
PLAN
----------------------------------------------------------------------------
--------------------------------------------------------------------
 Aggregate  (cost=98573.40..98573.40 rows=1 width=4) (actual
time=62542.849..62542.852 rows=1 loops=1)
   ->  Subquery Scan union_foo  (cost=0.00..88573.20 rows=2000040
width=4) (actual time=0.130..55536.040 rows=2000000 loops=1)
         ->  Append  (cost=0.00..68572.80 rows=2000040 width=80) (actual
time=0.122..43210.763 rows=2000000 loops=1)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..34286.40
rows=1000020 width=80) (actual time=0.118..16312.708 rows=1000000
loops=1)                     ->  Seq Scan on union_foo1
(cost=0.00..24286.20 rows=1000020 width=80) (actual time=0.107..7763.460
rows=1000000 loops=1)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..34286.40
rows=1000020 width=80) (actual time=0.116..16610.387 rows=1000000
loops=1)                     ->  Seq Scan on union_foo2
(cost=0.00..24286.20 rows=1000020 width=80) (actual time=0.095..7549.522
rows=1000000 loops=1)
 Total runtime: 62543.098 ms



vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM one_big_foo WHERE
partition = 1 ;
QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------
 Aggregate  (cost=61711.25..61711.25 rows=1 width=4) (actual
time=11592.135..11592.139 rows=1 loops=1)
   ->  Seq Scan on one_big_foo  (cost=0.00..56747.76 rows=992697
width=4) (actual time=0.106..7627.170 rows=1000000 loops=1)
         Filter: (partition = 1::numeric)
 Total runtime: 11592.264 ms
(4 rows)

Time: 11593.749 ms

vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM super_foo WHERE
partition = 1 ;

QUERY PLAN
----------------------------------------------------------------------------
---------------------------------------------------------------------------
 Aggregate  (cost=33377.11..33377.11 rows=1 width=4) (actual
time=15670.309..15670.312 rows=1 loops=1)
   ->  Append  (cost=0.00..28376.79 rows=1000064 width=4) (actual
time=6.699..12072.483 rows=1000000 loops=1)
         ->  Seq Scan on super_foo  (cost=0.00..0.00 rows=1 width=4)
(actual time=0.005..0.005 rows=0 loops=1)
               Filter: (partition = 1::numeric)
         ->  Seq Scan on sub_foo1 super_foo  (cost=0.00..28374.78
rows=1000062 width=4) (actual time=0.106..6688.812 rows=1000000 loops=1)
               Filter: (partition = 1::numeric)
         ->  Index Scan using idx_sub_foo2_partition on sub_foo2
super_foo  (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221
rows=0 loops=1)
               Index Cond: (partition = 1::numeric)
 Total runtime: 15670.463 ms
(9 rows)

Time: 15672.235 ms

vod=# EXPLAIN ANALYZE SELECT COUNT(*), MAX(bar1) FROM union_foo WHERE
partition = 1 ;
                                                                   QUERY
PLAN
----------------------------------------------------------------------------
--------------------------------------------------------------------
 Aggregate  (cost=98573.40..98573.40 rows=1 width=4) (actual
time=31897.629..31897.632 rows=1 loops=1)
   ->  Subquery Scan union_foo  (cost=0.00..88573.20 rows=2000040
width=4) (actual time=0.134..28323.692 rows=1000000 loops=1)
         ->  Append  (cost=0.00..68572.80 rows=2000040 width=80) (actual
time=0.125..21969.522 rows=1000000 loops=1)
               ->  Subquery Scan "*SELECT* 1"  (cost=0.00..34286.40
rows=1000020 width=80) (actual time=0.120..16867.005 rows=1000000
loops=1)
                     ->  Seq Scan on union_foo1  (cost=0.00..24286.20
rows=1000020 width=80) (actual time=0.108..8017.931 rows=1000000
loops=1)
               ->  Subquery Scan "*SELECT* 2"  (cost=0.00..34286.40
rows=1000020 width=80) (actual time=0.011..0.011 rows=0 loops=1)
                     ->  Result  (cost=0.00..24286.20 rows=1000020
width=80) (actual time=0.004..0.004 rows=0 loops=1)
                           One-Time Filter: false
                           ->  Seq Scan on union_foo2
(cost=0.00..24286.20 rows=1000020 width=80) (never executed)
 Total runtime: 31897.897 ms
(10 rows)

Time: 31900.204 ms



----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: <pgsql-performance@postgresql.org>
Cc: "Stacy White" <harsh@computer.org>
Sent: Sunday, December 05, 2004 3:06 PM
Subject: Re: [PERFORM] Partitioned table performance


Stacy,

Thanks for the stats!

> In some cases we've seen some increased performance in tests by splitting
> the table into several smaller tables. Both 'UNION ALL' views, and the
> superclass/subclass scheme work well at pruning down the set of rows a
> query uses, but they seem to introduce a large performance hit to the time
> to process each row (~50% for superclass/subclass, and ~150% for union
> views).

This seems reasonable, actually, given your test.   Really, what you should
be
comparing it against is not against selecting from an individual partition,
but selecting from the whole business as one large table.

I also suspect that wider rows results in less overhead proportionally; note
that your test contains *only* the indexed rows.   I should soon have a test
to prove this, hopefully.

However, I would be interested in seeing EXPLAIN ANALYZE from your tests
rather than just EXPLAIN.

--
Josh Berkus
Aglio Database Solutions
San Francisco


pgsql-performance by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Config review
Next
From: Christopher Kings-Lynne
Date:
Subject: Slow insert