Re: Partitioned table performance - Mailing list pgsql-performance

From Stacy White
Subject Re: Partitioned table performance
Date
Msg-id 001f01c4e26c$99719680$0200a8c0@grownups
Whole thread Raw
In response to Partitioned table performance  ("Stacy White" <harsh@computer.org>)
Responses Re: Partitioned table performance
List pgsql-performance
Josh,

You're absolutely correct that the overhead becomes less significant as the
partitioning prunes more rows.  I can even see a two-partition table being
useful in some situations (e.g., a table divided into a relatively small
"recent data" partition and a much larger "historical data" partition).  The
break-even point is when your partitioning scheme prunes 20% of the rows
(assuming you're using the inheritance based scheme).

Thanks again for the reply.  So it sounds like the answer to my original
question is that it's expected that the pseudo-partitioning would introduce
a fairly significant amount of overhead.  Correct?



----- Original Message -----
From: "Josh Berkus" <josh@agliodbs.com>
To: <pgsql-performance@postgresql.org>
Cc: "Stacy White" <harsh@computer.org>
Sent: Friday, December 10, 2004 9:52 PM
Subject: Re: [PERFORM] Partitioned table performance


Stacy,

> Each set of test tables holds 1,000,000 tuples with a partition value of
> '1', and 1,000,000 with a partition value of '2'. The bar* columns are all
> set to non-null values. The 'one_big_foo' table stores all 2M rows in one
> table. 'super_foo' and 'union_foo' split the data into two tables, and use
> inheritance and union views (respectively) to tie them together, as
> described in my previous message.
>
> Query timings and 'EXPLAIN ANALYZE' results for full table scans and for
> partition scans follow:

Hmmm .... interesting.   I think you've demonstrated that
pseudo-partitioning
doesn't pay for having only 2 partitions.   Examine this:

-> Index Scan using idx_sub_foo2_partition on sub_foo2
super_foo (cost=0.00..2.01 rows=1 width=4) (actual time=0.221..0.221
rows=0 loops=1)
Index Cond: (partition = 1::numeric)
Total runtime: 15670.463 ms

As you see, even though the aggregate operation requires a seq scan, the
planner is still able to scan, and discard, sub_foo2, using its index in 0.2
seconds.  Unfortunately, super_foo still needs to contend with:

-> Append (cost=0.00..28376.79 rows=1000064 width=4) (actual
time=6.699..12072.483 rows=1000000 loops=1)

Right there, in the Append, you lose 6 seconds.   This means that
pseudo-partitioning via inheritance will become a speed gain once you can
"make up" that 6 seconds by being able to discard more partitions.   If you
want, do a test with 6 partitions instead of 2 and let us know how it comes
out.

Also, keep in mind that there are other reasons to do pseudo-partitioning
than
your example.  Data write performance, expiring partitions, and vacuum are
big reasons that can motivate partitioning even in cases when selects are
slower.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly


pgsql-performance by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Speeding up pg_dump
Next
From: sarlav kumar
Date:
Subject: \d output to a file