Thread: how small to split a table?

how small to split a table?

From

Vivek Khera

Date:

20 March 2007, 12:02:48

I've got one logging table that is over 330 million rows to store 6
months' worth of data.  It consists of two integers and a 4-character
long string.  I have one primary key which is the two integers, and
an additional index on the second integer.

I'm planning to use inheritance to split the table into a bunch of
smaller ones by using a modulo function on one of the integers on
which we scan often.

My question is how small to make each inherited piece?  If I do
modulo 10, then each sub-table will be between 32 and 34 million rows
today based on current distribution.

If I expect to increase traffic 2 times over the next year (thus
doubling my logs) what would you recommend?

Attachment

smime.p7s

Re: how small to split a table?

From

"Heiko W.Rupp"

Date:

20 March 2007, 12:21:04

> My question is how small to make each inherited piece?  If I do
> modulo 10, then each sub-table will be between 32 and 34 million
> rows today based on current distribution.

You might try this with various sizes.
I did some testing lateley and found out that insert performance -
even if only inserting into one partition through the master
table abould halfed the speed with 4 partitions and made a 50%
increase for 2 partitions.
Please note: this is not representative in any kind!

So while it might be cool in your case to have e.g. one partition per
month, this might slow inserts down too much, so
that a different number of partitions could be better. The same
applies for queries as well (here perhaps in the other
direction).

--
Heiko W.Rupp
heiko.rupp@redhat.com, http://www.dpunkt.de/buch/3-89864-429-4.html

Re: how small to split a table?

From

Vivek Khera

Date:

20 March 2007, 13:17:15

On Mar 20, 2007, at 11:20 AM, Heiko W.Rupp wrote:

> partition through the master
> table abould halfed the speed with 4 partitions and made a 50%
> increase for 2 partitions.
> Please note: this is not representative in any kind!

I fully intend to build knowledge of the partitions into the insert
part of the logging.  Only the queries which do joins on the current
big table would use the master name.  Everything else can be trained
to go directly to the proper subtable.

Thanks for your note.  It implies to me I'm making the right choice
to build that knowledge into the system.

Attachment

smime.p7s