Thread: At what point does a big table start becoming too big?

At what point does a big table start becoming too big?

From

Nick

Date:

23 August 2012, 06:39:46

I have a table with 40 million rows and haven't had any performance issues yet.

Are there any rules of thumb as to when a table starts getting too big?

For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

Re: At what point does a big table start becoming too big?

From

"Martin French"

Date:

23 August 2012, 07:22:18

<p><tt><font size="2">> <br />> I have a table with 40 million rows and haven't had any performance <br />>
issuesyet.<br />> <br />> Are there any rules of thumb as to when a table starts getting too big?<br />> <br
/>>For example, maybe if the index size is 6x the amount of ram, if the<br />> table is 10% of total disk space,
etc?<br/>> <br />> <br />> -- <br />> </font></tt><br /><br /><tt><font size="2">My rule here is that a
tableis too big when performance starts degrading beyond an acceptable level. </font></tt><br /><br /><tt><font
size="2">Ifthe database and server are delivering consistent and acceptable performance levels despite an index being
6xRAM or a table consuming 10% of disk, then I tend to leave it be until an issue is raised.</font></tt><br /><br
/><tt><fontsize="2">Cheers<br /></font></tt><br /><tt><font size="2">Martin</font></tt><font
face="sans-serif">=============================================Romax Technology Limited Rutherford House Nottingham
Science& Technology Park Nottingham, NG7 2PZ England Telephone numbers: +44 (0)115 951 88 00 (main) For other
officelocations see: http://www.romaxtech.com/Contact ================================= =============== E-mail:
info@romaxtech.comWebsite: www.romaxtech.com ================================= ================ Confidentiality
StatementThis transmission is for the addressee only and contains information that is confidential and privileged.
Unlessyou are the named addressee, or authorised to receive it on behalf of the addressee you may not copy or use it,
ordisclose it to anyone else. If you have received this transmission in error please delete from your system and
contactthe sender. Thank you for your cooperation. =================================================</font>

Re: At what point does a big table start becoming too big?

From

Chris Travers

Date:

23 August 2012, 09:13:38

On Wed, Aug 22, 2012 at 4:06 PM, Nick <nboutelier@gmail.com> wrote:
> I have a table with 40 million rows and haven't had any performance issues yet.
>
> Are there any rules of thumb as to when a table starts getting too big?
>
> For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?
>
>
Performance on a big table is going to depend very heavily on the
sorts of queries executed against it.    I don't think you can come up
with a rule of thumb of that sort.

Best Wishes,
Chris Travers

Re: At what point does a big table start becoming too big?

From

Jasen Betts

Date:

23 August 2012, 12:01:38

On 2012-08-22, Nick <nboutelier@gmail.com> wrote:
> I have a table with 40 million rows and haven't had any performance issues yet.
>
> Are there any rules of thumb as to when a table starts getting too big?

when you need to run a query that needs to fetch too many rows.

> For example, maybe if the index size is 6x the amount of ram,
> if the table is 10% of total disk space, etc?

If you only need one row at a time and you have the indices for it
no size is too big, the larger they are the more impressive
indices are. O(log(n)) beats O(n) more and more as n grows.

--
⚂⚃ 100% natural

Re: At what point does a big table start becoming too big?

From

Bill Moran

Date:

23 August 2012, 13:37:31

In response to "Martin French" <Martin.French@romaxtech.com>:
> >
> > I have a table with 40 million rows and haven't had any performance
> > issues yet.
> >
> > Are there any rules of thumb as to when a table starts getting too big?
> >
> > For example, maybe if the index size is 6x the amount of ram, if the
> > table is 10% of total disk space, etc?
>
> My rule here is that a table is too big when performance starts degrading beyond an acceptable level.

The challenge there is that if you wait until performance degrades
beyond an acceptable level, you've allowed yourself to get into a
situation where clients are upset and frustrated, and fixing the
problem is difficult because there's so much data to manipulate to
rearrange things.

And the advice I have along those lines is to establish now what
constitutes unacceptable performance, and put some sort of monitoring
and tracking in place to know what your performance degradation looks
like and predict when you'll have to react.  For example, a MRTG
graph that runs an experimental query once a day during off hours and
graphs the time it takes vs. the # of rows in the table will prove
a valuable tool that can sometimes predict exactly when you'll have
to change things before it becomes a problem.  Other tricks work as
well, such as having the application send an email any time a process
takes more than 50% of the allowable maximum time.

The key is to have visibility into what's going on so your guesses
are at least informed.  People will often point out that no monitoring
or trend tracking is 100% accurate, but if it allows you to predict
and plan for 90% of the future issues, you'll have that much more time
available to deal with the 10% that you don't expect.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: At what point does a big table start becoming too big?

From

Merlin Moncure

Date:

23 August 2012, 13:46:11

On Wed, Aug 22, 2012 at 6:06 PM, Nick <nboutelier@gmail.com> wrote:
> I have a table with 40 million rows and haven't had any performance issues yet.
>
> Are there any rules of thumb as to when a table starts getting too big?
>
> For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

Well, that begs the question: ...and do what?  I guess you probably
mean partitioning.

Partitioning doesn't reduce index size -- it makes total index size
*bigger* since you have to duplicate higher nodes in the index --
unless you can exploit the table structure around the partition so
that less fields have to be indexed.

Where partitioning helps is by speeding certain classes of bulk
operations like deleting a bunch of rows -- maybe you can set it up so
that a partition can be dropped instead for a huge efficiency win.
Partitioning also helps by breaking up administrative operations such
as vacuum, analyze, cluster, create index, reindex, etc. So I'd argue
that it's time to start thinking about plan 'b' when you find yourself
getting concerned about performance of those operations.

Partitioning aside, the way to reduce the number of rows you're
dealing with is to explore reorganizing your data: classic
normalization or use of arrays are a couple of examples of things you
can try.

merlin

Re: At what point does a big table start becoming too big?

From

"Martin French"

Date:

23 August 2012, 13:51:29

> > >> > > I =
have a table with 40 million rows and haven't had any performance> &=
gt; > issues yet.> > >> > > Are there any rule=
s of thumb as to when a table starts getting too big?> > >=
> > > For example, maybe if the index size is 6x the amount of ram=
, if the> > > table is 10% of total disk space, etc?> &=
gt; > > My rule here is that a table is too big when performance =
starts > degrading beyond an acceptable level.> > The =
challenge there is that if you wait until performance degrades> beyo=
nd an acceptable level, you've allowed yourself to get into a> situa=
tion where clients are upset and frustrated, and fixing the> problem=
 is difficult because there's so much data to manipulate to> rearran=
ge things.> Apologies, I could/should have phrased that better..=
My rule here is that a table is to=
o big when performance starts degrading beyond a MEASURABLE level. :)</font=
>Cheers<font size=
=3D"2">Martin =3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

Romax Technology Limited
Rutherford House
Nottingham Science & Technology Park
Nottingham,=20
NG7 2PZ
England

Telephone numbers:
+44 (0)115 951 88 00 (main)

For other office locations see:
http://www.romaxtech.com/Contact
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
E-mail: info@romaxtech.com
Website: www.romaxtech.com
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Confidentiality Statement
This transmission is for the addressee only and contains information that i=
s confidential and privileged.
Unless you are the named addressee, or authorised to receive it on behalf o=
f the addressee=20
you may not copy or use it, or disclose it to anyone else.=20
If you have received this transmission in error please delete from your sys=
tem and contact the sender. Thank you for your cooperation.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</f=
ont>

Re: At what point does a big table start becoming too big?

From

Chris Travers

Date:

23 August 2012, 14:09:42

On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Wed, Aug 22, 2012 at 6:06 PM, Nick <nboutelier@gmail.com> wrote:
>> I have a table with 40 million rows and haven't had any performance issues yet.
>>
>> Are there any rules of thumb as to when a table starts getting too big?
>>
>> For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?
>
> Well, that begs the question: ...and do what?  I guess you probably
> mean partitioning.
>
> Partitioning doesn't reduce index size -- it makes total index size
> *bigger* since you have to duplicate higher nodes in the index --
> unless you can exploit the table structure around the partition so
> that less fields have to be indexed.

Depending on the operation it may make the effective index size bigger
or smaller.  For example if querying only one child table your
effective index size is much smaller.

However, if you are worried about that, partial indexes rock :-D

Best Wishes,
Chris Travers

Re: At what point does a big table start becoming too big?

From

Tom Lane

Date:

23 August 2012, 14:18:36

Chris Travers <chris.travers@gmail.com> writes:
> On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>> Partitioning doesn't reduce index size -- it makes total index size
>> *bigger* since you have to duplicate higher nodes in the index --
>> unless you can exploit the table structure around the partition so
>> that less fields have to be indexed.

> Depending on the operation it may make the effective index size bigger
> or smaller.  For example if querying only one child table your
> effective index size is much smaller.

I tend to think of it like this: partitioning means *manually* replacing
the first level of index search.

As such, it is almost never a win for either complexity or performance
of simple searches and updates.  As Merlin said, pretty much the only
compelling reason to do it is if you can match up the partition
boundaries with bulk tasks that are common in your application, such as
dropping a month's worth of data at a time.

            regards, tom lane

Re: At what point does a big table start becoming too big?

From

Jeff Janes

Date:

24 August 2012, 00:16:06

On Wed, Aug 22, 2012 at 4:06 PM, Nick <nboutelier@gmail.com> wrote:
> I have a table with 40 million rows and haven't had any performance issues yet.
>
> Are there any rules of thumb as to when a table starts getting too big?

No.  Assuming you decided it were "too big", what could you do about it?

If there are chunks of data that you don't need anymore, why wait for
the table to be become too big before removing it?

And partitioning very often isn't the answer, either.  There are very
few problems that ill-conceived partitioning won't make worse.  And
there are very many problems which even the best-conceived
partitioning will fail to improve.  If you have one of the cases where
partitioning is a good solution, don't wait for the table to become
'too big'.  Just go do it.

> For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?

If you need to be able to rapidly insert new rows in bulk, and you
can't sort them before inserting because there are multiple indexes
with completely different sort order and they cover the entire key
range of at least some of the indexes, than your performance will
collapse long before you get to 6x the amount of RAM.  But, what can
you do about it?  Maybe partitioning will fix this, maybe it won't.
If it will, why wait for a rule of thumb to be met?  If it won't, what
do you actually do once the rule of thumb is met?

I guess one rule of them I would have is, if for some reason I had to
cluster or reindex the table, how long would it take to do so?  If
that is much longer than I can reasonably schedule as a maintenance
window, I would be worried.

Cheers,

Jeff

Re: At what point does a big table start becoming too big?

From

Jeff Janes

Date:

24 August 2012, 00:56:46

On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@potentialtech.com> wrote:
> In response to "Martin French" <Martin.French@romaxtech.com>:
>> >
>> > I have a table with 40 million rows and haven't had any performance
>> > issues yet.
>> >
>> > Are there any rules of thumb as to when a table starts getting too big?
>> >
>> > For example, maybe if the index size is 6x the amount of ram, if the
>> > table is 10% of total disk space, etc?
>>
>> My rule here is that a table is too big when performance starts degrading beyond an acceptable level.
>
> The challenge there is that if you wait until performance degrades
> beyond an acceptable level, you've allowed yourself to get into a
> situation where clients are upset and frustrated, and fixing the
> problem is difficult because there's so much data to manipulate to
> rearrange things.

Yes, I agree with that.

> And the advice I have along those lines is to establish now what
> constitutes unacceptable performance, and put some sort of monitoring
> and tracking in place to know what your performance degradation looks
> like and predict when you'll have to react.  For example, a MRTG
> graph that runs an experimental query once a day during off hours and
> graphs the time it takes vs. the # of rows in the table will prove
> a valuable tool that can sometimes predict exactly when you'll have
> to change things before it becomes a problem.

This seems inconsistent with your previous advice.  By the time your
experimental query shows a problem, you no longer have any maintenance
windows left large enough to fix it.  Unless your experimental query
was a reindex or something non-production like that, in which case
running it on a production server, even off-hours, doesn't seem like a
good idea.


Cheers,

Jeff

Re: At what point does a big table start becoming too big?

From

Bill Moran

Date:

24 August 2012, 02:22:29

On Thu, 23 Aug 2012 17:56:37 -0700 Jeff Janes <jeff.janes@gmail.com> wrote:

> On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@potentialtech.com> wrote:
> > In response to "Martin French" <Martin.French@romaxtech.com>:
> >> >
> >> > I have a table with 40 million rows and haven't had any performance
> >> > issues yet.
> >> >
> >> > Are there any rules of thumb as to when a table starts getting too big?
> >> >
> >> > For example, maybe if the index size is 6x the amount of ram, if the
> >> > table is 10% of total disk space, etc?
> >>
> >> My rule here is that a table is too big when performance starts degrading beyond an acceptable level.
> >
> > The challenge there is that if you wait until performance degrades
> > beyond an acceptable level, you've allowed yourself to get into a
> > situation where clients are upset and frustrated, and fixing the
> > problem is difficult because there's so much data to manipulate to
> > rearrange things.
>
> Yes, I agree with that.
>
> > And the advice I have along those lines is to establish now what
> > constitutes unacceptable performance, and put some sort of monitoring
> > and tracking in place to know what your performance degradation looks
> > like and predict when you'll have to react.  For example, a MRTG
> > graph that runs an experimental query once a day during off hours and
> > graphs the time it takes vs. the # of rows in the table will prove
> > a valuable tool that can sometimes predict exactly when you'll have
> > to change things before it becomes a problem.
>
> This seems inconsistent with your previous advice.  By the time your
> experimental query shows a problem, you no longer have any maintenance
> windows left large enough to fix it.  Unless your experimental query
> was a reindex or something non-production like that, in which case
> running it on a production server, even off-hours, doesn't seem like a
> good idea.

Perhaps I didn't explain the approach sufficiently.

If you can establish something like, "This specific SELECT has to run
in under 5 minutes to meet the client's expectations" you can then
time how long that query takes each time it's run (by capturing that
information in the application, for example ... or by running it in
some automated fashion ... possibly other methods as well).

If you capture that runtime on a regular basis and put the results
on a graph in concert with other relevant data, such as the number
of rows in the related tables, size of the data, etc, you quickly
get a good picture of how fast things are growing, and frequently
you can project the line out into the future and say things like
"if we don't come up with a better way to do this by Sept of next
year, we're going to exceed our allowed run time."  You can then
take that very detailed information to business planners and point
out that they need to schedule developer time _before_ then if they
don't want the application to slow down below the allowable level.

Unless you work for somewhere that has unlimited resources, your
time is always split between feature requests, day to day operations,
firefighting, etc.  In my experience, keeping things like this
under control is often a matter of having enough information to
justify why your optimization project is more important than
whizbang feature x that marketing wants so bad.

Of course, if you work somewhere with unlimited resources, you
should let me know so I can send in my resume.

And none of what I'm suggesting is intended to belittle the other
suggestions either -- if you know of a way to optimize the data
better, why not do it now?  If you can be purging old data, why
wait until performance is a problem to start purging, etc.

It's just another trick to have in your bag.

--
Bill Moran <wmoran@potentialtech.com>

Re: At what point does a big table start becoming too big?

From

Ondrej Ivanič

Date:

24 August 2012, 02:48:49

Hi,

On 23 August 2012 23:37, Bill Moran <wmoran@potentialtech.com> wrote:
>
> And the advice I have along those lines is to establish now what
> constitutes unacceptable performance, and put some sort of monitoring
> and tracking in place to know what your performance degradation looks
> like and predict when you'll have to react.  For example, a MRTG
> graph that runs an experimental query once a day during off hours and
> graphs the time it takes vs. the # of rows in the table will prove
> a valuable tool that can sometimes predict exactly when you'll have
> to change things before it becomes a problem.  Other tricks work as
> well, such as having the application send an email any time a process
> takes more than 50% of the allowable maximum time.
>

I like to use APDEX (http://apdex.org/specs.html). You can change your
database to all time for all statements and then calculate APDEX score
based on last N log entries (> 10). APDEX score is weighted score
based on number of datapoints within three zones:
0...T Satisfied Zone
T..F (=4*T) Tolerating Zone
4T...  Frustrated Zone

you can choose T (or F; then T = F/4) i.e. under normal circumstances
all queries should finish under 20ms (T = 20ms, F = 4T = 80ms). Apdex
score is:

score = (Satisfied count + Tolerating count / 2) / Total samples

You can get this number, for example, every minute and plot it using
Ganglia / MRTG / ...

--
Ondrej Ivanic
(ondrej.ivanic@gmail.com)

Re: At what point does a big table start becoming too big?

From

Gavin Flower

Date:

24 August 2012, 15:01:57

On 23/08/12 11:06, Nick wrote:
> I have a table with 40 million rows and haven't had any performance issues yet.
>
> Are there any rules of thumb as to when a table starts getting too big?
>
> For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?
>
>
I think it would be good to specify the context.

For example:
The timeliness of a database required to support an ship based
anti-missile system would require far more stringent timing
considerations than a database used to retrieve scientific images based
on complicated criteria.

The size of records, how often updated/deleted, types of queries, ...
would also be useful.

Unfortunately it might simply be a case of "It depends..."!

Cheers,
Gavin