Thread: At what point does a big table start becoming too big?
I have a table with 40 million rows and haven't had any performance issues yet. Are there any rules of thumb as to when a table starts getting too big? For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc?
<p><tt><font size="2">> <br />> I have a table with 40 million rows and haven't had any performance <br />> issuesyet.<br />> <br />> Are there any rules of thumb as to when a table starts getting too big?<br />> <br />>For example, maybe if the index size is 6x the amount of ram, if the<br />> table is 10% of total disk space, etc?<br/>> <br />> <br />> -- <br />> </font></tt><br /><br /><tt><font size="2">My rule here is that a tableis too big when performance starts degrading beyond an acceptable level. </font></tt><br /><br /><tt><font size="2">Ifthe database and server are delivering consistent and acceptable performance levels despite an index being 6xRAM or a table consuming 10% of disk, then I tend to leave it be until an issue is raised.</font></tt><br /><br /><tt><fontsize="2">Cheers<br /></font></tt><br /><tt><font size="2">Martin</font></tt><font face="sans-serif">=============================================Romax Technology Limited Rutherford House Nottingham Science& Technology Park Nottingham, NG7 2PZ England Telephone numbers: +44 (0)115 951 88 00 (main) For other officelocations see: http://www.romaxtech.com/Contact ================================= =============== E-mail: info@romaxtech.comWebsite: www.romaxtech.com ================================= ================ Confidentiality StatementThis transmission is for the addressee only and contains information that is confidential and privileged. Unlessyou are the named addressee, or authorised to receive it on behalf of the addressee you may not copy or use it, ordisclose it to anyone else. If you have received this transmission in error please delete from your system and contactthe sender. Thank you for your cooperation. =================================================</font>
On Wed, Aug 22, 2012 at 4:06 PM, Nick <nboutelier@gmail.com> wrote: > I have a table with 40 million rows and haven't had any performance issues yet. > > Are there any rules of thumb as to when a table starts getting too big? > > For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? > > Performance on a big table is going to depend very heavily on the sorts of queries executed against it. I don't think you can come up with a rule of thumb of that sort. Best Wishes, Chris Travers
On 2012-08-22, Nick <nboutelier@gmail.com> wrote: > I have a table with 40 million rows and haven't had any performance issues yet. > > Are there any rules of thumb as to when a table starts getting too big? when you need to run a query that needs to fetch too many rows. > For example, maybe if the index size is 6x the amount of ram, > if the table is 10% of total disk space, etc? If you only need one row at a time and you have the indices for it no size is too big, the larger they are the more impressive indices are. O(log(n)) beats O(n) more and more as n grows. -- ⚂⚃ 100% natural
In response to "Martin French" <Martin.French@romaxtech.com>: > > > > I have a table with 40 million rows and haven't had any performance > > issues yet. > > > > Are there any rules of thumb as to when a table starts getting too big? > > > > For example, maybe if the index size is 6x the amount of ram, if the > > table is 10% of total disk space, etc? > > My rule here is that a table is too big when performance starts degrading beyond an acceptable level. The challenge there is that if you wait until performance degrades beyond an acceptable level, you've allowed yourself to get into a situation where clients are upset and frustrated, and fixing the problem is difficult because there's so much data to manipulate to rearrange things. And the advice I have along those lines is to establish now what constitutes unacceptable performance, and put some sort of monitoring and tracking in place to know what your performance degradation looks like and predict when you'll have to react. For example, a MRTG graph that runs an experimental query once a day during off hours and graphs the time it takes vs. the # of rows in the table will prove a valuable tool that can sometimes predict exactly when you'll have to change things before it becomes a problem. Other tricks work as well, such as having the application send an email any time a process takes more than 50% of the allowable maximum time. The key is to have visibility into what's going on so your guesses are at least informed. People will often point out that no monitoring or trend tracking is 100% accurate, but if it allows you to predict and plan for 90% of the future issues, you'll have that much more time available to deal with the 10% that you don't expect. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
On Wed, Aug 22, 2012 at 6:06 PM, Nick <nboutelier@gmail.com> wrote: > I have a table with 40 million rows and haven't had any performance issues yet. > > Are there any rules of thumb as to when a table starts getting too big? > > For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? Well, that begs the question: ...and do what? I guess you probably mean partitioning. Partitioning doesn't reduce index size -- it makes total index size *bigger* since you have to duplicate higher nodes in the index -- unless you can exploit the table structure around the partition so that less fields have to be indexed. Where partitioning helps is by speeding certain classes of bulk operations like deleting a bunch of rows -- maybe you can set it up so that a partition can be dropped instead for a huge efficiency win. Partitioning also helps by breaking up administrative operations such as vacuum, analyze, cluster, create index, reindex, etc. So I'd argue that it's time to start thinking about plan 'b' when you find yourself getting concerned about performance of those operations. Partitioning aside, the way to reduce the number of rows you're dealing with is to explore reorganizing your data: classic normalization or use of arrays are a couple of examples of things you can try. merlin
> > >> > > I = have a table with 40 million rows and haven't had any performance> &= gt; > issues yet.> > >> > > Are there any rule= s of thumb as to when a table starts getting too big?> > >= > > > For example, maybe if the index size is 6x the amount of ram= , if the> > > table is 10% of total disk space, etc?> &= gt; > > My rule here is that a table is too big when performance = starts > degrading beyond an acceptable level.> > The = challenge there is that if you wait until performance degrades> beyo= nd an acceptable level, you've allowed yourself to get into a> situa= tion where clients are upset and frustrated, and fixing the> problem= is difficult because there's so much data to manipulate to> rearran= ge things.> Apologies, I could/should have phrased that better..= My rule here is that a table is to= o big when performance starts degrading beyond a MEASURABLE level. :)</font= >Cheers<font size= =3D"2">Martin =3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Romax Technology Limited Rutherford House Nottingham Science & Technology Park Nottingham,=20 NG7 2PZ England Telephone numbers: +44 (0)115 951 88 00 (main) For other office locations see: http://www.romaxtech.com/Contact =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D E-mail: info@romaxtech.com Website: www.romaxtech.com =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Confidentiality Statement This transmission is for the addressee only and contains information that i= s confidential and privileged. Unless you are the named addressee, or authorised to receive it on behalf o= f the addressee=20 you may not copy or use it, or disclose it to anyone else.=20 If you have received this transmission in error please delete from your sys= tem and contact the sender. Thank you for your cooperation. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D</f= ont>
On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, Aug 22, 2012 at 6:06 PM, Nick <nboutelier@gmail.com> wrote: >> I have a table with 40 million rows and haven't had any performance issues yet. >> >> Are there any rules of thumb as to when a table starts getting too big? >> >> For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? > > Well, that begs the question: ...and do what? I guess you probably > mean partitioning. > > Partitioning doesn't reduce index size -- it makes total index size > *bigger* since you have to duplicate higher nodes in the index -- > unless you can exploit the table structure around the partition so > that less fields have to be indexed. Depending on the operation it may make the effective index size bigger or smaller. For example if querying only one child table your effective index size is much smaller. However, if you are worried about that, partial indexes rock :-D Best Wishes, Chris Travers
Chris Travers <chris.travers@gmail.com> writes: > On Thu, Aug 23, 2012 at 6:46 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> Partitioning doesn't reduce index size -- it makes total index size >> *bigger* since you have to duplicate higher nodes in the index -- >> unless you can exploit the table structure around the partition so >> that less fields have to be indexed. > Depending on the operation it may make the effective index size bigger > or smaller. For example if querying only one child table your > effective index size is much smaller. I tend to think of it like this: partitioning means *manually* replacing the first level of index search. As such, it is almost never a win for either complexity or performance of simple searches and updates. As Merlin said, pretty much the only compelling reason to do it is if you can match up the partition boundaries with bulk tasks that are common in your application, such as dropping a month's worth of data at a time. regards, tom lane
On Wed, Aug 22, 2012 at 4:06 PM, Nick <nboutelier@gmail.com> wrote: > I have a table with 40 million rows and haven't had any performance issues yet. > > Are there any rules of thumb as to when a table starts getting too big? No. Assuming you decided it were "too big", what could you do about it? If there are chunks of data that you don't need anymore, why wait for the table to be become too big before removing it? And partitioning very often isn't the answer, either. There are very few problems that ill-conceived partitioning won't make worse. And there are very many problems which even the best-conceived partitioning will fail to improve. If you have one of the cases where partitioning is a good solution, don't wait for the table to become 'too big'. Just go do it. > For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? If you need to be able to rapidly insert new rows in bulk, and you can't sort them before inserting because there are multiple indexes with completely different sort order and they cover the entire key range of at least some of the indexes, than your performance will collapse long before you get to 6x the amount of RAM. But, what can you do about it? Maybe partitioning will fix this, maybe it won't. If it will, why wait for a rule of thumb to be met? If it won't, what do you actually do once the rule of thumb is met? I guess one rule of them I would have is, if for some reason I had to cluster or reindex the table, how long would it take to do so? If that is much longer than I can reasonably schedule as a maintenance window, I would be worried. Cheers, Jeff
On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@potentialtech.com> wrote: > In response to "Martin French" <Martin.French@romaxtech.com>: >> > >> > I have a table with 40 million rows and haven't had any performance >> > issues yet. >> > >> > Are there any rules of thumb as to when a table starts getting too big? >> > >> > For example, maybe if the index size is 6x the amount of ram, if the >> > table is 10% of total disk space, etc? >> >> My rule here is that a table is too big when performance starts degrading beyond an acceptable level. > > The challenge there is that if you wait until performance degrades > beyond an acceptable level, you've allowed yourself to get into a > situation where clients are upset and frustrated, and fixing the > problem is difficult because there's so much data to manipulate to > rearrange things. Yes, I agree with that. > And the advice I have along those lines is to establish now what > constitutes unacceptable performance, and put some sort of monitoring > and tracking in place to know what your performance degradation looks > like and predict when you'll have to react. For example, a MRTG > graph that runs an experimental query once a day during off hours and > graphs the time it takes vs. the # of rows in the table will prove > a valuable tool that can sometimes predict exactly when you'll have > to change things before it becomes a problem. This seems inconsistent with your previous advice. By the time your experimental query shows a problem, you no longer have any maintenance windows left large enough to fix it. Unless your experimental query was a reindex or something non-production like that, in which case running it on a production server, even off-hours, doesn't seem like a good idea. Cheers, Jeff
On Thu, 23 Aug 2012 17:56:37 -0700 Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Aug 23, 2012 at 6:37 AM, Bill Moran <wmoran@potentialtech.com> wrote: > > In response to "Martin French" <Martin.French@romaxtech.com>: > >> > > >> > I have a table with 40 million rows and haven't had any performance > >> > issues yet. > >> > > >> > Are there any rules of thumb as to when a table starts getting too big? > >> > > >> > For example, maybe if the index size is 6x the amount of ram, if the > >> > table is 10% of total disk space, etc? > >> > >> My rule here is that a table is too big when performance starts degrading beyond an acceptable level. > > > > The challenge there is that if you wait until performance degrades > > beyond an acceptable level, you've allowed yourself to get into a > > situation where clients are upset and frustrated, and fixing the > > problem is difficult because there's so much data to manipulate to > > rearrange things. > > Yes, I agree with that. > > > And the advice I have along those lines is to establish now what > > constitutes unacceptable performance, and put some sort of monitoring > > and tracking in place to know what your performance degradation looks > > like and predict when you'll have to react. For example, a MRTG > > graph that runs an experimental query once a day during off hours and > > graphs the time it takes vs. the # of rows in the table will prove > > a valuable tool that can sometimes predict exactly when you'll have > > to change things before it becomes a problem. > > This seems inconsistent with your previous advice. By the time your > experimental query shows a problem, you no longer have any maintenance > windows left large enough to fix it. Unless your experimental query > was a reindex or something non-production like that, in which case > running it on a production server, even off-hours, doesn't seem like a > good idea. Perhaps I didn't explain the approach sufficiently. If you can establish something like, "This specific SELECT has to run in under 5 minutes to meet the client's expectations" you can then time how long that query takes each time it's run (by capturing that information in the application, for example ... or by running it in some automated fashion ... possibly other methods as well). If you capture that runtime on a regular basis and put the results on a graph in concert with other relevant data, such as the number of rows in the related tables, size of the data, etc, you quickly get a good picture of how fast things are growing, and frequently you can project the line out into the future and say things like "if we don't come up with a better way to do this by Sept of next year, we're going to exceed our allowed run time." You can then take that very detailed information to business planners and point out that they need to schedule developer time _before_ then if they don't want the application to slow down below the allowable level. Unless you work for somewhere that has unlimited resources, your time is always split between feature requests, day to day operations, firefighting, etc. In my experience, keeping things like this under control is often a matter of having enough information to justify why your optimization project is more important than whizbang feature x that marketing wants so bad. Of course, if you work somewhere with unlimited resources, you should let me know so I can send in my resume. And none of what I'm suggesting is intended to belittle the other suggestions either -- if you know of a way to optimize the data better, why not do it now? If you can be purging old data, why wait until performance is a problem to start purging, etc. It's just another trick to have in your bag. -- Bill Moran <wmoran@potentialtech.com>
Hi, On 23 August 2012 23:37, Bill Moran <wmoran@potentialtech.com> wrote: > > And the advice I have along those lines is to establish now what > constitutes unacceptable performance, and put some sort of monitoring > and tracking in place to know what your performance degradation looks > like and predict when you'll have to react. For example, a MRTG > graph that runs an experimental query once a day during off hours and > graphs the time it takes vs. the # of rows in the table will prove > a valuable tool that can sometimes predict exactly when you'll have > to change things before it becomes a problem. Other tricks work as > well, such as having the application send an email any time a process > takes more than 50% of the allowable maximum time. > I like to use APDEX (http://apdex.org/specs.html). You can change your database to all time for all statements and then calculate APDEX score based on last N log entries (> 10). APDEX score is weighted score based on number of datapoints within three zones: 0...T Satisfied Zone T..F (=4*T) Tolerating Zone 4T... Frustrated Zone you can choose T (or F; then T = F/4) i.e. under normal circumstances all queries should finish under 20ms (T = 20ms, F = 4T = 80ms). Apdex score is: score = (Satisfied count + Tolerating count / 2) / Total samples You can get this number, for example, every minute and plot it using Ganglia / MRTG / ... -- Ondrej Ivanic (ondrej.ivanic@gmail.com)
On 23/08/12 11:06, Nick wrote: > I have a table with 40 million rows and haven't had any performance issues yet. > > Are there any rules of thumb as to when a table starts getting too big? > > For example, maybe if the index size is 6x the amount of ram, if the table is 10% of total disk space, etc? > > I think it would be good to specify the context. For example: The timeliness of a database required to support an ship based anti-missile system would require far more stringent timing considerations than a database used to retrieve scientific images based on complicated criteria. The size of records, how often updated/deleted, types of queries, ... would also be useful. Unfortunately it might simply be a case of "It depends..."! Cheers, Gavin