Thread: sorting big tables :(
I think soon people are going to start calling me Mr. Big...Tables... I have a big table. 40M rows. On the disk, it's size is: 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. How should one decide based on table size how much room is needed? Also, this simple table consisting of only 2 int4 values is the exact size of an equally sized table consisting of only one int2. There seems to be too much overhead here. I realise there are extra things that have to be saved, but I am not getting the size/performance I had hoped for... I am starting to think this segment of the database would be better implemented without a dbms because it is not expected to change at all... -Mike
> > I think soon people are going to start calling me Mr. Big...Tables... > > I have a big table. 40M rows. > On the disk, it's size is: > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > How should one decide based on table size how much room is needed? > > Also, this simple table consisting of only 2 int4 values is the exact size > of an equally sized table consisting of only one int2. There seems to be > too much overhead here. I realise there are extra things that have to be > saved, but I am not getting the size/performance I had hoped for... I am > starting to think this segment of the database would be better implemented > without a dbms because it is not expected to change at all... > It is taking so much disk space because it is using a TAPE sorting method, by breaking the file into tape chunks and sorting in pieces, the merging. Can you try increasing your postgres -S parameter to some huge amount like 32MB and see if that helps? It should. i.e. postmaster -i -B 400 $DEBUG -o '-F -S 1024' "$@" >server.log 2>&1 -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Fri, 15 May 1998, Bruce Momjian wrote: > > I have a big table. 40M rows. > > On the disk, it's size is: > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > How should one decide based on table size how much room is needed? > It is taking so much disk space because it is using a TAPE sorting > method, by breaking the file into tape chunks and sorting in pieces, the The files grow until I have 6 files of almost a gig each. At that point, I start running out of space... This TAPE sotring method. It is a simple merge sort? Do you know of a way this could be done while using constant space and no more complexity in the algorithim. Even if it is a little slower, the DBMS could decide based on the table size whether it should use the tape sort or another one... Bubble sort would not be my first choice tho :) -Mike
> > On Fri, 15 May 1998, Bruce Momjian wrote: > > > > I have a big table. 40M rows. > > > On the disk, it's size is: > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > > How should one decide based on table size how much room is needed? > > > It is taking so much disk space because it is using a TAPE sorting > > method, by breaking the file into tape chunks and sorting in pieces, the > The files grow until I have 6 files of almost a gig each. At that point, I > start running out of space... > This TAPE sotring method. It is a simple merge sort? Do you know of a way > this could be done while using constant space and no more complexity in > the algorithim. Even if it is a little slower, the DBMS could decide based > on the table size whether it should use the tape sort or another one... > Bubble sort would not be my first choice tho :) Tape sort is a standard Knuth sorting. It basically sorts in pieces, and merges. If you don't do this, the accessing around gets very poor as you page fault all over the file, and the cache becomes useless. There is something optimal about having seven sort files. Not sure what to suggest. No one has complained about this before. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > > > > On Fri, 15 May 1998, Bruce Momjian wrote: > > > > > > I have a big table. 40M rows. > > > > On the disk, it's size is: > > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > > > How should one decide based on table size how much room is needed? > > > > > It is taking so much disk space because it is using a TAPE sorting > > > method, by breaking the file into tape chunks and sorting in pieces, the > > The files grow until I have 6 files of almost a gig each. At that point, I > > start running out of space... > > This TAPE sotring method. It is a simple merge sort? Do you know of a way > > this could be done while using constant space and no more complexity in > > the algorithim. Even if it is a little slower, the DBMS could decide based > > on the table size whether it should use the tape sort or another one... > > Bubble sort would not be my first choice tho :) > > Tape sort is a standard Knuth sorting. It basically sorts in pieces, > and merges. If you don't do this, the accessing around gets very poor > as you page fault all over the file, and the cache becomes useless. > > There is something optimal about having seven sort files. Not sure what > to suggest. No one has complained about this before. I think this is a bug. There is no reason to use more than a little bit over three times the input size for a sort. This is: input file, run files, output file. If we are not able to sort a 2 gig table on a 9 gig partition we need to fix it. I suspect we have a bug in the implementation, but perhaps we need to look at our choice of algorithm. In any case this problem should go on the todo list. -dg David Gould dg@illustra.com 510.628.3783 or 510.305.9468 Informix Software (No, really) 300 Lakeside Drive Oakland, CA 94612 "Of course, someone who knows more about this will correct me if I'm wrong, and someone who knows less will correct me if I'm right." --David Palmer (palmer@tybalt.caltech.edu)
On Sun, 17 May 1998, David Gould wrote: > I think this is a bug. There is no reason to use more than a little bit over > three times the input size for a sort. This is: input file, run files, output > file. If we are not able to sort a 2 gig table on a 9 gig partition we need > to fix it. I suspect we have a bug in the implementation, but perhaps we > need to look at our choice of algorithm. In any case this problem should go > on the todo list. Have to agree here... Micheal...if you were to dump that table into a text file, how big would it turn out to be? Much smaller then 2gig, no? Then perform a Unix sort on that, how long would that take? Then reload the data... Needing more then 7gig to sort a 2gig table sounds slightly off to me as well :( Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sun, 17 May 1998, Bruce Momjian wrote: > > > > I have a big table. 40M rows. > > > > On the disk, it's size is: > > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > > > How should one decide based on table size how much room is needed? > > Tape sort is a standard Knuth sorting. It basically sorts in pieces, > and merges. If you don't do this, the accessing around gets very poor > as you page fault all over the file, and the cache becomes useless. Right. I wasn't reading the right chapter. Internal sorting is much different than external sorts. Internal suggests the use of a Quicksort algorithim. Marc and I discussed over lunch. If I did a select * into, would it not make more sense to sort the results into the resulting table rather than into pieces and then copy into a table? From my limited knowlege, I think this should save 8/7 N the space. In this issue, I think there must be a lot more overhead than necessary. The table consists of only int4, int4, int2 I read 10 bytes / row of actual data here. Instead, 40M/2gigs is about 50 bytes / record What is there other than oid (4? bytes) -Mike
> > On Sun, 17 May 1998, Bruce Momjian wrote: > > > > > > I have a big table. 40M rows. > > > > > On the disk, it's size is: > > > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > > > > How should one decide based on table size how much room is needed? > > > > Tape sort is a standard Knuth sorting. It basically sorts in pieces, > > and merges. If you don't do this, the accessing around gets very poor > > as you page fault all over the file, and the cache becomes useless. > Right. I wasn't reading the right chapter. Internal sorting is much > different than external sorts. Internal suggests the use of a Quicksort > algorithim. > Marc and I discussed over lunch. If I did a select * into, would it not > make more sense to sort the results into the resulting table rather than > into pieces and then copy into a table? From my limited knowlege, I think > this should save 8/7 N the space. > In this issue, I think there must be a lot more overhead than necessary. Not sure if the internal tape is the same structure as a real table, but I doubt it. I seem to remember there is less overhead. > The table consists of only > int4, int4, int2 > I read 10 bytes / row of actual data here. > Instead, 40M/2gigs is about > 50 bytes / record > What is there other than oid (4? bytes) Internal stuff so it looks like a real table, even though it is a result, I think. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Tue, 19 May 1998, Bruce Momjian wrote: > > > > On Sun, 17 May 1998, Bruce Momjian wrote: > > > > > > > > I have a big table. 40M rows. > > > > > > On the disk, it's size is: > > > > > > 2,090,369,024 bytes. So 2 gigs. On a 9 gig drive I can't sort this table. > > > > > > How should one decide based on table size how much room is needed? > > > > > > Tape sort is a standard Knuth sorting. It basically sorts in pieces, > > > and merges. If you don't do this, the accessing around gets very poor > > > as you page fault all over the file, and the cache becomes useless. > > Right. I wasn't reading the right chapter. Internal sorting is much > > different than external sorts. Internal suggests the use of a Quicksort > > algorithim. > > Marc and I discussed over lunch. If I did a select * into, would it not > > make more sense to sort the results into the resulting table rather than > > into pieces and then copy into a table? From my limited knowlege, I think > > this should save 8/7 N the space. > > In this issue, I think there must be a lot more overhead than necessary. > > Not sure if the internal tape is the same structure as a real table, but > I doubt it. I seem to remember there is less overhead. > > > The table consists of only > > int4, int4, int2 > > I read 10 bytes / row of actual data here. > > Instead, 40M/2gigs is about > > 50 bytes / record > > What is there other than oid (4? bytes) > > Internal stuff so it looks like a real table, even though it is a > result, I think. Okay...I get to jump in here with both feet and arms flailing :) Michael and I had lunch today and talked about this, and I asked him to send an email in to the list about it...unfortunately, he didn't translate our chat very well for here :) This whole things makes absolutely no sense to me, as far as why it takes 2.5 times more space to *sort* the table as the table size itself. He starts with a 2gig table, and it runs out of disk space on a 9gig file system... Now, looking at question 3.26 in the FAQ, we have: 40 bytes + each row header (approximate) 10 bytes + two int4 fields + one int2 field 4 bytes + pointer on page to tuple -------- = 54 bytes per row The data page size in PostgreSQL is 8192(8k) bytes, so: 8192 bytes per page ------------------- = 157 rows per database page (rounded up) 54 bytes per row 40000000 data rows ----------------- = 254777 database pages 157 rows per page 254777 database pages * 8192 bytes per page = 2,087,133,184 or ~1.9gig Now, as a text file, this would amount to, what...~50MB? So, if I were to do a 'copy out' to a text file, a Unix sort and then a 'copy in', I would use up *less* disk space (by several orders of magnitude) then doing the sort inside of PostgreSQL? Why? Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker wrote: > Now, as a text file, this would amount to, what...~50MB? 40M of records to produce a 50MB text file? How would you sort such a *compressed* file? ;-) > So, if I were to do a 'copy out' to a text file, a Unix sort and then a > 'copy in', I would use up *less* disk space (by several orders of > magnitude) then doing the sort inside of PostgreSQL? Well, I think it might be optimised slightly. Am I right that postgres uses heap (i.e. they look like tables) files during sorting? While this is a merge sort, those files doesn't have to be a table-like files. Certainly, they might variable length records without pages (aren't they used sequentially). Moreover we would consider packing tape files before writting them down if necessary. Of course it will result in some performance dropdown. However it's better to have less performance that being unable to sort it at all. Last question... What's the purpose of such a big sort? If somebody gets 40M of sorted records in a result of some query, what would he do with it? Is he going to spent next years on reading this lecture? I mean, isn't it worth to query the database for necessary informations only and then sort it? Mike -- WWW: http://www.lodz.pdi.net/~mimo tel: Int. Acc. Code + 48 42 148340 add: Michal Mosiewicz * Bugaj 66 m.54 * 95-200 Pabianice * POLAND
On Wed, 20 May 1998, Michal Mosiewicz wrote: > The Hermit Hacker wrote: > > > Now, as a text file, this would amount to, what...~50MB? > 40M of records to produce a 50MB text file? How would you sort such a > *compressed* file? ;-) My math off? 40M rows at 11bytes each (2xint4+int2+\n?) oops...ya, just off by a factor of ten...still, 500MB is a quarter of the size of the 2gig file we started with... > > So, if I were to do a 'copy out' to a text file, a Unix sort and then a > > 'copy in', I would use up *less* disk space (by several orders of > > magnitude) then doing the sort inside of PostgreSQL? > > Well, I think it might be optimised slightly. Am I right that postgres > uses heap (i.e. they look like tables) files during sorting? While this > is a merge sort, those files doesn't have to be a table-like files. > Certainly, they might variable length records without pages (aren't they > used sequentially). Moreover we would consider packing tape files before > writting them down if necessary. Of course it will result in some > performance dropdown. However it's better to have less performance that > being unable to sort it at all. > > Last question... What's the purpose of such a big sort? If somebody gets > 40M of sorted records in a result of some query, what would he do with > it? Is he going to spent next years on reading this lecture? I mean, > isn't it worth to query the database for necessary informations only and > then sort it? this I don't know...I never even really thought about that, actually...Michael? :) Only you can answer that one.
> > On Wed, 20 May 1998, Michal Mosiewicz wrote: > > > The Hermit Hacker wrote: > > > > > Now, as a text file, this would amount to, what...~50MB? > > 40M of records to produce a 50MB text file? How would you sort such a > > *compressed* file? ;-) > > My math off? 40M rows at 11bytes each (2xint4+int2+\n?) oops...ya, just > off by a factor of ten...still, 500MB is a quarter of the size of the 2gig > file we started with... Actually, my description of the use of tape files was somewhat off. Actually, the file is sorted by putting several batches in each tape file, then reading the batches make another tape file with bigger batches until there is one tape file and one big sorted batch. Also, if the data is already sorted, it can do it in one pass, without making all those small batches because of the way the data structure sorts them in memory. Only Knuth can do the description justice, but suffice it to say that the data can appear up to two places at once. This is the first time I remember someone complaining about it. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
> > Well, I think it might be optimised slightly. Am I right that postgres > > uses heap (i.e. they look like tables) files during sorting? While this > > is a merge sort, those files doesn't have to be a table-like files. > > Certainly, they might variable length records without pages (aren't they > > used sequentially). Moreover we would consider packing tape files before > > writting them down if necessary. Of course it will result in some > > performance dropdown. However it's better to have less performance that > > being unable to sort it at all. > > > > Last question... What's the purpose of such a big sort? If somebody gets > > 40M of sorted records in a result of some query, what would he do with > > it? Is he going to spent next years on reading this lecture? I mean, > > isn't it worth to query the database for necessary informations only and > > then sort it? > > this I don't know...I never even really thought about that, > actually...Michael? :) Only you can answer that one. I have an idea. Can he run CLUSTER on the data? If so, the sort will not use small batches, and the disk space during sort will be reduced. However, I think CLUSTER will NEVER finish on such a file, unless it is already pretty well sorted. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Wed, 20 May 1998, Bruce Momjian wrote: > > > Well, I think it might be optimised slightly. Am I right that postgres > > > uses heap (i.e. they look like tables) files during sorting? While this > > > is a merge sort, those files doesn't have to be a table-like files. > > > Certainly, they might variable length records without pages (aren't they > > > used sequentially). Moreover we would consider packing tape files before > > > writting them down if necessary. Of course it will result in some > > > performance dropdown. However it's better to have less performance that > > > being unable to sort it at all. > > > > > > Last question... What's the purpose of such a big sort? If somebody gets > > > 40M of sorted records in a result of some query, what would he do with > > > it? Is he going to spent next years on reading this lecture? I mean, > > > isn't it worth to query the database for necessary informations only and > > > then sort it? > > > > this I don't know...I never even really thought about that, > > actually...Michael? :) Only you can answer that one. > > I have an idea. Can he run CLUSTER on the data? If so, the sort will > not use small batches, and the disk space during sort will be reduced. > However, I think CLUSTER will NEVER finish on such a file, unless it is > already pretty well sorted. Okay...then we *do* have a table size limit problem? Tables that just get too large to be manageable? Maybe this is one area we should be looking at as far as performance is concerned? One thing that just pop'd to mind, concerning the above CLUSTER command...what would it take to have *auto-cluster'ng*? Maybe provide a means of marking a field in a table for this purpose? One of the things that the Unix FS does is auto-defragmenting, at least the UFS one does. Whenever the system is idle (from my understanding), the kernel uses that time to clean up the file systems, to reduce the file system fragmentation. This is by no means SQL92, but it would be a neat "extension"...let me specify a "CLUSTER on" field. Then, as I'm entering data into the database, periodically check for fragmentation of the data and clean up accordingly. If done by the system, reasonably often, it shouldn't take up *too* much time, as most of the data should already be in order... That would have the side-benefit of speeding up the "ORDER by" on that field also...
> > I have an idea. Can he run CLUSTER on the data? If so, the sort will > > not use small batches, and the disk space during sort will be reduced. > > However, I think CLUSTER will NEVER finish on such a file, unless it is > > already pretty well sorted. > > Okay...then we *do* have a table size limit problem? Tables that > just get too large to be manageable? Maybe this is one area we should be > looking at as far as performance is concerned? Well, cluster moves one row at a time, so if the table is very fragmented, the code is slow because it is seeking all over the table. See the cluster manual pages for an alternate solution, the uses ORDER BY. > > One thing that just pop'd to mind, concerning the above CLUSTER > command...what would it take to have *auto-cluster'ng*? Maybe provide a > means of marking a field in a table for this purpose? Hard to do. That's what we have indexes for. > > One of the things that the Unix FS does is auto-defragmenting, at > least the UFS one does. Whenever the system is idle (from my > understanding), the kernel uses that time to clean up the file systems, to > reduce the file system fragmentation. > > This is by no means SQL92, but it would be a neat > "extension"...let me specify a "CLUSTER on" field. Then, as I'm entering > data into the database, periodically check for fragmentation of the data > and clean up accordingly. If done by the system, reasonably often, it > shouldn't take up *too* much time, as most of the data should already be > in order... > > That would have the side-benefit of speeding up the "ORDER by" on > that field also... We actually can have a CLUSTER ALL command, that does this. No one has implemented it yet. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Wed, 20 May 1998, The Hermit Hacker wrote: > One of the things that the Unix FS does is auto-defragmenting, at > least the UFS one does. Whenever the system is idle (from my > understanding), the kernel uses that time to clean up the file systems, to > reduce the file system fragmentation. No, that doesn't happen. The only way to eliminate fragmentation is a dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is reason UFS filesystems have a 10% reserve). Tom
On Wed, 20 May 1998, Tom wrote: > > On Wed, 20 May 1998, The Hermit Hacker wrote: > > > One of the things that the Unix FS does is auto-defragmenting, at > > least the UFS one does. Whenever the system is idle (from my > > understanding), the kernel uses that time to clean up the file systems, to > > reduce the file system fragmentation. > > No, that doesn't happen. The only way to eliminate fragmentation is a > dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is > reason UFS filesystems have a 10% reserve). Okay, then we have two different understandings of this. My understanding was that the 10% reserve gave the OS a 'temp area' in which to move blocks to/from so that it could defrag on the fly... Am CC'ng this into freebsd-hackers@freebsd.org for a "third opinion"...am willing to admit I'm wrong *grin*
> Okay, then we have two different understandings of this. My > understanding was that the 10% reserve gave the OS a 'temp area' in which > to move blocks to/from so that it could defrag on the fly... > > Am CC'ng this into freebsd-hackers@freebsd.org for a "third > opinion"...am willing to admit I'm wrong *grin* You are wrong. -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Wed, 20 May 1998, Bruce Momjian wrote: > > Okay, then we have two different understandings of this. My > > understanding was that the 10% reserve gave the OS a 'temp area' in which > > to move blocks to/from so that it could defrag on the fly... > > > > Am CC'ng this into freebsd-hackers@freebsd.org for a "third > > opinion"...am willing to admit I'm wrong *grin* > > You are wrong. I just love short answers *roll eyes*
> > On Wed, 20 May 1998, Bruce Momjian wrote: > > > > Okay, then we have two different understandings of this. My > > > understanding was that the 10% reserve gave the OS a 'temp area' in which > > > to move blocks to/from so that it could defrag on the fly... > > > > > > Am CC'ng this into freebsd-hackers@freebsd.org for a "third > > > opinion"...am willing to admit I'm wrong *grin* > > > > You are wrong. > > I just love short answers *roll eyes* > > > -- Bruce Momjian | 830 Blythe Avenue maillist@candle.pha.pa.us | Drexel Hill, Pennsylvania 19026 + If your life is a hard drive, | (610) 353-9879(w) + Christ can be your backup. | (610) 853-3000(h)
On Wed, May 20, 1998 at 01:17:34PM -0400, The Hermit Hacker wrote: > On Wed, 20 May 1998, Tom wrote: > > No, that doesn't happen. The only way to eliminate fragmentation is a > > dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is > > reason UFS filesystems have a 10% reserve). > > Okay, then we have two different understandings of this. My > understanding was that the 10% reserve gave the OS a 'temp area' in which > to move blocks to/from so that it could defrag on the fly... No. What is done is (quite correctly) fragmentation avoidance. Big files are even sometimes fragmented on purpose, to allow small files that are written later to avoid being fragmented. Eivind.
> Last question... What's the purpose of such a big sort? If somebody gets > 40M of sorted records in a result of some query, what would he do with > it? Is he going to spent next years on reading this lecture? I mean, > isn't it worth to query the database for necessary informations only and > then sort it? Not all query results are for human eyes. I'd venture to say that more queries are fed into report generators for formatting than are looked at directly from psql. A sort is required in some cases where not explicitly requested. For example, a GROUP BY clause. You _could_ get the data back ungrouped, but then you'd have to pipe it to another application or script to do the sorting and then the grouping (or perhaps group on the fly). But then perhaps that app/script will eat all the memory or disk space and you'd be in the same pickle as before. darrenk
Hello! On Wed, 20 May 1998, The Hermit Hacker wrote: > > No, that doesn't happen. The only way to eliminate fragmentation is a > > dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is > > reason UFS filesystems have a 10% reserve). > > Okay, then we have two different understandings of this. My > understanding was that the 10% reserve gave the OS a 'temp area' in which > to move blocks to/from so that it could defrag on the fly... No, you are wrong. This 10% is temp area reserved for emergent situations - when root bring system down to single-user and do system maintainance. Oleg. ---- Oleg Broytmann http://members.tripod.com/~phd2/ phd2@earthling.net Programmers don't die, they just GOSUB without RETURN.
On Thu, 21 May 1998, Oleg Broytmann wrote: > Hello! > > On Wed, 20 May 1998, The Hermit Hacker wrote: > > > No, that doesn't happen. The only way to eliminate fragmentation is a > > > dump/newfs/restore cycle. UFS does do fragmentation avoidance (which is > > > reason UFS filesystems have a 10% reserve). > > > > Okay, then we have two different understandings of this. My > > understanding was that the 10% reserve gave the OS a 'temp area' in which > > to move blocks to/from so that it could defrag on the fly... > > No, you are wrong. This 10% is temp area reserved for emergent > situations - when root bring system down to single-user and do system > maintainance. Actually, in this one you are only partly right. Only root has *access* to using that extra 10%, but, as I've been corrected by several ppl, including a couple on the FreeBSD list, that 10% is meant to *prevent/reduce* fragmentation.