Thread: filesystem performance with lots of files
this subject has come up a couple times just today (and it looks like one that keeps popping up). under linux ext2/3 have two known weaknesses (or rather one weakness with two manifestations). searching through large objects on disk is slow, this applies to both directories (creating, opening, deleting files if there are (or have been) lots of files in a directory), and files (seeking to the right place in a file). the rule of thumb that I have used for years is that if files get over a few tens of megs or directories get over a couple thousand entries you will start slowing down. common places you can see this (outside of postgres) 1. directories, mail or news storage. if you let your /var/spool/mqueue directory get large (for example a server that can't send mail for a while or mail gets misconfigured on). there may only be a few files in there after it gets fixed, but if the directory was once large just doing a ls on the directory will be slow. news servers that store each message as a seperate file suffer from this as well, they work around it by useing multiple layers of nested directories so that no directory has too many files in it (navigating the layers of directories costs as well, it's all about the tradeoffs). Mail servers that use maildir (and Cyrus which uses a similar scheme) have the same problem. to fix this you have to create a new directory and move the files to that directory (and then rename the new to the old) ext3 has an option to make searching directories faster (htree), but enabling it kills performance when you create files. And this doesn't help with large files. 2. files, mbox formatted mail files and log files as these files get large, the process of appending to them takes more time. syslog makes this very easy to test. On a box that does syncronous syslog writing (default for most systems useing standard syslog, on linux make sure there is not a - in front of the logfile name) time how long it takes to write a bunch of syslog messages, then make the log file large and time it again. a few weeks ago I did a series of tests to compare different filesystems. the test was for a different purpose so the particulars are not what I woud do for testing aimed at postgres, but I think the data is relavent) and I saw major differences between different filesystems, I'll see aobut re-running the tests to get a complete set of benchmarks in the next few days. My tests had their times vary from 4 min to 80 min depending on the filesystem in use (ext3 with hash_dir posted the worst case). what testing have other people done with different filesystems? David Lang
"David Lang" <dlang@invendra.net> wrote > > a few weeks ago I did a series of tests to compare different filesystems. > the test was for a different purpose so the particulars are not what I > woud do for testing aimed at postgres, but I think the data is relavent) > and I saw major differences between different filesystems, I'll see aobut > re-running the tests to get a complete set of benchmarks in the next few > days. My tests had their times vary from 4 min to 80 min depending on the > filesystem in use (ext3 with hash_dir posted the worst case). what testing > have other people done with different filesystems? > That's good ... what benchmarks did you used? Regards, Qingqing
On Thu, 1 Dec 2005, Qingqing Zhou wrote: > "David Lang" <dlang@invendra.net> wrote >> >> a few weeks ago I did a series of tests to compare different filesystems. >> the test was for a different purpose so the particulars are not what I >> woud do for testing aimed at postgres, but I think the data is relavent) >> and I saw major differences between different filesystems, I'll see aobut >> re-running the tests to get a complete set of benchmarks in the next few >> days. My tests had their times vary from 4 min to 80 min depending on the >> filesystem in use (ext3 with hash_dir posted the worst case). what testing >> have other people done with different filesystems? >> > > That's good ... what benchmarks did you used? I was doing testing in the context of a requirement to sync over a million small files from one machine to another (rsync would take >10 hours to do this over a 100Mb network so I started with the question 'how long would it take to do a tar-ftp-untar cycle with no smarts) so I created 1m x 1K files in a three deep directory tree (10d/10d/10d/1000files) and was doing simple 'time to copy tree', 'time to create tar', 'time to extract from tar', 'time to copy tarfile (1.6G file). I flushed the memory between each test with cat largefile >/dev/null (I know now that I should have unmounted and remounted between each test), source and destination on different IDE controllers I don't have all the numbers readily available (and I didn't do all the tests on every filesystem), but I found that even with only 1000 files/directory ext3 had some problems, and if you enabled dir_hash some functions would speed up, but writing lots of files would just collapse (that was the 80 min run) I'll have to script it and re-do the tests (and when I do this I'll also set it to do a test with far fewer, far larger files as well) David Lang
On Fri, 2 Dec 2005, David Lang wrote: > > I don't have all the numbers readily available (and I didn't do all the > tests on every filesystem), but I found that even with only 1000 > files/directory ext3 had some problems, and if you enabled dir_hash some > functions would speed up, but writing lots of files would just collapse > (that was the 80 min run) > Interesting. I would suggest test small number but bigger file would be better if the target is for database performance comparison. By small number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G (PostgreSQL data file is at most this size under normal installation). Let's take TPCC as an example, if we get a TPCC database of 500 files, each one is at most 1G (PostgreSQL has this feature/limit in ordinary installation), then this will give us a 500G database, which is big enough for your current configuration. Regards, Qingqing
On Fri, 2 Dec 2005, Qingqing Zhou wrote: >> >> I don't have all the numbers readily available (and I didn't do all the >> tests on every filesystem), but I found that even with only 1000 >> files/directory ext3 had some problems, and if you enabled dir_hash some >> functions would speed up, but writing lots of files would just collapse >> (that was the 80 min run) >> > > Interesting. I would suggest test small number but bigger file would be > better if the target is for database performance comparison. By small > number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G > (PostgreSQL data file is at most this size under normal installation). I agree, that round of tests was done on my system at home, and was in response to a friend who had rsync over a local lan take > 10 hours for <10G of data. but even so it generated some interesting info. I need to make a more controlled run at it though. > Let's take TPCC as an example, if we get a TPCC database of 500 files, > each one is at most 1G (PostgreSQL has this feature/limit in ordinary > installation), then this will give us a 500G database, which is big enough > for your current configuration. > > Regards, > Qingqing >
David Lang wrote:
> how long would it take to do a tar-ftp-untar cycle with no smarts
Note that you can do the taring, zipping, copying and untaring concurrentlt. I can't remember the exactl netcat command line options, but it goes something like this
Box1:
tar czvf - myfiles/* | netcat myserver:12345
Box2:
netcat -listen 12345 | tar xzvf -
Not only do you gain from doing it all concurrently, but not writing a temp file means that disk seeks a reduced too if you have a one spindle machine.
Also condsider just copying files onto a network mount. May not be as fast as the above, but will be faster than rsync, which has high CPU usage and thus not a good choice on a LAN.
Hmm, sorry this is not directly postgres anymore...
David
The ReiserFS white paper talks about the data structure he uses to store directories (some kind of tree), and he says it's quick to both read and write. Don't forget if you find ls slow, that could just be ls, since it's ls, not the fs, that sorts this files into alphabetical order.
ext3 has an option to make searching directories faster (htree), but enabling it kills performance when you create files. And this doesn't help with large files.
> how long would it take to do a tar-ftp-untar cycle with no smarts
Note that you can do the taring, zipping, copying and untaring concurrentlt. I can't remember the exactl netcat command line options, but it goes something like this
Box1:
tar czvf - myfiles/* | netcat myserver:12345
Box2:
netcat -listen 12345 | tar xzvf -
Not only do you gain from doing it all concurrently, but not writing a temp file means that disk seeks a reduced too if you have a one spindle machine.
Also condsider just copying files onto a network mount. May not be as fast as the above, but will be faster than rsync, which has high CPU usage and thus not a good choice on a LAN.
Hmm, sorry this is not directly postgres anymore...
David
On Tue, Dec 20, 2005 at 01:26:00PM +0000, David Roussel wrote: > Note that you can do the taring, zipping, copying and untaring > concurrentlt. I can't remember the exactl netcat command line options, > but it goes something like this > > Box1: > tar czvf - myfiles/* | netcat myserver:12345 > > Box2: > netcat -listen 12345 | tar xzvf - You can also use ssh... something like tar -cf - blah/* | ssh machine tar -xf - -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461