Thread: filesystem performance with lots of files

filesystem performance with lots of files

From

David Lang

Date:

01 December 2005, 14:13:27

this subject has come up a couple times just today (and it looks like one
that keeps popping up).

under linux ext2/3 have two known weaknesses (or rather one weakness with
two manifestations). searching through large objects on disk is slow, this
applies to both directories (creating, opening, deleting files if there
are (or have been) lots of files in a directory), and files (seeking to
the right place in a file).

the rule of thumb that I have used for years is that if files get over a
few tens of megs or directories get over a couple thousand entries you
will start slowing down.

common places you can see this (outside of postgres)

1. directories, mail or news storage.
   if you let your /var/spool/mqueue directory get large (for example a
server that can't send mail for a while or mail gets misconfigured on).
there may only be a few files in there after it gets fixed, but if the
directory was once large just doing a ls on the directory will be slow.

   news servers that store each message as a seperate file suffer from this
as well, they work around it by useing multiple layers of nested
directories so that no directory has too many files in it (navigating the
layers of directories costs as well, it's all about the tradeoffs). Mail
servers that use maildir (and Cyrus which uses a similar scheme) have the
same problem.

   to fix this you have to create a new directory and move the files to
that directory (and then rename the new to the old)

   ext3 has an option to make searching directories faster (htree), but
enabling it kills performance when you create files. And this doesn't help
with large files.

2. files, mbox formatted mail files and log files
   as these files get large, the process of appending to them takes more
time. syslog makes this very easy to test. On a box that does syncronous
syslog writing (default for most systems useing standard syslog, on linux
make sure there is not a - in front of the logfile name) time how long it
takes to write a bunch of syslog messages, then make the log file large
and time it again.

a few weeks ago I did a series of tests to compare different filesystems.
the test was for a different purpose so the particulars are not what I
woud do for testing aimed at postgres, but I think the data is relavent)
and I saw major differences between different filesystems, I'll see aobut
re-running the tests to get a complete set of benchmarks in the next few
days. My tests had their times vary from 4 min to 80 min depending on the
filesystem in use (ext3 with hash_dir posted the worst case). what testing
have other people done with different filesystems?

David Lang

Re: filesystem performance with lots of files

From

"Qingqing Zhou"

Date:

01 December 2005, 17:03:25

"David Lang" <dlang@invendra.net> wrote
>
> a few weeks ago I did a series of tests to compare different filesystems.
> the test was for a different purpose so the particulars are not what I
> woud do for testing aimed at postgres, but I think the data is relavent)
> and I saw major differences between different filesystems, I'll see aobut
> re-running the tests to get a complete set of benchmarks in the next few
> days. My tests had their times vary from 4 min to 80 min depending on the
> filesystem in use (ext3 with hash_dir posted the worst case). what testing
> have other people done with different filesystems?
>

That's good ... what benchmarks did you used?

Regards,
Qingqing

Re: filesystem performance with lots of files

From

David Lang

Date:

02 December 2005, 06:14:06

On Thu, 1 Dec 2005, Qingqing Zhou wrote:

> "David Lang" <dlang@invendra.net> wrote
>>
>> a few weeks ago I did a series of tests to compare different filesystems.
>> the test was for a different purpose so the particulars are not what I
>> woud do for testing aimed at postgres, but I think the data is relavent)
>> and I saw major differences between different filesystems, I'll see aobut
>> re-running the tests to get a complete set of benchmarks in the next few
>> days. My tests had their times vary from 4 min to 80 min depending on the
>> filesystem in use (ext3 with hash_dir posted the worst case). what testing
>> have other people done with different filesystems?
>>
>
> That's good ... what benchmarks did you used?

I was doing testing in the context of a requirement to sync over a million
small files from one machine to another (rsync would take >10 hours to do
this over a 100Mb network so I started with the question 'how long would
it take to do a tar-ftp-untar cycle with no smarts) so I created 1m x 1K
files in a three deep directory tree (10d/10d/10d/1000files) and was doing
simple 'time to copy tree', 'time to create tar', 'time to extract from
tar', 'time to copy tarfile (1.6G file). I flushed the memory between each
test with cat largefile >/dev/null (I know now that I should have
unmounted and remounted between each test), source and destination on
different IDE controllers

I don't have all the numbers readily available (and I didn't do all the
tests on every filesystem), but I found that even with only 1000
files/directory ext3 had some problems, and if you enabled dir_hash some
functions would speed up, but writing lots of files would just collapse
(that was the 80 min run)

I'll have to script it and re-do the tests (and when I do this I'll also
set it to do a test with far fewer, far larger files as well)

David Lang

Re: filesystem performance with lots of files

From

Qingqing Zhou

Date:

02 December 2005, 06:49:57

On Fri, 2 Dec 2005, David Lang wrote:
>
> I don't have all the numbers readily available (and I didn't do all the
> tests on every filesystem), but I found that even with only 1000
> files/directory ext3 had some problems, and if you enabled dir_hash some
> functions would speed up, but writing lots of files would just collapse
> (that was the 80 min run)
>

Interesting. I would suggest test small number but bigger file would be
better if the target is for database performance comparison. By small
number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G
(PostgreSQL data file is at most this size under normal installation).

Let's take TPCC as an example, if we get a TPCC database of 500 files,
each one is at most 1G (PostgreSQL has this feature/limit in ordinary
installation), then this will give us a 500G database, which is big enough
for your current configuration.

Regards,
Qingqing

Re: filesystem performance with lots of files

From

David Lang

Date:

02 December 2005, 07:06:26

On Fri, 2 Dec 2005, Qingqing Zhou wrote:

>>
>> I don't have all the numbers readily available (and I didn't do all the
>> tests on every filesystem), but I found that even with only 1000
>> files/directory ext3 had some problems, and if you enabled dir_hash some
>> functions would speed up, but writing lots of files would just collapse
>> (that was the 80 min run)
>>
>
> Interesting. I would suggest test small number but bigger file would be
> better if the target is for database performance comparison. By small
> number, I mean 10^2 - 10^3; By bigger, I mean file size from 8k to 1G
> (PostgreSQL data file is at most this size under normal installation).

I agree, that round of tests was done on my system at home, and was in
response to a friend who had rsync over a local lan take > 10 hours for
<10G of data. but even so it generated some interesting info. I need to
make a more controlled run at it though.

> Let's take TPCC as an example, if we get a TPCC database of 500 files,
> each one is at most 1G (PostgreSQL has this feature/limit in ordinary
> installation), then this will give us a 500G database, which is big enough
> for your current configuration.
>
> Regards,
> Qingqing
>

Re: filesystem performance with lots of files

From

David Roussel

Date:

20 December 2005, 12:26:06

David Lang wrote:

ext3 has an option to make searching directories faster (htree), but enabling it kills performance when you create files. And this doesn't help with large files.

The ReiserFS white paper talks about the data structure he uses to store directories (some kind of tree), and he says it's quick to both read and write. Don't forget if you find ls slow, that could just be ls, since it's ls, not the fs, that sorts this files into alphabetical order.

> how long would it take to do a tar-ftp-untar cycle with no smarts

Note that you can do the taring, zipping, copying and untaring concurrentlt. I can't remember the exactl netcat command line options, but it goes something like this

Box1:
tar czvf - myfiles/* | netcat myserver:12345

Box2:
netcat -listen 12345 | tar xzvf -

Not only do you gain from doing it all concurrently, but not writing a temp file means that disk seeks a reduced too if you have a one spindle machine.

Also condsider just copying files onto a network mount. May not be as fast as the above, but will be faster than rsync, which has high CPU usage and thus not a good choice on a LAN.

Hmm, sorry this is not directly postgres anymore...

David

Re: filesystem performance with lots of files

From

"Jim C. Nasby"

Date:

20 December 2005, 18:59:54

On Tue, Dec 20, 2005 at 01:26:00PM +0000, David Roussel wrote:
> Note that you can do the taring, zipping, copying and untaring
> concurrentlt.  I can't remember the exactl netcat command line options,
> but it goes something like this
>
> Box1:
> tar czvf - myfiles/* | netcat myserver:12345
>
> Box2:
> netcat -listen 12345 | tar xzvf -

You can also use ssh... something like

tar -cf - blah/* | ssh machine tar -xf -
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461