Thread: AW: AW: Re: New Linux xfs/reiser file systems

AW: AW: Re: New Linux xfs/reiser file systems

From

Zeugswetter Andreas SB

Date:

08 May 2001, 07:00:18

> > 2. The allocation time for raw devices is by far better (near
> >     instantaneous) than creating preallocated files in a
> >     fs. Providing 1 Tb of raw devices is a task of minutes,
> >     creating 1 Tb filsystems with preallocated 2 Gb files is a
> >     task of hours at best.
> 
> Filesystem dependent, surely?  Veritas' VxFS can create filesystems
> quickly, and quickly preallocate space for the files.

And you are sure, that this does not create a sparse file, which is exactly 
what we do not want ? Can you name one other example ?

> > 3. absolute control over writes and page location (you don't want
> > interleaved pages)
> 
> As well as a filesystem, most large systems I'm familiar with use
> volume management software (VxVM, LVM, ...) and their "disks" will be
> allocated space on disk arrays.

Of course. My thinking has long switched to volume groups and logical 
volumes. This however does not alter the fact, that one LV can be 
regarded as one mainly contiguous (is that the word ?) block on disk
for optimization issues. When reading a logical volume sequentially 
head movement will be minimal.

Andreas

Re: AW: AW: Re: New Linux xfs/reiser file systems

From

Giles Lean

Date:

08 May 2001, 08:03:38

> > Filesystem dependent, surely?  Veritas' VxFS can create filesystems
> > quickly, and quickly preallocate space for the files.
> 
> And you are sure, that this does not create a sparse file, which is exactly 
> what we do not want ? Can you name one other example ?

http://docs.hp.com//hpux/onlinedocs/B3929-90011/00/00/35-con.html#s3-2
   Reservation: Preallocating Space to a File  
   VxFS makes it possible to preallocate space to a file at the time   of the request rather than when data is written
intothe   file. This space cannot be allocated to other files in the file   system. VxFS prevents any unexpected
out-of-spacecondition on the   file system by ensuring that a file's required space will be   associated with the file
beforeit is required.
 

I can't name another example -- I'm not familiar with what IBM's JFS
or SGI's XFS filesytems are capable of doing.

> Of course. My thinking has long switched to volume groups and logical 
> volumes. This however does not alter the fact, that one LV can be 
> regarded as one mainly contiguous (is that the word ?) block on disk
> for optimization issues. When reading a logical volume sequentially 
> head movement will be minimal.

I'm no storage guru, but I'd certainly hope that sequential reads were
"efficient" on just about any storage device.

My mild concern is that any model of storage system behaviour that
includes "head movement" is inadequate for anything but small systems,
and is challenged for them by the presence of caches everywhere.

A storage array such as those made by Hitachi and EMC will have SCSI
LUNs (aka "disks") that are sized and configured by software inside
the storage device.

Good performance on such storage systems might depend on keeping as
much work up to it as possible, to let the device determine what order
to service the requests.  Attempts to minimise "head movement" may
hurt, not help.  But as I said, I'm no storage guru, and I'm not a
performance consultant either. :-)

Regards,

Giles

Re: AW: AW: Re: New Linux xfs/reiser file systems

From

test@test.com

Date:

11 May 2001, 12:46:59

On Tue, 8 May 2001 09:09:08 +0000 (UTC), giles@nemeton.com.au (Giles
Lean) wrote:

>Good performance on such storage systems might depend on keeping as
>much work up to it as possible, to let the device determine what order
>to service the requests.  Attempts to minimise "head movement" may
>hurt, not help.

Letting the device determine the sequence of IO increases throughput
and reduces performance.

If you want the maximum throughput, so you can reduce the money you
spend on storage, you que the requests and sort the ques based on the
minimum work required to complete the aggregated requests.

If you want performance, you put your request first and make the que
wait. Some storage systems allow the specification of two or more
priorities so your IO can go first and everyone else goes second.

"lazy" page writes and all the other tricks used to keep IO in memory
have the effect of reducing writes at the expense of data lost during
a power failure. Some storage devices were built with batteries to
allow writes after power loss. If the batteries could maintain writes
for 5 seconds after poser loss, writes could be held up for nearly 5
seconds in the hope that many duplicate writes to the same location
could be dropped.

I know a lot of storage systems from the hardware up and few
outperform an equivalent system where the money was focused on more
memory in the computer. Most add on storage systems offering
"spectacular" performance have make most financial sense when they are
attached to a computer that is at a physical limit of expansion. If
you have 4 Gb on a 32 bit computer, adding a storage system with 2 Gb
of cache can be a sound investment. Adding the same 2 Gb cache to a 32
bit system expanded to just 2 Gb usually costs more than adding the
extra 2 Gb to the computer.

Once 64 bit computers with 32, 64 or 128 Gb of DDR become available,
the best approach will go back to heaps of RAM on the computer and
none on disk.

If you are looking at one of the 64 bit replacements x86 style
processor and equivalents, the best disk arrangement would be to have
no file system or operating system intervention and have the whole
disk allocated to the processor page function, similar to the theory
behind AS/400s and equivalents. Each disk would be on a single fibre,
service 64 Gb gigabyte and be mirrored on an adjacent disk. The only
processing in the CPU would be ECC, the disk controller would perform
the RAID 1 processing and perform the IO in a pendulum sweep pattern
with just enough cache to handle one sweep. You would, of course, need
power supplies big enough to cover a few extra sweeps and something to
tell the page processing to flush everything when the power is
dropping.

When you have multiple computers in a cluster, you could build an
intermediate device to handle the page flow much the same as a network
switch.

All these technologies were tried and proves several times in the last
30 years and work perfectly when the computer's maximum address space
is larger than the total size of all open files. They worked perfectly
when people had 100Mb databases on 200Mb disks in systems that could
address 4Gb. Doubling the number of bits in the address range puts 64
bit systems out in front of both disks and memory again. There are
already 128 bit and 256 bit processors in use so systems could be
planned to stay ahead of disk design so you never have to worry about
a file system again.

The AMD slot A and Intel slot 1 could be sold the way you buy Turkish
pizza, by the foot. Just walk up to the hardware shop and ask for 300
bits of address space. Shops could have specials, like an extra 100
bits of address space for all orders over $20.