Thread: which ext3 fs type should I use for postgresql

which ext3 fs type should I use for postgresql

From
"Philippe Amelant"
Date:
Hi all,
using mkfs.ext3 I can use "-T" to tune the filesytem

mkfs.ext3 -T fs_type ...

fs_type are in /etc/mke2fs.conf (on debian)

is there a recommended setting for this parameter ???

thanks

Re: which ext3 fs type should I use for postgresql

From
Matthew Wakeling
Date:
On Thu, 15 May 2008, Philippe Amelant wrote:
> using mkfs.ext3 I can use "-T" to tune the filesytem
>
> mkfs.ext3 -T fs_type ...
>
> fs_type are in /etc/mke2fs.conf (on debian)

If you look at that file, you'd see that tuning really doesn't change that
much. In fact, the only thing it does change (if you avoid "small" and
"floppy") is the number of inodes available in the filesystem. Since
Postgres tends to produce few large files, you don't need that many
inodes, so the "largefile" option may be best. However, note that the
number of inodes is a hard limit of the filesystem - if you try to create
more files on the filesystem than there are available inodes, then you
will get an out of space error even if the filesystem has space left.
The only real benefit of having not many inodes is that you waste a little
less space, so many admins are pretty generous with this setting.

Probably of more use are some of the other settings:

  -m reserved-blocks-percentage - this reserves a portion of the filesystem
     that only root can write to. If root has no need for it, you can kill
     this by setting it to zero. The default is for 5% of the disc to be
     wasted.
  -j turns the filesystem into ext3 instead of ext2 - many people say that
     for Postgres you shouldn't do this, as ext2 is faster.

Matthew

--
The surest protection against temptation is cowardice.
                                              -- Mark Twain

Re: which ext3 fs type should I use for postgresql

From
david@lang.hm
Date:
On Thu, 15 May 2008, Matthew Wakeling wrote:

> On Thu, 15 May 2008, Philippe Amelant wrote:
>> using mkfs.ext3 I can use "-T" to tune the filesytem
>>
>> mkfs.ext3 -T fs_type ...
>>
>> fs_type are in /etc/mke2fs.conf (on debian)
>
> If you look at that file, you'd see that tuning really doesn't change that
> much. In fact, the only thing it does change (if you avoid "small" and
> "floppy") is the number of inodes available in the filesystem. Since Postgres
> tends to produce few large files, you don't need that many inodes, so the
> "largefile" option may be best. However, note that the number of inodes is a
> hard limit of the filesystem - if you try to create more files on the
> filesystem than there are available inodes, then you will get an out of space
> error even if the filesystem has space left.
> The only real benefit of having not many inodes is that you waste a little
> less space, so many admins are pretty generous with this setting.

IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T
setting goes.

> Probably of more use are some of the other settings:
>
> -m reserved-blocks-percentage - this reserves a portion of the filesystem
>    that only root can write to. If root has no need for it, you can kill
>    this by setting it to zero. The default is for 5% of the disc to be
>    wasted.

think twice about this. ext2/3 get slow when they fill up (they have
fragmentation problems when free space gets too small), this 5% that
only root can use also serves as a buffer against that as well.

> -j turns the filesystem into ext3 instead of ext2 - many people say that
>    for Postgres you shouldn't do this, as ext2 is faster.

for the partition with the WAL on it you may as well do ext2 (the WAL is
written synchronously and sequentially so the journal doesn't help you),
but for the data partition you may benifit from the journal.

David Lang

Re: which ext3 fs type should I use for postgresql

From
Matthew Wakeling
Date:
On Thu, 15 May 2008, david@lang.hm wrote:
> IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T
> setting goes.

ITYF it's actually 1GB/file.

> think twice about this. ext2/3 get slow when they fill up (they have
> fragmentation problems when free space gets too small), this 5% that only
> root can use also serves as a buffer against that as well.

It makes sense to me that the usage pattern of Postgres would be much less
susceptible to causing fragmentation than normal filesystem usage. Has
anyone actually tested this and found out?

Matthew

--
Isn't "Microsoft Works" something of a contradiction?

Re: which ext3 fs type should I use for postgresql

From
Craig James
Date:
Matthew Wakeling wrote:
> Probably of more use are some of the other settings:
>
>  -m reserved-blocks-percentage - this reserves a portion of the filesystem
>     that only root can write to. If root has no need for it, you can kill
>     this by setting it to zero. The default is for 5% of the disc to be
>     wasted.

This is not a good idea.  The 5% is NOT reserved for root's use, but rather is to prevent severe file fragmentation.
Asthe disk gets full, the remaining empty spaces tend to be small spaces scattered all over the disk, meaning that even
formodest-sized files, the kernel can't allocate contiguous disk blocks.  If you reduce this restriction to 0%, you are
virtuallyguaranteed poor performance when you fill up your disk, since those files that are allocated last will be
massivelyfragmented. 

Worse, the fragmented files that you create remain fragmented even if you clean up to get back below the 95% mark.  If
Postgreshappened to insert a lot of data on a 99% full file system, those blocks could be spread all over the place,
andthey'd stay that way forever, even after you cleared some space. 

Craig

Re: which ext3 fs type should I use for postgresql

From
Guillaume Cottenceau
Date:
Craig James <craig_james 'at' emolecules.com> writes:

> Matthew Wakeling wrote:
>> Probably of more use are some of the other settings:
>>
>>  -m reserved-blocks-percentage - this reserves a portion of the filesystem
>>     that only root can write to. If root has no need for it, you can kill
>>     this by setting it to zero. The default is for 5% of the disc to be
>>     wasted.
>
> This is not a good idea.  The 5% is NOT reserved for root's
> use, but rather is to prevent severe file fragmentation.  As

Also, IIRC when PG writes data up to a full filesystem,
postmaster won't be able to then restart if the filesystem is
still full (it needs some free disk space for its startup).

Or maybe this has been fixed in recent versions?

--
Guillaume Cottenceau

Re: which ext3 fs type should I use for postgresql

From
Matthew Wakeling
Date:
On Thu, 15 May 2008, Guillaume Cottenceau wrote:
> Also, IIRC when PG writes data up to a full filesystem,
> postmaster won't be able to then restart if the filesystem is
> still full (it needs some free disk space for its startup).
>
> Or maybe this has been fixed in recent versions?

Ah, the "not enough space to delete file, delete some files and try again"
problem. Anyway, that isn't relevant to the reserved percentage, as that
will happen whether or not the filesystem is 5% smaller.

Matthew

--
Let's say I go into a field and I hear "baa baa baa". Now, how do I work
out whether that was "baa" followed by "baa baa", or if it was "baa baa"
followed by "baa"?
         - Computer Science Lecturer

Re: which ext3 fs type should I use for postgresql

From
Guillaume Cottenceau
Date:
Matthew Wakeling <matthew 'at' flymine.org> writes:

> On Thu, 15 May 2008, Guillaume Cottenceau wrote:
>> Also, IIRC when PG writes data up to a full filesystem,
>> postmaster won't be able to then restart if the filesystem is
>> still full (it needs some free disk space for its startup).
>>
>> Or maybe this has been fixed in recent versions?
>
> Ah, the "not enough space to delete file, delete some files and try
> again" problem. Anyway, that isn't relevant to the reserved
> percentage, as that will happen whether or not the filesystem is 5%
> smaller.

It is still relevant, as with 5% margin, you can afford changing
that to 0% with tune2fs, just the time for you to start PG and
remove some data by SQL, then shutdown and set the margin to 5%
again.

--
Guillaume Cottenceau

Re: which ext3 fs type should I use for postgresql

From
"Joshua D. Drake"
Date:
Guillaume Cottenceau wrote:
> Matthew Wakeling <matthew 'at' flymine.org> writes:

> It is still relevant, as with 5% margin, you can afford changing
> that to 0% with tune2fs, just the time for you to start PG and
> remove some data by SQL, then shutdown and set the margin to 5%
> again.
>

I find that if you actually reach that level of capacity failure it is
due to lack of management and likely there is much lower hanging fruit
left over by a lazy dba or sysadmin than having to adjust filesystem
level parameters.

Manage actively and the above change is absolutely irrelevant.

Joshua D. Drake

Re: which ext3 fs type should I use for postgresql

From
Guillaume Cottenceau
Date:
"Joshua D. Drake" <jd 'at' commandprompt.com> writes:

> Guillaume Cottenceau wrote:
>> Matthew Wakeling <matthew 'at' flymine.org> writes:
>
>> It is still relevant, as with 5% margin, you can afford changing
>> that to 0% with tune2fs, just the time for you to start PG and
>> remove some data by SQL, then shutdown and set the margin to 5%
>> again.
>
> I find that if you actually reach that level of capacity failure it is
> due to lack of management and likely there is much lower hanging fruit
> left over by a lazy dba or sysadmin than having to adjust filesystem
> level parameters.
>
> Manage actively and the above change is absolutely irrelevant.

Of course. I didn't say otherwise. I only say that it's useful in
that case. E.g. if you're using a dedicated partition for PG,
then a good solution is what I describe, rather than horrifyingly
trying to remove some random PG files, or when you cannot
temporarily move some of them and symlink from the PG partition.
I don't praise that kind of case, it should of course be avoided
by sane management. A bad management is not a reason for hiding
solutions to the problems that can happen!

--
Guillaume Cottenceau

Re: which ext3 fs type should I use for postgresql

From
david@lang.hm
Date:
On Thu, 15 May 2008, david@lang.hm wrote:

> On Thu, 15 May 2008, Matthew Wakeling wrote:
>
>> On Thu, 15 May 2008, Philippe Amelant wrote:
>>> using mkfs.ext3 I can use "-T" to tune the filesytem
>>>
>>> mkfs.ext3 -T fs_type ...
>>>
>>> fs_type are in /etc/mke2fs.conf (on debian)
>>
>> If you look at that file, you'd see that tuning really doesn't change that
>> much. In fact, the only thing it does change (if you avoid "small" and
>> "floppy") is the number of inodes available in the filesystem. Since
>> Postgres tends to produce few large files, you don't need that many inodes,
>> so the "largefile" option may be best. However, note that the number of
>> inodes is a hard limit of the filesystem - if you try to create more files
>> on the filesystem than there are available inodes, then you will get an out
>> of space error even if the filesystem has space left.
>> The only real benefit of having not many inodes is that you waste a little
>> less space, so many admins are pretty generous with this setting.
>
> IIRC postgres likes to do 1M/file, which isn't very largeas far as the -T
> setting goes.
>
>> Probably of more use are some of the other settings:
>>
>> -m reserved-blocks-percentage - this reserves a portion of the filesystem
>>    that only root can write to. If root has no need for it, you can kill
>>    this by setting it to zero. The default is for 5% of the disc to be
>>    wasted.
>
> think twice about this. ext2/3 get slow when they fill up (they have
> fragmentation problems when free space gets too small), this 5% that only
> root can use also serves as a buffer against that as well.
>
>> -j turns the filesystem into ext3 instead of ext2 - many people say that
>>    for Postgres you shouldn't do this, as ext2 is faster.
>
> for the partition with the WAL on it you may as well do ext2 (the WAL is
> written synchronously and sequentially so the journal doesn't help you), but
> for the data partition you may benifit from the journal.

a fairly recent article on the subject


http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/

David Lang

Re: which ext3 fs type should I use for postgresql

From
"Scott Marlowe"
Date:
On Thu, May 15, 2008 at 9:38 AM, Joshua D. Drake <jd@commandprompt.com> wrote:
> Guillaume Cottenceau wrote:
>>
>> Matthew Wakeling <matthew 'at' flymine.org> writes:
>
>> It is still relevant, as with 5% margin, you can afford changing
>> that to 0% with tune2fs, just the time for you to start PG and
>> remove some data by SQL, then shutdown and set the margin to 5%
>> again.
>>
>
> I find that if you actually reach that level of capacity failure it is due
> to lack of management and likely there is much lower hanging fruit left over
> by a lazy dba or sysadmin than having to adjust filesystem level parameters.
>
> Manage actively and the above change is absolutely irrelevant.

Sorry, but that's like saying that open heart surgery isn't a fix for
clogged arteries because you should have been taking aspirin everyday
and exercising.  It might not be the best answer, but sometimes it's
the only answer you've got.

I know that being able to drop the margin from x% to 0% for 10 minutes
has pulled more than one db back from the brink for me (usually
consulting on other people's databases, only once or so on my own) :)

Re: which ext3 fs type should I use for postgresql

From
"Joshua D. Drake"
Date:
On Fri, 16 May 2008 11:07:17 -0600
"Scott Marlowe" <scott.marlowe@gmail.com> wrote:

> Sorry, but that's like saying that open heart surgery isn't a fix for
> clogged arteries because you should have been taking aspirin everyday
> and exercising.  It might not be the best answer, but sometimes it's
> the only answer you've got.
>
> I know that being able to drop the margin from x% to 0% for 10 minutes
> has pulled more than one db back from the brink for me (usually
> consulting on other people's databases, only once or so on my own) :)

My point is, if you are adjusting that parameter you probably have a
stray log or a bunch of rpms etc... that can be truncated to get
you where you need to be.

Of course there is always the last ditch effort of what you suggest but
first you should look for the more obvious possible solution.

Sincerely,

Joshua D. Drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
United States PostgreSQL Association: http://www.postgresql.us/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate



Attachment