Thread: How filesystems matter with PostgreSQL

How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

04 June 2010, 09:27:43

Some interesting data about different filesystems I tried with
PostgreSQL and how it came out.

I have an application that is backed in postgres using Java JDBC to
access it. The tests were all done on an opensuse 11.2 64-bit machine,
on the same hard drive (just ran mkfs between each test) on the same
input with the same code base. All filesystems were created with the
default options.

XFS (logbufs=8): ~4 hours to finish
ext4: ~1 hour 50 minutes to finish
ext3: 15 minutes to finish
ext3 on LVM: 15 minutes to finish


--
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

Re: How filesystems matter with PostgreSQL

From

"J. Roeleveld"

Date:

04 June 2010, 09:39:27

On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
> Some interesting data about different filesystems I tried with
> PostgreSQL and how it came out.
>
> I have an application that is backed in postgres using Java JDBC to
> access it. The tests were all done on an opensuse 11.2 64-bit machine,
> on the same hard drive (just ran mkfs between each test) on the same
> input with the same code base. All filesystems were created with the
> default options.
>
> XFS (logbufs=8): ~4 hours to finish
> ext4: ~1 hour 50 minutes to finish
> ext3: 15 minutes to finish
> ext3 on LVM: 15 minutes to finish
>

Hi Jon,

Any chance you can do the same test with reiserfs?

Thanks,

Joost

Re: How filesystems matter with PostgreSQL

From

Andres Freund

Date:

04 June 2010, 10:04:14

On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
> Some interesting data about different filesystems I tried with
> PostgreSQL and how it came out.
>
> I have an application that is backed in postgres using Java JDBC to
> access it. The tests were all done on an opensuse 11.2 64-bit machine,
> on the same hard drive (just ran mkfs between each test) on the same
> input with the same code base. All filesystems were created with the
> default options.
>
> XFS (logbufs=8): ~4 hours to finish
> ext4: ~1 hour 50 minutes to finish
> ext3: 15 minutes to finish
> ext3 on LVM: 15 minutes to finish
My guess is that some of the difference comes from barrier differences. ext4
uses barriers by default, ext3 does not.

Andres

Re: How filesystems matter with PostgreSQL

From

Tom Lane

Date:

04 June 2010, 11:25:43

Andres Freund <andres@anarazel.de> writes:
> On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
>> XFS (logbufs=8): ~4 hours to finish
>> ext4: ~1 hour 50 minutes to finish
>> ext3: 15 minutes to finish
>> ext3 on LVM: 15 minutes to finish

> My guess is that some of the difference comes from barrier differences. ext4
> uses barriers by default, ext3 does not.

Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.

            regards, tom lane

Re: How filesystems matter with PostgreSQL

From

Andres Freund

Date:

04 June 2010, 11:33:34

On Friday 04 June 2010 16:25:30 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
> >> XFS (logbufs=8): ~4 hours to finish
> >> ext4: ~1 hour 50 minutes to finish
> >> ext3: 15 minutes to finish
> >> ext3 on LVM: 15 minutes to finish
> >
> > My guess is that some of the difference comes from barrier differences.
> > ext4 uses barriers by default, ext3 does not.
> Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.
Jon: To verify you can enable it via the barrier=1 option during mounting..

Andres

Re: How filesystems matter with PostgreSQL

From

Bryan Hinton

Date:

04 June 2010, 15:20:18

UFS2 w/ soft updates on FreeBSD might be an interesting addition to the list of test cases

On Fri, Jun 4, 2010 at 9:33 AM, Andres Freund <andres@anarazel.de> wrote:

On Friday 04 June 2010 16:25:30 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
> >> XFS (logbufs=8): ~4 hours to finish
> >> ext4: ~1 hour 50 minutes to finish
> >> ext3: 15 minutes to finish
> >> ext3 on LVM: 15 minutes to finish
> >
> > My guess is that some of the difference comes from barrier differences.
> > ext4 uses barriers by default, ext3 does not.
> Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.
Jon: To verify you can enable it via the barrier=1 option during mounting..

Andres

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

04 June 2010, 15:23:18

I'm running on Linux, so that's not really an option here.

On 6/4/10 1:20 PM, Bryan Hinton wrote:

UFS2 w/ soft updates on FreeBSD might be an interesting addition to the list of test cases

On Fri, Jun 4, 2010 at 9:33 AM, Andres Freund <andres@anarazel.de> wrote:
On Friday 04 June 2010 16:25:30 Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
> >> XFS (logbufs=8): ~4 hours to finish
> >> ext4: ~1 hour 50 minutes to finish
> >> ext3: 15 minutes to finish
> >> ext3 on LVM: 15 minutes to finish
> >
> > My guess is that some of the difference comes from barrier differences.
> > ext4 uses barriers by default, ext3 does not.
> Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.
Jon: To verify you can enable it via the barrier=1 option during mounting..

Andres

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

04 June 2010, 15:26:38

On 6/4/10 9:33 AM, Andres Freund wrote:
> On Friday 04 June 2010 16:25:30 Tom Lane wrote:
>
>> Andres Freund <andres@anarazel.de> writes:
>>
>>> On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
>>>
>>>> XFS (logbufs=8): ~4 hours to finish
>>>> ext4: ~1 hour 50 minutes to finish
>>>> ext3: 15 minutes to finish
>>>> ext3 on LVM: 15 minutes to finish
>>>>
>>> My guess is that some of the difference comes from barrier differences.
>>> ext4 uses barriers by default, ext3 does not.
>>>
>> Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.
>>
> Jon: To verify you can enable it via the barrier=1 option during mounting..
>
>
>
First some details:
Linux kernel 2.6.31
postgres version: 8.4.2

More test results:
reiserfs: ~1 hour 50 minutes
ext3 barrier=1: ~15 minutes
ext4 nobarrier: ~15 minutes
jfs: ~15 minutes

--
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

Re: How filesystems matter with PostgreSQL

From

Bryan Hinton

Date:

04 June 2010, 15:44:26

What types of journaling on each fs?

On Fri, Jun 4, 2010 at 1:26 PM, Jon Schewe <jpschewe@mtu.net> wrote:

On 6/4/10 9:33 AM, Andres Freund wrote:
> On Friday 04 June 2010 16:25:30 Tom Lane wrote:
>
>> Andres Freund <andres@anarazel.de> writes:
>>
>>> On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
>>>
>>>> XFS (logbufs=8): ~4 hours to finish
>>>> ext4: ~1 hour 50 minutes to finish
>>>> ext3: 15 minutes to finish
>>>> ext3 on LVM: 15 minutes to finish
>>>>
>>> My guess is that some of the difference comes from barrier differences.
>>> ext4 uses barriers by default, ext3 does not.
>>>
>> Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.
>>
> Jon: To verify you can enable it via the barrier=1 option during mounting..
>
>
>
First some details:
Linux kernel 2.6.31
postgres version: 8.4.2

More test results:
reiserfs: ~1 hour 50 minutes
ext3 barrier=1: ~15 minutes
ext4 nobarrier: ~15 minutes
jfs: ~15 minutes

--
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

04 June 2010, 15:44:29

I just used standard mkfs for each filesystem and mounted them without options, unless otherwise specified.

On 6/4/10 1:37 PM, Bryan Hinton wrote:

What types of journaling on each fs?

On Fri, Jun 4, 2010 at 1:26 PM, Jon Schewe <jpschewe@mtu.net> wrote:
On 6/4/10 9:33 AM, Andres Freund wrote:
> On Friday 04 June 2010 16:25:30 Tom Lane wrote:
>
>> Andres Freund <andres@anarazel.de> writes:
>>
>>> On Friday 04 June 2010 14:17:35 Jon Schewe wrote:
>>>
>>>> XFS (logbufs=8): ~4 hours to finish
>>>> ext4: ~1 hour 50 minutes to finish
>>>> ext3: 15 minutes to finish
>>>> ext3 on LVM: 15 minutes to finish
>>>>
>>> My guess is that some of the difference comes from barrier differences.
>>> ext4 uses barriers by default, ext3 does not.
>>>
>> Or, to put it more clearly: the reason ext3 is fast is that it's unsafe.
>>
> Jon: To verify you can enable it via the barrier=1 option during mounting..
>
>
>
First some details:
Linux kernel 2.6.31
postgres version: 8.4.2

More test results:
reiserfs: ~1 hour 50 minutes
ext3 barrier=1: ~15 minutes
ext4 nobarrier: ~15 minutes
jfs: ~15 minutes

--
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

Re: How filesystems matter with PostgreSQL

From

Andres Freund

Date:

04 June 2010, 15:47:11

On Friday 04 June 2010 20:26:27 Jon Schewe wrote:
> ext3 barrier=1: ~15 minutes
> ext4 nobarrier: ~15 minutes
Any message in the kernel log about barriers or similar?

Andres

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

04 June 2010, 15:49:15

On 6/4/10 1:46 PM, Andres Freund wrote:
> On Friday 04 June 2010 20:26:27 Jon Schewe wrote:
>
>> ext3 barrier=1: ~15 minutes
>> ext4 nobarrier: ~15 minutes
>>
> Any message in the kernel log about barriers or similar?
>
>
No.


--
Jon Schewe | http://mtu.net/~jpschewe
If you see an attachment named signature.asc, this is my digital
signature. See http://www.gnupg.org for more information.

Re: How filesystems matter with PostgreSQL

From

Greg Smith

Date:

05 June 2010, 19:37:03

Jon Schewe wrote:
> The tests were all done on an opensuse 11.2 64-bit machine,
> on the same hard drive (just ran mkfs between each test) on the same
> input with the same code base.

So no controller card, just the motherboard and a single hard drive?  If
that's the case, what you've measured is which filesystems are safe
because they default to flushing drive cache (the ones that take around
15 minutes) and which do not (the ones that take >=around 2 hours).  You
can't make ext3 flush the cache correctly no matter what you do with
barriers, they just don't work on ext3 the way PostgreSQL needs them to.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

05 June 2010, 19:41:16

On 06/05/2010 05:36 PM, Greg Smith wrote:
> Jon Schewe wrote:
>> The tests were all done on an opensuse 11.2 64-bit machine,
>> on the same hard drive (just ran mkfs between each test) on the same
>> input with the same code base.
>
> So no controller card, just the motherboard and a single hard drive?
Correct.
>   If that's the case, what you've measured is which filesystems are
> safe because they default to flushing drive cache (the ones that take
> around 15 minutes) and which do not (the ones that take >=around 2
> hours).  You can't make ext3 flush the cache correctly no matter what
> you do with barriers, they just don't work on ext3 the way PostgreSQL
> needs them to.
>
So the 15 minute runs are doing it correctly and safely, but the slow
ones are doing the wrong thing? That would imply that ext3 is the safe
one. But your last statement suggests that ext3 is doing the wrong thing.

Re: How filesystems matter with PostgreSQL

From

Greg Smith

Date:

05 June 2010, 19:53:08

Jon Schewe wrote:
>>   If that's the case, what you've measured is which filesystems are
>> safe because they default to flushing drive cache (the ones that take
>> around 15 minutes) and which do not (the ones that take >=around 2
>> hours).  You can't make ext3 flush the cache correctly no matter what
>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>> needs them to.
>>
>>
> So the 15 minute runs are doing it correctly and safely, but the slow
> ones are doing the wrong thing? That would imply that ext3 is the safe
> one. But your last statement suggests that ext3 is doing the wrong thing.
>

I goofed and reversed the two times when writing that.  As is always the
case with this sort of thing, the unsafe runs are the fast ones.  ext3
does not ever do the right thing no matter how you configure it, you
have to compensate for its limitations with correct hardware setup to
make database writes reliable.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

05 June 2010, 20:04:11


On 06/05/2010 05:52 PM, Greg Smith wrote:
> Jon Schewe wrote:
>>>   If that's the case, what you've measured is which filesystems are
>>> safe because they default to flushing drive cache (the ones that take
>>> around 15 minutes) and which do not (the ones that take >=around 2
>>> hours).  You can't make ext3 flush the cache correctly no matter what
>>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>>> needs them to.
>>>
>>>
>> So the 15 minute runs are doing it correctly and safely, but the slow
>> ones are doing the wrong thing? That would imply that ext3 is the safe
>> one. But your last statement suggests that ext3 is doing the wrong
>> thing.
>>
>
> I goofed and reversed the two times when writing that.  As is always
> the case with this sort of thing, the unsafe runs are the fast ones.
> ext3 does not ever do the right thing no matter how you configure it,
> you have to compensate for its limitations with correct hardware setup
> to make database writes reliable.
>
OK, so if I want the 15 minute speed, I need to give up safety (OK in
this case as this is just research testing), or see if I can tune
postgres better.

Re: How filesystems matter with PostgreSQL

From

Scott Marlowe

Date:

05 June 2010, 20:54:47

On Sat, Jun 5, 2010 at 5:03 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>
>
> On 06/05/2010 05:52 PM, Greg Smith wrote:
>> Jon Schewe wrote:
>>>>   If that's the case, what you've measured is which filesystems are
>>>> safe because they default to flushing drive cache (the ones that take
>>>> around 15 minutes) and which do not (the ones that take >=around 2
>>>> hours).  You can't make ext3 flush the cache correctly no matter what
>>>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>>>> needs them to.
>>>>
>>>>
>>> So the 15 minute runs are doing it correctly and safely, but the slow
>>> ones are doing the wrong thing? That would imply that ext3 is the safe
>>> one. But your last statement suggests that ext3 is doing the wrong
>>> thing.
>>>
>>
>> I goofed and reversed the two times when writing that.  As is always
>> the case with this sort of thing, the unsafe runs are the fast ones.
>> ext3 does not ever do the right thing no matter how you configure it,
>> you have to compensate for its limitations with correct hardware setup
>> to make database writes reliable.
>>
> OK, so if I want the 15 minute speed, I need to give up safety (OK in
> this case as this is just research testing), or see if I can tune
> postgres better.

Or use a trustworthy hardware caching battery backed RAID controller,
either in RAID mode or JBOD mode.

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

05 June 2010, 20:59:09

On 06/05/2010 06:54 PM, Scott Marlowe wrote:
> On Sat, Jun 5, 2010 at 5:03 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>
>>
>> On 06/05/2010 05:52 PM, Greg Smith wrote:
>>
>>> Jon Schewe wrote:
>>>
>>>>>   If that's the case, what you've measured is which filesystems are
>>>>> safe because they default to flushing drive cache (the ones that take
>>>>> around 15 minutes) and which do not (the ones that take >=around 2
>>>>> hours).  You can't make ext3 flush the cache correctly no matter what
>>>>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>>>>> needs them to.
>>>>>
>>>>>
>>>>>
>>>> So the 15 minute runs are doing it correctly and safely, but the slow
>>>> ones are doing the wrong thing? That would imply that ext3 is the safe
>>>> one. But your last statement suggests that ext3 is doing the wrong
>>>> thing.
>>>>
>>>>
>>> I goofed and reversed the two times when writing that.  As is always
>>> the case with this sort of thing, the unsafe runs are the fast ones.
>>> ext3 does not ever do the right thing no matter how you configure it,
>>> you have to compensate for its limitations with correct hardware setup
>>> to make database writes reliable.
>>>
>>>
>> OK, so if I want the 15 minute speed, I need to give up safety (OK in
>> this case as this is just research testing), or see if I can tune
>> postgres better.
>>
> Or use a trustworthy hardware caching battery backed RAID controller,
> either in RAID mode or JBOD mode.
>
Right, because the real danger here is if the power goes out you can end
up with a scrambled database, correct?

Re: How filesystems matter with PostgreSQL

From

Scott Marlowe

Date:

05 June 2010, 21:02:28

On Sat, Jun 5, 2010 at 5:58 PM, Jon Schewe <jpschewe@mtu.net> wrote:
> On 06/05/2010 06:54 PM, Scott Marlowe wrote:
>> On Sat, Jun 5, 2010 at 5:03 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>>
>>>
>>> On 06/05/2010 05:52 PM, Greg Smith wrote:
>>>
>>>> Jon Schewe wrote:
>>>>
>>>>>>   If that's the case, what you've measured is which filesystems are
>>>>>> safe because they default to flushing drive cache (the ones that take
>>>>>> around 15 minutes) and which do not (the ones that take >=around 2
>>>>>> hours).  You can't make ext3 flush the cache correctly no matter what
>>>>>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>>>>>> needs them to.
>>>>>>
>>>>>>
>>>>>>
>>>>> So the 15 minute runs are doing it correctly and safely, but the slow
>>>>> ones are doing the wrong thing? That would imply that ext3 is the safe
>>>>> one. But your last statement suggests that ext3 is doing the wrong
>>>>> thing.
>>>>>
>>>>>
>>>> I goofed and reversed the two times when writing that.  As is always
>>>> the case with this sort of thing, the unsafe runs are the fast ones.
>>>> ext3 does not ever do the right thing no matter how you configure it,
>>>> you have to compensate for its limitations with correct hardware setup
>>>> to make database writes reliable.
>>>>
>>>>
>>> OK, so if I want the 15 minute speed, I need to give up safety (OK in
>>> this case as this is just research testing), or see if I can tune
>>> postgres better.
>>>
>> Or use a trustworthy hardware caching battery backed RAID controller,
>> either in RAID mode or JBOD mode.
>>
> Right, because the real danger here is if the power goes out you can end
> up with a scrambled database, correct?

Correct.  Assuming you can get power applied again before the battery
in the RAID controller dies, it will then flush out its cache and your
data will still be coherent.

Re: How filesystems matter with PostgreSQL

From

Jon Schewe

Date:

05 June 2010, 21:07:19

On 06/05/2010 07:02 PM, Scott Marlowe wrote:
> On Sat, Jun 5, 2010 at 5:58 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>
>> On 06/05/2010 06:54 PM, Scott Marlowe wrote:
>>
>>> On Sat, Jun 5, 2010 at 5:03 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>>>
>>>
>>>> On 06/05/2010 05:52 PM, Greg Smith wrote:
>>>>
>>>>
>>>>> Jon Schewe wrote:
>>>>>
>>>>>
>>>>>>>   If that's the case, what you've measured is which filesystems are
>>>>>>> safe because they default to flushing drive cache (the ones that take
>>>>>>> around 15 minutes) and which do not (the ones that take >=around 2
>>>>>>> hours).  You can't make ext3 flush the cache correctly no matter what
>>>>>>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>>>>>>> needs them to.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> So the 15 minute runs are doing it correctly and safely, but the slow
>>>>>> ones are doing the wrong thing? That would imply that ext3 is the safe
>>>>>> one. But your last statement suggests that ext3 is doing the wrong
>>>>>> thing.
>>>>>>
>>>>>>
>>>>>>
>>>>> I goofed and reversed the two times when writing that.  As is always
>>>>> the case with this sort of thing, the unsafe runs are the fast ones.
>>>>> ext3 does not ever do the right thing no matter how you configure it,
>>>>> you have to compensate for its limitations with correct hardware setup
>>>>> to make database writes reliable.
>>>>>
>>>>>
>>>>>
>>>> OK, so if I want the 15 minute speed, I need to give up safety (OK in
>>>> this case as this is just research testing), or see if I can tune
>>>> postgres better.
>>>>
>>>>
>>> Or use a trustworthy hardware caching battery backed RAID controller,
>>> either in RAID mode or JBOD mode.
>>>
>>>
>> Right, because the real danger here is if the power goes out you can end
>> up with a scrambled database, correct?
>>
> Correct.  Assuming you can get power applied again before the battery
> in the RAID controller dies, it will then flush out its cache and your
> data will still be coherent.
>
Or if you really don't care if your database is scrambled after a power
outage you can go without the battery backed RAID controller.

Re: How filesystems matter with PostgreSQL

From

Scott Marlowe

Date:

06 June 2010, 00:34:18

On Sat, Jun 5, 2010 at 6:07 PM, Jon Schewe <jpschewe@mtu.net> wrote:
> On 06/05/2010 07:02 PM, Scott Marlowe wrote:
>> On Sat, Jun 5, 2010 at 5:58 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>>
>>> On 06/05/2010 06:54 PM, Scott Marlowe wrote:
>>>
>>>> On Sat, Jun 5, 2010 at 5:03 PM, Jon Schewe <jpschewe@mtu.net> wrote:
>>>>
>>>>
>>>>> On 06/05/2010 05:52 PM, Greg Smith wrote:
>>>>>
>>>>>
>>>>>> Jon Schewe wrote:
>>>>>>
>>>>>>
>>>>>>>>   If that's the case, what you've measured is which filesystems are
>>>>>>>> safe because they default to flushing drive cache (the ones that take
>>>>>>>> around 15 minutes) and which do not (the ones that take >=around 2
>>>>>>>> hours).  You can't make ext3 flush the cache correctly no matter what
>>>>>>>> you do with barriers, they just don't work on ext3 the way PostgreSQL
>>>>>>>> needs them to.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> So the 15 minute runs are doing it correctly and safely, but the slow
>>>>>>> ones are doing the wrong thing? That would imply that ext3 is the safe
>>>>>>> one. But your last statement suggests that ext3 is doing the wrong
>>>>>>> thing.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> I goofed and reversed the two times when writing that.  As is always
>>>>>> the case with this sort of thing, the unsafe runs are the fast ones.
>>>>>> ext3 does not ever do the right thing no matter how you configure it,
>>>>>> you have to compensate for its limitations with correct hardware setup
>>>>>> to make database writes reliable.
>>>>>>
>>>>>>
>>>>>>
>>>>> OK, so if I want the 15 minute speed, I need to give up safety (OK in
>>>>> this case as this is just research testing), or see if I can tune
>>>>> postgres better.
>>>>>
>>>>>
>>>> Or use a trustworthy hardware caching battery backed RAID controller,
>>>> either in RAID mode or JBOD mode.
>>>>
>>>>
>>> Right, because the real danger here is if the power goes out you can end
>>> up with a scrambled database, correct?
>>>
>> Correct.  Assuming you can get power applied again before the battery
>> in the RAID controller dies, it will then flush out its cache and your
>> data will still be coherent.
>>
> Or if you really don't care if your database is scrambled after a power
> outage you can go without the battery backed RAID controller.

I do that all the time. On slony replication slaves.  You can use a
considerably less powerful machine, IO wise, with fsync disabled and a
handful of cheap SATA drives.

Re: How filesystems matter with PostgreSQL

From

Ron Mayer

Date:

06 June 2010, 03:51:19

Jon Schewe wrote:

> OK, so if I want the 15 minute speed, I need to give up safety (OK in
> this case as this is just research testing), or see if I can tune
> postgres better.

Depending on your app, one more possibility would be to see if you
can re-factor the application so it can do multiple writes in parallel
rather than waiting for each one to complete.   If I understand right,
then many transactions could potentially be handled by a single fsync.

Re: How filesystems matter with PostgreSQL

From

Craig Ringer

Date:

06 June 2010, 04:48:38

On 06/06/10 14:51, Ron Mayer wrote:
> Jon Schewe wrote:
>
>> OK, so if I want the 15 minute speed, I need to give up safety (OK in
>> this case as this is just research testing), or see if I can tune
>> postgres better.
>
> Depending on your app, one more possibility would be to see if you
> can re-factor the application so it can do multiple writes in parallel
> rather than waiting for each one to complete.   If I understand right,
> then many transactions could potentially be handled by a single fsync.

By using a commit delay, yes. (see postgresql.conf). You do open up the
risk of losing transactions committed within the commit delay period,
but you don't risk corruption like you do with fsync.

Sometimes you can also batch work into bigger transactions. The classic
example here is the usual long stream of individual auto-committed
INSERTs, which when wrapped in an explicit transaction can be vastly
quicker.

--
Craig Ringer