Thread: Re: Postgresql Performance on an HP DL385 and

Re: Postgresql Performance on an HP DL385 and

From

"Luke Lonergan"

Date:

09 August 2006, 02:25:55

Steve,

> Sun box with 4-disc array (4GB RAM. 4 167GB 10K SCSI RAID10
> LSI MegaRAID 128MB). This is after 8 runs.
>
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,us,12,2,5
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,sy,59,50,53
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,wa,1,0,0
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,id,45,26,38
>
> Average TPS is 75
>
> HP box with 8GB RAM. six disc array RAID10 on SmartArray 642
> with 192MB RAM. After 8 runs, I see:
>
> intown-vetstar-amd64,08/09/06,Tuesday,23,us,31,0,3
> intown-vetstar-amd64,08/09/06,Tuesday,23,sy,16,0,1
> intown-vetstar-amd64,08/09/06,Tuesday,23,wa,99,6,50
> intown-vetstar-amd64,08/09/06,Tuesday,23,id,78,0,42
>
> Average TPS is 31.

Note that the I/O wait (wa) on the HP box high, low and average are all
*much* higher than on the Sun box.  The average I/O wait was 50% of one
CPU, which is huge.  By comparison there was virtually no I/O wait on
the Sun machine.

This is indicating that your HP machine is indeed I/O bound and
furthermore is tying up a PG process waiting for the disk to return.

- Luke

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

09 August 2006, 02:45:18

Luke,

I thought so. In my test, I tried to be fair/equal since my Sun box has two 4-disc arrays each on their own channel. So, I just used one of them which should be a little slower than the 6-disc with 192MB cache.

Incidently, the two internal SCSI drives, which are on the 6i adapter, generated a TPS of 18.

I thought this server would impressive from notes I've read in the group. This is why I thought I might be doing something wrong. I stumped which way to take this. There is no obvious fault but something isn't right.

Steve

On 8/8/06, Luke Lonergan <LLonergan@greenplum.com> wrote:

Steve,

> Sun box with 4-disc array (4GB RAM. 4 167GB 10K SCSI RAID10
> LSI MegaRAID 128MB). This is after 8 runs.
>
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,us,12,2,5
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,sy,59,50,53
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,wa,1,0,0
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,id,45,26,38
>
> Average TPS is 75
>
> HP box with 8GB RAM. six disc array RAID10 on SmartArray 642
> with 192MB RAM. After 8 runs, I see:
>
> intown-vetstar-amd64,08/09/06,Tuesday,23,us,31,0,3
> intown-vetstar-amd64,08/09/06,Tuesday,23,sy,16,0,1
> intown-vetstar-amd64,08/09/06,Tuesday,23,wa,99,6,50
> intown-vetstar-amd64,08/09/06,Tuesday,23,id,78,0,42
>
> Average TPS is 31.

Note that the I/O wait (wa) on the HP box high, low and average are all
*much* higher than on the Sun box. The average I/O wait was 50% of one
CPU, which is huge. By comparison there was virtually no I/O wait on
the Sun machine.

This is indicating that your HP machine is indeed I/O bound and
furthermore is tying up a PG process waiting for the disk to return.

- Luke

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

09 August 2006, 03:33:24

Luke,

I check dmesg one more time and I found this regarding the cciss driver:

Filesystem "cciss/c1d0p1": Disabling barriers, not supported by the underlying device.

Don't know if it means anything, but thought I'd mention it.

Steve

On 8/8/06, Steve Poe <steve.poe@gmail.com> wrote:

Luke,

I thought so. In my test, I tried to be fair/equal since my Sun box has two 4-disc arrays each on their own channel. So, I just used one of them which should be a little slower than the 6-disc with 192MB cache.

Incidently, the two internal SCSI drives, which are on the 6i adapter, generated a TPS of 18.

I thought this server would impressive from notes I've read in the group. This is why I thought I might be doing something wrong. I stumped which way to take this. There is no obvious fault but something isn't right.

Steve

On 8/8/06, Luke Lonergan < LLonergan@greenplum.com> wrote:
Steve,

> Sun box with 4-disc array (4GB RAM. 4 167GB 10K SCSI RAID10
> LSI MegaRAID 128MB). This is after 8 runs.
>
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,us,12,2,5
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,sy,59,50,53
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,wa,1,0,0
> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,id,45,26,38
>
> Average TPS is 75
>
> HP box with 8GB RAM. six disc array RAID10 on SmartArray 642
> with 192MB RAM. After 8 runs, I see:
>
> intown-vetstar-amd64,08/09/06,Tuesday,23,us,31,0,3
> intown-vetstar-amd64,08/09/06,Tuesday,23,sy,16,0,1
> intown-vetstar-amd64,08/09/06,Tuesday,23,wa,99,6,50
> intown-vetstar-amd64,08/09/06,Tuesday,23,id,78,0,42
>
> Average TPS is 31.

Note that the I/O wait (wa) on the HP box high, low and average are all
*much* higher than on the Sun box. The average I/O wait was 50% of one
CPU, which is huge. By comparison there was virtually no I/O wait on
the Sun machine.

This is indicating that your HP machine is indeed I/O bound and
furthermore is tying up a PG process waiting for the disk to return.

- Luke

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

09 August 2006, 18:06:09

On Tue, Aug 08, 2006 at 10:45:07PM -0700, Steve Poe wrote:
> Luke,
>
> I thought so. In my test, I tried to be fair/equal since my Sun box has two
> 4-disc arrays each on their own channel. So, I just used one of them which
> should be a little slower than the 6-disc with 192MB cache.
>
> Incidently, the two internal SCSI drives, which are on the 6i adapter,
> generated a TPS of 18.

You should try putting pg_xlog on the 6 drive array with the data. My
(limited) experience with such a config is that on a good controller
with writeback caching enabled it won't hurt you, and if the internal
drives aren't caching writes it'll probably help you a lot.

> I thought this server would impressive from notes I've read in the group.
> This is why I thought I might be doing something wrong. I stumped which way
> to take this. There is no obvious fault but something isn't right.
>
> Steve
>
> On 8/8/06, Luke Lonergan <LLonergan@greenplum.com> wrote:
> >
> >Steve,
> >
> >> Sun box with 4-disc array (4GB RAM. 4 167GB 10K SCSI RAID10
> >> LSI MegaRAID 128MB). This is after 8 runs.
> >>
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,us,12,2,5
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,sy,59,50,53
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,wa,1,0,0
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,id,45,26,38
> >>
> >> Average TPS is 75
> >>
> >> HP box with 8GB RAM. six disc array RAID10 on SmartArray 642
> >> with 192MB RAM. After 8 runs, I see:
> >>
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,us,31,0,3
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,sy,16,0,1
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,wa,99,6,50
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,id,78,0,42
> >>
> >> Average TPS is 31.
> >
> >Note that the I/O wait (wa) on the HP box high, low and average are all
> >*much* higher than on the Sun box.  The average I/O wait was 50% of one
> >CPU, which is huge.  By comparison there was virtually no I/O wait on
> >the Sun machine.
> >
> >This is indicating that your HP machine is indeed I/O bound and
> >furthermore is tying up a PG process waiting for the disk to return.
> >
> >- Luke
> >
> >

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

09 August 2006, 18:12:01

Jim,

I'll give it a try. However, I did not see anywhere in the BIOS configuration of the 642 RAID adapter to enable writeback. It may have been mislabled cache accelerator where you can give a percentage to read/write. That aspect did not change the performance like the LSI MegaRAID adapter does.

Steve

On 8/9/06, Jim C. Nasby <jnasby@pervasive.com> wrote:

On Tue, Aug 08, 2006 at 10:45:07PM -0700, Steve Poe wrote:
> Luke,
>
> I thought so. In my test, I tried to be fair/equal since my Sun box has two
> 4-disc arrays each on their own channel. So, I just used one of them which
> should be a little slower than the 6-disc with 192MB cache.
>
> Incidently, the two internal SCSI drives, which are on the 6i adapter,
> generated a TPS of 18.

You should try putting pg_xlog on the 6 drive array with the data. My
(limited) experience with such a config is that on a good controller
with writeback caching enabled it won't hurt you, and if the internal
drives aren't caching writes it'll probably help you a lot.

> I thought this server would impressive from notes I've read in the group.
> This is why I thought I might be doing something wrong. I stumped which way
> to take this. There is no obvious fault but something isn't right.
>
> Steve
>
> On 8/8/06, Luke Lonergan < LLonergan@greenplum.com> wrote:
> >
> >Steve,
> >
> >> Sun box with 4-disc array (4GB RAM. 4 167GB 10K SCSI RAID10
> >> LSI MegaRAID 128MB). This is after 8 runs.
> >>
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,us,12,2,5
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,sy,59,50,53
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,wa,1,0,0
> >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,id,45,26,38
> >>
> >> Average TPS is 75
> >>
> >> HP box with 8GB RAM. six disc array RAID10 on SmartArray 642
> >> with 192MB RAM. After 8 runs, I see:
> >>
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,us,31,0,3
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,sy,16,0,1
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,wa,99,6,50
> >> intown-vetstar-amd64,08/09/06,Tuesday,23,id,78,0,42
> >>
> >> Average TPS is 31.
> >
> >Note that the I/O wait (wa) on the HP box high, low and average are all
> >*much* higher than on the Sun box.  The average I/O wait was 50% of one
> >CPU, which is huge.  By comparison there was virtually no I/O wait on
> >the Sun machine.
> >
> >This is indicating that your HP machine is indeed I/O bound and
> >furthermore is tying up a PG process waiting for the disk to return.
> >
> >- Luke
> >
> >

--
Jim C. Nasby, Sr. Engineering Consultant       jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf        cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

Scott Marlowe

Date:

09 August 2006, 18:37:35

On Wed, 2006-08-09 at 16:11, Steve Poe wrote:
> Jim,
>
> I'll give it a try. However, I did not see anywhere in the BIOS
> configuration of the 642 RAID adapter to enable writeback. It may have
> been mislabled cache accelerator where you can give a percentage to
> read/write. That aspect did not change the performance like the LSI
> MegaRAID adapter does.

Nope, that's not the same thing.

Does your raid controller have batter backed cache, or plain or regular
cache?  write back is unsafe without battery backup.

The default is write through (i.e. the card waits for the data to get
written out before acking an fsync).  In write back, the card's driver
writes the data to the bb cache, then returns on an fsync while the
cache gets written out at leisure.  In the event of a loss of power, the
cache is flushed on restart.

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

09 August 2006, 20:47:25

I believe it does, I'll need to check.Thanks for the correction.

Steve

On 8/9/06, Scott Marlowe <smarlowe@g2switchworks.com> wrote:

On Wed, 2006-08-09 at 16:11, Steve Poe wrote:
> Jim,
>
> I'll give it a try. However, I did not see anywhere in the BIOS
> configuration of the 642 RAID adapter to enable writeback. It may have
> been mislabled cache accelerator where you can give a percentage to
> read/write. That aspect did not change the performance like the LSI
> MegaRAID adapter does.

Nope, that's not the same thing.

Does your raid controller have batter backed cache, or plain or regular
cache?  write back is unsafe without battery backup.

The default is write through (i.e. the card waits for the data to get
written out before acking an fsync).  In write back, the card's driver
writes the data to the bb cache, then returns on an fsync while the
cache gets written out at leisure.  In the event of a loss of power, the
cache is flushed on restart.

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

09 August 2006, 22:24:19

Scott,

Do you know how to activate the writeback on the RAID controller from HP?

Steve

On 8/9/06, Scott Marlowe < smarlowe@g2switchworks.com> wrote:

On Wed, 2006-08-09 at 16:11, Steve Poe wrote:
> Jim,
>
> I'll give it a try. However, I did not see anywhere in the BIOS
> configuration of the 642 RAID adapter to enable writeback. It may have
> been mislabled cache accelerator where you can give a percentage to
> read/write. That aspect did not change the performance like the LSI
> MegaRAID adapter does.

Nope, that's not the same thing.

Does your raid controller have batter backed cache, or plain or regular
cache?  write back is unsafe without battery backup.

The default is write through (i.e. the card waits for the data to get
written out before acking an fsync).  In write back, the card's driver
writes the data to the bb cache, then returns on an fsync while the
cache gets written out at leisure.  In the event of a loss of power, the
cache is flushed on restart.

Re: Postgresql Performance on an HP DL385 and

From

Steve Poe

Date:

10 August 2006, 00:29:34

Jim,

I tried as you suggested and my performance dropped by 50%. I went from
a 32 TPS to 16. Oh well.

Steve

On Wed, 2006-08-09 at 16:05 -0500, Jim C. Nasby wrote:
> On Tue, Aug 08, 2006 at 10:45:07PM -0700, Steve Poe wrote:
> > Luke,
> >
> > I thought so. In my test, I tried to be fair/equal since my Sun box has two
> > 4-disc arrays each on their own channel. So, I just used one of them which
> > should be a little slower than the 6-disc with 192MB cache.
> >
> > Incidently, the two internal SCSI drives, which are on the 6i adapter,
> > generated a TPS of 18.
>
> You should try putting pg_xlog on the 6 drive array with the data. My
> (limited) experience with such a config is that on a good controller
> with writeback caching enabled it won't hurt you, and if the internal
> drives aren't caching writes it'll probably help you a lot.
>
> > I thought this server would impressive from notes I've read in the group.
> > This is why I thought I might be doing something wrong. I stumped which way
> > to take this. There is no obvious fault but something isn't right.
> >
> > Steve
> >
> > On 8/8/06, Luke Lonergan <LLonergan@greenplum.com> wrote:
> > >
> > >Steve,
> > >
> > >> Sun box with 4-disc array (4GB RAM. 4 167GB 10K SCSI RAID10
> > >> LSI MegaRAID 128MB). This is after 8 runs.
> > >>
> > >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,us,12,2,5
> > >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,sy,59,50,53
> > >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,wa,1,0,0
> > >> dbserver-dual-opteron-centos,08/08/06,Tuesday,20,id,45,26,38
> > >>
> > >> Average TPS is 75
> > >>
> > >> HP box with 8GB RAM. six disc array RAID10 on SmartArray 642
> > >> with 192MB RAM. After 8 runs, I see:
> > >>
> > >> intown-vetstar-amd64,08/09/06,Tuesday,23,us,31,0,3
> > >> intown-vetstar-amd64,08/09/06,Tuesday,23,sy,16,0,1
> > >> intown-vetstar-amd64,08/09/06,Tuesday,23,wa,99,6,50
> > >> intown-vetstar-amd64,08/09/06,Tuesday,23,id,78,0,42
> > >>
> > >> Average TPS is 31.
> > >
> > >Note that the I/O wait (wa) on the HP box high, low and average are all
> > >*much* higher than on the Sun box.  The average I/O wait was 50% of one
> > >CPU, which is huge.  By comparison there was virtually no I/O wait on
> > >the Sun machine.
> > >
> > >This is indicating that your HP machine is indeed I/O bound and
> > >furthermore is tying up a PG process waiting for the disk to return.
> > >
> > >- Luke
> > >
> > >
>

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

10 August 2006, 08:10:02

On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
>I tried as you suggested and my performance dropped by 50%. I went from
>a 32 TPS to 16. Oh well.

If you put data & xlog on the same array, put them on seperate
partitions, probably formatted differently (ext2 on xlog).

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

"Luke Lonergan"

Date:

10 August 2006, 12:16:18

Mike,

On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:

> On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
>> I tried as you suggested and my performance dropped by 50%. I went from
>> a 32 TPS to 16. Oh well.
>
> If you put data & xlog on the same array, put them on seperate
> partitions, probably formatted differently (ext2 on xlog).

If he's doing the same thing on both systems (Sun and HP) and the HP
performance is dramatically worse despite using more disks and having faster
CPUs and more RAM, ISTM the problem isn't the configuration.

Add to this the fact that the Sun machine is CPU bound while the HP is I/O
wait bound and I think the problem is the disk hardware or the driver
therein.

- Luke

Re: Postgresql Performance on an HP DL385 and

From

Scott Marlowe

Date:

10 August 2006, 12:35:20

On Thu, 2006-08-10 at 10:15, Luke Lonergan wrote:
> Mike,
>
> On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:
>
> > On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
> >> I tried as you suggested and my performance dropped by 50%. I went from
> >> a 32 TPS to 16. Oh well.
> >
> > If you put data & xlog on the same array, put them on seperate
> > partitions, probably formatted differently (ext2 on xlog).
>
> If he's doing the same thing on both systems (Sun and HP) and the HP
> performance is dramatically worse despite using more disks and having faster
> CPUs and more RAM, ISTM the problem isn't the configuration.
>
> Add to this the fact that the Sun machine is CPU bound while the HP is I/O
> wait bound and I think the problem is the disk hardware or the driver
> therein.

I agree.  The problem here looks to be the RAID controller.

Steve, got access to a different RAID controller to test with?

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

10 August 2006, 12:47:22

Scott,

I *could* rip out the LSI MegaRAID 2X from my Sun box. This belongs to me for testing. but I don't know if it will fit in the DL385. Do they have full-heigth/length slots? I've not worked on this type of box before. I was thinking this is the next step. In the meantime, I've discovered their no email support for them so I am hoping find a support contact through the sales rep that this box was purchased from.

Steve

On 8/10/06, Scott Marlowe <smarlowe@g2switchworks.com> wrote:

On Thu, 2006-08-10 at 10:15, Luke Lonergan wrote:
> Mike,
>
> On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:
>
> > On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
> >> I tried as you suggested and my performance dropped by 50%. I went from
> >> a 32 TPS to 16. Oh well.
> >
> > If you put data & xlog on the same array, put them on seperate
> > partitions, probably formatted differently (ext2 on xlog).
>
> If he's doing the same thing on both systems (Sun and HP) and the HP
> performance is dramatically worse despite using more disks and having faster
> CPUs and more RAM, ISTM the problem isn't the configuration.
>
> Add to this the fact that the Sun machine is CPU bound while the HP is I/O
> wait bound and I think the problem is the disk hardware or the driver
> therein.

I agree.  The problem here looks to be the RAID controller.

Steve, got access to a different RAID controller to test with?

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

14 August 2006, 12:38:53

On Thu, Aug 10, 2006 at 07:09:38AM -0400, Michael Stone wrote:
> On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
> >I tried as you suggested and my performance dropped by 50%. I went from
> >a 32 TPS to 16. Oh well.
>
> If you put data & xlog on the same array, put them on seperate
> partitions, probably formatted differently (ext2 on xlog).

Got any data to back that up?

The problem with seperate partitions is that it means more head movement
for the drives. If it's all one partition the pg_xlog data will tend to
be interspersed with the heap data, meaning less need for head
repositioning.

Of course, if ext2 provided enough of a speed improvement over ext3 with
data=writeback then it's possible that this would be a win, though if
the controller is good enough to make putting pg_xlog on the same array
as $PGDATA a win, I suspect it would make up for most filesystem
performance issues associated with pg_xlog as well.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

14 August 2006, 12:51:20

Jim,

I have to say Michael is onto something here to my surprise. I partitioned the RAID10 on the SmartArray 642 adapter into two parts, PGDATA formatted with XFS and pg_xlog as ext2. Performance jumped up to median of 98 TPS. I could reproduce the similar result with the LSI MegaRAID 2X adapter as well as with my own 4-disc drive array.

The problem lies with the HP SmartArray 6i adapter and/or the internal SCSI discs. Putting the pg_xlog on it kills the performance.

Steve

On 8/14/06, Jim C. Nasby <jnasby@pervasive.com> wrote:

On Thu, Aug 10, 2006 at 07:09:38AM -0400, Michael Stone wrote:
> On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
> >I tried as you suggested and my performance dropped by 50%. I went from
> >a 32 TPS to 16. Oh well.
>
> If you put data & xlog on the same array, put them on seperate
> partitions, probably formatted differently (ext2 on xlog).

Got any data to back that up?

The problem with seperate partitions is that it means more head movement
for the drives. If it's all one partition the pg_xlog data will tend to
be interspersed with the heap data, meaning less need for head
repositioning.

Of course, if ext2 provided enough of a speed improvement over ext3 with
data=writeback then it's possible that this would be a win, though if
the controller is good enough to make putting pg_xlog on the same array
as $PGDATA a win, I suspect it would make up for most filesystem
performance issues associated with pg_xlog as well.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

14 August 2006, 14:04:22

On Mon, Aug 14, 2006 at 10:38:41AM -0500, Jim C. Nasby wrote:
>Got any data to back that up?

yes. that I'm willing to dig out? no. :)

>The problem with seperate partitions is that it means more head movement
>for the drives. If it's all one partition the pg_xlog data will tend to
>be interspersed with the heap data, meaning less need for head
>repositioning.

The pg_xlog files will tend to be created up at the front of the disk
and just sit there. Any affect the positioning has one way or the other
isn't going to be measurable/repeatable. With a write cache for pg_xlog
the positioning isn't really going to matter anyway, since you don't
have to wait for a seek to do the write.

From what I've observed in testing, I'd guess that the issue is that
certain filesystem operations (including, possibly, metadata operations)
are handled in order. If you have xlog on a seperate partition there
will never be anything competing with a log write on the server side,
which won't necessarily be true on a shared filesystem. Even if you have
a battery backed write cache, you might still have to wait a relatively
long time for the pg_xlog data to be written out if there's already a
lot of other stuff in a filesystem write queue.

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

14 August 2006, 14:05:58

On Mon, Aug 14, 2006 at 08:51:09AM -0700, Steve Poe wrote:
> Jim,
>
> I have to say Michael is onto something here to my surprise. I partitioned
> the RAID10 on the SmartArray 642 adapter into two parts, PGDATA formatted
> with XFS and pg_xlog as ext2. Performance jumped up to median of 98 TPS. I
> could reproduce the similar result with the LSI MegaRAID 2X adapter as well
> as with my own 4-disc drive array.
>
> The problem lies with the HP SmartArray 6i adapter and/or the internal SCSI
> discs. Putting the pg_xlog on it kills the performance.

Wow, interesting. IIRC, XFS is lower performing than ext3, so if your
previous tests were done with XFS, that might be part of it. But without
a doubt, if you don't have a good raid controller you don't want to try
combining pg_xlog with PGDATA.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

14 August 2006, 14:09:19

On Mon, Aug 14, 2006 at 12:05:46PM -0500, Jim C. Nasby wrote:
>Wow, interesting. IIRC, XFS is lower performing than ext3,

For xlog, maybe. For data, no. Both are definately slower than ext2 for
xlog, which is another reason to have xlog on a small filesystem which
doesn't need metadata journalling.

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

15 August 2006, 13:25:32

On Mon, Aug 14, 2006 at 01:03:41PM -0400, Michael Stone wrote:
> On Mon, Aug 14, 2006 at 10:38:41AM -0500, Jim C. Nasby wrote:
> >Got any data to back that up?
>
> yes. that I'm willing to dig out? no. :)

Well, I'm not digging hard numbers out either, so that's fair. :) But it
would be very handy if people posted results from any testing they're
doing as part of setting up new hardware. Actually, a wiki would
probably be ideal for this...

> >The problem with seperate partitions is that it means more head movement
> >for the drives. If it's all one partition the pg_xlog data will tend to
> >be interspersed with the heap data, meaning less need for head
> >repositioning.
>
> The pg_xlog files will tend to be created up at the front of the disk
> and just sit there. Any affect the positioning has one way or the other
> isn't going to be measurable/repeatable. With a write cache for pg_xlog
> the positioning isn't really going to matter anyway, since you don't
> have to wait for a seek to do the write.

Certainly... my contention is that if you have a good controller that's
caching writes then drive layout basically won't matter at all, because
the controller will just magically make things optimal.

> From what I've observed in testing, I'd guess that the issue is that
> certain filesystem operations (including, possibly, metadata operations)
> are handled in order. If you have xlog on a seperate partition there
> will never be anything competing with a log write on the server side,
> which won't necessarily be true on a shared filesystem. Even if you have
> a battery backed write cache, you might still have to wait a relatively
> long time for the pg_xlog data to be written out if there's already a
> lot of other stuff in a filesystem write queue.

Well, if the controller is caching with a BBU, I'm not sure that order
matters anymore, because the controller should be able to re-order at
will. Theoretically. :) But this is why having some actual data posted
somewhere would be great.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

15 August 2006, 13:29:44

On Mon, Aug 14, 2006 at 01:09:04PM -0400, Michael Stone wrote:
> On Mon, Aug 14, 2006 at 12:05:46PM -0500, Jim C. Nasby wrote:
> >Wow, interesting. IIRC, XFS is lower performing than ext3,
>
> For xlog, maybe. For data, no. Both are definately slower than ext2 for
> xlog, which is another reason to have xlog on a small filesystem which
> doesn't need metadata journalling.

Are 'we' sure that such a setup can't lose any data? I'm worried about
files getting lost when they get written out before the metadata does.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

15 August 2006, 14:25:50

On Tue, Aug 15, 2006 at 11:25:24AM -0500, Jim C. Nasby wrote:
>Well, if the controller is caching with a BBU, I'm not sure that order
>matters anymore, because the controller should be able to re-order at
>will. Theoretically. :) But this is why having some actual data posted
>somewhere would be great.

You're missing the point. It's not a question of what happens once it
gets to the disk/controller, it's a question of whether the xlog write
has to compete with some other write activity before the write gets to
the disk (e.g., at the filesystem level). If you've got a bunch of stuff
in a write buffer on the OS level and you try to push the xlog write
out, you may have to wait for the other stuff to get to the controller
write cache before the xlog does. It doesn't matter if you don't have to
wait for the write to get from the controller cache to the disk if you
already had to wait to get to the controller cache. The effect is a
*lot* smaller than not having a non-volatile cache, but it is an
improvement. (Also, the difference between ext2 and xfs for the xlog is
pretty big itself, and a good reason all by itself to put xlog on a
seperate partition that's small enough to not need journalling.)

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

15 August 2006, 14:27:20

On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
>Are 'we' sure that such a setup can't lose any data?

Yes. If you check the archives, you can even find the last time this was
discussed...

The bottom line is that the only reason you need a metadata journalling
filesystem is to save the fsck time when you come up. On a little
partition like xlog, that's not an issue.

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

mark@mark.mielke.cc

Date:

15 August 2006, 15:32:02

On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
> On Mon, Aug 14, 2006 at 01:09:04PM -0400, Michael Stone wrote:
> > On Mon, Aug 14, 2006 at 12:05:46PM -0500, Jim C. Nasby wrote:
> > >Wow, interesting. IIRC, XFS is lower performing than ext3,
> > For xlog, maybe. For data, no. Both are definately slower than ext2 for
> > xlog, which is another reason to have xlog on a small filesystem which
> > doesn't need metadata journalling.
> Are 'we' sure that such a setup can't lose any data? I'm worried about
> files getting lost when they get written out before the metadata does.

I've been worrying about this myself, and my current conclusion is that
ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which
could lead to the need to trash the xlog.

Even ext3 in writeback mode allows for the indirect blocks to be updated
without the data underneath, allowing for blocks to point to random data,
or worse, previous apparently sane data (especially if the data is from
a drive only used for xlog - the chance is high that a block might look
partially valid?).

So, I'm sticking with ext3 in ordered mode.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Postgresql Performance on an HP DL385 and

From

mark@mark.mielke.cc

Date:

15 August 2006, 15:33:39

On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
> >Are 'we' sure that such a setup can't lose any data?
> Yes. If you check the archives, you can even find the last time this was
> discussed...

I looked last night (coincidence actually) and didn't find proof that
you cannot lose data.

How do you deal with the file system structure being updated before the
data blocks are (re-)written?

I don't think you can.

> The bottom line is that the only reason you need a metadata journalling
> filesystem is to save the fsck time when you come up. On a little
> partition like xlog, that's not an issue.

fsck isn't only about time to fix. fsck is needed, because the file system
is broken. If the file system is broken, how can you guarantee data has
not been corrupted?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

15 August 2006, 16:03:42

On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@mark.mielke.cc wrote:
>On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote:
>> On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
>> >Are 'we' sure that such a setup can't lose any data?
>> Yes. If you check the archives, you can even find the last time this was
>> discussed...
>
>I looked last night (coincidence actually) and didn't find proof that
>you cannot lose data.

You aren't going to find proof, any more than you'll find proof that you
won't lose data if you do lose a journalling fs. (Because there isn't
any.) Unfortunately, many people misunderstand the what a metadata
journal does for you, and overstate its importance in this type of
application.

>How do you deal with the file system structure being updated before the
>data blocks are (re-)written?

*That's what the postgres log is for.* If the latest xlog entries don't
make it to disk, they won't be replayed; if they didn't make it to
disk, the transaction would not have been reported as commited. An
application that understands filesystem semantics can guarantee data
integrity without metadata journaling.

>> The bottom line is that the only reason you need a metadata journalling
>> filesystem is to save the fsck time when you come up. On a little
>> partition like xlog, that's not an issue.
>
>fsck isn't only about time to fix. fsck is needed, because the file system
>is broken.

fsck is needed to reconcile the metadata with the on-disk allocations.
To do that, it reads all the inodes and their corresponding directory
entries. The time to do that is proportional to the size of the
filesystem, hence the comment about time. fsck is not needed "because
the filesystem is broken", it's needed because the filesystem is marked
dirty.

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

15 August 2006, 16:15:20

On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@mark.mielke.cc wrote:
> >On Tue, Aug 15, 2006 at 01:26:46PM -0400, Michael Stone wrote:
> >>On Tue, Aug 15, 2006 at 11:29:26AM -0500, Jim C. Nasby wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was
> >>discussed...
> >
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
>
> You aren't going to find proof, any more than you'll find proof that you
> won't lose data if you do lose a journalling fs. (Because there isn't
> any.) Unfortunately, many people misunderstand the what a metadata
> journal does for you, and overstate its importance in this type of
> application.
>
> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
>
> *That's what the postgres log is for.* If the latest xlog entries don't
> make it to disk, they won't be replayed; if they didn't make it to
> disk, the transaction would not have been reported as commited. An
> application that understands filesystem semantics can guarantee data
> integrity without metadata journaling.

So what causes files to get 'lost' and get stuck in lost+found?
AFAIK that's because the file was written before the metadata. Now, if
fsync'ing a file also ensures that all the metadata is written, then
we're probably fine... if not, then we're at risk every time we create a
new file (every WAL segment if archiving is on, and every time a
relation passes a 1GB boundary).

FWIW, the way that FreeBSD gets around the need to fsck a dirty
filesystem before use without using a journal is to ensure that metadate
operations are always on the drive before the actual data is written.
There's still a need to fsck a dirty filesystem, but it can now be done
in the background, with the filesystem mounted and in use.

> >>The bottom line is that the only reason you need a metadata journalling
> >>filesystem is to save the fsck time when you come up. On a little
> >>partition like xlog, that's not an issue.
> >
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken.
>
> fsck is needed to reconcile the metadata with the on-disk allocations.
> To do that, it reads all the inodes and their corresponding directory
> entries. The time to do that is proportional to the size of the
> filesystem, hence the comment about time. fsck is not needed "because
> the filesystem is broken", it's needed because the filesystem is marked
> dirty.
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

mark@mark.mielke.cc

Date:

15 August 2006, 16:40:05

On Tue, Aug 15, 2006 at 03:02:56PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 02:33:27PM -0400, mark@mark.mielke.cc wrote:
> >>>Are 'we' sure that such a setup can't lose any data?
> >>Yes. If you check the archives, you can even find the last time this was
> >>discussed...
> >I looked last night (coincidence actually) and didn't find proof that
> >you cannot lose data.
> You aren't going to find proof, any more than you'll find proof that you
> won't lose data if you do lose a journalling fs. (Because there isn't
> any.) Unfortunately, many people misunderstand the what a metadata
> journal does for you, and overstate its importance in this type of
> application.

Yes, many people do. :-)

> >How do you deal with the file system structure being updated before the
> >data blocks are (re-)written?
> *That's what the postgres log is for.* If the latest xlog entries don't
> make it to disk, they won't be replayed; if they didn't make it to
> disk, the transaction would not have been reported as commited. An
> application that understands filesystem semantics can guarantee data
> integrity without metadata journaling.

No. This is not true. Updating the file system structure (inodes, indirect
blocks) touches a separate part of the disk than the actual data. If
the file system structure is modified, say, to extend a file to allow
it to contain more data, but the data itself is not written, then upon
a restore, with a system such as ext2, or ext3 with writeback, or xfs,
it is possible that the end of the file, even the postgres log file,
will contain a random block of data from the disk. If this random block
of data happens to look like a valid xlog block, it may be played back,
and the database corrupted.

If the file system is only used for xlog data, the chance that it looks
like a valid block increases, would it not?

> >>The bottom line is that the only reason you need a metadata journalling
> >>filesystem is to save the fsck time when you come up. On a little
> >>partition like xlog, that's not an issue.
> >fsck isn't only about time to fix. fsck is needed, because the file system
> >is broken.
> fsck is needed to reconcile the metadata with the on-disk allocations.
> To do that, it reads all the inodes and their corresponding directory
> entries. The time to do that is proportional to the size of the
> filesystem, hence the comment about time. fsck is not needed "because
> the filesystem is broken", it's needed because the filesystem is marked
> dirty.

This is also wrong. fsck is needed because the file system is broken.

It takes time, because it doesn't have a journal to help it, therefore it
must look through the entire file system and guess what the problems are.
There are classes of problems such as I describe above, for which fsck
*cannot* guess how to solve the problem. There is not enough information
available for it to deduce that anything is wrong at all.

The probability is low, for sure - but then, the chance of a file system
failure is already low.

Betting on ext2 + postgresql xlog has not been confirmed to me as reliable.

Telling me that journalling is misunderstood doesn't prove to me that you
understand it.

I don't mean to be offensive, but I won't accept what you say, as it does
not make sense with my understanding of how file systems work. :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Postgresql Performance on an HP DL385 and

From

mark@mark.mielke.cc

Date:

15 August 2006, 16:43:16

On Tue, Aug 15, 2006 at 02:15:05PM -0500, Jim C. Nasby wrote:
> So what causes files to get 'lost' and get stuck in lost+found?
> AFAIK that's because the file was written before the metadata. Now, if
> fsync'ing a file also ensures that all the metadata is written, then
> we're probably fine... if not, then we're at risk every time we create a
> new file (every WAL segment if archiving is on, and every time a
> relation passes a 1GB boundary).

Only if fsync ensures that the data written to disk is ordered, which as
far as I know, is not done for ext2. Dirty blocks are written in whatever
order is fastest for them to be written, or sequential order, or some
order that isn't based on examining the metadata.

If my understanding is correct - and I've seen nothing yet to say that
it isn't - ext2 is not safe, postgresql xlog or not, fsck or not. It
is safer than no postgresql xlog - but there exists windows, however
small, where the file system can be corrupted.

The need for fsck is due to this problem. If fsck needs to do anything
at all, other than replay a journal, the file system is broken.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Postgresql Performance on an HP DL385 and

From

Tom Lane

Date:

15 August 2006, 17:05:33

mark@mark.mielke.cc writes:
> I've been worrying about this myself, and my current conclusion is that
> ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which
> could lead to the need to trash the xlog.

> Even ext3 in writeback mode allows for the indirect blocks to be updated
> without the data underneath, allowing for blocks to point to random data,
> or worse, previous apparently sane data (especially if the data is from
> a drive only used for xlog - the chance is high that a block might look
> partially valid?).

At least for xlog, this worrying is misguided, because we zero and fsync
a WAL file before we ever put any valuable data into it.  Unless the
filesystem is lying through its teeth about having done an fsync, there
should be no metadata changes happening for an active WAL file (other
than mtime of course).

            regards, tom lane

Re: Postgresql Performance on an HP DL385 and

From

mark@mark.mielke.cc

Date:

15 August 2006, 17:29:55

On Tue, Aug 15, 2006 at 04:05:17PM -0400, Tom Lane wrote:
> mark@mark.mielke.cc writes:
> > I've been worrying about this myself, and my current conclusion is that
> > ext2 is bad because: a) fsck, and b) data can be lost or corrupted, which
> > could lead to the need to trash the xlog.
> > Even ext3 in writeback mode allows for the indirect blocks to be updated
> > without the data underneath, allowing for blocks to point to random data,
> > or worse, previous apparently sane data (especially if the data is from
> > a drive only used for xlog - the chance is high that a block might look
> > partially valid?).
> At least for xlog, this worrying is misguided, because we zero and fsync
> a WAL file before we ever put any valuable data into it.  Unless the
> filesystem is lying through its teeth about having done an fsync, there
> should be no metadata changes happening for an active WAL file (other
> than mtime of course).

Hmmm... I may have missed a post about this in the archive.

WAL file is never appended - only re-written?

If so, then I'm wrong, and ext2 is fine. The requirement is that no
file system structures change as a result of any writes that
PostgreSQL does. If no file system structures change, then I take
everything back as uninformed.

Please confirm whichever. :-)

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Postgresql Performance on an HP DL385 and

From

Tom Lane

Date:

15 August 2006, 17:40:23

mark@mark.mielke.cc writes:
> WAL file is never appended - only re-written?

> If so, then I'm wrong, and ext2 is fine. The requirement is that no
> file system structures change as a result of any writes that
> PostgreSQL does. If no file system structures change, then I take
> everything back as uninformed.

That risk certainly exists in the general data directory, but AFAIK
it's not a problem for pg_xlog.

            regards, tom lane

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

15 August 2006, 17:53:35

On Tue, Aug 15, 2006 at 02:15:05PM -0500, Jim C. Nasby wrote:
>Now, if
>fsync'ing a file also ensures that all the metadata is written, then
>we're probably fine...

...and it does. Unclean shutdowns cause problems in general because
filesystems operate asynchronously. postgres (and other similar
programs) go to great lengths to make sure that critical operations are
performed synchronously. If the program *doesn't* do that, metadata
journaling isn't a magic wand which will guarantee data integrity--it
won't.  If the program *does* do that, all the metadata journaling adds
is the ability to skip fsck and start up faster.

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

Michael Stone

Date:

15 August 2006, 17:59:28

On Tue, Aug 15, 2006 at 03:39:51PM -0400, mark@mark.mielke.cc wrote:
>No. This is not true. Updating the file system structure (inodes, indirect
>blocks) touches a separate part of the disk than the actual data. If
>the file system structure is modified, say, to extend a file to allow
>it to contain more data, but the data itself is not written, then upon
>a restore, with a system such as ext2, or ext3 with writeback, or xfs,
>it is possible that the end of the file, even the postgres log file,
>will contain a random block of data from the disk. If this random block
>of data happens to look like a valid xlog block, it may be played back,
>and the database corrupted.

you're conflating a whole lot of different issues here. You're ignoring
the fact that postgres preallocates the xlog segment, you're ignoring
the fact that you can sync a directory entry, you're ignoring the fact
that syncing some metadata (such as atime) doesn't matter (only the
block allocation is important in this case, and the blocks are
pre-allocated).

>This is also wrong. fsck is needed because the file system is broken.

nope, the file system *may* be broken. the dirty flag simply indicates
that the filesystem needs to be checked to find out whether or not it is
broken.

>I don't mean to be offensive, but I won't accept what you say, as it does
>not make sense with my understanding of how file systems work. :-)

<shrug> I'm not getting paid to convince you of anything.

Mike Stone

Re: Postgresql Performance on an HP DL385 and

From

mark@mark.mielke.cc

Date:

15 August 2006, 18:38:53

On Tue, Aug 15, 2006 at 04:58:59PM -0400, Michael Stone wrote:
> On Tue, Aug 15, 2006 at 03:39:51PM -0400, mark@mark.mielke.cc wrote:
> >No. This is not true. Updating the file system structure (inodes, indirect
> >blocks) touches a separate part of the disk than the actual data. If
> >the file system structure is modified, say, to extend a file to allow
> >it to contain more data, but the data itself is not written, then upon
> >a restore, with a system such as ext2, or ext3 with writeback, or xfs,
> >it is possible that the end of the file, even the postgres log file,
> >will contain a random block of data from the disk. If this random block
> >of data happens to look like a valid xlog block, it may be played back,
> >and the database corrupted.
> you're conflating a whole lot of different issues here. You're ignoring
> the fact that postgres preallocates the xlog segment, you're ignoring
> the fact that you can sync a directory entry, you're ignoring the fact
> that syncing some metadata (such as atime) doesn't matter (only the
> block allocation is important in this case, and the blocks are
> pre-allocated).

Yes, no, no, no. :-)

I didn't know that the xlog segment only uses pre-allocated space. I
ignore mtime/atime as they don't count as file system structure
changes to me. It's updating a field in place. No change to the structure.

With the pre-allocation knowledge, I agree with you. Not sure how I
missed that in my reviewing of the archives... I did know it
pre-allocated once upon a time... Hmm....

> >This is also wrong. fsck is needed because the file system is broken.
> nope, the file system *may* be broken. the dirty flag simply indicates
> that the filesystem needs to be checked to find out whether or not it is
> broken.

Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now
would it? So we assume that it is broken. A little bit of a game, but
it is important to me. If I assumed the file system was not broken, I
wouldn't run fsck. I run fsck, because I assume it may be broken. If
broken, it indicates potential corruption.

The difference for me, is that if you are correct, that the xlog is
safe, than for a disk that only uses xlog, fsck is not ever necessary,
even after a system crash. If fsck is necessary, then there is potential
for a problem.

With the pre-allocation knowledge, I'm tempted to agree with you that
fsck is not ever necessary for partitions that only hold a properly
pre-allocated xlog.

> >I don't mean to be offensive, but I won't accept what you say, as it does
> >not make sense with my understanding of how file systems work. :-)
> <shrug> I'm not getting paid to convince you of anything.

Just getting you to back up your claim a bit... As I said, no intent
to offend. I learned from it.

Thanks,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   |
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/

Re: Postgresql Performance on an HP DL385 and

From

"Jim C. Nasby"

Date:

15 August 2006, 19:20:35

On Tue, Aug 15, 2006 at 05:38:43PM -0400, mark@mark.mielke.cc wrote:
> I didn't know that the xlog segment only uses pre-allocated space. I
> ignore mtime/atime as they don't count as file system structure
> changes to me. It's updating a field in place. No change to the structure.
>
> With the pre-allocation knowledge, I agree with you. Not sure how I
> missed that in my reviewing of the archives... I did know it
> pre-allocated once upon a time... Hmm....

This is only valid if the pre-allocation is also fsync'd *and* fsync
ensures that both the metadata and file data are on disk. Anyone
actually checked that? :)

BTW, I did see some anecdotal evidence on one of the lists a while ago.
A PostgreSQL DBA had suggested doing a 'pull the power cord' test to the
other DBAs (all of which were responsible for different RDBMSes,
including a bunch of well known names). They all thought he was off his
rocker. Not too long after that, an unplanned power outage did occur,
and PostgreSQL was the only RDBMS that recovered every single database
without intervention.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Postgresql Performance on an HP DL385 and

From

"Steinar H. Gunderson"

Date:

15 August 2006, 19:23:34

On Tue, Aug 15, 2006 at 05:20:25PM -0500, Jim C. Nasby wrote:
> This is only valid if the pre-allocation is also fsync'd *and* fsync
> ensures that both the metadata and file data are on disk. Anyone
> actually checked that? :)

fsync() does that, yes. fdatasync() (if it exists), OTOH, doesn't sync the
metadata.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: Postgresql Performance on an HP DL385 and

From

David Lang

Date:

15 August 2006, 21:07:39

On Tue, 15 Aug 2006 mark@mark.mielke.cc wrote:
>>> This is also wrong. fsck is needed because the file system is broken.
>> nope, the file system *may* be broken. the dirty flag simply indicates
>> that the filesystem needs to be checked to find out whether or not it is
>> broken.
>
> Ah, but if we knew it wasn't broken, then fsck wouldn't be needed, now
> would it? So we assume that it is broken. A little bit of a game, but
> it is important to me. If I assumed the file system was not broken, I
> wouldn't run fsck. I run fsck, because I assume it may be broken. If
> broken, it indicates potential corruption.

note tha the ext3, reiserfs, jfs, and xfs developers (at least) consider
fsck nessasary even for journaling fileysstems. they just let you get away
without it being mandatory after a unclean shutdown.

David Lang

Re: Postgresql Performance on an HP DL385 and

From

Tom Lane

Date:

16 August 2006, 00:02:29

"Steinar H. Gunderson" <sgunderson@bigfoot.com> writes:
> On Tue, Aug 15, 2006 at 05:20:25PM -0500, Jim C. Nasby wrote:
>> This is only valid if the pre-allocation is also fsync'd *and* fsync
>> ensures that both the metadata and file data are on disk. Anyone
>> actually checked that? :)

> fsync() does that, yes. fdatasync() (if it exists), OTOH, doesn't sync the
> metadata.

Well, the POSIX spec says that fsync should do that ;-)

My guess is that most/all kernel filesystem layers do indeed try to sync
everything that the spec says they should.  The Achilles' heel of the
whole business is disk drives that lie about write completion.  The
kernel is just as vulnerable to that as any application ...

            regards, tom lane

Re: Postgresql Performance on an HP DL385 and

From

Markus Schaber

Date:

16 August 2006, 05:33:36

Hi, Jim,

Jim C. Nasby wrote:

> Well, if the controller is caching with a BBU, I'm not sure that order
> matters anymore, because the controller should be able to re-order at
> will. Theoretically. :) But this is why having some actual data posted
> somewhere would be great.

Well, actually, the controller should not reorder over write barriers.

Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

Re: Postgresql Performance on an HP DL385 and

From

Steve Poe

Date:

16 August 2006, 23:10:46

Everyone,

I wanted to follow-up on bonnie results for the internal RAID1 which is
connected to the SmartArray 6i. I believe this is the problem, but I am
not good at interepting the results. Here's an sample of three runs:

scsi disc
array ,16G,47983,67,65492,20,37214,6,73785,87,89787,6,578.2,0,16,+++++,
+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
scsi disc
array ,16G,54634,75,67793,21,36835,6,74190,88,89314,6,579.9,0,16,+++++,
+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
scsi disc
array ,16G,55056,76,66108,20,36859,6,74108,87,89559,6,585.0,0,16,+++++,
+++,+++++,+++,+++++,+++,+++++,+++,+

This was run on the internal RAID1 on the outer portion of the discs
formatted at ext2.

Thanks.

Steve

On Thu, 2006-08-10 at 10:35 -0500, Scott Marlowe wrote:
> On Thu, 2006-08-10 at 10:15, Luke Lonergan wrote:
> > Mike,
> >
> > On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:
> >
> > > On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
> > >> I tried as you suggested and my performance dropped by 50%. I went from
> > >> a 32 TPS to 16. Oh well.
> > >
> > > If you put data & xlog on the same array, put them on seperate
> > > partitions, probably formatted differently (ext2 on xlog).
> >
> > If he's doing the same thing on both systems (Sun and HP) and the HP
> > performance is dramatically worse despite using more disks and having faster
> > CPUs and more RAM, ISTM the problem isn't the configuration.
> >
> > Add to this the fact that the Sun machine is CPU bound while the HP is I/O
> > wait bound and I think the problem is the disk hardware or the driver
> > therein.
>
> I agree.  The problem here looks to be the RAID controller.
>
> Steve, got access to a different RAID controller to test with?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: Postgresql Performance on an HP DL385 and

From

"Luke Lonergan"

Date:

18 August 2006, 11:38:35

Steve,

If this is an internal RAID1 on two disks, it looks great.

Based on the random seeks though (578 seeks/sec), it looks like maybe it's 6
disks in a RAID10?

- Luke


On 8/16/06 7:10 PM, "Steve Poe" <steve.poe@gmail.com> wrote:

> Everyone,
>
> I wanted to follow-up on bonnie results for the internal RAID1 which is
> connected to the SmartArray 6i. I believe this is the problem, but I am
> not good at interepting the results. Here's an sample of three runs:
>
> scsi disc
> array ,16G,47983,67,65492,20,37214,6,73785,87,89787,6,578.2,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
> scsi disc
> array ,16G,54634,75,67793,21,36835,6,74190,88,89314,6,579.9,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
> scsi disc
> array ,16G,55056,76,66108,20,36859,6,74108,87,89559,6,585.0,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+
>
> This was run on the internal RAID1 on the outer portion of the discs
> formatted at ext2.
>
> Thanks.
>
> Steve
>
> On Thu, 2006-08-10 at 10:35 -0500, Scott Marlowe wrote:
>> On Thu, 2006-08-10 at 10:15, Luke Lonergan wrote:
>>> Mike,
>>>
>>> On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:
>>>
>>>> On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
>>>>> I tried as you suggested and my performance dropped by 50%. I went from
>>>>> a 32 TPS to 16. Oh well.
>>>>
>>>> If you put data & xlog on the same array, put them on seperate
>>>> partitions, probably formatted differently (ext2 on xlog).
>>>
>>> If he's doing the same thing on both systems (Sun and HP) and the HP
>>> performance is dramatically worse despite using more disks and having faster
>>> CPUs and more RAM, ISTM the problem isn't the configuration.
>>>
>>> Add to this the fact that the Sun machine is CPU bound while the HP is I/O
>>> wait bound and I think the problem is the disk hardware or the driver
>>> therein.
>>
>> I agree.  The problem here looks to be the RAID controller.
>>
>> Steve, got access to a different RAID controller to test with?
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>
>

Re: Postgresql Performance on an HP DL385 and

From

"Bucky Jordan"

Date:

18 August 2006, 12:26:13

That's about what I was getting for a 2 disk RAID 0 setup on a PE 2950.
Here's bonnie++ numbers for the RAID10x4 and RAID0x2, unfortunately I
only have the 1.93 numbers since this was before I got the advice to run
with the earlier version of bonnie and larger file sizes, so I don't
know how meaningful they are.

RAID 10x4
bash-2.05b$ bonnie++ -d bonnie -s 1000:8k
Version 1.93c       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
    1000M   585  99 21705   4 28560   9  1004  99 812997  98  5436
454
Latency             14181us   81364us   50256us   57720us    1671us
1059ms
Version 1.93c       ------Sequential Create------ --------Random
Create--------
c -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  4712  10 +++++ +++ +++++ +++  4674  10 +++++ +++
+++++ +++
Latency               807ms      21us      36us     804ms     110us
36us
1.93c,1.93c,
,1,1155207445,1000M,,585,99,21705,4,28560,9,1004,99,812997,98,5436,454,1
6,,,,,4712,10,+++++,+++,+++++,+++,4674,10,+++++,+++,+++++,+++,14181us,81
364us,50256us,57720us,1671us,1059ms,807ms,21us,36us,804ms,110us,36us
bash-2.05b$

RAID 0x2
bash-2.05b$ bonnie++ -d bonnie -s 1000:8k
Version 1.93c       ------Sequential Output------ --Sequential Input-
--Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
    1000M   575  99 131621  25 104178  26  1004  99 816928  99  6233
521
Latency             14436us   26663us   47478us   54796us    1487us
38924us
Version 1.93c       ------Sequential Create------ --------Random
Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
              files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
/sec %CP
                 16  4935  10 +++++ +++ +++++ +++  5198  11 +++++ +++
+++++ +++
Latency               738ms      32us      43us     777ms      24us
30us
1.93c,1.93c,beast.corp.lumeta.com,1,1155210203,1000M,,575,99,131621,25,1
04178,26,1004,99,816928,99,6233,521,16,,,,,4935,10,+++++,+++,+++++,+++,5
198,11,+++++,+++,+++++,+++,14436us,26663us,47478us,54796us,1487us,38924u
s,738ms,32us,43us,777ms,24us,30us

A RAID 5 configuration seems to outperform this on the PE 2950 though
(at least in terms of raw read/write perf)

If anyone's interested in some more detailed tests of the 2950, I might
be able to reconfigure the raid for some tests next week before I start
setting up the box for long term use, so I'm open to suggestions. See
earlier posts in this thread for details about the hardware.

Thanks,

Bucky

-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Luke
Lonergan
Sent: Friday, August 18, 2006 10:38 AM
To: steve.poe@gmail.com; Scott Marlowe
Cc: Michael Stone; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Postgresql Performance on an HP DL385 and

Steve,

If this is an internal RAID1 on two disks, it looks great.

Based on the random seeks though (578 seeks/sec), it looks like maybe
it's 6
disks in a RAID10?

- Luke


On 8/16/06 7:10 PM, "Steve Poe" <steve.poe@gmail.com> wrote:

> Everyone,
>
> I wanted to follow-up on bonnie results for the internal RAID1 which
is
> connected to the SmartArray 6i. I believe this is the problem, but I
am
> not good at interepting the results. Here's an sample of three runs:
>
> scsi disc
> array
,16G,47983,67,65492,20,37214,6,73785,87,89787,6,578.2,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
> scsi disc
> array
,16G,54634,75,67793,21,36835,6,74190,88,89314,6,579.9,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
> scsi disc
> array
,16G,55056,76,66108,20,36859,6,74108,87,89559,6,585.0,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+
>
> This was run on the internal RAID1 on the outer portion of the discs
> formatted at ext2.
>
> Thanks.
>
> Steve
>
> On Thu, 2006-08-10 at 10:35 -0500, Scott Marlowe wrote:
>> On Thu, 2006-08-10 at 10:15, Luke Lonergan wrote:
>>> Mike,
>>>
>>> On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us>
wrote:
>>>
>>>> On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
>>>>> I tried as you suggested and my performance dropped by 50%. I went
from
>>>>> a 32 TPS to 16. Oh well.
>>>>
>>>> If you put data & xlog on the same array, put them on seperate
>>>> partitions, probably formatted differently (ext2 on xlog).
>>>
>>> If he's doing the same thing on both systems (Sun and HP) and the HP
>>> performance is dramatically worse despite using more disks and
having faster
>>> CPUs and more RAM, ISTM the problem isn't the configuration.
>>>
>>> Add to this the fact that the Sun machine is CPU bound while the HP
is I/O
>>> wait bound and I think the problem is the disk hardware or the
driver
>>> therein.
>>
>> I agree.  The problem here looks to be the RAID controller.
>>
>> Steve, got access to a different RAID controller to test with?
>>
>> ---------------------------(end of
broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that
your
>>        message can get through to the mailing list cleanly
>
>



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

18 August 2006, 14:42:14

Luke,

Nope. it is only a RAID1 for the 2 internal discs connected to the SmartArray 6i. This is where I *had* the pg_xlog located when the performance was very poor. Also, I just found out the default stripe size is 128k. Would this be a problem for pg_xlog?

The 6-disc RAID10 you speak of is on the SmartArray 642 RAID adapter.

Steve

On 8/18/06, Luke Lonergan < llonergan@greenplum.com> wrote:

Steve,

If this is an internal RAID1 on two disks, it looks great.

Based on the random seeks though (578 seeks/sec), it looks like maybe it's 6
disks in a RAID10?

- Luke

On 8/16/06 7:10 PM, "Steve Poe" <steve.poe@gmail.com > wrote:

> Everyone,
>
> I wanted to follow-up on bonnie results for the internal RAID1 which is
> connected to the SmartArray 6i. I believe this is the problem, but I am
> not good at interepting the results. Here's an sample of three runs:
>
> scsi disc
> array ,16G,47983,67,65492,20,37214,6,73785,87,89787,6,578.2,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
> scsi disc
> array ,16G,54634,75,67793,21,36835,6,74190,88,89314,6, 579.9,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
> scsi disc
> array ,16G,55056,76,66108,20,36859,6,74108,87,89559,6,585.0,0,16,+++++,
> +++,+++++,+++,+++++,+++,+++++,+++,+
>
> This was run on the internal RAID1 on the outer portion of the discs
> formatted at ext2.
>
> Thanks.
>
> Steve
>
> On Thu, 2006-08-10 at 10:35 -0500, Scott Marlowe wrote:
>> On Thu, 2006-08-10 at 10:15, Luke Lonergan wrote:
>>> Mike,
>>>
>>> On 8/10/06 4:09 AM, "Michael Stone" <mstone+postgres@mathom.us > wrote:
>>>
>>>> On Wed, Aug 09, 2006 at 08:29:13PM -0700, Steve Poe wrote:
>>>>> I tried as you suggested and my performance dropped by 50%. I went from
>>>>> a 32 TPS to 16. Oh well.
>>>>
>>>> If you put data & xlog on the same array, put them on seperate
>>>> partitions, probably formatted differently (ext2 on xlog).
>>>
>>> If he's doing the same thing on both systems (Sun and HP) and the HP
>>> performance is dramatically worse despite using more disks and having faster
>>> CPUs and more RAM, ISTM the problem isn't the configuration.
>>>
>>> Add to this the fact that the Sun machine is CPU bound while the HP is I/O
>>> wait bound and I think the problem is the disk hardware or the driver
>>> therein.
>>
>> I agree.  The problem here looks to be the RAID controller.
>>
>> Steve, got access to a different RAID controller to test with?
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>
>

Re: Postgresql Performance on an HP DL385 and

From

"Luke Lonergan"

Date:

18 August 2006, 15:14:40

Steve,

On 8/18/06 10:39 AM, "Steve Poe" <steve.poe@gmail.com> wrote:

> Nope. it is only a RAID1 for the 2 internal discs connected to the SmartArray
> 6i. This is where I *had* the pg_xlog located when the performance was very
> poor. Also, I just found out the default stripe size is 128k. Would this be a
> problem for pg_xlog?

ISTM that the main performance issue for xlog is going to be the rate at
which fdatasync operations complete, and the stripe size shouldn't hurt
that.

What are your postgresql.conf settings for the xlog: how many logfiles,
sync_method, etc?

> The 6-disc RAID10 you speak of is on the SmartArray 642 RAID adapter.

Interesting - the seek rate is very good for two drives, are they 15K RPM?

- Luke

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

18 August 2006, 16:06:18

Luke,

ISTM that the main performance issue for xlog is going to be the rate at
which fdatasync operations complete, and the stripe size shouldn't hurt
that.

I thought so. However, I've also tried running the PGDATA off of the RAID1 as a test and it is poor.

What are your postgresql.conf settings for the xlog: how many logfiles,
sync_method, etc?

wal_sync_method = fsync # the default varies across platforms:
                                # fsync, fdatasync, open_sync, or open_datasync
# - Checkpoints -

checkpoint_segments = 14        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300        # range 30-3600, in seconds
#checkpoint_warning = 30        # 0 is off, in seconds
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5

What stumps me is I use the same settings on a Sun box (dual Opteron 4GB w/ LSI MegaRAID 128M) with the same data. This is on pg 7.4.13.

> The 6-disc RAID10 you speak of is on the SmartArray 642 RAID adapter.

Interesting - the seek rate is very good for two drives, are they 15K RPM?

Nope. 10K. RPM.

HP's recommendation for testing is to connect the RAID1 to the second channel off of the SmartArray 642 adapter since they use the same driver, and, according to HP, I should not have to rebuilt the RAID1.

I have to send the new server to the hospital next week, so I have very little testing time left.

Steve

Re: Postgresql Performance on an HP DL385 and

From

"Luke Lonergan"

Date:

18 August 2006, 18:34:41

Steve,

One thing here is that “wal_sync_method” should be set to “fdatasync” and not “fsync”. In fact, the default is fdatasync, but because you have uncommented the standard line in the file, it is changed to “fsync”, which is a lot slower. This is a bug in the file defaults.

That could speed things up quite a bit on the xlog.

WRT the difference between the two systems, I’m kind of stumped.

- Luke

On 8/18/06 12:00 PM, "Steve Poe" <steve.poe@gmail.com> wrote:

Luke,

ISTM that the main performance issue for xlog is going to be the rate at
which fdatasync operations complete, and the stripe size shouldn't hurt
that.

I thought so. However, I've also tried running the PGDATA off of the RAID1 as a test and it is poor.

What are your postgresql.conf settings for the xlog: how many logfiles,
sync_method, etc?

wal_sync_method = fsync # the default varies across platforms:
                                # fsync, fdatasync, open_sync, or open_datasync
# - Checkpoints -

checkpoint_segments = 14        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300        # range 30-3600, in seconds
#checkpoint_warning = 30        # 0 is off, in seconds
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5

What stumps me is I use the same settings on a Sun box (dual Opteron 4GB w/ LSI MegaRAID 128M) with the same data. This is on pg 7.4.13.

> The 6-disc RAID10 you speak of is on the SmartArray 642 RAID adapter.

Interesting - the seek rate is very good for two drives, are they 15K RPM?

Nope. 10K. RPM.

HP's recommendation for testing is to connect the RAID1 to the second channel off of the SmartArray 642 adapter since they use the same driver, and, according to HP, I should not have to rebuilt the RAID1.

I have to send the new server to the hospital next week, so I have very little testing time left.

Steve

Re: Postgresql Performance on an HP DL385 and

From

"Steve Poe"

Date:

18 August 2006, 19:23:34

Luke,

I'll try it, but you're right, it should not matter. The two systems are:

HP DL385 (dual Opteron 265 I believe) 8GB of RAM, two internal RAID1 U320 10K

Sun W2100z (dual Opteron 245 I believe) 4GB of RAM, 1 U320 10K drive with LSI MegaRAID 2X 128M driving two external 4-disc arrays U320 10K drives in a RAID10 configuration. Running same version of LInux (Centos 4.3 ) and same kernel version. No changes within the kernel for each of them. Running the same *.conf files for Postgresql 7.4.13.

Steve

On 8/18/06, Luke Lonergan <llonergan@greenplum.com> wrote:

Steve,

One thing here is that "wal_sync_method" should be set to "fdatasync" and not "fsync". In fact, the default is fdatasync, but because you have uncommented the standard line in the file, it is changed to "fsync", which is a lot slower. This is a bug in the file defaults.

That could speed things up quite a bit on the xlog.

WRT the difference between the two systems, I'm kind of stumped.

- Luke

On 8/18/06 12:00 PM, "Steve Poe" <steve.poe@gmail.com> wrote:

Luke,

ISTM that the main performance issue for xlog is going to be the rate at
which fdatasync operations complete, and the stripe size shouldn't hurt
that.

I thought so. However, I've also tried running the PGDATA off of the RAID1 as a test and it is poor.

What are your postgresql.conf settings for the xlog: how many logfiles,
sync_method, etc?

wal_sync_method = fsync # the default varies across platforms:
                                # fsync, fdatasync, open_sync, or open_datasync
# - Checkpoints -

checkpoint_segments = 14        # in logfile segments, min 1, 16MB each
checkpoint_timeout = 300        # range 30-3600, in seconds
#checkpoint_warning = 30        # 0 is off, in seconds
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5

What stumps me is I use the same settings on a Sun box (dual Opteron 4GB w/ LSI MegaRAID 128M) with the same data. This is on pg 7.4.13.

> The 6-disc RAID10 you speak of is on the SmartArray 642 RAID adapter.

Interesting - the seek rate is very good for two drives, are they 15K RPM?

Nope. 10K. RPM.

HP's recommendation for testing is to connect the RAID1 to the second channel off of the SmartArray 642 adapter since they use the same driver, and, according to HP, I should not have to rebuilt the RAID1.

I have to send the new server to the hospital next week, so I have very little testing time left.

Steve