Thread: The results of my PostgreSQL/filesystem performance tests

The results of my PostgreSQL/filesystem performance tests

From
Bill Moran
Date:
Hey all.

I said I was going to do it, and I finally did it.

As with all performance tests/benchmarks, there are probably dozens or
more reasons why these results aren't as accurate or wonderful as they
should be.  Take them for what they are and hopefully everyone can
learn a few things from them.

Intelligent feedback is welcome.

http://www.potentialtech.com/wmoran/postgresql.php

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: The results of my PostgreSQL/filesystem performance tests

From
"Balazs Wellisch"
Date:
Bill,

Very interesting results. I'd like to command you on your honesty.
Having started out with the intentions of proving that FreeBSD is faster
than Linux only to find that the opposite is true must not have been
rewarding for you. However, these unexpected results serve only to
reinforce the integrity of your tests.

Thanks for all the work.

Balazs



-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Bill Moran
Sent: Tuesday, August 26, 2003 6:48 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] The results of my PostgreSQL/filesystem performance
tests

Hey all.

I said I was going to do it, and I finally did it.

As with all performance tests/benchmarks, there are probably dozens or
more reasons why these results aren't as accurate or wonderful as they
should be.  Take them for what they are and hopefully everyone can
learn a few things from them.

Intelligent feedback is welcome.

http://www.potentialtech.com/wmoran/postgresql.php

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings



Re: The results of my PostgreSQL/filesystem performance tests

From
"Shridhar Daithankar"
Date:
On 26 Aug 2003 at 21:47, Bill Moran wrote:

> Hey all.
>
> I said I was going to do it, and I finally did it.
>
> As with all performance tests/benchmarks, there are probably dozens or
> more reasons why these results aren't as accurate or wonderful as they
> should be.  Take them for what they are and hopefully everyone can
> learn a few things from them.
>
> Intelligent feedback is welcome.
>
> http://www.potentialtech.com/wmoran/postgresql.php

Can we have these benchmarks and relevant information stored in a central
archive at techdocs(Say)?

That would be better than searching mail archives..


Bye
 Shridhar

--
    "... freedom ... is a worship word..."    "It is our worship word too."        -- Cloud
William and Kirk, "The Omega Glory", stardate unknown


Re: The results of my PostgreSQL/filesystem performance

From
Dennis Björklund
Date:
On Tue, 26 Aug 2003, Bill Moran wrote:

> As with all performance tests/benchmarks, there are probably dozens or
> more reasons why these results aren't as accurate or wonderful as they
> should be.  Take them for what they are and hopefully everyone can
> learn a few things from them.

What version of pg was used in debian and redhat? For freebsd it's 7.2.4
it says on the page, but I see nothing about the other two. The version
that comes with Redhat 9 (Shrike) is 7.3.2.

--
/Dennis


Re: The results of my PostgreSQL/filesystem performance

From
Ron Johnson
Date:
On Tue, 2003-08-26 at 20:47, Bill Moran wrote:
> Hey all.
>
> I said I was going to do it, and I finally did it.
>
> As with all performance tests/benchmarks, there are probably dozens or
> more reasons why these results aren't as accurate or wonderful as they
> should be.  Take them for what they are and hopefully everyone can
> learn a few things from them.
>
> Intelligent feedback is welcome.
>
> http://www.potentialtech.com/wmoran/postgresql.php

Hi,

Woody has pg 7.2.1.  Note also that Woody comes with kernel 2.4.18.

It would be interesting to see how Debian Sid (kernel 2.4.21 and
pg 7.3.3) would perform.

Thanks for the results!

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"Oh, great altar of passive entertainment, bestow upon me thy
discordant images at such speed as to render linear thought impossible"
Calvin, regarding TV


Re: The results of my PostgreSQL/filesystem performance

From
Ron Johnson
Date:
On Tue, 2003-08-26 at 20:47, Bill Moran wrote:
> Hey all.
>
> I said I was going to do it, and I finally did it.
>
> As with all performance tests/benchmarks, there are probably dozens or
> more reasons why these results aren't as accurate or wonderful as they
> should be.  Take them for what they are and hopefully everyone can
> learn a few things from them.
>
> Intelligent feedback is welcome.
>
> http://www.potentialtech.com/wmoran/postgresql.php

I notice that the Linux FSs weren't tested with noatime.  Any
reason?

--
-----------------------------------------------------------------
Ron Johnson, Jr. ron.l.johnson@cox.net
Jefferson, LA USA

"As I like to joke, I may have invented it, but Microsoft made
it popular"
David Bradley, regarding Ctrl-Alt-Del


Re: The results of my PostgreSQL/filesystem performance tests

From
Ludek Finstrle
Date:
> Intelligent feedback is welcome.
>
> http://www.potentialtech.com/wmoran/postgresql.php

Good work. But I can't find information about xfs. Do you plan to add
this one FS in test?

Luf

Re: The results of my PostgreSQL/filesystem performance tests

From
Christopher Browne
Date:
A long time ago, in a galaxy far, far away, balazs@neusolutions.com ("Balazs Wellisch") wrote:
> Very interesting results. I'd like to command you on your honesty.
> Having started out with the intentions of proving that FreeBSD is faster
> than Linux only to find that the opposite is true must not have been
> rewarding for you. However, these unexpected results serve only to
> reinforce the integrity of your tests.

Well put.

To see a result that the tester didn't really want to see/present does
suggest good things about the tester's honesty.  There was incentive
to hide unfavorable results.

What it still leaves quite open is just what happens when the OS has
more than one disk drive or CPU to play with.  It's not clear what
happens in such cases, whether FreeBSD would catch up, or be "left
further in the dust."  The traditional "propaganda" has been that
there are all sorts of reasons to expect PostgreSQL on FreeBSD to run
a bit faster than on Linux; it is a bit unexpected for the opposite to
seem true.
--
output = reverse("gro.mca" "@" "enworbbc")
http://www3.sympatico.ca/cbbrowne/sap.html
"I am aware of the benefits  of a micro kernel approach.  However, the
fact remains  that Linux is  here, and GNU  isn't --- and  people have
been working on Hurd for a lot longer than Linus has been working on
Linux." -- Ted T'so, 1992.

Re: The results of my PostgreSQL/filesystem performance

From
Jeff
Date:
On Tue, 26 Aug 2003, Bill Moran wrote:

>
> Intelligent feedback is welcome.
>
That's some good work there, Lou. You'll make sgt for that someday.

But I think the next step, before trying out other filesystems and options
would be concurrency. Run a bunch of these beasts together and see what
happens (I don't think too many of us have a single session running).
Perhaps even make them "interfere" with each other to create as much
"pain" as possible?

on a side note - I might be blind here - I didn't see what version of pg
you were using or any postgresql.conf tweaks - or did you just use
whatever came with each distro?

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/



Re: The results of my PostgreSQL/filesystem performance tests

From
Rod Taylor
Date:
Couple of questions:

What was the postgresql.conf configuration used? Default?

How many threads of the script ran? Looks like a single user only.

I assume there was nothing else running at the time (cron, sendmail,
etc. were all off?)

Do you know whether the machines were disk or I/O bound?

Was PostgreSQL compiled the same for each OS or did you use the rpm,
deb, tgz that were available?

On Tue, 2003-08-26 at 21:47, Bill Moran wrote:
> Hey all.
>
> I said I was going to do it, and I finally did it.
>
> As with all performance tests/benchmarks, there are probably dozens or
> more reasons why these results aren't as accurate or wonderful as they
> should be.  Take them for what they are and hopefully everyone can
> learn a few things from them.
>
> Intelligent feedback is welcome.
>
> http://www.potentialtech.com/wmoran/postgresql.php

Attachment

Re: The results of my PostgreSQL/filesystem performance

From
Tomka Gergely
Date:
2003-08-28 ragyogó napján Christopher Browne ezt üzente:

> A long time ago, in a galaxy far, far away, balazs@neusolutions.com ("Balazs Wellisch") wrote:
> > Very interesting results. I'd like to command you on your honesty.
> > Having started out with the intentions of proving that FreeBSD is faster
> > than Linux only to find that the opposite is true must not have been
> > rewarding for you. However, these unexpected results serve only to
> > reinforce the integrity of your tests.
>
> Well put.
>
> To see a result that the tester didn't really want to see/present does
> suggest good things about the tester's honesty.  There was incentive
> to hide unfavorable results.
>
> What it still leaves quite open is just what happens when the OS has
> more than one disk drive or CPU to play with.  It's not clear what
> happens in such cases, whether FreeBSD would catch up, or be "left
> further in the dust."  The traditional "propaganda" has been that
> there are all sorts of reasons to expect PostgreSQL on FreeBSD to run
> a bit faster than on Linux; it is a bit unexpected for the opposite to
> seem true.

AFAIK *BSD better in the handling of big loads - maybe when multiple
concurrent tests run against a linux and a bsd box, we see better result.
Or not.

--
Tomka Gergely
"S most - vajon barbárok nélkül mi lesz velünk?
Ők mégiscsak megoldás voltak valahogy..."


Re: The results of my PostgreSQL/filesystem performance

From
Tomka Gergely
Date:
2003-08-28 ragyogó napján Ludek Finstrle ezt üzente:

> > Intelligent feedback is welcome.
> >
> > http://www.potentialtech.com/wmoran/postgresql.php
>
> Good work. But I can't find information about xfs. Do you plan to add
> this one FS in test?

http://mail.sth.sze.hu/~hsz/sql/



--
Tomka Gergely
"S most - vajon barbárok nélkül mi lesz velünk?
Ők mégiscsak megoldás voltak valahogy..."


Re: The results of my PostgreSQL/filesystem performance tests

From
Al Hulaton
Date:
> http://www.potentialtech.com/wmoran/postgresql.php
> --
> Bill Moran
> Potential Technologies
> http://www.potentialtech.com

Adding my voice to the many, thanks for sharing your results Bill. Very
instructive.

--
Best,
Al Hulaton    |  Sr. Account Engineer  |  Command Prompt, Inc.
503.222.2783  |  ahulaton@commandprompt.com
Home of Mammoth PostgreSQL and 'Practical PostgreSQL'
Managed PostgreSQL, Linux services and consulting
Read and Search O'Reilly's 'Practical PostgreSQL' at
http://www.commandprompt.com


Re: The results of my PostgreSQL/filesystem performance tests

From
Sean Chittenden
Date:
> What it still leaves quite open is just what happens when the OS has
> more than one disk drive or CPU to play with.  It's not clear what
> happens in such cases, whether FreeBSD would catch up, or be "left
> further in the dust."  The traditional "propaganda" has been that
> there are all sorts of reasons to expect PostgreSQL on FreeBSD to
> run a bit faster than on Linux; it is a bit unexpected for the
> opposite to seem true.

Let me nip this in the butt before people run away with ideas that
aren't correct.  When the tests were performed in FreeBSD 5.1 and
Linux, the hard drives were running UDMA.  When running 4.8, for some
reason his drives settled in on PIO mode:

ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) falling back to
PIOmode 

The benchmarks were hardly conclusive as UDMA runs vastly faster than
PIO.  Until we hear back as to whether cables were jarred loose
between the tests or hearing if something else changed, I'd hardly
consider these conclusive tests given PIO/UDMA is apples to oranges in
terms of speed and I fully expect that FreeBSD 4.8 will perform at
least faster than 5.1 (5.x is still being unwound from Giant), but
should out perform Linux as well if industry experience iss any
indicator.

-sc

--
Sean Chittenden

Re: The results of my PostgreSQL/filesystem performance

From
Bill Moran
Date:
I need to step in and do 2 things:

First, apologize for posting inaccurate test results.

Second, verify that Sean is absolutely correct.  FreeBSD 4.8 was accessing
the drives in PIO mode, which is significantly lousier than DMA, which
RedHat was able to use.  As a result, the tests are unreasonably skewed
in favor of Linux.

The only thing that the currently posted results prove is that Linux is
better at dealing with crappy hardware than BSD (which I feel we already
knew).

I did some rescrounging, and found some newer hardware stuffed in a
corner that I had forgotten was even around.  I am currently re-running
the tests and will post new results as soon as there are enough to
be interesting to talk about.

In an attempt to avoid the same mistake, I did a timed test with dd(1)
to a raw partition on both Linux and FreeBSD to ensure that both systems
are able to access the hardware at more or less the same speed. The
results of this will be included.

I'm also gathering considerably more information about the state of
the system during the tests, which should answer a number of questions
I've been getting.

To the many people who asked questions like "why not try filesystem x
on distribution y" and similar questions, the answer in most cases is
time. I've pared the tests down some so they run faster, and I'm hoping
to be able to run more tests on more combinations of configurations as
a result. Also, I never intended for anyone to assume that I was _done_
testing, just that I had enough results for folks to talk about.

I'll post again when I have enough results to be interesting, until then,
I apologize again for the inaccurate results.

Sean Chittenden wrote:
>>What it still leaves quite open is just what happens when the OS has
>>more than one disk drive or CPU to play with.  It's not clear what
>>happens in such cases, whether FreeBSD would catch up, or be "left
>>further in the dust."  The traditional "propaganda" has been that
>>there are all sorts of reasons to expect PostgreSQL on FreeBSD to
>>run a bit faster than on Linux; it is a bit unexpected for the
>>opposite to seem true.
>
> Let me nip this in the butt before people run away with ideas that
> aren't correct.  When the tests were performed in FreeBSD 5.1 and
> Linux, the hard drives were running UDMA.  When running 4.8, for some
> reason his drives settled in on PIO mode:
>
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) falling back to
PIOmode 
>
> The benchmarks were hardly conclusive as UDMA runs vastly faster than
> PIO.  Until we hear back as to whether cables were jarred loose
> between the tests or hearing if something else changed, I'd hardly
> consider these conclusive tests given PIO/UDMA is apples to oranges in
> terms of speed and I fully expect that FreeBSD 4.8 will perform at
> least faster than 5.1 (5.x is still being unwound from Giant), but
> should out perform Linux as well if industry experience iss any
> indicator.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: The results of my PostgreSQL/filesystem performance tests

From
Sean Chittenden
Date:
> I need to step in and do 2 things:

Thanks for posting that.  Let me know if you have any questions while
doing your testing.  I've found that using 16K blocks on FreeBSD
results in about an 8% speedup in writes to the database, fwiw.

I'm likely going to make this the default for PostgreSQL on FreeBSD
starting with 7.4 (just posted something to -hackers about this)f.  If
you'd like to do this in your testing, just apply the following patch.

Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K
blocks which means that currently, reading two blocks of data in PG is
two read calls to the OS, one reads 16K of data off disk and returns
the 1st page, the 2nd call pulls the 2nd block from the FS cache.  In
making things 16K, it avoids the need for the 2nd system call which is
where the performance difference is coming from, afaikt.  -sc

--
Sean Chittenden

Attachment

Re: The results of my PostgreSQL/filesystem performance tests

From
Vivek Khera
Date:
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes:

>> I need to step in and do 2 things:
SC> Thanks for posting that.  Let me know if you have any questions while
SC> doing your testing.  I've found that using 16K blocks on FreeBSD
SC> results in about an 8% speedup in writes to the database, fwiw.

Where/how does one set this?  In postgresql.conf or on the file system
or during compilation of postgres?  I'm on FreeBSD 4.8 still.

I've got a box right now on which I'm comparing the speed merits of
hardware RAID10, RAID5, and RAID50 using a filesystem benchmark
utility (bonnie++).  If I have time I'm gonna try different striping
block sizes.  Right now I'm using 32k byte stripe size.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: The results of my PostgreSQL/filesystem performance tests

From
Vivek Khera
Date:
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes:

>> I need to step in and do 2 things:
SC> Thanks for posting that.  Let me know if you have any questions while
SC> doing your testing.  I've found that using 16K blocks on FreeBSD
SC> results in about an 8% speedup in writes to the database, fwiw.

ok.. ignore my prior request about how to set that... i missed you had
included a patch.

Any recommendations on newfs parameters for an overly large file
system used solely for Postgres?  Over 100Gb (with raid 10) or over
200Gb (with raid 5)?


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: The results of my PostgreSQL/filesystem performance

From
"scott.marlowe"
Date:
On Thu, 28 Aug 2003, Sean Chittenden wrote:

> > What it still leaves quite open is just what happens when the OS has
> > more than one disk drive or CPU to play with.  It's not clear what
> > happens in such cases, whether FreeBSD would catch up, or be "left
> > further in the dust."  The traditional "propaganda" has been that
> > there are all sorts of reasons to expect PostgreSQL on FreeBSD to
> > run a bit faster than on Linux; it is a bit unexpected for the
> > opposite to seem true.
>
> Let me nip this in the butt before people run away with ideas that
> aren't correct.  When the tests were performed in FreeBSD 5.1 and
> Linux, the hard drives were running UDMA.  When running 4.8, for some
> reason his drives settled in on PIO mode:
>
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) retrying
> ad0s1a: UDMA ICRC error writing fsbn 1458368 of 729184-729215 (ad0s1 bn 1458368; cn 241 tn 12 sn 44) falling back to
PIOmode 
>
> The benchmarks were hardly conclusive as UDMA runs vastly faster than
> PIO.  Until we hear back as to whether cables were jarred loose
> between the tests or hearing if something else changed, I'd hardly
> consider these conclusive tests given PIO/UDMA is apples to oranges in
> terms of speed and I fully expect that FreeBSD 4.8 will perform at
> least faster than 5.1 (5.x is still being unwound from Giant), but
> should out perform Linux as well if industry experience iss any
> indicator.

Plus, in most "real" servers you're gonna be running SCSI, so it might be
nice to see a test with a good SCSI controller (Symbios 875 is a nice
choice) and a couple hard drives, one each for WAL and data.  This would
more closely resemble actual usage and there are likely to be fewer issues
with things like UDMA versus PIO on SCSI.


Re: The results of my PostgreSQL/filesystem performance tests

From
Sean Chittenden
Date:
> >> I need to step in and do 2 things:
> SC> Thanks for posting that.  Let me know if you have any questions while
> SC> doing your testing.  I've found that using 16K blocks on FreeBSD
> SC> results in about an 8% speedup in writes to the database, fwiw.
>
> ok.. ignore my prior request about how to set that... i missed you
> had included a patch.
>
> Any recommendations on newfs parameters for an overly large file
> system used solely for Postgres?  Over 100Gb (with raid 10) or over
> 200Gb (with raid 5)?

Nope, you'll have to test and see.  If you find something that works,
however, let me know.  -sc

--
Sean Chittenden

Re: The results of my PostgreSQL/filesystem performance

From
Bill Moran
Date:
Shridhar Daithankar wrote:
> On 26 Aug 2003 at 21:47, Bill Moran wrote:
>
>
>>Hey all.
>>
>>I said I was going to do it, and I finally did it.
>>
>>As with all performance tests/benchmarks, there are probably dozens or
>>more reasons why these results aren't as accurate or wonderful as they
>>should be.  Take them for what they are and hopefully everyone can
>>learn a few things from them.
>>
>>Intelligent feedback is welcome.
>>
>>http://www.potentialtech.com/wmoran/postgresql.php
>
>
> Can we have these benchmarks and relevant information stored in a central
> archive at techdocs(Say)?
>
> That would be better than searching mail archives..

I agree.

There doesn't seem to be a place on techdocs for benchmarks at the time.
Is there another part of the site that would be good for these to go?
I'll keep them posted on my site until a better location is found.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: The results of my PostgreSQL/filesystem performance

From
"Christopher Kings-Lynne"
Date:
> > As with all performance tests/benchmarks, there are probably dozens or
> > more reasons why these results aren't as accurate or wonderful as they
> > should be.  Take them for what they are and hopefully everyone can
> > learn a few things from them.
> >
> > Intelligent feedback is welcome.
> >
> > http://www.potentialtech.com/wmoran/postgresql.php
>
> I notice that the Linux FSs weren't tested with noatime.  Any
> reason?

My friend, (a FreeBSD committer), was wondering what the results are if you
turn off softupdates (to match Linux default installation) and use noatime.
He also wonders how bug the default IO is?

Chris


Re: The results of my PostgreSQL/filesystem performance tests

From
"Christopher Kings-Lynne"
Date:
> I'm likely going to make this the default for PostgreSQL on FreeBSD
> starting with 7.4 (just posted something to -hackers about this)f.  If
> you'd like to do this in your testing, just apply the following patch.
>
> Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K
> blocks which means that currently, reading two blocks of data in PG is
> two read calls to the OS, one reads 16K of data off disk and returns
> the 1st page, the 2nd call pulls the 2nd block from the FS cache.  In
> making things 16K, it avoids the need for the 2nd system call which is
> where the performance difference is coming from, afaikt.  -sc

Are you _sure_ this won't cause any atomicity problems?  Can FreeBSD write
16k as an atomic unit?

Chris


Re: The results of my PostgreSQL/filesystem performance

From
Bill Moran
Date:
Christopher Kings-Lynne wrote:
>>>As with all performance tests/benchmarks, there are probably dozens or
>>>more reasons why these results aren't as accurate or wonderful as they
>>>should be.  Take them for what they are and hopefully everyone can
>>>learn a few things from them.
>>>
>>>Intelligent feedback is welcome.
>>>
>>>http://www.potentialtech.com/wmoran/postgresql.php
>>
>>I notice that the Linux FSs weren't tested with noatime.  Any
>>reason?
>
> My friend, (a FreeBSD committer), was wondering what the results are if you
> turn off softupdates (to match Linux default installation) and use noatime.

Keep an eye on the page.  The test results will be posted shortly after I
finish them.

Keep in mind, I'm more interested in figuring out what can be done to make
Postgres _faster_, so tests along that line are going to have a higher
priority than ones that specifically compare "apples to apples" or anything
like that.

> He also wonders how bug the default IO is?

Huh?

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: The results of my PostgreSQL/filesystem performance tests

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> > I'm likely going to make this the default for PostgreSQL on FreeBSD
> > starting with 7.4 (just posted something to -hackers about this)f.  If
> > you'd like to do this in your testing, just apply the following patch.
> >
> > Right now PostgreSQL defaults to 8K blocks, but FreeBSD uses 16K
> > blocks which means that currently, reading two blocks of data in PG is
> > two read calls to the OS, one reads 16K of data off disk and returns
> > the 1st page, the 2nd call pulls the 2nd block from the FS cache.  In
> > making things 16K, it avoids the need for the 2nd system call which is
> > where the performance difference is coming from, afaikt.  -sc
>
> Are you _sure_ this won't cause any atomicity problems?  Can FreeBSD write
> 16k as an atomic unit?

We pre-modified page images to WAL before modifying the page.  The disks
are only 512-byte blocks, so we don't rely on file system atomicity
anymore anyway.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: The results of my PostgreSQL/filesystem performance tests

From
Bruce Momjian
Date:
Balazs Wellisch wrote:
>
> Bill,
>
> Very interesting results. I'd like to command you on your honesty.
> Having started out with the intentions of proving that FreeBSD is faster
> than Linux only to find that the opposite is true must not have been
> rewarding for you. However, these unexpected results serve only to
> reinforce the integrity of your tests.

Looking at the results, ext2 is out because it isn't crash safe, and I
have heard Reiser uses a lot more CPU to do its work.  It does show
ext3 as slow, which was expected.  Interesting how JFS came out, and XFS
would be interesting.  And, of course, it is multiple backends that
really shows PostgreSQL off, so it could radically affect the results.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes:

>> I need to step in and do 2 things:
SC> Thanks for posting that.  Let me know if you have any questions while
SC> doing your testing.  I've found that using 16K blocks on FreeBSD
SC> results in about an 8% speedup in writes to the database, fwiw.

Just double checking: if I do this, then I need to halve the
parameters in postgresql.conf that involve buffers, specifically,
max_fsm_pages and shared_buffers.  I think max_fsm_pages should be
adjusted since the number of pages in the system overall has been
halved.

Anything else that should be re-tuned for this?

My tests are still running so I don't have numbers yet.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: FreeBSD page size

From
Vivek Khera
Date:
Ok... simple tests have completed.  Here are some numbers.

FreeBSD 4.8
PG 7.4b2
4GB Ram
Dual Xeon 2.4GHz processors
14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5
 config with 32k stripe size

Dump file:
-rw-r--r--  1 vivek  wheel  1646633745 Aug 28 11:01 19-Aug-2003.dump

When restored (after deleting one index that took up ~1Gb -- turned
out it was redundant to another multi-column index):

% df -k /u/d02
Filesystem    1K-blocks     Used     Avail Capacity  Mounted on
/dev/amrd1s1e 226408360 18067260 190228432     9%    /u/d02



postgresql.conf alterations from standard:
shared_buffers = 60000
sort_mem = 8192
vacuum_mem=131702
max_fsm_pages=1000000
effective_cache_size=25600
random_page-cost = 2


restore time: 14777 seconds
vacuum analyze time: 30 minutes
select count(*) from user_list where owner_id=315;   50388.64 ms


the restore complained often about checkpoints occurring every few
seconds:

Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too frequently (15 seconds apart)
Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing CHECKPOINT_SEGMENTS.

The HINT threw me off since I had to set checkpoint_segments in
postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
compile-time constant.

Anyhow, so I deleted the PG data directory, and made these two
changes:

checkpoint_segments=50
sort_mem = 131702

This *really* improved the time for the restore:

restore time: 11594 seconds

then I reset the checkpoint_segments and sort_mem back to old
values...

vacuum analyze time is still 30 minutes
select count(*) from user_list where owner_id=315;   51363.98 ms

so the select appears a bit slower but it is hard to say why.  the
system is otherwise idle as it is not in production yet.


Then I took the suggestion to update PG's page size to 16k and did the
same increase on sort_mem and checkpoint_segments as above.  I also
halved the shared_buffers and max_fsm_pages  (probably should have
halved the effective_cache_size too...)

restore time: 11322 seconds
vacuum analyze time: 27 minutes
select count(*) from user_list where owner_id=315;   48267.66 ms


Granted, given this simple test it is hard to say whether the 16k
blocks will make an improvement under live load, but I'm gonna give it
a shot.  The 16k block size shows me roughly 2-6% improvement on these
tests.

So throw in my vote for 16k blocks on FreeBSD (and annotate the docs
to tell which parameters need to be halved to account for it).


--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: FreeBSD page size

From
Sean Chittenden
Date:
> Ok... simple tests have completed.  Here are some numbers.
>
> FreeBSD 4.8
> PG 7.4b2
> 4GB Ram
> Dual Xeon 2.4GHz processors
> 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5
>  config with 32k stripe size
[snip]
> Then I took the suggestion to update PG's page size to 16k and did the
> same increase on sort_mem and checkpoint_segments as above.  I also
> halved the shared_buffers and max_fsm_pages  (probably should have
> halved the effective_cache_size too...)
>
> restore time: 11322 seconds
> vacuum analyze time: 27 minutes
> select count(*) from user_list where owner_id=315;   48267.66 ms
>
>
> Granted, given this simple test it is hard to say whether the 16k
> blocks will make an improvement under live load, but I'm gonna give it
> a shot.  The 16k block size shows me roughly 2-6% improvement on these
> tests.
>
> So throw in my vote for 16k blocks on FreeBSD (and annotate the docs
> to tell which parameters need to be halved to account for it).

I haven't had a chance to run any tests yet (ELIFE), but there was a
suggestion that 32K blocks was a better performer than 16K blocks
(!!??!!??).  I'm not sure why this is and my only guess is that it
relies more heavily on the disk cache to ease IO.  Since you have the
hardware setup, Vivek, would it be possible for you to run a test with
32K blocks?

I've started writing a threaded benchmarking program called pg_crush
that I hope to post here in a few days that'll time connection startup
times, INSERTs, DELETEs, UPDATEs, and both sequential scans as well as
index scans for random and sequentially ordered tuples.  It's similar
to pgbench, except it generates its own data, uses pthreads (chears on
KSE!), and returns more fine grained timing information for the
various activities.

-sc

--
Sean Chittenden

Re: FreeBSD page size

From
Vivek Khera
Date:
>>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes:

SC> hardware setup, Vivek, would it be possible for you to run a test with
SC> 32K blocks?

Will do.  What's another 4 hours... ;-)

I guess I'll halve the buffer size parameters again...

SC> I've started writing a threaded benchmarking program called pg_crush
SC> that I hope to post here in a few days that'll time connection startup

Ok.  Please post it when it is ready.  I've decided to wait until 7.4
is final before going to production so I've got this very expensive
very fast box doing not much of anything for a little while...

Re: FreeBSD page size

From
"Marc G. Fournier"
Date:

Just curious, but Bruce(?) mentioned that apparently a 32k block size was
found to show a 15% improvement ... care to run one more test? :)

On Wed, 3 Sep 2003, Vivek Khera wrote:

> Ok... simple tests have completed.  Here are some numbers.
>
> FreeBSD 4.8
> PG 7.4b2
> 4GB Ram
> Dual Xeon 2.4GHz processors
> 14 U320 SCSI disks attached to Dell PERC3/DC RAID controller in RAID 5
>  config with 32k stripe size
>
> Dump file:
> -rw-r--r--  1 vivek  wheel  1646633745 Aug 28 11:01 19-Aug-2003.dump
>
> When restored (after deleting one index that took up ~1Gb -- turned
> out it was redundant to another multi-column index):
>
> % df -k /u/d02
> Filesystem    1K-blocks     Used     Avail Capacity  Mounted on
> /dev/amrd1s1e 226408360 18067260 190228432     9%    /u/d02
>
>
>
> postgresql.conf alterations from standard:
> shared_buffers = 60000
> sort_mem = 8192
> vacuum_mem=131702
> max_fsm_pages=1000000
> effective_cache_size=25600
> random_page-cost = 2
>
>
> restore time: 14777 seconds
> vacuum analyze time: 30 minutes
> select count(*) from user_list where owner_id=315;   50388.64 ms
>
>
> the restore complained often about checkpoints occurring every few
> seconds:
>
> Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too frequently (15 seconds apart)
> Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing CHECKPOINT_SEGMENTS.
>
> The HINT threw me off since I had to set checkpoint_segments in
> postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
> compile-time constant.
>
> Anyhow, so I deleted the PG data directory, and made these two
> changes:
>
> checkpoint_segments=50
> sort_mem = 131702
>
> This *really* improved the time for the restore:
>
> restore time: 11594 seconds
>
> then I reset the checkpoint_segments and sort_mem back to old
> values...
>
> vacuum analyze time is still 30 minutes
> select count(*) from user_list where owner_id=315;   51363.98 ms
>
> so the select appears a bit slower but it is hard to say why.  the
> system is otherwise idle as it is not in production yet.
>
>
> Then I took the suggestion to update PG's page size to 16k and did the
> same increase on sort_mem and checkpoint_segments as above.  I also
> halved the shared_buffers and max_fsm_pages  (probably should have
> halved the effective_cache_size too...)
>
> restore time: 11322 seconds
> vacuum analyze time: 27 minutes
> select count(*) from user_list where owner_id=315;   48267.66 ms
>
>
> Granted, given this simple test it is hard to say whether the 16k
> blocks will make an improvement under live load, but I'm gonna give it
> a shot.  The 16k block size shows me roughly 2-6% improvement on these
> tests.
>
> So throw in my vote for 16k blocks on FreeBSD (and annotate the docs
> to tell which parameters need to be halved to account for it).
>
>
> --
> =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> Vivek Khera, Ph.D.                Khera Communications, Inc.
> Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
> AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>

Re: FreeBSD page size

From
Bruce Momjian
Date:
Vivek Khera wrote:
> the restore complained often about checkpoints occurring every few
> seconds:
>
> Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too frequently (15 seconds apart)
> Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing CHECKPOINT_SEGMENTS.
>
> The HINT threw me off since I had to set checkpoint_segments in
> postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
> compile-time constant.

Woo hoo, my warning worked.  Great.

I uppercased it because config parameters are uppercased in the
documentation.  Do we mention config parameters in any other error
messages?  Should it be lowercased?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FreeBSD page size

From
Rod Taylor
Date:
> I uppercased it because config parameters are uppercased in the
> documentation.  Do we mention config parameters in any other error
> messages?  Should it be lowercased?

How about changing the hint?

Consider increasing CHECKPOINT_SEGMENTS in your postgresql.conf

Attachment

Re: FreeBSD page size

From
"Marc G. Fournier"
Date:

On Wed, 3 Sep 2003, Bruce Momjian wrote:

> Vivek Khera wrote:
> > the restore complained often about checkpoints occurring every few
> > seconds:
> >
> > Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too frequently (15 seconds apart)
> > Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing CHECKPOINT_SEGMENTS.
> >
> > The HINT threw me off since I had to set checkpoint_segments in
> > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
> > compile-time constant.
>
> Woo hoo, my warning worked.  Great.
>
> I uppercased it because config parameters are uppercased in the
> documentation.  Do we mention config parameters in any other error
> messages?  Should it be lowercased?

k, to me upper case denotes a compiler #define, so I would have been
confused ... I'd go with lower case and single quotes around it to denote
its a variable to be changed ...

Re: FreeBSD page size

From
Bruce Momjian
Date:
Marc G. Fournier wrote:
> On Wed, 3 Sep 2003, Bruce Momjian wrote:
>
> > Vivek Khera wrote:
> > > the restore complained often about checkpoints occurring every few
> > > seconds:
> > >
> > > Sep  2 11:57:14 d02 postgres[49721]: [5-1] LOG:  checkpoints are occurring too frequently (15 seconds apart)
> > > Sep  2 11:57:14 d02 postgres[49721]: [5-2] HINT:  Consider increasing CHECKPOINT_SEGMENTS.
> > >
> > > The HINT threw me off since I had to set checkpoint_segments in
> > > postgresql.conf, where as CHECKPOINT_SEGMENTS implied to me a
> > > compile-time constant.
> >
> > Woo hoo, my warning worked.  Great.
> >
> > I uppercased it because config parameters are uppercased in the
> > documentation.  Do we mention config parameters in any other error
> > messages?  Should it be lowercased?
>
> k, to me upper case denotes a compiler #define, so I would have been
> confused ... I'd go with lower case and single quotes around it to denote
> its a variable to be changed ...

Done.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FreeBSD page size

From
Vivek Khera
Date:
>>>>> "MGF" == Marc G Fournier <scrappy@hub.org> writes:

MGF> Just curious, but Bruce(?) mentioned that apparently a 32k block size was
MGF> found to show a 15% improvement ... care to run one more test? :)


Well, it is hard to tell based on my quick and dirty test:

16k page size:
restore time: 11322 seconds
vacuum analyze time:  1663 seconds (27 minutes)
select count(*) from user_list where owner_id=315;   56666.64 ms



32k page size:
restore time: 11430 seconds
vacuum analyze time: 1346 seconds
select count(*) from user_list where owner_id=315;   63275.73 ms


one anomaly I note is that if I re-run the select count(*) query
above, the large the page size, the longer the query takes.  In the
standard 8k page size, it was on the order of 306ms, with 16k page
size it was over 1400, and with 32k page size nearly 3000ms.

Another anomaly I note is that for the larger indexes, the relpages
doesn't scale as expected.  ie, I'd expect roughly half the relpages
per index for 32k page size as for 16k page size, but this is not
always the case... some are about the same size and some are about 2/3
and some are about 1/2.  The smaller indexes are often the same number
of pages (when under 20 pages).


I think I'm going to write a synthetic load generator that does a
bunch of inserts to some linked tables with several indexes, then goes
thru and pounds on it (update/select) from multiple children with
occasional vacuum's thrown in.  That's the only way to get 'real'
numbers, it seems.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: FreeBSD page size

From
Bruce Momjian
Date:
Vivek Khera wrote:
> >>>>> "MGF" == Marc G Fournier <scrappy@hub.org> writes:
>
> MGF> Just curious, but Bruce(?) mentioned that apparently a 32k block size was
> MGF> found to show a 15% improvement ... care to run one more test? :)
>
>
> Well, it is hard to tell based on my quick and dirty test:

The 32k number is from Tatsuo testing a few years ago.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FreeBSD page size

From
Vivek Khera
Date:
>>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:


BM> The 32k number is from Tatsuo testing a few years ago.

Can you verify for me that these parameters in postgresql.conf are
based on the BLCKSZ (ie one buffer is the size of the BLCKSIZ macro):

shared_buffers
effective_cache_size

Logically it makes sense, but I want to be sure I'm adjusting my
postgresql.conf parameters accordingly when I try different block
sizes.

Thanks.

--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.                Khera Communications, Inc.
Internet: khera@kciLink.com       Rockville, MD       +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

Re: FreeBSD page size (was Re: The results of my

From
"scott.marlowe"
Date:
On Wed, 3 Sep 2003, Vivek Khera wrote:

> >>>>> "SC" == Sean Chittenden <sean@chittenden.org> writes:
>
> >> I need to step in and do 2 things:
> SC> Thanks for posting that.  Let me know if you have any questions while
> SC> doing your testing.  I've found that using 16K blocks on FreeBSD
> SC> results in about an 8% speedup in writes to the database, fwiw.
>
> Just double checking: if I do this, then I need to halve the
> parameters in postgresql.conf that involve buffers, specifically,
> max_fsm_pages and shared_buffers.  I think max_fsm_pages should be
> adjusted since the number of pages in the system overall has been
> halved.
>
> Anything else that should be re-tuned for this?

Yes, effective_cache_size as well is measured in pgsql blocks.


Re: FreeBSD page size

From
Bruce Momjian
Date:
Vivek Khera wrote:
> >>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
>
> BM> The 32k number is from Tatsuo testing a few years ago.
>
> Can you verify for me that these parameters in postgresql.conf are
> based on the BLCKSZ (ie one buffer is the size of the BLCKSIZ macro):
>
> shared_buffers
> effective_cache_size
>
> Logically it makes sense, but I want to be sure I'm adjusting my
> postgresql.conf parameters accordingly when I try different block
> sizes.

Uh, yes, I think they have to be the same because they are pages in the
shared buffer cache, not disk blocks.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: FreeBSD page size

From
Bruce Momjian
Date:
Vivek Khera wrote:
> >>>>> "BM" == Bruce Momjian <pgman@candle.pha.pa.us> writes:
>
>
> BM> The 32k number is from Tatsuo testing a few years ago.
>
> Can you verify for me that these parameters in postgresql.conf are
> based on the BLCKSZ (ie one buffer is the size of the BLCKSIZ macro):
>
> shared_buffers
> effective_cache_size
>
> Logically it makes sense, but I want to be sure I'm adjusting my
> postgresql.conf parameters accordingly when I try different block
> sizes.

Also, to check, you can use ipcs to see the shared memory sizes
allocated.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073