Thread: New to PostgreSQL, performance considerations

New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Hi yall,

although I've worked with databases for more than 7 years now, I'm
petty new to PostgreSQL.

I have an application using SQLite3 as an embedded SQL solution
because it's simple and it can handle the load that *most* of my
clients have.

Because of that '*most*' part, because of the client/server way and
because of the license, I'm think about start using PostgreSQL.

My app uses only three tables: one has low read and really high write
rates, a second has high read and low write and the third one is
equally high on both.

I need a db that can handle something like 500 operations/sec
continuously. It's something like 250 writes/sec and 250 reads/sec. My
databases uses indexes.

Each table would have to handle 5 million rows/day. So I'm thinking
about creating different tables (clusters?) to different days to make
queries return faster. Am I right or there is no problem in having a
150 million (one month) rows on a table?

All my data is e-mail traffic: user's quarentine, inbond traffic,
outbond traffic, sender, recipients, subjects, attachments, etc...

What do you people say, is it possible with PostgreSQL? What kind of
hardware would I need to handle that kind of traffic?

On a first test, at a badly tunned AMD Athlon XP 1800+ (ergh!) I could
do 1400 writes/sec locally after I disabled fsync. We have UPSs, in
the last year we only had 1 power failure.

Thank you all for your tips.

Best regards,
Daniel Colchete

Re: New to PostgreSQL, performance considerations

From
Shane Ambler
Date:
Daniel van Ham Colchete wrote:
> although I've worked with databases for more than 7 years now, I'm
> petty new to PostgreSQL.

Same here.

> I need a db that can handle something like 500 operations/sec
> continuously. It's something like 250 writes/sec and 250 reads/sec. My
> databases uses indexes.

Taken from an email to the admin list about a week ago -

Stats about the system:
Postgres 8.1.4
db size: 200+ GB
Inheritance is used extremely heavily, so in figuring out what could
cause a create to hang, it may be of interest to know that there are:
101,745 tables
314,821 indexes
1,569 views
The last averages taken on the number of writes per hour on this
database: ~3 million (this stat is a few weeks old)

Machine info:
OS: Solaris 10
Sunfire X4100 XL
2x AMD Opteron Model 275 dual core procs
8GB of ram


> Each table would have to handle 5 million rows/day. So I'm thinking
> about creating different tables (clusters?) to different days to make
> queries return faster. Am I right or there is no problem in having a
> 150 million (one month) rows on a table?

Sounds to me that a month might be on the large size for real fast
response times - I would think of seperating weekly rather than daily.

Start with
http://www.postgresql.org/docs/8.2/interactive/ddl-inherit.html
then the next chapter explains using that to partition data into
different tables dependant on specified criteria.

You may be interested in tsearch2 which is in the contrib dir and adds
full text indexing.


--

Shane Ambler
pgSQL@007Marketing.com

Get Sheeky @ http://Sheeky.Biz

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Hi Gene,

at my postgresql.conf, the only non-comented lines are:
fsync = off
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
max_connections = 100
shared_buffers = 5000
temp_buffers = 1000
work_mem = 4096

The only two values I changed are shared_buffers and work_mem.

*** BUT ***
I'm using Gentoo Linux, so all my libraries (including glibc that is
very important to PostgreSQL), and all my softwares are compiled with
good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My
Linux is not an Intel-AMD binary compatible turtle like
Fedora/RedHat/SUSE/... It's really important to have your GLIBC
compiled for your processor.  It is essencial for performance.

I can't imagine anyone buying a $1k-dollar quad-core XEON and using an
i585 compatible distro that doesn't even know what the fudge is
SSE/SSE2/vectorized instructions.

Best regards,
Daniel Colchete

On 12/10/06, Gene <genekhart@gmail.com> wrote:
> I have a similar type application, I'm partitioning using constraint
> exclusion so queries only have to look at a few tables. I've found that
> there is some overhead to using partitioning so you should test to see how
> many partitions you want to create. Could I check out you postgresql.conf
> parameters to compare? thanks
>
>
> Gene Hart

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Shane,

really good answer man!!! It really helped me!

I can't wait to have time to test this partitions thing. This will
really solve a lot of my problems (and another one that I was
delaying).

Instead of making partitions over days/weeks I can tune it to the
seconds (timestamp) and try to find the best partition size for
performance. That partition thing is really good.

That server you sent me handles about 15 times more writes than I would have to.

Thank you very much for your answer.

Best regards,
Daniel Colchete

On 12/10/06, Shane Ambler <pgsql@007marketing.com> wrote:
> Daniel van Ham Colchete wrote:
> > although I've worked with databases for more than 7 years now, I'm
> > petty new to PostgreSQL.
>
> Same here.
>
> > I need a db that can handle something like 500 operations/sec
> > continuously. It's something like 250 writes/sec and 250 reads/sec. My
> > databases uses indexes.
>
> Taken from an email to the admin list about a week ago -
>
> Stats about the system:
> Postgres 8.1.4
> db size: 200+ GB
> Inheritance is used extremely heavily, so in figuring out what could
> cause a create to hang, it may be of interest to know that there are:
> 101,745 tables
> 314,821 indexes
> 1,569 views
> The last averages taken on the number of writes per hour on this
> database: ~3 million (this stat is a few weeks old)
>
> Machine info:
> OS: Solaris 10
> Sunfire X4100 XL
> 2x AMD Opteron Model 275 dual core procs
> 8GB of ram
>
>
> > Each table would have to handle 5 million rows/day. So I'm thinking
> > about creating different tables (clusters?) to different days to make
> > queries return faster. Am I right or there is no problem in having a
> > 150 million (one month) rows on a table?
>
> Sounds to me that a month might be on the large size for real fast
> response times - I would think of seperating weekly rather than daily.
>
> Start with
> http://www.postgresql.org/docs/8.2/interactive/ddl-inherit.html
> then the next chapter explains using that to partition data into
> different tables dependant on specified criteria.
>
> You may be interested in tsearch2 which is in the contrib dir and adds
> full text indexing.
>
>
> --
>
> Shane Ambler
> pgSQL@007Marketing.com
>
> Get Sheeky @ http://Sheeky.Biz
>

Re: New to PostgreSQL, performance considerations

From
Gene
Date:
I'm using gentoo as well, I'm having performance issues as the number of partitions is increasing I imagine do due to overhead managing them and figuring out where to put each insert/update. I'm switching to weekly partitions instead of daily. I believe in PG8.2 constraint exclusion works with updates/deletes also so I'm eager to upgrade. I get about 1 million records per day in two tables each, each record updated about 4 times within 30 minutes.

Do you think using UTF8 vs US-ASCII hurts performance signficantly, some of my smaller tables require unicode, and I don't think you can have some tables be unicode and some be ASCII.

On 12/10/06, Daniel van Ham Colchete <daniel.colchete@gmail.com> wrote:
Hi Gene,

at my postgresql.conf, the only non-comented lines are:
fsync = off
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
max_connections = 100
shared_buffers = 5000
temp_buffers = 1000
work_mem = 4096

The only two values I changed are shared_buffers and work_mem.

*** BUT ***
I'm using Gentoo Linux, so all my libraries (including glibc that is
very important to PostgreSQL), and all my softwares are compiled with
good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My
Linux is not an Intel-AMD binary compatible turtle like
Fedora/RedHat/SUSE/... It's really important to have your GLIBC
compiled for your processor.  It is essencial for performance.

I can't imagine anyone buying a $1k-dollar quad-core XEON and using an
i585 compatible distro that doesn't even know what the fudge is
SSE/SSE2/vectorized instructions.

Best regards,
Daniel Colchete

On 12/10/06, Gene <genekhart@gmail.com> wrote:
> I have a similar type application, I'm partitioning using constraint
> exclusion so queries only have to look at a few tables. I've found that
> there is some overhead to using partitioning so you should test to see how
> many partitions you want to create. Could I check out you postgresql.conf
> parameters to compare? thanks
>
>
> Gene Hart

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match



--
Gene Hart
cell: 443-604-2679

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Gene,

at Postgres's docs they say that the "constraint checks are relatively
expensive". From what you're saying, it's really worth studying the
matter deply first.

I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
charsets to a database. They're all simple C strings that doesn't have
the zero-byte in the midlle (like UTF16 would) and that doesn't
require any different processing unless you are doing case insensitive
search (them you would have a problem).

ASCII chars are also correct UTF8 chars as well. The first 127 Unicode
chars are the same as the ASCII chars. So you would not have any
problems changing your table from ASCII to UTF8. My software uses
UTF16 and UTF8 at some of it's internals and I only notice performance
problems with UTF16 (because of the zero-byte thing, the processing I
make is diferent). So, I imagine that you wouldn't have any
performance issues changing from ASCII to UTF8 if necessary.

Nowadays everything is turning to Unicode (thank god). I wouldn't
start anything with any other charset. I would only be asking for a
rewrite in a near future.

Best,
Daniel

On 12/10/06, Gene <genekhart@gmail.com> wrote:
> I'm using gentoo as well, I'm having performance issues as the number of
> partitions is increasing I imagine do due to overhead managing them and
> figuring out where to put each insert/update. I'm switching to weekly
> partitions instead of daily. I believe in PG8.2 constraint exclusion works
> with updates/deletes also so I'm eager to upgrade. I get about 1 million
> records per day in two tables each, each record updated about 4 times within
> 30 minutes.
>
> Do you think using UTF8 vs US-ASCII hurts performance signficantly, some of
> my smaller tables require unicode, and I don't think you can have some
> tables be unicode and some be ASCII.

Re: New to PostgreSQL, performance considerations

From
Michael Glaesemann
Date:
On Dec 11, 2006, at 10:47 , Daniel van Ham Colchete wrote:

> I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
> charsets to a database.

If what you mean by ASCII is SQL_ASCII, then there is at least one
significant difference between UTF8 (the PostgreSQL encoding) and
SQL_ASCII. AIUI, SQL_ASCII does no checking at all with respect to
what bytes are going in, while UTF8 does make sure to the best of its
ability that the bytes represent valid UTF-8 characters, throwing an
error if an invalid byte sequence is detected.

There's more information regarding this here:
http://www.postgresql.org/docs/8.2/interactive/
multibyte.html#MULTIBYTE-CHARSET-SUPPORTED

Michael Glaesemann
grzm seespotcode net



Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Sun, Dec 10, 2006 at 11:02:44PM -0200, Daniel van Ham Colchete wrote:
>I'm using Gentoo Linux, so all my libraries (including glibc that is
>very important to PostgreSQL), and all my softwares are compiled with
>good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My
>Linux is not an Intel-AMD binary compatible turtle like
>Fedora/RedHat/SUSE/... It's really important to have your GLIBC
>compiled for your processor.  It is essencial for performance.

Please, point to the benchmarks that demonstrate this for a postgres
application.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Dave Cramer
Date:
Hi Daniel
On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote:

> Hi Gene,
>
> at my postgresql.conf, the only non-comented lines are:
> fsync = off
This can, and will result in lost data.
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'

How much memory does this machine have and what version of postgresql
are you using?
> max_connections = 100
> shared_buffers = 5000
> temp_buffers = 1000
> work_mem = 4096
>
> The only two values I changed are shared_buffers and work_mem.

Dave
>
> *** BUT ***
> I'm using Gentoo Linux, so all my libraries (including glibc that is
> very important to PostgreSQL), and all my softwares are compiled with
> good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My
> Linux is not an Intel-AMD binary compatible turtle like
> Fedora/RedHat/SUSE/... It's really important to have your GLIBC
> compiled for your processor.  It is essencial for performance.
>
> I can't imagine anyone buying a $1k-dollar quad-core XEON and using an
> i585 compatible distro that doesn't even know what the fudge is
> SSE/SSE2/vectorized instructions.
>
> Best regards,
> Daniel Colchete
>
> On 12/10/06, Gene <genekhart@gmail.com> wrote:
>> I have a similar type application, I'm partitioning using constraint
>> exclusion so queries only have to look at a few tables. I've found
>> that
>> there is some overhead to using partitioning so you should test to
>> see how
>> many partitions you want to create. Could I check out you
>> postgresql.conf
>> parameters to compare? thanks
>>
>>
>> Gene Hart
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>


Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote:

> I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
> charsets to a database. They're all simple C strings that doesn't have
> the zero-byte in the midlle (like UTF16 would) and that doesn't
> require any different processing unless you are doing case insensitive
> search (them you would have a problem).

That's not the whole story. UTF-8 and other variable-width encodings
don't provide a 1:1 mapping of logical characters to single bytes; in
particular, combination characters opens the possibility of multiple
different byte sequences mapping to the same code point; therefore,
string comparison in such encodings generally cannot be done at the
byte level (unless, of course, you first acertain that the strings
involved are all normalized to an unambiguous subset of your encoding).

PostgreSQL's use of strings is not limited to string comparison.
Substring extraction, concatenation, regular expression matching, up/
downcasing, tokenization and so on are all part of PostgreSQL's small
library of text manipulation functions, and all deal with logical
characters, meaning they must be Unicode-aware.

Alexander.

Re: New to PostgreSQL, performance considerations

From
Tatsuo Ishii
Date:
> That's not the whole story. UTF-8 and other variable-width encodings
> don't provide a 1:1 mapping of logical characters to single bytes; in
> particular, combination characters opens the possibility of multiple
> different byte sequences mapping to the same code point; therefore,
> string comparison in such encodings generally cannot be done at the
> byte level (unless, of course, you first acertain that the strings
> involved are all normalized to an unambiguous subset of your encoding).

Can you show me such encodings supported by PostgreSQL other
than UTF-8?
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 11, 2006, at 04:35 , Tatsuo Ishii wrote:

>> That's not the whole story. UTF-8 and other variable-width encodings
>> don't provide a 1:1 mapping of logical characters to single bytes; in
>> particular, combination characters opens the possibility of multiple
>> different byte sequences mapping to the same code point; therefore,
>> string comparison in such encodings generally cannot be done at the
>> byte level (unless, of course, you first acertain that the strings
>> involved are all normalized to an unambiguous subset of your
>> encoding).
>
> Can you tell me such encodings supported by PostgreSQL other
> than UTF-8?

http://www.postgresql.org/docs/8.1/interactive/
multibyte.html#MULTIBYTE-CHARSET-SUPPORTED

Alexander.

Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 11, 2006, at 10:20 , Tatsuo Ishii wrote:

> My question was what kind of encoding other than UTF-8 has a
> chracteristic such as: "combination characters opens the possibility
> of multiple different byte sequences mapping to the same code point"

No idea; perhaps only UTF-8. What I said was that variable-width
encodings don't provide a 1:1 mapping of logical characters to single
bytes; combination characters is a feature of Unicode.

Alexander.


Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/11/06, Alexander Staubo <alex@purefiction.net> wrote:
> On Dec 11, 2006, at 02:47 , Daniel van Ham Colchete wrote:
>
> > I never understood what's the matter between the ASCII/ISO-8859-1/UTF8
> > charsets to a database. They're all simple C strings that doesn't have
> > the zero-byte in the midlle (like UTF16 would) and that doesn't
> > require any different processing unless you are doing case insensitive
> > search (them you would have a problem).
>
> That's not the whole story. UTF-8 and other variable-width encodings
> don't provide a 1:1 mapping of logical characters to single bytes; in
> particular, combination characters opens the possibility of multiple
> different byte sequences mapping to the same code point; therefore,
> string comparison in such encodings generally cannot be done at the
> byte level (unless, of course, you first acertain that the strings
> involved are all normalized to an unambiguous subset of your encoding).
>
> PostgreSQL's use of strings is not limited to string comparison.
> Substring extraction, concatenation, regular expression matching, up/
> downcasing, tokenization and so on are all part of PostgreSQL's small
> library of text manipulation functions, and all deal with logical
> characters, meaning they must be Unicode-aware.
>
> Alexander.
>

You're right. I was thinking only about my cases that takes the
Unicode normatization for granted and doesn't use
regexp/tokenization/...
Thanks

Best
Daniel

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Hi Dave,

On 12/11/06, Dave Cramer <pg@fastcrypt.com> wrote:
> Hi Daniel
> On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote:
>
> > Hi Gene,
> >
> > at my postgresql.conf, the only non-comented lines are:
> > fsync = off
> This can, and will result in lost data.

I know... If there is a power failure things can happen. I'm know, but
the performance dif is really really big I just have to decide if I'm
willing to take that chance or not.

> > lc_messages = 'C'
> > lc_monetary = 'C'
> > lc_numeric = 'C'
> > lc_time = 'C'
>
> How much memory does this machine have and what version of postgresql
> are you using?
It's only a test server with 512MB RAM, I only used it to see how well
would  the PostgreSQL do in a ugly case.

Daniel

Re: New to PostgreSQL, performance considerations

From
Dave Cramer
Date:
On 11-Dec-06, at 5:36 AM, Daniel van Ham Colchete wrote:

> Hi Dave,
>
> On 12/11/06, Dave Cramer <pg@fastcrypt.com> wrote:
>> Hi Daniel
>> On 10-Dec-06, at 8:02 PM, Daniel van Ham Colchete wrote:
>>
>> > Hi Gene,
>> >
>> > at my postgresql.conf, the only non-comented lines are:
>> > fsync = off
>> This can, and will result in lost data.
>
> I know... If there is a power failure things can happen. I'm know, but
> the performance dif is really really big I just have to decide if I'm
> willing to take that chance or not.
>
>> > lc_messages = 'C'
>> > lc_monetary = 'C'
>> > lc_numeric = 'C'
>> > lc_time = 'C'
>>
>> How much memory does this machine have and what version of postgresql
>> are you using?
> It's only a test server with 512MB RAM, I only used it to see how well
> would  the PostgreSQL do in a ugly case.

Given that optimal performance for postgresql can require up to 50%
of available memory, you are going to leave the OS with 256MB of
memory ?

Dave
>
> Daniel
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>


Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Mike,

unfortunally I don't have any benchmarks right now. Doing something
like this with the same machine would be a 2-day work (at least).
Installing a Gentoo and putting it to run well is not a quick task
(although is easy).

But, trust me on this one. It's worth it. Think of this: PostgreSQL
and GNU LibC use a lot of complex algorithms: btree, hashes,
checksums, strings functions, etc... And you have a lot of ways to
compile it into binary code. Now you have Pentium4's vectorization
that allow you to run plenty of instructions in paralell, but AMD
doesn't have this. Intel also have SSE2 that makes double-precision
floatpoint operations a lot faster, AMD also doesn't have this (at
least on 32bits). Now imagine that you're RedHat and that you have to
deliver one CD to AMD and Intel servers. That means you can't use any
AMD-specific or Intel-specific tecnology at the binary level.

Things can get worse. If you take a look at GCC's code (at
gcc/config/i386/i386.c), you'll see that GCC knows what runs faster on
each processor. Let-me give an example with the FDIV and FSQRT
instructions:

ARCH - FDIV - FSQRT (costs relative to an ADD)
i386: 88 - 122
i486: 73 - 83
pentium: 39 - 70
k6: 56 - 56
Athlon: 24 - 35
K8: 19 - 35
Pentium4: 43 - 43
Nocona: 40 - 44

Imagine that you are GCC and that you have two options in front of
you: you can use FSQRT or FDIV plus 20 ADD/SUB. If you are on an
Pentium situation you should use the second option. But on a Pentium4
or on a Athlon you should choose for the first one. This was only an
example, nowadays you would have to choose between: 387, 3dNow,
3dNow+, SSE, SSE2, ...

With this info, GCC knows how to choose the best ways to doing things
to each processor. When you are compiling to an generic i586 (like
Fedora and RedHat), them you are using pentium's timings.

An example that I know of: it's impossible to run my software at a
high demanding customer without compiling it to the it's processor (I
make 5 compilations on every release). Using Intel's Compiler for
Intel's processors makes things even faster, but it is not free and
the "how fast" part really depends on your application is coded.

With 64bits processors, AMD and Intel restarted the process and
everyone has SSE2 (but not SSE3). Even so, the timings are also very
diferent.

Best regards
Daniel

On 12/11/06, Michael Stone <mstone+postgres@mathom.us> wrote:
> On Sun, Dec 10, 2006 at 11:02:44PM -0200, Daniel van Ham Colchete wrote:
> >I'm using Gentoo Linux, so all my libraries (including glibc that is
> >very important to PostgreSQL), and all my softwares are compiled with
> >good CFLAG options to my processor ("-O2 march=athlon-xp (...)"). My
> >Linux is not an Intel-AMD binary compatible turtle like
> >Fedora/RedHat/SUSE/... It's really important to have your GLIBC
> >compiled for your processor.  It is essencial for performance.
>
> Please, point to the benchmarks that demonstrate this for a postgres
> application.
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
> >>
> >> How much memory does this machine have and what version of postgresql
> >> are you using?
> > It's only a test server with 512MB RAM, I only used it to see how well
> > would  the PostgreSQL do in a ugly case.
>
> Given that optimal performance for postgresql can require up to 50%
> of available memory, you are going to leave the OS with 256MB of
> memory ?

If it were the case I wouldn't have any problems letting the OS use
only 256MB, but this is not my production server. My production is
built yet. It'll have at least 2GB of memory.

But it's good to know anyway.

Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
> But, trust me on this one. It's worth it.

You know what? I don't.

> Think of this: PostgreSQL and GNU LibC use a lot of complex algorithms:
> btree, hashes, checksums, strings functions, etc... And you have a lot of
> ways to compile it into binary code. Now you have Pentium4's vectorization
> that allow you to run plenty of instructions in paralell, but AMD doesn't
> have this. Intel also have SSE2 that makes double-precision floatpoint
> operations a lot faster, AMD also doesn't have this (at least on 32bits).

Athlon 64 has SSE2, also in 32-bit-mode.

Of course, it doesn't really matter, since at the instant you hit the disk
even once, it's going to take a million cycles and any advantage you got from
saving single cycles is irrelevant.

> Imagine that you are GCC and that you have two options in front of
> you: you can use FSQRT or FDIV plus 20 ADD/SUB.

Could you please describe a reasonable case where GCC would have such an
option? I cannot imagine any.

> An example that I know of: it's impossible to run my software at a
> high demanding customer without compiling it to the it's processor (I
> make 5 compilations on every release).

What's "your software"? How can you make such assertions without backing them
up? How can you know that the same holds for PostgreSQL?

As Mike said, point to the benchmarks showing this "essential" difference
between -O2 and -O2 -mcpu=pentium4 (or whatever). The only single worthwhile
difference I can think of, is that glibc can use the SYSENTER function if it
knows you have a 686 or higher (which includes AMD), and with recent kernels,
I'm not even sure if that is needed anymore.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
>unfortunally I don't have any benchmarks right now.

That's fairly normal for gentoo users pushing their compile options.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/11/06, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
> On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
> > But, trust me on this one. It's worth it.
>
> You know what? I don't.
So test it yourself.

> > Think of this: PostgreSQL and GNU LibC use a lot of complex algorithms:
> > btree, hashes, checksums, strings functions, etc... And you have a lot of
> > ways to compile it into binary code. Now you have Pentium4's vectorization
> > that allow you to run plenty of instructions in paralell, but AMD doesn't
> > have this. Intel also have SSE2 that makes double-precision floatpoint
> > operations a lot faster, AMD also doesn't have this (at least on 32bits).
>
> Athlon 64 has SSE2, also in 32-bit-mode.
It's true. But, I'm not saying that Postfix is faster on AMD or Intel
systems. I'm saying that it's a lot faster on you compile Postfix and
your glibc to your processor. AMD also has features that Intel systems
doesn't: 3dNow for example. The fact is that if your distro is
compatible with a plain Athlon, you can't use neighter SSE nor SSE2.

> Of course, it doesn't really matter, since at the instant you hit the disk
> even once, it's going to take a million cycles and any advantage you got from
> saving single cycles is irrelevant.

Really??? We're talking about high performance systems and every case
is diferent. I once saw a ddr2 ram based storage once (like 1TB).
Before you say it, I don't understand how it works, but you won't lose
your data on a reboot or powerfailure. It was very expensive but
really solve this thing with the IO bottleneck. Even when your
bottleneck is the IO, still makes no sense to waste CPU resources
unnecessarily.

> > Imagine that you are GCC and that you have two options in front of
> > you: you can use FSQRT or FDIV plus 20 ADD/SUB.
>
> Could you please describe a reasonable case where GCC would have such an
> option? I cannot imagine any.
As I said, it is an example. Take floatpoint divisions. You have
plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have
to make a choice. And this is only one case. Usually, compiler
optimizations are really complex and the processor's timings counts a
lot.

At every optimization the compile needs to mesure the quickest path,
so it uses information on how the processor will run the code. If you
take a look the AMD's docs you will see that theirs SSE2
implementation is diferent from Intel's internally. So, sometimes the
quickest path uses SSE2 and sometimes it doesn't. You also have to
count the costs of converting SSE registers to commom ones.

If you still can't imagine any case, you can read Intel's assembler
reference. You'll see that there are a lot of ways of doing a lot of
things.

> > An example that I know of: it's impossible to run my software at a
> > high demanding customer without compiling it to the it's processor (I
> > make 5 compilations on every release).
>
> What's "your software"? How can you make such assertions without backing them
> up? How can you know that the same holds for PostgreSQL?
>
> As Mike said, point to the benchmarks showing this "essential" difference
> between -O2 and -O2 -mcpu=pentium4 (or whatever). The only single worthwhile
> difference I can think of, is that glibc can use the SYSENTER function if it
> knows you have a 686 or higher (which includes AMD), and with recent kernels,
> I'm not even sure if that is needed anymore.

Steinar, you should really test it. I won't read the PostgreSQL source
to point you were it could use SSE or SSE2 or whatever. And I won't
read glibc's code.

You don't need to belive in what I'm saying. You can read GCC docs,
Intel's assembler reference, AMD's docs about their processor and
about how diferent that arch is.

Best regards,
Daniel Colchete

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
This is very very very true :-)!

I just remebered one case with MySQL. When I changed the distro from
Conectiva 10 (rpm-based ended brazilian distro) to Gentoo, a MySQL
operation that usually took 2 minutes to run, ended in 47 seconds.
This is absolutely vage. I don't have how to prove it to you. The old
situation doesn't even exists anymore because I used the same hardware
on the upgrade. And I can't mesure how each factor helped: compiling
glibc and Mysql with good cflags, rebuilding my database in a ordered
way, never kernel, etc.. All I know is that this process still runs
with less than 1 minute (my database is larger now).

I used the very same hardware: P4 3.0Ghz SATA disk without RAID. And I
only upgraded because Conectiva's support to their version 10 ended
and I need to keep my system up with the security patches.

Best,
Daniel

On 12/11/06, Michael Stone <mstone+postgres@mathom.us> wrote:
> On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote:
> >unfortunally I don't have any benchmarks right now.
>
> That's fairly normal for gentoo users pushing their compile options.
>
> Mike Stone

Re: New to PostgreSQL, performance considerations

From
Christopher Browne
Date:
Oops! daniel.colchete@gmail.com ("Daniel van Ham Colchete") was seen spray-painting on a wall:
> But, trust me on this one. It's worth it.

No, the point of performance analysis is that you *can't* trust the
people that say "trust me on this one."

If you haven't got a benchmark where you can demonstrate a material
and repeatable difference, then you're just some Gentoo "speed racer"
making things up.
--
select 'cbbrowne' || '@' || 'acm.org';
http://linuxdatabases.info/info/wp.html
One  last  point  about metaphor,  poetry,   etc.   As  an example  to
illustrate these  capabilities  in   Sastric Sanskrit,    consider the
"bahuvrihi" construct (literally  "man with a  lot of rice") which  is
used currently  in   linguistics to  describe  references  outside  of
compounds.

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
You are right Christopher.

Okay. Let's solve this matter.

What PostgreSQL benchmark software should I use???

I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
the same version FC6 uses and install it at my Gentoo. I'll use the
same hardware (diferent partitions to each).

I'm not saying that Gentoo is faster than FC6. I just want to prove
that if you compile your software to make better use of your
processor, it will run faster.

It might take a few days because I'm pretty busy right now at my job.

Best regards,
Daniel

On 12/11/06, Christopher Browne <cbbrowne@acm.org> wrote:
> Oops! daniel.colchete@gmail.com ("Daniel van Ham Colchete") was seen spray-painting on a wall:
> > But, trust me on this one. It's worth it.
>
> No, the point of performance analysis is that you *can't* trust the
> people that say "trust me on this one."
>
> If you haven't got a benchmark where you can demonstrate a material
> and repeatable difference, then you're just some Gentoo "speed racer"
> making things up.
> --
> select 'cbbrowne' || '@' || 'acm.org';
> http://linuxdatabases.info/info/wp.html
> One  last  point  about metaphor,  poetry,   etc.   As  an example  to
> illustrate these  capabilities  in   Sastric Sanskrit,    consider the
> "bahuvrihi" construct (literally  "man with a  lot of rice") which  is
> used currently  in   linguistics to  describe  references  outside  of
> compounds.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Mon, Dec 11, 2006 at 11:09:13AM -0200, Daniel van Ham Colchete wrote:
>> You know what? I don't.
> So test it yourself.

You're making the claims, you're supposed to be proving them...

> As I said, it is an example. Take floatpoint divisions. You have
> plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have
> to make a choice.

No, you don't. MMX, SSE and 3Dnow! will all give you the wrong result
(reduced precision). SSE2, on the other hand, has double precision floats, so
you might have a choice there -- except that PostgreSQL doesn't really do a
lot of floating-point anyhow.

> And this is only one case. Usually, compiler optimizations are really
> complex and the processor's timings counts a lot.

You keep asserting this, with no good backing.

> If you still can't imagine any case, you can read Intel's assembler
> reference. You'll see that there are a lot of ways of doing a lot of
> things.

I've been programming x86 assembler for ten years or so...

> Steinar, you should really test it. I won't read the PostgreSQL source
> to point you were it could use SSE or SSE2 or whatever. And I won't
> read glibc's code.

Then you should stop making these sort of wild claims.

> You don't need to belive in what I'm saying. You can read GCC docs,
> Intel's assembler reference, AMD's docs about their processor and
> about how diferent that arch is.

I have.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/11/06, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
> On Mon, Dec 11, 2006 at 11:09:13AM -0200, Daniel van Ham Colchete wrote:
> >> You know what? I don't.
> > So test it yourself.
>
> You're making the claims, you're supposed to be proving them...
>
> > As I said, it is an example. Take floatpoint divisions. You have
> > plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have
> > to make a choice.
>
> No, you don't. MMX, SSE and 3Dnow! will all give you the wrong result
> (reduced precision). SSE2, on the other hand, has double precision floats, so
> you might have a choice there -- except that PostgreSQL doesn't really do a
> lot of floating-point anyhow.
>
> > And this is only one case. Usually, compiler optimizations are really
> > complex and the processor's timings counts a lot.
>
> You keep asserting this, with no good backing.
>
> > If you still can't imagine any case, you can read Intel's assembler
> > reference. You'll see that there are a lot of ways of doing a lot of
> > things.
>
> I've been programming x86 assembler for ten years or so...
So, I'm a newbie to you. I learned x86 assembler last year.

> > Steinar, you should really test it. I won't read the PostgreSQL source
> > to point you were it could use SSE or SSE2 or whatever. And I won't
> > read glibc's code.
>
> Then you should stop making these sort of wild claims.
>
> > You don't need to belive in what I'm saying. You can read GCC docs,
> > Intel's assembler reference, AMD's docs about their processor and
> > about how diferent that arch is.
>
> I have.
>
> /* Steinar */

Okay, I'll do the benchmarks. Just sent an e-mail about this to the
list. If you have any sugestions of how to make the benchmark please
let-me know.
I like when I prove myself wrong. Although it's much better when I'm
right :-)...

Best regards,
Daniel Colchete

Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Mon, Dec 11, 2006 at 11:17:06AM -0200, Daniel van Ham Colchete wrote:
> I just remebered one case with MySQL. When I changed the distro from
> Conectiva 10 (rpm-based ended brazilian distro) to Gentoo, a MySQL
> operation that usually took 2 minutes to run, ended in 47 seconds.

How do you know that this improvement had _anything_ to do with the use of
different optimization flags? Were even the MySQL versions or configuration
the same?

> This is absolutely vage.

Indeed it is.

> I don't have how to prove it to you.

No, but you should stop making this sort of "absolutely essential" claims if
you can't.

> And I can't mesure how each factor helped: compiling glibc and Mysql with
> good cflags, rebuilding my database in a ordered way, never kernel, etc..

Exactly. So why are you attributing it to the first factor only? And why do
you think this would carry over to PostgreSQL?

Remember, anecdotal evidence isn't.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
> What PostgreSQL benchmark software should I use???

Look up the list archives; search for "TPC".

> I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
> the same version FC6 uses and install it at my Gentoo. I'll use the
> same hardware (diferent partitions to each).

Why do you want to compare FC6 and Gentoo? Wasn't your point that the -march=
was supposed to be the relevant factor here? In that case, you want to keep
all other things equal; so use the same distribution, only with -O2
-march=i686 vs. -march=athlon-xp (or whatever).

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
"Merlin Moncure"
Date:
On 12/11/06, Daniel van Ham Colchete <daniel.colchete@gmail.com> wrote:
> But, trust me on this one. It's worth it. Think of this: PostgreSQL
> and GNU LibC use a lot of complex algorithms: btree, hashes,
> checksums, strings functions, etc... And you have a lot of ways to
> compile it into binary code. Now you have Pentium4's vectorization
> that allow you to run plenty of instructions in paralell, but AMD
> doesn't have this. Intel also have SSE2 that makes double-precision
> floatpoint operations a lot faster, AMD also doesn't have this (at
> least on 32bits). Now imagine that you're RedHat and that you have to
> deliver one CD to AMD and Intel servers. That means you can't use any
> AMD-specific or Intel-specific tecnology at the binary level.

AMD processors since the K6-2 and I think Intel ones since P-Pro are
essentially RISC processors with a hardware microcode compiler that
translates and reorganizes instructions on the fly.  Instruction
choice and ordering was extremely important in older 32 bit
architectures (like the 486) but is much less important these days.  I
think you will find that an optimized glibc might be faster in
specific contrived cases, the whole is unfortunately less than the sum
of its parts.

While SSE2 might be able to optimize things like video decoding and
the like, for most programs it's of little benifit and IMO a waste of
time.  Also as others pointed out things like cache hits/misses and
i/o considerations are actually much more important than instruction
execution speed.  We ran Gentoo here for months and did not to be
faster enough to merit the bleeding edge quirks it has for production
environments.

If you dig assembly, there was an interesting tackle of the spinlocks
code on the hackers list last year IIRC.

merlin

Re: New to PostgreSQL, performance considerations

From
Chris Browne
Date:
daniel.colchete@gmail.com ("Daniel van Ham Colchete") writes:
> You are right Christopher.
>
> Okay. Let's solve this matter.
>
> What PostgreSQL benchmark software should I use???

pgbench is one option.

There's a TPC-W at pgFoundry
(<http://pgfoundry.org/projects/tpc-w-php/>).

There's the Open Source Database Benchmark.
(<http://osdb.sourceforge.net/>)

Those are three reasonable options.

> I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
> the same version FC6 uses and install it at my Gentoo. I'll use the
> same hardware (diferent partitions to each).

Wrong approach.  You'll be comparing apples to oranges, because Gentoo
and Fedora pluck sources from different points in the source code
stream.

In order to prove what you want to prove, you need to run the
benchmarks all on Gentoo, where you run with 4 categorizations:

 1.  Where you run PostgreSQL and GLIBC without any processor-specific
     optimizations

 2.  Where you run PostgreSQL and GLIBC with all relevant
     processor-specific optimizations

 3.  Where you run PostgreSQL with, and GLIBC without
     processor-specific optimizations

 4.  Where you run PostgreSQL without, and GLIBC with processor-specific
     optimizations

That would allow one to clearly distinguish which optimizations are
particularly relevant.

> I'm not saying that Gentoo is faster than FC6. I just want to prove
> that if you compile your software to make better use of your
> processor, it will run faster.
>
> It might take a few days because I'm pretty busy right now at my
> job.

I expect that you'll discover, if you actually do these tests, that
this belief is fairly much nonsense.

- Modern CPUs do a huge amount of on-CPU self-tuning.

- CPU features that could have a material effect tend to be unusable
  when compiling general purpose libraries and applications.  GCC
  doesn't generate MMX-like instructions.

- Database application performance tends to be I/O driven.

- When database application performance *isn't* I/O driven, it is
  likely to be driven by cache management, which compiler options
  won't affect.
--
output = reverse("ofni.secnanifxunil" "@" "enworbbc")
http://linuxdatabases.info/info/sgml.html
"very few people approach me in real life and insist on proving they
are drooling idiots."  -- Erik Naggum, comp.lang.lisp

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
This definitely is the correct approach.

Actually, Daniel van Ham Colchete may not be as "all wet" as some
around here think.  We've had previous data that shows that pg can
become CPU bound (see previous posts by Josh Berkus and others
regarding CPU overhead in what should be IO bound tasks).

In addition, we know that x86 compatible 64b implementations differ
enough between AMD and Intel products that it sometimes shows on benches.

Evidence outside the DBMS arena supports the hypothesis that recent
CPUs are needing more hand-holding and product specific compiling,
not less, compared to their previous versions.

Side Note: I wonder what if anything pg could gain from using SWAR
instructions (SSE*, MMX, etc)?

I'd say the fairest attitude is to do everything we can to support
having the proper experiments done w/o presuming the results.

Ron Peacetree


At 10:47 AM 12/11/2006, Chris Browne wrote:

>In order to prove what you want to prove, you need to run the
>benchmarks all on Gentoo, where you run with 4 categorizations:
>
>  1.  Where you run PostgreSQL and GLIBC without any processor-specific
>      optimizations
>
>  2.  Where you run PostgreSQL and GLIBC with all relevant
>      processor-specific optimizations
>
>  3.  Where you run PostgreSQL with, and GLIBC without
>      processor-specific optimizations
>
>  4.  Where you run PostgreSQL without, and GLIBC with processor-specific
>      optimizations
>
>That would allow one to clearly distinguish which optimizations are
>particularly relevant.
>
> > I'm not saying that Gentoo is faster than FC6. I just want to prove
> > that if you compile your software to make better use of your
> > processor, it will run faster.
> >
> > It might take a few days because I'm pretty busy right now at my
> > job.
>
>I expect that you'll discover, if you actually do these tests, that
>this belief is fairly much nonsense.
>
>- Modern CPUs do a huge amount of on-CPU self-tuning.
>
>- CPU features that could have a material effect tend to be unusable
>   when compiling general purpose libraries and applications.  GCC
>   doesn't generate MMX-like instructions.
>
>- Database application performance tends to be I/O driven.
>
>- When database application performance *isn't* I/O driven, it is
>   likely to be driven by cache management, which compiler options
>   won't affect.


Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Mon, Dec 11, 2006 at 12:15:51PM -0500, Ron wrote:
>I'd say the fairest attitude is to do everything we can to support
>having the proper experiments done w/o presuming the results.

Who's presuming results?[1] It is fair to say that making extraordinary
claims without any evidence should be discouraged. It's also fair to say
that if there are specific things that need cpu-specific tuning they'll
be fairly limited critical areas (e.g., locks) which would probably be
better implemented with a hand-tuned code and runtime cpu detection than
by magical mystical compiler invocations.

Mike Stone

[1] I will say that I have never seen a realistic benchmark of general
code where the compiler flags made a statistically significant
difference in the runtime. There are some particularly cpu-intensive
codes, like some science simulations or encoding routines where they
matter, but that's not the norm--and many of those algorithms already
have hand-tuned versions which will outperform autogenerated code. You'd
think that with all the talk that the users of certain OS's generate
about CFLAG settings, there'd be some well-published numbers backing up
the hype. At any rate if there were numbers to back the claim then I
think they could certainly be considered without prejudice.

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
Statements like these can not be reasonably interpreted in any manner
_except_ that of presuming the results:

"I expect that you'll discover, if you actually do these tests, that
this belief  (that using arch specific compiler options lead to
better performing SW) is fairly much nonsense."

"...IMO a waste of time..."

etc

The correct objective response to claims w/o evidence is to request
evidence, and to do everything we can to support it being properly
gathered.  Not to try to discourage the claimant from even trying by
ganging up on them with multiple instances of Argument From Authority
or variations of Ad Hominem attacks.
(The validity of the claim has nothing to do with the skills or
experience of the claimant or anyone else in the discussion.  Only on
the evidence.)

  It is a tad unfair and prejudicial to call claims that CPU
optimizations matter to the performance of DB product "extraordinary".
Evidence outside the DBMS field exists; and previous posts here show
that pg can indeed become CPU-bound during what should be IO bound tasks.
At the moment, Daniel's claims are not well supported.  That is far
different from being "extraordinary" given the current circumstantial
evidence.

Let's also bear in mind that as a community project, we can use all
the help we can get.  Driving potential resources away is in
opposition to that goal.

[1] The evidence that arch specific flags matter to performance can
be found as easily as recompiling your kernel or your
compiler.   While it certainly could be argued how "general purpose"
such SW is, the same could be said for just about any SW at some
level of abstraction.

Ron Peacetree


At 12:31 PM 12/11/2006, Michael Stone wrote:
>On Mon, Dec 11, 2006 at 12:15:51PM -0500, Ron wrote:
>>I'd say the fairest attitude is to do everything we can to support
>>having the proper experiments done w/o presuming the results.
>
>Who's presuming results?[1] It is fair to say that making
>extraordinary claims without any evidence should be discouraged.
>It's also fair to say that if there are specific things that need
>cpu-specific tuning they'll be fairly limited critical areas (e.g.,
>locks) which would probably be better implemented with a hand-tuned
>code and runtime cpu detection than by magical mystical compiler invocations.
>
>Mike Stone
>
>[1] I will say that I have never seen a realistic benchmark of
>general code where the compiler flags made a statistically
>significant difference in the runtime. There are some particularly
>cpu-intensive codes, like some science simulations or encoding
>routines where they matter, but that's not the norm--and many of
>those algorithms already have hand-tuned versions which will
>outperform autogenerated code. You'd think that with all the talk
>that the users of certain OS's generate about CFLAG settings,
>there'd be some well-published numbers backing up the hype. At any
>rate if there were numbers to back the claim then I think they could
>certainly be considered without prejudice.


Re: New to PostgreSQL, performance considerations

From
"Luke Lonergan"
Date:
Michael,

On 12/11/06 9:31 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:

> [1] I will say that I have never seen a realistic benchmark of general
> code where the compiler flags made a statistically significant
> difference in the runtime.

Here's one - I wrote a general purpose Computational Fluid Dynamics analysis
method used by hundreds of people to perform aircraft and propulsion systems
analysis.  Compiler flag tuning would speed it up by factors of 2-3 or even
more on some architectures.  The reason it was so effective is that the
structure of the code was designed to be general, but also to expose the
critical performance sections in a way that the compilers could use - deep
pipelining/vectorization, unrolling, etc, were carefully made easy for the
compilers to exploit in critical sections.  Yes, this made the code in those
sections harder to read, but it was a common practice because it might take
weeks of runtime to get an answer and performance mattered.

The problem I see with general purpose DBMS code the way it's structured in
pgsql (and others) is that many of the critical performance sections are
embedded in abstract interfaces that obscure them from optimization.  An
example is doing a simple "is equal to" operation has many layers
surrounding it to ensure that UDFs can be declared and that special
comparison semantics can be accomodated.  But if you're simply performing a
large number of INT vs. INT comparisons, it will be thousands of times
slower than a CPU native operation because of the function call overhead,
etc.  I've seen presentations that show IPC of Postgres at about 0.5, versus
the 2-4 possible from the CPU.

Column databases like C-Store remove these abstractions at planner time to
expose native operations in large chunks to the compiler and the IPC
reflects that - typically 1+ and as high as 2.5.  If we were to redesign the
executor and planner to emulate that same structure we could achieve similar
speedups and the compiler would matter more.

- Luke



Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Mon, Dec 11, 2006 at 01:20:50PM -0500, Ron wrote:
>(The validity of the claim has nothing to do with the skills or
>experience of the claimant or anyone else in the discussion.  Only on
>the evidence.)

Please go back and reread the original post. I don't think the response
was unwarranted.

>  It is a tad unfair and prejudicial to call claims that CPU
>optimizations matter to the performance of DB product "extraordinary".
>Evidence outside the DBMS field exists; and previous posts here show
>that pg can indeed become CPU-bound during what should be IO bound tasks.
>At the moment, Daniel's claims are not well supported.  That is far
>different from being "extraordinary" given the current circumstantial
>evidence.

No, they're extraordinary regardless of whether postgres is CPU bound.
The question is whether cpu-specific compiler flags will have a
significant impact--which is, historically, fairly unlikely. Far more
likely is that performance can be improved with either a
non-cpu-specific optimization (e.g., loop unrolling vs not) or with an
algorithmic enhancement.

More importantly, you're arguing *your own* point, not the original
claim. I'll refresh your memory: "My Linux is not an Intel-AMD binary
compatible turtle like Fedora/RedHat/SUSE/... It's really important to
have your GLIBC compiled for your processor.  It is essencial for
performance." You wanna draw the line between that (IMO, extraordinary)
claim and the rational argument that you're trying to substitute in its
place?

>[1] The evidence that arch specific flags matter to performance can
>be found as easily as recompiling your kernel or your
>compiler.

Then, please, point to the body of evidence. IME, the results of such
efforts aren't statistically all that signficant on most workloads. I'm
sure there are edge cases, but it's certainly not going to be on my top
ten things to look at when tuning a database system. (If your kernel's
cpu utilization is the bottleneck in your database, you've probably got
bigger problems than compiler flags can solve.) Where you get the real
big benefits in a (linux) kernel recompile is when you select code
that's specifically tuned for a particular processor--not from
arch-specific gcc flags--and those sorts of things are increasingly
moving toward boot-time autotuning rather than compile-time manual
tuning for obvious reasons.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Mon, Dec 11, 2006 at 10:30:55AM -0800, Luke Lonergan wrote:
>Here's one - I wrote a general purpose Computational Fluid Dynamics analysis
>method used by hundreds of people to perform aircraft and propulsion systems
>analysis.

That's kinda the opposite of what I meant by general code. I was trying
(perhaps poorly) to distinguish between scientific codes and other
stuff (especially I/O or human interface code).

>Compiler flag tuning would speed it up by factors of 2-3 or even
>more on some architectures.  The reason it was so effective is that the
>structure of the code was designed to be general, but also to expose the
>critical performance sections in a way that the compilers could use - deep
>pipelining/vectorization, unrolling, etc, were carefully made easy for the
>compilers to exploit in critical sections.

It also sounds like code specifically written to take advantage of
compiler techniques, rather than random code thrown at a pile of cflags.
I don't disagree that it is possible to get performance improvements if
code is written to be performant code; I do (and did) disagree with the
idea that you'll get huge performance improvements by taking regular old
C application code and playing with compiler flags.

>Yes, this made the code in those
>sections harder to read, but it was a common practice because it might
>take weeks of runtime to get an answer and performance mattered.

IMO that's appropriate for some science codes (although I think even
that sector is beginning to find that they've gone too far in a lot of
ways), but for a database I'd rather have people debugging clean, readable
code than risking my data to something incomprehensible that runs in
optimal time.

>Column databases like C-Store remove these abstractions at planner time to
>expose native operations in large chunks to the compiler and the IPC
>reflects that - typically 1+ and as high as 2.5.  If we were to redesign the
>executor and planner to emulate that same structure we could achieve similar
>speedups and the compiler would matter more.

gcc --make-it-really-fast-by-rewriting-it-from-the-ground-up?

Mike Stone

Re: New to PostgreSQL, performance considerations

From
"Merlin Moncure"
Date:
On 12/11/06, Ron <rjpeace@earthlink.net> wrote:
> Statements like these can not be reasonably interpreted in any manner
> _except_ that of presuming the results:
>
> "I expect that you'll discover, if you actually do these tests, that
> this belief  (that using arch specific compiler options lead to
> better performing SW) is fairly much nonsense."
>
> "...IMO a waste of time..."
>
> etc
>
> The correct objective response to claims w/o evidence is to request
> evidence, and to do everything we can to support it being properly
> gathered.  Not to try to discourage the claimant from even trying by
> ganging up on them with multiple instances of Argument From Authority
> or variations of Ad Hominem attacks.
> (The validity of the claim has nothing to do with the skills or
> experience of the claimant or anyone else in the discussion.  Only on
> the evidence.)

/shrugs, this is not debate class, I just happened to have barked up
this particular tree before, and decided to share my insights from it.
 A lot of the misunderstanding here stems from legacy perceptions
about how cpus work, not to mention the entire architecture.  If
somebody produces hard facts to the contrary, great, and I encourage
them to do so.

also, some people posting here, not necessarily me, are authority figures. :-)

merlin

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 01:47 PM 12/11/2006, Michael Stone wrote:
>On Mon, Dec 11, 2006 at 01:20:50PM -0500, Ron wrote:
>>(The validity of the claim has nothing to do with the skills or
>>experience of the claimant or anyone else in the discussion.  Only
>>on the evidence.)
>
>Please go back and reread the original post. I don't think the
>response was unwarranted.

So he's evidently young and perhaps a trifle over-enthusiast.  We
were once too. ;-)

We are not going to get valuable contributions nor help people become
more valuable to the community by "flaming them into submission".

...and who knows, =properly= done experiment may provide both
surprises and unexpected insights/benefits.

I agree completely with telling him he needs to get better evidence
and even with helping him understand how he should go about getting it.

It should be noted that his opposition has not yet done these
experiments either.  (Else they could just simply point to the
results that refute Daniel's hypothesis.)

The reality is that a new CPU architecture and multiple new memory
technologies are part of this discussion.  I certainly do not expect
them to change the fundamental thinking regarding how to get best
performance for a DBMS.  OTOH, there are multiple valid reasons to
give such new stuff a thorough and rigorous experimental shake-down.

ATM, =both= sides of this debate are lacking evidence for their POV.

Let's support getting definitive evidence.  No matter who brings it
to the table ;-)
Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/11/06, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
> On Mon, Dec 11, 2006 at 11:17:06AM -0200, Daniel van Ham Colchete wrote:
> > I just remebered one case with MySQL. When I changed the distro from
> > Conectiva 10 (rpm-based ended brazilian distro) to Gentoo, a MySQL
> > operation that usually took 2 minutes to run, ended in 47 seconds.
>
> How do you know that this improvement had _anything_ to do with the use of
> different optimization flags? Were even the MySQL versions or configuration
> the same?
>
> > This is absolutely vage.
>
> Indeed it is.
Finally we agreed on something.

>
> > I don't have how to prove it to you.
>
> No, but you should stop making this sort of "absolutely essential" claims if
> you can't.
>
> > And I can't mesure how each factor helped: compiling glibc and Mysql with
> > good cflags, rebuilding my database in a ordered way, never kernel, etc..
>
> Exactly. So why are you attributing it to the first factor only? And why do
> you think this would carry over to PostgreSQL?
>
> Remember, anecdotal evidence isn't.

But that's exactly what I said. I'm not attributing this case to the
optimization factor. As I said there are a lot of factors involved.
The MySQL version change of a minor upgrade (from 4.1.15 to 4.1.21).

Steinar, I say I'll do the benchmark and it will be the end of the story.

>
> /* Steinar */

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 02:28 PM 12/11/2006, Merlin Moncure wrote:

>also, some people posting here, not necessarily me, are authority figures. :-)
>
>merlin

Noam Chomsky was one of the most influential thinkers in Linguistics
to yet have lived.  He was proven wrong a number of times.  Even
within Linguistics.
There are plenty of other historical examples.

As others have said, opinion without evidence and logic is just that- opinion.
And even Expert Opinion has been known to be wrong.  Sometimes very much so.

Part of what makes an expert an expert is that they can back up their
statements with evidence and logic that are compelling even to the
non expert when asked to do so.

All I'm saying is let's all remember how "assume" is spelled and
support the getting of some hard data.
Ron Peacetree



Re: New to PostgreSQL, performance considerations

From
"Luke Lonergan"
Date:
Michael,

On 12/11/06 10:57 AM, "Michael Stone" <mstone+postgres@mathom.us> wrote:

> That's kinda the opposite of what I meant by general code. I was trying
> (perhaps poorly) to distinguish between scientific codes and other
> stuff (especially I/O or human interface code).

Yes - choice of language has often been a differentiator in these markets -
LISP versus FORTRAN, C++ versus SQL/DBMS.  This isn't just about science,
it's also in Business Intelligence - e.g. Special purpose datamining code
versus algorithms expressed inside a data management engine.

> It also sounds like code specifically written to take advantage of
> compiler techniques, rather than random code thrown at a pile of cflags.
> I don't disagree that it is possible to get performance improvements if
> code is written to be performant code; I do (and did) disagree with the
> idea that you'll get huge performance improvements by taking regular old
> C application code and playing with compiler flags.

Agreed - that's my point exactly.

> IMO that's appropriate for some science codes (although I think even
> that sector is beginning to find that they've gone too far in a lot of
> ways), but for a database I'd rather have people debugging clean, readable
> code than risking my data to something incomprehensible that runs in
> optimal time.

Certainly something of a compromise is needed.

>> Column databases like C-Store remove these abstractions at planner time to
>
> gcc --make-it-really-fast-by-rewriting-it-from-the-ground-up?

Maybe not from ground->up, but rather from about 10,000 ft -> 25,000 ft?

There are some who have done a lot of work studying the impact of more
efficient DBMS, see here:
  http://homepages.cwi.nl/~boncz/x100.html

- Luke



Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/11/06, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
> On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
> > What PostgreSQL benchmark software should I use???
>
> Look up the list archives; search for "TPC".
>
> > I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
> > the same version FC6 uses and install it at my Gentoo. I'll use the
> > same hardware (diferent partitions to each).
>
> Why do you want to compare FC6 and Gentoo? Wasn't your point that the -march=
> was supposed to be the relevant factor here? In that case, you want to keep
> all other things equal; so use the same distribution, only with -O2
> -march=i686 vs. -march=athlon-xp (or whatever).
>
> /* Steinar */
Using Gentoo is just a easy way to make cflag optimizations to all the
other libs as well: glibc, ...

I can also mesure performance on Gentoo with cflag optimized
PostgreSQL and plain PostgreSQL as well.

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Mon, Dec 11, 2006 at 02:51:09PM -0500, Ron wrote:
>Let's support getting definitive evidence.

Since nobody opposed the concept of contrary evidence, I don't suppose
you're fighting an uphill battle on that particular point.

It's fine to get preachy about supporting intellectual curiosity, but do
remember that it's a waste of everyone's (limited) time to give equal
time to all theories. If someone comes to you with an idea for a
perpetual motion machine your effort is probably better spent on
something other than helping him build it, regardless of whether that
somehow seems unfair. (Now if he brings a working model, that's a
different story...) Heck, even building a bunch of non-working perpetual
motion machines as a demonstration is a waste of time, because it's
always easy to say "well, if you had just...".  That's precisely why the
person bringing the extraordinary claim is also expected to bring the
proof, rather than expecting that everyone else prove the status quo.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/11/06, Chris Browne <cbbrowne@acm.org> wrote:
> daniel.colchete@gmail.com ("Daniel van Ham Colchete") writes:
> > You are right Christopher.
> >
> > Okay. Let's solve this matter.
> >
> > What PostgreSQL benchmark software should I use???
>
> pgbench is one option.
>
> There's a TPC-W at pgFoundry
> (<http://pgfoundry.org/projects/tpc-w-php/>).
>
> There's the Open Source Database Benchmark.
> (<http://osdb.sourceforge.net/>)
>
> Those are three reasonable options.
Thanks Chris, I'm going to take a look at those options.

>
> > I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
> > the same version FC6 uses and install it at my Gentoo. I'll use the
> > same hardware (diferent partitions to each).
>
> Wrong approach.  You'll be comparing apples to oranges, because Gentoo
> and Fedora pluck sources from different points in the source code
> stream.
>
> In order to prove what you want to prove, you need to run the
> benchmarks all on Gentoo, where you run with 4 categorizations:
>
>  1.  Where you run PostgreSQL and GLIBC without any processor-specific
>      optimizations
>
>  2.  Where you run PostgreSQL and GLIBC with all relevant
>      processor-specific optimizations
>
>  3.  Where you run PostgreSQL with, and GLIBC without
>      processor-specific optimizations
>
>  4.  Where you run PostgreSQL without, and GLIBC with processor-specific
>      optimizations
>
> That would allow one to clearly distinguish which optimizations are
> particularly relevant.

Good ideia also. And it is much easier to do as well.

Re: New to PostgreSQL, performance considerations

From
"Craig A. James"
Date:
Ron wrote:
> We are not going to get valuable contributions nor help people become
> more valuable to the community by "flaming them into submission".
>
> Let's support getting definitive evidence.  No matter who brings it to
> the table ;-)

Thanks, Ron, for a voice of respect and reason.  Since I first started using Usenet back in 1984, inexplicable rudeness
hasbeen a plague on otherwise civilized people.  We're a community, we're all in this to help one another.  Sometimes
wegive good advice, and sometimes even those "wearing the mantle of authority" can make boneheaded comments.  I know I
do,and when it happens, I always appreciate it when I'm taken to task with good humor and tolerance. 

When someone comes to this forum with an idea you disagree with, no matter how brash or absurd their claims, it's so
easyto challenge them with grace and good humor, rather than chastizing with harsh words and driving someone from our
community. If you're right, you will have taught a valuable lesson to someone.  And if on occasion a newcomer shows us
somethingnew, then we've all learned.  Either way, we have a new friend and contributor to the community. 

Craig

Re: New to PostgreSQL, performance considerations

From
Mark Kirkwood
Date:
Daniel van Ham Colchete wrote:
> On 12/11/06, Steinar H. Gunderson <sgunderson@bigfoot.com> wrote:
>> On Mon, Dec 11, 2006 at 11:31:48AM -0200, Daniel van Ham Colchete wrote:
>> > What PostgreSQL benchmark software should I use???
>>
>> Look up the list archives; search for "TPC".
>>
>> > I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get
>> > the same version FC6 uses and install it at my Gentoo. I'll use the
>> > same hardware (diferent partitions to each).
>>
>> Why do you want to compare FC6 and Gentoo? Wasn't your point that the
>> -march=
>> was supposed to be the relevant factor here? In that case, you want to
>> keep
>> all other things equal; so use the same distribution, only with -O2
>> -march=i686 vs. -march=athlon-xp (or whatever).
>>
>> /* Steinar */
> Using Gentoo is just a easy way to make cflag optimizations to all the
> other libs as well: glibc, ...
>
> I can also mesure performance on Gentoo with cflag optimized
> PostgreSQL and plain PostgreSQL as well.
>

I can certainly recall that when I switched from Fedora Core (2 or 3
can't recall now) to Gentoo that the machine was "faster" for many
activities. Of course I can't recall precisely what now :-(.

To actually track down *why* and *what* make it faster is another story,
and custom CFLAGS is only 1 of the possible factors: others could be:

- different kernel versions (Gentoo would have possibly been later)
- different kernel patches (both RedHat and Gentoo patch 'em)
- different versions of glibc (Gentoo possibly later again).
- different config options for glibc (not sure if they in fact are, but
it's possible...)
- kernel and glibc built with different versions of gcc (again I suspect
Gentoo may have used a later version)

So there are a lot of variables to consider if you want to settle this
debate once and for all :-)!

Best wishes

Mark

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Hi yall,

I made some preliminary tests.

Before the results, I would like to make some acknowledgments:
1 - I didn't show any prove to any of the things I said until now.
2 - It really is a waste of everyone's time to say one thing when I
can't prove it.

But all I said, is the knowledge I have been acumulating over the past
few years working on a project where optimization is important. After
algorithmic optimizations, compiler options is the second on my list
and with my software they show measurable improvement. With the other
software I use, they seen to run faster, but I didn't measure it.

TEST PROCEDURE
================
I ran this test at a Gentoo test machine I have here. It's a Pentium 4
3.0GHz (I don't know witch P4) with 1 GB of RAM memory. It only uses
SATA drives. I didn't changed my glibc (or any other lib) during the
test. I used GCC 3.4.6.

I ran each test three times. So we can get an idea about average
values and standard deviation.

Each time I ran the test with the following commands:
dropdb mydb
createdb mydb
pgbench -i -s 10 mydb 2> /dev/null
psql -c 'vacuum analyze' mydb
psql -c 'checkpoint' mydb
sync
pgbench -v -n -t 600 -c 5 mydb

My postgresql.conf was the default one, except for:
    fsync = <depends on the test>
    shared_buffers = 10000
    work_mem = 10240

Every test results should begin the above, but I removed it because
it's always the same:
transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 5
number of transactions per client: 600
number of transactions actually processed: 3000/3000

TESTS RESULTS
==============
TEST 01: CFLAGS="-O2 -march=i686" fsync=false

tps = 734.948620 (including connections establishing)
tps = 736.866642 (excluding connections establishing)

tps = 713.225047 (including connections establishing)
tps = 715.039059 (excluding connections establishing)

tps = 721.769259 (including connections establishing)
tps = 723.631065 (excluding connections establishing)


TEST 02: CFLAGS="-O2 -march=i686" fsync=true

tps = 75.466058 (including connections establishing)
tps = 75.485675 (excluding connections establishing)

tps = 75.115797 (including connections establishing)
tps = 75.135311 (excluding connections establishing)

tps = 73.883113 (including connections establishing)
tps = 73.901997 (excluding connections establishing)


TEST 03: CFLAGS="-O2 -march=pentium4" fsync=false

tps = 846.337784 (including connections establishing)
tps = 849.067017 (excluding connections establishing)

tps = 829.476269 (including connections establishing)
tps = 832.008129 (excluding connections establishing)

tps = 831.416457 (including connections establishing)
tps = 835.300001 (excluding connections establishing)


TEST 04 CFLAGS="-O2 -march=pentium4" fsync=true

tps = 83.224016 (including connections establishing)
tps = 83.248157 (excluding connections establishing)

tps = 80.811892 (including connections establishing)
tps = 80.834525 (excluding connections establishing)

tps = 80.671406 (including connections establishing)
tps = 80.693975 (excluding connections establishing)


CONCLUSIONS
Everyone can get their own conclusion. Mine is:

1 - You have improvement when you compile your postgresql using
processor specific tecnologies. With the fsync the you have an
improvement of 9% at the tps rate. Without the fsync, the improvement
is of 15,6%.

2 - You can still improve your indexes, sqls and everythingelse, this
only adds another possible improvment.

3 - I can't prove this but I *think* that this is related to the fact
that GCC knows how to do the same thing better on each processor.

4 - I'm still using source-based distros.

WHAT NOW
There are other things I wish to test:
 1 - What efect an optimized glibc has on PostgreSQL?
 2 - How much improvement can I get playing with my postgresql.conf.
 3 - What efect optimizations have with concurrency?
 4 - What if I used Intel C++ Compiler instead of GCC?
 5 - What if I use GCC 4.1.1 instead of GCC 3.4.6?

I'm thinking about writing a script to make all the tests (more than 3
times each), get the data and plot some graphs.

I don't have the time right now to do it, maybe next week I'll have.

I invite everyone to comment/sugest on the procedure or the results.

Best regards,
Daniel Colchete

Re: New to PostgreSQL, performance considerations

From
"Luke Lonergan"
Date:
Daniel,

Good stuff.

Can you try this with just "-O3" versus "-O2"?

- Luke


On 12/11/06 2:22 PM, "Daniel van Ham Colchete" <daniel.colchete@gmail.com>
wrote:

> Hi yall,
>
> I made some preliminary tests.
>
> Before the results, I would like to make some acknowledgments:
> 1 - I didn't show any prove to any of the things I said until now.
> 2 - It really is a waste of everyone's time to say one thing when I
> can't prove it.
>
> But all I said, is the knowledge I have been acumulating over the past
> few years working on a project where optimization is important. After
> algorithmic optimizations, compiler options is the second on my list
> and with my software they show measurable improvement. With the other
> software I use, they seen to run faster, but I didn't measure it.
>
> TEST PROCEDURE
> ================
> I ran this test at a Gentoo test machine I have here. It's a Pentium 4
> 3.0GHz (I don't know witch P4) with 1 GB of RAM memory. It only uses
> SATA drives. I didn't changed my glibc (or any other lib) during the
> test. I used GCC 3.4.6.
>
> I ran each test three times. So we can get an idea about average
> values and standard deviation.
>
> Each time I ran the test with the following commands:
> dropdb mydb
> createdb mydb
> pgbench -i -s 10 mydb 2> /dev/null
> psql -c 'vacuum analyze' mydb
> psql -c 'checkpoint' mydb
> sync
> pgbench -v -n -t 600 -c 5 mydb
>
> My postgresql.conf was the default one, except for:
> fsync = <depends on the test>
> shared_buffers = 10000
> work_mem = 10240
>
> Every test results should begin the above, but I removed it because
> it's always the same:
> transaction type: TPC-B (sort of)
> scaling factor: 10
> number of clients: 5
> number of transactions per client: 600
> number of transactions actually processed: 3000/3000
>
> TESTS RESULTS
> ==============
> TEST 01: CFLAGS="-O2 -march=i686" fsync=false
>
> tps = 734.948620 (including connections establishing)
> tps = 736.866642 (excluding connections establishing)
>
> tps = 713.225047 (including connections establishing)
> tps = 715.039059 (excluding connections establishing)
>
> tps = 721.769259 (including connections establishing)
> tps = 723.631065 (excluding connections establishing)
>
>
> TEST 02: CFLAGS="-O2 -march=i686" fsync=true
>
> tps = 75.466058 (including connections establishing)
> tps = 75.485675 (excluding connections establishing)
>
> tps = 75.115797 (including connections establishing)
> tps = 75.135311 (excluding connections establishing)
>
> tps = 73.883113 (including connections establishing)
> tps = 73.901997 (excluding connections establishing)
>
>
> TEST 03: CFLAGS="-O2 -march=pentium4" fsync=false
>
> tps = 846.337784 (including connections establishing)
> tps = 849.067017 (excluding connections establishing)
>
> tps = 829.476269 (including connections establishing)
> tps = 832.008129 (excluding connections establishing)
>
> tps = 831.416457 (including connections establishing)
> tps = 835.300001 (excluding connections establishing)
>
>
> TEST 04 CFLAGS="-O2 -march=pentium4" fsync=true
>
> tps = 83.224016 (including connections establishing)
> tps = 83.248157 (excluding connections establishing)
>
> tps = 80.811892 (including connections establishing)
> tps = 80.834525 (excluding connections establishing)
>
> tps = 80.671406 (including connections establishing)
> tps = 80.693975 (excluding connections establishing)
>
>
> CONCLUSIONS
> Everyone can get their own conclusion. Mine is:
>
> 1 - You have improvement when you compile your postgresql using
> processor specific tecnologies. With the fsync the you have an
> improvement of 9% at the tps rate. Without the fsync, the improvement
> is of 15,6%.
>
> 2 - You can still improve your indexes, sqls and everythingelse, this
> only adds another possible improvment.
>
> 3 - I can't prove this but I *think* that this is related to the fact
> that GCC knows how to do the same thing better on each processor.
>
> 4 - I'm still using source-based distros.
>
> WHAT NOW
> There are other things I wish to test:
>  1 - What efect an optimized glibc has on PostgreSQL?
>  2 - How much improvement can I get playing with my postgresql.conf.
>  3 - What efect optimizations have with concurrency?
>  4 - What if I used Intel C++ Compiler instead of GCC?
>  5 - What if I use GCC 4.1.1 instead of GCC 3.4.6?
>
> I'm thinking about writing a script to make all the tests (more than 3
> times each), get the data and plot some graphs.
>
> I don't have the time right now to do it, maybe next week I'll have.
>
> I invite everyone to comment/sugest on the procedure or the results.
>
> Best regards,
> Daniel Colchete
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>



Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Mon, Dec 11, 2006 at 08:22:42PM -0200, Daniel van Ham Colchete wrote:
>TEST 01: CFLAGS="-O2 -march=i686" fsync=false
>
>tps = 734.948620 (including connections establishing)
>tps = 736.866642 (excluding connections establishing)
[snip]
>TEST 03: CFLAGS="-O2 -march=pentium4" fsync=false
>
>tps = 846.337784 (including connections establishing)
>tps = 849.067017 (excluding connections establishing)

Can anyone else reproduce these results? I'm on similar hardware (2.5GHz
P4, 1.5G RAM) and my test results are more like this:

(postgresql 8.2.0)

CFLAGS=(default)

tps = 527.300454 (including connections establishing)
tps = 528.898671 (excluding connections establishing)

tps = 517.874347 (including connections establishing)
tps = 519.404970 (excluding connections establishing)

tps = 534.934905 (including connections establishing)
tps = 536.562150 (excluding connections establishing)

CFLAGS=686

tps = 525.179375 (including connections establishing)
tps = 526.801278 (excluding connections establishing)

tps = 557.821136 (including connections establishing)
tps = 559.602414 (excluding connections establishing)

tps = 532.142941 (including connections establishing)
tps = 533.740209 (excluding connections establishing)


CFLAGS=pentium4

tps = 518.869825 (including connections establishing)
tps = 520.394341 (excluding connections establishing)

tps = 537.759982 (including connections establishing)
tps = 539.402547 (excluding connections establishing)

tps = 538.522198 (including connections establishing)
tps = 540.200458 (excluding connections establishing)

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Christopher Browne
Date:
After a long battle with technology, mstone+postgres@mathom.us (Michael Stone), an earthling, wrote:
> [1] I will say that I have never seen a realistic benchmark of
> general code where the compiler flags made a statistically
> significant difference in the runtime.

When we were initially trying out PostgreSQL on AIX, I did some
(limited, admittedly) comparisons between behaviour when compiled
using GCC 3.something, VisualAge C, and VisualAge C++.  I did some
modifications of -O values; I didn't find differences amounting to
more than a percent or two between any of the combinations.

If there's to be a difference, anywhere, it ought to have figured
pretty prominently between a pretty elderly GCC version and IBM's top
of the line PowerPC compiler.
--
output = reverse("gro.mca" "@" "enworbbc")
http://cbbrowne.com/info/linuxdistributions.html
"I  doubt this language  difference would  confuse anybody  unless you
were providing instructions on the insertion of a caffeine enema."
-- On alt.coffee

Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Mon, 11 Dec 2006, Michael Stone wrote:

> Can anyone else reproduce these results? I'm on similar hardware (2.5GHz P4,
> 1.5G RAM)...

There are two likely candidates for why Daniel's P4 3.0GHz significantly
outperforms your 2.5GHz system.

1) Most 2.5GHZ P4 processors use a 533MHz front-side bus (FSB); most
3.0GHZ ones use an 800MHz bus.

2) A typical motherboard paired with a 2.5GHz era processor will have a
single-channel memory interface; a typical 3.0GHZ era board supports
dual-channel DDR.

These changes could easily explain the magnitude of difference in results
you're seeing, expecially when combined with a 20% greater raw CPU clock.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> On Mon, 11 Dec 2006, Michael Stone wrote:
>> Can anyone else reproduce these results? I'm on similar hardware (2.5GHz P4,
>> 1.5G RAM)...

> There are two likely candidates for why Daniel's P4 3.0GHz significantly
> outperforms your 2.5GHz system.

Um, you entirely missed the point: the hardware speedups you mention are
quite independent of any compiler options.  The numbers we are looking
at are the relative speeds of two different compiles on the same
hardware, not whether hardware A is faster than hardware B.

            regards, tom lane

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Mike,

are you using "-mtune/-mcpu" or "-march" with GCC?
Witch GCC version? Are you working with a 32bits OS or 64bits?

Daniel

On 12/11/06, Michael Stone <mstone+postgres@mathom.us> wrote:
>
> Can anyone else reproduce these results? I'm on similar hardware (2.5GHz
> P4, 1.5G RAM) and my test results are more like this:
>
> (postgresql 8.2.0)
>
> CFLAGS=(default)
>
> tps = 527.300454 (including connections establishing)
> tps = 528.898671 (excluding connections establishing)
>
> tps = 517.874347 (including connections establishing)
> tps = 519.404970 (excluding connections establishing)
>
> tps = 534.934905 (including connections establishing)
> tps = 536.562150 (excluding connections establishing)
>
> CFLAGS=686
>
> tps = 525.179375 (including connections establishing)
> tps = 526.801278 (excluding connections establishing)
>
> tps = 557.821136 (including connections establishing)
> tps = 559.602414 (excluding connections establishing)
>
> tps = 532.142941 (including connections establishing)
> tps = 533.740209 (excluding connections establishing)
>
>
> CFLAGS=pentium4
>
> tps = 518.869825 (including connections establishing)
> tps = 520.394341 (excluding connections establishing)
>
> tps = 537.759982 (including connections establishing)
> tps = 539.402547 (excluding connections establishing)
>
> tps = 538.522198 (including connections establishing)
> tps = 540.200458 (excluding connections establishing)
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

Re: New to PostgreSQL, performance considerations

From
Guido Neitzer
Date:
On 12.12.2006, at 02:37, Michael Stone wrote:

> Can anyone else reproduce these results? I'm on similar hardware
> (2.5GHz P4, 1.5G RAM) and my test results are more like this:

I'm on totally different hardware / software (MacBook Pro 2.33GHz
C2D) and I can't reproduce the tests.

I have played with a lot of settings in the CFLAGS including -march
and -O3 and -O2 - there is no significant difference in the tests.

With fsync=off I get around 2100tps on average with all different
settings I have tested. I tried to get the rest of the setup as
similar to the described on ty Daniel as possible. It might be that
the crappy Pentium 4 needs some special handling, but I can't get the
Core 2 Duo in my laptop produce different tps numbers with the
different optimizations.

Btw: best results were 2147 with -march=i686 and 2137 with -
march=nocona. Both with -O3.

cug

Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 11, 2006, at 23:22 , Daniel van Ham Colchete wrote:

> I ran this test at a Gentoo test machine I have here. It's a Pentium 4
> 3.0GHz (I don't know witch P4)

Try cat /proc/cpuinfo.

> TESTS RESULTS
> ==============

On a dual-core Opteron 280 with 4G RAM with an LSI PCI-X Fusion-MPT
SAS controller, I am getting wildly uneven results:

tps = 264.775137 (excluding connections establishing)
tps = 160.365754 (excluding connections establishing)
tps = 151.967193 (excluding connections establishing)
tps = 148.010349 (excluding connections establishing)
tps = 260.973569 (excluding connections establishing)
tps = 144.693287 (excluding connections establishing)
tps = 148.147036 (excluding connections establishing)
tps = 259.485717 (excluding connections establishing)

I suspect the hardware's real maximum performance of the system is
~150 tps, but that the LSI's write cache is buffering the writes. I
would love to validate this hypothesis, but I'm not sure how.

Alexander.

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Tue, Dec 12, 2006 at 01:35:04AM -0500, Greg Smith wrote:
>These changes could easily explain the magnitude of difference in results
>you're seeing, expecially when combined with a 20% greater raw CPU clock.

I'm not interested in comparing the numbers between the systems (which
is obviously pointless); I am intested in the fact that there was a
consistent difference among the numbers on his system (based on
difference optimizations) and no difference among mine.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Tue, Dec 12, 2006 at 07:10:34AM -0200, Daniel van Ham Colchete wrote:
>are you using "-mtune/-mcpu" or "-march" with GCC?

I used exactly the options you said you used.

>Witch GCC version? Are you working with a 32bits OS or 64bits?

3.3.5; 32

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Tue, Dec 12, 2006 at 12:29:29PM +0100, Alexander Staubo wrote:
>I suspect the hardware's real maximum performance of the system is
>~150 tps, but that the LSI's write cache is buffering the writes. I
>would love to validate this hypothesis, but I'm not sure how.

With fsync off? The write cache shouldn't really matter in that case.
(And for this purpose that's probably a reasonable configuration.)

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Cosimo Streppone
Date:
Luke Lonergan wrote:

> Can you try this with just "-O3" versus "-O2"?

Thanks to Daniel for doing these tests.
I happen to have done the same tests about 3/4 years ago,
and concluded that gcc flags did *not* influence performance.

Moved by curiosity, I revamped those tests now on a test
machine (single P4 @ 3.2 Ghz, with 2Mb cache and 512 Mb Ram).

Here are the results:

http://www.streppone.it/cosimo/work/pg/gcc.png

In short: tests executed with postgresql 8.2.0,
gcc version 3.4.3 20041212 (Red Hat 3.4.3-9.EL4),
tps figures computed as average of 9 pgbench runs (don't ask why 9... :-),
with exactly the same commands given by Daniel:

"-O0" ~ 957 tps
"-O1 -mcpu=pentium4 -mtune=pentium4" ~ 1186 tps
"-O2 -mcpu=pentium4 -mtune=pentium4" ~ 1229 tps
"-O3 -mcpu=pentium4 -mtune=pentium4" ~ 1257 tps
"-O6 -mcpu=pentium4 -mtune=pentium4" ~ 1254 tps

I'm curious now to get the same tests run with
a custom-cflags-compiled glibc.

--
Cosimo

Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Tue, Dec 12, 2006 at 01:42:06PM +0100, Cosimo Streppone wrote:
> "-O0" ~ 957 tps
> "-O1 -mcpu=pentium4 -mtune=pentium4" ~ 1186 tps
> "-O2 -mcpu=pentium4 -mtune=pentium4" ~ 1229 tps
> "-O3 -mcpu=pentium4 -mtune=pentium4" ~ 1257 tps
> "-O6 -mcpu=pentium4 -mtune=pentium4" ~ 1254 tps

For the record, -O3 = -O6 for regular gcc. It used to matter for pgcc, but
that is hardly in use anymore.

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Tue, Dec 12, 2006 at 01:42:06PM +0100, Cosimo Streppone wrote:
>"-O0" ~ 957 tps
>"-O1 -mcpu=pentium4 -mtune=pentium4" ~ 1186 tps
>"-O2 -mcpu=pentium4 -mtune=pentium4" ~ 1229 tps
>"-O3 -mcpu=pentium4 -mtune=pentium4" ~ 1257 tps
>"-O6 -mcpu=pentium4 -mtune=pentium4" ~ 1254 tps
>
>I'm curious now to get the same tests run with
>a custom-cflags-compiled glibc.

I'd be curious to see -O2 with and without the arch-specific flags,
since that's mostly what the discussion is about.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Tue, Dec 12, 2006 at 07:48:06AM -0500, Michael Stone wrote:
>I'd be curious to see -O2 with and without the arch-specific flags,
>since that's mostly what the discussion is about.

That came across more harshly than I intended; I apologize for that.
It's certainly a useful data point to compare the various optimization
levels.

I'm rerunning the tests with gcc 4.1.2 and another platform to see if
that makes any difference.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Brad Nicholson
Date:
On Mon, 2006-12-11 at 20:22 -0200, Daniel van Ham Colchete wrote:
>
> I'm thinking about writing a script to make all the tests (more than 3
> times each), get the data and plot some graphs.
>
> I don't have the time right now to do it, maybe next week I'll have.

Check out the OSDL test suite stuff.  It runs the test and handles all
the reporting for you (including graphs).
http://www.osdl.org/lab_activities/kernel_testing/osdl_database_test_suite/

> I invite everyone to comment/sugest on the procedure or the results.

I'd recommend running each test for quite a bit longer.  Get your
buffers dirty and exercised, and see what things look like then.

Also, if you have the disk, offload your wal files to a separate disk.
--
Brad Nicholson  416-673-4106
Database Administrator, Afilias Canada Corp.


Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 12, 2006, at 13:32 , Michael Stone wrote:

> On Tue, Dec 12, 2006 at 12:29:29PM +0100, Alexander Staubo wrote:
>> I suspect the hardware's real maximum performance of the system
>> is  ~150 tps, but that the LSI's write cache is buffering the
>> writes. I  would love to validate this hypothesis, but I'm not
>> sure how.
>
> With fsync off? The write cache shouldn't really matter in that
> case. (And for this purpose that's probably a reasonable
> configuration.)

No, fsync=on. The tps values are similarly unstable with fsync=off,
though -- I'm seeing bursts of high tps values followed by low-tps
valleys, a kind of staccato flow indicative of a write caching being
filled up and flushed.

Alexander.


Re: New to PostgreSQL, performance considerations

From
Ron
Date:
1= In all these results I'm seeing, no one has yet reported what
their physical IO subsystem is... ...when we are benching a DB.

2= So far we've got ~ a factor of 4 performance difference between
Michael Stone's 1S 1C Netburst era 2.5GHz P4 PC and Guido Neitzer's
1S 2C MacBook Pro 2.33GHz  C2D.  If the physical IO subsystems are
even close to equivalent across the systems benched so far, we've
clearly established that pg performance is more sensitive to factors
outside the physical IO subsystem than might usually be thought with
regard to a DBMS.  (At least for this benchmark SW.)

3= Daniel van Ham Colchete is running Gentoo.  That means every SW
component on his box has been compiled to be optimized for the HW it
is running on.
There may be a combination of effects going on for him that others
not running a system optimized from the ground up for its HW do not see.

4= If we are testing arch specific compiler options and only arch
specific compiler options, we should remove the OS as a variable.
Since Daniel has presented evidence in support of his hypothesis, the
first step should be to duplicate his environment as =exactly= as
possible and see if someone can independently reproduce the results
when the only significant difference is the human involved.  This
will guard against procedural error in the experiment.

Possible Outcomes
A= Daniel made a procedural error.  We all learn what is and to avoid it.
B= The Gentoo results are confirmed but no other OS shows this
effect.  Much digging ensues ;-)
C= Daniel's results are confirmed as platform independent once we
take all factor into account properly
We all learn more re: how to best set up pg for highest performance.

Ron Peacetree


At 01:35 AM 12/12/2006, Greg Smith wrote:
>On Mon, 11 Dec 2006, Michael Stone wrote:
>
>>Can anyone else reproduce these results? I'm on similar hardware
>>(2.5GHz P4, 1.5G RAM)...
>
>There are two likely candidates for why Daniel's P4 3.0GHz
>significantly outperforms your 2.5GHz system.
>
>1) Most 2.5GHZ P4 processors use a 533MHz front-side bus (FSB); most
>3.0GHZ ones use an 800MHz bus.
>
>2) A typical motherboard paired with a 2.5GHz era processor will
>have a single-channel memory interface; a typical 3.0GHZ era board
>supports dual-channel DDR.
>
>These changes could easily explain the magnitude of difference in
>results you're seeing, expecially when combined with a 20% greater
>raw CPU clock.


Re: New to PostgreSQL, performance considerations

From
Florian Weimer
Date:
* Cosimo Streppone:

> "-O0" ~ 957 tps
> "-O1 -mcpu=pentium4 -mtune=pentium4" ~ 1186 tps
> "-O2 -mcpu=pentium4 -mtune=pentium4" ~ 1229 tps
> "-O3 -mcpu=pentium4 -mtune=pentium4" ~ 1257 tps
> "-O6 -mcpu=pentium4 -mtune=pentium4" ~ 1254 tps

-mcpu and -mtune are synonymous.  You really should -march here (but
the result is non-generic code).  Keep in mind that GCC does not
contain an instruction scheduler for the Pentium 4s.  I also believe
that the GCC switches are not fine-grained enough to cover the various
Pentium 4 variants.  For instance, some chips don't like the CMOV
instruction at all, but others can process it with decent speed.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: New to PostgreSQL, performance considerations

From
Bill Moran
Date:
In response to Ron <rjpeace@earthlink.net>:
>
> 3= Daniel van Ham Colchete is running Gentoo.  That means every SW
> component on his box has been compiled to be optimized for the HW it
> is running on.
> There may be a combination of effects going on for him that others
> not running a system optimized from the ground up for its HW do not see.

http://www.potentialtech.com/wmoran/source.php

You get an idea of how old these tests are by the fact that the latest
and greatest was FreeBSD 4.9 at the time, but I suppose it may still
be relevent.

--
Bill Moran
Collaborative Fusion Inc.

Re: New to PostgreSQL, performance considerations

From
Tom Lane
Date:
Alexander Staubo <alex@purefiction.net> writes:
> No, fsync=on. The tps values are similarly unstable with fsync=off,
> though -- I'm seeing bursts of high tps values followed by low-tps
> valleys, a kind of staccato flow indicative of a write caching being
> filled up and flushed.

It's notoriously hard to get repeatable numbers out of pgbench :-(

A couple of tips:
    * don't put any faith in short runs.  I usually use -t 1000
      plus -c whatever.
    * make sure you loaded the database (pgbench -i) with a scale
      factor (-s) at least equal to the maximum -c you want to test.
      Otherwise you're mostly measuring update contention.
    * pay attention to when checkpoints occur.  You probably need
      to increase checkpoint_segments if you want pgbench not to be
      checkpoint-bound.

            regards, tom lane

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
On 12/12/06, Florian Weimer <fweimer@bfk.de> wrote:
> * Cosimo Streppone:
>
> > "-O0" ~ 957 tps
> > "-O1 -mcpu=pentium4 -mtune=pentium4" ~ 1186 tps
> > "-O2 -mcpu=pentium4 -mtune=pentium4" ~ 1229 tps
> > "-O3 -mcpu=pentium4 -mtune=pentium4" ~ 1257 tps
> > "-O6 -mcpu=pentium4 -mtune=pentium4" ~ 1254 tps
>
> -mcpu and -mtune are synonymous.  You really should -march here (but
> the result is non-generic code).  Keep in mind that GCC does not
> contain an instruction scheduler for the Pentium 4s.  I also believe
> that the GCC switches are not fine-grained enough to cover the various
> Pentium 4 variants.  For instance, some chips don't like the CMOV
> instruction at all, but others can process it with decent speed.

You can use -march=pentium4, -march=prescott and -march=nocona to the
different Pentium4 processors. But you have to use -march (and not
-mcpu or -mtune) because without it you are still using only i386
instructions.

Daniel

Re: New to PostgreSQL, performance considerations

From
David Boreham
Date:
Alexander Staubo wrote:

> No, fsync=on. The tps values are similarly unstable with fsync=off,
> though -- I'm seeing bursts of high tps values followed by low-tps
> valleys, a kind of staccato flow indicative of a write caching being
> filled up and flushed.

Databases with checkpointing typically exhibit this cyclical throughput
syndrome.
(put another way : this is to be expected and you are correct that it
indicates buffered
data being flushed to disk periodically).




Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Tue, 12 Dec 2006, Tom Lane wrote:

> Um, you entirely missed the point: the hardware speedups you mention are
> quite independent of any compiler options.  The numbers we are looking
> at are the relative speeds of two different compiles on the same
> hardware, not whether hardware A is faster than hardware B.

The point that I failed to make clear is that expecting Mike's system to
perform like Daniel's just because they have similar processors isn't
realistic, considering the changes that happened in the underlying
hardware during that period.  Having very different memory subsystems will
shift which optimizations are useful and which have minimal impact even if
the processor is basically the same.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Alvaro Herrera
Date:
Tom Lane wrote:
> Alexander Staubo <alex@purefiction.net> writes:
> > No, fsync=on. The tps values are similarly unstable with fsync=off,
> > though -- I'm seeing bursts of high tps values followed by low-tps
> > valleys, a kind of staccato flow indicative of a write caching being
> > filled up and flushed.
>
> It's notoriously hard to get repeatable numbers out of pgbench :-(
>
> A couple of tips:
>     * don't put any faith in short runs.  I usually use -t 1000
>       plus -c whatever.
>     * make sure you loaded the database (pgbench -i) with a scale
>       factor (-s) at least equal to the maximum -c you want to test.
>       Otherwise you're mostly measuring update contention.
>     * pay attention to when checkpoints occur.  You probably need
>       to increase checkpoint_segments if you want pgbench not to be
>       checkpoint-bound.

While skimming over the pgbench source it has looked to me like it's
necessary to pass the -s switch (scale factor) to both the
initialization (-i) and the subsequent (non -i) runs.  I'm not sure if
this is obvious from the documentation but I thought it may be useful to
mention.


Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Tue, 12 Dec 2006, Alvaro Herrera wrote:

> While skimming over the pgbench source it has looked to me like it's
> necessary to pass the -s switch (scale factor) to both the
> initialization (-i) and the subsequent (non -i) runs.

For non-custom runs, it's computed based on the number of branches.
Around line 1415 you should find:

                 res = PQexec(con, "select count(*) from branches");
                 ...
                 scale = atoi(PQgetvalue(res, 0, 0));

So it shouldn't be required during the run, just the initialization.

However, note that there were some recent bug fixes to the scaling
implementation, and I would recommend using the version that comes with
8.2 (pgbench 1.58 2006/10/21).  It may compile fine even if you copy that
pgbench.c into an older version's contrib directory; it's certainly a
drop-in replacement (and improvement) for the pgbench 1.45 that comes with
current Postgres 8.1 versions.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
Mike,

I'm making some other tests here at another hardware (also Gentoo). I
found out that PostgreSQL stops for a while if I change the -t
parameter on bgbench from 600 to 1000 and I have ~150 tps instead of
~950tps.

I don't know why PostgreSQL stoped, but it was longer than 5 seconds
and my disk IO was comsuming 100% of my CPU time during this period.
And I'm testing with my fsync turned off.

Maybe if you lower your -t rate you are going to see this improvement.

Best regards,
Daniel

On 12/12/06, Michael Stone <mstone+postgres@mathom.us> wrote:
> On Tue, Dec 12, 2006 at 07:10:34AM -0200, Daniel van Ham Colchete wrote:
> >are you using "-mtune/-mcpu" or "-march" with GCC?
>
> I used exactly the options you said you used.
>
> >Witch GCC version? Are you working with a 32bits OS or 64bits?
>
> 3.3.5; 32
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

Re: New to PostgreSQL, performance considerations

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> While skimming over the pgbench source it has looked to me like it's
> necessary to pass the -s switch (scale factor) to both the
> initialization (-i) and the subsequent (non -i) runs.

No, it's not supposed to be, and I've never found it needed in practice.
The code seems able to pull the scale out of the database (I forget how
it figures it out exactly).

            regards, tom lane

Re: New to PostgreSQL, performance considerations

From
Tom Lane
Date:
"Daniel van Ham Colchete" <daniel.colchete@gmail.com> writes:
> I'm making some other tests here at another hardware (also Gentoo). I
> found out that PostgreSQL stops for a while if I change the -t
> parameter on bgbench from 600 to 1000 and I have ~150 tps instead of
> ~950tps.

> I don't know why PostgreSQL stoped, but it was longer than 5 seconds
> and my disk IO was comsuming 100% of my CPU time during this period.

Checkpoint?

            regards, tom lane

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 10:47 AM 12/12/2006, Tom Lane wrote:

>It's notoriously hard to get repeatable numbers out of pgbench :-(
That's not a good characteristic in bench marking SW...

Does the ODSL stuff  have an easier time getting reproducible results?


>A couple of tips:
>         * don't put any faith in short runs.  I usually use -t 1000
> plus -c whatever.
>         * make sure you loaded the database (pgbench -i) with a
> scale factor (-s) at least equal to the maximum -c you want to test.
>           Otherwise you're mostly measuring update contention.
>         * pay attention to when checkpoints occur.  You probably
> need to increase checkpoint_segments if you want pgbench not to be
> checkpoint-bound.
This all looks very useful.  Can you give some guidance as to what
checkpoint_segments should be increased to?   Do the values you are
running  pgbench with suggest what value checkpoint_segments should be?

Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
"Daniel van Ham Colchete"
Date:
I just made another test with a second Gentoo machine:

Pentium 4 3.0Ghz Prescott
GCC 4.1.1
Glibc 2.4
PostgreSQL 8.1.5
Kernel 2.6.17

Same postgresql.conf as yesterday's.

First test
==========
  GLIBC: -O2 -march=i686
  PostgreSQL: -O2 -march=i686
  Results: 974.638731 975.602142 975.882051 969.142503 992.914167
983.467131 983.231575 994.901330 970.375221 978.377467
  Average (error):  980 tps (13 tps)

Second test
===========
  GLIBC: -O2 -march=i686
  PostgreSQL: -O2 -march=prescott
  Results: 988.319643 976.152973 1006.482553 992.431322 983.090838
992.674065 989.216746 990.897615 987.129802 975.907955
  Average (error):  988 tps (15 tps)

Third test
==========
  GLIBC: -O2 -march=prescott
  PostgreSQL: -O2 -march=i686
  Results: 969.085400 966.187309 994.882325 968.715150 956.766771
970.151542 960.090571 967.680628 986.568462 991.756520
  Average (error): 973 tps (19 tps)

Forth test
==========
  GLIBC: -O2 -march=prescott
  PostgreSQL: -O2 -march=prescott
  Results: 980.888371 978.128269 969.344669 978.021509 979.256603
993.236457 984.078399 981.654834 976.295925 969.796277
  Average (error):  979 tps (11 tps)

The results showed no significant change. The conclusion of today's
test would be that there are no improvement at PostgreSQL when using
-march=prescott.

I only see 3 diferences between yesterday's server and today's: the
kernel version (y: 2.6.18, t:2.6.17), the server uses an IDE harddrive
(yesterday was SATA), and the gcc version (3.4.6 -> 4.1.1).

I don't know why yesterday we had improved and today we had not.

Best
Daniel

On 12/12/06, Daniel van Ham Colchete <daniel.colchete@gmail.com> wrote:
> I'm making some other tests here at another hardware (also Gentoo). I
> found out that PostgreSQL stops for a while if I change the -t
> parameter on bgbench from 600 to 1000 and I have ~150 tps instead of
> ~950tps.

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 01:35 PM 12/12/2006, Daniel van Ham Colchete wrote:
>I just made another test with a second Gentoo machine:

><snip>
>
>The results showed no significant change. The conclusion of today's
>test would be that there are no improvement at PostgreSQL when using
>-march=prescott.
>
>I only see 3 diferences between yesterday's server and today's: the
>kernel version (y: 2.6.18, t:2.6.17), the server uses an IDE
>harddrive (yesterday was SATA), and the gcc version (3.4.6 -> 4.1.1).
>
>I don't know why yesterday we had improved and today we had not.
SATA HD's, particularly SATA II HD's and _especially_ 10Krpm 150GB
SATA II Raptors are going to have far better performance than older IDE HDs.

Do some raw bonnie++ benches on the two systems.  If the numbers from
bonnie++ are close to those obtained during the pgbench runs, then
the HDs are limiting pgbench.

  Best would be to use the exact same HD IO subsystem on both boxes,
but that may not be feasible.

In general, it would be helpful if the entire config, HW + OS + pg
stuff, was documented when submitting benchmark results.
(For instance, it would not be outside the realm of plausibility for
Guidos C2D laptop to be HD IO limited and for Micheal's 2.5 GHZ P4 PC
to be CPU limited during pgbench runs.)

Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
"Bucky Jordan"
Date:
> I just made another test with a second Gentoo machine:
>
> Pentium 4 3.0Ghz Prescott
> GCC 4.1.1
> Glibc 2.4
> PostgreSQL 8.1.5
> Kernel 2.6.17
>
> Same postgresql.conf as yesterday's.
>
> First test
> ==========
>   GLIBC: -O2 -march=i686
>   PostgreSQL: -O2 -march=i686
>   Results: 974.638731 975.602142 975.882051 969.142503 992.914167
> 983.467131 983.231575 994.901330 970.375221 978.377467
>   Average (error):  980 tps (13 tps)

Do you have any scripts for the above process you could share with the
list? I have a few machines (a woodcrest and an older Xeon) that I could
run this on...

Thanks,

Bucky

Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Tue, 12 Dec 2006, Daniel van Ham Colchete wrote:

> I'm making some other tests here at another hardware (also Gentoo). I
> found out that PostgreSQL stops for a while if I change the -t
> parameter on bgbench from 600 to 1000 and I have ~150 tps instead of
> ~950tps.

Sure sounds like a checkpoint to me; the ones pgbench generates really
aren't fun to watch when running against IDE drives.  I've seen my test
system with 2 IDE drives pause for 15 seconds straight to process one when
fsync is on, caching was disabled on the WAL disk, and the shared_buffer
cache is large.

If you were processing 600 transactions/client without hitting a
checkpoint but 1000 is, try editing your configuration file, double
checkpoint_segments, restart the server, and then try again.  This is
cheating but will prove the source of the problem.

This kind of behavior is what other list members were trying to suggest to
you before:  once you get disk I/O involved, that drives the performance
characteristics of so many database operations that small improvements in
CPU optimization are lost.  Running the regular pgbench code is so wrapped
in disk writes that it's practically a worst-case for what you're trying
to show.

I would suggest that you run all your optimization tests with the -S
parameter to pgbench that limits it to select statements.  That will let
you benchmark whether the core code is benefitting from the CPU
improvements without having disk I/O as the main driver of performance.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Christopher Browne
Date:
alex@purefiction.net (Alexander Staubo) wrote:
> On Dec 12, 2006, at 13:32 , Michael Stone wrote:
>
>> On Tue, Dec 12, 2006 at 12:29:29PM +0100, Alexander Staubo wrote:
>>> I suspect the hardware's real maximum performance of the system  is
>>> ~150 tps, but that the LSI's write cache is buffering the
>>> writes. I  would love to validate this hypothesis, but I'm not
>>> sure how.
>>
>> With fsync off? The write cache shouldn't really matter in that
>> case. (And for this purpose that's probably a reasonable
>> configuration.)
>
> No, fsync=on. The tps values are similarly unstable with fsync=off,
> though -- I'm seeing bursts of high tps values followed by low-tps
> valleys, a kind of staccato flow indicative of a write caching being
> filled up and flushed.

If that seems coincidental with checkpoint flushing, that would be one
of the notable causes of that sort of phenomenon.

You could get more readily comparable numbers either by:
 a) Increasing the frequency of checkpoint flushes, so they would be
    individually smaller, or
 b) Decreasing the frequency so you could exclude it from the time of
    the test.
--
output = ("cbbrowne" "@" "gmail.com")
http://cbbrowne.com/info/slony.html
"It   can be   shown   that for any  nutty  theory,  beyond-the-fringe
political view or  strange religion there  exists  a proponent  on the
Net. The proof is left as an exercise for your kill-file."
-- Bertil Jonell

Re: New to PostgreSQL, performance considerations

From
Sven Geisler
Date:
Hi,

Did someone try '-mfpmath=sse -msse3'?

Would be interesting to know if -mfpmath=sse boost the performance.

I guess, the difference in the generated code isn't that much between
i686 and prescott. The bigger step is i386 to i686. '-mfpmath=sse
-msse3' will also use the SSE unit, which the classic i686 doesn't have.

CFLAGS=-O2 -march=prescott -mfpmath=sse -msse3

Best regards
Sven.

Daniel van Ham Colchete schrieb:
> I just made another test with a second Gentoo machine:
>
> Pentium 4 3.0Ghz Prescott
> GCC 4.1.1
> Glibc 2.4
> PostgreSQL 8.1.5
> Kernel 2.6.17
>
> Same postgresql.conf as yesterday's.
>
> First test
> ==========
>  GLIBC: -O2 -march=i686
>  PostgreSQL: -O2 -march=i686
>  Results: 974.638731 975.602142 975.882051 969.142503 992.914167
> 983.467131 983.231575 994.901330 970.375221 978.377467
>  Average (error):  980 tps (13 tps)
>
> Second test
> ===========
>  GLIBC: -O2 -march=i686
>  PostgreSQL: -O2 -march=prescott
>  Results: 988.319643 976.152973 1006.482553 992.431322 983.090838
> 992.674065 989.216746 990.897615 987.129802 975.907955
>  Average (error):  988 tps (15 tps)
>
> Third test
> ==========
>  GLIBC: -O2 -march=prescott
>  PostgreSQL: -O2 -march=i686
>  Results: 969.085400 966.187309 994.882325 968.715150 956.766771
> 970.151542 960.090571 967.680628 986.568462 991.756520
>  Average (error): 973 tps (19 tps)
>
> Forth test
> ==========
>  GLIBC: -O2 -march=prescott
>  PostgreSQL: -O2 -march=prescott
>  Results: 980.888371 978.128269 969.344669 978.021509 979.256603
> 993.236457 984.078399 981.654834 976.295925 969.796277
>  Average (error):  979 tps (11 tps)
>
> The results showed no significant change. The conclusion of today's
> test would be that there are no improvement at PostgreSQL when using
> -march=prescott.
>
> I only see 3 diferences between yesterday's server and today's: the
> kernel version (y: 2.6.18, t:2.6.17), the server uses an IDE harddrive
> (yesterday was SATA), and the gcc version (3.4.6 -> 4.1.1).
>
> I don't know why yesterday we had improved and today we had not.
>
> Best
> Daniel
>
> On 12/12/06, Daniel van Ham Colchete <daniel.colchete@gmail.com> wrote:
>> I'm making some other tests here at another hardware (also Gentoo). I
>> found out that PostgreSQL stops for a while if I change the -t
>> parameter on bgbench from 600 to 1000 and I have ~150 tps instead of
>> ~950tps.

Re: New to PostgreSQL, performance considerations

From
Cosimo Streppone
Date:
Michael Stone wrote:

> On Tue, Dec 12, 2006 at 01:42:06PM +0100, Cosimo Streppone wrote:
>> "-O0" ~ 957 tps
>> "-O1 -mcpu=pentium4 -mtune=pentium4" ~ 1186 tps
>> "-O2 -mcpu=pentium4 -mtune=pentium4" ~ 1229 tps
>> "-O3 -mcpu=pentium4 -mtune=pentium4" ~ 1257 tps
>> "-O6 -mcpu=pentium4 -mtune=pentium4" ~ 1254 tps
>>
>> I'm curious now to get the same tests run with
>> a custom-cflags-compiled glibc.
>
> I'd be curious to see -O2 with and without the arch-specific flags,
> since that's mostly what the discussion is about.

I run the same tests only for:

1) '-O2'
2) '-O2 -march=pentium4 -mtune=pentium4 -mcpu=pentium4'
    (so no more doubts here, and thanks for gcc hints :-)

and I obtained respectively an average of 1238 (plain -O2)
vs. 1229 tps on 9 runs.
Disk subsystem is a standard desktop SATA, no more than that.

I tried also recompiling *only* pgbench with various options, but as
I expected (and hoped) nothing changed.

Interesting, eh?

--
Cosimo


Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 11:11 AM 12/13/2006, Cosimo Streppone wrote:

>Interesting, eh?
>
>Cosimo

What I find interesting is that so far Guido's C2D Mac laptop has
gotten the highest values by far in this set of experiments, and no
one else is even close.
The slowest results, Michael's, are on the system with what appears
to be the slowest CPU of the bunch; and the ranking of the rest of
the results seem to similarly depend on relative CPU
performance.  This is not what one would naively expect when benching
a IO intensive app like a DBMS.

Given that the typical laptop usually has 1 HD, and a relatively
modest one at that (the fastest available are SATA 7200rpm or
Seagate's perpendicular recording 5400rpm) in terms of performance,
this feels very much like other factors are bottlenecking the
experiments to the point where Daniel's results regarding compiler
options are not actually being tested.

Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more
representative of a typical DB server hooked up to it?

Again, the best way to confirm/deny Daniel's results is to duplicate
the environment he obtained those results with as closely as possible
(preferably exactly) and then have someone else try to duplicate his results.

Also, I think the warnings regarding proper configuration of pgbench
and which version of pgbench to use are worthy of note.  Do we have
guidance yet as to what checkpoint_segments should be set
to?   Should we be considering using something other than pgbench for
such experiments?

Ron Peacetree



Re: New to PostgreSQL, performance considerations

From
"Bucky Jordan"
Date:
> What I find interesting is that so far Guido's C2D Mac laptop has
> gotten the highest values by far in this set of experiments, and no
> one else is even close.
> The slowest results, Michael's, are on the system with what appears
> to be the slowest CPU of the bunch; and the ranking of the rest of
> the results seem to similarly depend on relative CPU
> performance.  This is not what one would naively expect when benching
> a IO intensive app like a DBMS.
>
> Given that the typical laptop usually has 1 HD, and a relatively
> modest one at that (the fastest available are SATA 7200rpm or
> Seagate's perpendicular recording 5400rpm) in terms of performance,
> this feels very much like other factors are bottlenecking the
> experiments to the point where Daniel's results regarding compiler
> options are not actually being tested.
>
> Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more
> representative of a typical DB server hooked up to it?

I've only seen pg_bench numbers > 2,000 tps on either really large
hardware (none of the above mentioned comes close) or the results are in
memory due to a small database size (aka measuring update contention).

Just a guess, but these tests (compiler opts.) seem like they sometimes
show a benefit where the database is mostly in RAM (which I'd guess many
people have) since that would cause more work to be put on the
CPU/Memory subsystems.

Other people on the list hinted at this, but I share their hypothesis
that once you get IO involved as a bottleneck (which is a more typical
DB situation) you won't notice compiler options.

I've got a 2 socket x 2 core woodcrest poweredge 2950 with a BBC 6 disk
RAID I'll run some tests on as soon as I get a chance.

I'm also thinking for this test, there's no need to tweak the default
config other than maybe checkpoint_segments, since I don't really want
postgres using large amounts of RAM (all that does is require me to
build a larger test DB). Thoughts?

Thanks,

Bucky

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 01:49 PM 12/13/2006, Bucky Jordan wrote:

>I've only seen pg_bench numbers > 2,000 tps on either really large
>hardware (none of the above mentioned comes close) or the results
>are in memory due to a small database size (aka measuring update contention).
Which makes a laptop achieving such numbers all the more interesting IMHO.


>Just a guess, but these tests (compiler opts.) seem like they
>sometimes show a benefit where the database is mostly in RAM (which
>I'd guess many people have) since that would cause more work to be
>put on the CPU/Memory subsystems.
The cases where the working set, or the performance critical part of
the working set, of the DB is RAM resident are very important ones ITRW.


>Other people on the list hinted at this, but I share their
>hypothesis that once you get IO involved as a bottleneck (which is a
>more typical DB situation) you won't notice compiler options.
Certainly makes intuitive sense.  OTOH, this list has seen discussion
of what should be IO bound operations being CPU bound.  Evidently due
to the expense of processing pg datastructures.  Only objective
benches are going to tell us where the various limitations on pg
performance really are.


>I've got a 2 socket x 2 core woodcrest poweredge 2950 with a BBC 6
>disk RAID I'll run some tests on as soon as I get a chance.
>
>I'm also thinking for this test, there's no need to tweak the
>default config other than maybe checkpoint_segments, since I don't
>really want postgres using large amounts of RAM (all that does is
>require me to build a larger test DB).

Daniel's orginal system had 512MB RAM.  This suggests to me that
tests involving 256MB of pg memory should be plenty big enough.


>Thoughts?
Hope they are useful.

Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
Guido Neitzer
Date:
On 13.12.2006, at 19:03, Ron wrote:

> What I find interesting is that so far Guido's C2D Mac laptop has
> gotten the highest values by far in this set of experiments, and no
> one else is even close.

This might be the case because I have tested with fsync=off as my
internal harddrive would be a limiting factor and the results
wouldn't be really helpful. Perhaps it's still the IO system, I don't
know. I can try to reproduce the tests as close as possible again.
Perhaps I had different settings on something but I doubt that.

The new Core * CPUs from Intel are extremely fast with PostgreSQL.

> Anyone got a 2.33 GHz C2D box with a decent HD IO subsystem more
> representative of a typical DB server hooked up to it?

I have also now an Xserve with two Dual-Core Xeons and two SAS drives
(15k Seagates) in a mirrored RAID here. Will do some testing tomorrow.

Btw: I always compare only to my own results to have something
comparable - same test, same scripts, same db version, same operating
system and so on. The rest is just pure interest.

cug

Re: New to PostgreSQL, performance considerations

From
Bruce Momjian
Date:
Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > While skimming over the pgbench source it has looked to me like it's
> > necessary to pass the -s switch (scale factor) to both the
> > initialization (-i) and the subsequent (non -i) runs.
>
> No, it's not supposed to be, and I've never found it needed in practice.
> The code seems able to pull the scale out of the database (I forget how
> it figures it out exactly).

pgbench is designed to be a general benchmark, meanining it exercises
all parts of the system.  I am thinking just reexecuting a single SELECT
over and over again would be a better test of the CPU optimizations.

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: New to PostgreSQL, performance considerations

From
"Rajesh Kumar Mallah"
Date:
[offtopic];
hmm quite a long thread below is stats of posting
Total Messages:87    Total Participants: 27
-------------------------------------------------
     19 Daniel van Ham Colchete
     12 Michael Stone
      9 Ron
      5 Steinar H. Gunderson
      5 Alexander Staubo
      4 Tom Lane
      4 Greg Smith
      3 Luke Lonergan
      3 Christopher Browne
      2 Merlin Moncure
      2 Guido Neitzer
      2 Dave Cramer
      2 Cosimo Streppone
      2 Bucky Jordan
      1 Tatsuo Ishii
      1 Sven Geisler
      1 Shane Ambler
      1 Michael Glaesemann
      1 Mark Kirkwood
      1 Gene
      1 Florian Weimer
      1 David Boreham
      1 Craig A. James
      1 Chris Browne
      1 Brad Nicholson
      1 Bill Moran
      1 Alvaro Herrera
-------------------------------------------

Re: New to PostgreSQL, performance considerations

From
Josh Berkus
Date:
Bruce,

> pgbench is designed to be a general benchmark, meanining it exercises
> all parts of the system.  I am thinking just reexecuting a single SELECT
> over and over again would be a better test of the CPU optimizations.

Mostly, though, pgbench just gives the I/O system a workout.  It's not a
really good general workload.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: New to PostgreSQL, performance considerations

From
"Joshua D. Drake"
Date:
On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
> Bruce,
>
> > pgbench is designed to be a general benchmark, meanining it exercises
> > all parts of the system.  I am thinking just reexecuting a single SELECT
> > over and over again would be a better test of the CPU optimizations.
>
> Mostly, though, pgbench just gives the I/O system a workout.  It's not a
> really good general workload.

It also will not utilize all cpus on a many cpu machine. We recently
found that the only way to *really* test with pgbench was to actually
run 4+ copies of pgbench at the same time.

J


>
--

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate




Re: New to PostgreSQL, performance considerations

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
>> Mostly, though, pgbench just gives the I/O system a workout.  It's not a
>> really good general workload.

> It also will not utilize all cpus on a many cpu machine. We recently
> found that the only way to *really* test with pgbench was to actually
> run 4+ copies of pgbench at the same time.

The pgbench app itself becomes the bottleneck at high transaction
rates.  Awhile back I rewrote it to improve its ability to issue
commands concurrently, but then desisted from submitting the
changes --- if we change the app like that, future numbers would
be incomparable to past ones, which sort of defeats the purpose of a
benchmark no?

            regards, tom lane

Re: New to PostgreSQL, performance considerations

From
Michael Glaesemann
Date:
On Dec 14, 2006, at 14:44 , Tom Lane wrote:

> The pgbench app itself becomes the bottleneck at high transaction
> rates.  Awhile back I rewrote it to improve its ability to issue
> commands concurrently, but then desisted from submitting the
> changes --- if we change the app like that, future numbers would
> be incomparable to past ones, which sort of defeats the purpose of a
> benchmark no?

At the same time, if the current pgbench isn't the tool we want to
use, is this kind of backward comparison going to hinder any move to
improve it? It sounds like there's quite a bit of room for
improvement in pg_bench, and in my opinion we should move forward to
make an improved tool, one that measures what we want to measure. And
while comparison with past results might not be possible, there
remains the possibility of rerunning the improved pgbench on previous
systems, I should think.

Michael Glaesemann
grzm seespotcode net



Re: New to PostgreSQL, performance considerations

From
Ron
Date:
Benchmarks, like any other SW, need modernizing and updating from time to time.

Given the multi-core CPU approach to higher performance as the
current fad in CPU architecture, we need a benchmark that is appropriate.

If SPEC feels it is appropriate to rev their benchmark suite
regularly, we probably should as well.

Ron Peacetree

At 12:44 AM 12/14/2006, Tom Lane wrote:
>"Joshua D. Drake" <jd@commandprompt.com> writes:
> > On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
> >> Mostly, though, pgbench just gives the I/O system a workout.  It's not a
> >> really good general workload.
>
> > It also will not utilize all cpus on a many cpu machine. We recently
> > found that the only way to *really* test with pgbench was to actually
> > run 4+ copies of pgbench at the same time.
>
>The pgbench app itself becomes the bottleneck at high transaction
>rates.  Awhile back I rewrote it to improve its ability to issue
>commands concurrently, but then desisted from submitting the
>changes --- if we change the app like that, future numbers would
>be incomparable to past ones, which sort of defeats the purpose of a
>benchmark no?


Re: New to PostgreSQL, performance considerations

From
"Gregory S. Williamson"
Date:
(Re)-Design it to do both, unless there's reason to believe that doing one after the other would skew the results.

Then old results are available, new results are also visible and useful for future comparisons. And seeing them side by
sidemught be an interesting exercise as well, at least for a while. 

(sorry for top-posting -- web based interface that doesn't do proper quoting)

Greg Williamson
DBA
GlobeXplorer LLC

-----Original Message-----
From:    pgsql-performance-owner@postgresql.org on behalf of Michael Glaesemann
Sent:    Wed 12/13/2006 10:11 PM
To:    Tom Lane
Cc:    Joshua D. Drake; Josh Berkus; pgsql-performance@postgresql.org; Bruce Momjian; Alvaro Herrera; Alexander Staubo;
MichaelStone 
Subject:    Re: [PERFORM] New to PostgreSQL, performance considerations


On Dec 14, 2006, at 14:44 , Tom Lane wrote:

> The pgbench app itself becomes the bottleneck at high transaction
> rates.  Awhile back I rewrote it to improve its ability to issue
> commands concurrently, but then desisted from submitting the
> changes --- if we change the app like that, future numbers would
> be incomparable to past ones, which sort of defeats the purpose of a
> benchmark no?

At the same time, if the current pgbench isn't the tool we want to
use, is this kind of backward comparison going to hinder any move to
improve it? It sounds like there's quite a bit of room for
improvement in pg_bench, and in my opinion we should move forward to
make an improved tool, one that measures what we want to measure. And
while comparison with past results might not be possible, there
remains the possibility of rerunning the improved pgbench on previous
systems, I should think.

Michael Glaesemann
grzm seespotcode net



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster


-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com

"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4580ea76236074356172766&user=gsw@globexplorer.com&retrain=spam&template=history&history_page=1"
!DSPAM:4580ea76236074356172766!
-------------------------------------------------------






Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Wed, 13 Dec 2006, Ron wrote:

> The slowest results, Michael's, are on the system with what appears to be the
> slowest CPU of the bunch; and the ranking of the rest of the results seem to
> similarly depend on relative CPU performance.  This is not what one would
> naively expect when benching a IO intensive app like a DBMS.

pgbench with 3000 total transactions and fsync off is barely doing I/O to
disk; it's writing a bunch of data to the filesystem cache and ending the
benchmark before the data even makes it to the hard drive.  This is why
his results become completely different as soon as the number of
transactions increases. With little or no actual disk writes, you should
expect results to be ranked by CPU speed.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Thu, 14 Dec 2006, Tom Lane wrote:

> The pgbench app itself becomes the bottleneck at high transaction rates.
> Awhile back I rewrote it to improve its ability to issue commands
> concurrently, but then desisted from submitting the changes --- if we
> change the app like that, future numbers would be incomparable to past
> ones, which sort of defeats the purpose of a benchmark no?

Not at all.  Here's an example from the PC hardware benchmarking
landscape.  Futuremark Corporation has a very popular benchmark for 3D
hardware called 3DMark.  Every year, they release a new version, and
numbers from it are completely different from those produced by the
previous year's version.  That lets them rev the entire approach taken by
the benchmark to reflect current practice.  So when the standard for
high-end hardware includes, say, acceleration of lighting effects, the new
version will include a lighting test.  In order to break 1000 points (say)
on that test, you absolutely have to have lighting acceleration, even
though on the previous year's test you could score that high without it.

That is not an isolated example; every useful PC benchmark gets updated
regularly, completely breaking backward compatibility, to reflect the
capabilities of current hardware and software.  Otherwise we'd still be
testing how well DOS runs on new processors.  Right now everyone is (or
has already) upgraded their PC benchmarking code such that you need a
dual-core processor to do well on some of the tests.

If you have a pgbench version with better concurrency features, I for one
would love to see it.  I'm in the middle of patching that thing up right
now anyway but hadn't gotten that far (yet--I just spent some of yesterday
staring at how it submits into libpq trying to figure out how to improve
that).  I would be happy to take your changes, my changes, changes to the
base code since you forked it, and reconcile everything together into a
pgbench2007--whose results can't be directly compared to the earlier
version, but are more useful on current gen multi-processor/core systems.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Wed, Dec 13, 2006 at 01:03:04PM -0500, Ron wrote:
>What I find interesting is that so far Guido's C2D Mac laptop has
>gotten the highest values by far in this set of experiments, and no
>one else is even close.
>The slowest results, Michael's, are on the system with what appears
>to be the slowest CPU of the bunch; and the ranking of the rest of
>the results seem to similarly depend on relative CPU
>performance.  This is not what one would naively expect when benching
>a IO intensive app like a DBMS.

Note that I ran with fsync off, that the data set is <300M, and that all
of the systems (IIRC) have at least 1G RAM. This is exactly the
distribution I would expect since we're configuring the benchmark to
determine whether cpu-specific optimizations affect the results.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 10:00 AM 12/14/2006, Greg Smith wrote:
>On Wed, 13 Dec 2006, Ron wrote:
>
>>The slowest results, Michael's, are on the system with what appears
>>to be the slowest CPU of the bunch; and the ranking of the rest of
>>the results seem to similarly depend on relative CPU
>>performance.  This is not what one would naively expect when
>>benching a IO intensive app like a DBMS.
>
>pgbench with 3000 total transactions and fsync off is barely doing
>I/O to disk; it's writing a bunch of data to the filesystem cache
>and ending the benchmark before the data even makes it to the hard
>drive.  This is why his results become completely different as soon
>as the number of transactions increases. With little or no actual
>disk writes, you should expect results to be ranked by CPU speed.
I of course agree with you in the general sense.  OTOH, I'm fairly
sure the exact point where this cross-over occurs is dependent on the
components and configuration of the system involved.

(Nor do I want to dismiss this scenario as irrelevant or
unimportant.  There are plenty of RW situations where this takes
place or where the primary goal of a tuning effort is to make it take
place.  Multi-GB BB RAID caches anyone?)

In addition, let's keep in mind that we all know that overall system
performance is limited by whatever component hits its limits
first.  Local pg performance has been known to be limited by any of :
CPUs, memory subsystems, or physical IO subsystems.  Intuitively, one
tends to expect only the later to be a limiting factor in the vast
majority of DBMS tasks.  pg has a history of regularly surprising
such intuition in many cases.
IMO, this makes good bench marking tools and procedures more
important  to have.

(If nothing else, knowing what component is likely to be the
bottleneck in system "X" made of components "x1, x2, x3, ...." for
task "Y" is valuable lore for the pg community to have as preexisting
data when first asked any given question on this list! )

One plausible positive effect of tuning like that Daniel advocates is
to "move" the level of system activity where the physical IO
subsystem becomes the limiting factor on overall system performance.

We are not going to know definitively if such an effect exists, or to
what degree, or how to achieve it, if we don't have appropriately
rigorous and reproducible experiments (and documentation of them) in
place to test for it.

  So it seem to make sense that the community should have a
discussion about the proper bench marking of pg and to get some
results based on said.

Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
Matthew O'Connor
Date:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>> On Wed, 2006-12-13 at 18:36 -0800, Josh Berkus wrote:
>>> Mostly, though, pgbench just gives the I/O system a workout.  It's not a
>>> really good general workload.
>
>> It also will not utilize all cpus on a many cpu machine. We recently
>> found that the only way to *really* test with pgbench was to actually
>> run 4+ copies of pgbench at the same time.
>
> The pgbench app itself becomes the bottleneck at high transaction
> rates.  Awhile back I rewrote it to improve its ability to issue
> commands concurrently, but then desisted from submitting the
> changes --- if we change the app like that, future numbers would
> be incomparable to past ones, which sort of defeats the purpose of a
> benchmark no?

What is to stop us from running the new pgbench against older versions
of PGSQL?  Any stats taken from a run of pgbench a long time ago
probably aren't relevant against a modern test anyway as the underlying
hardware and OS are likely to have changed or been updated.



Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 14, 2006, at 16:00 , Greg Smith wrote:

> On Wed, 13 Dec 2006, Ron wrote:
>
>> The slowest results, Michael's, are on the system with what
>> appears to be the slowest CPU of the bunch; and the ranking of the
>> rest of the results seem to similarly depend on relative CPU
>> performance.  This is not what one would naively expect when
>> benching a IO intensive app like a DBMS.
>
> pgbench with 3000 total transactions and fsync off is barely doing
> I/O to disk; it's writing a bunch of data to the filesystem cache
> and ending the benchmark before the data even makes it to the hard
> drive.  This is why his results become completely different as soon
> as the number of transactions increases. With little or no actual
> disk writes, you should expect results to be ranked by CPU speed.

I also second your suggestion that pgbench should be run with -S to
disable updates. As far as I can see, nobody has reported numbers for
this setting, so here goes. I also increased the buffer size, which I
found was needed to avoid hitting the disk for block reads, and
increased the memory settings.

My PostgreSQL config overrides, then, are:

shared_buffers = 1024MB
work_mem = 1MB
maintenance_work_mem = 16MB
fsync = off

Environment: Linux 2.6.15-23-amd64-generic on Ubuntu. Dual-core AMD
Opteron 280 with 4GB of RAM. LSI PCI-X Fusion-MPT SAS.

Running with: pgbench -S -v -n -t 5000 -c 5.

Results as a graph: http://purefiction.net/paste/pgbench.pdf

Stats for CFLAGS="-O0": 18440.181894 19207.882300 19894.432185
19635.625622 19876.858884 20032.597042 19683.597973 20370.166669
19989.157881 20207.343510 19993.745956 20081.353580 20356.416424
20047.810017 20319.834190 19417.807528 19906.788454 20536.039929
19491.308046 20002.144230

Stats for CFLAGS="-O3 -msse2 -mfpmath=sse -funroll-loops -m64 -
march=opteron -pipe": 23830.358351 26162.203569 25569.091264
26762.755665 26590.822550 26864.908197 26608.029665 26796.116921
26323.742015 26692.576261 26878.859132 26106.770425 26328.371664
26755.595130 25488.304946 26635.527959 26377.485023 24817.590708
26480.245737 26223.427801

Alexander.


Re: New to PostgreSQL, performance considerations

From
Ron
Date:
Alexander,   Good stuff.

Can you do runs with just CFLAGS="-O3" and just CFLAGS="-msse2
-mfpmath=sse -funroll-loops -m64 - march=opteron -pipe" as well ?

As it is, you've given a good lower and upper bound on your
performance obtainable using compiler options, but you've given no
data to show what effect arch specific compiler options have by themselves.

Also, what HDs are you using?  How many in what config?

Thanks in Advance,
Ron Peacetree

At 02:14 PM 12/14/2006, Alexander Staubo wrote:

>My PostgreSQL config overrides, then, are:
>
>shared_buffers = 1024MB
>work_mem = 1MB
>maintenance_work_mem = 16MB
>fsync = off
>
>Environment: Linux 2.6.15-23-amd64-generic on Ubuntu. Dual-core AMD
>Opteron 280 with 4GB of RAM. LSI PCI-X Fusion-MPT SAS.
>
>Running with: pgbench -S -v -n -t 5000 -c 5.
>
>Results as a graph: http://purefiction.net/paste/pgbench.pdf
>
>Stats for CFLAGS="-O0": 18440.181894 19207.882300 19894.432185
>19635.625622 19876.858884 20032.597042 19683.597973 20370.166669
>19989.157881 20207.343510 19993.745956 20081.353580 20356.416424
>20047.810017 20319.834190 19417.807528 19906.788454 20536.039929
>19491.308046 20002.144230
>
>Stats for CFLAGS="-O3 -msse2 -mfpmath=sse -funroll-loops -m64 -
>march=opteron -pipe": 23830.358351 26162.203569 25569.091264
>26762.755665 26590.822550 26864.908197 26608.029665 26796.116921
>26323.742015 26692.576261 26878.859132 26106.770425 26328.371664
>26755.595130 25488.304946 26635.527959 26377.485023 24817.590708
>26480.245737 26223.427801


Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 14, 2006, at 20:28 , Ron wrote:

> Can you do runs with just CFLAGS="-O3" and just CFLAGS="-msse2 -
> mfpmath=sse -funroll-loops -m64 - march=opteron -pipe" as well ?

All right. From my perspective, the effect of -O3 is significant,
whereas architecture-related optimizations have no statistically
significant effect. As far as I'm aware, though, there's no other
arch targets on the Opteron that will make sense, there being no
predecessor CPU instruction set to choose from; -march=pentium4
doesn't exist.

> Also, what HDs are you using?  How many in what config?

I believe the volume is a two-drive RAID 1 configuration, but I'm not
managing these servers, so I'll ask the company's support people.

Interestingly enough I see that PostgreSQL seems to be writing around
1MB/s during the pgbench run, even though I'm running pgbench in the -
S mode. I haven't had the chance to look at the source yet; is it
really only doing selects?

Alexander.

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 05:39 PM 12/14/2006, Alexander Staubo wrote:
>On Dec 14, 2006, at 20:28 , Ron wrote:
>
>>Can you do runs with just CFLAGS="-O3" and just CFLAGS="-msse2 -
>>mfpmath=sse -funroll-loops -m64 - march=opteron -pipe" as well ?
>
>All right. From my perspective, the effect of -O3 is significant,
>whereas architecture-related optimizations have no statistically
>significant effect.

Is this opinion?  Or have you rerun the tests using the flags I
suggested?  If so, can you post the results?

If  "-O3 -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron
-pipe" results in a 30-40% speed up over "-O0", and
" -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe"
results in a 5-10% speedup, then ~ 1/8 - 1/3 of the total possible
speedup is due to arch specific optimizations.

(testing "-O3" in isolation in addition tests for independence of
factors as well as showing what "plain" "-O3" can accomplish.)

Some might argue that a 5-10% speedup which represents 1/8 - 1/3 of
the total speedup is significant...

But enough speculating.  I look forward to seeing your data.

Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 15, 2006, at 01:16 , Ron wrote:

> At 05:39 PM 12/14/2006, Alexander Staubo wrote:
>> On Dec 14, 2006, at 20:28 , Ron wrote:
>>
>>> Can you do runs with just CFLAGS="-O3" and just CFLAGS="-msse2 -
>>> mfpmath=sse -funroll-loops -m64 - march=opteron -pipe" as well ?
>>
>> All right. From my perspective, the effect of -O3 is significant,
>> whereas architecture-related optimizations have no statistically
>> significant effect.
>
> Is this opinion?  Or have you rerun the tests using the flags I
> suggested?  If so, can you post the results?

Sorry, I neglected to include the pertinent graph:

   http://purefiction.net/paste/pgbench2.pdf

The raw data:

CFLAGS="-msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron -pipe":

18480.899621 19977.162108 19640.562003 19823.585944 19500.293284
19964.383540 20228.664827
20515.766366 19956.431120 19740.795459 20184.551390 19984.907398
20457.260691 19771.395220
20159.225628 19907.248149 20197.580815 19947.498185 20209.450748
20088.501904

CFLAGS="-O3"

23814.672315 26846.761905 27137.807960 26957.898233 27109.057570
26997.227925 27291.056939
27565.553643 27422.624323 27392.397185 27757.144967 27402.365372
27563.365421 27349.544685
27544.658154 26957.200592 27523.824623 27457.380654 27052.910082
24452.819263

CFLAGS="-O0"

18440.181894 19207.882300 19894.432185 19635.625622 19876.858884
20032.597042 19683.597973
20370.166669 19989.157881 20207.343510 19993.745956 20081.353580
20356.416424 20047.810017
20319.834190 19417.807528 19906.788454 20536.039929 19491.308046
20002.144230

CFLAGS="-O3 -msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron -
pipe"

23830.358351 26162.203569 25569.091264 26762.755665 26590.822550
26864.908197 26608.029665
26796.116921 26323.742015 26692.576261 26878.859132 26106.770425
26328.371664 26755.595130
25488.304946 26635.527959 26377.485023 24817.590708 26480.245737
26223.427801

> If  "-O3 -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -
> pipe" results in a 30-40% speed up over "-O0", and
> " -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe"
> results in a 5-10% speedup, then ~ 1/8 - 1/3 of the total possible
> speedup is due to arch specific optimizations.

Unfortunately, I don't see a 5-10% speedup; "-O0" and "-msse2 ..."
are statistically identical.

Alexander.

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 07:27 PM 12/14/2006, Alexander Staubo wrote:

>Sorry, I neglected to include the pertinent graph:
>
>   http://purefiction.net/paste/pgbench2.pdf
In fact, your graph suggests that using arch specific options in
addition to -O3 actually =hurts= performance.

...that seems unexpected...
Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Thu, 14 Dec 2006, Alexander Staubo wrote:

> Interestingly enough I see that PostgreSQL seems to be writing around 1MB/s
> during the pgbench run, even though I'm running pgbench in the -S mode. I
> haven't had the chance to look at the source yet; is it really only doing
> selects?

I've noticed the same thing and have been meaning to figure out what the
cause is.  It's just doing a select in there; it's not even in a begin/end
block.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 15, 2006, at 04:09 , Ron wrote:

> At 07:27 PM 12/14/2006, Alexander Staubo wrote:
>
>> Sorry, I neglected to include the pertinent graph:
>>
>>   http://purefiction.net/paste/pgbench2.pdf
> In fact, your graph suggests that using arch specific options in
> addition to -O3 actually =hurts= performance.

The difference is very slight. I'm going to run without -funroll-
loops and -pipe (which are not arch-related) to get better data.

Alexander.



Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Fri, Dec 15, 2006 at 10:53:25AM +0100, Alexander Staubo wrote:
> The difference is very slight. I'm going to run without -funroll-
> loops and -pipe (which are not arch-related) to get better data.

-pipe does not matter for the generated code; it only affects compiler speed.
(It simply means that the compiler runs cpp | cc | as1 instead of cpp > tmp;
cc < tmp > tmp2; as1 < tmp2.)

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 04:54 AM 12/15/2006, Alexander Staubo wrote:
>On Dec 15, 2006, at 04:09 , Ron wrote:
>
>>At 07:27 PM 12/14/2006, Alexander Staubo wrote:
>>
>>>Sorry, I neglected to include the pertinent graph:
>>>
>>>   http://purefiction.net/paste/pgbench2.pdf
>>In fact, your graph suggests that using arch specific options in
>>addition to -O3 actually =hurts= performance.
>
>According to the tech staff, this is a Sun X4100 with a two-drive
>RAID 1 volume. No idea about the make of the hard drives.
>
>Alexander.
http://www.sun.com/servers/entry/x4100/features.xml

So we are dealing with a 1U 1-4S (which means 1-8C) AMD Kx box with
up to 32GB of ECC RAM (DDR2 ?) and 2 Seagate 2.5" SAS HDs.

http://www.seagate.com/cda/products/discsales/index/1,,,00.html?interface=SAS

My bet is the X4100 contains one of the 3 models of Cheetah
15K.4's.  A simple du, dkinfo, whatever, will tell you which.

I'm looking more closely into exactly what the various gcc -O
optimizations do on Kx's as well.
64b vs 32b gets x86 compatible code access to ~ 2x as many registers;
and MMX or SSE instructions get you access to not only more
registers, but wider ones as well.

As one wit has noted, "all optimization is an exercise in caching."
(Terje Mathisen- one of the better assembler coders on the planet.)

It seems unusual that code generation options which give access to
more registers would ever result in slower code...
Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
"Merlin Moncure"
Date:
On 12/15/06, Ron <rjpeace@earthlink.net> wrote:
> I'm looking more closely into exactly what the various gcc -O
> optimizations do on Kx's as well.
> 64b vs 32b gets x86 compatible code access to ~ 2x as many registers;
> and MMX or SSE instructions get you access to not only more
> registers, but wider ones as well.
>
> As one wit has noted, "all optimization is an exercise in caching."
> (Terje Mathisen- one of the better assembler coders on the planet.)
>
> It seems unusual that code generation options which give access to
> more registers would ever result in slower code...

The slower is probably due to the unroll loops switch which can
actually hurt code due to the larger footprint (less cache coherency).

The extra registers are not all that important because of pipelining
and other hardware tricks.  Pretty much all the old assembly
strategies such as forcing local variables to registers are basically
obsolete...especially with regards to integer math.  As I said before,
modern CPUs are essentially RISC engines with a CISC preprocessing
engine laid in top.

Things are much more complicated than they were in the old days where
you could count instructions for the assembly optimization process.  I
suspect that there is little or no differnece between the -march=686
and the various specifc archicectures.  Did anybody think to look at
the binaries and look for the amount of differences?  I bet you code
compiled for march=opteron will just fine on a pentium 2 if compiled
for 32 bit.

merlin

Re: New to PostgreSQL, performance considerations

From
Greg Smith
Date:
On Fri, 15 Dec 2006, Merlin Moncure wrote:

> The slower is probably due to the unroll loops switch which can
> actually hurt code due to the larger footprint (less cache coherency).

The cache issues are so important with current processors that I'd suggest
throwing -Os (optimize for size) into the mix people test.  That one may
stack usefully with -O2, but probably not with -O3 (3 includes
optimizations that increase code size).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 09:23 AM 12/15/2006, Merlin Moncure wrote:
>On 12/15/06, Ron <rjpeace@earthlink.net> wrote:
>>
>>It seems unusual that code generation options which give access to
>>more registers would ever result in slower code...
>
>The slower is probably due to the unroll loops switch which can
>actually hurt code due to the larger footprint (less cache coherency).

I have seen that effect as well occasionally in the last few decades
;-)  OTOH, suspicion is not _proof_; and I've seen other
"optimizations" turn out to be "pessimizations" over the years as well.


>The extra registers are not all that important because of pipelining
>and other hardware tricks.

No.  Whoever told you this or gave you such an impression was
mistaken.  There are many instances of x86 compatible code that get
30-40% speedups just because they get access to 16 rather than 8 GPRs
when recompiled for x84-64.


>Pretty much all the old assembly strategies such as forcing local
>variables to registers are basically obsolete...especially with
>regards to integer math.

Again, not true.  OTOH, humans are unlikely at this point to be able
to duplicate the accuracy of the compiler's register coloring
algorithms.  Especially on x86 compatibles.  (The flip side is that
_expert_ humans can often put the quirky register set and instruction
pipelines of x86 compatibles to more effective use for a specific
chunk of code than even the best compilers can.)


>As I said before, modern CPUs are essentially RISC engines with a
>CISC preprocessing engine laid in top.

I'm sure you meant modern =x86 compatible= CPUs are essentially RISC
engines with a CISC engine on top.  Just as "all the world's not a
VAX", "all CPUs are not x86 compatibles".  Forgetting this has
occasionally cost folks I know...


>Things are much more complicated than they were in the old days
>where you could count instructions for the assembly optimization process.

Those were the =very= old days in computer time...


>I suspect that there is little or no differnece between the
>-march=686 and the various specifc archicectures.

There should be.  The FSF compiler folks (and the rest of the
industry compiler folks for that matter) are far from stupid.  They
are not just adding compiler switches because they randomly feel like it.

Evidence suggests that the most recent CPUs are in need of =more=
arch specific TLC compared to their ancestors, and that this trend is
not only going to continue, it is going to accelerate.


>Did anybody think to look at the binaries and look for the amount of
>differences?  I bet you code compiled for march=opteron will just
>fine on a pentium 2 if compiled
>for 32 bit.
Sucker bet given that the whole point of a 32b x86 compatible is to
be able to run code on any I32 ISA. CPU.
OTOH, I bet that code optimized for best performance on a P2 is not
getting best performance on a P4.  Or vice versa. ;-)

The big arch specific differences in Kx's are in 64b mode.  Not 32b


Ron Peacetree.


Re: New to PostgreSQL, performance considerations

From
"Merlin Moncure"
Date:
On 12/15/06, Ron <rjpeace@earthlink.net> wrote:
> At 09:23 AM 12/15/2006, Merlin Moncure wrote:
> >On 12/15/06, Ron <rjpeace@earthlink.net> wrote:
> >>
> >>It seems unusual that code generation options which give access to
> >>more registers would ever result in slower code...
> >
> >The slower is probably due to the unroll loops switch which can
> >actually hurt code due to the larger footprint (less cache coherency).
>
> I have seen that effect as well occasionally in the last few decades
> ;-)  OTOH, suspicion is not _proof_; and I've seen other
> "optimizations" turn out to be "pessimizations" over the years as well.
>
> >The extra registers are not all that important because of pipelining
> >and other hardware tricks.
>
> No.  Whoever told you this or gave you such an impression was
> mistaken.  There are many instances of x86 compatible code that get
> 30-40% speedups just because they get access to 16 rather than 8 GPRs
> when recompiled for x84-64.

I'm not debating that this is true in specific cases.  Encryption and
video en/decoding  have shown to be faster in 64 bit mode on the same
achicture (a cursor search in google will confirm this).  However,
32-64 bit is not the same argument since there are a lot of other
variables besides more registers.  64 bit mode is often slower on many
programs because the extra code size from 64 bit pointers.  We
benchmarked PostgreSQL internally here and found it to be fastest in
32 bit mode running on a 64 bit platform -- this was on a quad opteron
870 runnning our specific software stack, your results might be
differnt of course.

> >Pretty much all the old assembly strategies such as forcing local
> >variables to registers are basically obsolete...especially with
> >regards to integer math.
>
> Again, not true.  OTOH, humans are unlikely at this point to be able
> to duplicate the accuracy of the compiler's register coloring
> algorithms.  Especially on x86 compatibles.  (The flip side is that
> _expert_ humans can often put the quirky register set and instruction
> pipelines of x86 compatibles to more effective use for a specific
> chunk of code than even the best compilers can.)
>
>
> >As I said before, modern CPUs are essentially RISC engines with a
> >CISC preprocessing engine laid in top.
>
> I'm sure you meant modern =x86 compatible= CPUs are essentially RISC
> engines with a CISC engine on top.  Just as "all the world's not a
> VAX", "all CPUs are not x86 compatibles".  Forgetting this has
> occasionally cost folks I know...

yes, In fact made this point earler.

> >Things are much more complicated than they were in the old days
> >where you could count instructions for the assembly optimization process.
>
> Those were the =very= old days in computer time...
>
>
> >I suspect that there is little or no differnece between the
> >-march=686 and the various specifc archicectures.
>
> There should be.  The FSF compiler folks (and the rest of the
> industry compiler folks for that matter) are far from stupid.  They
> are not just adding compiler switches because they randomly feel like it.
>
> Evidence suggests that the most recent CPUs are in need of =more=
> arch specific TLC compared to their ancestors, and that this trend is
> not only going to continue, it is going to accelerate.
>
>
> >Did anybody think to look at the binaries and look for the amount of
> >differences?  I bet you code compiled for march=opteron will just
> >fine on a pentium 2 if compiled
> >for 32 bit.
> Sucker bet given that the whole point of a 32b x86 compatible is to
> be able to run code on any I32 ISA. CPU.
> OTOH, I bet that code optimized for best performance on a P2 is not
> getting best performance on a P4.  Or vice versa. ;-)
>
> The big arch specific differences in Kx's are in 64b mode.  Not 32b

I dont think so.  IMO all the processor specific instruction sets were
hacks of 32 bit mode to optimize specific tasks.  Except for certain
things these instructions are rarely, if ever used in 64 bit mode,
especially in integer math (i.e. database binaries).  Since Intel and
AMD64 64 bit are virtually indentical I submit that -march is not
really important anymore except for very, very specific (but
important) cases like spinlocks.  This thread is about how much
architecture depenant binares can beat standard ones.  I say they
don't very much at all, and with the specific exception of Daniel's
benchmarking the results posted to this list bear that out.

merlin

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 09:50 AM 12/15/2006, Greg Smith wrote:
>On Fri, 15 Dec 2006, Merlin Moncure wrote:
>
>>The slower is probably due to the unroll loops switch which can
>>actually hurt code due to the larger footprint (less cache coherency).
>
>The cache issues are so important with current processors that I'd
>suggest throwing -Os (optimize for size) into the mix people
>test.  That one may stack usefully with -O2, but probably not with
>-O3 (3 includes optimizations that increase code size).

-Os
Optimize for size. -Os enables all -O2 optimizations that do not
typically increase code size. It also performs further optimizations
designed to reduce code size.

-Os disables the following optimization flags:
-falign-functions -falign-jumps -falign-loops -falign-labels
-freorder-blocks -freorder-blocks-and-partition
-fprefetch-loop-arrays
-ftree-vect-loop-version

Hmmm.  That list of disabled flags bears thought.

-falign-functions -falign-jumps -falign-loops -falign-labels

1= Most RISC CPUs performance is very sensitive to misalignment
issues.  Not recommended to turn these off.

-freorder-blocks
Reorder basic blocks in the compiled function in order to reduce
number of taken branches and improve code locality.

Enabled at levels -O2, -O3.
-freorder-blocks-and-partition
In addition to reordering basic blocks in the compiled function, in
order to reduce number of taken branches, partitions hot and cold
basic blocks into separate sections of the assembly and .o files, to
improve paging and cache locality performance.

This optimization is automatically turned off in the presence of
exception handling, for link once sections, for functions with a
user-defined section attribute and on any architecture that does not
support named sections.

2= Most RISC CPUs are cranky about branchy code and (lack of) cache
locality.  Wouldn't suggest punting these either.

-fprefetch-loop-arrays
If supported by the target machine, generate instructions to prefetch
memory to improve the performance of loops that access large arrays.

This option may generate better or worse code; results are highly
dependent on the structure of loops within the source code.

3= OTOH, This one looks worth experimenting with turning off.

-ftree-vect-loop-version
Perform loop versioning when doing loop vectorization on trees. When
a loop appears to be vectorizable except that data alignment or data
dependence cannot be determined at compile time then vectorized and
non-vectorized versions of the loop are generated along with runtime
checks for alignment or dependence to control which version is
executed. This option is enabled by default except at level -Os where
it is disabled.

4= ...and this one looks like a 50/50 shot.

Ron Peacetree




Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 10:55 AM 12/15/2006, Merlin Moncure wrote:
>On 12/15/06, Ron <rjpeace@earthlink.net> wrote:
>>
>>There are many instances of x86 compatible code that get
>>30-40% speedups just because they get access to 16 rather than 8 GPRs
>>when recompiled for x84-64.
>
>...We benchmarked PostgreSQL internally here and found it to be
>fastest in 32 bit mode running on a 64 bit platform -- this was on a
>quad opteron 870 runnning our specific software stack, your results
>might be differnt of course.

On AMD Kx's, you probably will get best performance in 64b mode (so
you get all those extra registers and other stuff) while using 32b
pointers (to keep code size and memory footprint down).

On Intel C2's, things are more complicated since Intel's x86-64
implementation and memory IO architecture are both different enough
from AMD's to have caused some consternation on occasion when Intel's
64b performance did not match AMD's.



>>The big arch specific differences in Kx's are in 64b mode.  Not 32b
>
>I dont think so.  IMO all the processor specific instruction sets were
>hacks of 32 bit mode to optimize specific tasks.  Except for certain
>things these instructions are rarely, if ever used in 64 bit mode,
>especially in integer math (i.e. database binaries).  Since Intel and
>AMD64 64 bit are virtually indentical I submit that -march is not
>really important anymore except for very, very specific (but
>important) cases like spinlocks.

Take a good look at the processor specific manuals and the x86-64
benches around the net.  The evidence outside the DBMS domain is
pretty solidly in contrast to your statement and
position.  Admittedly, DBMS are not web servers or FPS games or ...
That's why we need to do our own rigorous study of the subject.


>This thread is about how much architecture depenant binares can beat
>standard ones.  I say they don't very much at all, and with the
>specific exception of Daniel's
>benchmarking the results posted to this list bear that out.
...and IMHO the issue is still very much undecided given that we
don't have enough properly obtained and documented evidence.

ATM, the most we can say is that in a number of systems with modest
physical IO subsystems that are not running Gentoo Linux we have not
been able to duplicate the results.  (We've also gotten some
interesting results like yours suggesting the arch specific
optimizations are bad for pg performance in your environment.)

In the process questions have been asked and issues raised regarding
both the tolls involved and the proper way to use them.

We really do need to have someone other than Daniel duplicate his
Gentoo environment and independently try to duplicate his results.


...and let us bear in mind that this is not just intellectual
curiosity.  The less pg is mysterious, the better the odds pg will be
adopted in any specific case.
Ron Peacetree



Re: New to PostgreSQL, performance considerations

From
Alexander Staubo
Date:
On Dec 15, 2006, at 17:53 , Ron wrote:

> At 09:50 AM 12/15/2006, Greg Smith wrote:
>> On Fri, 15 Dec 2006, Merlin Moncure wrote:
>>
>>> The slower is probably due to the unroll loops switch which can
>>> actually hurt code due to the larger footprint (less cache
>>> coherency).
>>
>> The cache issues are so important with current processors that I'd
>> suggest throwing -Os (optimize for size) into the mix people
>> test.  That one may stack usefully with -O2, but probably not with
>> -O3 (3 includes optimizations that increase code size).
>
> -Os
> Optimize for size. -Os enables all -O2 optimizations that do not
> typically increase code size. It also performs further
> optimizations designed to reduce code size.

So far I have been compiling PostgreSQL and running my pgbench script
manually, but this makes me want to modify my script to run pgbench
automatically using all possible permutations of a set of compiler
flags.

Last I tried GCC to produce 32-bit code on this Opteron system,
though, it complained about the lack of a compiler; can I persuade it
to generate 32-bit code (or 32-bit pointers for that matter) without
going the cross-compilation route?

Alexander.

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Fri, Dec 15, 2006 at 12:24:46PM -0500, Ron wrote:
>ATM, the most we can say is that in a number of systems with modest
>physical IO subsystems


So I reran it on a 3.2GHz xeon with 6G RAM off a ramdisk; I/O ain't the
bottleneck on that one. Results didn't show didn't show any signficant
gains regardless of compilation options (results hovered around 12k
tps). If people want to continue this, I will point out that they should
make sure they're linked against the optimized libpq rather than an
existing one elsewhere in the library path. Beyond that, I'm done with
this thread. Maybe there are some gains to be found somewhere, but the
testing done thus far (while limited) is sufficient, IMO, to demonstrate
that compiler options aren't going to provide a blow-your-socks-off
dramatic performance improvement.

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Cosimo Streppone
Date:
Alexander Staubo wrote:

> On Dec 15, 2006, at 17:53 , Ron wrote:
>
>> At 09:50 AM 12/15/2006, Greg Smith wrote:
>>
>>> On Fri, 15 Dec 2006, Merlin Moncure wrote:
>>>
>>>> The slower is probably due to the unroll loops switch which can
>>>> actually hurt code due to the larger footprint (less cache  coherency).
>>>
>>> The cache issues are so important with current processors that I'd
>>> suggest throwing -Os (optimize for size) into the mix people  test.
>
> So far I have been compiling PostgreSQL and running my pgbench script
> manually, but this makes me want to modify my script to run pgbench
> automatically using all possible permutations of a set of compiler  flags.

I don't know if it's practical, but this link comes to mind:

http://clusty.com/search?query=acovea

--
Cosimo

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
At 07:06 PM 12/15/2006, Michael Stone wrote:
>On Fri, Dec 15, 2006 at 12:24:46PM -0500, Ron wrote:
>>ATM, the most we can say is that in a number of systems with modest
>>physical IO subsystems
>
>
>So I reran it on a 3.2GHz xeon with 6G RAM off a ramdisk; I/O ain't
>the bottleneck on that one. Results didn't show didn't show any
>signficant gains regardless of compilation options (results hovered
>around 12k tps). If people want to continue this, I will point out
>that they should make sure they're linked against the optimized
>libpq rather than an existing one elsewhere in the library path.
>Beyond that, I'm done with this thread. Maybe there are some gains
>to be found somewhere, but the testing done thus far (while limited)
>is sufficient, IMO, to demonstrate that compiler options aren't
>going to provide a blow-your-socks-off dramatic performance improvement.
AFAICT, no one has stated there would be a "blow-your-socks-off
dramatic performance improvement" for pg due to compilation
options.  Just that there might be some, and there might be some that
are arch specific.

So far these experiments have shown
= multiple instances of a ~30-35% performance improvement going from
-O0 to --O3
= 1 instance of arch specific options hurting performance when
combined with -O3
= 1 instance of arch specific options helping performance on an OS
that only one person has tested (Gentoo Linux)
= that a 2.33 GHz C2D Mac laptop (under what OS?) with a typical
laptop modest physical IO subystem can do ~2100tps
= that pg has a "speed limit" on a 3.2GHz Xeon (which kind?) with 6G
RAM off a ramdisk (under what OS?) of ~12K tps
(I'd be curious to see what this limit is with better CPUs and memory
subsystems)

Note that except for the first point, all the other results are
singletons that as of yet have not been reproduced.

The most important "gain" IMO is Knowledge, and I'd say there is
still more to learn and/or verify IMHO. YMMV.

Ron Peacetree


Re: New to PostgreSQL, performance considerations

From
"Steinar H. Gunderson"
Date:
On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote:
> AFAICT, no one has stated there would be a "blow-your-socks-off
> dramatic performance improvement" for pg due to compilation
> options.  Just that there might be some, and there might be some that
> are arch specific.

FWIW, the original claim was: "It's really important to have your GLIBC
compiled for your processor.  It is essencial for performance."

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: New to PostgreSQL, performance considerations

From
Michael Stone
Date:
On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote:
>The most important "gain" IMO is Knowledge, and I'd say there is
>still more to learn and/or verify IMHO. YMMV.

Well, I think there are other areas where I can spend my time where
potential gains are more likely. YMMV (although, I note, you don't seem
to be spending much of your own time testing this)

Mike Stone

Re: New to PostgreSQL, performance considerations

From
Ron
Date:
Sorry for the delay in responding.  I had familial obligations.

As a matter of fact, I am spending a decent amount of time on
this.  I don't usually pore through documentation for compilers and
OS's to the degree I've been since this thread started.  Nor do I
usually try and get access to the HW I'm presently tracking down.

I'll post my thoughts re: detailed analysis of gcc/g++ compiler
options later today or tomorrow as work schedule allows.

Why this is worth it:
1= Any gains from setup and configuration are the cheapest ones
available once we codify how to obtain them.
2= any public database or knowledge about how to best setup,
configure, and test pg is very good for the community.
3= developers need to know and agree on proper procedure and
technique for generating results for discussion or we end up wasting
a lot of time.
4= measuring and documenting pg performance means we know where best
to allocate resources for improving pg.  Or where using pg is
(in)appropriate compared to competitors.

Potential performance gains are not the only value of this thread.
Ron Peacetree


At 12:33 PM 12/16/2006, Michael Stone wrote:
>On Sat, Dec 16, 2006 at 10:53:21AM -0500, Ron wrote:
>>The most important "gain" IMO is Knowledge, and I'd say there is
>>still more to learn and/or verify IMHO. YMMV.
>
>Well, I think there are other areas where I can spend my time where
>potential gains are more likely. YMMV (although, I note, you don't
>seem to be spending much of your own time testing this)