Thread: count * performance issue

count * performance issue

From
"sathiya psql"
Date:
count(*) tooks much time...

but with the where clause we can make this to use indexing,... what where clause we can use??

Am using postgres 7.4 in Debian OS with 1 GB RAM,

am having a table with nearly 50 lakh records,

it has more than 15 columns, i want to count how many records are there, it is taking nearly 17 seconds to do that...

i know that to get a approximate count we can use
         SELECT reltuples FROM pg_class where relname = TABLENAME;

but this give approximate count, and i require exact count...

Re: count * performance issue

From
Chris
Date:
sathiya psql wrote:
> count(*) tooks much time...
>
> but with the where clause we can make this to use indexing,... what
> where clause we can use??
>
> Am using postgres 7.4 in Debian OS with 1 GB RAM,
>
> am having a table with nearly 50 lakh records,

Looks suspiciously like a question asked yesterday:

http://archives.postgresql.org/pgsql-performance/2008-03/msg00068.php

--
Postgresql & php tutorials
http://www.designmagick.com/

Re: count * performance issue

From
"A. Kretschmer"
Date:
am  Thu, dem 06.03.2008, um 11:13:01 +0530 mailte sathiya psql folgendes:
> count(*) tooks much time...
>
> but with the where clause we can make this to use indexing,... what where
> clause we can use??

An index without a WHERE can't help to avoid a seq. scan.


>
> Am using postgres 7.4 in Debian OS with 1 GB RAM,

PG 7.4 are very old... Recent versions are MUCH faster.



>
> am having a table with nearly 50 lakh records,
>
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
>
> i know that to get a approximate count we can use
>          SELECT reltuples FROM pg_class where relname = TABLENAME;
>
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: count * performance issue

From
"Shoaib Mir"
Date:
On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:>
> am having a table with nearly 50 lakh records,
>
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
>
> i know that to get a approximate count we can use
>          SELECT reltuples FROM pg_class where relname = TABLENAME;
>
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.



Or do something like:

ANALYZE tablename;
select reltuple from pg_class where relname = 'tablename';

That will also return the total number of rows in a table and I guess might be much faster then doing a count(*) but yes if trigger can be an option that can be the easiest way to do it and fastest too.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: count * performance issue

From
"sathiya psql"
Date:
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??

On Thu, Mar 6, 2008 at 11:45 AM, Shoaib Mir <shoaibmir@gmail.com> wrote:
On Thu, Mar 6, 2008 at 5:08 PM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:>
> am having a table with nearly 50 lakh records,
>
> it has more than 15 columns, i want to count how many records are there, it is
> taking nearly 17 seconds to do that...
>
> i know that to get a approximate count we can use
>          SELECT reltuples FROM pg_class where relname = TABLENAME;
>
> but this give approximate count, and i require exact count...

There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.



Or do something like:

ANALYZE tablename;
select reltuple from pg_class where relname = 'tablename';

That will also return the total number of rows in a table and I guess might be much faster then doing a count(*) but yes if trigger can be an option that can be the easiest way to do it and fastest too.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: count * performance issue

From
"Shoaib Mir"
Date:
On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <sathiya.psql@gmail.com> wrote:
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??



Dont you have autovacuuming running in the background which is taking care of the analyze as well?

If not then hmm turn it on and doing manual analyze then shouldnt I guess take much time!

But yes, I will say if its possible go with the trigger option as that might be more helpful and a very fast way to do that.

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: count * performance issue

From
Mark Mielke
Date:
 
There aren't a general solution. If you realy need the exact count of
tuples than you can play with a TRIGGER and increase/decrease the
tuple-count for this table in an extra table.

Of course, this means accepting the cost of obtaining update locks on the count table.

The original poster should understand that they can either get a fast estimated count, or they can get a slow accurate count (either slow in terms of select using count(*) or slow in terms of updates using triggers and locking).

Other systems have their own issues. An index scan may be faster than a table scan for databases that can accurately determine counts using only the index, but it's still a relatively slow operation, and people don't normally need an accurate count for records in the range of 100,000+? :-)

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: count * performance issue

From
"sathiya psql"
Date:
will you please tell, what is autovacuuming... and wat it ll do... is there any good article in this....

On Thu, Mar 6, 2008 at 11:56 AM, Shoaib Mir <shoaibmir@gmail.com> wrote:
On Thu, Mar 6, 2008 at 5:19 PM, sathiya psql <sathiya.psql@gmail.com> wrote:
buy every time i need to put ANALYZE...
this takes the same time as count(*) takes, what is the use ??



Dont you have autovacuuming running in the background which is taking care of the analyze as well?

If not then hmm turn it on and doing manual analyze then shouldnt I guess take much time!

But yes, I will say if its possible go with the trigger option as that might be more helpful and a very fast way to do that.


--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: count * performance issue

From
"Shoaib Mir"
Date:


On Thu, Mar 6, 2008 at 5:31 PM, sathiya psql <sathiya.psql@gmail.com> wrote:
will you please tell, what is autovacuuming... and wat it ll do... is there any good article in this....



Read this --> http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html#AUTOVACUUM

--
Shoaib Mir
Fujitsu Australia Software Technology
shoaibm[@]fast.fujitsu.com.au

Re: count * performance issue

From
"A. Kretschmer"
Date:
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
>
>
>         There aren't a general solution. If you realy need the exact count of
>         tuples than you can play with a TRIGGER and increase/decrease the
>         tuple-count for this table in an extra table.
>
>
> Of course, this means accepting the cost of obtaining update locks on the count
> table.
>
> The original poster should understand that they can either get a fast estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
>
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: count * performance issue

From
"sathiya psql"
Date:
is there any way to explicitly force the postgres to use index scan

On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
>
>
>         There aren't a general solution. If you realy need the exact count of
>         tuples than you can play with a TRIGGER and increase/decrease the
>         tuple-count for this table in an extra table.
>
>
> Of course, this means accepting the cost of obtaining update locks on the count
> table.
>
> The original poster should understand that they can either get a fast estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
>
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance

Fwd: count * performance issue

From
"sathiya psql"
Date:


---------- Forwarded message ----------
From: sathiya psql <sathiya.psql@gmail.com>
Date: Thu, Mar 6, 2008 at 12:17 PM
Subject: Re: [PERFORM] count * performance issue
To: "A. Kretschmer" <andreas.kretschmer@schollglas.com>
Cc: psql-performance@postgresql.org


TRIGGER i can use if i want the count of the whole table, but i require for some of the rows with WHERE condition....

so how to do that ???


On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
am  Thu, dem 06.03.2008, um  1:26:46 -0500 mailte Mark Mielke folgendes:
>
>
>         There aren't a general solution. If you realy need the exact count of
>         tuples than you can play with a TRIGGER and increase/decrease the
>         tuple-count for this table in an extra table.
>
>
> Of course, this means accepting the cost of obtaining update locks on the count
> table.
>
> The original poster should understand that they can either get a fast estimated
> count, or they can get a slow accurate count (either slow in terms of select
> using count(*) or slow in terms of updates using triggers and locking).
>
> Other systems have their own issues. An index scan may be faster than a table
> scan for databases that can accurately determine counts using only the index,

No. The current index-implementation contains no information about the
row-visibility within the current transaction. You need to scan the
whole data-table to obtain if the current row are visible within the
current transaction.


> but it's still a relatively slow operation, and people don't normally need an
> accurate count for records in the range of 100,000+? :-)

right.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance


Re: count * performance issue

From
"A. Kretschmer"
Date:
am  Thu, dem 06.03.2008, um 12:13:17 +0530 mailte sathiya psql folgendes:
> is there any way to explicitly force the postgres to use index scan

Not realy, PG use a cost-based optimizer and use an INDEX if it make
sense.


>
> On Thu, Mar 6, 2008 at 12:06 PM, A. Kretschmer <
> andreas.kretschmer@schollglas.com> wrote:

please, no silly top-posting with the complete quote below.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: count * performance issue

From
"A. Kretschmer"
Date:
am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
> TRIGGER i can use if i want the count of the whole table, but i require for
> some of the rows with WHERE condition....
>
> so how to do that ???

Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from <your_table> WHERE <your_row> = ... ?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: count * performance issue

From
"sathiya psql"
Date:

                                                          QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=205756.95..205756.95 rows=1 width=0) (actual time=114675.042..114675.042 rows=1 loops=1)
   ->  Seq Scan on call_log  (cost=0.00..193224.16 rows=5013112 width=0) (actual time=11.754..91429.594 rows=5061619 loops=1)
         Filter: (call_id > 0)
 Total runtime: 114699.797 ms
(4 rows)


it is now taking 114 seconds, i think because of load in my system.... any way will you explain., what is this COST, actual time and other stuffs....

On Thu, Mar 6, 2008 at 12:27 PM, A. Kretschmer <andreas.kretschmer@schollglas.com> wrote:
am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes:
> TRIGGER i can use if i want the count of the whole table, but i require for
> some of the rows with WHERE condition....
>
> so how to do that ???

Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from <your_table> WHERE <your_row> = ... ?


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.org&extra=pgsql-performance

Re: count * performance issue

From
"A. Kretschmer"
Date:
am  Thu, dem 06.03.2008, um 12:36:48 +0530 mailte sathiya psql folgendes:
>
>                                                           QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=205756.95..205756.95 rows=1 width=0) (actual time=
> 114675.042..114675.042 rows=1 loops=1)
>    ->  Seq Scan on call_log  (cost=0.00..193224.16 rows=5013112 width=0)
> (actual time=11.754..91429.594 rows=5061619 loops=1)
>          Filter: (call_id > 0)
>  Total runtime: 114699.797 ms
> (4 rows)

'call_id > 0' are your where-condition? An INDEX can't help, all rows
with call_id > 0 are in the result, and i guess, that's all records in
the table.


>
>
> it is now taking 114 seconds, i think because of load in my system.... any way
> will you explain., what is this COST, actual time and other stuffs....


08:16 < akretschmer> ??explain
08:16 < rtfm_please> For information about explain
08:16 < rtfm_please> see http://explain-analyze.info
08:16 < rtfm_please> or http://www.depesz.com/index.php/2007/08/06/better-explain-analyze/
08:16 < rtfm_please> or http://www.postgresql.org/docs/current/static/sql-explain.html

and

http://redivi.com/~bob/oscon2005_pgsql_pdf/OSCON_Explaining_Explain_Public.pdf


Read this to learn more about explain.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: count * performance issue

From
Dave Cramer
Date:
On 6-Mar-08, at 1:43 AM, sathiya psql wrote:

> is there any way to explicitly force the postgres to use index scan
>
>

If you want to count all the rows in the table there is only one way
to do it (without keeping track yourself with a trigger ); a seq scan.

An index will not help you.

The only thing that is going to help you is really fast disks, and
more memory, and you should consider moving to 8.3 for all the other
performance benefits.

Dave

Re: count * performance issue

From
Dave Cramer
Date:
Hi,

On 6-Mar-08, at 6:58 AM, sathiya psql wrote:

The only thing that is going to help you is really fast disks, and
more memory, and you should consider moving to 8.3 for all the other
performance benefits.
Is 8.3 is a stable version or what is the latest stable version of postgres ??

Yes it is the latest stable version.
moving my database from 7.4 to 8.3 will it do any harm ??

You will have to test this yourself. There may be issues 
what are all the advantages of moving from 7.4 to 8.3

Every version of postgresql has improved performance, and robustness; so you will get better overall performance. However I want to caution you this is not a panacea. It will NOT solve your seq scan problem.


Dave


Re: count * performance issue

From
"sathiya psql"
Date:

Yes it is the latest stable version.

is there any article saying the difference between this 7.3 and 8.4


Re: count * performance issue

From
"Harald Armin Massa"
Date:
Of course, the official documentation covers that information in its
release notes

http://www.postgresql.org/docs/8.3/static/release.html

best wishes

Harald

On Thu, Mar 6, 2008 at 1:43 PM, sathiya psql <sathiya.psql@gmail.com> wrote:
>
>
> >
> >
> >
> > Yes it is the latest stable version.
>
> is there any article saying the difference between this 7.3 and 8.4
>
>
>



--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
fx 01212-5-13695179
-
EuroPython 2008 will take place in Vilnius, Lithuania - Stay tuned!

Re: count * performance issue

From
"A. Kretschmer"
Date:
am  Thu, dem 06.03.2008, um 18:13:50 +0530 mailte sathiya psql folgendes:
>
>     Yes it is the latest stable version.
>
>
> is there any article saying the difference between this 7.3 and 8.4

http://developer.postgresql.org/pgdocs/postgres/release.html


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: count * performance issue

From
Alvaro Herrera
Date:
sathiya psql escribió:
> > Yes it is the latest stable version.
>
> is there any article saying the difference between this 7.3 and 8.4

http://www.postgresql.org/docs/8.3/static/release.html

In particular,
http://www.postgresql.org/docs/8.3/static/release-8-3.html
http://www.postgresql.org/docs/8.3/static/release-8-2.html
http://www.postgresql.org/docs/8.3/static/release-8-1.html
http://www.postgresql.org/docs/8.3/static/release-8-0.html
which are all the major releases between 7.4 and 8.3.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: count * performance issue

From
Mark Mielke
Date:
A. Kretschmer wrote:
am  Thu, dem 06.03.2008, um 12:17:55 +0530 mailte sathiya psql folgendes: 
TRIGGER i can use if i want the count of the whole table, but i require for
some of the rows with WHERE condition....

so how to do that ???   
Okay, in this case a TRIGGER are a bad idea. You can use an INDEX on
this row. Can you show us the output for a EXPLAIN ANALYSE SELECT
count(*) from <your_table> WHERE <your_row> = ... ? 

Actually - in this case, TRIGGER can be a good idea. If your count table can include the where information, then you no longer require an effective table-wide lock for updates.

In the past I have used sequential articles numbers within a topic for an online community. Each topic row had an article_count. To generate a new article, I could update the article_count and use the number I had generated as the article number. To query the number of articles in a particular topic, article_count was available. Given thousands of topics, and 10s of thousands of articles, the system worked pretty good. Not in the millions range as the original poster, but I saw no reason why this wouldn't scale.

For the original poster: You might be desperate and looking for help from the only place you know to get it from, but some of your recent answers have shown that you are either not reading the helpful responses provided to you, or you are unwilling to do your own research. If that continues, I won't be posting to aid you.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: count * performance issue

From
Greg Smith
Date:
On Thu, 6 Mar 2008, sathiya psql wrote:

> is there any article saying the difference between this 7.3 and 8.4

I've collected a list of everything on this topic I've seen at
http://www.postgresqldocs.org/index.php/Version_8.3_Changes

The Feature Matrix linked to there will be a quicker way to see what's
happened than sorting through the release notes.

None of these changes change the fact that getting an exact count in this
situation takes either a sequential scan or triggers.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: count * performance issue

From
Greg Smith
Date:
On Thu, 6 Mar 2008, sathiya psql wrote:

> any way will you explain., what is this COST, actual time and other
> stuffs....

There's a long list of links to tools and articles on this subject at
http://www.postgresqldocs.org/index.php/Using_EXPLAIN

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: count * performance issue

From
Craig James
Date:
In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall,
andeach time the answer is, "It's a sequential scan -- redesign your application." 

My question is: What do the other databases do that Postgres can't do, and why not?

Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do?

On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so
forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it
takesFIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a
second. It's hard for me to believe there isn't a better way. 

This is a real problem.  Countless people (including me) have spent significant effort rewriting applications because
ofthis performance flaw in Postgres.  Over and over, the response is, "You don't really need to do that ... change your
application." Well, sure, it's always possible to change the application, but that misses the point.  To most of us
users,count() seems like it should be a trivial operation.  On other relational database systems, it is a trivial
operation.

This is really a significant flaw on an otherwise excellent relational database system.

My rant for today...
Craig

Re: count * performance issue

From
Bruce Momjian
Date:
Craig James wrote:
> This is a real problem.  Countless people (including me) have
> spent significant effort rewriting applications because of this
> performance flaw in Postgres.  Over and over, the response is,
> "You don't really need to do that ... change your application."
> Well, sure, it's always possible to change the application, but
> that misses the point.  To most of us users, count() seems like
> it should be a trivial operation.  On other relational database
> systems, it is a trivial operation.
>
> This is really a significant flaw on an otherwise excellent
> relational database system.

Have you read the TODO items related to this?

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

Re: count * performance issue

From
"Steinar H. Gunderson"
Date:
On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:
> Count() on Oracle and MySQL is almost instantaneous, even for very large
> tables. So why can't Postgres do what they do?

In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)

/* Steinar */
--
Homepage: http://www.sesse.net/


Re: count * performance issue

From
Bill Moran
Date:
In response to Craig James <craig_james@emolecules.com>:

> In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall,
andeach time the answer is, "It's a sequential scan -- redesign your application." 
>
> My question is: What do the other databases do that Postgres can't do, and why not?
>
> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they
do?

I don't know about Oracle, but MySQL has this problem as well.  Use
innodb tables and see how slow it is.  The only reason myisam tables
don't have this problem is because they don't implement any of the
features that make the problem difficult to solve.

> On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and
soforth.  On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it
takesFIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a
second. It's hard for me to believe there isn't a better way. 

There's been discussion about putting visibility information in indexes.
I don't know how far along that effort is, but I expect that will improve
count() performance significantly.

> This is a real problem.  Countless people (including me) have spent significant effort rewriting applications because
ofthis performance flaw in Postgres.  Over and over, the response is, "You don't really need to do that ... change your
application." Well, sure, it's always possible to change the application, but that misses the point.  To most of us
users,count() seems like it should be a trivial operation.  On other relational database systems, it is a trivial
operation.
>
> This is really a significant flaw on an otherwise excellent relational database system.

Not really.  It really is a design flaw in your application ... it doesn't
make relational sense to use the number of rows in a table for anything.
Just because other people do it frequently doesn't make it right.

That being said, it's still a useful feature, and I don't hear anyone
denying that.  As I said, google around a bit WRT to PG storing
visibility information in indexes, as I think that's the way this will
be improved.

> My rant for today...

Feel better now?

--
Bill Moran

Re: count * performance issue

From
Greg Smith
Date:
On Thu, 6 Mar 2008, Steinar H. Gunderson wrote:

> On Thu, Mar 06, 2008 at 07:28:50AM -0800, Craig James wrote:
>> Count() on Oracle and MySQL is almost instantaneous, even for very large
>> tables. So why can't Postgres do what they do?
>
> In MySQL's case: Handle transactions. (Try COUNT(*) on an InnoDB table.)

Exactly.  There is a good discussion of this at
http://www.mysqlperformanceblog.com/2007/04/10/count-vs-countcol/ and I
found the comments from Ken Jacobs were the most informative.

In short, if you want any reasonable database integrity you have to use
InnoDB with MySQL, and once you make that choice it has the same problem.
You only get this accelerated significantly when using MyISAM, which can
tell you an exact count of all the rows it hasn't corrupted yet.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: count * performance issue

From
"Dave Page"
Date:
On Thu, Mar 6, 2008 at 3:49 PM, Greg Smith <gsmith@gregsmith.com> wrote:
>
>  You only get this accelerated significantly when using MyISAM, which can
>  tell you an exact count of all the rows it hasn't corrupted yet.

Please don't do that again. I'm going to have to spend the next hour
cleaning coffee out of my laptop keyboard.

:-)

--
Dave Page
EnterpriseDB UK Ltd: http://www.enterprisedb.com
PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

Re: count * performance issue

From
"Mark Lewis"
Date:
On Thu, 2008-03-06 at 07:28 -0800, Craig James wrote:
...
> My question is: What do the other databases do that Postgres can't do, and why not?
>
> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they
do?
...

I can vouch that Oracle can still take linear time to perform a
count(*), at least in some cases.

I have also seen count(*) fast in some cases too... my understanding is
that they maintain a list of "interested transactions" on a per-relation
basis.  Perhaps they do an optimization based on the index size if there
are no pending DML transactions?

-- Mark



Re: count * performance issue

From
"D'Arcy J.M. Cain"
Date:
On Thu, 06 Mar 2008 07:28:50 -0800
Craig James <craig_james@emolecules.com> wrote:
> In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall,
andeach time the answer is, "It's a sequential scan -- redesign your application." 
>
> My question is: What do the other databases do that Postgres can't do, and why not?
>
> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they
do?

It's a tradeoff.  The only way to get that information quickly is to
maintain it internally when you insert or delete a row.  So when do you
want to take your hit.  It sounds like Oracle has made this decision
for you.  In PostgreSQL you can use triggers and rules to manage this
information if you need it.  You can even do stuff like track how many
of each type of something you have.  That's something you can't do if
your database engine has done a generic speedup for you.  You would
still have to create your own table for something like that and then
you get the hit twice.

--
D'Arcy J.M. Cain <darcy@druid.net>         |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: count * performance issue

From
Tom Lane
Date:
Craig James <craig_james@emolecules.com> writes:
> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they
do?

AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.

            regards, tom lane

Re: count * performance issue

From
Mark Kirkwood
Date:
Craig James wrote:
>
> My question is: What do the other databases do that Postgres can't do,
> and why not?
>
> Count() on Oracle and MySQL is almost instantaneous, even for very
> large tables. So why can't Postgres do what they do?
>

I think Mysql can only do that for the myisam engine - innodb and
falcon  are similar to Postgres.

I don't believe Oracle optimizes bare count(*) on a table either - tho
it may be able to use a suitable index (if present) to get the answer
quicker.

regards

Mark

Re: count * performance issue

From
Craig James
Date:
Tom Lane wrote:
> Craig James <craig_james@emolecules.com> writes:
>> Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they
do?
>
> AFAIK the above claim is false for Oracle.  They have the same
> transactional issues we do.

My experience doesn't match this claim.  When I ported my application from Oracle to Postgres, this was the single
biggestperformance problem.  count() in Oracle was always very fast.  We're not talking about a 20% or 50% difference,
we'retalking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three
ordersof magnitude. 

It may be that Oracle has a way to detect when there's no transaction and use a faster method.  If so, this was a
cleveroptimization -- in my experience, that represents the vast majority of the times you want to use count().  It's
notvery useful to count the rows of a table that many apps are actively modifying since the result may change the
momentyour transaction completes.  Most of the time when you use count(), it's because you're the only one modifying
thetable, so the count will be meaningful. 

Craig


Re: count * performance issue

From
paul rivers
Date:
Craig James wrote:
> Tom Lane wrote:
>> Craig James <craig_james@emolecules.com> writes:
>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>> large tables. So why can't Postgres do what they do?
>>
>> AFAIK the above claim is false for Oracle.  They have the same
>> transactional issues we do.
>
> My experience doesn't match this claim.  When I ported my application
> from Oracle to Postgres, this was the single biggest performance
> problem.  count() in Oracle was always very fast.  We're not talking
> about a 20% or 50% difference, we're talking about a small fraction of
> a second (Oracle) versus a minute (Postgres) -- something like two or
> three orders of magnitude.
>
> It may be that Oracle has a way to detect when there's no transaction
> and use a faster method.  If so, this was a clever optimization -- in
> my experience, that represents the vast majority of the times you want
> to use count().  It's not very useful to count the rows of a table
> that many apps are actively modifying since the result may change the
> moment your transaction completes.  Most of the time when you use
> count(), it's because you're the only one modifying the table, so the
> count will be meaningful.
>
> Craig
>
>

Oracle will use a btree index on a not null set of columns to do a fast
full index scan, which can be an order of magnitude or faster compared
to a table scan.  Also, Oracle can use a bitmap index (in cases where a
bitmap index isn't otherwise silly) for a bitmap fast index scan/bitmap
conversion for similar dramatic results.

For "large" tables, Oracle is not going to be as fast as MyISAM tables
in MySQL, even with these optimizations, since MyISAM doesn't have to
scan even index pages to get a count(*) answer against the full table.

Paul



Re: count * performance issue

From
Mark Kirkwood
Date:
Craig James wrote:
> Tom Lane wrote:
>> Craig James <craig_james@emolecules.com> writes:
>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>> large tables. So why can't Postgres do what they do?
>>
>> AFAIK the above claim is false for Oracle.  They have the same
>> transactional issues we do.
>
> My experience doesn't match this claim.  When I ported my application
> from Oracle to Postgres, this was the single biggest performance
> problem.  count() in Oracle was always very fast.  We're not talking
> about a 20% or 50% difference, we're talking about a small fraction of
> a second (Oracle) versus a minute (Postgres) -- something like two or
> three orders of magnitude.
>

To convince yourself do this in Oracle:

EXPLAIN PLAN FOR SELECT count(*) FROM table_without_any_indexes

and you will see a full table scan. If you add (suitable) indexes you'll
see something like an index full fast scan.


In fact you can make count(*) *very* slow indeed in Oracle, by having an
older session try to count a table that a newer session is modifying and
committing to. The older session's data for the count is reconstructed
from the rollback segments - which is very expensive.

regards

Mark



Re: count * performance issue

From
Josh Berkus
Date:
Tom,

> > Count() on Oracle and MySQL is almost instantaneous, even for very
> > large tables. So why can't Postgres do what they do?
>
> AFAIK the above claim is false for Oracle.  They have the same
> transactional issues we do.

Nope.  Oracle's MVCC is implemented through rollback segments, rather than
non-overwriting the way ours is.  So Oracle can just do a count(*) on the
index, then check the rollback segment for any concurrent
update/delete/insert activity and adjust the count.  This sucks if there's
a *lot* of concurrent activity, but in the usual case it's pretty fast.

I've been thinking that when we apply the Dead Space Map we might be able
to get a similar effect in PostgreSQL.  That is, just do a count over the
index, and visit only the heap pages flagged in the DSM.  Again, for a
heavily updated table this wouldn't have any benefit, but for most cases
it would be much faster.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

Re: count * performance issue

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> Tom,
>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>> large tables. So why can't Postgres do what they do?
>>
>> AFAIK the above claim is false for Oracle.  They have the same
>> transactional issues we do.

> Nope.  Oracle's MVCC is implemented through rollback segments, rather than
> non-overwriting the way ours is.  So Oracle can just do a count(*) on the
> index, then check the rollback segment for any concurrent
> update/delete/insert activity and adjust the count.  This sucks if there's
> a *lot* of concurrent activity, but in the usual case it's pretty fast.

Well, scanning an index to get a count might be significantly faster
than scanning the main table, but it's hardly "instantaneous".  It's
still going to take time proportional to the table size.

Unless they keep a central counter of the number of index entries;
which would have all the same serialization penalties we've talked
about before...

            regards, tom lane

Re: count * performance issue

From
Greg Smith
Date:
On Fri, 7 Mar 2008, Tom Lane wrote:

> Well, scanning an index to get a count might be significantly faster
> than scanning the main table, but it's hardly "instantaneous".  It's
> still going to take time proportional to the table size.

If this is something that's happening regularly, you'd have to hope that
most of the index is already buffered in memory somewhere though, so now
you're talking a buffer/OS cache scan that doesn't touch disk much.
Should be easier for that to be true because the index is smaller than the
table, right?

I know when I'm playing with pgbench the primary key index on the big
accounts table is 1/7 the size of the table, and when using that table
heavily shared_buffers ends up being mostly filled with that index. The
usage counts are so high on the index blocks relative to any section of
the table itself that they're very sticky in memory.  And that's toy data;
on some of the webapps people want these accurate counts for the ratio of
index size to table data is even more exaggerated (think web forum).

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: count * performance issue

From
Mark Mielke
Date:
Josh Berkus wrote:
Count() on Oracle and MySQL is almost instantaneous, even for very
large tables. So why can't Postgres do what they do?     
AFAIK the above claim is false for Oracle.  They have the same
transactional issues we do.   
Nope.  Oracle's MVCC is implemented through rollback segments, rather than 
non-overwriting the way ours is.  So Oracle can just do a count(*) on the 
index, then check the rollback segment for any concurrent 
update/delete/insert activity and adjust the count.  This sucks if there's 
a *lot* of concurrent activity, but in the usual case it's pretty fast

I read the "almost instantaneous" against "the above claim is false" and "Nope.", and I am not sure from the above whether you are saying that Oracle keeps an up-to-date count for the index (which might make it instantaneous?), or whether you are saying it still has to scan the index - which can take time if the index is large (therefore not instantaneous).

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: count * performance issue

From
Tom Lane
Date:
Greg Smith <gsmith@gregsmith.com> writes:
> I know when I'm playing with pgbench the primary key index on the big
> accounts table is 1/7 the size of the table, and when using that table
> heavily shared_buffers ends up being mostly filled with that index. The
> usage counts are so high on the index blocks relative to any section of
> the table itself that they're very sticky in memory.  And that's toy data;
> on some of the webapps people want these accurate counts for the ratio of
> index size to table data is even more exaggerated (think web forum).

Remember that our TOAST mechanism acts to limit the width of the
main-table row.

            regards, tom lane

Re: count * performance issue

From
paul rivers
Date:
Mark Mielke wrote:
> Josh Berkus wrote:
>>>> Count() on Oracle and MySQL is almost instantaneous, even for very
>>>> large tables. So why can't Postgres do what they do?
>>>>
>>> AFAIK the above claim is false for Oracle.  They have the same
>>> transactional issues we do.
>>>
>>
>> Nope.  Oracle's MVCC is implemented through rollback segments, rather than
>> non-overwriting the way ours is.  So Oracle can just do a count(*) on the
>> index, then check the rollback segment for any concurrent
>> update/delete/insert activity and adjust the count.  This sucks if there's
>> a *lot* of concurrent activity, but in the usual case it's pretty fast
>
> I read the "almost instantaneous" against "the above claim is false" and
> "Nope.", and I am not sure from the above whether you are saying that
> Oracle keeps an up-to-date count for the index (which might make it
> instantaneous?), or whether you are saying it still has to scan the
> index - which can take time if the index is large (therefore not
> instantaneous).
>
> Cheers,
> mark
>
> --
> Mark Mielke <mark@mielke.cc>
>

Oracle scans the index pages, if the b-tree index is on non-nullable
columns, or if the bitmap index is on low-ish cardinality data.
Otherwise, it table scans.  MyISAM in MySQL would be an example where a
counter is kept.





Re: count * performance issue

From
Arjen van der Meijden
Date:
On 6-3-2008 16:28 Craig James wrote:
> On the one hand, I understand that Postgres has its architecture, and I
> understand the issue of row visibility, and so forth.  On the other
> hand, my database is just sitting there, nothing going on, no
> connections except me, and... it takes FIFTY FIVE SECONDS to count 20
> million rows, a query that either Oracle or MySQL would answer in a
> fraction of a second.  It's hard for me to believe there isn't a better
> way.

Can you explain to me how you'd fit this in a fraction of a second?

mysql> select count(*) from messages;
+----------+
| count(*) |
+----------+
| 21908505 |
+----------+
1 row in set (8 min 35.09 sec)

This is a table containing the messages on forumtopics and is therefore
relatively large. The hardware is quite beefy for a forum however (4
3Ghz cores, 16GB, 14+1 disk raid5). This table has about 20GB of data.

If I use a table that contains about the same amount of records as the
above and was before this query probably much less present in the
innodb-buffer (but also less frequently touched by other queries), we
see this:

mysql> select count(*) from messagesraw;
+----------+
| count(*) |
+----------+
| 21962804 |
+----------+
1 row in set (5 min 16.41 sec)

This table is about 12GB.

In both cases MySQL claimed to be 'Using index' with the PRIMARY index,
which for those tables is more or less identical.

Apparently the time is still table-size related, not necessarily
tuple-count related. As this shows:

mysql> select count(*) from articlestats;
+----------+
| count(*) |
+----------+
| 34467246 |
+----------+
1 row in set (54.14 sec)

that table is only 2.4GB, but contains 57% more records, although this
was on another database on a system with somewhat different specs (8
2.6Ghz cores, 16GB, 7+7+1 raid50), used a non-primary index and I have
no idea how well that index was in the system's cache prior to this query.

Repeating it makes it do that query in 6.65 seconds, repeating the
12GB-query doesn't make it any faster.

Anyway, long story short: MySQL's table-count stuff also seems
table-size related. As soon as the index it uses fits in the cache or it
doesn't have to use the primary index, it might be a different story,
but when the table(index) is too large to fit, it is quite slow.
Actually, it doesn't appear to be much faster than Postgresql's (8.2)
table-based counts. If I use a much slower machine (2 2Ghz opterons, 8GB
ddr memory, 5+1 old 15k rpm scsi disks in raid5) with a 1GB, 13M record
table wich is similar to the above articlestats, it is able to return a
count(*) in 3 seconds after priming the cache.

If you saw instantaneous results with MySQL, you have either seen the
query-cache at work or where using myisam. Or perhaps with a fast
system, you had small tuples with a nice index in a nicely primed cache.

Best regards,

Arjen

Re: count * performance issue

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Well, scanning an index to get a count might be significantly faster
> than scanning the main table, but it's hardly "instantaneous".  It's
> still going to take time proportional to the table size.

Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap
index can do RLE compression which makes the relationship between the size of
the table and the time taken to scan the index more complex. In the degenerate
case where there are no concurrent updates (assuming you can determine that
quickly) it might actually be constant time.

> Unless they keep a central counter of the number of index entries;
> which would have all the same serialization penalties we've talked
> about before...

Bitmap indexes do in fact have concurrency issues -- arguably they're just a
baroque version of this central counter in this case.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

Re: count * performance issue

From
"Joe Mirabal"
Date:
Gregory,

I just joined this listserv and was happy to see this posting.  I have a 400GB table that I have indexed (building the index took 27 hours) , Loading the table with 10 threads took 9 hours.  I run queries on the data nad get immediate max and min as well as other aggrgate functions very quickly, however a select count(*) of the table takes forever usually nearly an hour or more. 

Do you have any tuning recommendations.  We in our warehouse use the count(*) as our verification of counts by day/month's etc and in Netezza its immediate.  I tried by adding oids. BUT the situation I learned was that adding the oids in the table adds a significasnt amount of space to the data AND the index.

As you may gather from this we are relatively new on Postgres.

Any suggestions you can give me would be most helpful.

Cheers,
Joe

On Mon, Mar 10, 2008 at 11:16 AM, Gregory Stark <stark@enterprisedb.com> wrote:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Well, scanning an index to get a count might be significantly faster
> than scanning the main table, but it's hardly "instantaneous".  It's
> still going to take time proportional to the table size.

Hm, Mark's comment about bitmap indexes makes that not entirely true. A bitmap
index can do RLE compression which makes the relationship between the size of
the table and the time taken to scan the index more complex. In the degenerate
case where there are no concurrent updates (assuming you can determine that
quickly) it might actually be constant time.

> Unless they keep a central counter of the number of index entries;
> which would have all the same serialization penalties we've talked
> about before...

Bitmap indexes do in fact have concurrency issues -- arguably they're just a
baroque version of this central counter in this case.

--
 Gregory Stark
 EnterpriseDB          http://www.enterprisedb.com
 Ask me about EnterpriseDB's Slony Replication support!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Mirabili et Veritas
Joe Mirabal

Re: count * performance issue

From
Bill Moran
Date:
In response to "Joe Mirabal" <jmmirabal@gmail.com>:

> Gregory,
>
> I just joined this listserv and was happy to see this posting.  I have a
> 400GB table that I have indexed (building the index took 27 hours) , Loading
> the table with 10 threads took 9 hours.  I run queries on the data nad get
> immediate max and min as well as other aggrgate functions very quickly,
> however a select count(*) of the table takes forever usually nearly an hour
> or more.
>
> Do you have any tuning recommendations.  We in our warehouse use the
> count(*) as our verification of counts by day/month's etc and in Netezza its
> immediate.  I tried by adding oids. BUT the situation I learned was that
> adding the oids in the table adds a significasnt amount of space to the data
> AND the index.
>
> As you may gather from this we are relatively new on Postgres.
>
> Any suggestions you can give me would be most helpful.

One approach to this problem is to create triggers that keep track of
the total count whenever rows are added or deleted.  This adds some
overhead to the update process, but the correct row count is always
quickly available.

Another is to use EXPLAIN to get an estimate of the # of rows from
the planner.  This works well if an estimate is acceptable, but can't
be trusted for precise counts.

Some searches through the archives should turn up details on these
methods.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: count * performance issue

From
Greg Smith
Date:
On Mon, 10 Mar 2008, Bill Moran wrote:

> Some searches through the archives should turn up details on these
> methods.

I've collected up what looked like the best resources on this topic into
the FAQ entry at http://www.postgresqldocs.org/index.php/Slow_Count

General Bits has already done two good summary articles here and I'd think
wading through the archives directly shouldn't be necessary.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: count * performance issue

From
"Scott Marlowe"
Date:
On Mon, Mar 10, 2008 at 1:54 PM, Joe Mirabal <jmmirabal@gmail.com> wrote:
> Gregory,
>
> I just joined this listserv and was happy to see this posting.  I have a
> 400GB table that I have indexed (building the index took 27 hours) , Loading
> the table with 10 threads took 9 hours.  I run queries on the data nad get
> immediate max and min as well as other aggrgate functions very quickly,
> however a select count(*) of the table takes forever usually nearly an hour
> or more.
>
> Do you have any tuning recommendations.  We in our warehouse use the
> count(*) as our verification of counts by day/month's etc and in Netezza its
> immediate.  I tried by adding oids. BUT the situation I learned was that
> adding the oids in the table adds a significasnt amount of space to the data
> AND the index.

Yeah, this is a typical problem people run into with MVCC databases to
one extent or another.  PostgreSQL has no native way to just make it
faster.  However, if it's a table with wide rows, you can use a lookup
table to help a bit.  Have a FK with cascading deletes from the master
table to a table that just holds the PK for it, and do count(*) on
that table.

Otherwise, you have the trigger solution mentioned previously.

Also, if you only need an approximate count, then you can use the
system tables to get that with something like

select reltuples from pg_class where relname='tablename';

after an analyze.  It won't be 100% accurate, but it will be pretty
close most the time.

Re: count * performance issue

From
Greg Smith
Date:
On Mon, 10 Mar 2008, Joe Mirabal wrote:

> I run queries on the data nad get immediate max and min as well as other
> aggrgate functions very quickly, however a select count(*) of the table
> takes forever usually nearly an hour or more.

Are you sure the form of "select count(*)" you're using is actually
utilizing the index to find a useful subset?  What do you get out of
EXPLAIN ANALZYE on the query?

In order for indexes to be helpful a couple of things need to happen:
1) They have to be structured correctly to be useful
2) There needs to be large enough settings for shared_buffes and
effective_cache_size that the database things it can use them efficiently
3) The tables involved need to be ANALYZEd to keep their statistics up to
date.

The parameters to run a 400GB *table* are very different from the
defaults; if you want tuning suggestions you should post the non-default
entries in your postgresql.conf file from what you've already adjusted
along with basic information about your server (PostgreSQL version, OS,
memory, disk setup).

> We in our warehouse use the count(*) as our verification of counts by
> day/month's etc

If you've got a database that size and you're doing that sort of thing on
it, you really should be considering partitioning as well.

  --
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: count * performance issue

From
"Robins Tharakan"
Date:
Hi,

I have been reading this conversation for a few days now and I just wanted to ask this. From the release notes, one of the new additions in 8.3 is (Allow col IS NULL to use an index (Teodor)).

Sorry, if I am missing something here, but shouldn't something like this allow us to get a (fast) accurate count ?

SELECT COUNT(*) from table WHERE indexed_field IS NULL
+
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

Robins Tharakan

---------- Forwarded message ----------
From: Greg Smith <gsmith@gregsmith.com>
Date: Tue, Mar 11, 2008 at 4:31 AM
Subject: Re: [PERFORM] count * performance issue
To: Joe Mirabal <jmmirabal@gmail.com>
Cc: pgsql-performance@postgresql.org


On Mon, 10 Mar 2008, Joe Mirabal wrote:

> I run queries on the data nad get immediate max and min as well as other
> aggrgate functions very quickly, however a select count(*) of the table
> takes forever usually nearly an hour or more.

Are you sure the form of "select count(*)" you're using is actually
utilizing the index to find a useful subset?  What do you get out of
EXPLAIN ANALZYE on the query?

In order for indexes to be helpful a couple of things need to happen:
1) They have to be structured correctly to be useful
2) There needs to be large enough settings for shared_buffes and
effective_cache_size that the database things it can use them efficiently
3) The tables involved need to be ANALYZEd to keep their statistics up to
date.

The parameters to run a 400GB *table* are very different from the
defaults; if you want tuning suggestions you should post the non-default
entries in your postgresql.conf file from what you've already adjusted
along with basic information about your server (PostgreSQL version, OS,
memory, disk setup).

> We in our warehouse use the count(*) as our verification of counts by
> day/month's etc

If you've got a database that size and you're doing that sort of thing on
it, you really should be considering partitioning as well.

 --
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: count * performance issue

From
Mark Mielke
Date:
Robins Tharakan wrote:
Hi,

I have been reading this conversation for a few days now and I just wanted to ask this. From the release notes, one of the new additions in 8.3 is (Allow col IS NULL to use an index (Teodor)).

Sorry, if I am missing something here, but shouldn't something like this allow us to get a (fast) accurate count ?

SELECT COUNT(*) from table WHERE indexed_field IS NULL
+
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

For PostgreSQL: You still don't know whether the row is visible until you check the row. That it's NULL or NOT NULL does not influence this truth.

Cheers,
mark

-- 
Mark Mielke <mark@mielke.cc>

Re: count * performance issue

From
"Joshua D. Drake"
Date:
On Tue, 11 Mar 2008 08:27:05 +0530
"Robins Tharakan" <tharakan@gmail.com> wrote:

> SELECT COUNT(*) from table WHERE indexed_field IS NULL
> +
> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

If the selectivity is appropriate yes. However if you have 1 million
rows, and 200k of those rows are null (or not null), it is still going
to seqscan.

joshua d. drake

--
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director |  PostgreSQL political pundit


Attachment

Re: count * performance issue

From
"Scott Marlowe"
Date:
On Mon, Mar 10, 2008 at 7:57 PM, Robins Tharakan <tharakan@gmail.com> wrote:
> Hi,
>
> I have been reading this conversation for a few days now and I just wanted
> to ask this. From the release notes, one of the new additions in 8.3 is
> (Allow col IS NULL to use an index (Teodor)).
>
> Sorry, if I am missing something here, but shouldn't something like this
> allow us to get a (fast) accurate count ?
>
> SELECT COUNT(*) from table WHERE indexed_field IS NULL
>  +
> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

It really depends on the distribution of the null / not nulls in the
table.  If it's 50/50 there's no advantage to using the index, as you
still have to check visibility info in the table itself.

OTOH, if NULL (or converserly not null) are rare, then yes, the index
can help.  I.e. if 1% of the tuples are null, the select count(*) from
table where field is null can use the index efficiently.

Re: count * performance issue

From
Albert Cervera Areny
Date:
A Dimarts 11 Març 2008 04:11, Scott Marlowe va escriure:
> On Mon, Mar 10, 2008 at 7:57 PM, Robins Tharakan <tharakan@gmail.com> wrote:
> > Hi,
> >
> > I have been reading this conversation for a few days now and I just
> > wanted to ask this. From the release notes, one of the new additions in
> > 8.3 is (Allow col IS NULL to use an index (Teodor)).
> >
> > Sorry, if I am missing something here, but shouldn't something like this
> > allow us to get a (fast) accurate count ?
> >
> > SELECT COUNT(*) from table WHERE indexed_field IS NULL
> >  +
> > SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL
>
> It really depends on the distribution of the null / not nulls in the
> table.  If it's 50/50 there's no advantage to using the index, as you
> still have to check visibility info in the table itself.
>
> OTOH, if NULL (or converserly not null) are rare, then yes, the index
> can help.  I.e. if 1% of the tuples are null, the select count(*) from
> table where field is null can use the index efficiently.

But you'll get a sequential scan with the NOT NULL case which will end up
taking more time.  (Seq Scan + Index Scan > Seq Scan)

--
Albert Cervera Areny
Dept. Informàtica Sedifa, S.L.

Av. Can Bordoll, 149
08202 - Sabadell (Barcelona)
Tel. 93 715 51 11
Fax. 93 715 51 12

====================================================================
........................  AVISO LEGAL  ............................
La   presente  comunicación  y sus anexos tiene como destinatario la
persona a  la  que  va  dirigida, por  lo  que  si  usted lo  recibe
por error  debe  notificarlo  al  remitente  y   eliminarlo   de  su
sistema,  no  pudiendo  utilizarlo,  total  o   parcialmente,   para
ningún  fin.  Su  contenido  puede  tener información confidencial o
protegida legalmente   y   únicamente   expresa  la  opinión     del
remitente.  El   uso   del   correo   electrónico   vía Internet  no
permite   asegurar    ni  la   confidencialidad   de   los  mensajes
ni    su    correcta     recepción.   En    el  caso   de   que   el
destinatario no consintiera la utilización  del correo  electrónico,
deberá ponerlo en nuestro conocimiento inmediatamente.
====================================================================
........................... DISCLAIMER .............................
This message and its  attachments are  intended  exclusively for the
named addressee. If you  receive  this  message  in   error,  please
immediately delete it from  your  system  and notify the sender. You
may  not  use  this message  or  any  part  of it  for any  purpose.
The   message   may  contain  information  that  is  confidential or
protected  by  law,  and  any  opinions  expressed  are those of the
individual    sender.  Internet  e-mail   guarantees   neither   the
confidentiality   nor  the  proper  receipt  of  the  message  sent.
If  the  addressee  of  this  message  does  not  consent to the use
of   internet    e-mail,    please    inform     us    inmmediately.
====================================================================




Re: count * performance issue

From
Bill Moran
Date:
In response to "Robins Tharakan" <tharakan@gmail.com>:

> Hi,
>
> I have been reading this conversation for a few days now and I just wanted
> to ask this. From the release notes, one of the new additions in 8.3 is
> (Allow col IS NULL to use an index (Teodor)).
>
> Sorry, if I am missing something here, but shouldn't something like this
> allow us to get a (fast) accurate count ?
>
> SELECT COUNT(*) from table WHERE indexed_field IS NULL
> +
> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

For certain, qualified definitions of "fast", sure.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

****************************************************************
IMPORTANT: This message contains confidential information and is
intended only for the individual named. If the reader of this
message is not an intended recipient (or the individual
responsible for the delivery of this message to an intended
recipient), please be advised that any re-use, dissemination,
distribution or copying of this message is prohibited. Please
notify the sender immediately by e-mail if you have received
this e-mail by mistake and delete this e-mail from your system.
E-mail transmission cannot be guaranteed to be secure or
error-free as information could be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
sender therefore does not accept liability for any errors or
omissions in the contents of this message, which arise as a
result of e-mail transmission.
****************************************************************

Re: count * performance issue

From
Matthew
Date:
On Tue, 11 Mar 2008, Bill Moran wrote:

> In response to "Robins Tharakan" <tharakan@gmail.com>:
>> Sorry, if I am missing something here, but shouldn't something like this
>> allow us to get a (fast) accurate count ?
>>
>> SELECT COUNT(*) from table WHERE indexed_field IS NULL
>> +
>> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL
>
> For certain, qualified definitions of "fast", sure.

And certain, qualified definitions of "accurate" as well. Race condition?

Matthew

--
"Television is a medium because it is neither rare nor well done."
  -- Fred Friendly

Re: count * performance issue

From
Tino Wildenhain
Date:
Hi,

Matthew wrote:
> On Tue, 11 Mar 2008, Bill Moran wrote:
>
>> In response to "Robins Tharakan" <tharakan@gmail.com>:
>>> Sorry, if I am missing something here, but shouldn't something like this
>>> allow us to get a (fast) accurate count ?
>>>
>>> SELECT COUNT(*) from table WHERE indexed_field IS NULL
>>> +
>>> SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL
>>
>> For certain, qualified definitions of "fast", sure.
>
> And certain, qualified definitions of "accurate" as well. Race condition?

You mean in a three-state-logic? null, not null and something different?

;-)

Tino

Re: count * performance issue

From
Matthew
Date:
On Tue, 11 Mar 2008, Tino Wildenhain wrote:
>> And certain, qualified definitions of "accurate" as well. Race condition?
>
> You mean in a three-state-logic? null, not null and something different?

True, False, and FILE_NOT_FOUND.

No, actually I was referring to a race condition. So, you find the count
of rows with IS NULL, then someone changes a row, then you find the count
of rows with IS NOT NULL. Add the two together, and there may be rows that
were counted twice, or not at all.

Matthew

--
It's one of those irregular verbs - "I have an independent mind," "You are
an eccentric," "He is round the twist."
                                      -- Bernard Woolly, Yes Prime Minister

Re: count * performance issue

From
"Heikki Linnakangas"
Date:
Matthew wrote:
> No, actually I was referring to a race condition. So, you find the count
> of rows with IS NULL, then someone changes a row, then you find the
> count of rows with IS NOT NULL. Add the two together, and there may be
> rows that were counted twice, or not at all.

Not a problem if you use a serializable transaction, or if you do

SELECT COUNT(*) from table WHERE indexed_field IS NULL
UNION ALL
SELECT COUNT(*) from table WHERE indexed_field IS NOT NULL

as one statement.

However, this makes no sense whatsoever. As both index scans (assuming
the planner even chooses an index scan for them, which seems highly
unlikely) still have to visit each tuple in the heap. It's always going
to be slower than a single "SELECT COUNT(*) FROM table" with a seq scan.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: count * performance issue

From
Andrew Sullivan
Date:
On Tue, Mar 11, 2008 at 02:19:09PM +0000, Matthew wrote:
> of rows with IS NULL, then someone changes a row, then you find the count
> of rows with IS NOT NULL. Add the two together, and there may be rows that
> were counted twice, or not at all.

Only if you count in READ COMMITTED.

A


Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
I just received a new server and thought benchmarks would be interesting.  I think this looks pretty good, but maybe
thereare some suggestions about the configuration file.  This is a web app, a mix of read/write, where writes tend to
be"insert into ... (select ...)" where the resulting insert is on the order of 100 to 10K rows of two integers.  An
externalprocess also uses a LOT of CPU power along with each query. 

Thanks,
Craig


Configuration:
  Dell 2950
  8 CPU (Intel 2GHz Xeon)
  8 GB memory
  Dell Perc 6i with battery-backed cache
  RAID 10 of 8x 146GB SAS 10K 2.5" disks

Everything (OS, WAL and databases) are on the one RAID array.

Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 1000000
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB

Bonnie output (slightly reformatted)

------------------------------------------------------------------------------

Delete files in random order...done.
Version  1.03
         ------Sequential Output------       --Sequential Input-      --Random-
      -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--    --Seeks--
 Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec  %CP    /sec %CP
  16G 64205  99   234252  38   112924  26    65275  98   293852  24   940.3   1

         ------Sequential Create------    --------Random Create--------
      -Create--   --Read---   -Delete--   -Create--   --Read---   -Delete--
files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
   16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++   15578  82


www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82

------------------------------------------------------------------------------

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)



Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Wed, Mar 12, 2008 at 9:55 PM, Craig James <craig_james@emolecules.com> wrote:
> I just received a new server and thought benchmarks would be interesting.  I think this looks pretty good, but maybe
thereare some suggestions about the configuration file.  This is a web app, a mix of read/write, where writes tend to
be"insert into ... (select ...)" where the resulting insert is on the order of 100 to 10K rows of two integers.  An
externalprocess also uses a LOT of CPU power along with each query. 

Have you been inserting each insert individually, or as part of a
larger transaction.  Wrapping a few thousand up in a begin;end; pair
can really help.  You can reasonably wrap 100k or more inserts into a
single transaction.  if any one insert fails the whole insert sequence
fails.

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 12 Mar 2008 21:55:18 -0700
Craig James <craig_james@emolecules.com> wrote:


> Diffs from original configuration:
> 
> max_connections = 1000
> shared_buffers = 400MB
> work_mem = 256MB
> max_fsm_pages = 1000000
> max_fsm_relations = 5000
> wal_buffers = 256kB
> effective_cache_size = 4GB

I didn't see which OS but I assume linux. I didn't see postgresql so I
assume 8.3.

wal_sync_method = open_sync
checkpoint_segments = 30
shared_buffers = 2000MB
asyncrhonous_commit = off (sp?)

Try again.

Thanks this is useful stuff!

Joshua D. Drake


> 


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
      PostgreSQL political pundit | Mocker of Dolphins

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH2MHsATb/zqfZUUQRAqqtAJsEa8RkJbpqY2FAYSrNVHhvTK/GBgCfYzYD
9myRDV7AYXq+Iht7rIZVZcc=
=PLpQ
-----END PGP SIGNATURE-----

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Justin Graf"
Date:
I recent just got a new server also from dell 2 weeks ago
went with more memory slower CPU, and smaller harddrives
 have not run pgbench

Dell PE 2950 III
2 Quad Core 1.866 Ghz
16 gigs of ram.
8 hard drives 73Gig 10k RPM SAS
2 drives in Mirrored for OS, Binaries, and WAL
6 in a raid 10
Dual Gig Ethernet
OS Ubuntu 7.10
-----------------------------------------------

Version 1.03
                     ------Sequential Output------ --Sequential Input- --Random-
                       -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size        K/sec %CP  K/sec   %CP    K/sec %CP   K/sec %CP K/sec    %CP /sec %CP
PriData 70000M  51030   90   107488    29      50666 10     38464 65     102931     9    268.2 0

------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
PriData,70000M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16,
+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++


the difference in our results are interesting.

What are the setting on the RAID card .  I have the cache turned on with Read Ahead 


---- Message from Craig James <craig_james@emolecules.com> at 03-12-2008 09:55:18 PM ------
I just received a new server and thought benchmarks would be interesting.  I think this looks pretty good, but maybe there are some suggestions about the configuration file.  This is a web app, a mix of read/write, where writes tend to be "insert into ... (select ...)" where the resulting insert is on the order of 100 to 10K rows of two integers.  An external process also uses a LOT of CPU power along with each query.

Thanks,
Craig


Configuration:
  Dell 2950
  8 CPU (Intel 2GHz Xeon)
  8 GB memory
  Dell Perc 6i with battery-backed cache
  RAID 10 of 8x 146GB SAS 10K 2.5" disks

Everything (OS, WAL and databases) are on the one RAID array.

Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 1000000
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB

Bonnie output (slightly reformatted)

------------------------------------------------------------------------------

Delete files in random order...done.
Version  1.03
         ------Sequential Output------       --Sequential Input-      --Random-
      -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--    --Seeks--
 Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec  %CP    /sec %CP
  16G 64205  99   234252  38   112924  26    65275  98   293852  24   940.3   1

         ------Sequential Create------    --------Random Create--------
      -Create--   --Read---   -Delete--   -Create--   --Read---   -Delete--
files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
   16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++   15578  82

www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82

------------------------------------------------------------------------------

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Doug Knight
Date:
All,
I am in the process of specing out a purchase for our production systems, and am looking at the Dell 2950s as well. I am very interested to see where this thread goes, and what combinations work with different application loading types. Our systems will have one pair of heartbeat-controlled, drbd mirrored servers running postgresql 8.3, with a more write intensive, multiple writers and few readers application. The other similarly configured pair will have lots of readers and few writers. Our initial plan is RAID 10 for the database (four 300GB 15K drives in an attached MD1000 box) and RAID 1 for the OS (pair of 73GB drives internal to the 2950). PERC 6i for the internal drives (256MB battery backed cache), PERC 6E for the external drives (512MB battery backed cache). 8GB RAM, also dual Gig NICs for internet and heartbeat/drbd. Not sure which processor we're going with, or if 8GB memory will be enough. Keep the benchmarks coming.

Doug

On Thu, 2008-03-13 at 04:11 -0400, Justin Graf wrote:
I recent just got a new server also from dell 2 weeks ago
went with more memory slower CPU, and smaller harddrives
 have not run pgbench

Dell PE 2950 III
2 Quad Core 1.866 Ghz
16 gigs of ram.
8 hard drives 73Gig 10k RPM SAS
2 drives in Mirrored for OS, Binaries, and WAL
6 in a raid 10
Dual Gig Ethernet
OS Ubuntu 7.10
-----------------------------------------------

Version 1.03
                     ------Sequential Output------ --Sequential Input- --Random-
                       -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size        K/sec %CP  K/sec   %CP    K/sec %CP   K/sec %CP K/sec    %CP /sec %CP
PriData 70000M  51030   90   107488    29      50666 10     38464 65     102931     9    268.2 0

------Sequential Create------ --------Random Create--------
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
PriData,70000M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16,
+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++

the difference in our results are interesting.

What are the setting on the RAID card .  I have the cache turned on with Read Ahead 


---- Message from Craig James <craig_james@emolecules.com> at 03-12-2008 09:55:18 PM ------
I just received a new server and thought benchmarks would be interesting.  I think this looks pretty good, but maybe there are some suggestions about the configuration file.  This is a web app, a mix of read/write, where writes tend to be "insert into ... (select ...)" where the resulting insert is on the order of 100 to 10K rows of two integers.  An external process also uses a LOT of CPU power along with each query.

Thanks,
Craig


Configuration:
  Dell 2950
  8 CPU (Intel 2GHz Xeon)
  8 GB memory
  Dell Perc 6i with battery-backed cache
  RAID 10 of 8x 146GB SAS 10K 2.5" disks

Everything (OS, WAL and databases) are on the one RAID array.

Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 1000000
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB

Bonnie output (slightly reformatted)

------------------------------------------------------------------------------

Delete files in random order...done.
Version  1.03
         ------Sequential Output------       --Sequential Input-      --Random-
      -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--    --Seeks--
 Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec  %CP    /sec %CP
  16G 64205  99   234252  38   112924  26    65275  98   293852  24   940.3   1

         ------Sequential Create------    --------Random Create--------
      -Create--   --Read---   -Delete--   -Create--   --Read---   -Delete--
files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
   16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++   15578  82

www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82

------------------------------------------------------------------------------

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Will Weaver
Date:
Justin,

This may be a bit out of context, but did you run into any troubles getting your Perc6i RAID controller to work under Ubuntu 7.1? I've heard there were issues with that.

Thanks,
Will


On Mar 13, 2008, at 3:11 AM, Justin Graf wrote:

I recent just got a new server also from dell 2 weeks ago
went with more memory slower CPU, and smaller harddrives
 have not run pgbench 

Dell PE 2950 III 
2 Quad Core 1.866 Ghz 
16 gigs of ram. 
8 hard drives 73Gig 10k RPM SAS 
2 drives in Mirrored for OS, Binaries, and WAL 
6 in a raid 10 
Dual Gig Ethernet 
OS Ubuntu 7.10
-----------------------------------------------

Version 1.03 
                     ------Sequential Output------ --Sequential Input- --Random- 
                       -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- 
Machine Size        K/sec %CP  K/sec   %CP    K/sec %CP   K/sec %CP K/sec    %CP /sec %CP 
PriData 70000M  51030   90   107488    29      50666 10     38464 65     102931     9    268.2 0 

------Sequential Create------ --------Random Create-------- 
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- 
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
PriData,70000M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16, 
+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++


the difference in our results are interesting.

What are the setting on the RAID card .  I have the cache turned on with Read Ahead  


---- Message from Craig James <craig_james@emolecules.com> at 03-12-2008 09:55:18 PM ------
I just received a new server and thought benchmarks would be interesting.  I think this looks pretty good, but maybe there are some suggestions about the configuration file.  This is a web app, a mix of read/write, where writes tend to be "insert into ... (select ...)" where the resulting insert is on the order of 100 to 10K rows of two integers.  An external process also uses a LOT of CPU power along with each query.

Thanks,
Craig


Configuration:
  Dell 2950
  8 CPU (Intel 2GHz Xeon)
  8 GB memory
  Dell Perc 6i with battery-backed cache
  RAID 10 of 8x 146GB SAS 10K 2.5" disks

Everything (OS, WAL and databases) are on the one RAID array.

Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 1000000
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB

Bonnie output (slightly reformatted)

------------------------------------------------------------------------------

Delete files in random order...done.
Version  1.03
         ------Sequential Output------       --Sequential Input-      --Random-
      -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--    --Seeks--
 Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec  %CP    /sec %CP
  16G 64205  99   234252  38   112924  26    65275  98   293852  24   940.3   1

         ------Sequential Create------    --------Random Create--------
      -Create--   --Read---   -Delete--   -Create--   --Read---   -Delete--
files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
   16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++   15578  82

www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82

------------------------------------------------------------------------------

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Justin Graf"
Date:
I did not run into one install problem,  I read a thread about people having problems but the thread is over a year old now.

I used the 7.1 gutsy amd64 server version

I then installed gnome desktop because its not installed by default.  "i'm a windows admin i have to have my gui"

then installed postgres 8.3 gutsy.

 it took about 3 hours to get the server setup.


---- Message from Will Weaver <will@myemma.com> at 03-13-2008 08:11:06 AM ------
Justin,

This may be a bit out of context, but did you run into any troubles getting your Perc6i RAID controller to work under Ubuntu 7.1? I've heard there were issues with that.

Thanks,
Will


On Mar 13, 2008, at 3:11 AM, Justin Graf wrote:

I recent just got a new server also from dell 2 weeks ago
went with more memory slower CPU, and smaller harddrives
 have not run pgbench 

Dell PE 2950 III 
2 Quad Core 1.866 Ghz 
16 gigs of ram. 
8 hard drives 73Gig 10k RPM SAS 
2 drives in Mirrored for OS, Binaries, and WAL 
6 in a raid 10 
Dual Gig Ethernet 
OS Ubuntu 7.10
-----------------------------------------------

Version 1.03 
                     ------Sequential Output------ --Sequential Input- --Random- 
                       -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- 
Machine Size        K/sec %CP  K/sec   %CP    K/sec %CP   K/sec %CP K/sec    %CP /sec %CP 
PriData 70000M  51030   90   107488    29      50666 10     38464 65     102931     9    268.2 0 

------Sequential Create------ --------Random Create-------- 
-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- 
files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 
16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ 
PriData,70000M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16, 
+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++


the difference in our results are interesting.

What are the setting on the RAID card .  I have the cache turned on with Read Ahead  


---- Message from Craig James <craig_james@emolecules.com> at 03-12-2008 09:55:18 PM ------
I just received a new server and thought benchmarks would be interesting.  I think this looks pretty good, but maybe there are some suggestions about the configuration file.  This is a web app, a mix of read/write, where writes tend to be "insert into ... (select ...)" where the resulting insert is on the order of 100 to 10K rows of two integers.  An external process also uses a LOT of CPU power along with each query.

Thanks,
Craig


Configuration:
  Dell 2950
  8 CPU (Intel 2GHz Xeon)
  8 GB memory
  Dell Perc 6i with battery-backed cache
  RAID 10 of 8x 146GB SAS 10K 2.5" disks

Everything (OS, WAL and databases) are on the one RAID array.

Diffs from original configuration:

max_connections = 1000
shared_buffers = 400MB
work_mem = 256MB
max_fsm_pages = 1000000
max_fsm_relations = 5000
wal_buffers = 256kB
effective_cache_size = 4GB

Bonnie output (slightly reformatted)

------------------------------------------------------------------------------

Delete files in random order...done.
Version  1.03
         ------Sequential Output------       --Sequential Input-      --Random-
      -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--    --Seeks--
 Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec  %CP    /sec %CP
  16G 64205  99   234252  38   112924  26    65275  98   293852  24   940.3   1

         ------Sequential Create------    --------Random Create--------
      -Create--   --Read---   -Delete--   -Create--   --Read---   -Delete--
files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
   16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++   15578  82

www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82

------------------------------------------------------------------------------

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance





Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
Justin Graf wrote:
> I recent just got a new server also from dell 2 weeks ago
> went with more memory slower CPU, and smaller harddrives
>  have not run pgbench
>
> Dell PE 2950 III
> 2 Quad Core 1.866 Ghz
> 16 gigs of ram.
> 8 hard drives 73Gig 10k RPM SAS
> 2 drives in Mirrored for OS, Binaries, and WAL
> 6 in a raid 10
> Dual Gig Ethernet
> OS Ubuntu 7.10
> -----------------------------------------------
>
> Version 1.03
>                      ------Sequential Output------ --Sequential Input-
> --Random-
>                        -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine Size        K/sec %CP  K/sec   %CP    K/sec %CP   K/sec %CP
> K/sec    %CP /sec %CP
> PriData 70000M  51030   90   107488    29      50666 10     38464 65
> 102931     9    268.2 0
>
> ------Sequential Create------ --------Random Create--------
> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
> PriData,70000M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16,
> +++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
>
> the difference in our results are interesting.
>
> What are the setting on the RAID card .  I have the cache turned on with
> Read Ahead

First, did you get the Perc 6i with battery-backed cache?  Not all versions have it.  I found this really confusing
whentrying to order -- we had to work pretty hard to figure out exactly what to order to be sure we got this feature.
(Doesanyone at Dell follow these discussions?) 

Second, Dell ships a Linux driver with this hardware, and we installed it.  I have no idea what the driver does,
becauseI think you can run the system without it, but my guess is that without the Dell driver, it's using the Perc6
cardin some "normal" mode that doesn't take advantage of its capabilities. 

With a 6-disk RAID 10, you should get numbers at least in the same ballpark as my numbers.

Craig

>
>
> ---- Message from Craig James <craig_james@emolecules.com>
> <mailto:craig_james@emolecules.com> at 03-12-2008 09:55:18 PM ------
>
>     I just received a new server and thought benchmarks would be
>     interesting.  I think this looks pretty good, but maybe there are
>     some suggestions about the configuration file.  This is a web app, a
>     mix of read/write, where writes tend to be "insert into ... (select
>     ...)" where the resulting insert is on the order of 100 to 10K rows
>     of two integers.  An external process also uses a LOT of CPU power
>     along with each query.
>
>     Thanks,
>     Craig
>
>
>     Configuration:
>       Dell 2950
>       8 CPU (Intel 2GHz Xeon)
>       8 GB memory
>       Dell Perc 6i with battery-backed cache
>       RAID 10 of 8x 146GB SAS 10K 2.5" disks
>
>     Everything (OS, WAL and databases) are on the one RAID array.
>
>     Diffs from original configuration:
>
>     max_connections = 1000
>     shared_buffers = 400MB
>     work_mem = 256MB
>     max_fsm_pages = 1000000
>     max_fsm_relations = 5000
>     wal_buffers = 256kB
>     effective_cache_size = 4GB
>
>     Bonnie output (slightly reformatted)
>
>     ------------------------------------------------------------------------------
>
>     Delete files in random order...done.
>     Version  1.03
>              ------Sequential Output------       --Sequential Input-
>          --Random-
>           -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--
>        --Seeks--
>      Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec  %CP
>        /sec %CP
>       16G 64205  99   234252  38   112924  26    65275  98   293852  24
>       940.3   1
>
>              ------Sequential Create------    --------Random Create--------
>           -Create--   --Read---   -Delete--   -Create--   --Read---
>       -Delete--
>     files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
>        /sec %CP
>        16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++
>       15578  82
>
>
www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82
>
>     ------------------------------------------------------------------------------
>
>     $ pgbench -c 10 -t 10000 -v test -U test
>     starting vacuum...end.
>     starting vacuum accounts...end.
>     transaction type: TPC-B (sort of)
>     scaling factor: 1
>     number of clients: 10
>     number of transactions per client: 10000
>     number of transactions actually processed: 100000/100000
>     tps = 2786.377933 (including connections establishing)
>     tps = 2787.888209 (excluding connections establishing)

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
Doug Knight wrote:
> All,
> I am in the process of specing out a purchase for our production
> systems, and am looking at the Dell 2950s as well. I am very interested
> to see where this thread goes, and what combinations work with different
> application loading types. Our systems will have one pair of
> heartbeat-controlled, drbd mirrored servers running postgresql 8.3, with
> a more write intensive, multiple writers and few readers application.
> The other similarly configured pair will have lots of readers and few
> writers. Our initial plan is RAID 10 for the database (four 300GB 15K
> drives in an attached MD1000 box) and RAID 1 for the OS (pair of 73GB
> drives internal to the 2950). PERC 6i for the internal drives (256MB
> battery backed cache), PERC 6E for the external drives (512MB battery
> backed cache). 8GB RAM, also dual Gig NICs for internet and
> heartbeat/drbd. Not sure which processor we're going with, or if 8GB
> memory will be enough. Keep the benchmarks coming.

We considered this configuration too.  But in the end, we decided that by going with the 146 GB 2.5" drives, we could
get8 disks in the main box, and save the cost of the MD1000, which almost doubles the price of the system.  We end up
witha 546 GB RAID assembly, more than enough for our needs. 

I think that 8 10K disks in a RAID 10 will be faster than 4 15K disks, and you only gain a little space (two 300GB
versusfour 146GB).  So it seemed like we'd be paying more and getting less.  With the battery-backed Perc 6i RAID, the
adviceseemed to be that the OS, WAL and Database could all share the disk without conflict, and I think the numbers
willback that up.  We're not in production yet, so only time will tell. 

Craig

>
> Doug
>
> On Thu, 2008-03-13 at 04:11 -0400, Justin Graf wrote:
>> I recent just got a new server also from dell 2 weeks ago
>> went with more memory slower CPU, and smaller harddrives
>>  have not run pgbench
>>
>> Dell PE 2950 III
>> 2 Quad Core 1.866 Ghz
>> 16 gigs of ram.
>> 8 hard drives 73Gig 10k RPM SAS
>> 2 drives in Mirrored for OS, Binaries, and WAL
>> 6 in a raid 10
>> Dual Gig Ethernet
>> OS Ubuntu 7.10
>> -----------------------------------------------
>>
>> Version 1.03
>>                      ------Sequential Output------ --Sequential Input-
>> --Random-
>>                        -Per Chr- --Block-- -Rewrite- -Per Chr-
>> --Block-- --Seeks--
>> Machine Size        K/sec %CP  K/sec   %CP    K/sec %CP   K/sec %CP
>> K/sec    %CP /sec %CP
>> PriData 70000M  51030   90   107488    29      50666 10     38464
>> 65     102931     9    268.2 0
>>
>> ------Sequential Create------ --------Random Create--------
>> -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
>> files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
>> 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++
>> PriData,70000M,51030,90,107488,29,50666,10,38464,65,102931,9,268.2,0,16,
>> +++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++,+++++,+++
>>
>> the difference in our results are interesting.
>>
>> What are the setting on the RAID card .  I have the cache turned on
>> with Read Ahead
>>
>>
>> ---- Message from Craig James <craig_james@emolecules.com>
>> <mailto:craig_james@emolecules.com> at 03-12-2008 09:55:18 PM ------
>>
>>     I just received a new server and thought benchmarks would be
>>     interesting.  I think this looks pretty good, but maybe there are
>>     some suggestions about the configuration file.  This is a web app,
>>     a mix of read/write, where writes tend to be "insert into ...
>>     (select ...)" where the resulting insert is on the order of 100 to
>>     10K rows of two integers.  An external process also uses a LOT of
>>     CPU power along with each query.
>>
>>     Thanks,
>>     Craig
>>
>>
>>     Configuration:
>>       Dell 2950
>>       8 CPU (Intel 2GHz Xeon)
>>       8 GB memory
>>       Dell Perc 6i with battery-backed cache
>>       RAID 10 of 8x 146GB SAS 10K 2.5" disks
>>
>>     Everything (OS, WAL and databases) are on the one RAID array.
>>
>>     Diffs from original configuration:
>>
>>     max_connections = 1000
>>     shared_buffers = 400MB
>>     work_mem = 256MB
>>     max_fsm_pages = 1000000
>>     max_fsm_relations = 5000
>>     wal_buffers = 256kB
>>     effective_cache_size = 4GB
>>
>>     Bonnie output (slightly reformatted)
>>
>>     ------------------------------------------------------------------------------
>>
>>     Delete files in random order...done.
>>     Version  1.03
>>              ------Sequential Output------       --Sequential Input-
>>          --Random-
>>           -Per Chr-   --Block--    -Rewrite-     -Per Chr-   --Block--
>>        --Seeks--
>>      Size K/sec %CP   K/sec  %CP   K/sec  %CP    K/sec %CP   K/sec
>>      %CP    /sec %CP
>>       16G 64205  99   234252  38   112924  26    65275  98   293852
>>      24   940.3   1
>>
>>              ------Sequential Create------    --------Random
>>     Create--------
>>           -Create--   --Read---   -Delete--   -Create--   --Read---
>>       -Delete--
>>     files  /sec %CP    /sec %CP    /sec %CP    /sec %CP    /sec %CP
>>        /sec %CP
>>        16 12203  95   +++++ +++   19469  94   12297  95   +++++ +++
>>       15578  82
>>
>>
www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+++++,+++,19469,94,12297,95,+++++,+++,15578,82
>>
>>     ------------------------------------------------------------------------------
>>
>>     $ pgbench -c 10 -t 10000 -v test -U test
>>     starting vacuum...end.
>>     starting vacuum accounts...end.
>>     transaction type: TPC-B (sort of)
>>     scaling factor: 1
>>     number of clients: 10
>>     number of transactions per client: 10000
>>     number of transactions actually processed: 100000/100000
>>     tps = 2786.377933 (including connections establishing)
>>     tps = 2787.888209 (excluding connections establishing)
>>
>>
>>
>>     --
>>     Sent via pgsql-performance mailing list
>>     (pgsql-performance@postgresql.org)
>>     To make changes to your subscription:
>>     http://www.postgresql.org/mailpref/pgsql-performance
>>
>>


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
Joshua D. Drake wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, 12 Mar 2008 21:55:18 -0700
> Craig James <craig_james@emolecules.com> wrote:
>
>
>> Diffs from original configuration:
>>
>> max_connections = 1000
>> shared_buffers = 400MB
>> work_mem = 256MB
>> max_fsm_pages = 1000000
>> max_fsm_relations = 5000
>> wal_buffers = 256kB
>> effective_cache_size = 4GB
>
> I didn't see which OS but I assume linux. I didn't see postgresql so I
> assume 8.3.

Right on both counts.

> wal_sync_method = open_sync
> checkpoint_segments = 30
> shared_buffers = 2000MB
> asyncrhonous_commit = off (sp?)
>
> Try again.

Nice improvement!  About 25% increase in TPS:

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 3423.636423 (including connections establishing)
tps = 3425.957521 (excluding connections establishing)

For reference, here are the results before your suggested changes:

$ pgbench -c 10 -t 10000 -v test -U test
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 2786.377933 (including connections establishing)
tps = 2787.888209 (excluding connections establishing)

Thanks!
Craig

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Greg Smith
Date:
On Thu, 13 Mar 2008, Craig James wrote:

>> wal_sync_method = open_sync

There was a bug report I haven't had a chance to investigate yet that
suggested some recent Linux versions have issues when using open_sync.
I'd suggest popping that back to the default for now unless you have time
to really do a long certification process that your system runs reliably
with it turned on.

I suspect most of the improvement you saw from Joshua's recommendations
was from raising checkpoint_segments.

> $ pgbench -c 10 -t 10000 -v test -U test
> scaling factor: 1
> number of clients: 10

A scaling factor of 1 means you are operating on a positively trivial 16MB
database.  It also means there's exactly one entry in a table that every
client updates on every transactions.  You have 10 clients, and they're
all fighting over access to it.

If you actually want something that approaches useful numbers here, you
need to at run 'pgbench -i -s 10' to get a scaling factor of 10 and a
160MB database.  Interesting results on this class of hardware are when
you set scaling to 100 or more (100=1.6GB database).  See
http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm for
some examples of how that works, from a less powerful system than yours.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"justin"
Date:
Absolutely on the battery backup.

I did not load the linux drivers from dell,  it works so i figured i was not
going to worry about it.  This server is so oversized for its load its
unreal.  I have always gone way overboard on server specs and making sure
its redundant.

The difference in our bonnie++ numbers is interesting,  In some cases my
setup blows by yours and in other your's destroys mine.

On the raid setup.  I'm a windows guy so i setup like its windows machine
keeping the OS/logs way separate from the DATA.

I chose to use ext3 on these partition


---- Message from Craig James <craig_james@emolecules.com> at 03-13-2008
07:29:23 AM ------

  First, did you get the Perc 6i with battery-backed cache?  Not all
versions have it.  I found this really confusing when trying to order -- we
had to work pretty hard to figure out exactly what to order to be sure we
got this feature.  (Does anyone at Dell follow these discussions?)

  Second, Dell ships a Linux driver with this hardware, and we installed it.
I have no idea what the driver does, because I think you can run the system
without it, but my guess is that without the Dell driver, it's using the
Perc6 card in some "normal" mode that doesn't take advantage of its
capabilities.

  With a 6-disk RAID 10, you should get numbers at least in the same
ballpark as my numbers.

  Craig






Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Thu, 13 Mar 2008 12:01:50 -0400 (EDT)
Greg Smith <gsmith@gregsmith.com> wrote:

> On Thu, 13 Mar 2008, Craig James wrote:
> 
> >> wal_sync_method = open_sync
> 
> There was a bug report I haven't had a chance to investigate yet that 
> suggested some recent Linux versions have issues when using
> open_sync. I'd suggest popping that back to the default for now
> unless you have time to really do a long certification process that
> your system runs reliably with it turned on.

Well the default would be ugly, that's fsync, fdatasync is probably a
better choice in that case.

Sincerely,

Joshua D. Drake


- -- 
The PostgreSQL Company since 1997: http://www.commandprompt.com/ 
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
      PostgreSQL political pundit | Mocker of Dolphins

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFH2ZdMATb/zqfZUUQRArZvAJ9Ja3Jnj2WD3eSYWoAv0ps5TVlPCQCglIEK
CAelb/M/BR+RJXhhEh7Iecw=
=pq2P
-----END PGP SIGNATURE-----

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Greg Smith
Date:
On Thu, 13 Mar 2008, Joshua D. Drake wrote:

> Greg Smith <gsmith@gregsmith.com> wrote:
>>>> wal_sync_method = open_sync
>>
>> There was a bug report I haven't had a chance to investigate yet that
>> suggested some recent Linux versions have issues when using
>> open_sync. I'd suggest popping that back to the default for now
>> unless you have time to really do a long certification process that
>> your system runs reliably with it turned on.
>
> Well the default would be ugly, that's fsync, fdatasync is probably a
> better choice in that case.

I haven't found fdatasync to be significantly better in my tests on Linux
but I never went out of my way to try and quantify it.  My understanding
is that some of the write barrier implementation details on ext3
filesystems make any sync call a relatively heavy operation but I haven't
poked at the code yet to figure out why.

There are really some substantial gains for WAL-heavy loads under Linux
just waiting for someone to dig into this more.  For example, I have a
little plan sitting here to allow opening the WAL files with noatime even
if the rest of the filesystem can't be mounted that way, which would
collapse one of the big reasons to use a separate WAL disk.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"justin"
Date:
----- Original Message -----
From: "Greg Smith" <gsmith@gregsmith.com>
To: <pgsql-performance@postgresql.org>
Sent: Thursday, March 13, 2008 4:27 PM
Subject: Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10


> On Thu, 13 Mar 2008, Joshua D. Drake wrote:
>
>> Greg Smith <gsmith@gregsmith.com> wrote:
>>>>> wal_sync_method = open_sync
>>>
>>> There was a bug report I haven't had a chance to investigate yet that
>>> suggested some recent Linux versions have issues when using
>>> open_sync. I'd suggest popping that back to the default for now
>>> unless you have time to really do a long certification process that
>>> your system runs reliably with it turned on.
>>
>> Well the default would be ugly, that's fsync, fdatasync is probably a
>> better choice in that case.
>
> I haven't found fdatasync to be significantly better in my tests on Linux
> but I never went out of my way to try and quantify it.  My understanding
> is that some of the write barrier implementation details on ext3
> filesystems make any sync call a relatively heavy operation but I haven't
> poked at the code yet to figure out why.
>
> There are really some substantial gains for WAL-heavy loads under Linux
> just waiting for someone to dig into this more.  For example, I have a
> little plan sitting here to allow opening the WAL files with noatime even
> if the rest of the filesystem can't be mounted that way, which would
> collapse one of the big reasons to use a separate WAL disk.
>
> --
> * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

I'm ran pgbench from my laptop to the new server

My laptop is dual core with 2 gigs of ram and 1 gig enthernet connection to
server.   so i don't think the network is going to be a problem in the test.

When i look at the server memory its only consuming  463 megs.   I have the
effective cache set at 12 gigs and sharebuffer at 100megs and work mem set
to 50megs

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 1
number of transactions per client: 10
number of transactions actually processed: 10/10
tps = 20.618557 (including connections establishing)
tps = 20.618557 (excluding connections establishing)

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 10
number of transactions actually processed: 100/100
tps = 18.231541 (including connections establishing)
tps = 18.231541 (excluding connections establishing)

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 100
number of transactions actually processed: 1000/1000
tps = 19.116073 (including connections establishing)
tps = 19.116073 (excluding connections establishing)

transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 1000
number of transactions actually processed: 40000/40000
tps = 20.368217 (including connections establishing)
tps = 20.368217 (excluding connections establishing)


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Thu, Mar 13, 2008 at 4:53 PM, justin <justin@emproshunts.com> wrote:
>
>  I'm ran pgbench from my laptop to the new server
>
>  My laptop is dual core with 2 gigs of ram and 1 gig enthernet connection to
>  server.   so i don't think the network is going to be a problem in the test.
>
>  When i look at the server memory its only consuming  463 megs.   I have the
>  effective cache set at 12 gigs and sharebuffer at 100megs and work mem set
>  to 50megs

You do know that effective_cache_size is the size of the OS level
cache.  i.e. it won't show up in postgresql's memory usage.  On a
machine with (I assume) 12 or more gigs or memory, you should have
your shared_buffers set to a much higher number than 100Meg.  (unless
you're still running 7.4 but that's another story.)

pgbench will never use 50 megs of work_mem, as it's transactional and
hitting single rows at a time, not sorting huge lists of rows.  Having
PostgreSQL use up all the memory is NOT necessarily your best bet.
Letting the OS cache your data is quite likely a good choice here, so
I'd keep your shared_buffers in the 500M to 2G range.

>  transaction type: TPC-B (sort of)
>  scaling factor: 100
>  number of clients: 1
>
> number of transactions per client: 10
>  number of transactions actually processed: 10/10
>  tps = 20.618557 (including connections establishing)
>  tps = 20.618557 (excluding connections establishing)
>
>
>  transaction type: TPC-B (sort of)
>  scaling factor: 100
>
> number of clients: 10
>  number of transactions per client: 10
>  number of transactions actually processed: 100/100
>  tps = 18.231541 (including connections establishing)
>  tps = 18.231541 (excluding connections establishing)
>
>
>  transaction type: TPC-B (sort of)
>  scaling factor: 100
>
> number of clients: 10
>  number of transactions per client: 100
>  number of transactions actually processed: 1000/1000
>  tps = 19.116073 (including connections establishing)
>  tps = 19.116073 (excluding connections establishing)
>
>
>  transaction type: TPC-B (sort of)
>  scaling factor: 100
>
> number of clients: 40
>  number of transactions per client: 1000
>  number of transactions actually processed: 40000/40000
>  tps = 20.368217 (including connections establishing)
>  tps = 20.368217 (excluding connections establishing)

Those numbers are abysmal.  I had a P-III-750 5 years ago that ran
well into the hundreds on  a large scaling factor (1000 or so) pgbench
db with 100 or more concurrent connections all the way down to 10
threads.  I.e. it never dropped below 200 or so during the testing.
this was with a Perc3 series LSI controller with LSI firmware and the
megaraid 2.0.x driver, which I believe is the basis for the current
LSI drivers today.

A few points.  10 or 100 total transactions is far too few
transactions to really get a good number.  1000 is about the minimum
to run to get a good average, and running 10000 or so is about the
minimum I shoot for.  So your later tests are likely to be less noisy.
 They're all way too slow for a modern server, and point ot
non-optimal hardware.  An untuned pgsql database should be able to get
to or over 100 tps.  I had a sparc-20 that could do 80 or so.

Do you know if you're I/O bound or CPU bound?

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Thu, Mar 13, 2008 at 3:09 PM, justin <justin@emproshunts.com> wrote:

>  I chose to use ext3 on these partition

You should really consider another file system.  ext3 has two flaws
that mean I can't really use it properly.  A 2TB file system size
limit (at least on the servers I've tested) and it locks the whole
file system while deleting large files, which can take several seconds
and stop ANYTHING from happening during that time.  This means that
dropping or truncating large tables in the middle of the day could
halt your database for seconds at a time.  This one misfeature means
that ext2/3 are unsuitable for running under a database.

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Fri, Mar 14, 2008 at 12:17 AM, Jesper Krogh <jesper@krogh.cc> wrote:
>
> Scott Marlowe wrote:
>  > On Thu, Mar 13, 2008 at 3:09 PM, justin <justin@emproshunts.com> wrote:
>  >
>  >>  I chose to use ext3 on these partition
>  >
>  > You should really consider another file system.  ext3 has two flaws
>  > that mean I can't really use it properly.  A 2TB file system size
>  > limit (at least on the servers I've tested) and it locks the whole
>  > file system while deleting large files, which can take several seconds
>  > and stop ANYTHING from happening during that time.  This means that
>  > dropping or truncating large tables in the middle of the day could
>  > halt your database for seconds at a time.  This one misfeature means
>  > that ext2/3 are unsuitable for running under a database.
>
>  I cannot acknowledge or deny the last one, but the first one is not
>  true. I have several volumes in the 4TB+ range on ext3 performing nicely.
>
>  I can test the "large file stuff", but how large? .. several GB is not a
>  problem here.

Is this on a 64 bit or 32 bit machine?  We had the problem with a 32
bit linux box (not sure what flavor) just a few months ago.  I would
not create a filesystem on a partition of 2+TB

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Fri, Mar 14, 2008 at 12:19 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>
> On Fri, Mar 14, 2008 at 12:17 AM, Jesper Krogh <jesper@krogh.cc> wrote:
>  >
>  > Scott Marlowe wrote:
>  >  > On Thu, Mar 13, 2008 at 3:09 PM, justin <justin@emproshunts.com> wrote:
>  >  >
>  >  >>  I chose to use ext3 on these partition
>  >  >
>  >  > You should really consider another file system.  ext3 has two flaws
>  >  > that mean I can't really use it properly.  A 2TB file system size
>  >  > limit (at least on the servers I've tested) and it locks the whole
>  >  > file system while deleting large files, which can take several seconds
>  >  > and stop ANYTHING from happening during that time.  This means that
>  >  > dropping or truncating large tables in the middle of the day could
>  >  > halt your database for seconds at a time.  This one misfeature means
>  >  > that ext2/3 are unsuitable for running under a database.
>  >
>  >  I cannot acknowledge or deny the last one, but the first one is not
>  >  true. I have several volumes in the 4TB+ range on ext3 performing nicely.
>  >
>  >  I can test the "large file stuff", but how large? .. several GB is not a
>  >  problem here.
>
>  Is this on a 64 bit or 32 bit machine?  We had the problem with a 32
>  bit linux box (not sure what flavor) just a few months ago.  I would
>  not create a filesystem on a partition of 2+TB
>

OK, according to this it's 16TiB:
http://en.wikipedia.org/wiki/Ext2

so I'm not sure what problem we were having.  It was a friend setting
up the RAID and I'd already told him to use xfs but he really wanted
to use ext3 because he was more familiar with it.

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Jesper Krogh
Date:
Scott Marlowe wrote:
> On Fri, Mar 14, 2008 at 12:17 AM, Jesper Krogh <jesper@krogh.cc> wrote:
>> Scott Marlowe wrote:
>>  > On Thu, Mar 13, 2008 at 3:09 PM, justin <justin@emproshunts.com> wrote:
>>  >
>>  >>  I chose to use ext3 on these partition
>>  >
>>  > You should really consider another file system.  ext3 has two flaws
>>  > that mean I can't really use it properly.  A 2TB file system size
>>  > limit (at least on the servers I've tested) and it locks the whole
>>  > file system while deleting large files, which can take several seconds
>>  > and stop ANYTHING from happening during that time.  This means that
>>  > dropping or truncating large tables in the middle of the day could
>>  > halt your database for seconds at a time.  This one misfeature means
>>  > that ext2/3 are unsuitable for running under a database.
>>
>>  I cannot acknowledge or deny the last one, but the first one is not
>>  true. I have several volumes in the 4TB+ range on ext3 performing nicely.
>>
>>  I can test the "large file stuff", but how large? .. several GB is not a
>>  problem here.
>
> Is this on a 64 bit or 32 bit machine?  We had the problem with a 32
> bit linux box (not sure what flavor) just a few months ago.  I would
> not create a filesystem on a partition of 2+TB

It is on a 64 bit machine.. but ext3 doesnt have anything specifik in it
as far as I know.. I have mountet filesystems created on 32 bit on 64
bit and the other way around. The filesystems are around years old.

http://en.wikipedia.org/wiki/Ext3 => Limit seems to be 16TB currently
(It might get down to something lower if you choose a small blocksize).

--
Jesper

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Jesper Krogh
Date:
Scott Marlowe wrote:
> On Thu, Mar 13, 2008 at 3:09 PM, justin <justin@emproshunts.com> wrote:
>
>>  I chose to use ext3 on these partition
>
> You should really consider another file system.  ext3 has two flaws
> that mean I can't really use it properly.  A 2TB file system size
> limit (at least on the servers I've tested) and it locks the whole
> file system while deleting large files, which can take several seconds
> and stop ANYTHING from happening during that time.  This means that
> dropping or truncating large tables in the middle of the day could
> halt your database for seconds at a time.  This one misfeature means
> that ext2/3 are unsuitable for running under a database.

I cannot acknowledge or deny the last one, but the first one is not
true. I have several volumes in the 4TB+ range on ext3 performing nicely.

I can test the "large file stuff", but how large? .. several GB is not a
problem here.

Jesper
--
Jesper


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:
> Is this on a 64 bit or 32 bit machine?  We had the problem with a 32
> bit linux box (not sure what flavor) just a few months ago.  I would
> not create a filesystem on a partition of 2+TB
>
Yes this machine is 64bit
> You do know that effective_cache_size is the size of the OS level
> cache.  i.e. it won't show up in postgresql's memory usage.  On a
> machine with (I assume) 12 or more gigs or memory, you should have
> your shared_buffers set to a much higher number than 100Meg.  (unless
> you're still running 7.4 but that's another story.)


Sorry for my ignorance of linux, i'm used to windows task manager or
performance monitor showing all the
memory usage.  I
decided to move to Linux on the new server to get 64bit so still in the
learning curve with that

I played with shared_buffer and never saw much of an improvement from
100 all the way up to 800 megs  moved the checkpoints from 3 to 30 and
still never saw no movement in the numbers.

i agree with you, those numbers are terrible  i realized after posting i
had the option -C turned on
if i read the option -C correctly it is disconnecting and reconnecting
between transactions. The way read -C option creates the worst case.

The raid controller setting is set to make sure it don't lie on fsync

shared_buffers = 800megs
temp_buffers 204800
work_mem 256MB
fsync_on
wal_syns_method fysnc



C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 10000 -v -h
192.168.1.9 -U postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 1768.940935 (including connections establishing)
tps = 1783.230500 (excluding connections establishing)


C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h
192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 567.149831 (including connections establishing)
tps = 568.648692 (excluding connections establishing)

--------------now with just Select --------------

C:\Program Files\PostgreSQL\8.3\bin>pgbench -S -c 10 -t 10000 -h
192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
number of clients: 10
number of transactions per client: 10000
number of transactions actually processed: 100000/100000
tps = 16160.310278 (including connections establishing)
tps = 17436.791630 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -S -c 40 -t 10000 -h
192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
transaction type: SELECT only
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 18338.529250 (including connections establishing)
tps = 20031.048125 (excluding connections establishing)





Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Greg Smith
Date:
On Fri, 14 Mar 2008, Justin wrote:

> I played with shared_buffer and never saw much of an improvement from
> 100 all the way up to 800 megs  moved the checkpoints from 3 to 30 and
> still never saw no movement in the numbers.

Increasing shared_buffers normally improves performance as the size of the
database goes up, but since the pgbench workload is so simple the
operating system will cache it pretty well even if you don't give the
memory directly to PostgreSQL.  Also, on Windows large settings for
shared_buffers don't work very well, you might as well keep it in the
100MB range.

> wal_sync_method=fsync

You might get a decent boost in resuls that write data (not the SELECT
ones) by changing

wal_sync_method = open_datasync

which is the default on Windows.  The way you've got your RAID controller
setup, this is no more or less safe than using fsync.

> i agree with you, those numbers are terrible i realized after posting i
> had the option -C turned on if i read the option -C correctly it is
> disconnecting and reconnecting between transactions. The way read -C
> option creates the worst case.

In addition to being an odd testing mode, there's an outstanding bug in
how -C results are computed that someone submitted a fix for, but it
hasn't been applied yet.  I would suggest forgetting you ever ran that
test.

> number of clients: 10
> number of transactions per client: 10000
> number of transactions actually processed: 100000/100000
> tps = 1768.940935 (including connections establishing)

> number of clients: 40
> number of transactions per client: 10000
> number of transactions actually processed: 400000/400000
> tps = 567.149831 (including connections establishing)
> tps = 568.648692 (excluding connections establishing)

Note how the total number of transactions goes up here, because it's
actually doing clients x requested transcations in total.  The 40 client
case is actually doing 4X as many total operations.  That also means you
can expect 4X as many checkpoints during that run.  It's a longer run like
this second one that you might see some impact by increasing
checkpoint_segments.

To keep comparisons like this more fair, I like to keep the total
transactions constant and just divide that number by the number of clients
to figure out what to set the -t parameter to.  400000 is a good medium
length test, so for that case you'd get

-c 10 -t 40000
-c 40 -t 10000

as the two to compare.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:


Greg Smith wrote:
On Fri, 14 Mar 2008, Justin wrote:

I played with shared_buffer and never saw much of an improvement from
100 all the way up to 800 megs  moved the checkpoints from 3 to 30 and
still never saw no movement in the numbers.

Increasing shared_buffers normally improves performance as the size of the database goes up, but since the pgbench workload is so simple the operating system will cache it pretty well even if you don't give the memory directly to PostgreSQL.  Also, on Windows large settings for shared_buffers don't work very well, you might as well keep it in the 100MB range.

wal_sync_method=fsync

You might get a decent boost in resuls that write data (not the SELECT ones) by changing

wal_sync_method = open_datasync

which is the default on Windows.  The way you've got your RAID controller setup, this is no more or less safe than using fsync.
I moved the window server back to fsync a long time ago.  Around here we are super paranoid about making sure the data does not become corrupt, performance is secondary.  The new server along with the old server is way over built for the load it will ever see.  I will be making the old server a slony replicator located in the manufacturing building.

Also *note* tried setting the value open_datasync and get invalid parameter.  instead i use open_sync

i agree with you, those numbers are terrible i realized after posting i had the option -C turned on if i read the option -C correctly it is disconnecting and reconnecting between transactions. The way read -C option creates the worst case.

In addition to being an odd testing mode, there's an outstanding bug in how -C results are computed that someone submitted a fix for, but it hasn't been applied yet.  I would suggest forgetting you ever ran that test.
Why is the -C option odd?


Note how the total number of transactions goes up here, because it's actually doing clients x requested transcations in total.  The 40 client case is actually doing 4X as many total operations.  That also means you can expect 4X as many checkpoints during that run.  It's a longer run like this second one that you might see some impact by increasing checkpoint_segments.

To keep comparisons like this more fair, I like to keep the total transactions constant and just divide that number by the number of clients to figure out what to set the -t parameter to.  400000 is a good medium length test, so for that case you'd get

-c 10 -t 40000
-c 40 -t 10000

as the two to compare.

---- retested with fsync turned on -----

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h 192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 767.040279 (including connections establishing)
tps = 767.707166 (excluding connections establishing)


C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h 192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 648.988227 (including connections establishing)
tps = 650.935720 (excluding connections establishing)


-------open_sync------------

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h 192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 798.030461 (including connections establishing)
tps = 798.752349 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h 192.168.1.9 -U
postgres empro
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 613.879195 (including connections establishing)
tps = 615.592023 (excluding connections establishing)

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Michael Stone
Date:
On Thu, Mar 13, 2008 at 05:27:09PM -0400, Greg Smith wrote:
>I haven't found fdatasync to be significantly better in my tests on Linux
>but I never went out of my way to try and quantify it.  My understanding
>is that some of the write barrier implementation details on ext3
>filesystems make any sync call a relatively heavy operation but I haven't
>poked at the code yet to figure out why.

Which is why having the checkpoints on a seperate ext2 partition tends
to be a nice win. (Even if its not on a seperate disk.)

Mike Stone

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:
I decided to reformat the raid 10 into ext2 to see if there was any real
big difference in performance as some people have noted   here is the
test results

please note the WAL files are still on the raid 0 set which is still in
ext3 file system format.  these test where run with the fsync as
before.   I made sure every thing was the same as with the first test.

As you can see there is a  3 to 3.5 times increase in performance
numbers just changing the file system

With -S option set there is not change in performance numbers

-------First Run 10 clients------
C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 2108.036891 (including connections establishing)
tps = 2112.902970 (excluding connections establishing)

-----Second Run 10 clients -----
C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 2316.114949 (including connections establishing)
tps = 2321.990410 (excluding connections establishing)


-----First Run 40 clients --------
C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 2675.585284 (including connections establishing)
tps = 2706.707899 (excluding connections establishing)

---Second Run ----
C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 2600.560421 (including connections establishing)
tps = 2629.952529 (excluding connections establishing)

---- Select Only Option ------
C:\Program Files\PostgreSQL\8.3\bin>pgbench -S -c 10 -t 40000 -v -h
192.168.1.9
-U postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: SELECT only
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 18181.818182 (including connections establishing)
tps = 18550.294486 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -S -c 40 -t 10000 -v -h
192.168.1.9
-U postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: SELECT only
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 18991.548761 (including connections establishing)
tps = 20729.684909 (excluding connections establishing)




Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Dave Cramer
Date:
On 16-Mar-08, at 2:19 AM, Justin wrote:

>
> I decided to reformat the raid 10 into ext2 to see if there was any
> real big difference in performance as some people have noted   here
> is the test results
>
> please note the WAL files are still on the raid 0 set which is still
> in ext3 file system format.  these test where run with the fsync as
> before.   I made sure every thing was the same as with the first test.
>
This is opposite to the way I run things. I use ext2 on the WAL and
ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly
write.

Dave


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
Dave Cramer wrote:
>
> On 16-Mar-08, at 2:19 AM, Justin wrote:
>
>>
>> I decided to reformat the raid 10 into ext2 to see if there was any
>> real big difference in performance as some people have noted   here is
>> the test results
>>
>> please note the WAL files are still on the raid 0 set which is still
>> in ext3 file system format.  these test where run with the fsync as
>> before.   I made sure every thing was the same as with the first test.
>>
> This is opposite to the way I run things. I use ext2 on the WAL and ext3
> on the data. I'd also suggest RAID 10 on the WAL it is mostly write.

Just out of curiosity: Last time I did research, the word seemed to be that xfs was better than ext2 or ext3.  Is that
nottrue?  Why use ext2/3 at all if xfs is faster for Postgres? 

Criag

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
Craig James wrote:
> Dave Cramer wrote:
>>
>> On 16-Mar-08, at 2:19 AM, Justin wrote:
>>
>>>
>>> I decided to reformat the raid 10 into ext2 to see if there was any
>>> real big difference in performance as some people have noted   here
>>> is the test results
>>>
>>> please note the WAL files are still on the raid 0 set which is still
>>> in ext3 file system format.  these test where run with the fsync as
>>> before.   I made sure every thing was the same as with the first test.
>>>
>> This is opposite to the way I run things. I use ext2 on the WAL and
>> ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly write.
>
> Just out of curiosity: Last time I did research, the word seemed to be
> that xfs was better than ext2 or ext3.  Is that not true?  Why use
> ext2/3 at all if xfs is faster for Postgres?
>
> Criag

And let's see if I can write my own name ...

Craig

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Dave Cramer
Date:
On 16-Mar-08, at 3:04 PM, Craig James wrote:

> Dave Cramer wrote:
>> On 16-Mar-08, at 2:19 AM, Justin wrote:
>>>
>>> I decided to reformat the raid 10 into ext2 to see if there was
>>> any real big difference in performance as some people have noted
>>> here is the test results
>>>
>>> please note the WAL files are still on the raid 0 set which is
>>> still in ext3 file system format.  these test where run with the
>>> fsync as before.   I made sure every thing was the same as with
>>> the first test.
>>>
>> This is opposite to the way I run things. I use ext2 on the WAL and
>> ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly
>> write.
>
> Just out of curiosity: Last time I did research, the word seemed to
> be that xfs was better than ext2 or ext3.  Is that not true?  Why
> use ext2/3 at all if xfs is faster for Postgres?
>
I would like to see the evidence of this. I doubt that it would be
faster than ext2. There is no journaling on ext2.

Dave

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Sun, Mar 16, 2008 at 1:36 PM, Dave Cramer <pg@fastcrypt.com> wrote:
>
>  On 16-Mar-08, at 3:04 PM, Craig James wrote:
>  > Just out of curiosity: Last time I did research, the word seemed to
>  > be that xfs was better than ext2 or ext3.  Is that not true?  Why
>  > use ext2/3 at all if xfs is faster for Postgres?
>  >
>  I would like to see the evidence of this. I doubt that it would be
>  faster than ext2. There is no journaling on ext2.

Well, if you're dropping a large table ext2/3 has that very long wait
thing that can happen.  Don't know how much battery backed cache would
help.

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:
OK i'm showing my ignorance of linux.  On Ubuntu i can't seem to figure
out if  XFS file system is installed, if not installed  getting it
installed.

I would like to see the difference between XFS and ext2 performance
numbers.

any pointers would be nice.  I 'm not going to reinstall the OS.  Nor do
i want to install some unstable library into the kernel.

Dave Cramer wrote:
>
> On 16-Mar-08, at 3:04 PM, Craig James wrote:
>
>> Dave Cramer wrote:
>>> On 16-Mar-08, at 2:19 AM, Justin wrote:
>>>>
>>>> I decided to reformat the raid 10 into ext2 to see if there was any
>>>> real big difference in performance as some people have noted   here
>>>> is the test results
>>>>
>>>> please note the WAL files are still on the raid 0 set which is
>>>> still in ext3 file system format.  these test where run with the
>>>> fsync as before.   I made sure every thing was the same as with the
>>>> first test.
>>>>
>>> This is opposite to the way I run things. I use ext2 on the WAL and
>>> ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly
>>> write.
>>
>> Just out of curiosity: Last time I did research, the word seemed to
>> be that xfs was better than ext2 or ext3.  Is that not true?  Why use
>> ext2/3 at all if xfs is faster for Postgres?
>>
> I would like to see the evidence of this. I doubt that it would be
> faster than ext2. There is no journaling on ext2.
>
> Dave
>

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Tino Wildenhain
Date:
Justin wrote:
> OK i'm showing my ignorance of linux.  On Ubuntu i can't seem to figure
> out if  XFS file system is installed, if not installed  getting it
> installed.

Hm? Installed/not installed? You can select that when you are preparing
your partitions. If you run the automated partitioner there is of course
not much choice but you can try the manual mode. Even after that you
can format individual partitions with XFS if you want. XFS is long since
included in the recent linux kernels, also there is raiserfs if you feel
desperate (well in fact raiser fs is ok too but you should not use it
on flaky hardware). Both xfs and raiser are designed for journaling -
it is believed that xfs performs better with large files and raiser
good with many small files (like Maildir for example).

I'd suggest a test with your data and workload to be sure.

Regards
Tino

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Andrej Ricnik-Bay"
Date:
On 17/03/2008, Justin <justin@emproshunts.com> wrote:
> OK i'm showing my ignorance of linux.  On Ubuntu i can't seem to figure
>  out if  XFS file system is installed, if not installed  getting it
>  installed.
...
>  any pointers would be nice.  I 'm not going to reinstall the OS.  Nor do
>  i want to install some unstable library into the kernel.
It's there.  All you need to do is (I hope ;}) back-up the
partition with the database files on it, reformat using mkfs.xfs
(man mkfs.xfs for details), modify /etc/fstab to say xfs where
it says ext2 for the database partition, restore the data and
use it...


Cheers,
Andrej

--
Please don't top post, and don't use HTML e-Mail :}  Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig Ringer
Date:
Justin wrote:
> OK i'm showing my ignorance of linux.  On Ubuntu i can't seem to figure
> out if  XFS file system is installed, if not installed  getting it
> installed.

There are two parts to the file system, really. One is the kernel driver
for the file system. This is almost certainly available, as it will ship
with the kernel. It might be a module that is loaded on demand or it
might be compiled into the kernel its self.

On my Debian Etch system it's a module, xfs.ko, that can be loaded
manually with:

modprobe xfs

... however, you should not need to do that, as it'll be autoloaded when
you try to mount an xfs volume.

The other part to the file system is the userspace tools for creating,
checking, resizing, etc the file system. An `apt-cache search xfs' shows
that these tools have the package name xfsprogs, at least on Debian.

You can install them with "apt-get install xfsprogs". If they're already
installed no action will be taken.

When xfsprogs is installed you can use mkfs.xfs (see: man mkfs.xfs) to
format a block device (say, a partition like /dev/sda1 or an LVM logical
volume like /dev/SOMELVMVG/somelvmlv) with the xfs file system.

Once the file system is formatted you can mount it manually with the
mount command, eg:

mkdir /mnt/tmp
mount -t xfs /dev/sda1 /mnt/tmp

... or have it mounted on boot using an fstab entry like:

/dev/sda1 /path/to/desired/mountpoint xfs defaults 0 0

--
Craig Ringer

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:
Well every thing worked right up to the point where i tried to mount the file system

Warning:  xfs_db: /dev/sdb1 contains a mounted file system

fatal error -- couldn't initialize XFS library.

think i'm missing something???

Craig Ringer wrote:
Justin wrote: 
OK i'm showing my ignorance of linux.  On Ubuntu i can't seem to figure
out if  XFS file system is installed, if not installed  getting it
installed.   
There are two parts to the file system, really. One is the kernel driver
for the file system. This is almost certainly available, as it will ship
with the kernel. It might be a module that is loaded on demand or it
might be compiled into the kernel its self.

On my Debian Etch system it's a module, xfs.ko, that can be loaded
manually with:

modprobe xfs

... however, you should not need to do that, as it'll be autoloaded when
you try to mount an xfs volume.

The other part to the file system is the userspace tools for creating,
checking, resizing, etc the file system. An `apt-cache search xfs' shows
that these tools have the package name xfsprogs, at least on Debian.

You can install them with "apt-get install xfsprogs". If they're already
installed no action will be taken.

When xfsprogs is installed you can use mkfs.xfs (see: man mkfs.xfs) to
format a block device (say, a partition like /dev/sda1 or an LVM logical
volume like /dev/SOMELVMVG/somelvmlv) with the xfs file system.

Once the file system is formatted you can mount it manually with the
mount command, eg:

mkdir /mnt/tmp
mount -t xfs /dev/sda1 /mnt/tmp

... or have it mounted on boot using an fstab entry like:

/dev/sda1 /path/to/desired/mountpoint xfs defaults 0 0

--
Craig Ringer
 

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Dave Cramer
Date:
On 17-Mar-08, at 2:50 PM, Justin wrote:

>
>>
>> Just out of curiosity: Last time I did research, the word seemed to
>> be that xfs was better than ext2 or ext3.  Is that not true?  Why
>> use ext2/3 at all if xfs is faster for Postgres?
>>
>> Criag
>
> Ext2 vs XFS on my setup there is difference in the performance
> between the two file systems but its not OMG let switch.   XFS did
> better then Ext2 only one time, then Ext2 won out by small margin at
> best was 6%.   the other test ran at 3 to 4% better than XFS
> performance.
>
> XFS has journaling so it should be safer.  I think i may stick with
> XFS as it has journaling
>
> One thing i think is clear don't use ext3  it just kills performance
> by factors not small percents
>
> here is article i found on XFS http://linux-xfs.sgi.com/projects/xfs/papers/xfs_white/xfs_white_paper.html
>
> I hope this is helpful to people.  I know the process has taught me
> new things, and thanks to those that helped me out.
>
> Before i throw this sever into production any one else want
> performance numbers.
>
> C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h
> 192.168.1.9 -U
> postgres play
> Password:
> starting vacuum...end.
> starting vacuum accounts...end.
> transaction type: TPC-B (sort of)
> scaling factor: 100
> number of clients: 10
> number of transactions per client: 40000
> number of transactions actually processed: 400000/400000
> tps = 2181.512770 (including connections establishing)
> tps = 2187.107004 (excluding connections establishing)
>


2000 tps ??? do you have fsync turned off ?

Dave


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:
>
> Just out of curiosity: Last time I did research, the word seemed to be
> that xfs was better than ext2 or ext3.  Is that not true?  Why use
> ext2/3 at all if xfs is faster for Postgres?
>
> Criag

Ext2 vs XFS on my setup there is difference in the performance between
the two file systems but its not OMG let switch.   XFS did better then
Ext2 only one time, then Ext2 won out by small margin at best was 6%.
the other test ran at 3 to 4% better than XFS performance.

 XFS has journaling so it should be safer.  I think i may stick with XFS
as it has journaling

One thing i think is clear don't use ext3  it just kills performance by
factors not small percents

here is article i found on XFS
http://linux-xfs.sgi.com/projects/xfs/papers/xfs_white/xfs_white_paper.html

I hope this is helpful to people.  I know the process has taught me new
things, and thanks to those that helped me out.

Before i throw this sever into production any one else want performance
numbers.

 C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 2181.512770 (including connections establishing)
tps = 2187.107004 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 10 -t 40000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 10
number of transactions per client: 40000
number of transactions actually processed: 400000/400000
tps = 2248.365719 (including connections establishing)
tps = 2254.308547 (excluding connections establishing)

-----------Clients log increased to 40------------

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 2518.447629 (including connections establishing)
tps = 2548.014141 (excluding connections establishing)

C:\Program Files\PostgreSQL\8.3\bin>pgbench -c 40 -t 10000 -v -h
192.168.1.9 -U
postgres play
Password:
starting vacuum...end.
starting vacuum accounts...end.
transaction type: TPC-B (sort of)
scaling factor: 100
number of clients: 40
number of transactions per client: 10000
number of transactions actually processed: 400000/400000
tps = 2606.933139 (including connections establishing)
tps = 2638.626859 (excluding connections establishing)


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:


>>
>>
>
>
> 2000 tps ??? do you have fsync turned off ?
>
> Dave
>

No its turned on.

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Craig James
Date:
Justin wrote:
>> 2000 tps ??? do you have fsync turned off ?
>>
>> Dave
>>
>
> No its turned on.

Unless I'm seriously confused, something is wrong with these numbers.  That's the sort of performance you expect from a
good-sizedRAID 10 six-disk array.  With a single 7200 rpm SATA disk and XFS, I get 640 tps.  There's no way you could
2000tps from a single disk. 

Craig


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Justin
Date:

Craig James wrote:
> Justin wrote:
>>> 2000 tps ??? do you have fsync turned off ?
>>>
>>> Dave
>>>
>>
>> No its turned on.
>
> Unless I'm seriously confused, something is wrong with these numbers.
> That's the sort of performance you expect from a good-sized RAID 10
> six-disk array.  With a single 7200 rpm SATA disk and XFS, I get 640
> tps.  There's no way you could 2000 tps from a single disk.
>
> Craig
>

it is a RAID 10 controller with 6 SAS 10K 73 gig drives.    The server
is 3 weeks old now.

it has 16 gigs of RAM
2 quad core Xenon 1.88 Ghz processors
2 gig Ethernet cards.
RAID controller perc 6/i with battery backup 512meg cache, setup not lie
about fsync

WAL is on a RAID 0 drive along with the OS

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Enrico Sirola
Date:
Hi Justin,

Il giorno 17/mar/08, alle ore 20:38, Justin ha scritto:
> it is a RAID 10 controller with 6 SAS 10K 73 gig drives.    The
> server is 3 weeks old now.
>
> it has 16 gigs of RAM
> 2 quad core Xenon 1.88 Ghz processors
> 2 gig Ethernet cards. RAID controller perc 6/i with battery backup
> 512meg cache, setup not lie about fsync
>
> WAL is on a RAID 0 drive along with the OS

Did you try with a single raid 10 hosting DB + WAL? It gave me much
better performances on similar hardware
Bye,
e.


Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
"Scott Marlowe"
Date:
On Mon, Mar 17, 2008 at 2:58 PM, Enrico Sirola <enrico.sirola@gmail.com> wrote:
> Hi Justin,
>
>  Il giorno 17/mar/08, alle ore 20:38, Justin ha scritto:
>
> > it is a RAID 10 controller with 6 SAS 10K 73 gig drives.    The
>  > server is 3 weeks old now.
>  >
>  > it has 16 gigs of RAM
>  > 2 quad core Xenon 1.88 Ghz processors
>  > 2 gig Ethernet cards. RAID controller perc 6/i with battery backup
>  > 512meg cache, setup not lie about fsync
>  >
>  > WAL is on a RAID 0 drive along with the OS
>
>  Did you try with a single raid 10 hosting DB + WAL? It gave me much
>  better performances on similar hardware
>  Bye,

Note that it can often be advantageous to have one big physical
partition on RAID-10 and to then break it into logical partitions for
the computer, so that you have a partition with just ext2 for the WAL
and since it has its own file system you usually get better
performance without having to actually hard partition out a separate
RAID-1 or RAID-10 for WAL.

Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
James Mansion
Date:
justin@emproshunts.com wrote:
>
> WAL is on a RAID 0 drive along with the OS
Isn't that just as unsafe as having the whole lot on RAID0?



Re: Benchmark: Dell/Perc 6, 8 disk RAID 10

From
Michael Stone
Date:
On Sun, Mar 16, 2008 at 12:04:44PM -0700, Craig James wrote:
>Just out of curiosity: Last time I did research, the word seemed to be that
>xfs was better than ext2 or ext3.  Is that not true?  Why use ext2/3 at all
>if xfs is faster for Postgres?

For the WAL, the filesystem is largely irrelevant. (It's relatively
small, the files are preallocated, the data is synced to disk so there's
not advantage from write buffering, etc.) The best filesystem is one
that does almost nothing and stays out of the way--ext2 is a good choice
for that. The data is a different story and a different filesystem is
usually a better choice. (If for no other reason than to avoid long
fsck times.)

Mike Stone