Thread: New server to improve performance on our large and busy DB - advice? (v2)

New server to improve performance on our large and busy DB - advice? (v2)

From
"Carlo Stonebanks"
Date:
My client just informed me that new hardware is available for our DB server.

. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)

I have put the ugly details of what we do with our DB below, as well as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated processes.

There are less than 10 users, as the server is devoted to the KB production
system.

My questions:

1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? (which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?

Thanks,

Carlo

The details of our use:

. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also large
. The DB is in a perpetual state of serving TCL-scripted Extract, Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are "import tables"
. The import tables are permanently kept in the data warehouse so that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an "audit table" of 500M
rows, and counting.
. The size of the audit table makes it very difficult to manage, especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL processes
decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice suggests as
best practice.

Current non-default conf settings are:

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB



Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Dave Crooke
Date:
I'll bite ....


1. In general, RAID-10 is the only suitable RAID configuration for a database. The decision making comes in how many drives, and splitting stuff up into LUNs (like putting pg_xlog on its own LUN).


2. None of the above - you're asking the wrong question really. PostgreSQL is open source, and is developed on Unix. The Windows version is a pretty good port, as Windows posrt of OSS stuff go, but it's just that, a port. Your server is being dedicated to running Postgres, so the right question to ask is "What is the best OS for running Postgres?".

For any given database engine, regardless of the marketing and support stance, there is only one true "primary" enterprise OS platform that most big mission critical sites use, and is the best supported and most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10 years ago, it was Solaris). For PostgreSQL, it's Linux.

The biggest problem with Postgres on Windows is that it only comes in 32-bit. RAM is the ultimate performance tweak for an RDBMS, and to make proper use of modern amounts of RAM, you need a 64-bit executable.
 


3. The two choices I'd consider are both Linux:

- for the conservative / supported approach, get Red Hat and buy support from them and (e.g.) Enterprise DB
- if you plan to keep pretty current and are happy actively managing versions and running locally compiled builds, go with Ubuntu


4. The general wisdom is that there are a lot of improvements from 8.3 to 8.4, but how much benefit you'll see in your environment is another question. If you're building a new system and have to migrate anyway, it seems like a good opportunity to upgrade.


Cheers
Dave

On Thu, Jan 14, 2010 at 3:25 PM, Carlo Stonebanks <stonec.register@sympatico.ca> wrote:
My client just informed me that new hardware is available for our DB server.

. Intel Core 2 Quads Quad
. 48 GB RAM
. 4 Disk RAID drive (RAID level TBD)

I have put the ugly details of what we do with our DB below, as well as the
postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
with very large tables and the server is always busy serving a constant
stream of single-row UPDATEs and INSERTs from parallel automated processes.

There are less than 10 users, as the server is devoted to the KB production
system.

My questions:

1) Which RAID level would you recommend
2) Which Windows OS would you recommend? (currently 2008 x64 Server)
3) If we were to port to a *NIX flavour, which would you recommend? (which
support trouble-free PG builds/makes please!)
4) Is this the right PG version for our needs?

Thanks,

Carlo

The details of our use:

. The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
professional information of 1.3M individuals.
. The KB tables related to these 130M individuals are naturally also large
. The DB is in a perpetual state of serving TCL-scripted Extract, Transform
and Load (ETL) processes
. These ETL processes typically run 10 at-a-time (i.e. in parallel)
. We would like to run more, but the server appears to be the bottleneck
. The ETL write processes are 99% single row UPDATEs or INSERTs.
. There are few, if any DELETEs
. The ETL source data are "import tables"
. The import tables are permanently kept in the data warehouse so that we
can trace the original source of any information.
. There are 6000+ and counting
. The import tables number from dozens to hundreds of thousands of rows.
They rarely require more than a pkey index.
. Linking the KB to the source import date requires an "audit table" of 500M
rows, and counting.
. The size of the audit table makes it very difficult to manage, especially
if we need to modify the design.
. Because we query the audit table different ways to audit the ETL processes
decisions, almost every column in the audit table is indexed.
. The maximum number of physical users is 10 and these users RARELY perform
any kind of write
. By contrast, the 10+ ETL processes are writing constantly
. We find that internal stats drift, for whatever reason, causing row seq
scans instead of index scans.
. So far, we have never seen a situation where a seq scan has improved
performance, which I would attribute to the size of the tables
. We believe our requirements are exceptional, and we would benefit
immensely from setting up the PG planner to always favour index-oriented
decisions - which seems to contradict everything that PG advice suggests as
best practice.

Current non-default conf settings are:

autovacuum = on
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 250
autovacuum_naptime = 1min
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 500
bgwriter_lru_maxpages = 100
checkpoint_segments = 64
checkpoint_warning = 290
datestyle = 'iso, mdy'
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
log_destination = 'stderr'
log_line_prefix = '%t '
logging_collector = on
maintenance_work_mem = 16MB
max_connections = 200
max_fsm_pages = 204800
max_locks_per_transaction = 128
port = 5432
shared_buffers = 500MB
vacuum_cost_delay = 100
work_mem = 512MB



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Craig Ringer
Date:
On 15/01/2010 6:35 AM, Dave Crooke wrote:
> I'll bite ....
>
>
> 1. In general, RAID-10 is the only suitable RAID configuration for a
> database. The decision making comes in how many drives, and splitting
> stuff up into LUNs (like putting pg_xlog on its own LUN).
>
>

> The biggest problem with Postgres on Windows is that it only comes in
> 32-bit. RAM is the ultimate performance tweak for an RDBMS, and to make
> proper use of modern amounts of RAM, you need a 64-bit executable.

.... though that's much less important for Pg than for most other
things, as Pg uses a one-process-per-connection model and lets the OS
handle much of the caching. So long as the OS can use all that RAM for
caching, Pg will benefit, and it's unlikely you need >2GB for any given
client connection or for the postmaster.

It's nice to have the flexibility to push up shared_buffers, and it'd be
good to avoid any overheads in running 32-bit code on win64. However,
it's not that unreasonable to run a 32-bit Pg on a 64-bit OS and expect
good performance.

You can always go 64-bit once 8.5/9.0 hits and has stabilized, anyway.

--
Craig Ringer

hi,

You wrote a lot of information here so let's confirm in a nutshell what
you have and what you are looking for:

* A database that is of small to medium size (5 - 10 GB)?
* Around 10 clients that perform constant write operations to the
database (UPDATE/INSERT)
* Around 10 clients that occasionally read from the database
* Around 6000 tables in your database
* A problem with tuning it all
* Migration to new hardware and/or OS

Is this all correct?

First thing that is noticeable is that you seem to have way too few
drives in the server - not because of disk space required but because of
speed. You didn't say what type of drives you have and you didn't say
what you would consider desirable performance levels, but off hand
(because of the "10 clients perform constant writes" part) you will
probably want at least 2x-4x more drives.

 > 1) Which RAID level would you recommend

With only 4 drives, RAID 10 is the only thing usable here.

 > 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

Would not recommend Windows OS.

 > 3) If we were to port to a *NIX flavour, which would you recommend?
(which
 > support trouble-free PG builds/makes please!)

Practically any. I'm biased for FreeBSD, a nice and supported version of
Linux will probably be fine.

 > 4) Is this the right PG version for our needs?

If you are starting from scratch on a new server, go for the newest
version you can get - 8.4.2 in this case.

Most importantly, you didn't say what you would consider desirable
performance. The hardware and the setup you described will work, but not
necessarily fast enough.

 > . So far, we have never seen a situation where a seq scan has improved
 > performance, which I would attribute to the size of the tables

... and to the small number of drives you are using.

 > . We believe our requirements are exceptional, and we would benefit
 > immensely from setting up the PG planner to always favour
index-oriented decisions

Have you tried decreasing random_page_cost in postgresql.conf? Or
setting (as a last resort) enable_seqscan = off?


Carlo Stonebanks wrote:
> My client just informed me that new hardware is available for our DB
> server.
>
> . Intel Core 2 Quads Quad
> . 48 GB RAM
> . 4 Disk RAID drive (RAID level TBD)
>
> I have put the ugly details of what we do with our DB below, as well as the
> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
> with very large tables and the server is always busy serving a constant
> stream of single-row UPDATEs and INSERTs from parallel automated processes.
>
> There are less than 10 users, as the server is devoted to the KB production
> system.
>
> My questions:
>
> 1) Which RAID level would you recommend
> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
> 3) If we were to port to a *NIX flavour, which would you recommend? (which
> support trouble-free PG builds/makes please!)
> 4) Is this the right PG version for our needs?
>
> Thanks,
>
> Carlo
>
> The details of our use:
>
> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
> professional information of 1.3M individuals.
> . The KB tables related to these 130M individuals are naturally also large
> . The DB is in a perpetual state of serving TCL-scripted Extract, Transform
> and Load (ETL) processes
> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
> . We would like to run more, but the server appears to be the bottleneck
> . The ETL write processes are 99% single row UPDATEs or INSERTs.
> . There are few, if any DELETEs
> . The ETL source data are "import tables"
> . The import tables are permanently kept in the data warehouse so that we
> can trace the original source of any information.
> . There are 6000+ and counting
> . The import tables number from dozens to hundreds of thousands of rows.
> They rarely require more than a pkey index.
> . Linking the KB to the source import date requires an "audit table" of
> 500M
> rows, and counting.
> . The size of the audit table makes it very difficult to manage, especially
> if we need to modify the design.
> . Because we query the audit table different ways to audit the ETL
> processes
> decisions, almost every column in the audit table is indexed.
> . The maximum number of physical users is 10 and these users RARELY perform
> any kind of write
> . By contrast, the 10+ ETL processes are writing constantly
> . We find that internal stats drift, for whatever reason, causing row seq
> scans instead of index scans.
> . So far, we have never seen a situation where a seq scan has improved
> performance, which I would attribute to the size of the tables
> . We believe our requirements are exceptional, and we would benefit
> immensely from setting up the PG planner to always favour index-oriented
> decisions - which seems to contradict everything that PG advice suggests as
> best practice.
>
> Current non-default conf settings are:
>
> autovacuum = on
> autovacuum_analyze_scale_factor = 0.1
> autovacuum_analyze_threshold = 250
> autovacuum_naptime = 1min
> autovacuum_vacuum_scale_factor = 0.2
> autovacuum_vacuum_threshold = 500
> bgwriter_lru_maxpages = 100
> checkpoint_segments = 64
> checkpoint_warning = 290
> datestyle = 'iso, mdy'
> default_text_search_config = 'pg_catalog.english'
> lc_messages = 'C'
> lc_monetary = 'C'
> lc_numeric = 'C'
> lc_time = 'C'
> log_destination = 'stderr'
> log_line_prefix = '%t '
> logging_collector = on
> maintenance_work_mem = 16MB
> max_connections = 200
> max_fsm_pages = 204800
> max_locks_per_transaction = 128
> port = 5432
> shared_buffers = 500MB
> vacuum_cost_delay = 100
> work_mem = 512MB
>
>
>

On Fri, Jan 15, 2010 at 8:43 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> Have you tried decreasing random_page_cost in postgresql.conf? Or setting
> (as a last resort) enable_seqscan = off?

If you need to set enable_seqscan to off to get the planner to use
your index, the chances that that index are actually going to improve
performance are extremely poor.

...Robert

Re: Re: New server to improve performance on our large and busy DB - advice? (v2)

From
"Ing. Marcos L. Ortiz Valmaseda"
Date:
El 15/01/2010 14:43, Ivan Voras escribió:
> hi,
>
> You wrote a lot of information here so let's confirm in a nutshell
> what you have and what you are looking for:
>
> * A database that is of small to medium size (5 - 10 GB)?
> * Around 10 clients that perform constant write operations to the
> database (UPDATE/INSERT)
> * Around 10 clients that occasionally read from the database
> * Around 6000 tables in your database
> * A problem with tuning it all
> * Migration to new hardware and/or OS
>
> Is this all correct?
>
> First thing that is noticeable is that you seem to have way too few
> drives in the server - not because of disk space required but because
> of speed. You didn't say what type of drives you have and you didn't
> say what you would consider desirable performance levels, but off hand
> (because of the "10 clients perform constant writes" part) you will
> probably want at least 2x-4x more drives.
>
> > 1) Which RAID level would you recommend
>
> With only 4 drives, RAID 10 is the only thing usable here.
>
> > 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
>
> Would not recommend Windows OS.
>
> > 3) If we were to port to a *NIX flavour, which would you recommend?
> (which
> > support trouble-free PG builds/makes please!)
>
> Practically any. I'm biased for FreeBSD, a nice and supported version
> of Linux will probably be fine.
>
> > 4) Is this the right PG version for our needs?
>
> If you are starting from scratch on a new server, go for the newest
> version you can get - 8.4.2 in this case.
>
> Most importantly, you didn't say what you would consider desirable
> performance. The hardware and the setup you described will work, but
> not necessarily fast enough.
>
> > . So far, we have never seen a situation where a seq scan has improved
> > performance, which I would attribute to the size of the tables
>
> ... and to the small number of drives you are using.
>
> > . We believe our requirements are exceptional, and we would benefit
> > immensely from setting up the PG planner to always favour
> index-oriented decisions
>
> Have you tried decreasing random_page_cost in postgresql.conf? Or
> setting (as a last resort) enable_seqscan = off?
>
>
> Carlo Stonebanks wrote:
>> My client just informed me that new hardware is available for our DB
>> server.
>>
>> . Intel Core 2 Quads Quad
>> . 48 GB RAM
>> . 4 Disk RAID drive (RAID level TBD)
>>
>> I have put the ugly details of what we do with our DB below, as well
>> as the
>> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
>> with very large tables and the server is always busy serving a constant
>> stream of single-row UPDATEs and INSERTs from parallel automated
>> processes.
>>
>> There are less than 10 users, as the server is devoted to the KB
>> production
>> system.
>>
>> My questions:
>>
>> 1) Which RAID level would you recommend
>> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
>> 3) If we were to port to a *NIX flavour, which would you recommend?
>> (which
>> support trouble-free PG builds/makes please!)
>> 4) Is this the right PG version for our needs?
>>
>> Thanks,
>>
>> Carlo
>>
>> The details of our use:
>>
>> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
>> professional information of 1.3M individuals.
>> . The KB tables related to these 130M individuals are naturally also
>> large
>> . The DB is in a perpetual state of serving TCL-scripted Extract,
>> Transform
>> and Load (ETL) processes
>> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
>> . We would like to run more, but the server appears to be the bottleneck
>> . The ETL write processes are 99% single row UPDATEs or INSERTs.
>> . There are few, if any DELETEs
>> . The ETL source data are "import tables"
>> . The import tables are permanently kept in the data warehouse so
>> that we
>> can trace the original source of any information.
>> . There are 6000+ and counting
>> . The import tables number from dozens to hundreds of thousands of rows.
>> They rarely require more than a pkey index.
>> . Linking the KB to the source import date requires an "audit table"
>> of 500M
>> rows, and counting.
>> . The size of the audit table makes it very difficult to manage,
>> especially
>> if we need to modify the design.
>> . Because we query the audit table different ways to audit the ETL
>> processes
>> decisions, almost every column in the audit table is indexed.
>> . The maximum number of physical users is 10 and these users RARELY
>> perform
>> any kind of write
>> . By contrast, the 10+ ETL processes are writing constantly
>> . We find that internal stats drift, for whatever reason, causing row
>> seq
>> scans instead of index scans.
>> . So far, we have never seen a situation where a seq scan has improved
>> performance, which I would attribute to the size of the tables
>> . We believe our requirements are exceptional, and we would benefit
>> immensely from setting up the PG planner to always favour index-oriented
>> decisions - which seems to contradict everything that PG advice
>> suggests as
>> best practice.
>>
>> Current non-default conf settings are:
>>
>> autovacuum = on
>> autovacuum_analyze_scale_factor = 0.1
>> autovacuum_analyze_threshold = 250
>> autovacuum_naptime = 1min
>> autovacuum_vacuum_scale_factor = 0.2
>> autovacuum_vacuum_threshold = 500
>> bgwriter_lru_maxpages = 100
>> checkpoint_segments = 64
>> checkpoint_warning = 290
>> datestyle = 'iso, mdy'
>> default_text_search_config = 'pg_catalog.english'
>> lc_messages = 'C'
>> lc_monetary = 'C'
>> lc_numeric = 'C'
>> lc_time = 'C'
>> log_destination = 'stderr'
>> log_line_prefix = '%t '
>> logging_collector = on
>> maintenance_work_mem = 16MB
>> max_connections = 200
>> max_fsm_pages = 204800
>> max_locks_per_transaction = 128
>> port = 5432
>> shared_buffers = 500MB
>> vacuum_cost_delay = 100
>> work_mem = 512MB
>>
>>
>>
>
>
I have a question about that, due to all of you recommend RAID-10 for
the implementatio of this system. Would you give a available
arquitecture based on all these considerations?
About the questions, I recommend FreeBSD too for a PostgreSQL production
server (and for other things too, not only Pg), but with Linux you can
obtain a strong, reliable environment that can be more efficient that
Windows.

Regards


On Thu, 14 Jan 2010 16:35:53 -0600
Dave Crooke <dcrooke@gmail.com> wrote:

> For any given database engine, regardless of the marketing and support
> stance, there is only one true "primary" enterprise OS platform that
> most big mission critical sites use, and is the best supported and
> most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10
> years ago, it was Solaris). For PostgreSQL, it's Linux.

I am interested in this response and am wondering if this is just
Dave's opinion or some sort of official PostgreSQL policy.  I am
learning PostgreSQL by running it on FreeBSD 8.0-STABLE.  So far I
have found no problems and have even read a few posts that are critical
of Linux's handling of fsync.  I really don't want to start a Linux vs
FreeBSD flame war (I like Linux and use that too, though not for
database use), I am just intrigued by the claim that Linux is somehow
the natural OS for running PostgreSQL.  I think if Dave had said "for
PostgreSQL, it's a variant of Unix" I wouldn't have been puzzled.  So I
suppose the question is: what is it about Linux specifically (as
contrasted with other Unix-like OSes, especially Open Source ones) that
makes it particularly suitable for running PostgreSQL?

Best,
Tony


Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Richard Broersma
Date:
On Fri, Jan 15, 2010 at 8:10 AM, Tony McC <afmcc@btinternet.com> wrote:

>> most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10
>> years ago, it was Solaris). For PostgreSQL, it's Linux.
>
> I am interested in this response and am wondering if this is just
> Dave's opinion or some sort of official PostgreSQL policy.

>I really don't want to start a Linux vs
> FreeBSD flame war (I like Linux and use that too, though not for
> database use), I am just intrigued by the claim that Linux is somehow
> the natural OS for running PostgreSQL.


I would wager that this response is a tad flame-bait-"ish".



--
Regards,
Richard Broersma Jr.

Visit the Los Angeles PostgreSQL Users Group (LAPUG)
http://pugs.postgresql.org/lapug

Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Robert Haas
Date:
On Fri, Jan 15, 2010 at 11:10 AM, Tony McC <afmcc@btinternet.com> wrote:
> what is it about Linux specifically (as
> contrasted with other Unix-like OSes, especially Open Source ones) that
> makes it particularly suitable for running PostgreSQL?

Nothing that I know of.

...Robert

Re: Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Pierre Frédéric Caillaud
Date:
>  > 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
>
> Would not recommend Windows OS.

    BTW, I'd be interested to know the NTFS fragmentation stats of your
database file.

Richard Broersma <richard.broersma@gmail.com> writes:
> On Fri, Jan 15, 2010 at 8:10 AM, Tony McC <afmcc@btinternet.com> wrote:
>>> most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10
>>> years ago, it was Solaris). For PostgreSQL, it's Linux.

>> I am interested in this response and am wondering if this is just
>> Dave's opinion or some sort of official PostgreSQL policy.

>> I really don't want to start a Linux vs
>> FreeBSD flame war (I like Linux and use that too, though not for
>> database use), I am just intrigued by the claim that Linux is somehow
>> the natural OS for running PostgreSQL.

> I would wager that this response is a tad flame-bait-"ish".

Indeed.  It's certainly not "project policy".

Given the Linux kernel hackers' apparent disinterest in fixing their
OOM kill policy or making write barriers work well (or at all, with
LVM), I think arguing that Linux is the best database platform requires
a certain amount of suspension of disbelief.

            regards, tom lane

Tom Lane wrote:
> Given the Linux kernel hackers' apparent disinterest in fixing their
> OOM kill policy or making write barriers work well (or at all, with
> LVM), I think arguing that Linux is the best database platform requires
> a certain amount of suspension of disbelief.
>

Don't forget the general hostility toward how the database allocates
shared memory on that list too.

I was suggesting Linux as being the best in the context of consistently
having up to date packages that install easily if you can use the PGDG
yum repo, since that was a specific request.  The idea that Linux is
somehow the preferred platform from PostgreSQL is pretty weird; it's
just a popular one, and has plenty of drawbacks.

I think it's certainly the case that you have to enter into using
PostgreSQL with Linux with the understanding that you only use the most
basic and well understood parts of the OS.  Filesystem other than ext3?
Probably buggy, may get corrupted.  Using the latest write-barrier code
rather than the most basic fsync approach?  Probably buggy, may get
corrupted.  Using LVM instead of simple partitions?  Probably going to
perform badly, maybe buggy and get corrupted too.  Assuming software
RAID can replace a hardware solution with a battery-backed write cache?
Never happen.

There's a narrow Linux setup for PostgreSQL that works well for a lot of
people, but some days it does feel like that's in spite of the
priorities of the people working on the Linux kernel.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Scott Marlowe
Date:
On Fri, Jan 15, 2010 at 11:28 AM, Greg Smith <greg@2ndquadrant.com> wrote:
> Tom Lane wrote:
>>
>> Given the Linux kernel hackers' apparent disinterest in fixing their
>> OOM kill policy or making write barriers work well (or at all, with
>> LVM), I think arguing that Linux is the best database platform requires
>> a certain amount of suspension of disbelief.
>>
>
> Don't forget the general hostility toward how the database allocates shared
> memory on that list too.
>
> I was suggesting Linux as being the best in the context of consistently
> having up to date packages that install easily if you can use the PGDG yum
> repo, since that was a specific request.  The idea that Linux is somehow the
> preferred platform from PostgreSQL is pretty weird; it's just a popular one,
> and has plenty of drawbacks.
>
> I think it's certainly the case that you have to enter into using PostgreSQL
> with Linux with the understanding that you only use the most basic and well
> understood parts of the OS.  Filesystem other than ext3?  Probably buggy,
> may get corrupted.  Using the latest write-barrier code rather than the most
> basic fsync approach?  Probably buggy, may get corrupted.  Using LVM instead
> of simple partitions?  Probably going to perform badly, maybe buggy and get
> corrupted too.  Assuming software RAID can replace a hardware solution with
> a battery-backed write cache?  Never happen.
>
> There's a narrow Linux setup for PostgreSQL that works well for a lot of
> people, but some days it does feel like that's in spite of the priorities of
> the people working on the Linux kernel.

As someone who uses Linux to run postgresql dbs, I tend to agree.
It's not quite alchemy or anything, but there are very real caveats to
be aware of when using linux as the OS for postgresql to run on top
of.  I will say that XFS seems to be a very stable file system, and we
use it for some of our databases with no problems at all.  But most of
our stuff sits on ext3 because it's stable and reliable and fast
enough.

Each OS has some warts when it comes to running pg on it.  Could be a
narrower selection of hardware drivers, buggy locale support, iffy
kernel behaviour when lots of memory is allocated.  And most have a
way to work around those issues as long as you're careful what you're
doing.  If you're familiar with one OS and its warts, you're more
likely to be bitten by the warts of another OS that's new to you no
matter how good it is.

And as always, test the crap outta your setup, cause the time to find
problems is before you put a machine into production.

Re: New server to improve performance on our large and busy DB - advice? (v2)

From
"Kevin Grittner"
Date:
Scott Marlowe <scott.marlowe@gmail.com> wrote:

> I will say that XFS seems to be a very stable file system, and we
> use it for some of our databases with no problems at all.  But
> most of our stuff sits on ext3 because it's stable and reliable
> and fast enough.

Our PostgreSQL data directories are all on xfs, with everything else
(OS, etc) on ext3.  We've been happy with it, as long as we turn off
write barriers, which is only save with a RAID controller with BBU
cache.

> And as always, test the crap outta your setup, cause the time to
> find problems is before you put a machine into production.

Absolutely.

-Kevin

Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Dave Crooke
Date:
Just opinion, and like Greg, I was suggesting it along the lines of "it's the platform most production PG instances run on, so you're following a well trodden path, and any issue you encounter is likely to have been found and fixed by someone else".

It's not about the general suitability of the OS as a database platform, or its feature set, it's about a combination of specific versions of OS, kernel, DB etc that are known to work reliably.


I am curious about the write barrier and shmem issues that other folks have alluded to ... I am pretty new to using PG, but I've used other databases on Linux in production (mostly Oracle, some MySQL) which also use these kernel resources and never encountered problems related to them even under very high loads.

I'd also like to know what OS'es the PG core folks like Tom use.

Cheers
Dave

On Fri, Jan 15, 2010 at 10:10 AM, Tony McC <afmcc@btinternet.com> wrote:
On Thu, 14 Jan 2010 16:35:53 -0600
Dave Crooke <dcrooke@gmail.com> wrote:

> For any given database engine, regardless of the marketing and support
> stance, there is only one true "primary" enterprise OS platform that
> most big mission critical sites use, and is the best supported and
> most stable platform for that RDBMS. For Oracle, that's HP-UX (but 10
> years ago, it was Solaris). For PostgreSQL, it's Linux.

I am interested in this response and am wondering if this is just
Dave's opinion or some sort of official PostgreSQL policy.  I am
learning PostgreSQL by running it on FreeBSD 8.0-STABLE.  So far I
have found no problems and have even read a few posts that are critical
of Linux's handling of fsync.  I really don't want to start a Linux vs
FreeBSD flame war (I like Linux and use that too, though not for
database use), I am just intrigued by the claim that Linux is somehow
the natural OS for running PostgreSQL.  I think if Dave had said "for
PostgreSQL, it's a variant of Unix" I wouldn't have been puzzled.  So I
suppose the question is: what is it about Linux specifically (as
contrasted with other Unix-like OSes, especially Open Source ones) that
makes it particularly suitable for running PostgreSQL?

Best,
Tony


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: New server to improve performance on our large and busy DB - advice? (v2)

From
Dave Crooke
Date:
This is the second time I've heard that "PG shared buffer on Windows doesn't matter" ... I'd like to understand the reasoning behind that claim, and why it differs from other DB servers.

.... though that's much less important for Pg than for most other things, as Pg uses a one-process-per-connection model and lets the OS handle much of the caching. So long as the OS can use all that RAM for caching, Pg will benefit, and it's unlikely you need >2GB for any given client connection or for the postmaster.

Any DB software would benefit from the OS buffer cache, but there is still the overhead of copying that data into the shared buffer area, and as necessary unpacking it into in-memory format.

Oracle uses a more or less identical process and memory model to PG, and for sure you can't have too much SGA with it.

It's nice to have the flexibility to push up shared_buffers, and it'd be good to avoid any overheads in running 32-bit code on win64. However, it's not that unreasonable to run a 32-bit Pg on a 64-bit OS and expect good performance.

My reasoning goes like this:

a. there is a significant performance benefit to using a large proportion of memory as in-process DB server cache instead of OS level block / filesystem cache

b. the only way to do so on modern hardware (i.e. >>4GB) is with a 64-bit binary

c. therefore, a 64-bit binary is essential


You're the second person that's said a. is only a "nice to have" with PG ... what makes the difference?

Cheers
Dave

Dave Crooke <dcrooke@gmail.com> writes:
> This is the second time I've heard that "PG shared buffer on Windows doesn't
> matter" ... I'd like to understand the reasoning behind that claim, and why
> it differs from other DB servers.

AFAIK we don't really understand why, but the experimental evidence is
that increasing shared_buffers to really large values doesn't help much
on Windows.  You can probably find more in the archives.

I'm not sure that this has been retested recently, so it might be
obsolete information, but it's what we've got.

            regards, tom lane

Dave Crooke wrote:

> My reasoning goes like this:
> a. there is a significant performance benefit to using a large
> proportion of memory as in-process DB server cache instead of OS level
> block / filesystem cache
> b. the only way to do so on modern hardware (i.e. >>4GB) is with a
> 64-bit binary
> c. therefore, a 64-bit binary is essential
> You're the second person that's said a. is only a "nice to have" with
> PG ... what makes the difference?

The PostgreSQL model presumes that it's going to be cooperating with the
operating system cache.  In a default config, all reads and writes go
through the OS cache.  You can get the WAL writes to be written in a way
that bypasses the OS cache, but even that isn't the default.  This makes
PostgreSQL's effective cache size equal to shared_buffers *plus* the OS
cache.  This is why Windows can perform OK even without having a giant
amount of dedicated RAM; it just leans on the OS more heavily instead.
That's not as efficient, because you're shuffling more things between
shared_buffers and the OS than you would on a UNIX system, but it's
still way faster than going all the way to disk for something.  On, say,
a system with 16GB of RAM, you can setup Windows to use 256MB of
shared_buffers, and expect that you'll find at least another 14GB or so
of data cached by the OS.

The reasons why Windows is particularly unappreciative of being
allocated memory directly isn't well understood.  But the basic property
that shared_buffers is not the only source, or even the largest source,
of caching is not unique to that platform.

> Oracle uses a more or less identical process and memory model to PG,
> and for sure you can't have too much SGA with it.

The way data goes in and out of Oracle's SGA is often via direct I/O
instead of even touching the OS read/white cache.  That's why the
situation is so different there.  If you're on an Oracle system, and you
need to re-read a block that was recently evicted from the SGA, it's
probably going to be read from disk.  In the same situation with
PostgreSQL, it's likely you'll find it's still in the OS cache.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Re: New server to improve performance on our large and busy DB - advice? (v2)

From
"Carlo Stonebanks"
Date:
> * A database that is of small to medium size (5 - 10 GB)?
> * Around 10 clients that perform constant write operations to the database
> (UPDATE/INSERT)
> * Around 10 clients that occasionally read from the database
> * Around 6000 tables in your database
> * A problem with tuning it all
> * Migration to new hardware and/or OS
>
> Is this all correct?

Actually, the tablespace is very large, over 500GB. However, the actualy
production DB is 200GB.

> First thing that is noticeable is that you seem to have way too few drives
> in the server - not because of disk space required but because of speed.
> You didn't say what type of drives you have and you didn't say what you
> would consider desirable performance levels, but off hand (because of the
> "10 clients perform constant writes" part) you will probably want at least
> 2x-4x more drives.

> With only 4 drives, RAID 10 is the only thing usable here.

What would be the optimum RAID level and number of disks?

> > 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
>
> Would not recommend Windows OS.

We may be stuck as my client is only considering Red Hat Linux (still
waiting to find out which version). If it turns out that this limitatt
doesn't give better than a marginal improvement, then there is no incentive
to create more complications in what is basically a Windows shop (although
the project manager is a Linux advocate).

> Most importantly, you didn't say what you would consider desirable
> performance. The hardware and the setup you described will work, but not
> necessarily fast enough.

Once again, it seems as though we are down to the number of drives...

> Have you tried decreasing random_page_cost in postgresql.conf? Or setting
> (as a last resort) enable_seqscan = off?

In critical code sections, we do - we have stored procedures and code
segments which save the current enable_seqscan value, set it to off (local
to the transaction), then restore it after the code has run. Our current
"planner cost" values are all default. Is this what you would choose for a
Intel Core 2 Quads Quad with 48 GB RAM?

# - Planner Cost Constants -
#seq_page_cost = 1.0   # measured on an arbitrary scale
#random_page_cost = 4.0   # same scale as above
#cpu_tuple_cost = 0.01   # same scale as above
#cpu_index_tuple_cost = 0.005  # same scale as above
#cpu_operator_cost = 0.0025  # same scale as above
#effective_cache_size = 128MB

Thanks for the help,

Carlo