Re: Re: New server to improve performance on our large and busy DB - advice? (v2) - Mailing list pgsql-performance

From Ing. Marcos L. Ortiz Valmaseda
Subject Re: Re: New server to improve performance on our large and busy DB - advice? (v2)
Date
Msg-id 4B50330E.4020901@uci.cu
Whole thread Raw
In response to Re: New server to improve performance on our large and busy DB - advice? (v2)  (Ivan Voras <ivoras@freebsd.org>)
List pgsql-performance
El 15/01/2010 14:43, Ivan Voras escribió:
> hi,
>
> You wrote a lot of information here so let's confirm in a nutshell
> what you have and what you are looking for:
>
> * A database that is of small to medium size (5 - 10 GB)?
> * Around 10 clients that perform constant write operations to the
> database (UPDATE/INSERT)
> * Around 10 clients that occasionally read from the database
> * Around 6000 tables in your database
> * A problem with tuning it all
> * Migration to new hardware and/or OS
>
> Is this all correct?
>
> First thing that is noticeable is that you seem to have way too few
> drives in the server - not because of disk space required but because
> of speed. You didn't say what type of drives you have and you didn't
> say what you would consider desirable performance levels, but off hand
> (because of the "10 clients perform constant writes" part) you will
> probably want at least 2x-4x more drives.
>
> > 1) Which RAID level would you recommend
>
> With only 4 drives, RAID 10 is the only thing usable here.
>
> > 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
>
> Would not recommend Windows OS.
>
> > 3) If we were to port to a *NIX flavour, which would you recommend?
> (which
> > support trouble-free PG builds/makes please!)
>
> Practically any. I'm biased for FreeBSD, a nice and supported version
> of Linux will probably be fine.
>
> > 4) Is this the right PG version for our needs?
>
> If you are starting from scratch on a new server, go for the newest
> version you can get - 8.4.2 in this case.
>
> Most importantly, you didn't say what you would consider desirable
> performance. The hardware and the setup you described will work, but
> not necessarily fast enough.
>
> > . So far, we have never seen a situation where a seq scan has improved
> > performance, which I would attribute to the size of the tables
>
> ... and to the small number of drives you are using.
>
> > . We believe our requirements are exceptional, and we would benefit
> > immensely from setting up the PG planner to always favour
> index-oriented decisions
>
> Have you tried decreasing random_page_cost in postgresql.conf? Or
> setting (as a last resort) enable_seqscan = off?
>
>
> Carlo Stonebanks wrote:
>> My client just informed me that new hardware is available for our DB
>> server.
>>
>> . Intel Core 2 Quads Quad
>> . 48 GB RAM
>> . 4 Disk RAID drive (RAID level TBD)
>>
>> I have put the ugly details of what we do with our DB below, as well
>> as the
>> postgres.conf settings. But, to summarize: we have a PostgreSQL 8.3.6 DB
>> with very large tables and the server is always busy serving a constant
>> stream of single-row UPDATEs and INSERTs from parallel automated
>> processes.
>>
>> There are less than 10 users, as the server is devoted to the KB
>> production
>> system.
>>
>> My questions:
>>
>> 1) Which RAID level would you recommend
>> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)
>> 3) If we were to port to a *NIX flavour, which would you recommend?
>> (which
>> support trouble-free PG builds/makes please!)
>> 4) Is this the right PG version for our needs?
>>
>> Thanks,
>>
>> Carlo
>>
>> The details of our use:
>>
>> . The DB hosts is a data warehouse and a knowledgebase (KB) tracking the
>> professional information of 1.3M individuals.
>> . The KB tables related to these 130M individuals are naturally also
>> large
>> . The DB is in a perpetual state of serving TCL-scripted Extract,
>> Transform
>> and Load (ETL) processes
>> . These ETL processes typically run 10 at-a-time (i.e. in parallel)
>> . We would like to run more, but the server appears to be the bottleneck
>> . The ETL write processes are 99% single row UPDATEs or INSERTs.
>> . There are few, if any DELETEs
>> . The ETL source data are "import tables"
>> . The import tables are permanently kept in the data warehouse so
>> that we
>> can trace the original source of any information.
>> . There are 6000+ and counting
>> . The import tables number from dozens to hundreds of thousands of rows.
>> They rarely require more than a pkey index.
>> . Linking the KB to the source import date requires an "audit table"
>> of 500M
>> rows, and counting.
>> . The size of the audit table makes it very difficult to manage,
>> especially
>> if we need to modify the design.
>> . Because we query the audit table different ways to audit the ETL
>> processes
>> decisions, almost every column in the audit table is indexed.
>> . The maximum number of physical users is 10 and these users RARELY
>> perform
>> any kind of write
>> . By contrast, the 10+ ETL processes are writing constantly
>> . We find that internal stats drift, for whatever reason, causing row
>> seq
>> scans instead of index scans.
>> . So far, we have never seen a situation where a seq scan has improved
>> performance, which I would attribute to the size of the tables
>> . We believe our requirements are exceptional, and we would benefit
>> immensely from setting up the PG planner to always favour index-oriented
>> decisions - which seems to contradict everything that PG advice
>> suggests as
>> best practice.
>>
>> Current non-default conf settings are:
>>
>> autovacuum = on
>> autovacuum_analyze_scale_factor = 0.1
>> autovacuum_analyze_threshold = 250
>> autovacuum_naptime = 1min
>> autovacuum_vacuum_scale_factor = 0.2
>> autovacuum_vacuum_threshold = 500
>> bgwriter_lru_maxpages = 100
>> checkpoint_segments = 64
>> checkpoint_warning = 290
>> datestyle = 'iso, mdy'
>> default_text_search_config = 'pg_catalog.english'
>> lc_messages = 'C'
>> lc_monetary = 'C'
>> lc_numeric = 'C'
>> lc_time = 'C'
>> log_destination = 'stderr'
>> log_line_prefix = '%t '
>> logging_collector = on
>> maintenance_work_mem = 16MB
>> max_connections = 200
>> max_fsm_pages = 204800
>> max_locks_per_transaction = 128
>> port = 5432
>> shared_buffers = 500MB
>> vacuum_cost_delay = 100
>> work_mem = 512MB
>>
>>
>>
>
>
I have a question about that, due to all of you recommend RAID-10 for
the implementatio of this system. Would you give a available
arquitecture based on all these considerations?
About the questions, I recommend FreeBSD too for a PostgreSQL production
server (and for other things too, not only Pg), but with Linux you can
obtain a strong, reliable environment that can be more efficient that
Windows.

Regards


pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: New server to improve performance on our large and busy DB - advice? (v2)
Next
From: "Fernando Hevia"
Date:
Subject: Re: new server I/O setup