Configuring a Large RAM PostgreSQL Server - Mailing list pgsql-performance

From Alex Hochberger
Subject Configuring a Large RAM PostgreSQL Server
Date
Msg-id C78952BC-8F12-486E-9BD3-368EFAF448AD@dsgi.us
Whole thread Raw
Responses Re: Configuring a Large RAM PostgreSQL Server  (Richard Huxton <dev@archonet.com>)
Re: Configuring a Large RAM PostgreSQL Server  (Josh Berkus <josh@agliodbs.com>)
List pgsql-performance
Does anyone have any white papers or basic guides for a large RAM
server?

We are consolidating two databases to enable better data-mining that
currently run on a 4 GB and 2 GB machine.  The data issues on the 4
GB machine are numerous, things like "create index" fail and update
queries fail from out of memory issues.  Re-factoring the data is
helping, but isn't finishing the job.

The new machine will have 48 GB of RAM, so figuring out starting
points for the Shared Buffers and Work_mem/Maintenance_work_mem is
going to be a crap shoot, since the defaults still seem to be based
upon 256MB of RAM or less.

Usage:
    Most of the time, the database is being hit with a handle of
poorly written and unoptimized queries from a Ruby on Rails app that
is being refactored as a simple Ruby-DBI app since we need to support
our legacy code but don't need the Rails environment, just a lot of
SQL writes.  Some stored procedures should streamline this.  However,
each transaction will do about 5 or 6 writes.
    Common Usage: we have a reporting tool that is also being
refactored, but does a lot of aggregate queries.  None of these take
more than 500 ms after indexing on the 2 GB database, so assuming
that more RAM should help and eliminate the problems.
    Problem Usage: we have a 20GB table with 120m rows that we are
splitting into some sub-tables.  Generally, we do large data pulls
from here, 1 million - 4 million records at a time, stored in a new
table for export.  These queries are problematic because we are
unable to index the database for the queries that we run because we
get out of memory errors.  Most of my cleanup has restored to FOR-IN
loops via pl-pgsql to manage the data one row at a time.  This is
problematic because many of these scripts are taking 4-5 days to run.
    Other usage: we will import between 10k and 10m rows at one time
out of CSVs into the big database table.  I got my gig here because
this was all failing and the data was becoming worthless.  These
imports involve a lot of writes.

    Our simultaneous queries are small, and currently run
acceptably.  It's the big imports, data-mining pulls, and system
manipulation were we routinely wait days on the query that we are
looking to speed up.

Thanks,
Alex

pgsql-performance by date:

Previous
From: Csaba Nagy
Date:
Subject: Re: TB-sized databases
Next
From: Richard Huxton
Date:
Subject: Re: Configuring a Large RAM PostgreSQL Server