Re: "large" spam tables and performance: postgres memory parameters - Mailing list pgsql-performance

From Stephen Frost
Subject Re: "large" spam tables and performance: postgres memory parameters
Date
Msg-id 20100108020258.GG17756@tamriel.snowman.net
Whole thread Raw
In response to "large" spam tables and performance: postgres memory parameters  (Gary Warner <gar@cis.uab.edu>)
List pgsql-performance
* Gary Warner (gar@cis.uab.edu) wrote:
>  - if you have 8 pentium cores, 12GB of RAM and "infinite" diskspace, what sorts of memory settings would you have in
yourstart up tables? 

If the PG database is the only thing on the system, I'd probably go with
something like:

shared_buffers = 4GB
temp_buffers = 1GB
work_mem = 128M # Maybe adjust this during a session if you have
                # big/complex queries
maintenance_work_mem = 256M
checkpoint_segments = 20 # Maybe more..
effective_cache_size = 8GB # Maybe more if you have a SAN which is doing
                           # cacheing for you too..


>   My biggest question mark there really has to do with how many users I have and how that might alter the results.

Presuming what you mean by this is "how would the number of users change
the settings I'm suggesting above", I'd say "probably not much for the
number of users you're talking about.".  Really, 12-25 users just isn't
all that many.  You probably need a queueing system to handle requests
that are going to take a long time to complete (as in, don't expect the
user or their web browser to stick around while you run a query that
takes half an hour to complete...).

> (I'll let this thread run a bit, and then come back to ask questions about "vacuum analyze" and "partitioned tables"
asa second and third round of questions.)  

You should definitely be lookig to do partitioning based on the type of
data and the way you want to restrict the queries.  I'm not convinced
you'd actually *need* to restrict the queries to recent things if you
partition correctly- you might restrict the total *range* to be
something small enough that it won't take too long.

It sounds like you have a number of systems, in which case you might
consider sharding if you get a large number of users (where large is a
whole lot bigger than 25...) or you find that users really do need
real-time results on very large ranges.  It involves a fair bit of code
to do and do well though, so you really have to consider it carefully
and make sure it will help your important use cases (and not too badly
impact your other use cases) before going that route.

autovacuum is your friend..  though you might need to tune it.

    Thanks,

        Stephen

Attachment

pgsql-performance by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: "large" spam tables and performance: postgres memory parameters
Next
From: Craig Ringer
Date:
Subject: Re: Air-traffic benchmark