Thread: Need advice on postgresql.conf settings

Need advice on postgresql.conf settings

From
"Shane | SkinnyCorp"
Date:
Thanks in advance for anything you can do to help.


The real issue is this, we have THE SAME queries taking anywhere from .001 -
90.0 seconds... the server is using 98% of the available RAM at all times
(because of the persistant connections via php), and I don't know what to
do.  Every time I change a .conf setting I feel like it slows it down even
more.... and I can't seem to find the balance.  I'll give you everything
I've got, and I hope to god someone can point out some areas where I could
improve the speed of queries overall.

For months I've been optimizing my queries, and working around places that I
don't need them.  They are so incredibly honed, I couldn't even begin to
explain.... and when there are less than 10 users browsing our sites, they
are LIGHTENING fast... even with 5x the amount of dummy data in the
database(s)... so basically the LARGEST factor in this whole performance
issue that I can find is the number of users browsing the sites at all
times... but lowering shared_buffers to raise max_connections is hurting
performance immensley... so I"m totally lost.... please help!! Bless you!


THE DETAILS:


(for the databases, i'll list only the 'main' tables... as the others are
fairly small)
Database 1:
    5000 'users'
    20,000 'threads'
    500,000 'posts'
    ...
Database 2: (just starting out)
    150 'users'
    150 'entries'
    ...

Hardware :
    Pentium 4 2.44ghz
    1.5gb RAM
    7200rpm SATA

Software:
    Redhat Linux (kernel v. 2.4.21-9.EL)
    Postgresql 7.4.2
    PHP 4.3.6 (using persistant connections to pgsql)

Usage:
    uptime: 12:23:08  up 132 days, 19:16,  2 users,  load average: 19.75,
17.34, 18.86
    roughly 100-200 users connected to our server at any given moment
    roughly 10-15 queries per HTTP page load


----------------------------------------------------------

# -----------------------------
# PostgreSQL configuration file
# -----------------------------
#
# This file consists of lines of the form:
#
#   name = value
#
# (The '=' is optional.) White space may be used. Comments are introduced
# with '#' anywhere on a line. The complete list of option names and
# allowed values can be found in the PostgreSQL documentation. The
# commented-out settings shown in this file represent the default values.
#
# Any option can also be given as a command line switch to the
# postmaster, e.g. 'postmaster -c log_connections=on'. Some options
# can be changed at run-time with the 'SET' SQL command.
#
# This file is read on postmaster startup and when the postmaster
# receives a SIGHUP. If you edit the file on a running system, you have
# to SIGHUP the postmaster for the changes to take effect, or use
# "pg_ctl reload".


#---------------------------------------------------------------------------
# CONNECTIONS AND AUTHENTICATION
#---------------------------------------------------------------------------

# - Connection Settings -

tcpip_socket = true
max_connections = 75
        # note: increasing max_connections costs about 500 bytes of shared
        # memory per connection slot, in addition to costs from
shared_buffers
        # and max_locks_per_transaction.
#superuser_reserved_connections = 2
port = 5432
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777 # octal
#virtual_host = ''              # what interface to listen on; defaults to
any
#rendezvous_name = ''           # defaults to the computer name

# - Security & Authentication -

#authentication_timeout = 60    # 1-600, in seconds
ssl = true
password_encryption = true
#krb_server_keyfile = ''
#db_user_namespace = false


#---------------------------------------------------------------------------
# RESOURCE USAGE (except WAL)
#---------------------------------------------------------------------------

# - Memory -


shared_buffers = 8192           # min 16, at least max_connections*2, 8KB
each
sort_mem = 8192                 # min 64, size in KB
vacuum_mem = 4096               # min 1024, size in KB

# - Free Space Map -

max_fsm_pages = 20000           # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 1000        # min 100, ~50 bytes each

# - Kernel Resource Usage -

max_files_per_process = 3052    # min 25
#preload_libraries = ''


#---------------------------------------------------------------------------
# WRITE AHEAD LOG
#---------------------------------------------------------------------------

# - Settings -

fsync = true                    # turns forced synchronization on or off
#wal_sync_method = fsync        # the default varies across platforms:
                                # fsync, fdatasync, open_sync, or
open_datasync
wal_buffers = 192               # min 4, 8KB each

# - Checkpoints -

#checkpoint_segments = 3        # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 300       # range 30-3600, in seconds
#checkpoint_warning = 30        # 0 is off, in seconds
#commit_delay = 0               # range 0-100000, in microseconds
#commit_siblings = 5            # range 1-1000


#---------------------------------------------------------------------------
# QUERY TUNING
#---------------------------------------------------------------------------

# - Planner Method Enabling -

#enable_hashagg = true
#enable_hashjoin = true
#enable_indexscan = true
#enable_mergejoin = true
#enable_nestloop = true
enable_seqscan = false
#enable_sort = true
#enable_tidscan = true

# - Planner Cost Constants -

effective_cache_size = 131072   # typically 8KB each

random_page_cost = 2            # units are one sequential page fetch cost



cpu_tuple_cost = .01            # (same) default .01
cpu_index_tuple_cost = .001     # (same) default .001
cpu_operator_cost = 0.0025      # (same) default .0025

# - Genetic Query Optimizer -

geqo = true
geqo_threshold = 20
#geqo_effort = 1
#geqo_generations = 0
#geqo_pool_size = 0             # default based on tables in statement,
                                # range 128-1024
#geqo_selection_bias = 2.0      # range 1.5-2.0

# - Other Planner Options -

#default_statistics_target = 10 # range 1-1000
#from_collapse_limit = 8
#join_collapse_limit = 8        # 1 disables collapsing of explicit JOINs


#---------------------------------------------------------------------------
# ERROR REPORTING AND LOGGING
#---------------------------------------------------------------------------

# - Syslog -

#syslog = 0                     # range 0-2; 0=stdout; 1=both; 2=syslog
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'

# - When to Log -

client_min_messages = error     # Values, in order of decreasing detail:
                                #   debug5, debug4, debug3, debug2, debug1,
                                #   log, info, notice, warning, error

log_min_messages = error        # Values, in order of decreasing detail:
                                #   debug5, debug4, debug3, debug2, debug1,
                                #   info, notice, warning, error, log,
fatal,
                                #   panic

log_error_verbosity = default   # terse, default, or verbose messages

log_min_error_statement = panic # Values in order of increasing severity:
                                 #   debug5, debug4, debug3, debug2, debug1,
                                 #   info, notice, warning, error,
panic(off)

log_min_duration_statement = -1 # Log all statements whose
                                 # execution time exceeds the value, in
                                 # milliseconds.  Zero prints all queries.
                                 # Minus-one disables.

#silent_mode = false             # DO NOT USE without Syslog!

# - What to Log -

debug_print_parse = false
debug_print_rewritten = false
debug_print_plan = false
debug_pretty_print = false
log_connections = false
log_duration = false
log_pid = false
log_statement = false
log_timestamp = false
log_hostname = false
log_source_port = false


#---------------------------------------------------------------------------
# RUNTIME STATISTICS
#---------------------------------------------------------------------------

# - Statistics Monitoring -

log_parser_stats = false
log_planner_stats = false
log_executor_stats = false
log_statement_stats = false

# - Query/Index Statistics Collector -

stats_start_collector = false
stats_command_string = false
stats_block_level = false
stats_row_level = false
stats_reset_on_server_start = false


#---------------------------------------------------------------------------
# CLIENT CONNECTION DEFAULTS
#---------------------------------------------------------------------------

# - Statement Behavior -

#search_path = '$user,public'   # schema names
#check_function_bodies = true
#default_transaction_isolation = 'read committed'
#default_transaction_read_only = false
#statement_timeout = 0          # 0 is disabled, in milliseconds

# - Locale and Formatting -

#datestyle = 'iso, mdy'
#timezone = unknown             # actually, defaults to TZ environment
setting
#australian_timezones = false
#extra_float_digits = 0         # min -15, max 2
#client_encoding = sql_ascii    # actually, defaults to database encoding

# These settings are initialized by initdb -- they may be changed
lc_messages = 'en_US.UTF-8'             # locale for system error message
strings
lc_monetary = 'en_US.UTF-8'             # locale for monetary formatting
lc_numeric = 'en_US.UTF-8'              # locale for number formatting
lc_time = 'en_US.UTF-8'                 # locale for time formatting

# - Other Defaults -

explain_pretty_print = true
#dynamic_library_path = '$libdir'
#max_expr_depth = 10000         # min 10


#---------------------------------------------------------------------------
# LOCK MANAGEMENT
#---------------------------------------------------------------------------

#deadlock_timeout = 1000        # in milliseconds
#max_locks_per_transaction = 64 # min 10, ~260*max_connections bytes each


#---------------------------------------------------------------------------
# VERSION/PLATFORM COMPATIBILITY
#---------------------------------------------------------------------------

# - Previous Postgres Versions -

#add_missing_from = true
regex_flavor = advanced         # advanced, extended, or basic
#sql_inheritance = true

# - Other Platforms & Clients -

#transform_null_equals = false


------
I've been tweaking the postgresql.conf file for about 5 hours... just today.
We've had problems in the past (and I've also emailed this list in the past,
but perhaps I failed to ask the right questions)....

I guess I need some help with the postgresql configuration file.  I would
like to start off by asking that you not link me to the same basic .CONF
overview, as what I really need at this point is real-world experience and
wisdom, as opposed to cold, poorly documented, and incredibly abstract
(trial-and-error) type manual entries.


Re: Need advice on postgresql.conf settings

From
Tom Lane
Date:
"Shane | SkinnyCorp" <shanew@skinnycorp.com> writes:
> The real issue is this, we have THE SAME queries taking anywhere from .001 -
> 90.0 seconds... the server is using 98% of the available RAM at all times
> (because of the persistant connections via php), and I don't know what to
> do.

I have a feeling that the answer is going to boil down to "buy more RAM"
--- it sounds a lot like you're just overstressing your server.  The
more active backends you have, the more RAM goes to process-local
memory, and the less is available for kernel disk cache.  Even if you
don't go into outright swapping, the amount of disk I/O needed goes up
the smaller the kernel disk cache gets.

Another possible line of attack is to use persistent (pooled)
connections to cut down the number of live backend processes you need.
However, depending on what your application software is, that might
take more time/effort (= money) than dropping in some more RAM.

You can investigate this theory by watching "top" output (the first few
lines about memory usage, not the process listing) as well as "vmstat"
output.

>     uptime: 12:23:08  up 132 days, 19:16,  2 users,  load average: 19.75,
> 17.34, 18.86

Load averages approaching 20 are not good either ... what sort of box
are you running on anyway?

As for the postgresql.conf settings, the only ones I'd seriously question
are

max_fsm_pages = 20000           # min max_fsm_relations*16, 6 bytes each
max_fsm_relations = 1000        # min 100, ~50 bytes each

These are the defaults, and are probably too small for a DB exceeding a
hundred meg or so.

max_files_per_process = 3052    # min 25

You really have your kernel set to support 3052 * 75 simultaneously open
files?  Back this off.  I doubt values beyond a couple hundred buy
anything except headaches.

wal_buffers = 192

This is an order-of-magnitude overkill too, especially if your
transactions are mostly small.  I know it's only a megabyte or two,
but you evidently need that RAM more elsewhere.

enable_seqscan = false

I don't think this is a good idea in general.

            regards, tom lane

Re: Need advice on postgresql.conf settings

From
Sean Chittenden
Date:
>> The real issue is this, we have THE SAME queries taking anywhere from
>> .001 -
>> 90.0 seconds... the server is using 98% of the available RAM at all
>> times
>> (because of the persistant connections via php), and I don't know
>> what to
>> do.
>
> Another possible line of attack is to use persistent (pooled)
> connections to cut down the number of live backend processes you need.
> However, depending on what your application software is, that might
> take more time/effort (= money) than dropping in some more RAM.

This particular feature is pure evilness.  Using all of my fingers and
toes, I can't count the number of times I've had a client do this and
get themselves into a world of hurt.  Somewhere in the PHP
documentation, there should be a big warning wrapped in the blink tag
that steers people away from setting this.  The extra time necessary to
setup a TCP connection is less than the performance drag induced on the
backend when persistent connections are enabled.  Reread that last
sentence until it sinks in.  On a local network, this is premature
optimization that's hurting you.

> max_files_per_process = 3052    # min 25
>
> You really have your kernel set to support 3052 * 75 simultaneously
> open
> files?  Back this off.  I doubt values beyond a couple hundred buy
> anything except headaches.

This, on the other hand, has made a large difference for me.  Time
necessary to complete open(2) calls can be expensive, especially when
the database is poorly designed and is touching many different parts of
the database spread across multiple files on the backend.  3000 is
high, but I've found 500 to be vastly too low in some cases... in
others, it's just fine.  My rule of thumb has become, if you're doing
lots of aggregate functions (ex, SUM(), COUNT()) more than once in the
lifetime of a backend, increasing this value helps.. otherwise it buys
you little (if so, 1500 is generally sufficient).  Faster IO, however,
is going to save you here.  If you can, increase your disk caching in
the OS.  On FreeBSD, increase your KVA_PAGES and NBUFs.  Since you've
freed up more ram by disabling persistent connections, this shouldn't
be a problem.  -sc

--
Sean Chittenden


Re: Need advice on postgresql.conf settings

From
Jeff
Date:
On Nov 9, 2004, at 2:01 PM, Shane | SkinnyCorp wrote:

> Thanks in advance for anything you can do to help.
>
>
> The real issue is this, we have THE SAME queries taking anywhere from
> .001 - 90.0 seconds... the server is using 98% of the available RAM at
> all times (because of the persistant connections via php), and I don't
> know what to do.  Every time I change a

I'd recommend strongly ditching the use of pconnect and use pgpool +
regular connect. It is a terrific combination that provides pool
connections like how you'd think they shoudl work (a pool of N
connections to PG shared by Y processes instead of a 1:1 mapping).

curiously, have you noticed any pattern to the slowdown?
It could be induced by a checkpoint or vacuum.

Are you swapping at all?

Are your PHP scripts leaking at all, etc.?

Your load average is high, how does your CPU idle look (if load is
high, and the cpus are pretty idle that is an indicator of being IO
bound).

good luck.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/


How to speed-up inserts with jdbc

From
Michael Kleiser
Date:
Im PostgreSQL 7.2.2 / Linux 2.4.27  dual-processor Pentium III 900MHz,

we have this table:

create table testtable (id SERIAL PRIMARY KEY, coni VARCHAR(255), date TIMESTAMP, direction VARCHAR(255), partner
VARCHAR(255),type VARCHAR(255), block VARCHAR(255) ); 


We using Java with JDBC-driver pg72jdbc2.jar


our Java-testgrogram is :


public class Stresser implements Runnable {
     public void run() {
         System.out.println("-> start");
         try {

            Class.forName("org.postgresql.Driver");
            Connection con =
DriverManager.getConnection("jdbc:postgresql://"+prop.getProperty("host")+":"+prop.getProperty("port")+"/"+prop.getProperty("dbname"),
prop.getProperty("user"), 
prop.getProperty("pwd"));
            con.setAutoCommit(true);
            Statement  st = con.createStatement();
            java.sql.Timestamp datum = new java.sql.Timestamp(new Date().getTime());
            Date start = new Date();
            System.out.println(start);
            for (int i=0; i<100; ++i) {
                st.executeUpdate("insert into history(uuid,coni,date,direction,partner,type)
values('uuid','content','"+datum+"','dir','partner','type')");
            }
            Date end = new Date();
            System.out.println(end);
            con.close();
         } catch (Exception e) {
             System.out.println("Exception!");
             e.printStackTrace();
         }
         System.out.println("-> ende");
     }

     public static void main(String[] args) {

         for (int i=0; i<10; ++i) {
             Stresser s = new Stresser();
             Thread t = new Thread(s);
             t.start();
         }
     }
}


It is trunning in in 10 Threads. Each thread makes 100 Inserts:

For the 1000 Inserts (10 threads a 100 inserts)
we need 8 seconds.
That's 125 Insets / Seconds.

How could we make it faster ?

Inserting 1000 rows via  INSERT AS SELECT is much faster.

regards
   Michael

Re: How to speed-up inserts with jdbc

From
Jeff
Date:
On Nov 10, 2004, at 8:51 AM, Michael Kleiser wrote:
> It is trunning in in 10 Threads. Each thread makes 100 Inserts:
>
> For the 1000 Inserts (10 threads a 100 inserts)
> we need 8 seconds.
> That's 125 Insets / Seconds.
> How could we make it faster ?
>

Batch the inserts up into a transaction.

So you'd have
BEGIN
insert
insert
insert
...
COMMIT

Your numbers will suddenly sky rocket.

--
Jeff Trout <jeff@jefftrout.com>
http://www.jefftrout.com/
http://www.stuarthamm.net/


Re: How to speed-up inserts with jdbc

From
Dave Cramer
Date:
couple of things

1) That is a fairly old version of postgres, there are considerable
performance improvements in the last 2 releases since, and even more in
the pending release.
2) If you are going to insert more rows than that, consider dropping the
index before, and recreating after the insert.

Dave

Michael Kleiser wrote:

> Im PostgreSQL 7.2.2 / Linux 2.4.27  dual-processor Pentium III 900MHz,
>
> we have this table:
>
> create table testtable (id SERIAL PRIMARY KEY, coni VARCHAR(255), date
> TIMESTAMP, direction VARCHAR(255), partner VARCHAR(255), type
> VARCHAR(255), block VARCHAR(255) );
>
>
> We using Java with JDBC-driver pg72jdbc2.jar
>
>
> our Java-testgrogram is :
>
>
> public class Stresser implements Runnable {
>     public void run() {
>         System.out.println("-> start");
>         try {
>
>             Class.forName("org.postgresql.Driver");
>             Connection con =
>
DriverManager.getConnection("jdbc:postgresql://"+prop.getProperty("host")+":"+prop.getProperty("port")+"/"+prop.getProperty("dbname"),

> prop.getProperty("user"), prop.getProperty("pwd"));
>             con.setAutoCommit(true);
>             Statement  st = con.createStatement();
>             java.sql.Timestamp datum = new java.sql.Timestamp(new
> Date().getTime());
>             Date start = new Date();
>             System.out.println(start);
>             for (int i=0; i<100; ++i) {
>                 st.executeUpdate("insert into
> history(uuid,coni,date,direction,partner,type)
> values('uuid','content','"+datum+"','dir','partner','type')");
>             }
>             Date end = new Date();
>             System.out.println(end);
>             con.close();
>         } catch (Exception e) {
>             System.out.println("Exception!");
>             e.printStackTrace();
>         }
>         System.out.println("-> ende");
>     }
>
>     public static void main(String[] args) {
>
>         for (int i=0; i<10; ++i) {
>             Stresser s = new Stresser();
>             Thread t = new Thread(s);
>             t.start();
>         }
>     }
> }
>
>
> It is trunning in in 10 Threads. Each thread makes 100 Inserts:
>
> For the 1000 Inserts (10 threads a 100 inserts)
> we need 8 seconds.
> That's 125 Insets / Seconds.
>
> How could we make it faster ?
>
> Inserting 1000 rows via  INSERT AS SELECT is much faster.
>
> regards
>   Michael
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>
>

--
Dave Cramer
http://www.postgresintl.com
519 939 0336
ICQ#14675561


Re: How to speed-up inserts with jdbc

From
Edwin Eyan Moragas
Date:
On Wed, 10 Nov 2004 14:51:57 +0100, Michael Kleiser <mkl@webde-ag.de> wrote:
>                Statement  st = con.createStatement();
>                java.sql.Timestamp datum = new java.sql.Timestamp(new Date().getTime());
>                Date start = new Date();
>                System.out.println(start);
>                for (int i=0; i<100; ++i) {
>                    st.executeUpdate("insert into history(uuid,coni,date,direction,partner,type)
values('uuid','content','"+datum+"','dir','partner','type')");
>                }

how about using PreparedStatment? that's on the java end.
on the pg end, maybe do a BEGIN before the for loop and
END at the end of the for loop.
--
i'm not flying. i'm falling... in style.

Re: How to speed-up inserts with jdbc

From
"Steinar H. Gunderson"
Date:
On Thu, Nov 11, 2004 at 04:04:06PM +0800, Edwin Eyan Moragas wrote:
> how about using PreparedStatment? that's on the java end.
> on the pg end, maybe do a BEGIN before the for loop and
> END at the end of the for loop.

You don't even need a "BEGIN" and "END"; his code has a setAutoComit(true)
before the for loop, which just has to be changed to setAutoCommit(false)
(and add an explicit commit() after the for loop, of course).

/* Steinar */
--
Homepage: http://www.sesse.net/

Re: How to speed-up inserts with jdbc

From
Edwin Eyan Moragas
Date:
On Thu, 11 Nov 2004 11:04:18 +0100, Steinar H. Gunderson
<sgunderson@bigfoot.com> wrote:
> You don't even need a "BEGIN" and "END"; his code has a setAutoComit(true)
> before the for loop, which just has to be changed to setAutoCommit(false)
> (and add an explicit commit() after the for loop, of course).

amen. i stand corrected.

-eem