Thread: Optimizing POSTGRESQL config
I'm new to POSTGRESQL but me and my friend are trying to benchmark the database by populating one table with 100,000 records, affecting around 5 columns in one row. The problem is our processing time is very slow...around 30 minutes to 1 hour. There are times it just hangs and stop execution. I was just wondering if this was normal. We structured our PHP code in such a way that isn't optimized in order to isolate possible problems with the database. We also used indices to speed up the process from the database side. Any suggestions on how to optimize Postgre SQL through its config files? Thanks in advance! __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
Joanne Formoso <joanneformoso@yahoo.com> writes: > I'm new to POSTGRESQL but me and my friend are trying > to benchmark the database by populating one table with > 100,000 records, affecting around 5 columns in one > row. The problem is our processing time is very > slow...around 30 minutes to 1 hour. There are times > it just hangs and stop execution. I was just > wondering if this was normal. Seems mighty slow --- but it's impossible to really tell with this lack of information. Show us the table schema, the actual query, and EXPLAIN's output for the query (or better yet, EXPLAIN ANALYZE), and then you might get some useful advice. Of course there's the standard newbie mistake: have you run VACUUM and/or ANALYZE on this table? You might also care to boost the shared_buffers setting, which is unreasonably low out-of-the-box. > We structured our PHP code in such a way that isn't > optimized in order to isolate possible problems with > the database. Are you sure it's not the PHP code that's the problem? One way to investigate is to watch "top" while running your test. If the php process is taking more CPU than the postgres backend, then I'd look to fixing the PHP side first ... regards, tom lane
Hello Joanne, Le Thu, 1 May 2003 06:02:38 -0700 (PDT) Joanne Formoso <joanneformoso@yahoo.com> a écrit: > I'm new to POSTGRESQL but me and my friend are trying > to benchmark the database by populating one table with > 100,000 records, affecting around 5 columns in one > row. The problem is our processing time is very > slow...around 30 minutes to 1 hour. There are times > it just hangs and stop execution. I was just > wondering if this was normal. > > We structured our PHP code in such a way that isn't > optimized in order to isolate possible problems with > the database. We also used indices to speed up the > process from the database side. Any suggestions on > how to optimize Postgre SQL through its config files? > Thanks in advance! Perhaps I don't well understand your question, but I have some times (for test) populated a table of 46 rows and 1,348,215 tuples in 6 MINUTES using \copy from a sequential file (on an AMD 900 MHz). This take more than 50 HOURS if I do that by a "C" program using libpq ... ! And don't create secondary indexes before the table was populated, depending of number of indexes this can take ... some days. Hope this help, -- Alain Lucari (Eurlix)
Thanks for all the suggestions! We're really quite new with database in general but were experimenting with PostgreSQL. Our knowledge of databases were really from MySQL. Anyway, we will be moving on to trying to learn PostgreSQL. We modified our old code, to 'batch' the processes instead of just one whole command. Based on some of the suggestions earlier here is our modified PHP code: ****************************************************** $query = "VACUUM ANALYZE dbspeed2;"; $result = pg_query($connection, $query); //or die(" Error in query: $query. " . pg_last_error($connection)); if (!$result) { $time_elapsed = time() - $time_connect; echo "Database failure: Error during VACUUM ANALYZE 1<br>Time Elapsed: $time_elapsed <br><br> POSTGRESQL ERROR:<br>" . pg_last_error($connection); exit; } $count = 1; for ($i = 0; $i < 100; $i++) { $query = "BEGIN;"; for($j = 0; $j <1000; $j++) { $query = $query."INSERT INTO dbspeed2 VALUES ($count, 'This is loop pass # $i-$j', random()*1000, 0, 0, 0, 0);"; $count = $count + 1; } $query = $query."COMMIT;"; $result = pg_query($connection, $query); if (!$result) { $time_elapsed = time() - $time_connect; echo "Database failure: INSERT GROUP $i-$j<br>Time Elapsed: $time_elapsed <br><br> POSTGRESQL ERROR:<br>" . pg_last_error($connection); exit; } } $query = "VACUUM ANALYZE dbspeed2;"; $result = pg_query($connection, $query); if (!$result) { $time_elapsed = time() - $time_connect; echo "Database failure: Error during VACUUM ANALYZE 2<br>Time Elapsed: $time_elapsed <br><br> POSTGRESQL ERROR:<br>" . pg_last_error($connection); exit; } $count = 1; for ($i = 0; $i < 100; $i++) { $query = "BEGIN;"; for ($j = 0; $j < 1000; $j++) { $query = $query."UPDATE dbspeed2 SET number1 = number/4, number2 = number/4, number3 = number/4, number4 = number/4 WHERE pky = $count;"; $count++; } $query = $query."COMMIT;"; $result = pg_query($connection, $query); if (!$result) { $time_elapsed = time() - $time_connect; echo "Database failure: UPDATE $i-$j<br>Time Elapsed: $time_elapsed <br><br> POSTGRESQL ERROR:<br>" . pg_last_error($connection); exit; } } ****************************************************** The code above executes in about 20 minutes. We have an placed indexes in columns that is being used by the code. I know the code is a little long...but is there any way to improve the code and POSTGRESQL's performance? The basic goal of this code is to gauge how fast POSTGRESQL will run given high volume data. I'll post the config file of POSTGRESQL in a minute. Once again thanks in advance! Regards, Joanne Formoso __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
Hello, Attached to this email is the configuration file that we are currently using for POSTGRESQL (postgresql.txt). We are using Apache webserver together with PHP and PostgreSQL 7.3.2. Our test server has the following specs: AMD Athlon 1.2 Gz with 128 MB of ram. Any help and suggestions on the matter would be very much appreciated. Thank you. Regards, Joanne Formoso __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com# # PostgreSQL configuration file # ----------------------------- # # This file consists of lines of the form: # # name = value # # (The '=' is optional.) White space may be used. Comments are introduced # with '#' anywhere on a line. The complete list of option names and # allowed values can be found in the PostgreSQL documentation. The # commented-out settings shown in this file represent the default values. # # Any option can also be given as a command line switch to the # postmaster, e.g. 'postmaster -c log_connections=on'. Some options # can be changed at run-time with the 'SET' SQL command. # # This file is read on postmaster startup and when the postmaster # receives a SIGHUP. If you edit the file on a running system, you have # to SIGHUP the postmaster for the changes to take effect, or use # "pg_ctl reload". #======================================================================== # # Connection Parameters # tcpip_socket = true #ssl = false max_connections = 64 #superuser_reserved_connections = 2 port = 5432 #hostname_lookup = false #show_source_port = false #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #virtual_host = '' #krb_server_keyfile = '' # # Shared Memory Size # shared_buffers = 11207 # min max_connections*2 or 16, 8KB each #max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes #max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes #max_locks_per_transaction = 64 # min 10 #wal_buffers = 8 # min 4, typically 8KB each # # Non-shared Memory Sizes # sort_mem = 32168 # min 64, size in KB #vacuum_mem = 8192 # min 1024, size in KB # # Write-ahead log (WAL) # #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds # #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # fsync = false #wal_sync_method = fsync # the default varies across platforms: # # fsync, fdatasync, open_sync, or open_datasync #wal_debug = 0 # range 0-16 # # Optimizer Parameters # #enable_seqscan = true #enable_indexscan = true #enable_tidscan = true #enable_sort = true #enable_nestloop = true #enable_mergejoin = true #enable_hashjoin = true #effective_cache_size = 1000 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) #default_statistics_target = 10 # range 1-1000 # # GEQO Optimizer Parameters # #geqo = true #geqo_selection_bias = 2.0 # range 1.5-2.0 #geqo_threshold = 11 #geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 #geqo_effort = 1 #geqo_generations = 0 #geqo_random_seed = -1 # auto-compute seed # # Message display # #server_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic #client_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, info, notice, warning, error #silent_mode = false #log_connections = false #log_pid = false #log_statement = false #log_duration = false #log_timestamp = false #log_min_error_statement = error # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false #explain_pretty_print = true # requires USE_ASSERT_CHECKING #debug_assertions = true # # Syslog # #syslog = 0 # range 0-2 #syslog_facility = 'LOCAL0' #syslog_ident = 'postgres' # # Statistics # #show_parser_stats = false #show_planner_stats = false #show_executor_stats = false #show_statement_stats = false # requires BTREE_BUILD_STATS #show_btree_build_stats = false # # Access statistics collection # #stats_start_collector = true #stats_reset_on_server_start = true #stats_command_string = false #stats_row_level = false #stats_block_level = false # # Lock Tracing # #trace_notify = false # requires LOCK_DEBUG #trace_locks = false #trace_userlocks = false #trace_lwlocks = false #debug_deadlocks = false #trace_lock_oidmin = 16384 #trace_lock_table = 0 # # Misc # #autocommit = true #dynamic_library_path = '$libdir' #search_path = '$user,public' #datestyle = 'iso, us' #timezone = unknown # actually, defaults to TZ environment setting #australian_timezones = false #client_encoding = sql_ascii # actually, defaults to database encoding #authentication_timeout = 60 # 1-600, in seconds #deadlock_timeout = 1000 # in milliseconds #default_transaction_isolation = 'read committed' #max_expr_depth = 10000 # min 10 #max_files_per_process = 1000 # min 25 #password_encryption = true #sql_inheritance = true #transform_null_equals = false #statement_timeout = 0 # 0 is disabled, in milliseconds #db_user_namespace = false # # Locale settings # # (initialized by initdb -- may be changed) LC_MESSAGES = 'en_US.iso885915' LC_MONETARY = 'en_US.iso885915' LC_NUMERIC = 'en_US.iso885915' LC_TIME = 'en_US.iso885915'
On Fri, 2 May 2003 00:28:34 -0700 (PDT), Joanne Formoso <joanneformoso@yahoo.com> wrote: >We are using Apache webserver >together with PHP and PostgreSQL 7.3.2. Our test >server has the following specs: AMD Athlon 1.2 Gz >with 128 MB of ram. Who sold you a 1.2 GHz machine with 128 MB? Next time better buy 600 MHz and 256 MB RAM ;-) So the short answer is: Buy more memory. OTOH you *can* configure Postgres to run satisfactorily on small hardware. >shared_buffers = 11207 # min max_connections*2 or 16, 8KB each This means almost 90 MB which is way too much for your small machine. I'd recommend shared_buffers = 1000. >sort_mem = 32168 # min 64, size in KB 32 MB sort mem together with 90 MB shared memory, so a single sort can cause your system to start swapping, not to mention Apache and other processes running on that system... >fsync = false Only do this if you don't care for consistency after a crash. >LC_MESSAGES = 'en_US.iso885915' >LC_MONETARY = 'en_US.iso885915' >LC_NUMERIC = 'en_US.iso885915' >LC_TIME = 'en_US.iso885915' This may slow down certain kinds of queries, locale C seems to be a better choice if you have speed in mind. Search the archives for details. Servus Manfred
Joanne, > Attached to this email is the configuration file that > we are currently using for POSTGRESQL > (postgresql.txt). We are using Apache webserver > together with PHP and PostgreSQL 7.3.2. Our test > server has the following specs: AMD Athlon 1.2 Gz > with 128 MB of ram. Any help and suggestions on the > matter would be very much appreciated. Thank you. Buy more RAM. In my experience, Apache tends to use about 128mb by itself under moderate concurrent user load, causing it to fight with PostgreSQL for RAM. I think you'll find that doubling your RAM will quintuple your PostgreSQL performance on large queries. -- -Josh Berkus ______AGLIO DATABASE SOLUTIONS___________________________ Josh Berkus Complete information technology josh@agliodbs.com and data management solutions (415) 565-7293 for law firms, small businesses fax 621-2533 and non-profit organizations. San Francisco
Le Fri, 02 May 2003 16:22:00 +0200 Manfred Koizar <mkoi-pg@aon.at> a écrit: > > Who sold you a 1.2 GHz machine with 128 MB? Next time better buy > 600 MHz and 256 MB RAM ;-) So the short answer is: Buy more > memory. OTOH you *can* configure Postgres to run satisfactorily on > small hardware. > I agree with you, memory seem to be more important than CPU speed. (But it's not easy to found now less than 4 GHz CPU speed at the street corner) Have you an idea how to configure an AMD 800 MHz with 512 Mo of RAM for <= 16 users ? > 32 MB sort mem together with 90 MB shared memory, so a single sort > can cause your system to start swapping, not to mention Apache and > other processes running on that system... > Without Apache or others ... for now. > >fsync = false > > Only do this if you don't care for consistency after a crash. Really dangerous with a Mylex DAC960 Accelraid ... wich seem to sync when it want ? > > >LC_MESSAGES = 'en_US.iso885915' > >LC_MONETARY = 'en_US.iso885915' > >LC_NUMERIC = 'en_US.iso885915' > >LC_TIME = 'en_US.iso885915' > > This may slow down certain kinds of queries, locale C seems to be a > better choice if you have speed in mind. Search the archives for > details. > Can you explain a little more or give the archives references ? Many thanks for answer, -- Alain Lucari (Eurlix)
Thanks for all the tips! I'll check out the database again when I get back to work tommorrow. Most likely I'll have some follow-ups but thanks again! Regards, Joanne Formoso __________________________________ Do you Yahoo!? The New Yahoo! Search - Faster. Easier. Bingo. http://search.yahoo.com
On Fri, 2 May 2003 22:55:15 +0200, eurlix <eurlix@libertysurf.fr> wrote: > I agree with you, memory seem to be more important than CPU speed. > (But it's not easy to found now less than 4 GHz CPU speed at the street corner) :-) and it's equally hard to find a new PC with less than 1 GB <g>. > Have you an idea how to configure an AMD 800 MHz with 512 Mo of RAM > for <= 16 users ? Without any knowledge about your database size and usage patterns, only vague ideas - I'd start with shared_buffers = 1000 sort_mem = 4096 effective_cache_size = 50000 >> This may slow down certain kinds of queries, locale C seems to be a >> better choice if you have speed in mind. Search the archives for >> details. >> > Can you explain a little more or give the archives references ? Google for something like locale index like or default locale considered harmful Servus Manfred