Thread: 500 "too many non super user connections". Guilty parameter: "statement_timeout = 0"
500 "too many non super user connections". Guilty parameter: "statement_timeout = 0"
From
Enric Naval
Date:
Hello: Warning: this both a ranting and an advice to novices about stupid configuration parameter "statement_timeout", which is by default set to "0", rendering thus statements inmortal, and increasing up the number of connections occupied by days-old unused statements. If you think I am wrong, please correct me after reading on. Problems aroused after switching from postgres 6.2, where everything worked fine, to postgres 7.3 (details later). So I had no old postgresql.conf to relay on. I lost the exact wording of the 500 error, but was like this: "too many non super user connections" Now, this was an error rendered by our JSP server (caucho.resin-3.0.3, which makes roughly same things as Tomcat), but the source of the error is obviously the database backend (postgresql-server-7.3.2-3 RedHat RPM), as the JSP configuration has no parameter named like that, and postgres does have one like that. OS is RedHat 9.0 totally up to date running 24 hours in a Pentium 2.4 GHz, 512 Mg RAM, 80 GB hard disk, so it is not a perfomance problem. Server is standalone JSP+Postgres. JSP pages use these instructions to open and close the database. DRIVER = "org.postgresql.Driver"; STRING = "jdbc:postgresql://127.0.0.1:5432/aip"; // get driver and connection Driver DriverPest = (Driver)Class.forName(DRIVER).newInstance(); Connection ConnPest = DriverManager.getConnection(STRING,USERNAME,PASSWORD); // // some statements, prepared statements, etc. // // politely close ResultSets and connections (NOT IN // ALL FILES, SOME FILES "FORGET" TO CLOSE THE // CONNECTION, surely the problem starts there) rstPest.close(); ConnPest.close(); The 500 error appeared after five/six days in a row of flawless working, so I interrogated the machine before restarting the server, to guess the source problem. Command "netstat -antp" gives me +150 lines the likes of this: tcp 0 0 127.0.0.1:5432 127.0.0.1:43741 ESTABLISHED 10033/postgres: aip tcp 0 0 127.0.0.1:43741 127.0.0.1:5432 ESTABLISHED 19181/java ...and another +150 lines saying CLOSE_WAIT instead of ESTABLISHED, all in paired postgres/java ports. Really slick. This looks like the "java" process (the JSP server) connecting to postgres, and the postgres connecting back to the "java" process. "aip" is a database name. There are only two databases in the server, "aip" and "cel". Some connections have "aip" name, others hace "cel" name, so I know it is not database especific. "ps aux" is very funny (look at the end of the message). Postgres processes are inmortal, and more processes keep happily popping up day after day until they stop, in sunday. The 500 error was first noticed in monday morning, and JSP server process was not restarted until tuesday morning, after half an hour collecting enough evidence to make a forensics analysis. Once the JSP server was stopped, all extra postgres processes and connections desappeared after a few seconds. JSP server was restarted and worked apparently normal. Then I looked over the postgresql.conf file and found the guilty line: statement_timeout = 0 # 0 is disabled, in milliseconds Statements never time out! I changed it to: statement_timeout = 600000 # 0 is disabled, in milliseconds Perhaps a bit too long timeout (10 minutes), but want be on the sure side, and server is fast and new, so what the heck. The JSP server gets only 1500-2000 hits a day. That's 75 to 100 visits. With max_connections=128, it should work fine. Please understand: if all JSP pages closed connections correctly, then *perhaps* this could not happen, but these are legacy pages from long ago, and anyways I can't make myself responsible for every clumsy programmer mistake. The server has to work perfectly *always*. I must say that Apache+PHP+mysql (all latest RPMs from RedHat) was running happily all that time in that server with no problem at all. Oh, well. Thanks for listening to my rant. Corrections welcome. ************************* POSTGRESQL.CONF ************************** # # Connection Parameters # tcpip_socket = true ssl = false max_connections = 128 superuser_reserved_connections = 2 port = 5432 #hostname_lookup = false #show_source_port = false #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 # octal #virtual_host = '' #krb_server_keyfile = '' # # Shared Memory Size # shared_buffers = 256 # min max_connections*2 or 16, 8KB each max_fsm_relations = 2000 # min 10, fsm is free space map, ~40 bytes max_fsm_pages = 20000 # min 1000, fsm is free space map, ~6 bytes max_locks_per_transaction = 64 # min 10 wal_buffers = 32 # min 4, typically 8KB each # # Non-shared Memory Sizes # sort_mem = 1024 # min 64, size in KB vacuum_mem = 8192 # min 1024, size in KB # # Write-ahead log (WAL) # #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each #checkpoint_timeout = 300 # range 30-3600, in seconds # #commit_delay = 0 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 # #fsync = true #wal_sync_method = fsync # the default varies across platforms: # # fsync, fdatasync, open_sync, or open_datasync #wal_debug = 0 # range 0-16 # # Optimizer Parameters # #enable_seqscan = true #enable_indexscan = true #enable_tidscan = true #enable_sort = true #enable_nestloop = true #enable_mergejoin = true #enable_hashjoin = true effective_cache_size = 2000 # typically 8KB each #random_page_cost = 4 # units are one sequential page fetch cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) #default_statistics_target = 10 # range 1-1000 # # GEQO Optimizer Parameters # #geqo = true #geqo_selection_bias = 2.0 # range 1.5-2.0 #geqo_threshold = 11 #geqo_pool_size = 0 # default based on tables in statement, # range 128-1024 #geqo_effort = 1 #geqo_generations = 0 #geqo_random_seed = -1 # auto-compute seed # # Message display # #server_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, log, fatal, # panic #client_min_messages = notice # Values, in order of decreasing detail: # debug5, debug4, debug3, debug2, debug1, # log, info, notice, warning, error #silent_mode = false log_connections = true #log_pid = false #log_statement = false log_duration = true log_timestamp = true log_min_error_statement = error # Values in order of increasing severity: # debug5, debug4, debug3, debug2, debug1, # info, notice, warning, error, panic(off) #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false #explain_pretty_print = true # requires USE_ASSERT_CHECKING #debug_assertions = true # # Syslog # syslog = 0 # range 0-2 #syslog_facility = 'LOCAL0' syslog_ident = 'postgres' # # Statistics # #show_parser_stats = false #show_planner_stats = false #show_executor_stats = false #show_statement_stats = false # requires BTREE_BUILD_STATS #show_btree_build_stats = false # # Access statistics collection # #stats_start_collector = true #stats_reset_on_server_start = true #stats_command_string = false #stats_row_level = false #stats_block_level = false # # Lock Tracing # #trace_notify = false # requires LOCK_DEBUG #trace_locks = false #trace_userlocks = false #trace_lwlocks = false #debug_deadlocks = false #trace_lock_oidmin = 16384 #trace_lock_table = 0 # # Misc # #autocommit = true #dynamic_library_path = '$libdir' search_path = '$user,public' #datestyle = 'iso, us' #timezone = unknown # actually, defaults to TZ environment setting #australian_timezones = false #client_encoding = sql_ascii # actually, defaults to database encoding #authentication_timeout = 60 # 1-600, in seconds #deadlock_timeout = 1000 # in milliseconds #default_transaction_isolation = 'read committed' #max_expr_depth = 10000 # min 10 #max_files_per_process = 1000 # min 25 password_encryption = true #sql_inheritance = true #transform_null_equals = false statement_timeout = 600000 # 0 is disabled, in milliseconds #db_user_namespace = false # # Locale settings # # (initialized by initdb -- may be changed) LC_MESSAGES = 'ca_ES.UTF-8' LC_MONETARY = 'ca_ES.UTF-8' LC_NUMERIC = 'ca_ES.UTF-8' LC_TIME = 'ca_ES.UTF-8' ****************************** NETSTAT -ATNP OUTPUT ******************************* postgres 680 0.0 0.0 13940 0 ? SW Nov06 0:00 postgres: aipo ai postgres 4641 0.0 0.0 13924 0 ? SW Nov07 0:00 postgres: aipo ai postgres 4642 0.0 0.0 13924 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6410 0.0 0.0 13924 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6411 0.0 0.0 13924 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6412 0.0 0.0 13940 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6414 0.0 0.0 13940 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6416 0.0 0.0 13940 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6417 0.0 0.0 13924 0 ? SW Nov07 0:00 postgres: aipo ai postgres 6418 0.0 0.0 13924 0 ? SW Nov07 0:00 postgres: aipo ai postgres 7720 0.0 0.0 13940 0 ? SW Nov07 0:00 postgres: aipo ai postgres 8275 0.0 0.1 13908 604 ? S Nov07 0:00 postgres: cel cel postgres 8278 0.0 0.1 13908 592 ? S Nov07 0:00 postgres: cel cel postgres 8281 0.0 0.1 13908 548 ? S Nov07 0:00 postgres: cel cel postgres 8526 0.0 0.1 13940 668 ? S Nov07 0:00 postgres: aipo ai postgres 8585 0.0 0.1 13924 576 ? S Nov07 0:00 postgres: aipo ai postgres 8586 0.0 0.1 13924 672 ? S Nov07 0:00 postgres: aipo ai postgres 8587 0.0 0.0 13940 300 ? S Nov07 0:00 postgres: aipo ai postgres 9156 0.0 0.1 13924 532 ? S Nov08 0:00 postgres: aipo ai postgres 9157 0.0 0.1 13924 672 ? S Nov08 0:00 postgres: aipo ai postgres 10028 0.0 0.1 13932 608 ? S Nov08 0:00 postgres: cel cel postgres 10033 0.0 0.1 13924 556 ? S Nov08 0:00 postgres: aipo ai postgres 10034 0.0 0.1 13924 680 ? S Nov08 0:00 postgres: aipo ai postgres 10035 0.0 0.1 13940 676 ? S Nov08 0:00 postgres: aipo ai postgres 10159 0.0 0.1 13932 652 ? S Nov08 0:00 postgres: cel cel postgres 10233 0.0 0.0 13924 496 ? S Nov08 0:00 postgres: aipo ai postgres 10234 0.0 0.1 13924 680 ? S Nov08 0:00 postgres: aipo ai postgres 10235 0.0 0.1 13940 676 ? S Nov08 0:00 postgres: aipo ai postgres 11014 0.0 0.1 13924 608 ? S Nov08 0:00 postgres: aipo ai postgres 11015 0.0 0.1 13924 680 ? S Nov08 0:00 postgres: aipo ai postgres 11016 0.0 0.1 13940 676 ? S Nov08 0:00 postgres: aipo ai postgres 11163 0.0 0.1 13932 584 ? S Nov08 0:00 postgres: cel cel postgres 11479 0.0 0.1 13940 676 ? S Nov08 0:00 postgres: aipo ai postgres 11643 0.0 0.1 13924 636 ? S Nov08 0:00 postgres: aipo ai postgres 11644 0.0 0.1 13924 680 ? S Nov08 0:00 postgres: aipo ai postgres 11645 0.0 0.1 13940 676 ? S Nov08 0:00 postgres: aipo ai postgres 12114 0.0 0.0 13940 500 ? S Nov08 0:00 postgres: cel cel postgres 12124 0.0 0.1 13940 680 ? S Nov08 0:00 postgres: aipo ai postgres 12138 0.0 0.1 13924 672 ? S Nov08 0:00 postgres: aipo ai postgres 12139 0.0 0.1 13924 684 ? S Nov08 0:00 postgres: aipo ai postgres 12140 0.0 0.1 13940 680 ? S Nov08 0:00 postgres: aipo ai postgres 12277 0.0 0.1 13932 660 ? S Nov08 0:00 postgres: cel cel postgres 12282 0.0 0.1 13940 684 ? S Nov08 0:00 postgres: aipo ai postgres 12312 0.0 0.1 13940 684 ? S Nov08 0:00 postgres: aipo ai postgres 12314 0.0 0.1 13924 624 ? S Nov08 0:00 postgres: aipo ai postgres 12315 0.0 0.1 13924 688 ? S Nov08 0:00 postgres: aipo ai postgres 12316 0.0 0.1 13940 684 ? S Nov08 0:00 postgres: aipo ai postgres 12318 0.0 0.1 13932 520 ? S Nov08 0:00 postgres: cel cel postgres 12471 0.0 0.1 13924 556 ? S Nov08 0:00 postgres: aipo ai postgres 12472 0.0 0.1 13924 692 ? S Nov08 0:00 postgres: aipo ai postgres 12473 0.0 0.1 13940 684 ? S Nov08 0:00 postgres: aipo ai postgres 12517 0.0 0.1 13924 604 ? S Nov08 0:00 postgres: aipo ai postgres 12518 0.0 0.1 13924 696 ? S Nov08 0:00 postgres: aipo ai postgres 12519 0.0 0.1 13940 688 ? S Nov08 0:00 postgres: aipo ai postgres 12625 0.0 0.1 13940 632 ? S Nov08 0:00 postgres: aipo ai postgres 12642 0.0 0.1 13940 692 ? S Nov08 0:00 postgres: aipo ai postgres 12657 0.0 0.1 13940 576 ? S Nov08 0:00 postgres: aipo ai postgres 12816 0.0 0.1 13932 676 ? S Nov08 0:00 postgres: cel cel postgres 12824 0.0 0.0 13932 284 ? S Nov08 0:00 postgres: cel cel postgres 12828 0.0 0.0 13924 432 ? S Nov08 0:00 postgres: aipo ai postgres 12829 0.0 0.1 13924 668 ? S Nov08 0:00 postgres: aipo ai postgres 12830 0.0 0.1 13940 696 ? S Nov08 0:00 postgres: aipo ai postgres 12832 0.0 0.1 13940 696 ? S Nov08 0:00 postgres: aipo ai postgres 12839 0.0 0.1 13944 540 ? S Nov08 0:00 postgres: aipo ai postgres 12842 0.0 0.1 13940 696 ? S Nov08 0:00 postgres: aipo ai postgres 12984 0.0 0.1 13924 572 ? S Nov08 0:00 postgres: aipo ai postgres 12985 0.0 0.1 13924 700 ? S Nov08 0:00 postgres: aipo ai postgres 12986 0.0 0.1 13940 660 ? S Nov08 0:00 postgres: aipo ai postgres 12993 0.0 0.1 13924 696 ? S Nov08 0:00 postgres: aipo ai postgres 12994 0.0 0.1 13924 700 ? S Nov08 0:00 postgres: aipo ai postgres 12995 0.0 0.1 13944 692 ? S Nov08 0:00 postgres: aipo ai postgres 12996 0.0 0.1 13928 584 ? S Nov08 0:00 postgres: aipo ai postgres 12997 0.0 0.1 13928 692 ? S Nov08 0:00 postgres: aipo ai postgres 12998 0.0 0.1 13944 688 ? S Nov08 0:00 postgres: aipo ai postgres 12999 0.0 0.1 13928 552 ? S Nov08 0:00 postgres: aipo ai postgres 13000 0.0 0.1 13928 692 ? S Nov08 0:00 postgres: aipo ai postgres 13001 0.0 0.1 13944 688 ? S Nov08 0:00 postgres: aipo ai postgres 13007 0.0 0.1 13928 564 ? S Nov08 0:00 postgres: aipo ai postgres 13008 0.0 0.1 13928 692 ? S Nov08 0:00 postgres: aipo ai postgres 13009 0.0 0.1 13944 688 ? S Nov08 0:00 postgres: aipo ai postgres 13017 0.0 0.1 13928 684 ? S Nov08 0:00 postgres: aipo ai postgres 13018 0.0 0.1 13928 692 ? S Nov08 0:00 postgres: aipo ai postgres 13019 0.0 0.1 13944 692 ? S Nov08 0:00 postgres: aipo ai postgres 13319 0.0 0.1 13944 692 ? S Nov09 0:00 postgres: aipo ai postgres 13321 0.0 0.1 13936 560 ? S Nov09 0:00 postgres: cel cel postgres 13340 0.0 0.1 13944 696 ? S Nov09 0:00 postgres: aipo ai postgres 13496 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 13497 0.0 0.1 13928 700 ? S Nov09 0:00 postgres: aipo ai postgres 13498 0.0 0.1 13944 696 ? S Nov09 0:00 postgres: aipo ai postgres 13500 0.0 0.1 13928 532 ? S Nov09 0:00 postgres: aipo ai postgres 13501 0.0 0.1 13928 696 ? S Nov09 0:00 postgres: aipo ai postgres 13502 0.0 0.1 13944 692 ? S Nov09 0:00 postgres: aipo ai postgres 13503 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 13504 0.0 0.1 13928 696 ? S Nov09 0:00 postgres: aipo ai postgres 13505 0.0 0.1 13944 608 ? S Nov09 0:00 postgres: aipo ai postgres 13509 0.0 0.0 13928 460 ? S Nov09 0:00 postgres: aipo ai postgres 13510 0.0 0.1 13928 684 ? S Nov09 0:00 postgres: aipo ai postgres 13511 0.0 0.1 13944 680 ? S Nov09 0:00 postgres: aipo ai postgres 14274 0.0 0.1 13936 568 ? S Nov09 0:00 postgres: cel cel postgres 14284 0.0 0.1 13928 600 ? S Nov09 0:00 postgres: aipo ai postgres 14285 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 14286 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 21700 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 21990 0.0 0.0 13928 508 ? S Nov09 0:00 postgres: aipo ai postgres 21991 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 21992 0.0 0.1 13944 632 ? S Nov09 0:00 postgres: aipo ai postgres 21993 0.0 0.0 13928 388 ? S Nov09 0:00 postgres: aipo ai postgres 21994 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 21995 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22003 0.0 0.1 13912 648 ? S Nov09 0:00 postgres: aipo ai postgres 22004 0.0 0.1 13912 628 ? S Nov09 0:00 postgres: aipo ai postgres 22010 0.0 0.1 13928 564 ? S Nov09 0:00 postgres: aipo ai postgres 22011 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 22012 0.0 0.1 13944 680 ? S Nov09 0:00 postgres: aipo ai postgres 22016 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22019 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22022 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22025 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22028 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22030 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22033 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22034 0.0 0.0 13928 416 ? S Nov09 0:00 postgres: aipo ai postgres 22035 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai postgres 22036 0.0 0.1 13944 684 ? S Nov09 0:00 postgres: aipo ai postgres 22038 0.0 0.1 13928 644 ? S Nov09 0:00 postgres: aipo ai postgres 22039 0.0 0.1 13928 688 ? S Nov09 0:00 postgres: aipo ai ===== Enric Naval Estudiante de Inform�tica de Gesti�n en la Udl (Lleida) __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree
Enric, I've noticed a few things that may be major problems, IMHO: 1 - If you're using JSP/Tomcat with postgresql, than you, at least, should consider using a connection pool. You simply can't predict the number of pageviews being requested by web users. Period. If every page spawns a new database connection, soon or later you're doomed. 2 - You made clear that you're dealing with legacy code, but, in using java, the programmer should code database access like this: try { get_db_connection(); do_some_queries(); } catch { deal_exceptions(); } finally { close_db_connection(); } I'm not an expert in computer programming. Maybe you can think better ways to do the same thing, but, at the end of the process, the database connection should be closed. Otherwise you'll get leaked connections. 3 - We have a similar setup here. But our database was being accessed from different perspectives: - a JSP page for report generation - normal, application specific data handling, storage and retrival - human beings checking data and managing the database One thing in common: everyone was connecting from the same machine running the database (it is also a web/jsp server), with the same user. We noticed some connection leaks and, despite I could link the leakage with apache/jsp/tomcat (more on that later), we agreed that in other circumstances we simply could not be able to determine where it was coming from. Quick solution: one database, multiple users, so we can track what process have spawned the db connection more easily. 4 - connection leaks from JSP Application servers normally are set up with very specific memory and garbage collection tuning. In java, garbage collection has a bit of undesirability, since it freezes the entire VM during its own execution. Maybe your JSP pages are leaking connections because the GC isn't running too often as you desire. Since your server isn't a very busy one, increasing the number of garbage collection runnings should close the undesired connections without touching legacy code. 5 - server reliability You stated that servers should work despite client behaviour or configuration. I respectfully disagree with that. Well, I even don't think your issue with the number of database connections is a problem after all. Postgres was able to determine that more people were trying to connect to it than what was speficied for it to handle. During this entire process, all your data were safe, all other active connections were being serviced. So, what's the issue here? If the number of connections were set to "infinite", you really could get into a Denial Of Service issue. That's why hard limits are useful. They are not mistakes or errors, but part of the process. That's all I had to say... Sorry for not including your original message, but I think this email is already too long to read without configuration files and commands output. Marcus.