Thread: Out of Memory - 8.2.4
"PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" Week-old install....still tuning and tweaking this thing. Over last 2 days, have spotted 10 "Out of Memory" errors in postgres logs (never saw before with same app/usage patterns on tuned hardware/postgres under FreeBSD) Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: out of memory. Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 local0.warning] [6-2] 2007-08-22 18:08:24 CDT DETAIL: Failed on request of size 536870910. What I found interesting is that It's ALWAYS the same size....536870910 I am running autovacuum and slony.....but I see nothing in the logs anywhere near the "out of memory" errors related to either (autovacuum used to under 8.0.X log INFO messages every time it vacuumed which came in handy...I assume it doesn't so this any more?) The events are fairly spread out...and cannot (by looking at app logs and rest of DB logs) correlate to any specific query or activity. Any help would be appreciated Box is a Sun X4600 with 8 dual-core processors and 32 gig of ram. # su - pgsql Sun Microsystems Inc. SunOS 5.10 Generic January 2005 -bash-3.00$ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited open files (-n) 256 pipe size (512 bytes, -p) 10 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 16357 virtual memory (kbytes, -v) unlimited shared_buffers = 3GB # min 128kB or max_connections*16kB temp_buffers = 1000 # min 800kB was 8MB max_prepared_transactions = 450 # can be 0 or more work_mem = 100MB # min 64kB maintenance_work_mem = 512MB # min 1MB #max_stack_depth = 2MB # min 100kB max_fsm_pages = 208000 # min max_fsm_relations*16, 6 bytes each max_fsm_relations = 10000 # min 100, ~70 bytes each #max_files_per_process = 1000 # min 25 #shared_preload_libraries = '' # (change requires restart) fsync = on # turns forced synchronization on or off wal_sync_method = fdatasync # the default is the first option full_page_writes = off # recover from partial page writes wal_buffers = 2300 # min 32kB commit_delay = 10 # range 0-100000, in microseconds #commit_siblings = 5 # range 1-1000 checkpoint_segments = 128 # in logfile segments, min 1, 16MB each checkpoint_timeout = 5min # range 30s-1h checkpoint_warning = 99s # 0 is off ____________________________________________________________________________________ Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545469
On Aug 24, 2007, at 10:09 AM, Jeff Amiel wrote: > "PostgreSQL 8.2.4 on i386-pc-solaris2.10, compiled by > GCC gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)" > > > Week-old install....still tuning and tweaking this > thing. > > Over last 2 days, have spotted 10 "Out of Memory" > errors in postgres logs (never saw before with same > app/usage patterns on tuned hardware/postgres under > FreeBSD) > > Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 > local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: > out of memory. > Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 > local0.warning] [6-2] 2007-08-22 18:08:24 CDT > DETAIL: Failed on request of size 536870910. > > What I found interesting is that It's ALWAYS the same > size....536870910 > > I am running autovacuum and slony.....but I see > nothing in the logs anywhere near the "out of memory" > errors related to either (autovacuum used to under > 8.0.X log INFO messages every time it vacuumed which > came in handy...I assume it doesn't so this any more?) > > > The events are fairly spread out...and cannot (by > looking at app logs and rest of DB logs) correlate to > any specific query or activity. > > Any help would be appreciated > > Box is a Sun X4600 with 8 dual-core processors and 32 > gig of ram. > > # su - pgsql > Sun Microsystems Inc. SunOS 5.10 Generic > January 2005 > -bash-3.00$ ulimit -a > core file size (blocks, -c) unlimited > data seg size (kbytes, -d) unlimited > file size (blocks, -f) unlimited > open files (-n) 256 > pipe size (512 bytes, -p) 10 > stack size (kbytes, -s) 10240 > cpu time (seconds, -t) unlimited > max user processes (-u) 16357 > virtual memory (kbytes, -v) unlimited > > shared_buffers = 3GB # min 128kB or > max_connections*16kB > temp_buffers = 1000 # min 800kB > was 8MB > max_prepared_transactions = 450 # can be 0 or > more > work_mem = 100MB # min > 64kB > maintenance_work_mem = 512MB # min 1MB > #max_stack_depth = 2MB # min 100kB > max_fsm_pages = 208000 # min > max_fsm_relations*16, 6 bytes each > max_fsm_relations = 10000 # min 100, ~70 > bytes each > #max_files_per_process = 1000 # min 25 > #shared_preload_libraries = '' # (change > requires restart) > fsync = on # turns forced > synchronization on or off > wal_sync_method = fdatasync # the default > is the first option > full_page_writes = off # recover from > partial page writes > wal_buffers = 2300 # min 32kB > commit_delay = 10 # range > 0-100000, in microseconds > #commit_siblings = 5 # range 1-1000 > checkpoint_segments = 128 # in logfile > segments, min 1, 16MB each > checkpoint_timeout = 5min # range 30s-1h > checkpoint_warning = 99s # 0 is off A few weeks ago I got the same error on the same server. In fact, the only difference is our memory where you have 32GB and I have 16GB and you have 512MB maintenance_work_mem and I have 256MB. I point out the maintenance work memory setting as that is pretty much exactly what the request size that your error pointed out as was mine (yours/2). However, that was the only time I've seen this. Below is the full context of the error report in my log. I see that there is an Autovacuum context as well as references to a toast table so, something to do with autovacuum? TopMemoryContext: 14466424 total in 1758 blocks; 7160792 free (12578 chunks); 7305632 used TopTransactionContext: 8192 total in 1 blocks; 7688 free (10 chunks); 504 used Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks); 3320 used Autovacuum context: 16769024 total in 11 blocks; 6959320 free (11 chunks); 9809704 used smgr relation table: 24576 total in 2 blocks; 11952 free (4 chunks); 12624 used TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 256 free (0 chunks); 7936 used CacheMemoryContext: 1183288 total in 20 blocks; 889824 free (4094 chunks); 293464 used pg_toast_356294_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_index_indrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_namespace_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_operator_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_index_indexrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_class_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks); 19752 used Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks); 19752 used Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks); 19752 used Per-database table: 57344 total in 3 blocks; 37592 free (13 chunks); 19752 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Per-database table: 100671512 total in 22 blocks; 5803552 free (91 chunks); 94867960 used Databases hash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used MdSmgr: 8192 total in 1 blocks; 7936 free (226 chunks); 256 used LOCALLOCK hash: 24576 total in 2 blocks; 10000 free (4 chunks); 14576 used Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used Postmaster: 24576 total in 2 blocks; 20264 free (155 chunks); 4312 used ErrorContext: 8192 total in 1 blocks; 8176 free (4 chunks); 16 used 2007-08-08 20:21:05 CDT 3716 :ERROR: out of memory 2007-08-08 20:21:05 CDT 3716 :DETAIL: Failed on request of size 268435452. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Jeff Amiel <becauseimjeff@yahoo.com> writes: > Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 > local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: > out of memory. > Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 > local0.warning] [6-2] 2007-08-22 18:08:24 CDT > DETAIL: Failed on request of size 536870910. > What I found interesting is that It's ALWAYS the same > size....536870910 > maintenance_work_mem = 512MB # min 1MB Apparently this maintenance_work_mem setting is higher than your system can reliably provide. Knock it back a bit. regards, tom lane
On Aug 24, 2007, at 11:46 AM, Tom Lane wrote: > Jeff Amiel <becauseimjeff@yahoo.com> writes: >> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 >> local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: >> out of memory. >> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 >> local0.warning] [6-2] 2007-08-22 18:08:24 CDT >> DETAIL: Failed on request of size 536870910. > >> What I found interesting is that It's ALWAYS the same >> size....536870910 > >> maintenance_work_mem = 512MB # min 1MB > > Apparently this maintenance_work_mem setting is higher than your > system > can reliably provide. Knock it back a bit. > > regards, tom lane I'm not so sure. In my case, the request size was only 256MB and we maintain about 10 - 11 GB free of our 16 GB of memory (2GB shared_buffers, 42MB work_mem, and 256 MB maintenance_work_mem). The toast table that was involved in the error was pretty small and I was able to successfully vacuum it myself virtually instantly. However, in my case, this (so far) being a one time error I don't have much more data to contribute. We constantly monitor and graph our system's I/O, CPU, and memory usage and scan our logs for errors so if anything else comes up I'll be sure to share. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Erik Jones wrote: > On Aug 24, 2007, at 11:46 AM, Tom Lane wrote: > >> Jeff Amiel <becauseimjeff@yahoo.com> writes: >>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 >>> local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: >>> out of memory. >>> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 >>> local0.warning] [6-2] 2007-08-22 18:08:24 CDT >>> DETAIL: Failed on request of size 536870910. >> >>> What I found interesting is that It's ALWAYS the same >>> size....536870910 >> >>> maintenance_work_mem = 512MB # min 1MB >> >> Apparently this maintenance_work_mem setting is higher than your system >> can reliably provide. Knock it back a bit. >> >> regards, tom lane > > I'm not so sure. In my case, the request size was only 256MB and we > maintain about 10 - 11 GB free of our 16 GB of memory (2GB > shared_buffers, 42MB work_mem, and 256 MB maintenance_work_mem). The > toast table that was involved in the error was pretty small and I was > able to successfully vacuum it myself virtually instantly. However, in > my case, this (so far) being a one time error I don't have much more > data to contribute. We constantly monitor and graph our system's I/O, > CPU, and memory usage and scan our logs for errors so if anything else > comes up I'll be sure to share. We are actually diagnosing a similar problem on this end, where we get a failure at 1920... I am currently trying to get some DEBUG output. Sincerely, Joshua D. Drake > > > Erik Jones > > Software Developer | Emma® > erik@myemma.com > 800.595.4401 or 615.292.5888 > 615.292.0777 (fax) > > Emma helps organizations everywhere communicate & market in style. > Visit us online at http://www.myemma.com > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGzx/gATb/zqfZUUQRAg10AJ9bmIUZ8V99vVCDZfWH05PWckf49QCfa4ta G1daeagQY2CMUR1QDMtuXTQ= =HxG3 -----END PGP SIGNATURE-----
--- "Joshua D. Drake" <jd@commandprompt.com> wrote: > We are actually diagnosing a similar problem on this > end, where we get a > failure at 1920... I am currently trying to get some > DEBUG output. We are actually getting it semi-regularly today (3 times already)....I would be happy to provide some more info if somebody guides me (just set log_min_messages to one of the debug settings?) ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Jeff Amiel wrote: > --- "Joshua D. Drake" <jd@commandprompt.com> wrote: > > >> We are actually diagnosing a similar problem on this >> end, where we get a >> failure at 1920... I am currently trying to get some >> DEBUG output. > > We are actually getting it semi-regularly today (3 > times already)....I would be happy to provide some > more info if somebody guides me (just set > log_min_messages to one of the debug settings?) Having log_line_prefix with at least %p and %m (or %t) plus a log_min_messages of DEBUG2 would be great. Joshua D. Drake > > > > > > ____________________________________________________________________________________ > Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. > http://autos.yahoo.com/green_center/ > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGzySaATb/zqfZUUQRAqxJAJwL8VcEjDJ1dwQYuvEPh4pORCRUQQCeIwAO ajfjr7m1bTy9r5DFuNmUP6Y= =zq4y -----END PGP SIGNATURE-----
--- "Joshua D. Drake" <jd@commandprompt.com> wrote: > Having log_line_prefix with at least %p and %m (or > %t) plus a > log_min_messages of DEBUG2 would be great. i am getting the additional timestampt/pid on my log lines now....but no additional debug output... is log_min_messages one of them that requires a restart? ____________________________________________________________________________________ Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. http://answers.yahoo.com/dir/?link=list&sid=396545433
Jeff Amiel <becauseimjeff@yahoo.com> writes: > is log_min_messages one of them that requires a > restart? No, SIGHUP (pg_ctl reload) should be sufficient. regards, tom lane
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Amiel <becauseimjeff@yahoo.com> writes: > > is log_min_messages one of them that requires a > > restart? > > No, SIGHUP (pg_ctl reload) should be sufficient. Weird.... looks like some items are going to syslog and some to my defined postgres logfile (from -L option). Bizarre. Anyway....I hope this helps someone..... At 10:46, I find this in my syslog files.. Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 27008ERROR: out of memory Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 27008DETAIL: Failed on request of size 536870910. and at the same time in my postgres logfile I see this (and only this) TopMemoryContext: 169608 total in 10 blocks; 18832 free (34 chunks); 150776 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (9 chunks); 544 used CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks); 3320 used Autovacuum context: 8192 total in 1 blocks; 5416 free (8 chunks); 2776 used smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used CacheMemoryContext: 659000 total in 19 blocks; 264904 free (15 chunks); 394096 used sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0 chunks); 768 used PartInd_istream_replication_cluster_sl_log_1-node-1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_index_indrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_autovacuum_vacrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_operator_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_namespace_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_index_indexrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_database_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_class_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_cast_source_target_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_amop_opr_opc_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used Per-database table: 122880 total in 4 blocks; 44680 free (19 chunks); 78200 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Databases hash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used MdSmgr: 8192 total in 1 blocks; 8056 free (1 chunks); 136 used LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used Postmaster: 24576 total in 2 blocks; 13576 free (123 chunks); 11000 used ErrorContext: 8192 total in 1 blocks; 8176 free (11 chunks); 16 used ____________________________________________________________________________________ Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. http://autos.yahoo.com/green_center/
--- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jeff Amiel <becauseimjeff@yahoo.com> writes: > > is log_min_messages one of them that requires a > > restart? > > No, SIGHUP (pg_ctl reload) should be sufficient. Weird.... looks like some items are going to syslog and some to my defined postgres logfile (from -L option). Bizarre. Anyway....I hope this helps someone..... At 10:46, I find this in my syslog files.. Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 27008ERROR: out of memory Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 27008DETAIL: Failed on request of size 536870910. and at the same time in my postgres logfile I see this (and only this) TopMemoryContext: 169608 total in 10 blocks; 18832 free (34 chunks); 150776 used TopTransactionContext: 8192 total in 1 blocks; 7648 free (9 chunks); 544 used CFuncHash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used Type information cache: 8192 total in 1 blocks; 1800 free (0 chunks); 6392 used Operator class cache: 8192 total in 1 blocks; 4872 free (0 chunks); 3320 used Autovacuum context: 8192 total in 1 blocks; 5416 free (8 chunks); 2776 used smgr relation table: 8192 total in 1 blocks; 2808 free (0 chunks); 5384 used TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used Relcache by OID: 8192 total in 1 blocks; 3376 free (0 chunks); 4816 used CacheMemoryContext: 659000 total in 19 blocks; 264904 free (15 chunks); 394096 used sl_seqlog_idx: 1024 total in 1 blocks; 256 free (0 chunks); 768 used PartInd_istream_replication_cluster_sl_log_1-node-1: 1024 total in 1 blocks; 392 free (0 chunks); 632 used sl_log_1_idx1: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_index_indrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_autovacuum_vacrelid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_type_typname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_type_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_statistic_relid_att_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_auth_members_member_role_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_auth_members_role_member_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 256 free (0 chunks); 768 used pg_proc_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_operator_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_namespace_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_namespace_nspname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_language_name_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_index_indexrelid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_authid_rolname_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_database_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_conversion_oid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used pg_conversion_name_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_conversion_default_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used pg_class_relname_nsp_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_class_oid_index: 1024 total in 1 blocks; 352 free (0 chunks); 672 used pg_cast_source_target_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 328 free (0 chunks); 696 used pg_amproc_opc_proc_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_amop_opr_opc_index: 1024 total in 1 blocks; 288 free (0 chunks); 736 used pg_amop_opc_strat_index: 1024 total in 1 blocks; 216 free (0 chunks); 808 used pg_aggregate_fnoid_index: 1024 total in 1 blocks; 392 free (0 chunks); 632 used Per-database table: 122880 total in 4 blocks; 44680 free (19 chunks); 78200 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Per-database table: 24576 total in 2 blocks; 13040 free (5 chunks); 11536 used Databases hash: 8192 total in 1 blocks; 4936 free (0 chunks); 3256 used MdSmgr: 8192 total in 1 blocks; 8056 free (1 chunks); 136 used LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used Timezones: 48616 total in 2 blocks; 5968 free (0 chunks); 42648 used Postmaster: 24576 total in 2 blocks; 13576 free (123 chunks); 11000 used ErrorContext: 8192 total in 1 blocks; 8176 free (11 chunks); 16 used ____________________________________________________________________________________ Yahoo! oneSearch: Finally, mobile search that gives answers, not web links. http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC
--- "Joshua D. Drake" <jd@commandprompt.com> wrote: > We are actually diagnosing a similar problem on this > end, where we get a > failure at 1920... I am currently trying to get some > DEBUG output. Tracking for last few days. Does not appear to happen when little or no user activity (like Saturday) I don't know if that rules out autovacuum or not (if no update threshholds are reached, no vacuuming will take place anyway) Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848 local0.warning] [2-1] 2007-08-23 11:11:51 CDT ERROR: out of memory Aug 23 11:11:51 db-1 postgres[8455]: [ID 748848 local0.warning] [2-2] 2007-08-23 11:11:51 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848 local0.warning] [2-1] 2007-08-23 11:34:49 CDT ERROR: out of memory Aug 23 11:34:49 db-1 postgres[8910]: [ID 748848 local0.warning] [2-2] 2007-08-23 11:34:49 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848 local0.warning] [2-1] 2007-08-23 12:06:47 CDT ERROR: out of memory Aug 23 12:06:47 db-1 postgres[9562]: [ID 748848 local0.warning] [2-2] 2007-08-23 12:06:47 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848 local0.warning] [2-1] 2007-08-23 12:58:47 CDT ERROR: out of memory Aug 23 12:58:47 db-1 postgres[10617]: [ID 748848 local0.warning] [2-2] 2007-08-23 12:58:47 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848 local0.warning] [2-1] 2007-08-23 15:15:35 CDT ERROR: out of memory Aug 23 15:15:35 db-1 postgres[13400]: [ID 748848 local0.warning] [2-2] 2007-08-23 15:15:35 CDT DETAIL: Failed on request of size 536870910. -- Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848 local0.warning] [2-1] 2007-08-23 16:50:47 CDT ERROR: out of memory Aug 23 16:50:47 db-1 postgres[15422]: [ID 748848 local0.warning] [2-2] 2007-08-23 16:50:47 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848 local0.warning] [2-1] 2007-08-24 10:46:46 CDT ERROR: out of memory Aug 24 10:46:46 db-1 postgres[10508]: [ID 748848 local0.warning] [2-2] 2007-08-24 10:46:46 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848 local0.warning] [2-1] 2007-08-24 11:29:00 CDT ERROR: out of memory Aug 24 11:29:00 db-1 postgres[11539]: [ID 748848 local0.warning] [2-2] 2007-08-24 11:29:00 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848 local0.warning] [2-1] 2007-08-24 11:50:04 CDT ERROR: out of memory Aug 24 11:50:04 db-1 postgres[12051]: [ID 748848 local0.warning] [2-2] 2007-08-24 11:50:04 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848 local0.warning] [2-1] 2007-08-24 12:00:33 CDT ERROR: out of memory Aug 24 12:00:33 db-1 postgres[12310]: [ID 748848 local0.warning] [2-2] 2007-08-24 12:00:33 CDT DETAIL: Failed on request of size 536870910. -- Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848 local0.warning] [2493-1] 2007-08-24 16:03:19.296 CDT 18263ERROR: out of memory Aug 24 16:03:19 db-1 postgres[18263]: [ID 748848 local0.warning] [2493-2] 2007-08-24 16:03:19.296 CDT 18263DETAIL: Failed on request of size 536870910. -- Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848 local0.warning] [3356-1] 2007-08-24 16:45:46.804 CDT 19313ERROR: out of memory Aug 24 16:45:46 db-1 postgres[19313]: [ID 748848 local0.warning] [3356-2] 2007-08-24 16:45:46.804 CDT 19313DETAIL: Failed on request of size 536870910. -- Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848 local0.warning] [4238-1] 2007-08-24 17:29:16.926 CDT 20379ERROR: out of memory Aug 24 17:29:16 db-1 postgres[20379]: [ID 748848 local0.warning] [4238-2] 2007-08-24 17:29:16.926 CDT 20379DETAIL: Failed on request of size 536870910. -- Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848 local0.warning] [4452-1] 2007-08-24 17:40:02.682 CDT 20651ERROR: out of memory Aug 24 17:40:02 db-1 postgres[20651]: [ID 748848 local0.warning] [4452-2] 2007-08-24 17:40:02.682 CDT 20651DETAIL: Failed on request of size 536870910. -- Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848 local0.warning] [56115-1] 2007-08-26 11:14:56.077 CDT 22161ERROR: out of memory Aug 26 11:14:56 db-1 postgres[22161]: [ID 748848 local0.warning] [56115-2] 2007-08-26 11:14:56.077 CDT 22161DETAIL: Failed on request of size 536870910. -- Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848 local0.warning] [56381-1] 2007-08-26 11:27:41.141 CDT 22477ERROR: out of memory Aug 26 11:27:41 db-1 postgres[22477]: [ID 748848 local0.warning] [56381-2] 2007-08-26 11:27:41.141 CDT 22477DETAIL: Failed on request of size 536870910. -- Aug 26 11:37:27 db-1 postgres[22729]: [ID 748848 local0.warning] [56603-1] 2007-08-26 11:37:27.476 CDT 22729ERROR: out of memory Aug 26 11:37:27 db-1 postgres[22729]: [ID 748848 local0.warning] [56603-2] 2007-08-26 11:37:27.476 CDT 22729DETAIL: Failed on request of size 536870910. -- Aug 26 13:02:47 db-1 postgres[24831]: [ID 748848 local0.warning] [58357-1] 2007-08-26 13:02:47.721 CDT 24831ERROR: out of memory Aug 26 13:02:47 db-1 postgres[24831]: [ID 748848 local0.warning] [58357-2] 2007-08-26 13:02:47.721 CDT 24831DETAIL: Failed on request of size 536870910. -- Aug 26 14:15:54 db-1 postgres[26625]: [ID 748848 local0.warning] [59885-1] 2007-08-26 14:15:54.583 CDT 26625ERROR: out of memory Aug 26 14:15:54 db-1 postgres[26625]: [ID 748848 local0.warning] [59885-2] 2007-08-26 14:15:54.583 CDT 26625DETAIL: Failed on request of size 536870910. -- Aug 26 14:38:10 db-1 postgres[27167]: [ID 748848 local0.warning] [60334-1] 2007-08-26 14:38:10.817 CDT 27167ERROR: out of memory Aug 26 14:38:10 db-1 postgres[27167]: [ID 748848 local0.warning] [60334-2] 2007-08-26 14:38:10.817 CDT 27167DETAIL: Failed on request of size 536870910. -- Aug 26 14:57:42 db-1 postgres[27662]: [ID 748848 local0.warning] [60748-1] 2007-08-26 14:57:42.690 CDT 27662ERROR: out of memory Aug 26 14:57:42 db-1 postgres[27662]: [ID 748848 local0.warning] [60748-2] 2007-08-26 14:57:42.690 CDT 27662DETAIL: Failed on request of size 536870910. -- Aug 26 17:25:41 db-1 postgres[1352]: [ID 748848 local0.warning] [63840-1] 2007-08-26 17:25:41.189 CDT 1352ERROR: out of memory Aug 26 17:25:41 db-1 postgres[1352]: [ID 748848 local0.warning] [63840-2] 2007-08-26 17:25:41.189 CDT 1352DETAIL: Failed on request of size 536870910. -- Aug 26 18:10:21 db-1 postgres[2467]: [ID 748848 local0.warning] [64756-1] 2007-08-26 18:10:21.684 CDT 2467ERROR: out of memory Aug 26 18:10:21 db-1 postgres[2467]: [ID 748848 local0.warning] [64756-2] 2007-08-26 18:10:21.684 CDT 2467DETAIL: Failed on request of size 536870910. -- Aug 26 18:42:15 db-1 postgres[3246]: [ID 748848 local0.warning] [65420-1] 2007-08-26 18:42:15.973 CDT 3246ERROR: out of memory Aug 26 18:42:15 db-1 postgres[3246]: [ID 748848 local0.warning] [65420-2] 2007-08-26 18:42:15.973 CDT 3246DETAIL: Failed on request of size 536870910. -- Aug 27 08:05:48 db-1 postgres[23092]: [ID 748848 local0.warning] [82122-1] 2007-08-27 08:05:48.214 CDT 23092ERROR: out of memory Aug 27 08:05:48 db-1 postgres[23092]: [ID 748848 local0.warning] [82122-2] 2007-08-27 08:05:48.214 CDT 23092DETAIL: Failed on request of size 536870910. -- Aug 27 08:25:06 db-1 postgres[23569]: [ID 748848 local0.warning] [82520-1] 2007-08-27 08:25:06.407 CDT 23569ERROR: out of memory Aug 27 08:25:06 db-1 postgres[23569]: [ID 748848 local0.warning] [82520-2] 2007-08-27 08:25:06.407 CDT 23569DETAIL: Failed on request of size 536870910. -- Aug 27 08:38:05 db-1 postgres[23909]: [ID 748848 local0.warning] [82785-1] 2007-08-27 08:38:05.991 CDT 23909ERROR: out of memory Aug 27 08:38:05 db-1 postgres[23909]: [ID 748848 local0.warning] [82785-2] 2007-08-27 08:38:05.991 CDT 23909DETAIL: Failed on request of size 536870910. -- Aug 27 09:20:09 db-1 postgres[24945]: [ID 748848 local0.warning] [83640-1] 2007-08-27 09:20:09.331 CDT 24945ERROR: out of memory Aug 27 09:20:09 db-1 postgres[24945]: [ID 748848 local0.warning] [83640-2] 2007-08-27 09:20:09.331 CDT 24945DETAIL: Failed on request of size 536870910. -- Aug 27 09:30:08 db-1 postgres[25155]: [ID 748848 local0.warning] [83857-1] 2007-08-27 09:30:08.536 CDT 25155ERROR: out of memory Aug 27 09:30:08 db-1 postgres[25155]: [ID 748848 local0.warning] [83857-2] 2007-08-27 09:30:08.536 CDT 25155DETAIL: Failed on request of size 536870910. -- Aug 27 09:40:01 db-1 postgres[25396]: [ID 748848 local0.warning] [84040-1] 2007-08-27 09:40:01.195 CDT 25396ERROR: out of memory Aug 27 09:40:01 db-1 postgres[25396]: [ID 748848 local0.warning] [84040-2] 2007-08-27 09:40:01.195 CDT 25396DETAIL: Failed on request of size 536870910. -- Aug 27 09:53:16 db-1 postgres[25729]: [ID 748848 local0.warning] [84289-1] 2007-08-27 09:53:16.815 CDT 25729ERROR: out of memory Aug 27 09:53:16 db-1 postgres[25729]: [ID 748848 local0.warning] [84289-2] 2007-08-27 09:53:16.815 CDT 25729DETAIL: Failed on request of size 536870910. -- Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-1] 2007-08-27 10:46:01.888 CDT 27008ERROR: out of memory Aug 27 10:46:01 db-1 postgres[27008]: [ID 748848 local0.warning] [85355-2] 2007-08-27 10:46:01.888 CDT 27008DETAIL: Failed on request of size 536870910. ____________________________________________________________________________________ Sick sense of humor? Visit Yahoo! TV's Comedy with an Edge to see what's on, when. http://tv.yahoo.com/collections/222
Jeff Amiel <becauseimjeff@yahoo.com> writes: > Tracking for last few days. > Does not appear to happen when little or no user > activity (like Saturday) I don't know if that rules > out autovacuum or not (if no update threshholds are > reached, no vacuuming will take place anyway) Can you correlate these occurrences with anything in the regular system logs (kernel log in particular)? The Postgres log shows nothing out of the ordinary --- it's simply that the kernel won't give us 512M for some reason. I'm guessing it's got something to do with overall system load. regards, tom lane
On Mon, Aug 27, 2007 at 09:12:17AM -0700, Jeff Amiel wrote: > Tracking for last few days. > Does not appear to happen when little or no user > activity (like Saturday) I don't know if that rules > out autovacuum or not (if no update threshholds are > reached, no vacuuming will take place anyway) I don't think I've seen it so far this thread, but what are your memory overcommit settings and allocated swap? At least on Linux you would need a significant chunk of swap to be able to work with that much memory, even with overcommit off. Check the rules for your system. Another thing I havn't seen mentioned: you appear to be on a 32-bit architecture and with 2GB shared_buffers you've lost half your address space on that alone. Perhaps you simply don't have enough contiguous address space to alloc 512MB. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
On Aug 27, 2007, at 12:15 PM, Martijn van Oosterhout wrote: > On Mon, Aug 27, 2007 at 09:12:17AM -0700, Jeff Amiel wrote: >> Tracking for last few days. >> Does not appear to happen when little or no user >> activity (like Saturday) I don't know if that rules >> out autovacuum or not (if no update threshholds are >> reached, no vacuuming will take place anyway) > > I don't think I've seen it so far this thread, but what are your > memory > overcommit settings and allocated swap? At least on Linux you would > need a significant chunk of swap to be able to work with that much > memory, even with overcommit off. Check the rules for your system. > > Another thing I havn't seen mentioned: you appear to be on a 32-bit > architecture and with 2GB shared_buffers you've lost half your address > space on that alone. Perhaps you simply don't have enough contiguous > address space to alloc 512MB. The X4600 runs with 64-bit Dual Opterons. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Erik Jones wrote: > > On Aug 27, 2007, at 12:15 PM, Martijn van Oosterhout wrote: > >> On Mon, Aug 27, 2007 at 09:12:17AM -0700, Jeff Amiel wrote: >>> Tracking for last few days. >>> Does not appear to happen when little or no user >>> activity (like Saturday) I don't know if that rules >>> out autovacuum or not (if no update threshholds are >>> reached, no vacuuming will take place anyway) >> >> I don't think I've seen it so far this thread, but what are your memory >> overcommit settings and allocated swap? At least on Linux you would >> need a significant chunk of swap to be able to work with that much >> memory, even with overcommit off. Check the rules for your system. >> >> Another thing I havn't seen mentioned: you appear to be on a 32-bit >> architecture and with 2GB shared_buffers you've lost half your address >> space on that alone. Perhaps you simply don't have enough contiguous >> address space to alloc 512MB. > > The X4600 runs with 64-bit Dual Opterons. The machine we are tracking this problem on is also 64bit. Joshua D. Drake > > Erik Jones > > Software Developer | Emma® > erik@myemma.com > 800.595.4401 or 615.292.5888 > 615.292.0777 (fax) > > Emma helps organizations everywhere communicate & market in style. > Visit us online at http://www.myemma.com > > > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFG0xomATb/zqfZUUQRAnfUAJ4jQhMV9vEqL8I7zyT59qo0vhbxuACeLH9d +PpbVOWYxMkrNC/+V4meHSs= =DK8s -----END PGP SIGNATURE-----
--- "Joshua D. Drake" <jd@commandprompt.com> wrote: > > The machine we are tracking this problem on is also 64bit. Hmmmm.....looks like 3 different people are tracking a similar issue on 64 bit platforms.....you, Erik and myself. ____________________________________________________________________________________Ready for the edge of your seat? Check out tonight's top picks on Yahoo! TV. http://tv.yahoo.com/
Yes, but fortunately for me, unfortunately for the list, it's only happened to me once so I don't really have anything to go on wrt repeating the problem. I can only say, "Yep! It's happened!" I am watching my db closely, though. Well, my monitoring scripts are :) On Aug 27, 2007, at 1:56 PM, Jeff Amiel wrote: > --- "Joshua D. Drake" <jd@commandprompt.com> wrote: >> >> The machine we are tracking this problem on is also 64bit. > > Hmmmm.....looks like 3 different people are tracking a similar > issue on 64 bit platforms.....you, > Erik and myself. > > > > > > > > > ______________________________________________________________________ > ______________Ready for the edge of your seat? > Check out tonight's top picks on Yahoo! TV. > http://tv.yahoo.com/ > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that > your > message can get through to the mailing list cleanly Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
I notice in the log entries for the out of memory events have no username, database name or host identifier (while regular logged events do) Does that mean anything to anybody? Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-1] 2007-08-28 08:25:50.081 CDT 29019ERROR: out of memory Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-2] 2007-08-28 08:25:50.081 CDT 29019DETAIL: Failed on request of size 536870910. (regular log entry) Aug 28 08:26:45 db-1 postgres[28785]: [ID 748848 local0.info] [114999-1] 2007-08-28 08:26:45.413 CDT jboss prod 192.168.20.44 28785LOG: duration: 22606.146 ms execute <unnamed>: select --- Tom Lane <tgl@sss.pgh.pa.us> wrote: > Can you correlate these occurrences with anything in the regular system > logs (kernel log in particular)? The Postgres log shows nothing out of > the ordinary --- it's simply that the kernel won't give us 512M for some > reason. I'm guessing it's got something to do with overall system load. > > regards, tom lane > ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/
Jeff Amiel wrote: > I notice in the log entries for the out of memory events have no username, database name or host > identifier (while regular logged events do) Does that mean anything to anybody? > > Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-1] 2007-08-28 > 08:25:50.081 CDT 29019ERROR: out of memory > Aug 28 08:25:50 db-1 postgres[29019]: [ID 748848 local0.warning] [111900-2] 2007-08-28 > 08:25:50.081 CDT 29019DETAIL: Failed on request of size 536870910. > > (regular log entry) > Aug 28 08:26:45 db-1 postgres[28785]: [ID 748848 local0.info] [114999-1] 2007-08-28 08:26:45.413 > CDT jboss prod 192.168.20.44 28785LOG: duration: 22606.146 ms execute <unnamed>: select Interesting. What's your log_line_prefix? Does it have "%q" somewhere? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Jeff Amiel <becauseimjeff@yahoo.com> writes: > I notice in the log entries for the out of memory events have no username, database name or host > identifier (while regular logged events do) Does that mean anything to anybody? Means they're coming from autovacuum, likely? Autovacuum probably *should* populate the database field when possible, but the other two are meaningless for it. regards, tom lane
--- Alvaro Herrera <alvherre@commandprompt.com> wrote: >2.168.20.44 28785LOG: duration: 22606.146 ms execute <unnamed>: select > > Interesting. What's your log_line_prefix? Does it have "%q" somewhere? No, no %q...not quite sure what it means: "stop here in non-session processes" ____________________________________________________________________________________ Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7
On Aug 28, 2007, at 3:55 PM, Tom Lane wrote: > Jeff Amiel <becauseimjeff@yahoo.com> writes: >> I notice in the log entries for the out of memory events have no >> username, database name or host >> identifier (while regular logged events do) Does that mean >> anything to anybody? > > Means they're coming from autovacuum, likely? Such was my original supposition given that the memory context output that preceded the actual error in my log included a line for "Autovacuum Context: ..." Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
On 8/24/07, Jeff Amiel <becauseimjeff@yahoo.com> wrote: > Over last 2 days, have spotted 10 "Out of Memory" > errors in postgres logs (never saw before with same > app/usage patterns on tuned hardware/postgres under > FreeBSD) > > Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 > local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: > out of memory. > Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 > local0.warning] [6-2] 2007-08-22 18:08:24 CDT > DETAIL: Failed on request of size 536870910. > > What I found interesting is that It's ALWAYS the same > size....536870910 > > I am running autovacuum and slony.....but I see > nothing in the logs anywhere near the "out of memory" > errors related to either (autovacuum used to under > 8.0.X log INFO messages every time it vacuumed which > came in handy...I assume it doesn't so this any more?) > > > The events are fairly spread out...and cannot (by > looking at app logs and rest of DB logs) correlate to > any specific query or activity. > > Any help would be appreciated I've experienced something similar. The reason turned out to be combination of overcommit=off, big maint_mem and several parallel vacuums for fast-changing tables. Seems like VACUUM allocates full maint_mem before start, whatever the actual size of the table. Fix was to put "set maint_mem=32M" before small vacuums and serialize some of them. -- marko
On Aug 28, 2007, at 4:33 PM, Marko Kreen wrote: > On 8/24/07, Jeff Amiel <becauseimjeff@yahoo.com> wrote: >> Over last 2 days, have spotted 10 "Out of Memory" >> errors in postgres logs (never saw before with same >> app/usage patterns on tuned hardware/postgres under >> FreeBSD) >> >> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 >> local0.warning] [6-1] 2007-08-22 18:08:24 CDT ERROR: >> out of memory. >> Aug 22 18:08:24 db-1 postgres[16452]: [ID 748848 >> local0.warning] [6-2] 2007-08-22 18:08:24 CDT >> DETAIL: Failed on request of size 536870910. >> >> What I found interesting is that It's ALWAYS the same >> size....536870910 >> >> I am running autovacuum and slony.....but I see >> nothing in the logs anywhere near the "out of memory" >> errors related to either (autovacuum used to under >> 8.0.X log INFO messages every time it vacuumed which >> came in handy...I assume it doesn't so this any more?) >> >> >> The events are fairly spread out...and cannot (by >> looking at app logs and rest of DB logs) correlate to >> any specific query or activity. >> >> Any help would be appreciated > > I've experienced something similar. The reason turned out to be > combination of overcommit=off, big maint_mem and several parallel > vacuums for fast-changing tables. Seems like VACUUM allocates > full maint_mem before start, whatever the actual size of the table. > > Fix was to put "set maint_mem=32M" before small vacuums and > serialize some of them. > I'm not sure about the OP but I know that we just run autovacuum so no problem with parallel vacuums. In addition, Solaris doesn't have overcommit. Erik Jones Software Developer | Emma® erik@myemma.com 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate & market in style. Visit us online at http://www.myemma.com
Marko Kreen escribió: > I've experienced something similar. The reason turned out to be > combination of overcommit=off, big maint_mem and several parallel > vacuums for fast-changing tables. Seems like VACUUM allocates > full maint_mem before start, whatever the actual size of the table. Hmm. Maybe we should have VACUUM estimate how much is the maximum amount of memory that would be used, given the size of the table, and allocate only that much. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Erik Jones escribió: > On Aug 28, 2007, at 4:33 PM, Marko Kreen wrote: > >> I've experienced something similar. The reason turned out to be >> combination of overcommit=off, big maint_mem and several parallel >> vacuums for fast-changing tables. Seems like VACUUM allocates >> full maint_mem before start, whatever the actual size of the table. >> >> Fix was to put "set maint_mem=32M" before small vacuums and >> serialize some of them. > > I'm not sure about the OP but I know that we just run autovacuum so no > problem with parallel vacuums. In addition, Solaris doesn't have > overcommit. Note that as of 8.3 you will have vacuums running in parallel with autovac ... 3 in the default config. So keep an eye on that maintenance_work_mem setting. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "XML!" Exclaimed C++. "What are you doing here? You're not a programming language." "Tell that to the people who use me," said XML.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Marko Kreen escribi�: >> I've experienced something similar. The reason turned out to be >> combination of overcommit=off, big maint_mem and several parallel >> vacuums for fast-changing tables. Seems like VACUUM allocates >> full maint_mem before start, whatever the actual size of the table. > Hmm. Maybe we should have VACUUM estimate how much is the maximum > amount of memory that would be used, given the size of the table, and > allocate only that much. Yeah --- given the likelihood of parallel vacuum activity in 8.3, it'd be good to not expend memory we certainly aren't going to need. We could set a hard limit at RelationGetNumberOfBlocks * MaxHeapTuplesPerPage TIDs, but that is *extremely* conservative (it'd work out to allocating about a quarter of the table's actual size in bytes, if I did the math right). Given that the worst-case consequence is extra index vacuum passes, which don't hurt that much when a table is small, maybe some smaller estimate like 100 TIDs per page would be enough. Or, instead of using a hard-wired constant, look at pg_class.reltuples/relpages to estimate the average tuple density ... regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Marko Kreen escribi�: > >> I've experienced something similar. The reason turned out to be > >> combination of overcommit=off, big maint_mem and several parallel > >> vacuums for fast-changing tables. Seems like VACUUM allocates > >> full maint_mem before start, whatever the actual size of the table. > > > Hmm. Maybe we should have VACUUM estimate how much is the maximum > > amount of memory that would be used, given the size of the table, and > > allocate only that much. > > Yeah --- given the likelihood of parallel vacuum activity in 8.3, > it'd be good to not expend memory we certainly aren't going to need. > > We could set a hard limit at RelationGetNumberOfBlocks * > MaxHeapTuplesPerPage TIDs, but that is *extremely* conservative > (it'd work out to allocating about a quarter of the table's actual size > in bytes, if I did the math right). Another idea is to consider applying this patch: http://thread.gmane.org/gmane.comp.db.postgresql.devel.patches/19384/focus=19393 which is said to reduce the amount of memory needed to store the TID array. > Given that the worst-case consequence is extra index vacuum passes, > which don't hurt that much when a table is small, maybe some smaller > estimate like 100 TIDs per page would be enough. Or, instead of > using a hard-wired constant, look at pg_class.reltuples/relpages > to estimate the average tuple density ... This sounds like a reasonable compromise. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" Management by consensus: I have decided; you concede. (Leonard Liu)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane escribió: >> We could set a hard limit at RelationGetNumberOfBlocks * >> MaxHeapTuplesPerPage TIDs, but that is *extremely* conservative >> (it'd work out to allocating about a quarter of the table's actual size >> in bytes, if I did the math right). > Another idea is to consider applying this patch: > http://thread.gmane.org/gmane.comp.db.postgresql.devel.patches/19384/focus=19393 > which is said to reduce the amount of memory needed to store the TID > array. Yeah, but that's orthogonal to the question of how many TID slots we need. >> Given that the worst-case consequence is extra index vacuum passes, >> which don't hurt that much when a table is small, maybe some smaller >> estimate like 100 TIDs per page would be enough. Or, instead of >> using a hard-wired constant, look at pg_class.reltuples/relpages >> to estimate the average tuple density ... > This sounds like a reasonable compromise. Do you want to make it happen? regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Given that the worst-case consequence is extra index vacuum passes, > >> which don't hurt that much when a table is small, maybe some smaller > >> estimate like 100 TIDs per page would be enough. Or, instead of > >> using a hard-wired constant, look at pg_class.reltuples/relpages > >> to estimate the average tuple density ... > > > This sounds like a reasonable compromise. > > Do you want to make it happen? OK, I'll work on it tomorrow. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ Hi! I'm a .signature virus! cp me into your .signature file to help me spread!
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> Given that the worst-case consequence is extra index vacuum passes, > >> which don't hurt that much when a table is small, maybe some smaller > >> estimate like 100 TIDs per page would be enough. Or, instead of > >> using a hard-wired constant, look at pg_class.reltuples/relpages > >> to estimate the average tuple density ... > > > This sounds like a reasonable compromise. > > Do you want to make it happen? I'm not having much luck really. I think the problem is that ANALYZE stores reltuples as the number of live tuples, so if you delete a big portion of a big table, then ANALYZE and then VACUUM, there's a huge misestimation and extra index cleanup passes happen, which is a bad thing. There seems to be no way to estimate the dead space, is there? We could go to pgstats but that seems backwards. I was having a problem at first with estimating for small tables which had no valid info in pg_class.reltuples, but I worked around that by using MaxHeapTuplesPerPage. (I was experimenting with the code that estimates average tuple width in estimate_rel_size() but then figured it was too much work.) So this part is fine AFAICS. I attach the patch I am playing with, and the simple test I've been examining (on which I comment the ANALYZE on some runs, change the conditions on the DELETE, put the CREATE INDEX before insertion instead of after it, etc). -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > I'm not having much luck really. I think the problem is that ANALYZE > stores reltuples as the number of live tuples, so if you delete a big > portion of a big table, then ANALYZE and then VACUUM, there's a huge > misestimation and extra index cleanup passes happen, which is a bad > thing. Yeah ... so just go with a constant estimate of say 200 deletable tuples per page? regards, tom lane
On 8/29/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I'm not having much luck really. I think the problem is that ANALYZE > > stores reltuples as the number of live tuples, so if you delete a big > > portion of a big table, then ANALYZE and then VACUUM, there's a huge > > misestimation and extra index cleanup passes happen, which is a bad > > thing. > > Yeah ... so just go with a constant estimate of say 200 deletable tuples > per page? Note that it's much better to err on the smaller values. Extra index pass is really no problem. VACUUM getting "Out of memory" may not sound like a big problem, but the scary thing is - the last VACUUM's memory request may succeed and that means following queries start failing and that is big problem. -- marko
"Marko Kreen" <markokr@gmail.com> writes: > Note that it's much better to err on the smaller values. > Extra index pass is really no problem. I beg to differ ... What this may actually suggest is that autovacuum needs its own value of "maintenance_work_mem", or that it should automatically divide the value by the authorized number of workers. regards, tom lane
Marko Kreen escribió: > On 8/29/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@commandprompt.com> writes: > > > I'm not having much luck really. I think the problem is that ANALYZE > > > stores reltuples as the number of live tuples, so if you delete a big > > > portion of a big table, then ANALYZE and then VACUUM, there's a huge > > > misestimation and extra index cleanup passes happen, which is a bad > > > thing. > > > > Yeah ... so just go with a constant estimate of say 200 deletable tuples > > per page? > > Note that it's much better to err on the smaller values. > > Extra index pass is really no problem. Humm, is it? If you have a really big table (say, a hundred million tuples) and two indexes then you are not happy when vacuum must make two passes over the indexes. It may mean vacuum taking five hours instead of three with vacuum delay. Remember, you must scan each index *completely* each time. > VACUUM getting "Out of memory" may not sound like a big problem, but > the scary thing is - the last VACUUM's memory request may succeed and > that means following queries start failing and that is big problem. Maybe what we should do is spill the TID list to disk instead. TODO for 8.4? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Tom Lane escribió: > "Marko Kreen" <markokr@gmail.com> writes: > > Note that it's much better to err on the smaller values. > > > Extra index pass is really no problem. > > I beg to differ ... > > What this may actually suggest is that autovacuum needs its own value of > "maintenance_work_mem", or that it should automatically divide the value > by the authorized number of workers. Hmm, this makes sense. -- Alvaro Herrera Valdivia, Chile ICBM: S 39º 49' 18.1", W 73º 13' 56.4" "Cómo ponemos nuestros dedos en la arcilla del otro. Eso es la amistad; jugar al alfarero y ver qué formas se pueden sacar del otro" (C. Halloway en La Feria de las Tinieblas, R. Bradbury)
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I'm not having much luck really. I think the problem is that ANALYZE > > stores reltuples as the number of live tuples, so if you delete a big > > portion of a big table, then ANALYZE and then VACUUM, there's a huge > > misestimation and extra index cleanup passes happen, which is a bad > > thing. > > Yeah ... so just go with a constant estimate of say 200 deletable tuples > per page? How about we use a constant estimate using the average tuple width code? -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane escribi�: >> Yeah ... so just go with a constant estimate of say 200 deletable tuples >> per page? > How about we use a constant estimate using the average tuple width code? I think that's overthinking the problem. The point here is mostly for vacuum to not consume 512MB (or whatever you have maintenance_work_mem set to) when vacuuming a ten-page table. I think that if we significantly increase the risk of having to make multiple index passes on medium-size tables, we'll not be doing anyone any favors. If we went with allocating MaxHeapTuplesPerPage slots per page (292 in CVS HEAD), 512MB would correspond to a bit over 300,000 pages, and you'd get memory savings for anything less than that. But that's already a 2GB table --- do you want to risk multiple index passes because you were chintzy with your memory allocation? Ultimately, the answer for a DBA who sees "out of memory" a lot is to reduce his maintenance_work_mem. I don't think VACUUM should be trying to substitute for the DBA's judgment. BTW, if an autovac worker gets an elog(ERROR) on one table, does it die or continue on with the next table? regards, tom lane
On 8/30/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Marko Kreen" <markokr@gmail.com> writes: > > Note that it's much better to err on the smaller values. > > > Extra index pass is really no problem. > > I beg to differ ... Well, if Postgres tries to cut down passes by using max memory then admin is forced to cut down maint_mem for safety reasons... > What this may actually suggest is that autovacuum needs its own value of > "maintenance_work_mem", or that it should automatically divide the value > by the authorized number of workers. +1 Autovacuum having it's own value and cutting pieces from there and giving to vacuums is probably the easiest path. Then the per-backend maint_mem does not need to be large. -- marko
Tom Lane escribió: > BTW, if an autovac worker gets an elog(ERROR) on one table, does it die > or continue on with the next table? It continues with the next table if interrupted (SIGINT), but the worker exits on any other error. I would ask you to review that code -- it's in do_autovacuum, the PG_TRY block at the end. It was committed in rev 1.52 of autovacuum.c. -- Alvaro Herrera http://www.advogato.org/person/alvherre "We are who we choose to be", sang the goldfinch when the sun is high (Sandman)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane escribió: >> BTW, if an autovac worker gets an elog(ERROR) on one table, does it die >> or continue on with the next table? > It continues with the next table if interrupted (SIGINT), but the worker > exits on any other error. Hmm, that seems backwards. What's the rationale? regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Tom Lane escribió: > >> BTW, if an autovac worker gets an elog(ERROR) on one table, does it die > >> or continue on with the next table? > > > It continues with the next table if interrupted (SIGINT), but the worker > > exits on any other error. > > Hmm, that seems backwards. What's the rationale? I don't remember :-( We now use SIGTERM for shutdown, which closes the worker for good. It may well be that we need to change this so that the worker is shut down on specific error codes, instead of continuing on some other codes. Or it may need more thorough surgery. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: >> VACUUM getting "Out of memory" may not sound like a big problem, but >> the scary thing is - the last VACUUM's memory request may succeed and >> that means following queries start failing and that is big problem. > Maybe what we should do is spill the TID list to disk instead. TODO for > 8.4? I'm inclined to think that that'd be counterproductive. Searching the TID list has to be *fast*, else the index cleanup scans will take forever. It's probably better to have a smaller list and do two index passes than to try to do it in one pass using a list that doesn't fit in memory --- in the former case you're at least doing a seqscan of the index, rather than randomly faulting in different parts of the TID list. It's fairly likely that 512MB is already too high and is causing some swapping on the OP's machine ... regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> VACUUM getting "Out of memory" may not sound like a big problem, but > >> the scary thing is - the last VACUUM's memory request may succeed and > >> that means following queries start failing and that is big problem. > > > Maybe what we should do is spill the TID list to disk instead. TODO for > > 8.4? > > I'm inclined to think that that'd be counterproductive. Searching the > TID list has to be *fast*, else the index cleanup scans will take > forever. It's probably better to have a smaller list and do two index > passes than to try to do it in one pass using a list that doesn't fit in > memory --- in the former case you're at least doing a seqscan of the > index, rather than randomly faulting in different parts of the TID list. Maybe we could load it in a more compact form after the heap cleanup pass, instead of a plain TID list. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Maybe we could load it in a more compact form after the heap cleanup > pass, instead of a plain TID list. Hmm ... a non-lossy bitmap would be an interesting alternative. I think I went with bsearch mainly because I knew I could generate the TID list already sorted. Somebody should try to measure the probe time of the tidbitmap code against bsearch ... regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I'm not having much luck really. I think the problem is that ANALYZE > > stores reltuples as the number of live tuples, so if you delete a big > > portion of a big table, then ANALYZE and then VACUUM, there's a huge > > misestimation and extra index cleanup passes happen, which is a bad > > thing. > > Yeah ... so just go with a constant estimate of say 200 deletable tuples > per page? This seems the most reasonable approach for now. So this is the patch. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "I call it GNU/Linux. Except the GNU/ is silent." (Ben Reiter)
Attachment
Alvaro Herrera <alvherre@commandprompt.com> writes: > ! /* no need to allocate more space than we have pages */ > ! maxtuples = LAZY_ALLOC_TUPLES * relblocks; Uh ... you need to guard against integer overflow in this calculation. Perhaps leave the current initialization alone, and then after the Min-reduction steps add /* curious coding to ensure the multiplication can't overflow */ if ((BlockNumber) (maxtuples / LAZY_ALLOC_TUPLES) > relblocks) maxtuples = relblocks * LAZY_ALLOC_TUPLES; which is demonstrably overflow-proof once maxtuples has been clamped to less than INT_MAX. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Maybe we could load it in a more compact form after the heap cleanup >> pass, instead of a plain TID list. > > Hmm ... a non-lossy bitmap would be an interesting alternative. > I think I went with bsearch mainly because I knew I could generate > the TID list already sorted. Somebody should try to measure the > probe time of the tidbitmap code against bsearch ... Incidentally, there was a previous discussion about this a while back. I can't seem to find it in the archives though. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Alvaro Herrera <alvherre@commandprompt.com> writes: > It continues with the next table if interrupted (SIGINT), but the worker > exits on any other error. I would ask you to review that code -- it's > in do_autovacuum, the PG_TRY block at the end. It was committed in rev > 1.52 of autovacuum.c. While looking at this I came across something I didn't like at all: * We somewhat ignore the risk that the launcher changes its PID * between we reading it and the actual kill; we expect ProcKill to be * called shortly after us, and we assume that PIDs are not reused too * quickly after a process exits. I'm fairly sure that Windows has a bad habit of recycling PIDs almost immediately. I didn't actually read the code to see what the assumption is for --- I just noticed this comment and it set off alarm bells. Can you rework the logic to not depend on PIDs at all? (Perhaps the "session IDs" that Florian's patch will create would serve instead? I imagine those will be assigned during InitProcess, so they should be available to identify individual autovac workers.) regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > It continues with the next table if interrupted (SIGINT), but the worker > > exits on any other error. I would ask you to review that code -- it's > > in do_autovacuum, the PG_TRY block at the end. It was committed in rev > > 1.52 of autovacuum.c. > > While looking at this I came across something I didn't like at all: > > * We somewhat ignore the risk that the launcher changes its PID > * between we reading it and the actual kill; we expect ProcKill to be > * called shortly after us, and we assume that PIDs are not reused too > * quickly after a process exits. > > I'm fairly sure that Windows has a bad habit of recycling PIDs almost > immediately. I didn't actually read the code to see what the assumption > is for --- I just noticed this comment and it set off alarm bells. Well, this is not much of a risk, because what's going on is that the worker wants to signal the launcher. So the launcher would need to shut down for this to happen, which would be pretty rare on its own. Also, note that the time interval we're talking about is between one proc_exit handler fires and the next. Also, note that the worst thing that can happen is that the wrong process gets a SIGUSR1 signal, and the launcher misses an opportunity for starting another worker and rebalancing the vacuum cost parameters. -- Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC "No me acuerdo, pero no es cierto. No es cierto, y si fuera cierto, no me acuerdo." (Augusto Pinochet a una corte de justicia)
Alvaro Herrera <alvherre@commandprompt.com> writes: > Also, note that the worst thing that can happen is that the wrong > process gets a SIGUSR1 signal, and the launcher misses an opportunity > for starting another worker and rebalancing the vacuum cost parameters. Hmmm ... okay, but I note that part of that assumption is that every postgres-owned process either ignores SIGUSR1 or handles it in a fashion such that an extra signal won't cause any Bad Things. This is not obvious, especially considering that the Unix default action for SIGUSR1 is abnormal process termination. I'm starting to think that we need a README somewhere collecting all the system's assumptions about signal handling. regards, tom lane
Tom Lane escribió: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > ! /* no need to allocate more space than we have pages */ > > ! maxtuples = LAZY_ALLOC_TUPLES * relblocks; > > Uh ... you need to guard against integer overflow in this calculation. Done this way. I applied it all the way back to 8.0, because 8.2 was requested and supporting 8.1 in this way still seems necessary; and the 8.1 patch applied cleanly to 8.0 so I did that one too. It didn't apply cleanly to 7.4 though, but I don't think I've ever seen a complaint about this with so old a version. It's only a wording change in a comment though, so it's trivial to apply it back if somebody feels it's needed. Sorry it didn't make it into 8.2.5 :-( -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support