Thread: Vacuum full - disk space eaten by WAL logfiles
Hi all,
When we do weekly “vacuum full”, PG uses all space and causes PG down.
checkpoint_segments | 30
checkpoint_timeout | 300
select version();
version
-------------------------------------------------------------
PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96
The error message is:
Jan 8 20:25:33 mybox postgres[1602]: [13] PANIC: ZeroFill failed to write /my/pg_xlog/xlogtemp.1602: No space left on device
Jan 8 20:25:35 mybox postgres[8213]: [163] LOG: removing transaction log file 00001AB2000000EC
Jan 8 20:25:35 mybox postgres[1602]: [14-1] LOG: statement: COPY xxxxx (domain, domain_id, customer_id, action_unspecified, action_unknown,
Jan 8 20:25:35 mybox postgres[8213]: [164] LOG: removing transaction log file 00001AB2000000ED
Jan 8 20:25:35 mybox postgres[8213]: [165] LOG: removing transaction log file 00001AB2000000EE
Jan 8 20:25:35 mybox postgres[1602]: [14-2] action_none, action_deny, action_fail, action_strip, action_tag, action_quarantine, action_clean, action_copy, action_allow,
Jan 8 20:25:35 mybox postgres[8213]: [166] LOG: removing transaction log file 00001AB2000000F0
Jan 8 20:25:35 mybox postgres[1602]: [14-3] module, period, created) FROM stdin
Jan 8 20:25:35 mybox postgres[8213]: [167] LOG: removing transaction log file 00001AB2000000F1
Jan 8 20:25:35 mybox postgres[8213]: [168] LOG: removing transaction log file 00001AB2000000F2
Jan 8 20:25:36 mybox postgres[8213]: [169] LOG: removing transaction log file 00001AB2000000F3
Jan 8 20:25:36 mybox postgres[8213]: [170] LOG: removing transaction log file 00001AB2000000F4
Jan 8 20:25:36 mybox postgres[8213]: [171] LOG: removing transaction log file 00001AB2000000F5
Jan 8 20:25:36 mybox postgres[862]: [13] LOG: server process (pid 1602) was terminated by signal 6
Jan 8 20:25:36 mybox postgres[862]: [14] LOG: terminating any other active server processes
Jan 8 20:25:36 mybox postgres[8601]: [13-1] WARNING: copy: line 1, Message from PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[8601]: [13-2] ^IThe Postmaster has informed me that some other backend
Jan 8 20:25:36 mybox postgres[8601]: [13-3] ^Idied abnormally and possibly corrupted shared memory.
Jan 8 20:25:36 mybox postgres[8601]: [13-4] ^II have rolled back the current transaction and am
Jan 8 20:25:36 mybox postgres[7284]: [13-1] WARNING: Message from PostgreSQL backend:
Jan 8 20:25:36 mybox postgres[3658]: [13-1] WARNING: Message from PostgreSQL backend:
When the error is happening, we see at one time, there are 563 WAL files under pg_xlog. My understanding is we should expect
61 WAL files, each 16M.
Our /my/pg_xlog is 10G partition and typically has 8G left.
The table size of being vacuumed full is 35G.
What can I do to fix this mess??
Regards,
"Lee Wu" <Lwu@mxlogic.com> writes: > When we do weekly "vacuum full", PG uses all space and causes PG down. This implies that checkpoints aren't completing for some reason. If they were, they'd be recycling WAL space. I'm not aware of any problems in 7.3 that would block a checkpoint indefinitely, but we have seen cases where it just took too darn long to do the checkpoint --- implying either a ridiculously large shared_buffers setting, or a drastic shortage of I/O bandwidth. You might want to try strace'ing the checkpoint process to see if it seems to be making progress or not. Also, are you certain that this is happening during a VACUUM? The log messages you show refer to COPY commands. > PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96 Are you aware of the number and significance of post-7.3.2 bug fixes in the 7.3 branch? You really ought to be on 7.3.8, if you can't afford to migrate to 7.4 right now. regards, tom lane
Hi Tom, 1. shared_buffers | 32768 2. I/O bandwidth is not an issue to best our knowledge 3. It is "vacuum full" as shown: Jan 8 20:25:38 mybox postgres[8603]: [15] FATAL: The database system is in recovery mode Jan 8 20:25:38 mybox postgres[7284]: [14] LOG: statement: vacuum full analyze the_35G_table Jan 8 20:25:39 mybox postgres[8604]: [15] FATAL: The database system is in recovery mode Also this error happened last 2 Saturdays and matched our vacuum log timing: 20050108-194000 End vacuum full analyze on table1. 20050108-194000 Begin vacuum full analyze on the_35G_table. WARNING: Message from PostgreSQL backend: The Postmaster has informed me that some other backend died abnormally and possibly corrupted shared memory. I have rolled back the current transaction and am going to terminate your database system connection and exit. Please reconnect to the database system and repeat your query. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost 20050108-202539 End vacuum full analyze on the_35G_table. 20050108-202539 Begin vacuum full analyze on table3. psql: FATAL: The database system is in recovery mode We only do vacuum full on Saturday. This error has not been seen occurring other time. 4. PG upgrade issue - out of my (an DBA) control Thanks, -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, January 10, 2005 2:27 PM To: Lee Wu Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles "Lee Wu" <Lwu@mxlogic.com> writes: > When we do weekly "vacuum full", PG uses all space and causes PG down. This implies that checkpoints aren't completing for some reason. If they were, they'd be recycling WAL space. I'm not aware of any problems in 7.3 that would block a checkpoint indefinitely, but we have seen cases where it just took too darn long to do the checkpoint --- implying either a ridiculously large shared_buffers setting, or a drastic shortage of I/O bandwidth. You might want to try strace'ing the checkpoint process to see if it seems to be making progress or not. Also, are you certain that this is happening during a VACUUM? The log messages you show refer to COPY commands. > PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC 2.96 Are you aware of the number and significance of post-7.3.2 bug fixes in the 7.3 branch? You really ought to be on 7.3.8, if you can't afford to migrate to 7.4 right now. regards, tom lane
"Lee Wu" <Lwu@mxlogic.com> writes: > 3. It is "vacuum full" as shown: Hmm. I'm not able to replicate any such problem here: AFAICS, checkpoints go through to completion just fine while a VACUUM is running. This might mean that the problem was fixed between 7.3.2 and 7.3.8 (though I see nothing relevant-looking in the changelog). Or it might mean that you've not given enough information to let someone else duplicate the problem. What else is going on in parallel with the VACUUM? (We know at least some COPY operations...) Again, it would be useful to know whether the checkpoint process is making progress or not. regards, tom lane
Hi Tom, I re-send this because my last email with attachment did not get through. It is a live production database, and I do not know what else happened on Saturdays because I do not work on Saturdays. However, I can ask our shift to find out what other queries will be in the PG next Saturday if it helps you/us. On the other hand, should not PG use fix number/amount of LOG files like Oracle even though there may be other activities at the same time? From archive, I see a post http://archives.postgresql.org/pgsql-hackers/2004-07/msg01055.php It said, > > You have 35 because the max files in pg_xlog is > > 2*checkpoint_segments +1 or something like that. This is documented > > in the SGML. Here is postgres log file except I changed hostname and table names. I do not know how to strace the checkpoint process and would appreciate it if you can show me: 1. how 2. when Thanks, Postgres.log: Jan 8 20:15:52 mybox postgres[8037]: [73] LOG: recycled transaction log file 00001AB100000060 all other recycling transaction log ... Jan 8 20:15:52 mybox postgres[8037]: [74] LOG: removing transaction log file 00001AB100000061 all other removing transaction log ... Jan 8 20:17:27 mybox postgres[8213]: [13] LOG: recycled transaction log file 00001AB2000000A3 all other recycling transaction log ... Jan 8 20:17:42 mybox postgres[8213]: [74] LOG: removing transaction log file 00001AB200000077 all other removing transaction log ... Jan 8 20:25:33 mybox postgres[1602]: [13] PANIC: ZeroFill failed to write /my/pg_xlog/xlogtemp.1602: No space left on device Jan 8 20:25:35 mybox postgres[8213]: [163] LOG: removing transaction log file 00001AB2000000EC Jan 8 20:25:35 mybox postgres[1602]: [14-1] LOG: statement: COPY table1 (domain, domain_id, customer_id, action_unspecified, action_unknown, Jan 8 20:25:35 mybox postgres[8213]: [164] LOG: removing transaction log file 00001AB2000000ED Jan 8 20:25:35 mybox postgres[8213]: [165] LOG: removing transaction log file 00001AB2000000EE Jan 8 20:25:35 mybox postgres[1602]: [14-2] action_none, action_deny, action_fail, action_strip, action_tag, action_quarantine, action_clean, action_copy, action_allow, Jan 8 20:25:35 mybox postgres[8213]: [166] LOG: removing transaction log file 00001AB2000000F0 Jan 8 20:25:35 mybox postgres[1602]: [14-3] module, period, created) FROM stdin Jan 8 20:25:35 mybox postgres[8213]: [167] LOG: removing transaction log file 00001AB2000000F1 Jan 8 20:25:35 mybox postgres[8213]: [168] LOG: removing transaction log file 00001AB2000000F2 Jan 8 20:25:36 mybox postgres[8213]: [169] LOG: removing transaction log file 00001AB2000000F3 Jan 8 20:25:36 mybox postgres[8213]: [170] LOG: removing transaction log file 00001AB2000000F4 Jan 8 20:25:36 mybox postgres[8213]: [171] LOG: removing transaction log file 00001AB2000000F5 Jan 8 20:25:36 mybox postgres[862]: [13] LOG: server process (pid 1602) was terminated by signal 6 Jan 8 20:25:36 mybox postgres[862]: [14] LOG: terminating any other active server processes Jan 8 20:25:36 mybox postgres[8601]: [13-1] WARNING: copy: line 1, Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[8601]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[8601]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:36 mybox postgres[8601]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:36 mybox postgres[7284]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[3658]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[29685]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[1605]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[29687]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[1603]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[29684]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[1604]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[29686]: [13-1] WARNING: Message from PostgreSQL backend: Jan 8 20:25:36 mybox postgres[8601]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:36 mybox postgres[7284]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[3658]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[29685]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[1605]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[29687]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[1603]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[29684]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:36 mybox postgres[1604]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:37 mybox postgres[29686]: [13-2] ^IThe Postmaster has informed me that some other backend Jan 8 20:25:37 mybox postgres[8601]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:37 mybox postgres[7284]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[3658]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[29685]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[1605]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[29687]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[1603]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[29684]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[1604]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[29686]: [13-3] ^Idied abnormally and possibly corrupted shared memory. Jan 8 20:25:37 mybox postgres[8601]: [14-1] LOG: copy: line 1, statement: COPY table1 (domain, domain_id, customer_id, email, bytes_received, mails, Jan 8 20:25:37 mybox postgres[7284]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[3658]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[29685]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[1605]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[29687]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[1603]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[29684]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[1604]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[29686]: [13-4] ^II have rolled back the current transaction and am Jan 8 20:25:37 mybox postgres[8601]: [14-2] period, created) FROM stdin Jan 8 20:25:37 mybox postgres[7284]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[3658]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[29685]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[1605]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[29687]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[1603]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[29684]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[1604]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[29686]: [13-5] ^Igoing to terminate your database system connection and exit. Jan 8 20:25:38 mybox postgres[8602]: [15] FATAL: The database system is in recovery mode Jan 8 20:25:38 mybox postgres[7284]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[3658]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[29685]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[1605]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[29687]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[1603]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[29684]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[1604]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[29686]: [13-6] ^IPlease reconnect to the database system and repeat your query. Jan 8 20:25:38 mybox postgres[8603]: [15] FATAL: The database system is in recovery mode Jan 8 20:25:38 mybox postgres[7284]: [14] LOG: statement: set vacuum_mem=131072; vacuum full analyze the_35G_table Jan 8 20:25:39 mybox postgres[8604]: [15] FATAL: The database system is in recovery mode All other The database system is in recovery mode... Jan 8 20:25:41 mybox postgres[862]: [15] LOG: all server processes terminated; reinitializing shared memory and semaphores Jan 8 20:25:41 mybox postgres[9355]: [16] FATAL: The database system is starting up Jan 8 20:25:41 mybox postgres[9346]: [16] LOG: database system was interrupted at 2005-01-08 20:25:32 MST Jan 8 20:25:41 mybox postgres[9365]: [16] FATAL: The database system is starting up Jan 8 20:25:41 mybox postgres[9346]: [17] LOG: checkpoint record is at 1AB3/692F2358 Jan 8 20:25:41 mybox postgres[9378]: [16] FATAL: The database system is starting up Jan 8 20:25:41 mybox postgres[9346]: [18] LOG: redo record is at 1AB3/584199FC; undo record is at 0/0; shutdown FALSE Jan 8 20:25:42 mybox postgres[9477]: [16] FATAL: The database system is starting up Jan 8 20:25:42 mybox postgres[9346]: [19] LOG: next transaction id: 85474993; next oid: 901857280 Jan 8 20:25:42 mybox postgres[9478]: [16] FATAL: The database system is starting up Jan 8 20:25:42 mybox postgres[9346]: [20] LOG: database system was not properly shut down; automatic recovery in progress Jan 8 20:25:42 mybox postgres[9479]: [16] FATAL: The database system is starting up Jan 8 20:25:42 mybox postgres[9346]: [21] LOG: redo starts at 1AB3/584199FC Jan 8 20:25:42 mybox postgres[9481]: [16] FATAL: The database system is starting up All other The database system is starting up... Jan 8 20:37:38 mybox postgres[9346]: [22] LOG: open of /my/pg_xlog/00001AB500000030 (log file 6837, segment 48) failed: No such file or directory Jan 8 20:37:38 mybox postgres[9346]: [23] LOG: redo done at 1AB5/2FFFC1F8 Jan 8 20:37:45 mybox postgres[9346]: [24] LOG: recycled transaction log file 00001AB300000021 All other recycling transaction log ... Jan 8 20:37:48 mybox postgres[9346]: [85] LOG: removing transaction log file 00001AB2000000FB Jan 8 20:37:48 mybox postgres[9346]: [86] LOG: removing transaction log file 00001AB2000000FC Jan 8 20:37:48 mybox postgres[14114]: [16] FATAL: The database system is starting up Jan 8 20:37:48 mybox postgres[9346]: [87] LOG: removing transaction log file 00001AB2000000FD All other removing transaction log ... Jan 8 20:43:28 mybox postgres[14419]: [484] LOG: removing transaction log file 00001AB50000002E Jan 8 20:48:38 mybox postgres[14547]: [16] LOG: recycled transaction log file 00001AB500000031 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, January 10, 2005 4:01 PM To: Lee Wu Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles "Lee Wu" <Lwu@mxlogic.com> writes: > 3. It is "vacuum full" as shown: Hmm. I'm not able to replicate any such problem here: AFAICS, checkpoints go through to completion just fine while a VACUUM is running. This might mean that the problem was fixed between 7.3.2 and 7.3.8 (though I see nothing relevant-looking in the changelog). Or it might mean that you've not given enough information to let someone else duplicate the problem. What else is going on in parallel with the VACUUM? (We know at least some COPY operations...) Again, it would be useful to know whether the checkpoint process is making progress or not. regards, tom lane
"Lee Wu" <Lwu@mxlogic.com> writes: > On the other hand, should not PG use fix number/amount > of LOG files like Oracle even though there may be other activities at > the same time? No, it certainly should not. Anyone who's admin'd an Oracle installation will tell you what a PITA it is that Oracle keels over and dies when you exceed the fixed log space allocation. The real question here is why the log space isn't getting recycled in a timely fashion. > Postgres.log: > Jan 8 20:15:52 mybox postgres[8037]: [73] LOG: recycled transaction > log file 00001AB100000060 > all other recycling transaction log ... > Jan 8 20:15:52 mybox postgres[8037]: [74] LOG: removing transaction > log file 00001AB100000061 > all other removing transaction log ... > Jan 8 20:17:27 mybox postgres[8213]: [13] LOG: recycled transaction > log file 00001AB2000000A3 > all other recycling transaction log ... > Jan 8 20:17:42 mybox postgres[8213]: [74] LOG: removing transaction > log file 00001AB200000077 > all other removing transaction log ... > Jan 8 20:25:33 mybox postgres[1602]: [13] PANIC: ZeroFill failed to > write /my/pg_xlog/xlogtemp.1602: No space left on device > Jan 8 20:25:35 mybox postgres[8213]: [163] LOG: removing transaction > log file 00001AB2000000EC > Jan 8 20:25:35 mybox postgres[1602]: [14-1] LOG: statement: COPY > table1 (domain, domain_id, customer_id, action_unspecified, > action_unknown, > Jan 8 20:25:35 mybox postgres[8213]: [164] LOG: removing transaction > log file 00001AB2000000ED > Jan 8 20:25:35 mybox postgres[8213]: [165] LOG: removing transaction > log file 00001AB2000000EE > Jan 8 20:25:35 mybox postgres[1602]: [14-2] action_none, action_deny, > action_fail, action_strip, action_tag, action_quarantine, action_clean, > action_copy, action_allow, Hmm. You seem to have removed all the evidence about the interesting question, which is what process 8213 (which was evidently doing a checkpoint) was doing between 20:17:42 and 20:25:35. Also, what postgresql.conf parameters are you using? The mix of "removing" and "recycling" operations seems a bit odd. regards, tom lane
Here is postgres log from 20:17:42 to 20:25:35. I did NOT removing anything except hostname and table names. Thanks, name | setting --------------------------------+--------------------------------------- australian_timezones | off authentication_timeout | 60 autocommit | on checkpoint_segments | 30 checkpoint_timeout | 300 client_encoding | SQL_ASCII client_min_messages | notice commit_delay | 0 commit_siblings | 5 cpu_index_tuple_cost | 0.001 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 DateStyle | ISO with US (NonEuropean) conventions db_user_namespace | off deadlock_timeout | 2500 debug_pretty_print | off debug_print_parse | off debug_print_plan | off debug_print_rewritten | off default_statistics_target | 10 default_transaction_isolation | read committed dynamic_library_path | $libdir effective_cache_size | 32000 enable_hashjoin | on enable_indexscan | on enable_mergejoin | on enable_nestloop | on enable_seqscan | on enable_sort | on enable_tidscan | on explain_pretty_print | on fixbtree | on fsync | off geqo | on geqo_effort | 1 geqo_generations | 0 geqo_pool_size | 0 geqo_random_seed | -1 geqo_selection_bias | 2 geqo_threshold | 11 hostname_lookup | off krb_server_keyfile | unset lc_messages | en_US.iso885915 lc_monetary | en_US.iso885915 lc_numeric | en_US.iso885915 lc_time | en_US.iso885915 log_connections | off log_duration | off log_min_error_statement | debug1 log_pid | off log_statement | off log_timestamp | off max_connections | 400 max_expr_depth | 10000 max_files_per_process | 1000 max_fsm_pages | 5000000 max_fsm_relations | 500 max_locks_per_transaction | 64 password_encryption | on port | 5432 pre_auth_delay | 0 random_page_cost | 1.1 search_path | $user,public server_encoding | SQL_ASCII server_min_messages | notice shared_buffers | 32768 show_executor_stats | off show_parser_stats | off show_planner_stats | off show_source_port | on show_statement_stats | off silent_mode | off sort_mem | 131072 sql_inheritance | on ssl | off statement_timeout | 0 stats_block_level | on stats_command_string | on stats_reset_on_server_start | on stats_row_level | on stats_start_collector | on superuser_reserved_connections | 2 syslog | 2 syslog_facility | LOCAL0 syslog_ident | postgres tcpip_socket | on TimeZone | unknown trace_notify | off TRANSACTION ISOLATION LEVEL | READ COMMITTED transform_null_equals | off unix_socket_directory | unset unix_socket_group | unset unix_socket_permissions | 511 vacuum_mem | 524288 virtual_host | unset wal_buffers | 2048 wal_debug | 0 wal_sync_method | open_sync (98 rows) Jan 8 20:16:59 mybox postgres[8037]: [294] LOG: removing transaction log file 00001AB200000055 Jan 8 20:17:27 mybox postgres[8213]: [13] LOG: recycled transaction log file 00001AB2000000A3 Jan 8 20:17:27 mybox postgres[8213]: [14] LOG: recycled transaction log file 00001AB2000000A4 Jan 8 20:17:27 mybox postgres[8213]: [15] LOG: recycled transaction log file 00001AB2000000A5 Jan 8 20:17:27 mybox postgres[8213]: [16] LOG: recycled transaction log file 00001AB2000000A6 Jan 8 20:17:27 mybox postgres[8213]: [17] LOG: recycled transaction log file 00001AB2000000A7 Jan 8 20:17:27 mybox postgres[8213]: [18] LOG: recycled transaction log file 00001AB2000000A8 Jan 8 20:17:27 mybox postgres[8213]: [19] LOG: recycled transaction log file 00001AB2000000A9 Jan 8 20:17:27 mybox postgres[8213]: [20] LOG: recycled transaction log file 00001AB2000000AA Jan 8 20:17:27 mybox postgres[8213]: [21] LOG: recycled transaction log file 00001AB2000000AB Jan 8 20:17:27 mybox postgres[8213]: [22] LOG: recycled transaction log file 00001AB2000000AC Jan 8 20:17:27 mybox postgres[8213]: [23] LOG: recycled transaction log file 00001AB2000000AD Jan 8 20:17:27 mybox postgres[8213]: [24] LOG: recycled transaction log file 00001AB2000000AE Jan 8 20:17:27 mybox postgres[8213]: [25] LOG: recycled transaction log file 00001AB2000000AF Jan 8 20:17:30 mybox postgres[8213]: [26] LOG: recycled transaction log file 00001AB2000000B0 Jan 8 20:17:36 mybox postgres[8213]: [27] LOG: recycled transaction log file 00001AB2000000A2 Jan 8 20:17:36 mybox postgres[8213]: [28] LOG: recycled transaction log file 00001AB2000000B1 Jan 8 20:17:36 mybox postgres[8213]: [29] LOG: recycled transaction log file 00001AB2000000B2 Jan 8 20:17:37 mybox postgres[8213]: [30] LOG: recycled transaction log file 00001AB2000000B3 Jan 8 20:17:37 mybox postgres[8213]: [31] LOG: recycled transaction log file 00001AB2000000B4 Jan 8 20:17:38 mybox postgres[8213]: [32] LOG: recycled transaction log file 00001AB2000000B5 Jan 8 20:17:39 mybox postgres[8213]: [33] LOG: recycled transaction log file 00001AB2000000B8 Jan 8 20:17:39 mybox postgres[8213]: [34] LOG: recycled transaction log file 00001AB2000000BC Jan 8 20:17:39 mybox postgres[8213]: [35] LOG: recycled transaction log file 00001AB2000000C0 Jan 8 20:17:40 mybox postgres[8213]: [36] LOG: recycled transaction log file 00001AB2000000C4 Jan 8 20:17:40 mybox postgres[8213]: [37] LOG: recycled transaction log file 00001AB2000000CA Jan 8 20:17:40 mybox postgres[8213]: [38] LOG: recycled transaction log file 00001AB2000000D4 Jan 8 20:17:40 mybox postgres[8213]: [39] LOG: recycled transaction log file 00001AB2000000DE Jan 8 20:17:40 mybox postgres[8213]: [40] LOG: recycled transaction log file 00001AB2000000E9 Jan 8 20:17:40 mybox postgres[8213]: [41] LOG: recycled transaction log file 00001AB2000000EF Jan 8 20:17:40 mybox postgres[8213]: [42] LOG: recycled transaction log file 00001AB200000057 Jan 8 20:17:40 mybox postgres[8213]: [43] LOG: recycled transaction log file 00001AB200000058 Jan 8 20:17:40 mybox postgres[8213]: [44] LOG: recycled transaction log file 00001AB200000059 Jan 8 20:17:40 mybox postgres[8213]: [45] LOG: recycled transaction log file 00001AB20000005A Jan 8 20:17:40 mybox postgres[8213]: [46] LOG: recycled transaction log file 00001AB20000005B Jan 8 20:17:40 mybox postgres[8213]: [47] LOG: recycled transaction log file 00001AB20000005C Jan 8 20:17:40 mybox postgres[8213]: [48] LOG: recycled transaction log file 00001AB20000005D Jan 8 20:17:40 mybox postgres[8213]: [49] LOG: recycled transaction log file 00001AB20000005E Jan 8 20:17:40 mybox postgres[8213]: [50] LOG: recycled transaction log file 00001AB20000005F Jan 8 20:17:40 mybox postgres[8213]: [51] LOG: recycled transaction log file 00001AB200000060 Jan 8 20:17:40 mybox postgres[8213]: [52] LOG: recycled transaction log file 00001AB200000061 Jan 8 20:17:40 mybox postgres[8213]: [53] LOG: recycled transaction log file 00001AB200000062 Jan 8 20:17:41 mybox postgres[8213]: [54] LOG: recycled transaction log file 00001AB200000063 Jan 8 20:17:41 mybox postgres[8213]: [55] LOG: recycled transaction log file 00001AB200000064 Jan 8 20:17:41 mybox postgres[8213]: [56] LOG: recycled transaction log file 00001AB200000065 Jan 8 20:17:41 mybox postgres[8213]: [57] LOG: recycled transaction log file 00001AB200000066 Jan 8 20:17:41 mybox postgres[8213]: [58] LOG: recycled transaction log file 00001AB200000067 Jan 8 20:17:41 mybox postgres[8213]: [59] LOG: recycled transaction log file 00001AB200000068 Jan 8 20:17:41 mybox postgres[8213]: [60] LOG: recycled transaction log file 00001AB200000069 Jan 8 20:17:41 mybox postgres[8213]: [61] LOG: recycled transaction log file 00001AB20000006A Jan 8 20:17:41 mybox postgres[8213]: [62] LOG: recycled transaction log file 00001AB20000006B Jan 8 20:17:41 mybox postgres[8213]: [63] LOG: recycled transaction log file 00001AB20000006C Jan 8 20:17:41 mybox postgres[8213]: [64] LOG: recycled transaction log file 00001AB20000006D Jan 8 20:17:41 mybox postgres[8213]: [65] LOG: recycled transaction log file 00001AB20000006E Jan 8 20:17:41 mybox postgres[8213]: [66] LOG: recycled transaction log file 00001AB20000006F Jan 8 20:17:41 mybox postgres[8213]: [67] LOG: recycled transaction log file 00001AB200000070 Jan 8 20:17:42 mybox postgres[8213]: [68] LOG: recycled transaction log file 00001AB200000071 Jan 8 20:17:42 mybox postgres[8213]: [69] LOG: recycled transaction log file 00001AB200000072 Jan 8 20:17:42 mybox postgres[8213]: [70] LOG: recycled transaction log file 00001AB200000073 Jan 8 20:17:42 mybox postgres[8213]: [71] LOG: recycled transaction log file 00001AB200000074 Jan 8 20:17:42 mybox postgres[8213]: [72] LOG: recycled transaction log file 00001AB200000075 Jan 8 20:17:42 mybox postgres[8213]: [73] LOG: recycled transaction log file 00001AB200000076 Jan 8 20:17:42 mybox postgres[8213]: [74] LOG: removing transaction log file 00001AB200000077 Jan 8 20:17:42 mybox postgres[8213]: [75] LOG: removing transaction log file 00001AB200000078 Jan 8 20:17:43 mybox postgres[8213]: [76] LOG: removing transaction log file 00001AB200000079 Jan 8 20:17:44 mybox postgres[8213]: [77] LOG: removing transaction log file 00001AB20000007A Jan 8 20:17:46 mybox postgres[8213]: [78] LOG: removing transaction log file 00001AB20000007B Jan 8 20:17:46 mybox postgres[8213]: [79] LOG: removing transaction log file 00001AB20000007C Jan 8 20:17:46 mybox postgres[8213]: [80] LOG: removing transaction log file 00001AB20000007D Jan 8 20:17:46 mybox postgres[8213]: [81] LOG: removing transaction log file 00001AB20000007E Jan 8 20:17:46 mybox postgres[8213]: [82] LOG: removing transaction log file 00001AB20000007F Jan 8 20:17:46 mybox postgres[8213]: [83] LOG: removing transaction log file 00001AB200000080 Jan 8 20:17:46 mybox postgres[8213]: [84] LOG: removing transaction log file 00001AB200000081 Jan 8 20:17:46 mybox postgres[8213]: [85] LOG: removing transaction log file 00001AB200000082 Jan 8 20:17:47 mybox postgres[8213]: [86] LOG: removing transaction log file 00001AB200000083 Jan 8 20:17:48 mybox postgres[8213]: [87] LOG: removing transaction log file 00001AB200000084 Jan 8 20:17:49 mybox postgres[8213]: [88] LOG: removing transaction log file 00001AB200000085 Jan 8 20:17:54 mybox postgres[8213]: [89] LOG: removing transaction log file 00001AB200000086 Jan 8 20:17:57 mybox postgres[8213]: [90] LOG: removing transaction log file 00001AB200000087 Jan 8 20:18:21 mybox postgres[8213]: [91] LOG: removing transaction log file 00001AB200000088 Jan 8 20:18:32 mybox postgres[8213]: [92] LOG: removing transaction log file 00001AB200000089 Jan 8 20:18:40 mybox postgres[8213]: [93] LOG: removing transaction log file 00001AB20000008A Jan 8 20:18:43 mybox postgres[8213]: [94] LOG: removing transaction log file 00001AB20000008B Jan 8 20:18:46 mybox postgres[8213]: [95] LOG: removing transaction log file 00001AB20000008C Jan 8 20:18:52 mybox postgres[8213]: [96] LOG: removing transaction log file 00001AB20000008D Jan 8 20:19:01 mybox postgres[8213]: [97] LOG: removing transaction log file 00001AB20000008E Jan 8 20:19:01 mybox postgres[8213]: [98] LOG: removing transaction log file 00001AB20000008F Jan 8 20:19:11 mybox postgres[8213]: [99] LOG: removing transaction log file 00001AB200000090 Jan 8 20:19:27 mybox postgres[8213]: [100] LOG: removing transaction log file 00001AB200000091 Jan 8 20:19:31 mybox postgres[8213]: [101] LOG: removing transaction log file 00001AB200000092 Jan 8 20:19:34 mybox postgres[8213]: [102] LOG: removing transaction log file 00001AB200000093 Jan 8 20:19:36 mybox postgres[8213]: [103] LOG: removing transaction log file 00001AB200000094 Jan 8 20:19:36 mybox postgres[8213]: [104] LOG: removing transaction log file 00001AB200000095 Jan 8 20:19:36 mybox postgres[8213]: [105] LOG: removing transaction log file 00001AB200000096 Jan 8 20:19:36 mybox postgres[8213]: [106] LOG: removing transaction log file 00001AB200000097 Jan 8 20:19:36 mybox postgres[8213]: [107] LOG: removing transaction log file 00001AB200000098 Jan 8 20:19:36 mybox postgres[8213]: [108] LOG: removing transaction log file 00001AB200000099 Jan 8 20:19:37 mybox postgres[8213]: [109] LOG: removing transaction log file 00001AB20000009A Jan 8 20:19:37 mybox postgres[8213]: [110] LOG: removing transaction log file 00001AB20000009B Jan 8 20:19:37 mybox postgres[8213]: [111] LOG: removing transaction log file 00001AB20000009C Jan 8 20:19:37 mybox postgres[8213]: [112] LOG: removing transaction log file 00001AB20000009D Jan 8 20:19:38 mybox postgres[8213]: [113] LOG: removing transaction log file 00001AB20000009E Jan 8 20:19:38 mybox postgres[8213]: [114] LOG: removing transaction log file 00001AB20000009F Jan 8 20:19:40 mybox postgres[8213]: [115] LOG: removing transaction log file 00001AB2000000A0 Jan 8 20:19:40 mybox postgres[8213]: [116] LOG: removing transaction log file 00001AB2000000A1 Jan 8 20:19:43 mybox postgres[8213]: [117] LOG: removing transaction log file 00001AB2000000B6 Jan 8 20:19:43 mybox postgres[8213]: [118] LOG: removing transaction log file 00001AB2000000B7 Jan 8 20:19:44 mybox postgres[8213]: [119] LOG: removing transaction log file 00001AB2000000B9 Jan 8 20:19:53 mybox postgres[8213]: [120] LOG: removing transaction log file 00001AB2000000BA Jan 8 20:20:25 mybox postgres[8213]: [121] LOG: removing transaction log file 00001AB2000000BB Jan 8 20:21:02 mybox postgres[8213]: [122] LOG: removing transaction log file 00001AB2000000BD Jan 8 20:21:23 mybox postgres[8213]: [123] LOG: removing transaction log file 00001AB2000000BE Jan 8 20:21:33 mybox postgres[8213]: [124] LOG: removing transaction log file 00001AB2000000BF Jan 8 20:21:38 mybox postgres[8213]: [125] LOG: removing transaction log file 00001AB2000000C1 Jan 8 20:21:41 mybox postgres[8213]: [126] LOG: removing transaction log file 00001AB2000000C2 Jan 8 20:21:43 mybox postgres[8213]: [127] LOG: removing transaction log file 00001AB2000000C3 Jan 8 20:22:01 mybox postgres[8213]: [128] LOG: removing transaction log file 00001AB2000000C5 Jan 8 20:22:01 mybox postgres[8213]: [129] LOG: removing transaction log file 00001AB2000000C6 Jan 8 20:22:01 mybox postgres[8213]: [130] LOG: removing transaction log file 00001AB2000000C7 Jan 8 20:22:01 mybox postgres[8213]: [131] LOG: removing transaction log file 00001AB2000000C8 Jan 8 20:22:01 mybox postgres[8213]: [132] LOG: removing transaction log file 00001AB2000000C9 Jan 8 20:22:08 mybox postgres[8213]: [133] LOG: removing transaction log file 00001AB2000000CB Jan 8 20:22:08 mybox postgres[8213]: [134] LOG: removing transaction log file 00001AB2000000CC Jan 8 20:22:08 mybox postgres[8213]: [135] LOG: removing transaction log file 00001AB2000000CD Jan 8 20:22:14 mybox postgres[8213]: [136] LOG: removing transaction log file 00001AB2000000CE Jan 8 20:22:29 mybox postgres[8213]: [137] LOG: removing transaction log file 00001AB2000000CF Jan 8 20:22:36 mybox postgres[8213]: [138] LOG: removing transaction log file 00001AB2000000D0 Jan 8 20:22:49 mybox postgres[8213]: [139] LOG: removing transaction log file 00001AB2000000D1 Jan 8 20:22:56 mybox postgres[8213]: [140] LOG: removing transaction log file 00001AB2000000D2 Jan 8 20:23:01 mybox postgres[8213]: [141] LOG: removing transaction log file 00001AB2000000D3 Jan 8 20:23:01 mybox postgres[8213]: [142] LOG: removing transaction log file 00001AB2000000D5 Jan 8 20:23:01 mybox postgres[8213]: [143] LOG: removing transaction log file 00001AB2000000D6 Jan 8 20:23:27 mybox postgres[8213]: [144] LOG: removing transaction log file 00001AB2000000D7 Jan 8 20:23:31 mybox postgres[8213]: [145] LOG: removing transaction log file 00001AB2000000D8 Jan 8 20:23:35 mybox postgres[8213]: [146] LOG: removing transaction log file 00001AB2000000D9 Jan 8 20:23:35 mybox postgres[8213]: [147] LOG: removing transaction log file 00001AB2000000DA Jan 8 20:23:43 mybox postgres[8213]: [148] LOG: removing transaction log file 00001AB2000000DB Jan 8 20:23:47 mybox postgres[8213]: [149] LOG: removing transaction log file 00001AB2000000DC Jan 8 20:24:05 mybox postgres[8213]: [150] LOG: removing transaction log file 00001AB2000000DD Jan 8 20:24:17 mybox postgres[8213]: [151] LOG: removing transaction log file 00001AB2000000DF Jan 8 20:24:17 mybox postgres[8213]: [152] LOG: removing transaction log file 00001AB2000000E0 Jan 8 20:24:17 mybox postgres[8213]: [153] LOG: removing transaction log file 00001AB2000000E1 Jan 8 20:24:21 mybox postgres[8213]: [154] LOG: removing transaction log file 00001AB2000000E2 Jan 8 20:24:26 mybox postgres[8213]: [155] LOG: removing transaction log file 00001AB2000000E3 Jan 8 20:24:26 mybox postgres[8213]: [156] LOG: removing transaction log file 00001AB2000000E4 Jan 8 20:24:26 mybox postgres[8213]: [157] LOG: removing transaction log file 00001AB2000000E5 Jan 8 20:24:26 mybox postgres[8213]: [158] LOG: removing transaction log file 00001AB2000000E6 Jan 8 20:24:31 mybox postgres[8213]: [159] LOG: removing transaction log file 00001AB2000000E7 Jan 8 20:24:51 mybox postgres[8213]: [160] LOG: removing transaction log file 00001AB2000000E8 Jan 8 20:25:02 mybox postgres[8213]: [161] LOG: removing transaction log file 00001AB2000000EA Jan 8 20:25:25 mybox postgres[8213]: [162] LOG: removing transaction log file 00001AB2000000EB Jan 8 20:25:33 mybox postgres[1602]: [13] PANIC: ZeroFill failed to write /my/pg_xlog/xlogtemp.1602: No space left on device Jan 8 20:25:35 mybox postgres[8213]: [163] LOG: removing transaction log file 00001AB2000000EC Jan 8 20:25:35 mybox postgres[1602]: [14-1] LOG: statement: COPY table1 (domain, domain_id, customer_id, action_unspecified, action_unknown, Jan 8 20:25:35 mybox postgres[8213]: [164] LOG: removing transaction log file 00001AB2000000ED Jan 8 20:25:35 mybox postgres[8213]: [165] LOG: removing transaction log file 00001AB2000000EE Jan 8 20:25:35 mybox postgres[1602]: [14-2] action_none, action_deny, action_fail, action_strip, action_tag, action_quarantine, action_clean, action_copy, action_allow, Jan 8 20:25:35 mybox postgres[8213]: [166] LOG: removing transaction log file 00001AB2000000F0 Jan 8 20:25:35 mybox postgres[1602]: [14-3] module, period, created) FROM stdin Jan 8 20:25:35 mybox postgres[8213]: [167] LOG: removing transaction log file 00001AB2000000F1 Jan 8 20:25:35 mybox postgres[8213]: [168] LOG: removing transaction log file 00001AB2000000F2 Jan 8 20:25:36 mybox postgres[8213]: [169] LOG: removing transaction log file 00001AB2000000F3 -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, January 11, 2005 11:05 AM To: Lee Wu Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles "Lee Wu" <Lwu@mxlogic.com> writes: > On the other hand, should not PG use fix number/amount > of LOG files like Oracle even though there may be other activities at > the same time? No, it certainly should not. Anyone who's admin'd an Oracle installation will tell you what a PITA it is that Oracle keels over and dies when you exceed the fixed log space allocation. The real question here is why the log space isn't getting recycled in a timely fashion. > Postgres.log: > Jan 8 20:15:52 mybox postgres[8037]: [73] LOG: recycled transaction > log file 00001AB100000060 > all other recycling transaction log ... > Jan 8 20:15:52 mybox postgres[8037]: [74] LOG: removing transaction > log file 00001AB100000061 > all other removing transaction log ... > Jan 8 20:17:27 mybox postgres[8213]: [13] LOG: recycled transaction > log file 00001AB2000000A3 > all other recycling transaction log ... > Jan 8 20:17:42 mybox postgres[8213]: [74] LOG: removing transaction > log file 00001AB200000077 > all other removing transaction log ... > Jan 8 20:25:33 mybox postgres[1602]: [13] PANIC: ZeroFill failed to > write /my/pg_xlog/xlogtemp.1602: No space left on device > Jan 8 20:25:35 mybox postgres[8213]: [163] LOG: removing transaction > log file 00001AB2000000EC > Jan 8 20:25:35 mybox postgres[1602]: [14-1] LOG: statement: COPY > table1 (domain, domain_id, customer_id, action_unspecified, > action_unknown, > Jan 8 20:25:35 mybox postgres[8213]: [164] LOG: removing transaction > log file 00001AB2000000ED > Jan 8 20:25:35 mybox postgres[8213]: [165] LOG: removing transaction > log file 00001AB2000000EE > Jan 8 20:25:35 mybox postgres[1602]: [14-2] action_none, action_deny, > action_fail, action_strip, action_tag, action_quarantine, action_clean, > action_copy, action_allow, Hmm. You seem to have removed all the evidence about the interesting question, which is what process 8213 (which was evidently doing a checkpoint) was doing between 20:17:42 and 20:25:35. Also, what postgresql.conf parameters are you using? The mix of "removing" and "recycling" operations seems a bit odd. regards, tom lane
"Lee Wu" <Lwu@mxlogic.com> writes: > Here is postgres log from 20:17:42 to 20:25:35. > I did NOT removing anything except hostname and table names. Hmm. That shows the checkpoint process (8213) in a tight loop removing unwanted XLOG files. But apparently it is being starved for cycles --- look at the gaps between log entries. There's no problem with removing eight per second at 20:17:46, but after that there are periods of 10 to 20 seconds where it doesn't get to run at all. It really should have finished this phase in a few seconds, but instead it's still trying to do the removal seven minutes later when the COPY process fails. Apparently the COPY and VACUUM and whatever else is going on are able to create/use XLOG files faster than the checkpoint process can remove them. Which makes no sense. I think there is something fairly odd about either your kernel scheduler or your filesystem. What sort of platform is this anyway? regards, tom lane
cat /etc/redhat-release Red Hat Linux release 7.3 (Valhalla) pg_xlog is on local disk while $PGDATA is on RAID 10. Thanks, -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, January 11, 2005 1:00 PM To: Lee Wu Cc: pgsql-admin@postgresql.org Subject: Re: [ADMIN] Vacuum full - disk space eaten by WAL logfiles "Lee Wu" <Lwu@mxlogic.com> writes: > Here is postgres log from 20:17:42 to 20:25:35. > I did NOT removing anything except hostname and table names. Hmm. That shows the checkpoint process (8213) in a tight loop removing unwanted XLOG files. But apparently it is being starved for cycles --- look at the gaps between log entries. There's no problem with removing eight per second at 20:17:46, but after that there are periods of 10 to 20 seconds where it doesn't get to run at all. It really should have finished this phase in a few seconds, but instead it's still trying to do the removal seven minutes later when the COPY process fails. Apparently the COPY and VACUUM and whatever else is going on are able to create/use XLOG files faster than the checkpoint process can remove them. Which makes no sense. I think there is something fairly odd about either your kernel scheduler or your filesystem. What sort of platform is this anyway? regards, tom lane