Thread: Postgres slave not catching up (on 9.2)
Hello,
I have built up a hot-standby between a master running 9.2.4 and a slave running 9.2.9. I did the initial copy using "pg_basebackup" . My recovery.conf looks like:
standby_mode = 'on'
primary_conninfo = 'host=pXXXXXXXX port=XXXX user=replicator'
trigger_file = 'failover.now'
restore_command = 'test -f /ORA/dbs02/PUPDBTST/archive/%f && ln -fns /ORA/dbs02/PUPDBTST/archive/%f %p'
archive_cleanup_command = '/usr/local/pgsql/postgresql-9.2.9/bin/pg_archivecleanup /ORA/dbs03/PUPDBTST/data %r'
The slave (I don't have control on the master) is using 2 NFS file systems, one for WALs and another one for the data, on Netapp controllers:
dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600)
dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600)
The master produces quite a lot of WALs. This is what I get on the slave (number of WAL files, date-hour, Total size in MB), so per day is more than 400GB:
1582 20140930-17 25312MB
1328 20140930-18 21248MB
1960 20140930-19 31360MB
1201 20140930-20 19216MB
1467 20140930-21 23472MB
1298 20140930-22 20768MB
1579 20140930-23 25264MB
1646 20141001-00 26336MB
1274 20141001-01 20384MB
1652 20141001-02 26432MB
1756 20141001-03 28096MB
1628 20141001-04 26048MB
1015 20141001-05 16240MB
1624 20141001-06 25984MB
1652 20141001-07 26432MB
1286 20141001-08 20576MB
1485 20141001-09 23760MB
1987 20141001-10 31792MB
1432 20141001-11 22912MB
1235 20141001-12 19760MB
1690 20141001-13 27040MB
1442 20141001-14 23072MB
1500 20141001-15 24000MB
1306 20141001-16 20896MB
1491 20141001-17 23856MB
1535 20141001-18 24560MB
1548 20141001-19 24768MB
1068 20141001-20 17088MB
1519 20141001-21 24304MB
1608 20141001-22 25728MB
1019 20141001-23 16304MB
1568 20141002-00 25088MB
1411 20141002-01 22576MB
1781 20141002-02 28496MB
1280 20141002-03 20480MB
1556 20141002-04 24896MB
1114 20141002-05 17824MB
1906 20141002-06 30496MB
1316 20141002-07 21056MB
1483 20141002-08 23728MB
1245 20141002-09 19920MB
The slave is running on a server with 48GB, and 8 cores (Intel(R) Xeon(R) CPU E5630 @ 2.53GHz) running red hat 5.10, herewith the postgresql.conf:
postgres=# show all;
name | setting | description
---------------------------------+------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------
allow_system_table_mods | off | Allows modifications of the structure of system tables.
application_name | psql | Sets the application name to be reported in statistics and logs.
archive_command | (disabled) | Sets the shell command that will be called to archive a WAL file.
archive_mode | off | Allows archiving of WAL files using archive_command.
archive_timeout | 0 | Forces a switch to the next xlog file if a new file has not been started within N seconds.
array_nulls | on | Enable input of NULL elements in arrays.
authentication_timeout | 1min | Sets the maximum allowed time to complete client authentication.
autovacuum | on | Starts the autovacuum subprocess.
autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples.
autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates, or deletes prior to analyze.
autovacuum_freeze_max_age | 200000000 | Age at which to autovacuum a table to prevent transaction ID wraparound.
autovacuum_max_workers | 3 | Sets the maximum number of simultaneously running autovacuum worker processes.
autovacuum_naptime | 20s | Time to sleep between autovacuum runs.
autovacuum_vacuum_cost_delay | 5ms | Vacuum cost delay in milliseconds, for autovacuum.
autovacuum_vacuum_cost_limit | 1000 | Vacuum cost amount available before napping, for autovacuum.
autovacuum_vacuum_scale_factor | 0.1 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.
autovacuum_vacuum_threshold | 50 | Minimum number of tuple updates or deletes prior to vacuum.
backslash_quote | safe_encoding | Sets whether "\'" is allowed in string literals.
bgwriter_delay | 200ms | Background writer sleep time between rounds.
bgwriter_lru_maxpages | 100 | Background writer maximum number of LRU pages to flush per round.
bgwriter_lru_multiplier | 2 | Multiple of the average buffer usage to free per round.
block_size | 8192 | Shows the size of a disk block.
bonjour | off | Enables advertising the server via Bonjour.
bonjour_name | | Sets the Bonjour service name.
bytea_output | hex | Sets the output format for bytea.
check_function_bodies | on | Check function bodies during CREATE FUNCTION.
checkpoint_completion_target | 0.9 | Time spent flushing dirty buffers during checkpoint, as fraction of checkpoint interval.
checkpoint_segments | 64 | Sets the maximum distance in log segments between automatic WAL checkpoints.
checkpoint_timeout | 10min | Sets the maximum time between automatic WAL checkpoints.
checkpoint_warning | 1min | Enables warnings if checkpoint segments are filled more frequently than this.
client_encoding | UTF8 | Sets the client's character set encoding.
client_min_messages | notice | Sets the message levels that are sent to the client.
commit_delay | 0 | Sets the delay in microseconds between transaction commit and flushing WAL to disk.
commit_siblings | 5 | Sets the minimum concurrent open transactions before performing commit_delay.
config_file | /ORA/dbs03/PUPDBTST/data/postgresql.conf | Sets the server's main configuration file.
constraint_exclusion | on | Enables the planner to use constraints to optimize queries.
cpu_index_tuple_cost | 0.005 | Sets the planner's estimate of the cost of processing each index entry during an index scan.
cpu_operator_cost | 0.0025 | Sets the planner's estimate of the cost of processing each operator or function call.
cpu_tuple_cost | 0.01 | Sets the planner's estimate of the cost of processing each tuple (row).
cursor_tuple_fraction | 0.1 | Sets the planner's estimate of the fraction of a cursor's rows that will be retrieved.
data_directory | /ORA/dbs03/PUPDBTST/data | Sets the server's data directory.
DateStyle | ISO, MDY | Sets the display format for date and time values.
db_user_namespace | off | Enables per-database user names.
deadlock_timeout | 1s | Sets the time to wait on a lock before checking for deadlock.
debug_assertions | off | Turns on various assertion checks.
debug_pretty_print | on | Indents parse and plan tree displays.
debug_print_parse | off | Logs each query's parse tree.
debug_print_plan | off | Logs each query's execution plan.
debug_print_rewritten | off | Logs each query's rewritten parse tree.
default_statistics_target | 50 | Sets the default statistics target.
default_tablespace | | Sets the default tablespace to create tables and indexes in.
default_text_search_config | pg_catalog.english | Sets default text search configuration.
default_transaction_deferrable | off | Sets the default deferrable status of new transactions.
default_transaction_isolation | read committed | Sets the transaction isolation level of each new transaction.
default_transaction_read_only | off | Sets the default read-only status of new transactions.
default_with_oids | off | Create new tables with OIDs by default.
dynamic_library_path | $libdir | Sets the path for dynamically loadable modules.
effective_cache_size | 36GB | Sets the planner's assumption about the size of the disk cache.
effective_io_concurrency | 1 | Number of simultaneous requests that can be handled efficiently by the disk subsystem.
enable_bitmapscan | on | Enables the planner's use of bitmap-scan plans.
enable_hashagg | on | Enables the planner's use of hashed aggregation plans.
enable_hashjoin | on | Enables the planner's use of hash join plans.
enable_indexonlyscan | on | Enables the planner's use of index-only-scan plans.
enable_indexscan | on | Enables the planner's use of index-scan plans.
enable_material | on | Enables the planner's use of materialization.
enable_mergejoin | on | Enables the planner's use of merge join plans.
enable_nestloop | on | Enables the planner's use of nested-loop join plans.
enable_seqscan | on | Enables the planner's use of sequential-scan plans.
enable_sort | on | Enables the planner's use of explicit sort steps.
enable_tidscan | on | Enables the planner's use of TID scan plans.
escape_string_warning | on | Warn about backslash escapes in ordinary string literals.
event_source | PostgreSQL | Sets the application name used to identify PostgreSQL messages in the event log.
exit_on_error | off | Terminate session on any error.
external_pid_file | | Writes the postmaster PID to the specified file.
extra_float_digits | 0 | Sets the number of digits displayed for floating-point values.
from_collapse_limit | 8 | Sets the FROM-list size beyond which subqueries are not collapsed.
fsync | off | Forces synchronization of updates to disk.
full_page_writes | on | Writes full pages to WAL when first modified after a checkpoint.
geqo | on | Enables genetic query optimization.
geqo_effort | 5 | GEQO: effort is used to set the default for other GEQO parameters.
geqo_generations | 0 | GEQO: number of iterations of the algorithm.
geqo_pool_size | 0 | GEQO: number of individuals in the population.
geqo_seed | 0 | GEQO: seed for random path selection.
geqo_selection_bias | 2 | GEQO: selective pressure within the population.
geqo_threshold | 12 | Sets the threshold of FROM items beyond which GEQO is used.
gin_fuzzy_search_limit | 0 | Sets the maximum allowed result for exact search by GIN.
hba_file | /ORA/dbs03/PUPDBTST/data/pg_hba.conf | Sets the server's "hba" configuration file.
hot_standby | on | Allows connections and queries during recovery.
hot_standby_feedback | off | Allows feedback from a hot standby to the primary that will avoid query conflicts.
ident_file | /ORA/dbs03/PUPDBTST/data/pg_ident.conf | Sets the server's "ident" configuration file.
ignore_system_indexes | off | Disables reading from system indexes.
integer_datetimes | on | Datetimes are integer based.
IntervalStyle | postgres | Sets the display format for interval values.
join_collapse_limit | 8 | Sets the FROM-list size beyond which JOIN constructs are not flattened.
krb_caseins_users | off | Sets whether Kerberos and GSSAPI user names should be treated as case-insensitive.
krb_server_keyfile | FILE:/etc/postgresql/krb5.keytab | Sets the location of the Kerberos server key file.
krb_srvname | postgres | Sets the name of the Kerberos service.
lc_collate | en_US.UTF-8 | Shows the collation order locale.
lc_ctype | en_US.UTF-8 | Shows the character classification and case conversion locale.
lc_messages | en_US.UTF-8 | Sets the language in which messages are displayed.
lc_monetary | en_US.UTF-8 | Sets the locale for formatting monetary amounts.
lc_numeric | en_US.UTF-8 | Sets the locale for formatting numbers.
lc_time | en_US.UTF-8 | Sets the locale for formatting date and time values.
listen_addresses | * | Sets the host name or IP address(es) to listen to.
lo_compat_privileges | off | Enables backward compatibility mode for privilege checks on large objects.
local_preload_libraries | | Lists shared libraries to preload into each backend.
log_autovacuum_min_duration | 0 | Sets the minimum execution time above which autovacuum actions will be logged.
log_checkpoints | off | Logs each checkpoint.
log_connections | off | Logs each successful connection.
log_destination | stderr | Sets the destination for server log output.
log_directory | pg_log | Sets the destination directory for log files.
log_disconnections | off | Logs end of a session, including duration.
log_duration | off | Logs the duration of each completed SQL statement.
log_error_verbosity | default | Sets the verbosity of logged messages.
log_executor_stats | off | Writes executor performance statistics to the server log.
log_file_mode | 0600 | Sets the file permissions for log files.
log_filename | postgresql-%Y-%m-%d_%H%M%S.log | Sets the file name pattern for log files.
log_hostname | off | Logs the host name in the connection logs.
log_line_prefix | <%m> | Controls information prefixed to each log line.
log_lock_waits | off | Logs long lock waits.
log_min_duration_statement | -1 | Sets the minimum execution time above which statements will be logged.
log_min_error_statement | error | Causes all statements generating error at or above this level to be logged.
log_min_messages | warning | Sets the message levels that are logged.
log_parser_stats | off | Writes parser performance statistics to the server log.
log_planner_stats | off | Writes planner performance statistics to the server log.
log_rotation_age | 1d | Automatic log file rotation will occur after N minutes.
log_rotation_size | 10MB | Automatic log file rotation will occur after N kilobytes.
log_statement | none | Sets the type of statements logged.
log_statement_stats | off | Writes cumulative performance statistics to the server log.
log_temp_files | -1 | Log the use of temporary files larger than this number of kilobytes.
log_timezone | Europe/Paris | Sets the time zone to use in log messages.
log_truncate_on_rotation | off | Truncate existing log files of same name during log rotation.
logging_collector | off | Start a subprocess to capture stderr output and/or csvlogs into log files.
maintenance_work_mem | 1GB | Sets the maximum memory to be used for maintenance operations.
max_connections | 80 | Sets the maximum number of concurrent connections.
max_files_per_process | 1000 | Sets the maximum number of simultaneously open files for each server process.
max_function_args | 100 | Shows the maximum number of function arguments.
max_identifier_length | 63 | Shows the maximum identifier length.
max_index_keys | 32 | Shows the maximum number of index keys.
max_locks_per_transaction | 64 | Sets the maximum number of locks per transaction.
max_pred_locks_per_transaction | 64 | Sets the maximum number of predicate locks per transaction.
max_prepared_transactions | 0 | Sets the maximum number of simultaneously prepared transactions.
max_stack_depth | 2MB | Sets the maximum stack depth, in kilobytes.
max_standby_archive_delay | 30s | Sets the maximum delay before canceling queries when a hot standby server is processing archived WAL data.
max_standby_streaming_delay | 30s | Sets the maximum delay before canceling queries when a hot standby server is processing streamed WAL data.
max_wal_senders | 0 | Sets the maximum number of simultaneously running WAL sender processes.
password_encryption | on | Encrypt passwords.
port | 6600 | Sets the TCP port the server listens on.
post_auth_delay | 0 | Waits N seconds on connection startup after authentication.
pre_auth_delay | 0 | Waits N seconds on connection startup before authentication.
quote_all_identifiers | off | When generating SQL fragments, quote all identifiers.
random_page_cost | 4 | Sets the planner's estimate of the cost of a nonsequentially fetched disk page.
replication_timeout | 1min | Sets the maximum time to wait for WAL replication.
restart_after_crash | on | Reinitialize server after backend crash.
search_path | "$user",public | Sets the schema search order for names that are not schema-qualified.
segment_size | 1GB | Shows the number of pages per disk file.
seq_page_cost | 1 | Sets the planner's estimate of the cost of a sequentially fetched disk page.
server_encoding | UTF8 | Sets the server (database) character set encoding.
server_version | 9.2.9 | Shows the server version.
server_version_num | 90209 | Shows the server version as an integer.
session_replication_role | origin | Sets the session's behavior for triggers and rewrite rules.
shared_buffers | 12GB | Sets the number of shared memory buffers used by the server.
shared_preload_libraries | | Lists shared libraries to preload into server.
sql_inheritance | on | Causes subtables to be included by default in various commands.
ssl | off | Enables SSL connections.
ssl_ca_file | | Location of the SSL certificate authority file.
ssl_cert_file | server.crt | Location of the SSL server certificate file.
ssl_ciphers | ALL:!ADH:!LOW:!EXP:!MD5:@STRENGTH | Sets the list of allowed SSL ciphers.
ssl_crl_file | | Location of the SSL certificate revocation list file.
ssl_key_file | server.key | Location of the SSL server private key file.
ssl_renegotiation_limit | 512MB | Set the amount of traffic to send and receive before renegotiating the encryption keys.
standard_conforming_strings | on | Causes '...' strings to treat backslashes literally.
statement_timeout | 0 | Sets the maximum allowed duration of any statement.
stats_temp_directory | pg_stat_tmp | Writes temporary statistics files to the specified directory.
superuser_reserved_connections | 3 | Sets the number of connection slots reserved for superusers.
synchronize_seqscans | on | Enable synchronized sequential scans.
synchronous_commit | off | Sets the current transaction's synchronization level.
synchronous_standby_names | | List of names of potential synchronous standbys.
syslog_facility | local0 | Sets the syslog "facility" to be used when syslog enabled.
syslog_ident | postgres | Sets the program name used to identify PostgreSQL messages in syslog.
tcp_keepalives_count | 0 | Maximum number of TCP keepalive retransmits.
tcp_keepalives_idle | 0 | Time between issuing TCP keepalives.
tcp_keepalives_interval | 0 | Time between TCP keepalive retransmits.
temp_buffers | 8MB | Sets the maximum number of temporary buffers used by each session.
temp_file_limit | -1 | Limits the total size of all temporary files used by each session.
temp_tablespaces | | Sets the tablespace(s) to use for temporary tables and sort files.
TimeZone | Europe/Paris | Sets the time zone for displaying and interpreting time stamps.
timezone_abbreviations | Default | Selects a file of time zone abbreviations.
trace_notify | off | Generates debugging output for LISTEN and NOTIFY.
trace_recovery_messages | log | Enables logging of recovery-related debugging information.
trace_sort | off | Emit information about resource usage in sorting.
track_activities | on | Collects information about executing commands.
track_activity_query_size | 1024 | Sets the size reserved for pg_stat_activity.query, in bytes.
track_counts | on | Collects statistics on database activity.
track_functions | none | Collects function-level statistics on database activity.
track_io_timing | off | Collects timing statistics for database I/O activity.
transaction_deferrable | off | Whether to defer a read-only serializable transaction until it can be executed with no possible serialization failures.
transaction_isolation | read committed | Sets the current transaction's isolation level.
transaction_read_only | on | Sets the current transaction's read-only status.
transform_null_equals | off | Treats "expr=NULL" as "expr IS NULL".
unix_socket_directory | /var/lib/pgsql | Sets the directory where the Unix-domain socket will be created.
unix_socket_group | | Sets the owning group of the Unix-domain socket.
unix_socket_permissions | 0777 | Sets the access permissions of the Unix-domain socket.
update_process_title | on | Updates the process title to show the active SQL command.
vacuum_cost_delay | 0 | Vacuum cost delay in milliseconds.
vacuum_cost_limit | 200 | Vacuum cost amount available before napping.
vacuum_cost_page_dirty | 20 | Vacuum cost for a page dirtied by vacuum.
vacuum_cost_page_hit | 1 | Vacuum cost for a page found in the buffer cache.
vacuum_cost_page_miss | 10 | Vacuum cost for a page not found in the buffer cache.
vacuum_defer_cleanup_age | 0 | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any.
vacuum_freeze_min_age | 50000000 | Minimum age at which VACUUM should freeze a table row.
vacuum_freeze_table_age | 150000000 | Age at which VACUUM should scan whole table to freeze tuples.
wal_block_size | 8192 | Shows the block size in the write ahead log.
wal_buffers | 8MB | Sets the number of disk-page buffers in shared memory for WAL.
wal_keep_segments | 0 | Sets the number of WAL files held for standby servers.
wal_level | hot_standby | Set the level of information written to the WAL.
wal_receiver_status_interval | 10s | Sets the maximum interval between WAL receiver status reports to the primary.
wal_segment_size | 16MB | Shows the number of pages per write ahead log segment.
wal_sync_method | fdatasync | Selects the method used for forcing WAL updates to disk.
wal_writer_delay | 200ms | WAL writer sleep time between WAL flushes.
work_mem | 288MB | Sets the maximum memory to be used for query workspaces.
xmlbinary | base64 | Sets how binary values are to be encoded in XML.
xmloption | content | Sets whether XML data in implicit parsing and serialization operations is to be considered as documents or content fragments.
zero_damaged_pages | off | Continues processing past damaged page headers.
The streaming is working perfectly:
/usr/local/pgsql/postgresql-9.2.9/bin/psql -U admin -h XXXXX -p XXXXX -d puppetdb -c "select pid,usesysid, usename, application_name, client_addr, state, sent_location,write_location,replay_location,sync_state from pg_stat_replication;"
pid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | replay_location | sync_state
--------+----------+------------+------------------+-------------+-----------+---------------+----------------+-----------------+------------
117659 | 16384 | replicator | walreceiver | 10.16.7.137 | streaming | AA74/DD630978 | AA74/DD630978 | A977/F84F0BE0 | async
(1 row)
But the lag is increasing constantly, it looks the replay can not cope with:
/usr/local/pgsql/postgresql-9.2.9/bin/psql -U postgres -h /var/lib/pgsql/ -p 6600 -d puppetdb -c "SELECT now(), now() - pg_last_xact_replay_timestamp() AS time_lag" | perl -ne 'if (/\|\s+(\d{2}):(\d{2}):(\d{2})\.\d+/) {$hour=$1;$min=$2;$sec=$3; print $_;}'
2014-09-22 18:40:02.004482+02 | 00:00:03.166557
2014-09-22 18:50:02.001836+02 | 00:00:00.765323
2014-09-22 19:00:01.229943+02 | 00:00:00.600354
2014-09-22 19:10:01.655343+02 | 00:00:07.85969
2014-09-22 19:20:01.872653+02 | 00:07:23.727307
2014-09-22 19:30:01.458236+02 | 00:14:16.244349
2014-09-22 19:40:01.715044+02 | 00:20:31.616665
2014-09-22 19:50:01.949646+02 | 00:28:20.129136
2014-09-22 20:00:01.216571+02 | 00:31:11.289404
...
2014-09-30 14:00:01.67156+02 | 23:20:11.229815
2014-09-30 14:10:01.884168+02 | 23:23:59.162476
2014-09-30 14:20:02.022649+02 | 23:26:39.082807
2014-09-30 14:30:01.263667+02 | 23:34:02.517874
2014-09-30 14:40:01.421162+02 | 23:41:30.496393
2014-09-30 14:50:01.650239+02 | 23:49:11.102335
2014-09-30 15:00:02.055655+02 | 23:56:07.012862
I tried to play with how the IO is handled, making it less strict setting synchronous_commit and fsync to off with not much success.
I have also done a second test increasing shared_buffers from 12GB to 24GB (we are running on a 48GB, 8 cores server).
Please let me know if you can see something obvious I am missing.
Thanks for your help,
Ruben
On Sat, Nov 8, 2014 at 2:11 PM, Ruben Domingo Gaspar Aparicio wrote: > The slave (I don't have control on the master) is using 2 NFS file systems, > one for WALs and another one for the data, on Netapp controllers: > > dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs > (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) > > dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs > (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) You should use noatime to avoid unnecessary IO. > The master produces quite a lot of WALs. This is what I get on the slave > (number of WAL files, date-hour, Total size in MB), so per day is more than > 400GB: > I tried to play with how the IO is handled, making it less strict setting > synchronous_commit and fsync to off with not much success. > > I have also done a second test increasing shared_buffers from 12GB to 24GB > (we are running on a 48GB, 8 cores server). > Please let me know if you can see something obvious I am missing. Your IO system needs to be able to deliver sustained IO bandwith at least as large as you need to read and write all the changes. What raw IO bandwidth do those NFS file systems deliver _long term_? I am not talking about spikes because there are buffers. I am talking about the minimum of network throughput on one hand and raw disk IO those boxes can do on the other hand. Then, how much of it is available to your slave? Did you do the math to ensure that the IO bandwidth you have available on the slave is at least as high as what is needed? Note that it's not simply the WAL size that needs to be written and read but also data pages. Kind regards robert -- [guy, jim].each {|him| remember.him do |as, often| as.you_can - without end} http://blog.rubybestpractices.com/
Hi, On 2014-11-08 13:11:25 +0000, Ruben Domingo Gaspar Aparicio wrote: > Hello, > > I have built up a hot-standby between a master running 9.2.4 and a slave running 9.2.9. I did the initial copy using "pg_basebackup". My recovery.conf looks like: > > standby_mode = 'on' > primary_conninfo = 'host=pXXXXXXXX port=XXXX user=replicator' > trigger_file = 'failover.now' > restore_command = 'test -f /ORA/dbs02/PUPDBTST/archive/%f && ln -fns /ORA/dbs02/PUPDBTST/archive/%f %p' > archive_cleanup_command = '/usr/local/pgsql/postgresql-9.2.9/bin/pg_archivecleanup /ORA/dbs03/PUPDBTST/data %r' > > The slave (I don't have control on the master) is using 2 NFS file systems, one for WALs and another one for the data,on Netapp controllers: > > dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) > dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=600) > > The streaming is working perfectly: > > /usr/local/pgsql/postgresql-9.2.9/bin/psql -U admin -h XXXXX -p XXXXX -d puppetdb -c "select pid,usesysid, usename, application_name,client_addr, state, sent_location,write_location,replay_location,sync_state from pg_stat_replication;" > pid | usesysid | usename | application_name | client_addr | state | sent_location | write_location | replay_location| sync_state > --------+----------+------------+------------------+-------------+-----------+---------------+----------------+-----------------+------------ > 117659 | 16384 | replicator | walreceiver | 10.16.7.137 | streaming | AA74/DD630978 | AA74/DD630978 | A977/F84F0BE0 | async > (1 row) > > But the lag is increasing constantly, it looks the replay can not cope with: I have a couple of questions: 1) Is the standby actually used for querying? Is it possible that replay frequently conflicts with active queries? As you don't have hot_standby_feedback enabled that seems quite possible. 2) Is the startup process on the standby CPU or IO bound? 3) Does the workload involve loads of temporary tables or generally transactions locking lots of tables exclusively in one transaction? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Indeed I could save some IO with noatime. I must say I haven’t found any recommendation about mount options for postgresql,likely because this is not encourage. The ones you see are taking from a Oracle cluster configuration where severalnodes see the same files. It's not the case on this setup. The IO is not an issue. The storage is not at all saturated. Slave gets streaming perfectly but the apply is quite slow,looks like always working with pages of 8k at a time: --datafiles [root@ ~]# /ORA/dbs01/syscontrol/projects/dfm/bin/smetrics -i 5 -n 100 -o vol dbnasg403-12a:/vol/dodpupdbtst03 Instance total_opsread_ops write_ops read_data write_data avg_latency read_latency write_latenc /s /s /s b/s b/s us us us dodpupdbtst03 6466 0 162 0 2350619 31.53 0 764.70 dodpupdbtst03 6762 0 843 0 8751169 48.32 0 263.10 dodpupdbtst03 7023 0 1547 0 14914498 112.88 0 303.16 dodpupdbtst03 5373 0 321 6501 3809930 58.44 11287.75 467.21 dodpupdbtst03 5618 0 183 0 1661200 20.91 0 265.61 dodpupdbtst03 5538 0 214 0 3471380 29.24 0 374.27 dodpupdbtst03 5753 0 425 0 4973131 45.36 0 351.08 dodpupdbtst03 6110 0 142 0 2331695 20.96 0 378.95 --WALs Bye Bye[root@ ~]# /ORA/dbs01/syscontrol/projects/dfm/bin/smetrics -i 5 -n 100 -o vol dbnasg401-12a:/vol/dodpupdbtst02 Instancetotal_ops read_ops write_ops read_data write_data avg_latency read_latency write_latenc /s /s /s b/s b/s us us us dodpupdbtst02 1017 202 93 5915899 2637111 2033.22 10116.09 172.61 dodpupdbtst02 1284 242 141 7368712 4309409 1235.11 6306.37 172.89 dodpupdbtst02 1357 231 268 6869816 8489466 957.55 5104.09 192.26 dodpupdbtst02 1566 264 288 8142965 9008529 747.96 4069.78 180.00 dodpupdbtst02 1333 235 153 7601051 4755791 993.81 5394.99 176.99 dodpupdbtst02 1261 199 287 6124821 9075170 896.32 5150.28 203.81 dodpupdbtst02 963 161 192 4955996 6066333 1757.66 10035.06 213.12 dodpupdbtst02 924 159 157 4782617 4807262 1092.61 5804.85 236.91 dodpupdbtst02 591 97 137 2899085 4275046 1218.24 6980.66 190.20 Writes are usually fast (us as they use the NVRAM )and reads are about 5 ms which is quite ok considering SATA disks (theyhave a flash cache of 512GB, this is why we get this average). Thank you, Ruben
Hi Andres, Sorry for my delay to reply. Here below my replies: > I have a couple of questions: > 1) Is the standby actually used for querying? Is it possible that replay > frequently conflicts with active queries? As you don't have > hot_standby_feedback enabled that seems quite possible. Nowadays we don't manage to have a decent lag so the standby is not use at all. No clients connect to it. > 2) Is the startup process on the standby CPU or IO bound? The servers is almost idle. I don't see any bottle neck either on CPU or IO. > 3) Does the workload involve loads of temporary tables or generally > transactions locking lots of tables exclusively in one transaction? We have monitored the master for a couple of days we haven't detected any "create temp table" statement. For the locks I see it's also not the case in my opinion, at a given point in time I don't see many tables lock in exclusivemode: puppetdb=# SELECT locktype, relation::regclass, mode, transactionid AS tid, datname, virtualtransaction AS vtid, pid, granted FROM pg_catalog.pg_locks l LEFT JOIN pg_catalog.pg_database db ON db.oid = l.database WHERE NOT pid = pg_backend_pid(); locktype | relation | mode | tid | datname | vtid | pid | granted ---------------+----------------------------------------------+------------------+-----------+----------+------------+--------+--------- relation | resource_params_cache_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | resource_params_cache | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | catalog_resources_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | idx_catalog_resources_exported_true | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | idx_catalog_resources_resource | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | idx_catalog_resources_type | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | idx_catalog_resources_type_title | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | catalog_resources | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | idx_catalogs_transaction_uuid | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | idx_catalogs_transaction_uuid | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs_certname_key | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs_certname_key | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs_hash_key | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs_hash_key | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs_pkey | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | catalogs | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t relation | certnames_pkey | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | certnames_pkey | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t relation | certnames | AccessShareLock | | puppetdb | 5/3099716 | 54422| t relation | certnames | RowExclusiveLock | | puppetdb | 5/3099716 | 54422| t virtualxid | | ExclusiveLock | | | 5/3099716 | 54422| t relation | resource_params_cache_pkey | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | resource_params_cache | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | catalog_resources_pkey | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | idx_catalog_resources_exported_true | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | idx_catalog_resources_resource | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | idx_catalog_resources_type | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | idx_catalog_resources_type_title | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | catalog_resources | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | idx_catalogs_transaction_uuid | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | idx_catalogs_transaction_uuid | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs_certname_key | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs_certname_key | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs_hash_key | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs_hash_key | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs_pkey | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs_pkey | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | catalogs | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t relation | certnames_pkey | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | certnames_pkey | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t relation | certnames | AccessShareLock | | puppetdb | 20/2901642 | 100098| t relation | certnames | RowExclusiveLock | | puppetdb | 20/2901642 | 100098| t virtualxid | | ExclusiveLock | | | 20/2901642 | 100098| t relation | idx_catalogs_transaction_uuid | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | catalogs_certname_key | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | catalogs_hash_key | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | catalogs_pkey | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | catalogs | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | certnames_pkey | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | certnames_pkey | RowExclusiveLock | | puppetdb | 21/2767248 | 13044| t relation | certnames | AccessShareLock | | puppetdb | 21/2767248 | 13044| t relation | certnames | RowExclusiveLock | | puppetdb | 21/2767248 | 13044| t virtualxid | | ExclusiveLock | | | 21/2767248 | 13044| t relation | edges | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | resource_params_cache_pkey | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | resource_params_cache | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | resource_params_cache | RowShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalog_resources_pkey | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalog_resources_pkey | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_exported_true | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_exported_true | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_resource | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_resource | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_type | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_type | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_type_title | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalog_resources_type_title | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | catalog_resources | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalog_resources | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalogs_transaction_uuid | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | idx_catalogs_transaction_uuid | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs_certname_key | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs_certname_key | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs_hash_key | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs_hash_key | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs_pkey | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs_pkey | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | catalogs | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | certnames_pkey | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | certnames_pkey | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t relation | certnames | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | certnames | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t virtualxid | | ExclusiveLock | | | 27/1597400 | 77873| t relation | certnames | RowShareLock | | puppetdb | 27/1597400 | 77873| t relation | edges | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t transactionid | | ExclusiveLock | 191755866 | | 27/1597400 | 77873| t transactionid | | ExclusiveLock | 191755880 | | 20/2901642 | 100098| t relation | edges_certname_source_target_type_unique_key | AccessShareLock | | puppetdb | 27/1597400 | 77873| t relation | edges_certname_source_target_type_unique_key | RowExclusiveLock | | puppetdb | 27/1597400 | 77873| t transactionid | | ExclusiveLock | 191755874 | | 5/3099716 | 54422| t transactionid | | ExclusiveLock | 191755883 | | 21/2767248 | 13044| t (95 rows) Jus to comment that we are running a DBaaS, we don't know much about the apps running on our servers. Thank you, Ruben
> > The slave (I don't have control on the master) is using 2 NFS file > > systems, one for WALs and another one for the data, on Netapp controllers: > > > > dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs > > > (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=60 > > 0) > > > > dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs > > > (rw,remount,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,timeo=60 > > 0) > > You should use noatime to avoid unnecessary IO. > Just to mention that changing the mount points from: dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw, actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw, actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) to dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) it did have a big impact. The profile of the recovery process on terms of calls changed quite a lot: From: [postgres@itrac1202 tmp]$ strace -p 9596 -c Process 9596 attached - interrupt to quit Process 9596 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 78.73 0.217824 0 456855 381376 read 17.87 0.049453 0 515320 lseek 2.89 0.007989 12 669 669 poll 0.33 0.000912 25 36 open 0.07 0.000206 0 994 994 stat 0.05 0.000151 0 995 787 rt_sigreturn 0.05 0.000133 0 673 write 0.00 0.000000 0 36 close 0.00 0.000000 0 52 kill ------ ----------- ----------- --------- --------- ---------------- 100.00 0.276668 975630 383826 total To: [postgres@itrac1202 tmp]$ strace -p 9596 -c Process 9596 attached - interrupt to quit Process 9596 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 78.73 0.217824 0 456855 381376 read 17.87 0.049453 0 515320 lseek 2.89 0.007989 12 669 669 poll 0.33 0.000912 25 36 open 0.07 0.000206 0 994 994 stat 0.05 0.000151 0 995 787 rt_sigreturn 0.05 0.000133 0 673 write 0.00 0.000000 0 36 close 0.00 0.000000 0 52 kill ------ ----------- ----------- --------- --------- ---------------- 100.00 0.276668 975630 383826 total We did also increased the shared_buffers from 12 to 24GB. The lag has decreased most of the time: */10 * * * * /usr/local/pgsql/postgresql-9.2.9/bin/psql -U postgres -h /var/lib/pgsql/ -p 6600 -d puppetdb -c "SELECTnow(), now() - pg_last_xact_replay_timestamp() AS time_lag" | perl -ne 'if (/\|\s+(\d{2}):(\d{2}):(\d{2})\.\d+/){$hour=$1;$min=$2;$sec=$3; print $_;}' >> /tmp/lag929morememmount.log ... 2014-12-14 14:10:01.688947+01 | 00:00:00.096524 2014-12-14 14:20:01.798223+01 | 00:00:00.024083 2014-12-14 14:30:01.884448+01 | 00:00:00.420791 2014-12-14 14:40:01.960623+01 | 00:00:00.168318 2014-12-14 14:50:01.191487+01 | 00:00:00.163832 2014-12-14 15:00:02.146436+01 | 00:00:00.026934 2014-12-14 15:10:01.277963+01 | 00:00:00.332185 2014-12-14 15:20:01.353979+01 | 00:00:00.020616 2014-12-14 15:30:01.417092+01 | 00:00:00.584768 2014-12-14 15:40:01.575347+01 | 00:00:00.151685 2014-12-14 15:50:01.205507+01 | 00:00:00.102073 2014-12-14 16:00:01.321511+01 | 00:00:00.590677 2014-12-14 16:10:01.570474+01 | 00:00:00.182683 2014-12-14 16:20:01.640095+01 | 00:00:00.420185 2014-12-14 16:30:01.767033+01 | 00:00:00.015989 2014-12-14 16:40:01.849532+01 | 00:00:00.106296 2014-12-14 16:50:01.920876+01 | 00:00:00.258851 2014-12-14 17:00:02.000278+01 | 00:00:00.119841 2014-12-14 17:10:01.894227+01 | 00:00:00.091599 2014-12-14 17:20:01.61729+01 | 00:00:00.367367 2014-12-14 17:30:01.683326+01 | 00:00:00.103884 2014-12-14 17:40:01.755904+01 | 00:00:00.051262 2014-12-14 17:50:01.833825+01 | 00:00:00.06901 2014-12-14 18:00:01.901236+01 | 00:00:00.17467 2014-12-14 18:10:01.186283+01 | 00:00:00.214941 2014-12-14 18:20:01.145413+01 | 00:00:00.03517 2014-12-14 18:30:01.241746+01 | 00:00:00.207842 2014-12-14 18:40:01.299413+01 | 00:00:00.147878 2014-12-14 18:50:01.368541+01 | 00:00:00.393893 2014-12-14 19:00:01.430736+01 | 00:00:00.031226 2014-12-14 19:10:01.672117+01 | 00:05:03.512832 2014-12-14 19:20:01.9195+01 | 00:06:39.08761 2014-12-14 19:30:02.184486+01 | 00:00:00.307668 2014-12-14 19:40:01.227278+01 | 00:00:00.054831 2014-12-14 19:50:01.305485+01 | 00:00:00.425595 2014-12-14 20:00:01.410501+01 | 00:00:00.394526 2014-12-14 20:10:01.984196+01 | 00:00:00.388844 2014-12-14 20:20:01.031042+01 | 00:00:00.503092 2014-12-14 20:30:01.225871+01 | 00:00:00.241493 2014-12-14 20:40:01.305696+01 | 00:00:00.280656 2014-12-14 20:50:01.379617+01 | 00:00:00.151103 2014-12-14 21:00:01.468849+01 | 00:00:00.014412 2014-12-14 21:10:01.724514+01 | 00:00:00.147476 2014-12-14 21:20:01.799292+01 | 00:00:00.08696 2014-12-14 21:30:01.866336+01 | 00:00:00.035226 2014-12-14 21:40:01.942882+01 | 00:00:00.111701 2014-12-14 21:50:02.010419+01 | 00:00:00.215121 2014-12-14 22:00:01.110033+01 | 00:00:16.460612 2014-12-14 22:10:01.568286+01 | 00:00:00.077897 2014-12-14 22:20:01.682714+01 | 00:00:00.104112 2014-12-14 22:30:01.758958+01 | 00:00:00.061474 2014-12-14 22:40:01.970545+01 | 00:00:00.108613 2014-12-14 22:50:01.038908+01 | 00:00:00.039637 2014-12-14 23:00:01.120295+01 | 00:00:00.338731 2014-12-14 23:10:01.365371+01 | 00:00:00.680065 2014-12-14 23:20:01.423365+01 | 00:00:00.154614 2014-12-14 23:30:01.48998+01 | 00:00:00.014643 2014-12-14 23:40:01.569452+01 | 00:00:00.126961 2014-12-14 23:50:01.63047+01 | 00:00:00.303156 2014-12-15 00:00:01.278047+01 | 00:00:00.351391 2014-12-15 00:10:01.382566+01 | 00:00:00.012265 2014-12-15 00:20:01.444746+01 | 00:07:39.002651 2014-12-15 00:30:01.510413+01 | 00:16:13.476753 2014-12-15 00:40:01.97735+01 | 00:00:00.105011 2014-12-15 00:50:01.082528+01 | 00:01:10.313796 2014-12-15 01:00:01.124843+01 | 00:00:01.508016 2014-12-15 01:10:01.818415+01 | 00:00:00.082441 2014-12-15 01:20:01.961064+01 | 00:00:00.048221 2014-12-15 01:30:01.355472+01 | 00:00:00.37941 2014-12-15 01:40:01.42728+01 | 00:00:00.013836 2014-12-15 01:50:01.486446+01 | 00:00:00.110321 2014-12-15 02:00:01.566731+01 | 00:00:00.290281 2014-12-15 02:10:01.236574+01 | 00:01:15.954532 2014-12-15 02:20:01.440259+01 | 00:00:00.471677 2014-12-15 02:30:01.5733+01 | 00:00:00.208574 2014-12-15 02:40:01.662425+01 | 00:00:00.591091 2014-12-15 02:50:01.263385+01 | 00:00:00.050648 2014-12-15 03:00:01.340777+01 | 00:00:00.289115 2014-12-15 03:10:01.993079+01 | 00:00:00.790201 2014-12-15 03:20:01.061826+01 | 00:00:00.043176 2014-12-15 03:30:01.125639+01 | 00:00:00.172924 2014-12-15 03:40:01.252033+01 | 00:03:05.113579 2014-12-15 03:50:01.362396+01 | 00:00:00.254974 2014-12-15 04:00:01.370922+01 | 00:00:00.208254 2014-12-15 04:10:01.472816+01 | 00:00:00.077214 2014-12-15 04:20:01.553443+01 | 00:00:00.135887 2014-12-15 04:30:01.63607+01 | 00:00:00.027272 2014-12-15 04:40:01.696442+01 | 00:00:00.130954 2014-12-15 04:50:01.786961+01 | 00:00:00.572573 2014-12-15 05:00:01.790753+01 | 00:00:00.491799 2014-12-15 05:10:01.078332+01 | 00:07:58.438202 **** likely autovacuum 2014-12-15 05:20:01.139541+01 | 00:00:00.057486 2014-12-15 05:30:01.251079+01 | 00:00:00.053462 2014-12-15 05:40:01.322349+01 | 00:00:00.084701 2014-12-15 05:50:01.607937+01 | 00:00:00.205241 2014-12-15 06:00:01.699406+01 | 00:00:00.121415 2014-12-15 06:10:01.756047+01 | 00:00:00.20769 2014-12-15 06:20:01.854222+01 | 00:00:00.03397 2014-12-15 06:30:02.041054+01 | 00:03:07.271295 2014-12-15 06:40:01.891882+01 | 00:00:00.263748 2014-12-15 06:50:01.987809+01 | 00:00:00.155619 2014-12-15 07:00:01.068556+01 | 00:00:00.119866 2014-12-15 07:10:01.318478+01 | 00:00:00.092856 2014-12-15 07:20:01.704899+01 | 00:00:00.106533 2014-12-15 07:30:01.773268+01 | 00:00:00.135743 2014-12-15 07:40:01.730152+01 | 00:00:00.06358 2014-12-15 07:50:01.798179+01 | 00:00:00.529685 2014-12-15 08:00:01.868205+01 | 00:00:00.194482 2014-12-15 08:10:01.219339+01 | 00:00:00.063553 2014-12-15 08:20:01.309426+01 | 00:00:00.056698 2014-12-15 08:30:01.120431+01 | 00:00:00.425596 2014-12-15 08:40:01.201882+01 | 00:00:00.00909 2014-12-15 08:50:01.272526+01 | 00:00:00.019492 2014-12-15 09:00:01.361022+01 | 00:00:00.423997 2014-12-15 09:10:01.603702+01 | 00:00:00.066705 2014-12-15 09:20:01.682277+01 | 00:00:09.251202 2014-12-15 09:30:01.934477+01 | 00:00:00.311553 2014-12-15 09:40:02.03221+01 | 00:00:00.125678 2014-12-15 09:50:01.105372+01 | 00:00:00.294006 2014-12-15 10:00:01.201109+01 | 00:00:00.014641 2014-12-15 10:10:01.164478+01 | 00:01:51.375378 2014-12-15 10:20:01.264589+01 | 00:09:54.476361 **** likely autovacuum 2014-12-15 10:30:01.351103+01 | 00:00:00.213636 2014-12-15 10:40:01.623903+01 | 00:00:00.488103 2014-12-15 10:50:01.768132+01 | 00:00:00.080799 2014-12-15 11:00:01.880247+01 | 00:00:20.401738 2014-12-15 11:10:01.215509+01 | 00:00:00.036288 2014-12-15 11:20:01.265607+01 | 00:00:00.057142 2014-12-15 11:30:01.343731+01 | 00:00:00.036609 2014-12-15 11:40:01.41248+01 | 00:00:00.218139 2014-12-15 11:50:01.48113+01 | 00:00:00.242754 2014-12-15 12:00:01.685114+01 | 00:00:00.82528 2014-12-15 12:10:01.995243+01 | 00:02:29.971448 2014-12-15 12:20:01.962833+01 | 00:00:00.118112 2014-12-15 12:30:01.100587+01 | 00:00:00.214437 2014-12-15 12:40:01.226111+01 | 00:00:00.052599 2014-12-15 12:50:01.300061+01 | 00:00:00.162205 2014-12-15 13:00:01.4007+01 | 00:00:00.707891 2014-12-15 13:10:02.005526+01 | 00:00:00.162238 2014-12-15 13:20:01.072375+01 | 00:00:00.214978 2014-12-15 13:30:01.446005+01 | 00:00:00.121816 2014-12-15 13:40:01.483524+01 | 00:00:00.650178 2014-12-15 13:50:01.796143+01 | 00:00:00.065482 2014-12-15 14:00:01.886071+01 | 00:00:00.237577 2014-12-15 14:10:01.134148+01 | 00:00:00.193941 2014-12-15 14:20:01.199047+01 | 00:00:00.068058 2014-12-15 14:30:01.27777+01 | 00:00:00.022991 2014-12-15 14:40:01.361959+01 | 00:00:00.439753 2014-12-15 14:50:01.421515+01 | 00:00:00.037749 2014-12-15 15:00:01.500559+01 | 00:00:00.174448 2014-12-15 15:10:01.811804+01 | 00:06:09.196648 **** likely autovacuum .. It goes up till a maximum of 25 minutes (for the last two weeks), it looks correlated with an autovacuum at the master inone of the big tables of the schema. It happens at about 5hours interval. Is there a way to avoid this ? Should I ask tothe master db dba to try to have a more active autovacuum policy? Thank you, Ruben
> dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs (rw, > actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) > dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs (rw, > actimeo=0,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) > > to > > dbnasg403-12a:/vol/dodpupdbtst03 on /ORA/dbs03/PUPDBTST type nfs > (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) > dbnasg401-12a:/vol/dodpupdbtst02 on /ORA/dbs02/PUPDBTST type nfs > (rw,noatime,hard,nointr,rsize=65536,wsize=65536,tcp,timeo=600) > > it did have a big impact. The profile of the recovery process on terms of calls > changed quite a lot: > > From: > > [postgres@itrac1202 tmp]$ strace -p 9596 -c Process 9596 attached - > interrupt to quit Process 9596 detached > % time seconds usecs/call calls errors syscall > ------ ----------- ----------- --------- --------- ---------------- > 78.73 0.217824 0 456855 381376 read > 17.87 0.049453 0 515320 lseek > 2.89 0.007989 12 669 669 poll > 0.33 0.000912 25 36 open > 0.07 0.000206 0 994 994 stat > 0.05 0.000151 0 995 787 rt_sigreturn > 0.05 0.000133 0 673 write > 0.00 0.000000 0 36 close > 0.00 0.000000 0 52 kill > ------ ----------- ----------- --------- --------- ---------------- > 100.00 0.276668 975630 383826 total > This one should read: [root@itrac1202 ~]# strace -c -p 28073 Process 28073 attached - interrupt to quit Process 28073 detached % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 59.16 10.756007 5 2201974 1202832 read 40.69 7.398247 3 2367885 lseek 0.14 0.025970 154 169 open 0.00 0.000057 0 169 close 0.00 0.000038 0 169 kill 0.00 0.000033 1 29 write 0.00 0.000000 0 1 semop ------ ----------- ----------- --------- --------- ---------------- 100.00 18.180352 4570396 1202832 total Apologies for the confusion.