Thread: BUG #17179: EOF detected

BUG #17179: EOF detected

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      17179
Logged by:          Никита Шумилов
Email address:      66666666nikita@gmail.com
PostgreSQL version: 12.6
Operating system:   Ubuntu
Description:

Query running for more than 30 minutes are disconnected
SELECT pg_sleep(1820)
SSL SYSCALL error: EOF detected

Increased timeouts tcp_user_timeout, tcp_keepalives_interval didn't help.

Configuration:
autovacuum_naptime    60    s
timezone_abbreviations    Default    
client_encoding    UNICODE    
vacuum_freeze_min_age    50000000    
search_path    "$user", public    
password_encryption    md5    
ssl_passphrase_command        
ssl_key_file    /etc/ssl/private/ssl-cert-snakeoil.key    
allow_system_table_mods    off    
ident_file    /etc/postgresql/12/main/pg_ident.conf    
segment_size    131072    8kB
wal_block_size    8192    
debug_assertions    off    
data_directory_mode    700    
cluster_name    pgsql    
geqo_selection_bias    2    
geqo_seed    0    
geqo_effort    5    
constraint_exclusion    partition    
cpu_tuple_cost    0.01    
enable_mergejoin    on    
synchronous_standby_names        
wal_keep_segments    8    
max_wal_senders    5    
wal_sender_timeout    60000    ms
max_replication_slots    5    
max_standby_archive_delay    30000    ms
recovery_min_apply_delay    0    ms
vacuum_cost_page_miss    10    
vacuum_cost_page_dirty    20    
max_files_per_process    1000    
log_planner_stats    off    
log_executor_stats    off    
track_counts    on    
stats_temp_directory    pg_stat_tmp    
array_nulls    on    
backslash_quote    safe_encoding    
operator_precedence_warning    off    
recovery_target_lsn        
wal_log_hints    on    
wal_init_zero    on


Re: BUG #17179: EOF detected

From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes:
> Query running for more than 30 minutes are disconnected
> SELECT pg_sleep(1820)
> SSL SYSCALL error: EOF detected

Ideally, you'd fix your broken network infrastructure.  However,
if you can't ...

> Increased timeouts tcp_user_timeout, tcp_keepalives_interval didn't help.

... that's exactly backwards about how to work around it.  You need
to *reduce* the interval at which keepalives are sent.  Try setting
tcp_keepalives_idle to 10min or so.  (At least on my Linux box,
it seems to default to 7200s = 2 hours, so that it's no surprise that a
router that drops connections after 30 minutes would cause problems.)

            regards, tom lane