Critical errors during logical decoding - Mailing list pgsql-general

From Colin Morelli
Subject Critical errors during logical decoding
Date
Msg-id CAPtU-Uqf4AR1uQ_UCo15JB6QRBecEnm2EvByWMcKyJRnJCL00Q@mail.gmail.com
Whole thread Raw
Responses Re: Critical errors during logical decoding
List pgsql-general
List,

For seemingly no reason at all, my logical replication slot has just started bailing out with errors every time my client tries to connect to it. Particularly - at this moment - I'm getting:

2018-02-07 19:14:31 UTC [3985-1] svc_app@app LOG:  00000: received replication command: START_REPLICATION SLOT event_stream LOGICAL 0/0 ("startup_params_format" '1', "no_txinfo" 'true', "expected_encoding" 'UTF8', "max_proto_version" '1', "proto_format" 'json', "min_proto_version" '1')
2018-02-07 19:14:31 UTC [3985-2] svc_app@app LOCATION:  exec_replication_command, walsender.c:1341
2018-02-07 19:14:31 UTC [3985-3] svc_app@app LOG:  00000: starting logical decoding for slot "event_stream"
2018-02-07 19:14:31 UTC [3985-4] svc_app@app DETAIL:  streaming transactions committing after 0/8DDFD48, reading WAL from 0/8DDDC00
2018-02-07 19:14:31 UTC [3985-5] svc_app@app LOCATION:  CreateDecodingContext, logical.c:408
2018-02-07 19:14:31 UTC [3985-6] svc_app@app LOG:  00000: logical decoding found consistent point at 0/8DDDC00
2018-02-07 19:14:31 UTC [3985-7] svc_app@app DETAIL:  There are no running transactions.
2018-02-07 19:14:31 UTC [3985-8] svc_app@app LOCATION:  SnapBuildFindSnapshot, snapbuild.c:1245
2018-02-07 19:14:31 UTC [3985-9] svc_app@app ERROR:  XX000: no known snapshots

Other errors have included missing toast entries, as well as missing base/ files on disk. I blew away the entire data directory, and restored the database from a pg_dump to a fresh directory after checking filesystem consistency. The logical replication stream operates for a few minutes, and then eventually bails out again. This is Postgres 9.6.6 on Ubuntu installed from packages.

Does anyone have any insight into what could be happening here? Other steps to try and rectify the problem? The database itself does not appear to have any issues that I can see.

Best,
Colin

pgsql-general by date:

Previous
From: Jeremy Finzel
Date:
Subject: Re: Alter table set logged hanging after writing out all WAL
Next
From: Peter Eisentraut
Date:
Subject: Re: Documentation section F