Re: PostgreSQL logical replication depends on WAL segments? - Mailing list pgsql-general

From Achilleas Mantzios
Subject Re: PostgreSQL logical replication depends on WAL segments?
Date
Msg-id 640878d7-b2a6-2801-3dc5-b9f844db2bc0@matrix.gatewaynet.com
Whole thread Raw
In response to PostgreSQL logical replication depends on WAL segments?  (Josef Machytka <josef.machytka@gmail.com>)
List pgsql-general
On 22/1/19 3:18 μ.μ., Josef Machytka wrote:
Hello, I already tried to ask on stackoverflow but so far without success.

Could someone help me please?

****

I am successfully using logical replication between 2 PG 11 cloud VMs for latest data. But I tried to publish also some older tables to transfer data between databases and got strange error about missing WAL segment.

These older partitions contain data 5-6 days old. I successfully published them on master and refreshed subscription on logical replica. But now I am getting these strange error messages on logical replica:

2019-01-21 15:03:14.713 UTC [17203] LOG:  logical replication table synchronization worker for subscription "mysubscription", table "mytable_20190115" has finished
2019-01-21 15:03:19.768 UTC [18877] LOG:  logical replication apply worker for subscription "mysubscription" has started
2019-01-21 15:03:19.797 UTC [18877] ERROR:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000098E000000CB has already been removed
2019-01-21 15:03:19.799 UTC [29534] LOG:  background worker "logical replication worker" (PID 18877) exited with exit code 1
2019-01-21 15:03:24.806 UTC [18910] LOG:  logical replication apply worker for subscription "mysubscription" has started
2019-01-21 15:03:24.824 UTC [18911] LOG:  logical replication table synchronization worker for subscription "mysubscription", table "mytable_20190116" has started
2019-01-21 15:03:24.831 UTC [18910] ERROR:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000098E000000CB has already been removed
2019-01-21 15:03:24.834 UTC [29534] LOG:  background worker "logical replication worker" (PID 18910) exited with exit code 1

if you have WAL archiving enabled then try to find the missing WALs and copy them inside your data/pg_wal directory .
Normally the replication slot will retain all logs that have not been applied to the subscriber yet. So what you describe is not normal. Do you have any cron task that touches files in this dir?

Which is confusing for me. I tried to find some info but did not find anything about logical replication depending on WAL segments.

There is no streaming replication running on that particular master and these error messages I see on both master and replica connected with only logical replication.

Am I doing something wrong? Is there some special way how to publish older data? For newer data and latest data all works without problems.

Of course since I published like ~20 tables it took some time for replica to process all tables - currently it processes always 2 at the time. But I still do not understand why it should depend on WAL segments... Thank you very much.

I tried to unpublished and unsubscribe these older tables and publish and subscribe them again but getting still the same error message for the exactly the same WAL segment number.

I unpublished and unsubscribed those problematic tables and error messages stopped so they were definitely related to logical replication. Could they be caused by snapshot?

I even made additional strange experience with WAL segments errors - my logical replica had only quite small disk and during all that fiddling I forgot to check disk usage. So postgresql on logical replica crashed due to full disk. Since I use GCE I just resized root disk and after restart of the instance got more space. But I also got back missing WAL segments errors in connections with logical replication. My postgresql log on replica is now full of sequence of these 3 lines:

2019-01-22 09:47:14.408 UTC [1946] LOG:  logical replication apply worker for subscription "mysubscription" has started
2019-01-22 09:47:14.429 UTC [1946] ERROR:  could not receive data from WAL stream: ERROR:  requested WAL segment 000000010000099D0000007A has already been removed
2019-01-22 09:47:14.431 UTC [737] LOG:  background worker "logical replication worker" (PID 1946) exited with exit code 1

Why logical replication depends on some old WAL segments? Today's data seem to work perfectly although there cannot be all WAL segments for today available on the logical master. But I am unable to publish older data...

Thanks for help.

Josef Machytka



-- 
Achilleas Mantzios
IT DEV Lead
IT DEPT
Dynacom Tankers Mgmt

pgsql-general by date:

Previous
From: Josef Machytka
Date:
Subject: PostgreSQL logical replication depends on WAL segments?
Next
From: Stephen Frost
Date:
Subject: Re: Marc G. Fournier Invoice