Re: Unresolved repliaction hang and stop problem. - Mailing list pgsql-hackers

From Lukasz Biegaj
Subject Re: Unresolved repliaction hang and stop problem.
Date
Msg-id 224ac459-9ea9-32ab-8b5e-18fb885267e3@unitygroup.com
Whole thread Raw
In response to Re: Unresolved repliaction hang and stop problem.  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Unresolved repliaction hang and stop problem.
List pgsql-hackers
Hey, thanks for reaching out and sorry for the late reply - we had few 
days of national holidays.

On 29.04.2021 15:55, Alvaro Herrera wrote:
> https://www.postgresql.org/message-id/flat/CANDwggKYveEtXjXjqHA6RL3AKSHMsQyfRY6bK%2BNqhAWJyw8psQ%40mail.gmail.com
> https://www.postgresql.org/message-id/flat/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com
> 
> Krzysztof said "after upgrading to pg13 we started having problems",
> which implicitly indicates that the same thing worked well in pg10 ---
> but if the problem has been correctly identified, then this wouldn't
> have worked in pg10 either.  So something in the story doesn't quite
> match up.  Maybe it's not the same problem after all, or maybe they
> weren't doing X in pg10 which they are attempting in pg13.
> 

The problem started occurring after upgrade from pg10 to pg13. No other 
changes were performed, especially not within the database structure nor 
performed operations.

The problem is as described in 
https://www.postgresql.org/message-id/flat/8bf8785c-f47d-245c-b6af-80dc1eed40db%40unitygroup.com

It does occur on two separate production clusters and one test cluster - 
all belonging to the same customer, although processing slightly 
different data (it's an e-commerce store with multiple languages and 
separate production databases for each language).

We've tried recreating the database from dump, and recreating the 
replication, but without any positive effect - the problem persists.

We did not rollback the databases to pg10, instead we've stayed with 
pg13 and implemented a shell script to kill the walsender process if it 
seems stuck in `hash_seq_search`. It's ugly, but it works and we backup 
and monitor the data integrity anyway.

I'd be happy to help in debugging the issue had I known how to do it 
:-). If you'd like then we can also try to rollback the installation 
back to pg10 to get certainty that this was not caused by schema changes.


-- 
Lukasz Biegaj | Unity Group | https://www.unitygroup.com/
System Architect, AWS Certified Solutions Architect



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Identify missing publications from publisher while create/alter subscription.
Next
From: Tom Lane
Date:
Subject: Re: Simplify backend terminate and wait logic in postgres_fdw test