Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server - Mailing list pgsql-hackers

From Chao Li
Subject Bug in ALTER SUBSCRIPTION ... SERVER / ... CONNECTION with broken old server
Date
Msg-id D908370F-2695-4231-851D-17179A6A6F2A@gmail.com
Whole thread
List pgsql-hackers
Hi,

I tested the v19 new feature CREATE SUBSCRIPTION ... SERVER yesterday, and found an issue: once the old server becomes
broken,the subscription cannot be recovered by switching it to a good server. 

Here is a repro:
```
evantest=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

# Create two servers
evantest=# CREATE SERVER old_srv FOREIGN DATA WRAPPER postgres_fdw
evantest-# OPTIONS (host '127.0.0.1', dbname 'postgres', port '5432');
CREATE SERVER
evantest=# CREATE SERVER new_srv FOREIGN DATA WRAPPER postgres_fdw
evantest-# OPTIONS (host '127.0.0.1', dbname 'postgres', port '5432');
CREATE SERVER
evantest=# CREATE USER MAPPING FOR CURRENT_USER SERVER old_srv
evantest-# OPTIONS (user 'dummy', password 'dummy');
CREATE USER MAPPING
evantest=# CREATE USER MAPPING FOR CURRENT_USER SERVER new_srv
evantest-# OPTIONS (user 'dummy', password 'dummy');
CREATE USER MAPPING

# Add old_server to a subscription
evantest=# CREATE SUBSCRIPTION sub_bug SERVER old_srv
evantest-# PUBLICATION pub_does_not_matter
evantest-# WITH (connect = false, slot_name = NONE);
WARNING:  subscription was created, but is not connected
HINT:  To initiate replication, you must manually create the replication slot, enable the subscription, and alter the
subscriptionto refresh publications. 
CREATE SUBSCRIPTION

# Break old_srv
evantest=# DROP USER MAPPING FOR CURRENT_USER SERVER old_srv;
DROP USER MAPPING

# Fail to switch to a good server because old_srv is broken
evantest=# ALTER SUBSCRIPTION sub_bug SERVER new_srv;
ERROR:  user mapping not found for user "chaol", server "old_srv"
evantest=# ALTER SUBSCRIPTION sub_bug CONNECTION 'host=127.0.0.1 dbname=postgres port=5432 user=dummy password=dummy';
ERROR:  user mapping not found for user "chaol", server "old_srv"
```

In AlterSubscription(), this comment made me think this is a bug:
```
    /*
     * Skip ACL checks on the subscription's foreign server, if any. If
     * changing the server (or replacing it with a raw connection), then the
     * old one will be removed anyway. If changing something unrelated,
     * there's no need to do an additional ACL check here; that will be done
     * by the subscription worker anyway.
     */
    sub = GetSubscription(subid, false, false);
```

The comment explicitly says to skip ACL checks on the old server because it will be removed anyway.

But after looking into GetSubscription(), I found that when the aclcheck parameter is false, it still calls
ForeignServerConnectionString().I think that is the root cause of the bug. 

To fix this, I worked out a solution that stores the server OID in Subscription, and only calls
ForeignServerConnectionString()lazilywhen sub->conninfo is actually needed. I also added a new test case to cover this
scenario.Without the fix, the new test fails. 
See attached patch for details.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired
Next
From: David Rowley
Date:
Subject: Re: [PATCH] Allow SJE to recognize GiST-backed temporal primary keys