Re: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From Katsuragi Yuta
Subject Re: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id 3cb43b39672c17786566063b0aa1225d@oss.nttdata.com
Whole thread Raw
In response to RE: [Proposal] Add foreign-server health checks infrastructure  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: [Proposal] Add foreign-server health checks infrastructure  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
Hi Kuroda-san,

Thank you for updating the patch!


>> I rethought the pqSocketPoll part. Current interpretation of
>> arguments seems a little bit confusing because a specific pattern
>> of arguments has a different meaning. What do you think about
>> introducing a new argument like `int forConnCheck`? This seems
>> straightforward and readable.
> 
> I think it may be better, so fixed.
> But now we must consider another thing - will we support combination
> of conncheck
> and {read|write} or timeout? Especially about timeout, if someone
> calls pqSocketPoll()
> with forConnCheck = 1 and end_time = -1, the process may be stuck
> because it waits
> till socket peer closed connection.
> Currently the forConnCheck can be specified with other request, but
> timeout must be zero.

Yes, we need to consider these.
I'm wondering whether we need a special care for the combination
of event and timeout. Surely, if forConnCheck is set and end_time = -1,
pqSocketPoll blocks until the connection close. However, the
behavior matches the meaning of the arguments and does not seem
confusing (also not an error state). Do we need to restrict this
kind of usage in the pqSocketPoll side? I think like the following
might be fine.

```
      if (!forRead && !forWrite)
    {
        if (!(forConnCheck && PQconnCheckable()))
            return 0;
    }
```



> While making the patch, I come up with idea that
> postgres_fdw_verify_connection_states*
> returns NULL if PQconnCheck() cannot work on this platform. This can be 
> done by
> adding PQconnCheckable(). It may be reasonable because the checking is
> not really
> done on this environment. How do you think?

I agree with you.

Followings are comments for v33. Please check.
0001:
1. the comment of PQconnCheck
+/*
+ * Check whether the socket peer closed connection or not.
+ *

Check whether the socket peer closed 'the' connection or not?

2. the comment of pqSocketCheck
- * or both.  Returns >0 if one or more conditions are met, 0 if it 
timed
- * out, -1 if an error occurred.
+ * or both. Moreover, this function can check the health of socket on 
some
+ * limited platforms if end_time is 0.

the health of socket -> the health of the connection?
if end_time is 0 -> if forConnCehck is specified?


3. the comment of pqSocketPoll
- * If neither forRead nor forWrite are set, immediately return a 
timeout
- * condition (without waiting).  Return >0 if condition is met, 0
- * if a timeout occurred, -1 if an error or interrupt occurred.
+ * Moreover, this function can check the health of socket on some 
limited
+ * platforms if end_time is 0.

the health of socket -> the health of the connection?
if end_time is 0 -> if forConnCehck is specified?

4. the code of pqSocketPoll
+#if defined(POLLRDHUP)
+    if (forConnCheck)
+        input_fd.events |= POLLRDHUP;
+#endif

I think it is better to use PQconnCheckable() to remove the macro.

5. the code of pqSocketCheck and the definition of pqSocketPoll
-        result = pqSocketPoll(conn->sock, forRead, forWrite, end_time);
+        result = pqSocketPoll(conn->sock, forRead, forWrite, forConnCheck, 
end_time);

-pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
+pqSocketPoll(int sock, int forRead, int forWrite, int forConnCheck, 
time_t end_time)

Should these be divided into two lines?


6. the comment of verify_cached_connections
+ * This function emits warnings if a disconnection is found, and this 
returns
+ * false only when the lastly verified server seems to be disconnected.

It seems better to write the case where this function returns
true.

7. the comment of postgres_fdw_verify_connection_states
+ * This function emits a warning if a disconnection is found. This 
returns
+ * false only when the verified server seems to be disconnected, and 
reutrns
+ * NULL if the connection check had not been done.

It seems better to write the case where this function returns
true.

8. the code of
+                (errcode(ERRCODE_CONNECTION_FAILURE),
+                 errmsg("%s", str.data),
+                 errdetail("Socket close is detected."),
+                 errhint("Plsease check the health of server.")));

Is it better to use "Connection close is detected" rather than
"Socket close is detected"?

9. the document of postgres_fdw
The document of postgres_fdw_verify_connection_states_* is a little
bit old. Could you update it?

regards,

-- 
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Non-superuser subscription owners
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)