RE: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id TYAPR01MB58667F1E17C0B79BDFFB48F9F5A49@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [Proposal] Add foreign-server health checks infrastructure  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
Dear Peter,

Thank you for reviewing! Latest version can be seen in [1].

> 1.
> PQcanConnCheck seemed like a strange API name. Maybe it can have the
> same prefix as the other?
> 
> e.g.
> 
> - PQconnCheck()
> - PGconnCheckSupported()
> 
> or
> 
> - PQconnCheck()
> - PGconnCheckable()
>

I choose PQconnCheckable().

> 2.
> PqconnCheck() function allows to check the status of socket by polling
> the socket. This function is currently available only on systems that
> support the non-standard POLLRDHUP extension to the poll system call,
> including Linux.
> 
> PqcanConnCheck() checks whether above function is available or not.
> 
> ~
> 
> 2a.
> "status of socket" --> "status of the connection"
> 
> ~
> 
> 2b.
> "above function" --> "the above function"

Fixed.

> doc/src/sgml/libpq.sgml
> 
> 3. PQconnCheck
> 
> Returns the health of the socket.
> 
> int PQconnCheck(PGconn *conn);
> 
> Unlike PQstatus, this function checks socket health. This check is
> performed by polling the socket. This function is currently available
> only on systems that support the non-standard POLLRDHUP extension to
> the poll system call, including Linux. PQconnCheck returns greater
> than zero if the remote peer seems to be closed, returns 0 if the
> socket is valid, and returns -1 if the connection has been already
> invalid or an error is error occurred.
> 
> ~
> 
> 3a.
> Should these descriptions be referring to the health of the
> *connection* rather than the health of the socket?

Reworded.

> 3b.
> "has been already invalid" ?? wording

I checked codes and found that the socket becomes PGINVALID_SOCKET
after being closed. So I clarified that.

> 4. PQcanConnCheck
> 
> Returns whether PQconnCheck is available on this platform.
> PQcanConnCheck returns 1 if the function is supported, otherwise
> returns 0.
> 
> ~
> 
> I thought this should be worded using "true" and "false" same as other
> boolean functions on this page.
> 
> SUGGESTION
> Returns true (1) or false (0) to indicate if the PQconnCheck function
> is supported on this platform.

Fixed.

> ======
> src/interfaces/libpq/fe-misc.c
> 
> 5.
> -static int pqSocketCheck(PGconn *conn, int forRead, int forWrite,
> -   time_t end_time);
> +static int pqSocketIsReadableOrWritableOrValid(PGconn *conn, int forRead,
> + int forWrite, time_t end_time);
> 
> I was not 100% sure overloading this API is the right thing to do.
> Doesn't this introduce a subtle side-effect on some of the existing
> callers? e.g. Previously pqWaitTimed would ALWAYS return 0 if
> forRead/forWrite were both false. But now other return values like
> errors will be possible. Is that OK?

I checked pqWaitTimed() and seems not to affect other parts.

As the first place, it is an internal function and will be never called from clients.
There are 2 functions that call that - connectDBComplete() and pqWait(), and both of
them are not set finish_time to zero. In connectDBComplete(), basically finish_time 
is set to -1,  and set to time(NULL) + connect_timeout if the timeout is enabled.


> 6. pqSocketPoll
> 
> /*
>  * Check a file descriptor for read and/or write data, possibly waiting.
>  * If neither forRead nor forWrite are set, immediately return a timeout
>  * condition (without waiting).  Return >0 if condition is met, 0
>  * if a timeout occurred, -1 if an error or interrupt occurred.
>  *
>  * Timeout is infinite if end_time is -1.  Timeout is immediate (no blocking)
>  * if end_time is 0 (or indeed, any time before now).
>  *
>  * Moreover, when neither forRead nor forWrite is requested and timeout is
>  * disabled, try to check the health of socket.
>  */
> 
> 
> The new comment "Moreover..." is contrary to the earlier part of the
> same comment which already said, "If neither forRead nor forWrite are
> set, immediately return a timeout condition (without waiting)."
> 
> There might be side-effects to previous/existing callers of this
> function (e.g. pqWaitTimed via pqSocketCheck)

Comments were fixed. About the side-effect, please see previous discussion.

> 7.
>   if (!forRead && !forWrite)
> - return 0;
> + {
> + /* Try to check the health if requested */
> + if (end_time == 0)
> +#if defined(POLLRDHUP)
> + input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
> +#else
> + return 0;
> +#endif /* defined(POLLRDHUP) */
> + else
> + return 0;
> + }
> 
> FYI - I think the new code can be simpler without needing #else by
> calling your other new function.
> 
> SUGGESTION
> if (!forRead && !forWrite)
> {
> if (!PQcanConnCheck() || end_time != 0)
> return 0;
> 
> /* Check the connection health when end_time is 0 */
> Assert(PQcanConnCheck() && end_time == 0);
> #if defined(POLLRDHUP)
> input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
> #endif
> }

Fixed.

> 8. PQconnCheck
> +/*
> + * Check whether PQconnCheck() can work well on this platform.
> + *
> + * Returns 1 if this can use PQconnCheck(), otherwise 0.
> + */
> +int
> +PQcanConnCheck(void)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + return true;
> +#else
> + return false;
> +#endif
> +}
> 
> ~
> 
> 8a.
> "can work well" --> "works"
> 
> ~
> 
> 8b.
> Maybe better to say "true (1)" and "otherwise false (0)"

Fixed.

[1]:
https://www.postgresql.org/message-id/TYAPR01MB58669C95604A0EB7BCC1B58CF5A49%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [Proposal] Add foreign-server health checks infrastructure
Next
From: Peter Eisentraut
Date:
Subject: Add support for unit "B" to pg_size_pretty()