RE: [Proposal] Add foreign-server health checks infrastructure - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: [Proposal] Add foreign-server health checks infrastructure
Date
Msg-id TYAPR01MB5866AAA4EB0632B4001AA2C5F50E9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [Proposal] Add foreign-server health checks infrastructure  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [Proposal] Add foreign-server health checks infrastructure
List pgsql-hackers
Dear Önder,

Sorry for my late reply. I understood that we got agreement the basic design of first version. Thanks!
I attached new version patches.

> I think more experienced hackers could guide us here. I don't see a problem
> with that, but checking other functions like pqSocketPoll(), I see that
> Postgres uses the select system call. I wonder if we can have something
> similar with select? Seems no, but wanted to bring up in case you know more
> about that?

Here I have been still checking...

> I guess it kind of makes sense to have the flexibility to check all
> connections vs only in tx connections.
>
> Though, maybe we should follow a similar approach
> to postgres_fdw_disconnect(servername) / postgres_fdw_disconnect_all()
>     postgres_fdw_verify_connection_states(servername) /
> postgres_fdw_verify_connection_states_all()
>
> That seems like a more natural API considering the other UDFs. You can also
> use in combination with postgres_fdw_get_connections()

I think you are right, fixed.

> Hmm, I don't know the reason. Is that maybe epoll is available on a smaller
> number of platforms and libpq can be used on more platforms as being a
> client library?

Here I have been still checking...

> So, should we also be using all these flags like: input_fd.events =
> POLLRDHUP | *errflags*;  ?

Fixed.

> Is erroring out always necessary? Maybe we should just return true/false,
> and let the caller decide what to do? For example, you might want to
> rollback to savepoint and retry?  If the caller wants to rollback the whole
> transaction, that is possible anyway.

I changed as you suggested once, but I was not sure we can really do such a thing.
Please see the following case.

```
BEGIN;

SAVEPOINT s;
SELECT 1 FROM ft1 LIMIT 1; -- connect to foreign server

SELECT pg_terminate_backend(pid, 180000) FROM pg_stat_activity
    WHERE application_name = 'fdw_retry_check'; -- kill the remote connection

SELECT 1 FROM ft1 LIMIT 1; -- will fail
-> ERROR:  FATAL:  terminating connection due to administrator command


ROLLBACK TO SAVEPOINT s; -- OK, go back to savepoint...
COMMIT;
-> ERROR:  connection to server "loopback" was lost
```

The test means that even if we go back to the savepoint, 
we could not commit the transaction because postgres_fdw tries to connect to the remove
in pgfdw_xact_callback().

Do we really have a flexibility when the remote connection seems to be closed?
If we do not have, I will revert the change.

> Do we really need this new file or just an oversight in your patch?

Previously there were three file because sometimes the ordering of
output  was changed, but it was no more needed.
Currently this patch contains two comparison files, one is for linux,
and another one is for unsupported OSes like Windows.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Attachment

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: ExecRTCheckPerms() and many prunable partitions
Next
From: Ashutosh Bapat
Date:
Subject: Re: Avoid distributing new catalog snapshot again for the transaction being decoded.