Hello Folks,
Thanks for Your inspiration; and I made some progress (found
a way to avoid the issue).
The issue is most likely not related to postgres.
Ron Johnson said:
>> A configuration problem on the machine(s) can be ruled out,
> Famous last words.
Trust me. :)
> Is there a way to test pmc authentication via some other tool, like psql?
Sure, that works. The problem is contained inside the running
application program(s), everything else doesn't show it.
> If *only *the application changed, then by definition it can't be a
> database problem. *Something* in the application changed; you just haven't
> found it.
Obviousely, yes. But then, such a change might expose an undesired
behaviour elsewhere.
> Specifically, I'd read the Discourse 2.3.0 and 2.3.1 release notes.
Correction: it is actually 3.2.0 and 3.3.1.
I finally went the way of bisecting, and, it's not really a problem in
Discourse either. It comes from a feature I had enabled in the course
of migrating, a filesystem change monitor based on kqueue:
https://man.freebsd.org/cgi/man.cgi?query=kqueue
Removing that feature solves the issue for now.
I have still no idea how that tool might lead to mishandled sockets
elsewhere; it might somehow have to do with the async processing of
the DB connect. That would need a thorough look into the code where
this is done.
Tom Lane wrote:
>The TCP trace looks like the client side is timing out too quickly
>in the unsuccessful case. It's not clear to me how the different
>Discourse version would lead to the Kerberos library applying a
>different timeout.
It's not a timeout; a timeout would close the socket. It seems to
rather forget the socket.
>Still, it seems like most of the moving parts
>here are outside of Postgres' control --- I don't think that libpq
>itself has much involvement in the KDC communication.
Kerberos is weird. It goes into libgssapi, but libgssapi doesn't
do much on it's own, it just maps so-called "mech"s, which then point
to the actual kerberos code - which in the case of FreeBSD is very
ancient (but work should be underway to modernize it). It's one of
the most creepy pieces of code I've looked into.
> I concur with looking at the Discourse release notes and maybe asking
> some questions in that community.
They only support that app to run in a certain containerization
on a specific brand of Linux. They don't like my questions and
might just delete them.
Anyway, I have a lead now to either avoid the problem or where to
look more closely. And it has not directly to do with postgres, but
rather with genuine socket mishandling and/or maybe some flaw in
FreeBSD.
cheers,
PMc