Thread: Fixing cache pollution in the Kerberos test suite

Fixing cache pollution in the Kerberos test suite

From
Jacob Champion
Date:
Hi all,

I was running tests with a GSS-enabled stack, and ran into some very
long psql timeouts after running the Kerberos test suite. It turns out
the suite pushes test credentials into the user's global cache, and
these no-longer-useful credentials persist after the suite has
finished. (You can see this in action by running the test/kerberos
suite and then running `klist`.) This leads to long hangs, I assume
while the GSS implementation tries to contact a KDC that no longer
exists.
Attached is a patch that initializes a local credentials cache inside
tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
This prevents the global cache pollution. WDYT?

--Jacob

Attachment

Re: Fixing cache pollution in the Kerberos test suite

From
Stephen Frost
Date:
Greetings,

* Jacob Champion (pchampion@vmware.com) wrote:
> I was running tests with a GSS-enabled stack, and ran into some very
> long psql timeouts after running the Kerberos test suite. It turns out
> the suite pushes test credentials into the user's global cache, and
> these no-longer-useful credentials persist after the suite has
> finished. (You can see this in action by running the test/kerberos
> suite and then running `klist`.) This leads to long hangs, I assume
> while the GSS implementation tries to contact a KDC that no longer
> exists.
> Attached is a patch that initializes a local credentials cache inside
> tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
> This prevents the global cache pollution. WDYT?

Ah, yeah, that generally seems like a good idea.

Thanks,

Stephen

Attachment

Re: Fixing cache pollution in the Kerberos test suite

From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes:
> * Jacob Champion (pchampion@vmware.com) wrote:
>> I was running tests with a GSS-enabled stack, and ran into some very
>> long psql timeouts after running the Kerberos test suite. It turns out
>> the suite pushes test credentials into the user's global cache, and
>> these no-longer-useful credentials persist after the suite has
>> finished. (You can see this in action by running the test/kerberos
>> suite and then running `klist`.) This leads to long hangs, I assume
>> while the GSS implementation tries to contact a KDC that no longer
>> exists.
>> Attached is a patch that initializes a local credentials cache inside
>> tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar.
>> This prevents the global cache pollution. WDYT?

> Ah, yeah, that generally seems like a good idea.

Yeah, changing global state is just awful.  However, I don't
actually see any change here (RHEL8):

$ klist
klist: Credentials cache 'KCM:1001' not found
$ make check
...
Result: PASS
$ klist
klist: Credentials cache 'KCM:1001' not found

I suppose in an environment where someone was really using Kerberos,
the random kinit would be more of a problem.

Also, why are you only setting the ENV variable within narrow parts
of the test script?  I'd be inclined to enforce it throughout.

            regards, tom lane



Re: Fixing cache pollution in the Kerberos test suite

From
Jacob Champion
Date:
On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:
> Yeah, changing global state is just awful.  However, I don't
> actually see any change here (RHEL8):

Interesting. I'm running Ubuntu 20.04:

$ klist
klist: No credentials cache found (filename: /tmp/krb5cc_1000)

$ make check
...

$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: test1@EXAMPLE.COM

Valid starting       Expires              Service principal
... krbtgt/EXAMPLE.COM@EXAMPLE.COM
... postgres/auth-test-localhost.postgresql.example.com@
... postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM

I wonder if your use of a KCM cache type rather than FILE makes the
difference?

> Also, why are you only setting the ENV variable within narrow parts
> of the test script?  I'd be inclined to enforce it throughout.

I considered it and decided I didn't want to pollute the server's
environment with it, since the server shouldn't need the client cache.
But I think it'd be fine (and match the current situation) if it were
set once for the whole script, if you prefer.

--Jacob

Re: Fixing cache pollution in the Kerberos test suite

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:
>> Yeah, changing global state is just awful.  However, I don't
>> actually see any change here (RHEL8):

> Interesting. I'm running Ubuntu 20.04:

Hmm.  I'll poke harder.

>> Also, why are you only setting the ENV variable within narrow parts
>> of the test script?  I'd be inclined to enforce it throughout.

> I considered it and decided I didn't want to pollute the server's
> environment with it, since the server shouldn't need the client cache.

True, but if it did try to access the cache, accessing the user's
normal cache would be strictly worse than accessing the test cache.

            regards, tom lane



Re: Fixing cache pollution in the Kerberos test suite

From
Jacob Champion
Date:
On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:
> Jacob Champion <pchampion@vmware.com> writes:
> > On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote:
> > > Also, why are you only setting the ENV variable within narrow parts
> > > of the test script?  I'd be inclined to enforce it throughout.
> > I considered it and decided I didn't want to pollute the server's
> > environment with it, since the server shouldn't need the client cache.
> 
> True, but if it did try to access the cache, accessing the user's
> normal cache would be strictly worse than accessing the test cache.

That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
for a much smaller patch :)

--Jacob

Attachment

Re: Fixing cache pollution in the Kerberos test suite

From
Tom Lane
Date:
I wrote:
> Jacob Champion <pchampion@vmware.com> writes:
>> Interesting. I'm running Ubuntu 20.04:

> Hmm.  I'll poke harder.

Ah ... on both RHEL8 and Fedora 33, I find this:

--- snip ---
$ cat /etc/krb5.conf.d/kcm_default_ccache
# This file should normally be installed by your distribution into a
# directory that is included from the Kerberos configuration file (/etc/krb5.conf)
# On Fedora/RHEL/CentOS, this is /etc/krb5.conf.d/
#
# To enable the KCM credential cache enable the KCM socket and the service:
#   systemctl enable sssd-secrets.socket sssd-kcm.socket
#   systemctl start sssd-kcm.socket
#
# To disable the KCM credential cache, comment out the following lines.

[libdefaults]
    default_ccache_name = KCM:
--- snip ---

Even more interesting, that service seems to be enabled by default
(I'm pretty darn sure I didn't ask for it...)

However, this doesn't seem to explain why the test script isn't
causing a global state change.  Whether the state is held in a
file or the sssd daemon shouldn't matter, it seems like.

Also, it looks like the test causes /tmp/krb5cc_<uid> to get
created or updated despite this setting.  If I force klist
to look at that:

$ KRB5CCNAME=/tmp/krb5cc_1001 klist
Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: test1@EXAMPLE.COM

Valid starting     Expires            Service principal
01/25/21 14:31:57  01/26/21 14:31:57  krbtgt/EXAMPLE.COM@EXAMPLE.COM
01/25/21 14:31:57  01/26/21 14:31:57  postgres/auth-test-localhost.postgresql.example.com@
        Ticket server: postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM

where the time corresponds to my having just run the test again.

So I'm still mightily confused, but it is clear that the test's
kinit is touching a file it shouldn't.

            regards, tom lane



Re: Fixing cache pollution in the Kerberos test suite

From
Jacob Champion
Date:
On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:
> However, this doesn't seem to explain why the test script isn't
> causing a global state change.  Whether the state is held in a
> file or the sssd daemon shouldn't matter, it seems like.
> 
> Also, it looks like the test causes /tmp/krb5cc_<uid> to get
> created or updated despite this setting.

Huh. I wonder, if you run `klist -A` after running the tests, do you
get anything more interesting? I am seeing a few bugs on Red Hat's
Bugzilla that center around strange KCM behavior [1]. But we're now
well outside my area of competence.

--Jacob

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1712875

Re: Fixing cache pollution in the Kerberos test suite

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote:
>> True, but if it did try to access the cache, accessing the user's
>> normal cache would be strictly worse than accessing the test cache.

> That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes
> for a much smaller patch :)

I tweaked this to make it look a bit more like the rest of the script,
and pushed it.  Thanks!

            regards, tom lane



Re: Fixing cache pollution in the Kerberos test suite

From
Tom Lane
Date:
Jacob Champion <pchampion@vmware.com> writes:
> On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote:
>> Also, it looks like the test causes /tmp/krb5cc_<uid> to get
>> created or updated despite this setting.

> Huh. I wonder, if you run `klist -A` after running the tests, do you
> get anything more interesting?

"klist -A" prints nothing.

> I am seeing a few bugs on Red Hat's
> Bugzilla that center around strange KCM behavior [1]. But we're now
> well outside my area of competence.

Mine too.  But I verified that the /tmp file is no longer modified
with the adjusted script, so one way or the other this is better.

            regards, tom lane