Thread: Fixing cache pollution in the Kerberos test suite
Hi all, I was running tests with a GSS-enabled stack, and ran into some very long psql timeouts after running the Kerberos test suite. It turns out the suite pushes test credentials into the user's global cache, and these no-longer-useful credentials persist after the suite has finished. (You can see this in action by running the test/kerberos suite and then running `klist`.) This leads to long hangs, I assume while the GSS implementation tries to contact a KDC that no longer exists. Attached is a patch that initializes a local credentials cache inside tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar. This prevents the global cache pollution. WDYT? --Jacob
Attachment
Greetings, * Jacob Champion (pchampion@vmware.com) wrote: > I was running tests with a GSS-enabled stack, and ran into some very > long psql timeouts after running the Kerberos test suite. It turns out > the suite pushes test credentials into the user's global cache, and > these no-longer-useful credentials persist after the suite has > finished. (You can see this in action by running the test/kerberos > suite and then running `klist`.) This leads to long hangs, I assume > while the GSS implementation tries to contact a KDC that no longer > exists. > Attached is a patch that initializes a local credentials cache inside > tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar. > This prevents the global cache pollution. WDYT? Ah, yeah, that generally seems like a good idea. Thanks, Stephen
Attachment
Stephen Frost <sfrost@snowman.net> writes: > * Jacob Champion (pchampion@vmware.com) wrote: >> I was running tests with a GSS-enabled stack, and ran into some very >> long psql timeouts after running the Kerberos test suite. It turns out >> the suite pushes test credentials into the user's global cache, and >> these no-longer-useful credentials persist after the suite has >> finished. (You can see this in action by running the test/kerberos >> suite and then running `klist`.) This leads to long hangs, I assume >> while the GSS implementation tries to contact a KDC that no longer >> exists. >> Attached is a patch that initializes a local credentials cache inside >> tmp_check/krb5cc, and tells psql to use it via the KRB5CCNAME envvar. >> This prevents the global cache pollution. WDYT? > Ah, yeah, that generally seems like a good idea. Yeah, changing global state is just awful. However, I don't actually see any change here (RHEL8): $ klist klist: Credentials cache 'KCM:1001' not found $ make check ... Result: PASS $ klist klist: Credentials cache 'KCM:1001' not found I suppose in an environment where someone was really using Kerberos, the random kinit would be more of a problem. Also, why are you only setting the ENV variable within narrow parts of the test script? I'd be inclined to enforce it throughout. regards, tom lane
On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote: > Yeah, changing global state is just awful. However, I don't > actually see any change here (RHEL8): Interesting. I'm running Ubuntu 20.04: $ klist klist: No credentials cache found (filename: /tmp/krb5cc_1000) $ make check ... $ klist Ticket cache: FILE:/tmp/krb5cc_1000 Default principal: test1@EXAMPLE.COM Valid starting Expires Service principal ... krbtgt/EXAMPLE.COM@EXAMPLE.COM ... postgres/auth-test-localhost.postgresql.example.com@ ... postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM I wonder if your use of a KCM cache type rather than FILE makes the difference? > Also, why are you only setting the ENV variable within narrow parts > of the test script? I'd be inclined to enforce it throughout. I considered it and decided I didn't want to pollute the server's environment with it, since the server shouldn't need the client cache. But I think it'd be fine (and match the current situation) if it were set once for the whole script, if you prefer. --Jacob
Jacob Champion <pchampion@vmware.com> writes: > On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote: >> Yeah, changing global state is just awful. However, I don't >> actually see any change here (RHEL8): > Interesting. I'm running Ubuntu 20.04: Hmm. I'll poke harder. >> Also, why are you only setting the ENV variable within narrow parts >> of the test script? I'd be inclined to enforce it throughout. > I considered it and decided I didn't want to pollute the server's > environment with it, since the server shouldn't need the client cache. True, but if it did try to access the cache, accessing the user's normal cache would be strictly worse than accessing the test cache. regards, tom lane
On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote: > Jacob Champion <pchampion@vmware.com> writes: > > On Mon, 2021-01-25 at 13:49 -0500, Tom Lane wrote: > > > Also, why are you only setting the ENV variable within narrow parts > > > of the test script? I'd be inclined to enforce it throughout. > > I considered it and decided I didn't want to pollute the server's > > environment with it, since the server shouldn't need the client cache. > > True, but if it did try to access the cache, accessing the user's > normal cache would be strictly worse than accessing the test cache. That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes for a much smaller patch :) --Jacob
Attachment
I wrote: > Jacob Champion <pchampion@vmware.com> writes: >> Interesting. I'm running Ubuntu 20.04: > Hmm. I'll poke harder. Ah ... on both RHEL8 and Fedora 33, I find this: --- snip --- $ cat /etc/krb5.conf.d/kcm_default_ccache # This file should normally be installed by your distribution into a # directory that is included from the Kerberos configuration file (/etc/krb5.conf) # On Fedora/RHEL/CentOS, this is /etc/krb5.conf.d/ # # To enable the KCM credential cache enable the KCM socket and the service: # systemctl enable sssd-secrets.socket sssd-kcm.socket # systemctl start sssd-kcm.socket # # To disable the KCM credential cache, comment out the following lines. [libdefaults] default_ccache_name = KCM: --- snip --- Even more interesting, that service seems to be enabled by default (I'm pretty darn sure I didn't ask for it...) However, this doesn't seem to explain why the test script isn't causing a global state change. Whether the state is held in a file or the sssd daemon shouldn't matter, it seems like. Also, it looks like the test causes /tmp/krb5cc_<uid> to get created or updated despite this setting. If I force klist to look at that: $ KRB5CCNAME=/tmp/krb5cc_1001 klist Ticket cache: FILE:/tmp/krb5cc_1001 Default principal: test1@EXAMPLE.COM Valid starting Expires Service principal 01/25/21 14:31:57 01/26/21 14:31:57 krbtgt/EXAMPLE.COM@EXAMPLE.COM 01/25/21 14:31:57 01/26/21 14:31:57 postgres/auth-test-localhost.postgresql.example.com@ Ticket server: postgres/auth-test-localhost.postgresql.example.com@EXAMPLE.COM where the time corresponds to my having just run the test again. So I'm still mightily confused, but it is clear that the test's kinit is touching a file it shouldn't. regards, tom lane
On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote: > However, this doesn't seem to explain why the test script isn't > causing a global state change. Whether the state is held in a > file or the sssd daemon shouldn't matter, it seems like. > > Also, it looks like the test causes /tmp/krb5cc_<uid> to get > created or updated despite this setting. Huh. I wonder, if you run `klist -A` after running the tests, do you get anything more interesting? I am seeing a few bugs on Red Hat's Bugzilla that center around strange KCM behavior [1]. But we're now well outside my area of competence. --Jacob [1] https://bugzilla.redhat.com/show_bug.cgi?id=1712875
Jacob Champion <pchampion@vmware.com> writes: > On Mon, 2021-01-25 at 14:04 -0500, Tom Lane wrote: >> True, but if it did try to access the cache, accessing the user's >> normal cache would be strictly worse than accessing the test cache. > That's fair. Attached is a v2 that just sets KRB5CCNAME globally. Makes > for a much smaller patch :) I tweaked this to make it look a bit more like the rest of the script, and pushed it. Thanks! regards, tom lane
Jacob Champion <pchampion@vmware.com> writes: > On Mon, 2021-01-25 at 14:36 -0500, Tom Lane wrote: >> Also, it looks like the test causes /tmp/krb5cc_<uid> to get >> created or updated despite this setting. > Huh. I wonder, if you run `klist -A` after running the tests, do you > get anything more interesting? "klist -A" prints nothing. > I am seeing a few bugs on Red Hat's > Bugzilla that center around strange KCM behavior [1]. But we're now > well outside my area of competence. Mine too. But I verified that the /tmp file is no longer modified with the adjusted script, so one way or the other this is better. regards, tom lane