Thread: Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.
Re: Thomas Munro 2018-11-23 <E1gQ6IU-0002sR-Fm@gemulon.postgresql.org> > Add WL_EXIT_ON_PM_DEATH pseudo-event. I think this broke something: TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))«, Datei: »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«,Zeile: 389) 2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: Aborted I can trigger it just by opening an ssl connection, non-ssl tcp connections are fine. Debian unstable/amd64. Christoph
On Sun, Nov 25, 2018 at 3:38 AM Christoph Berg <myon@debian.org> wrote: > Re: Thomas Munro 2018-11-23 <E1gQ6IU-0002sR-Fm@gemulon.postgresql.org> > > Add WL_EXIT_ON_PM_DEATH pseudo-event. > > I think this broke something: > > TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))«, Datei: »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«,Zeile: 389) > 2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: Aborted > > I can trigger it just by opening an ssl connection, non-ssl tcp > connections are fine. Thanks. I was initially surprised that this didn't come up in check-world, but I see now that I need to go and add PG_TEST_EXTRA="ssl ldap" to my testing routine (and cfbot's). Reproduced here, and it's a case where we were not handling postmaster death, which exactly what this assertion was designed to find. The following is one way to fix the assertion failure, though I'm not sure if it would be better to request WL_POSTMASTER_DEATH and generate a FATAL error like secure_read() does: --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -406,9 +406,9 @@ aloop: * StartupPacketTimeoutHandler() which directly exits. */ if (err == SSL_ERROR_WANT_READ) - waitfor = WL_SOCKET_READABLE; + waitfor = WL_SOCKET_READABLE | WL_EXIT_ON_PM_DEATH; else - waitfor = WL_SOCKET_WRITEABLE; + waitfor = WL_SOCKET_WRITEABLE | WL_EXIT_ON_PM_DEATH; -- Thomas Munro http://www.enterprisedb.com
On Sun, Nov 25, 2018 at 12:59 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote: > On Sun, Nov 25, 2018 at 3:38 AM Christoph Berg <myon@debian.org> wrote: > > TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))«, Datei: »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«,Zeile: 389) > > 2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: Aborted Fix pushed. By way of penance, I have now configured PG_TEST_EXTRA="ssl ldap kerberos" for my build farm animals elver and eelpout. elver should pass at the next build, as I just tested it with --nosend, but eelpout is so slow I'll just take my chances see if that works. I'll also review the firewall config on those VMs, since apparently everyone is too chicken to run those tests, perhaps for those sorts of reasons. I've also set those tests up for cfbot, which would have caught this when draft patches were posted, and also enabled -Werror on cfbot which would have caught a GCC warning I missed because I usually develop/test with clang. -- Thomas Munro http://www.enterprisedb.com
Thomas Munro <thomas.munro@enterprisedb.com> writes: > Fix pushed. > By way of penance, I have now configured PG_TEST_EXTRA="ssl ldap > kerberos" for my build farm animals elver and eelpout. elver should > pass at the next build, as I just tested it with --nosend, but eelpout > is so slow I'll just take my chances see if that works. Nope :-(. Looks like something about key length ... probably just misconfiguration? > I'll also > review the firewall config on those VMs, since apparently everyone is > too chicken to run those tests, perhaps for those sorts of reasons. I think in many cases the answer is just "it's not in the default buildfarm configuration". I couldn't think of a strong reason not to run the ssl check on longfin, so I've just updated that to do so. regards, tom lane
On Mon, Nov 26, 2018 at 6:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Thomas Munro <thomas.munro@enterprisedb.com> writes: > > Fix pushed. > > By way of penance, I have now configured PG_TEST_EXTRA="ssl ldap > > kerberos" for my build farm animals elver and eelpout. elver should > > pass at the next build, as I just tested it with --nosend, but eelpout > > is so slow I'll just take my chances see if that works. > > Nope :-(. Looks like something about key length ... probably just > misconfiguration? It seems that we have keys in our tree that are unacceptable to OpenSSL 1.1.1 as shipped in Debian buster: 2018-11-25 20:32:22.519 UTC [26882] FATAL: could not load server certificate file "server-cn-only.crt": ee key too small That's what you get if you use the libssl-dev package (1.1.1a-1), but you can still install libssl1.0-dev (which uninstalls 1.1's dev package). I've done that and it the ssl test passes on that machine, so fingers crossed for the next build farm run. I see now that Michael already wrote about this recently[1], but that thread hasn't yet reached a conclusion. [1] https://www.postgresql.org/message-id/flat/20180917131340.GE31460%40paquier.xyz -- Thomas Munro http://www.enterprisedb.com
On Mon, Nov 26, 2018 at 09:53:19AM +1300, Thomas Munro wrote: > I see now that Michael already wrote about this recently[1], but that > thread hasn't yet reached a conclusion. > > [1] https://www.postgresql.org/message-id/flat/20180917131340.GE31460%40paquier.xyz Yes, I heard nothing but crickets on this one. So what I have been doing is just to update my SSL configuration when running the tests. That's annoying... Still not impossible to solve. If there are extra opinions to move on with a key replacement, I could always do so. -- Michael