Thread: LDAP check flapping on crake due to race
Hi, There are one or two failures per month on crake. It looks like when authentication is rejected, as expected in the tests, the psql process is exiting, but there is a race where the Perl script still wants to write a dummy query to its stdin (?), so you get: psql: FATAL: LDAP authentication failed for user "test1" ack Broken pipe: write( 13, 'SELECT 1' ) at /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549. Example: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=crake&dt=2019-11-10%2023%3A36%3A04 tmunro=> select animal, snapshot, branch from run where fail_stage = 'ldapCheck' order by snapshot desc; animal | snapshot | branch --------+---------------------+--------------- crake | 2020-08-02 02:32:30 | REL_13_STABLE crake | 2020-07-22 23:36:04 | REL_12_STABLE crake | 2020-07-14 00:52:04 | REL_13_STABLE crake | 2020-05-15 17:35:05 | REL_11_STABLE crake | 2020-04-07 20:51:03 | REL_12_STABLE mantid | 2020-03-04 18:17:58 | REL_12_STABLE mantid | 2020-03-04 17:59:58 | REL_11_STABLE crake | 2020-01-17 14:33:21 | REL_12_STABLE crake | 2019-11-10 23:36:04 | REL_11_STABLE crake | 2019-09-09 08:48:25 | HEAD crake | 2019-08-05 21:18:23 | REL_12_STABLE crake | 2019-07-19 01:33:31 | HEAD crake | 2019-07-16 01:06:02 | REL_11_STABLE (13 rows) (Ignore mantid, it had a temporary setup problem that was resolved.)
On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote: > There are one or two failures per month on crake. It looks like when > authentication is rejected, as expected in the tests, the psql process > is exiting, but there is a race where the Perl script still wants to > write a dummy query to its stdin (?), so you get: > > psql: FATAL: LDAP authentication failed for user "test1" > ack Broken pipe: write( 13, 'SELECT 1' ) at > /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549. Do you suppose a fix like e12a472 would cover this? ("psql <&-" fails with status 1 after successful authentication, and authentication failure gives status 2.)
Noah Misch <noah@leadboat.com> writes: > On Sun, Aug 02, 2020 at 05:29:57PM +1200, Thomas Munro wrote: >> There are one or two failures per month on crake. It looks like when >> authentication is rejected, as expected in the tests, the psql process >> is exiting, but there is a race where the Perl script still wants to >> write a dummy query to its stdin (?), so you get: >> psql: FATAL: LDAP authentication failed for user "test1" >> ack Broken pipe: write( 13, 'SELECT 1' ) at >> /usr/share/perl5/vendor_perl/IPC/Run/IO.pm line 549. > Do you suppose a fix like e12a472 would cover this? ("psql <&-" fails with > status 1 after successful authentication, and authentication failure gives > status 2.) AFAICT the failure is happening down inside PostgresNode::psql's call of IPC::Run::run, so we don't really have the option to adjust things in exactly that way. (And messing with sub psql for the benefit of this one caller seems pretty risky anyway.) I'm inclined to suggest that the LDAP test's test_access could use an empty stdin and pass "-c 'SELECT 1'" as a command line option instead. (Maybe that's exactly what you meant, but I'm not sure.) I've not been able to duplicate this locally, so I have no idea if that'd really fix it. regards, tom lane
On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm inclined to suggest that the LDAP test's test_access could use > an empty stdin and pass "-c 'SELECT 1'" as a command line option > instead. (Maybe that's exactly what you meant, but I'm not sure.) Good idea. Here's a patch like that. > I've not been able to duplicate this locally, so I have no idea if > that'd really fix it. Me neither -- I guess someone who enjoys perl could hack IPC::Run to take a short nap at the right moment.
Attachment
On Mon, Aug 03, 2020 at 12:12:57PM +1200, Thomas Munro wrote: > On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I'm inclined to suggest that the LDAP test's test_access could use > > an empty stdin and pass "-c 'SELECT 1'" as a command line option > > instead. (Maybe that's exactly what you meant, but I'm not sure.) > > Good idea. Here's a patch like that. While I had meant a different approach, this is superior. > > I've not been able to duplicate this locally, so I have no idea if > > that'd really fix it. > > Me neither -- I guess someone who enjoys perl could hack IPC::Run to > take a short nap at the right moment. Not essential to reproduce first, I think.
On Mon, Aug 3, 2020 at 12:29 PM Noah Misch <noah@leadboat.com> wrote: > On Mon, Aug 03, 2020 at 12:12:57PM +1200, Thomas Munro wrote: > > On Mon, Aug 3, 2020 at 4:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > I'm inclined to suggest that the LDAP test's test_access could use > > > an empty stdin and pass "-c 'SELECT 1'" as a command line option > > > instead. (Maybe that's exactly what you meant, but I'm not sure.) > > > > Good idea. Here's a patch like that. > > While I had meant a different approach, this is superior. Thanks. Pushed.