Regression tests versus the buildfarm environment - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Regression tests versus the buildfarm environment |
Date | |
Msg-id | 26400.1281501751@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Regression tests versus the buildfarm environment
Re: Regression tests versus the buildfarm environment Re: Regression tests versus the buildfarm environment |
List | pgsql-hackers |
There's an interesting buildfarm failure here: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-08-10%2023:46:10 It appears to me that this was caused by the concurrent run of another buildfarm animal on the same physical machine, namely: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-08-11%2000:02:58 Both animals are trying to test HEAD, which means that pg_regress defaults to the same postmaster port number in both builds: if (temp_install && !port_specified_by_user) /* * To reduce chances of interference with parallel installations, use * a port number starting inthe private range (49152-65535) * calculated from the version number. */ port = 0xC000 | (PG_VERSION_NUM& 0x3FFF); We observe colugos successfully starting on that port: ============== starting postmaster ============== running on port 57332 with pid 47019 ============== creating database "regression" ============== CREATE DATABASE ALTER DATABASE ... etc etc ... polecat comes along what must be only moments later, and tries to use the same port for its temp install: ============== starting postmaster ============== running on port 57332 with pid 47022 ============== creating database "regression" ============== ERROR: duplicate key value violates unique constraint "pg_database_datname_index" DETAIL: Key (datname)=(regression) already exists. command failed: "/usr/local/src/build-farm-3.2/builds/HEAD/pgsql.15278/src/test/regress/./tmp_check/install//usr/local/src/build-farm-3.2/builds/HEAD/inst/bin/psql" -X-c "CREATE DATABASE \"regression\" TEMPLATE=template0 ENCODING='SQL_ASCII' LC_COLLATE='C' LC_CTYPE='C'" "postgres" pg_ctl: PID file "/usr/local/src/build-farm-3.2/builds/HEAD/pgsql.15278/src/test/regress/./tmp_check/data/postmaster.pid"does not exist Is server running? pg_regress: could not stop postmaster: exit code was 256 Now the postmaster log shows that the second postmaster correctly recognized that the port number was already in use, so it bailed out: ================== pgsql.15278/src/test/regress/log/postmaster.log =================== [4c61f2d2.b7ae:1] FATAL: lock file "/tmp/.s.PGSQL.57332.lock" already exists [4c61f2d2.b7ae:2] HINT: Is another postmaster (PID 47019) using socket file "/tmp/.s.PGSQL.57332"? However, pg_regress failed to have a clue about what had happened, and bulled ahead trying to run the regression tests (against the postmaster started by the other pg_regress instance). A look at the code shows that it is merely trying to run psql, and if psql reports that it can connect to the specified port, then pg_regress thinks the postmaster started OK. Of course, psql was really reporting that it could connect to the other instance's postmaster. I've seen similar multiple-postmaster-interference symptoms before in the buildfarm, but never really understood the cause. I am not sure if there's anything very good we can do about the problem of pg_regress misidentifying the postmaster it's managed to connect to. A real solution would probably be much more trouble than it's worth, anyway. However, it does seem like we ought to be able to do something about two buildfarm critters defaulting to the same choice of port number. The buildfarm infrastructure goes to great lengths to pick nonconflicting port numbers for the "installed" postmasters it runs; but we're ignoring all that effort and just using a hardwired port number for "make check". This is dumb. pg_regress does have a --port argument that can be used to override that default. I don't know whether the buildfarm script calls pg_regress directly or does "make check". If the latter, we'd need to twiddle the Makefiles to allow a port number to get passed in. But this seems well worthwhile to me. Comments? regards, tom lane
pgsql-hackers by date: