Thread: installcheck-world concurrency issues

installcheck-world concurrency issues

From
Andres Freund
Date:
Hi,

while working on installcheck support with meson, that currently running
installcheck-world fails regularly with meson and occasionally with make.

A way to quite reliably reproduce this with make is

make -s -j48 -C contrib/ USE_MODULE_DB=1 installcheck-adminpack-recurse installcheck-passwordcheck-recurse

that will fail with diffs like:

diff -du10 /home/andres/src/postgresql/contrib/passwordcheck/expected/passwordcheck.out
/home/andres/build/postgres/dev-assert/vpath/contrib/passwordcheck/res>
--- /home/andres/src/postgresql/contrib/passwordcheck/expected/passwordcheck.out        2022-10-03 15:56:57.900326662
-0700
+++ /home/andres/build/postgres/dev-assert/vpath/contrib/passwordcheck/results/passwordcheck.out        2022-10-03
15:56:59.930329973-0700
 
@@ -1,19 +1,22 @@
 LOAD 'passwordcheck';
 CREATE USER regress_user1;
 -- ok
 ALTER USER regress_user1 PASSWORD 'a_nice_long_password';
+ERROR:  tuple concurrently deleted
 -- error: too short
 ALTER USER regress_user1 PASSWORD 'tooshrt';
-ERROR:  password is too short
+ERROR:  role "regress_user1" does not exist
 -- error: contains user name
 ALTER USER regress_user1 PASSWORD 'xyzregress_user1';
-ERROR:  password must not contain user name
+ERROR:  role "regress_user1" does not exist
 -- error: contains only letters

 LOAD 'passwordcheck';
 CREATE USER regress_user1;
 -- ok
 ALTER USER regress_user1 PASSWORD 'a_nice_long_password';
+ERROR:  tuple concurrently deleted
 -- error: too short
 ALTER USER regress_user1 PASSWORD 'tooshrt';
-ERROR:  password is too short
+ERROR:  role "regress_user1" does not exist
 -- error: contains user name


That's not surprising, given the common name of "regress_user1".

The attached patch fixes a number of instances of this issue. With it I got
through ~5 iterations of installcheck-world on ac, and >30 iterations with
meson.

There's a few further roles that seem to pose some danger goign forward:

./contrib/file_fdw/sql/file_fdw.sql:CREATE ROLE regress_no_priv_user LOGIN;                 -- has priv but no user
mapping
./contrib/postgres_fdw/sql/postgres_fdw.sql:CREATE ROLE regress_view_owner SUPERUSER;
./contrib/postgres_fdw/sql/postgres_fdw.sql:CREATE ROLE regress_nosuper NOSUPERUSER;
./contrib/passwordcheck/sql/passwordcheck.sql:CREATE USER regress_passwordcheck_user1;
./contrib/citext/sql/create_index_acl.sql:CREATE ROLE regress_minimal;
./src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql:CREATE ROLE regress_r1;
./src/test/modules/test_rls_hooks/sql/test_rls_hooks.sql:CREATE ROLE regress_s1;
./src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql:CREATE ROLE regress_role_joe;
./src/test/modules/test_oat_hooks/sql/test_oat_hooks.sql:CREATE USER regress_test_user;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol0 SUPERUSER LOGIN;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrolx SUPERUSER LOGIN;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol2 SUPERUSER;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_testrol1 SUPERUSER LOGIN IN ROLE
regress_testrol2;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_role_haspriv;
./src/test/modules/unsafe_tests/sql/rolenames.sql:CREATE ROLE regress_role_nopriv;
./src/test/modules/unsafe_tests/sql/guc_privs.sql:CREATE ROLE regress_admin SUPERUSER;
./src/test/modules/test_ddl_deparse/sql/alter_function.sql:CREATE ROLE regress_alter_function_role;


BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
somehow?  Seems not great to run it as part of installcheck-world, if we don't
want to run it as part of installcheck.


A second issue I noticed is that advisory_lock.sql often fails, because the
pg_locks queries don't restrict to the current database. Patch attached.

I haven't seen that with autoconf installcheck-world, presumably because of
this:

# There are too many interdependencies between the subdirectories, so
# don't attempt parallel make here.
.NOTPARALLEL:


With those two patches applied, I got through 10 iterations of running all
regress / isolation tests concurrently with meson without failures.

I attached the meson patch as well, but just because I used it to to get to
these patches.

Greetings,

Andres Freund

Attachment

Re: installcheck-world concurrency issues

From
Michael Paquier
Date:
On Mon, Oct 03, 2022 at 04:41:11PM -0700, Andres Freund wrote:
> There's a few further roles that seem to pose some danger goign forward:

I have never seen that myself, but 0001 is a nice cleanup.
generated.sql includes a user named "regress_user11".  Perhaps that's
worth renaming while on it?

> BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
> somehow?  Seems not great to run it as part of installcheck-world, if we don't
> want to run it as part of installcheck.c

Indeed.

> A second issue I noticed is that advisory_lock.sql often fails, because the
> pg_locks queries don't restrict to the current database. Patch attached.

As in prepared_xacts.sql or just advisory locks taken in an installed
cluster?  Or both?

> I attached the meson patch as well, but just because I used it to to get to
> these patches.

I am still studying a lot of this area, but it seems like all the
spots requiring a custom configuration (aka NO_INSTALLCHECK) are
covered.  --setup running is working here with 0003.
--
Michael

Attachment

Re: installcheck-world concurrency issues

From
Andres Freund
Date:
Hi,

On 2022-10-04 17:05:40 +0900, Michael Paquier wrote:
> On Mon, Oct 03, 2022 at 04:41:11PM -0700, Andres Freund wrote:
> > There's a few further roles that seem to pose some danger goign forward:
> 
> I have never seen that myself, but 0001 is a nice cleanup.
> generated.sql includes a user named "regress_user11".  Perhaps that's
> worth renaming while on it?

I think regress_* without a "namespace" is what's src/test/regress uses, so
there's not really a need?


> > A second issue I noticed is that advisory_lock.sql often fails, because the
> > pg_locks queries don't restrict to the current database. Patch attached.
> 
> As in prepared_xacts.sql or just advisory locks taken in an installed
> cluster?  Or both?

There's various isolation tests, including several in src/test/isolation, that
use advisory locks.

prepared_xacts.sql shouldn't be an issue, because it's scheduled in a separate
group.


> > I attached the meson patch as well, but just because I used it to to get to
> > these patches.
> 
> I am still studying a lot of this area, but it seems like all the
> spots requiring a custom configuration (aka NO_INSTALLCHECK) are
> covered.  --setup running is working here with 0003.

Thanks for checking.

Greetings,

Andres Freund



Re: installcheck-world concurrency issues

From
Peter Eisentraut
Date:
On 04.10.22 01:41, Andres Freund wrote:
> BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
> somehow?  Seems not great to run it as part of installcheck-world, if we don't
> want to run it as part of installcheck.

I think there are different levels and kinds of unsafeness.  The ssl and 
kerberos tests start open server processes on your machine.  The 
modules/unsafe_tests just make a mess of your postgres instance.  The 
latter isn't a problem when run against a temp instance.




Re: installcheck-world concurrency issues

From
Andres Freund
Date:
Hi,

On 2022-10-05 08:16:37 +0200, Peter Eisentraut wrote:
> On 04.10.22 01:41, Andres Freund wrote:
> > BTW, shouldn't src/test/modules/unsafe_tests use the PG_TEST_EXTRA mechanism
> > somehow?  Seems not great to run it as part of installcheck-world, if we don't
> > want to run it as part of installcheck.
> 
> I think there are different levels and kinds of unsafeness. The ssl and
> kerberos tests start open server processes on your machine.  The
> modules/unsafe_tests just make a mess of your postgres instance.  The latter
> isn't a problem when run against a temp instance.

I agree - but I suspect our definition of danger is reversed. For me breaking
an existing cluster is a lot more likely to incur "real world" danger than
starting a throway instance listening to tcp on localhost...

Greetings,

Andres Freund



Re: installcheck-world concurrency issues

From
Michael Paquier
Date:
On Tue, Oct 04, 2022 at 11:35:53AM -0700, Andres Freund wrote:
> On 2022-10-04 17:05:40 +0900, Michael Paquier wrote:
>> I am still studying a lot of this area, but it seems like all the
>> spots requiring a custom configuration (aka NO_INSTALLCHECK) are
>> covered.  --setup running is working here with 0003.
>
> Thanks for checking.

For the archives' sake: this has been applied as of 6a20b04 and
c3315a7.
--
Michael

Attachment