Thread: Windows CFBot is broken because ecpg dec_test.c error

Windows CFBot is broken because ecpg dec_test.c error

From
Jelte Fennema-Nio
Date:
Since about ~11 hours ago the ecpg test is consistently failing on
Window with this error[1]:

> Could not open file C:/cirrus/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for reading

I took a quick look at possible causes but couldn't find a clear
winner. My current guess is that there's some dependency rule missing
in the meson file and due to some infra changes files now get compiled
in the wrong order.

One recent suspicious commit seems to be:
7819a25cd101b574f5422edb00fe3376fbb646da
But there are a bunch of successful changes that include that commit,
so it seems to be a red herring. (CC-ed Noah anyway)

[1]: https://cirrus-ci.com/task/6305422665056256?logs=check_world#L143



Re: Windows CFBot is broken because ecpg dec_test.c error

From
Andres Freund
Date:
Hi,

On January 28, 2025 7:13:16 AM EST, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
>Since about ~11 hours ago the ecpg test is consistently failing on
>Window with this error[1]:
>
>> Could not open file C:/cirrus/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for reading
>
>I took a quick look at possible causes but couldn't find a clear
>winner. My current guess is that there's some dependency rule missing
>in the meson file and due to some infra changes files now get compiled
>in the wrong order.
>
>One recent suspicious commit seems to be:
>7819a25cd101b574f5422edb00fe3376fbb646da
>But there are a bunch of successful changes that include that commit,
>so it seems to be a red herring. (CC-ed Noah anyway)

I think it's due to a new version of meson. Seems we under specified test dependencies. I'll write up a patch.

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: Windows CFBot is broken because ecpg dec_test.c error

From
Nazir Bilal Yavuz
Date:
Hi,

On Tue, 28 Jan 2025 at 17:02, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On January 28, 2025 7:13:16 AM EST, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> >Since about ~11 hours ago the ecpg test is consistently failing on
> >Window with this error[1]:
> >
> >> Could not open file C:/cirrus/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for reading
> >
> >I took a quick look at possible causes but couldn't find a clear
> >winner. My current guess is that there's some dependency rule missing
> >in the meson file and due to some infra changes files now get compiled
> >in the wrong order.
> >
> >One recent suspicious commit seems to be:
> >7819a25cd101b574f5422edb00fe3376fbb646da
> >But there are a bunch of successful changes that include that commit,
> >so it seems to be a red herring. (CC-ed Noah anyway)
>
> I think it's due to a new version of meson. Seems we under specified test dependencies. I'll write up a patch.

The cause is that meson fixed a bug [1] in v.1.7.0. Before meson
v1.7.0; although --no-rebuild is used while running tests, meson was
building all targets. This is fixed with v.1.7.0.

The change below fixes the problem:

diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
index 18e944ca89d..c7a94ff6471 100644
--- a/.cirrus.tasks.yml
+++ b/.cirrus.tasks.yml
@@ -17,7 +17,7 @@ env:
   CHECK: check-world PROVE_FLAGS=$PROVE_FLAGS
   CHECKFLAGS: -Otarget
   PROVE_FLAGS: --timer
-  MTEST_ARGS: --print-errorlogs --no-rebuild -C build
+  MTEST_ARGS: --print-errorlogs -C build
   PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
   TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
   PG_TEST_EXTRA: kerberos ldap ssl libpq_encryption load_balance

And I think this is the correct approach. It builds all of the
not-yet-built targets before running the tests. Another solution might
be manually building ecpg target before running tests but I think the
former approach is more suitable for the CI.

CI run after this change applied: https://cirrus-ci.com/build/6264369203380224

[1] https://mesonbuild.com/Release-notes-for-1-7-0.html#test-targets-no-longer-built-by-default

--
Regards,
Nazir Bilal Yavuz
Microsoft



Re: Windows CFBot is broken because ecpg dec_test.c error

From
Andres Freund
Date:
Hi,

On 2025-01-29 18:24:45 +0300, Nazir Bilal Yavuz wrote:
> On Tue, 28 Jan 2025 at 17:02, Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On January 28, 2025 7:13:16 AM EST, Jelte Fennema-Nio <postgres@jeltef.nl> wrote:
> > >Since about ~11 hours ago the ecpg test is consistently failing on
> > >Window with this error[1]:
> > >
> > >> Could not open file C:/cirrus/build/src/interfaces/ecpg/test/compat_informix/dec_test.c for reading
> > >
> > >I took a quick look at possible causes but couldn't find a clear
> > >winner. My current guess is that there's some dependency rule missing
> > >in the meson file and due to some infra changes files now get compiled
> > >in the wrong order.
> > >
> > >One recent suspicious commit seems to be:
> > >7819a25cd101b574f5422edb00fe3376fbb646da
> > >But there are a bunch of successful changes that include that commit,
> > >so it seems to be a red herring. (CC-ed Noah anyway)
> >
> > I think it's due to a new version of meson. Seems we under specified test dependencies. I'll write up a patch.

Sorry, got distracted with somewhat pressing matters.


> The cause is that meson fixed a bug [1] in v.1.7.0. Before meson
> v1.7.0; although --no-rebuild is used while running tests, meson was
> building all targets. This is fixed with v.1.7.0.
>
> [1] https://mesonbuild.com/Release-notes-for-1-7-0.html#test-targets-no-longer-built-by-default

That's not *quite* right - it wasn't that the targets were built when
--no-rebuild was specified, it's that the default build target (for just
'ninja'), built all test dependencies.


> The change below fixes the problem:
>
> diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
> index 18e944ca89d..c7a94ff6471 100644
> --- a/.cirrus.tasks.yml
> +++ b/.cirrus.tasks.yml
> @@ -17,7 +17,7 @@ env:
>    CHECK: check-world PROVE_FLAGS=$PROVE_FLAGS
>    CHECKFLAGS: -Otarget
>    PROVE_FLAGS: --timer
> -  MTEST_ARGS: --print-errorlogs --no-rebuild -C build
> +  MTEST_ARGS: --print-errorlogs -C build
>    PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
>    TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
>    PG_TEST_EXTRA: kerberos ldap ssl libpq_encryption load_balance
>
> And I think this is the correct approach. It builds all of the
> not-yet-built targets before running the tests. Another solution might
> be manually building ecpg target before running tests but I think the
> former approach is more suitable for the CI.
>
> CI run after this change applied: https://cirrus-ci.com/build/6264369203380224

I don't think that's the entirety of the issue.

Our dependencies aren't quite airtight enough. With a sufficiently modern
meson, try doing e.g.

  rm -rf tmp_install/ && ninja clean && meson test --suite setup --suite ecpg

It'll fail, because the dependencies of the tests are insufficient.

See the set of patches at
https://www.postgresql.org/message-id/qh4c5tvkgjef7jikjig56rclbcdrrotngnwpycukd2n3k25zi2%4044hxxvtwmgum


I think the only reason your patch on its own suffices, is that the "all"
target, that we ran separately beforehand, actually has sufficient
dependencies to make things work.



The nice thing is that with this meson improvement we should be able to get
rid of the "setup" test suite and instead generate the test install via
dependencies.  Obviously we either have to wait a fair bit or do it depending
on the meson version...


Greetings,

Andres Freund



Re: Windows CFBot is broken because ecpg dec_test.c error

From
Nazir Bilal Yavuz
Date:
Hi,

On Wed, 29 Jan 2025 at 19:50, Andres Freund <andres@anarazel.de> wrote:
>
> On 2025-01-29 18:24:45 +0300, Nazir Bilal Yavuz wrote:
>
> > The cause is that meson fixed a bug [1] in v.1.7.0. Before meson
> > v1.7.0; although --no-rebuild is used while running tests, meson was
> > building all targets. This is fixed with v.1.7.0.
> >
> > [1] https://mesonbuild.com/Release-notes-for-1-7-0.html#test-targets-no-longer-built-by-default
>
> That's not *quite* right - it wasn't that the targets were built when
> --no-rebuild was specified, it's that the default build target (for just
> 'ninja'), built all test dependencies.
>
>
> > The change below fixes the problem:
> >
> > diff --git a/.cirrus.tasks.yml b/.cirrus.tasks.yml
> > index 18e944ca89d..c7a94ff6471 100644
> > --- a/.cirrus.tasks.yml
> > +++ b/.cirrus.tasks.yml
> > @@ -17,7 +17,7 @@ env:
> >    CHECK: check-world PROVE_FLAGS=$PROVE_FLAGS
> >    CHECKFLAGS: -Otarget
> >    PROVE_FLAGS: --timer
> > -  MTEST_ARGS: --print-errorlogs --no-rebuild -C build
> > +  MTEST_ARGS: --print-errorlogs -C build
> >    PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
> >    TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
> >    PG_TEST_EXTRA: kerberos ldap ssl libpq_encryption load_balance
> >
> > And I think this is the correct approach. It builds all of the
> > not-yet-built targets before running the tests. Another solution might
> > be manually building ecpg target before running tests but I think the
> > former approach is more suitable for the CI.
> >
> > CI run after this change applied: https://cirrus-ci.com/build/6264369203380224
>
> I don't think that's the entirety of the issue.
>
> Our dependencies aren't quite airtight enough. With a sufficiently modern
> meson, try doing e.g.
>
>   rm -rf tmp_install/ && ninja clean && meson test --suite setup --suite ecpg
>
> It'll fail, because the dependencies of the tests are insufficient.
>
> See the set of patches at
> https://www.postgresql.org/message-id/qh4c5tvkgjef7jikjig56rclbcdrrotngnwpycukd2n3k25zi2%4044hxxvtwmgum
>
>
> I think the only reason your patch on its own suffices, is that the "all"
> target, that we ran separately beforehand, actually has sufficient
> dependencies to make things work.

Yes, you are right. I agree that what you said is the correct solution
and that should be the ultimate goal. What I shared could be a
band-aid fix to make the Windows CI task happy until the patches you
shared get committed. Another solution might be to downgrade the meson
version in the Windows images at the CI repository [1], that would be
better for the commit history.

[1] https://github.com/anarazel/pg-vm-images

-- 
Regards,
Nazir Bilal Yavuz
Microsoft