Thread: Use windows VMs instead of windows containers on the CI

Use windows VMs instead of windows containers on the CI

From
Nazir Bilal Yavuz
Date:

Hi,

I propose using windows VMs instead of containers, the patch is attached. Currently, windows containers are used on the CI, but these container images are needs to get pulled on every CI run, also they are slow to run.

These VM images are created in the same way how container images are created [1].

The comparison between VMs and containers are (based on d952373a98 and with same numbers of CPU and memory):

Scheduling step:


VS 2019  
MinGW64
VM [2]
00:17m
00:16m
Container [3]
03:51m04:28m

Execution step:


VS 2019  
MinGW64
VM [2]
12:16m
07.55m
Container [3]
26:02m16:34m

There is more than 2x speed gain when VMs are used.

[1] https://github.com/anarazel/pg-vm-images/blob/main/packer/windows.pkr.hcl
[2] https://cirrus-ci.com/build/4720774045499392
[3] https://cirrus-ci.com/build/5468256027279360

Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

Re: Use windows VMs instead of windows containers on the CI

From
Nazir Bilal Yavuz
Date:

Hi,

Tables didn't seem nice on web interface. Re-sending with correct formatting.

Scheduling step:

| VS 2019 | MinGW64
--------------------------------------------------------------
VM | 00:17m | 00:16m
--------------------------------------------------------------
Container | 03:51m | 04:28m

Execution step:

| VS 2019 | MinGW64
--------------------------------------------------------------
VM | 12:16m | 07:55m
--------------------------------------------------------------
Container | 26:02m | 16:34m


Regards,
Nazir Bilal Yavuz
Microsoft

Re: Use windows VMs instead of windows containers on the CI

From
Nazir Bilal Yavuz
Date:

Hi,

It didn't work again. Sending numbers until I figure out how to solve this.

Scheduling Step:

VM + VS 2019: 00.17m
Container + VS 2019: 03.51m

VM + MinGW64: 00.16m
Container + MinGW64: 04.28m


Execution step:

VM + VS 2019: 12.16m
Container + VS 2019: 26.02m

VM + MinGW64: 07.55m
Container + MinGW64: 16.34m


Sorry for the multiple mails.


Regards,
Nazir Bilal Yavuz
Microsoft

Re: Use windows VMs instead of windows containers on the CI

From
Justin Pryzby
Date:
On Tue, Jan 10, 2023 at 03:20:18PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> I propose using windows VMs instead of containers, the patch is attached.
> Currently, windows containers are used on the CI, but these container images
> are needs to get pulled on every CI run, also they are slow to run.

+1

> There is more than 2x speed gain when VMs are used.

One consideration is that if windows runs twice as fast, we'll suddenly
start using twice as many resources at cirrus/google/amazon - the
windows task has been throttling everything else.  Not sure if we should
to do anything beyond the limits that cfbot already uses.

-- 
Justin



Re: Use windows VMs instead of windows containers on the CI

From
Andres Freund
Date:
Hi,

On 2023-01-10 09:22:12 -0600, Justin Pryzby wrote:
> > There is more than 2x speed gain when VMs are used.
> 
> One consideration is that if windows runs twice as fast, we'll suddenly
> start using twice as many resources at cirrus/google/amazon - the
> windows task has been throttling everything else.  Not sure if we should
> to do anything beyond the limits that cfbot already uses.

I'm not sure we would. cfbot has a time based limit for how often it tries to
rebuild entries, and I think we were just about keeping up with that. In which
case we shouldn't, on average, schedule more jobs than we currently
do. Although peak "job throughput" would be higher.

Thomas?

Greetings,

Andres Freund



Re: Use windows VMs instead of windows containers on the CI

From
Thomas Munro
Date:
On Wed, Jan 11, 2023 at 8:20 AM Andres Freund <andres@anarazel.de> wrote:
> On 2023-01-10 09:22:12 -0600, Justin Pryzby wrote:
> > > There is more than 2x speed gain when VMs are used.
> >
> > One consideration is that if windows runs twice as fast, we'll suddenly
> > start using twice as many resources at cirrus/google/amazon - the
> > windows task has been throttling everything else.  Not sure if we should
> > to do anything beyond the limits that cfbot already uses.
>
> I'm not sure we would. cfbot has a time based limit for how often it tries to
> rebuild entries, and I think we were just about keeping up with that. In which
> case we shouldn't, on average, schedule more jobs than we currently
> do. Although peak "job throughput" would be higher.
>
> Thomas?

It currently tries to re-test each patch every 24 hours, but doesn't
achieve that.  It looks like it's currently re-testing every ~30
hours.  Justin's right, we'll consume more non-Windows resources if
Windows speeds up, but not 2x, more like 1.25x when cfbot's own
throttling kicks in.  Or I could change the cycle target to 36 or 48
hours, to spread the work out more.

Back-of-a-napkin maths:

 * there are currently 240 entries in a testable status
 * it takes ~0.5 hours to test (because that's the slow Windows time)
 * therefore it takes ~120 hours to test them all
 * but we can do 4 at a time, so that's ~30 hours to get through them
all and start again
 * that matches what we see:

cfbot=> select created - lag(created) over (order by created) from
branch where submission_id = 4068;
       ?column?
-----------------------

 1 day 06:30:00.265047
 1 day 05:43:59.978949
 1 day 04:13:59.754048
 1 day 05:28:59.811916
 1 day 07:00:00.651655
(6 rows)

If, with this change, we can test in only ~0.25 hours, then we'll only
need 60 hours of Cirrus time to test them all.  With a target of
re-testing every 24 hours, it should now only have to run ~2.5 jobs at
all times.  Having free slots would be kind to Cirrus, and also lower
the latency when a new patch is posted (which currently has to wait
for a free slot before it can begin).  Great news.



Re: Use windows VMs instead of windows containers on the CI

From
Justin Pryzby
Date:
On Tue, Jan 10, 2023 at 03:20:18PM +0300, Nazir Bilal Yavuz wrote:
> Hi,
> 
> I propose using windows VMs instead of containers, the patch is attached.
> Currently, windows containers are used on the CI, but these container images
> are needs to get pulled on every CI run, also they are slow to run.

> @@ -589,8 +591,10 @@ task:
>    # otherwise it'll be sorted before other tasks
>    depends_on: SanityCheck
>  
> -  windows_container:
> -    image: $CONTAINER_REPO/windows_ci_mingw64:latest
> +  compute_engine_instance:
> +    image_project: $IMAGE_PROJECT
> +    image: family/pg-ci-windows-ci-mingw64
> +    platform: windows
>      cpu: $CPUS
>      memory: 4G

It looks like MinGW currently doesn't have the necessary perl modules:

[19:58:46.356] Message: Can't locate IPC/Run.pm in @INC (you may need to install the IPC::Run module) (@INC contains:
C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1
C:/msys64/ucrt64/lib/perl5/site_perlC:/msys64/ucrt64/lib/perl5/vendor_perl C:/msys64/ucrt64/lib/perl5/core_perl) at
config/check_modules.plline 11.
 
[19:58:46.356] BEGIN failed--compilation aborted at config/check_modules.pl line 11.
[19:58:46.356] meson.build:1337: WARNING: Additional Perl modules are required to run TAP tests.

That could be caused by a transient failure combined with bad error
handling - if there's an error while building the image, it shouldn't be
uploaded.

-- 
Justin



Re: Use windows VMs instead of windows containers on the CI

From
Andres Freund
Date:
Hi,

On 2023-01-11 17:21:21 -0600, Justin Pryzby wrote:
> On Tue, Jan 10, 2023 at 03:20:18PM +0300, Nazir Bilal Yavuz wrote:
> > Hi,
> > 
> > I propose using windows VMs instead of containers, the patch is attached.
> > Currently, windows containers are used on the CI, but these container images
> > are needs to get pulled on every CI run, also they are slow to run.
> 
> > @@ -589,8 +591,10 @@ task:
> >    # otherwise it'll be sorted before other tasks
> >    depends_on: SanityCheck
> >  
> > -  windows_container:
> > -    image: $CONTAINER_REPO/windows_ci_mingw64:latest
> > +  compute_engine_instance:
> > +    image_project: $IMAGE_PROJECT
> > +    image: family/pg-ci-windows-ci-mingw64
> > +    platform: windows
> >      cpu: $CPUS
> >      memory: 4G
> 
> It looks like MinGW currently doesn't have the necessary perl modules:
> 
> [19:58:46.356] Message: Can't locate IPC/Run.pm in @INC (you may need to install the IPC::Run module) (@INC contains:
C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1C:/msys64/ucrt64/lib/perl5/site_perl/5.32.1
C:/msys64/ucrt64/lib/perl5/site_perlC:/msys64/ucrt64/lib/perl5/vendor_perl C:/msys64/ucrt64/lib/perl5/core_perl) at
config/check_modules.plline 11.
 
> [19:58:46.356] BEGIN failed--compilation aborted at config/check_modules.pl line 11.
> [19:58:46.356] meson.build:1337: WARNING: Additional Perl modules are required to run TAP tests.
> 
> That could be caused by a transient failure combined with bad error
> handling - if there's an error while building the image, it shouldn't be
> uploaded.

Yea, there's a problem where packer on windows doesn't seem to abort after a
powershell script error out. The reason isn't yet quiete clear. I think Bilal
is working on a workaround.

Greetings,

Andres Freund



Re: Use windows VMs instead of windows containers on the CI

From
Nazir Bilal Yavuz
Date:
Hi,


On 1/12/2023 3:30 AM, Andres Freund wrote:
> Yea, there's a problem where packer on windows doesn't seem to abort after a
> powershell script error out. The reason isn't yet quiete clear. I think Bilal
> is working on a workaround.


That should be fixed now. Also, adding a patch for PG15. There were 
conflicts while applying current patch to the REL_15_STABLE branch.


Regards,
Nazir Bilal Yavuz
Microsoft

Attachment

Re: Use windows VMs instead of windows containers on the CI

From
Andres Freund
Date:
Hi,

On 2023-02-02 17:47:37 +0300, Nazir Bilal Yavuz wrote:
> On 1/12/2023 3:30 AM, Andres Freund wrote:
> > Yea, there's a problem where packer on windows doesn't seem to abort after a
> > powershell script error out. The reason isn't yet quiete clear. I think Bilal
> > is working on a workaround.
> 
> 
> That should be fixed now. Also, adding a patch for PG15. There were
> conflicts while applying current patch to the REL_15_STABLE branch.

And pushed!  I think an improvement in CI times of this degree is pretty
awesome.


Unfortunately I also noticed that the tap tests on mingw don't run
anymore, due to IPC::Run not being available. But it's independent of
this change. I don't know when that broke. Could you check it out?

Greetings,

Andres Freund



Re: Use windows VMs instead of windows containers on the CI

From
Thomas Munro
Date:
On Fri, Feb 3, 2023 at 6:57 PM Andres Freund <andres@anarazel.de> wrote:
> And pushed!  I think an improvement in CI times of this degree is pretty
> awesome.

+1

A lot of CI compute time is saved.  The Cirrus account[1] was
previously hitting the 4 job limit all day long, and now it's often
running 1 or 2 jobs when I look, and it has space capacity to start a
new job immediately if someone posts a new patch.  I'll monitor it
over the next few days but it looks great.

[1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql



Re: Use windows VMs instead of windows containers on the CI

From
Bharath Rupireddy
Date:
On Fri, Feb 3, 2023 at 3:27 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Fri, Feb 3, 2023 at 6:57 PM Andres Freund <andres@anarazel.de> wrote:
> > And pushed!  I think an improvement in CI times of this degree is pretty
> > awesome.
>
> +1
>
> A lot of CI compute time is saved.  The Cirrus account[1] was
> previously hitting the 4 job limit all day long, and now it's often
> running 1 or 2 jobs when I look, and it has space capacity to start a
> new job immediately if someone posts a new patch.  I'll monitor it
> over the next few days but it looks great.
>
> [1] https://cirrus-ci.com/github/postgresql-cfbot/postgresql

Oh, wow! This commit drastically improved testing time on Windows. It
was Windows tests that were always behind in my github repo's CI, now
I can see it got much faster. Thanks for working on this.

Windows tests were taking around 27min
(https://cirrus-ci.com/build/6060448332644352) before the patch, but
it came down to 13min (https://cirrus-ci.com/build/6528980879147008)
after the patch - Yay! 2X improvement :).

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com