Re: Robocopy might be not robust enough for never-ending testing on Windows - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Robocopy might be not robust enough for never-ending testing on Windows
Date
Msg-id 8b724988-ba94-25b4-8064-068b6c4b0520@gmail.com
Whole thread Raw
In response to Re: Robocopy might be not robust enough for never-ending testing on Windows  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Robocopy might be not robust enough for never-ending testing on Windows
List pgsql-hackers
Hello Thomas,

14.09.2024 23:32, Thomas Munro wrote:
> On Sun, Sep 15, 2024 at 1:00 AM Alexander Lakhin <exclusion@gmail.com> wrote:
>> (That is, 0.1-0.2 MB leaks per one robocopy run.)
>>
>> I observed this on Windows 10 (Version 10.0.19045.4780), with all updates
>> installed, but not on Windows Server 2016 (10.0.14393.0). Moreover, using
>> robocopy v14393 on Windows 10 doesn't affect the issue.
> I don't understand Windows but that seems pretty weird to me, as it
> seems to imply that a driver or something fairly low level inside the
> kernel is leaking objects (at least by simple minded analogies to
> operating systems I understand better).  Either that or robocop.exe
> has userspace stuff involving at least one thread still running
> somewhere after it's exited, but that seems unlikely as I guess you'd
> have noticed that...

Yes, I see no robocopy process left after the test, and I think userspace
threads would not survive logoff.

> Just a thought: I was surveying the block cloning landscape across
> OSes and filesystems while looking into clone-based CREATE DATABASE
> (CF #4886) and also while thinking about the new TAP test initdb
> template copy trick, is that robocopy.exe tries to use Windows' block
> cloning magic, just like cp on recent Linux and FreeBSD systems (at
> one point I was wondering if that was causing some funky extra flush
> stalls on some systems, I need to come back to that...).  It probably
> doesn't actually work unless you have Windows 11 kernel with DevDrive
> enabled (from reading, no Windows here), but I guess it still probably
> uses the new system interfaces, probably something like CopyFileEx().
> Does it still leak if you use /nooffload or /noclone?

I tested the following (with the script above):
Windows 10 (Version 10.0.19045.4780):
robocopy.exe (10.0.19041.4717) /NOOFFLOAD
iteration 1
496611328
...
iteration 1000
609701888

That is, it leaks

/NOCLONE is not supported by that robocopy version:
ERROR : Invalid Parameter #1 : "/NOCLONE"

Then, Windows 11 (Version 10.0.22000.613), robocopy 10.0.22000.469:
iteration 1
141217792
...
iteration 996
151670784
...
iteration 997
152817664
...
iteration 1000
151674880

That is, it doesn't leak.

robocopy.exe /NOOFFLOAD
iteration 1
152666112
...
iteration 1000
153341952

No leak.

/NOCLONE is not supported by that robocopy version:

Then I updated that Windows 11 to Version 10.0.22000.2538 (with KB5031358),
robocopy 10.0.22000.1516:
iteration 1
122753024
...
iteration 1000
244674560

It does leak.

robocopy /NOOFFLOAD
iteration 1
167522304
...
iteration 1000
283484160

It leaks as well.

Finally, I've installed newest Windows 11 Version 10.0.22631.4169, with
robocopy 10.0.22621.3672:
Non-paged pool increased from 133 to 380 MB after 1000 robocopy runs.

robocopy /OFFLOAD leaks too.

/NOCLONE is not supported by that robocopy version:

So this leak looks like a recent and still existing defect.

(Sorry for a delay, fighting with OS updates/installation took me a while.)

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Allow logical failover slots to wait on synchronous replication
Next
From: Tony Wayne
Date:
Subject: Re: A starter task