Re: Robocopy might be not robust enough for never-ending testing on Windows - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: Robocopy might be not robust enough for never-ending testing on Windows
Date
Msg-id 71a57d38-1c4f-4c2d-15e4-520802283c56@gmail.com
Whole thread Raw
In response to Re: Robocopy might be not robust enough for never-ending testing on Windows  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Robocopy might be not robust enough for never-ending testing on Windows
List pgsql-hackers
Hello Thomas,

17.09.2024 04:01, Thomas Munro wrote:
> On Mon, Sep 16, 2024 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
>> So this leak looks like a recent and still existing defect.
>  From my cartoon-like understanding of Windows, I would guess that if
> event handles created by a program are leaked after it has exited, it
> would normally imply that they've been duplicated somewhere else that
> is still running (for example see the way that PostgreSQL's
> dsm_impl_pin_segment() calls DuplicateHandle() to give a copy to the
> postmaster, so that the memory segment continues to exist after the
> backend exits), and if it's that, you'd be able to see the handle
> count going up in the process monitor for some longer running process
> somewhere (as seen in this report from the Chrome hackers[1]).  And if
> it's not that, then I would guess it would have to be a kernel bug
> because something outside userspace must be holding onto/leaking
> handles.  But I don't really understand Windows beyond trying to debug
> PostgreSQL at a distance, so my guesses may be way off.  If we wanted
> to try to find a Windows expert to look at a standalone repro, does
> your PS script work with *any* source directory, or is there something
> about the initdb template, in which case could you post it in a .zip
> file so that a non-PostgreSQL person could see the failure mode?
>
> [1] https://randomascii.wordpress.com/2021/07/25/finding-windows-handle-leaks-in-chromium-and-others/

That's very interesting reading. I'll try to research the issue that deep
later (though I guess this case is different — after logging off and
logging in as another user, I can't see any processes belonging to the
first one, while those "Event objects" in non-paged pool still occupy
memory), but finding a Windows expert who perhaps can look at the
robocopy's sources, would be good too (and more productive).

So, the repro we can show is:
rm -r c:\temp\source
mkdir c:\temp\source
for ($i = 1; $i -le 1000; $i++)
{
echo 1 > "c:\temp\source\$i"
}

for ($i = 1; $i -le 1000; $i++)
{
echo "iteration $i"
rm -r c:\temp\target
robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\source c:\temp\target
Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % PoolNonpagedBytes
}

It produces for me (on Windows 10 [Version 10.0.19045.4780]):
iteration 1
...
216887296
...
iteration 1000


------------------------------------------------------------------------------

                Total    Copied   Skipped  Mismatch    FAILED Extras
     Dirs :         1         1         0         0         0 0
    Files :      1000      1000         0         0         0 0
    Bytes :     7.8 k     7.8 k         0         0         0 0
    Times :   0:00:00   0:00:00                       0:00:00 0:00:00


    Speed :               17660 Bytes/sec.
    Speed :               1.010 MegaBytes/min.
    Ended : Monday, September 16, 2024 8:58:09 PM

365080576

Just "touch c:\temp\source\$i" is not enough, files must be non-empty for
the leak to happen.

Best regards,
Alexander



pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: Add contrib/pg_logicalsnapinspect
Next
From: "David G. Johnston"
Date:
Subject: Re: Add contrib/pg_logicalsnapinspect