Re: Robocopy might be not robust enough for never-ending testing on Windows - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: Robocopy might be not robust enough for never-ending testing on Windows |
Date | |
Msg-id | 71a57d38-1c4f-4c2d-15e4-520802283c56@gmail.com Whole thread Raw |
In response to | Re: Robocopy might be not robust enough for never-ending testing on Windows (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Robocopy might be not robust enough for never-ending testing on Windows
|
List | pgsql-hackers |
Hello Thomas, 17.09.2024 04:01, Thomas Munro wrote: > On Mon, Sep 16, 2024 at 6:00 PM Alexander Lakhin <exclusion@gmail.com> wrote: >> So this leak looks like a recent and still existing defect. > From my cartoon-like understanding of Windows, I would guess that if > event handles created by a program are leaked after it has exited, it > would normally imply that they've been duplicated somewhere else that > is still running (for example see the way that PostgreSQL's > dsm_impl_pin_segment() calls DuplicateHandle() to give a copy to the > postmaster, so that the memory segment continues to exist after the > backend exits), and if it's that, you'd be able to see the handle > count going up in the process monitor for some longer running process > somewhere (as seen in this report from the Chrome hackers[1]). And if > it's not that, then I would guess it would have to be a kernel bug > because something outside userspace must be holding onto/leaking > handles. But I don't really understand Windows beyond trying to debug > PostgreSQL at a distance, so my guesses may be way off. If we wanted > to try to find a Windows expert to look at a standalone repro, does > your PS script work with *any* source directory, or is there something > about the initdb template, in which case could you post it in a .zip > file so that a non-PostgreSQL person could see the failure mode? > > [1] https://randomascii.wordpress.com/2021/07/25/finding-windows-handle-leaks-in-chromium-and-others/ That's very interesting reading. I'll try to research the issue that deep later (though I guess this case is different — after logging off and logging in as another user, I can't see any processes belonging to the first one, while those "Event objects" in non-paged pool still occupy memory), but finding a Windows expert who perhaps can look at the robocopy's sources, would be good too (and more productive). So, the repro we can show is: rm -r c:\temp\source mkdir c:\temp\source for ($i = 1; $i -le 1000; $i++) { echo 1 > "c:\temp\source\$i" } for ($i = 1; $i -le 1000; $i++) { echo "iteration $i" rm -r c:\temp\target robocopy.exe /E /NJH /NFL /NDL /NP c:\temp\source c:\temp\target Get-WmiObject -Class Win32_PerfRawData_PerfOS_Memory | % PoolNonpagedBytes } It produces for me (on Windows 10 [Version 10.0.19045.4780]): iteration 1 ... 216887296 ... iteration 1000 ------------------------------------------------------------------------------ Total Copied Skipped Mismatch FAILED Extras Dirs : 1 1 0 0 0 0 Files : 1000 1000 0 0 0 0 Bytes : 7.8 k 7.8 k 0 0 0 0 Times : 0:00:00 0:00:00 0:00:00 0:00:00 Speed : 17660 Bytes/sec. Speed : 1.010 MegaBytes/min. Ended : Monday, September 16, 2024 8:58:09 PM 365080576 Just "touch c:\temp\source\$i" is not enough, files must be non-empty for the leak to happen. Best regards, Alexander
pgsql-hackers by date: