Re: zstd compression for pg_dump - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: zstd compression for pg_dump |
Date | |
Msg-id | 0022dd9e-a439-d72f-d09b-ec512c23949b@enterprisedb.com Whole thread Raw |
In response to | Re: zstd compression for pg_dump (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: zstd compression for pg_dump
Re: zstd compression for pg_dump |
List | pgsql-hackers |
On 3/27/23 19:28, Justin Pryzby wrote: > On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote: >> On 3/16/23 05:50, Justin Pryzby wrote: >>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote: >>>> On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion <jchampion@timescale.com> wrote: >>>>> I did some smoke testing against zstd's GitHub release on Windows. To >>>>> build against it, I had to construct an import library, and put that >>>>> and the DLL into the `lib` folder expected by the MSVC scripts... >>>>> which makes me wonder if I've chosen a harder way than necessary? >>>> >>>> It looks like pg_dump's meson.build is missing dependencies on zstd >>>> (meson couldn't find the headers in the subproject without them). >>> >>> I saw that this was added for LZ4, but I hadn't added it for zstd since >>> I didn't run into an issue without it. Could you check that what I've >>> added works for your case ? >>> >>>>> Parallel zstd dumps seem to work as expected, in that the resulting >>>>> pg_restore output is identical to uncompressed dumps and nothing >>>>> explodes. I haven't inspected the threading implementation for safety >>>>> yet, as you mentioned. >>>> >>>> Hm. Best I can tell, the CloneArchive() machinery is supposed to be >>>> handling safety for this, by isolating each thread's state. I don't feel >>>> comfortable pronouncing this new addition safe or not, because I'm not >>>> sure I understand what the comments in the format-specific _Clone() >>>> callbacks are saying yet. >>> >>> My line of reasoning for unix is that pg_dump forks before any calls to >>> zstd. Nothing zstd does ought to affect the pg_dump layer. But that >>> doesn't apply to pg_dump under windows. This is an opened question. If >>> there's no solid answer, I could disable/ignore the option (maybe only >>> under windows). >> >> I may be missing something, but why would the patch affect this? Why >> would it even affect safety of the parallel dump? And I don't see any >> changes to the clone stuff ... > > zstd supports using threads during compression, with -Z zstd:workers=N. > When unix forks, the child processes can't do anything to mess up the > state of the parent processes. > > But windows pg_dump uses threads instead of forking, so it seems > possible that the pg_dump -j threads that then spawn zstd threads could > "leak threads" and break the main thread. I suspect there's no issue, > but we still ought to verify that before declaring it safe. > OK. I don't have access to a Windows machine so I can't test that. Is it possible to disable the zstd threading, until we figure this out? >>> The function is first checking if it was passed a filename which already >>> has a suffix. And if not, it searches through a list of suffixes, >>> testing for an existing file with each suffix. The search with stat() >>> doesn't happen if it has a suffix. I'm having trouble seeing how the >>> hasSuffix() branch isn't dead code. Another opened question. >> >> AFAICS it's done this way because of this comment in pg_backup_directory >> >> * ... >> * ".gz" suffix is added to the filenames. The TOC files are never >> * compressed by pg_dump, however they are accepted with the .gz suffix >> * too, in case the user has manually compressed them with 'gzip'. >> >> I haven't tried, but I believe that if you manually compress the >> directory, it may hit this branch. > > That would make sense, but when I tried, it didn't work like that. > The filenames were all uncompressed names. Maybe it worked differently > in an older release. Or maybe it changed during development of the > parallel-directory-dump patch and it's actually dead code. > Interesting. Would be good to find out. I wonder if a little bit of git-log digging could tell us more. Anyway, until we confirm it's dead code, we should probably do what .gz does and have the same check for .lz4 and .zst files. > This is rebased over the updated compression API. > > It seems like I misunderstood something you said before, so now I put > back "supports_compression()". > Thanks! I need to do a bit more testing and review, but it seems pretty much RFC to me, assuming we can figure out what to do about threading. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: