Re: zstd compression for pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: zstd compression for pg_dump
Date
Msg-id ZCd6sb7XA/W6nLaD@telsasoft.com
Whole thread Raw
In response to Re: zstd compression for pg_dump  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: zstd compression for pg_dump
List pgsql-hackers
On Sat, Apr 01, 2023 at 02:11:12AM +0200, Tomas Vondra wrote:
> On 4/1/23 01:16, Justin Pryzby wrote:
> > On Tue, Mar 28, 2023 at 06:23:26PM +0200, Tomas Vondra wrote:
> >> On 3/27/23 19:28, Justin Pryzby wrote:
> >>> On Fri, Mar 17, 2023 at 03:43:31AM +0100, Tomas Vondra wrote:
> >>>> On 3/16/23 05:50, Justin Pryzby wrote:
> >>>>> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
> >>>>>> On Wed, Mar 8, 2023 at 10:59 AM Jacob Champion <jchampion@timescale.com> wrote:
> >>>>>>> I did some smoke testing against zstd's GitHub release on Windows. To
> >>>>>>> build against it, I had to construct an import library, and put that
> >>>>>>> and the DLL into the `lib` folder expected by the MSVC scripts...
> >>>>>>> which makes me wonder if I've chosen a harder way than necessary?
> >>>>>>
> >>>>>> It looks like pg_dump's meson.build is missing dependencies on zstd
> >>>>>> (meson couldn't find the headers in the subproject without them).
> >>>>>
> >>>>> I saw that this was added for LZ4, but I hadn't added it for zstd since
> >>>>> I didn't run into an issue without it.  Could you check that what I've
> >>>>> added works for your case ?
> >>>>>
> >>>>>>> Parallel zstd dumps seem to work as expected, in that the resulting
> >>>>>>> pg_restore output is identical to uncompressed dumps and nothing
> >>>>>>> explodes. I haven't inspected the threading implementation for safety
> >>>>>>> yet, as you mentioned.
> >>>>>>
> >>>>>> Hm. Best I can tell, the CloneArchive() machinery is supposed to be
> >>>>>> handling safety for this, by isolating each thread's state. I don't feel
> >>>>>> comfortable pronouncing this new addition safe or not, because I'm not
> >>>>>> sure I understand what the comments in the format-specific _Clone()
> >>>>>> callbacks are saying yet.
> >>>>>
> >>>>> My line of reasoning for unix is that pg_dump forks before any calls to
> >>>>> zstd.  Nothing zstd does ought to affect the pg_dump layer.  But that
> >>>>> doesn't apply to pg_dump under windows.  This is an opened question.  If
> >>>>> there's no solid answer, I could disable/ignore the option (maybe only
> >>>>> under windows).
> >>>>
> >>>> I may be missing something, but why would the patch affect this? Why
> >>>> would it even affect safety of the parallel dump? And I don't see any
> >>>> changes to the clone stuff ...
> >>>
> >>> zstd supports using threads during compression, with -Z zstd:workers=N.
> >>> When unix forks, the child processes can't do anything to mess up the
> >>> state of the parent processes.  
> >>>
> >>> But windows pg_dump uses threads instead of forking, so it seems
> >>> possible that the pg_dump -j threads that then spawn zstd threads could
> >>> "leak threads" and break the main thread.  I suspect there's no issue,
> >>> but we still ought to verify that before declaring it safe.
> >>
> >> OK. I don't have access to a Windows machine so I can't test that. Is it
> >> possible to disable the zstd threading, until we figure this out?
> > 
> > I think that's what's best.  I made it issue a warning if "workers" was
> > specified.  It could also be an error, or just ignored.
> > 
> > I considered disabling workers only for windows, but realized that I
> > haven't tested with threads myself - my local zstd package is compiled
> > without threading, and I remember having some issue recompiling it with
> > threading.  Jacob's recipe for using meson wraps works well, but it
> > still seems better to leave it as a future feature.  I used that recipe
> > to enabled zstd with threading on CI (except for linux/autoconf).
> 
> +1 to disable this if we're unsure it works correctly. I agree it's
> better to just error out if workers are requested - I rather dislike
> when a tool just ignores an explicit parameter. And AFAICS it's what
> zstd does too, when someone requests workers on incompatible build.
> 
> FWIW I've been thinking about this a bit more and I don't quite see why
> would the threading cause issues (except for Windows). I forgot
> pg_basebackup already supports zstd, including the worker threading, so
> why would it work there and not in pg_dump? Sure, pg_basebackup is not
> parallel, but with separate pg_dump processes that shouldn't be an issue
> (although I'm not sure when zstd creates threads).

There's no concern at all except under windows (because on windows
pg_dump -j is implemented using threads rather than forking).
Especially since zstd:workers is already allowed in the basebackup
backend process.

> I'll try building zstd with threading enabled, and do some tests over
> the weekend.

Feel free to wait until v17 :)

I used "meson wraps" to get a local version with threading.  Note that
if you want to use a zstd subproject, you may have to specify -D
zstd=enabled, or else meson may not enable the library at all.

Also, in order to introspect its settings, I had to do like this:

mkdir subprojects
meson wrap install zstd
meson subprojects download
mkdir build.meson
meson setup -C build.meson --force-fallback-for=zstd

-- 
Justin



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Option to not use ringbuffer in VACUUM, using it in failsafe mode
Next
From: Thomas Munro
Date:
Subject: WL_SOCKET_ACCEPT fairness on Windows