Re: zstd compression for pg_dump - Mailing list pgsql-hackers

From Jacob Champion
Subject Re: zstd compression for pg_dump
Date
Msg-id CAAWbhmjNb-m+bPVRWk4fk9hfZDERu3ZkU5E1RrFhTz5tn-JRMw@mail.gmail.com
Whole thread Raw
In response to Re: zstd compression for pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Wed, Mar 15, 2023 at 9:50 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Fri, Mar 10, 2023 at 12:48:13PM -0800, Jacob Champion wrote:
> > It looks like pg_dump's meson.build is missing dependencies on zstd
> > (meson couldn't find the headers in the subproject without them).
>
> I saw that this was added for LZ4, but I hadn't added it for zstd since
> I didn't run into an issue without it.  Could you check that what I've
> added works for your case ?

I thought I replied to this, sorry -- your newest patchset builds
correctly with subprojects, so the new dependency looks good to me.
Thanks!

> > Hm. Best I can tell, the CloneArchive() machinery is supposed to be
> > handling safety for this, by isolating each thread's state. I don't feel
> > comfortable pronouncing this new addition safe or not, because I'm not
> > sure I understand what the comments in the format-specific _Clone()
> > callbacks are saying yet.
>
> My line of reasoning for unix is that pg_dump forks before any calls to
> zstd.  Nothing zstd does ought to affect the pg_dump layer.  But that
> doesn't apply to pg_dump under windows.  This is an opened question.  If
> there's no solid answer, I could disable/ignore the option (maybe only
> under windows).

To (maybe?) move this forward a bit, note that pg_backup_custom's
_Clone() function makes sure that there is no active compressor state
at the beginning of the new thread. pg_backup_directory's
implementation has no such provision. And I don't think it can,
because the parent thread might have concurrently set one up -- see
the directory-specific implementation of _CloseArchive(). Perhaps we
should just NULL out those fields after the copy, instead?

To illustrate why I think this is tough to characterize: if I've read
the code correctly, the _Clone() and CloneArchive() implementations
are running concurrently with code that is actively modifying the
ArchiveHandle and the lclContext. So safety is only ensured to the
extent that we keep track of which fields threads are allowed to
touch, and I don't have that mental model.

--Jacob



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Why mark empty pages all visible?
Next
From: Michael Paquier
Date:
Subject: Re: Add pg_walinspect function with block info columns