Re: refactoring basebackup.c (zstd workers) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: refactoring basebackup.c (zstd workers)
Date
Msg-id CA+TgmoZz-F1pA1Vv=-6FOPWuS1eFi2ifNyySV4u6pW0+FBeG-g@mail.gmail.com
Whole thread Raw
In response to Re: refactoring basebackup.c  (Dipesh Pandit <dipesh.pandit@gmail.com>)
Responses Re: refactoring basebackup.c (zstd workers)  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Mon, Mar 21, 2022 at 9:18 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > > This patch also needs to update the other user-facing docs.
> >
> > Which ones exactly?
>
> I mean pg_basebackup -Z
>
> -Z level
> -Z [{client|server}-]method[:level]
> --compress=level
> --compress=[{client|server}-]method[:level]

Ah, right. Thanks.

Here's v3. I have updated that section of the documentation. I also
went and added a bunch more test cases for validation of compression
detail strings, many inspired by your examples, and fixed all the bugs
that I found in the process. I think the crashes you complained about
are now fixed, but please let me know if I have missed any. I also
added _() calls as you suggested. I searched for the "contain a an"
typo that you mentioned but was not able to find it. Can you give me a
more specific pointer?

I looked a little bit more at the compression method vs. compression
algorithm thing. I agree that there is some inconsistency in
terminology here, but I'm still not sure that we are well-served by
trying to make it totally uniform, especially if we pick the word
"method" as the standard rather than "algorithm". In my opinion,
"method" is less specific than "algorithm". If someone asks me to
choose a compression algorithm, I know that I should give an answer
like "lz4" or "zstd". If they ask me to pick a compression method, I'm
not quite sure whether they want that kind of answer or whether they
want something more detailed, like "use lz4 with compression level 3
and a 1MB block size". After all, that is (at least according to my
understanding of how English works) a perfectly valid answer to the
question "what method should I use to compress this data?" -- but not
to the question "what algorithm should I use to compress this data?".
The latter can ONLY be properly answered by saying something like
"lz4". And I think that's really the root of my hesitation to make the
kinds of changes you want here. If it's just a question of specifying
a compression algorithm and a level, I don't think using the name
"method" for the algorithm is going to be too bad. But as we enrich
the system with multiple compression algorithms each of which may have
multiple and different parameters, I think the whole thing becomes
murkier and the need for precision in language goes up.

Now that is of course an arguable position and you're welcome to
disagree with it, but I think that's part of why I'm hesitating.
Another part of it, at least for me, is that complete uniformity is
not always a positive. I suppose all of us have had the experience at
some point of reading a manual that says something like "to activate
the boil water function, press and release the 'boil water' button"
and rolled our eyes at how useless it was. It's important to me that
we don't fall into that trap. We clearly don't want to go ballistic
and have random inconsistencies in language for no reason, but at the
same time, it's not useful to tell people that METHOD should be
replaced with a compression method and LEVEL with a compression level.
I mean, if you end up saying something like that interspersed with
non-obvious information, that is OK, and I don't want to overstate the
point I'm trying to make. But it seems to me that if there's a little
variation in phrasing and we end up saying that METHOD means the
compression algorithm or that ALGORITHM means the compression method
or whatever, that can actually make things more clear. Here again it's
debatable: how much variation in phraseology is helpful, and at what
point does it just start to seem inconsistent? Well, everyone may have
their own opinion.

I'm not trying to pretend that this patch (or the existing code base)
gets this all right. But I do think that, to the extent that we have a
considered position on what to do here, we can make that change later,
perhaps even after getting some user feedback on what does and does
not make sense to other people. And I also think that what we end up
doing here may well end up being more nuanced than a blanket
search-and-replace. I'm not saying we couldn't make a blanket
search-and-replace. I just don't see it as necessarily creating value,
or being all that closely connected to the goal of this patch, which
is to quickly clean up a forward-compatibility risk before we hit
feature freeze.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: David Christensen
Date:
Subject: Re: [PATCH] add relation and block-level filtering to pg_waldump
Next
From: Christoph Berg
Date:
Subject: Re: pgsql: Add option to use ICU as global locale provider