Thread: Introduce pg_receivewal gzip compression tests

Introduce pg_receivewal gzip compression tests

From
Georgios
Date:
Hi,

As suggested on a different thread [1], pg_receivewal can increase it's test
coverage. There exists a non trivial amount of code that handles gzip
compression. The current patch introduces tests that cover creation of gzip
compressed WAL files and the handling of gzip partial segments. Finally the
integrity of the compressed files is verified.

I hope you find this useful.

Cheers,
//Georgios

[1]
https://www.postgresql.org/message-id/flat/ZCm1J5vfyQ2E6dYvXz8si39HQ2gwxSZ3IpYaVgYa3lUwY88SLapx9EEnOf5uEwrddhx2twG7zYKjVeuP5MwZXCNPybtsGouDsAD1o2L_I5E%3D%40pm.me
Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Fri, Jul 09, 2021 at 11:26:58AM +0000, Georgios wrote:
> As suggested on a different thread [1], pg_receivewal can increase it's test
> coverage. There exists a non trivial amount of code that handles gzip
> compression. The current patch introduces tests that cover creation of gzip
> compressed WAL files and the handling of gzip partial segments. Finally the
> integrity of the compressed files is verified.

+       # Verify compressed file's integrity
+       my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
+       is($gzip_is_valid, 0, "program gzip verified file's integrity");
libz and gzip are usually split across different packages, hence there
is no guarantee that this command is always available (same comment as
for LZ4 from a couple of days ago).

+               [
+                       'pg_receivewal', '-D',     $stream_dir, '--verbose',
+                       '--endpos',      $nextlsn, '-Z', '5'
+               ],
I would keep the compression level to a minimum here, to limit CPU
usage but still compress something faster.

+       # Verify compressed file's integrity
+       my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
+       is($gzip_is_valid, 0, "program gzip verified file's integrity");
Shouldn't this be coded as a loop going through @gzip_wals?
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 08:42, Michael Paquier <michael@paquier.xyz> wrote:

> On Fri, Jul 09, 2021 at 11:26:58AM +0000, Georgios wrote:
>
> > As suggested on a different thread [1], pg_receivewal can increase it's test
> >
> > coverage. There exists a non trivial amount of code that handles gzip
> >
> > compression. The current patch introduces tests that cover creation of gzip
> >
> > compressed WAL files and the handling of gzip partial segments. Finally the
> >
> > integrity of the compressed files is verified.
>
> -         # Verify compressed file's integrity
>
>
> -         my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
>
>
> -         is($gzip_is_valid, 0, "program gzip verified file's integrity");
>
>
>
> libz and gzip are usually split across different packages, hence there
>
> is no guarantee that this command is always available (same comment as
>
> for LZ4 from a couple of days ago).


Of course. Though while going for it, I did find in Makefile.global.in:

  TAR = @TAR@
  XGETTEXT = @XGETTEXT@

  GZIP    = gzip
  BZIP2   = bzip2

  DOWNLOAD = wget -O $@ --no-use-server-timestamps

Which is also used by GNUmakefile.in

  distcheck: dist
      rm -rf $(dummy)
      mkdir $(dummy)
      $(GZIP) -d -c $(distdir).tar.gz | $(TAR) xf -
      install_prefix=`cd $(dummy) && pwd`; \


This to my understanding means that gzip is expected to exist.
If this is correct, then simply checking for the headers should
suffice, since that is the only dependency for the files to be
created.

If this is wrong, then I will add the discovery code as in the
other patch.

>
> -                 [
>
>
> -                         'pg_receivewal', '-D',     $stream_dir, '--verbose',
>
>
> -                         '--endpos',      $nextlsn, '-Z', '5'
>
>
> -                 ],
>
>
>
> I would keep the compression level to a minimum here, to limit CPU
>
> usage but still compress something faster.
>
> -         # Verify compressed file's integrity
>
>
> -         my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
>
>
> -         is($gzip_is_valid, 0, "program gzip verified file's integrity");
>
>
>
> Shouldn't this be coded as a loop going through @gzip_wals?

I would hope that there is only one gz file created. There is a line
further up that tests exactly that.

+   is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");


Then there should also be a partial gz file which is tested further ahead.

Cheers,
//Georgios

> -----------------------------------------------------------
>
> Michael



Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 11:42, <gkokolatos@pm.me> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Monday, July 12th, 2021 at 08:42, Michael Paquier michael@paquier.xyz wrote:
>
> > On Fri, Jul 09, 2021 at 11:26:58AM +0000, Georgios wrote:
> >
> > > As suggested on a different thread [1], pg_receivewal can increase it's test
> > >
> > > coverage. There exists a non trivial amount of code that handles gzip
> > >
> > > compression. The current patch introduces tests that cover creation of gzip
> > >
> > > compressed WAL files and the handling of gzip partial segments. Finally the
> > >
> > > integrity of the compressed files is verified.
> >
> > -           # Verify compressed file's integrity
> >
> >
> > -           my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
> >
> >
> > -           is($gzip_is_valid, 0, "program gzip verified file's integrity");
> >
> >
> >
> > libz and gzip are usually split across different packages, hence there
> >
> > is no guarantee that this command is always available (same comment as
> >
> > for LZ4 from a couple of days ago).
>
> Of course. Though while going for it, I did find in Makefile.global.in:
>
> TAR = @TAR@
>
> XGETTEXT = @XGETTEXT@
>
> GZIP = gzip
>
> BZIP2 = bzip2
>
> DOWNLOAD = wget -O $@ --no-use-server-timestamps
>
> Which is also used by GNUmakefile.in
>
> distcheck: dist
>
> rm -rf $(dummy)
>
> mkdir $(dummy)
>
> $(GZIP) -d -c $(distdir).tar.gz | $(TAR) xf -
>
> install_prefix=`cd $(dummy) && pwd`; \
>
> This to my understanding means that gzip is expected to exist.
>
> If this is correct, then simply checking for the headers should
>
> suffice, since that is the only dependency for the files to be
>
> created.
>
> If this is wrong, then I will add the discovery code as in the
>
> other patch.
>
> > -                   [
> >
> >
> > -                           'pg_receivewal', '-D',     $stream_dir, '--verbose',
> >
> >
> > -                           '--endpos',      $nextlsn, '-Z', '5'
> >
> >
> > -                   ],
> >
> >
> >
> > I would keep the compression level to a minimum here, to limit CPU
> >
> > usage but still compress something faster.
> >
> > -           # Verify compressed file's integrity
> >
> >
> > -           my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
> >
> >
> > -           is($gzip_is_valid, 0, "program gzip verified file's integrity");
> >
> >
> >
> > Shouldn't this be coded as a loop going through @gzip_wals?
>
> I would hope that there is only one gz file created. There is a line
>
> further up that tests exactly that.
>
> -   is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");


Let me amend that. The line should be instead:

      is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");

To properly test that there is one entry.

Let me provide with v2 to fix this.

Cheers,
//Georgios

>
>     Then there should also be a partial gz file which is tested further ahead.
>
>     Cheers,
>
>     //Georgios
>
> > Michael



Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 11:56, <gkokolatos@pm.me> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Monday, July 12th, 2021 at 11:42, gkokolatos@pm.me wrote:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> >
> > On Monday, July 12th, 2021 at 08:42, Michael Paquier michael@paquier.xyz wrote:
> >
> > > On Fri, Jul 09, 2021 at 11:26:58AM +0000, Georgios wrote:
> > >
> > > > As suggested on a different thread [1], pg_receivewal can increase it's test
> > > >
> > > > coverage. There exists a non trivial amount of code that handles gzip
> > > >
> > > > compression. The current patch introduces tests that cover creation of gzip
> > > >
> > > > compressed WAL files and the handling of gzip partial segments. Finally the
> > > >
> > > > integrity of the compressed files is verified.
> > >
> > > -             # Verify compressed file's integrity
> > >
> > >
> > > -             my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
> > >
> > >
> > > -             is($gzip_is_valid, 0, "program gzip verified file's integrity");
> > >
> > >
> > >
> > > libz and gzip are usually split across different packages, hence there
> > >
> > > is no guarantee that this command is always available (same comment as
> > >
> > > for LZ4 from a couple of days ago).
> >
> > Of course. Though while going for it, I did find in Makefile.global.in:
> >
> > TAR = @TAR@
> >
> > XGETTEXT = @XGETTEXT@
> >
> > GZIP = gzip
> >
> > BZIP2 = bzip2
> >
> > DOWNLOAD = wget -O $@ --no-use-server-timestamps
> >
> > Which is also used by GNUmakefile.in
> >
> > distcheck: dist
> >
> > rm -rf $(dummy)
> >
> > mkdir $(dummy)
> >
> > $(GZIP) -d -c $(distdir).tar.gz | $(TAR) xf -
> >
> > install_prefix=`cd $(dummy) && pwd`; \
> >
> > This to my understanding means that gzip is expected to exist.
> >
> > If this is correct, then simply checking for the headers should
> >
> > suffice, since that is the only dependency for the files to be
> >
> > created.
> >
> > If this is wrong, then I will add the discovery code as in the
> >
> > other patch.
> >
> > > -                     [
> > >
> > >
> > > -                             'pg_receivewal', '-D',     $stream_dir, '--verbose',
> > >
> > >
> > > -                             '--endpos',      $nextlsn, '-Z', '5'
> > >
> > >
> > > -                     ],
> > >
> > >
> > >
> > > I would keep the compression level to a minimum here, to limit CPU
> > >
> > > usage but still compress something faster.
> > >
> > > -             # Verify compressed file's integrity
> > >
> > >
> > > -             my $gzip_is_valid = system_log('gzip', '--test', $gzip_wals[0]);
> > >
> > >
> > > -             is($gzip_is_valid, 0, "program gzip verified file's integrity");
> > >
> > >
> > >
> > > Shouldn't this be coded as a loop going through @gzip_wals?
> >
> > I would hope that there is only one gz file created. There is a line
> >
> > further up that tests exactly that.
> >
> > -   is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");
>
> Let me amend that. The line should be instead:
>
> is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");
>
> To properly test that there is one entry.
>
> Let me provide with v2 to fix this.


Please find v2 attached with the above.

Cheers,
//Georgios

>
> Cheers,
>
> //Georgios
>
> >     Then there should also be a partial gz file which is tested further ahead.
> >
> >     Cheers,
> >
> >     //Georgios
> >
> >
> > > Michael
Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Gilles Darold
Date:
Le 12/07/2021 à 12:27, gkokolatos@pm.me a écrit :
>>>>
>>>> Shouldn't this be coded as a loop going through @gzip_wals?
>>> I would hope that there is only one gz file created. There is a line
>>>
>>> further up that tests exactly that.
>>>
>>> -   is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");
>> Let me amend that. The line should be instead:
>>
>> is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");
>>
>> To properly test that there is one entry.
>>
>> Let me provide with v2 to fix this.


The following tests are not correct in Perl even if Perl returns the
right value.

+    is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");


+    is (scalar(keys @gzip_partial_wals), 1,
+        "one partial gzip compressed WAL was created");


Function keys or values are used only with hashes but here you are using
arrays. To obtain the length of the array you can just use the scalar
function as Perl returns the length of the array when it is called in a
scalar context. Please use the following instead:


+    is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");


+    is (scalar(@gzip_partial_wals), 1,
+        "one partial gzip compressed WAL was created");


--
Gilles Darold
http://www.darold.net/





Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Mon, Jul 12, 2021 at 09:42:32AM +0000, gkokolatos@pm.me wrote:
> This to my understanding means that gzip is expected to exist.
> If this is correct, then simply checking for the headers should
> suffice, since that is the only dependency for the files to be
> created.

You cannot expect this to work on Windows when it comes to MSVC for
example, as gzip may not be in the environment PATH so the test would
fail hard.  Let's just rely on $ENV{GZIP} instead, and skip the test
if it is not defined.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 13:00, Gilles Darold <gilles@darold.net> wrote:

> Le 12/07/2021 à 12:27, gkokolatos@pm.me a écrit :
>
> > > > > Shouldn't this be coded as a loop going through @gzip_wals?
> > > > >
> > > > > I would hope that there is only one gz file created. There is a line
> > > >
> > > > further up that tests exactly that.
> > > >
> > > > -   is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");
> > > >
> > > >     Let me amend that. The line should be instead:
> > >
> > > is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");
> > >
> > > To properly test that there is one entry.
> > >
> > > Let me provide with v2 to fix this.
>
> The following tests are not correct in Perl even if Perl returns the
>
> right value.
>
> +    is (scalar(keys @gzip_wals), 1, "one gzip compressed WAL was created");
>
> +    is (scalar(keys @gzip_partial_wals), 1,
>
> +        "one partial gzip compressed WAL was created");
>
> Function keys or values are used only with hashes but here you are using
>
> arrays. To obtain the length of the array you can just use the scalar
>
> function as Perl returns the length of the array when it is called in a
>
> scalar context. Please use the following instead:
>
> +    is (scalar(@gzip_wals), 1, "one gzip compressed WAL was created");
>
> +    is (scalar(@gzip_partial_wals), 1,
>
> +        "one partial gzip compressed WAL was created");

You are absolutely correct. I had used that in v1, yet since it got called out
I doubted myself, assumed I was wrong and the rest is history. I shall ament the
amendment for v3 of the patch.

Cheers,
//Georgios

>
>
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Gilles Darold
>
> http://www.darold.net/



Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 13:04, Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Jul 12, 2021 at 09:42:32AM +0000, gkokolatos@pm.me wrote:
>
> > This to my understanding means that gzip is expected to exist.
> >
> > If this is correct, then simply checking for the headers should
> >
> > suffice, since that is the only dependency for the files to be
> >
> > created.
>
> You cannot expect this to work on Windows when it comes to MSVC for
>
> example, as gzip may not be in the environment PATH so the test would
>
> fail hard. Let's just rely on $ENV{GZIP} instead, and skip the test
>
> if it is not defined.

I am admittedly not so well versed on Windows systems. Thank you for
informing me.

Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used
instead. To the best of my knowledge one should avoid using $ENV{GZIP}
because that would translate to the obsolete, yet used environment
variable GZIP which holds a set of default options for gzip. In essence
it would be equivalent to executing:
   GZIP=gzip gzip --test <files>
which can result to errors similar to:
   gzip: gzip: non-option in GZIP environment variable

Cheers,
//Georgios

>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Michael
Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 17:07, <gkokolatos@pm.me> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Monday, July 12th, 2021 at 13:04, Michael Paquier michael@paquier.xyz wrote:
>
> > On Mon, Jul 12, 2021 at 09:42:32AM +0000, gkokolatos@pm.me wrote:
> >
> > > This to my understanding means that gzip is expected to exist.
> > >
> > > If this is correct, then simply checking for the headers should
> > >
> > > suffice, since that is the only dependency for the files to be
> > >
> > > created.
> >
> > You cannot expect this to work on Windows when it comes to MSVC for
> >
> > example, as gzip may not be in the environment PATH so the test would
> >
> > fail hard. Let's just rely on $ENV{GZIP} instead, and skip the test
> >
> > if it is not defined.
>
> I am admittedly not so well versed on Windows systems. Thank you for
>
> informing me.
>
> Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used
>
> instead. To the best of my knowledge one should avoid using $ENV{GZIP}
>
> because that would translate to the obsolete, yet used environment
>
> variable GZIP which holds a set of default options for gzip. In essence
>
> it would be equivalent to executing:
>
> GZIP=gzip gzip --test <files>
>
> which can result to errors similar to:
>
> gzip: gzip: non-option in GZIP environment variable
>

After a bit more thinking, I went ahead and added on top of v3 a test
verifying that the gzip program can actually be called.

Please find v4 attached.

Cheers,
//Georgios

>
> > Michael
Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Mon, Jul 12, 2021 at 04:46:29PM +0000, gkokolatos@pm.me wrote:
> On Monday, July 12th, 2021 at 17:07, <gkokolatos@pm.me> wrote:
>> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Are you using outlook?  The format of your messages gets blurry on the
PG website, so does it for me.

>> I am admittedly not so well versed on Windows systems. Thank you for
>> informing me.
>> Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used
>> instead. To the best of my knowledge one should avoid using $ENV{GZIP}
>> because that would translate to the obsolete, yet used environment
>> variable GZIP which holds a set of default options for gzip. In essence
>> it would be equivalent to executing:
>> GZIP=gzip gzip --test <files>
>> which can result to errors similar to:
>> gzip: gzip: non-option in GZIP environment variable

-# make this available to TAP test scripts
+# make these available to TAP test scripts
 export TAR
+export GZIP_PROGRAM=$(GZIP)

Wow.  So this comes from the fact that the command gzip can feed on
the environment variable from the same name.  I was not aware of
that, and a comment would be in place here.  That means complicating a
bit the test flow for people on Windows, but I am fine to live with
that as long as this does not fail hard.  One extra thing we could do
is drop this part of the test, but I agree that this is useful to have
around as a validity check.

> After a bit more thinking, I went ahead and added on top of v3 a test
> verifying that the gzip program can actually be called.

+       system_log($gzip, '--version') != 0);
Checking after that does not hurt, indeed.  I am wondering if we
should do that for TAR.

Another thing I find unnecessary is the number of the tests.  This
does two rounds of pg_receivewal just to test the long and short
options of -Z/-compress, which brings only coverage to make sure that
both option names are handled.  That's a high cost for a low amound of
extra coverage, so let's cut the runtime in half and just use the
round with --compress.

There was also a bit of confusion with ZLIB and gzip in the variable
names and the comments, the latter being only the command while the
compression happens with zlib.  With a round of indentation on top of
all that, I ge tthe attached.

What do you think?
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, July 13th, 2021 at 03:53, Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Jul 12, 2021 at 04:46:29PM +0000, gkokolatos@pm.me wrote:
>
> > On Monday, July 12th, 2021 at 17:07, gkokolatos@pm.me wrote:
> >
> > > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> Are you using outlook? The format of your messages gets blurry on the
>
> PG website, so does it for me.

I am using protonmail's web page. I was not aware of the issue. Thank you
for bringing it up to my attention. I shall try to address it.

>
> > > I am admittedly not so well versed on Windows systems. Thank you for
> > >
> > > informing me.
> > >
> > > Please find attached v3 of the patch where $ENV{GZIP_PROGRAM} is used
> > >
> > > instead. To the best of my knowledge one should avoid using $ENV{GZIP}
> > >
> > > because that would translate to the obsolete, yet used environment
> > >
> > > variable GZIP which holds a set of default options for gzip. In essence
> > >
> > > it would be equivalent to executing:
> > >
> > > GZIP=gzip gzip --test <files>
> > >
> > > which can result to errors similar to:
> > >
> > > gzip: gzip: non-option in GZIP environment variable
>
> -# make this available to TAP test scripts
>
> +# make these available to TAP test scripts
>
> export TAR
>
> +export GZIP_PROGRAM=$(GZIP)
>
> Wow. So this comes from the fact that the command gzip can feed on
>
> the environment variable from the same name. I was not aware of
>
> that, and a comment would be in place here. That means complicating a
>
> bit the test flow for people on Windows, but I am fine to live with
>
> that as long as this does not fail hard. One extra thing we could do
>
> is drop this part of the test, but I agree that this is useful to have
>
> around as a validity check.

Great.

>
> > After a bit more thinking, I went ahead and added on top of v3 a test
> >
> > verifying that the gzip program can actually be called.
>
> -         system_log($gzip, '--version') != 0);
>
>
>
> Checking after that does not hurt, indeed. I am wondering if we
>
> should do that for TAR.

I do not think that this will be a necessity for TAR. TAR after all
is discovered by autoconf, which gzip is not.

>
> Another thing I find unnecessary is the number of the tests. This
>
> does two rounds of pg_receivewal just to test the long and short
>
> options of -Z/-compress, which brings only coverage to make sure that
>
> both option names are handled. That's a high cost for a low amound of
>
> extra coverage, so let's cut the runtime in half and just use the
>
> round with --compress.

I am sorry this was not so clear. It is indeed running twice the binary
with different flags. However the goal is not to check the flags, but
to make certain that the partial file has now been completed. That is
why there was code asserting that the previous FILENAME.gz.partial file
after the second invocation is converted to FILENAME.gz.

Additionally the second invocation of pg_receivewal is extending the
coverage of FindStreamingStart().

The different flags was an added bonus.

>
> There was also a bit of confusion with ZLIB and gzip in the variable
>
> names and the comments, the latter being only the command while the
>
> compression happens with zlib. With a round of indentation on top of
>
> all that, I ge tthe attached.
>
> What do you think?

Thank you very much for the patch. I would prefer to keep the parts that
tests that .gz.partial are completed on a subsequent run if you agree.

Cheers,
//Georgios

>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Michael



Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Tue, Jul 13, 2021 at 06:36:59AM +0000, gkokolatos@pm.me wrote:
> I am sorry this was not so clear. It is indeed running twice the binary
> with different flags. However the goal is not to check the flags, but
> to make certain that the partial file has now been completed. That is
> why there was code asserting that the previous FILENAME.gz.partial file
> after the second invocation is converted to FILENAME.gz.

The first run you are adding checks the same thing thanks to
pg_switch_wal(), where pg_receivewal completes the generation of
000000010000000000000002.gz and finishes with
000000010000000000000003.gz.partial.

> Additionally the second invocation of pg_receivewal is extending the
> coverage of FindStreamingStart().

Hmm.  It looks like a waste in runtime once we mix LZ4 in that as that
would mean 5 runs of pg_receivewal, but we really need only three of
them with --endpos:
- One with ZLIB compression.
- One with LZ4 compression.
- One without compression.

Do you think that we could take advantage of what is now the only run
of pg_receivewal --endpos for that?  We could make the ZLIB checks run
first, conditionally, and then let the last command with --endpos
perform a full scan of the contents in $stream_dir with the .gz files
already in place.  The addition of LZ4 would be an extra conditional
block similar to what's introduced in ZLIB, running before the last
command without compression.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Tue, Jul 13, 2021 at 04:37:53PM +0900, Michael Paquier wrote:
> Hmm.  It looks like a waste in runtime once we mix LZ4 in that as that
> would mean 5 runs of pg_receivewal, but we really need only three of
> them with --endpos:
> - One with ZLIB compression.
> - One with LZ4 compression.
> - One without compression.
>
> Do you think that we could take advantage of what is now the only run
> of pg_receivewal --endpos for that?  We could make the ZLIB checks run
> first, conditionally, and then let the last command with --endpos
> perform a full scan of the contents in $stream_dir with the .gz files
> already in place.  The addition of LZ4 would be an extra conditional
> block similar to what's introduced in ZLIB, running before the last
> command without compression.

Poking at this problem, I partially take this statement back as this
requires an initial run of pg_receivewal --endpos to ensure the
creation of one .gz and one .gz.partial.  So I guess that this should
be structured as:
1) Keep the existing pg_receivewal --endpos.
2) Add the ZLIB block, with one pg_receivewal --endpos.
3) Add at the end one extra pg_receivewal --endpos, outside of the ZLIB
block, which should check the creation of a .partial, non-compressed
segment.  This should mention that we place this command at the end of
the test for the start streaming point computation.

LZ4 tests would be then between 2) and 3), or 1) and 2).
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, July 13th, 2021 at 09:37, Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Jul 13, 2021 at 06:36:59AM +0000, gkokolatos@pm.me wrote:
>
> > I am sorry this was not so clear. It is indeed running twice the binary
> > with different flags. However the goal is not to check the flags, but
> > to make certain that the partial file has now been completed. That is
> > why there was code asserting that the previous FILENAME.gz.partial file
> > after the second invocation is converted to FILENAME.gz.
>
> The first run you are adding checks the same thing thanks to
> pg_switch_wal(), where pg_receivewal completes the generation of
> 000000010000000000000002.gz and finishes with
> 000000010000000000000003.gz.partial.

This is correct. It is the 000000010000000000000003 awl that the rest
of the tests are targeting.

>
> > Additionally the second invocation of pg_receivewal is extending the
> > coverage of FindStreamingStart().
>
> Hmm. It looks like a waste in runtime once we mix LZ4 in that as that
> would mean 5 runs of pg_receivewal, but we really need only three of
> them with --endpos:
> -   One with ZLIB compression.
> -   One with LZ4 compression.
> -   One without compression.
>
>     Do you think that we could take advantage of what is now the only run
>     of pg_receivewal --endpos for that? We could make the ZLIB checks run
>     first, conditionally, and then let the last command with --endpos
>     perform a full scan of the contents in $stream_dir with the .gz files
>     already in place. The addition of LZ4 would be an extra conditional
>     block similar to what's introduced in ZLIB, running before the last
>     command without compression.

I will admit that for the current patch I am not taking lz4 into account as
at the moment I have little idea as to how the lz4 patch will advance with the
review rounds. I simply accepted that it will be rebased on top of the patch
in the current thread and probably need to modify the current then.

But I digress. I would like have some combination of .gz and .gz.parial but
I will not take too strong of a stance. I am happy to go with your suggestion.

Cheers,
//Georgios

>
>     --
>
>     Michael



Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, July 13th, 2021 at 10:14, Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Jul 13, 2021 at 04:37:53PM +0900, Michael Paquier wrote:
>
> Poking at this problem, I partially take this statement back as this
> requires an initial run of pg_receivewal --endpos to ensure the
> creation of one .gz and one .gz.partial. So I guess that this should
> be structured as:
>
> 1.  Keep the existing pg_receivewal --endpos.
> 2.  Add the ZLIB block, with one pg_receivewal --endpos.
> 3.  Add at the end one extra pg_receivewal --endpos, outside of the ZLIB
>     block, which should check the creation of a .partial, non-compressed
>     segment. This should mention that we place this command at the end of
>     the test for the start streaming point computation.
>     LZ4 tests would be then between 2) and 3), or 1) and 2).

Sounds great. Let me cook up v6 for this.

>
>     --
>
>     Michael



Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Tue, Jul 13, 2021 at 08:28:44AM +0000, gkokolatos@pm.me wrote:
> Sounds great. Let me cook up v6 for this.

Thanks.  Could you use v5 I posted upthread as a base?  There were
some improvements in the variable names, the comments and the test
descriptions.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, July 13th, 2021 at 12:26, Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Jul 13, 2021 at 08:28:44AM +0000, gkokolatos@pm.me wrote:
> > Sounds great. Let me cook up v6 for this.
>
> Thanks. Could you use v5 I posted upthread as a base? There were
> some improvements in the variable names, the comments and the test
> descriptions.

Agreed. For the record that is why I said v6 :)

Cheers,
//Georgios

>
-----------------------------------------------------------------------------------------------------------------------------------------------------
> Michael



Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Tue, Jul 13, 2021 at 11:16:06AM +0000, gkokolatos@pm.me wrote:
> Agreed. For the record that is why I said v6 :)

Okay, thanks.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Wednesday, July 14th, 2021 at 04:17, Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Jul 13, 2021 at 11:16:06AM +0000, gkokolatos@pm.me wrote:
> > Agreed. For the record that is why I said v6 :)
> Okay, thanks.

Please find v6 attached.

Cheers,
//Georgios

> ---------------
>
> Michael
Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Wed, Jul 14, 2021 at 02:11:09PM +0000, gkokolatos@pm.me wrote:
> Please find v6 attached.

Thanks.  I have spent some time checking this stuff in details, and
I did some tests on Windows while on it.  A run of pgperltidy was
missing.  A second thing is that you added one useless WAL segment
switch in the ZLIB block, and two at the end, causing the first two in
the set of three (one in the ZLIB block and one in the final command)
to be no-ops as they followed a previous WAL switch.  The final one
was not needed as no WAL is generated after that.

And applied.  Let's see if the buildfarm has anything to say.  Perhaps
this will even catch some bugs that pre-existed.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 15th, 2021 at 09:00, Michael Paquier <michael@paquier.xyz> wrote:

> On Wed, Jul 14, 2021 at 02:11:09PM +0000, gkokolatos@pm.me wrote:
>
> > Please find v6 attached.
>
> Thanks. I have spent some time checking this stuff in details, and
> I did some tests on Windows while on it. A run of pgperltidy was
> missing. A second thing is that you added one useless WAL segment
> switch in the ZLIB block, and two at the end, causing the first two in
> the set of three (one in the ZLIB block and one in the final command)
> to be no-ops as they followed a previous WAL switch. The final one
> was not needed as no WAL is generated after that.
>

Thank you for the work and comments.

> And applied. Let's see if the buildfarm has anything to say. Perhaps
> this will even catch some bugs that pre-existed.

Let us hope that it will prevent some bugs from happening.

Cheers,
//Georgios

>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Michael



Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Thu, Jul 15, 2021 at 07:48:08AM +0000, gkokolatos@pm.me wrote:
> Let us hope that it will prevent some bugs from happening.

The buildfarm has two reports.

1) bowerbird on Windows/MSVC:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2021-07-15%2010%3A30%3A36
pg_receivewal: fatal: could not fsync existing write-ahead log file
"000000010000000000000002.partial": Permission denied
not ok 20 - streaming some WAL using ZLIB compression
I don't think the existing code can be blamed for that as this means a
failure with gzflush().  Likely a concurrency issue as that's an
EACCES.  If that's repeatable, that could point to an actual issue
with pg_receivewal --compress.

2) curculio:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=curculio&dt=2021-07-15%2010%3A30%3A15
# Running: gzip --test

/home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal/000000010000000000000002.gz

/home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal/000000010000000000000003.gz.partial
gzip:

/home/pgbf/buildroot/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal/000000010000000000000003.gz.partial:
 unknown suffix: ignored
 not ok 24 - gzip verified the integrity of compressed WAL segments

Looking at the OpenBSD code (usr.bin/compress/main.c), long options
are supported, where --version does exit(0) without printing
anything, and --test is supported even if that's not on the man pages.
set_outfile() is doing a discard of the file suffixes it does not
recognize, and I think that their implementation bumps on .gz.partial
and generates an exit code of 512 to map with WARNING.  I still wish
to keep this test, and I'd like to think that the contents of
@zlib_wals are enough in terms of coverage.  What do you think?
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Thu, Jul 15, 2021 at 08:35:27PM +0900, Michael Paquier wrote:
> 1) bowerbird on Windows/MSVC:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2021-07-15%2010%3A30%3A36
> pg_receivewal: fatal: could not fsync existing write-ahead log file
> "000000010000000000000002.partial": Permission denied
> not ok 20 - streaming some WAL using ZLIB compression
> I don't think the existing code can be blamed for that as this means a
> failure with gzflush().  Likely a concurrency issue as that's an
> EACCES.  If that's repeatable, that could point to an actual issue
> with pg_receivewal --compress.

For this one, I'll try to test harder on my own host.  I am curious to
see if the other Windows members running the TAP tests have anything
to say.  Looking at the code of zlib, this would come from gz_zero()
in gzflush(), which could blow up on a write() in gz_comp().

> 2) curculio:
>
> Looking at the OpenBSD code (usr.bin/compress/main.c), long options
> are supported, where --version does exit(0) without printing
> anything, and --test is supported even if that's not on the man pages.
> set_outfile() is doing a discard of the file suffixes it does not
> recognize, and I think that their implementation bumps on .gz.partial
> and generates an exit code of 512 to map with WARNING.  I still wish
> to keep this test, and I'd like to think that the contents of
> @zlib_wals are enough in terms of coverage.  What do you think?

After thinking more about this one, I have taken the course to just
remove the .gz.partial segment from the check, a full segment should
be enough in terms of coverage.  I prefer this simplification over a
rename of the .partial segment or a tweak of the error code to map
with WARNING.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
gkokolatos@pm.me
Date:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 15th, 2021 at 14:35, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Jul 15, 2021 at 08:35:27PM +0900, Michael Paquier wrote:

>
> > 2.  curculio:
> > Looking at the OpenBSD code (usr.bin/compress/main.c), long options
> > are supported, where --version does exit(0) without printing
> > set_outfile() is doing a discard of the file suffixes it does not
> > recognize, and I think that their implementation bumps on .gz.partial
> > and generates an exit code of 512 to map with WARNING. I still wish
> > to keep this test, and I'd like to think that the contents of
> > @zlib_wals are enough in terms of coverage. What do you think?
>
> After thinking more about this one, I have taken the course to just
> remove the .gz.partial segment from the check, a full segment should
> be enough in terms of coverage. I prefer this simplification over a
> rename of the .partial segment or a tweak of the error code to map
> with WARNING.

Fair enough.

Cheers,
//Georgios

>
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Michael



Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Thu, Jul 15, 2021 at 09:35:52PM +0900, Michael Paquier wrote:
> For this one, I'll try to test harder on my own host.  I am curious to
> see if the other Windows members running the TAP tests have anything
> to say.  Looking at the code of zlib, this would come from gz_zero()
> in gzflush(), which could blow up on a write() in gz_comp().

bowerbird has just failed for the second time in a row on EACCESS, so
there is more here than meets the eye.  Looking at the code, I think I
have spotted what it is and the buildfarm logs give a very good hint:
# Running: pg_receivewal -D
:/prog/bf/root/HEAD/pgsql.build/src/bin/pg_basebackup/tmp_check/t_020_pg_receivewal_primary_data/archive_wal
--verbose --endpos 0/3000028 --compress 1
pg_receivewal: starting log streaming at 0/2000000 (timeline 1)
pg_receivewal: fatal: could not fsync existing write-ahead log file
"000000010000000000000002.partial": Permission denied
not ok 20 - streaming some WAL using ZLIB compression

--compress is used and the sync fails for a non-compressed segment.
Looking at the code it is pretty obvious that open_walfile() is
getting confused with the handling of an existing .partial segment
while walmethods.c uses dir_data->compression in all the places that
matter.  So that's a legit bug, that happens only when mixing
pg_receivewal runs for where successive runs use the compression or
non-compression modes.

I am amazed that the other buildfarm members are not complaining, to
be honest.  jacana runs this TAP test with MinGW and ZLIB, and does
not complain.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Fri, Jul 16, 2021 at 08:59:11AM +0900, Michael Paquier wrote:
> --compress is used and the sync fails for a non-compressed segment.
> Looking at the code it is pretty obvious that open_walfile() is
> getting confused with the handling of an existing .partial segment
> while walmethods.c uses dir_data->compression in all the places that
> matter.  So that's a legit bug, that happens only when mixing
> pg_receivewal runs for where successive runs use the compression or
> non-compression modes.

Ditto.  After reading the code more carefully, the code is actually
able to work even if it could be cleaner:
1) dir_existsfile() would check for the existence of a
non-compressed, partial segment all the time.
2) If this non-compressed file was padded, the code would use
open_for_write() that would open a compressed, partial segment.
3) The compressed, partial segment would be the one flushed.

This behavior is rather debatable, and it would be more instinctive to
me to just skip any business related to the pre-padding if compression
is enabled, at the cost of one extra callback in WalWriteMethod to
grab the compression level (dir_open_for_write() skips that for
compression) to allow receivelog.c to handle that.  But at the same
time few users are going to care about that as pg_receivewal has most
likely always the same set of options, so complicating this code is
not really appealing either.

> I am amazed that the other buildfarm members are not complaining, to
> be honest.  jacana runs this TAP test with MinGW and ZLIB, and does
> not complain.

I have spent more time on that with my own environment, and while
testing I have bumped on a different issue with zlib, which was
really weird.  In the same scenario as above, gzdopen() has been
failing for me at step 2), causing the test to loop forever.  We
document to use DLLs for ZLIB coming from zlib.net, but the ones
available there are really outdated as far as I can see (found some
called zlib.lib/dll myself, breaking Solution.pm).  For now I have
disabled those tests on Windows to bring back bowerbird to green, but
there is something else going on here.  We don't do much tests with
ZLIB on Windows for pg_basebackup and pg_dump, so there may be some
more issues?

@Andrew: which version of ZLIB are you using on bowerbird?  That's the
one in c:\prog\3p64.  That's a zdll.lib, right?
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Fri, Jul 16, 2021 at 02:08:57PM +0900, Michael Paquier wrote:
> This behavior is rather debatable, and it would be more instinctive to
> me to just skip any business related to the pre-padding if compression
> is enabled, at the cost of one extra callback in WalWriteMethod to
> grab the compression level (dir_open_for_write() skips that for
> compression) to allow receivelog.c to handle that.  But at the same
> time few users are going to care about that as pg_receivewal has most
> likely always the same set of options, so complicating this code is
> not really appealing either.

I have chewed on that over the weekend, and skipping the padding logic
if we are in compression mode in open_walfile() makes sense, so
attached is a patch that I'd like to backpatch.

Another advantage of this patch is the handling of ".gz" is reduced to
one code path instead of four.  That makes a bit easier the
introduction of new compression methods.

A second thing that was really confusing is that the name of the WAL
segment generated in this code path completely ignored the type of
compression.  This led to one confusing error message if failing to
open a segment for write where we'd mention a .partial file rather
than a .gz.partial file.  The versions of zlib I used on Windows
looked buggy so I cannot conclude there, but I am sure that this
should allow bowerbird to handle the test correctly.
--
Michael

Attachment

Re: Introduce pg_receivewal gzip compression tests

From
Michael Paquier
Date:
On Mon, Jul 19, 2021 at 04:03:33PM +0900, Michael Paquier wrote:
> Another advantage of this patch is the handling of ".gz" is reduced to
> one code path instead of four.  That makes a bit easier the
> introduction of new compression methods.
>
> A second thing that was really confusing is that the name of the WAL
> segment generated in this code path completely ignored the type of
> compression.  This led to one confusing error message if failing to
> open a segment for write where we'd mention a .partial file rather
> than a .gz.partial file.  The versions of zlib I used on Windows
> looked buggy so I cannot conclude there, but I am sure that this
> should allow bowerbird to handle the test correctly.

After more testing and more review, I have applied and backpatched
this stuff.  Another thing I did on HEAD was to enable again the ZLIB
portion of the pg_receivewal tests on Windows.  bowerdird should stay
green (I hope), and it is better to have as much more coverage as
possible for all that.
--
Michael

Attachment