Re: Decompression bug in astreamer_lz4 - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Decompression bug in astreamer_lz4
Date
Msg-id aFyGYWedCAs_gr21@paquier.xyz
Whole thread Raw
List pgsql-hackers
On Wed, Jun 25, 2025 at 11:07:32PM +0300, Mikhail Gribkov wrote:
> The same situation can be reproduced using randomly filled files. Let’s
> imagine the extension saves its own statistics in pg_stat:
> dd if=/dev/urandom of=$PGDATA/pg_stat/junk bs=1M count=16

Original, I like that.  It's not hard to imagine some extension code
that decides to dump at the root of the data folder some file using
one of the shutdown callbacks at the root of the data directory.  Not
seeing the problem would then be a matter of luck.  So yes, that's a
legit bug IMO.

> In the vast majority of cases the mystreamer->bytes_written value is zero
> at the moment of next_out/avail_out initialization, thus the problem does
> not appear. In fact we were totally unable to reproduce non-zero initial
> value of mystreamer->bytes_written without poor-compressible custom format
> files.I’m not that deep in the compression algorithms to say exactly why is
> that, and I realize that very few users will really face the problem, but
> the problem is definitely there.

That's not surprising, noted.

> Attached is a simple patch implementing the above fix. The issue disappears
> with it.

It seems like you have spent some time digging into all that, thanks
for all that.

> What do you think?

I think that it would be nice to have some test coverage for such
cases in pg_verifybackup for both the client side and the server side
of backups compressed, relying on the backend to generate some random
data dumped into a file at the root of a data folder.  You could just
plug in that into one of the scripts, before taking a backup, and
check all the compression methods we need to support.  See for example
brin.sql where with the combination of string_agg() and fipshash(), as
one reference.  So we could just reuse something like that.

I am detecting a threshold close to 512kB to be able to reproduce the
problem, so we don't need a file that large to see the failure, making
such tests faster with less data required as long as the file is
written once in a data folder used by all the backups taken:
$ rm $PGDATA/junk && dd if=/dev/urandom of=$PGDATA/junk bs=1k count=512
$ rm -r /tmp/client-backup-lz4/*
$ pg_basebackup --format=tar --no-sync -cfast \
   -Xfetch --compress client-lz4 -D /tmp/client-backup-lz4
$ pg_verifybackup -n /tmp/client-backup-lz4/
pg_verifybackup: error: checksum mismatch for file "junk" in archive "base.tar.lz4"
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Next
From: Greg Sabino Mullane
Date:
Subject: Re: display hot standby state in psql prompt