Home > mailing lists

Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From	gkokolatos@pm.me
Subject	Re: Add LZ4 compression in pg_dump
Date	January 25, 2023 19:50:49
Msg-id	dtVQJmORKPCq98pCTjlcc5fl4y-ktNMCHZNgU4z55nzNZV__auUO_cuoviQEcr8cSQt6gKkylEK3Y_9FA9ppRDVgfUHEOZgztFoVrNcYfSw=@pm.me Whole thread
In response to	Re: Add LZ4 compression in pg_dump (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List	pgsql-hackers

Tree view





------- Original Message -------
On Wednesday, January 25th, 2023 at 6:28 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:


>
>
>
> On 1/25/23 16:37, gkokolatos@pm.me wrote:
>
> > ------- Original Message -------
> > On Wednesday, January 25th, 2023 at 2:42 AM, Justin Pryzby pryzby@telsasoft.com wrote:
> >
> > > On Tue, Jan 24, 2023 at 03:56:20PM +0000, gkokolatos@pm.me wrote:
> > >
> > > > On Monday, January 23rd, 2023 at 7:00 PM, Justin Pryzby pryzby@telsasoft.com wrote:
> > > >
> > > > > On Mon, Jan 23, 2023 at 05:31:55PM +0000, gkokolatos@pm.me wrote:
> > > > >
> > > > > > Please find attached v23 which reintroduces the split.
> > > > > >
> > > > > > 0001 is reworked to have a reduced footprint than before. Also in an attempt
> > > > > > to facilitate the readability, 0002 splits the API's and the uncompressed
> > > > > > implementation in separate files.
> > > > >
> > > > > Thanks for updating the patch. Could you address the review comments I
> > > > > sent here ?
> > > > > https://www.postgresql.org/message-id/20230108194524.GA27637%40telsasoft.com
> > > >
> > > > Please find v24 attached.
> > >
> > > Thanks for updating the patch.
> > >
> > > In 001, RestoreArchive() does:
> > >
> > > > -#ifndef HAVE_LIBZ
> > > > - if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP &&
> > > > - AH->PrintTocDataPtr != NULL)
> > > > + supports_compression = false;
> > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_NONE ||
> > > > + AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > > > + supports_compression = true;
> > > > +
> > > > + if (AH->PrintTocDataPtr != NULL)
> > > > {
> > > > for (te = AH->toc->next; te != AH->toc; te = te->next)
> > > > {
> > > > if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > > > - pg_fatal("cannot restore from compressed archive (compression not supported in this installation)");
> > > > + {
> > > > +#ifndef HAVE_LIBZ
> > > > + if (AH->compression_spec.algorithm == PG_COMPRESSION_GZIP)
> > > > + supports_compression = false;
> > > > +#endif
> > > > + if (supports_compression == false)
> > > > + pg_fatal("cannot restore from compressed archive (compression not supported in this installation)");
> > > > + }
> > > > }
> > > > }
> > > > -#endif
> > >
> > > This first checks if the algorithm is implemented, and then checks if
> > > the algorithm is supported by the current build - that confused me for a
> > > bit. It seems unnecessary to check for unimplemented algorithms before
> > > looping. That also requires referencing both GZIP and LZ4 in two
> > > places.
> >
> > I am not certain that it is unnecessary, at least not in the way that is
> > described. The idea is that new compression methods can be added, without
> > changing the archive's version number. It is very possible that it is
> > requested to restore an archive compressed with a method not implemented
> > in the current binary. The first check takes care of that and sets
> > supports_compression only for the supported versions. It is possible to
> > enter the loop with supports_compression already set to false, for example
> > because the archive was compressed with ZSTD, triggering the fatal error.
> >
> > Of course, one can throw the error before entering the loop, yet I think
> > that it does not help the readability of the code. IMHO it is easier to
> > follow if the error is thrown once during that check.
>
>
> Actually, I don't understand why 0001 moves the check into the loop. I
> mean, why not check HAVE_LIBZ before the loop?

The intention is to be able to restore archives that don't contain
data. In that case compression becomes irrelevant as only the data in
an archive is compressed.

>
> > > I think it could be written to avoid the need to change for added
> > > compression algorithms:
> > >
> > > + if (te->hadDumper && (te->reqs & REQ_DATA) != 0)
> > >
> > > + {
> > > + /* Check if the compression algorithm is supported */
> > > + pg_compress_specification spec;
> > > + parse_compress_specification(AH->compression_spec.algorithm, NULL, &spec);
> > >
> > > + if (spec->parse_error != NULL)
> > >
> > > + pg_fatal(spec->parse_error);
> > >
> > > + }
> >
> > I am not certain how that would work in the example with ZSTD above.
> > If I am not wrong, parse_compress_specification() will not throw an error
> > if the codebase supports ZSTD, yet this specific pg_dump binary will not
> > support it because ZSTD is not implemented. parse_compress_specification()
> > is not aware of that and should not be aware of it, should it?
>
>
> Not sure. What happens in a similar situation now? That is, when trying
> to deal with an archive gzip-compressed in a build without libz?


In case that there are no data chunks, the archive will be restored.

Cheers,
//Georgios


>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Aleksander Alekseev
Date: 25 January 2023, 19:27:38
Subject: Re: [PATCH] Make ON CONFLICT DO NOTHING and ON CONFLICT DO UPDATE consistent

From: Aleksander Alekseev
Date: 25 January 2023, 19:55:48
Subject: Re: to_hex() for negative inputs

Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

Previous

Next