Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From gkokolatos@pm.me
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id XPMIIorJ5munrlLRZGquartiN5tuapdjiBhaPZ6iEmJccRiKWCIHSowSLPhaGj4gLpqvaS1pQn7PGkA_WNwvuUTVdl-lrU_TzHA81LqujJ4=@pm.me
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Add LZ4 compression in pg_dump  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Add LZ4 compression in pg_dump  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Re: Add LZ4 compression in pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers




------- Original Message -------
On Thursday, March 16th, 2023 at 10:20 PM, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:


>
>
>
>
> On 3/16/23 18:04, gkokolatos@pm.me wrote:
>
> > ------- Original Message -------
> > On Tuesday, March 14th, 2023 at 4:32 PM, Tomas Vondra tomas.vondra@enterprisedb.com wrote:
> >
> > > On 3/14/23 16:18, gkokolatos@pm.me wrote:
> > >
> > > > ...> Would you mind me trying to come with a patch to address your points?
> > >
> > > That'd be great, thanks. Please keep it split into smaller patches - two
> > > might work, with one patch for "cosmetic" changes and the other tweaking
> > > the API error-handling stuff.
> >
> > Please find attached a set for it. I will admit that the splitting in the
> > series might not be ideal and what you requested. It is split on what
> > seemed as a logical units. Please advice how a better split can look like.
> >
> > 0001 is unifying types and return values on the API
> > 0002 is addressing the constant definitions
> > 0003 is your previous 0004 adding comments
>
>
> Thanks. I think the split seems reasonable - the goal was to not mix
> different changes, and from that POV it works.
>
> I'm not sure I understand the Gzip_read/Gzip_write changes in 0001. I
> mean, gzread/gzwrite returns int, so how does renaming the size_t
> variable solve the issue of negative values for errors? I mean, this
>
> - size_t ret;
> + size_t gzret;
>
> - ret = gzread(gzfp, ptr, size);
> + gzret = gzread(gzfp, ptr, size);
>
> means we still lost the information gzread() returned a negative value,
> no? We'll still probably trigger an error, but it's a bit weird.

You are obviously correct. My bad, I miss-read the return type of gzread().

Please find an amended version attached.

> Unless I'm missing something, if gzread() ever returns -1 or some other
> negative error value, we'll cast it to size_t, while condition will
> evaluate to "true" and we'll happily chew on some random chunk of data.
>
> So the confusion is (at least partially) a preexisting issue ...
>
> For gzwrite() it seems to be fine, because that only returns 0 on error.
> OTOH it's defined to take 'int size' but then we happily pass size_t
> values to it.
>
> As I wrote earlier, this apparently assumes we never need to deal with
> buffers larger than int, and I don't think we have the ambition to relax
> that (I'm not sure it's even needed / possible).

Agreed.


> I see the read/write functions are now defined as int, but we only ever
> return 0/1 from them, and then interpret that as bool. Why not to define
> it like that? I don't think we need to adhere to the custom that
> everything returns "int". This is an internal API. Or if we want to
> stick to int, I'd define meaningful "nice" constants for 0/1.

The return types are now booleans and the callers have been made aware.


> 0002 seems fine to me. I see you've ditched the idea of having two
> separate buffers, and replaced them with DEFAULT_IO_BUFFER_SIZE. Fine
> with me, although I wonder if this might have negative impact on
> performance or something (but I doubt that).
>

I doubt that too. Thank you.

> 0003 seems fine too.

Thank you.


> > As far as the error handling is concerned, you had said upthread:
> >
> > > I think the right approach is to handle all library errors and not just
> > > let them through. So Gzip_write() needs to check the return value, and
> > > either call pg_fatal() or translate it to an error defined by the API.
> >
> > While working on it, I thought it would be clearer and more consistent
> > for the pg_fatal() to be called by the caller of the individual functions.
> > Each individual function can keep track of the specifics of the error
> > internally. Then the caller upon detecting that there was an error by
> > checking the return value, can call pg_fatal() with a uniform error
> > message and then add the specifics by calling the get_error_func().
>
>
> I agree it's cleaner the way you did it.
>
> I was thinking that with each compression function handling error
> internally, the callers would not need to do that. But I haven't
> realized there's logic to detect ENOSPC and so on, and we'd need to
> duplicate that in every compression func.
>

If you agree, I can prepare a patch to improve on the error handling
aspect of the API as a separate thread, since here we are trying to
focus on correctness.

Cheers,
//Georgios

>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
Attachment

pgsql-hackers by date:

Previous
From: Zheng Li
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Paul Jungwirth
Date:
Subject: Re: Exclusion constraints on partitioned tables