Re: [PATCH] Refactor bytea_sortsupport(), take two - Mailing list pgsql-hackers

From Aleksander Alekseev
Subject Re: [PATCH] Refactor bytea_sortsupport(), take two
Date
Msg-id CAJ7c6TOP=kjnT+Aeoaige-qp7pPstZ_M8CcrMTfvwUaiYu952w@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Refactor bytea_sortsupport(), take two  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PATCH] Refactor bytea_sortsupport(), take two
List pgsql-hackers
Hi John,

Many thanks for the feedback.

> For example in *_abbrev_convert:
>
> [...]
>
> The first part of the comment explained why we do this at all. The
> bytea type does not use strcoll, so it's possible that a future patch
> may decide to skip cardinality estimation entirely. It's obviously not
> the responsibility of this refactoring patch to investigate that, but
> throwing the reason out makes it harder for someone to discover that
> bytea could do something different here. In this case, maybe we could
> point to varstr_abbrev_convert instead of repeating the whole comment.

Fixed. I choose to repeat the entire comment since it is relatively
short in this case.

>         /*
> -        * Clamp cardinality estimates to at least one distinct value.  While
> -        * NULLs are generally disregarded, if only NULL values were
> seen so far,
> -        * that might misrepresent costs if we failed to clamp.
> +        * Clamp cardinality estimates to at least one distinct value.
>          */
>
> The old comment explained why we clamp, but the new comment just says
> what the code is doing, and it's obvious what the code is doing.

Fixed in the same manner.

> -       /*
> -        * In the worst case all abbreviated keys are identical, while
> at the same
> -        * time there are differences within full key strings not captured in
> -        * abbreviations.
> -        */
>
> Why is this gone? I could go on, but hopefully you get my point. If
> it's short, just keep it, and adjust as necessary. If it's long, point
> to varlena.c if the idea is the same.

OK, I made sure all the comments are in place. I referenced
corresponding varlena.c functions where the comments were too long to
my taste to repeat them entirely.

> Back to the actual patch:
>
> - * More generally, it's okay that bytea callers can have NUL bytes in
> - * strings because abbreviated cmp need not make a distinction between
> - * terminating NUL bytes, and NUL bytes representing actual NULs in the
> - * authoritative representation.  Hopefully a comparison at or past one
> - * abbreviated key's terminating NUL byte will resolve the comparison
> - * without consulting the authoritative representation; specifically, some
> - * later non-NUL byte in the longer string can resolve the comparison
> - * against a subsequent terminating NUL in the shorter string.  There will
> - * usually be what is effectively a "length-wise" resolution there and
> - * then.
> - *
> - * If that doesn't work out -- if all bytes in the longer string
> - * positioned at or past the offset of the smaller string's (first)
> - * terminating NUL are actually representative of NUL bytes in the
> - * authoritative binary string (perhaps with some *terminating* NUL bytes
> - * towards the end of the longer string iff it happens to still be small)
> - * -- then an authoritative tie-breaker will happen, and do the right
> - * thing: explicitly consider string length.
>
> Why is this gone -- AFAICT it's still true for bytea, no matter what
> file it's located in?

Actually for bytea_abbrev_convert() this comment makes no sense IMO.
Presumably the reader is aware that '\x0000...' is a valid bytea, and
there is no such thing as "terminating NUL bytes" for bytea.

For varstr_abbrev_convert() as you correctly pointed out before [1] we
actually disallow NUL bytes [2]. The reason why the comment is
currently there is that varstr_abbrev_convert() may have bytea callers
which may have NUL bytes.

So it seems to me that rewriting this particular part is exactly in
scope of the refactoring, unless I'm missing something.

[1]: https://postgr.es/m/CANWCAZbDKESq30bn_6QPZqOyrP7JYxxz27Q5gymv0qJEDVj6_A%40mail.gmail.com
[2]: https://www.postgresql.org/docs/current/datatype-character.html

-- 
Best regards,
Aleksander Alekseev

Attachment

pgsql-hackers by date:

Previous
From: Tender Wang
Date:
Subject: Fix a typo in the comment for gettuple_eval_partition()
Next
From: Álvaro Herrera
Date:
Subject: Re: [PATCH] Add pg_get_tablespace_ddl() function to reconstruct CREATE TABLESPACE statement