Re: [PATCH] Refactor bytea_sortsupport(), take two - Mailing list pgsql-hackers
| From | Aleksander Alekseev |
|---|---|
| Subject | Re: [PATCH] Refactor bytea_sortsupport(), take two |
| Date | |
| Msg-id | CAJ7c6TOP=kjnT+Aeoaige-qp7pPstZ_M8CcrMTfvwUaiYu952w@mail.gmail.com Whole thread Raw |
| In response to | Re: [PATCH] Refactor bytea_sortsupport(), take two (John Naylor <johncnaylorls@gmail.com>) |
| Responses |
Re: [PATCH] Refactor bytea_sortsupport(), take two
|
| List | pgsql-hackers |
Hi John, Many thanks for the feedback. > For example in *_abbrev_convert: > > [...] > > The first part of the comment explained why we do this at all. The > bytea type does not use strcoll, so it's possible that a future patch > may decide to skip cardinality estimation entirely. It's obviously not > the responsibility of this refactoring patch to investigate that, but > throwing the reason out makes it harder for someone to discover that > bytea could do something different here. In this case, maybe we could > point to varstr_abbrev_convert instead of repeating the whole comment. Fixed. I choose to repeat the entire comment since it is relatively short in this case. > /* > - * Clamp cardinality estimates to at least one distinct value. While > - * NULLs are generally disregarded, if only NULL values were > seen so far, > - * that might misrepresent costs if we failed to clamp. > + * Clamp cardinality estimates to at least one distinct value. > */ > > The old comment explained why we clamp, but the new comment just says > what the code is doing, and it's obvious what the code is doing. Fixed in the same manner. > - /* > - * In the worst case all abbreviated keys are identical, while > at the same > - * time there are differences within full key strings not captured in > - * abbreviations. > - */ > > Why is this gone? I could go on, but hopefully you get my point. If > it's short, just keep it, and adjust as necessary. If it's long, point > to varlena.c if the idea is the same. OK, I made sure all the comments are in place. I referenced corresponding varlena.c functions where the comments were too long to my taste to repeat them entirely. > Back to the actual patch: > > - * More generally, it's okay that bytea callers can have NUL bytes in > - * strings because abbreviated cmp need not make a distinction between > - * terminating NUL bytes, and NUL bytes representing actual NULs in the > - * authoritative representation. Hopefully a comparison at or past one > - * abbreviated key's terminating NUL byte will resolve the comparison > - * without consulting the authoritative representation; specifically, some > - * later non-NUL byte in the longer string can resolve the comparison > - * against a subsequent terminating NUL in the shorter string. There will > - * usually be what is effectively a "length-wise" resolution there and > - * then. > - * > - * If that doesn't work out -- if all bytes in the longer string > - * positioned at or past the offset of the smaller string's (first) > - * terminating NUL are actually representative of NUL bytes in the > - * authoritative binary string (perhaps with some *terminating* NUL bytes > - * towards the end of the longer string iff it happens to still be small) > - * -- then an authoritative tie-breaker will happen, and do the right > - * thing: explicitly consider string length. > > Why is this gone -- AFAICT it's still true for bytea, no matter what > file it's located in? Actually for bytea_abbrev_convert() this comment makes no sense IMO. Presumably the reader is aware that '\x0000...' is a valid bytea, and there is no such thing as "terminating NUL bytes" for bytea. For varstr_abbrev_convert() as you correctly pointed out before [1] we actually disallow NUL bytes [2]. The reason why the comment is currently there is that varstr_abbrev_convert() may have bytea callers which may have NUL bytes. So it seems to me that rewriting this particular part is exactly in scope of the refactoring, unless I'm missing something. [1]: https://postgr.es/m/CANWCAZbDKESq30bn_6QPZqOyrP7JYxxz27Q5gymv0qJEDVj6_A%40mail.gmail.com [2]: https://www.postgresql.org/docs/current/datatype-character.html -- Best regards, Aleksander Alekseev
Attachment
pgsql-hackers by date: