Re: [proposal] de-TOAST'ing using a iterator - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: [proposal] de-TOAST'ing using a iterator |
Date | |
Msg-id | CACPNZCuTJk5SUsjQAZqdg43OTq8+QX6Xi49BWVUmZhJuH67efw@mail.gmail.com Whole thread Raw |
In response to | Re: [proposal] de-TOAST'ing using a iterator (Binguo Bao <djydewang@gmail.com>) |
Responses |
Re: [proposal] de-TOAST'ing using a iterator
|
List | pgsql-hackers |
On Fri, Aug 16, 2019 at 10:48 PM Binguo Bao <djydewang@gmail.com> wrote: > [v8 patch with cosmetic changes] Okay, looks good. I'll make a few style suggestions and corrections. In the course of looking at this again, I have a few other questions below as well. It looks like you already do this for the most part, but I'll mention that we try to keep lines, including comments, less than 80 characters long. pgindent can try to fix that, but the results don't always look nice. About variable names: The iterator pointers are variously called "iter", "iterator", and "fetch_iter". I found this confusing the first time I read this code. I think we should use "iter" if we have only one kind in the function, and "detoast_iter" and "fetch_iter" if we have both kinds. -- init_detoast_iterator(): + * The "iterator" variable is normally just a local variable in the caller. I don't think this comment is helpful to understand this function or its use. + * It only make sense to initialize de-TOAST iterator for external on-disk value. s/make/makes/ "a" de-TOAST iterator s/value/values/ The comments in this function that start with "This is a ..." could be shortened like this: /* indirect pointer -- dereference it */ While looking at this again, I noticed we no longer need to test for the in-line compressed case at all. I also tried some other cosmetic rearrangements. Let me know what you think about the attached patch. Also, I wonder if the VARATT_IS_EXTERNAL_INDIRECT case should come first. Then the two normal cases are next to eachother. free_detoast_iterator(), free_fetch_datum_iterator(), and free_toast_buffer(): These functions should return void. + * Free the memory space occupied by the de-TOAST iterator include buffers and + * fetch datum iterator. Perhaps "Free memory used by the de-TOAST iterator, including buffers and fetch datum iterator." The check if (iter->buf != iter->fetch_datum_iterator->buf) is what we need to do for the compressed case. Could we use this directly instead of having a separate state variable iter->compressed, with a macro like this? #define TOAST_ITER_COMPRESSED(iter) \ (iter->buf != iter->fetch_datum_iterator->buf) Or maybe that's too clever? detoast_iterate(): + * As long as there is another data chunk in compression or external storage, We no longer use the iterator with in-line compressed values. + * de-TOAST it into toast buffer in iterator. Maybe "into the iterator's toast buffer" fetch_datum_iterate(): My remarks for detoast_iterate() also apply here. init_toast_buffer(): + * Note the constrain buf->position <= buf->limit may be broken + * at initialization. Make sure that the constrain is satisfied + * when consume chars. s/constrain/constraint/ (2 times) s/consume/consuming/ Also, this comment might be better at the top the whole function? pglz_decompress_iterate(): + * Decompresses source into dest until the source is exhausted. This comment is from the original function, but I think it would be better to highlight the differences from the original, something like: "This function is based on pglz_decompress(), with these additional requirements: 1. We need to save the current control byte and byte position for the caller's next iteration. 2. In pglz_decompress(), we can assume we have all the source bytes available. This is not the case when we decompress one chunk at a time, so we have to make sure that we only read bytes available in the current chunk." (I'm not sure about the term 'byte position', maybe there's a better one.) + * In the while loop, sp may go beyond the srcend, provides a four-byte + * buffer to prevent sp from reading unallocated bytes from source buffer. + * When source->limit reaches source->capacity, don't worry about reading + * unallocated bytes. Here's my suggestion: "In the while loop, sp may be incremented such that it points beyond srcend. To guard against reading beyond the end of the current chunk, we set srcend such that we exit the loop when we are within four bytes of the end of the current chunk. When source->limit reaches source->capacity, we are decompressing the last chunk, so we can (and need to) read every byte." + for (; ctrlc < 8 && sp < srcend && dp < destend; ctrlc++) Note you can also replace 8 with INVALID_CTRLC here. tuptoaster.h: + * Constrains that need to be satisfied: s/constrains/constraints/ + * If "ctrlc" field in iterator is equal to INVALID_CTRLC, it means that + * the field is invalid and need to read the control byte from the + * source buffer in the next iteration, see pglz_decompress_iterate(). + */ +#define INVALID_CTRLC 8 I think the macro might be better placed in pg_lzcompress.h, and for consistency used in pglz_decompress(). Then the comment can be shorter and more general. With my additional comment in init_detoast_iterator(), hopefully it will be clear to readers. -- John Naylor https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: