From 48d74effe7043576008f31551e7f1ac08d24496b Mon Sep 17 00:00:00 2001 From: Aleksander Alekseev Date: Wed, 17 Aug 2022 20:48:43 +0300 Subject: [PATCH v1] Clarify the comments about varlena header encoding This patch fixes somewhat misleading comments regarding the encoding of the varlena header. Author: Aleksander Alekseev Reviewed-by: TODO FIXME Discussion: TODO FIXME --- src/include/postgres.h | 39 ++++++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 15 deletions(-) diff --git a/src/include/postgres.h b/src/include/postgres.h index 31358110dc..0f9dac73ec 100644 --- a/src/include/postgres.h +++ b/src/include/postgres.h @@ -178,27 +178,36 @@ typedef struct /* * Bit layouts for varlena headers on big-endian machines: * - * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G) - * 01xxxxxx 4-byte length word, aligned, *compressed* data (up to 1G) - * 10000000 1-byte length word, unaligned, TOAST pointer - * 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 126b) + * 00xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G) + * 01xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, compressed data (up to 1G) + * 10000000 xxxxxxxx, TOAST pointer (struct varatt_external) + * 1xxxxxxx, uncompressed data (up to 126b) * * Bit layouts for varlena headers on little-endian machines: * - * xxxxxx00 4-byte length word, aligned, uncompressed data (up to 1G) - * xxxxxx10 4-byte length word, aligned, *compressed* data (up to 1G) - * 00000001 1-byte length word, unaligned, TOAST pointer - * xxxxxxx1 1-byte length word, unaligned, uncompressed data (up to 126b) + * xxxxxx00 xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G) + * xxxxxx10 xxxxxxxx xxxxxxxx xxxxxxxx, compressed data (up to 1G) + * 00000001 xxxxxxxx, TOAST pointer (struct varatt_external) + * xxxxxxx1, uncompressed data (up to 126b) + * + * The "xxx" bits are the length of the attribute. It always includes the length + * of the varlena header. * - * The "xxx" bits are the length field (which includes itself in all cases). * In the big-endian case we mask to extract the length, in the little-endian - * case we shift. Note that in both cases the flag bits are in the physically - * first byte. Also, it is not possible for a 1-byte length word to be zero; - * this lets us disambiguate alignment padding bytes from the start of an - * unaligned datum. (We now *require* pad bytes to be filled with zero!) + * case we shift. Note that in both cases the flag bits are stored in the + * physically first byte. + * + * In first two cases when the length is encoded with 30 bits the varlena + * header is aligned to 4 bytes. In other two cases the header is unaligned. + * Padding bytes are required to be filled with zeroes. This makes the encoding + * unambiguous. + * + * In the second case the first 4 bytes of compressed data store the length + * of the uncompressed data. * - * In TOAST pointers the va_tag field (see varattrib_1b_e) is used to discern - * the specific type and length of the pointer datum. + * In the third case the va_tag field (see varattrib_1b_e) is used to discern + * the specific type and length of the pointer datum. On disk the "xxx" bits + * currently always store sizeof(varatt_external) + 2. */ /* -- 2.37.1